Details

Updated 4 weeks ago

4 weeks ago

bb805c7f66ed · 293MB ·

model

archlfm2

parameters354M

quantizationQ6_K

293MB

template

<|startoftext|>{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<

196B

system

You are a helpful assistant trained by Liquid AI.

49B

params

{ "repeat_penalty": 1.05, "stop": [ "<|im_start|>", "<|im_end|>", "<

138B

LFM2.5-350M

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

Best-in-class performance: A 350M model rivaling much larger models, bringing high-quality AI to your pocket.
Fast edge inference: 313 tok/s decode on AMD CPU, 188 tok/s on Snapdragon Gen4. Runs under 1GB of memory with day-one support for llama.cpp, MLX, and vLLM.
Scaled training: Extended pre-training from 10T to 28T tokens and large-scale multi-stage reinforcement learning.

Find more information about LFM2.5-350M in our blog post.

🗒️ Model Details

LFM2.5-350M is a general-purpose text-only model with the following features:

Number of parameters: 350M
Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
Training budget: 28T tokens
Context length: 32,768 tokens
Vocabulary size: 65,536
Knowledge cutoff: Mid-2024
Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
Generation parameters:
- temperature: 0.1
- top_k: 50
- repetition_penalty: 1.05

We recommend using it for data extraction, structured outputs, and tool use. It is not recommended for knowledge-intensive tasks and programming.

📊 Performance

Benchmarks

Model	GPQA Diamond	MMLU-Pro	IFEval	IFBench	Multi-IF
LFM2.5-350M	30.64	20.01	76.96	40.69	44.92
LFM2-350M	27.58	19.29	64.96	18.20	32.92
Granite 4.0-H-350M	22.32	13.14	61.27	17.22	28.70
Granite 4.0-350M	25.91	12.84	53.48	15.98	24.21
Qwen3.5-0.8B (Instruct)	27.41	37.42	59.94	22.87	41.68
Qwen3.5-0.8B (Thinking)	19.29	-*	32.93	22.00	26.44
Gemma 3 1B IT	23.89	14.04	63.49	20.33	44.25

Model	CaseReportBench	BFCLv3	BFCLv4	τ²-Bench Telecom	τ²-Bench Retail
LFM2.5-350M	32.45	44.11	21.86	18.86	17.84
LFM2-350M	11.67	22.95	12.29	10.82	5.56
Granite 4.0-H-350M	12.44	43.07	13.28	13.74	6.14
Granite 4.0-350M	0.84	39.58	13.73	2.92	6.14
Qwen3.5-0.8B (Instruct)	13.83	35.08	18.70	12.57	6.14
Qwen3.5-0.8B (Thinking)	0.39	39.64	25.39	14.33	7.02
Gemma 3 1B IT	2.28	16.61	7.17	9.36	6.43

*Evaluation could not be completed due to doom looping.

CPU Inference

GPU Inference

📬 Contact

Got questions or want to connect? Join our Discord community
If you are interested in custom solutions with edge deployment, please contact our sales team.

Ultra-fast edge model for data extraction and tool use