1,607 1 month ago

An upgraded version of LFM2 trained on over twice as many tokens

ollama run sam860/lfm2.5:1.2b-Q8_0

Models

View all →

Readme

Notes

Uploaded in fp16 and Q8_0

  • Q8_0 is the sweet‑spot for on‑device CPU inference
  • fp16 is useful if you have enough GPU memory and want the absolute best performance

Temperature: The model was tuned for very deterministic output, so start with 0.1 – 0.2. Raise to ≈0.6 only if you need more creative or exploratory answers.


Description

LFM2.5‑1.2B‑Instruct – a 1.2B parameter, hybrid Liquid‑architecture model built for edge deployment. I’ve only uploaded the Instruct variant.

Key points:

  • Hybrid design: 10 double‑gated LIV convolution blocks + 6 Grouped‑Query‑Attention (GQA) blocks give fast, low‑latency inference on CPUs, NPUs and mobile GPUs.
  • Extended training: 28 T tokens of pre‑training plus multi‑stage reinforcement learning, delivering quality comparable to much larger models.
  • Context length: 32 k tokens, 65 k‑vocab, multilingual (EN, AR, ZH, FR, DE, JA, KO, ES).
  • Agentic ready: Supports function‑calling/tool use via the built‑in <|tool_call_start|> / <|tool_call_end|> token wrappers.
  • Ideal use‑cases:
    • On‑device assistants, chatbots, and RAG pipelines
    • Structured JSON extraction / data‑wrangling
    • Lightweight code completion (FIM)
    • Edge AI in smartphones, laptops, vehicles, IoT devices

References