72 1 week ago

ollama run yasserrmd/GLM4.7-Distill-LFM2.5-1.2B

Details

1 week ago

f3f22f72cabe · 1.2GB ·

lfm2
·
1.17B
·
Q8_0
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user
{ "stop": [ "<|im_start|>", "<|im_end|>" ] }

Readme

GLM4.7-Distill-LFM2.5-1.2B (Ollama)

Ollama Model: yasserrmd/GLM4.7-Distill-LFM2.5-1.2B
Base Model: LiquidAI LFM2.5-1.2B
Distilled From: GLM-4.7
Quantization: 8-bit (Ollama)
License: Apache-2.0

Overview

This is an 8-bit quantized Ollama build of GLM4.7-Distill-LFM2.5-1.2B, a compact instruction-following model distilled from GLM-4.7 into the LiquidAI LFM2.5 architecture.

The goal of this model is practical local usage: fast startup, low memory footprint, and stable instruction adherence on CPU-only or constrained environments.

Original model and training details are available on Hugging Face: https://huggingface.co/yasserrmd/GLM4.7-Distill-LFM2.5-1.2B


Key Characteristics

  • Instruction-tuned via distillation from GLM-4.7
  • Optimized for local and edge-friendly inference
  • 8-bit quantized for reduced memory usage
  • Suitable for assistants, reasoning, summarization, and lightweight coding help
  • Runs well on laptops, desktops, and small servers

Usage

Pull the model

ollama pull yasserrmd/GLM4.7-Distill-LFM2.5-1.2B

Run interactively

ollama run yasserrmd/GLM4.7-Distill-LFM2.5-1.2B

Single prompt execution

ollama run yasserrmd/GLM4.7-Distill-LFM2.5-1.2B "Explain quantization in simple terms."

Recommended Settings

These settings work well for most instruction and reasoning tasks:

  • Temperature: 0.1 – 0.3
  • Top-p: 0.9
  • Max tokens: 256–512

Lower temperatures are recommended for deterministic and concise outputs.


Intended Use

  • Local AI assistants
  • Research experimentation
  • Instruction following and reasoning
  • Summarization and explanation tasks
  • Prototyping agent workflows on Ollama

This model is not designed for safety-critical or medical/legal decision-making.


Notes

  • Built and tested for Ollama native runtime
  • No external dependencies required
  • Performance depends on host CPU and available RAM
  • Works well in fully offline environments

License

Apache-2.0. See the Hugging Face model card for full license details.