115 Downloads Updated 1 month ago
ollama run novaforgeai/qwen2.5-3b:q2k
CPU-optimized GGUF quantized variants of Qwen 2.5 3B Instruct Optimized for NovaForgeAI Desktop App Maintained by NovaForgeAI Team
🚀 Quick Start
ollama run novaforgeai/qwen2.5-3b:q4km
ollama run novaforgeai/qwen2.5-3b:q3km
ollama run novaforgeai/qwen2.5-3b:q2k
All variants are CPU-only and work fully offline once downloaded.
📊 Variant Comparison Variant Size RAM Context Speed Quality Recommended Use Q4_K_M ~1.8 GB ~3 GB 2048 Medium ⭐⭐⭐⭐⭐ Production & demos Q3_K_M ~1.5 GB ~2.5 GB 1024 Fast ⭐⭐⭐ Low-RAM systems Q2_K ~1.2 GB ~2 GB 768 Fastest ⭐ Testing only
🏆 Recommended: q4km — best balance of accuracy, stability, and speed.
💡 Choosing the Right Variant ✅ Use Q4_K_M if:
You want accurate & stable answers
You have 4GB+ RAM
Using in presentations or production
⚠️ Use Q3_K_M if:
RAM is limited (3–4 GB)
You prefer speed over depth
❌ Avoid Q2_K for:
Serious reasoning tasks
Long answers or coding (quality loss is significant)
🔧 Technical Overview Base Model
Name: Qwen/Qwen2.5-3B-Instruct
Parameters: 3 Billion
Architecture: Transformer (Qwen2)
Original Context: 32K tokens
License: Qwen Research License
Why Quantization?
The original FP16 model is ~6–7 GB and slow on CPUs. Quantization converts it into compact GGUF files that:
Reduce RAM usage
Increase inference speed
Enable CPU-only execution
🧠 Quantization Explained (Simple) Format What it means Result FP16 Full precision High quality, very slow GGUF llama.cpp optimized format CPU-friendly Q4_K_M Smart mixed 4–6 bit Best balance Q3_K_M More compression Faster, less accurate Q2_K Aggressive compression Fast but unstable
Quantization does NOT retrain the model — it only compresses weights.
🎯 Use Cases
Perfect for:
Local AI assistants
Coding help & explanations
Summarization & translation
Offline & privacy-focused apps
Student & FYP projects
Optimized for:
Low-end CPUs
No GPU
Desktop environments
🧪 Benchmark Summary (CPU)
Variant Avg Speed Stability Verdict
Q4_K_M ~4.5 tok/s Excellent ✅ Best
Q3_K_M ~6 tok/s Moderate ⚠️ Acceptable
Q2_K ~7 tok/s Poor ❌ Not usable
📦 Local File Mapping
E:\NovaForgeAI\models\quantized
├── qwen2.5-3b-q4km.gguf
├── qwen2.5-3b-q3km.gguf
└── qwen2.5-3b-q2k.gguf
These files are referenced directly by Ollama Modelfiles.
🌟 NovaForgeAI Edition Benefits
Clean Ollama tag-based structure
CPU-first tuning
No redundant base models
Professional documentation
Ready for demo, FYP & production
📄 License & Credits
Base Model: Qwen Team (Alibaba Cloud)
Quantization: NovaForgeAI (llama.cpp)
License: Qwen Research License
Status: ✅ Production Ready Optimized for: NovaForgeAI Desktop App Maintained by: NovaForgeAI Team