209 Downloads Updated 5 months ago
ollama run hrbrmstr/jamba
Updated 5 months ago
5 months ago
4f221da702dd · 6.4GB ·

Ref: https://www.ai21.com/blog/introducing-jamba-reasoning-3b/
license: apache-2.0
license_name: jamba-open-model-license
license_link: https://www.ai21.com/licenses/jamba-open-model-license
pipeline_tag: text-generation
library_name: transformers
AI21’s Jamba Reasoning 3B is a top-performing reasoning model that packs leading scores on intelligence benchmarks and highly-efficient processing into a compact 3B build.
Read the full blog post here.
Fast: Optimized for efficient sequence processing
The hybrid design combines Transformer attention with Mamba (a state-space model). Mamba layers are more efficient for sequence processing, while attention layers capture complex dependencies. This mix reduces memory overhead, improves throughput, and makes the model run smoothly on laptops, GPUs, and even mobile devices, while maintainig impressive quality.

Smart: Leading intelligence scores The model outperforms competitors, such as Gemma 3 4B, Llama 3.2 3B, and Granite 4.0 Micro, on a combined intelligence score that averages 6 standard benchmarks.

Scalable: Handles very long contexts
Unlike most compact models, Jamba Reasoning 3B supports extremely long contexts. Mamba layers allow the model to process inputs without storing massive attention caches, so it scales to 256K tokens while keeping inference practical. This makes it suitable for edge deployment as well as datacenter workloads.

| MMLU-Pro | Humanity’s Last Exam | IFBench | |
|---|---|---|---|
| DeepSeek R1 Distill Qwen 1.5B | 27.0% | 3.3% | 13.0% |
| Phi-4 mini | 47.0% | 4.2% | 21.0% |
| Granite 4.0 Micro | 44.7% | 5.1% | 24.8% |
| Llama 3.2 3B | 35.0% | 5.2% | 26.0% |
| Gemma 3 4B | 42.0% | 5.2% | 28.0% |
| Qwen 3 1.7B | 57.0% | 4.8% | 27.0% |
| Qwen 3 4B | 70% | 5.1% | 33% |
| Jamba Reasoning 3B | 61.0% | 6.0% | 52.0% |