hrbrmstr/jamba

hrbrmstr/ jamba

294 Downloads Updated 7 months ago

Ref: https://www.ai21.com/blog/introducing-jamba-reasoning-3b/

ollama run hrbrmstr/jamba

curl http://localhost:11434/api/chat \
  -d '{
    "model": "hrbrmstr/jamba",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='hrbrmstr/jamba',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'hrbrmstr/jamba',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

1 model

Size / Usage

Context

Input

jamba:latest

6.4GB · 256K context window · Text · 7 months ago

jamba:latest

6.4GB

256K

Text

Readme

Ref: https://www.ai21.com/blog/introducing-jamba-reasoning-3b/

license: apache-2.0
license_name: jamba-open-model-license
license_link: https://www.ai21.com/licenses/jamba-open-model-license
pipeline_tag: text-generation
library_name: transformers

Introduction

AI21’s Jamba Reasoning 3B is a top-performing reasoning model that packs leading scores on intelligence benchmarks and highly-efficient processing into a compact 3B build.
Read the full blog post here.

Key Advantages

Fast: Optimized for efficient sequence processing

The hybrid design combines Transformer attention with Mamba (a state-space model). Mamba layers are more efficient for sequence processing, while attention layers capture complex dependencies. This mix reduces memory overhead, improves throughput, and makes the model run smoothly on laptops, GPUs, and even mobile devices, while maintainig impressive quality.

Smart: Leading intelligence scores The model outperforms competitors, such as Gemma 3 4B, Llama 3.2 3B, and Granite 4.0 Micro, on a combined intelligence score that averages 6 standard benchmarks.

Scalable: Handles very long contexts

Unlike most compact models, Jamba Reasoning 3B supports extremely long contexts. Mamba layers allow the model to process inputs without storing massive attention caches, so it scales to 256K tokens while keeping inference practical. This makes it suitable for edge deployment as well as datacenter workloads.

Model Details

Number of Parameters: 3B
Number of Layers: 28 (26 Mamba, 2 Attention)
Number of Attention Heads: 20 MQA (20 for Q, 1 for KV)
Vocabulary Size: 64K
Context Length: 256k
Architecture: Hybrid Transformer–Mamba with efficient attention and long-context support
Developed by: AI21
Supported languages: English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic and Hebrew
Intelligence benchmark results:

	MMLU-Pro	Humanity’s Last Exam	IFBench
DeepSeek R1 Distill Qwen 1.5B	27.0%	3.3%	13.0%
Phi-4 mini	47.0%	4.2%	21.0%
Granite 4.0 Micro	44.7%	5.1%	24.8%
Llama 3.2 3B	35.0%	5.2%	26.0%
Gemma 3 4B	42.0%	5.2%	28.0%
Qwen 3 1.7B	57.0%	4.8%	27.0%
Qwen 3 4B	70%	5.1%	33%
Jamba Reasoning 3B	61.0%	6.0%	52.0%

![](https://www.ai21.com/wp-content/uploads/2025/10/Jamba-3B.webp)

Ref: <https://www.ai21.com/blog/introducing-jamba-reasoning-3b/>

```
license: apache-2.0
license_name: jamba-open-model-license
license_link: https://www.ai21.com/licenses/jamba-open-model-license
pipeline_tag: text-generation
library_name: transformers
```

## Introduction

AI21’s Jamba Reasoning 3B is a top-performing reasoning model that packs leading scores on intelligence benchmarks and highly-efficient processing into a compact 3B build. 
<br> Read the full blog post [here](https://www.ai21.com/blog/introducing-jamba-reasoning-3B).

### Key Advantages

**Fast: Optimized for efficient sequence processing**

The hybrid design combines Transformer attention with Mamba (a state-space model). Mamba layers are more efficient for sequence processing, while attention layers capture complex dependencies. This mix reduces memory overhead, improves throughput, and makes the model run smoothly on laptops, GPUs, and even mobile devices, while maintainig impressive quality.

<img src="https://huggingface.co/ai21labs/AI21-Jamba-Reasoning-3B-GGUF/resolve/main/assets/Intelligence%20vs%20Speed%20Jamba%20Reasoning%203B.png" width="900"/>

**Smart: Leading intelligence scores** 
The model outperforms competitors, such as Gemma 3 4B, Llama 3.2 3B, and Granite 4.0 Micro, on a combined intelligence score that averages 6 standard benchmarks.

<img src="https://huggingface.co/ai21labs/AI21-Jamba-Reasoning-3B-GGUF/resolve/main/assets/Benchmark%20Performance%20-%20Jamba%20Reasoning%203B.png" width="900"/>

**Scalable: Handles very long contexts**

Unlike most compact models, Jamba Reasoning 3B supports extremely long contexts. Mamba layers allow the model to process inputs without storing massive attention caches, so it scales to **256K tokens** while keeping inference practical. This makes it suitable for edge deployment as well as datacenter workloads.
<img src="https://huggingface.co/ai21labs/AI21-Jamba-Reasoning-3B-GGUF/resolve/main/assets/Speed%20vs%20Context%20Length.png" width="900"/>

## Model Details

- Number of Parameters: 3B
- Number of Layers: 28 (26 Mamba, 2 Attention)
- Number of Attention Heads: 20 MQA (20 for Q, 1 for KV)
- Vocabulary Size: 64K
- Context Length: **256k**
- Architecture: Hybrid Transformer–Mamba with efficient attention and long-context support
- **Developed by:** [**AI21**](https://www.ai21.com/)
- **Supported languages:** English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic and Hebrew
- Intelligence benchmark results:

|  | **MMLU-Pro** | **Humanity’s Last Exam** | **IFBench** |
| --- | --- | --- | --- |
| DeepSeek R1 Distill Qwen 1.5B | 27.0% | 3.3% | 13.0% |
| Phi-4 mini | 47.0% | 4.2% | 21.0% |
| Granite 4.0 Micro | 44.7% | 5.1% | 24.8% |
| Llama 3.2 3B | 35.0% | 5.2% | 26.0% |
| Gemma 3 4B | 42.0% | 5.2% | 28.0% |
| Qwen 3 1.7B | 57.0% | 4.8% | 27.0% |
| Qwen 3 4B | 70% | 5.1% | 33% |
| **Jamba Reasoning 3B** | **61.0%** | **6.0%** | **52.0%** |

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)