n27/ gemma-4-26B-A4B-it-UD-Q4_K_M-32k:latest

1,894 Downloads Updated 2 months ago

Best local model for standard desktop setups, suitable for coding and agent use.

tools thinking

ollama run n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k

curl http://localhost:11434/api/chat \
  -d '{
    "model": "n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

23b4fc191226 · 17GB ·

model

archgemma4

·

parameters25.2B

·

quantizationQ4_K_M

17GB

params

{ "num_ctx": 32768, "stop": [ "<turn|>" ] }

47B

Readme

🧠 n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k

Best local model for standard desktop setups, suitable for coding and agent use.

📦 Overview

gemma-4-26B-A4B-it-UD-Q4_K_M-32k is a quantized, instruction-tuned large language model optimized for local inference on consumer hardware.

This variant is pre-configured with a 32K context window for Ollama, ensuring stable performance on standard desktop setups.

It offers an excellent balance between: - ⚡ Performance - 🧠 Reasoning capability - 💻 Code generation - 🤖 Agent workflows

🚀 Features

🧩 Instruction-tuned (IT) – ready for chat and task execution
💻 Strong coding abilities – works well with dev tools and agents
🧠 Good reasoning performance for its size
🪶 Quantized (Q4_K_M) – optimized for desktop GPUs / RAM
📏 Context pre-configured to 32K (Ollama)
🔧 Compatible with multiple local AI toolchains

📊 Model Details

Property	Value
Model Name	gemma-4-26B-A4B-it-UD-Q4_K_M-32k
Size	~17 GB
Context Length	32K (configured)
Base Capability	Up to 256K (model-dependent)
Input Type	Text
Quantization	Q4_K_M

🖥️ Hardware Requirements

Minimum (will run, but limited performance)

GPU: 4 GB VRAM
RAM: 16 GB
⚠️ Expect slow generation and possible limitations on long context

Recommended (comfortable usage)

GPU: 16 GB VRAM
RAM: 32 GB
✅ Good balance between speed and stability

Optimal (best experience)

GPU: 24 GB VRAM
RAM: 32 GB+
🚀 Smooth performance, better handling of long context and agents

⚙️ Usage

🖥️ Run with Ollama

ollama run n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k

🧪 Recommended Use Cases

💻 Local coding assistant
🤖 Autonomous agents / tool use
📄 Document analysis (medium to long context)
🧠 Reasoning-heavy tasks
🛠️ Developer workflows

💡 Notes

Context is intentionally limited to 32K for better stability and memory usage in Ollama
The underlying model may support larger context, but this build is optimized for real-world desktop usage
Works best with GPU acceleration, but can run on CPU with reduced performance!

![model.png](/assets/n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k/85a8b3ae-7322-4e75-ba0b-bd8c1ee9938e)

# 🧠 n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k

**Best local model for standard desktop setups, suitable for coding and agent use.**

---

## 📦 Overview

`gemma-4-26B-A4B-it-UD-Q4_K_M-32k` is a quantized, instruction-tuned large language model optimized for **local inference on consumer hardware**.

This variant is **pre-configured with a 32K context window for Ollama**, ensuring stable performance on standard desktop setups.

It offers an excellent balance between:
- ⚡ Performance
- 🧠 Reasoning capability
- 💻 Code generation
- 🤖 Agent workflows

---

## 🚀 Features

- 🧩 **Instruction-tuned (IT)** – ready for chat and task execution  
- 💻 **Strong coding abilities** – works well with dev tools and agents  
- 🧠 **Good reasoning performance** for its size  
- 🪶 **Quantized (Q4_K_M)** – optimized for desktop GPUs / RAM  
- 📏 **Context pre-configured to 32K (Ollama)**  
- 🔧 Compatible with multiple local AI toolchains

---

## 📊 Model Details

| Property        | Value |
|----------------|------|
| Model Name     | gemma-4-26B-A4B-it-UD-Q4_K_M-32k |
| Size           | ~17 GB |
| Context Length | 32K (configured) |
| Base Capability| Up to 256K (model-dependent) |
| Input Type     | Text |
| Quantization   | Q4_K_M |

---

## 🖥️ Hardware Requirements

### Minimum (will run, but limited performance)
- GPU: **4 GB VRAM**
- RAM: **16 GB**
- ⚠️ Expect slow generation and possible limitations on long context

### Recommended (comfortable usage)
- GPU: **16 GB VRAM**
- RAM: **32 GB**
- ✅ Good balance between speed and stability

### Optimal (best experience)
- GPU: **24 GB VRAM**
- RAM: **32 GB+**
- 🚀 Smooth performance, better handling of long context and agents

---

## ⚙️ Usage

### 🖥️ Run with Ollama

```bash
ollama run n27/gemma-4-26B-A4B-it-UD-Q4_K_M-32k
```

---

## 🧪 Recommended Use Cases

- 💻 Local coding assistant  
- 🤖 Autonomous agents / tool use  
- 📄 Document analysis (medium to long context)  
- 🧠 Reasoning-heavy tasks  
- 🛠️ Developer workflows

---

## 💡 Notes

- Context is **intentionally limited to 32K** for better stability and memory usage in Ollama
- The underlying model may support larger context, but this build is optimized for **real-world desktop usage**
- Works best with **GPU acceleration**, but can run on CPU with reduced performance!

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)