Gemma 4 26B (IQ4_XS) - Optimized for 16GB VRAM

tools thinking

ollama run VladimirGav/gemma4-26b-16GB-VRAM

curl http://localhost:11434/api/chat \
  -d '{
    "model": "VladimirGav/gemma4-26b-16GB-VRAM",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='VladimirGav/gemma4-26b-16GB-VRAM',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'VladimirGav/gemma4-26b-16GB-VRAM',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 3 months ago

3 months ago

c1c301c090d0 · 14GB ·

model

archgemma4

parameters25.2B

quantizationIQ4_XS

14GB

params

{ "num_ctx": 8192, "num_gpu": 99, "stop": [ "<turn|>" ], "temperature":

98B

Readme

Gemma 4 26B (IQ4_XS) - Optimized for 16GB VRAM

🔗 Uncensored Version

Looking for the uncensored version? You can find it here: Gemma 4 26B (Uncensored)

This is a highly optimized version of Google Gemma 4 26B, specifically tailored to run on GPUs with 16GB of VRAM. It uses the advanced IQ4_XS (Importance Matrix) quantization to maintain high intelligence while fitting into a limited memory footprint.

🚀 Key Features

VRAM Efficient: Occupies ~15GB, leaving about 1GB for context (KV Cache) on a 16GB card.

💻 Target Hardware

Perfectly fits: * NVIDIA RTX 5060 Ti (16GB) * NVIDIA RTX 4070 Ti Super (16GB) * NVIDIA RTX 3080 (16GB version) * NVIDIA RTX 4080 / 4080 Super * NVIDIA RTX ⁵⁰⁰⁰⁄₆₀₀₀ Ada / A4000 (16GB)

🛠 How to Use

Simply run the following command in your terminal: “`bash ollama run VladimirGav/gemma4-26b-16GB-VRAM