187 Downloads Updated 2 days ago
ollama run abdulroqib/qwen3.6-27b-q3-128k
Updated 2 days ago
2 days ago
125310b87db6 · 13GB ·
A custom 3-bit quantized coding model optimized for local AI coding agents on 16 GB GPUs. Built for large-context software development workflows with a 128K context window, enabling repository-scale code understanding, refactoring, debugging, and autonomous agent tasks.
Designed for use with OpenCode and other agent frameworks that benefit from long context and efficient local inference.
To fit comfortably within 16 GB VRAM while maintaining a large context window:
[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"
Or run Ollama with:
OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=q4_0 \
ollama serve
ollama run abdulroqib/qwen3.6-27b-q3-128k
Add the following configuration to your OpenCode config:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"abdulroqib/qwen3.6-27b-q3-128k": {
"name": "Qwen 3.6 27B 3bit (128K)",
"thinking": true,
"limit": {
"context": 128000,
"output": 32000
}
}
}
}
}
}