187 2 days ago

Custom 3-bit coding model optimized for local agent workflows on 16 GB GPUs. Features a 128K context window for large codebases, long-running tasks, and coding assistants such as OpenCode. Designed for efficient local inference with strong code generation

vision thinking
ollama run abdulroqib/qwen3.6-27b-q3-128k

Details

2 days ago

125310b87db6 · 13GB ·

qwen35
·
26.9B
·
Q3_K_S
clip
·
461M
·
F16
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la
{ "min_p": 0, "num_ctx": 128000, "presence_penalty": 0, "repeat_penalty": 1, "st

Readme

Qwen 3.6 27B 3bit (128K)

A custom 3-bit quantized coding model optimized for local AI coding agents on 16 GB GPUs. Built for large-context software development workflows with a 128K context window, enabling repository-scale code understanding, refactoring, debugging, and autonomous agent tasks.

Designed for use with OpenCode and other agent frameworks that benefit from long context and efficient local inference.

Recommended Ollama Settings

To fit comfortably within 16 GB VRAM while maintaining a large context window:

[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"

Or run Ollama with:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=q4_0 \
ollama serve

Usage

ollama run abdulroqib/qwen3.6-27b-q3-128k

OpenCode Configuration

Add the following configuration to your OpenCode config:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "abdulroqib/qwen3.6-27b-q3-128k": {
          "name": "Qwen 3.6 27B 3bit (128K)",
          "thinking": true,
          "limit": {
            "context": 128000,
            "output": 32000
          }
        }
      }
    }
  }
}