Custom 3-bit coding model optimized for local agent workflows on 16 GB GPUs. Features a 128K context window for large codebases, long-running tasks, and coding assistants such as OpenCode. Designed for efficient local inference with strong code generation

Details

Updated 2 days ago

2 days ago

125310b87db6 · 13GB ·

model

archqwen35

parameters26.9B

quantizationQ3_K_S

12GB

projector

archclip

parameters461M

quantizationF16

928MB

template

{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

781B

params

{ "min_p": 0, "num_ctx": 128000, "presence_penalty": 0, "repeat_penalty": 1, "st

194B

Qwen 3.6 27B 3bit (128K)

A custom 3-bit quantized coding model optimized for local AI coding agents on 16 GB GPUs. Built for large-context software development workflows with a 128K context window, enabling repository-scale code understanding, refactoring, debugging, and autonomous agent tasks.

Designed for use with OpenCode and other agent frameworks that benefit from long context and efficient local inference.

Recommended Ollama Settings

To fit comfortably within 16 GB VRAM while maintaining a large context window:

[Service]
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"

Or run Ollama with:

OLLAMA_FLASH_ATTENTION=1 \
OLLAMA_KV_CACHE_TYPE=q4_0 \
ollama serve

Usage

ollama run abdulroqib/qwen3.6-27b-q3-128k

OpenCode Configuration

Add the following configuration to your OpenCode config:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "abdulroqib/qwen3.6-27b-q3-128k": {
          "name": "Qwen 3.6 27B 3bit (128K)",
          "thinking": true,
          "limit": {
            "context": 128000,
            "output": 32000
          }
        }
      }
    }
  }
}