1 an hour ago

A coding-optimized configuration of Qwen3.5-9B designed for 16 GB single-GPU hardware. The model uses the official Q4_K_M quantization (~6.6 GB weights), leaving ~9 GB headroom for KV cache — enabling 32K+ context windows comfortably.

vision tools thinking
5a64af469455 · 204B
{
"frequency_penalty": 0,
"min_p": 0,
"num_ctx": 32768,
"num_gpu": 99,
"presence_penalty": 0,
"repeat_penalty": 1.05,
"stop": [
"<|im_start|>",
"<|im_end|>"
],
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}