SetneufPT/ Qwen3.6-27B-MTP_Q3_32K_16GB-GPU

738 Downloads Updated 1 week ago

Custom model for coding with agents to use locally with 16gb GPUs (working fine...)

ollama run SetneufPT/Qwen3.6-27B-MTP_Q3_32K_16GB-GPU

curl http://localhost:11434/api/chat \
  -d '{
    "model": "SetneufPT/Qwen3.6-27B-MTP_Q3_32K_16GB-GPU",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='SetneufPT/Qwen3.6-27B-MTP_Q3_32K_16GB-GPU',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'SetneufPT/Qwen3.6-27B-MTP_Q3_32K_16GB-GPU',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

1 model

Size / Usage

Context

Input

Qwen3.6-27B-MTP_Q3_32K_16GB-GPU:latest

14GB · 256K context window · Text · 1 week ago

Qwen3.6-27B-MTP_Q3_32K_16GB-GPU:latest

14GB

256K

Text

Readme

Qwen 3.6 MTP - 27B param, Q3, 32K ctx, Local/Offline, 16GB GPU

Custom Ollama model, fine-tuned from Qwen3.6 MTP 27B from unsloth, configured for local coding-agent workflows, especially with Open Code / Hermes.

This model is based on a 27B parameter LLM, quantized in Q3, with MTP and configured with a large context window for software development tasks. It is intended to provide a practical balance between performance, memory usage, and code-assistance quality on local hardware.

Model details

Type: Text model with MTP*
Size: 27B parameters
Quantization: Q3
Context target: 32K
Real GPU memory usage: 15 GB VRAM
Recommended GPU memory: 16 GB VRAM
Main focus: Coding and agentic development workflows
Tool use: Supported, depending on the client/application
Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

Agents workflows
Local coding assistants
Code analysis
Debugging support
Refactoring suggestions
Project exploration
Terminal-based programming tasks
Educational demonstrations of AI coding agents

* if runtime allows

---
# Qwen 3.6 MTP - 27B param, Q3, 32K ctx, Local/Offline, 16GB GPU

Custom Ollama model, fine-tuned from Qwen3.6 MTP 27B from unsloth, configured for local coding-agent workflows, especially with Open Code / Hermes.

This model is based on a 27B parameter LLM, quantized in Q3, with MTP and configured with a large context window for software development tasks. It is intended to provide a practical balance between performance, memory usage, and code-assistance quality on local hardware.

## Model details

- Type: Text model with MTP*
- Size: 27B parameters
- Quantization: Q3
- Context target: 32K
- Real GPU memory usage: 15 GB VRAM
- Recommended GPU memory: 16 GB VRAM
- Main focus: Coding and agentic development workflows
- Tool use: Supported, depending on the client/application
- Thinking/reasoning mode: Supported, depending on the client/application

## Intended use

This model is designed for:

- Agents workflows
- Local coding assistants
- Code analysis
- Debugging support
- Refactoring suggestions
- Project exploration
- Terminal-based programming tasks
- Educational demonstrations of AI coding agents

_* if runtime allows_

![image.png](/assets/SetneufPT/ccode79_9b_q4_64k_16gb-gpu/89a089a9-c581-4ec0-9cc0-cc3515446e71)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)