738 1 week ago

Custom model for coding with agents to use locally with 16gb GPUs (working fine...)

ollama run SetneufPT/Qwen3.6-27B-MTP_Q3_32K_16GB-GPU

Models

View all →

Readme


Qwen 3.6 MTP - 27B param, Q3, 32K ctx, Local/Offline, 16GB GPU

Custom Ollama model, fine-tuned from Qwen3.6 MTP 27B from unsloth, configured for local coding-agent workflows, especially with Open Code / Hermes.

This model is based on a 27B parameter LLM, quantized in Q3, with MTP and configured with a large context window for software development tasks. It is intended to provide a practical balance between performance, memory usage, and code-assistance quality on local hardware.

Model details

  • Type: Text model with MTP*
  • Size: 27B parameters
  • Quantization: Q3
  • Context target: 32K
  • Real GPU memory usage: 15 GB VRAM
  • Recommended GPU memory: 16 GB VRAM
  • Main focus: Coding and agentic development workflows
  • Tool use: Supported, depending on the client/application
  • Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

  • Agents workflows
  • Local coding assistants
  • Code analysis
  • Debugging support
  • Refactoring suggestions
  • Project exploration
  • Terminal-based programming tasks
  • Educational demonstrations of AI coding agents

* if runtime allows

image.png