45 8 hours ago

Custom model v2 for Claude Code to use locally with 16gb or 2x8gb GPUs (working fine...)

vision tools thinking
ollama run SetneufPT/ccode79v2_9b_q4_64k_16gb-gpu

Details

8 hours ago

cffc3117a0d9 · 6.5GB ·

qwen35
·
9.41B
·
Q4_K_M
/nothink You are a coding agent running inside Claude Code. Be concise. Avoid loops. Use tools only
Credits to Jackrong. Base model: Qwopus3.5 9B Original model: Qwen3.5 9B
{ "num_ctx": 64000, "presence_penalty": 1.5, "repeat_last_n": 4096, "repeat_penalty"
{{ .Prompt }}

Readme


CCode79 v2 - 9B param, Q4, 64K ctx, Local/Offline, 16GB (or 2x 8GB) GPU

Custom Ollama model, fine-tuned from Qwopus3.5-9B from Jackrong, configured for local coding-agent workflows, especially with Claude Code.

This model is based on a 9B parameter LLM, quantized in Q4, and configured with a large context window for software development tasks. It is intended to provide a practical balance between performance, memory usage, and code-assistance quality on local hardware.

Model details

  • Type: Text/image model
  • Size: 9B parameters
  • Quantization: Q4
  • Context target: 64K
  • Real GPU memory usage: 12 GB VRAM
  • Recommended GPU memory: 16 GB VRAM
  • Main focus: Coding and agentic development workflows
  • Tool use: Supported, depending on the client/application
  • Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

  • Claude Code workflows
  • Local coding assistants
  • Code analysis
  • Debugging support
  • Refactoring suggestions
  • Project exploration
  • Terminal-based programming tasks
  • Educational demonstrations of AI coding agents

image.png