49 5 days ago

Custom model for Hermes to use locally with 8gb GPUs (expect no miracles...)

vision tools thinking
ollama run SetneufPT/hermes79_2b_q4_128k_8gb-gpu

Details

5 days ago

f5e13a33e759 · 1.9GB ·

qwen35
·
2.27B
·
Q4_K_M
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
You are a coding agent running inside Hermes. CORE RULES: - Minimize creativity. - Maximize determin
{ "num_ctx": 128000, "presence_penalty": 1.5, "repeat_last_n": 2048, "repeat_penalty

Readme


Hermes79 - 2B param, Q4, 128K ctx, Local/Offline, 8GB GPU

Custom Ollama model, fine-tuned from Qwen3.5-2B, configured for Hermes Agent and local personal-assistant workflows.

This model is based on a 2B parameter LLM, quantized in Q4, and configured with a very large context window for long assistant sessions. It is intended for local AI assistant experiments where privacy, offline operation, tool use, and extended conversations are important.

Model details

  • Type: Text/image model
  • Size: 2B parameters
  • Quantization: Q4
  • Context target: 128K
  • Real GPU memory usage: 6,5 GB VRAM
  • Recommended GPU memory: 8 GB VRAM
  • Main focus: Personal assistant and AI agent workflows
  • Tool use: Supported, depending on the client/application
  • Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

  • Hermes Agent
  • Local personal assistant workflows
  • Long-context conversations
  • Tool-assisted tasks
  • Linux/Windows help
  • Network and computer diagnostics
  • Educational AI agent demonstrations
  • Offline/private assistant experiments

image.png