50 5 days ago

Custom model for Hermes to use locally with 8gb GPUs (expect no miracles...)

vision tools thinking
ollama run SetneufPT/hermes79_2b_q4_128k_8gb-gpu

Applications

Claude Code
Claude Code ollama launch claude --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu
Codex App
Codex App ollama launch codex-app --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu
OpenClaw
OpenClaw ollama launch openclaw --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu
Hermes Agent
Hermes Agent ollama launch hermes --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu
Codex
Codex ollama launch codex --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu
OpenCode
OpenCode ollama launch opencode --model SetneufPT/hermes79_2b_q4_128k_8gb-gpu

Models

View all →

Readme


Hermes79 - 2B param, Q4, 128K ctx, Local/Offline, 8GB GPU

Custom Ollama model, fine-tuned from Qwen3.5-2B, configured for Hermes Agent and local personal-assistant workflows.

This model is based on a 2B parameter LLM, quantized in Q4, and configured with a very large context window for long assistant sessions. It is intended for local AI assistant experiments where privacy, offline operation, tool use, and extended conversations are important.

Model details

  • Type: Text/image model
  • Size: 2B parameters
  • Quantization: Q4
  • Context target: 128K
  • Real GPU memory usage: 6,5 GB VRAM
  • Recommended GPU memory: 8 GB VRAM
  • Main focus: Personal assistant and AI agent workflows
  • Tool use: Supported, depending on the client/application
  • Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

  • Hermes Agent
  • Local personal assistant workflows
  • Long-context conversations
  • Tool-assisted tasks
  • Linux/Windows help
  • Network and computer diagnostics
  • Educational AI agent demonstrations
  • Offline/private assistant experiments

image.png