30 5 days ago

Custom model for Hermes to use locally with 16gb or 2x8gb GPUs (working fine...)

tools thinking
ollama run SetneufPT/hermes79_9b_q4_200k_16gb-gpu

Details

5 days ago

10ad31bca371 · 5.7GB ·

qwen35
·
8.95B
·
Q4_K_M
You are a coding agent running inside Hermes. CORE RULES: - Minimize creativity. - Maximize determin
{ "num_ctx": 200000, "repeat_last_n": 4096, "repeat_penalty": 1.2, "seed": 42, "
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

Readme


Hermes79 - 9B param, Q4, 200K ctx, Local/Offline, 16GB (or 2x 8GB) GPU

Custom Ollama model, fine-tuned from Qwen3.5-9B, configured for Hermes Agent and local personal-assistant workflows.

This model is based on a 9B parameter LLM, quantized in Q4, and configured with a very large context window for long assistant sessions. It is intended for local AI assistant experiments where privacy, offline operation, tool use, and extended conversations are important.

Model details

  • Type: Text/image model
  • Size: 9B parameters
  • Quantization: Q4
  • Context target: 200K
  • Real GPU memory usage: 14 GB VRAM
  • Recommended GPU memory: 16 GB VRAM
  • Main focus: Personal assistant and AI agent workflows
  • Tool use: Supported, depending on the client/application
  • Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

  • Hermes Agent
  • Local personal assistant workflows
  • Long-context conversations
  • Tool-assisted tasks
  • Linux/Windows help
  • Network and computer diagnostics
  • Educational AI agent demonstrations
  • Offline/private assistant experiments

image.png