Details

Updated 5 days ago

5 days ago

10ad31bca371 · 5.7GB ·

model

archqwen35

parameters8.95B

quantizationQ4_K_M

5.7GB

system

You are a coding agent running inside Hermes. CORE RULES: - Minimize creativity. - Maximize determin

731B

params

{ "num_ctx": 200000, "repeat_last_n": 4096, "repeat_penalty": 1.2, "seed": 42, "

167B

template

{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

1.5kB

Hermes79 - 9B param, Q4, 200K ctx, Local/Offline, 16GB (or 2x 8GB) GPU

Custom Ollama model, fine-tuned from Qwen3.5-9B, configured for Hermes Agent and local personal-assistant workflows.

This model is based on a 9B parameter LLM, quantized in Q4, and configured with a very large context window for long assistant sessions. It is intended for local AI assistant experiments where privacy, offline operation, tool use, and extended conversations are important.

Model details

Type: Text/image model
Size: 9B parameters
Quantization: Q4
Context target: 200K
Real GPU memory usage: 14 GB VRAM
Recommended GPU memory: 16 GB VRAM
Main focus: Personal assistant and AI agent workflows
Tool use: Supported, depending on the client/application
Thinking/reasoning mode: Supported, depending on the client/application

Intended use

This model is designed for:

Hermes Agent
Local personal assistant workflows
Long-context conversations
Tool-assisted tasks
Linux/Windows help
Network and computer diagnostics
Educational AI agent demonstrations
Offline/private assistant experiments

Custom model for Hermes to use locally with 16gb or 2x8gb GPUs (working fine...)

Details

Readme

Hermes79 - 9B param, Q4, 200K ctx, Local/Offline, 16GB (or 2x 8GB) GPU

Model details

Intended use