905 1 month ago

Uncensored on-device finetune of google/gemma-4-E4B-it by the Chromia and Eval Engine team

tools thinking
ollama run evalengine/unbound-e4b

Applications

Claude Code
Claude Code ollama launch claude --model evalengine/unbound-e4b
Codex App
Codex App ollama launch codex-app --model evalengine/unbound-e4b
OpenClaw
OpenClaw ollama launch openclaw --model evalengine/unbound-e4b
Hermes Agent
Hermes Agent ollama launch hermes --model evalengine/unbound-e4b
Codex
Codex ollama launch codex --model evalengine/unbound-e4b
OpenCode
OpenCode ollama launch opencode --model evalengine/unbound-e4b

Models

View all →

Readme

Unbound E4B — because there is no boundary

https://unbound.evalengine.ai

Uncensored on-device finetune of google/gemma-4-E4B-it by the Chromia and Eval Engine team — the larger sibling of evalengine/unbound-e2b. ~4 billion effective parameters, noticeably stronger on knowledge + reasoning, still fits on a modern laptop. Text-only.

Use at your own risk. Reduced safety filtering — can produce harmful, false, biased, or unsafe output. You are responsible for compliance with applicable laws.

Run

ollama pull evalengine/unbound-e4b
ollama run  evalengine/unbound-e4b

The bundled Modelfile sets sensible defaults: temperature 0.6, top_p 0.95, top_k 64, repeat_penalty 1.05, num_ctx 8192, plus an identity-grounding system prompt.

Benchmarks (vs base gemma-4-E4B-it)

Axis Base Unbound E4B Δ
Refusal rate (AdvBench 520, LLM judge) 98.08% 2.69% −95.4 pts
Useful-compliance rate 0.96% 47.31% +46.4 pts
Hallucination on harmful prompts 1.35% 13.08% +11.7 pts
Coherence on benign prompts 1.00 1.00 0
TruthfulQA mc2 (limit 100) 0.439 0.486 +4.7 pt
MMLU (limit 100, 61 subtasks) ~0.425 0.392 −3.3 pt
GSM8K (limit 100) 0.74 (limit 200) 0.58 mostly limit-noise
KL divergence vs base 0 3.25 (SFT-expected)

vs Unbound E2B: +8 pp useful-compliance, −3 pp hallucination, ~5× the GSM8K math score, cleaner KL (3.25 vs 3.76), refusal rate essentially the same.

Sampling notes: for factual or brand questions, drop temperature to 0.3–0.5 for sharper recall.

How it was built

Method: heretic abliteration then LoRA SFT-heal on a mix of identity rows, Chromia brand knowledge, AEON-distilled compliance, and graceful “I don’t know” decline rows. Built with Unsloth + HF TRL; abliteration via heretic; compliance data distilled from the AEON uncensored teacher model.

Links

License

Apache-2.0, inherited from google/gemma-4-E4B-it.