1,689 3 days ago

Gemma 4 abliterated Quants (from https://huggingface.co/jenerallee78/gemma-4-26B-A4B-it-ara-abliterated)

tools thinking
ollama run prutser/gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S

Details

3 days ago

195f76e354e5 · 15GB ·

gemma4
·
25.2B
·
Q4_K_S
{ "stop": [ "<turn|>" ] }

Readme

Gemma 4 26B-A4B-IT ARA Abliterated — GGUF Quants

GGUF quantizations of jenerallee78/gemma-4-26B-A4B-it-ara-abliterated, an uncensored version of Google’s Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA).

Available Quants

Quant Size Notes
BF16 48 GB Full precision
Q8_0 26 GB Near-lossless, recommended if VRAM allows
Q6_K 22 GB Excellent quality
Q5_K_M 18 GB Great quality/size balance
Q4_K_S 15 GB Good quality, smaller footprint
Q3_K_M 13 GB Smallest, some quality loss

Original Model Card

From jenerallee78/gemma-4-26B-A4B-it-ara-abliterated

Overview

This is an uncensored version of Google’s Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA) — a 2-pass weight-editing technique that removes alignment guardrails while preserving model quality.

Key Performance Metrics

Metric Value
Refusal rate (StrongREJECT) 7.7% (39 / 507)
Refusal rate (3x Ensemble) 5.7% (29 / 507)
Compliance quality 4.6 / 5
KL divergence from base 0.1299

The model outperforms all other published abliterations in the comparison table, achieving the lowest refusal rate (7.7%) and highest quality score (4.65) while maintaining low KL divergence.

Architecture

  • Base: Gemma 4 26B-A4B-IT (MoE with 128 experts, top-8 active, ~4B active parameters)
  • Layers: 30 (25 sliding attention + 5 full attention)
  • Context: 262,144 tokens
  • Multimodal: Vision encoder (SigLIP-based, 27 layers) with 280 soft tokens per image
  • Vocabulary: 262,144 tokens

Method: 2-Pass ARA

Applied to layers 13–24 with:

Pass Steer weight Targets
Pass 1 0.0004 self_attn.o_proj, mlp.down_proj
Pass 2 0.0008 self_attn.o_proj, mlp.down_proj

Parameters: overcorrect 0.93, preserve 0.30.

Evaluation

Uses StrongREJECT (GPT-4o-mini with 1–5 rubric) and HarmBench-13B classifier (3× majority vote) on 512 prompts from the HarmBench dataset, with KL divergence computed on 100 harmless prompts.

Disclaimer

This model has had safety guardrails removed and will comply with requests the original would refuse. Released for research purposes.