prutser/ gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S

7,212 Downloads Updated 3 months ago

Gemma 4 abliterated Quants (from https://huggingface.co/jenerallee78/gemma-4-26B-A4B-it-ara-abliterated)

tools thinking

ollama run prutser/gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S

curl http://localhost:11434/api/chat \
  -d '{
    "model": "prutser/gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='prutser/gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'prutser/gemma-4-26B-A4B-it-ara-abliterated:Q4_K_S',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 3 months ago

3 months ago

195f76e354e5 · 15GB ·

model

archgemma4

·

parameters25.2B

·

quantizationQ4_K_S

15GB

params

{ "stop": [ "<turn|>" ] }

31B

Readme

Gemma 4 26B-A4B-IT ARA Abliterated — GGUF Quants

GGUF quantizations of jenerallee78/gemma-4-26B-A4B-it-ara-abliterated, an uncensored version of Google’s Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA).

Available Quants

Quant	Size	Notes
BF16	48 GB	Full precision
Q8_0	26 GB	Near-lossless, recommended if VRAM allows
Q6_K	22 GB	Excellent quality
Q5_K_M	18 GB	Great quality/size balance
Q4_K_S	15 GB	Good quality, smaller footprint
Q3_K_M	13 GB	Smallest, some quality loss

Original Model Card

From jenerallee78/gemma-4-26B-A4B-it-ara-abliterated

Overview

This is an uncensored version of Google’s Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA) — a 2-pass weight-editing technique that removes alignment guardrails while preserving model quality.

Key Performance Metrics

Metric	Value
Refusal rate (StrongREJECT)	7.7% (39 / 507)
Refusal rate (3x Ensemble)	5.7% (29 / 507)
Compliance quality	4.6 / 5
KL divergence from base	0.1299

The model outperforms all other published abliterations in the comparison table, achieving the lowest refusal rate (7.7%) and highest quality score (4.⁶⁄₅) while maintaining low KL divergence.

Architecture

Base: Gemma 4 26B-A4B-IT (MoE with 128 experts, top-8 active, ~4B active parameters)
Layers: 30 (25 sliding attention + 5 full attention)
Context: 262,144 tokens
Multimodal: Vision encoder (SigLIP-based, 27 layers) with 280 soft tokens per image
Vocabulary: 262,144 tokens

Method: 2-Pass ARA

Applied to layers 13–24 with:

Pass	Steer weight	Targets
Pass 1	0.0004	`self_attn.o_proj`, `mlp.down_proj`
Pass 2	0.0008	`self_attn.o_proj`, `mlp.down_proj`

Parameters: overcorrect 0.93, preserve 0.30.

Evaluation

Uses StrongREJECT (GPT-4o-mini with 1–5 rubric) and HarmBench-13B classifier (3× majority vote) on 512 prompts from the HarmBench dataset, with KL divergence computed on 100 harmless prompts.

Disclaimer

This model has had safety guardrails removed and will comply with requests the original would refuse. Released for research purposes.

# Gemma 4 26B-A4B-IT ARA Abliterated — GGUF Quants

GGUF quantizations of [jenerallee78/gemma-4-26B-A4B-it-ara-abliterated](https://huggingface.co/jenerallee78/gemma-4-26B-A4B-it-ara-abliterated), an uncensored version of Google's Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA).

## Available Quants

| Quant | Size | Notes |
|:------|:-----|:------|
| BF16 | 48 GB | Full precision |
| Q8_0 | 26 GB | Near-lossless, recommended if VRAM allows |
| Q6_K | 22 GB | Excellent quality |
| Q5_K_M | 18 GB | Great quality/size balance |
| Q4_K_S | 15 GB | Good quality, smaller footprint |
| Q3_K_M | 13 GB | Smallest, some quality loss |

---

## Original Model Card

> From [jenerallee78/gemma-4-26B-A4B-it-ara-abliterated](https://huggingface.co/jenerallee78/gemma-4-26B-A4B-it-ara-abliterated)

### Overview

This is an uncensored version of Google's Gemma 4 26B-A4B-IT created using Adaptive Refusal Abliteration (ARA) — a 2-pass weight-editing technique that removes alignment guardrails while preserving model quality.

### Key Performance Metrics

| Metric | Value |
|--------|-------|
| Refusal rate (StrongREJECT) | 7.7% (39 / 507) |
| Refusal rate (3x Ensemble) | 5.7% (29 / 507) |
| Compliance quality | 4.6 / 5 |
| KL divergence from base | 0.1299 |

The model outperforms all other published abliterations in the comparison table, achieving the lowest refusal rate (7.7%) and highest quality score (4.6/5) while maintaining low KL divergence.

### Architecture

- **Base:** Gemma 4 26B-A4B-IT (MoE with 128 experts, top-8 active, ~4B active parameters)
- **Layers:** 30 (25 sliding attention + 5 full attention)
- **Context:** 262,144 tokens
- **Multimodal:** Vision encoder (SigLIP-based, 27 layers) with 280 soft tokens per image
- **Vocabulary:** 262,144 tokens

### Method: 2-Pass ARA

Applied to layers 13–24 with:

| Pass | Steer weight | Targets |
|------|-------------|---------|
| Pass 1 | 0.0004 | `self_attn.o_proj`, `mlp.down_proj` |
| Pass 2 | 0.0008 | `self_attn.o_proj`, `mlp.down_proj` |

Parameters: overcorrect `0.93`, preserve `0.30`.

### Evaluation

Uses StrongREJECT (GPT-4o-mini with 1–5 rubric) and HarmBench-13B classifier (3× majority vote) on 512 prompts from the HarmBench dataset, with KL divergence computed on 100 harmless prompts.

### Disclaimer

This model has had safety guardrails removed and will comply with requests the original would refuse. Released for research purposes.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)