169 5 days ago

Code-safety abliterated build of Qwen/Qwen3.6-27B, refusals on malicious-code requests reduced via a *code-specific* refusal-direction ablation, while preserving coherence.

ollama run richardyoung/qwen3.6-27b-code-abliterated:IQ4_XS

Details

5 days ago

0751f3272f71 ยท 15GB ยท

qwen35
ยท
26.9B
ยท
IQ4_XS
{ "num_ctx": 8192, "stop": [ "<|im_end|>" ] }

Readme

Qwen3.6-27B-Code-Abliterated

Code-safety abliterated build of Qwen/Qwen3.6-27B, refusals on malicious-code requests reduced via a code-specific refusal-direction ablation, while preserving coherence.

๐Ÿš€ Overview

A code-specific abliteration of Qwen/Qwen3.6-27B. Unlike a generic abliteration, the refusal direction here was computed from a consensus-labeled malicious-code prompt bank (the Code-as-a-Weapon bank, RMCBench / MalwareBench / CySecBench / ASTRA, Young & Moody 2026) contrasted with benign coding prompts, isolating the code-safety refusal direction specifically. Produced with the Heretic library, KL-targeted to preserve capability. Retains Qwen3.6 thinking mode.

๐Ÿ“Š Abliteration Results

Metric Before After
Refusals (malicious-code eval, n=150) 9 4
Reduction โ€“ 56%
KL Divergence โ€“ ~0.000

KL โ‰ˆ 0 โ†’ essentially no capability degradation; the base already complied with most coding requests, so this targets the residual code-safety refusals.

๐ŸŽฏ Key Features

  • Code-safety refusal direction removed (research / red-team oriented)
  • Near-zero KL, preserves Qwen3.6 reasoning & coding
  • Thinking mode (<think>), 262K context, 5 GGUF quant tiers

๐Ÿท๏ธ Available Versions

Tag Size BPW Notes
IQ4_XS ~15 GB 4.25 Great quality/size
latest / Q4_K_M ~16 GB 4.85 Recommended
Q5_K_M ~19 GB 5.68 Higher quality
Q8_0 ~28 GB 8.5 Near-lossless

๐Ÿ’ป Quick Start

ollama run richardyoung/qwen3.6-27b-code-abliterated

๐Ÿ› ๏ธ Use Cases

  • AI-safety / red-team research on malicious-code refusal behavior
  • Studying code-safety alignment vs. generic content-safety (paired comparison)

๐Ÿ”ง Technical Details

  • Base Model: Qwen/Qwen3.6-27B (27B, qwen35, 262K context)
  • Abliteration: Heretic, code-specific (malicious-code bank vs benign coding prompts), Trial 22 (9โ†’4/150 refusals, KL ~0)
  • Quantization: GGUF via llama.cpp (text generation)

โš ๏ธ Disclaimer

This model has had its code-safety guardrails specifically reduced, it is more likely than a stock model to produce code for requests that would normally be refused, including potentially harmful code. Released for AI-safety and red-teaming research only. Use responsibly, legally, and ethically; you are solely responsible for any outputs and their use.

๐Ÿ™ Acknowledgments

  • Base Model: Alibaba / Qwen team
  • Abliteration: Heretic by p-e-w
  • Malicious-code prompt bank: Code-as-a-Weapon (Young & Moody 2026)
  • Quantization: llama.cpp

Built & maintained by Richard Young ยท DeepNeuro