Now with 100% more alignment!

41 4 months ago

Readme

See on Huggingface

This is microsoft/Phi-3-mini-128k-instruct with orthogonalized bfloat16 safetensor weights, generated with a refined methodology based on that which was described in the preview paper/blog post: ‘Refusal in LLMs is mediated by a single direction’ which I encourage you to read to understand more.