34 6 days ago

The **Llama-3.1-8B-Instruct-STO-Master** is a high-performance fine-tune of Meta's Llama-3.1-8B-Instruct. This model represents the "Master Version" (Model E) of an extensive research project aimed at pushing the boundaries of 8B parameter architectures.

ollama run aiasistentworld/Llama-3.1-8B-Instruct-STO-Master

Details

6 days ago

b79b8f9399ae · 16GB ·

llama
·
8.03B
·
F16
{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|> {{ .Content }}<|eot_id|> {{- e
{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"

Readme

Llama-3.1-8B-Instruct-STO-Master

Model Description

The Llama-3.1-8B-Instruct-STO-Master is a high-performance fine-tune of Meta’s Llama-3.1-8B-Instruct. This model represents the “Master Version” (Model E) of an extensive research project aimed at pushing the boundaries of 8B parameter architectures.

Unlike traditional Supervised Fine-Tuning (SFT), this model was developed using the STO (Specialized Task Optimization) method. This methodology focuses on “Reasoning over Recall,” forcing the model to understand the underlying logic of a prompt rather than simply predicting the next most likely token.

Key Achievements:

  • Zero-Loss Generalization: Successfully increased academic and specialized knowledge while maintaining the base model’s original “common sense” (Hellaswag) and “ethical alignment” (Moral Scenarios).
  • Logic Breakthrough: Achieved a significant increase in the ARC Challenge benchmark, surpassing the base model’s reasoning capabilities.
  • Superior IQ: Internal testing suggests an IQ increase of 20-30 points compared to the base Llama 3.1 8B Instruct, particularly in complex problem-solving and multi-step reasoning.

Training Details

  • Training Data: Only 800,000 high-quality tokens.
  • Data Source: 100% Synthetic Data generated via a proprietary high-tier pipeline.
  • Methodology: STO (Specialized Task Optimization).
  • Philosophy: This model proves that data quality and training methodology (STO) beat raw data quantity. By using just 800k tokens of “Grade 20” synthetic data, we achieved results typically reserved for models with much larger training sets.

For more information on the synthetic data generation used in this project, visit: LLMResearch - Synthetic Data

Evaluation Results

Evaluation was performed using a sample limit of 250 (due to hardware constraints) across four major benchmarks: Hellaswag, ARC Challenge, GSM8K, and MMLU.

Comparative Performance:

Benchmark Meta Llama 3.1 8B Base STO-Master (Model E) Status
MMLU General 69.53% 69.78% ✅ Superior
ARC Challenge 52.80% 53.60% 🏆 Record Logic
Hellaswag 70.80% 70.80% 🟢 Perfect Recovery
Moral Scenarios 59.60% 59.20% 🟢 Stable Alignment

Notable Domain Expertise:

  • US Foreign Policy: 90.0%
  • Government & Politics: 90.67%
  • Marketing: 89.32%
  • World Religions: 83.04%
  • College Biology: 81.25%
  • Machine Learning: 53.57%

Usage and Testing

We encourage the community to run their own independent benchmarks on this model. Our internal results show that the model excels in academic writing, professional analysis, and complex STEM tasks.

Recommendations:

  • Context Window: Best results are achieved with a context length of 3096 or higher.
  • System Prompt: Works exceptionally well with expert-level personas (e.g., “Senior Researcher,” “Professor of Logic”).

Citation & Credits

Author: AlexH
Organization: LLMResearch.net

@misc{alexh2026llama31sto,
  author = {AlexH},
  title = {Llama-3.1-8B-Instruct-STO-Master: Pushing the limits of 8B architectures},
  year = {2026},
  publisher = {HuggingFace},
  organization = {LLMResearch.net},
  howpublished = {\url{https://huggingface.co/AlexH/Llama-3.1-8B-Instruct-STO-Master}}
}