QN-Summarizer-1 is a 400M parameter model specialized for summarization, outperforming ~20B models on benchmarks. Delivers ~1.5s CPU and millisecond GPU inference. Accepts up to 7,000 words for fast, high-quality document summarization.

QyrouNnet-Summarizer-1:0.4B is the first summarization model in the QyrouNnet/Summarizer family.

Model

Overview

Built for factual summarization, this model is optimized to compress structured information such as articles, essays, webpages, reports, study material, technical explanations, and newspaper-style content into clear, source-focused summaries. It is designed to be fast, lightweight, and reliable to deploy locally, making it suitable for practical workflows where users need efficient summarization without relying on large infrastructure.

Its strongest use case is extracting key factual points from long informational text while keeping the output concise, consistent, and grounded in the source material. It is not primarily designed for natural conversations, fictional stories, or dialogue-heavy content, where tone, emotion, and narrative flow are more important. However, for factual essays, documents, webpages, and long-form informational data, it provides a focused and dependable summarization experience built around speed, reliability, and practical deployment.

Supports only English as of now

Model Interactions

QN-Summarizer-1 isn’t a normal conversational AI model. You cannot expect hi to get How, can I help you?. This model is strictly only for summarizing content.

Example (user):

<|user|>
Airplanes fly because four main forces act on them: lift, weight, thrust, and drag. Weight pulls the aircraft downward due to gravity, while thrust moves it forward through engines or propellers. As the airplane moves, air flows over and under its wings. The wings have an airfoil shape and are set at an angle, helping direct air downward and creating lift upward.

Pressure differences also help produce lift, with lower pressure usually forming above the wing and higher pressure below it. Drag pushes against the aircraft’s motion, so airplanes are designed with smooth, streamlined shapes to reduce resistance. During steady flight, lift balances weight, and thrust balances drag. Control surfaces such as ailerons, elevators, and rudders adjust airflow, allowing the pilot to turn, climb, descend, and land safely.

Example (assistant):

<|assistant|>
<s>Airplanes operate by balancing four main forces—lift, weight, thrust, and drag—and use airfoil shapes to create lift while managing pressure differences for efficient flight.<\s>

### The Forces of Flight
- Lift is created by the interaction between the airplane's weight and the thrust generated by engines or propellers.
- Weight pulls the aircraft downward due to gravity.
- Thrust moves the aircraft forward through engines or propellers.
- Airflow over and under wings creates lift, which pushes upward.

### Wing Design and Lift
- The airfoil shape of the wing is used to direct air downward and create lift upward.
- The angle of the wing is set at an angle to help the air flow over and under it.

### Drag and Control
- Drag opposes motion, so airplanes are designed with smooth, streamlined shapes to reduce resistance.
- Control surfaces like ailerons, elevators, and rudders adjust airflow to control the aircraft's movement.

Structure

As of now, the model has a specific structure that follows every generation. Here is an example (Input text generated by GPT5.5 | Output generated by QN-Summarizer-1):

The first paragraph provides a short overview of the text, making it suitable for notifications and other space-constrained contexts.

Following the brief overview, the model generates key points that provide a more detailed and well-structured summary for the user to reference.

Benchmarks

Overview

Parameter Breakdown:

Model	Parameter Count
*QyrouNnet Summarizer*	*400M*
gemma4-e2b	2B
ministral-3-3b	3B
granite4.1-8b	8B
qwen3.5-9b	9B
gpt-oss-20b	20B
lfm2	24B

QyrouNnet/summarizer:400m is not intended to compete with larger general-purpose models across all evaluation categories. Instead, it is optimized for fast, faithful, and factually grounded summarizations. While it may underperform on some older benchmarks, it demonstrates competitive results on modern summarization-focused evaluations.

How it performs

Benchmark Type	What the Benchmark Measures	Performance
ROUGE	Measures word and phrase overlap between the generated summary and the reference summary.	Moderate. The model scores lower than most larger models, suggesting weaker surface-level overlap. This is mostly due to a newer structure.
BERTScore	Measures semantic similarity using embeddings, focusing on meaning rather than exact wording.	Competitive but slightly lower. The model keeps meaning fairly well despite its smaller size.
FactCC	Checks factual consistency between the source text and the generated summary.	Strong. The model performs near the top, showing good factual grounding.
BARTScore	Evaluates summary quality using a generation-based scoring model. Higher scores are better.	Weak. The model has one of the lowest scores, meaning its phrasing or fluency may be less favored.
SummaC	Measures factual consistency and summary faithfulness to the source document.	Excellent. The model scores highest overall, showing strong faithfulness.
G-Eval	Uses an evaluator model to judge overall summary quality, including coherence and usefulness.	Weak. The model scores lowest, suggesting it may sound less polished than larger models.

Intended Use

The model is specifically designed for fast, local summarization. While larger 20B–24B models naturally achieve stronger results in several benchmarks due to their scale, they also require significantly more computational resources and longer generation times.

In contrast, this model prioritizes speed, efficiency, structured output, consistency, and reliability of summarization. Its main strength is not competing with large, general-purpose models in raw capability, but rather providing a practical, local summarization system that can produce dependable results quickly on consumer hardware.

Speed

Machine specifications of the machine used to evaluate inference speed:

RTX 5070
Ryzen7 9700X
32 GB RAM (DDR5 6000 mt/s)

GPU

Q4 AVG: ~800 TPS
Q5 AVG: ~760 TPS
Q6 AVG: ~740 TPS
Q8 AVG: ~695 TPS

CPU

Q4 AVG: ~180 TPS
Q5 AVG: ~150 TPS
Q6 AVG: ~140 TPS
Q8 AVG: ~100 TPS

Attributes

This model is based on SmolLM2-360M

@misc{allal2025smollm2smolgoesbig,
      title={SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Guilherme Penedo and Lewis Tunstall and Andrés Marafioti and Hynek Kydlíček and Agustín Piqueres Lajarín and Vaibhav Srivastav and Joshua Lochner and Caleb Fahlgren and Xuan-Son Nguyen and Clémentine Fourrier and Ben Burtenshaw and Hugo Larcher and Haojun Zhao and Cyril Zakka and Mathieu Morlon and Colin Raffel and Leandro von Werra and Thomas Wolf},
      year={2025},
      eprint={2502.02737},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.02737}, 
}

This model was finetuned with Unsloth.

@misc{unsloth2024,
  author = {Daniel Han and the Unsloth contributors},
  title = {Unsloth: Faster and more memory efficient LLM finetuning},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/unslothai/unsloth}},
}

Citation

If you are using this model for you project, please cite me with:

@misc{QyrouNnet2026,
    author       = {MightyXdash},
    title        = {QyrouNnet: A Lightweight 400M-Parameter AI Model for High-Speed and Factual Text Summarization},
    howpublished = {\url{https://ollama.com/QyrouNnet/summarizer}},
    month        = may,
    year         = {2026},
    note         = {Version 1. Initial public release of a compact open-source AI model optimized for fast local inference, low hallucination rates, and factual abstractive summarization. Available in Q4, Q5, Q6, and Q8 quantizations}
}

QN-Summarizer-1 is a 400M parameter model specialized for summarization, outperforming ~20B models on benchmarks. Delivers ~1.5s CPU and millisecond GPU inference. Accepts up to 7,000 words for fast, high-quality document summarization.

Models

Readme

Model

Overview

Model Interactions

Structure

Benchmarks

Overview

Parameter Breakdown:

How it performs

Intended Use

Speed

Attributes

Citation