Astria is a multimodal model built by combining a LLaVA vision encoder with the new Ministral model, producing a unified system capable of detailed visual understanding and strong general-purpose reasoning.

$Astria\_\_3\_-removebg-preview.png$

Astria

Astria is a next-generation multimodal model that fuses a LLaVA-style vision encoder with the strengthened knowledge capabilities of a Ministral-based language model. This combination enables more accurate visual interpretation, more grounded reasoning, and smoother alignment between image and text understanding.

✨ Model Summary

Astria extends the classic LLaVA pipeline with a more advanced language core. Key characteristics:

Vision Encoder: LLaVA-compatible visual features with improved alignment.
Language Backbone: Ministral-based reasoning, memory, and world-knowledge improvements.
Training Style: End-to-end multimodal alignment inspired by LLaVA, with enhanced knowledge supervision.
Output: Smooth, context-aware image–text reasoning.

Astria is optimized for reliability, factuality, and visual detail understanding while remaining fully local.

📦 Usage (Ollama)

Pull the GGUF model:

ollama pull Me7war/Astria

Run the model:

ollama run astria

📊 Visual Reasoning Performance

LLaVA alone reports 90.92% on visual-reasoning benchmarks when evaluated with a text-only judge (GPT-4-style scoring).

Astria applies the same evaluation paradigm using GPT-5 PRO as a judge. Thanks to its Ministral knowledge core and improved vision–language alignment, Astria reaches:

🔥 New SOTA: 92.53%

(A full leaderboard and evaluation details will be released when the architecture is published.)

📝 Astria vs GPT-5 on a Custom Evaluation Set

An evaluation dataset with 30 unseen images was constructed. Each image is associated with three types of instructions:

Conversation
Detailed description
Complex reasoning

This results in 90 new language–image instructions, tested with Astria and GPT-5, and scored using GPT-5 PRO as the judge. Scores are from 1 to 10, with summed and relative scores reported.

Results: Astria outperforms GPT-5 across all instruction types, demonstrating the effectiveness of combining a LLaVA-style vision encoder with a Ministral knowledge-enhanced language model.

📄 Notes

The full model architecture and training scripts will be published once finalized.
This release focuses on providing a clean GGUF build for local experimentation.
Additional formats and inference utilities may be added later.

⚖️## License

codeAstria is released under the Astria License. It is free for personal and non-commercial use. Commercial usage requires explicit permission from the creator.