28 Downloads Updated 3 days ago
Astria is a next-generation multimodal model that fuses a LLaVA-style vision encoder with the strengthened knowledge capabilities of a Ministral-based language model. This combination enables more accurate visual interpretation, more grounded reasoning, and smoother alignment between image and text understanding.
Astria extends the classic LLaVA pipeline with a more advanced language core. Key characteristics:
Astria is optimized for reliability, factuality, and visual detail understanding while remaining fully local.
Pull the GGUF model:
ollama pull Me7war/Astria
Run the model:
ollama run astria
LLaVA alone reports 90.92% on visual-reasoning benchmarks when evaluated with a text-only judge (GPT-4-style scoring).
Astria applies the same evaluation paradigm using GPT-5 PRO as a judge. Thanks to its Ministral knowledge core and improved vision–language alignment, Astria reaches:
(A full leaderboard and evaluation details will be released when the architecture is published.)
An evaluation dataset with 30 unseen images was constructed. Each image is associated with three types of instructions:
This results in 90 new language–image instructions, tested with Astria and GPT-5, and scored using GPT-5 PRO as the judge. Scores are from 1 to 10, with summed and relative scores reported.
Results: Astria outperforms GPT-5 across all instruction types, demonstrating the effectiveness of combining a LLaVA-style vision encoder with a Ministral knowledge-enhanced language model.
⚖️## License
codeAstria is released under the Astria License. It is free for personal and non-commercial use. Commercial usage requires explicit permission from the creator.