28 3 days ago

Astria is a multimodal model built by combining a LLaVA vision encoder with the new Ministral model, producing a unified system capable of detailed visual understanding and strong general-purpose reasoning.

vision tools 4b 8b

Models

View all →

Readme

Astria\_\_3\_-removebg-preview.png

Astria

Astria is a next-generation multimodal model that fuses a LLaVA-style vision encoder with the strengthened knowledge capabilities of a Ministral-based language model. This combination enables more accurate visual interpretation, more grounded reasoning, and smoother alignment between image and text understanding.


✨ Model Summary

Astria extends the classic LLaVA pipeline with a more advanced language core. Key characteristics:

  • Vision Encoder: LLaVA-compatible visual features with improved alignment.
  • Language Backbone: Ministral-based reasoning, memory, and world-knowledge improvements.
  • Training Style: End-to-end multimodal alignment inspired by LLaVA, with enhanced knowledge supervision.
  • Output: Smooth, context-aware image–text reasoning.

Astria is optimized for reliability, factuality, and visual detail understanding while remaining fully local.


📦 Usage (Ollama)

Pull the GGUF model:

ollama pull Me7war/Astria

Run the model:

ollama run astria

📊 Visual Reasoning Performance

Astria (1).png

LLaVA alone reports 90.92% on visual-reasoning benchmarks when evaluated with a text-only judge (GPT-4-style scoring).

Astria applies the same evaluation paradigm using GPT-5 PRO as a judge. Thanks to its Ministral knowledge core and improved vision–language alignment, Astria reaches:

🔥 New SOTA: 92.53%

(A full leaderboard and evaluation details will be released when the architecture is published.)


📝 Astria vs GPT-5 on a Custom Evaluation Set

Astria-evulation.png

An evaluation dataset with 30 unseen images was constructed. Each image is associated with three types of instructions:

  1. Conversation
  2. Detailed description
  3. Complex reasoning

This results in 90 new language–image instructions, tested with Astria and GPT-5, and scored using GPT-5 PRO as the judge. Scores are from 1 to 10, with summed and relative scores reported.

Results: Astria outperforms GPT-5 across all instruction types, demonstrating the effectiveness of combining a LLaVA-style vision encoder with a Ministral knowledge-enhanced language model.


📄 Notes

  • The full model architecture and training scripts will be published once finalized.
  • This release focuses on providing a clean GGUF build for local experimentation.
  • Additional formats and inference utilities may be added later.

⚖️## License

codeAstria is released under the Astria License. It is free for personal and non-commercial use. Commercial usage requires explicit permission from the creator.