49 Downloads Updated 2 days ago
Updated 1 week ago
1 week ago
7ac30fe735fd · 2.9GB ·
Astria is a next-generation, fully local multimodal foundation model built on top of a Ministral-based language backbone and a custom vision encoder. This architecture significantly improves visual grounding, multilingual reasoning, and agentic reliability while remaining efficient enough for edge deployment.
Me7war’s latest Astria update pushes the limits of small-scale multimodal AI, combining efficiency, reasoning, and vision capabilities:
A fully local, compact model redefining what edge-deployable multimodal AI can achieve.
Astria applies a custom evaluation using GPT-5 PRO as the judge.
LLaVA baseline: 90.92%
A custom evaluation on 30 unseen images with 3 instruction types per image (conversation, description, complex reasoning) shows Astria outperforms GPT-5 in all categories.
Pull the model:
ollama pull Me7war/Astria
Run locally:
ollama run astria
A custom evaluation set of 30 unseen images was constructed. Each image includes three instruction types:
This yields 90 unique image–language tasks, evaluated on:
Scoring was performed by GPT-5 PRO, using a 1–10 scale per task.
Astria outperforms GPT-5 across all instruction categories, validating the effectiveness of the custom vision encoder combined with the Ministral knowledge-enhanced language model.
Astria is released under the Astria License for personal and non-commercial use. Commercial use requires explicit permission from the creator.