21 Downloads Updated 2 days ago
ollama run DedeProGames/AstralOCR-8b
AstralOCR is the latest and most capable OCR model in the Astral family. It is built on SigLip-400M and Qwen2-7B, totaling 8B parameters. it brings major quality gains and adds new capabilities for multi-image and video understanding.
Notable features include:
Leading Performance: AstralOCR reaches an average score of 65.2 on the latest OpenCompass (an evaluation spanning 8 popular benchmarks). With only 8B parameters, it can outperform widely used proprietary models such as GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet on single-image understanding.
Multi-Image Understanding & In-Context Learning: AstralOCR supports conversation and reasoning over multiple images. It reports state-of-the-art results on multi-image benchmarks like Mantis-Eval, BLINK, Mathverse mv, and Sciverse mv, and shows promising in-context learning behavior.
Strong OCR Capability: AstralOCR can handle images with any aspect ratio and up to 1.8 million pixels (e.g., 1344×1344). It reports state-of-the-art performance on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V, and Gemini 1.5 Pro. With RLAIF-V and VisCPM techniques, it aims for more trustworthy behavior (notably lower hallucination rates than GPT-4o/GPT-4V on Object HalBench) and supports multiple languages, including English, Chinese, German, French, Italian, Korean, and more.
Superior Efficiency: AstralOCR emphasizes high token density (more pixels per visual token). It produces only ~640 tokens for a 1.8M-pixel image—around 75% fewer than many alternatives—improving inference speed, first-token latency, memory usage, and power consumption.