GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.
22.3K Pulls 3 Tags Updated 1 week ago
The most powerful vision-language model in the Qwen model family to date.
1.4M Pulls 59 Tags Updated 3 months ago
The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
400.7K Pulls 16 Tags Updated 2 months ago
24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
138.3K Pulls 6 Tags Updated 1 month ago
An update to Mistral Small that improves on function calling, instruction following, and less repetition errors.
1.2M Pulls 5 Tags Updated 7 months ago
Meta's latest collection of multimodal models.
1.2M Pulls 11 Tags Updated 7 months ago
A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
729.1K Pulls 5 Tags Updated 11 months ago
Building upon Mistral Small 3, Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.
583.3K Pulls 5 Tags Updated 10 months ago