2,789 Downloads Updated 2 months ago
ollama run maternion/LightOnOCR-2:1b
LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.
📄 Paper | 📝 Blog Post | 🚀 Demo | 📊 Dataset | 📊 BBox Dataset | 📓 Finetuning Notebook
| Variant | Description |
|---|---|
| LightOnOCR-2-1B | Best OCR model |
| LightOnOCR-2-1B-base | Base model, ideal for fine-tuning |
| LightOnOCR-2-1B-bbox | Best model with image bounding boxes |
| LightOnOCR-2-1B-bbox-base | Base bbox model, ideal for fine-tuning |
| LightOnOCR-2-1B-ocr-soup | Merged variant for extra robustness |
| LightOnOCR-2-1B-bbox-soup | Merged variant: OCR + bbox combined |
ollama run Maternion/LightOnOCR-2:1b
LightOnOCR-2 is fully differentiable and supports:
For fine-tuning, we recommend starting with the LightOnOCR-2-1B-base variant.
Apache License 2.0
@misc{lightonocr2_2026,
title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
year = {2026},
howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}