2,785 2 months ago

Best OCR model. LightOnOCR-2-1B is our flagship OCR model, refined with RLVR training for maximum accuracy. We recommend this variant for most OCR tasks.

vision tools 1b
ollama run maternion/LightOnOCR-2:1b

Details

2 months ago

ee7d83c4eb67 · 1.5GB ·

qwen3
·
596M
·
Q8_0
clip
·
410M
·
BF16
{ "temperature": 0.2, "top_p": 0.9 }
{{ .Prompt }}

Readme

About LightOnOCR-2

LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.

Highlights

  • Speed: 3.3× faster than Chandra OCR, 1.7× faster than OlmOCR, 5× faster than dots.ocr, 2× faster than PaddleOCR-VL-0.9B, 1.73× faster than DeepSeekOCR
  • 💸 Efficiency: Processes 5.71 pages/s on a single H100 (~493k pages/day) for <$0.01 per 1,000 pages
  • 🧠 End-to-End: Fully differentiable, no external OCR pipeline
  • 🧾 Versatile: Handles tables, receipts, forms, multi-column layouts, and math notation
  • 📍 Image detection: Predicts bounding boxes for embedded images (bbox variants)

📄 Paper | 📝 Blog Post | 🚀 Demo | 📊 Dataset | 📊 BBox Dataset | 📓 Finetuning Notebook


Model Variants

Variant Description
LightOnOCR-2-1B Best OCR model
LightOnOCR-2-1B-base Base model, ideal for fine-tuning
LightOnOCR-2-1B-bbox Best model with image bounding boxes
LightOnOCR-2-1B-bbox-base Base bbox model, ideal for fine-tuning
LightOnOCR-2-1B-ocr-soup Merged variant for extra robustness
LightOnOCR-2-1B-bbox-soup Merged variant: OCR + bbox combined

Benchmarks

1000110436.png

Usage with Ollama

ollama run Maternion/LightOnOCR-2:1b


Rendering and Preprocessing Tips

  • Render PDFs to PNG or JPEG at a target longest dimension of 1540px
  • Maintain aspect ratio to preserve text geometry
  • Use one image per page; batching supported by vLLM

Fine-tuning

LightOnOCR-2 is fully differentiable and supports:

  • LoRA fine-tuning
  • Domain adaptation (receipts, scientific articles, forms, etc.)
  • Multilingual fine-tuning with task-specific corpora

For fine-tuning, we recommend starting with the LightOnOCR-2-1B-base variant.


License

Apache License 2.0


Citation

@misc{lightonocr2_2026,
  title        = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author       = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year         = {2026},
  howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}