📄 Nanonets-OCR-s
A compact (3B‑parameter) Vision‑Language OCR model that turns document images into semantically rich Markdown—recognizing tables, LaTeX, checkboxes, signatures, watermarks, and images.
🚀 Quick Start
ollama pull yasserrmd/Nanonets-OCR-s:latest
ollama run Nanonets-OCR-s \
--prompt "Process image.png" \
--mount image.png:/input.png
:latest
is ~4.6 GB, supports 125K context.
- Quantized to Q8: runs well on ≥6 GB GPUs.
- Lower‑precision variants (Q4_K_M) exist, but performance may degrade ([ollama.com][1], [communeify.com][2]).
✅ Features
- LaTeX equations: auto-converts inline and display math.
- Structured tables: preserves layout, outputs Markdown/HTML.
- Image captioning: embeds
<img>
tags with descriptions.
- Watermark & signature isolation.
- Checkbox handling: outputs Unicode ☑, ☐ tags.
- Page number tags.
Supports clean JSON or Markdown outputs for LLM pipelines ([dev.to][3], [communeify.com][2], [docs.inferless.com][4], [nanonets.com][5], [medium.com][6], [learnopencv.com][7]).
💻 Requirements
- GPU: 6 GB+ VRAM (RTX 3060+). For best results, 16 GB+ (3090/4090/A100) .
- Ollama version 0.8.0+.
- Completely offline and open-source (Apache‑2.0 license) ([ollama.com][1], [docs.inferless.com][4]).
💡 Usage Tips
- Use Q8 format unless memory is extremely tight — Q4 may hurt accuracy ([ollama.com][1]).
- Preprocess images: crop, straighten, denoise for cleaner output.
- Avoid handwritten text—wasn’t trained for it and may hallucinate ([nanonets.com][5]).
🧠 Best Use Cases
- Research papers: preserves LaTeX math and tables.
- Legal/finance docs: detects signatures, watermarks, structured info.
- Forms & surveys: handles visual elements and checkboxes reliably.
⚠️ Limitations
- Not optimized for handwriting—limited performance.
- Some hallucination risk, especially in degraded images ([nanonets.com][5], [learnopencv.com][7]).