411 2 months ago

vision

2 months ago

0fe0bd939328 · 3.2GB ·

qwen2vl
·
3.09B
·
Q4_0
clip
·
669M
·
F16
You are a helpful assistant.
{{- if .System -}} <|im_start|>system {{ .System }}<|im_end|> {{- end -}} {{- range $i, $_ := .Messa
{ "min_p": 0.01, "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_en

Readme

📄 Nanonets-OCR-s

A compact (3B‑parameter) Vision‑Language OCR model that turns document images into semantically rich Markdown—recognizing tables, LaTeX, checkboxes, signatures, watermarks, and images.


🚀 Quick Start

ollama pull yasserrmd/Nanonets-OCR-s:latest
ollama run Nanonets-OCR-s \
  --prompt "Process image.png" \
  --mount image.png:/input.png
  • :latest is ~4.6 GB, supports 125K context.
  • Quantized to Q8: runs well on ≥6 GB GPUs.
  • Lower‑precision variants (Q4_K_M) exist, but performance may degrade ([ollama.com][1], [communeify.com][2]).

✅ Features

  • LaTeX equations: auto-converts inline and display math.
  • Structured tables: preserves layout, outputs Markdown/HTML.
  • Image captioning: embeds <img> tags with descriptions.
  • Watermark & signature isolation.
  • Checkbox handling: outputs Unicode ☑, ☐ tags.
  • Page number tags.

Supports clean JSON or Markdown outputs for LLM pipelines ([dev.to][3], [communeify.com][2], [docs.inferless.com][4], [nanonets.com][5], [medium.com][6], [learnopencv.com][7]).


💻 Requirements

  • GPU: 6 GB+ VRAM (RTX 3060+). For best results, 16 GB+ (3090/4090/A100) .
  • Ollama version 0.8.0+.
  • Completely offline and open-source (Apache‑2.0 license) ([ollama.com][1], [docs.inferless.com][4]).

💡 Usage Tips

  • Use Q8 format unless memory is extremely tight — Q4 may hurt accuracy ([ollama.com][1]).
  • Preprocess images: crop, straighten, denoise for cleaner output.
  • Avoid handwritten text—wasn’t trained for it and may hallucinate ([nanonets.com][5]).

🧠 Best Use Cases

  • Research papers: preserves LaTeX math and tables.
  • Legal/finance docs: detects signatures, watermarks, structured info.
  • Forms & surveys: handles visual elements and checkboxes reliably.

⚠️ Limitations

  • Not optimized for handwriting—limited performance.
  • Some hallucination risk, especially in degraded images ([nanonets.com][5], [learnopencv.com][7]).