47 yesterday

State-of-the-art OCR (Optical Character Recognition) vision language model based on [allenai/olmOCR-2-7B-1025](https://huggingface.co/allenai/olmOCR-2-7B-1025).

vision

yesterday

66e48f629e5e · 9.5GB ·

qwen2vl
·
7.62B
·
Q8_0
clip
·
677M
·
F16
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user
You are an OCR assistant specialized in extracting text from images. You can accurately transcribe d
Apache 2.0
{ "num_ctx": 4096, "stop": [ "<|im_start|>", "<|im_end|>" ], "temper

Readme

olmOCR-2 7B Q8

State-of-the-art OCR (Optical Character Recognition) vision language model based on allenai/olmOCR-2-7B-1025.

Model Description

This model excels at extracting text from: - Documents and PDFs - Handwritten notes - Tables and spreadsheets - Charts and graphs - Mathematical expressions - Screenshots and images

Base Model: Qwen2.5-VL-7B-Instruct (fine-tuned for OCR) Quantization: Q8_0 (8-bit, high quality) Size: 8.85 GB Performance: 82.4 points on olmOCR-Bench

Installation

ollama pull richardyoung/olmocr2:7b-q8

Usage

Basic OCR

ollama run richardyoung/olmocr2:7b-q8 "Extract all text from this image." image.png

Extract text from a document

ollama run richardyoung/olmocr2:7b-q8 "Extract all text from this document, preserving formatting and structure." document.jpg

Transcribe handwriting

ollama run richardyoung/olmocr2:7b-q8 "Transcribe the handwritten text in this image." handwriting.jpg

Extract table data

ollama run richardyoung/olmocr2:7b-q8 "Extract the table data and format it as markdown." table.png

Extract mathematical equations

ollama run richardyoung/olmocr2:7b-q8 "Extract all mathematical equations from this image in LaTeX format." math.png

Python API Example

import ollama

response = ollama.chat(
    model='richardyoung/olmocr2:7b-q8',
    messages=[{
        'role': 'user',
        'content': 'Extract all text from this image.',
        'images': ['document.png']
    }]
)

print(response['message']['content'])

System Requirements

Minimum: - 10 GB RAM - 9 GB free disk space

Recommended: - 16 GB RAM - Apple Silicon (M1/M2/M3/M4) or NVIDIA GPU - Metal or CUDA support for GPU acceleration

Tips for Best Results

  1. Image Quality: Use clear, high-contrast images for best OCR accuracy
  2. Prompts: Be specific about what you want extracted (e.g., “preserve formatting”, “convert to markdown”)
  3. Context Length: For long documents, you may need to increase context with --num-ctx 8192

Model Variants

If you need different formats or sizes:

License

Apache 2.0 (same as base model)

Citation

@article{olmocr2,
  title={olmOCR-2: Advancing OCR with Vision Language Models},
  author={Allen Institute for AI},
  year={2025}
}

Links