fredrezones55/chandra-ocr-2

Chandra-OCR-2 from Datalab is a state-of-the-art OCR model that outputs structured markdown, HTML, or JSON while preserving precise layout information from images and PDFs across 90+ languages. there is another of this in Ollama but unpatched [no vision?]

Details

Updated 3 months ago

3 months ago

f6ea1f1f31e2 · 5.8GB ·

model

archqwen35

parameters5.17B

quantizationQ8_0

5.8GB

params

{ "presence_penalty": 1.5, "repeat_penalty": 1, "temperature": 1, "top_k": 20, "

84B

template

13B

Seems to have been a requested model though the Ollama issues page, what is the point of having this model if there is no vision capability? We just needed to correct this mistake.

This model has been bought to you by the seemingly unconventional work of: 繋 ollama - v0.0.1

a gguf and mmproj quick patcher to bridge to the Ollama library

before I forget: Happy Easter!

the GGUF model was sourced from: https://huggingface.co/prithivMLmods/chandra-ocr-2-GGUF the original fine-tune was from: https://huggingface.co/datalab-to/chandra-ocr-2

As an experiment: I’ll start this 4B model with Q8 quant paired with F16 vision. [this config seems to mostly fit in a 8GB vram pascal gpu.]

Noting that the base model is Qwen3.5:4B and all it’s limitations with Ollama, but this model has vision fully working otherwise that breaks the point of a OCR vision based model 🤣.

you could just give the patched model no prompt and just the image; and it will begin to OCR unprompted.

the model has likely been trained thoroughly with datalab-to’s OCR model harness with the quick start recommentation of:

pip install chandra-ocr

# With vLLM (recommended, easy install)
chandra_vllm
chandra input.pdf ./output

# With HuggingFace (requires torch)
pip install chandra-ocr[hf]
chandra input.pdf ./output --method hf

# In particular this ollama model, we can use vllm
VLLM_API_BASE=http://localhost:11434/v1 VLLM_MODEL_NAME=fredrezones55/chandra-ocr-2:patch chandra --method vllm input output

the patch model is my attempts to constrain the base model so it will stop thinking and breaking the chandra program. [he could have used an instruction model or something] {or perhaps I have not done enough research}

issues could be a capped text generation where you might need to set the MAX_OUTPUT_TOKENS environment variable.

Multilingual Benchmark (43 Languages)

The table below covers the 43 most common languages, benchmarked across multiple models. For a comprehensive evaluation across 90 languages (Chandra 2 vs Gemini 2.5 Flash only), see the full 90-language benchmark.

Language	Datalab API	Chandra 2	Chandra 1	Gemini 2.5 Flash	GPT-5 Mini
ar	67.6%	68.4%	34.0%	84.4%	55.6%
bn	85.1%	72.8%	45.6%	55.3%	23.3%
ca	88.7%	85.1%	84.2%	88.0%	78.5%
cs	88.2%	85.3%	84.7%	79.1%	78.8%
da	90.1%	91.1%	88.4%	86.0%	87.7%
de	93.8%	94.8%	83.0%	88.3%	93.8%
el	89.9%	85.6%	85.5%	83.5%	82.4%
es	91.8%	89.3%	88.7%	86.8%	97.1%
fa	82.2%	75.1%	69.6%	61.8%	56.4%
fi	85.7%	83.4%	78.4%	86.0%	84.7%
fr	93.3%	93.7%	89.6%	86.1%	91.1%
gu	73.8%	70.8%	44.6%	47.6%	11.5%
he	76.4%	70.4%	38.9%	50.9%	22.3%
hi	80.5%	78.4%	70.2%	82.7%	41.0%
hr	93.4%	90.1%	85.9%	88.2%	81.3%
hu	88.1%	82.1%	82.5%	84.5%	84.8%
id	91.3%	91.6%	86.7%	88.3%	89.7%
it	94.4%	94.1%	89.1%	85.7%	91.6%
ja	87.3%	86.9%	85.4%	80.0%	76.1%
jv	87.5%	73.2%	85.1%	80.4%	69.6%
kn	70.0%	63.2%	20.6%	24.5%	10.1%
ko	89.1%	81.5%	82.3%	84.8%	78.4%
la	78.0%	73.8%	55.9%	70.5%	54.6%
ml	72.4%	64.3%	18.1%	23.8%	11.9%
mr	80.8%	75.0%	57.0%	69.7%	20.9%
nl	90.0%	88.6%	85.3%	87.5%	83.8%
no	89.2%	90.3%	85.5%	87.8%	87.4%
pl	93.8%	91.5%	83.9%	89.7%	90.4%
pt	97.0%	95.2%	84.3%	89.4%	90.8%
ro	86.2%	84.5%	82.1%	76.1%	77.3%
ru	88.8%	85.5%	88.7%	82.8%	72.2%
sa	57.5%	51.1%	33.6%	44.6%	12.5%
sr	95.3%	90.3%	82.3%	89.7%	83.0%
sv	91.9%	92.8%	82.1%	91.1%	92.1%
ta	82.9%	77.7%	50.8%	53.9%	8.1%
te	69.4%	58.6%	19.5%	33.3%	9.9%
th	71.6%	62.6%	47.0%	66.7%	53.8%
tr	88.9%	84.1%	68.1%	84.1%	78.2%
uk	93.1%	91.0%	88.5%	87.9%	81.9%
ur	54.1%	43.2%	28.1%	57.6%	16.9%
vi	85.0%	80.4%	81.6%	89.5%	83.6%
zh	87.8%	88.7%	88.3%	70.0%	70.4%
Average	80.4%	77.8%	69.4%	67.6%	60.5%

Full 90-Language Benchmark

We also have a more comprehensive evaluation covering 90 languages, comparing Chandra 2 against Gemini 2.5 Flash. The average scores are lower than the 43-language table above because this includes many lower-resource languages. Chandra 2 averages 72.7% vs Gemini 2.5 Flash at 60.8%.

See the full 90-language results.

Chandra-OCR-2 from Datalab is a state-of-the-art OCR model that outputs structured markdown, HTML, or JSON while preserving precise layout information from images and PDFs across 90+ languages. there is another of this in Ollama but unpatched [no vision?]

Details

Readme

Multilingual Benchmark (43 Languages)

Full 90-Language Benchmark