170 1 week ago

Chandra-OCR-2 from Datalab is a state-of-the-art OCR model that outputs structured markdown, HTML, or JSON while preserving precise layout information from images and PDFs across 90+ languages. there is another of this in Ollama but unpatched [no vision?]

vision tools thinking
ollama run fredrezones55/chandra-ocr-2:patch

Details

1 week ago

8714a4782ea6 · 5.8GB ·

qwen35
·
5.17B
·
Q8_0
{ "presence_penalty": 1.5, "repeat_penalty": 1, "temperature": 0, "top_k": 20, "
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user /nothink {{ .Pr

Readme

Seems to have been a requested model though the Ollama issues page, what is the point of having this model if there is no vision capability? We just needed to correct this mistake.

before I forget: Happy Easter!

the GGUF model was sourced from: https://huggingface.co/prithivMLmods/chandra-ocr-2-GGUF the original fine-tune was from: https://huggingface.co/datalab-to/chandra-ocr-2

As an experiment: I’ll start this 4B model with Q8 quant paired with F16 vision. [this config seems to mostly fit in a 8GB vram pascal gpu.]

Noting that the base model is Qwen3.5:4B and all it’s limitations with Ollama, but this model has vision fully working otherwise that breaks the point of a OCR vision based model 🤣.

you could just give the patched model no prompt and just the image; and it will begin to OCR unprompted.

the model has likely been trained thoroughly with datalab-to’s OCR model harness with the quick start recommentation of:

pip install chandra-ocr

# With vLLM (recommended, easy install)
chandra_vllm
chandra input.pdf ./output

# With HuggingFace (requires torch)
pip install chandra-ocr[hf]
chandra input.pdf ./output --method hf
# In particular this ollama model, we can use vllm
VLLM_API_BASE=http://localhost:11434/v1 VLLM_MODEL_NAME=fredrezones55/chandra-ocr-2:patch chandra --method vllm input output

the patch model is my attempts to constrain the base model so it will stop thinking and breaking the chandra program. [he could have used an instruction model or something] {or perhaps I have not done enough research}

issues could be a capped text generation where you might need to set the MAX_OUTPUT_TOKENS environment variable.

Multilingual Benchmark (43 Languages)

The table below covers the 43 most common languages, benchmarked across multiple models. For a comprehensive evaluation across 90 languages (Chandra 2 vs Gemini 2.5 Flash only), see the full 90-language benchmark.

Language Datalab API Chandra 2 Chandra 1 Gemini 2.5 Flash GPT-5 Mini
ar 67.6% 68.4% 34.0% 84.4% 55.6%
bn 85.1% 72.8% 45.6% 55.3% 23.3%
ca 88.7% 85.1% 84.2% 88.0% 78.5%
cs 88.2% 85.3% 84.7% 79.1% 78.8%
da 90.1% 91.1% 88.4% 86.0% 87.7%
de 93.8% 94.8% 83.0% 88.3% 93.8%
el 89.9% 85.6% 85.5% 83.5% 82.4%
es 91.8% 89.3% 88.7% 86.8% 97.1%
fa 82.2% 75.1% 69.6% 61.8% 56.4%
fi 85.7% 83.4% 78.4% 86.0% 84.7%
fr 93.3% 93.7% 89.6% 86.1% 91.1%
gu 73.8% 70.8% 44.6% 47.6% 11.5%
he 76.4% 70.4% 38.9% 50.9% 22.3%
hi 80.5% 78.4% 70.2% 82.7% 41.0%
hr 93.4% 90.1% 85.9% 88.2% 81.3%
hu 88.1% 82.1% 82.5% 84.5% 84.8%
id 91.3% 91.6% 86.7% 88.3% 89.7%
it 94.4% 94.1% 89.1% 85.7% 91.6%
ja 87.3% 86.9% 85.4% 80.0% 76.1%
jv 87.5% 73.2% 85.1% 80.4% 69.6%
kn 70.0% 63.2% 20.6% 24.5% 10.1%
ko 89.1% 81.5% 82.3% 84.8% 78.4%
la 78.0% 73.8% 55.9% 70.5% 54.6%
ml 72.4% 64.3% 18.1% 23.8% 11.9%
mr 80.8% 75.0% 57.0% 69.7% 20.9%
nl 90.0% 88.6% 85.3% 87.5% 83.8%
no 89.2% 90.3% 85.5% 87.8% 87.4%
pl 93.8% 91.5% 83.9% 89.7% 90.4%
pt 97.0% 95.2% 84.3% 89.4% 90.8%
ro 86.2% 84.5% 82.1% 76.1% 77.3%
ru 88.8% 85.5% 88.7% 82.8% 72.2%
sa 57.5% 51.1% 33.6% 44.6% 12.5%
sr 95.3% 90.3% 82.3% 89.7% 83.0%
sv 91.9% 92.8% 82.1% 91.1% 92.1%
ta 82.9% 77.7% 50.8% 53.9% 8.1%
te 69.4% 58.6% 19.5% 33.3% 9.9%
th 71.6% 62.6% 47.0% 66.7% 53.8%
tr 88.9% 84.1% 68.1% 84.1% 78.2%
uk 93.1% 91.0% 88.5% 87.9% 81.9%
ur 54.1% 43.2% 28.1% 57.6% 16.9%
vi 85.0% 80.4% 81.6% 89.5% 83.6%
zh 87.8% 88.7% 88.3% 70.0% 70.4%
Average 80.4% 77.8% 69.4% 67.6% 60.5%

Full 90-Language Benchmark

We also have a more comprehensive evaluation covering 90 languages, comparing Chandra 2 against Gemini 2.5 Flash. The average scores are lower than the 43-language table above because this includes many lower-resource languages. Chandra 2 averages 72.7% vs Gemini 2.5 Flash at 60.8%.

See the full 90-language results.