4,531 yesterday

DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

vision 3b

Models

View all →

Readme

DeepSeek-OCR requires Ollama v0.13.0 or later.

DeepSeek-OCR is a vision-language model that can perform token-efficient optical character recognition (OCR).

fig1.png

Example inputs

Please note, the model is sensitive to its input. For example, a missing punctuation or new line may cause an improper output.

ollama run deepseek-ocr "/path/to/image\n<|grounding|>Given the layout of the image."
ollama run deepseek-ocr "/path/to/image\nFree OCR."
ollama run deepseek-ocr "/path/to/image\nParse the figure."
ollama run deepseek-ocr "/path/to/image\nExtract the text in the image."
ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."

Examples

show1.jpg

show2.jpg

show3.jpg

show4.jpg

References