deepseek-ocr:latest

4,667 yesterday

DeepSeek-OCR is a vision-language model that can perform token-efficient OCR.

vision 3b

yesterday

0e7b018b8a22 · 6.7GB ·

deepseekocr
·
3.34B
·
F16
MIT License Copyright (c) [year] [fullname] Permission is hereby granted, free of charge, to any per
{ "temperature": 0 }

Readme

DeepSeek-OCR requires Ollama v0.13.0 or later.

DeepSeek-OCR is a vision-language model that can perform token-efficient optical character recognition (OCR).

fig1.png

Example inputs

Please note, the model is sensitive to its input. For example, a missing punctuation or new line may cause an improper output.

ollama run deepseek-ocr "/path/to/image\n<|grounding|>Given the layout of the image."
ollama run deepseek-ocr "/path/to/image\nFree OCR."
ollama run deepseek-ocr "/path/to/image\nParse the figure."
ollama run deepseek-ocr "/path/to/image\nExtract the text in the image."
ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."

Examples

show1.jpg

show2.jpg

show3.jpg

show4.jpg

References