scb10x/typhoon-ocr-3b

Typhoon-OCR-3B: A bilingual document parsing model built specifically for real-world documents in Thai and English inspired by models like olmOCR based on Qwen2.5-VL-Instruction.

Try our demo available on Demo

Code / Examples available on Github

Release Blog available on OpenTyphoon Blog

*Remark: This model is intended to be used with a specific prompt only; it will not work with any other prompts.

Usage Example

ollama run scb10x/typhoon-ocr-3b

If you get inaccurate results on macOS, please set num_gpu to 0.

/set parameter num_gpu 0

(Recommended): Using Typhoon-OCR Package

pip install typhoon-ocr

from typhoon_ocr import ocr_document

markdown = ocr_document("test.png", base_url="http://localhost:11434/v1", api_key="ollama", model='scb10x/typhoon-ocr-3b')
print(markdown)

Prompting

This model only works with the specific prompts defined below, where {base_text} refers to information extracted from the PDF metadata using the get_anchor_text function from the typhoon-ocr package. It will not function correctly with any other prompts.

PROMPTS_SYS = {
    "default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
        f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
        f"If the document contains images, use a placeholder like dummy.png for each image.\n"
        f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
        f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"),
    "structure": lambda base_text: (
        f"Below is an image of a document page, along with its dimensions and possibly some raw textual content previously extracted from it. "
        f"Note that the text extraction may be incomplete or partially missing. Carefully consider both the layout and any available text to reconstruct the document accurately.\n"
        f"Your task is to return the markdown representation of this document, presenting tables in HTML format as they naturally appear.\n"
        f"If the document contains images or figures, analyze them and include the tag <figure>IMAGE_ANALYSIS</figure> in the appropriate location.\n"
        f"Your final output must be in JSON format with a single key `natural_text` containing the response.\n"
        f"RAW_TEXT_START\n{base_text}\nRAW_TEXT_END"
    ),
}

Generation Parameters

We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.

temperature=0.1,
top_p=0.6,
repetition_penalty: 1.2

Intended Uses & Limitations

This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.

https://twitter.com/opentyphoon

Support

https://discord.gg/us5gAYmrxw

Citation

If you find Typhoon2 useful for your work, please cite it using:

@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}

Typhoon-OCR - A document parsing model built for Thai and English

Models

Readme