119 1 week ago

Typhoon-OCR 1.5 - A document parsing model built for Thai and English

vision

Models

View all →

Readme

Typhoon-OCR-3B 1.5: A bilingual document parsing model built specifically for real-world documents in Thai and English based on Qwen2.5-VL. (Q4_K_M quantized).

This model has been quantization-aware trained (QAT), preserving a quality similar to half-precision (BF16) models while maintaining a lower memory footprint (3x to non-quantized model) and enabling accurate prediction in 4-bit version.

Try our demo available on Demo

Code / Examples available on Github

Release Blog available on OpenTyphoon Blog

*Remark: This model is intended to be used with a specific prompt only; it will not work with any other prompts.

Usage Example

ollama run scb10x/typhoon-ocr1.5-3b

(Recommended): Using Typhoon-OCR Package

pip install typhoon-ocr
from typhoon_ocr import ocr_document

markdown = ocr_document("test.png", base_url="http://localhost:11434/v1", api_key="ollama", model='scb10x/typhoon-ocr1.5-3b')
print(markdown)

Prompting

Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes.

Generation Parameters

We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.

temperature=0.1,
top_p=0.6,
repetition_penalty: 1.1

Intended Uses & Limitations

This is a task-specific model intended to be used only with the provided prompts. It does not include any guardrails or VQA capability. Due to the nature of large language models (LLMs), a certain level of hallucination may occur. We recommend that developers carefully assess these risks in the context of their specific use case.

Follow us

https://twitter.com/opentyphoon

Support

https://discord.gg/us5gAYmrxw

Citation

  • If you find Typhoon2 useful for your work, please cite it using:
@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}