153 4 weeks ago

MinerU 2.5 Pro (1.2B) - Q4_K_M An advanced document parsing vision-language model (VLM)

ollama run tb-etl/mineru-q4km

Details

4 weeks ago

c347c8d9bd13 · 398MB ·

qwen2vl
·
494M
·
Q4_K_M

Readme

MinerU 2.5 Pro (1.2B) - Q4_K_M

An advanced document parsing vision-language model (VLM) specialized in converting PDFs, charts, formulas, and Office documents into structured formats like Markdown and JSON.

Key Features

Architecture: A 1.2B parameter Vision-Language Model optimized for spatial and structural document analysis. Top Performance: Scores exceptionally high in document parsing benchmarks, including state-of-the-art accuracy in dense formula recognition, table parsing, and text extraction.

Advanced Document Understanding:

  • Chart and image parsing
  • Complex table merging across split pages
  • Dense formula translation to clean LaTeX
  • Image recognition within tables

Quantization Details

Type: Q4_K_M Balanced Profile: Uses Q6_K for half the attention and feed-forward tensor layers, and Q4_K for the rest, offering the absolute best balance of speed, low memory footprints, and near-native accuracy.