16 Downloads Updated 2 days ago
High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model
German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format.
| Model | Size | Base | HuggingFace |
|---|---|---|---|
| german-ocr | 4.4 GB | Qwen2-VL-2B | Keyven/german-ocr |
| german-ocr-3b | 7.5 GB | Qwen2.5-VL-3B | Keyven/german-ocr-3b |
Dieses Modell befindet sich noch in aktiver Entwicklung. Aktuell gibt es Kompatibilitätsprobleme mit dem Ollama Vision-Adapter.
Für zuverlässige Ergebnisse: HuggingFace-Version verwenden.
pip install german-ocr
from german_ocr import GermanOCR
# Using Ollama (fast, local)
ocr = GermanOCR(backend="ollama")
result = ocr.extract("document.png")
print(result)
# Using Transformers (more accurate)
ocr = GermanOCR(backend="transformers")
result = ocr.extract("document.png")
print(result)
ollama run Keyvan/german-ocr "Extrahiere den Text: image.png"
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image
model = Qwen2VLForConditionalGeneration.from_pretrained(
"Keyven/german-ocr",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("Keyven/german-ocr")
image = Image.open("document.png")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Extrahiere den Text aus diesem Dokument."}
]
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt"
).to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=512)
result = processor.batch_decode(
output_ids[:, inputs.input_ids.shape[1]:],
skip_special_tokens=True
)[0]
print(result)
| Metric | Value |
|---|---|
| Base Model | Qwen2-VL-2B-Instruct |
| Model Size | 4.4 GB |
| VRAM (4-bit) | 1.5 GB |
| Inference Time | ~15s (GPU) |
Apache 2.0
Keyvan Hardani - Website: keyvan.ai - LinkedIn: linkedin.com/in/keyvanhardani - GitHub: @Keyvanhardani