OpenScan-1.0 is created by HSR-projects trained using Nvidia GeForce RTX 5090 with 16B tokens + Vision model

Details

Updated 2 months ago

2 months ago

d97b1f4cb961 · 4.7GB ·

model

archllama

parameters7.24B

quantizationQ4_0

4.1GB

projector

archclip

parameters312M

quantizationF16

624MB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

system

You are OpenScan-1.0, an advanced OCR model. Extract all text from images accurately. Preserve forma

135B

params

{ "stop": [ "[INST]", "[/INST]" ] }

30B

template

[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]

67B

OpenScan-1.0 📸🧠

A lightweight, local-first AI-powered OCR system for extracting and structuring text from images.

📌 Overview

OpenScan-1.0 is a modern OCR pipeline that combines vision models and language models to extract text from images and convert it into clean, structured output.

Unlike traditional OCR engines, OpenScan focuses on:

Handling noisy and low-quality images
Improving extracted text using AI
Providing flexible output formats

🧠 How It Works

Image → Vision Model → Raw Text → AI Cleanup → Structured Output

Pipeline:

Vision Model (BakLLaVA / LLaVA) → Extracts raw text from images
OpenScan Model → Cleans, corrects, and structures the text
Optional Post-processing → Converts into formats like JSON, Markdown, or plain text

⚙️ Requirements

GPU: NVIDIA RTX 20xx (recommended, optional)
RAM: 16GB
OS: Linux / Windows

Software:

Ollama
Python 3.10+
OpenCV (optional for preprocessing)

🚀 Installation

1. Install Ollama

https://ollama.com

2. Pull Vision Models

ollama pull bakllava
ollama pull llava:7b

3. Create OpenScan Model

Create a Modelfile:

FROM llava:7b

SYSTEM "You are OpenScan-1.0, an AI OCR assistant. Extract text from images accurately, fix OCR errors, and return clean, structured output."

PARAMETER temperature 0.2
PARAMETER top_p 0.9

Build the model:

ollama create OpenScan-1.0 -f Modelfile

🧪 Usage

Step 1 — Extract Raw Text

ollama run bakllava

Prompt:

Extract all visible text from this image exactly as written.
Do not explain anything.

Step 2 — Clean & Structure Text

ollama run OpenScan-1.0

Prompt:

Clean and structure this OCR output.
Fix errors and return readable text.

📄 Example

Input (raw OCR)

H3llo W0rld!!
Th1s 1s @ t3st.

Output (cleaned)

Hello World!!
This is a test.

⚠️ Limitations

Performance depends on image quality
Handwritten text may reduce accuracy
Complex layouts may require multiple passes

🔥 Roadmap

Automatic image preprocessing
Layout detection (tables, paragraphs)
Multi-language support
Web UI and API integration

🤝 Contributing

Contributions are welcome:

Improve OCR accuracy
Enhance prompts
Add new output formats

📜 License

This project relies on third-party models. Ensure compliance with respective model licenses.

🏢 Maintained By

HSR Projects

📣 Disclaimer

OpenScan-1.0 is an experimental OCR system. Results may vary depending on input quality and model limitations.

🚀 Vision

Not just OCR — intelligent text understanding.