60 2 months ago

OpenScan-1.0 is created by HSR-projects trained using Nvidia GeForce RTX 5090 with 16B tokens + Vision model

vision
ollama run HSR-projects/OpenScan-1.0

Details

2 months ago

d97b1f4cb961 · 4.7GB ·

llama
·
7.24B
·
Q4_0
clip
·
312M
·
F16
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
You are OpenScan-1.0, an advanced OCR model. Extract all text from images accurately. Preserve forma
{ "stop": [ "[INST]", "[/INST]" ] }
[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]

Readme

koda.png

OpenScan-1.0 📸🧠

A lightweight, local-first AI-powered OCR system for extracting and structuring text from images.


📌 Overview

OpenScan-1.0 is a modern OCR pipeline that combines vision models and language models to extract text from images and convert it into clean, structured output.

Unlike traditional OCR engines, OpenScan focuses on:

  • Handling noisy and low-quality images
  • Improving extracted text using AI
  • Providing flexible output formats

🧠 How It Works

Image → Vision Model → Raw Text → AI Cleanup → Structured Output

Pipeline:

  1. Vision Model (BakLLaVA / LLaVA) → Extracts raw text from images

  2. OpenScan Model → Cleans, corrects, and structures the text

  3. Optional Post-processing → Converts into formats like JSON, Markdown, or plain text


⚙️ Requirements

  • GPU: NVIDIA RTX 20xx (recommended, optional)
  • RAM: 16GB
  • OS: Linux / Windows

Software:

  • Ollama
  • Python 3.10+
  • OpenCV (optional for preprocessing)

🚀 Installation

1. Install Ollama

https://ollama.com

2. Pull Vision Models

ollama pull bakllava
ollama pull llava:7b

3. Create OpenScan Model

Create a Modelfile:

FROM llava:7b

SYSTEM "You are OpenScan-1.0, an AI OCR assistant. Extract text from images accurately, fix OCR errors, and return clean, structured output."

PARAMETER temperature 0.2
PARAMETER top_p 0.9

Build the model:

ollama create OpenScan-1.0 -f Modelfile

🧪 Usage

Step 1 — Extract Raw Text

ollama run bakllava

Prompt:

Extract all visible text from this image exactly as written.
Do not explain anything.

Step 2 — Clean & Structure Text

ollama run OpenScan-1.0

Prompt:

Clean and structure this OCR output.
Fix errors and return readable text.

📄 Example

Input (raw OCR)

H3llo W0rld!!
Th1s 1s @ t3st.

Output (cleaned)

Hello World!!
This is a test.

⚠️ Limitations

  • Performance depends on image quality
  • Handwritten text may reduce accuracy
  • Complex layouts may require multiple passes

🔥 Roadmap

  • Automatic image preprocessing
  • Layout detection (tables, paragraphs)
  • Multi-language support
  • Web UI and API integration

🤝 Contributing

Contributions are welcome:

  • Improve OCR accuracy
  • Enhance prompts
  • Add new output formats

📜 License

This project relies on third-party models. Ensure compliance with respective model licenses.


🏢 Maintained By

HSR Projects


📣 Disclaimer

OpenScan-1.0 is an experimental OCR system. Results may vary depending on input quality and model limitations.


🚀 Vision

Not just OCR — intelligent text understanding.