60 2 months ago

OpenScan-1.0 is created by HSR-projects trained using Nvidia GeForce RTX 5090 with 16B tokens + Vision model

vision
ollama run HSR-projects/OpenScan-1.0

Models

View all →

Readme

koda.png

OpenScan-1.0 πŸ“ΈπŸ§ 

A lightweight, local-first AI-powered OCR system for extracting and structuring text from images.


πŸ“Œ Overview

OpenScan-1.0 is a modern OCR pipeline that combines vision models and language models to extract text from images and convert it into clean, structured output.

Unlike traditional OCR engines, OpenScan focuses on:

  • Handling noisy and low-quality images
  • Improving extracted text using AI
  • Providing flexible output formats

🧠 How It Works

Image β†’ Vision Model β†’ Raw Text β†’ AI Cleanup β†’ Structured Output

Pipeline:

  1. Vision Model (BakLLaVA / LLaVA) β†’ Extracts raw text from images

  2. OpenScan Model β†’ Cleans, corrects, and structures the text

  3. Optional Post-processing β†’ Converts into formats like JSON, Markdown, or plain text


βš™οΈ Requirements

  • GPU: NVIDIA RTX 20xx (recommended, optional)
  • RAM: 16GB
  • OS: Linux / Windows

Software:

  • Ollama
  • Python 3.10+
  • OpenCV (optional for preprocessing)

πŸš€ Installation

1. Install Ollama

https://ollama.com

2. Pull Vision Models

ollama pull bakllava
ollama pull llava:7b

3. Create OpenScan Model

Create a Modelfile:

FROM llava:7b

SYSTEM "You are OpenScan-1.0, an AI OCR assistant. Extract text from images accurately, fix OCR errors, and return clean, structured output."

PARAMETER temperature 0.2
PARAMETER top_p 0.9

Build the model:

ollama create OpenScan-1.0 -f Modelfile

πŸ§ͺ Usage

Step 1 β€” Extract Raw Text

ollama run bakllava

Prompt:

Extract all visible text from this image exactly as written.
Do not explain anything.

Step 2 β€” Clean & Structure Text

ollama run OpenScan-1.0

Prompt:

Clean and structure this OCR output.
Fix errors and return readable text.

πŸ“„ Example

Input (raw OCR)

H3llo W0rld!!
Th1s 1s @ t3st.

Output (cleaned)

Hello World!!
This is a test.

⚠️ Limitations

  • Performance depends on image quality
  • Handwritten text may reduce accuracy
  • Complex layouts may require multiple passes

πŸ”₯ Roadmap

  • Automatic image preprocessing
  • Layout detection (tables, paragraphs)
  • Multi-language support
  • Web UI and API integration

🀝 Contributing

Contributions are welcome:

  • Improve OCR accuracy
  • Enhance prompts
  • Add new output formats

πŸ“œ License

This project relies on third-party models. Ensure compliance with respective model licenses.


🏒 Maintained By

HSR Projects


πŸ“£ Disclaimer

OpenScan-1.0 is an experimental OCR system. Results may vary depending on input quality and model limitations.


πŸš€ Vision

Not just OCR β€” intelligent text understanding.