146.6K 1 week ago

This is a Highly Specialized Vision Model With More Then 2B Parameters.

vision tools

Models

View all →

1 model

opan:latest

1.9GB · 256K context window · Text, Image · 1 week ago

Readme

Opan

image.png

Opan requires Ollama 0.12.7

Opan is a friendly, multimodal vision-language assistant built on Qwen3-VL:2b. It combines the powerful perception and reasoning abilities of Qwen3-VL with a warm, supportive conversational style designed by Shushank.

In this generation, Opan inherits major improvements in many areas: understanding and generating text, perceiving and reasoning about visual content, handling long contexts, understanding spatial relationships, interpreting documents, and assisting with everyday tasks — offering a smooth and helpful AI experience.

Models

Opan (based on Qwen3-VL:2b)

Run

ollama run aeline/opan

Key features

  • Friendly Conversational Intelligence.

Opan is tuned with a custom system prompt that makes it warm, polite, and easy to understand. It explains concepts clearly and provides thoughtful, supportive responses designed for all users.

  • Multimodal Vision-Language Understanding.

Since Opan runs on Qwen3-VL, it can understand images + text together, recognize objects, interpret documents, describe scenes, and answer questions about visual content.

  • Improved Text Understanding & Generation.

Opan benefits from early-stage joint pretraining, giving it strong text reasoning skills. It handles general knowledge, guided explanations, learning assistance, and everyday conversation with ease.

  • Spatial & Visual Reasoning.

Opan can understand object relationships, positions, shapes, diagrams, and UI layouts. It performs much better at analyzing structured images, charts, or designs.

  • Enhanced OCR in 32 Languages.

Like Qwen3-VL, Opan can read text from images in over 32 languages, even in challenging conditions such as blur, tilt, or low light. It can also interpret long documents with preserved structure.

  • Long Context Support.

  • Opan supports 256K tokens, extendable toward 1M depending on configuration. This allows input such as:

    • Full textbooks

    • Long PDF documents

    • Extended conversations

      • Multi-image sequences

      • Opan can recall details across very long contexts more reliably.

Visual Coding Capabilities

Opan can generate code based on visual input — for example:

  • Convert UI mockups into HTML/CSS

  • Generate JavaScript from diagrams

  • Interpret flowcharts and transform them into code

  • This enables “what you see is what you get” style visual-to-code workflows.

Stronger Reasoning (Inherited from Qwen Thinking Models)

While not fine-tuned as a dedicated thinking model, Opan still benefits from Qwen3-VL’s improved logical structure. It can break down problems, analyze steps, and give clear reasoning, especially in STEM-related questions.

How Run

ollama run aeline/opan