InternVL3 is a Qwen2.5 based multimodal large language model from OpenGVLab that represents a significant advancement over its predecessor, InternVL 2.5.

tools

ollama run blaifa/InternVL3

curl http://localhost:11434/api/chat \
  -d '{
    "model": "blaifa/InternVL3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='blaifa/InternVL3',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'blaifa/InternVL3',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 11 months ago

11 months ago

70a4b5f65a88 · 4.7GB ·

model

archqwen2

parameters7.61B

quantizationQ4_K_M

4.7GB

template

{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M

1.6kB

Readme

InternVL3 Summary

InternVL3 is a new multimodal large language model that represents a significant advancement over its predecessor, InternVL 2.5.

Key Improvements

Enhanced Core Capabilities

Superior multimodal perception and reasoning
Better overall text performance than comparable models like Qwen2.5 Chat

Expanded Functionality

Tool usage integration
GUI agent capabilities
Industrial image analysis
3D vision perception
Additional multimodal applications

Technical Innovation

The model benefits from Native Multimodal Pre-Training, which allows it to outperform even the Qwen2.5 series in text tasks, despite using Qwen2.5’s pre-trained base models as initialization for its language component.

Bottom Line

InternVL3 pushes the boundaries of what multimodal AI can do by combining stronger foundational capabilities with a broader range of practical applications across visual, textual, and interactive domains.