blaifa/
InternVL3:latest

350 2 months ago

InternVL3 is a Qwen2.5 based multimodal large language model from OpenGVLab that represents a significant advancement over its predecessor, InternVL 2.5.

tools

2 months ago

70a4b5f65a88 · 4.7GB ·

qwen2
·
7.61B
·
Q4_K_M
{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M

Readme

InternVL3 Summary

InternVL3 is a new multimodal large language model that represents a significant advancement over its predecessor, InternVL 2.5.

Key Improvements

Enhanced Core Capabilities

  • Superior multimodal perception and reasoning
  • Better overall text performance than comparable models like Qwen2.5 Chat

Expanded Functionality

  • Tool usage integration
  • GUI agent capabilities
  • Industrial image analysis
  • 3D vision perception
  • Additional multimodal applications

Technical Innovation

The model benefits from Native Multimodal Pre-Training, which allows it to outperform even the Qwen2.5 series in text tasks, despite using Qwen2.5’s pre-trained base models as initialization for its language component.

Bottom Line

InternVL3 pushes the boundaries of what multimodal AI can do by combining stronger foundational capabilities with a broader range of practical applications across visual, textual, and interactive domains.