6 20 hours ago

is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding.

vision

Models

View all →

Readme

No readme