๐ŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

Vision 7B 13B 34B

410.5K Pulls Updated 5 months ago

98 Tags

d5ca8c59f62d ยท 46B
{{ .System }} USER: {{ .Prompt }} ASSSISTANT: