๐ŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

Vision 7B 13B 34B

1.4M Pulls Updated 8 months ago

98 Tags

f02dd72bb242 ยท 59B
{ "stop": [ "<|im_start|>", "<|im_end|>" ] }