๐ŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

135.4K Pulls Updated 2 months ago

98 Tags

ed11eda7790d ยท 30B
{ "stop": [ "[INST]", "[/INST]" ] }