๐ŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

2.2M 10 months ago

ed11eda7790d ยท 30B
{
"stop": [
"[INST]",
"[/INST]"
]
}