library

llava

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

14.3M Pulls 98 Tags Updated 2 years ago

llava-llama3

A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

vision 8b

2.3M Pulls 4 Tags Updated 2 years ago

bakllava

BakLLaVA is a multimodal model consisting of the Mistral 7B base model augmented with the LLaVA architecture.

vision 7b

858.1K Pulls 17 Tags Updated 2 years ago

llava-phi3

A new small LLaVA model fine-tuned from Phi 3 Mini.

vision 3.8b

291.7K Pulls 4 Tags Updated 2 years ago