-
llava
🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.
Vision 7B 13B 34B1M Pulls 98 Tags Updated 7 months ago
-
llava-llama3
A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.
Vision 8B152.9K Pulls 4 Tags Updated 4 months ago
-
moondream
moondream2 is a small vision language model designed to run efficiently on edge devices.
Vision50K Pulls 18 Tags Updated 4 months ago
-
bakllava
BakLLaVA is a multimodal model consisting of the Mistral 7B base model augmented with the LLaVA architecture.
Vision 7B42.7K Pulls 17 Tags Updated 9 months ago
-
llava-phi3
A new small LLaVA model fine-tuned from Phi 3 Mini.
Vision 3B35.2K Pulls 4 Tags Updated 4 months ago
-
hhao/openbmb-minicpm-llama3-v-2_5
MiniCPM-V surpasses proprietary models such as GPT-4V, Gemini Pro, Qwen-VL and Claude 3 in overall performance, and support multimodal conversation for over 30 languages.
Vision34.3K Pulls 8 Tags Updated 3 months ago
-
aiden_lu/minicpm-v2.6
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5
Vision 7B26.4K Pulls 1 Tag Updated 5 weeks ago
-
minicpm-v
A series of multimodal LLMs (MLLMs) designed for vision-language understanding.
Vision 7B11.2K Pulls 17 Tags Updated 8 days ago
-
xuxx/minicpm2.6
minicpm2.6
Vision 7B4,893 Pulls 1 Tag Updated 5 weeks ago
-
benzie/llava-phi-3
A lightweight vision model
Vision 3B2,640 Pulls 1 Tag Updated 4 months ago
-
xiayu/openbmb-minicpm-llama3-v-2_5
Vision 8B1,464 Pulls 2 Tags Updated 3 months ago
-
mskimomadto/chat-gph-vision
GPH Vision LLM: Transforming Industries through Intelligent Solutions
Vision 8B1,355 Pulls 1 Tag Updated 2 months ago
-
0ssamaak0/xtuner-llava
Family of LLaVA models fine-tuned from Llama3-8B Instruct, Phi3-mini and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.
Vision 3B 8B1,298 Pulls 4 Tags Updated 4 months ago
-
srizon/pixie
Pixie is a combined model powered by dolphin-llama3 and llava who can break complex problems into smaller pieces and find the best solutions using her own pattern. Not only text based, she can read images as well.
Vision 8B923 Pulls 1 Tag Updated 4 months ago
-
knoopx/llava-phi-2
Lightweight and fast vision model, does a decent job describing photos.
Vision 3B780 Pulls 2 Tags Updated 6 months ago
-
rohithbojja/llava-med-v1.6
Vision 7B770 Pulls 1 Tag Updated 4 months ago
-
nsheth/llama-3-lumimaid-8b-v0.1-iq-imatrix
It uses this one Q4_K_M-imat (4.89 BPW) quant for up to 12288 context sizes. for less than 8gb vram
Vision 8B737 Pulls 1 Tag Updated 4 months ago
-
anas/video-llava
Vision 7B712 Pulls 2 Tags Updated 4 months ago
-
bigbug/minicpm-v2.5
Vision687 Pulls 1 Tag Updated 3 months ago
-
knoopx/mobile-vlm
Lightweight and fast vision model, does a decent job captioning photos.
Vision 3B624 Pulls 1 Tag Updated 6 months ago
-
jyan1/paligemma-mix-224
PaliGemma is a versatile and lightweight vision-language model based on open components such as the SigLIP vision model and the Gemma language model.
Vision613 Pulls 1 Tag Updated 2 weeks ago
-
qnguyen3/nanollava
Vision 0.5B605 Pulls 1 Tag Updated 4 months ago
-
nsheth/llava-llama-3-8b-v1_1-int4
Vision 8B601 Pulls 1 Tag Updated 4 months ago
-
ManishThota/llava_next_video
LLaVA NeXT Video 7B DPO which can process video and multiple images at once
Vision 7B532 Pulls 1 Tag Updated 2 months ago
-
mannix/llava-phi3
A new small LLaVA model fine-tuned from Phi 3 Mini [I-Quants]
Vision 3B500 Pulls 4 Tags Updated 3 months ago