PaliGemma is a versatile and lightweight vision-language model based on open components such as the SigLIP vision model and the Gemma language model.
2,839 Pulls Updated 2 months ago
Updated 2 months ago
2 months ago
7045408dd656 · 5.9GB
Readme
Model information
Description
PaliGemma is a versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. It takes both image and text as input and generates text as output, supporting multiple languages. It is designed for class-leading fine-tune performance on a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation.
Model architecture
PaliGemma is the composition of a Transformer decoder and a Vision Transformer image encoder, with a total of 3 billion params. The text decoder is initialized from Gemma-2B. The image encoder is initialized from SigLIP-So400m/14. PaliGemma is trained following the PaLI-3 recipes.
Inputs and outputs
- Input: Image and text string, such as a prompt to caption the image, or a question.
- Output: Generated text in response to the input, such as a caption of the image, an answer to a question, a list of object bounding box coordinates, or segmentation codewords.
Usage:
Ensure you are on this branch: Josh and Roy’s Paligemma Support :)
Run the model:
ollama run jyan1/paligemma-mix-224
Then at the prompt, include the path to your image in the prompt:
>>> What is in this image? /path/to/paligemma/puppy.jpg
Added image '/path/to/paligemma/puppy.jpg'
A brown dog wearing a floral shirt and lei stands proudly next to a clear blue
pool. The dog's mouth is open, its paw rests on the edge of the water, and its
eyes are focused on the horizon. The pool water is crystal clear, and the palm
trees in the distance provide shade for the dog. A black leash connects the dog
to its owner, and a flower lei is around the dog's neck. The dog's fur is brown,
and its nose is black. The tree behind the pool is tall and slender, and the
fence surrounding the pool is made of metal posts.