24 3 months ago

vision tools

Models

View all →

Readme

this is finetune of Qwen2.5-VL-3B-Instruct trained with image text pairs.

try: write perverted comment about this picture