bsahane/gemma3

This model requires Ollama 0.6 or later. Download Ollama

Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.

Models

Text

1B parameter model (32k context window)

ollama run bsahane/gemma3:1b

Multimodal (Vision)

4B parameter model (128k context window)

ollama run bsahane/gemma3:4b

12B parameter model (128k context window)

ollama run bsahane/gemma3:12b

27B parameter model (128k context window)

ollama run bsahane/gemma3:27b

Evaluation

Benchmark Results

These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation:

Reasoning, logic and code capabilities

Benchmark	Metric	Gemma 3 PT 1B	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
HellaSwag	10-shot	62.3	77.2	84.2	85.6
BoolQ	0-shot	63.2	72.3	78.8	82.4
PIQA	0-shot	73.8	79.6	81.8	83.3
SocialIQA	0-shot	48.9	51.9	53.4	54.9
TriviaQA	5-shot	39.8	65.8	78.2	85.5
Natural Questions	5-shot	9.48	20.0	31.4	36.1
ARC-c	25-shot	38.4	56.2	68.9	70.6
ARC-e	0-shot	73.0	82.4	88.3	89.0
WinoGrande	5-shot	58.2	64.7	74.3	78.8
BIG-Bench Hard		28.4	50.9	72.6	77.7
DROP	3-shot, F1	42.4	60.1	72.2	77.2
AGIEval	3-5-shot	22.2	42.1	57.4	66.2
MMLU	5-shot, top-1	26.5	59.6	74.5	78.6
MATH	4-shot	–	24.2	43.3	50.0
GSM8K	5-shot, maj@1	1.36	38.4	71.0	82.6
GPQA		9.38	15.0	25.4	24.3
MMLU (Pro)	5-shot	11.2	23.7	40.8	43.9
MBPP	3-shot	9.80	46.0	60.4	65.6
HumanEval	pass@1	6.10	36.0	45.7	48.8
MMLU (Pro COT)	5-shot	9.7	NaN	NaN	NaN

Multilingual capabilities

Benchmark	Gemma 3 PT 1B	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
MGSM	2.04	34.7	64.3	74.3
Global-MMLU-Lite	24.9	57.0	69.4	75.7
Belebele	26.6	59.4	78.0	–
WMT24++ (ChrF)	36.7	48.4	53.9	55.7
FloRes	29.5	39.2	46.0	48.8
XL-Sum	4.82	8.55	12.2	14.9
XQuAD (all)	43.9	68.0	74.5	76.8

Multimodal capabilities

Benchmark	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
COCOcap	102	111	116
DocVQA (val)	72.8	82.3	85.6
InfoVQA (val)	44.1	54.8	59.4
MMMU (pt)	39.2	50.3	56.1
TextVQA (val)	58.9	66.5	68.6
RealWorldQA	45.5	52.2	53.9
ReMI	27.3	38.5	44.8
AI2D	63.2	75.2	79.0
ChartQA	45.4	60.9	63.8
ChartQA (augmented)	81.8	88.5	88.7
VQAv2	–	–	–
BLINK	38.0	35.9	39.6
OKVQA	51.0	58.7	60.2
TallyQA	42.5	51.8	54.3
SpatialSense VQA	50.9	60.0	59.4
CountBenchQA	26.1	17.8	68.0

The current, most capable model that runs on a single GPU with Tools Support

Models

Readme

Models

Text

Multimodal (Vision)

Evaluation

Benchmark Results

Reasoning, logic and code capabilities

Multilingual capabilities

Multimodal capabilities