This repo packages Google’s google/medgemma-1.5-4b-it (MedGemma 1.5, 4B, instruction-tuned, multimodal) for Ollama, exposing vision + text inference and common Ollama features (structured JSON output, tool calling via your app, etc.)

Details

Updated 4 months ago

4 months ago

1374b39addf1 · 5.0GB ·

model

archgemma3

parameters4.3B

quantizationQ8_0

5.0GB

template

<start_of_turn>user {{- if .System }}{{ .System }} {{- end }}{{ .Prompt }}<end_of_turn> <start_of_tu

109B

system

You are a medical assistant. - Be concise, structured, and cautious with uncertainty. - Do not provi

390B

params

{ "num_predict": 4096, "repeat_last_n": 256, "repeat_penalty": 1.25, "stop": [

155B

MedGemma 1.5 4B IT (Ollama build)

This repository provides an Ollama-ready build of Google’s MedGemma 1.5 4B instruction-tuned model:

Upstream model: google/medgemma-1.5-4b-it
Type: Image-Text-to-Text (multimodal: text + vision input → text output)
Purpose: Medical text reasoning + medical image comprehension (CXR, derm, ophtho, pathology, document understanding, etc.)
License/Terms: Governed by Health AI Developer Foundations (HAI-DEF) terms of use.

Available tags

This repo publishes the same model under multiple precision/quantization tags so you can choose quality vs speed:

:F16 — full precision (best quality, highest VRAM/RAM)
:Q8_0 — high-quality quant (strong quality/speed trade-off)
:Q4_K_M — smaller/faster quant (best for local demos / edge-ish constraints)

Quick guidance

If you want maximum fidelity (especially for vision): F16
If you want near-F16 quality with lower memory: Q8_0
If you want fast + small for demos: Q4_K_M

What the model is good at

Text tasks

Medical Q&A (non-clinical decision support)
Summarization of clinical notes (with proper safety framing)
Extraction of structured facts (best combined with Ollama structured output)
Drafting patient-facing explanations (with guardrails)

Vision tasks

Image understanding (e.g., chest X-rays) and visual Q&A
“Describe / list findings” style prompts
Combining image + text context (e.g., “compare to prior” requires validation)

Important limitations (read this)

MedGemma is intended as a developer foundation model, not a drop-in clinical system.

Not a medical device. Do not use outputs to make clinical diagnoses, treatment decisions, or patient management decisions.
Outputs require independent verification and clinical correlation by qualified professionals.
The upstream model is noted as not evaluated/optimized for multi-turn applications; treat multi-turn workflows as experimental and validate carefully.
Vision evaluation is primarily single-image; multi-image workflows may need extra preprocessing and validation.

Safety / usage disclaimer (recommended)

This model can generate medical content and may sound authoritative. Always:

require clinician review,
verify with trusted sources,
implement guardrails (red flags, uncertainty, escalation),
log and evaluate performance on representative data before any real-world use.

Install & run

Install Ollama

Follow Ollama installation for your OS, then:

ollama --version
ollama list

Run the model (text)

ollama run dcarrascosa/medgemma-1.5-4b-it:Q8_0 "Summarize possible causes of upper abdominal discomfort after overeating and list red flags."

Using structured output (JSON Schema)

Ollama supports constraining model output to valid JSON using a JSON Schema via format.

PowerShell example (Windows) — text-only

$schema = @{
  type="object"
  required=@("summary","differential","next_steps","red_flags")
  properties=@{
    summary=@{type="string"}
    differential=@{type="array"; items=@{type="string"}; maxItems=3}
    next_steps=@{type="array"; items=@{type="string"}; maxItems=4}
    red_flags=@{type="array"; items=@{type="string"}; maxItems=5}
  }
}

$body = @{
  model="dcarrascosa/medgemma-1.5-4b-it:Q4_K_M"
  messages=@(@{
    role="user"
    content="Return JSON only. Adult (45M) ate a large meal 2 hours ago; now mild nausea and upper abdominal fullness. No fever, vomiting, black stools. Give general guidance, differential, next steps, red flags."
  })
  format=$schema
  options=@{
    temperature=0.2
    top_p=0.9
    repeat_penalty=1.25
    num_predict=1200
  }
  stream=$false
} | ConvertTo-Json -Depth 10

$res = Invoke-RestMethod http://localhost:11434/api/chat -Method Post -ContentType "application/json" -Body $body
$res.message.content | ConvertFrom-Json | Format-List

Tip: Keep schemas practical. Overly strict schemas can reduce answer quality.

Using vision (image + text)

Ollama vision works through /api/chat by passing base64 images in messages[].images.

PowerShell example — vision + structured JSON

$imgPath = "C:\Users\User\Downloads\00000001_000.png"
$imgB64  = [Convert]::ToBase64String([IO.File]::ReadAllBytes($imgPath))

$schema = @{
  type="object"
  required=@("findings","differential","next_steps")
  properties=@{
    findings=@{type="array"; items=@{type="string"}}
    differential=@{type="array"; items=@{type="string"}; maxItems=3}
    next_steps=@{type="array"; items=@{type="string"}; maxItems=3}
  }
}

$body = @{
  model="dcarrascosa/medgemma-1.5-4b-it:Q8_0"
  messages=@(@{
    role="user"
    content="Return JSON only. Analyze this chest X-ray. Provide findings, differential (max 3), next steps (max 3). Avoid repetition."
    images=@($imgB64)
  })
  format=$schema
  options=@{
    temperature=0.2
    repeat_penalty=1.25
    num_predict=1200
  }
  stream=$false
} | ConvertTo-Json -Depth 10

$res = Invoke-RestMethod http://localhost:11434/api/chat -Method Post -ContentType "application/json" -Body $body
$res.message.content | ConvertFrom-Json | Format-List

Important: For medical imaging, treat outputs as non-diagnostic and validate with a clinician and real datasets.

Tool calling (function calling) — how to think about it

Ollama can return structured “tool calls” if your application layer implements a tool schema and runs the tools. The model doesn’t execute tools by itself—you do.

Typical loop:

user question
model outputs a tool call (JSON)
your app executes tool (DB lookup / RAG / calculator)
your app feeds results back to the model
model finalizes answer

Best use cases:

RAG retrieval (guidelines, manuals, policies)
Clinical workflow orchestration (non-decision support)
Calling internal APIs (EHR integrations, scheduling)

Security tip: Validate tool args. Treat them as untrusted input.

“Thinking” / reasoning trace

Some models emit visible “thinking” tokens depending on the template and model behavior. For production:

prefer structured outputs for machine parsing,
keep “thinking” out of the user-facing response,
log it separately only if needed.

If you need strict separation (final vs reasoning), enforce it via JSON schema fields like { "final": "...", "reasoning": "..." }.

Recommended parameters

These are sane defaults for a “medical assistant” style:

temperature: 0.2
top_p: 0.9
repeat_penalty: 1.25
num_predict: 1200 (raise to 4096 if you want long structured outputs)

Attribution

This Ollama repo packages the upstream model:

Google MedGemma 1.5 4B IT: google/medgemma-1.5-4b-it

Please cite the upstream technical report when appropriate:

Sellergren et al., “MedGemma Technical Report.” arXiv:2507.05201 (2025).

License / Terms

Use is governed by the Health AI Developer Foundations terms of use. Ensure your downstream use and any redistribution complies with those terms.