698 1 month ago

This repo packages Google’s google/medgemma-1.5-4b-it (MedGemma 1.5, 4B, instruction-tuned, multimodal) for Ollama, exposing vision + text inference and common Ollama features (structured JSON output, tool calling via your app, etc.)

vision
ollama run dcarrascosa/medgemma-1.5-4b-it:Q8_0

Details

1 month ago

1374b39addf1 · 5.0GB ·

gemma3
·
4.3B
·
Q8_0
<start_of_turn>user {{- if .System }}{{ .System }} {{- end }}{{ .Prompt }}<end_of_turn> <start_of_tu
You are a medical assistant. - Be concise, structured, and cautious with uncertainty. - Do not provi
{ "num_predict": 4096, "repeat_last_n": 256, "repeat_penalty": 1.25, "stop": [

Readme

MedGemma 1.5 4B IT (Ollama build)

This repository provides an Ollama-ready build of Google’s MedGemma 1.5 4B instruction-tuned model:

  • Upstream model: google/medgemma-1.5-4b-it
  • Type: Image-Text-to-Text (multimodal: text + vision input → text output)
  • Purpose: Medical text reasoning + medical image comprehension (CXR, derm, ophtho, pathology, document understanding, etc.)
  • License/Terms: Governed by Health AI Developer Foundations (HAI-DEF) terms of use.

Available tags

This repo publishes the same model under multiple precision/quantization tags so you can choose quality vs speed:

  • :F16full precision (best quality, highest VRAM/RAM)
  • :Q8_0high-quality quant (strong quality/speed trade-off)
  • :Q4_K_Msmaller/faster quant (best for local demos / edge-ish constraints)

Quick guidance

  • If you want maximum fidelity (especially for vision): F16
  • If you want near-F16 quality with lower memory: Q8_0
  • If you want fast + small for demos: Q4_K_M

What the model is good at

Text tasks

  • Medical Q&A (non-clinical decision support)
  • Summarization of clinical notes (with proper safety framing)
  • Extraction of structured facts (best combined with Ollama structured output)
  • Drafting patient-facing explanations (with guardrails)

Vision tasks

  • Image understanding (e.g., chest X-rays) and visual Q&A
  • “Describe / list findings” style prompts
  • Combining image + text context (e.g., “compare to prior” requires validation)

Important limitations (read this)

MedGemma is intended as a developer foundation model, not a drop-in clinical system.

  • Not a medical device. Do not use outputs to make clinical diagnoses, treatment decisions, or patient management decisions.
  • Outputs require independent verification and clinical correlation by qualified professionals.
  • The upstream model is noted as not evaluated/optimized for multi-turn applications; treat multi-turn workflows as experimental and validate carefully.
  • Vision evaluation is primarily single-image; multi-image workflows may need extra preprocessing and validation.

Safety / usage disclaimer (recommended)

This model can generate medical content and may sound authoritative. Always:

  • require clinician review,
  • verify with trusted sources,
  • implement guardrails (red flags, uncertainty, escalation),
  • log and evaluate performance on representative data before any real-world use.

Install & run

Install Ollama

Follow Ollama installation for your OS, then:

ollama --version
ollama list

Run the model (text)

ollama run dcarrascosa/medgemma-1.5-4b-it:Q8_0 "Summarize possible causes of upper abdominal discomfort after overeating and list red flags."

Using structured output (JSON Schema)

Ollama supports constraining model output to valid JSON using a JSON Schema via format.

PowerShell example (Windows) — text-only

$schema = @{
  type="object"
  required=@("summary","differential","next_steps","red_flags")
  properties=@{
    summary=@{type="string"}
    differential=@{type="array"; items=@{type="string"}; maxItems=3}
    next_steps=@{type="array"; items=@{type="string"}; maxItems=4}
    red_flags=@{type="array"; items=@{type="string"}; maxItems=5}
  }
}

$body = @{
  model="dcarrascosa/medgemma-1.5-4b-it:Q4_K_M"
  messages=@(@{
    role="user"
    content="Return JSON only. Adult (45M) ate a large meal 2 hours ago; now mild nausea and upper abdominal fullness. No fever, vomiting, black stools. Give general guidance, differential, next steps, red flags."
  })
  format=$schema
  options=@{
    temperature=0.2
    top_p=0.9
    repeat_penalty=1.25
    num_predict=1200
  }
  stream=$false
} | ConvertTo-Json -Depth 10

$res = Invoke-RestMethod http://localhost:11434/api/chat -Method Post -ContentType "application/json" -Body $body
$res.message.content | ConvertFrom-Json | Format-List

Tip: Keep schemas practical. Overly strict schemas can reduce answer quality.


Using vision (image + text)

Ollama vision works through /api/chat by passing base64 images in messages[].images.

PowerShell example — vision + structured JSON

$imgPath = "C:\Users\User\Downloads\00000001_000.png"
$imgB64  = [Convert]::ToBase64String([IO.File]::ReadAllBytes($imgPath))

$schema = @{
  type="object"
  required=@("findings","differential","next_steps")
  properties=@{
    findings=@{type="array"; items=@{type="string"}}
    differential=@{type="array"; items=@{type="string"}; maxItems=3}
    next_steps=@{type="array"; items=@{type="string"}; maxItems=3}
  }
}

$body = @{
  model="dcarrascosa/medgemma-1.5-4b-it:Q8_0"
  messages=@(@{
    role="user"
    content="Return JSON only. Analyze this chest X-ray. Provide findings, differential (max 3), next steps (max 3). Avoid repetition."
    images=@($imgB64)
  })
  format=$schema
  options=@{
    temperature=0.2
    repeat_penalty=1.25
    num_predict=1200
  }
  stream=$false
} | ConvertTo-Json -Depth 10

$res = Invoke-RestMethod http://localhost:11434/api/chat -Method Post -ContentType "application/json" -Body $body
$res.message.content | ConvertFrom-Json | Format-List

Important: For medical imaging, treat outputs as non-diagnostic and validate with a clinician and real datasets.


Tool calling (function calling) — how to think about it

Ollama can return structured “tool calls” if your application layer implements a tool schema and runs the tools. The model doesn’t execute tools by itself—you do.

Typical loop:

  1. user question
  2. model outputs a tool call (JSON)
  3. your app executes tool (DB lookup / RAG / calculator)
  4. your app feeds results back to the model
  5. model finalizes answer

Best use cases:

  • RAG retrieval (guidelines, manuals, policies)
  • Clinical workflow orchestration (non-decision support)
  • Calling internal APIs (EHR integrations, scheduling)

Security tip: Validate tool args. Treat them as untrusted input.


“Thinking” / reasoning trace

Some models emit visible “thinking” tokens depending on the template and model behavior. For production:

  • prefer structured outputs for machine parsing,
  • keep “thinking” out of the user-facing response,
  • log it separately only if needed.

If you need strict separation (final vs reasoning), enforce it via JSON schema fields like { "final": "...", "reasoning": "..." }.


Recommended parameters

These are sane defaults for a “medical assistant” style:

  • temperature: 0.2
  • top_p: 0.9
  • repeat_penalty: 1.25
  • num_predict: 1200 (raise to 4096 if you want long structured outputs)

Attribution

This Ollama repo packages the upstream model:

  • Google MedGemma 1.5 4B IT: google/medgemma-1.5-4b-it

Please cite the upstream technical report when appropriate:

Sellergren et al., “MedGemma Technical Report.” arXiv:2507.05201 (2025).


License / Terms

Use is governed by the Health AI Developer Foundations terms of use. Ensure your downstream use and any redistribution complies with those terms.