4 2 days ago

Gemma 4 E2B fine-tuned on 122k microscopy VQA · 145+ genera · 5 categories · runs offline on a sub-$100 phone · Unsloth + llama.cpp · Apache 2.0 · research/educational only, not a medical device

vision
ollama run brinzaengineeringai/microlens-v2

Models

View all →

Readme

MicroLens v2

Research model · Apache 2.0 · Not a medical device · Not a certified instrument · Use at your own risk. Outputs are statistical pattern matches against training data, not analytical measurements. See full disclaimer below.

A small vision-language model for microscopy. Gemma 4 E2B fine-tuned on 122,399 image-question-answer pairs covering 145+ taxonomic genera across diatoms, freshwater and marine zooplankton, fungal spores, and fish larvae. Q4_K_M GGUF, 3.4 GB. Runs on a phone.

ollama run brinzaengineeringai/microlens-v2

What it does

Give it a microscopy image. Get back one line of structured taxonomic text. Same image, same answer, every time.

This is a diatom of the genus Navicula, specifically Navicula gregaria.

That format is on purpose. v2 is built for pipelines that need to ingest thousands of images and feed the result into a database. No prose, no chain-of-thought, no markdown surprises. If you want a longer scientific description (morphology, habitat, identification cues), use microlens-v3 instead.

Accuracy

Stratified evaluation on 220 held-out validation images.

Category Category match Genus match Notes
Diatoms 100% ~50% Largest class in training (8k+ samples)
Freshwater zooplankton 97% ~45% Rotifers, copepods, ciliates
Marine zooplankton 100% ~45% Copepods, ostracods, krill larvae
Fungal spores 100% ~50% Plant-pathogenic conidia
Fish larvae 100% n/a Pseudo-genus, see Limitations

For reference, random guess across the 145+ genera is around 0.7%.

Performance

Measured on actual hardware:

  • RTX 3090 Ti: 0.4 to 0.6 seconds per answer
  • Sub-$100 Android phone with 8 GB RAM: 1.5 to 2.5 seconds
  • Raspberry Pi 5: about 3 seconds

The Android client uses llama.cpp + mtmd via JNI. Desktop runs llama-server and reads SSE.

Intended use and full disclaimer

MicroLens v2 is a research and educational artefact published under Apache 2.0. It is a fine-tuned neural network, not a regulated instrument.

Designed for:

  • Citizen-science screening
  • Taxonomy teaching and student labs
  • ML research, dataset benchmarking, model comparison
  • Pre-classification stages of professional pipelines, where every result is verified by a qualified person before any decision is made

This model is NOT, and must not be treated as:

  • A medical device, in-vitro diagnostic (IVD), or clinical decision-support tool
  • A regulatory-compliant water-quality measurement instrument (no ISO 17025, EPA, EU WFD, or equivalent certification)
  • A substitute for a trained taxonomist or accredited laboratory analysis
  • A calibrated, validated, or peer-reviewed analytical method

The model’s output is a probabilistic pattern match against the training data distribution, not a physical or analytical measurement. The model can be confidently wrong, particularly on:

  • specimens not represented in training (145+ genera ≠ all microscopic life)
  • damaged, atypical, or out-of-focus images
  • subjects from kingdoms or phyla outside the training categories

No warranty. This software is provided “AS IS”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the author or contributors be liable for any claim, damages or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the model or the use or other dealings in the model.

You assume all risk when downloading, deploying, modifying, or using this model on your own hardware. Always have qualified personnel verify any result that informs a regulatory, environmental, clinical, or health-related decision.

Limitations

A few things to know:

For fish larvae, the underlying dataset has no species-level annotation. The model returns the category name as the “genus” for this class. Don’t import that into a taxonomic database.

The output format is rigid. That’s a feature for parsers and a limitation for humans. Use v3 if you want flowing text.

Long-tail genera — the roughly 100 with fewer than 100 training samples each — score noticeably lower than the 30 most-common ones. Per-genus precision and recall live in the GitHub model card.

There is no uncertainty score in the standard output. If you need confidence values, pull logprobs from llama-server.

Built with

Three pieces did the heavy lifting:

  • Unsloth for fine-tuning. FastVisionModel with 4-bit QLoRA on a single RTX 3090 Ti. Two-times speedup and half the VRAM compared to vanilla Transformers, which is what made this trainable on consumer hardware in the first place.
  • llama.cpp + mtmd for inference. The reason this fits on a phone.
  • Gemma 4 E2B-it as the base. Apache 2.0, multimodal out of the box, small enough to ship.

Links

Apache 2.0. Built for the Kaggle Gemma 4 Good Hackathon 2026, Health & Sciences track.

Serghei Brinza · Vienna, Austria · 2026