2 4 days ago

QLoRA fine-tune of Gemma 4 E2B for scam detection. F1 86.1% / FPR 1.1% on a 300-sample real test set. 12-tool function calling for SMS, email, voice transcripts, and OCR'd MMS images.

tools thinking
ollama run alicek0914/gemma4-scam

Applications

Claude Code
Claude Code ollama launch claude --model alicek0914/gemma4-scam
Codex App
Codex App ollama launch codex-app --model alicek0914/gemma4-scam
OpenClaw
OpenClaw ollama launch openclaw --model alicek0914/gemma4-scam
Hermes Agent
Hermes Agent ollama launch hermes --model alicek0914/gemma4-scam
Codex
Codex ollama launch codex --model alicek0914/gemma4-scam
OpenCode
OpenCode ollama launch opencode --model alicek0914/gemma4-scam

Models

View all →

Readme

gemma4-scam

Fine-tuned Gemma 4 E2B for scam-pattern detection — F1 86.1% / FPR 1.1% on a 300-sample real-world test set.

A QLoRA fine-tune of unsloth/gemma-4-E2B-it-unsloth-bnb-4bit, merged into the base and quantized to Q4_K_M GGUF. Fits on a consumer 8 GB GPU. The model classifies SMS, email, voice-call transcripts, and OCR’d MMS images into safe / low / medium / high / critical, explains its reasoning in plain language, and selects which of 12 protective tools to call (notify_trusted_contact, block_payment_intent, check_url_safety, …).

Built for the Gemma 4 Good Hackathon (Safety & Trust + Ollama Special Tech tracks).

Quick start

ollama pull alicek0914/gemma4-scam
ollama run alicek0914/gemma4-scam

Headline results

Evaluated on 300 hand-labeled real samples, no RAG, v3 prompt: - 70 from the FTC Consumer Sentinel public scam case database + a normal/control set + curated edge cases - 230 from the UCI SMS Spam Collection (training-disjoint — the 571 UCI seeds used in training are excluded)

Setup Size F1 FPR Precision Recall
Gemma 4 E4B base (Q4_K_M) ~8B 63.4% 78.9% 46.9% 97.6%
Gemma 4 E2B base (Q4_K_M) ~5B 58.0% 97.7% 41.4% 96.8%
gemma4-scam (this model) ~5B 86.1% 1.1% 98.0% 76.8%
  • +28.1 F1 pt vs. the same-size E2B base
  • 88× FPR reduction (97.7% → 1.1%)
  • Beats the larger 8B E4B base by +22.7 F1 pt

Intended use

Trained for scam-risk reasoning, not generic chat. Best at:

  • SMS / email phishing classification
  • Voice-call transcript analysis (impersonation, urgency, secrecy, OTP requests)
  • OCR’d MMS image text (after pytesseract extracts the text)
  • Emitting structured JSON with risk_level, patterns[], plain-language user_message, and tool_calls[]

Not a forensic deepfake detector. Not a general-purpose chatbot.

How it was trained

  • Base: unsloth/gemma-4-E2B-it-unsloth-bnb-4bit (~5B params, 4-bit)
  • Adapter: QLoRA r=16, lora_alpha=32, target=q_proj/k_proj/v_proj/o_proj
  • Trainer: Unsloth FastLanguageModel + TRL SFTTrainer on Colab L4
  • Training data: 3,100 synthetic + 571 real UCI SMS spam seeds, chat-template formatted. Evaluated on a 300-sample hand-labeled real test set that was never touched in training.
  • Quantization: merged into bf16 safetensors → GGUF f16 → Q4_K_M via llama-quantize

Links

License

The base Gemma 4 weights are governed by the Gemma Terms of Use. The fine-tuned adapter and merged weights are released under the same terms for research and non-commercial use.