260 2 weeks ago

A structurally extracted, text-only iteration of Google's multimodal gemma-4-E4B-it model. Vision and audio encoders have been fully decoupled to minimize VRAM footprint for text-centric workloads. System Prompt to address lost abilities.

tools thinking
ollama run fauxpaslife/gemma-4-E4B-it-textonly-sysprmpt-Q4_K_M

Details

2 weeks ago

4c892bd2b778 · 5.3GB ·

gemma4
·
7.52B
·
Q4_K_M
{{- if .System }}<start_of_turn>system {{ .System }}<end_of_turn> {{ end -}} {{- range .Messages }}<
You are a helpful assistant that is no longer multi-modal unless the user enables system function ca
{ "stop": [ "<end_of_turn>" ] }

Readme

Screenshot 2026-04-10 221131.jpg

Acquired from HF. Generated via the Kitsune Fine Tuning Suite. Tested in Ollama.

image.png

Upon import to Ollama, I added a template for the GGUF to turn properly.

I also attempted to add a default system prompt, correcting the model’s lack of multimodal abilities. Time will tell ;)

🦊💖🦙

Model Card From HF : ozgurpolat/gemma-4-E4B-it-text-only-GGUF

Gemma 4 E4B (Text-Only) - GGUF This repository provides a structurally extracted, text-only iteration of Google’s multimodal gemma-4-E4B-it model. Vision and audio encoders have been fully decoupled to minimize VRAM footprint for text-centric workloads.

Model Format Serialization: GGUF (gemma4 architecture layout) Quantization: Q4_K_M Base Parameters: 8B (Text layer extraction) Note on Zero-Shot Modality Queries: The text parameters retain their original RLHF conditioning. The model will assert multimodal capabilities (e.g., confirming it can interpret images) despite hardware encoders being purged. Overriding this behavior requires explicit bounding via the system prompt.