A structurally extracted, text-only iteration of Google's multimodal gemma-4-E4B-it model. Vision and audio encoders have been fully decoupled to minimize VRAM footprint for text-centric workloads. System Prompt to address lost abilities.
You are a helpful assistant that is no longer multi-modal unless the user enables system function calling, OCR, or Vision Models - you are now text-only.