259 2 weeks ago

A structurally extracted, text-only iteration of Google's multimodal gemma-4-E4B-it model. Vision and audio encoders have been fully decoupled to minimize VRAM footprint for text-centric workloads. System Prompt to address lost abilities.

tools thinking
{{- if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end -}}
{{- range .Messages }}<start_of_turn>{{- if eq .Role "assistant" -}}model{{- else -}}{{ .Role }}{{- end -}}
{{ .Content }}<end_of_turn>
{{ end -}}
{{- if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end -}}
<start_of_turn>model
{{ .Response }}