Bilingual (IT/EN) offline AI assistant optimized for Raspberry Pi. Runs completely offline with 3.67 tokens/s on Pi 4.

Gemma3 Smart Q4 — Bilingual Offline AI for Raspberry Pi

Quantized Gemma 3 1B optimized for edge devices. Fully offline, bilingual (Italian/English), privacy-first.

🚀 Quick Start

IMPORTANT: To enable bilingual behavior, you must create a Modelfile with the bilingual SYSTEM prompt.

Step 1: Pull the base model

# Pull Q4_0 (recommended - faster, smaller)
ollama pull antconsales/antonio-gemma3-smart-q4

# Or pull Q4_K_M variant (better quality for long conversations)
ollama pull antconsales/antonio-gemma3-smart-q4:q4_k_m

Step 2: Create Modelfile with bilingual configuration

cat > Modelfile <<'EOF'
FROM antconsales/antonio-gemma3-smart-q4

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 1024
PARAMETER num_thread 4
PARAMETER num_batch 32
PARAMETER repeat_penalty 1.05
PARAMETER stop "<end_of_turn>"
PARAMETER stop "</s>"

SYSTEM """You are an offline AI assistant running on a Raspberry Pi. You MUST detect the user's language and respond in the SAME language:

- If the user writes in Italian, respond ONLY in Italian
- If the user writes in English, respond ONLY in English

Sei un assistente AI offline su Raspberry Pi. DEVI rilevare la lingua dell'utente e rispondere nella STESSA lingua:

- Se l'utente scrive in italiano, rispondi SOLO in italiano
- Se l'utente scrive in inglese, rispondi SOLO in inglese

Always match the user's language choice."""
EOF

Step 3: Create the configured model

ollama create gemma3-bilingual -f Modelfile

Step 4: Run it!

ollama run gemma3-bilingual

# Test in Italian
>>> ciao! come va?

# Test in English
>>> hello! how are you?

Why this is needed: The base model is instruction-tuned but doesn’t automatically switch languages. The SYSTEM prompt explicitly tells it to match the user’s language.

✨ Features

🔒 100% Offline — No cloud, no tracking, no internet required
🗣️ Bilingual — Automatically detects and responds in Italian or English
⚡ Fast — 3.67 tokens/s on Raspberry Pi 4 (Q4_0)
🎯 Optimized — Tuned parameters for Pi ⁴⁄₅ hardware
🔐 Privacy-First — All inference on-device

📊 Benchmarks (Raspberry Pi 4, 4GB RAM)

Model	Speed	Size	Use Case
Q4_0 ⭐	3.67 t/s	720 MB	Default choice (faster, smaller)
Q4_K_M	3.56 t/s	806 MB	Better coherence in long conversations

Tested on: Raspberry Pi OS (Debian Bookworm), Ollama runtime

💬 Example Interactions

Once you’ve created the model with the Modelfile (see Quick Start above):

Italian

ollama run gemma3-bilingual "Ciao! Spiegami cos'è un sensore di prossimità."

English

ollama run gemma3-bilingual "What is a Raspberry Pi and what can I do with it?"

Code-switching (IT/EN mixed)

ollama run gemma3-bilingual "Explain GPIO in English, poi dimmi come usarlo in italiano"

The model automatically detects the language and responds appropriately when using the Modelfile configuration!

🎯 Use Cases

Privacy-first personal assistants — All inference on-device
Offline home automation — Control IoT without cloud dependencies
Voice assistants — Fast enough for real-time speech (3.67 t/s)
Educational Pi projects — Learn AI/ML on affordable hardware
Bilingual chatbots — IT/EN customer support, documentation
Embedded systems — Industrial applications requiring offline inference

⚙️ Recommended Settings (Raspberry Pi ⁴⁄₅)

For optimal performance, use these parameters in your Modelfile:

FROM antconsales/antonio-gemma3-smart-q4

PARAMETER num_ctx 1024       # Context length (512 for faster response, 1024 for longer conversations)
PARAMETER num_thread 4        # Utilize all 4 cores on Raspberry Pi 4
PARAMETER num_batch 32        # Optimized for throughput on Pi
PARAMETER temperature 0.7     # Balanced creativity vs consistency
PARAMETER top_p 0.9           # Nucleus sampling for diverse responses
PARAMETER repeat_penalty 1.05 # Reduces repetitive outputs
PARAMETER stop "<end_of_turn>"
PARAMETER stop "</s>"

SYSTEM """
You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful.

Sei un assistente AI offline che opera su Raspberry Pi. Rileva automaticamente la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile.
"""

For voice assistants or real-time chat, reduce num_ctx to 512 for faster responses.

🛠️ Technical Details

Base Model: Google Gemma 3 1B IT
Quantization: Q4_0 and Q4_K_M (llama.cpp)
Context Length: 1024 tokens (configurable down to 512)
Vocabulary Size: 262,144 tokens
Architecture: Gemma3ForCausalLM
Supported Platforms: Raspberry Pi ⁴⁄₅, Mac M1/M2, Linux ARM64, x86-64

🔒 Model Verification

Verify downloaded models using SHA256 checksums:

File	SHA256 Checksum
`gemma3-1b-q4_0.gguf`	`d1d037446a2836db7666aa6ced3ce460b0f7f2ba61c816494a098bb816f2ad55`
`gemma3-1b-q4_k_m.gguf`	`c02d2e6f68fd34e9e66dff6a31d3f95fccb6db51f2be0b51f26136a85f7ec1f0`

# Verify checksum (on Linux/Mac with Ollama)
# Models are stored in ~/.ollama/models/blobs/
sha256sum ~/.ollama/models/blobs/sha256-*

🔗 Links

Ollama: https://ollama.com/antconsales/antonio-gemma3-smart-q4
HuggingFace: https://huggingface.co/chill123/antonio-gemma3-smart-q4
GitHub (demos, benchmarks, code): https://github.com/antconsales/gemma3-smart-q4

📜 License

This model is a derivative work of Google’s Gemma 3 1B.

License: Gemma License Please review and comply with the Gemma License Terms before using this model.

Quantization, optimization, and bilingual configuration by Antonio (antconsales).

For licensing questions regarding the base model, refer to Google’s official Gemma documentation.

📝 Version History

v0.1.0 (2025-10-21)

Initial release
Two quantizations: Q4_0 (720 MB) and Q4_K_M (806 MB)
Bilingual IT/EN support with automatic language detection
Optimized for Raspberry Pi 4 (3.56-3.67 tokens/s)
Tested on Raspberry Pi OS (Debian Bookworm) with Ollama

Built with ❤️ for privacy and edge computing Empowering offline AI, one Raspberry Pi at a time. 🇮🇹