Gemma 3 IT QAT (Multiple Sizes) - Hybrid Coding Agent with Tool Support

This repository provides modified versions of Google’s gemma3:*-it-qat models (1B, 4B, 12B, 27B), specifically adapted to function as hybrid coding assistants with enhanced tool-using capabilities. They are primarily designed for integration within environments like VS Code.

This specific page documents the 27B variant: orieg/gemma3-tools:27b-it-qat. Instructions for other sizes are similar, requiring adjustments to the tag and potentially the num_ctx parameter based on hardware and model capabilities.

Available Variants

This tool-enabled version can be built for the following Gemma 3 IT QAT base models:

Parameter Size	Base Model Tag	Vision Support	Max Context Window	Notes
1B	`1b-it-qat`	No	32k (32768)	Text-only model
4B	`4b-it-qat`	Yes	128k (131072)	Multimodal
12B	`12b-it-qat`	Yes	128k (131072)	Multimodal
27B	`27b-it-qat`	Yes	128k (131072)	Multimodal, most capable

Base Model

These models are built upon Google’s official gemma3:<size>-it-qat releases. They inherit the base models’ core capabilities, context window potential, multilinguality, and performance characteristics, combined with the memory efficiency of Quantization-Aware Training (QAT).

Modifications

The primary modification involves a custom Modelfile (shown below for the 27B variant) that:

Implements Tool Support: Uses a template structure designed to recognize and format tool calls based on the Ollama API interaction pattern.
Provides a Hybrid System Prompt: Instructs the model to act primarily as an expert AI programming assistant within VS Code, while also being aware of and capable of using a range of tools (both coding-specific and more generic, like web search or IDE interactions).
Sets Specific Parameters: Configures parameters suitable for coding tasks and long context. The provided Modelfile sets num_ctx to 131072 (128k) for the 27B version and num_predict to -1 (unlimited). Note: num_ctx may need adjustment based on the specific variant and hardware.

Usage (Example for 27B)

Pull the model: “`bash ollama pull orieg/gemma3-tools:27b-it-qat

(Replace 27b-it-qat with 1b-it-qat, 4b-it-qat, or 12b-it-qat for other sizes if available under the orieg/ namespace)

Run the model:

ollama run orieg/gemma3-tools:27b-it-qat

API Interaction When using these models via the Ollama API (e.g., /api/chat), ensure you:

Pass the available tools using the tools parameter in your request payload. Provide clear names and descriptions for each tool.

Your client application must be prepared to handle the output from the model, execute the corresponding tool, and send the result back in a subsequent message with role: “tool”.

Expected Tool Call Format The model is instructed to output tool calls in the following format only:

{“name”: “tool_name”, “parameters”: {“param_name”: “value”, …}}

Hardware Considerations Context Window (num_ctx): The Modelfile for the 27B variant defaults to PARAMETER num_ctx 131072 to target Gemma 3’s 128k context capability. The 1B variant supports a maximum of 32k. Users should adjust num_ctx based on the chosen model variant and specific hardware limitations.

VRAM/RAM Requirements:

All Variants (1B, 4B, 12B, 27B QAT): These quantized models are memory-efficient. The 1B, 4B, and 12B variants should run comfortably on GPUs with 24GB VRAM (e.g., NVIDIA RTX ³⁰⁹⁰⁄₄₀₉₀) with significant context.

27B QAT Variant: While this variant can also run on a 24GB VRAM GPU, achieving stable performance with very large context windows (approaching the 128k limit) might require careful memory management or potentially reducing num_ctx below the maximum. Performance will vary based on the specific workload and context size used.

CPU: Running any variant on CPU requires substantial system RAM (scaling with model size and context) and will be significantly slower than GPU inference.

Adjust num_ctx if needed: If you encounter performance issues or out-of-memory errors (especially with the 27B model at high context), creating a derivative model with a lower num_ctx (e.g., 32768, 16384) is recommended.

License

These models are based on Google’s gemma3:*-it-qat releases. Please refer to the original Gemma Terms of Use

Gemma3 model with tools support and large context window (optimized for RTX3090 24GB VRAM)