3,930 Downloads Updated 4 months ago
Updated 4 months ago
4 months ago
b954c157c229 · 1.0GB
This repository provides modified versions of Google’s gemma3:*-it-qat
models (1B, 4B, 12B, 27B), specifically adapted to function as hybrid coding assistants with enhanced tool-using capabilities. They are primarily designed for integration within environments like VS Code.
This specific page documents the 27B variant: orieg/gemma3-tools:27b-it-qat
. Instructions for other sizes are similar, requiring adjustments to the tag and potentially the num_ctx
parameter based on hardware and model capabilities.
This tool-enabled version can be built for the following Gemma 3 IT QAT base models:
Parameter Size | Base Model Tag | Vision Support | Max Context Window | Notes |
---|---|---|---|---|
1B | 1b-it-qat |
No | 32k (32768) | Text-only model |
4B | 4b-it-qat |
Yes | 128k (131072) | Multimodal |
12B | 12b-it-qat |
Yes | 128k (131072) | Multimodal |
27B | 27b-it-qat |
Yes | 128k (131072) | Multimodal, most capable |
These models are built upon Google’s official gemma3:<size>-it-qat
releases. They inherit the base models’ core capabilities, context window potential, multilinguality, and performance characteristics, combined with the memory efficiency of Quantization-Aware Training (QAT).
The primary modification involves a custom Modelfile
(shown below for the 27B variant) that:
num_ctx
to 131072 (128k) for the 27B version and num_predict
to -1 (unlimited). Note: num_ctx
may need adjustment based on the specific variant and hardware.Pull the model: “`bash ollama pull orieg/gemma3-tools:27b-it-qat
(Replace 27b-it-qat with 1b-it-qat, 4b-it-qat, or 12b-it-qat for other sizes if available under the orieg/ namespace)
Run the model:
ollama run orieg/gemma3-tools:27b-it-qat
API Interaction When using these models via the Ollama API (e.g., /api/chat), ensure you:
Pass the available tools using the tools parameter in your request payload. Provide clear names and descriptions for each tool.
Your client application must be prepared to handle the output from the model, execute the corresponding tool, and send the result back in a subsequent message with role: “tool”.
Expected Tool Call Format The model is instructed to output tool calls in the following format only:
{“name”: “tool_name”, “parameters”: {“param_name”: “value”, …}}
Hardware Considerations Context Window (num_ctx): The Modelfile for the 27B variant defaults to PARAMETER num_ctx 131072 to target Gemma 3’s 128k context capability. The 1B variant supports a maximum of 32k. Users should adjust num_ctx based on the chosen model variant and specific hardware limitations.
VRAM/RAM Requirements:
All Variants (1B, 4B, 12B, 27B QAT): These quantized models are memory-efficient. The 1B, 4B, and 12B variants should run comfortably on GPUs with 24GB VRAM (e.g., NVIDIA RTX 3090⁄4090) with significant context.
27B QAT Variant: While this variant can also run on a 24GB VRAM GPU, achieving stable performance with very large context windows (approaching the 128k limit) might require careful memory management or potentially reducing num_ctx below the maximum. Performance will vary based on the specific workload and context size used.
CPU: Running any variant on CPU requires substantial system RAM (scaling with model size and context) and will be significantly slower than GPU inference.
Adjust num_ctx if needed: If you encounter performance issues or out-of-memory errors (especially with the 27B model at high context), creating a derivative model with a lower num_ctx (e.g., 32768, 16384) is recommended.
These models are based on Google’s gemma3:*-it-qat releases. Please refer to the original Gemma Terms of Use