185 Downloads Updated 1 week ago
ollama run MedAIBase/TranslateGemma:4b
TranslateGemma model card
Resources and Technical Documentation:
Terms of Use: Terms Authors: Google Translate
Model Information
Summary description and brief definition of inputs and outputs.
Description
TranslateGemma is a family of lightweight, state-of-the-art open translation models from Google, based on the Gemma 3 family of models. TranslateGemma models are designed to handle translation tasks across 55 languages. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art translation models and helping foster innovation for everyone.
Inputs and outputs
Usage
TranslateGemma is designed to work with a specific chat template that supports direct translation of a text input, or text-extraction-and-translation from an image input. This chat template has been implemented with Hugging Face transformers’ chat templating system and is compatible with the apply_chat_template() function provided by the Gemma tokenizer and Gemma 3 processor. Notable differences from other models’ chat templates include:
Additionally, TranslateGemma may respond well to other prompting techniques to support use cases that go beyond the provided chat template, such as Automatic Translation Post-Editing. As these are not officially supported, they should be crafted manually using the special control tokens and structures specified in the Gemma 3 Technical Report, and sent directly to the tokenizer or processor instead of using the apply_chat_template() function. The TranslateGemma team is interested in hearing about your experiences with alternate prompts, please reach out with any questions and feedback.
With Pipelines
from modelscope import pipeline
import torch
pipe = pipeline(
“image-text-to-text”,
model=“google/translategemma-4b-it”,
device=“cuda”,
dtype=torch.bfloat16
)
# —- Text Translation —-
messages = [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“source_lang_code”: “cs”,
“target_lang_code”: “de-DE”,
“text”: “V nejhorším případě i k prasknutí čočky.”,
}
],
}
]
output = pipe(text=messages, max_new_tokens=200)
print(output[0][“generated_text”][-1][“content”])
# —- Text Extraction and Translation —-
messages = [
{
“role”: “user”,
“content”: [
{
“type”: “image”,
“source_lang_code”: “cs”,
“target_lang_code”: “de-DE”,
“url”: “https://c7.alamy.com/comp/2YAX36N/traffic-signs-in-czech-republic-pedestrian-zone-2YAX36N.jpg”,
},
],
}
]
output = pipe(text=messages, max_new_tokens=200)
print(output[0][“generated_text”][-1][“content”])
With direct initialization
import torch
from modelscope import AutoModelForImageTextToText, AutoProcessor
model_id = “google/translategemma-4b-it”
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id, device_map=“auto”)
# —- Text Translation —-
messages = [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“source_lang_code”: “cs”,
“target_lang_code”: “de-DE”,
“text”: “V nejhorším případě i k prasknutí čočky.”,
}
],
}
]
inputs = processor.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors=“pt”
).to(model.device, dtype=torch.bfloat16)
input_len = len(inputs[‘input_ids’][0])
with torch.inference_mode():
generation = model.generate(**inputs, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
# —- Text Extraction and Translation —-
messages = [
{
“role”: “user”,
“content”: [
{
“type”: “image”,
“source_lang_code”: “cs”,
“target_lang_code”: “de-DE”,
“url”: “https://c7.alamy.com/comp/2YAX36N/traffic-signs-in-czech-republic-pedestrian-zone-2YAX36N.jpg”,
},
],
}
]
inputs = processor.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors=“pt”
).to(model.device, dtype=torch.bfloat16)
with torch.inference_mode():
generation = model.generate(**inputs, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
Citation
@article{gemmatranslate2026,
title={{TranslateGemma Technical Report}},
url={https://arxiv.org/pdf/2601.09012},
publisher={Google DeepMind},
author={{Google Translate Research Team} and
Finkelstein, Mara and
Caswell, Isaac and
Domhan, Tobias and
Peter, Jan-Thorsten and
Juraska, Juraj and
Riley, Parker and
Deutsch, Daniel and
Dilanni, Cole and
Cherry, Colin and
Briakou, Eleftheria and
Nielsen, Elizabeth and
Luo, Jiaming and
Agrawal, Sweta and
Xu, Wenda and
Kats, Erin and
Jaskiewicz, Stephane and
Freitag, Markus and
Vilar, David
},
year={2026}
}
Model Data
Data used for model training and how the data was processed.
Training Dataset
The models were fine-tuned from the original Gemma 3 checkpoints using parallel data from a wide variety of sources. The TranslateGemma models used 4.3 billion tokens during SFT and 10.2 million tokens during the reinforcement learning phase. The key components were:
Implementation Information
Details about the model internals.
Hardware
TranslateGemma was trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p and TPUv5e). TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain:
Software
Training was done using JAX and ML Pathways. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google’s latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; “the ‘single controller’ programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.”
Evaluation
Model evaluation metrics and results.
Benchmark Results
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation.
| 4B | 12B | 27B | |
|---|---|---|---|
| WMT24++ (55 langs) | |||
| MetricX ↓ | 5.32 | 3.60 | 3.09 |
| Comet ↑ | 81.6 | 83.5 | 84.4 |
| WMT25 (10 langs) | |||
| ** **MQM ↓ | N/A | 7.94 | 5.86 |
| Vistra (4 langs)* | |||
| ** **MetricX ↓ | 2.57 | 2.08 | 1.57 |
Ethics and Safety
Ethics and safety evaluation approach and results.
Evaluation Approach
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:
Evaluation Results
For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models’ performance with respect to ungrounded inferences.
Usage and Limitations
These models have certain limitations that users should be aware of.
Intended Usage
The models have been trained with the explicit goal of producing text translation from textual or image output. No claims about other capabilities are made about these models.
Limitations
Ethical Considerations and Risks
The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Risks identified and mitigations:
Benefits
At the time of release, this family of models provides high-performance translation model implementations fine-tuned from Gemma 3 models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives.