5,341 Downloads Updated 11 months ago
Name
14 models
gemma-2-2b-jpn-it:latest
2.8GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q3_K_S
1.4GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q3_K_M
1.5GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q3_K_L
1.6GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q4_0
1.6GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q4_1
1.8GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q4_K_S
1.6GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q4_K_M
1.7GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q5_0
1.9GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q5_1
2.0GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q5_K_S
1.9GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q5_K_M
1.9GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q6_K
2.2GB · 8K context window · Text · 11 months ago
gemma-2-2b-jpn-it:q8_0
2.8GB · 8K context window · Text · 11 months ago
https://huggingface.co/google/gemma-2-2b-jpn-it
上記モデルカードより。
Terms of Use: Terms
Authors: Google
Summary description and brief definition of inputs and outputs.
Gemma is a series of best-in-class open models and draws inspiration and technological lineage from the Gemini family of models. They are text-to-text, decoder-only large language models with open weights. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
Gemma-2-JPN is a Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language with the same level of performance of English only queries on Gemma 2.
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
pip install -U transformers
Then, copy the snippet from the section that is relevant for your usecase.
pipeline
APIimport torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="google/gemma-2-2b-jpn-it",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]
outputs = pipe(messages, return_full_text=False, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"].strip()
print(assistant_response)
It can also be used for translation, as follows:
translation_input_text = f"Translate the following poem from Japanese to English:\n\n{assistant_response}"
messages = [
{"role": "user", "content": translation_input_text},
]
outputs = pipe(messages, return_full_text=False, max_new_tokens=1024)
translated_response = outputs[0]["generated_text"].strip()
print(translated_response)
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it")
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-jpn-it",
device_map="auto",
torch_dtype=torch.bfloat16,
)
messages = [
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0]
print(generated_text.strip())
The native weights of this model were exported in bfloat16
precision.
You can also use float32
if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to float32
). See examples below.
torch.float32
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it")
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-jpn-it",
device_map="auto",
)
messages = [
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0]
print(generated_text.strip())
Data used for model training and how the data was processed.
These models were trained on a dataset of text data that includes a wide variety of sources, totaling 8 trillion tokens. Here are the key components:
The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats.
Here are the key data cleaning and filtering methods applied to the training data:
Details about the model internals.
Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5p).
Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain:
These advantages are aligned with Google’s commitments to operate sustainably.
Training was done using JAX and ML Pathways.
JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
ML Pathways is Google’s latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones.
Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; “the ‘single controller’ programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow.”
To assess the quality of this model, we collected a diverse set of Japanese prompts and evaluated performance using an LLM-as-a-judge approach against GPT-3.5. The rating system is based on a 7-scale assessments, which are MuchBetterThan, BetterThan, SlightlyBetterThan, AboutTheSame, SlightlyWorse, WorseThan, MuchWorseThan associated with the numerical scores 1.5, 1.0, 0.5, 0, -0.5, -1.0, -1.5 respectively. We also tracked the ability of the model to answer in the correct language: for a Japanese prompt, the model should typically answer in Japanese rather than defaulting to English.
Benchmark |
Gemma-2-IT |
Gemma-2-IT-JPN |
|
---|---|---|---|
Preference vs GPT-3.5 |
-0.25 ± 0.05 |
0.03 ± 0.04 |
|
Language correctness |
86.47% |
98.24% |
Ethics and safety evaluation approach and results.
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:
These models have certain limitations that users should be aware of.
Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.
The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Risks identified and mitigations:
At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models.