Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models

2B

99 Pulls Updated 7 weeks ago

7 weeks ago

e955053d19b8 · 1.6GB

model
gemma2
·
2.61B
·
Q4_0
license
Gemma Terms of Use Last modified: February 21, 2024 By using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement. Section 1: DEFINITIONS 1.1 Definitions (a) "Agreement" or "Gemma Terms of Use" means these terms and conditions that govern the use, reproduction, Distribution or modification of the Gemma Services and any terms and conditions incorporated by reference. (b) "Distribution" or "Distribute" means any transmission, publication, or other sharing of Gemma or Model Derivatives to a third party, including by providing or making Gemma or its functionality available as a hosted service via API, web access, or any other electronic or remote means ("Hosted Service"). (c) "Gemma" means the set of machine learning language models, trained model weights and parameters identified at ai.google.dev/gemma, regardless of the source that you obtained it from. (d) "Google" means Google LLC. (e) "Model Derivatives" means all (i) modifications to Gemma, (ii) works based on Gemma, or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Gemma, to that model in order to cause that model to perform similarly to Gemma, including distillation methods that use intermediate data representations or methods based on the generation of synthetic data Outputs by Gemma for training that model. For clarity, Outputs are not deemed Model Derivatives. (f) "Output" means the information content output of Gemma or a Model Derivative that results from operating or otherwise using Gemma or the Model Derivative, including via a Hosted Service. 1.2 As used in this Agreement, "including" means "including without limitation". Section 2: ELIGIBILITY AND USAGE 2.1 Eligibility You represent and warrant that you have the legal capacity to enter into this Agreement (including being of sufficient age of consent). If you are accessing or using any of the Gemma Services for or on behalf of a legal entity, (a) you are entering into this Agreement on behalf of yourself and that legal entity, (b) you represent and warrant that you have the authority to act on behalf of and bind that entity to this Agreement and (c) references to "you" or "your" in the remainder of this Agreement refers to both you (as an individual) and that entity. 2.2 Use You may use, reproduce, modify, Distribute, perform or display any of the Gemma Services only in accordance with the terms of this Agreement, and must not violate (or encourage or permit anyone else to violate) any term of this Agreement. Section 3: DISTRIBUTION AND RESTRICTIONS 3.1 Distribution and Redistribution You may reproduce or Distribute copies of Gemma or Model Derivatives if you meet all of the following conditions: You must include the use restrictions referenced in Section 3.2 as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Gemma or Model Derivatives and you must provide notice to subsequent users you Distribute to that Gemma or Model Derivatives are subject to the use restrictions in Section 3.2. You must provide all third party recipients of Gemma or Model Derivatives a copy of this Agreement. You must cause any modified files to carry prominent notices stating that you modified the files. All Distributions (other than through a Hosted Service) must be accompanied by a "Notice" text file that contains the following notice: "Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms". You may add your own intellectual property statement to your modifications and, except as set forth in this Section, may provide additional or different terms and conditions for use, reproduction, or Distribution of your modifications, or for any such Model Derivatives as a whole, provided your use, reproduction, modification, Distribution, performance, and display of Gemma otherwise complies with the terms and conditions of this Agreement. Any additional or different terms and conditions you impose must not conflict with the terms of this Agreement. 3.2 Use Restrictions You must not use any of the Gemma Services: for the restricted uses set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma/prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement; or in violation of applicable laws and regulations. To the maximum extent permitted by law, Google reserves the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonably believes are in violation of this Agreement. 3.3 Generated Output Google claims no rights in Outputs you generate using Gemma. You and your users are solely responsible for Outputs and their subsequent uses. Section 4: ADDITIONAL PROVISIONS 4.1 Updates Google may update Gemma from time to time, and you must make reasonable efforts to use the latest version of Gemma. 4.2 Trademarks Nothing in this Agreement grants you any rights to use Google's trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between you and Google. Google reserves any rights not expressly granted herein. 4.3 DISCLAIMER OF WARRANTY UNLESS REQUIRED BY APPLICABLE LAW, THE GEMMA SERVICES, AND OUTPUTS, ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR OR DISTRIBUTING ANY OF THE GEMMA SERVICES OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR USE OR DISTRIBUTION OF ANY OF THE GEMMA SERVICES OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT. 4.4 LIMITATION OF LIABILITY TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY, CONTRACT, OR OTHERWISE, UNLESS REQUIRED BY APPLICABLE LAW, SHALL GOOGLE OR ITS AFFILIATES BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL, OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO, ANY OF THE GEMMA SERVICES OR OUTPUTS EVEN IF GOOGLE OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 4.5 Term, Termination, and Survival The term of this Agreement will commence upon your acceptance of this Agreement (including acceptance by your use, modification, or Distribution, reproduction, performance or display of any portion or element of the Gemma Services) and will continue in full force and effect until terminated in accordance with the terms of this Agreement. Google may terminate this Agreement if you are in breach of any term of this Agreement. Upon termination of this Agreement, you must delete and cease use and Distribution of all copies of Gemma and Model Derivatives in your possession or control. Sections 1, 2.1, 3.3, 4.2 to 4.9 shall survive the termination of this Agreement. 4.6 Governing Law and Jurisdiction This Agreement will be governed by the laws of the State of California without regard to choice of law principles. The UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The state and federal courts of Santa Clara County, California shall have exclusive jurisdiction of any dispute arising out of this Agreement. 4.7 Severability If any provision of this Agreement is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein. 4.8 Entire Agreement This Agreement states all the terms agreed between the parties and supersedes all other agreements between the parties as of the date of acceptance relating to its subject matter. 4.9 No Waiver Google will not be treated as having waived any rights by not exercising (or delaying the exercise of) any rights under this Agreement.
params
{"num_ctx":3072,"num_predict":3072,"stop":["<start_of_turn>","<end_of_turn>"]}
template
{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user {{ .Content }}<end_of_turn> {{ if $last }}<start_of_turn>model {{ end }} {{- else if eq .Role "assistant" }}<start_of_turn>model {{ .Content }}{{ if not $last }}<end_of_turn> {{ end }} {{- end }} {{- end }}

Readme

Requires ollama version at least v0.1.48

  • Quantizations with i-matrix calibration_datav3.txt
  • Safetensors converted to fp32
  • Context length set to 3072, max context length is 8192

Gemma 2 model card

Model Page: Gemma

Resources and Technical Documentation:

Terms of Use: Terms

Authors: Google

Model Information

Summary description and brief definition of inputs and outputs.

Description

Gemma is a family of lightweight, state-of-the-art open models from Google,
built from the same research and technology used to create the Gemini models.
They are text-to-text, decoder-only large language models, available in English,
with open weights for both pre-trained variants and instruction-tuned variants.
Gemma models are well-suited for a variety of text generation tasks, including
question answering, summarization, and reasoning. Their relatively small size
makes it possible to deploy them in environments with limited resources such as
a laptop, desktop or your own cloud infrastructure, democratizing access to
state of the art AI models and helping foster innovation for everyone.

Usage

Below we share some code snippets on how to get quickly started with running the model. First make sure to pip install -U transformers, then copy the snippet from the section that is relevant for your usecase.

Running the model on a single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using different precisions

The native weights of this model were exported in bfloat16 precision. You can use float16, which may be faster on certain hardware, indicating the torch_dtype when loading the model. For convenience, the float16 revision of the repo contains a copy of the weights already converted to that precision.

You can also use float32 if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to float32). See examples below.

  • Using torch.float16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.float16,
    revision="float16",
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
  • Using torch.bfloat16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
  • Upcasting to torch.float32
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Quantized Versions through bitsandbytes

  • Using 8-bit precision (int8)
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    quantization_config=quantization_config)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
  • Using 4-bit precision
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    quantization_config=quantization_config)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Other optimizations

  • Flash Attention 2

First make sure to install flash-attn in your environment pip install flash-attn

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
+   attn_implementation="flash_attention_2"
).to(0)

Chat Template

The instruction-tuned models use a chat template that must be adhered to for conversational use.
The easiest way to apply it is using the tokenizer’s built-in chat template, as shown in the following snippet.

Let’s load the model and apply the chat template to a conversation. In this example, we’ll start with a single user interaction:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "google/gemma-2-9b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,)

chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

At this point, the prompt contains the following text:

<bos><start_of_turn>user
Write a hello world program<end_of_turn>
<start_of_turn>model

As you can see, each turn is preceded by a <start_of_turn> delimiter and then the role of the entity
(either user, for content supplied by the user, or model for LLM responses). Turns finish with
the <end_of_turn> token.

You can follow this format to build the prompt manually, if you need to do it without the tokenizer’s
chat template.

After the prompt is ready, generation can be performed like this:

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Inputs and outputs

  • Input: Text string, such as a question, a prompt, or a document to be
    summarized.
  • Output: Generated English-language text in response to the input, such
    as an answer to a question, or a summary of a document.

Citation

@article{gemma_2024,
    title={Gemma},
    url={https://www.kaggle.com/m/3301},
    DOI={10.34740/KAGGLE/M/3301},
    publisher={Kaggle},
    author={Gemma Team},
    year={2024}
}

Model Data

Data used for model training and how the data was processed.

Training Dataset

These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens.
Here are the key components:

  • Web Documents: A diverse collection of web text ensures the model is exposed
    to a broad range of linguistic styles, topics, and vocabulary. Primarily
    English-language content.
  • Code: Exposing the model to code helps it to learn the syntax and patterns of
    programming languages, which improves its ability to generate code or
    understand code-related questions.
  • Mathematics: Training on mathematical text helps the model learn logical
    reasoning, symbolic representation, and to address mathematical queries.

The combination of these diverse data sources is crucial for training a powerful
language model that can handle a wide variety of different tasks and text
formats.

Data Preprocessing

Here are the key data cleaning and filtering methods applied to the training
data:

  • CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was
    applied at multiple stages in the data preparation process to ensure the
    exclusion of harmful and illegal content.
  • Sensitive Data Filtering: As part of making Gemma pre-trained models safe and
    reliable, automated techniques were used to filter out certain personal
    information and other sensitive data from training sets.
  • Additional methods: Filtering based on content quality and safety in line with
    our policies.

Implementation Information

Details about the model internals.

Hardware

Gemma was trained using the latest generation of
Tensor Processing Unit (TPU) hardware (TPUv5p).

Training large language models requires significant computational power. TPUs,
designed specifically for matrix operations common in machine learning, offer
several advantages in this domain:

  • Performance: TPUs are specifically designed to handle the massive computations
    involved in training LLMs. They can speed up training considerably compared to
    CPUs.
  • Memory: TPUs often come with large amounts of high-bandwidth memory, allowing
    for the handling of large models and batch sizes during training. This can
    lead to better model quality.
  • Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for
    handling the growing complexity of large foundation models. You can distribute
    training across multiple TPU devices for faster and more efficient processing.
  • Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective
    solution for training large models compared to CPU-based infrastructure,
    especially when considering the time and resources saved due to faster
    training.
  • These advantages are aligned with
    Google’s commitments to operate sustainably.

Software

Training was done using JAX and ML Pathways.

JAX allows researchers to take advantage of the latest generation of hardware,
including TPUs, for faster and more efficient training of large models.

ML Pathways is Google’s latest effort to build artificially intelligent systems
capable of generalizing across multiple tasks. This is specially suitable for
foundation models, including large language models like
these ones.

Together, JAX and ML Pathways are used as described in the
paper about the Gemini family of models; “the ‘single
controller’ programming model of Jax and Pathways allows a single Python
process to orchestrate the entire training run, dramatically simplifying the
development workflow.”

Evaluation

Model evaluation metrics and results.

Benchmark Results

These models were evaluated against a large collection of different datasets and
metrics to cover different aspects of text generation:

Benchmark Metric Gemma PT 9B Gemma PT 27B
MMLU 5-shot, top-1 71.3 75.2
HellaSwag 10-shot 81.9 86.4
PIQA 0-shot 81.7 83.2
SocialIQA 0-shot 53.4 53.7
BoolQ 0-shot 84.2 84.8
WinoGrande partial score 80.6 83.7
ARC-e 0-shot 88.0 88.6
ARC-c 25-shot 68.4 71.4
TriviaQA 5-shot 76.6 83.7
Natural Questions 5-shot 29.2 34.5
HumanEval pass@1 40.2 51.8
MBPP 3-shot 52.4 62.6
GSM8K 5-shot, maj@1 68.6 74.0
MATH 4-shot 36.6 42.3
AGIEval 3-5-shot 52.8 55.1
BIG-Bench 3-shot, CoT 68.2 74.9
—————————— ————- ———– ————

Ethics and Safety

Ethics and safety evaluation approach and results.

Evaluation Approach

Our evaluation methods include structured evaluations and internal red-teaming
testing of relevant content policies. Red-teaming was conducted by a number of
different teams, each with different goals and human evaluation metrics. These
models were evaluated against a number of different categories relevant to
ethics and safety, including:

  • Text-to-Text Content Safety: Human evaluation on prompts covering safety
    policies including child sexual abuse and exploitation, harassment, violence
    and gore, and hate speech.
  • Text-to-Text Representational Harms: Benchmark against relevant academic
    datasets such as WinoBias and BBQ Dataset.
  • Memorization: Automated evaluation of memorization of training data, including
    the risk of personally identifiable information exposure.
  • Large-scale harm: Tests for “dangerous capabilities,” such as chemical,
    biological, radiological, and nuclear (CBRN) risks.

Evaluation Results

The results of ethics and safety evaluations are within acceptable thresholds
for meeting internal policies for categories such as child
safety, content safety, representational harms, memorization, large-scale harms.
On top of robust internal evaluations, the results of well-known safety
benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA
are shown here.

Gemma 2.0

Benchmark Metric Gemma 2 IT 9B Gemma 2 IT 27B
RealToxicity average 8.25 8.84
CrowS-Pairs top-1 37.47 36.67
BBQ Ambig 1-shot, top-1 88.58 85.99
BBQ Disambig top-1 82.67 86.94
Winogender top-1 79.17 77.22
TruthfulQA 50.27 51.60
Winobias 1_2 78.09 81.94
Winobias 2_2 95.32 97.22
Toxigen 39.30 38.42
———————— ————- ————— —————-

Usage and Limitations

These models have certain limitations that users should be aware of.

Intended Usage

Open Large Language Models (LLMs) have a wide range of applications across
various industries and domains. The following list of potential uses is not
comprehensive. The purpose of this list is to provide contextual information
about the possible use-cases that the model creators considered as part of model
training and development.

  • Content Creation and Communication
    • Text Generation: These models can be used to generate creative text formats
      such as poems, scripts, code, marketing copy, and email drafts.
    • Chatbots and Conversational AI: Power conversational interfaces for customer
      service, virtual assistants, or interactive applications.
    • Text Summarization: Generate concise summaries of a text corpus, research
      papers, or reports.
  • Research and Education
    • Natural Language Processing (NLP) Research: These models can serve as a
      foundation for researchers to experiment with NLP techniques, develop
      algorithms, and contribute to the advancement of the field.
    • Language Learning Tools: Support interactive language learning experiences,
      aiding in grammar correction or providing writing practice.
    • Knowledge Exploration: Assist researchers in exploring large bodies of text
      by generating summaries or answering questions about specific topics.

Limitations

  • Training Data
    • The quality and diversity of the training data significantly influence the
      model’s capabilities. Biases or gaps in the training data can lead to
      limitations in the model’s responses.
    • The scope of the training dataset determines the subject areas the model can
      handle effectively.
  • Context and Task Complexity
    • LLMs are better at tasks that can be framed with clear prompts and
      instructions. Open-ended or highly complex tasks might be challenging.
    • A model’s performance can be influenced by the amount of context provided
      (longer context generally leads to better outputs, up to a certain point).
  • Language Ambiguity and Nuance
    • Natural language is inherently complex. LLMs might struggle to grasp subtle
      nuances, sarcasm, or figurative language.
  • Factual Accuracy
    • LLMs generate responses based on information they learned from their
      training datasets, but they are not knowledge bases. They may generate
      incorrect or outdated factual statements.
  • Common Sense
    • LLMs rely on statistical patterns in language. They might lack the ability
      to apply common sense reasoning in certain situations.

Ethical Considerations and Risks

The development of large language models (LLMs) raises several ethical concerns.
In creating an open model, we have carefully considered the following:

  • Bias and Fairness
    • LLMs trained on large-scale, real-world text data can reflect socio-cultural
      biases embedded in the training material. These models underwent careful
      scrutiny, input data pre-processing described and posterior evaluations
      reported in this card.
  • Misinformation and Misuse
    • LLMs can be misused to generate text that is false, misleading, or harmful.
    • Guidelines are provided for responsible use with the model, see the
      Responsible Generative AI Toolkit.
  • Transparency and Accountability:
    • This model card summarizes details on the models’ architecture,
      capabilities, limitations, and evaluation processes.
    • A responsibly developed open model offers the opportunity to share
      innovation by making LLM technology accessible to developers and researchers
      across the AI ecosystem.

Risks identified and mitigations:

  • Perpetuation of biases: It’s encouraged to perform continuous monitoring
    (using evaluation metrics, human review) and the exploration of de-biasing
    techniques during model training, fine-tuning, and other use cases.
  • Generation of harmful content: Mechanisms and guidelines for content safety
    are essential. Developers are encouraged to exercise caution and implement
    appropriate content safety safeguards based on their specific product policies
    and application use cases.
  • Misuse for malicious purposes: Technical limitations and developer and
    end-user education can help mitigate against malicious applications of LLMs.
    Educational resources and reporting mechanisms for users to flag misuse are
    provided. Prohibited uses of Gemma models are outlined in the
    Gemma Prohibited Use Policy.
  • Privacy violations: Models were trained on data filtered for removal of PII
    (Personally Identifiable Information). Developers are encouraged to adhere to
    privacy regulations with privacy-preserving techniques.

Benefits

At the time of release, this family of models provides high-performance open
large language model implementations designed from the ground up for Responsible
AI development compared to similarly sized models.

Using the benchmark evaluation metrics described in this document, these models
have shown to provide superior performance to other, comparably-sized open model
alternatives.