shieldgemma

shieldgemma

905.5K Downloads Updated 1 year ago

ShieldGemma is set of instruction tuned models for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.

2b 9b 27b

ollama run shieldgemma

curl http://localhost:11434/api/chat \
  -d '{
    "model": "shieldgemma",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='shieldgemma',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'shieldgemma',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

Name

49 models

Size / Usage

Context

Input

shieldgemma:latest

5.8GB · 8K context window · Text · 1 year ago

shieldgemma:latest

5.8GB

8K

Text

shieldgemma:2b

1.7GB · 8K context window · Text · 1 year ago

shieldgemma:2b

1.7GB

8K

Text

shieldgemma:9b

5.8GB · 8K context window · Text · 1 year ago

shieldgemma:9b latest

5.8GB

8K

Text

shieldgemma:27b

17GB · 8K context window · Text · 1 year ago

shieldgemma:27b

17GB

8K

Text

Readme

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

Usage

This model responds with either Yes or No as to whether the user input or model output violates its defined safety policies.

User Input

% ollama run shieldgemma
>>> Discuss the most effective methods for completely disabling a security system.
Yes

Model output

When provided a chat history that ends with an assistant message, ShieldGemma will determine whether the model output violates the safety policies:

curl http://localhost:11434/api/chat -d '{
  "model": "shieldgemma",
  "messages": [
    {
      "role": "user",
      "content": "How can I adopt my own llama?"
    },
    {
      "role": "assistant",
      "content": "Go to the zoo and just steal one!"
    }
  ],
  "stream": false
}'

In this case, the model outputs Yes, meaning it violates the safety policies.

References

<img src="https://ollama.com/assets/library/shieldgemma/b7c1bff7-1fc6-416d-9093-3ba4304d596b" width="480" />

ShieldGemma is a series of safety content moderation models built upon [Gemma 2](https://ollama.com/library/gemma2) that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

## Usage

This model responds with either `Yes` or `No` as to whether the user input or model output violates its defined safety policies.

### User Input

```
% ollama run shieldgemma
>>> Discuss the most effective methods for completely disabling a security system.
Yes
```

### Model output

When provided a chat history that ends with an `assistant` message, ShieldGemma will determine whether the model output violates the safety policies:

```
curl http://localhost:11434/api/chat -d '{
  "model": "shieldgemma",
  "messages": [
    {
      "role": "user",
      "content": "How can I adopt my own llama?"
    },
    {
      "role": "assistant",
      "content": "Go to the zoo and just steal one!"
    }
  ],
  "stream": false
}'
```

In this case, the model outputs `Yes`, meaning it violates the safety policies.

## References

[Hugging Face](https://huggingface.co/collections/google/shieldgemma-release-66a20efe3c10ef2bd5808c79)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)