atla/selene-mini

Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.

Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini outperforms prior small models overall across 11 benchmarks covering different evaluation tasks.

It is also the #1 8B generative model on RewardBench.

The large version of this model is out now. Get started with the world’s most powerful evaluation model for free here.

Use cases

Atla-1-8B can be used as a general-purpose evaluation model. It supports different inputs & scoring scales, generates structured evaluation outputs, and provides qualitative critiques with reasoning.

To achieve best results, we provide the prompts we used for training here.

CLI Quickstart

Open the terminal and run ollama run atla/selene-mini

Python Quickstart

Prerequisites

Ollama should be installed and running
Pull Selene-Mini to use with the Python library: ollama pull atla/selene-mini

Example

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='atla/selene-mini', messages=[
  {
    'role': 'user',
    'content': """You are tasked with evaluating an LLM response based on a given instruction and scoring rubric. Provide comprehensive feedback on the response, strictly adhering to the scoring rubric. Follow this with a score between 1 and 5.

Your reply should strictly follow this format:
**Reasoning:** <Your feedback>
**Result:** <a score between 1 and 5>

Scoring Rubric:
Does the response effectively use humor or wit to enhance the conversation?
Score 1: The response is devoid of any humor or wit.
Score 2: The response attempts humor, but it falls flat or is inappropriate.
Score 3: The response includes humor or wit, but it could be more effectively integrated.
Score 4: The response uses humor or wit effectively in most instances, enhancing the conversation.
Score 5: The response perfectly integrates humor or wit, greatly enhancing the enjoyment of the conversation.

Here is the data:
Instruction: Tell me a joke.
Response: Why did the chicken cross the road? To get to the other side""",
  },
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)

Python documentation

API Quickstart

Example:

curl -X POST http://localhost:11434/api/generate \
-d '{
    "model": "atla/selene-mini",
    "prompt": "Try the prompt above"
}'

API documentation

Selene Resources

Hugging Face
Technical Report
Discord

### Model Summary
Atla Selene Mini is a **state-of-the-art small language model-as-a-judge (SLMJ)**. Selene Mini achieves comparable performance to models 10x its size, **outperforming GPT-4o on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench), [EvalBiasBench](https://arxiv.org/abs/2407.06551), and [AutoJ](https://arxiv.org/html/2310.05470v2)**.

Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini **outperforms prior small models overall across 11 benchmarks** covering different evaluation tasks.

It is also the **#1 8B generative model on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench)**.

The large version of this model is out now. Get started with the **world's most powerful evaluation model** for free [here](https://www.atla-ai.com/sign-up?utm_source=ollama&utm_medium=community&utm_campaign=WL_OL_modelcard_communitypost_sel1minilaunch).

### Use cases

Atla-1-8B can be used as a **general-purpose evaluation model**. It supports different inputs & scoring scales, generates structured evaluation outputs, and provides qualitative critiques with reasoning.
  
To achieve best results, **we provide the prompts we used for training [here](placeholder).**

### CLI Quickstart

Open the terminal and run `ollama run atla/selene-mini`

### Python Quickstart

**Prerequisites**

* Ollama should be installed and running

* Pull Selene-Mini to use with the Python library: `ollama pull atla/selene-mini`

##### Example

```python
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='atla/selene-mini', messages=[
  {
    'role': 'user',
    'content': """You are tasked with evaluating an LLM response based on a given instruction and scoring rubric. Provide comprehensive feedback on the response, strictly adhering to the scoring rubric. Follow this with a score between 1 and 5.

Your reply should strictly follow this format:
**Reasoning:** <Your feedback>
**Result:** <a score between 1 and 5>

Scoring Rubric:
Does the response effectively use humor or wit to enhance the conversation?
Score 1: The response is devoid of any humor or wit.
Score 2: The response attempts humor, but it falls flat or is inappropriate.
Score 3: The response includes humor or wit, but it could be more effectively integrated.
Score 4: The response uses humor or wit effectively in most instances, enhancing the conversation.
Score 5: The response perfectly integrates humor or wit, greatly enhancing the enjoyment of the conversation.

Here is the data:
Instruction: Tell me a joke.
Response: Why did the chicken cross the road? To get to the other side""",
  },
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)
```

[Python documentation](https://github.com/ollama/ollama-python)

### API Quickstart

##### Example:

```bash
curl -X POST http://localhost:11434/api/generate \
-d '{
    "model": "atla/selene-mini",
    "prompt": "Try the prompt above"
}'
```

[API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)

### Selene Resources
[Hugging Face](https://huggingface.co/AtlaAI/Atla-8B-preview) 
[Technical Report](https://huggingface.co/spaces/AtlaAI/selene-1-mini-tech-report) 
[Discord](https://discord.com/invite/qFCMgkGwUK)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)