Updated 3 weeks ago
3 weeks ago
026d80ef2256 · 4.9GB
Readme
Model Summary
Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.
Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini outperforms prior small models overall across 11 benchmarks covering different evaluation tasks.
It is also the #1 8B generative model on RewardBench.
The large version of this model is out now. Get started with the world’s most powerful evaluation model for free here.
Use cases
Atla-1-8B can be used as a general-purpose evaluation model. It supports different inputs & scoring scales, generates structured evaluation outputs, and provides qualitative critiques with reasoning.
To achieve best results, we provide the prompts we used for training here.
CLI Quickstart
Open the terminal and run ollama run atla/selene-mini
Python Quickstart
Prerequisites
Ollama should be installed and running
Pull Selene-Mini to use with the Python library:
ollama pull atla/selene-mini
Example
from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='atla/selene-mini', messages=[
{
'role': 'user',
'content': """You are tasked with evaluating an LLM response based on a given instruction and scoring rubric. Provide comprehensive feedback on the response, strictly adhering to the scoring rubric. Follow this with a score between 1 and 5.
Your reply should strictly follow this format:
**Reasoning:** <Your feedback>
**Result:** <a score between 1 and 5>
Scoring Rubric:
Does the response effectively use humor or wit to enhance the conversation?
Score 1: The response is devoid of any humor or wit.
Score 2: The response attempts humor, but it falls flat or is inappropriate.
Score 3: The response includes humor or wit, but it could be more effectively integrated.
Score 4: The response uses humor or wit effectively in most instances, enhancing the conversation.
Score 5: The response perfectly integrates humor or wit, greatly enhancing the enjoyment of the conversation.
Here is the data:
Instruction: Tell me a joke.
Response: Why did the chicken cross the road? To get to the other side""",
},
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)
API Quickstart
Example:
curl -X POST http://localhost:11434/api/generate \
-d '{
"model": "atla/selene-mini",
"prompt": "Try the prompt above"
}'