nuroai/avalon-2b

nuroai/ avalon-2b:latest

12 Downloads Updated 2 months ago

AVALON-2B: The First Sub-3B Self-Reflective Language

ollama run nuroai/avalon-2b

curl http://localhost:11434/api/chat \
  -d '{
    "model": "nuroai/avalon-2b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='nuroai/avalon-2b',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'nuroai/avalon-2b',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

4e601921476d · 1.6GB ·

model

archqwen35

parameters2.39B

quantizationQ4_K_M

1.6GB

system

You are AVALON, a self-reflective AI assistant. Before answering any question: 1. Determine if you n

474B

license

Apache 2.0 - https://opensource.org/licenses/Apache-2.0

55B

params

{ "num_ctx": 4096, "stop": [ "<|im_end|>", "<|im_start|>" ], "temper

92B

template

{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user

182B

Readme

AVALON-2B: The First Sub-3B Self-Reflective Language Model

The first sub-3B parameter model with Self-RAG (Self-Reflective Retrieval-Augmented Generation). Generates
[Retrieval], [No Retrieval], and [Utility:X] tokens to indicate when external knowledge is needed.

Key Features:
- 1.88B parameters (Q4_K_M quantized to 1.5GB)
- 82.5% Self-RAG token accuracy
- 62.04% MMLU (beats Gemma 4 E2B)
- 40+ tok/s on Apple M3
- Runs on iPhone (12 tok/s on A17 Pro)

Links:
- HuggingFace: https://huggingface.co/nuroai/Avalon-2B
- GitHub: https://github.com/nuro-labs/avalon-2b
- Paper: https://github.com/nuro-labs/avalon-2b/blob/main/AVALON_2B_Paper.pdf

Built by Nuro AI Labs (UK) - https://nuro.one

AVALON-2B: The First Sub-3B Self-Reflective Language Model                                                            
                                                                                                                        
  The first sub-3B parameter model with Self-RAG (Self-Reflective Retrieval-Augmented Generation). Generates            
  [Retrieval], [No Retrieval], and [Utility:X] tokens to indicate when external knowledge is needed.                    
                                                                                                                        
  Key Features:                                               
  - 1.88B parameters (Q4_K_M quantized to 1.5GB)              
  - 82.5% Self-RAG token accuracy                                                                                       
  - 62.04% MMLU (beats Gemma 4 E2B)                                                                                     
  - 40+ tok/s on Apple M3                                                                                               
  - Runs on iPhone (12 tok/s on A17 Pro)                                                                                
                                                                                                                        
  Links:                                                                                                                
  - HuggingFace: https://huggingface.co/nuroai/Avalon-2B                                                                
  - GitHub: https://github.com/nuro-labs/avalon-2b                                                                      
  - Paper: https://github.com/nuro-labs/avalon-2b/blob/main/AVALON_2B_Paper.pdf
                                                                                                                        
  Built by Nuro AI Labs (UK) - https://nuro.one

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)