9 Downloads Updated 2 days ago
ollama run rafi-dev/rb-nano
A 48M-parameter, GPT-2-style decoder-only transformer trained from scratch as part of the Leopard AI Model Suite. Small enough to run on CPU or any GPU; built as a learning/research model, not a production assistant.
ollama run rafi-dev/rb-nano
rb-nano is a tiny chat model pretrained on web text and instruction-tuned for short, single- and multi-turn conversations. At 48M parameters it sits well below the knowledge capacity of mainstream models, so treat it as a fast, lightweight demonstrator rather than a factual reference.
| Type | Decoder-only transformer (GPT-2 family) |
| Parameters | ~48M |
Embedding dim (d_model) |
512 |
| Layers | 10 |
| Attention heads | 8 |
| Context length | 1024 tokens |
| Position embeddings | Learned |
| Norm / activation | LayerNorm, GELU-tanh |
| Attention | Combined QKV, SDPA (flash) |
| Head | Weight-tied to token embeddings |
| Tokenizer | ByteLevel BPE, 32k vocab |
| Format | GGUF, f16 (gpt2 architecture) |
sample-10BT), ~50M tokens. Final val loss ≈ 3.44.The model is trained on a simple user: / ai: turn format (Ollama’s chat template handles this automatically):
user: hello
ai: Hi there! How can I help you today?
user: what is python?
ai:
temperature 0.7
top_k 40
top_p 0.9
repeat_penalty 1.3
Trained on publicly available datasets (FineWeb-Edu, Alpaca, Dolly, CodeAlpaca, ShareGPT). Review each dataset’s license before redistributing derived outputs.
rb-nano was built by Rafi and Buddi — pretrained and finetuned from scratch on a single RTX 4070 (8 GB VRAM). It’s a passion project: proof that a coherent little chat model can be trained end-to-end on consumer hardware.
If you enjoy it and want to support more experiments like this, you can buy us a coffee ☕. Thank you for trying rb-nano — we hope you like it.