1 Download Updated 1 month ago
ollama run ermwhatesigma420/sigmaAI:80M
A 80M parameter language model trained from scratch on a personal AMD GPU. This is my first test model — the full 221M version is currently in training.
SigmaAI-80M is the first test model I trained entirely from scratch — no pre-trained base, no fine-tuning on top of someone else’s weights. Every parameter was initialized randomly and learned from raw text data.
This model was primarily a proof of concept to validate the training pipeline before committing to the full 221M run. It showed the stack worked end-to-end: custom tokenizer, data pipeline, training loop, checkpointing, and GGUF export all functioning correctly on consumer AMD hardware.
The full SigmaAI 221M model is currently in training on the same hardware.
| Property | Value |
|---|---|
| Parameters | ~80M |
| Architecture | Transformer (LLaMA-style) |
| Vocabulary size | 32,000 |
| Tokenizer | Custom BPE (trained on the same corpus) |
| Attention | Multi-Head Attention with RoPE |
| FFN | SwiGLU |
| Normalization | RMSNorm |
| Precision | bfloat16 during training |
| Property | Value |
|---|---|
| Hardware | ASRock AMD Radeon RX 9060 XT Challenger 16GB OC |
| Framework | PyTorch + ROCm 7.2 |
| Optimizer | Fused AdamW |
| Learning rate | 2e-4 with cosine warmup |
| Mixed precision | bfloat16 (AMP) |
| Compiled | Yes — torch.compile() |
The entire training stack was written from scratch, including: - A custom BPE tokenizer trained on the corpus - A binary token cache for fast data loading - A background prefetch thread to keep the GPU saturated - An auto-restart launcher that resumes from checkpoints on any crash - Gradient checkpointing to fit larger batches in VRAM
Trained on a personal collection of text data including various JSON, JSONL, and plain text files — roughly 2.34 billion tokens total. The tokenizer was trained on a representative sample of the same corpus.
ollama run ermwhatesigma420/sigmaAI:80M
Or with the API:
curl http://localhost:11434/api/generate -d '{
"model": "ermwhatesigma420/sigmaAI:80M",
"prompt": "Once upon a time",
"stream": false
}'
temperature 0.7 # good balance of creativity vs coherence
repeat_penalty 1.1 # reduces repetitive output
This model was step one. The full SigmaAI 221M is currently training on the same hardware with: - 12 layers, d_model=1024, 16 attention heads - ~2.34 billion training tokens - Flash Attention enabled via ROCm 7.2 + AOTriton - The same fully custom training stack, now optimized for speed
I wanted to understand what it actually takes to train a language model end-to-end — not fine-tune an existing one, not run someone else’s weights, but build the entire thing from the ground up. Every component — the tokenizer, the model architecture, the training loop, the data pipeline — was written and debugged by hand on consumer hardware.
This project is proof that training a real transformer language model does not require a data center. It requires patience, a good GPU, and a lot of debugging.
This model was trained and released for personal and research use.
Mostly an school project.