1 1 month ago

My first ever made self model and trained

80m
ollama run ermwhatesigma420/sigmaAI:80M

Details

1 month ago

f1eda642b7ef · 130MB ·

llama
·
64.2M
·
F16
You are SigmaAI, a helpful AI assistant.
{ "num_ctx": 1024, "stop": [ "<eos>", "</s>" ], "temperature": 0.7 }

Readme

SigmaAI — 80M (Test Run)

A 80M parameter language model trained from scratch on a personal AMD GPU. This is my first test model — the full 221M version is currently in training.


About

SigmaAI-80M is the first test model I trained entirely from scratch — no pre-trained base, no fine-tuning on top of someone else’s weights. Every parameter was initialized randomly and learned from raw text data.

This model was primarily a proof of concept to validate the training pipeline before committing to the full 221M run. It showed the stack worked end-to-end: custom tokenizer, data pipeline, training loop, checkpointing, and GGUF export all functioning correctly on consumer AMD hardware.

The full SigmaAI 221M model is currently in training on the same hardware.


Model Details

Property Value
Parameters ~80M
Architecture Transformer (LLaMA-style)
Vocabulary size 32,000
Tokenizer Custom BPE (trained on the same corpus)
Attention Multi-Head Attention with RoPE
FFN SwiGLU
Normalization RMSNorm
Precision bfloat16 during training

Training

Property Value
Hardware ASRock AMD Radeon RX 9060 XT Challenger 16GB OC
Framework PyTorch + ROCm 7.2
Optimizer Fused AdamW
Learning rate 2e-4 with cosine warmup
Mixed precision bfloat16 (AMP)
Compiled Yes — torch.compile()

The entire training stack was written from scratch, including: - A custom BPE tokenizer trained on the corpus - A binary token cache for fast data loading - A background prefetch thread to keep the GPU saturated - An auto-restart launcher that resumes from checkpoints on any crash - Gradient checkpointing to fit larger batches in VRAM


Training Data

Trained on a personal collection of text data including various JSON, JSONL, and plain text files — roughly 2.34 billion tokens total. The tokenizer was trained on a representative sample of the same corpus.


Usage

ollama run ermwhatesigma420/sigmaAI:80M

Or with the API:

curl http://localhost:11434/api/generate -d '{
  "model": "ermwhatesigma420/sigmaAI:80M",
  "prompt": "Once upon a time",
  "stream": false
}'

Recommended parameters

temperature     0.7     # good balance of creativity vs coherence
repeat_penalty  1.1     # reduces repetitive output

Limitations

  • This is a test model. It was trained to validate the pipeline, not to be a production-quality assistant.
  • At 80M parameters it is capable of coherent text generation but will struggle with complex reasoning, long-range context, and factual accuracy.
  • It does not have instruction tuning or RLHF — it is a base language model that continues text rather than following instructions.
  • Knowledge is limited to whatever was in the training corpus.
  • Like all language models it can generate plausible-sounding but incorrect information.

What Comes Next

This model was step one. The full SigmaAI 221M is currently training on the same hardware with: - 12 layers, d_model=1024, 16 attention heads - ~2.34 billion training tokens - Flash Attention enabled via ROCm 7.2 + AOTriton - The same fully custom training stack, now optimized for speed


Why I Built This

I wanted to understand what it actually takes to train a language model end-to-end — not fine-tune an existing one, not run someone else’s weights, but build the entire thing from the ground up. Every component — the tokenizer, the model architecture, the training loop, the data pipeline — was written and debugged by hand on consumer hardware.

This project is proof that training a real transformer language model does not require a data center. It requires patience, a good GPU, and a lot of debugging.


License

This model was trained and released for personal and research use.
Mostly an school project.