Models
Docs
Pricing
Sign in
Download
Models
Download
Docs
Pricing
Sign in
sppandita85
/
story-llm
:latest
16
Downloads
Updated
1 week ago
A GPT-style language model built entirely from scratch (no HuggingFace/nanoGPT for the model itself) and trained on 50 children's moral stories. It's an educational reproduction of the full LLM lifecycle at miniature scale.
A GPT-style language model built entirely from scratch (no HuggingFace/nanoGPT for the model itself) and trained on 50 children's moral stories. It's an educational reproduction of the full LLM lifecycle at miniature scale.
Cancel
story-llm:latest
...
/
model
2db06ac8b8ab · 2.0MB
Metadata
general.architecture
gpt2
gpt2
general.file_type
F16
F16
gpt2.attention.head_count
4
4
gpt2.attention.layer_norm_epsilon
1e-05
1e-05
gpt2.block_count
4
4
gpt2.context_length
128
128
gpt2.embedding_length
128
128
gpt2.feed_forward_length
512
512
tokenizer.ggml.add_bos_token
false
false
tokenizer.ggml.bos_token_id
1020
1020
tokenizer.ggml.eos_token_id
1020
1020
tokenizer.ggml.merges
[h e, Ġ t, Ġ a, i n, e d, ...]
[h e, Ġ t, Ġ a, i n, e d, ...]
tokenizer.ggml.model
gpt2
gpt2
tokenizer.ggml.padding_token_id
1023
1023
tokenizer.ggml.pre
gpt-2
gpt-2
tokenizer.ggml.token_type
[1, 1, 1, 1, 1, ...]
[1, 1, 1, 1, 1, ...]
tokenizer.ggml.tokens
[Ā, ā, Ă, ă, Ą, ...]
[Ā, ā, Ă, ă, Ą, ...]
tokenizer.ggml.unknown_token_id
1020
1020
Tensor
Name
Type
Shape
token_embd.weight
F16
F16
[128, 1024]
blk.0
blk.0.attn_norm.bias
F32
F32
[128]
blk.0.attn_norm.weight
F32
F32
[128]
blk.0.attn_output.bias
F32
F32
[128]
blk.0.attn_output.weight
F16
F16
[128, 128]
blk.0.attn_qkv.bias
F32
F32
[384]
blk.0.attn_qkv.weight
F16
F16
[128, 384]
blk.0.ffn_down.bias
F32
F32
[128]
blk.0.ffn_down.weight
F16
F16
[512, 128]
blk.0.ffn_norm.bias
F32
F32
[128]
blk.0.ffn_norm.weight
F32
F32
[128]
blk.0.ffn_up.bias
F32
F32
[512]
blk.0.ffn_up.weight
F16
F16
[128, 512]
blk.1
blk.1.attn_norm.bias
F32
F32
[128]
blk.1.attn_norm.weight
F32
F32
[128]
blk.1.attn_output.bias
F32
F32
[128]
blk.1.attn_output.weight
F16
F16
[128, 128]
blk.1.attn_qkv.bias
F32
F32
[384]
blk.1.attn_qkv.weight
F16
F16
[128, 384]
blk.1.ffn_down.bias
F32
F32
[128]
blk.1.ffn_down.weight
F16
F16
[512, 128]
blk.1.ffn_norm.bias
F32
F32
[128]
blk.1.ffn_norm.weight
F32
F32
[128]
blk.1.ffn_up.bias
F32
F32
[512]
blk.1.ffn_up.weight
F16
F16
[128, 512]
blk.2
blk.2.attn_norm.bias
F32
F32
[128]
blk.2.attn_norm.weight
F32
F32
[128]
blk.2.attn_output.bias
F32
F32
[128]
blk.2.attn_output.weight
F16
F16
[128, 128]
blk.2.attn_qkv.bias
F32
F32
[384]
blk.2.attn_qkv.weight
F16
F16
[128, 384]
blk.2.ffn_down.bias
F32
F32
[128]
blk.2.ffn_down.weight
F16
F16
[512, 128]
blk.2.ffn_norm.bias
F32
F32
[128]
blk.2.ffn_norm.weight
F32
F32
[128]
blk.2.ffn_up.bias
F32
F32
[512]
blk.2.ffn_up.weight
F16
F16
[128, 512]
blk.3
blk.3.attn_norm.bias
F32
F32
[128]
blk.3.attn_norm.weight
F32
F32
[128]
blk.3.attn_output.bias
F32
F32
[128]
blk.3.attn_output.weight
F16
F16
[128, 128]
blk.3.attn_qkv.bias
F32
F32
[384]
blk.3.attn_qkv.weight
F16
F16
[128, 384]
blk.3.ffn_down.bias
F32
F32
[128]
blk.3.ffn_down.weight
F16
F16
[512, 128]
blk.3.ffn_norm.bias
F32
F32
[128]
blk.3.ffn_norm.weight
F32
F32
[128]
blk.3.ffn_up.bias
F32
F32
[512]
blk.3.ffn_up.weight
F16
F16
[128, 512]
output_norm.bias
F32
F32
[128]
position_embd.weight
F32
F32
[128, 128]
output_norm.weight
F32
F32
[128]