sppandita85/story-llm/model

sppandita85/ story-llm:latest

16 Downloads Updated 1 week ago

A GPT-style language model built entirely from scratch (no HuggingFace/nanoGPT for the model itself) and trained on 50 children's moral stories. It's an educational reproduction of the full LLM lifecycle at miniature scale.

story-llm:latest ... /

model

2db06ac8b8ab · 2.0MB

Metadata

general.architecture

gpt2

gpt2
general.file_type

F16

F16
gpt2.attention.head_count

4

4
gpt2.attention.layer_norm_epsilon

1e-05

1e-05
gpt2.block_count

4

4
gpt2.context_length

128

128
gpt2.embedding_length

128

128
gpt2.feed_forward_length

512

512
tokenizer.ggml.add_bos_token

false

false
tokenizer.ggml.bos_token_id

1020

1020
tokenizer.ggml.eos_token_id

1020

1020
tokenizer.ggml.merges

[h e, Ġ t, Ġ a, i n, e d, ...]

[h e, Ġ t, Ġ a, i n, e d, ...]
tokenizer.ggml.model

gpt2

gpt2
tokenizer.ggml.padding_token_id

1023

1023
tokenizer.ggml.pre

gpt-2

gpt-2
tokenizer.ggml.token_type

[1, 1, 1, 1, 1, ...]

[1, 1, 1, 1, 1, ...]
tokenizer.ggml.tokens

[Ā, ā, Ă, ă, Ą, ...]

[Ā, ā, Ă, ă, Ą, ...]
tokenizer.ggml.unknown_token_id

1020

1020

Tensor

Name

Type

Shape
token_embd.weight

F16

F16

[128, 1024]

blk.0

blk.0.attn_norm.bias

F32

F32

[128]
blk.0.attn_norm.weight

F32

F32

[128]
blk.0.attn_output.bias

F32

F32

[128]
blk.0.attn_output.weight

F16

F16

[128, 128]
blk.0.attn_qkv.bias

F32

F32

[384]
blk.0.attn_qkv.weight

F16

F16

[128, 384]
blk.0.ffn_down.bias

F32

F32

[128]
blk.0.ffn_down.weight

F16

F16

[512, 128]
blk.0.ffn_norm.bias

F32

F32

[128]
blk.0.ffn_norm.weight

F32

F32

[128]
blk.0.ffn_up.bias

F32

F32

[512]
blk.0.ffn_up.weight

F16

F16

[128, 512]

blk.1

blk.1.attn_norm.bias

F32

F32

[128]
blk.1.attn_norm.weight

F32

F32

[128]
blk.1.attn_output.bias

F32

F32

[128]
blk.1.attn_output.weight

F16

F16

[128, 128]
blk.1.attn_qkv.bias

F32

F32

[384]
blk.1.attn_qkv.weight

F16

F16

[128, 384]
blk.1.ffn_down.bias

F32

F32

[128]
blk.1.ffn_down.weight

F16

F16

[512, 128]
blk.1.ffn_norm.bias

F32

F32

[128]
blk.1.ffn_norm.weight

F32

F32

[128]
blk.1.ffn_up.bias

F32

F32

[512]
blk.1.ffn_up.weight

F16

F16

[128, 512]

blk.2

blk.2.attn_norm.bias

F32

F32

[128]
blk.2.attn_norm.weight

F32

F32

[128]
blk.2.attn_output.bias

F32

F32

[128]
blk.2.attn_output.weight

F16

F16

[128, 128]
blk.2.attn_qkv.bias

F32

F32

[384]
blk.2.attn_qkv.weight

F16

F16

[128, 384]
blk.2.ffn_down.bias

F32

F32

[128]
blk.2.ffn_down.weight

F16

F16

[512, 128]
blk.2.ffn_norm.bias

F32

F32

[128]
blk.2.ffn_norm.weight

F32

F32

[128]
blk.2.ffn_up.bias

F32

F32

[512]
blk.2.ffn_up.weight

F16

F16

[128, 512]

blk.3

blk.3.attn_norm.bias

F32

F32

[128]
blk.3.attn_norm.weight

F32

F32

[128]
blk.3.attn_output.bias

F32

F32

[128]
blk.3.attn_output.weight

F16

F16

[128, 128]
blk.3.attn_qkv.bias

F32

F32

[384]
blk.3.attn_qkv.weight

F16

F16

[128, 384]
blk.3.ffn_down.bias

F32

F32

[128]
blk.3.ffn_down.weight

F16

F16

[512, 128]
blk.3.ffn_norm.bias

F32

F32

[128]
blk.3.ffn_norm.weight

F32

F32

[128]
blk.3.ffn_up.bias

F32

F32

[512]
blk.3.ffn_up.weight

F16

F16

[128, 512]

output_norm.bias

F32

F32

[128]
position_embd.weight

F32

F32

[128, 128]
output_norm.weight

F32

F32

[128]