116 1 year ago

This is the 110M parameter Llama 2 architecture model trained on the TinyStories dataset

841382748832 · 74B
{
"num_ctx": 1024,
"stop": [
"<|im_start|>",
"<|im_end|>"
]
}