116 1 year ago

This is the 110M parameter Llama 2 architecture model trained on the TinyStories dataset