This is a Qwen 2 based LLM that I trained on my own Laptop. Pre-trained with C4 and fine-tuned with dolly 15k

Bootstrap LLM

Introduction

Ever since I released my first Qwen2 based model several weeks ago I’ve taken what I’ve learned and attempted to create a new model that has been pre-trained more thoroughly and on a more diverse dataset. I settled on using the unfiltered version of the english subset of c4 with entries being shuffled in batches of 1000 in an effort to deviate away from continuous streams of related training data. As for fine-tuning I initially opted to use agentlans/multiturn-chat because of the large amounts of examples they had over databricks/databricks-dolly-15k however I reverted back to dolly-15k due to the verbosity of the conversations in multiturn chat which wasn’t the best suited for a short 1024-token context model.

Model Details

Model Name: Bootstrap LLM
Architecture: Qwen2-based
Context: 1024 Tokens
Vocab Size: 50,262 tokens
Qwen2 Specific: Hidden size of 768, 6 layers, 6 heads

Training Details

GPU: NVIDIA GeForce RTX 4070 Laptop GPU
Cuda: CUDA was used during pre-training and fine-tuning.
VRAM: 8 GB

Like my previous model the AllenAI C4 English dataset was used for pre-training with the key difference being that I used the “en.noblocklist” subset for more diversity. Instead of creating my own tokenizer I opted instead to using the internal tokenizer of GPT-2 because it saved me a lot of extra computation and was proven in real world examples to be effective. The model was trained on 280 thousand steps with 1024 token context, at a per device training batch size of 4, and 4 gradient accumulation steps. Pre-training took about 60 hours with the GPU overclocked to its maximum capacity. Post-training involved 5 epochs of databricks/databricks-dolly-15k formatted in ChatML.

HuggingFace URL: https://huggingface.co/TheOneWhoWill/Bootstrap-LLM

This is a Qwen 2 based LLM that I trained on my own Laptop. Pre-trained with C4 and fine-tuned with dolly 15k

Readme

Bootstrap LLM

Introduction

Model Details

Training Details