133 Downloads Updated 4 months ago
Source: HF
Qwen2.5 brings the following improvements upon Qwen2:
This repo contains the instruction-tuned 0.5B Qwen2.5 model in the GGUF Format, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens
For more details, please refer to our blog, GitHub, and Documentation.