AeroCorp/
afm:expert_24_testing_qa

49 2 months ago

The African Foundation Model (AFM) is a state-of-the-art language model specifically designed for African contexts, languages, and use cases. Built with the latest transformer optimizations from 2025 research. AFM combines power with efficiency.

tools

2 months ago

3f61409ff8b3 · 2.0GB ·

llama
·
3.21B
·
Q4_K_M
<|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 {{ if .System }}{{
LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreemen
**Llama 3.2** **Acceptable Use Policy** Meta is committed to promoting safe and fair use of its tool
You are AFM Expert 24, specializing in Software testing and quality assurance expert. You combine de
{ "repeat_penalty": 1.1, "stop": [ "<|start_header_id|>", "<|end_header_id|>

Readme

African Foundation Model (AFM) The African Foundation Model (AFM) is a state-of-the-art language model specifically designed for African contexts, languages, and use cases. Built with the latest transformer optimizations from 2025 research (DeepSeek V3, Llama 4), AFM combines power with efficiency.

Python 3.11+ PyTorch 2.4+ CUDA 13.0 License: Apache 2.0

🌍 Overview The African Foundation Model (AFM) is a state-of-the-art language model specifically designed for African contexts, languages, and use cases. Built with the latest transformer optimizations from 2025 research (DeepSeek V3, Llama 4), AFM combines power with efficiency.

Key Features ✨ Cutting-Edge 2025 Architecture

MoE (Mixture of Experts) - 8x capacity with minimal overhead iRoPE (Interleaved RoPE) - 256K training → 10M+ inference context MLA (Multi-head Latent Attention) - 75% KV cache reduction MTP (Multi-Token Prediction) - 5-10% better reasoning Flash Attention 3 - 3-5x faster training FP8 Training - 2x speedup, 50% less memory RMSNorm - Faster and more stable than LayerNorm SwiGLU Activation - Superior to GELU/ReLU GQA (Grouped Query Attention) - 3x smaller KV cache 🚀 Performance

246M base parameters → 1.2B total capacity (MoE) 30-50M active params per token 256K training context → 10M+ inference context 75% smaller KV cache (MLA) 3-5x faster training vs standard transformers 2-3x faster inference with speculative decoding 🌐 African Focus

✅ 60 Expert Models Configured Category Count Examples Languages 6 African, Asian, European, Middle Eastern, Indigenous, Sign Code 4 Python, Web, Systems, Mobile Science 4 Physics, Chemistry, Biology, Mathematics Medical & Legal 4 Diagnosis, Research, Contracts, Compliance Finance & Business 4 Analysis, Accounting, Strategy, Marketing Vision & Audio 6 Medical Vision, Autonomous, Industrial, Transcription, Synthesis, Analysis Advanced Tech 4 Cybersecurity, Logical Reasoning, Cloud Architecture, AI Ethicsafm_logo_transparent.png