52 Downloads Updated 3 months ago
ollama run 169pi/alpie-core
Updated 3 months ago
3 months ago
f3f2e41155ee · 20GB ·
Welcome to 169Pi’s Alpie-Core
Alpie-Core is one of the first 4-bit quantized reasoning models, a 32B parameter system developed by 169Pi team that matches or outperforms several full-precision frontier models. Built from the DeepSeek-R1-Distill-Qwen-32B backbone, it represents a major leap in efficient reasoning, sustainable AI, and democratized intelligence, all trained on just 8 NVIDIA Hopper GPUs.
Alpie-Core redefines what’s possible under limited resources by combining LoRA/QLoRA, groupwise-blockwise quantization, and synthetic data distillation, achieving state-of-the-art results on reasoning, coding, and math benchmarks — all while reducing memory footprint by over 75%. Designed for researchers, developers, and enterprises, Alpie-Core brings frontier-level reasoning to accessible, low-compute environments.
Get started
You can get started by downloading or running Alpie-Core with Ollama:
To pull the model:
ollama pull 169pi/alpie-core
To run it instantly:
ollama run 169pi/alpie-core
Alpie-Core can also be integrated programmatically for local or API-based workflows.
Quick Start with SDK
Access Alpie Core through our official Python SDK (pi169) for seamless API integration:
# Install the SDK
pip install pi169
# Set your API key
export ALPIE_API_KEY="your_key_here"
# Start using the CLI
pi169 "Explain 4-bit quantization in simple terms"
SDK Features
Benchmarks
Alpie-Core is built for structured reasoning, step-by-step logic, and factual responses. It achieves outstanding performance across multiple benchmarks:
Feature Highlights
1. Technical Advancements
4-Bit Quantization (NF4): Achieves ∼8GB memory footprint with minimal accuracy loss
128K context length for extended reasoning, and based on your specific use cases
Fine-tunable: Fully customise models to your specific use case through parameter fine-tuning
LoRA + QLoRA Fine-Tuning: Retains reasoning fidelity under low-bit constraints
Groupwise + Blockwise Quantization: Reduces noise, enhances precision at scale
vLLM-based Inference: Enables low-latency and high-throughput deployment
2. API & Integration Ready
OpenAI-Compatible API: Drop-in replacement for GPT endpoints
Function Calling & Tool Use: Supports structured output and dynamic API linking
Streaming Output: Token-by-token real-time response generation
Configurable Guardrails: Safety, moderation, and content filters included
3. Sustainable and Accessible
Runs efficiently on consumer GPUs (16–24GB VRAM)
Up to 75% lower VRAM use vs. FP16 baselines
Significantly reduced carbon and energy footprint
Fully open under the Apache 2.0 License
Quantization
Format: NF4 (NormalFloat 4-bit)
Compression Ratio: 16:1
Technique: QLoRA + Double Quantization
Implementation: bitsandbytes (bnb_4bit_use_double_quant=True)
Inference: Mixed precision (FP16 compute, 4-bit storage)
Minimal reasoning loss (%)
License: Apache 2.0
Use freely for research, customisation, and commercial deployment without copyleft restrictions. Ideal for experimentation, extension, and open collaboration.
More about 169Pi
169Pi Hugging Face
169Pi PyPI Package
169Pi LinkedIn Updates