35 1 month ago

Alpie-Core brings together high reasoning accuracy, sustainable compute efficiency, and open accessibility, redefining what’s possible with 4-bit, high-performance AI.

thinking

Models

View all →

Readme

169pi_cover.jpg

Welcome to 169Pi’s Alpie-Core

Alpie-Core is one of the first 4-bit quantized reasoning models, a 32B parameter system developed by 169Pi team that matches or outperforms several full-precision frontier models. Built from the DeepSeek-R1-Distill-Qwen-32B backbone, it represents a major leap in efficient reasoning, sustainable AI, and democratized intelligence, all trained on just 8 NVIDIA Hopper GPUs.

Alpie-Core redefines what’s possible under limited resources by combining LoRA/QLoRA, groupwise-blockwise quantization, and synthetic data distillation, achieving state-of-the-art results on reasoning, coding, and math benchmarks — all while reducing memory footprint by over 75%. Designed for researchers, developers, and enterprises, Alpie-Core brings frontier-level reasoning to accessible, low-compute environments.

Get started

You can get started by downloading or running Alpie-Core with Ollama:

To pull the model:

ollama pull 169pi/alpie-core

To run it instantly:

ollama run 169pi/alpie-core

Alpie-Core can also be integrated programmatically for local or API-based workflows.

Benchmarks

Alpie-Core is built for structured reasoning, step-by-step logic, and factual responses. It achieves MMLU 81.28% | GSM8K 92.75% | BBH 85.12% | SWE-Bench Verified 57.8% | SciQ 98.0% | HumanEval 57.23% :

SWE Bench Verified - Accuracy Comparison.png

Test (1).png

Feature Highlights

1. Technical Advancements

  • 4-Bit Quantization (NF4): Achieves ∼8GB memory footprint with minimal accuracy loss
  • 128K context length for extended reasoning, and based on your specific use cases
  • Fine-tunable: Fully customise models to your specific use case through parameter fine-tuning.
  • LoRA + QLoRA Fine-Tuning: Retains reasoning fidelity under low-bit constraints
  • Groupwise + Blockwise Quantization: Reduces noise, enhances precision at scale
  • vLLM-based Inference: Enables low-latency and high-throughput deployment

2. API & Integration Ready

  • OpenAI-Compatible API: Drop-in replacement for GPT endpoints
  • Function Calling & Tool Use: Supports structured output and dynamic API linking
  • Streaming Output: Token-by-token real-time response generation
  • Configurable Guardrails: Safety, moderation, and content filters included

3. Sustainable and Accessible

  • Runs efficiently on consumer GPUs (16–24GB VRAM)
  • Up to 75% lower VRAM use vs. FP16 baselines
  • Significantly reduced carbon and energy footprint
  • Fully open under the Apache 2.0 License

download (2).png

Quantization

  • Format: NF4 (NormalFloat 4-bit)
  • Compression Ratio: 16:1
  • Technique: QLoRA + Double Quantization
  • Implementation: bitsandbytes (bnb_4bit_use_double_quant=True)
  • Inference: Mixed precision (FP16 compute, 4-bit storage)
  • Minimal reasoning loss (%)

License: Apache 2.0

Use freely for research, customisation, and commercial deployment without copyleft restrictions. Ideal for experimentation, extension, and open collaboration.

More about 169Pi

169Pi Hugging Face

169Pi LinkedIn Updates