No models have been pushed.
aquif-3 is a lightweight, high-efficiency and ultra-powerful mixture-of-experts model. Using a brand new Mamba-2 hybrid-recurrent architecture, it shows powerful reasoning capabilities and activates only ~1B parameters per forward pass while still delivering competitive results across multiple benchmarks.
aquif-3-preview
aquif MoE delivers strong performance despite its minimal activation size:
Benchmark | aquif-3.0-preview-2 (2.5B active) | aquif-3-preview (1B active) |
---|---|---|
MMLU | 55.9 | 60.4 |
HumanEval | 80.5 | 82.4 |
GSM8K | 72.5 | 70.1 |
Average | 69.6 | 71.0 |
These results reflect internal evaluations on representative test sets. Final scores may vary slightly in public benchmarks.
To enhance reasoning, activate “thinking mode” with the following control message before your prompt:
{
"role": "control",
"content": "thinking"
}
or you can use thinking
as True
on your HuggingFace code.
input_ids = tokenizer.apply_chat_template(
conv,
return_tensors="pt",
thinking=True,
return_dict=True,
add_generation_prompt=True
).to(device)
This enables internal self-reflection logic and improves multi-step task accuracy.
To run via Huggingface, you need to install IBM’s granitemoe_hybrid_external_cleanup
branch instead of regular HF transformers
, as aquif-3-preview is a finetune of Granite-4.0-Tiny-Base:
git clone https://github.com/Ssukriti/transformers.git
cd transformers
git checkout granitemoe_hybrid_external_cleanup
pip install -e .
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch
model_path="aquiffoo/aquif-3-preview"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map=device,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
model_path
)
conv = [{"role": "user", "content":"Hi!"}]
input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=False, return_dict=True, add_generation_prompt=True).to(device)
set_seed(42)
output = model.generate(
**input_ids,
max_new_tokens=8192,
)
prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)
The future of aquif AI leans towards both dense and Mixture of Experts models, which are smarter and more efficient for inference. We can’t wait to see what you are going to create with aquif-3.