323 3 days ago

I’m excited to share a new antigenic fine-tune of Gemma-4-12B designed specifically for tool-calling and raw reasoning loops on Apple Silicon.

tools thinking
ollama run cyborgxx101/gemma-4-12b-opus-finetuned-mlx:8bit

Details

3 days ago

ab9ed3fe14b6 · 13GB ·

gemma4
·
11.9B
·
Q8_0
<|start|>turn|user|{{ .Prompt }}<|channel|>assistant\n{{ .Response }}<|end|>
<|think|>\nYou are a helpful AI assistant. For every request, first think step by step, then provide
{ "num_ctx": 8192, "stop": [ "<|end|>", "Task complete." ], "tempera

Readme


license: gemma base_model: mlx-community/gemma-4-12b-it-bf16 tags: - mlx - gemma - code - reasoning - agent - finetuned language: - en pipeline_tag: text-generation

library_name: transformers

gemma-4-12b-opus-finetuned-mlx

Fine-tuned mlx-community/gemma-4-12b-it-bf16 on Claude Opus 4.64.7 reasoning data with R1-distilled code examples.

Variants

  • cyborgxx101/gemma-4-12b-opus-finetuned-mlx (16bit, 22GB)
  • cyborgxx101/gemma-4-12b-opus-finetuned-mlx-8bit (~12GB)
  • cyborgxx101/gemma-4-12b-opus-finetuned-mlx-4bit (~7GB)

Inference

With oMLX (recommended for Apple Silicon)

”`bash omlx serve curl http://localhost:8000/v1/chat/completions
-H “Content-Type: application/json”
-d ‘{“model”: “cyborgxx101/gemma-4-12b-opus-finetuned-mlx-8bit”, “messages”: [{“role”: “user”, “content”: “Analyze the database schema and construct an optimized multi-join query.”}]}’