Applications

Claude Code ollama launch claude --model brnpistone/Qwen3-4B-AgentCoder-q6-k

Codex App ollama launch codex-app --model brnpistone/Qwen3-4B-AgentCoder-q6-k

OpenClaw ollama launch openclaw --model brnpistone/Qwen3-4B-AgentCoder-q6-k

Hermes Agent ollama launch hermes --model brnpistone/Qwen3-4B-AgentCoder-q6-k

Codex ollama launch codex --model brnpistone/Qwen3-4B-AgentCoder-q6-k

OpenCode ollama launch opencode --model brnpistone/Qwen3-4B-AgentCoder-q6-k

🧠 Qwen3-4B-AgentCoder-GGUF

A Quantized, Fine-Tuned Model for Enhanced Tool Calling, Code Generation, and Reasoning

Author: Bruno Pistone
License: Apache 2.0
Base model: Qwen/Qwen3-4B-Thinking-2507

Model Description

Qwen3-4B-AgentCoder-GGUF is a quantized, version of the Qwen3-4B-AgentCoder model, using llama.cpp. This model is optimized for: - 🧮 Complex reasoning tasks - 🧰 Tool calling - 💻 Code generation

The model was developed through sequential fine-tuning, followed by a Direct Preference Optimization (DPO) post-training stage to improve alignment, coherence, and reasoning accuracy.

Highlights

Fine-tuned on three specialized datasets
Retains thinking-mode behavior with long-context reasoning (~264K tokens)
Post-trained with DPO using chosen/rejected pairs for better alignment
Excellent balance between tool use, code generation, and reasoning

🚀 Direct Use

Qwen3-4B-AgentCoder-GGUF can be used directly for: - ✅ Tool calling in complex reasoning tasks - ✅ Code generation for Python, JS, and other languages - ✅ Multi-domain reasoning (math, logic, Q&A)

⚠️ Out-of-Scope Use

❌ Highly sensitive or confidential data
❌ Domains requiring expert-level specialization
❌ Tasks where full explainability is mandatory

🧠 Training Details

Training Procedure

Phase 1 — Supervised Fine-Tuning

Learning rate: 1e-5
Batch size: 4
Gradient accumulation: 4
Epochs: 3
Warmup steps: 100
Weight decay: 0.01
Sequence length: ~2.4K tokens

Training Data

interstellarninja/hermes_reasoning_tool_use (~51K) — multi-turn tool use

Phase 2 — Sequential Fine-Tuning (Supervised)

Learning rate: 3e-5
Batch size: 1
Gradient accumulation: 8
Epochs: 2
Warmup steps: 100
Weight decay: 0.05
Sequence length: ~13K tokens

Training Data

ise-uiuc/Magicoder-OSS-Instruct-75K (~38K) — code generation
open-thoughts/OpenThoughts-114k (~37K) — general reasoning
interstellarninja/hermes_reasoning_tool_use (~30K) — tool use
custom/dpo-toolcode-alignment (~15K) — DPO preference pairs

Phase 3 — Post-Training - Direct Preference Optimization (DPO)

After sequential fine-tuning, the model underwent a DPO phase to enhance response alignment, reasoning robustness, and factual consistency.

Learning rate: 1e-6
Batch size: 2
Gradient accumulation: 8
Epochs: 5
Beta: 0.2
Loss type: sigmoid
Warmup steps: 2
Sequence length: ~1.5K tokens

DPO Data

~1.5K chosen/rejected response pairs
Rejected samples synthetically generated to represent poor or incoherent answers
Chosen samples verified or automatically ranked for quality and correctness

Objective - Encourage the model to prefer chosen completions - Improve clarity, correctness, and helpfulness - Reduce hallucinations and verbosity

📊 Evaluation

The model was evaluated on multiple benchmarks to assess its capabilities across different domains:

Benchmark	Score	Details
HumanEval	72.0%	Base tests
HumanEval+	68.5%	Base + extra tests
GSM8K	82.0%	¹⁰⁸²⁄₁₃₁₉ correct
MMLU	77.7%	¹¹⁹⁰⁄₁₅₃₁ correct (validation split)
Multi-turn Tool Calling	70.0%	⁷⁰⁄₁₀₀ correct

Evaluation Datasets

HumanEval/HumanEval+: openai/openai_humaneval - 164 hand-written programming problems
GSM8K: openai/gsm8k (test split) - 1,319 grade school math word problems
MMLU: cais/mmlu (validation split) - 1,531 multiple-choice questions across 57 subjects
Tool Calling: Custom dataset - 100 tool calling scenarios

Evaluation Factors

Tool calling accuracy
Code generation quality
General reasoning performance
Alignment and factual consistency (post-DPO)

Observations

DPO improved reasoning precision and response coherence
Code generation accuracy increased in structured programming tasks
Reduced non-determinism in multi-step tool use

🖥️ Technical Specifications

Model Architecture

Model type: Causal language model
Parameters: 4.0B
Context length: ~264K tokens
Thinking mode: Enabled

Compute Infrastructure

Hardware - GPU: NVIDIA H100 (80 GB VRAM)
- System RAM: 2 TiB
- Memory per vCPU: 10.67 GiB

Software - Python: 3.12
- Transformers: 4.55.0
- Libraries: bitsandbytes, safetensors, torch, trl, scikit-learn, tokenizers, psutil, py7zr

🧭 Recommendations

Tool use accuracy depends on task complexity
Code generation may occasionally produce minor syntax issues
Reasoning strongest in structured, logical, and mathematical contexts
Avoid using this model for confidential or safety-critical applications

🧠 Qwen3-4B-AgentCoder-GGUF — created by Bruno Pistone
Enhanced reasoning, tool calling, and code generation — refined with DPO alignment

A Quantized, Fine-Tuned Model for Enhanced Tool Calling, Code Generation, and Reasoning

Applications

Models

Readme

🧠 Qwen3-4B-AgentCoder-GGUF

A Quantized, Fine-Tuned Model for Enhanced Tool Calling, Code Generation, and Reasoning

Model Description

Highlights

🚀 Direct Use

⚠️ Out-of-Scope Use

🧠 Training Details

Training Procedure

Phase 1 — Supervised Fine-Tuning

Training Data

Phase 2 — Sequential Fine-Tuning (Supervised)

Training Data

Phase 3 — Post-Training - Direct Preference Optimization (DPO)

DPO Data

📊 Evaluation

Evaluation Datasets

Evaluation Factors

Observations

🖥️ Technical Specifications

Model Architecture

Compute Infrastructure

🧭 Recommendations