269 Downloads Updated 3 weeks ago
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-q8
Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.
Version: E1 (Experiment 1)
Training Focus: Code accuracy and clean output format
Status: Production-ready for direct code generation tasks
Base Model: Nanbeige/Nanbeige4.1-3B
Parameters: 3 billion
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Duration: ~16 hours on RTX 5060 Ti 16GB
Final Accuracy: 87.4% mean token accuracy (up from 76.3% baseline)
Total Examples: 45,757 - Python Code (84%): 38,284 examples from Magicoder-OSS-Instruct-75K - Real-world open-source Python code patterns - High-quality instruction-tuned format - Covers algorithms, APIs, data processing, debugging
Architecture: 3B parameters
LoRA Rank: 16
Trainable Parameters: 28.4M (0.72% of total)
Training Epochs: 1
Batch Size: 2 (effective 8 with gradient accumulation)
Learning Rate: 2e-4 → 9.6e-6 (linear decay)
Optimizer: AdamW 8-bit
Quantization: 4-bit during training, Q8_0/FP16 for inference
| Metric | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| Loss | 1.04 | 0.45 | -57% |
| Token Accuracy | 76.3% | 87.4% | +11.1 pts |
| Entropy | 0.78 | 0.44 | -44% |
✅ Direct Output Format
Model provides immediate, clean code responses without verbose preambles
✅ High Code Accuracy
87% token-level accuracy on Python generation tasks
✅ Fast Inference
Optimized for quick responses with temperature 0.3 default
⚠️ Suppressed Chain-of-Thought
E1 training focused on direct answers - <think> reasoning tags are not output
(Reasoning still occurs internally, but thought process is not narrated)
No, the irony is not lost with me… I did have a giggle. Don’t worry, preparing E2.
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b
Example prompts:
"Write a Flask endpoint that handles file uploads with error handling"
"Create a Python class for managing Ollama model routing"
"Implement a binary search tree with insertion and traversal methods"
"Fix this code: [paste code]"
For code generation (default):
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
--temperature 0.3 \
--top-p 0.9 \
--repeat-penalty 1.1
For creative/exploratory coding:
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
--temperature 0.7 \
--top-p 0.95
Available quantizations:
| Tag | Quantization | Size | Quality | Use Case |
|---|---|---|---|---|
| 3b-fp16 | Full precision | ~7.9GB | Maximum | Evaluation, further fine-tuning |
| 3b-q8 | 8-bit | ~4.2GB | Excellent | General use (recommended) |
| 3b | Alias to q8 | ~4.2GB | Excellent | Default |
# Maximum quality
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-fp16
# Recommended (best balance)
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-q8
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b # Same as q8
E1 Observations: - Training data contained no chain-of-thought examples - Model learned direct output format from instruction-tuned code datasets - Internal reasoning capability preserved (evidenced by accuracy gains) - Output format optimized for production code generation
Planned E2 Improvements:
- Add chain-of-thought examples (30% of dataset)
- Preserve <think> tag behavior for complex problem solving
- Balance direct output with transparent reasoning
- Maintain E1 code quality while adding reasoning transparency
Use this model (E1 Python-DeepThink): - Direct code generation needed - Clean, production-ready output required - Fast inference priority
Use base Nanbeige4.1: - Complex problem requiring visible reasoning - Exploring multiple solution approaches - Educational explanations with thought process - Research/debugging requiring transparency
<think> reasoning)Same as base model (see Nanbeige4.1-3B license terms)
Training Experiment E1 - First systematic fine-tune of Nanbeige4.1 for Python specialization. Focus on code accuracy and clean output format. Successfully achieved 87% token accuracy with direct response style. E2 will reintroduce chain-of-thought reasoning while maintaining code quality.
Developed by: fauxpaslife
Training Date: February 2026
Hardware: RTX 5060 Ti 16GB
Training Framework: Transformers + PEFT (LoRA)