269 3 weeks ago

Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.

3b
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b

Details

3 weeks ago

968106ba9ec7 · 4.2GB ·

llama
·
3.93B
·
Q8_0
"{{ .Prompt }}"
{ "num_ctx": 8192, "repeat_penalty": 1.1, "temperature": 0.3, "top_p": 0.9 }

Readme

kitsune labs long.jpg

Nanbeige 4.1 Python DeepThink - 3B

Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.

Version: E1 (Experiment 1)
Training Focus: Code accuracy and clean output format
Status: Production-ready for direct code generation tasks

Training Details

Base Model: Nanbeige/Nanbeige4.1-3B
Parameters: 3 billion
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Duration: ~16 hours on RTX 5060 Ti 16GB
Final Accuracy: 87.4% mean token accuracy (up from 76.3% baseline)

Dataset Composition

Total Examples: 45,757 - Python Code (84%): 38,284 examples from Magicoder-OSS-Instruct-75K - Real-world open-source Python code patterns - High-quality instruction-tuned format - Covers algorithms, APIs, data processing, debugging

  • Mathematical Reasoning (16%): 7,473 examples from GSM8K
    • Step-by-step problem solving
    • Multi-step logical chains
    • Reinforces deep thinking patterns

Training Configuration

Architecture: 3B parameters
LoRA Rank: 16
Trainable Parameters: 28.4M (0.72% of total)
Training Epochs: 1
Batch Size: 2 (effective 8 with gradient accumulation)
Learning Rate: 2e-4 → 9.6e-6 (linear decay)
Optimizer: AdamW 8-bit
Quantization: 4-bit during training, Q8_0/FP16 for inference

Performance Improvements

Metric Baseline Fine-tuned Improvement
Loss 1.04 0.45 -57%
Token Accuracy 76.3% 87.4% +11.1 pts
Entropy 0.78 0.44 -44%

Key Characteristics (E1)

Direct Output Format
Model provides immediate, clean code responses without verbose preambles

High Code Accuracy
87% token-level accuracy on Python generation tasks

Fast Inference
Optimized for quick responses with temperature 0.3 default

⚠️ Suppressed Chain-of-Thought
E1 training focused on direct answers - <think> reasoning tags are not output
(Reasoning still occurs internally, but thought process is not narrated)

No, the irony is not lost with me… I did have a giggle. Don’t worry, preparing E2.

Best For

  • Direct Python code generation and completion
  • Algorithm implementation requiring clean output
  • Code debugging with concise explanations
  • Flask, FastAPI, and web development patterns
  • Data processing and scientific computing
  • Production codebases requiring deterministic output

Usage

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b

Example prompts:

"Write a Flask endpoint that handles file uploads with error handling"
"Create a Python class for managing Ollama model routing"
"Implement a binary search tree with insertion and traversal methods"
"Fix this code: [paste code]"

Recommended Settings

For code generation (default):

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
  --temperature 0.3 \
  --top-p 0.9 \
  --repeat-penalty 1.1

For creative/exploratory coding:

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
  --temperature 0.7 \
  --top-p 0.95

Model Variants

Available quantizations:

Tag Quantization Size Quality Use Case
3b-fp16 Full precision ~7.9GB Maximum Evaluation, further fine-tuning
3b-q8 8-bit ~4.2GB Excellent General use (recommended)
3b Alias to q8 ~4.2GB Excellent Default

Usage by variant:

# Maximum quality
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-fp16

# Recommended (best balance)
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-q8
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b  # Same as q8

Training Notes

E1 Observations: - Training data contained no chain-of-thought examples - Model learned direct output format from instruction-tuned code datasets - Internal reasoning capability preserved (evidenced by accuracy gains) - Output format optimized for production code generation

Planned E2 Improvements: - Add chain-of-thought examples (30% of dataset) - Preserve <think> tag behavior for complex problem solving - Balance direct output with transparent reasoning - Maintain E1 code quality while adding reasoning transparency

When to Use Base vs Fine-Tuned

Use this model (E1 Python-DeepThink): - Direct code generation needed - Clean, production-ready output required - Fast inference priority

Use base Nanbeige4.1: - Complex problem requiring visible reasoning - Exploring multiple solution approaches - Educational explanations with thought process - Research/debugging requiring transparency

Related Models

License

Same as base model (see Nanbeige4.1-3B license terms)


Training Experiment E1 - First systematic fine-tune of Nanbeige4.1 for Python specialization. Focus on code accuracy and clean output format. Successfully achieved 87% token accuracy with direct response style. E2 will reintroduce chain-of-thought reasoning while maintaining code quality.

Developed by: fauxpaslife
Training Date: February 2026
Hardware: RTX 5060 Ti 16GB
Training Framework: Transformers + PEFT (LoRA)

image.png