Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.

Details

Updated 4 months ago

4 months ago

968106ba9ec7 · 4.2GB ·

model

archllama

parameters3.93B

quantizationQ8_0

4.2GB

template

"{{ .Prompt }}"

15B

params

{ "num_ctx": 8192, "repeat_penalty": 1.1, "temperature": 0.3, "top_p": 0.9 }

68B

Nanbeige 4.1 Python DeepThink - 3B

Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.

Version: E1 (Experiment 1)
Training Focus: Code accuracy and clean output format
Status: Production-ready for direct code generation tasks

Training Details

Base Model: Nanbeige/Nanbeige4.1-3B
Parameters: 3 billion
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Duration: ~16 hours on RTX 5060 Ti 16GB
Final Accuracy: 87.4% mean token accuracy (up from 76.3% baseline)

Dataset Composition

Total Examples: 45,757 - Python Code (84%): 38,284 examples from Magicoder-OSS-Instruct-75K - Real-world open-source Python code patterns - High-quality instruction-tuned format - Covers algorithms, APIs, data processing, debugging

Mathematical Reasoning (16%): 7,473 examples from GSM8K
- Step-by-step problem solving
- Multi-step logical chains
- Reinforces deep thinking patterns

Training Configuration

Architecture: 3B parameters
LoRA Rank: 16
Trainable Parameters: 28.4M (0.72% of total)
Training Epochs: 1
Batch Size: 2 (effective 8 with gradient accumulation)
Learning Rate: 2e-4 → 9.6e-6 (linear decay)
Optimizer: AdamW 8-bit
Quantization: 4-bit during training, Q8_0/FP16 for inference

Performance Improvements

Metric	Baseline	Fine-tuned	Improvement
Loss	1.04	0.45	-57%
Token Accuracy	76.3%	87.4%	+11.1 pts
Entropy	0.78	0.44	-44%

Key Characteristics (E1)

✅ Direct Output Format
Model provides immediate, clean code responses without verbose preambles

✅ High Code Accuracy
87% token-level accuracy on Python generation tasks

✅ Fast Inference
Optimized for quick responses with temperature 0.3 default

⚠️ Suppressed Chain-of-Thought
E1 training focused on direct answers - <think> reasoning tags are not output
(Reasoning still occurs internally, but thought process is not narrated)

No, the irony is not lost with me… I did have a giggle. Don’t worry, preparing E2.

Best For

Direct Python code generation and completion
Algorithm implementation requiring clean output
Code debugging with concise explanations
Flask, FastAPI, and web development patterns
Data processing and scientific computing
Production codebases requiring deterministic output

Usage

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b

Example prompts:

"Write a Flask endpoint that handles file uploads with error handling"
"Create a Python class for managing Ollama model routing"
"Implement a binary search tree with insertion and traversal methods"
"Fix this code: [paste code]"

Recommended Settings

For code generation (default):

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
  --temperature 0.3 \
  --top-p 0.9 \
  --repeat-penalty 1.1

For creative/exploratory coding:

ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b \
  --temperature 0.7 \
  --top-p 0.95

Model Variants

Available quantizations:

Tag	Quantization	Size	Quality	Use Case
3b-fp16	Full precision	~7.9GB	Maximum	Evaluation, further fine-tuning
3b-q8	8-bit	~4.2GB	Excellent	General use (recommended)
3b	Alias to q8	~4.2GB	Excellent	Default

Usage by variant:

# Maximum quality
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-fp16

# Recommended (best balance)
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b-q8
ollama run fauxpaslife/nanbeige4.1-python-deepthink:3b  # Same as q8

Training Notes

E1 Observations: - Training data contained no chain-of-thought examples - Model learned direct output format from instruction-tuned code datasets - Internal reasoning capability preserved (evidenced by accuracy gains) - Output format optimized for production code generation

Planned E2 Improvements: - Add chain-of-thought examples (30% of dataset) - Preserve <think> tag behavior for complex problem solving - Balance direct output with transparent reasoning - Maintain E1 code quality while adding reasoning transparency

When to Use Base vs Fine-Tuned

Use this model (E1 Python-DeepThink): - Direct code generation needed - Clean, production-ready output required - Fast inference priority

Use base Nanbeige4.1: - Complex problem requiring visible reasoning - Exploring multiple solution approaches - Educational explanations with thought process - Research/debugging requiring transparency

Related Models

fauxpaslife/nanbeige4.1 - Base model (includes <think> reasoning)
fauxpaslife/codellama-python-13b-q6 - Larger Python specialist

License

Same as base model (see Nanbeige4.1-3B license terms)

Training Experiment E1 - First systematic fine-tune of Nanbeige4.1 for Python specialization. Focus on code accuracy and clean output format. Successfully achieved 87% token accuracy with direct response style. E2 will reintroduce chain-of-thought reasoning while maintaining code quality.

Developed by: fauxpaslife
Training Date: February 2026
Hardware: RTX 5060 Ti 16GB
Training Framework: Transformers + PEFT (LoRA)

Fine-tuned version of Nanbeige 4.1 3B specialized for Python code generation with direct, focused output.

Details

Readme

Nanbeige 4.1 Python DeepThink - 3B

Training Details

Dataset Composition

Training Configuration

Performance Improvements

Key Characteristics (E1)

Best For

Usage

Recommended Settings

Model Variants

Usage by variant:

Training Notes

When to Use Base vs Fine-Tuned

Related Models

License