30 2 days ago

An open-weight 4B dense instruct model optimized for French, delivering strong performance, natural dialogue, and robust multilingual usability for local and agent-oriented use.

tools
ollama run jpacifico/chocolatine-2.1

Details

2 days ago

889a0931d5ca · 2.5GB ·

qwen3
·
4.02B
·
Q4_K_M
{{- if or .System .Tools }}<|im_start|>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }
You are a friendly assistant called Chocolatine.
{ "num_ctx": 32768, "presence_penalty": 1.4, "stop": [ "<|im_start|>", "

Readme

Chocolatine-2.1

Chocolatine-2.1 is an open-weight 4B dense instruct model optimized for French, delivering strong performance, natural dialogue, and robust multilingual usability for local and agent-oriented use.
Built through post-training with a strong focus on French, Chocolatine-2.1 is designed to deliver more natural and effective behavior in French while preserving strong multilingual usability. Despite the French-centered training pipeline, English performance remains robust and can even slightly improve over the base model, suggesting positive cross-lingual transfer.

Overview

A compact 4B instruct model derived from Qwen3-4B-Instruct-2507 and post-trained with DPO and model merging.
This is the most practical entry point in the Chocolatine-2 family for users who want a strong quality / speed / footprint trade-off, especially for local inference and lightweight agent loops.

Key characteristics.
- 4B dense instruct model.
- 262K native context.
- optimized for direct, efficient generation.
- strong French benchmark gains relative to the base model.
- robust English performance despite French-focused post-training

Model Variants

chocolatine-2.1:latest is the default recommended variant.
It currently maps to chocolatine-2.1:q4_k_m, which is the most accessible option across a wide range of hardware, including CPU-only setups.

Q4_K_M

Recommended default variant for most users.

ollama run jpacifico/chocolatine-2.1:q4_k_m

Q8_0

Higher-quality variant for users who can afford a larger memory footprint.

ollama run jpacifico/chocolatine-2.1:q8_0

MLX Variants

Additional MLX variants are available in 4bit and 6bit on Hugging Face for Apple Silicon workflows. Ollama also supports MLX-based inference, making these variants relevant for local Apple Silicon use.

Benchmark Results

Chocolatine-2.1 places a strong emphasis on measured performance, not only stylistic adaptation.
The comparison is against its direct base model, Qwen3-4B-Instruct-2507.
Across the reported French benchmarks, Chocolatine-2.1 shows consistent improvements over the base model on a range of capability types. This points to a broad gain in French performance across the evaluated tasks.

Benchmark Base model Chocolatine-2.1 4B Delta
gpqa-fr:diamond 28.93 32.49 +3.56
french_bench_arc_challenge 47.13 49.79 +2.66
french_bench_grammar 70.59 72.27 +1.68
xwinograd_fr 66.27 67.47 +1.20
french_bench_boolqa 88.76 89.89 +1.13
french_bench_hellaswag 56.99 58.03 +1.04
global_mmlu_fr 63.75 64.75 +1.00
fr_mt_bench 6.22 6.44 +0.22

FR-MT-Bench evaluation is performed on MT-Bench-French, using multilingual-mt-bench with OpenAI/GPT-5 as the LLM judge.
global_mmlu_fr, xwinograd_fr and french_bench results were obtained using EleutherAI LM Eval Harness in a 0-shot evaluation setting.
gpqa-fr:diamond using LightEval/vLLM via kurakurai/Luth process eval.

Recommended Use Cases

Chocolatine-2.1 is well suited for:
- French general assistants.
- local chat and private inference.
- RAG and document-grounded assistants.
- automation and agent-oriented workflows.
- Apple Silicon and local-first experimentation.
- compact deployments where latency and cost matter

Limitations

Chocolatine-2.1 does not include a built-in moderation layer.
As with any open-weight instruct model, output quality depends on prompting, evaluation setup, context quality, and deployment safeguards. Additional care is recommended for sensitive or high-risk applications.

License

Apache-2.0

References

Developed by: Jonathan Pacifico, 2026.
Made with ❤️ in France