north-mini-code-1.0:mlx-bf16

North Mini Code is Cohere's first model for developers — a 30B Mixture-of-Experts model with 3B active parameters, built for agentic software engineering.

Details

Updated 7 hours ago

7 hours ago

70520154ae45 · 61GB ·

json

{ "architectures": [ "Cohere2MoeForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "

2.3kB

json

{ "_from_model_config": true, "bos_token_id": 2, "eos_token_id": 255001, "pad_token_id": 0, "transfo

136B

json

{ "bos_token": { "content": "<BOS_TOKEN>", "lstrip": false, "normalized": false, "rstrip": false, "s

672B

json

{ "version": "1.0", "truncation": null, "padding": null, "added_tokens": [ { "id": 0, "content": "<P

28MB

json

{ "add_bos_token": true, "add_eos_token": false, "add_prefix_space": false, "clean_up_tokenization_s

9.0kB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "temperature": 1, "top_p": 0.95 }

31B

346 tensors

61GB

North Mini Code is the first model in Cohere’s new family of models, and is specifically designed and trained for agentic software engineering tasks.

Agentic coding focus, post-trained with two-stage supervised fine-tuning followed by reinforcement learning with verifiable rewards (RLVR) on real-world software engineering and terminal tasks.
256K context length with up to 64K output tokens, optimized for repository-scale understanding and long-horizon agent trajectories.
Trained across multiple agent harnesses (SWE-Agent, mini-SWE-agent, OpenCode, Terminus 2) for robustness in real-world tooling environments rather than a single scaffold.
Native tool-use and interleaved thinking support, designed to plug into coding agents like OpenCode.

On Artificial Analysis’ Coding Index, North Mini Code scores 33.4, outperforming similarly sized open models like Qwen3.5 (35B-A3B), Gemma 4 (26B-A4B), and Devstral Small 2 (24B), as well as substantially larger models including Nemotron 3 Super (120B-A12B), Mistral Small 4 (119B-A6B), and Devstral 2 (123B).

Architecture

North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model. It interleaves sliding-window attention (with RoPE) and global attention (with no positional embeddings) in a 3:1 ratio. The feed-forward block is an MoE block with 128 experts, 8 of which are activated per token, each using SwiGLU activation. The router applies a sigmoid activation before top-k selection, and a single dense layer precedes the sparse layers.

Tool use

North Mini Code is trained for tool use and agentic coding, and supports interleaved thinking — it works best with thinking enabled. For best performance, pass model-generated thinking content forward to subsequent agentic steps and chat turns. Tool descriptions are best provided as JSON schema.

License

North Mini Code is released under the Apache 2.0 license, and also requires adhering to Cohere Lab’s Acceptable Use Policy.

North Mini Code is Cohere's first model for developers — a 30B Mixture-of-Experts model with 3B active parameters, built for agentic software engineering.

Details

Readme

Architecture

Tool use

License

Reference