281 7 hours ago

North Mini Code is Cohere's first model for developers — a 30B Mixture-of-Experts model with 3B active parameters, built for agentic software engineering.

tools thinking
ollama run north-mini-code-1.0:mlx-bf16

Details

7 hours ago

70520154ae45 · 61GB ·

{ "architectures": [ "Cohere2MoeForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "
{ "_from_model_config": true, "bos_token_id": 2, "eos_token_id": 255001, "pad_token_id": 0, "transfo
{ "bos_token": { "content": "<BOS_TOKEN>", "lstrip": false, "normalized": false, "rstrip": false, "s
{ "version": "1.0", "truncation": null, "padding": null, "added_tokens": [ { "id": 0, "content": "<P
{ "add_bos_token": true, "add_eos_token": false, "add_prefix_space": false, "clean_up_tokenization_s
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "temperature": 1, "top_p": 0.95 }
346 tensors

Readme

North Mini Code is the first model in Cohere’s new family of models, and is specifically designed and trained for agentic software engineering tasks.

Benchmark

  • Agentic coding focus, post-trained with two-stage supervised fine-tuning followed by reinforcement learning with verifiable rewards (RLVR) on real-world software engineering and terminal tasks.
  • 256K context length with up to 64K output tokens, optimized for repository-scale understanding and long-horizon agent trajectories.
  • Trained across multiple agent harnesses (SWE-Agent, mini-SWE-agent, OpenCode, Terminus 2) for robustness in real-world tooling environments rather than a single scaffold.
  • Native tool-use and interleaved thinking support, designed to plug into coding agents like OpenCode.

On Artificial Analysis’ Coding Index, North Mini Code scores 33.4, outperforming similarly sized open models like Qwen3.5 (35B-A3B), Gemma 4 (26B-A4B), and Devstral Small 2 (24B), as well as substantially larger models including Nemotron 3 Super (120B-A12B), Mistral Small 4 (119B-A6B), and Devstral 2 (123B).

Architecture

North Mini Code is a decoder-only Transformer-based sparse Mixture-of-Experts model. It interleaves sliding-window attention (with RoPE) and global attention (with no positional embeddings) in a 3:1 ratio. The feed-forward block is an MoE block with 128 experts, 8 of which are activated per token, each using SwiGLU activation. The router applies a sigmoid activation before top-k selection, and a single dense layer precedes the sparse layers.

Tool use

North Mini Code is trained for tool use and agentic coding, and supports interleaved thinking — it works best with thinking enabled. For best performance, pass model-generated thinking content forward to subsequent agentic steps and chat turns. Tool descriptions are best provided as JSON schema.

License

North Mini Code is released under the Apache 2.0 license, and also requires adhering to Cohere Lab’s Acceptable Use Policy.

Reference