laguna-xs.2:mlx-bf16

413 yesterday

Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine.

tools thinking
ollama run laguna-xs.2:mlx-bf16

Details

yesterday

10011d1c9084 · 67GB ·

{ "bos_token_id": 2, "do_sample": true, "eos_token_id": [ 2, 24 ], "max_new_tokens": 2048, "pad_toke
{ "bos_token": "〈|EOS|〉", "cls_token": "〈|CLS|〉", "eos_token": "〈|EOS|〉", "mask_token":
{ "version": "1.0", "truncation": null, "padding": null, "added_tokens": [ { "id": 0, "content": "ã€
{ "added_tokens_decoder": { "0": { "content": "〈|UNK|〉", "lstrip": false, "normalized": false, "
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "architectures": [ "LagunaForCausalLM" ], "auto_map": { "AutoConfig": "configuration_laguna.Laguna
600 tensors

Readme

Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.

For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our release blog post.

Highlights

  • Mixed SWA and global attention layout: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
  • KV cache in FP8: KV cache quantized to FP8, reducing memory per token
  • Native reasoning support: Interleaved thinking between tool calls with support for enabling and disabling thinking per-request
  • Local-ready: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. Available on Ollama
  • Apache 2.0 license: Use and modify freely for commercial and non-commercial purposes

Model overview

  • Training: pre-training, post-training and reinforcement learning stages
  • Number of parameters: 33B total with 3B activated per token
  • Optimizer: Muon
  • Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)
  • Experts: 256 experts with 1 shared expert
  • Sliding Window: 512 tokens
  • Modality: text-to-text
  • Context window: 131,072 tokens
  • Reasoning support: interleaved thinking with preserved thinking

Benchmark results

benchmarks

Model Size (total params.) SWE-bench Verified SWE-bench Multilingual SWE-bench Pro (Public Dataset) Terminal-Bench 2.0
Laguna XS.2 33B 68.2% 62.4% 44.5% 30.1%
Devstral Small 2 24B dense 68.0% 55.7% - 22.5%
Gemma 4 31B IT 31B dense 52.0% 51.7% 35.7% 42.9%
Qwen3.5-35B-A3B 35B 69.2% 60.3% 44.6% 40.5%
Qwen3.6-35B-A3B 35B 73.4% 67.2% 49.5% 51.5%
Claude Haiku 4.5 - 73.3% - 39.5% 29.8%
GPT-5.4 Nano - - - 52.4% 46.3%

We used the highest publicly-referenced scores for all comparison models across each benchmark. In almost all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were reported by the Qwen team and Claude Haiku 4.5 where the highest published (verified) scores for SWE-bench Pro and Terminal-Bench 2.0 are from their respective official leaderboards.

Expand for benchmarking methodology All benchmarking for Laguna XS.2 was completed using the Laude Institute’s Harbor Framework with our [agent harness](https://github.com/poolsideai/pool), using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used for all benchmarking: temperature=0.7 and top_k=20. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post. - SWE-bench Verified: mean pass@1 averaged over 4 runs. - SWE-bench Multilingual: mean pass@1 averaged over 7 runs. - SWE-bench Pro: mean pass@1 averaged over 3 runs. - Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.

License

This model is licensed under the Apache 2.0 License.

Intended and Responsible Use

Laguna XS.2 is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna XS.2 is subject to the Apache 2.0 License, and should be used consistently with Poolside’s Acceptable Use Policy. We advise against circumventing Laguna XS.2 safety guardrails without implementing substantially equivalent mitigations appropriate for your use case.

Please report security vulnerabilities or safety concerns to security@poolside.ai.