OpenAI gpt-oss

August 5, 2025

welcome OpenAI gpt-oss

Welcome OpenAI’s gpt-oss!

Ollama partners with OpenAI to bring its latest state-of-the-art open weight models to Ollama. The two models, 20B and 120B, bring a whole new local chat experience, and are designed for powerful reasoning, agentic tasks, and versatile developer use cases.

Feature highlights

Quantization - MXFP4 format

OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.

Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.

Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.

20B parameter model

gpt-oss-20b

gpt-oss-20b model is designed for lower latency, local, or specialized use-cases.

120B parameter model

gpt-oss-120b

gpt-oss-120b model is designed for production, general purpose, high reasoning use-cases.

NVIDIA and Ollama collaborate to accelerate gpt-oss on GeForce RTX and RTX PRO GPUs

ollama collaborate

NVIDIA and Ollama are advancing their partnership to boost model performance on NVIDIA GeForce RTX and RTX PRO GPUs. This collaboration enables users on RTX-powered PCs to accurately leverage the capabilities of OpenAI’s gpt-oss model.

We will continue to collaborate and enhance Ollama. In the future, we will publish an in-depth engineering post on the model.

Get started

Get started by downloading the latest Ollama version

The model can be downloaded directly in Ollama’s new app or via the terminal:

ollama run gpt-oss:20b

ollama run gpt-oss:120b

Reference

OpenAI launch blog OpenAI model card NVIDIA RTX blog