612 2 months ago

20b MXFP4 GPT-oss model stripped of the built in tools

tools thinking

2 months ago

7e0c17b5b7de · 14GB ·

gptoss
·
20.9B
·
MXFP4
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutof
{ "min_p": 0, "stop": [ "<|call|>", "<|endoftext|>" ], "temperature"

Readme


Latest Fix : 2025/08/09

  • The error
    • Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-bc4d52a46e1d89088ff3cbb4be21a7c99f0bb68b53514d7d50679c9f07e33a41
  • Use updated version of ollama
  • Fixed the quantization type was F16 for this model which caused issues so with some work changed it to MXFP4 ,
  • Template remains unchanged from before so no clutter of builtin tools like browser search or python taking priority over yours

Any Issues pls feel free to contact me through linkedin https://linkedin.com/in/mashriram

Hugging face is also great place to discuss https://huggingface.co/mashriram

My sincere apologies and gratitude for each of the ones who tried this model ,Thanks a lot for supporting me Hope you have a great use of the model

gpt-oss-20b:Regular Developer Edition

This repository provides a custom Ollama Modelfile for OpenAI’s gpt-oss-20b, the state-of-the-art open weight model designed for powerful reasoning and agentic tasks.

This version is specifically configured to be an “unfettered” developer edition. It delivers the raw agentic capabilities of the model by removing the built-in, hardcoded tools (browser, python). This gives you, the developer, complete control and transparency over tool implementation.

Philosophy: Why Use This Version?

The official gpt-oss model from OpenAI is fantastic, but it comes with pre-packaged tools. This version is for developers who need more control:

  • Total Control: You are not locked into a specific browser or Python implementation. You can define your own tools from scratch and have the model call them. Want a browse tool that uses Selenium instead of a simple GET request? You can build it. Need a Python sandbox with specific libraries? It’s all up to you.
  • Maximum Flexibility: By exposing the model’s native, proprietary tool-calling format, you can build complex, multi-tool agents without any black boxes. You see the full chain-of-thought and every tool interaction exactly as the model generates it.
  • Transparency: Without built-in tools, you have a clearer view of the model’s core reasoning capabilities. This is ideal for research, fine-tuning, and debugging agentic behavior.

Core Features (Inherited from gpt-oss)

This model retains all the powerful features of the original gpt-oss release:

  • Powerful Agentic Capabilities: Native support for function calling using a unique, proprietary format.
  • Full Chain-of-Thought: Gain complete access to the model’s reasoning process (analysis channel) for easier debugging and increased trust.
  • Configurable Reasoning Effort: The model’s template supports adjusting reasoning effort (though this Modelfile defaults to medium).
  • Fine-Tunable: The base model is perfect for further parameter-efficient or full fine-tuning for your specific use case.
  • Permissive Apache 2.0 License: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

Run this model with:

ollama run mashriram/gpt-oss-Regular

Screenshot from 2025-08-09 22-29-48.png

How to Use Tool Calling

This model uses a unique format for tool calling. Your application needs to be able to parse the model’s output and provide responses in the correct format.

Example Flow: Using a custom get_weather tool

  1. Initial Request (Your App -> Ollama) You provide the tool definitions in the .Tools field of your API request.

  2. Model’s Response (Ollama -> Your App) The model will first output its reasoning, then the tool call.

    <|start|>assistant<|channel|>analysis<|message|>The user is asking for the weather in a specific location. I need to use the `get_weather` tool.<|end|>
    <|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{"location": "San Francisco"}<|call|>
    
  3. Tool Execution (Your App) Your code parses the to=functions.get_weather and the arguments {"location": "San Francisco"}. You execute your function and get a result, e.g., {"temperature": "65F", "conditions": "Foggy"}.

  4. Tool Response (Your App -> Ollama) You send the result back to the model, specifying the tool name.

  5. Final Answer (Ollama -> Your App) The model processes the tool result and generates a final answer.

    <|start|>assistant<|channel|>final<|message|>The weather in San Francisco is currently 65°F and Foggy.<|end|>
    

Technical Details

OpenAI uses MXFP4 quantization for the MoE weights in gpt-oss models. This Modelfile uses the raw GGUF, and Ollama’s engine supports this format natively, ensuring the highest possible quality without additional conversions.

License

The base gpt-oss model is licensed under the Apache 2.0 license, granting permissive use for commercial and private applications.