Franchisco/gpt-120b-extreme-speed

Advanced AI Assistant with integrated reasoning and multi-tool support. Featuring a 32k context window and optimized for precise, chain-of-thought analysis and complex task execution.

Details

Updated 3 months ago

3 months ago

ca2bbe4209cd · 65GB ·

model

archgptoss

parameters117B

quantizationMXFP4

65GB

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "min_p": 0.05, "num_batch": 256, "num_ctx": 32768, "num_gpu": 10, "num_keep":

175B

template

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutof

7.2kB

Optimized Model Description

Summary (Optional) “The Ultimate Coding Powerhouse for Consumer GPUs. Fine-tuned for ClaudeCode & Clawdbot, bringing high-end reasoning to setups with less than 24GB VRAM. Experience 120B-class intelligence on your local machine.”

Overview Stop struggling with massive models that won’t fit your hardware. This model is a precision-engineered version of the 120B architecture, specifically fine-tuned to serve as the perfect foundation model for ClaudeCode and Clawdbot. We have optimized every parameter to ensure users with less than 24GB VRAM can run advanced AI agents locally without sacrificing reasoning depth.

Why This Model? (The Game Changer) Built for Local Agents: Specifically optimized to handle the complex loops and tool-calling requirements of ClaudeCode and Clawdbot.

VRAM Efficient: Engineered for performance on consumer-grade GPUs. If you have a 3090, 4090, or even lower-tier cards with clever quantization, this is your new default.

Elite Reasoning: Unlike standard models, this version excels at “Thinking” before acting, ensuring your autonomous agents don’t get stuck in infinite loops.

Key Features Reasoning-Focused: Built-in support for hierarchical “Thinking” processes (Reasoning: Medium/High) for flawless logical consistency.

Advanced Tool Integration: Native framework for seamless Browser and Python tool execution within agentic workflows.

32k Context Window: Analyze entire codebases or long documentation at once without losing track of the mission.

Structured Multi-Channel Output: Separate internal logic (Analysis/Commentary) from the final code output to maintain clean agent logs.

Technical Specifications Foundation: Fine-tuned 120B Architecture.

Knowledge Cutoff: June 2024.

Context Length: 32,768 tokens.

Optimal Settings: temperature: 0.3, top_p: 0.9 for surgical precision in coding.

How to Use with Agents This model is best suited for:

ClaudeCode / Clawdbot: Use as the backend LLM for autonomous coding tasks.

Chain-of-Thought (CoT): Solving complex architectural bugs that 7B or 70B models miss.

Large Project Refactoring: Leveraging the 32k context for deep structural changes.

Advanced AI Assistant with integrated reasoning and multi-tool support. Featuring a 32k context window and optimized for precise, chain-of-thought analysis and complex task execution.

Details

Readme

Optimized Model Description