Franchisco/gpt-120b-extreme-speed

Advanced AI Assistant with integrated reasoning and multi-tool support. Featuring a 32k context window and optimized for precise, chain-of-thought analysis and complex task execution.

Applications

Claude Code ollama launch claude --model Franchisco/gpt-120b-extreme-speed

OpenClaw ollama launch openclaw --model Franchisco/gpt-120b-extreme-speed

Hermes Agent ollama launch hermes --model Franchisco/gpt-120b-extreme-speed

Codex ollama launch codex --model Franchisco/gpt-120b-extreme-speed

OpenCode ollama launch opencode --model Franchisco/gpt-120b-extreme-speed

Optimized Model Description

Summary (Optional) “The Ultimate Coding Powerhouse for Consumer GPUs. Fine-tuned for ClaudeCode & Clawdbot, bringing high-end reasoning to setups with less than 24GB VRAM. Experience 120B-class intelligence on your local machine.”

Overview Stop struggling with massive models that won’t fit your hardware. This model is a precision-engineered version of the 120B architecture, specifically fine-tuned to serve as the perfect foundation model for ClaudeCode and Clawdbot. We have optimized every parameter to ensure users with less than 24GB VRAM can run advanced AI agents locally without sacrificing reasoning depth.

Why This Model? (The Game Changer) Built for Local Agents: Specifically optimized to handle the complex loops and tool-calling requirements of ClaudeCode and Clawdbot.

VRAM Efficient: Engineered for performance on consumer-grade GPUs. If you have a 3090, 4090, or even lower-tier cards with clever quantization, this is your new default.

Elite Reasoning: Unlike standard models, this version excels at “Thinking” before acting, ensuring your autonomous agents don’t get stuck in infinite loops.

Key Features Reasoning-Focused: Built-in support for hierarchical “Thinking” processes (Reasoning: Medium/High) for flawless logical consistency.

Advanced Tool Integration: Native framework for seamless Browser and Python tool execution within agentic workflows.

32k Context Window: Analyze entire codebases or long documentation at once without losing track of the mission.

Structured Multi-Channel Output: Separate internal logic (Analysis/Commentary) from the final code output to maintain clean agent logs.

Technical Specifications Foundation: Fine-tuned 120B Architecture.

Knowledge Cutoff: June 2024.

Context Length: 32,768 tokens.

Optimal Settings: temperature: 0.3, top_p: 0.9 for surgical precision in coding.

How to Use with Agents This model is best suited for:

ClaudeCode / Clawdbot: Use as the backend LLM for autonomous coding tasks.

Chain-of-Thought (CoT): Solving complex architectural bugs that 7B or 70B models miss.

Large Project Refactoring: Leveraging the 32k context for deep structural changes.

Advanced AI Assistant with integrated reasoning and multi-tool support. Featuring a 32k context window and optimized for precise, chain-of-thought analysis and complex task execution.

Applications

Models

Readme

Optimized Model Description