KexityAI/kex1.5

Applications

Claude Code ollama launch claude --model KexityAI/kex1.5

OpenClaw ollama launch openclaw --model KexityAI/kex1.5

Hermes Agent ollama launch hermes --model KexityAI/kex1.5

Codex ollama launch codex --model KexityAI/kex1.5

OpenCode ollama launch opencode --model KexityAI/kex1.5

Kex 1.5 is Kexity AI’s second generation of flagship TLMs for efficient local inference. Kex 1.5 supports tool calling, thinking, and features token-efficient thinking and reasoning for compute-constrained environments.

Use Case

This model is for customers with extremely constrained compute or low-latency applications. Kex 1.5 punches above it’s weight in agentic use-cases, and is useful for tasks such as the following:

Agents running on edge/IoT devices with less than 1 GB of RAM
Low-latency chatbots and agents for environments where speed matters

Speed Comparison

Model	Thinking Token Usage
Kex 1.5 0.6B	52 tokens (on average)
Qwen3 0.6B	752 tokens (on average)

Examples

1. Python

Quick example of a Kex 1.5 tool calling setup

# example usage:
# Prompt: What is the weather today?
# Calling: get_weather({'city': 'paris'})
# Response: It's a sunny 22°C in Paris. Would you like advice on how to protect yourself outdoors?

import json
from rich import print
from ollama import chat
model = 'KexityAI/kex1.5'

def get_weather(city: str) -> str:
  return json.dumps({'city': city, 'temperature': 22, 'unit': 'celsius', 'condition': 'sunny'})

messages = [{"role": "system", "content": "You are a helpful general purpose assistant. The user is located in Paris."}]

while True:
  prompt = input("Prompt: ")

  if prompt == "":
    break

  messages.append({"role": "user", "content": prompt})

  response = chat(model, messages=messages, tools=[get_weather])

  if response.message.tool_calls:
    tool = response.message.tool_calls[0]
    print(f'Calling: {tool.function.name}({tool.function.arguments})')

    result = get_weather(**tool.function.arguments)

    messages.append(response.message)
    messages.append({'role': 'tool', 'content': result})

    final = chat(model, messages=messages)
    print('Response:', final.message.content)
  else:
    print('Response:', response.message.content)

2. Quick on-device AI on free CPU Colab instances

Kex 1.5 is light enough to run on a CPU-only Colab instance!

Cell 1:

try:
  import ollama
except:
  import os, time, threading
  !apt-get install zstd
  !curl -fsSL https://ollama.com/install.sh | sh
  !pip install ollama
  def worker():
    os.system("ollama serve")
  threading.Thread(target=worker).start()
  time.sleep(1)
  !ollama pull KexityAI/kex1.5

Cell 2:

import os, threading, time

def worker():
  os.system("ollama serve")

threading.Thread(target=worker).start()

time.sleep(1)

Kexity AI's second generation of flagship TLMs for efficient local inference. Edit