12 8 hours ago

Kexity AI's second generation of flagship TLMs for efficient local inference. Edit

tools thinking
ollama run KexityAI/kex1.5

Applications

Claude Code
Claude Code ollama launch claude --model KexityAI/kex1.5
OpenClaw
OpenClaw ollama launch openclaw --model KexityAI/kex1.5
Hermes Agent
Hermes Agent ollama launch hermes --model KexityAI/kex1.5
Codex
Codex ollama launch codex --model KexityAI/kex1.5
OpenCode
OpenCode ollama launch opencode --model KexityAI/kex1.5

Models

View all →

Readme

image.png

Kex 1.5 is Kexity AI’s second generation of flagship TLMs for efficient local inference. Kex 1.5 supports tool calling, thinking, and features token-efficient thinking and reasoning for compute-constrained environments.

Use Case

This model is for customers with extremely constrained compute or low-latency applications. Kex 1.5 punches above it’s weight in agentic use-cases, and is useful for tasks such as the following:

  • Agents running on edge/IoT devices with less than 1 GB of RAM
  • Low-latency chatbots and agents for environments where speed matters

Speed Comparison

Model Thinking Token Usage
Kex 1.5 0.6B 52 tokens (on average)
Qwen3 0.6B 752 tokens (on average)

Examples

1. Python

Quick example of a Kex 1.5 tool calling setup

# example usage:
# Prompt: What is the weather today?
# Calling: get_weather({'city': 'paris'})
# Response: It's a sunny 22°C in Paris. Would you like advice on how to protect yourself outdoors?

import json
from rich import print
from ollama import chat
model = 'KexityAI/kex1.5'

def get_weather(city: str) -> str:
  return json.dumps({'city': city, 'temperature': 22, 'unit': 'celsius', 'condition': 'sunny'})

messages = [{"role": "system", "content": "You are a helpful general purpose assistant. The user is located in Paris."}]

while True:
  prompt = input("Prompt: ")

  if prompt == "":
    break

  messages.append({"role": "user", "content": prompt})

  response = chat(model, messages=messages, tools=[get_weather])

  if response.message.tool_calls:
    tool = response.message.tool_calls[0]
    print(f'Calling: {tool.function.name}({tool.function.arguments})')

    result = get_weather(**tool.function.arguments)

    messages.append(response.message)
    messages.append({'role': 'tool', 'content': result})

    final = chat(model, messages=messages)
    print('Response:', final.message.content)
  else:
    print('Response:', response.message.content)

2. Quick on-device AI on free CPU Colab instances

Kex 1.5 is light enough to run on a CPU-only Colab instance!

Cell 1:

try:
  import ollama
except:
  import os, time, threading
  !apt-get install zstd
  !curl -fsSL https://ollama.com/install.sh | sh
  !pip install ollama
  def worker():
    os.system("ollama serve")
  threading.Thread(target=worker).start()
  time.sleep(1)
  !ollama pull KexityAI/kex1.5

Cell 2:

import os, threading, time

def worker():
  os.system("ollama serve")

threading.Thread(target=worker).start()

time.sleep(1)