12 Downloads Updated 8 hours ago
ollama run KexityAI/kex1.5
ollama launch claude --model KexityAI/kex1.5
ollama launch openclaw --model KexityAI/kex1.5
ollama launch hermes --model KexityAI/kex1.5
ollama launch codex --model KexityAI/kex1.5
ollama launch opencode --model KexityAI/kex1.5
Kex 1.5 is Kexity AI’s second generation of flagship TLMs for efficient local inference. Kex 1.5 supports tool calling, thinking, and features token-efficient thinking and reasoning for compute-constrained environments.
This model is for customers with extremely constrained compute or low-latency applications. Kex 1.5 punches above it’s weight in agentic use-cases, and is useful for tasks such as the following:
| Model | Thinking Token Usage |
|---|---|
| Kex 1.5 0.6B | 52 tokens (on average) |
| Qwen3 0.6B | 752 tokens (on average) |
Quick example of a Kex 1.5 tool calling setup
# example usage:
# Prompt: What is the weather today?
# Calling: get_weather({'city': 'paris'})
# Response: It's a sunny 22°C in Paris. Would you like advice on how to protect yourself outdoors?
import json
from rich import print
from ollama import chat
model = 'KexityAI/kex1.5'
def get_weather(city: str) -> str:
return json.dumps({'city': city, 'temperature': 22, 'unit': 'celsius', 'condition': 'sunny'})
messages = [{"role": "system", "content": "You are a helpful general purpose assistant. The user is located in Paris."}]
while True:
prompt = input("Prompt: ")
if prompt == "":
break
messages.append({"role": "user", "content": prompt})
response = chat(model, messages=messages, tools=[get_weather])
if response.message.tool_calls:
tool = response.message.tool_calls[0]
print(f'Calling: {tool.function.name}({tool.function.arguments})')
result = get_weather(**tool.function.arguments)
messages.append(response.message)
messages.append({'role': 'tool', 'content': result})
final = chat(model, messages=messages)
print('Response:', final.message.content)
else:
print('Response:', response.message.content)
Kex 1.5 is light enough to run on a CPU-only Colab instance!
Cell 1:
try:
import ollama
except:
import os, time, threading
!apt-get install zstd
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama
def worker():
os.system("ollama serve")
threading.Thread(target=worker).start()
time.sleep(1)
!ollama pull KexityAI/kex1.5
Cell 2:
import os, threading, time
def worker():
os.system("ollama serve")
threading.Thread(target=worker).start()
time.sleep(1)