KexityAI/kex

KexityAI/ kex:latest

7 Downloads Updated 1 month ago

Kexity AI's first generation of flagship TLMs for efficient local inference.

tools thinking

ollama run KexityAI/kex

curl http://localhost:11434/api/chat \
  -d '{
    "model": "KexityAI/kex",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='KexityAI/kex',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'KexityAI/kex',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 month ago

1 month ago

c9901482cc8c · 397MB ·

model

archqwen3

parameters596M

quantizationQ4_K_M

397MB

params

{ "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_end|>" ], "te

120B

template

{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la

1.7kB

Readme

NOTE: Kex has been succeeded by Kex 1.5. We suggest using that instead.

Kex is Kexity AI’s first generation of flagship TLMs for efficient local inference. Kex supports tool calling, thinking, and features token-efficient thinking and reasoning for compute-constrained environments.

Use Case

This model is for customers with extremely constrained compute or low-latency applications. Kex punches above it’s weight in agentic use-cases, and is useful for tasks such as the following:

Agents running on edge/IoT devices with less than 512 MB of RAM
Low-latency chatbots and agents for environments where speed matters