143K 1 week ago

Cipher is a compact conversational AI assistant optimized for direct responses, practical reasoning, structured output, and efficient local deployment. Designed for fast inference on consumer hardware, Cipher focuses on usability, consistency,

tools
ollama run vatistasdim/Cipher

Details

1 week ago

c71a0954f1d8 · 2.0GB ·

llama
·
3.21B
·
Q4_K_M
<|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 {{ if .System }}{{
Identity and specs: Model name: Cipher. Creator statement: Dimitris Vatistas made and trained you wi
{ "num_ctx": 2048, "repeat_penalty": 1.05, "stop": [ "<|start_header_id|>",

Readme

| Architecture family | llama | | Parameter scale | 3.2B | | Quantization | Q4_K_M | | Input mode | Text | | Output mode | Completion and tool-capable text | | Native context window | 131072 tokens | | Recommended daily context | 2048 tokens | | Embedding length | 3072 | | Temperature | 0.55 | | Top-p | 0.9 | | Repeat penalty | 1.05 | | Primary behavior | Precise, concise, practical, structured | | Best output formats | Markdown, short plans, code snippets, JSON-shaped responses, checklists |

Behavior Profile

Cipher is the stricter model in the Cipher pair. It is meant to stay close to the prompt, keep wording controlled, and avoid drifting into unnecessary alternatives. The lower temperature makes it the better choice for:

  • debugging steps
  • command suggestions
  • code review notes
  • structured summaries
  • short implementation plans
  • deterministic agent-style responses
  • formatted answers that should not change much between runs

Benchmark Profile

Benchmark results depend on hardware, prompt size, context length, and Ollama settings. Cipher is tuned for a practical balance: stronger and more capable than very small local models, while staying much lighter than large 7B, 8B, or cloud-scale models.

Area Cipher Profile What This Means
Local speed High for a 3.2B-class model Good for chat, CLI use, and repeated local calls.
Memory use Low to moderate Designed to run on consumer machines without a large GPU requirement.
Answer precision High Lower temperature helps with direct answers, code explanations, and checklists.
Creativity Moderate Better for controlled output than broad brainstorming.
Long-context work Strong when context is increased Start at 2048 tokens, then raise context for large files or logs.
Structured output Strong Good fit for Markdown, JSON-shaped output, plans, and automation steps.

Local Benchmark Snapshot

These are single local smoke-test numbers from the same machine and a short prompt. They are useful for relative runtime feel, not as universal benchmark claims. No quality score is implied by token speed.

Benchmark prompt: Write exactly six concise bullets comparing local AI assistants for coding, summarization, and brainstorming.

Benchmark options: num_ctx 2048, num_predict 140, temperature 0.2.

Model Installed size Eval tokens Total time Generation speed
vatistasdim/Cipher:latest 2.0 GB 137 12.60 s 32.19 tok/s
vatistasdim/Cipher-Abliterated:latest 2.0 GB 140 4.32 s 38.46 tok/s
hf.co/bartowski/Qwen2.5-3B-Instruct-GGUF:Q4_K_M 1.9 GB 78 10.31 s 34.79 tok/s
phi3:mini 2.2 GB 140 8.27 s 33.17 tok/s
gemma:2b 1.7 GB 140 5.55 s 46.54 tok/s
dolphin-phi:latest 1.6 GB 140 8.81 s 38.00 tok/s
huihui_ai/falcon3-abliterated:3b 2.0 GB 140 12.53 s 36.29 tok/s

Near-2GB Model Comparison

Model Size Class Main Feel Cipher Difference
gemma:2b 1.7 GB local model Fast, lightweight general chat Cipher is tuned more for structured technical answers and precision.
phi3:mini 2.2 GB local model Compact reasoning and instruction following Cipher uses a more controlled sampling profile for concise local workflows.
dolphin-phi:latest 1.6 GB local model Lightweight conversational assistant Cipher is more focused on predictable coding, planning, and checklist output.
hf.co/bartowski/Qwen2.5-3B-Instruct-GGUF:Q4_K_M 1.9 GB local model General instruction model with broad use Cipher has a narrower practical assistant identity and lower-temperature output.
huihui_ai/falcon3-abliterated:3b 2.0 GB local model Flexible 3B-class generation Cipher is tuned less loose, with stronger emphasis on precision.
vatistasdim/Cipher-Abliterated:latest 2.0 GB Cipher variant More adaptive and creative Cipher is the stricter option for direct technical work.

Request Flow

sequenceDiagram
    participant User
    participant Client as "CLI, app, or script"
    participant Ollama as "Local Ollama runtime"
    participant Cipher as "Cipher model"

    User->>Client: "Send prompt"
    Client->>Ollama: "Chat request with model vatistasdim/Cipher"
    Ollama->>Cipher: "Apply model settings and context"
    Cipher-->>Ollama: "Generated response"
    Ollama-->>Client: "Response payload or streamed tokens"
    Client-->>User: "Precise, structured answer"

Strengths

  • Fast local inference through Ollama.
  • Clear, structured, task-oriented responses.
  • Useful behavior for coding, debugging, automation, and technical explanation.
  • Stable assistant style for agent workflows and repeated local use.

Local API Usage

Start the Ollama service, then call the chat API:

curl http://localhost:11434/api/chat \
  -d '{
    "model": "vatistasdim/Cipher",
    "messages": [
      { "role": "user", "content": "Write a concise plan for testing a CLI tool." }
    ],
    "stream": false,
    "options": {
      "temperature": 0.55,
      "top_p": 0.9,
      "repeat_penalty": 1.05,
      "num_ctx": 2048
    }
  }'

Python:

from ollama import chat

response = chat(
    model="vatistasdim/Cipher",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.message.content)

JavaScript:

import ollama from "ollama";

const response = await ollama.chat({
  model: "vatistasdim/Cipher",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.message.content);

Application Launch Examples

ollama launch claude --model vatistasdim/Cipher
ollama launch codex-app --model vatistasdim/Cipher
ollama launch openclaw --model vatistasdim/Cipher
ollama launch codex --model vatistasdim/Cipher
ollama launch opencode --model vatistasdim/Cipher

Best Fit

Use Cipher when you want a precise local assistant for:

  • Coding and debugging help
  • Command-line workflows
  • Local automation planning
  • Structured summaries
  • Technical checklists
  • Concise explanations
  • Agent-style tasks that need predictable formatting

For more open-ended brainstorming, use Cipher-Abliterated. For tighter answers, repeatable formatting, and practical technical work, use Cipher.