143K Downloads Updated 1 week ago
ollama run vatistasdim/Cipher
Updated 1 week ago
1 week ago
c71a0954f1d8 · 2.0GB ·
| Architecture family | llama |
| Parameter scale | 3.2B |
| Quantization | Q4_K_M |
| Input mode | Text |
| Output mode | Completion and tool-capable text |
| Native context window | 131072 tokens |
| Recommended daily context | 2048 tokens |
| Embedding length | 3072 |
| Temperature | 0.55 |
| Top-p | 0.9 |
| Repeat penalty | 1.05 |
| Primary behavior | Precise, concise, practical, structured |
| Best output formats | Markdown, short plans, code snippets, JSON-shaped responses, checklists |
Cipher is the stricter model in the Cipher pair. It is meant to stay close to the prompt, keep wording controlled, and avoid drifting into unnecessary alternatives. The lower temperature makes it the better choice for:
Benchmark results depend on hardware, prompt size, context length, and Ollama settings. Cipher is tuned for a practical balance: stronger and more capable than very small local models, while staying much lighter than large 7B, 8B, or cloud-scale models.
| Area | Cipher Profile | What This Means |
|---|---|---|
| Local speed | High for a 3.2B-class model | Good for chat, CLI use, and repeated local calls. |
| Memory use | Low to moderate | Designed to run on consumer machines without a large GPU requirement. |
| Answer precision | High | Lower temperature helps with direct answers, code explanations, and checklists. |
| Creativity | Moderate | Better for controlled output than broad brainstorming. |
| Long-context work | Strong when context is increased | Start at 2048 tokens, then raise context for large files or logs. |
| Structured output | Strong | Good fit for Markdown, JSON-shaped output, plans, and automation steps. |
These are single local smoke-test numbers from the same machine and a short prompt. They are useful for relative runtime feel, not as universal benchmark claims. No quality score is implied by token speed.
Benchmark prompt: Write exactly six concise bullets comparing local AI
assistants for coding, summarization, and brainstorming.
Benchmark options: num_ctx 2048, num_predict 140, temperature 0.2.
| Model | Installed size | Eval tokens | Total time | Generation speed |
|---|---|---|---|---|
vatistasdim/Cipher:latest |
2.0 GB | 137 | 12.60 s | 32.19 tok/s |
vatistasdim/Cipher-Abliterated:latest |
2.0 GB | 140 | 4.32 s | 38.46 tok/s |
hf.co/bartowski/Qwen2.5-3B-Instruct-GGUF:Q4_K_M |
1.9 GB | 78 | 10.31 s | 34.79 tok/s |
phi3:mini |
2.2 GB | 140 | 8.27 s | 33.17 tok/s |
gemma:2b |
1.7 GB | 140 | 5.55 s | 46.54 tok/s |
dolphin-phi:latest |
1.6 GB | 140 | 8.81 s | 38.00 tok/s |
huihui_ai/falcon3-abliterated:3b |
2.0 GB | 140 | 12.53 s | 36.29 tok/s |
| Model | Size Class | Main Feel | Cipher Difference |
|---|---|---|---|
gemma:2b |
1.7 GB local model | Fast, lightweight general chat | Cipher is tuned more for structured technical answers and precision. |
phi3:mini |
2.2 GB local model | Compact reasoning and instruction following | Cipher uses a more controlled sampling profile for concise local workflows. |
dolphin-phi:latest |
1.6 GB local model | Lightweight conversational assistant | Cipher is more focused on predictable coding, planning, and checklist output. |
hf.co/bartowski/Qwen2.5-3B-Instruct-GGUF:Q4_K_M |
1.9 GB local model | General instruction model with broad use | Cipher has a narrower practical assistant identity and lower-temperature output. |
huihui_ai/falcon3-abliterated:3b |
2.0 GB local model | Flexible 3B-class generation | Cipher is tuned less loose, with stronger emphasis on precision. |
vatistasdim/Cipher-Abliterated:latest |
2.0 GB Cipher variant | More adaptive and creative | Cipher is the stricter option for direct technical work. |
sequenceDiagram
participant User
participant Client as "CLI, app, or script"
participant Ollama as "Local Ollama runtime"
participant Cipher as "Cipher model"
User->>Client: "Send prompt"
Client->>Ollama: "Chat request with model vatistasdim/Cipher"
Ollama->>Cipher: "Apply model settings and context"
Cipher-->>Ollama: "Generated response"
Ollama-->>Client: "Response payload or streamed tokens"
Client-->>User: "Precise, structured answer"
Start the Ollama service, then call the chat API:
curl http://localhost:11434/api/chat \
-d '{
"model": "vatistasdim/Cipher",
"messages": [
{ "role": "user", "content": "Write a concise plan for testing a CLI tool." }
],
"stream": false,
"options": {
"temperature": 0.55,
"top_p": 0.9,
"repeat_penalty": 1.05,
"num_ctx": 2048
}
}'
Python:
from ollama import chat
response = chat(
model="vatistasdim/Cipher",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.message.content)
JavaScript:
import ollama from "ollama";
const response = await ollama.chat({
model: "vatistasdim/Cipher",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.message.content);
ollama launch claude --model vatistasdim/Cipher
ollama launch codex-app --model vatistasdim/Cipher
ollama launch openclaw --model vatistasdim/Cipher
ollama launch codex --model vatistasdim/Cipher
ollama launch opencode --model vatistasdim/Cipher
Use Cipher when you want a precise local assistant for:
For more open-ended brainstorming, use Cipher-Abliterated. For tighter answers, repeatable formatting, and practical technical work, use Cipher.