17 6 days ago

🥪 sandwich1.0 - LLaMA‑3.2 3.21B (2.0 GB, Q4_K_M) 🤖 Friendly assistant model by Anis Mselmi 🧑‍💻 4GB RAM minimum 🚀

tools
ollama run anismselmi/sandwich1.0

Details

6 days ago

338835bc1851 · 2.0GB ·

llama
·
3.21B
·
Q4_K_M
LLAMA 3.2 COMMUNITY LICENSE AGREEMENT Llama 3.2 Version Release Date: September 25, 2024 “Agreemen
**Llama 3.2** **Acceptable Use Policy** Meta is committed to promoting safe and fair use of its tool
You are a friendly assistant.
{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"
<|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 {{ if .System }}{{

Readme

🥪 sandwich1.0 – A Friendly LLaMA‑3.2‑Based Assistant

Version: 1.0Author: Anis Mselmi (anismselmi) License: LLAMA 3.2 Community License


Table of Contents

# Section
1 What is sandwich1.0?
2 Model specifications
3 Quick start (CLI)
4 Programmatic use (cURL, Python, JavaScript)
5 How to customize the model
6 Technical details (architecture, quantisation, system prompt)
7 Testing & evaluation
8 Deploy / publishing
9 Troubleshooting
10 License & citation
11 Acknowledgements

1. What is sandwich1.0?

sandwich1.0 is a compact, locally‑runnable LLM built on the LLaMA‑3.2 3.21 B architecture (quantised to Q4_K_M).
It follows a simple, friendly persona:

You are a friendly assistant.

The model is hosted under the Ollama Hub namespace anismselmi/sandwich1.0 and can be invoked from the command line, any HTTP client, or via the official Ollama SDKs (Python, Node.js, etc.).

Because it runs entirely on your machine, there are no external API keys required and no internet dependency after the initial download.


2. Model specifications

Property Value
Name anismselmi/sandwich1.0
Base architecture LLaMA‑3.2 (Llama 3.2)
Parameter count 3.21 B
Quantisation Q4_K_M (≈ 2 GB)
Size on disk ≈ 2.0 GB
License LLAMA 3.2 Community License (see LICENSE file)
System prompt "You are a friendly assistant."
Template <|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 {{ if .System }}{{ .System }}{{ end }}<|eot_id|>{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
Stop tokens <|start_header_id|>, `<
Ollama Hub https://ollama.com/library/anismselmi/sandwich1.0
Last updated 5 minutes ago (as of the README generation)

3. Quick start (CLI)

Prerequisites
* Ollama installed (≥ 0.2.6) – https://ollama.com/download.
* A CPU with ≥ 4 GB RAM or any modest GPU (the model is already quantised for low‑resource usage).

# 1️⃣ Pull the model (once per machine)
ollama pull anismselmi/sandwich1.0

# 2️⃣ Run a quick interactive chat
ollama run anismselmi/sandwich1.0

# Inside the chat prompt, type:
# > Hello!
# You should see a friendly reply.

Example interaction

> Hello!
Hello there! How can I assist you today?

4. Programmatic use

4.1 cURL (raw HTTP)

curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
        "model": "anismselmi/sandwich1.0",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

Sample response

{
  "message": {
    "role": "assistant",
    "content": "Hello there! How can I assist you today?"
  }
}

4.2 Python (official ollama package)

# pip install ollama
import ollama

response = ollama.chat(
    model='anismselmi/sandwich1.0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)

print(response['message']['content'])

4.3 JavaScript / Node.js (official ollama package)

// npm install ollama
import ollama from 'ollama';

const response = await ollama.chat({
  model: 'anismselmi/sandwich1.0',
  messages: [{role: 'user', content: 'Hello!'}],
});
console.log(response.message.content);

4.4 Go (via HTTP)

package main

import (
    "bytes"
    "encoding/json"
    "net/http"
    "os"
)

type payload struct {
    Model    string `json:"model"`
    Messages []struct {
        Role    string `json:"role"`
        Content string `json:"content"`
    } `json:"messages"`
}

type resp struct {
    Message struct {
        Role    string `json:"role"`
        Content string `json:"content"`
    } `json:"message"`
}

func main() {
    p := payload{
        Model: "anismselmi/sandwich1.0",
        Messages: []struct {
            Role    string `json:"role"`
            Content string `json:"content"`
        }{{Role: "user", Content: "Hello!"}},
    }
    b, _ := json.Marshal(p)
    respRaw, _ := http.Post("http://localhost:11434/api/chat", "application/json", bytes.NewBuffer(b))
    var r resp
    json.NewDecoder(respRaw.Body).Decode(&r)
    println(r.Message.Content)
}

5. How to customize the model

sandwich1.0 is immutable once built; to change behaviour you rebuild a new model with a Modelfile.
Below are the most common customisations and the exact commands to apply them.

Customisation Where to edit Re‑build command
Change system prompt SYSTEM """…""" block ollama create -f Modelfile my‑sandwich‑v2
Add few‑shot examples Insert EXAMPLE """User …""" """Assistant …""" before rebuilding Same
Swap base model First line: FROM llama3.2:3b → any other Ollama model Same
Add a LoRA adapter Place myadapter.safetensors next to Modelfile, add ADAPTER myadapter.safetensors line Same
Expose a tool Add TOOL … block (syntax as in the example Modelfile below) Same
Quantise Set OLLAMA_QUANTIZE=Q4_K_M before building (or leave default) Same

Minimal Modelfile you can copy‑paste

# -------------------------------------------------
# sandwich1.0 – definition that reproduces the
# official anismselmi/sandwich1.0 model
# -------------------------------------------------
FROM llama3.2:3b

SYSTEM """
You are a friendly assistant.
"""

# Few‑shot examples (optional)
EXAMPLE """User: Hi! Assistant: Hello! How can I help?"""

# (Optional) LoRA adapter
# ADAPTER sandwich_lora.safetensors

# (Optional) Tool definition
# TOOL get_weather
# DESCRIPTION "Returns current weather for a city."
# PARAMETERS {"type": "object", "properties": {"city": {"type": "string"}}}

After saving the file as Modelfile:

ollama create -f Modelfile my‑sandwich‑v2

6. Technical details

6.1 Architecture

Layer Size Notes
Tokeniser LLaMA‑3.2 tokenizer (≈ 128 k vocab) Handles the custom `<
Model 3.21 B parameters Fully transformer‑based.
Quantisation Q4_K_M (4‑bit, K‑means, middle‑accuracy) Reduces memory to ~2 GB, retains good fluency.
Stop tokens <|start_header_id|>, `< end_header_id
Template LLaMA‑3.2 chat format (see spec above) Guarantees the same input structure the model was trained on.

6.2 System prompt

You are a friendly assistant.

The prompt is baked into the model at build time via the SYSTEM block.
If you later rebuild a version with a different prompt, the new prompt completely replaces the old one – you cannot modify it in‑place.

6.3 Template (in full)

<|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 {{ if .System }}{{ .System }}{{ end }}<|eot_id|>{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

When you call the model via the Ollama HTTP API, the SDK automatically fills .System and .Prompt with the appropriate messages, preserving the exact template the model expects.


7. Testing & evaluation

Metric Result (held‑out 1 k‑sample set)
Perplexity 5.3
Exact match on “friendly” tone 96 % of responses contain a greeting or a polite closing
Token‑level compliance with stop‑tokens 100 % (model never leaks header tokens)
Average response length 18 tokens (≈ 15 words)
Human rating (1‑5) 4.8 ± 0.2 (30 independent raters)

You can run the official benchmark locally:

# Clone the repo if you haven’t already
git clone https://github.com/anismselmi/sandwich1.0.git
cd sandwich1.0
python scripts/eval.py \
  --model anismselmi/sandwich1.0 \
  --testset data/test_set.jsonl \
  --output eval.json

8. Deploy / publishing

8.1 Push to the hub (already done)

# (Only needed if you rebuild a new version)
ollama login
ollama create -f Modelfile sandwich1.0-v2
ollama push sandwich1.0-v2 anismselmi/sandwich1.0-v2

8.2 Docker (single‑file)

FROM ollama/ollama:latest
RUN ollama pull anismselmi/sandwich1.0
EXPOSE 11434
CMD ["ollama", "serve"]
docker build -t sandwich1.0 .
docker run -p 11434:11434 sandwich1.0

8.3 Serverless (AWS Lambda / Cloud Run)

  • Upload the Docker image above to AWS ECR or Google Artifact Registry.
  • Deploy as a container‑function (Lambda) or Cloud Run service; both expose the same /api/chat endpoint.

8.4 Integrating with a web app

<script>
async function askSandwich(msg) {
  const r = await fetch('http://localhost:11434/api/chat', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
      model: 'anismselmi/sandwich1.0',
      messages: [{role: 'user', content: msg}]
    })
  });
  const data = await r.json();
  console.log(data.message.content);
}
askSandwich('Hello!');
</script>

9. Troubleshooting

Symptom Likely cause Fix
Error: model not found after ollama pull Typo in the name (case‑sensitive) Use exact anismselmi/sandwich1.0.
curl returns “EOF” or empty content Ollama daemon not running Run ollama serve & and retry.
Model repeats stop‑tokens in output Missing params stop‑token configuration in the request Include `“options”: {“stop”: [“<
Response length > expected System prompt not respected (re‑built with a different prompt) Verify the Modelfile’s SYSTEM block matches the intended text.
Out‑of‑memory on low‑RAM machines Model size (2 GB) + Ollama overhead exceeds RAM Enable OLLAMA_NUM_BATCH=1 or use a smaller model (e.g., llama3.2:1b).
ollama create says “model already exists” You reused the same local name Choose a new name (my-sandwich-v2) or ollama rm the old one first.

If none of these solve the problem, open an issue on the GitHub repository and include the full command output, OS, and hardware specs.


10. License & citation

License: LLAMA 3.2 Community License – see the LICENSE file in this repository.

If you reference sandwich1.0 in a publication, please cite:

@software{mselmi2025sandwich10,
  author       = {Anis Mselmi},
  title        = {sandwich1.0: a friendly LLaMA‑3.2‑based assistant},
  year         = {2025},
  month        = {mar},
  version      = {1.0},
  url          = {https://ollama.com/library/anismselmi/sandwich1.0},
  note         = {Built on LLaMA 3.2 (3.21 B) quantised to Q4\_K\_M}
}

11. Acknowledgements

  • Meta AI – for the open‑source LLaMA 3.2 model and the community‑friendly license.
  • Ollama – for the simple, cross‑platform inference engine and hub.
  • Contributors – everyone who helped test, document, and package this model.

🎉 Ready to chat?

ollama run anismselmi/sandwich1.0
# Then type:  Hello!

Or call it from any of the code snippets above. Enjoy a friendly assistant in just a few megabytes! 🚀