SmolLM-135M-GGUF quantized to Q4_0 GGUF for efficient inference.

ollama run schroneko/smollm-135m:q4_0

curl http://localhost:11434/api/chat \
  -d '{
    "model": "schroneko/smollm-135m:q4_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='schroneko/smollm-135m:q4_0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'schroneko/smollm-135m:q4_0',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 months ago

2 months ago

6d10e3567a82 · 92MB ·

model

archllama

parameters135M

quantizationQ4_0

92MB

Readme

SmolLM-135M-GGUF

SmolLM-135M-GGUF is a language model converted to GGUF format for efficient on-device inference.

This is a GGUF Q4_0 quantized version of QuantFactory/SmolLM-135M-GGUF, converted using castkit.

Getting Started

ollama run schroneko/smollm-135m:q4_0

Quantization Details

Property	Value
Format	GGUF
Quantization	Q4_0
Source	QuantFactory/SmolLM-135M-GGUF

Key Features

GGUF format: Optimized for fast inference with llama.cpp and Ollama
Q4_0 quantization: Reduced memory footprint for on-device deployment
Ready to use: Run directly with ollama run schroneko/smollm-135m:q4_0

Source

Developer: QuantFactory
Original model: QuantFactory/SmolLM-135M-GGUF

License

Please refer to the original model card for license information.

# SmolLM-135M-GGUF

SmolLM-135M-GGUF is a language model converted to GGUF format for efficient on-device inference.

This is a GGUF Q4_0 quantized version of [QuantFactory/SmolLM-135M-GGUF](https://huggingface.co/QuantFactory/SmolLM-135M-GGUF), converted using [castkit](https://github.com/schroneko/castkit).

## Getting Started

```
ollama run schroneko/smollm-135m:q4_0
```

## Quantization Details

| Property | Value |
|---|---|
| Format | GGUF |
| Quantization | Q4_0 |
| Source | [QuantFactory/SmolLM-135M-GGUF](https://huggingface.co/QuantFactory/SmolLM-135M-GGUF) |

## Key Features

- **GGUF format**: Optimized for fast inference with llama.cpp and Ollama
- **Q4_0 quantization**: Reduced memory footprint for on-device deployment
- **Ready to use**: Run directly with `ollama run schroneko/smollm-135m:q4_0`

## Source

- **Developer**: [QuantFactory](https://huggingface.co/QuantFactory)
- **Original model**: [QuantFactory/SmolLM-135M-GGUF](https://huggingface.co/QuantFactory/SmolLM-135M-GGUF)

## License

Please refer to the [original model card](https://huggingface.co/QuantFactory/SmolLM-135M-GGUF) for license information.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)