sam860/falcon-h1:1.5b-deep-Q8_0

sam860/ falcon-h1:1.5b-deep-Q8_0

91 Downloads Updated 3 months ago

1.5b

ollama run sam860/falcon-h1:1.5b-deep-Q8_0

curl http://localhost:11434/api/chat \
  -d '{
    "model": "sam860/falcon-h1:1.5b-deep-Q8_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='sam860/falcon-h1:1.5b-deep-Q8_0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'sam860/falcon-h1:1.5b-deep-Q8_0',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 3 months ago

3 months ago

caa0703bff10 · 1.7GB ·

model

archfalcon-h1

·

parameters1.55B

·

quantizationQ8_0

1.7GB

template

{{ if .System }}<|begin_of_text|><|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }

199B

params

{ "stop": [ "<|begin_of_text|>", "<|im_start|>", "<|im_end|>", "

77B

Readme

Notes

Uploaded in Q4_0 and Q8_0 formats.

Q4_0 – the lowest‑bit version that still retains most of the original quality; good for CPU‑only inference on modest RAM (≈1 GB).
Q8_0 – higher‑bit, slightly better fidelity; use when you have a bit more memory or need the absolute best output.

Temperature: The model was trained for fairly deterministic behavior. Start with 0.1 – 0.2 for reliable answers; increase to ≈0.6 only if you want more creative or exploratory output.

Description

Falcon‑H1‑1.5B‑Deep‑Instruct – a 1.5 B‑parameter hybrid model that combines a classic decoder‑only transformer stack with Mamba (state‑space) blocks.

Key architectural highlights:

Hybrid Transformer + Mamba: Alternating transformer layers and Mamba (S4) layers give strong sequence modeling while keeping compute low.
Efficient inference: The mixed architecture enables fast token generation on CPUs and NPUs, making the model well‑suited for edge devices.
Multilingual: Primarily English but trained on a multilingual corpus, so it handles many languages reasonably well.
Instruction‑tuned: Optimized for chat, tool‑calling, and structured JSON output.

Ideal for:

On‑device assistants and chatbots
Retrieval‑augmented generation (RAG) pipelines
Structured data extraction / JSON generation
Lightweight code completion (FIM)

References

Falcon‑H1 release blogpost

Technical report (arXiv 2507.22448)

Model card on HuggingFace

Discord community

### Notes
Uploaded in **Q4_0** and **Q8_0** formats.

- **Q4_0** – the lowest‑bit version that still retains most of the original quality; good for CPU‑only inference on modest RAM (≈1 GB).  
- **Q8_0** – higher‑bit, slightly better fidelity; use when you have a bit more memory or need the absolute best output.

**Temperature:** The model was trained for fairly deterministic behavior. Start with **0.1 – 0.2** for reliable answers; increase to **≈0.6** only if you want more creative or exploratory output.

---

### Description
**Falcon‑H1‑1.5B‑Deep‑Instruct** – a 1.5 B‑parameter hybrid model that combines a classic decoder‑only transformer stack with Mamba (state‑space) blocks.

Key architectural highlights:

- **Hybrid Transformer + Mamba:** Alternating transformer layers and Mamba (S4) layers give strong sequence modeling while keeping compute low.  
- **Efficient inference:** The mixed architecture enables fast token generation on CPUs and NPUs, making the model well‑suited for edge devices.  
- **Multilingual:** Primarily English but trained on a multilingual corpus, so it handles many languages reasonably well.  
- **Instruction‑tuned:** Optimized for chat, tool‑calling, and structured JSON output.

Ideal for:

- On‑device assistants and chatbots  
- Retrieval‑augmented generation (RAG) pipelines  
- Structured data extraction / JSON generation  
- Lightweight code completion (FIM)

---

### References
[Falcon‑H1 release blogpost](https://falcon-lm.github.io/blog/falcon-h1/)

[Technical report (arXiv 2507.22448)](https://arxiv.org/abs/2507.22448)

[Model card on HuggingFace](https://huggingface.co/tiiuae/Falcon-H1-1.5B-Deep-Base)

[Discord community](https://discord.gg/trwMYP9PYm)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)