bramvanroy/fietje-2b-instruct:Q8

bramvanroy/ fietje-2b-instruct:Q8_0

340 Downloads Updated 1 year ago

Fietje: An open and efficient LLM for Dutch (instruct)

ollama run bramvanroy/fietje-2b-instruct:Q8_0

curl http://localhost:11434/api/chat \
  -d '{
    "model": "bramvanroy/fietje-2b-instruct:Q8_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='bramvanroy/fietje-2b-instruct:Q8_0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'bramvanroy/fietje-2b-instruct:Q8_0',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

06c478dc3188 · 3.0GB ·

model

archphi2

parameters2.78B

quantizationQ8_0

3.0GB

template

{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user

156B

params

{ "num_ctx": 2048, "stop": [ "<|im_start|>", "<|im_end|>" ] }

74B

Readme

This repository contains quantized versions of BramVanroy/fietje-2b-instruct.

Available quantization types and expected performance differences compared to base f16, higher perplexity=worse (from llama.cpp):

Q3_K_M  :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
Q4_K_M  :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
Q5_K_M  :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
Q6_K    :  5.15G, +0.0008 ppl @ LLaMA-v1-7B
Q8_0    :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
F16     : 13.00G              @ 7B

Quants were made with release b2777 of llama.cpp.