Details

Updated 1 year ago

1 year ago

bbc4c64bdc94 · 12GB ·

model

archllama

parameters11.2B

quantizationQ8_0

12GB

params

{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"

114B

template

<s>{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if

263B

! Quants from Q4 to Q8 (standard) are here:
https://ollama.com/SpeakLeash/bielik-11b-v2.2-instruct

Bielik-11B-v2.2-Instruct-GGUF-IQ-Imatrix

! Experimental: Select the appropriate model from the Tags list

IQ1_M, IQ2_XXS, IQ3_XXS, IQ4_XS, Q4_K_M, Q5_K_M, Q6_K, Q8_0 - models description below!

This is an experimental version of the repository containing quantized Bielik-11B-v2.2-Instruct models using calibration with importance matrix (imatrix). Models with low precision (2bit, 3bit) for use in mobile devices or minicomputers. Note that these models should be used mainly in instructional mode (not chat). We recommend setting low temperature values. Models with higher precision 4-8bit after calibration may show better quality than models without calibration.

DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!

Available quantization formats:

IQ1_M: (2.7GB) (1.75bit) Extremely low quality, not recommended.
IQ2_XXS: (3.0GB) Lower quality, uses SOTA techniques to be usable.
IQ3_XXS: (4.3GB) Lower quality, new method with decent performance, comparable to Q3 quants.
IQ4_XS: (6.0GB) Decent quality, smaller than Q4_K_S with similar performance, recommended.
Q4_K_M: (6.7GB) Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K
Q5_K_M: (7.9GB) Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K
Q6_K: (9.2GB) Uses Q8_K for all tensors
Q8_0: (12GB) Almost indistinguishable from float16. High resource use and slow.

This model is created using - Ollama Modfile

Bigger models have: PARAMETER temperature 0.2 - set while creating this repo. Smaller imatrix models (IQ4_XS, IQ3_XXS, IQ2_XXS, IQ1_M) have: PARAMETER temperature 0.1 - set while creating this repo.

The GGUF file can be used with Ollama. To do this, you need to import the model using the configuration defined in the Modfile. For model eg. Bielik-11B-v2.2-Instruct.Q4_K_M.gguf (full path to model location) Modfile looks like:

FROM ./Bielik-11B-v2.2-Instruct.Q4_K_M.gguf
TEMPLATE """<s>{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

# Remeber to set low temperature for experimental models (1-3bits)
PARAMETER temperature 0.1

Model description:

Developed by: SpeakLeash & ACK Cyfronet AGH
Language: Polish
Model type: causal decoder-only
Quant from: Bielik-11B-v2.2-Instruct
Finetuned from: Bielik-11B-v2
License: Apache 2.0 and Terms of Use

Contact Us

If you have any questions or suggestions join our Discord SpeakLeash.

Polish LLM - Bielik-11B-v2.2-Instruct ~ by SpeakLeash a.k.a Spichlerz!