997 Downloads Updated 1 year ago
Updated 1 year ago
1 year ago
399adee563dd · 8.5GB ·
NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence
All the models have been quantized following the instructions provided by llama.cpp. This is:
# obtain the official LLaMA model weights and place them in ./models
ls ./models
llama-2-7b tokenizer_checklist.chk tokenizer.model
# [Optional] for models using BPE tokenizers
ls ./models
<folder containing weights and tokenizer json> vocab.json
# [Optional] for PyTorch .bin models like Mistral-7B
ls ./models
<folder containing weights and tokenizer json>
# install Python dependencies
python3 -m pip install -r requirements.txt
# convert the model to ggml FP16 format
python3 convert-hf-to-gguf.py models/mymodel/
# quantize the model to 4-bits (using Q4_K_M method)
./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
# update the gguf filetype to current version if older version is now unsupported
./llama-quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY
Original model: https://huggingface.co/openchat/openchat-3.6-8b-20240522
<|begin_of_text|><|start_header_id|>System<|end_header_id|>
{system}<|eot_id|><|start_header_id|>GPT4 Correct User<|end_header_id|>
{user}<|eot_id|><|start_header_id|>GPT4 Correct Assistant<|end_header_id|>
{{ if .System }}<|begin_of_text|><|start_header_id|>System<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>GPT4 Correct User<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>GPT4 Correct Assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
| Filename | Quant type | File Size | Description |
|---|---|---|---|
| openchat-3.6-8b-20240522-fp16.gguf | fp16 | 16.06GB | Half precision, no quantization applied |
| openchat-3.6-8b-20240522-q8_0.gguf | q8_0 | 8.54GB | Extremely high quality, generally unneeded but max available quant. |
| openchat-3.6-8b-20240522-q6_K.gguf | q6_K | 6.59GB | Very high quality, near perfect, recommended. |
| openchat-3.6-8b-20240522-q5_1.gguf | q5_1 | 6.06GB | High quality, recommended. |
| openchat-3.6-8b-20240522-q5_K_M.gguf | q5_K_M | 5.73GB | High quality, recommended. |
| openchat-3.6-8b-20240522-q5_K_S.gguf | q5_K_S | 5.59GB | High quality, recommended. |
| openchat-3.6-8b-20240522-q5_K_S.gguf | q5_0 | 5.59GB | High quality, recommended. |
| openchat-3.6-8b-20240522-q4_K_M.gguf | q4_1 | 4.92GB | Good quality, recommended. |
| openchat-3.6-8b-20240522-q4_K_M.gguf | q4_K_M | 4.92GB | Good quality, uses about 4.83 bits per weight, recommended. |
| openchat-3.6-8b-20240522-q4_K_S.gguf | q4_K_S | 4.69GB | Slightly lower quality with more space savings, recommended. |
| openchat-3.6-8b-20240522-q4_0.gguf | q4_0 | 4.66GB | Slightly lower quality with more space savings, recommended. |
| openchat-3.6-8b-20240522-q3_K_L.gguf | q3_K_L | 4.32GB | Lower quality but usable, good for low RAM availability. |
| openchat-3.6-8b-20240522-q3_K_M.gguf | q3_K_M | 4.01GB | Even lower quality. |
| openchat-3.6-8b-20240522-q3_K_S.gguf | q3_K_S | 3.66GB | Low quality, not recommended. |
| openchat-3.6-8b-20240522-q2_K.gguf | q2_K | 3.17GB | Very low quality but surprisingly usable. |
ollama run NeuralNet/openchat-3.6-8b-20240522
Create a text plain file named Modelfile (no extension needed)
FROM NeuralNet/openchat-3.6
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.5
# sets the context window size to 8192, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 8192
# tokens to generate set to 4096 (max)
PARAMETER num_predict 4096
# set system
SYSTEM "You are an AI assistant created by NeuralNet, your answer are clear and consice"
# template OpenChat3.6
TEMPLATE "{{ if .System }}<|begin_of_text|><|start_header_id|>System<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>GPT4 Correct User<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>GPT4 Correct Assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"
Then, after previously install ollama, just run:
ollama create openchat-3.6-8b-20240522 -f openchat-3.6-8b-20240522
huggingface_hub[cli]Ensure you have the necessary CLI tool installed by running:
pip install -U "huggingface_hub[cli]"
To download a specific model file, use the following command:
huggingface-cli download NeuralNet-Hub/openchat-3.6-8b-20240522-GGUF --include "openchat-3.6-8b-20240522-Q4_K_M.gguf" --local-dir ./
This command downloads the specified model file and places it in the current directory (./).
For models exceeding 50GB, which are typically split into multiple files for easier download and management:
huggingface-cli download NeuralNet-Hub/openchat-3.6-8b-20240522-GGUF --include "openchat-3.6-8b-20240522-Q8_0.gguf/*" --local-dir openchat-3.6-8b-20240522-Q8_0
This command downloads all files in the specified directory and places them into the chosen local folder (openchat-3.6-8b-20240522-Q8_0). You can choose to download everything in place or specify a new location for the downloaded files.
A comprehensive analysis with performance charts is provided by Artefact2 here.
By following these guidelines, you can make an informed decision on which file best suits your system and performance needs.
NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence
Website: https://neuralnet.solutions Email: info[at]neuralnet.solutions