koesn/llama3-8b-instruct:Q4_K

koesn/ llama3-8b-instruct:Q4_K_M

1,669 Downloads Updated 1 year ago

Fixed num_ctx to 8192 and eos token. This Llama 3 8B Instruct model is ready to use for full model's 8k contexts window.

ollama run koesn/llama3-8b-instruct:Q4_K_M

curl http://localhost:11434/api/chat \
  -d '{
    "model": "koesn/llama3-8b-instruct:Q4_K_M",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='koesn/llama3-8b-instruct:Q4_K_M',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'koesn/llama3-8b-instruct:Q4_K_M',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

aae5b523ef30 · 4.9GB ·

model

archllama

parameters8.03B

quantizationQ4_K_M

4.9GB

system

You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests t

129B

params

{ "num_ctx": 8192, "num_keep": 24, "stop": [ "<|start_header_id|>", "<|e

129B

template

{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Pr

257B

Readme

Meta-Llama-3-8B-Instruct

Model Quants	Size	Bit	Perplexity
llama3-8b-instruct:Q4_0	4.7GB	4	+0.2166 ppl
llama3-8b-instruct:Q4_K_M	4.9GB	4	+0.0532 ppl
llama3-8b-instruct:Q5_K_M	5.7GB	5	+0.0122 ppl
llama3-8b-instruct:Q6_K	6.6GB	6	+0.0008 ppl

Config

“max_position_embeddings” : 8192
“rope_theta” : 500000.0
“vocab_size” : 128256

Remarks

‘latest’ model points to Q4_0
modelfile has 8192 num_ctx activated (Ollama default only 2048)
fixed eos token, no more repetitive response