koesn/llama3-8b-instruct

koesn/ llama3-8b-instruct

1,907 Downloads Updated 2 years ago

Fixed num_ctx to 8192 and eos token. This Llama 3 8B Instruct model is ready to use for full model's 8k contexts window.

ollama run koesn/llama3-8b-instruct

curl http://localhost:11434/api/chat \
  -d '{
    "model": "koesn/llama3-8b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='koesn/llama3-8b-instruct',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'koesn/llama3-8b-instruct',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Models

View all →

Name

5 models

Size / Usage

Context

Input

llama3-8b-instruct:latest

4.7GB · 8K context window · Text · 2 years ago

llama3-8b-instruct:latest

4.7GB

Text

llama3-8b-instruct:Q4_0

4.7GB · 8K context window · Text · 2 years ago

llama3-8b-instruct:Q4_0

4.7GB

Text

llama3-8b-instruct:Q4_K_M

4.9GB · 8K context window · Text · 2 years ago

llama3-8b-instruct:Q4_K_M

4.9GB

Text

llama3-8b-instruct:Q5_K_M

5.7GB · 8K context window · Text · 2 years ago

llama3-8b-instruct:Q5_K_M

5.7GB

Text

llama3-8b-instruct:Q6_K

6.6GB · 8K context window · Text · 2 years ago

llama3-8b-instruct:Q6_K

6.6GB

Text

Readme

Meta-Llama-3-8B-Instruct

Model Quants	Size	Bit	Perplexity
llama3-8b-instruct:Q4_0	4.7GB	4	+0.2166 ppl
llama3-8b-instruct:Q4_K_M	4.9GB	4	+0.0532 ppl
llama3-8b-instruct:Q5_K_M	5.7GB	5	+0.0122 ppl
llama3-8b-instruct:Q6_K	6.6GB	6	+0.0008 ppl

Config

“max_position_embeddings” : 8192
“rope_theta” : 500000.0
“vocab_size” : 128256

Remarks

‘latest’ model points to Q4_0
modelfile has 8192 num_ctx activated (Ollama default only 2048)
fixed eos token, no more repetitive response