cwchang/llama-3-taiwan-8b-instruct:q4

cwchang/ llama-3-taiwan-8b-instruct:q4_1

2,288 Downloads Updated 1 year ago

The model used is a quantized version of `Llama-3-Taiwan-8B-Instruct`. More details can be found on the https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct

ollama run cwchang/llama-3-taiwan-8b-instruct:q4_1

curl http://localhost:11434/api/chat \
  -d '{
    "model": "cwchang/llama-3-taiwan-8b-instruct:q4_1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='cwchang/llama-3-taiwan-8b-instruct:q4_1',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'cwchang/llama-3-taiwan-8b-instruct:q4_1',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 1 year ago

1 year ago

5ede2b77cd3c · 5.1GB ·

model

archllama

parameters8.03B

quantizationQ4_1

5.1GB

params

{ "num_ctx": 8192, "stop": [ "<|start_header_id|>", "<|end_header_id|>",

171B

template

"{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .P

257B

Readme

The model used is a quantized version of Llama-3 Taiwan 8B Instruct, a specialized model designed for traditional Chinese conversation with 8 billion parameters. Quantization reduces the model’s size and computational requirements while maintaining performance, making it suitable for deployment in resource-constrained environments. More details can be found on the Hugging Face page.