yarn-llama2:7b-64k-q6

yarn-llama2:7b-64k-q6_K

944.8K Downloads Updated 2 years ago

An extension of Llama 2 that supports a context of up to 128k tokens.

7b 13b

ollama run yarn-llama2:7b-64k-q6_K

curl http://localhost:11434/api/chat \
  -d '{
    "model": "yarn-llama2:7b-64k-q6_K",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='yarn-llama2:7b-64k-q6_K',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'yarn-llama2:7b-64k-q6_K',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 years ago

2 years ago

f583cc04b87c · 5.5GB ·

model

archllama

parameters6.74B

quantizationQ6_K

5.5GB

params

{ "num_ctx": 65536 }

17B

Readme

Yarn Llama 2 is a model based on Llama2 that extends its context size up to 128k context. It is developed by Nous Research by implementing the YaRN method to further train the model to support larger context windows.

CLI

64k context size:

ollama run yarn-llama2

128k context size:

ollama run yarn-llama2:7b-128k

API

Example:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "yarn-llama2:7b-128k",
  "prompt":"Here is a story about llamas eating grass"
 }'

References

Hugging Face

YaRN: Efficient Context Window Extension of Large Language Models