4.3M Downloads Updated 1 year ago
Updated 1 year ago
1 year ago
df2e3c78030b · 11GB ·
Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.
Open the terminal and run ollama run llama2
Example using curl:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.
Chat is fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with -chat in the tags tab.
Example: ollama run llama2
Pre-trained is without the chat fine-tuning. This is tagged as -text in the tags tab.
Example: ollama run llama2:text
By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.