1.1B parameter Lllama model finetuned for chatting

ollama run saikatkumardey/tinyllama:Q8_0

curl http://localhost:11434/api/chat \
  -d '{
    "model": "saikatkumardey/tinyllama:Q8_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='saikatkumardey/tinyllama:Q8_0',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'saikatkumardey/tinyllama:Q8_0',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 2 years ago

2 years ago

1f8f12e9b667 · 1.2GB ·

model

archllama

parameters1.1B

quantizationQ8_0

1.2GB

system

"""You are a helpful assistant."""

34B

template

<|im_start|>system {{ .System }}<|im_end|> <|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assi

107B

params

{ "stop": [ "<|im_start|>", "<|im_end|>" ], "temperature": 0.7 }

76B

Readme

tinyllama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

This chat model is finetuned on OpenAssistant/oasst_top1_2023-08-25 using chatml.

This model is based on an intermediate snapshot trained on 1T tokens.

Note: models will be updated as and when new snapshots are released.

Get Started with TinyLlama

CLI

ollama run saikatkumardey/tinyllama

API

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "saikatkumardey/tinyllama:latest",
  "prompt":"Why is the sky blue?"
 }'

Memory Requirements

Model	Memory
tinyllama	3.4G
tinyllama:Q6_K	3.4G
tinyllama:Q8_0	3.67G