olfh/teuken-7b-instruct-commercial-v0.4:7b

(unofficial) An instruction-tuned 7B parameter multilingual large language model (LLM) pre-trained with 4T tokens in all official 24 European languages and released in the research project OpenGPT-X.

Details

Updated 1 year ago

1 year ago

17c79ef84136 · 5.0GB ·

model

archllama

parameters7.45B

quantizationQ4_K_M

5.0GB

template

{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user

156B

system

Cutting knowledge date: February 2023 You are an artificial intelligence assistant and you chat with

786B

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "num_ctx": 4096, "stop": [ "<|im_start|>", "<|im_end|>" ], "temper

92B

Teuken-7B-instruct-commercial-v0.4 is an instruction-tuned 7B parameter multilingual large language model (LLM) pre-trained with 4T tokens in all official 24 European languages and released under Apache 2.0 in the research project OpenGPT-X.

To learn more about Teuken-7B-instruct-commercial-v0.4, visit Hugging Face.

Primary use cases

The model is designed for commercial and research use in all official 24 European languages. Since it focuses on covering all 24 EU languages, it provides more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks.

Primary use cases include:

RAG (retrieval augmented generation) applications
Summarizing texts
Text generation
Information extraction from texts
Chatbot

The official 24 EU languages are:

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish.

Out-of-Scope use cases

The model is not completely free from biases and hallucinations. It may generate content that is inappropriate, offensive, or harmful. While the dataset has been filtered to minimize such outputs, the model may still produce text that is biased or toxic due to the large scale and diverse nature of the data.

Use in any manner that violates applicable laws or regulations (including trade compliance laws) is prohibited. Use in any other way than allowed by the Apache 2.0 license is prohibited.

Out-of-scope use cases include:

Math tasks
Coding tasks
Use in languages that are not among the 24 EU languages

Disclaimer

This publication of Teuken-7B-Instruct-Commercial-v0.4 on Ollama is done solely by me as an independent user without any official affiliation, endorsement, or connection to its creators or developers. This distribution is made in compliance with the terms and conditions of the Apache License 2.0, which permits use, distribution, and modification of the software under its specified guidelines. The original creators of Teuken-7B assume no responsibility for this release, its use, or any modifications made by me.

Source Code

The source code for this Ollama build of Teuken-7B can be found on GitHub.