1,025 1 year ago

(unofficial) An instruction-tuned 7B parameter multilingual large language model (LLM) pre-trained with 4T tokens in all official 24 European languages and released in the research project OpenGPT-X.

7b
ollama run olfh/teuken-7b-instruct-commercial-v0.4:7b

Details

1 year ago

17c79ef84136 · 5.0GB ·

llama
·
7.45B
·
Q4_K_M
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user
Cutting knowledge date: February 2023 You are an artificial intelligence assistant and you chat with
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "num_ctx": 4096, "stop": [ "<|im_start|>", "<|im_end|>" ], "temper

Readme

Teuken-7B-instruct-commercial-v0.4 is an instruction-tuned 7B parameter multilingual large language model (LLM) pre-trained with 4T tokens in all official 24 European languages and released under Apache 2.0 in the research project OpenGPT-X.

open-gpt-x.png

To learn more about Teuken-7B-instruct-commercial-v0.4, visit Hugging Face.

Primary use cases

The model is designed for commercial and research use in all official 24 European languages. Since it focuses on covering all 24 EU languages, it provides more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks.

Primary use cases include:

  • RAG (retrieval augmented generation) applications
  • Summarizing texts
  • Text generation
  • Information extraction from texts
  • Chatbot

The official 24 EU languages are:

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish.

Out-of-Scope use cases

The model is not completely free from biases and hallucinations. It may generate content that is inappropriate, offensive, or harmful. While the dataset has been filtered to minimize such outputs, the model may still produce text that is biased or toxic due to the large scale and diverse nature of the data.

Use in any manner that violates applicable laws or regulations (including trade compliance laws) is prohibited. Use in any other way than allowed by the Apache 2.0 license is prohibited.

Out-of-scope use cases include:

  • Math tasks
  • Coding tasks
  • Use in languages that are not among the 24 EU languages

Disclaimer

This publication of Teuken-7B-Instruct-Commercial-v0.4 on Ollama is done solely by me as an independent user without any official affiliation, endorsement, or connection to its creators or developers. This distribution is made in compliance with the terms and conditions of the Apache License 2.0, which permits use, distribution, and modification of the software under its specified guidelines. The original creators of Teuken-7B assume no responsibility for this release, its use, or any modifications made by me.

Source Code

The source code for this Ollama build of Teuken-7B can be found on GitHub.