todorov/

bggpt:latest

1,388 Downloads Updated 8 months ago

BgGPT is a Bulgarian language model built on top of Google’s Gemma 2.

Updated 8 months ago

8 months ago

2fcbcadcf30b · 5.5GB

archgemma2

·

parameters9.24B

·

quantizationQ4_K_S

5.5GB

Gemma Terms of Use Last modified: April 1, 2024 By using, reproducing, modifying, distributing, perf

8.4kB

{ "num_ctx": 8192, "num_predict": 2048, "repeat_penalty": 1, "stop": [ "<sta

154B

<start_of_turn>user {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn> <start_of_turn

136B

Readme

BgGPT

Meet BgGPT, a Bulgarian language model built on top of Google’s Gemma 2. BgGPT is distributed under Gemma Terms of Use.

Versions 0.1 and 0.2 of the model were built on top of Mistral 0.1 and 0.2.

This model was created by INSAIT Institute, part of Sofia University, in Sofia, Bulgaria.

Model description

The model was built on top of Google’s Gemma 2 2B, 9B and 27B open models. It was continuously pre-trained on around 100 billion tokens (85 billion in Bulgarian) using the Branch-and-Merge strategy INSAIT presented at EMNLP’24, allowing the model to gain outstanding Bulgarian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Bulgarian web crawl data, freely available datasets such as Wikipedia, a range of specialized Bulgarian datasets sourced by the INSAIT Institute, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Bulgarian instruction dataset created using real-world conversations. For more information check our blogpost.

Usage

CLI

ollama run todorov/bggpt

API

Example:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "todorov/bggpt",
  "prompt":"Кога е основан Софийският университет?"
 }'

References

BgGPT
BgGPT-Gemma-2-2.6B-IT-v1.0 on Hugging Face
BgGPT-Gemma-2-9B-IT-v1.0 on Hugging Face
BgGPT-Gemma-2-27B-IT-v1.0 on Hugging Face

<p><img src="https://bggpt.ai/_app/immutable/assets/logo-plain-bright.DudhvT1g.svg" alt="BgGPT logo"/></p>

<h1>BgGPT</h1>

<p>Meet BgGPT, a Bulgarian language model built on top of Google’s Gemma 2. BgGPT is distributed under <a href="https://ai.google.dev/gemma/terms" rel="nofollow">Gemma Terms of Use</a>.</p>
<p>Versions 0.1 and 0.2 of the model were built on top of Mistral 0.1 and 0.2.</p>

<p>This model was created by <a href="https://insait.ai/" rel="nofollow">INSAIT Institute</a>, part of <a href="https://www.uni-sofia.bg/index.php/eng" rel="nofollow">Sofia University</a>, in Sofia, Bulgaria.</p>

<h2>Model description</h2>

<p>The model was built on top of Google’s Gemma 2 2B, 9B and 27B open models. It was continuously pre-trained on around 100 billion tokens (85 billion in Bulgarian) using the Branch-and-Merge strategy INSAIT presented at <a href="https://aclanthology.org/2024.findings-emnlp.1000/" rel="nofollow">EMNLP’24</a>, allowing the model to gain outstanding Bulgarian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Bulgarian web crawl data, freely available datasets such as Wikipedia, a range of specialized Bulgarian datasets sourced by the INSAIT Institute, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Bulgarian instruction dataset created using real-world conversations. For more information check our <a href="https://models.bggpt.ai/blog/" rel="nofollow">blogpost</a>.</p>

<h2>Usage</h2>

<h3>CLI</h3>

<pre><code>ollama run todorov/bggpt
</code></pre>

<h3>API</h3>

<p>Example:</p>

<pre><code>curl -X POST http://localhost:11434/api/generate -d &#39;{
  &#34;model&#34;: &#34;todorov/bggpt&#34;,
  &#34;prompt&#34;:&#34;Кога е основан Софийският университет?&#34;
 }&#39;
</code></pre>

<h2>References</h2>

<p><a href="https://bggpt.ai/" rel="nofollow">BgGPT</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-2.6B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-2.6B-IT-v1.0 on Hugging Face</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-9B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-9B-IT-v1.0 on Hugging Face</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-27B-IT-v1.0 on Hugging Face</a></p>

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)