todorov/

bggpt

1,781 Downloads Updated 11 months ago

BgGPT is a Bulgarian language model built on top of Google’s Gemma 2.

Models

Name

24 models

Size

Context

Input

bggpt:latest

5.5GB · 8K context window · Text · 11 months ago

bggpt:latest

5.5GB

8K

Text

bggpt:v0.1

4.4GB · 32K context window · Text · 1 year ago

bggpt:v0.1

4.4GB

32K

Text

bggpt:v0.2

4.4GB · 32K context window · Text · 1 year ago

bggpt:v0.2

4.4GB

32K

Text

bggpt:v1.0

5.5GB · 8K context window · Text · 11 months ago

bggpt:v1.0 latest

5.5GB

8K

Text

bggpt:2B-IT-v1.0.F16

5.2GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.F16

5.2GB

8K

Text

bggpt:2B-IT-v1.0.Q4_K_M

1.7GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q4_K_M

1.7GB

8K

Text

bggpt:2B-IT-v1.0.Q4_K_S

1.6GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q4_K_S

1.6GB

8K

Text

bggpt:2B-IT-v1.0.Q5_K_M

1.9GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q5_K_M

1.9GB

8K

Text

bggpt:2B-IT-v1.0.Q5_K_S

1.9GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q5_K_S

1.9GB

8K

Text

bggpt:2B-IT-v1.0.Q6_K

2.2GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q6_K

2.2GB

8K

Text

bggpt:2B-IT-v1.0.Q8_0

2.8GB · 8K context window · Text · 11 months ago

bggpt:2B-IT-v1.0.Q8_0

2.8GB

8K

Text

bggpt:9B-IT-v1.0.F16

18GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.F16

18GB

8K

Text

bggpt:9B-IT-v1.0.Q4_K_M

5.8GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q4_K_M

5.8GB

8K

Text

bggpt:9B-IT-v1.0.Q4_K_S

5.5GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q4_K_S

5.5GB

8K

Text

bggpt:9B-IT-v1.0.Q5_K_M

6.6GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q5_K_M

6.6GB

8K

Text

bggpt:9B-IT-v1.0.Q5_K_S

6.5GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q5_K_S

6.5GB

8K

Text

bggpt:9B-IT-v1.0.Q6_K

7.6GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q6_K

7.6GB

8K

Text

bggpt:9B-IT-v1.0.Q8_0

9.8GB · 8K context window · Text · 11 months ago

bggpt:9B-IT-v1.0.Q8_0

9.8GB

8K

Text

bggpt:27B-IT-v1.0.Q4_K_M

17GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q4_K_M

17GB

8K

Text

bggpt:27B-IT-v1.0.Q4_K_S

16GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q4_K_S

16GB

8K

Text

bggpt:27B-IT-v1.0.Q5_K_M

19GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q5_K_M

19GB

8K

Text

bggpt:27B-IT-v1.0.Q5_K_S

19GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q5_K_S

19GB

8K

Text

bggpt:27B-IT-v1.0.Q6_K

22GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q6_K

22GB

8K

Text

bggpt:27B-IT-v1.0.Q8_0

29GB · 8K context window · Text · 11 months ago

bggpt:27B-IT-v1.0.Q8_0

29GB

8K

Text

Readme

BgGPT

Meet BgGPT, a Bulgarian language model built on top of Google’s Gemma 2. BgGPT is distributed under Gemma Terms of Use.

Versions 0.1 and 0.2 of the model were built on top of Mistral 0.1 and 0.2.

This model was created by INSAIT Institute, part of Sofia University, in Sofia, Bulgaria.

Model description

The model was built on top of Google’s Gemma 2 2B, 9B and 27B open models. It was continuously pre-trained on around 100 billion tokens (85 billion in Bulgarian) using the Branch-and-Merge strategy INSAIT presented at EMNLP’24, allowing the model to gain outstanding Bulgarian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Bulgarian web crawl data, freely available datasets such as Wikipedia, a range of specialized Bulgarian datasets sourced by the INSAIT Institute, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Bulgarian instruction dataset created using real-world conversations. For more information check our blogpost.

Usage

CLI

ollama run todorov/bggpt

API

Example:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "todorov/bggpt",
  "prompt":"Кога е основан Софийският университет?"
 }'

References

BgGPT
BgGPT-Gemma-2-2.6B-IT-v1.0 on Hugging Face
BgGPT-Gemma-2-9B-IT-v1.0 on Hugging Face
BgGPT-Gemma-2-27B-IT-v1.0 on Hugging Face

<p><img src="https://bggpt.ai/_app/immutable/assets/logo-plain-bright.DudhvT1g.svg" alt="BgGPT logo"/></p>

<h1>BgGPT</h1>

<p>Meet BgGPT, a Bulgarian language model built on top of Google’s Gemma 2. BgGPT is distributed under <a href="https://ai.google.dev/gemma/terms" rel="nofollow">Gemma Terms of Use</a>.</p>
<p>Versions 0.1 and 0.2 of the model were built on top of Mistral 0.1 and 0.2.</p>

<p>This model was created by <a href="https://insait.ai/" rel="nofollow">INSAIT Institute</a>, part of <a href="https://www.uni-sofia.bg/index.php/eng" rel="nofollow">Sofia University</a>, in Sofia, Bulgaria.</p>

<h2>Model description</h2>

<p>The model was built on top of Google’s Gemma 2 2B, 9B and 27B open models. It was continuously pre-trained on around 100 billion tokens (85 billion in Bulgarian) using the Branch-and-Merge strategy INSAIT presented at <a href="https://aclanthology.org/2024.findings-emnlp.1000/" rel="nofollow">EMNLP’24</a>, allowing the model to gain outstanding Bulgarian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Bulgarian web crawl data, freely available datasets such as Wikipedia, a range of specialized Bulgarian datasets sourced by the INSAIT Institute, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Bulgarian instruction dataset created using real-world conversations. For more information check our <a href="https://models.bggpt.ai/blog/" rel="nofollow">blogpost</a>.</p>

<h2>Usage</h2>

<h3>CLI</h3>

<pre><code>ollama run todorov/bggpt
</code></pre>

<h3>API</h3>

<p>Example:</p>

<pre><code>curl -X POST http://localhost:11434/api/generate -d &#39;{
  &#34;model&#34;: &#34;todorov/bggpt&#34;,
  &#34;prompt&#34;:&#34;Кога е основан Софийският университет?&#34;
 }&#39;
</code></pre>

<h2>References</h2>

<p><a href="https://bggpt.ai/" rel="nofollow">BgGPT</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-2.6B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-2.6B-IT-v1.0 on Hugging Face</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-9B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-9B-IT-v1.0 on Hugging Face</a><br>
<a href="https://huggingface.co/INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0" rel="nofollow">BgGPT-Gemma-2-27B-IT-v1.0 on Hugging Face</a></p>

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)