BgGPT-v1.0, a state-of-the-art Bulgarian language model based on google/gemma-2-2b and google/gemma-2-2b-it.

BgGPT-v1.0

BgGPT-v1.0 is a Bulgarian language model based on Google’s Gemma 2 architecture. The model is free to use and distributed under the Gemma Terms of Use. This model was developed by INSAIT, part of Sofia University St. Kliment Ohridski, in Sofia, Bulgaria.

Model Description

This model was built on top of Google’s Gemma 2 open models through continuous pre-training on approximately 100 billion tokens (85 billion in Bulgarian) using the Branch-and-Merge strategy. The training allows the model to develop Bulgarian cultural and linguistic capabilities while maintaining English performance.

The pre-training utilized various datasets including Bulgarian web crawl data, Wikipedia, specialized Bulgarian datasets, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a Bulgarian instruction dataset created from real-world conversations.

Benchmarks and Results

The model has been evaluated on standard English benchmarks, their Bulgarian translations, and Bulgarian-specific benchmarks including:

Winogrande challenge: World knowledge and understanding
Hellaswag: Sentence completion
ARC Easy/Challenge: Logical reasoning
TriviaQA: Trivia knowledge
GSM-8k: High-school mathematics
Exams: High school problems from natural and social sciences
MON: Exams for grades 4 to 12

Performance comparisons show the model competing with other small open language models while retaining English performance from the original Gemma 2 base models.

Available Models

Multiple model sizes and quantizations are available:

Model	Size	Context	Quantization
BgGPT-v1.0:2.6b	1.7GB	8K	Q4_K_M
BgGPT-v1.0:2.6b-q8	2.8GB	8K	Q8_0
BgGPT-v1.0:9b	5.8GB	8K	Q4_K_M
BgGPT-v1.0:9b-q8	9.8GB	8K	Q8_0
BgGPT-v1.0:27b	17GB	8K	Q4_K_M
BgGPT-v1.0:27b-q8	29GB	8K	Q8_0

Usage with Ollama

To use this model with Ollama, you can pull it using:

# 2.6B model
ollama pull s_emanuilov/BgGPT-v1.0:2.6b

# 9B model
ollama pull s_emanuilov/BgGPT-v1.0:9b

# 27B model
ollama pull s_emanuilov/BgGPT-v1.0:27b

# Q8 quantized versions (higher quality)
ollama pull s_emanuilov/BgGPT-v1.0:2.6b-q8
ollama pull s_emanuilov/BgGPT-v1.0:9b-q8
ollama pull s_emanuilov/BgGPT-v1.0:27b-q8

Then run it with:

ollama run s_emanuilov/BgGPT-v1.0:2.6b

Instruction format

In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token <bos> and be formatted in the Gemma 2 chat template. <bos> should only be the first token in a chat sequence.

E.g.

<bos><start_of_turn>user
Кога е основан Софийският университет?<end_of_turn>
<start_of_turn>model

Recommended Parameters

For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them:

max_new_tokens: 2048
temperature: 0.1
top_k: 25
top_p: 1
repetition_penalty: 1.1