2,956 4 months ago

RN_TR_R2 is a Turkish-language reasoning model fine-tuned from Turkish-Llama-8B using GRPO. It excels in STEM and cultural Q&A tasks, scoring 82.4% on benchmarks. Ideal for education-focused reasoning in Turkish.

tools

4 months ago

8eff3a89286d · 4.9GB ·

llama
·
8.03B
·
Q4_K_M
<|start_header_id|>system<|end_header_id|> Sana sorulan sorulara cevap verirken, adım adım düşü
{ "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"

Readme

Model Overview

RefinedNeuro/RN_TR_R2 is a Turkish‐language reasoning model fine-tuned from ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.


Evaluation

image/png

Benchmark: RN_TR_R2_Benchmark_Results (Open-ended Turkish culture & reasoning questions)

Model Name Score (%) Description
RefinedNeuro/RN_TR_R2 82.4 Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data
Qwen3-8B 76.8 Off-the-shelf Qwen3-8B
RefinedNeuro/RN_TR_R1 64.8 Baseline Turkish-fine-tuned model
ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 63.2 Turkish-adapted Llama-8B via direct preference optimization
Meta-Llama-3.1-8B-Instruct 63.2 Meta’s Llama-3.1-8B with instruct tuning

RN_TR_R2 achieves a 17.6 pp improvement over the RN_TR_R1 baseline.


Intended Use

  • Primary use cases: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
  • Not recommended: Generating creative fiction or tasks requiring memorized facts outside its training scope.
  • Known limitations: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.

Training Data

  • Base model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1

  • Fine-tuning dataset: RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset

    • Domains: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
    • Grade levels: 6th–12th grade
    • Prompt variations (e.g., long/detailed vs. short/easy; answer must be a single number)
    • Size: 13 000 examples (~9 million tokens)
    • License: Creative Commons

Training Configuration

  • Method: GRPO (Generalized Reinforcement-Preference Optimization) via unsloth
  • Hardware: 2× RTX 3090 GPUs for 1 week (~336 GPU-hours)
  • Batch size: 1
  • Context length: 4 096 tokens (max_seq_length=8 192 at inference)
  • Optimizer & scheduler: Cosine LR schedule, peak LR=5×10⁻⁶
  • Library: unsloth==2025.3.18

Usage Example

ollama run RefinedNeuro/RN_TR_R2

Sample Q&A

> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
>
> **A:** <think>

1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, önce f(x)'nin türevini bulmamız gerekmektedir.

2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.

f(x) = 3x⁴ - 2x³ + 5x - 7

3. Adım: Türevi bulmak için her terimi ayrı ayrı türevini buluyoruz.

- 3x⁴: Türevi 12x³'dür.
- -2x³: Türevi -6x²'dir.
- 5x: Türevi 5'dir.
- -7: Türevi 0'dır (çünkü sabit bir sayıdır).

4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.

f'(x) = 12x³ - 6x² + 5

5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.

f'(1) = 12(1)³ - 6(1)² + 5

f'(1) = 12(1) - 6(1) + 5

f'(1) = 12 - 6 + 5

f'(1) = 11

6. Adım: x = 1 noktasındaki türevi 11 olarak bulduk.

\boxed{11}

</think>

\boxed{11}

License

Apache 2.0 — the most permissive open-source license.


Authors

  • Authors: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro