8 yesterday

Phibek-4B is a compact Thai language model developed by Hectocorn Labs, built on Qwen3.5-4B. Designed for practical deployment, it focuses on clear instruction-following and efficient Thai language understanding for real-world applications.

ollama run hectocorn-labs/phibek-4b

Details

yesterday

2db3b3c4358a · 2.7GB ·

qwen35
·
4.21B
·
Q4_K_M
{ "num_ctx": 4096, "num_predict": 1024, "repeat_last_n": 256, "repeat_penalty": 1.15
{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ range .Messages }}{{ if eq .R

Readme

Phibek-4B v0.1

Phibek (พิเภค) is a Thai instruction-following model developed by HectocornLabs — a Thai AI company building language technology for Thailand.

Named after Vibhishana (พิเภค) from the Ramayana (รามเกียรติ์), a figure of wisdom and integrity, Phibek is built to be Thailand’s honest, practical AI — deployable locally, usable in real workflows, and grounded in Thai language and culture.


🚀 Try It


What Phibek Is Built For

  • 🇹🇭 Thai-first instruction following — designed to understand and respond naturally in Thai
  • Lightweight (4B parameters) — runs locally via Ollama or GGUF on consumer hardware
  • 🧩 Real-world usability — tuned for practical tasks: summarization, translation, drafting, Q&A
  • 🌐 Bilingual — Thai + English, retaining general capability

Evaluation

Phibek-4B is built on Qwen3.5-4B, which demonstrates strong general capability across reasoning, multilingual understanding, and instruction following (official benchmarks).

Phibek extends this foundation with targeted Thai-language improvements:

Benchmark Qwen3.5-4B (base) Phibek-4B v0.1 Δ
XNLI-TH 37.23% 39.16% +1.9%
MMLU EN 74.31% 73.92% -0.4%
Code-switch rate ↓ 25.0%

Internal evaluation on Kaggle T4 (float16). Full Thai benchmark suite (ThaiExam, M3Exam) coming in v0.2.


Model Details

Item Value
Model Phibek-4B v0.1
Base Qwen3.5-4B (Apache 2.0)
Fine-tuning LoRA bf16 (r=16, α=32)
Context 4,096 tokens (Ollama)
Language Thai + English
License Apache 2.0

Training Data

23,864 training samples across Thai instruction, general knowledge, and synthetic data:

Source Samples License
Suraponn Thai SFT 9,466 Apache 2.0
Thai Alpaca (cleaned) 4,767 CC BY 4.0
WangchanThaiInstruct 3,773 Apache 2.0 (filtered)
Typhoon synthetic 2,840 Apache 2.0
Tulu-3 SFT Mixture 2,683 ODC-BY
Identity injection 335

Usage

Ollama (recommended for local use)

ollama run hectocorn-labs/phibek-4b

🔗 https://ollama.com/hectocorn-labs/phibek-4b


Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hectocorn-labs/Phibek-4B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True
)

messages = [
    {"role": "system", "content": "คุณคือ Phibek (พิเภค) ผู้ช่วย AI จาก HectocornLabs"},
    {"role": "user", "content": "อธิบายแนวคิด 'เศรษฐกิจพอเพียง' แบบสั้นๆ"},
]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
tok = getattr(tokenizer, "tokenizer", tokenizer)
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

  • 4B parameters — smaller than frontier models; capability ceiling applies
  • Thai performance may vary on specialized or technical domains
  • Synthetic training data (Typhoon2.5) introduces some quality variability
  • Identity consistency at 35% pass rate in v0.1 — improving in future releases
  • Use enable_thinking=False in apply_chat_template() to disable chain-of-thought mode

License & Attribution

Released under Apache License 2.0.

Base model: Qwen3.5-4B (Alibaba Cloud, Apache 2.0). Training data sources: WangchanThaiInstruct, Suraponn Thai SFT, Thai Alpaca, Tulu-3, Typhoon2.5-Qwen3-4B (SCB 10X). See NOTICE for full attribution.


HectocornLabs — building AI for Thailand / AI เพื่อคนไทย