8 Downloads Updated yesterday
ollama run hectocorn-labs/phibek-4b
Phibek (พิเภค) is a Thai instruction-following model developed by HectocornLabs — a Thai AI company building language technology for Thailand.
Named after Vibhishana (พิเภค) from the Ramayana (รามเกียรติ์), a figure of wisdom and integrity, Phibek is built to be Thailand’s honest, practical AI — deployable locally, usable in real workflows, and grounded in Thai language and culture.
Phibek-4B is built on Qwen3.5-4B, which demonstrates strong general capability across reasoning, multilingual understanding, and instruction following (official benchmarks).
Phibek extends this foundation with targeted Thai-language improvements:
| Benchmark | Qwen3.5-4B (base) | Phibek-4B v0.1 | Δ |
|---|---|---|---|
| XNLI-TH | 37.23% | 39.16% | +1.9% ✅ |
| MMLU EN | 74.31% | 73.92% | -0.4% |
| Code-switch rate ↓ | — | 25.0% | — |
Internal evaluation on Kaggle T4 (float16). Full Thai benchmark suite (ThaiExam, M3Exam) coming in v0.2.
| Item | Value |
|---|---|
| Model | Phibek-4B v0.1 |
| Base | Qwen3.5-4B (Apache 2.0) |
| Fine-tuning | LoRA bf16 (r=16, α=32) |
| Context | 4,096 tokens (Ollama) |
| Language | Thai + English |
| License | Apache 2.0 |
23,864 training samples across Thai instruction, general knowledge, and synthetic data:
| Source | Samples | License |
|---|---|---|
| Suraponn Thai SFT | 9,466 | Apache 2.0 |
| Thai Alpaca (cleaned) | 4,767 | CC BY 4.0 |
| WangchanThaiInstruct | 3,773 | Apache 2.0 (filtered) |
| Typhoon synthetic | 2,840 | Apache 2.0 |
| Tulu-3 SFT Mixture | 2,683 | ODC-BY |
| Identity injection | 335 | — |
ollama run hectocorn-labs/phibek-4b
🔗 https://ollama.com/hectocorn-labs/phibek-4b
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "hectocorn-labs/Phibek-4B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True
)
messages = [
{"role": "system", "content": "คุณคือ Phibek (พิเภค) ผู้ช่วย AI จาก HectocornLabs"},
{"role": "user", "content": "อธิบายแนวคิด 'เศรษฐกิจพอเพียง' แบบสั้นๆ"},
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
tok = getattr(tokenizer, "tokenizer", tokenizer)
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
enable_thinking=False in apply_chat_template() to disable chain-of-thought modeReleased under Apache License 2.0.
Base model: Qwen3.5-4B (Alibaba Cloud, Apache 2.0). Training data sources: WangchanThaiInstruct, Suraponn Thai SFT, Thai Alpaca, Tulu-3, Typhoon2.5-Qwen3-4B (SCB 10X). See NOTICE for full attribution.
HectocornLabs — building AI for Thailand / AI เพื่อคนไทย