medragondot

medragondot

Sky-T1-32B-Preview

32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data with performance on par with o1 preview.

tools

1,630 Pulls 2 Tags Updated 1 year ago
llama-3.2-3b-thinking

Llama 3.2-3b finetuned from Claude and o1 style thinking

349 Pulls 2 Tags Updated 1 year ago
llama-3.2-rltp-v2

Reinforcement Learning with Thought Process Llama 3.2 3B to achieve search

27 Pulls 1 Tag Updated 1 year ago
llama-3.2-rltp

Reinforcement Learning with Thought Process Llama 3.2 3B to achieve search

20 Pulls 1 Tag Updated 1 year ago