47 6 months ago

This is a distill model that trained from the dataset of TieBa latest. Used about 8k data and think chain from DeepSeek-V3.

b2ad9c47ff5f · 66B
{
"num_ctx": 4096,
"stop": [
"<|end▁of▁sentence|>"
]
}