milkey

m3e

Moka-AI Massive Mixed Embedding

embedding

6,787 Pulls 7 Tags Updated 1 year ago

QwQ-32B-0305

QwQ is the reasoning model of the Qwen series.

tools

3,364 Pulls 1 Tag Updated 8 months ago

dmeta-embedding-zh

Dmeta-embedding is a cross-domain, cross-task, out-of-the-box Chinese embedding model.

embedding

2,793 Pulls 2 Tags Updated 1 year ago

gte

General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning trained by Alibaba DAMO Academy.

embedding

2,636 Pulls 2 Tags Updated 1 year ago

deepseek-v3-UD

(Unsloth Dynamic Quants) A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

2,011 Pulls 3 Tags Updated 9 months ago

reader-lm-v2

ReaderLM-v2 is a 1.5B parameter language model that converts raw HTML into beautifully formatted markdown or JSON with superior accuracy and improved longer context handling.

1,278 Pulls 3 Tags Updated 9 months ago

bilibili-index

由哔哩哔哩自主研发的大语言模型，Index-1.9B 系列是 Index 系列模型中的轻量版本。

941 Pulls 3 Tags Updated 1 year ago

Simplescaling-S1

s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing.

tools

547 Pulls 3 Tags Updated 9 months ago

Kalomaze-Qwen3-16B-A3B

Qwen3-16B-A3B is a rendition of Qwen3-30B-A3B by kalomaze.

tools

443 Pulls 3 Tags Updated 6 months ago

deepseek-v2.5-1210

DeepSeek-V2.5-1210 is an upgraded version of DeepSeek-V2.5, offering enhanced mathematical, coding, writing, and reasoning capabilities.

350 Pulls 3 Tags Updated 10 months ago

rwkv-6-world

RWKV (pronounced RwaKuv) is an RNN with great LLM performance.

314 Pulls 1 Tag Updated 10 months ago

Qihoo360-Light-R1-14B-DS

Light-R1-14B-DS is the State-Of-The-Art 14B math model with AIME24 & 25 scores 74.0 & 60.2, outperforming many 32B models.

302 Pulls 1 Tag Updated 7 months ago

Qihoo360-Light-R1-32B

Light-R1: Surpassing R1-Distill from Scratch* with $1000 through Curriculum SFT & DPO

tools

210 Pulls 1 Tag Updated 8 months ago

deepseek-r1-UD

(Unsloth Dynamic Quants) DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, 671B MoE full model, not dense distilled models.

194 Pulls 2 Tags Updated 9 months ago

deepseek-v2.5-1210-UD

(Unsloth Dynamic Quants) DeepSeek-V2.5-1210 is an upgraded version of DeepSeek-V2.5, offering enhanced mathematical, coding, writing, and reasoning capabilities.

176 Pulls 3 Tags Updated 9 months ago

GLM-4-9B-0414

GLM-4-0414 series models.

169 Pulls 1 Tag Updated 5 months ago

Qwen3-UD

(Unsloth Dynamic 2.0 Quants) Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models.

tools

101 Pulls 1 Tag Updated 6 months ago

TheDrummer-GLM-Steam-106B-A12B-v1

tools thinking

24 Pulls 1 Tag Updated 1 month ago