16.7K Downloads Updated 9 months ago
Updated 9 months ago
9 months ago
5d920282cc97 · 43GB ·
NB! You are here probably because you want to do programming or create agents locally. To be honest, Devstral using vllama is currently your best option – especially if you have 24 Gb of VRAM. You can find vllama here which also is capable of running Deepseek R1 blazingly fast.
This version of Deepseek R1 is tweaked to work with Cline or Roo Code, making it compatible with Cline and its tool protocol still honoring its thinking nature.
Here are the most remarkable capabilities added to the vanilla version of Deepseek R1.
☑ it has 32k context window
☑ it can use built-in tools
☑ it can use MCP tool servers
When choosing the model size, you should keep in mind that also the context requires some VRAM headspace. Sometimes smaller is better, and you need to balance between model’s capabilities and space for the context.
So choose wisely between one or more of these:
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:1.5b # 1.1 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:7b # 4.7 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:8b # 4.9 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:14b # 9.0 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:32b # 20 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:70b # 43 Gb
You can also find more variants of the model here. There is an enormous amount of different combinations of quantization, model size, temperature and context size.
32b version is recommended but some small tasks can be done with smaller ones.
If you have an Nvidia GPU with at least 24 Gb of VRAM, you can install vllama which is drop-in replacement for ollama inference, just 3-4 times faster.