16.7K 9 months ago

This version of Deepseek R1 is optimized for tool usage with Cline and Roo Code.

tools 1.5b 7b 8b 14b 32b 70b

9 months ago

a4ef529225d7 · 4.7GB ·

qwen2
·
7.62B
·
Q4_K_M
MIT License Copyright (c) 2023 DeepSeek Permission is hereby granted, free of charge, to any person
You are an expert software engineer with deep knowledge of programming and software architecture. Th
{ "num_ctx": 32000, "stop": [ "<|begin▁of▁sentence|>", "<|end▁of
{{- if .Tools }} <available_tools> {{- range .Tools }} {{ .Function }} {{- end }} </available_tools>

Readme

NB! You are here probably because you want to do programming or create agents locally. To be honest, Devstral using vllama is currently your best option – especially if you have 24 Gb of VRAM. You can find vllama here which also is capable of running Deepseek R1 blazingly fast.

Deepseek R1 optimized for Cline tool usage

This version of Deepseek R1 is tweaked to work with Cline or Roo Code, making it compatible with Cline and its tool protocol still honoring its thinking nature.

Tweaks

Here are the most remarkable capabilities added to the vanilla version of Deepseek R1.
☑ it has 32k context window
☑ it can use built-in tools
☑ it can use MCP tool servers

Model versions

When choosing the model size, you should keep in mind that also the context requires some VRAM headspace. Sometimes smaller is better, and you need to balance between model’s capabilities and space for the context.

So choose wisely between one or more of these:

ollama pull tom_himanen/deepseek-r1-roo-cline-tools:1.5b # 1.1 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:7b   # 4.7 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:8b   # 4.9 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:14b  # 9.0 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:32b  # 20 Gb
ollama pull tom_himanen/deepseek-r1-roo-cline-tools:70b  # 43 Gb

You can also find more variants of the model here. There is an enormous amount of different combinations of quantization, model size, temperature and context size.

32b version is recommended but some small tasks can be done with smaller ones.

If you have an Nvidia GPU with at least 24 Gb of VRAM, you can install vllama which is drop-in replacement for ollama inference, just 3-4 times faster.