DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. For research purposes, we release checkpoints of base, instruct, and RL models to the public.

$table$

Evaluation Results

DeepSeekMath-Base 7B

We conduct a comprehensive assessment of the mathematical capabilities of DeepSeekMath-Base 7B, focusing on its ability to produce self-contained mathematical solutions without relying on external tools, solve math problems using tools, and conduct formal theorem proving. Beyond mathematics, we also provide a more general profile of the base model, including its performance of natural language understanding, reasoning, and programming skills.

Mathematical problem solving with step-by-step reasoning

$table$

Mathematical problem solving with tool use

$table$

Natural Language Understanding, Reasoning, and Code
$table$

The evaluation results from the tables above can be summarized as follows: - Superior Mathematical Reasoning: On the competition-level MATH dataset, DeepSeekMath-Base 7B outperforms existing open-source base models by more than 10% in absolute terms through few-shot chain-of-thought prompting, and also surpasses Minerva 540B. - Strong Tool Use Ability: Continuing pre-training with DeepSeekCoder-Base-7B-v1.5 enables DeepSeekMath-Base 7B to more effectively solve and prove mathematical problems by writing programs. - Comparable Reasoning and Coding Performance: DeepSeekMath-Base 7B achieves performance in reasoning and coding that is comparable to that of DeepSeekCoder-Base-7B-v1.5.

DeepSeekMath-Instruct and -RL 7B

DeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B, while DeepSeekMath-RL 7B is trained on the foundation of DeepSeekMath-Instruct 7B, utilizing our proposed Group Relative Policy Optimization (GRPO) algorithm.

We evaluate mathematical performance both without and with tool use, on 4 quantitative reasoning benchmarks in English and Chinese. As shown in Table, DeepSeekMath-Instruct 7B demonstrates strong performance of step-by-step reasoning, and DeepSeekMath-RL 7B approaches an accuracy of 60% on MATH with tool use, surpassing all existing open-source models.

$table$

Data Collection

Step 1: Select OpenWebMath, a collection of high-quality mathematical web texts, as our initial seed corpus for training a FastText model.
Step 2: Use the FastText model to retrieve mathematical web pages from the deduplicated Common Crawl database.
Step 3: Identify potential math-related domains through statistical analysis.
Step 4: Manually annotate URLs within these identified domains that are associated with mathematical content.
Step 5: Add web pages linked to these annotated URLs, but not yet collected, to the seed corpus. Jump to step 1 until four iterations.

$table$

After four iterations of data collection, we end up with 35.5M mathematical web pages, totaling 120B tokens.

Citation

@misc{deepseek-math,
  author = {Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo},
  title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  journal = {CoRR},
  volume = {abs/2402.03300},
  year = {2024},
  url = {https://arxiv.org/abs/2402.03300},
}

Contact

If you have any questions, please raise an issue or contact DeepSeek at service@deepseek.com.

# deepseek-math-7b-rl

It's https://huggingface.co/tastypear/deepseek-ai-deepseek-math-7b-rl-GGUF but on Ollama with a good system prompt.

__Tags:__

- `ollama run t1c/deepseek-math-7b-rl:latest` Q4_K_M (default)

- `ollama run t1c/deepseek-math-7b-rl:Q5` Q5_K_M

- `ollama run t1c/deepseek-math-7b-rl:Q6` Q6_K

- `ollama run t1c/deepseek-math-7b-rl:Q8` Q8_0

---

Here's a shortened version of the readme from the [original GitHub repo](https://github.com/deepseek-ai/DeepSeek-Math/):

![DeepSeek Logo](https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math/main/images/logo.svg)

## Introduction

DeepSeekMath is initialized with [DeepSeek-Coder-v1.5 7B](https://huggingface.co/deepseek-ai/deepseek-coder-7b-base-v1.5) and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens. DeepSeekMath 7B has achieved an impressive score of **51.7%** on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. For research purposes, we release [checkpoints](#4-model-downloads) of base, instruct, and RL models to the public.

![table](https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math/main/images/math.png)

## Evaluation Results

### DeepSeekMath-Base 7B

- **Mathematical problem solving with step-by-step reasoning**

- **Mathematical problem solving with tool use**

- **Natural Language Understanding, Reasoning, and Code**

<img src="https://raw.githubusercontent.com/deepseek-ai/DeepSeek-Math/main/images/base_results_3.png" alt="table" width="50%">

The evaluation results from the tables above can be summarized as follows:
  - **Superior Mathematical Reasoning:** On the competition-level MATH dataset, DeepSeekMath-Base 7B outperforms existing open-source base models by more than 10% in absolute terms through few-shot chain-of-thought prompting, and also surpasses Minerva 540B.
  - **Strong Tool Use Ability:** Continuing pre-training with DeepSeekCoder-Base-7B-v1.5 enables DeepSeekMath-Base 7B to more effectively solve and prove mathematical problems by writing programs.
  - **Comparable Reasoning and Coding Performance:** DeepSeekMath-Base 7B achieves performance in reasoning and coding that is comparable to that of DeepSeekCoder-Base-7B-v1.5.

### DeepSeekMath-Instruct and -RL  7B

## Data Collection

- Step 1:  Select [OpenWebMath](https://arxiv.org/pdf/2310.06786.pdf), a collection of high-quality mathematical web texts, as our initial seed corpus for training a FastText model.
- Step 2:  Use the FastText model to retrieve mathematical web pages from the deduplicated Common Crawl database.
- Step 3:  Identify potential math-related domains through statistical analysis.
- Step 4:  Manually annotate URLs within these identified domains that are associated with mathematical content.
- Step 5:  Add web pages linked to these annotated URLs, but not yet collected, to the seed corpus. Jump to step 1 until four iterations.

After four iterations of data collection, we end up with **35.5M** mathematical web pages, totaling **120B** tokens.

## Citation

```
@misc{deepseek-math,
  author = {Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo},
  title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  journal = {CoRR},
  volume = {abs/2402.03300},
  year = {2024},
  url = {https://arxiv.org/abs/2402.03300},
}
```

## Contact

If you have any questions, please raise an [issue](https://github.com/deepseek-ai/DeepSeek-Math/issues) or contact DeepSeek at [service@deepseek.com](mailto:service@deepseek.com).

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)