66 Downloads Updated 1 year ago
As featured on the Aider Leaderboard this fine tune of Qwen 2.5 72b is suited to coding tasks.
It’s highly recommended to run Ollama with K/V cache quantisation set to Q8_0 with Ollama build from the PR that adds this (https://github.com/ollama/ollama/pull/6279) to 1⁄2 the amount of vRAM used by the context.
Defaults:
| num_ctx | 50K | To be useful for medium-larger coding tasks |
| num_batch | 128 | To reduce memory overhead of the larger context |
| num_keep | 512 | To improve context overflow for coding |
| temperature | 0.1 | To reduce hallucinations |
| top_p | 0.8 | To increase quality |