Perfect for 24GB cards

Tools 70B

19 Pulls Updated 5 weeks ago

5 weeks ago

27e79b3132af · 22GB

model
llama
·
70.6B
·
IQ4_XS
params
{"stop":["<|start_header_id|>","<|end_header_id|>","<|eot_id|>"]}
template
{{ if .Messages }} {{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|> {{- if .System }} {{ .System }} {{- end }} {{- if .Tools }} You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the orginal use question. {{- end }} {{- end }}<|eot_id|> {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|> {{- if and $.Tools $last }} Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. {{ $.Tools }} {{- end }} {{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|> {{ end }} {{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|> {{- if .ToolCalls }} {{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }} {{- else }} {{ .Content }}{{ if not $last }}<|eot_id|>{{ end }} {{- end }} {{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|> {{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|> {{ end }} {{- end }} {{- end }} {{- else }} {{- if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ end }}{{ .Response }}{{ if .Response }}<|eot_id|>{{ end }}

Readme

Quant from mradermacher/Cathallama-70B-i1-GGUF.

image/png

Cathallama

Awesome model, my new daily driver.

Notable Performance

  • 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b
  • Strong performance in MMLU-PRO categories overall
  • Great performance during manual testing

Creation workflow

Models merged
* meta-llama/Meta-Llama-3.1-70B-Instruct
* turboderp/Cat-Llama-3-70B-instruct
* Nexusflow/Athene-70B

flowchart TD
    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    E[Merge] -->|Result| F[Cathallama]

image/png

Testing

Hyperparameters

  • Temperature: 0.0 for automated, 0.9 for manual
  • Penalize repeat sequence: 1.05
  • Consider N tokens for penalize: 256
  • Penalize repetition of newlines
  • Top-K sampling: 40
  • Top-P sampling: 0.95
  • Min-P sampling: 0.05

LLaMAcpp Version

  • b3527-2-g2d5dd7bb
  • -fa -ngl -1 -ctk f16 –no-mmap

Tested Files

  • Cathallama-70B.Q4_0.gguf
  • Nexusflow_Athene-70B.Q4_0.gguf
  • turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
  • Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Tests

Manual testing

Category Test Case Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common Sense Ball on cup OK KO KO OK
Big duck small horse KO OK KO OK
Killers OK OK KO OK
Strawberry r’s OK KO KO KO
9.11 or 9.9 bigger KO OK OK KO
Dragon or lens KO KO KO KO
Shirts OK OK KO KO
Sisters OK KO KO KO
Jane faster OK OK OK OK
Programming JSON OK OK OK OK
Python snake game OK KO KO KO
Math Door window combination OK OK KO KO
Smoke Poem OK OK OK OK
Story OK OK KO OK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

Model Success %
Cathallama-70B 51.0%
turboderp_Cat-Llama-3-70B-instruct 37.0%
Nexusflow_Athene-70B 41.0%
Meta-Llama-3.1-70B-Instruct 42.0%
MMLU-PRO category Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business 50.0% 45.0% 20.0% 40.0%
Law 40.0% 30.0% 30.0% 35.0%
Psychology 85.0% 80.0% 70.0% 75.0%
Biology 80.0% 70.0% 85.0% 80.0%
Chemistry 55.0% 40.0% 35.0% 35.0%
History 65.0% 60.0% 55.0% 65.0%
Other 55.0% 50.0% 45.0% 50.0%
Health 75.0% 40.0% 60.0% 65.0%
Economics 80.0% 75.0% 65.0% 70.0%
Math 45.0% 35.0% 15.0% 40.0%
Physics 50.0% 45.0% 45.0% 45.0%
Computer Science 60.0% 55.0% 55.0% 60.0%
Philosophy 55.0% 60.0% 45.0% 50.0%
Engineering 35.0% 40.0% 25.0% 35.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model Name Success%
Cathallama-70B.Q4_0.gguf 73.00%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf 76.00%
Nexusflow_Athene-70B.Q4_0.gguf 67.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf 72.00%

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!