nishtahir/ zeta:7b-fp16

2,305 Downloads Updated 7 months ago

This repository contains a fine-tuned version of Qwen2.5-Coder-7B to support edit prediction in Zed.

7b

ollama run nishtahir/zeta:7b-fp16

curl http://localhost:11434/api/chat \
  -d '{
    "model": "nishtahir/zeta:7b-fp16",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='nishtahir/zeta:7b-fp16',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'nishtahir/zeta:7b-fp16',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Details

Updated 7 months ago

7 months ago

28670c68ee9b · 15GB ·

model

archqwen2

·

parameters7.62B

·

quantizationF16

15GB

system

You are a code completion assistant and your task is to analyze user edits and then rewrite an excer

220B

template

{{ if .System }} ### Instruction: {{ .System }} {{ end }} {{ .Prompt }} ### Response: {{ .Response }

103B

Readme

(Unofficial) Edit Prediction: Fine-Tuned from Qwen2.5-Coder-7B

This repository contains a fine-tuned version of Qwen2.5-Coder-7B to support edit prediction in Zed.

Training Details

The model has been fine-tuned using the zeta dataset. If you want to fine-tune the model yourself, you can refer to the following scripts:

DPO Fine-Tuning: View Notebook
SFT Fine-Tuning: View Notebook

Dataset

The dataset used for training is available at: zed-industries/zeta

Running Zeta

vLLM - Simple

vllm serve zed-industries/zeta --served-model-name zeta

vLLM - Advanced

Quantization vLLM supports FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs such as Nvidia H100 and AMD MI300x.
NGram Speculative Decoding configures vLLM to use speculative decoding where proposals are generated by matching n-grams in the prompt. This is a great fit for edit predictions since many of the tokens are already present in the prompt and the model is only needed to generate changes to the code file.

vllm serve zed-industries/zeta --served-model-name zeta --enable-prefix-caching --enable-chunked-prefill --quantization="fp8" --speculative-model [ngram] --ngram-prompt-lookup-max 4 --ngram-prompt-lookup-min 2 --num-speculative-tokens 8

Learn More

For more insights about the model and its integration in Zed, check out the official blog post: Zed Blog - Edit Prediction

<img src="https://cdn-uploads.huggingface.co/production/uploads/644a8bc1cb1654dcb6e762f9/6296GYaJsrUBSAeUwUHvm.png" width="100">

# (Unofficial) Edit Prediction: Fine-Tuned from Qwen2.5-Coder-7B

This repository contains a fine-tuned version of **Qwen2.5-Coder-7B** to support [edit prediction](https://zed.dev/edit-prediction) in Zed.

## Training Details

The model has been fine-tuned using the [zeta dataset](https://huggingface.co/datasets/zed-industries/zeta). If you want to fine-tune the model yourself, you can refer to the following scripts:

- **DPO Fine-Tuning**: [View Notebook](https://huggingface.co/datasets/zed-industries/zeta/blob/main/script/dpo.ipynb)
- **SFT Fine-Tuning**: [View Notebook](https://huggingface.co/datasets/zed-industries/zeta/blob/main/script/sft.ipynb)

## Dataset

The dataset used for training is available at:
[zed-industries/zeta](https://huggingface.co/datasets/zed-industries/zeta)

## Running Zeta

### vLLM - Simple

`vllm serve zed-industries/zeta --served-model-name zeta`

### vLLM - Advanced

- [Quantization](https://docs.vllm.ai/en/latest/features/quantization/fp8.html#) vLLM supports FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs such as Nvidia H100 and AMD MI300x.

- [NGram Speculative Decoding](https://docs.vllm.ai/en/latest/features/spec_decode.html#speculating-by-matching-n-grams-in-the-prompt) configures vLLM to use
speculative decoding where proposals are generated by matching n-grams in the prompt. This is a great fit for edit predictions since many of the tokens are already present in the prompt and
the model is only needed to generate changes to the code file.

`vllm serve zed-industries/zeta --served-model-name zeta --enable-prefix-caching --enable-chunked-prefill --quantization="fp8" --speculative-model [ngram] --ngram-prompt-lookup-max 4 --ngram-prompt-lookup-min 2 --num-speculative-tokens 8`

## Learn More

For more insights about the model and its integration in Zed, check out the official blog post:
[Zed Blog - Edit Prediction](https://zed-k1xdvw833-zed-industries.vercel.app/blog/edit-prediction)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)