did100/

phi4_Q8

43 Downloads Updated 9 months ago

Microsoft Phi4 pulled from HuggingFace and quantized to Q8

Models

Name

1 model

Size

Context

Input

phi4_Q8:latest

16GB · 16K context window · Text · 9 months ago

phi4_Q8:latest

16GB

16K

Text

Readme

Phi4 with Q8 quantization

This is Phi4 from Microsoft with a Q8 quantization. The ollama version is Q4_K_M.

This model was built doing the following:

download of the official HuggingFace model
gguf conversion
quantization from BF16 to Q8
ollama import with the same model file than the official ollama Phi4

With a ctx-size of 3200, the official model is taking around 10GB of VRAM while this Q8 version takes 16GB.