did100/
phi4_Q8:latest

43 9 months ago

Microsoft Phi4 pulled from HuggingFace and quantized to Q8

9 months ago

d04922f21d98 · 16GB ·

phi3
·
14.7B
·
Q8_0
Microsoft. Copyright (c) Microsoft Corporation. MIT License Permission is hereby granted, free of ch
{ "stop": [ "<|im_start|>", "<|im_end|>", "<|im_sep|>" ] }
{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} <|im_start|>{{ .R

Readme

Phi4 with Q8 quantization

This is Phi4 from Microsoft with a Q8 quantization. The ollama version is Q4_K_M.

This model was built doing the following:

  • download of the official HuggingFace model
  • gguf conversion
  • quantization from BF16 to Q8
  • ollama import with the same model file than the official ollama Phi4

With a ctx-size of 3200, the official model is taking around 10GB of VRAM while this Q8 version takes 16GB.