50 1 week ago

Voyage AI’s state of the art embedding model in Q8 GGUF for easy use. Important: See README for more info. I did not make this GGUF, credit goes to jsonMartin for the quant. See https://hf.co/jsonMartin/voyage-4-nano-gguf

embedding tools
ollama pull nub235/voyage-4-nano

Details

1 week ago

b9e82fa95f20 · 372MB ·

qwen3
·
344M
·
Q8_0
{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M
{ "num_ctx": 4096 }

Readme

Important:

Ollama shows 40K context and quick commands for apps, because this model uses Qwen3 architecture, but it’s actual context is less and it cannot be used in Claude Code or similar apps. I set the default context for this model to 4096 tokens to reduce memory, but this can be manually changed if needed.

Also:

This GGUF outputs embedding a of 1024 dimensions, and not the native 2048 dimensions of the model, because it is missing the linear projection layer at the end. It will still work and perform well, but it should not be dropped into workflows that already expect Voyage 4 embeddings. You can get the linear projection file and code to use it at the orgininal HF repo for this GGUF linked above.