nub235/voyage-4-nano

nub235/ voyage-4-nano:latest

50 Downloads Updated 1 week ago

Voyage AI’s state of the art embedding model in Q8 GGUF for easy use. Important: See README for more info. I did not make this GGUF, credit goes to jsonMartin for the quant. See https://hf.co/jsonMartin/voyage-4-nano-gguf

embedding tools

ollama pull nub235/voyage-4-nano

curl http://localhost:11434/api/embed \
  -d '{
    "model": "nub235/voyage-4-nano",
    "input": "Why is the sky blue?"
  }'

import ollama

response = ollama.embed(
    model='nub235/voyage-4-nano',
    input='The sky is blue because of Rayleigh scattering',
)
print(response.embeddings)

import ollama from 'ollama'

const response = await ollama.embed({
  model: 'nub235/voyage-4-nano',
  input: 'The sky is blue because of Rayleigh scattering',
})
console.log(response.embeddings)

Details

Updated 1 week ago

1 week ago

b9e82fa95f20 · 372MB ·

model

archqwen3

parameters344M

quantizationQ8_0

372MB

template

{{- if .Suffix }}<|fim_prefix|>{{ .Prompt }}<|fim_suffix|>{{ .Suffix }}<|fim_middle|> {{- else if .M

1.6kB

params

{ "num_ctx": 4096 }

17B

Readme

Important:

Ollama shows 40K context and quick commands for apps, because this model uses Qwen3 architecture, but it’s actual context is less and it cannot be used in Claude Code or similar apps. I set the default context for this model to 4096 tokens to reduce memory, but this can be manually changed if needed.

Also:

This GGUF outputs embedding a of 1024 dimensions, and not the native 2048 dimensions of the model, because it is missing the linear projection layer at the end. It will still work and perform well, but it should not be dropped into workflows that already expect Voyage 4 embeddings. You can get the linear projection file and code to use it at the orgininal HF repo for this GGUF linked above.