Details

Updated 1 week ago

1 week ago

e614dfd77bed · 2.3GB ·

model

archgranite

parameters1.84B

quantizationQ4_K_M

1.1GB

projector

archclip

parameters476M

quantizationF16

1.2GB

template

{{- range .Messages }}<|{{ .Role }}|> {{ .Content }}<|end|> {{ end }}<|assistant|>

83B

license

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US

11kB

params

{ "stop": [ "<|end|>", "<|system|>", "<|user|>", "<|assistant|>"

101B

Granite 4.1 Speech

Granite-Speech-4.1 is a compact and efficient speech-language model from IBM, purpose-built for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). It was created by modality-aligning an intermediate checkpoint of granite-4.0-1b-base to speech, and trained on 174,000 hours of audio from public corpora plus synthetic data.

Parameter Sizes

2B:

ollama run gabegoodhart/granite4.1-speech:2b /path/to/audio.wav "transcribe the speech with proper punctuation and capitalization."

Supported Languages

English, French, German, Spanish, Portuguese, and Japanese.

Speech translation (AST) is supported to and from English for the languages above, plus English-to-Italian and English-to-Mandarin.

Intended Use

Granite-Speech-4.1 is designed for enterprise applications that process speech inputs — converting speech to text and translating between English and the supported languages. The model accepts mono, 16 kHz audio along with a text prompt that specifies the task.

To trigger speech processing, include the <|audio|> tag in your prompt. If the model receives an unfamiliar or malformed prompt, it falls back to transcription by default.

Capabilities

Multilingual ASR — High-accuracy transcription across six languages, powered by a dual-head CTC conformer encoder (graphemic + BPE outputs) with frame importance sampling.
Speech Translation (AST) — Bidirectional translation between English and supported languages, including English-to-Italian and English-to-Mandarin.
Punctuation & Truecasing — Produces properly punctuated and capitalized output, including German noun capitalization, via a prompt change.
Keyword Biasing — Improved recognition of names, acronyms, and technical jargon when supplied with a keyword list.

Preferred Prompts by Task

Task	Prompt
ASR (raw)	`can you transcribe the speech into a written format?`
ASR (punctuation)	`transcribe the speech with proper punctuation and capitalization.`
ASR (keyword biasing)	`transcribe the speech to text. Keywords: <kw1>, <kw2>, ...`
AST (raw)	`translate the speech to <language>.`
AST (punctuation)	`translate the speech to <language> with proper punctuation and capitalization.`

Note: Non-English ASR still requires an English prompt.

Evaluation

On the Open ASR Leaderboard, Granite-Speech-4.1-2b achieves a mean WER of 5.33 at an RTFx of 231.29.

Dataset	WER
LibriSpeech Clean	1.33
LibriSpeech Other	2.5
SPGISpeech	3.78
AMI	8.09
Earnings22	8.37
Gigaspeech	9.8

Learn more

Developers: IBM Granite Speech Team
Release Date: April 29, 2026
License: Apache 2.0