14 Downloads Updated 1 week ago
ollama run gabegoodhart/granite4.1-speech:2b
Updated 1 week ago
1 week ago
e614dfd77bed · 2.3GB ·
Granite-Speech-4.1 is a compact and efficient speech-language model from IBM, purpose-built for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). It was created by modality-aligning an intermediate checkpoint of granite-4.0-1b-base to speech, and trained on 174,000 hours of audio from public corpora plus synthetic data.
2B:
ollama run gabegoodhart/granite4.1-speech:2b /path/to/audio.wav "transcribe the speech with proper punctuation and capitalization."
English, French, German, Spanish, Portuguese, and Japanese.
Speech translation (AST) is supported to and from English for the languages above, plus English-to-Italian and English-to-Mandarin.
Granite-Speech-4.1 is designed for enterprise applications that process speech inputs — converting speech to text and translating between English and the supported languages. The model accepts mono, 16 kHz audio along with a text prompt that specifies the task.
To trigger speech processing, include the <|audio|> tag in your prompt. If the model receives an unfamiliar or malformed prompt, it falls back to transcription by default.
| Task | Prompt |
|---|---|
| ASR (raw) | can you transcribe the speech into a written format? |
| ASR (punctuation) | transcribe the speech with proper punctuation and capitalization. |
| ASR (keyword biasing) | transcribe the speech to text. Keywords: <kw1>, <kw2>, ... |
| AST (raw) | translate the speech to <language>. |
| AST (punctuation) | translate the speech to <language> with proper punctuation and capitalization. |
Note: Non-English ASR still requires an English prompt.
On the Open ASR Leaderboard, Granite-Speech-4.1-2b achieves a mean WER of 5.33 at an RTFx of 231.29.
| Dataset | WER |
|---|---|
| LibriSpeech Clean | 1.33 |
| LibriSpeech Other | 2.5 |
| SPGISpeech | 3.78 |
| AMI | 8.09 |
| Earnings22 | 8.37 |
| Gigaspeech | 9.8 |