openai/whisper
Convert speech in audio to text
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
audio * | string (uri) | Audio file | — | — |
compression_ratio_threshold | number | if the gzip compression ratio is higher than this value, treat the decoding as failed | 2.4 | — |
condition_on_previous_text | boolean | if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop | true | — |
initial_prompt | string | optional text to provide as a prompt for the first window. | — | — |
language | string | Language spoken in the audio, specify 'auto' for automatic language detection | "auto" | auto af am ar as az ba be bg bn bo br bs ca cs cy da de el en es et eu fa fi fo fr gl gu ha haw he hi hr ht hu hy id is it ja jw ka kk km kn ko la lb ln lo lt lv mg mi mk ml mn mr ms mt my ne nl nn no oc pa pl ps pt ro ru sa sd si sk sl sn so sq sr su sv sw ta te tg th tk tl tr tt uk ur uz vi yi yo yue zh Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Cantonese Castilian Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish Flemish French Galician Georgian German Greek Gujarati Haitian Haitian Creole Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Letzeburgesch Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Mandarin Maori Marathi Moldavian Moldovan Mongolian Myanmar Nepali Norwegian Nynorsk Occitan Panjabi Pashto Persian Polish Portuguese Punjabi Pushto Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Sinhalese Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Valencian Vietnamese Welsh Yiddish Yoruba |
logprob_threshold | number | if the average log probability is lower than this value, treat the decoding as failed | -1 | — |
no_speech_threshold | number | if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence | 0.6 | — |
patience | number | optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search | — | — |
suppress_tokens | string | comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations | "-1" | — |
temperature | number | temperature to use for sampling | 0 | — |
temperature_increment_on_fallback | number | temperature to increase when falling back when the decoding fails to meet either of the thresholds below | 0.2 | — |
transcription | string | Choose the format for the transcription | "plain text" | plain text srt vtt |
translate | boolean | Translate the text to English when set to True | false | — |
audio required string Audio file
compression_ratio_threshold number if the gzip compression ratio is higher than this value, treat the decoding as failed
2.4 condition_on_previous_text boolean if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
true initial_prompt string optional text to provide as a prompt for the first window.
language string Language spoken in the audio, specify 'auto' for automatic language detection
"auto" logprob_threshold number if the average log probability is lower than this value, treat the decoding as failed
-1 no_speech_threshold number if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
0.6 patience number optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
suppress_tokens string comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
"-1" temperature number temperature to use for sampling
0 temperature_increment_on_fallback number temperature to increase when falling back when the decoding fails to meet either of the thresholds below
0.2 transcription string Choose the format for the transcription
"plain text" translate boolean Translate the text to English when set to True
false 8099696689d2 Updated: 2/26/2026 143.5M runs
cinemasetfree.com