openai/whisper

Convert speech in audio to text

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`audio`*	string(uri)	Audio file	`—`	—
`compression_ratio_threshold`	number	if the gzip compression ratio is higher than this value, treat the decoding as failed	`2.4`	—
`condition_on_previous_text`	boolean	if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop	`true`	—
`initial_prompt`	string	optional text to provide as a prompt for the first window.	`—`	—
`language`	string	Language spoken in the audio, specify 'auto' for automatic language detection	`"auto"`	autoafamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezhAfrikaansAlbanianAmharicArabicArmenianAssameseAzerbaijaniBashkirBasqueBelarusianBengaliBosnianBretonBulgarianBurmeseCantoneseCastilianCatalanChineseCroatianCzechDanishDutchEnglishEstonianFaroeseFinnishFlemishFrenchGalicianGeorgianGermanGreekGujaratiHaitianHaitian CreoleHausaHawaiianHebrewHindiHungarianIcelandicIndonesianItalianJapaneseJavaneseKannadaKazakhKhmerKoreanLaoLatinLatvianLetzeburgeschLingalaLithuanianLuxembourgishMacedonianMalagasyMalayMalayalamMalteseMandarinMaoriMarathiMoldavianMoldovanMongolianMyanmarNepaliNorwegianNynorskOccitanPanjabiPashtoPersianPolishPortuguesePunjabiPushtoRomanianRussianSanskritSerbianShonaSindhiSinhalaSinhaleseSlovakSlovenianSomaliSpanishSundaneseSwahiliSwedishTagalogTajikTamilTatarTeluguThaiTibetanTurkishTurkmenUkrainianUrduUzbekValencianVietnameseWelshYiddishYoruba
`logprob_threshold`	number	if the average log probability is lower than this value, treat the decoding as failed	`-1`	—
`no_speech_threshold`	number	if the probability of the <\|nospeech\|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence	`0.6`	—
`patience`	number	optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search	`—`	—
`suppress_tokens`	string	comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations	`"-1"`	—
`temperature`	number	temperature to use for sampling	`0`	—
`temperature_increment_on_fallback`	number	temperature to increase when falling back when the decoding fails to meet either of the thresholds below	`0.2`	—
`transcription`	string	Choose the format for the transcription	`"plain text"`	plain textsrtvtt
`translate`	boolean	Translate the text to English when set to True	`false`	—

audiorequiredstring

Audio file

compression_ratio_thresholdnumber

if the gzip compression ratio is higher than this value, treat the decoding as failed

Default: 2.4

condition_on_previous_textboolean

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Default: true

initial_promptstring

optional text to provide as a prompt for the first window.

languagestring

Language spoken in the audio, specify 'auto' for automatic language detection

Default: "auto"

autoafamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezhAfrikaansAlbanianAmharicArabicArmenianAssameseAzerbaijaniBashkirBasqueBelarusianBengaliBosnianBretonBulgarianBurmeseCantoneseCastilianCatalanChineseCroatianCzechDanishDutchEnglishEstonianFaroeseFinnishFlemishFrenchGalicianGeorgianGermanGreekGujaratiHaitianHaitian CreoleHausaHawaiianHebrewHindiHungarianIcelandicIndonesianItalianJapaneseJavaneseKannadaKazakhKhmerKoreanLaoLatinLatvianLetzeburgeschLingalaLithuanianLuxembourgishMacedonianMalagasyMalayMalayalamMalteseMandarinMaoriMarathiMoldavianMoldovanMongolianMyanmarNepaliNorwegianNynorskOccitanPanjabiPashtoPersianPolishPortuguesePunjabiPushtoRomanianRussianSanskritSerbianShonaSindhiSinhalaSinhaleseSlovakSlovenianSomaliSpanishSundaneseSwahiliSwedishTagalogTajikTamilTatarTeluguThaiTibetanTurkishTurkmenUkrainianUrduUzbekValencianVietnameseWelshYiddishYoruba

logprob_thresholdnumber

if the average log probability is lower than this value, treat the decoding as failed

Default: -1

no_speech_thresholdnumber

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Default: 0.6

patiencenumber

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

suppress_tokensstring

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Default: "-1"

temperaturenumber

temperature to use for sampling

Default: 0

temperature_increment_on_fallbacknumber

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default: 0.2

transcriptionstring

Choose the format for the transcription

Default: "plain text"

plain textsrtvtt

translateboolean

Translate the text to English when set to True

Default: false

Version: 8099696689d2Updated: 7/25/2026143.5M runs