vaibhavs10/incredibly-fast-whisper

whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`audio`*	string(uri)	Audio file	`—`	—
`batch_size`	integer	Number of parallel batches you want to compute. Reduce if you face OOMs.	`24`	—
`diarise_audio`	boolean	Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.	`false`	—
`hf_token`	string	Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.	`—`	—
`language`	string	Language spoken in the audio, specify 'None' to perform language detection.	`"None"`	Noneafrikaansalbanianamharicarabicarmenianassameseazerbaijanibashkirbasquebelarusianbengalibosnianbretonbulgariancantonesecatalanchinesecroatianczechdanishdutchenglishestonianfaroesefinnishfrenchgaliciangeorgiangermangreekgujaratihaitian creolehausahawaiianhebrewhindihungarianicelandicindonesianitalianjapanesejavanesekannadakazakhkhmerkoreanlaolatinlatvianlingalalithuanianluxembourgishmacedonianmalagasymalaymalayalammaltesemaorimarathimongolianmyanmarnepalinorwegiannynorskoccitanpashtopersianpolishportuguesepunjabiromanianrussiansanskritserbianshonasindhisinhalaslovaksloveniansomalispanishsundaneseswahiliswedishtagalogtajiktamiltatarteluguthaitibetanturkishturkmenukrainianurduuzbekvietnamesewelshyiddishyoruba
`task`	string	Task to perform: transcribe or translate to another language.	`"transcribe"`	transcribetranslate
`timestamp`	string	Whisper supports both chunked as well as word level timestamps.	`"chunk"`	chunkword

audiorequiredstring

Audio file

batch_sizeinteger

Number of parallel batches you want to compute. Reduce if you face OOMs.

Default: 24

diarise_audioboolean

Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.

Default: false

hf_tokenstring

Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.

languagestring

Language spoken in the audio, specify 'None' to perform language detection.

Default: "None"

Noneafrikaansalbanianamharicarabicarmenianassameseazerbaijanibashkirbasquebelarusianbengalibosnianbretonbulgariancantonesecatalanchinesecroatianczechdanishdutchenglishestonianfaroesefinnishfrenchgaliciangeorgiangermangreekgujaratihaitian creolehausahawaiianhebrewhindihungarianicelandicindonesianitalianjapanesejavanesekannadakazakhkhmerkoreanlaolatinlatvianlingalalithuanianluxembourgishmacedonianmalagasymalaymalayalammaltesemaorimarathimongolianmyanmarnepalinorwegiannynorskoccitanpashtopersianpolishportuguesepunjabiromanianrussiansanskritserbianshonasindhisinhalaslovaksloveniansomalispanishsundaneseswahiliswedishtagalogtajiktamiltatarteluguthaitibetanturkishturkmenukrainianurduuzbekvietnamesewelshyiddishyoruba

taskstring

Task to perform: transcribe or translate to another language.

Default: "transcribe"

transcribetranslate

timestampstring

Whisper supports both chunked as well as word level timestamps.

Default: "chunk"

chunkword

Version: 3ab86df6c8f5Updated: 7/25/202626.4M runs