← Back to all generators

vaibhavs10/incredibly-fast-whisper

whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

audio required string

Audio file

batch_size integer

Number of parallel batches you want to compute. Reduce if you face OOMs.

Default: 24
diarise_audio boolean

Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.

Default: false
hf_token string

Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.

language string

Language spoken in the audio, specify 'None' to perform language detection.

Default: "None"
None afrikaans albanian amharic arabic armenian assamese azerbaijani bashkir basque belarusian bengali bosnian breton bulgarian cantonese catalan chinese croatian czech danish dutch english estonian faroese finnish french galician georgian german greek gujarati haitian creole hausa hawaiian hebrew hindi hungarian icelandic indonesian italian japanese javanese kannada kazakh khmer korean lao latin latvian lingala lithuanian luxembourgish macedonian malagasy malay malayalam maltese maori marathi mongolian myanmar nepali norwegian nynorsk occitan pashto persian polish portuguese punjabi romanian russian sanskrit serbian shona sindhi sinhala slovak slovenian somali spanish sundanese swahili swedish tagalog tajik tamil tatar telugu thai tibetan turkish turkmen ukrainian urdu uzbek vietnamese welsh yiddish yoruba
task string

Task to perform: transcribe or translate to another language.

Default: "transcribe"
transcribe translate
timestamp string

Whisper supports both chunked as well as word level timestamps.

Default: "chunk"
chunk word
Version: 3ab86df6c8f5 Updated: 2/26/2026 26.4M runs