← Back to all generators
vaibhavs10/incredibly-fast-whisper
Official
View on Replicate →
whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗
Capabilities
No capability data available
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
audio * | string (uri) | Audio file | — | — |
batch_size | integer | Number of parallel batches you want to compute. Reduce if you face OOMs. | 24 | — |
diarise_audio | boolean | Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too. | false | — |
hf_token | string | Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first. | — | — |
language | string | Language spoken in the audio, specify 'None' to perform language detection. | "None" | None afrikaans albanian amharic arabic armenian assamese azerbaijani bashkir basque belarusian bengali bosnian breton bulgarian cantonese catalan chinese croatian czech danish dutch english estonian faroese finnish french galician georgian german greek gujarati haitian creole hausa hawaiian hebrew hindi hungarian icelandic indonesian italian japanese javanese kannada kazakh khmer korean lao latin latvian lingala lithuanian luxembourgish macedonian malagasy malay malayalam maltese maori marathi mongolian myanmar nepali norwegian nynorsk occitan pashto persian polish portuguese punjabi romanian russian sanskrit serbian shona sindhi sinhala slovak slovenian somali spanish sundanese swahili swedish tagalog tajik tamil tatar telugu thai tibetan turkish turkmen ukrainian urdu uzbek vietnamese welsh yiddish yoruba |
task | string | Task to perform: transcribe or translate to another language. | "transcribe" | transcribe translate |
timestamp | string | Whisper supports both chunked as well as word level timestamps. | "chunk" | chunk word |
audio required string Audio file
batch_size integer Number of parallel batches you want to compute. Reduce if you face OOMs.
Default:
24 diarise_audio boolean Use Pyannote.audio to diarise the audio clips. You will need to provide hf_token below too.
Default:
false hf_token string Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
language string Language spoken in the audio, specify 'None' to perform language detection.
Default:
"None" None afrikaans albanian amharic arabic armenian assamese azerbaijani bashkir basque belarusian bengali bosnian breton bulgarian cantonese catalan chinese croatian czech danish dutch english estonian faroese finnish french galician georgian german greek gujarati haitian creole hausa hawaiian hebrew hindi hungarian icelandic indonesian italian japanese javanese kannada kazakh khmer korean lao latin latvian lingala lithuanian luxembourgish macedonian malagasy malay malayalam maltese maori marathi mongolian myanmar nepali norwegian nynorsk occitan pashto persian polish portuguese punjabi romanian russian sanskrit serbian shona sindhi sinhala slovak slovenian somali spanish sundanese swahili swedish tagalog tajik tamil tatar telugu thai tibetan turkish turkmen ukrainian urdu uzbek vietnamese welsh yiddish yoruba
task string Task to perform: transcribe or translate to another language.
Default:
"transcribe" transcribe translate
timestamp string Whisper supports both chunked as well as word level timestamps.
Default:
"chunk" chunk word
Version:
3ab86df6c8f5 Updated: 2/26/2026 26.4M runs
cinemasetfree.com