← Back to all generators
awerks/whisperx
Official
View on Replicate →
Fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
Capabilities
No capability data available
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
align_output | boolean | Use if you need word-level timing and not just batched transcription | false | — |
audio_file | string (uri) | Audio file (Input option #1) | — | — |
audio_url | string | Direct audio url. (Input option #2) | — | — |
batch_size | integer | Parallelization of input audio transcription | 32 | — |
debug | boolean | Debugging purposes | false | — |
diarize | boolean | Diarize the result | false | — |
file_extension | string | Extension of the audio file (if audio_url is used) | ".wav" | — |
language | string | Original language of the audio (reduces hallucinations). Leave empty to detect automatically | — | — |
only_text | boolean | Set if you only want to return text; otherwise, segment metadata will be returned as well. | false | — |
task | string | Task: transcribe or translate | "transcribe" | — |
align_output boolean Use if you need word-level timing and not just batched transcription
Default:
false audio_file string Audio file (Input option #1)
audio_url string Direct audio url. (Input option #2)
batch_size integer Parallelization of input audio transcription
Default:
32 debug boolean Debugging purposes
Default:
false diarize boolean Diarize the result
Default:
false file_extension string Extension of the audio file (if audio_url is used)
Default:
".wav" language string Original language of the audio (reduces hallucinations). Leave empty to detect automatically
only_text boolean Set if you only want to return text; otherwise, segment metadata will be returned as well.
Default:
false task string Task: transcribe or translate
Default:
"transcribe" Version:
8546c7207250 Updated: 2/26/2026 25.8K runs
cinemasetfree.com