← Back to all generators

awerks/whisperx

Fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

align_output boolean

Use if you need word-level timing and not just batched transcription

Default: false
audio_file string

Audio file (Input option #1)

audio_url string

Direct audio url. (Input option #2)

batch_size integer

Parallelization of input audio transcription

Default: 32
debug boolean

Debugging purposes

Default: false
diarize boolean

Diarize the result

Default: false
file_extension string

Extension of the audio file (if audio_url is used)

Default: ".wav"
language string

Original language of the audio (reduces hallucinations). Leave empty to detect automatically

only_text boolean

Set if you only want to return text; otherwise, segment metadata will be returned as well.

Default: false
task string

Task: transcribe or translate

Default: "transcribe"
Version: 8546c7207250 Updated: 2/26/2026 25.8K runs