← Back to all generators

victor-upmeet/whisperx

Accelerated transcription, word-level timestamps and diarization with whisperX large-v3

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

audio_file required string

Audio file

align_output boolean

Aligns whisper output to get accurate word-level timestamps

Default: false
batch_size integer

Parallelization of input audio transcription

Default: 64
debug boolean

Print out compute/inference times and memory usage information

Default: false
diarization boolean

Assign speaker ID labels

Default: false
huggingface_access_token string

To enable diarization, please enter your HuggingFace token (read). You need to accept " "the user agreement for the models specified in the README.

initial_prompt string

Optional text to provide as a prompt for the first window

language string

ISO code of the language spoken in the audio, specify None to perform language detection

language_detection_max_tries integer

If language is not specified, then the language will be detected following the logic of " "language_detection_min_prob parameter, but will stop after the given max retries. If max " "retries is reached, the most probable language is kept.

Default: 5
language_detection_min_prob number

If language is not specified, then the language will be detected recursively on different " "parts of the file until it reaches the given probability

Default: 0
max_speakers integer

Maximum number of speakers if diarization is activated (leave blank if unknown)

min_speakers integer

Minimum number of speakers if diarization is activated (leave blank if unknown)

task string

Task to perform on the audio file. Options are: transcribe, translate (English only)

Default: "transcribe"
transcribe translate
temperature number

Temperature to use for sampling

Default: 0
user_agent string

Override the User-Agent used to download the audio file. Useful when the host " "blocks the default value.

vad_offset number

VAD offset

Default: 0.363
vad_onset number

VAD onset

Default: 0.5
Version: 655845d6190e Updated: 6/26/2026 7.8M runs