awerks/whisperx

Fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.

Capabilities

No capability data available

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`align_output`	boolean	Use if you need word-level timing and not just batched transcription	`false`	—
`audio_file`	string(uri)	Audio file (Input option #1)	`—`	—
`audio_url`	string	Direct audio url. (Input option #2)	`—`	—
`batch_size`	integer	Parallelization of input audio transcription	`32`	—
`debug`	boolean	Debugging purposes	`false`	—
`diarize`	boolean	Diarize the result	`false`	—
`file_extension`	string	Extension of the audio file (if audio_url is used)	`".wav"`	—
`language`	string	Original language of the audio (reduces hallucinations). Leave empty to detect automatically	`—`	—
`only_text`	boolean	Set if you only want to return text; otherwise, segment metadata will be returned as well.	`false`	—
`task`	string	Task: transcribe or translate	`"transcribe"`	—

align_outputboolean

Use if you need word-level timing and not just batched transcription

Default: false

audio_filestring

Audio file (Input option #1)

audio_urlstring

Direct audio url. (Input option #2)

batch_sizeinteger

Parallelization of input audio transcription

Default: 32

debugboolean

Debugging purposes

Default: false

diarizeboolean

Diarize the result

Default: false

file_extensionstring

Extension of the audio file (if audio_url is used)

Default: ".wav"

languagestring

Original language of the audio (reduces hallucinations). Leave empty to detect automatically

only_textboolean

Set if you only want to return text; otherwise, segment metadata will be returned as well.

Default: false

taskstring

Task: transcribe or translate

Default: "transcribe"

Version: 8546c7207250Updated: 7/25/202625.8K runs