← Back to all generators

thomasmol/whisper-diarization

⚡️ Blazing fast audio transcription with speaker diarization | Whisper Large V3 Turbo | word & sentence level timestamps | prompt

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

file string

Or an audio file

file_string string

Either provide: Base64 encoded audio file,

file_url string

Or provide: A direct audio file URL

group_segments boolean

Group segments of same speaker shorter apart than 2 seconds

Default: true
language string

Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.

num_speakers integer

Number of speakers, leave empty to autodetect.

min: 1, max: 50
prompt string

Vocabulary: provide names, acronyms and loanwords in a list. Use punctuation for best accuracy.

translate boolean

Translate the speech into English.

Default: false
Version: 1495a9cddc83 Updated: 2/26/2026 5.9M runs