← Back to all generators

nicknaskida/whisper-diarization

⚡️ Insanely Fast audio transcription | whisper large-v3 | speaker diarization | word & sentence level timestamps | prompt | hotwords. Fork of thomasmol/whisper-diarization. Added batched whisper, 3x-4x speedup 🚀

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

batch_size integer

Batch size for inference. (Reduce if face OOM error)

Default: 64 min: 1
file string

Or an audio file

file_string string

Either provide: Base64 encoded audio file,

file_url string

Or provide: A direct audio file URL

group_segments boolean

Group segments of same speaker shorter apart than 2 seconds

Default: true
hf_token string

Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.

language string

Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.

num_speakers integer

Number of speakers, leave empty to autodetect.

Default: 2 min: 1, max: 50
offset_seconds integer

Offset in seconds, used for chunked inputs

Default: 0 min: 0
prompt string

Vocabulary: provide names, acronyms and loanwords in a list. Use punctuation for best accuracy.

transcript_output_format string

Specify the format of the transcript output: individual words with timestamps, full text of segments, or a combination of both.

Default: "both"
words_only segments_only both
translate boolean

Translate the speech into English.

Default: false
Version: c643440e783b Updated: 2/26/2026 451 runs