thomasmol/whisper-diarization

OfficialView on Replicate →

⚡️ Blazing fast audio transcription with speaker diarization | Whisper Large V3 Turbo & pyannote 4.0 community-1 | word & sentence level timestamps | prompt

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`file`	string(uri)	Or an audio file	`—`	—
`file_string`	string	Either provide: Base64 encoded audio file,	`—`	—
`file_url`	string	Or provide: A direct audio file URL	`—`	—
`language`	string	Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.	`—`	—
`num_speakers`	integer	Number of speakers, leave empty to autodetect.	`—`	min: 1, max: 50
`prompt`	string	Vocabulary: provide names, acronyms and loanwords in a list. Use punctuation for best accuracy.	`—`	—
`translate`	boolean	Translate the speech into English.	`false`	—

filestring

Or an audio file

file_stringstring

Either provide: Base64 encoded audio file,

file_urlstring

Or provide: A direct audio file URL

languagestring

Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.

num_speakersinteger

Number of speakers, leave empty to autodetect.

min: 1, max: 50

promptstring

Vocabulary: provide names, acronyms and loanwords in a list. Use punctuation for best accuracy.

translateboolean

Translate the speech into English.

Default: false

Version: 744c4f2bffaeUpdated: 7/25/20268.5M runs