sabuhigr/sabuhi-model
Whisper AI with channel separation and speaker diarization
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
audio * | string (uri) | Audio file | — | — |
hf_token * | string | Your Hugging Face token for speaker diarization | — | — |
language * | string | language spoken in the audio, specify None to perform language detection | — | af am ar as az ba be bg bn bo br bs ca cs cy da de el en es et eu fa fi fo fr gl gu ha haw he hi hr ht hu hy id is it ja jw ka kk km kn ko la lb ln lo lt lv mg mi mk ml mn mr ms mt my ne nl nn no oc pa pl ps pt ro ru sa sd si sk sl sn so sq sr su sv sw ta te tg th tk tl tr tt uk ur uz vi yi yo zh Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Castilian Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish Flemish French Galician Georgian German Greek Gujarati Haitian Haitian Creole Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Letzeburgesch Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Moldavian Moldovan Mongolian Myanmar Nepali Norwegian Nynorsk Occitan Panjabi Pashto Persian Polish Portuguese Punjabi Pushto Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Sinhalese Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Valencian Vietnamese Welsh Yiddish Yoruba |
compression_ratio_threshold | number | if the gzip compression ratio is higher than this value, treat the decoding as failed | 2.4 | — |
condition_on_previous_text | boolean | if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop | true | — |
initial_prompt | string | optional text to provide as a prompt for the first window. | — | — |
logprob_threshold | number | if the average log probability is lower than this value, treat the decoding as failed | -1 | — |
max_speakers | integer | Select 2 if record is stereo, 1 if is mono.Default is 1 for mono records | 1 | 1 2 |
min_speakers | integer | Select 2 if record is stereo, 1 if is mono.Default is 1 for mono records | 1 | 1 2 |
model | string | Choose a Whisper model. | "large-v2" | large large-v2 |
no_speech_threshold | number | if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence | 0.6 | — |
patience | number | optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search | — | — |
suppress_tokens | string | comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations | "-1" | — |
temperature | number | temperature to use for sampling | 0 | — |
temperature_increment_on_fallback | number | temperature to increase when falling back when the decoding fails to meet either of the thresholds below | 0.2 | — |
transcription | string | Choose the format for the transcription | "plain text" | plain text srt vtt |
translate | boolean | Translate the text to English when set to True | false | — |
audio required string Audio file
hf_token required string Your Hugging Face token for speaker diarization
language required string language spoken in the audio, specify None to perform language detection
compression_ratio_threshold number if the gzip compression ratio is higher than this value, treat the decoding as failed
2.4 condition_on_previous_text boolean if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
true initial_prompt string optional text to provide as a prompt for the first window.
logprob_threshold number if the average log probability is lower than this value, treat the decoding as failed
-1 max_speakers integer Select 2 if record is stereo, 1 if is mono.Default is 1 for mono records
1 min_speakers integer Select 2 if record is stereo, 1 if is mono.Default is 1 for mono records
1 model string Choose a Whisper model.
"large-v2" no_speech_threshold number if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
0.6 patience number optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
suppress_tokens string comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
"-1" temperature number temperature to use for sampling
0 temperature_increment_on_fallback number temperature to increase when falling back when the decoding fails to meet either of the thresholds below
0.2 transcription string Choose the format for the transcription
"plain text" translate boolean Translate the text to English when set to True
false 29b6421db707 Updated: 6/8/2026 25.5K runs
cinemasetfree.com