← Back to all generators

shreejalmaharjan-27/tiktok-short-captions

Generate Tiktok-Style Captions powered by Whisper (GPU)

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

video required string

Video Path

caption_size integer

The maximum number of words to generate in each window

Default: 30
compression_ratio_threshold number

if the gzip compression ratio is higher than this value, treat the decoding as failed

Default: 2.4
condition_on_previous_text boolean

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Default: true
highlight_color string

The color of the highlight for the captioned text

Default: "#39E508"
initial_prompt string

optional text to provide as a prompt for the first window.

language string

Language spoken in the audio, specify 'auto' for automatic language detection

Default: "auto"
auto af am ar as az ba be bg bn bo br bs ca cs cy da de el en es et eu fa fi fo fr gl gu ha haw he hi hr ht hu hy id is it ja jw ka kk km kn ko la lb ln lo lt lv mg mi mk ml mn mr ms mt my ne nl nn no oc pa pl ps pt ro ru sa sd si sk sl sn so sq sr su sv sw ta te tg th tk tl tr tt uk ur uz vi yi yo yue zh Afrikaans Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Cantonese Castilian Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish Flemish French Galician Georgian German Greek Gujarati Haitian Haitian Creole Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Letzeburgesch Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Mandarin Maori Marathi Moldavian Moldovan Mongolian Myanmar Nepali Norwegian Nynorsk Occitan Panjabi Pashto Persian Polish Portuguese Punjabi Pushto Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Sinhalese Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Valencian Vietnamese Welsh Yiddish Yoruba
logprob_threshold number

if the average log probability is lower than this value, treat the decoding as failed

Default: -1
model string

Whisper model size (currently only large-v3 is supported).

Default: "large-v3"
large-v3
no_speech_threshold number

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Default: 0.6
patience number

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

suppress_tokens string

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Default: "-1"
temperature number

temperature to use for sampling

Default: 0
temperature_increment_on_fallback number

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default: 0.2
Version: 46bf1c12c77a Updated: 2/26/2026 232.4K runs