shreejalmaharjan-27/tiktok-short-captions

Generate Tiktok-Style Captions powered by Whisper (GPU)

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`video`*	string(uri)	Video Path	`—`	—
`caption_size`	integer	The maximum number of words to generate in each window	`30`	—
`compression_ratio_threshold`	number	if the gzip compression ratio is higher than this value, treat the decoding as failed	`2.4`	—
`condition_on_previous_text`	boolean	if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop	`true`	—
`highlight_color`	string	The color of the highlight for the captioned text	`"#39E508"`	—
`initial_prompt`	string	optional text to provide as a prompt for the first window.	`—`	—
`language`	string	Language spoken in the audio, specify 'auto' for automatic language detection	`"auto"`	autoafamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezhAfrikaansAlbanianAmharicArabicArmenianAssameseAzerbaijaniBashkirBasqueBelarusianBengaliBosnianBretonBulgarianBurmeseCantoneseCastilianCatalanChineseCroatianCzechDanishDutchEnglishEstonianFaroeseFinnishFlemishFrenchGalicianGeorgianGermanGreekGujaratiHaitianHaitian CreoleHausaHawaiianHebrewHindiHungarianIcelandicIndonesianItalianJapaneseJavaneseKannadaKazakhKhmerKoreanLaoLatinLatvianLetzeburgeschLingalaLithuanianLuxembourgishMacedonianMalagasyMalayMalayalamMalteseMandarinMaoriMarathiMoldavianMoldovanMongolianMyanmarNepaliNorwegianNynorskOccitanPanjabiPashtoPersianPolishPortuguesePunjabiPushtoRomanianRussianSanskritSerbianShonaSindhiSinhalaSinhaleseSlovakSlovenianSomaliSpanishSundaneseSwahiliSwedishTagalogTajikTamilTatarTeluguThaiTibetanTurkishTurkmenUkrainianUrduUzbekValencianVietnameseWelshYiddishYoruba
`logprob_threshold`	number	if the average log probability is lower than this value, treat the decoding as failed	`-1`	—
`model`	string	Whisper model size (currently only large-v3 is supported).	`"large-v3"`	large-v3
`no_speech_threshold`	number	if the probability of the <\|nospeech\|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence	`0.6`	—
`patience`	number	optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search	`—`	—
`suppress_tokens`	string	comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations	`"-1"`	—
`temperature`	number	temperature to use for sampling	`0`	—
`temperature_increment_on_fallback`	number	temperature to increase when falling back when the decoding fails to meet either of the thresholds below	`0.2`	—

videorequiredstring

Video Path

caption_sizeinteger

The maximum number of words to generate in each window

Default: 30

compression_ratio_thresholdnumber

if the gzip compression ratio is higher than this value, treat the decoding as failed

Default: 2.4

condition_on_previous_textboolean

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Default: true

highlight_colorstring

The color of the highlight for the captioned text

Default: "#39E508"

initial_promptstring

optional text to provide as a prompt for the first window.

languagestring

Language spoken in the audio, specify 'auto' for automatic language detection

Default: "auto"

autoafamarasazbabebgbnbobrbscacscydadeeleneseteufafifofrglguhahawhehihrhthuhyidisitjajwkakkkmknkolalblnloltlvmgmimkmlmnmrmsmtmynenlnnnoocpaplpsptrorusasdsiskslsnsosqsrsusvswtatetgthtktltrttukuruzviyiyoyuezhAfrikaansAlbanianAmharicArabicArmenianAssameseAzerbaijaniBashkirBasqueBelarusianBengaliBosnianBretonBulgarianBurmeseCantoneseCastilianCatalanChineseCroatianCzechDanishDutchEnglishEstonianFaroeseFinnishFlemishFrenchGalicianGeorgianGermanGreekGujaratiHaitianHaitian CreoleHausaHawaiianHebrewHindiHungarianIcelandicIndonesianItalianJapaneseJavaneseKannadaKazakhKhmerKoreanLaoLatinLatvianLetzeburgeschLingalaLithuanianLuxembourgishMacedonianMalagasyMalayMalayalamMalteseMandarinMaoriMarathiMoldavianMoldovanMongolianMyanmarNepaliNorwegianNynorskOccitanPanjabiPashtoPersianPolishPortuguesePunjabiPushtoRomanianRussianSanskritSerbianShonaSindhiSinhalaSinhaleseSlovakSlovenianSomaliSpanishSundaneseSwahiliSwedishTagalogTajikTamilTatarTeluguThaiTibetanTurkishTurkmenUkrainianUrduUzbekValencianVietnameseWelshYiddishYoruba

logprob_thresholdnumber

if the average log probability is lower than this value, treat the decoding as failed

Default: -1

modelstring

Whisper model size (currently only large-v3 is supported).

Default: "large-v3"

large-v3

no_speech_thresholdnumber

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Default: 0.6

patiencenumber

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

suppress_tokensstring

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Default: "-1"

temperaturenumber

temperature to use for sampling

Default: 0

temperature_increment_on_fallbacknumber

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default: 0.2

Version: 46bf1c12c77aUpdated: 7/25/2026232.4K runs