cjwbw/controlvideo

Training-free Controllable Text-to-Video Generation

Capabilities

Seed

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`video_path`*	string(uri)	source video	`—`	—
`condition`	string	Condition of structure sequence	`"depth"`	depthcannypose
`guidance_scale`	number	Scale for classifier-free guidance	`12.5`	min: 1, max: 20
`is_long_video`	boolean	Whether to use hierarchical sampler to produce long video	`false`	—
`num_inference_steps`	integer	Number of denoising steps	`50`	—
`prompt`	string	Text description of target video	`"A striking mallard floats effortlessly on the sparkling pond."`	—
`seed`	string	Random seed. Leave blank to randomize the seed	`—`	—
`smoother_steps`	string	Timesteps at which using interleaved-frame smoother, separate with comma	`"19, 20"`	—
`video_length`	integer	Length of synthesized video	`15`	—

video_pathrequiredstring

source video

conditionstring

Condition of structure sequence

Default: "depth"

depthcannypose

guidance_scalenumber

Scale for classifier-free guidance

Default: 12.5min: 1, max: 20

is_long_videoboolean

Whether to use hierarchical sampler to produce long video

Default: false

num_inference_stepsinteger

Number of denoising steps

Default: 50

promptstring

Text description of target video

Default: "A striking mallard floats effortlessly on the sparkling pond."

seedstring

Random seed. Leave blank to randomize the seed

smoother_stepsstring

Timesteps at which using interleaved-frame smoother, separate with comma

Default: "19, 20"

video_lengthinteger

Length of synthesized video

Default: 15

Version: 91710b3f53c9Updated: 7/25/20262.4K runs