mirelo/video-to-sfx-v1

Generate synced sounds for any video, and return it with its new sound track

Capabilities

Seed

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`video_path`*	string(uri)	Video file to process for sound effects. Video will be trimmed to 10 sec if longer	`—`	—
`creativity_coef`	number	Creativity coefficient to control the creativity of the generated sound. Higher values are more creative.	`4.5`	min: 1, max: 10
`duration`	integer	Duration of the generated sound effects in seconds.	`10`	min: 1, max: 10
`num_samples`	integer	Number of sound effects to generate. Each sample will be a different variation.	`2`	min: 1, max: 4
`seed`	integer	Random seed for reproducibility. Leave blank (None) or use -1 for random seed, or any integer for deterministic results.	`—`	—
`start_offset`	number	Starting point in the video (in seconds) from which to generate audio. 0 means start from the beginning.	`0`	min: 0, max: 300
`steps`	integer	Number of processing steps for the generation model. Higher values may improve quality but take longer.	`25`	min: 1, max: 30
`text_prompt`	string	Text prompt to guide sound effect generation. Optional text to guide the sound generation process.	`""`	—

video_pathrequiredstring

Video file to process for sound effects. Video will be trimmed to 10 sec if longer

creativity_coefnumber

Creativity coefficient to control the creativity of the generated sound. Higher values are more creative.

Default: 4.5min: 1, max: 10

durationinteger

Duration of the generated sound effects in seconds.

Default: 10min: 1, max: 10

num_samplesinteger

Number of sound effects to generate. Each sample will be a different variation.

Default: 2min: 1, max: 4

seedinteger

Random seed for reproducibility. Leave blank (None) or use -1 for random seed, or any integer for deterministic results.

start_offsetnumber

Starting point in the video (in seconds) from which to generate audio. 0 means start from the beginning.

Default: 0min: 0, max: 300

stepsinteger

Number of processing steps for the generation model. Higher values may improve quality but take longer.

Default: 25min: 1, max: 30

text_promptstring

Text prompt to guide sound effect generation. Optional text to guide the sound generation process.

Default: ""

Version: 34ea7bf892b6Updated: 7/25/20265.4K runs