mirelo/video-to-sfx-v1
Generate synced sounds for any video, and return it with its new sound track
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
video_path * | string (uri) | Video file to process for sound effects. Video will be trimmed to 10 sec if longer | — | — |
creativity_coef | number | Creativity coefficient to control the creativity of the generated sound. Higher values are more creative. | 4.5 | min: 1, max: 10 |
duration | integer | Duration of the generated sound effects in seconds. | 10 | min: 1, max: 10 |
num_samples | integer | Number of sound effects to generate. Each sample will be a different variation. | 2 | min: 1, max: 4 |
seed | integer | Random seed for reproducibility. Leave blank (None) or use -1 for random seed, or any integer for deterministic results. | — | — |
start_offset | number | Starting point in the video (in seconds) from which to generate audio. 0 means start from the beginning. | 0 | min: 0, max: 300 |
steps | integer | Number of processing steps for the generation model. Higher values may improve quality but take longer. | 25 | min: 1, max: 30 |
text_prompt | string | Text prompt to guide sound effect generation. Optional text to guide the sound generation process. | "" | — |
video_path required string Video file to process for sound effects. Video will be trimmed to 10 sec if longer
creativity_coef number Creativity coefficient to control the creativity of the generated sound. Higher values are more creative.
4.5 min: 1, max: 10 duration integer Duration of the generated sound effects in seconds.
10 min: 1, max: 10 num_samples integer Number of sound effects to generate. Each sample will be a different variation.
2 min: 1, max: 4 seed integer Random seed for reproducibility. Leave blank (None) or use -1 for random seed, or any integer for deterministic results.
start_offset number Starting point in the video (in seconds) from which to generate audio. 0 means start from the beginning.
0 min: 0, max: 300 steps integer Number of processing steps for the generation model. Higher values may improve quality but take longer.
25 min: 1, max: 30 text_prompt string Text prompt to guide sound effect generation. Optional text to guide the sound generation process.
"" 34ea7bf892b6 Updated: 6/26/2026 5.4K runs
cinemasetfree.com