← Back to all generators
zsxkib/thinksound
Official
View on Replicate →
Generate contextual audio from video using step-by-step reasoning🎶
Capabilities
Seed
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
video * | string (uri) | Input video file (supports various formats) | — | — |
caption | string | Caption/title describing the video content (optional) | "" | — |
cfg_scale | number | Classifier-free guidance scale. Higher values follow conditioning more closely but may reduce creativity | 5 | min: 1, max: 20 |
cot | string | Chain-of-Thought description providing detailed reasoning about the desired audio (optional) | "" | — |
num_inference_steps | integer | Number of diffusion denoising steps. More steps = higher quality but slower generation | 24 | min: 10, max: 100 |
seed | integer | Random seed for reproducible outputs. Leave empty for random seed | — | — |
video required string Input video file (supports various formats)
caption string Caption/title describing the video content (optional)
Default:
"" cfg_scale number Classifier-free guidance scale. Higher values follow conditioning more closely but may reduce creativity
Default:
5 min: 1, max: 20 cot string Chain-of-Thought description providing detailed reasoning about the desired audio (optional)
Default:
"" num_inference_steps integer Number of diffusion denoising steps. More steps = higher quality but slower generation
Default:
24 min: 10, max: 100 seed integer Random seed for reproducible outputs. Leave empty for random seed
Version:
40d08f9f569e Updated: 2/26/2026 8.3K runs
cinemasetfree.com