← Back to all generators

zsxkib/thinksound

Generate contextual audio from video using step-by-step reasoning🎶

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

video required string

Input video file (supports various formats)

caption string

Caption/title describing the video content (optional)

Default: ""
cfg_scale number

Classifier-free guidance scale. Higher values follow conditioning more closely but may reduce creativity

Default: 5 min: 1, max: 20
cot string

Chain-of-Thought description providing detailed reasoning about the desired audio (optional)

Default: ""
num_inference_steps integer

Number of diffusion denoising steps. More steps = higher quality but slower generation

Default: 24 min: 10, max: 100
seed integer

Random seed for reproducible outputs. Leave empty for random seed

Version: 40d08f9f569e Updated: 2/26/2026 8.3K runs