← Back to all generators

zsxkib/multitalk

Audio-driven multi-person conversational video generation - Upload audio files and a reference image to create realistic conversations between multiple people

Capabilities

Reference Images Seed

Cost

Community model (estimated from hardware time)

Input Parameters

first_audio required string

First audio file for driving the conversation

image required string

Reference image containing the person(s) for video generation

num_frames integer

Number of frames to generate (automatically adjusted to nearest valid value of form 4n+1, e.g., 81, 181)

Default: 81 min: 25, max: 201
prompt string

Text prompt describing the desired interaction or conversation scenario

Default: "A smiling man and woman wearing headphones sit in front of microphones, appearing to host a podcast."
sampling_steps integer

Number of sampling steps (higher = better quality, lower = faster)

Default: 40 min: 2, max: 100
second_audio string

Second audio file for multi-person conversation (optional)

seed integer

Random seed for reproducible results

turbo boolean

Enable turbo mode optimizations (adjusts thresholds and guidance scales for speed)

Default: true
Version: 0bd2390c4061 Updated: 2/26/2026 3.3K runs