bytedance/seedance-2.0-mini

A lower-cost variant of Seedance 2.0 for high-volume video generation with multimodal inputs and native audio.

Capabilities

Reference Images Seed

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`prompt` *	string	Text prompt for video generation. Maximum 4000 characters. BytePlus recommends keeping prompts under 600 English words for best results.	`—`	—
`aspect_ratio`	string	Video aspect ratio. Set to 'adaptive' to let the model choose the best ratio based on inputs.	`"16:9"`	16:9 4:3 1:1 3:4 9:16 21:9 9:21 adaptive
`duration`	integer	Video duration in seconds. Set to -1 for intelligent duration (model picks the best length).	`5`	min: -1, max: 15
`generate_audio`	boolean	Generate synchronized audio with the video, including dialogue (use double quotes in prompt), sound effects, and background music.	`true`	—
`image`	string (uri)	Input image for image-to-video generation (first frame). Cannot be combined with reference images.	`—`	—
`last_frame_image`	string (uri)	Input image for last frame generation. Only works if a first frame image is also provided. Cannot be combined with reference images.	`—`	—
`reference_audios`	array	Reference audio files (up to 3, total duration max 15s) for audio-driven generation and lip-sync. Requires at least one reference image or video. Reference them in your prompt as [Audio1], [Audio2], etc.		—
`reference_images`	array	Reference images (up to 9) for character consistency, style guidance, and scene composition. Cannot be used together with first/last frame images. You can reference them in your prompt as [Image1], [Image2], etc.		—
`reference_videos`	array	Reference videos (up to 3, total duration max 15s) for motion transfer, style reference, and editing. Reference them in your prompt as [Video1], [Video2], etc.		—
`resolution`	string	Video resolution.	`"720p"`	480p 720p
`seed`	integer	Random seed. Set for reproducible generation.	`—`	—

prompt required string

Text prompt for video generation. Maximum 4000 characters. BytePlus recommends keeping prompts under 600 English words for best results.

aspect_ratio string

Video aspect ratio. Set to 'adaptive' to let the model choose the best ratio based on inputs.

Default: "16:9"

16:9 4:3 1:1 3:4 9:16 21:9 9:21 adaptive

duration integer

Video duration in seconds. Set to -1 for intelligent duration (model picks the best length).

Default: 5 min: -1, max: 15

generate_audio boolean

Generate synchronized audio with the video, including dialogue (use double quotes in prompt), sound effects, and background music.

Default: true

image string

Input image for image-to-video generation (first frame). Cannot be combined with reference images.

last_frame_image string

Input image for last frame generation. Only works if a first frame image is also provided. Cannot be combined with reference images.

reference_audios array

Reference audio files (up to 3, total duration max 15s) for audio-driven generation and lip-sync. Requires at least one reference image or video. Reference them in your prompt as [Audio1], [Audio2], etc.

Default:

reference_images array

Reference images (up to 9) for character consistency, style guidance, and scene composition. Cannot be used together with first/last frame images. You can reference them in your prompt as [Image1], [Image2], etc.

Default:

reference_videos array

Reference videos (up to 3, total duration max 15s) for motion transfer, style reference, and editing. Reference them in your prompt as [Video1], [Video2], etc.

Default:

resolution string

Video resolution.

Default: "720p"

480p 720p

seed integer

Random seed. Set for reproducible generation.

Version: 4c173327636d Updated: 6/26/2026 749 runs