bytedance/seedance-2.0-mini
A lower-cost variant of Seedance 2.0 for high-volume video generation with multimodal inputs and native audio.
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
prompt * | string | Text prompt for video generation. Maximum 4000 characters. BytePlus recommends keeping prompts under 600 English words for best results. | — | — |
aspect_ratio | string | Video aspect ratio. Set to 'adaptive' to let the model choose the best ratio based on inputs. | "16:9" | 16:9 4:3 1:1 3:4 9:16 21:9 9:21 adaptive |
duration | integer | Video duration in seconds. Set to -1 for intelligent duration (model picks the best length). | 5 | min: -1, max: 15 |
generate_audio | boolean | Generate synchronized audio with the video, including dialogue (use double quotes in prompt), sound effects, and background music. | true | — |
image | string (uri) | Input image for image-to-video generation (first frame). Cannot be combined with reference images. | — | — |
last_frame_image | string (uri) | Input image for last frame generation. Only works if a first frame image is also provided. Cannot be combined with reference images. | — | — |
reference_audios | array | Reference audio files (up to 3, total duration max 15s) for audio-driven generation and lip-sync. Requires at least one reference image or video. Reference them in your prompt as [Audio1], [Audio2], etc. | | — |
reference_images | array | Reference images (up to 9) for character consistency, style guidance, and scene composition. Cannot be used together with first/last frame images. You can reference them in your prompt as [Image1], [Image2], etc. | | — |
reference_videos | array | Reference videos (up to 3, total duration max 15s) for motion transfer, style reference, and editing. Reference them in your prompt as [Video1], [Video2], etc. | | — |
resolution | string | Video resolution. | "720p" | 480p 720p |
seed | integer | Random seed. Set for reproducible generation. | — | — |
prompt required string Text prompt for video generation. Maximum 4000 characters. BytePlus recommends keeping prompts under 600 English words for best results.
aspect_ratio string Video aspect ratio. Set to 'adaptive' to let the model choose the best ratio based on inputs.
"16:9" duration integer Video duration in seconds. Set to -1 for intelligent duration (model picks the best length).
5 min: -1, max: 15 generate_audio boolean Generate synchronized audio with the video, including dialogue (use double quotes in prompt), sound effects, and background music.
true image string Input image for image-to-video generation (first frame). Cannot be combined with reference images.
last_frame_image string Input image for last frame generation. Only works if a first frame image is also provided. Cannot be combined with reference images.
reference_audios array Reference audio files (up to 3, total duration max 15s) for audio-driven generation and lip-sync. Requires at least one reference image or video. Reference them in your prompt as [Audio1], [Audio2], etc.
reference_images array Reference images (up to 9) for character consistency, style guidance, and scene composition. Cannot be used together with first/last frame images. You can reference them in your prompt as [Image1], [Image2], etc.
reference_videos array Reference videos (up to 3, total duration max 15s) for motion transfer, style reference, and editing. Reference them in your prompt as [Video1], [Video2], etc.
resolution string Video resolution.
"720p" seed integer Random seed. Set for reproducible generation.
4c173327636d Updated: 6/26/2026 749 runs
cinemasetfree.com