kwaivgi/kling-o1
Modify an existing video through natural-language commands, changing subjects, environments, and visual style while preserving the original motion and timing.
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
prompt * | string | Text prompt for video generation. Can include references like <<<image_1>>>, <<<video_1>>> to reference inputs. | — | — |
aspect_ratio | string | Aspect ratio of the generated video. Required for text-to-video. Ignored when using first frame image or video editing. | "16:9" | 16:9 9:16 1:1 |
duration | integer | Video duration in seconds. For text/image-to-video: 5 or 10. With video reference (feature type): 3-10. Ignored for video editing (base type). | 5 | 3 4 5 6 7 8 9 10 |
end_image | string (uri) | Last frame image for the video. Requires start_image to be set. Supports .jpg/.jpeg/.png, max 10MB. | — | — |
keep_original_sound | boolean | Whether to keep the original sound from the reference video. | true | — |
mode | string | Video generation mode. 'std' is cost-effective, 'pro' has higher quality. | "pro" | std pro |
reference_images | array | Reference images for elements, scenes, or styles (up to 7 without video, 4 with video). Supports .jpg/.jpeg/.png. | — | — |
reference_video | string (uri) | Reference video for style, camera movement, or as base for editing. Supports .mp4/.mov, 3-10s duration, max 200MB. | — | — |
start_image | string (uri) | First frame image for the video. Supports .jpg/.jpeg/.png, max 10MB. | — | — |
video_reference_type | string | How to use the reference video: 'feature' for style/camera reference, 'base' for video editing. | "feature" | feature base |
prompt required string Text prompt for video generation. Can include references like <<<image_1>>>, <<<video_1>>> to reference inputs.
aspect_ratio string Aspect ratio of the generated video. Required for text-to-video. Ignored when using first frame image or video editing.
"16:9" duration integer Video duration in seconds. For text/image-to-video: 5 or 10. With video reference (feature type): 3-10. Ignored for video editing (base type).
5 end_image string Last frame image for the video. Requires start_image to be set. Supports .jpg/.jpeg/.png, max 10MB.
keep_original_sound boolean Whether to keep the original sound from the reference video.
true mode string Video generation mode. 'std' is cost-effective, 'pro' has higher quality.
"pro" reference_images array Reference images for elements, scenes, or styles (up to 7 without video, 4 with video). Supports .jpg/.jpeg/.png.
reference_video string Reference video for style, camera movement, or as base for editing. Supports .mp4/.mov, 3-10s duration, max 200MB.
start_image string First frame image for the video. Supports .jpg/.jpeg/.png, max 10MB.
video_reference_type string How to use the reference video: 'feature' for style/camera reference, 'base' for video editing.
"feature" 6d5f2d4becc7 Updated: 6/8/2026 2.1K runs
cinemasetfree.com