← Back to all generators
bytedance/omni-human-1.5
Official
View on Replicate →
A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.
Capabilities
Reference Images
Seed
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
audio * | string (uri) | Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail. | — | — |
image * | string (uri) | Input image containing a human subject, face or character. | — | — |
fast_mode | boolean | Enable fast mode to speed up generation by sacrificing some effects. | false | — |
prompt | string | Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian. | — | — |
seed | integer | Random seed for reproducible generation. | — | — |
audio required string Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.
image required string Input image containing a human subject, face or character.
fast_mode boolean Enable fast mode to speed up generation by sacrificing some effects.
Default:
false prompt string Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.
seed integer Random seed for reproducible generation.
Version:
b0f93aebf8c3 Updated: 6/26/2026 41.0K runs
cinemasetfree.com