← Back to all generators

bytedance/omni-human-1.5

A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.

Capabilities

Reference Images Seed

Cost

Community model (estimated from hardware time)

Input Parameters

audio required string

Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.

image required string

Input image containing a human subject, face or character.

fast_mode boolean

Enable fast mode to speed up generation by sacrificing some effects.

Default: false
prompt string

Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.

seed integer

Random seed for reproducible generation.

Version: b0f93aebf8c3 Updated: 6/26/2026 41.0K runs