bytedance/omni-human-1.5

Official View on Replicate →

A film-grade digital human model that generates realistic video from a single image, audio clip, and optional text prompt.

Capabilities

Reference Images Seed

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`audio` *	string (uri)	Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.	`—`	—
`image` *	string (uri)	Input image containing a human subject, face or character.	`—`	—
`fast_mode`	boolean	Enable fast mode to speed up generation by sacrificing some effects.	`false`	—
`prompt`	string	Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.	`—`	—
`seed`	integer	Random seed for reproducible generation.	`—`	—

audio required string

Input audio file (MP3, WAV, etc.). Duration must be less than 35 seconds. If the audio exceeds 35 seconds, an error will be generated and the generation will fail.

image required string

Input image containing a human subject, face or character.

fast_mode boolean

Enable fast mode to speed up generation by sacrificing some effects.

Default: false

prompt string

Optional prompt for precise control of the scene, movements, camera movements, etc. Supports Chinese, English, Japanese, Korean, Spanish, and Indonesian.

seed integer

Random seed for reproducible generation.

Version: b0f93aebf8c3 Updated: 6/26/2026 41.0K runs