← Back to all generators

bytedance/omni-human

Turns your audio/video/images into professional-quality animated videos

Capabilities

Reference Images

Cost

Community model (estimated from hardware time)

Input Parameters

audio required string

Input audio file (MP3, WAV, etc.). For the best quality outputs audio should be no longer than 15 seconds. After 15 seconds the video quality will begin to degrade. If you have a lot of audio you want to process, we recommend splitting it into 15 second chunks.

image required string

Input image containing a human subject, face or character.

Version: 566f1b030169 Updated: 2/26/2026 153.5K runs