← Back to all generators

playht/play-dialog

End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

text required string

Text for speech generation

language string

The language of the text to be spoken.

Default: "english"
afrikaans albanian amharic arabic bengali bulgarian catalan croatian czech danish dutch english french galician german greek hebrew hindi hungarian indonesian italian japanese korean malay mandarin polish portuguese russian serbian spanish swedish tagalog thai turkish ukrainian urdu xhosa
prompt string

A prompt to guide the style of the output generated by the first voice.

Default: ""
prompt2 string

A prompt to guide the style of the output generated by the second voice.

Default: ""
seed integer

Random seed. Set for reproducible generation

speed number

Control how fast the generated audio should be.

Default: 1 min: 0.1, max: 5
temperature number

The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.

Default: 1 min: 0, max: 2
turnPrefix string

The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.

Default: "Voice 1:"
turnPrefix2 string

The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.

Default: "Voice 2:"
voice string

Voice to use for generation

Default: "Angelo (Young male US conversational voice)"
Angelo (Young male US conversational voice) Arsenio (Middle-aged male US African American conversational voice) Cillian (Middle-aged male Irish conversational voice) Timo (Middle-aged male US conversational voice) Dexter (Middle-aged male US conversational voice) Miles (Young male US African American conversational voice) Briggs (Elderly male US Southern (Oklahoma) conversational voice) Deedee (Middle-aged female US African American conversational voice) Nia (Young female US conversational voice) Inara (Middle-aged female US African American conversational voice) Constanza (Young female US Latin American conversational voice) Gideon (Elderly male British narrative voice) Casper (Middle-aged male US narrative voice) Mitch (Middle-aged male Australian narrative voice) Ava (Middle-aged female Australian narrative voice)
voice_2 string

Optional second voice to use for generation

Default: "None"
None Angelo (Young male US conversational voice) Arsenio (Middle-aged male US African American conversational voice) Cillian (Middle-aged male Irish conversational voice) Timo (Middle-aged male US conversational voice) Dexter (Middle-aged male US conversational voice) Miles (Young male US African American conversational voice) Briggs (Elderly male US Southern (Oklahoma) conversational voice) Deedee (Middle-aged female US African American conversational voice) Nia (Young female US conversational voice) Inara (Middle-aged female US African American conversational voice) Constanza (Young female US Latin American conversational voice) Gideon (Elderly male British narrative voice) Casper (Middle-aged male US narrative voice) Mitch (Middle-aged male Australian narrative voice) Ava (Middle-aged female Australian narrative voice)
voice_conditioning_seconds integer

The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.

Default: 20 min: 1, max: 60
voice_conditioning_seconds_2 integer

The number of seconds of conditioning to use from the second selected voice.

Default: 20 min: 1, max: 60
Version: 0d5710136b22 Updated: 6/26/2026 27.1K runs