playht/play-dialog

OfficialView on Replicate →

End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`text`*	string	Text for speech generation	`—`	—
`language`	string	The language of the text to be spoken.	`"english"`	afrikaansalbanianamharicarabicbengalibulgariancatalancroatianczechdanishdutchenglishfrenchgaliciangermangreekhebrewhindihungarianindonesianitalianjapanesekoreanmalaymandarinpolishportugueserussianserbianspanishswedishtagalogthaiturkishukrainianurduxhosa
`prompt`	string	A prompt to guide the style of the output generated by the first voice.	`""`	—
`prompt2`	string	A prompt to guide the style of the output generated by the second voice.	`""`	—
`seed`	integer	Random seed. Set for reproducible generation	`—`	—
`speed`	number	Control how fast the generated audio should be.	`1`	min: 0.1, max: 5
`temperature`	number	The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.	`1`	min: 0, max: 2
`turnPrefix`	string	The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.	`"Voice 1:"`	—
`turnPrefix2`	string	The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.	`"Voice 2:"`	—
`voice`	string	Voice to use for generation	`"Angelo (Young male US conversational voice)"`	Angelo (Young male US conversational voice)Arsenio (Middle-aged male US African American conversational voice)Cillian (Middle-aged male Irish conversational voice)Timo (Middle-aged male US conversational voice)Dexter (Middle-aged male US conversational voice)Miles (Young male US African American conversational voice)Briggs (Elderly male US Southern (Oklahoma) conversational voice)Deedee (Middle-aged female US African American conversational voice)Nia (Young female US conversational voice)Inara (Middle-aged female US African American conversational voice)Constanza (Young female US Latin American conversational voice)Gideon (Elderly male British narrative voice)Casper (Middle-aged male US narrative voice)Mitch (Middle-aged male Australian narrative voice)Ava (Middle-aged female Australian narrative voice)
`voice_2`	string	Optional second voice to use for generation	`"None"`	NoneAngelo (Young male US conversational voice)Arsenio (Middle-aged male US African American conversational voice)Cillian (Middle-aged male Irish conversational voice)Timo (Middle-aged male US conversational voice)Dexter (Middle-aged male US conversational voice)Miles (Young male US African American conversational voice)Briggs (Elderly male US Southern (Oklahoma) conversational voice)Deedee (Middle-aged female US African American conversational voice)Nia (Young female US conversational voice)Inara (Middle-aged female US African American conversational voice)Constanza (Young female US Latin American conversational voice)Gideon (Elderly male British narrative voice)Casper (Middle-aged male US narrative voice)Mitch (Middle-aged male Australian narrative voice)Ava (Middle-aged female Australian narrative voice)
`voice_conditioning_seconds`	integer	The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.	`20`	min: 1, max: 60
`voice_conditioning_seconds_2`	integer	The number of seconds of conditioning to use from the second selected voice.	`20`	min: 1, max: 60

textrequiredstring

Text for speech generation

languagestring

The language of the text to be spoken.

Default: "english"

afrikaansalbanianamharicarabicbengalibulgariancatalancroatianczechdanishdutchenglishfrenchgaliciangermangreekhebrewhindihungarianindonesianitalianjapanesekoreanmalaymandarinpolishportugueserussianserbianspanishswedishtagalogthaiturkishukrainianurduxhosa

promptstring

A prompt to guide the style of the output generated by the first voice.

Default: ""

prompt2string

A prompt to guide the style of the output generated by the second voice.

Default: ""

seedinteger

Random seed. Set for reproducible generation

speednumber

Control how fast the generated audio should be.

Default: 1min: 0.1, max: 5

temperaturenumber

The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.

Default: 1min: 0, max: 2

turnPrefixstring

The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.

Default: "Voice 1:"

turnPrefix2string

The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.

Default: "Voice 2:"

voicestring

Voice to use for generation

Default: "Angelo (Young male US conversational voice)"

Angelo (Young male US conversational voice)Arsenio (Middle-aged male US African American conversational voice)Cillian (Middle-aged male Irish conversational voice)Timo (Middle-aged male US conversational voice)Dexter (Middle-aged male US conversational voice)Miles (Young male US African American conversational voice)Briggs (Elderly male US Southern (Oklahoma) conversational voice)Deedee (Middle-aged female US African American conversational voice)Nia (Young female US conversational voice)Inara (Middle-aged female US African American conversational voice)Constanza (Young female US Latin American conversational voice)Gideon (Elderly male British narrative voice)Casper (Middle-aged male US narrative voice)Mitch (Middle-aged male Australian narrative voice)Ava (Middle-aged female Australian narrative voice)

voice_2string

Optional second voice to use for generation

Default: "None"

NoneAngelo (Young male US conversational voice)Arsenio (Middle-aged male US African American conversational voice)Cillian (Middle-aged male Irish conversational voice)Timo (Middle-aged male US conversational voice)Dexter (Middle-aged male US conversational voice)Miles (Young male US African American conversational voice)Briggs (Elderly male US Southern (Oklahoma) conversational voice)Deedee (Middle-aged female US African American conversational voice)Nia (Young female US conversational voice)Inara (Middle-aged female US African American conversational voice)Constanza (Young female US Latin American conversational voice)Gideon (Elderly male British narrative voice)Casper (Middle-aged male US narrative voice)Mitch (Middle-aged male Australian narrative voice)Ava (Middle-aged female Australian narrative voice)

voice_conditioning_secondsinteger

The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.

Default: 20min: 1, max: 60

voice_conditioning_seconds_2integer

The number of seconds of conditioning to use from the second selected voice.

Default: 20min: 1, max: 60

Version: 0d5710136b22Updated: 7/25/202627.1K runs