playht/play-dialog
End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
text * | string | Text for speech generation | — | — |
language | string | The language of the text to be spoken. | "english" | afrikaans albanian amharic arabic bengali bulgarian catalan croatian czech danish dutch english french galician german greek hebrew hindi hungarian indonesian italian japanese korean malay mandarin polish portuguese russian serbian spanish swedish tagalog thai turkish ukrainian urdu xhosa |
prompt | string | A prompt to guide the style of the output generated by the first voice. | "" | — |
prompt2 | string | A prompt to guide the style of the output generated by the second voice. | "" | — |
seed | integer | Random seed. Set for reproducible generation | — | — |
speed | number | Control how fast the generated audio should be. | 1 | min: 0.1, max: 5 |
temperature | number | The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice. | 1 | min: 0, max: 2 |
turnPrefix | string | The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice. | "Voice 1:" | — |
turnPrefix2 | string | The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice. | "Voice 2:" | — |
voice | string | Voice to use for generation | "Angelo (Young male US conversational voice)" | Angelo (Young male US conversational voice) Arsenio (Middle-aged male US African American conversational voice) Cillian (Middle-aged male Irish conversational voice) Timo (Middle-aged male US conversational voice) Dexter (Middle-aged male US conversational voice) Miles (Young male US African American conversational voice) Briggs (Elderly male US Southern (Oklahoma) conversational voice) Deedee (Middle-aged female US African American conversational voice) Nia (Young female US conversational voice) Inara (Middle-aged female US African American conversational voice) Constanza (Young female US Latin American conversational voice) Gideon (Elderly male British narrative voice) Casper (Middle-aged male US narrative voice) Mitch (Middle-aged male Australian narrative voice) Ava (Middle-aged female Australian narrative voice) |
voice_2 | string | Optional second voice to use for generation | "None" | None Angelo (Young male US conversational voice) Arsenio (Middle-aged male US African American conversational voice) Cillian (Middle-aged male Irish conversational voice) Timo (Middle-aged male US conversational voice) Dexter (Middle-aged male US conversational voice) Miles (Young male US African American conversational voice) Briggs (Elderly male US Southern (Oklahoma) conversational voice) Deedee (Middle-aged female US African American conversational voice) Nia (Young female US conversational voice) Inara (Middle-aged female US African American conversational voice) Constanza (Young female US Latin American conversational voice) Gideon (Elderly male British narrative voice) Casper (Middle-aged male US narrative voice) Mitch (Middle-aged male Australian narrative voice) Ava (Middle-aged female Australian narrative voice) |
voice_conditioning_seconds | integer | The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness. | 20 | min: 1, max: 60 |
voice_conditioning_seconds_2 | integer | The number of seconds of conditioning to use from the second selected voice. | 20 | min: 1, max: 60 |
text required string Text for speech generation
language string The language of the text to be spoken.
"english" prompt string A prompt to guide the style of the output generated by the first voice.
"" prompt2 string A prompt to guide the style of the output generated by the second voice.
"" seed integer Random seed. Set for reproducible generation
speed number Control how fast the generated audio should be.
1 min: 0.1, max: 5 temperature number The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.
1 min: 0, max: 2 turnPrefix string The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.
"Voice 1:" turnPrefix2 string The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.
"Voice 2:" voice string Voice to use for generation
"Angelo (Young male US conversational voice)" voice_2 string Optional second voice to use for generation
"None" voice_conditioning_seconds integer The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.
20 min: 1, max: 60 voice_conditioning_seconds_2 integer The number of seconds of conditioning to use from the second selected voice.
20 min: 1, max: 60 0d5710136b22 Updated: 6/26/2026 27.1K runs
cinemasetfree.com