afiaka87/tortoise-tts
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
custom_voice | string (uri) | (Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input. | — | — |
cvvp_amount | number | How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled) | 0 | min: 0, max: 1 |
preset | string | Which voice preset to use. See the documentation for more information. | "fast" | ultra_fast fast standard high_quality |
seed | integer | Random seed which can be used to reproduce results. | 0 | — |
text | string | Text to speak. | "The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them." | — |
voice_a | string | Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice. | "random" | angie cond_latent_example deniro freeman halle lj myself pat2 snakes tom train_daws train_dreams train_grace train_lescault weaver applejack daniel emma geralt jlaw mol pat rainbow tim_reynolds train_atkins train_dotrice train_empire train_kennard train_mouse william random custom_voice disabled |
voice_b | string | (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing. | "disabled" | angie cond_latent_example deniro freeman halle lj myself pat2 snakes tom train_daws train_dreams train_grace train_lescault weaver applejack daniel emma geralt jlaw mol pat rainbow tim_reynolds train_atkins train_dotrice train_empire train_kennard train_mouse william random custom_voice disabled |
voice_c | string | (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing. | "disabled" | angie cond_latent_example deniro freeman halle lj myself pat2 snakes tom train_daws train_dreams train_grace train_lescault weaver applejack daniel emma geralt jlaw mol pat rainbow tim_reynolds train_atkins train_dotrice train_empire train_kennard train_mouse william random custom_voice disabled |
custom_voice string (Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input.
cvvp_amount number How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)
0 min: 0, max: 1 preset string Which voice preset to use. See the documentation for more information.
"fast" seed integer Random seed which can be used to reproduce results.
0 text string Text to speak.
"The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them." voice_a string Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice.
"random" voice_b string (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
"disabled" voice_c string (Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.
"disabled" e9658de4b325 Updated: 2/26/2026 173.3K runs
cinemasetfree.com