afiaka87/tortoise-tts

Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`custom_voice`	string(uri)	(Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input.	`—`	—
`cvvp_amount`	number	How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)	`0`	min: 0, max: 1
`preset`	string	Which voice preset to use. See the documentation for more information.	`"fast"`	ultra_fastfaststandardhigh_quality
`seed`	integer	Random seed which can be used to reproduce results.	`0`	—
`text`	string	Text to speak.	`"The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them."`	—
`voice_a`	string	Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice.	`"random"`	angiecond_latent_exampledenirofreemanhalleljmyselfpat2snakestomtrain_dawstrain_dreamstrain_gracetrain_lescaultweaverapplejackdanielemmageraltjlawmolpatrainbowtim_reynoldstrain_atkinstrain_dotricetrain_empiretrain_kennardtrain_mousewilliamrandomcustom_voicedisabled
`voice_b`	string	(Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.	`"disabled"`	angiecond_latent_exampledenirofreemanhalleljmyselfpat2snakestomtrain_dawstrain_dreamstrain_gracetrain_lescaultweaverapplejackdanielemmageraltjlawmolpatrainbowtim_reynoldstrain_atkinstrain_dotricetrain_empiretrain_kennardtrain_mousewilliamrandomcustom_voicedisabled
`voice_c`	string	(Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.	`"disabled"`	angiecond_latent_exampledenirofreemanhalleljmyselfpat2snakestomtrain_dawstrain_dreamstrain_gracetrain_lescaultweaverapplejackdanielemmageraltjlawmolpatrainbowtim_reynoldstrain_atkinstrain_dotricetrain_empiretrain_kennardtrain_mousewilliamrandomcustom_voicedisabled

custom_voicestring

(Optional) Create a custom voice based on an mp3 file of a speaker. Audio should be at least 15 seconds, only contain one speaker, and be in mp3 format. Overrides the `voice_a` input.

cvvp_amountnumber

How much the CVVP model should influence the output. Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)

Default: 0min: 0, max: 1

presetstring

Which voice preset to use. See the documentation for more information.

Default: "fast"

ultra_fastfaststandardhigh_quality

seedinteger

Random seed which can be used to reproduce results.

Default: 0

textstring

Text to speak.

Default: "The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them."

voice_astring

Selects the voice to use for generation. Use `random` to select a random voice. Use `custom_voice` to use a custom voice.

Default: "random"

angiecond_latent_exampledenirofreemanhalleljmyselfpat2snakestomtrain_dawstrain_dreamstrain_gracetrain_lescaultweaverapplejackdanielemmageraltjlawmolpatrainbowtim_reynoldstrain_atkinstrain_dotricetrain_empiretrain_kennardtrain_mousewilliamrandomcustom_voicedisabled

voice_bstring

(Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.

Default: "disabled"

voice_cstring

(Optional) Create new voice from averaging the latents for `voice_a`, `voice_b` and `voice_c`. Use `disabled` to disable voice mixing.

Default: "disabled"

Version: e9658de4b325Updated: 7/25/2026173.3K runs