resemble-ai/chatterbox-turbo
The fastest open source TTS model without sacrificing quality.
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
text * | string | Text to synthesize into speech (maximum 500 characters). Supported paralinguistic tags you can include in your text: [clear throat], [sigh], [sush], [cough], [groan], [sniff], [gasp], [chuckle], [laugh] Example: "Oh, that's hilarious! [chuckle] Let me tell you more." | — | — |
reference_audio | string (uri) | Reference audio file for voice cloning (optional). Must be longer than 5 seconds. If provided, overrides the voice selection. | — | — |
repetition_penalty | number | Penalizes token repetition. Higher values reduce repetition. | 1.2 | min: 1, max: 2 |
seed | integer | Random seed for reproducible results. Leave blank for random generation. | — | — |
temperature | number | Controls randomness in generation. Higher values produce more varied speech. | 0.8 | min: 0.05, max: 2 |
top_k | integer | Top-k sampling. Limits vocabulary to top k tokens at each step. | 1000 | min: 1, max: 2000 |
top_p | number | Nucleus sampling threshold. Lower values make output more focused. | 0.95 | min: 0.5, max: 1 |
voice | string | Pre-made voice to use for synthesis. Ignored if reference_audio is provided. | "Andy" | Aaron Abigail Anaya Andy Archer Brian Chloe Dylan Emmanuel Ethan Evelyn Gavin Gordon Ivan Laura Lucy Madison Marisol Meera Walter |
text required string Text to synthesize into speech (maximum 500 characters). Supported paralinguistic tags you can include in your text: [clear throat], [sigh], [sush], [cough], [groan], [sniff], [gasp], [chuckle], [laugh] Example: "Oh, that's hilarious! [chuckle] Let me tell you more."
reference_audio string Reference audio file for voice cloning (optional). Must be longer than 5 seconds. If provided, overrides the voice selection.
repetition_penalty number Penalizes token repetition. Higher values reduce repetition.
1.2 min: 1, max: 2 seed integer Random seed for reproducible results. Leave blank for random generation.
temperature number Controls randomness in generation. Higher values produce more varied speech.
0.8 min: 0.05, max: 2 top_k integer Top-k sampling. Limits vocabulary to top k tokens at each step.
1000 min: 1, max: 2000 top_p number Nucleus sampling threshold. Lower values make output more focused.
0.95 min: 0.5, max: 1 voice string Pre-made voice to use for synthesis. Ignored if reference_audio is provided.
"Andy" 95c87b883ff3 Updated: 2/26/2026 138.4K runs
cinemasetfree.com