resemble-ai/chatterbox-turbo

The fastest open source TTS model without sacrificing quality.

Capabilities

SeedTop-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`text`*	string	Text to synthesize into speech (maximum 500 characters). Supported paralinguistic tags you can include in your text: [clear throat], [sigh], [sush], [cough], [groan], [sniff], [gasp], [chuckle], [laugh] Example: "Oh, that's hilarious! [chuckle] Let me tell you more."	`—`	—
`reference_audio`	string(uri)	Reference audio file for voice cloning (optional). Must be longer than 5 seconds. If provided, overrides the voice selection.	`—`	—
`repetition_penalty`	number	Penalizes token repetition. Higher values reduce repetition.	`1.2`	min: 1, max: 2
`seed`	integer	Random seed for reproducible results. Leave blank for random generation.	`—`	—
`temperature`	number	Controls randomness in generation. Higher values produce more varied speech.	`0.8`	min: 0.05, max: 2
`top_k`	integer	Top-k sampling. Limits vocabulary to top k tokens at each step.	`1000`	min: 1, max: 2000
`top_p`	number	Nucleus sampling threshold. Lower values make output more focused.	`0.95`	min: 0.5, max: 1
`voice`	string	Pre-made voice to use for synthesis. Ignored if reference_audio is provided.	`"Andy"`	AaronAbigailAnayaAndyArcherBrianChloeDylanEmmanuelEthanEvelynGavinGordonIvanLauraLucyMadisonMarisolMeeraWalter

textrequiredstring

Text to synthesize into speech (maximum 500 characters). Supported paralinguistic tags you can include in your text: [clear throat], [sigh], [sush], [cough], [groan], [sniff], [gasp], [chuckle], [laugh] Example: "Oh, that's hilarious! [chuckle] Let me tell you more."

reference_audiostring

Reference audio file for voice cloning (optional). Must be longer than 5 seconds. If provided, overrides the voice selection.

repetition_penaltynumber

Penalizes token repetition. Higher values reduce repetition.

Default: 1.2min: 1, max: 2

seedinteger

Random seed for reproducible results. Leave blank for random generation.

temperaturenumber

Controls randomness in generation. Higher values produce more varied speech.

Default: 0.8min: 0.05, max: 2

top_kinteger

Top-k sampling. Limits vocabulary to top k tokens at each step.

Default: 1000min: 1, max: 2000

top_pnumber

Nucleus sampling threshold. Lower values make output more focused.

Default: 0.95min: 0.5, max: 1

voicestring

Pre-made voice to use for synthesis. Ignored if reference_audio is provided.

Default: "Andy"

AaronAbigailAnayaAndyArcherBrianChloeDylanEmmanuelEthanEvelynGavinGordonIvanLauraLucyMadisonMarisolMeeraWalter

Version: 95c87b883ff3Updated: 7/25/2026138.4K runs