adirik/styletts2

Generates speech from text

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`text`*	string	Text to convert to speech	`—`	—
`alpha`	number	Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.	`0.3`	min: 0, max: 1
`beta`	number	Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.	`0.7`	min: 0, max: 1
`diffusion_steps`	integer	Number of diffusion steps	`10`	min: 0, max: 50
`embedding_scale`	number	Embedding scale, use higher values for pronounced emotion	`1`	min: 0, max: 5
`reference`	string(uri)	Reference speech to copy style from	`—`	—
`seed`	integer	Seed for reproducibility	`0`	—
`weights`	string	Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.	`—`	—

textrequiredstring

Text to convert to speech

alphanumber

Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.3min: 0, max: 1

betanumber

Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.7min: 0, max: 1

diffusion_stepsinteger

Number of diffusion steps

Default: 10min: 0, max: 50

embedding_scalenumber

Embedding scale, use higher values for pronounced emotion

Default: 1min: 0, max: 5

referencestring

Reference speech to copy style from

seedinteger

Seed for reproducibility

Default: 0

weightsstring

Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.

Version: 989cb5ea6d24Updated: 7/25/2026132.5K runs