adirik/styletts2
Generates speech from text
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
text * | string | Text to convert to speech | — | — |
alpha | number | Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text. | 0.3 | min: 0, max: 1 |
beta | number | Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text. | 0.7 | min: 0, max: 1 |
diffusion_steps | integer | Number of diffusion steps | 10 | min: 0, max: 50 |
embedding_scale | number | Embedding scale, use higher values for pronounced emotion | 1 | min: 0, max: 5 |
reference | string (uri) | Reference speech to copy style from | — | — |
seed | integer | Seed for reproducibility | 0 | — |
weights | string | Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used. | — | — |
text required string Text to convert to speech
alpha number Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
0.3 min: 0, max: 1 beta number Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
0.7 min: 0, max: 1 diffusion_steps integer Number of diffusion steps
10 min: 0, max: 50 embedding_scale number Embedding scale, use higher values for pronounced emotion
1 min: 0, max: 5 reference string Reference speech to copy style from
seed integer Seed for reproducibility
0 weights string Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.
989cb5ea6d24 Updated: 2/26/2026 132.5K runs
cinemasetfree.com