← Back to all generators

adirik/styletts2

Generates speech from text

Capabilities

Seed

Cost

Community model (estimated from hardware time)

Input Parameters

text required string

Text to convert to speech

alpha number

Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.3 min: 0, max: 1
beta number

Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.

Default: 0.7 min: 0, max: 1
diffusion_steps integer

Number of diffusion steps

Default: 10 min: 0, max: 50
embedding_scale number

Embedding scale, use higher values for pronounced emotion

Default: 1 min: 0, max: 5
reference string

Reference speech to copy style from

seed integer

Seed for reproducibility

Default: 0
weights string

Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.

Version: 989cb5ea6d24 Updated: 2/26/2026 132.5K runs