← Back to all generators

inworld/realtime-tts-2

Most expressive text-to-speech model from Inworld, with natural-language steering, real-time latency, and multilingual support across 100+ languages.

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

text required string

The text to convert to speech. Maximum 2,000 characters. Supports natural-language steering with bracketed instructions placed before the text they apply to (e.g. `[say excitedly]`, `[whisper in a hushed style]`, `[say sadly with deliberate pauses in a low voice]`). Inline non-verbal tags are also supported (e.g. `[laugh]`, `[sigh]`, `[breathe]`, `[clear throat]`, `[cough]`, `[yawn]`). SSML break tags work for pauses (e.g. `<break time="1s" />`). Capitalize words for emphasis (e.g. `I told you NOT to do that`).

audio_format string

Output audio format.

Default: "mp3"
mp3 wav ogg_opus flac
language string

Language of the input text. Use 'auto' to let the model detect the language. Supported production languages: English (en), Chinese (zh), Japanese (ja), Korean (ko), Russian (ru), Italian (it), Spanish (es), Portuguese (pt), French (fr), German (de), Polish (pl), Dutch (nl), Hindi (hi), Hebrew (he), Arabic (ar).

Default: "auto"
auto en zh ja ko ru it es pt fr de pl nl hi he ar
sample_rate integer

Audio sample rate in Hz.

Default: 48000
8000 16000 22050 24000 32000 44100 48000
speaking_rate number

Speaking speed multiplier. Set to 0 for normal speed (1.0).

Default: 0 min: 0, max: 1.5
temperature number

Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).

Default: 0 min: 0, max: 2
text_normalization string

Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.

Default: "auto"
auto on off
voice_id string

The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex', 'Darlene') or a custom cloned voice ID.

Default: "Ashley"
Version: ff2e08e7e058 Updated: 6/26/2026 5.7K runs