minimax/speech-02-turbo

OfficialView on Replicate →

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency

Capabilities

No capability data available

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`text`*	string	Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.	`—`	—
`audio_format`	string	File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.	`"mp3"`	mp3wavflacpcm
`bitrate`	integer	MP3 bitrate in bits per second. Only used when audio_format is mp3.	`128000`	3200064000128000256000
`channel`	string	mono for 1 channel (default), stereo for 2 channels.	`"mono"`	monostereo
`emotion`	string	Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.	`"auto"`	autohappysadangryfearfuldisgustedsurprisedcalmfluentneutral
`english_normalization`	boolean	Improve number/date reading for English text (adds a small amount of latency).	`false`	—
`language_boost`	string	Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.	`"None"`	NoneAutomaticChineseChinese,YueCantoneseEnglishArabicRussianSpanishFrenchPortugueseGermanTurkishDutchUkrainianVietnameseIndonesianJapaneseItalianKoreanThaiPolishRomanianGreekCzechFinnishHindiBulgarianDanishHebrewMalayPersianSlovakSwedishCroatianFilipinoHungarianNorwegianSlovenianCatalanNynorskTamilAfrikaans
`pitch`	integer	Semitone offset applied to the voice (−12 to +12).	`0`	min: -12, max: 12
`sample_rate`	integer	Audio sample rate in Hz.	`32000`	80001600022050240003200044100
`speed`	number	Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.	`1`	min: 0.5, max: 2
`subtitle_enable`	boolean	Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).	`false`	—
`voice_id`	string	Voice to synthesize. Pick any MiniMax system voice (e.g. English_Wiselady, English_Deep-VoicedGentleman) or a voice_id returned by https://replicate.com/minimax/voice-cloning. See the full list of voices in the README.	`"English_Wiselady"`	—
`volume`	number	Relative loudness. 1.0 is default MiniMax gain. Range 0–10.	`1`	min: 0, max: 10

textrequiredstring

Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.

audio_formatstring

File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.

Default: "mp3"

mp3wavflacpcm

bitrateinteger

MP3 bitrate in bits per second. Only used when audio_format is mp3.

Default: 128000

3200064000128000256000

channelstring

mono for 1 channel (default), stereo for 2 channels.

Default: "mono"

monostereo

emotionstring

Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.

Default: "auto"

autohappysadangryfearfuldisgustedsurprisedcalmfluentneutral

english_normalizationboolean

Improve number/date reading for English text (adds a small amount of latency).

Default: false

language_booststring

Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.

Default: "None"

NoneAutomaticChineseChinese,YueCantoneseEnglishArabicRussianSpanishFrenchPortugueseGermanTurkishDutchUkrainianVietnameseIndonesianJapaneseItalianKoreanThaiPolishRomanianGreekCzechFinnishHindiBulgarianDanishHebrewMalayPersianSlovakSwedishCroatianFilipinoHungarianNorwegianSlovenianCatalanNynorskTamilAfrikaans

pitchinteger

Semitone offset applied to the voice (−12 to +12).

Default: 0min: -12, max: 12

sample_rateinteger

Audio sample rate in Hz.

Default: 32000

80001600022050240003200044100

speednumber

Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.

Default: 1min: 0.5, max: 2

subtitle_enableboolean

Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).

Default: false

voice_idstring

Voice to synthesize. Pick any MiniMax system voice (e.g. English_Wiselady, English_Deep-VoicedGentleman) or a voice_id returned by https://replicate.com/minimax/voice-cloning. See the full list of voices in the README.

Default: "English_Wiselady"

volumenumber

Relative loudness. 1.0 is default MiniMax gain. Range 0–10.

Default: 1min: 0, max: 10

Version: f39649380c14Updated: 7/25/202612.5M runs