qwen/qwen3-tts

A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design

Capabilities

No capability data available

Community model (estimated from hardware time)

Name	Type	Description	Default	Constraints
`text`*	string	Text to synthesize into speech	`—`	—
`language`	string	Language of the text (use 'auto' for automatic detection)	`"auto"`	autoChineseEnglishJapaneseKoreanFrenchGermanItalianSpanishPortugueseRussian
`mode`	string	TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description	`"custom_voice"`	custom_voicevoice_clonevoice_design
`reference_audio`	string(uri)	Reference audio file for voice cloning (only for 'voice_clone' mode)	`—`	—
`reference_text`	string	Transcript of the reference audio (recommended for 'voice_clone' mode)	`—`	—
`speaker`	string	Preset speaker voice (only for 'custom_voice' mode)	`"Serena"`	AidenDylanEricOno_annaRyanSerenaSoheeUncle_fuVivian
`style_instruction`	string	Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')	`—`	—
`voice_description`	string	Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'	`—`	—

textrequiredstring

Text to synthesize into speech

languagestring

Language of the text (use 'auto' for automatic detection)

Default: "auto"

autoChineseEnglishJapaneseKoreanFrenchGermanItalianSpanishPortugueseRussian

modestring

TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description

Default: "custom_voice"

custom_voicevoice_clonevoice_design

reference_audiostring

Reference audio file for voice cloning (only for 'voice_clone' mode)

reference_textstring

Transcript of the reference audio (recommended for 'voice_clone' mode)

speakerstring

Preset speaker voice (only for 'custom_voice' mode)

Default: "Serena"

AidenDylanEricOno_annaRyanSerenaSoheeUncle_fuVivian

style_instructionstring

Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')

voice_descriptionstring

Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'

Version: d490a561cf11Updated: 7/25/2026712.4K runs