qwen/qwen3-tts
A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
text * | string | Text to synthesize into speech | — | — |
language | string | Language of the text (use 'auto' for automatic detection) | "auto" | auto Chinese English Japanese Korean French German Spanish Portuguese Russian |
mode | string | TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description | "custom_voice" | custom_voice voice_clone voice_design |
reference_audio | string (uri) | Reference audio file for voice cloning (only for 'voice_clone' mode) | — | — |
reference_text | string | Transcript of the reference audio (recommended for 'voice_clone' mode) | — | — |
speaker | string | Preset speaker voice (only for 'custom_voice' mode) | "Serena" | Aiden Dylan Eric Ono_anna Ryan Serena Sohee Uncle_fu Vivian |
style_instruction | string | Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone') | — | — |
voice_description | string | Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent' | — | — |
text required string Text to synthesize into speech
language string Language of the text (use 'auto' for automatic detection)
"auto" mode string TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description
"custom_voice" reference_audio string Reference audio file for voice cloning (only for 'voice_clone' mode)
reference_text string Transcript of the reference audio (recommended for 'voice_clone' mode)
speaker string Preset speaker voice (only for 'custom_voice' mode)
"Serena" style_instruction string Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')
voice_description string Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'
501be1210291 Updated: 2/26/2026 83.9K runs
cinemasetfree.com