zsxkib/dia
Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Capabilities
Cost
Community model (estimated from hardware time)
Input Parameters
| Name | Type | Description | Default | Constraints |
|---|---|---|---|---|
text * | string | Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers). | — | — |
audio_prompt | string (uri) | Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style. | — | — |
audio_prompt_text | string | Optional transcript of the audio prompt. If provided, this will be prepended to the main text input. | — | — |
cfg_filter_top_k | integer | Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio. | 45 | min: 10, max: 100 |
cfg_scale | number | Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more. | 3 | min: 1, max: 5 |
max_audio_prompt_seconds | integer | Maximum duration in seconds for the input voice cloning audio prompt. Only used when an audio prompt is provided. Longer voice samples will be truncated to this length. | 10 | min: 1, max: 120 |
max_new_tokens | integer | Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio). | 3072 | min: 500, max: 4096 |
seed | integer | Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time. | — | — |
speed_factor | number | Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed. | 1 | min: 0.5, max: 1.5 |
temperature | number | Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values make output more consistent. Set to 0 for deterministic (greedy) generation. | 1.8 | min: 1, max: 2.5 |
top_p | number | Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter. | 0.95 | min: 0.1, max: 1 |
text required string Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).
audio_prompt string Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style.
audio_prompt_text string Optional transcript of the audio prompt. If provided, this will be prepended to the main text input.
cfg_filter_top_k integer Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.
45 min: 10, max: 100 cfg_scale number Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.
3 min: 1, max: 5 max_audio_prompt_seconds integer Maximum duration in seconds for the input voice cloning audio prompt. Only used when an audio prompt is provided. Longer voice samples will be truncated to this length.
10 min: 1, max: 120 max_new_tokens integer Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).
3072 min: 500, max: 4096 seed integer Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.
speed_factor number Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.
1 min: 0.5, max: 1.5 temperature number Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values make output more consistent. Set to 0 for deterministic (greedy) generation.
1.8 min: 1, max: 2.5 top_p number Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.
0.95 min: 0.1, max: 1 2119e338ca5c Updated: 2/26/2026 12.8K runs
cinemasetfree.com