zsxkib/dia

Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning

Capabilities

SeedMax TokensTop-P

Cost

Community model (estimated from hardware time)

Input Parameters

Name	Type	Description	Default	Constraints
`text`*	string	Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).	`—`	—
`audio_prompt`	string(uri)	Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style.	`—`	—
`audio_prompt_text`	string	Optional transcript of the audio prompt. If provided, this will be prepended to the main text input.	`—`	—
`cfg_filter_top_k`	integer	Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.	`45`	min: 10, max: 100
`cfg_scale`	number	Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.	`3`	min: 1, max: 5
`max_audio_prompt_seconds`	integer	Maximum duration in seconds for the input voice cloning audio prompt. Only used when an audio prompt is provided. Longer voice samples will be truncated to this length.	`10`	min: 1, max: 120
`max_new_tokens`	integer	Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).	`3072`	min: 500, max: 4096
`seed`	integer	Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.	`—`	—
`speed_factor`	number	Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.	`1`	min: 0.5, max: 1.5
`temperature`	number	Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values make output more consistent. Set to 0 for deterministic (greedy) generation.	`1.8`	min: 1, max: 2.5
`top_p`	number	Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.	`0.95`	min: 0.1, max: 1

textrequiredstring

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

audio_promptstring

Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style.

audio_prompt_textstring

Optional transcript of the audio prompt. If provided, this will be prepended to the main text input.

cfg_filter_top_kinteger

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

Default: 45min: 10, max: 100

cfg_scalenumber

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

Default: 3min: 1, max: 5

max_audio_prompt_secondsinteger

Maximum duration in seconds for the input voice cloning audio prompt. Only used when an audio prompt is provided. Longer voice samples will be truncated to this length.

Default: 10min: 1, max: 120

max_new_tokensinteger

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

Default: 3072min: 500, max: 4096

seedinteger

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

speed_factornumber

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

Default: 1min: 0.5, max: 1.5

temperaturenumber

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values make output more consistent. Set to 0 for deterministic (greedy) generation.

Default: 1.8min: 1, max: 2.5

top_pnumber

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

Default: 0.95min: 0.1, max: 1

Version: 2119e338ca5cUpdated: 7/25/202612.8K runs