← Back to all generators

zsxkib/dia

Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning

Capabilities

Seed Max Tokens Top-P

Cost

Community model (estimated from hardware time)

Input Parameters

text required string

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

audio_prompt string

Optional audio file (.wav/.mp3/.flac) for voice cloning. The model will attempt to mimic this voice style.

audio_prompt_text string

Optional transcript of the audio prompt. If provided, this will be prepended to the main text input.

cfg_filter_top_k integer

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

Default: 45 min: 10, max: 100
cfg_scale number

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

Default: 3 min: 1, max: 5
max_audio_prompt_seconds integer

Maximum duration in seconds for the input voice cloning audio prompt. Only used when an audio prompt is provided. Longer voice samples will be truncated to this length.

Default: 10 min: 1, max: 120
max_new_tokens integer

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

Default: 3072 min: 500, max: 4096
seed integer

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

speed_factor number

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

Default: 1 min: 0.5, max: 1.5
temperature number

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values make output more consistent. Set to 0 for deterministic (greedy) generation.

Default: 1.8 min: 1, max: 2.5
top_p number

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

Default: 0.95 min: 0.1, max: 1
Version: 2119e338ca5c Updated: 2/26/2026 12.8K runs