Dia TTS Server | Text-to-Dialogue

Generate Speech with Dia

Text to speak

Use [S1] and [S2] tags for speaker turns. Add non-verbals like (laughs).

0 / 8192

Voice Mode

Single / Dialogue (Use [S1]/[S2]) Voice Clone (from Reference)

Load Example Preset

Generation Parameters

Speed Factor (0.9)

CFG Scale (3.0)

Temperature (1.3)

Top P (0.95)

CFG Filter Top K (35)

Server Configuration

These settings are saved to the .env file. Restart the server to apply changes.

Model Repo ID

Model Config Filename

Model Weights Filename

Model Cache Path

Reference Audio Path

Output Path

Server Host

Server Port

Use [S1]/[S2] for dialogue. Add (laughs) etc.

For **Dialogue** mode, clearly mark speaker turns using [S1] and [S2].
Add non-verbal sounds like (laughs), (sighs), (clears throat) within the text where desired.
For **Voice Clone** mode, upload a clean reference audio file (.wav/.mp3) using the "Load" button. Crucially, include the exact transcript of the reference audio at the beginning of your text input (e.g., [S1] Reference transcript. [S1] Target text...).
Experiment with **CFG Scale** (higher = more adherence to text, potentially less natural) and **Temperature** (higher = more random/varied).
The **Speed Factor** adjusts playback speed (0.8 = slower, 1.0 = original).
Use the /v1/audio/speech endpoint for OpenAI compatibility. Use the voice parameter to specify mode ('S1', 'S2', 'dialogue', 'reference_file.wav').