Generate Speech with Dia

Use [S1] and [S2] tags for speaker turns. Add non-verbals like (laughs).

0 / 8192
Generation Parameters
Server Configuration

These settings are saved to the .env file. Restart the server to apply changes.

Tips & Tricks for Dia

  • For **Dialogue** mode, clearly mark speaker turns using [S1] and [S2].
  • Add non-verbal sounds like (laughs), (sighs), (clears throat) within the text where desired.
  • For **Voice Clone** mode, upload a clean reference audio file (.wav/.mp3) using the "Load" button. Crucially, include the exact transcript of the reference audio at the beginning of your text input (e.g., [S1] Reference transcript. [S1] Target text...).
  • Experiment with **CFG Scale** (higher = more adherence to text, potentially less natural) and **Temperature** (higher = more random/varied).
  • The **Speed Factor** adjusts playback speed (0.8 = slower, 1.0 = original).
  • Use the /v1/audio/speech endpoint for OpenAI compatibility. Use the voice parameter to specify mode ('S1', 'S2', 'dialogue', 'reference_file.wav').