Tips & Tricks for Dia
- For **Dialogue** mode, clearly mark speaker turns using
[S1]and[S2]. - Add non-verbal sounds like
(laughs),(sighs),(clears throat)within the text where desired. - For **Voice Clone** mode, upload a clean reference audio file (
.wav/.mp3) using the "Load" button. Crucially, include the exact transcript of the reference audio at the beginning of your text input (e.g.,[S1] Reference transcript. [S1] Target text...). - Experiment with **CFG Scale** (higher = more adherence to text, potentially less natural) and **Temperature** (higher = more random/varied).
- The **Speed Factor** adjusts playback speed (0.8 = slower, 1.0 = original).
- Use the
/v1/audio/speechendpoint for OpenAI compatibility. Use thevoiceparameter to specify mode ('S1', 'S2', 'dialogue', 'reference_file.wav').