Speech Samples from

- SOTA TTS

- Real-time TTS

- Speaker Adaptation using Fine Tuning Method

Tacotron2_WaveNet

sample1	sample2

Tacotron2_WaveRNN

sample1	sample2

Tacotron2_WaveGlow

sample1	sample2

FastSpeech2_Multiband-MelGAN

sample1	sample2

FastSpeech_WaveGlow

Normal

sample1	sample2

Fast x0.8

sample1	sample2

Slow x1.2

sample1	sample2

FastPitch_WaveGlow

Normal

sample1	sample2

Fast x0.8

sample1	sample2

Slow x1.2

sample1	sample2

High +30Hz

sample1	sample2

Low -30Hz

sample1	sample2

Speaker Adaptation

Speech DB

- Source Speaker: lmy

- Target Speaker: ada

- Target Speaker: ava avb

Fine Tuning with Tacotron2 & WaveNet

source speaker	target ada	converted ada using 7 min. DB	target avb	converted avb using 15 min. DB	target ava	converted ava using 30 min. DB

Fine Tuning with FastSpeech2 & Multiband_MelGAN

source speaker	target ava	converted ava using 30 min. DB

Fine Tuning with FastPitch & WaveGlow

source speaker	target ava	converted ava using 30 min. DB