Speech Samples from

 - SOTA TTS

 - Real-time TTS

 - Speaker Adaptation using Fine Tuning Method

Tacotron2_WaveNet

sample1 sample2

Tacotron2_WaveRNN

sample1 sample2

Tacotron2_WaveGlow

sample1 sample2

FastSpeech2_Multiband-MelGAN

sample1 sample2

FastSpeech_WaveGlow

Normal

sample1 sample2

Fast x0.8

sample1 sample2

Slow x1.2

sample1 sample2

FastPitch_WaveGlow

Normal

sample1 sample2

Fast x0.8

sample1 sample2

Slow x1.2

sample1 sample2

High +30Hz

sample1 sample2

Low -30Hz

sample1 sample2

Speaker Adaptation

 Speech DB

  - Source Speaker: lmy

  - Target Speaker: ada

  - Target Speaker: ava avb

Fine Tuning with Tacotron2 & WaveNet

source speaker target ada converted ada using 7 min. DB target avb converted avb using 15 min. DB target ava converted ava using 30 min. DB

Fine Tuning with FastSpeech2 & Multiband_MelGAN

source speaker target ava converted ava using 30 min. DB

Fine Tuning with FastPitch & WaveGlow

source speaker target ava converted ava using 30 min. DB