Voyages 5 Speech Synthesis
This tutorial reinforces knowledge of speech synthesis and text-to-speech TTS systems, including manipulating audio files, using a basic concatenative synthesiser and using more advanced systems. The results of a variety of systems are reviewed to understand the difference between them.
Audio processing with Audacity
-
Download and install Audacity. Choose the installer to suit your system from https://www.fosshub.com/Audacity.html
-
Make an audio recording of your student number (you can use one that you’ve recorded before). Save it as a WAV file.
- Pitch & duration
- Use the Effects > Change Pitch menu item to change the pitch.
- Listen to it, and save the file as student_number_1.wav
- Revert to the original recording.
- Try the Change Speed effect.
- Listen to it, and save the file as student_number_2.wav
- Revert to the original recording.
- Try the Change Tempo effect.
- Listen to it, and save the file as student_number_3.wav
- Normalisation
- Download this folder of audio files.
- Open each file in Audacity and compare the waveforms.
- Open the
kicking-mule-very-quiet
audio file.
- Normalise the levels.
- Show and tell!
-
Try this Colab to generate representations of audio features.
-
Read more with Jurafsky and Martin’s Speech and Language Processing book. Particularly chapters 25 and 26.
Speech Synthesis systems
- Concatenative TTS
- Open and copy the Concatenative demo Colab
- Run the first code cells to download the Python module.
- Type a sentence into the message variable.
- Run that cell to generate an audio file of the text.
- Preview the audio.
- Comments??
- DeepVoice3
- Open and copy the DeepVoice demo Colab
- Run the code cells to setup and install the program.
- Change the sentence.
- Generate speech.
- Describe the plots.
- More info: Medium article, Github, Baidu
- Tacotron2
- Open and copy the Tacotron2 demo Colab
- Run the code cells to setup and install the program.
- Change the sentence.
- Generate the speech.
- More info: GoogleAI blog
- Voice cloning
- Open and copy the RealTimeVoiceCloning demo Colab
- Run the code cells to setup and install the program.
- Record yourself using the Colab (you may need to authorise the browser to record), read the Harvard Sentences or SUS or NIT sentences.
- Write a sentence and synthesise it with your voice-cloned system.