voyages

Voyages 5 Speech Synthesis

This tutorial reinforces knowledge of speech synthesis and text-to-speech TTS systems, including manipulating audio files, using a basic concatenative synthesiser and using more advanced systems. The results of a variety of systems are reviewed to understand the difference between them.

Audio processing with Audacity

  1. Download and install Audacity. Choose the installer to suit your system from https://www.fosshub.com/Audacity.html

  2. Make an audio recording of your student number (you can use one that you’ve recorded before). Save it as a WAV file.

  3. Pitch & duration
    1. Use the Effects > Change Pitch menu item to change the pitch.
    2. Listen to it, and save the file as student_number_1.wav
    3. Revert to the original recording.
    4. Try the Change Speed effect.
    5. Listen to it, and save the file as student_number_2.wav
    6. Revert to the original recording.
    7. Try the Change Tempo effect.
    8. Listen to it, and save the file as student_number_3.wav
  4. Normalisation
    1. Download this folder of audio files.
    2. Open each file in Audacity and compare the waveforms.
    3. Open the kicking-mule-very-quiet audio file.
    4. Normalise the levels.
  5. Show and tell!

Waveforms, spectograms and MFCCs

  1. Try this Colab to generate representations of audio features.

  2. Read more with Jurafsky and Martin’s Speech and Language Processing book. Particularly chapters 25 and 26.

Speech Synthesis systems

  1. Concatenative TTS
    1. Open and copy the Concatenative demo Colab
    2. Run the first code cells to download the Python module.
    3. Type a sentence into the message variable.
    4. Run that cell to generate an audio file of the text.
    5. Preview the audio.
    6. Comments??
  2. DeepVoice3
    1. Open and copy the DeepVoice demo Colab
    2. Run the code cells to setup and install the program.
    3. Change the sentence.
    4. Generate speech.
    5. Describe the plots.
    6. More info: Medium article, Github, Baidu
  3. Tacotron2
    1. Open and copy the Tacotron2 demo Colab
    2. Run the code cells to setup and install the program.
    3. Change the sentence.
    4. Generate the speech.
    5. More info: GoogleAI blog
  4. Voice cloning
    1. Open and copy the RealTimeVoiceCloning demo Colab
    2. Run the code cells to setup and install the program.
    3. Record yourself using the Colab (you may need to authorise the browser to record), read the Harvard Sentences or SUS or NIT sentences.
    4. Write a sentence and synthesise it with your voice-cloned system.