voyages

Voyages 2 Language Data

Preparation of data is a critical step in the development and use of language technologies. This practical session will demonstrate techniques for changing data formats and extracting elements of data to suit particular technologies. The tasks also demonstrate sorting and cleaning of ELAN transcriptions.

The audio colab demonstrates the use of the FFmpeg software to convert audio and video files.

The ELAN colab demonstrates how to parse ELAN files, sort tiers and extract annotations from tiers. It also shows a method for removing punctuation from the annotations.

For more data preparation scripts, see