voyages

Voyages 6 Indigenous AI

In this activity we will build the Hua Ki’i prototype AI image recognition app and consider what is required to adapt it with another language community.

Hua Ki’i was developed during a series of Indigenous Protocols and Artificial Intelligence workshops in 2019, by a team of Indigenous engineers, scholars and language activists. The app was built with a goal of developing an Indigenous language revitalisation tool. Hua Ki’i uses an object recognition system and translates the image detection results into Hawaiian. The app is easily remixed for another language, and is a great platform for exposing the bias in AI.

For more information about the background to Hua Ki’i and an introduction to Indigenous Protocols, read the Indigenous AI position paper. https://www.indigenous-ai.net/position-paper/

The activity is in two parts. First build the app, then train an object detection system to gain insight into the limitations that current technologies have for recognising culturally specific objects for many of the world’s communities.

Hua Ki’i

What are some things to consider if wanting to make this with an Indigenous community in Australia?

YOLOv5

The prototype app is limited in recognising particular objects from common categories. A bias in AI is that the categories and objects that are recognised are not always culturally relevant for a community who wants to build their own.

What is required to develop a localised object detection system? Many of the considerations here apply for speech/sign recongition, translation and language generation.

Datasets

Publicly available datasets:

COCO

CIFAR

Data quality control

Clean and high-quality data is critical. Some tools exist for finding annotation mistakes, verifying models and looking at subsets of data. Eg. https://voxel51.com/docs/fiftyone/

Make your own dataset

Datasets can be created from scratch by taking photos/scanning culturally relevant objects. Another approach is to bulk download images from online services such as Google/Flickr.

Image prep

Training object detection typically requires matching image sizes in the training data. Bulk resizing is required using tools such as Imagemagick.

Image Labelling

A dataset of images requires labelling/annotation with polygon or rectangle shapes before training.

Read about different types of image annotation.

There are many tools that can be used to label images, including:

In the prac we won’t have time to label a large dataset, but let’s try downloading some images from Google and label them using Roboflow.

Training

Use the Colab to train YOLOv5 with the racoon data.

If you are super keen, try training a koala recogniser! What would be involved? How long do you think it would take?

Other demos