December 16, 2022
Appen: Larrakia Nation Data Annotation
To preserve the Larrakia language (the Australian Indigenous language spoken in Darwin), linguist Dr. Mark Harvey has teamed up with the Larrakia Nation Aboriginal Corporation of People and Appen to improve the database of usable text and audio data language.
Appen
Appen is an Australian data company and a global leader in data for the AI lifecycle, with more than 25 years of experience in data annotation, data sourcing and model evaluation. Appen provides AI and machine learning models supported with quality training data and expert managed services. Key assets include search and content relevance, data collection, computer vision, NLP & speech, chatbots & conversional AI, AR/VR, audio, linguistics.
Larrakia Nation Data Annotation:
To preserve the Larrakia language (the Australian Indigenous language spoken in Darwin), linguist Dr. Mark Harvey has teamed up with the Larrakia Nation Aboriginal Corporation of People and Appen to improve the database of usable text and audio data language. The need to preserve the language is strong as the last fluent speaker died decades ago.
The project consisted of a digitised text and audio database comprised of sentences, words, and phrases. The main challenge is that both datasets were not linked, and it was incredibly hard to distinguish key words, as well as a multitude of editing flaws.
Appen was involved to bring together the two data sets, enrich the meta data and provide acoustic measurements to describe Larrakia consonants and vowels. Appen linguists firstly provided supplementary English transcription and granular timestamping by inserting markers at relevant sense units (single words, markers etc). This was followed by isolating parts of a text and then correcting, followed with vowels and constants extracted by the data.
The project is ongoing, and the next step will be to preserve and then teach the language, as a partner Appen has provided a usable and sustainable database.