Scientist Integrates AI with Over 500 African Native Languages

Scientist Integrates AI with Over 500 African Native Languages

A scientist is integrating AI with over 500 native African languages to make the technology available to other non-English-speaking people and minorities.

The scientist, a scholar at the University of British Columbia’s linguistics department and programmer’s project known as Afrocentric Natural Language Processing, seeks to make AI tools inclusive of “everyone,” including those who do not speak English.

Bridging language barriers

Ife Adebara’s idea came following the realization that AI technology has moved fast, especially in the past year with OpenAI’s ChatGPT spurring developments in generative AI. She, however, noticed that despite the growth in generative AI, many Africans were being excluded due to language barriers.

Incorporating over 500 native African languages would, according to the scientist, ensure the technology is available to everyone and “no one is left behind.”

“Someone who speaks a minority language has to put their language aside in order to be able to get technology in English, for example,” said Adebara in an interview with CBC.

“Over time, their usage of the language begins to drop, and that can have long-term consequences of language endangerment. We need to mitigate that.”

The challenge of exclusion is not unique to Africa alone but prevalent across the globe. Currently, many speakers across the world are left behind as long as they don’t use any of the global dominant languages like English, French, Spanish, German, Chinese, or Russian.

Also read: Meta’s Ad Business Is Still Expanding

The African languages

Dubbed Afrocentric Natural Language Processing, the project is expected to make AI tools available to some of Africa’s most popular native languages, such as Zulu and Swahili.

According to CBC, Adebara’s team has so far released two language identification programs known as Serengeti and AfroLID.

“There are 2,000+ languages in Africa. So right now I’ve been working on about 517 of them, which are spoken in 50 out of 54 countries in Africa,” said Adebara.

There are about 7,000 known languages worldwide, but the majority of online content is in English.

Under this project, integrating AI into these languages will ensure that Africans can interact with AI using their own indigenous languages they are comfortable with, according to Adebara.

Diverse grammar

According to Adebara, many indigenous African languages have “grammatical features that are diverse” and unique to the region alone.

“If we build language technology and exclude African languages, the models and technologies are not learning certain features,” she said.

“Which is also not good because they’re not versatile across different grammatical features that exist in human language,” she added.

With over a billion people in Africa, who account for about 17% of the global population, Adebara thinks exclusion from AI would be unfair for the region, creating scope for African languages to integrate the technology.

This, she said, will ensure everyone participates in the global conversation using their “indigenous languages.”

As such, the Afrocentric Natural Language Processing project is already looking at adding African languages to the existing list to broaden the number of people accessing the technology.

“They can have access to information on the web in their language, either translated to their language or translated from another language,” said Adebara.

Image credits: Shutterstock, CC images, Midjourney, Unsplash.