ChatGPT’s Dependence on the ‘English Language’ Has Forced Japan to Create Its Own AI Chatbot

September 19, 2023

Japan is building its own version of ChatGPT, an AI-powered chatbot launched by OpenAI in November 2022. OpenAI’s brainchild has been the trend-setter and has become the fastest-growing platform in history.

However, the new tech sensation “falls short” in Japanese in comparison to English.

The Japanese government, along with the tech titans of Asian tech leaders NEC, Fujitsu, and SoftBank, are gushing billions of dollars to create AI systems known as large language models (LLMs).

“Current public LLMs, such as GPT, excel in English but often fall short in Japanese due to differences in the alphabet system, limited data, and other factors,” said Keisuke Sakaguchi, a researcher at Tohoku University in Japan who specializes in natural language processing.

English language barrier

The ChatGPT‘s GPT-s model was trained on the vast majority of English data, which became its first language to hold human-sounding conversations. The concern with AI’s systems trained on other languages is not being able to maintain the naturality of Japan’s language and culture.

When it comes to learning Japanese with ChatGPT, there are various problems, but it could be used as a resource, according to a discussion on Reddit.

“It could be used as a resource, but there are a number of problems with doing so. It also helps if you understand what ChatGPT actually is and why it might give unreliable (yet convincing) answers,” reads the discussion.

ChatGPT is particularly good at producing fluent and convincing output, but it could be harmful for those who use the chatbot inappropriately, in particular in relation to Japanese learning, according to the discussion.

You can’t rely on it

The Japanese language carries a distinct sentence structure from English, so ChatGPT must translate a Japanese prompt into English, then find an answer, and then again translate the finding into Japanese.

In this way, the output will be a translation rather than an original sentence, where the sentence structure becomes the problem.

While English contains only 26 letters, written Japanese employs two sets of 48 fundamental characters, in addition to 2,136 commonly utilized Chinese characters, known as Kanji.

Also read: Japan Leads the Way by Adapting Copyright Laws to the Rise of AI

There are approximately 50,000 infrequently used Kanji.

“(In Japanese ChatGPT) sometimes generates extremely rare characters that most people have never seen before, and weird, unknown words result,” says Sakaguchi.

“This can be a problem for Japanese learners who may rely on ChatGPT for accurate information and may be misled by its fluent and convincing output,” another Redditor wrote in the discussion.

The Redditor raised awareness about the limitation of ChatGPT to use it as a supplement to other learning resources “rather than relying on it as the sole source of information.

“You can’t rely on it to be 100% accurate when explaining proper Japanese usage in English, for example,” a fellow Redditor joined the discussion.

Japanese LLMs better representing culture

Japan is known for its unique culture, language, and politeness. To assess the sensitivity of LLMs to Japanese culture, a team of researchers introduced Rakuda, a ranking system that evaluates the proficiency of LLMs in providing open-ended responses to Japanese-themed questions.

Sam Passaglia, one of Rakuda’s co-founders, along with his colleagues, asked ChatGPT about comparing the fluidity and cultural appropriateness of answers to standard prompts.

Their utilization of this tool for ranking purposes was based on a preprint published in June, demonstrating that GPT-4 aligns with human reviewers in 87% of cases.

Japan is investing heavily in creating its own AI chatbot, inspired by ChatGPT, to navigate the complexities of the Japanese language and culture.

This move aims to ensure accurate and culturally sensitive AI interactions.

— Mert Cihan Kurel (@mece_ka) September 17, 2023

Interestingly, the top-ranking open-source Japanese LLM on Rakuda holds the fourth position, while the first place, unsurprisingly, is occupied by GPT-4, which also serves as the competition’s judge.

“Certainly, Japanese LLMs are getting much better, but they are far behind GPT-4. But there is no reason in principle, he says, that a Japanese LLM couldn’t equal or surpass GPT-4 in the future. This is not technically insurmountable, but just a question of resources,” said Passaglia, a physicist at the University of Tokyo who studies Japanese language models.