Tech Rivals Race to Launch Multimodal AI Wearables – Report

December 19, 2023

You know who this is. AI does too.

Major tech companies like Microsoft, Google, OpenAI, and others are racing to integrate multimodal AI to build smart glasses and other wearable devices with front-looking cameras.

Multimodal AI is a powerful form of the technology that combines many sources of data to go beyond simple generated text replies. It can understand text, images, audio, video, speech, and even hand gestures.

As reported by The Information, big tech companies are betting that multimodal systems can be a good fit for smart glasses with in-built cameras in front as well as other wearable technology.

Also read: Meta’s Ray-Ban Glasses Now Have AI Capabilities for Sound and Sight

New battle for AI dominance

The vision is shaping up to become a key area of development and AI rivalry for Big Tech in 2024. Many of the companies have talked about this vision or worked on it for several years, the report said.

Now, they are confident they can sell smart glasses powered by AI. For example, OpenAI discussed “embedding” its object recognition software, GPT-4 with Vision, into Snapchat’s Spectacles wearables.

The deal with Snap, the parent company of Snapchat, could result in new features for the smart glasses, The Information wrote. The firm has struggled to turn the device into a mass-market product.

In February, Snap hinted at how it plans to integrate generative AI into its photo-and-video recording glasses, Spectacles. CEO Evan Spiegel said AI could be used to “improve the resolution and clarity of a Snap after the user captures it,” according to industry media.

It could even be used for “more extreme transformations,” like editing images or creating Snaps based on text input, he added.

When running chatGPT on your #ARglasses, you'll get guidance during day to day activities

But instead of #AI based on general models, it needs to understand me!

So I've programmed my @Spectacles to ask me why I'm doing things. So I can collect input to train my own personal LLM pic.twitter.com/5wPdgr6jXp

— Sander Veenhof (@sndrv) April 17, 2023

OpenAI and Microsoft are already working with AI startup Humane, which recently launched a device called the Ai Pin that uses a laser projection system to display text and images on a user’s hand.

The gadget is designed to be worn on clothing and can be tapped to talk to a virtual assistant powered by OpenAI’s GPT-4 technology and cloud computing power from Microsoft.

Meta leads the industry push

The tech industry push comes as Meta last week revealed the latest version of its Ray-Ban smart glasses, which use AI to “see, hear, and identify things via a built-in camera and microphone.”

When activated, the Ray-Ban can respond to a voice command like, “Is this tea caffeine-free?” by taking a picture, analyzing it, and then providing a response, said Meta CEO Mark Zuckerberg.

But a test by CNET shows that the Ray-Bans hallucinate—the glasses saw things that weren’t really present and went on to give a description of the items. It is a common problem with generative AI.

As for Google, in 2013, the company started selling a prototype of its earliest smart glasses, known simply as Glass, for $1,500. The glasses did not catch on, and were criticized as a threat to privacy.

Eventually, Google stopped producing Glass. The company is now adding multimodal artificial intelligence to ChatGPT rival Gemini and is also expected to incorporate the technology into its wearables.

The integration of multimodal AI into wearables like augmented reality smart glasses typically aims to enhance their functionality and offer users a more immersive experience.

It can also be used for a lot of practical applications, including translating languages, remote support for engineers, and real-time data sharing for soldiers in combat.

In 2022, the global wearables market was valued at about $61 billion, according to estimates. The sector is expected to grow by 15% every year until 2030—faster than the smartphone market.