Apple researchers have revealed their new product known as ReALM AI, which they claim can understand what is on a user’s screen and therefore responds to requests accordingly.
The model, according to the researchers also outsmarts GPT-4 on various tasks although it has fewer parameters. This comes ahead of the official launch of iOS 18 at the WWDC 2024 in June, with an expected big push behind the new Siri 2.0. It is however not yet clear if Apple will integrate the new ReALM into Siri in time for the WWDC 2024.
Also read: Nicki Minaj, Kate Perry, 200 Other Artists Slam AI Developers for ‘Devaluing’ Music
Comprehending onscreen
Apple has been playing catch up on the AI arena, making AI related announcements. Now, researchers at the iPhone maker have made a breakthrough releasing a new AI model – ReALM, which can “understand what’s on your screen.”
This comes barely a month after the acquisition of an AI startup DarwinAI. According to the researchers, the model converts information from user’s screen to text. This allows it to function on devices “without requiring bulky image recognition.”
The model, which as stated in the research paper significantly outperformed GPT-4 albeit fewer parameters, considers what is on the screen as well as tasks that are running in the background.
For example, when a user browses a webpage and they find a business they want to call, they can simply ask Siri “to call this business.” Now, when using ReALM, the model will enable Siri to “see” contact details and “initiate the call directly.”
This illustrates how the model understand screen context helping enhance user experience.
According to an MSPowerUser report, integrating the new model into future Siri updates helps Apple create a more seamless and “hands-free user experience.” This is also expected to give Siri more conversational abilities but without deploying a large language model like Gemini.
The report further notes that the iPhone maker is also working on MM1 that can reduce the need for multiple prompts to get the needed results, and an AI image manipulator.
AI NEWS: Apple researchers just revealed a new AI model that can 'see' and understand screen context.
Plus, more developments from an open-sourced AI agent called SWE-agent, Anthropic, Apple Vision Pro, and Baidu.
Here's everything going on in AI right now:
— Rowan Cheung (@rowancheung) April 3, 2024
Outperforming competition
According to the research paper, ReALM outshined peers and previous models on various datasets. These included synthetic, conversational, and unseen conversational datasets.
The research paper also specifically highlights how ReALM performed completely with OpenAI’s GPT-4 on on-screen information. During the exercise, ReALM relied solely on textual coding while GPT-4 was afforded access to screenshots.
Both GPT-4 and ReALM showed identical results when researchers evaluated their performance.
“However, ReALM outperformed GPT-4 when it came to domain-specific queries due to being fine-tuned on user requests,” according to the MSPowerUser.
The researchers explained that: “We especially wish to highlight the gains on onscreen datasets and find that our model with the textual encoding approach is able to perform almost as well as GPT-4, despite the latter being provided with screenshots.”
This, according to the researchers, allows ReALM to grasp the “nuances of user intent and respond accordingly.”
The other side of the model
The research highlights how ReALM utilizes LLMs for reference resolution. According to MSPowerUser, the model can comprehend a user’s screen as well as their requests by “converting on-screen entities into natural language text, even while remaining efficient for on-device applications.
However, while the model encodes the position of on-screen entities, it might not capture every detail from “intricate user queries requiring complex understanding of spatial relationships.”
According to Tom’s Guide, this isn’t Apple’s first foray into AI space in the past few months. The company has been working on a mixture of tools to enhance efficiency on devices, showing their commitment to make AI the center of their business.
Now, ReALM is the latest from the iPhone maker focusing specifically on enhancing existing models, making them faster and more efficient