GPT-4 is coming as early as this week, allowing users to create video, music, and images from text, according to Microsoft Germany chief technology officer Andreas Braun. GPT is the tech that underpins the popular AI chatbot ChatGPT.
Microsoft’s revamped Bing search engine currently runs on a more “powerful” large language model from OpenAI than the one which supports ChatGPT. It is customized specifically for search and combines improvements from ChatGPT and GPT-3.5 to be “faster and accurate.”
ChatGPT operates specifically on GPT-3 and GPT-3.5. GPT is short for Generative Pre-trained Transformer, a language model that relies on deep learning to generate human-like responses to given user prompts.
ChatGPT-4 “will be multimodal”
“We will introduce GPT-4 next week, where we will have multimodal models that will offer completely different possibilities – for example, videos,” said Braun, as reported by German publication Heise on March 9.
The CTO made the comments during an event called ‘AI in Focus – Digital Kickoff’. ChatGPT is already capable of performing a variety of tasks within seconds. It can write code, handle complex instructions, produce longer-form content, create more rhyming poems and songs.
With multimodal AI capabilities, GPT-4 will let users translate text into not only images, but also music and video. GPT-4 will create answers much faster than the existing GPT-3.5. The responses will reportedly sound human-like, as though a human wrote them.
Advanced AI to take ‘visual IQ test’
Stanford artificial intelligence scientist Jim Fan, said he expects that a multimodal GPT-4 might be capable of doing a lot more tasks than those shared by Microsoft, including performing a “visual IQ test: yes, the ones that humans take!”
“Once again, it doesn’t matter if GPT-4 is released next week or not. What matters is that you need to brace yourself or your organization for multimodal LLM APIs. This is an unstoppable force and will inevitably come – very likely before the end of 2023,” he wrote on Twitter.
*If* GPT-4 is multimodal, we can predict with reasonable confidence what GPT-4 *might* be capable of, given Microsoft’s prior work Kosmos-1:
– Visual IQ test: yes, the ones that humans take!
– OCR-free reading comprehension: input a screenshot, scanned document, street sign, or… pic.twitter.com/q5uWMKGUMK
— Jim Fan (@DrJimFan) March 10, 2023
Microsoft has invested several billions of dollars in OpenAI, the creators of ChatGPT. The U.S. technology company integrated the conversational AI tool into its Bing search engine in early February. Since then, Bing has crossed 100 million active daily users for the first time.
According to officials, Bing uses OpenAI’s prometheus model to “improve search relevancy, annotate searches, provide up to date results, improve understanding of geolocation, and increase safety of answers.”
But users who tested the AI search report that Bing can also produce nonsensical responses, referred to as “hallucinations.” From declaring undying love to a New York Times columnist to swearing at people and issuing veiled threats, Bing has had a very busy last few weeks.
GPT-4 features coming to Bing – just not now
Andreas Braun explained at the Germany AI event that large language models (LLM) are a “game changer” because of their ability to teach machines to understand natural language. In turn, the machine can understand stuff that could only be understood and read by humans.
He said artificial intelligence technology has come a long way to the extent that it basically “works in all languages”: One could querry AI chatbot in German and receive a response in Italian. With multimodality, Microsoft (-OpenAI) will “make the models comprehensive”
However, Braun did not specifically mention Microsoft’s AI-powered Bing Chat and when the new features of GPT-4 will be added to the search engine. It stands to reason though that it will not be long before the changes will also be available in the Bing Chat.