ChatGPT Gets Voice and Image Features, Becoming More Like Apple’s Siri

September 27, 2023

OpenAI is rolling out voice and image capabilities for ChatGPT, allowing the popular chatbot to have voice conversations with people and interact using images. The upgrade will bring ChatGPT closer to other widely used AI assistants like Apple’s Siri or Amazon’s Alexa.

In a blog post on Sept. 25, OpenAI said the voice feature is powered by a new text-to-speech model that can generate human-like audio from text and a few seconds of sample speech. It said the feature “opens doors to many creative and accessibility-focused applications.”

The new features will be released initially for ChatGPT Plus and Enterprise subscribers over the next two weeks. Then, “other groups of users, including developers,” soon after, suggesting the free version of ChatGPT will also get the voice and image update.

Also read: Schools Reverse ChatGPT Bans Citing Potential Benefits

ChatGPT narrates bedtime stories

ChatGPT has brought up novel possibilities since its launch last year. Companies integrated the chatbot to help with drawing up meetings, writing code, or simplifying financials. Ordinary people are using the technology for a lot of things, including as a travel guide, to write essays, and other stuff.

OpenAI said ChatGPT’s new voice feature allows the chatbot to narrate a bedtime story, speak out loud text prompts to users, or even settle a dinner table debate. The technology is able to create realistic synthetic voices from just a few seconds of real speech.

We listened to a sample of audio in which ChatGPT told the story of a mother cat and her kitten, and it sounded like a real human being. Music streaming platform Spotify is already using the technology, OpenAI said, helping podcasters translate their content in various languages in their own voices.

Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.

Sound on 🔊 pic.twitter.com/3tuWzX0wtS

— OpenAI (@OpenAI) September 25, 2023

According to OpenAI, the voices on ChatGPT were created using the voices of professional actors in order to prevent abuse. “These capabilities present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” it added.

The voice support feature will be available only on iPhone and Android ChatGPT apps. The upgrade is expected to bring ChatGPT closer to popular AI assistants such as Apple’s Siri, Google Assistant, and Amazon’s Alexa, which are built into devices and can be used to set alarms, reminders, and access information online, observers say, as reported by Reuters.

Using images in prompts

In its blog post, OpenAI also announced that people can now query ChatGPT using pictures. Users can take photos of things in their surroundings and ask the AI chatbot to “troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data.”

The model is reportedly similar to how Google uses Google Lens with artificial intelligence. To understand images, OpenAI said ChatGPT uses multimodal GPT-3.5 and GPT-4, a model that can decipher a range of images like photographs and screenshots.

It can also read documents containing both text and images. In a demonstration video, a bike owner uploads a photo of their bike and asks a question. ChatGPT responds, and the user adds additional pictures to help the AI better understand the problem.

The person even circles the bike component they’re asking about, and ChatGPT adjusts its response accordingly. Finally, the user uploads photos of their tools, and ChatGPT tells them which one to use to lower the bike seat.