Sora, an impressive new generative video model created by OpenAI, can take a brief text description and transform it into a minute-long, intricate, high-definition film clip.
OpenAI, the parent company of the ChatGPT chatbot and the still-image generator DALL-E, is among the many companies vying to enhance this instant video generator. Other companies include start-ups like Runway and tech giants like Google and Meta Platforms Inc., the owners of Facebook and Instagram.
The technology has the potential to completely replace less skilled digital artists while speeding up the work of seasoned moviemakers.
Also Read: OpenAI’s Co-Founder, Andrej Karpathy, Steps Down, Eyes Personal Ventures
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W
Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024
Releasing Sora
OpenAI named its new system Sora, the Japanese word for sky. The technology’s development team, including the researchers Tim Brooks and Bill Peebles, chose the name because it “evokes the idea of limitless creative potential.”
Sora is our first video generation model – it can create HD videos up to 1 min long. AGI will be able to simulate the physical world, and Sora is a key step in that direction. thrilled to have worked on this with @billpeeb at @openai for the past year https://t.co/p4kAkRR0i0 pic.twitter.com/Hipku1LFRM
— Tim Brooks (@_tim_brooks) February 15, 2024
They also said the company had yet to release Sora to the public because it was still looking into the risks associated with the system. Rather, OpenAI is sharing the technology with a selected group of academics and other outside researchers who will “red team” it, a term to describe searching for potential misuses.
According to Dr. Brooks, the intention here is to give a preview of what is on the horizon so that people can see the capabilities of this technology and get feedback.
OpenAI Tags the Videos
OpenAI already tags videos created by the system with watermarks to indicate they were generated by artificial intelligence (AI). However, the company acknowledges that these can be removed. They added that they can also be challenging to identify.
According to OpenAI, they are teaching artificial intelligence (AI) to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
Additionally, they are granting access to several visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.
here is sora, our video generation model:https://t.co/CDr4DdCrh1
today we are starting red-teaming and offering access to a limited number of creators.@_tim_brooks @billpeeb @model_mechanic are really incredible; amazing work by them and the team.
remarkable moment.
— Sam Altman (@sama) February 15, 2024
They are sharing their research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.
Developing Sora
However, OpenAI declined to disclose the number of videos the system learned from or where they came from. They only stated that the training included both publicly available videos and videos licensed by copyright holders.
The company has been sued several times for using copyrighted content. It is probably trying to keep an advantage over competitors, so it doesn’t disclose anything about the data used to train its technologies.
Furthermore, the model has a profound comprehension of language, enabling it to accurately interpret prompts and generate compelling characters that vividly convey emotions. Sora can also cause several shots that maintain the visual shot and character within a single-generated video.
OpenAI shared the prompt to generate a video on their X handle, causing several reactions from X users.
Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq
— OpenAI (@OpenAI) February 15, 2024
The Model’s Weaknesses
According to OpenAI, the current model has weaknesses. It may need help with accurately simulating the physics of a complex scene and may need help understanding specific instances of cause and effect. For example, a person might bite a cookie, but afterward, the cookie may not have a bite mark.
The model may also need to clarify the spatial details of a prompt, for example, mixing up left and right, and may need help with precise descriptions of events that take place over time, like following a specific camera trajectory.