Google’s Genie AI Crafts Games from Single Images

Google's Genie AI Crafts Games from Single Images

Google has announced Genie, an app for making mobile games, as part of its ongoing investment in artificial intelligence. 

The generative AI model Genie, developed by Google’s AI startup DeepMind, demonstrated a live demo. Genie learns game mechanics from hundreds of thousands of gameplay videos and can generate playable games with minimal prompts.

Also Read: Figure AI to Raise $675 million for Labour-Solving Humanoid Robotics

Unveiling Genie

As stated in Google’s official DeepMind blog post, Genie is a foundational world model trained using online videos. “An endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches” can be produced by the model.

Genie, short for Generative Interactive Environments, was developed in partnership between Google and the University of British Columbia. With just one image, it can generate side-scrolling 2D platformers such as Contra and Super Mario Brothers based on user prompts.

However, Google DeepMind stated during the announcement that it is introducing a “new paradigm” for generative artificial intelligence (AI) in the form of Genie. Additionally, the company acknowledged the emergence of generative AI models capable of generating novel and creative content via language, images, and even videos.

According to Google, a significant portion of the 200,000 hours of unsupervised public internet gaming videos that Genie was trained on are 2D platformers rather than full virtual reality games.

Genie’s Specifications

When it comes to dimensions, Genie stands at 11 billion parameters. A spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model are also included in the model. These specifications enable Genie to act in generated environments frame-by-frame, even without labels or other domain-specific requirements when training.

Additionally, Genie can be instructed to generate a diverse set of interactive and controllable environments, despite being trained on video-only data. Genie can make playable environments with just one image prompt, unlike numerous generative AI models that can produce creative content with language images and even videos.

However, Google DeepMind developer Tim Rocktäschel said on X (formerly Twitter) that they focus on scale rather than adding inductive biases.

He added that they use a dataset of over 200k hours of videos from 2D platformers and train an 11B world model. In an unsupervised way, Genie learns diverse latent actions that consistently control characters.

Google's Genie AI Crafts Games from Single Images
Photo Credit: Google

Genie’s Capabilities

According to Google researchers, Genie is driven by three models: a dynamic model that predicts what will happen in the next frame, a video tokenizer that turns raw video frames into discrete tokens, and a latent action model that can infer the actions between video frames.

Genie’s foundational model’s ability to identify a game’s primary character without being trained on action or text annotations is one of its unique features. Thanks to the models that drive it, the user can effortlessly control the character in an AI-generated virtual reality environment.

Rocktäschel also said that Genie could turn other media into games. Genie can be asked to create various action-controllable virtual worlds from various inputs in the accompanying Google DeepMind research paper.

Furthermore, Rocktäschel said the model can convert any image into a playable 2D world. According to him, Genie can bring to life human-designed creations such as sketches, for example, beautiful artwork from Seneca and Caspian, two of the youngest ever world creators.

Image credits: Shutterstock, CC images, Midjourney, Unsplash.