How Google Is Joining the Text-to-Image AI Race

January 25, 2023

Think up your pictures.

Two of the AI technologies that have been gaining traction are the Text-to-Image and Text-to-Video AI systems, and Google has joined the race to provide them. These systems are trained on large datasets of images, videos and their associated text descriptions.

They can be used for a variety of applications, such as creating photorealistic images from written descriptions, generating images for product listings, or creating illustrations for books and other documents.

On the fore front of development has been OpenAI’s DALL-E text-to-image AI system, which has been dominating for a while. However, tech giants like Microsoft, Meta, and Google have joined the race to create similar AI tools to bring to the market. Also, Midjourney has been garnering lots of attention.

Also read: New Metaverse Product M3 is Focused on Senior Citizens Healthcare

According to Joseph Foley Google announced and showcased a glimpse of their text-to-image AI system in May 2022 and officially released parts of the working system to the public in November of the same year. One of the key features of Imagen is its ability to generate photorealistic images.

This means that the images generated by the system are so realistic, they could be mistaken for actual photographs.

However, this might not be the case when it comes to AI systems that generate images from text, as some of the images come out blurry and the AI often confuses what the user prompted to the image that is produced.

DALL-E is one system that has been found to misunderstand the text prompts and producing different results than intended.

Google beat their chest

Google research claims that their text-to-image AI is the best on the market beating other systems like VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2 both in terms of sample quality and image-text alignment.

According to Joseph Foley’s article, Google has also released sample images that suggest that it is a high-end AI tool that could take on the competition. However, the images have raised concerns over copyright abuse and the security of artists jobs.

Although Google’s Imagen AI seems to be claiming its place within the AI space, Google have not yet released the entire system to the public, they have only given a glimpse with limited functionalities through their AI test kitchen app, which they use for Beta testing of their AI systems under development.

AI systems reinforcing stereotypes?

Whilst these AI systems have proved that they can be used to create beautiful artwork, concerns have been raised over their social biases and stereotypes when generating images. According to James Vincent, researchers have also found that OpenAI’s DALL-E has the ability to create images that are based on some social bias like gender, race and sexuality.

Both Google and OpenAI have decided not to give full public access of their image-to-text AI systems with Google citing their system is not yet ready for full public use, while they work on a way to address social and cultural bias for future release.

Google also noted that there were limitations to this, including “several ethical challenges facing text-to image research broadly.”

The company admitted this could impact “society in complex ways,” and there is a risk of misuse of such models.