Stability AI’s New Image Generator Stable Diffusion XL 1.0 Rivals Midjourney

Stability AI's New Image Generator Stable Diffusion XL 1.0 Rivals Midjourney

Stability AI has released Stable Diffusion XL (SDXL) 1.0, an open-source generative AI model that can convert text prompts into images. The model competes directly with the likes of Midjourney and Adobe Firefly.

The London-based company claimed the new AI image generator is its “most advanced” tool yet. SDXL 1.0 improves image quality, it says and can deliver “more vibrant and accurate” colors and contrast, lighting and shadows compared to previous Stable Diffusion models.

Also read: YouPro, the ChatGPT Rival with GPT-4 and Stable Diffusion XL.

Stable Diffusion XL 1.0: What can it do?

According to Stability AI CEO Emad Mostaque, the new Stable Diffusion model implemented several upgrades to improve its functionality. The company has been testing the capabilities of SDXL 1.0 since June, releasing a research-only version that helped to prove its power.

With short text prompts, the AI can generate high-quality images in any art style, including photorealism. Stable Diffusion XL 1.0 can also interpret simpler language more accurately and can create full one-megapixel resolution pictures in seconds across multiple aspect ratios.

“The latest SDXL model represents the next step in Stability AI’s innovation heritage and ability to bring the most cutting-edge open access models to market for the AI community,” Mostaque wrote in a blog post.

Stability AI's New Image Generator Stable Diffusion XL 1.0 Rivals Midjourney
An image generated using SDXL 1.0. Image Credits: Stability AI

The chief executive revealed that SDXL 1.0 now allows users to create custom images with less work, thanks to a new fine-tuning feature. With just five images, people can fine-tune the model to generate images of specific people, products, and more.

However, the feature is currently in limited testing with some early access users, and will be released to the public in the coming weeks, he added. Per industry reports, SDXL 1.0 also enables people to reconstruct missing parts of an image and extend existing pictures.

Training SDXL 1.0

SDXL 1.0 was created using a new and efficient training method that takes advantage of a 3.5 billion-parameter base model, considered the largest parameter count of any open access image model. Stability AI intends to use this “solid foundation” to build more tools.

“Base models are really interesting, they’re like a Minecraft release where a whole modding community appears, and you’ve seen that richness within the Stable Diffusion community,” said Emad Mostaque, as quoted by VentureBeat.

Mostaque explained that the larger parameter count of the base SDXL 1.0 model results in more accurate image generation.

“You’re teaching the model various things and you’re teaching it more in-depth. Parameter count actually matters – the more concepts that it knows, and the deeper it knows them,” he added.

The model’s training set also includes artwork from artists who’ve previously complained about AI companies using their work as training data for generative AI models. Stability AI asserts it is protected from legal liability under the fair use doctrine, TechCrunch reports.

In its blog post, Stability AI said since it launched SDXL 1.0 in beta in April, ClipDrop users have generated more than 35 million images using the model. The company’s Discord community generated an average of 20,000 images per day during the same period.

SDXL 1.0 is available immediately in open source on various platforms, such as Bedrock, Amazon’s cloud platform for hosting generative AI models. It is also on Github alongside Stability’s API and consumer apps, ClipDrop and DreamStudio. Users can try the model here.

AI image generators’ rivalry

Stable Diffusion XL 1.0 is built to compete with popular AI image generator Midjourney and others including Adobe’s Firefly and OpenAI’s Dall E-2. While Stability AI’s new tool is free to use, one will need to pay at least $10 per month to get the best of Midjourney.

That gets you 3.3 hours of GPU time, good for roughly 200 image generations on Midjourney. Adobe Firefly is also a paid service, costing $19.99 each month.

However, Midjourney uses a combination of large language and diffusion models to deliver very high-quality images compared to SDXL 1.0. Both image generators face similar pitfalls – they can used by bad actors to create harmful content, like nonconsensual deepfakes.

For example, generative AI image generators have been used to create images of Pope Francis dressed in a puffer jacket and Trump supposedly getting arrested days before the actual event.

“We are constantly improving the safety functionality of Stable Diffusion and are serious about continuing to iterate on these measures,” said Joe Penna, Stability AI’s head of applied machine learning.

“Moreover, we are committed to respecting artists’ requests to be removed from training data sets.”

Image credits: Shutterstock, CC images, Midjourney, Unsplash.