When Stable Diffusion, the AI application rendering photo realistic images shot to prominence a few weeks ago, a new buzzword came along with it; hypernetworks.
Now, already Stable Diffusion and hypernetworks are so conjoined its impossible to mention one without the other in the same paragraph.
“I’ve trained stable diffusion hypernetworks on small datasets (no, not contemporary artists aside from yours truly) to teach it obscure “styles” it doesn’t really understand out of the box. It works exactly as described, actually better than I myself thought it would,” says a user on twitter.
i've trained stable diffusion hypernetworks on small datasets (no, not contemporary artists aside from yours truly) to teach it obscure "styles" it doesn't really understand out of the box. it works exactly as he described, actually better than i myself thought it would.
— regret maximizer (@regretmaximizer) December 20, 2022
This epitomises the hypernetwork buzz gripping netizens of late.
In computer science, a hypernetwork is technically a network that generates weights for a main network. In other words, it is believed that the main network’s behaviour is the same with other neural networks because it learns to map some raw inputs to their desired targets while the hypernetwork takes a set of inputs that contain information about the structure of the weights and generates the weight for that layer.
How are hypernetworks used?
In order to understand what a hypernetwork is, lets back up a little. If you have created images on Stable Diffusion – the AI tool for creating digital art and images – you have come across it.
Training generally refers to a process where a model learns (determining) good values for all the weights and the bias from labeled examples
First an AI model must learn how to render or synthesize an image of someone into a photo from a 2D or 3D model via software. Although the Stable Diffusion model was thoroughly tested, it has some training limitations that can be corrected by embedding and hypernetworks training methods.
To get best results, end-users may choose to do additional training to fine-tune generation outputs to match more specific use-cases. An “embedding” training involves a collection of user-provided images, and allows the model to create visually similar images whenever the name of the embedding is used within a generation prompt.
Embeddings are based on the “textual inversion” concept developed by researchers from Tel Aviv University where vector representations for specific tokens used by the model’s text encoder are linked to new pseudo-words. Embedding can reduce biases within the original model, or mimic visual styles.
A “hypernetwork”, on the other hand, is a pre-trained neural network that is applied to various points within a larger neural network, and refers to the technique created by NovelAI developer Kurumuz in 2021, originally intended for text-generation transformer models.
Trains on specific artists
Hypernetworks are included to steer results towards a particular direction, allowing Stable Diffusion-based models to replicate art styles of specific artists. The network has the advantage of being able to work even when the artist is not recognised by the original model and will still process the image by finding key areas of importance such as hair and eyes, and then patch these areas in a secondary latent space.
“The Embedding layer in Stable Diffusion is responsible for encoding the inputs (for example, the text prompt and class labels) into low-dimensional vectors. These vectors help guide the diffusion model to produce images that match the user’s input,” Benny Cheung explains in his blog.
“The Hypernetwork layer is a way for the system to learn and represent its own knowledge. It allows Stable Diffusion to create images based on its previous experience.”
While its embedding layer encodes the inputs such as text prompt and class labels into low-dimensional vectors to help guide the diffusion model to produce images that match the user’s input, the hypernetwork layer is somewhat a way for the system to learn and represent its own knowledge.
In other words, it permits Stable Diffusion to create images based on its previous experience. In Stable Diffussion, a hypernetwork is an additional layer that is processed after an image has been rendered through the model. The Hypernetwork tends to skew all results from the model towards your training data in a way essentially “changing” the model.
This essentially means that the hypernetwork is responsible for memory retention of images the system has previously generated. When a user gives a new input, the system can use its prior existing knowledge to create a more accurate image. As such, hypernetworks therefore allow the system to learn faster and improve as it goes.
This has the advantage that every image containing something that describes your training data, will look like your training data.
“We found that training with embedding is easier than training with a hypernetwork for generating self-portraits. Our training yielded good results which we are satisfied with,” Cheung wrote.
Which finetuning technique are you using? Something like Hypernetworks or Textual Inversion?
— Mathias Michel (@m91michel) December 21, 2022
But its a technology many are still haggling with. Hypernetworks and AI-generators have just begun to cater to users’ needs and wants. The user interfaces and prompting techniques will undoubtedly advance fast, and will maybe even be catching Google off-guard, as MetaNews recently covered.