A team of PhD student researchers from Saudi Arabia have developed a new AI-powered tool, MiniGPT-4, which has similar attributes to OpenAI’s ChatGPT-4.
Since ChatGPT was released in November and became a global hit, developers have stopped at nothing to come up with new AI tools that either rival the popular chatbot or complement it.
MiniGPT-4, developed using the ChatGPT model, is just the latest example.
According to Future Tools, MiniGPT-4 is capable of many tasks including image description generations and building websites.
“This tool is capable of generating detailed image descriptions, creating websites from hand-written drafts, writing stories and poems inspired by given images, providing solutions to problems shown in images, and teaching users how to cook based on food photos,” claims Future Tools.
When ChatGPT-4 was released, a video of the model building a website from a sketch image was shown. According to a tweet by Barsee, MiniGPT-4 has the ability to pull off the same feat. The only difference is that ChatGPT-4 is not available to everyone at present, while MiniGPT-4 is already in the wild.
According to Ghacks, MiniGPT-4 uses an advanced LLM called Vicuna as the language decoder, which is built upon LLaMa and is reported to achieve 90% of ChatGPT’s quality as evaluated by GPT-4.
The AI model has used the pre-trained component of Bootstrapping Language Image Pre-training (BLIP-2) and has added a single injection layer to align the encoded visual features with the Vicuna language model by freezing all other vision and language components.
David Watson says MiniGPT is lightweight and can be easily implemented in real-time situations like chatbots, virtual assistants and automated image captioning systems.
He also lists a few possible applications that can be a good use for MiniGPT-4: creating image captioning systems that only require lightweight resources; and image description for the visually impaired using audio description, a method which would require including a text-to-audio system.
While OpenAI confirmed GPT-4’s multimodal capabilities, they are yet to release its image-processing abilities. MiniGPT-4 fills this gap by processing images alongside language using a more sophisticated LLM.
An AI tool to aid research
Experts say the state-of-the-art foundational language model used is designed to aid researchers in advancing their work in this particular AI segment.
Given that OpenAI has not disclosed much information about GPT-4’s architecture, model size, hardware, training compute, dataset construction or training method, MiniGPT-4’s open-source nature may prove particularly valuable to researchers.
“MiniGPT’s ability to process images provides researchers with new opportunities to investigate the relationship between language and vision models,” said Yana Khara, writing for Analytics Vidhaya.
“By offering a smaller, more accessible model for researchers to work with, MiniGPT-4 can drive innovation and advancements in AI technology.
“Furthermore, the model’s open-source foundation ensures the research community can collaborate and share their findings to further progress in the field.”
MiniGPT takes image captioning to another level
Barsee, who tweeted a thread describing how MiniGPT-4 can be used to chat with images, included some of the following cases:
Fixing broken items
By uploading a picture of a broken item onto the MiniGPT platform and asking how you could fix the situation in the image, the chatbot will explain the situation in the image and suggest ways to fix the problems identified.
In the tweet, MiniGPT can easily identify the problem, a leaking washing machine, explaining the reasons why the leak could happen and also providing a list of solutions the user could try.
In another tweet from Barsee on the MiniGPT thread, he included a scenario where MiniGPT was given a picture of a mug the user makes and sells. The user then asks the chatbot to write an advertisement to market the mugs, which the chatbot duly does.
Simply upload a picture of a movie and ask MiniGPT to give you a short introduction; it will then produce a paragraph introduction of the movie in question. As seen in the tweet, the MiniGPT chatbot recognizes the image from “The Godfather” and writes an intro of the movie as instructed.
The market has seen countless new AI tools developed since ChatGPT launched. There are more alternatives to the famous chatbot with others reportedly outpacing it, not least Auto-GPT, which is still making waves in the AI community. At this rate, it almost seems inevitable that we’ll end up with an embarrassment of AI riches for virtually any human task.