Meta has unveiled its Segment Anything Model (SAM), an AI model that can identify individual objects within an image, including those it has not previously encountered.
Meta’s research division said it has published the new tool and corresponding dataset to foster research into foundation models for computer vision.
On its Twitter account, Meta AI wrote: “Today we’re releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation.
“SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks.”
According to a company blog post, SAM was trained using the largest dataset of its kind using over 1 billion masks or objects on 11 million licensed images. Having access to the biggest dataset allows the AI model to segment images it has never encountered before.
“The model is designed and trained to be prompt, so it can transfer zero-shot to new image distributions and tasks,” said the company.
“We evaluate its capabilities on numerous tasks and find out that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results.”
Also read: Google Claims its AI Computer Outperforms Nvidia’s A100 Chip
Meta is reportedly already using technology similar to SAM internally for tagging photos, content moderation and post suggestion on Facebook and Instagram.
The first foundation model for image segmentation?
The SAM model allows users to annotate images either by clicking on them or providing a text prompt. India.com posted an image with the word cat prompted, and SAM promptly drew boxes around cats within the image.
Responding to Meta’s tweet announcing the new tool, most users expressed enthusiasm. “Wow, this will accelerate the self-driving and robotics industry 10x,” responded Arkash, while another described it as “very cool.”
“It sounds like a significant step towards the development of the first foundation model for image segmentation. Keep up the fantastic work,” said IslandPitch.
Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation.
SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️ https://t.co/qYUoePrWVi pic.twitter.com/zX4Rxb5Yfo
— Meta AI (@MetaAI) April 5, 2023
But MagicOfBarca thinks otherwise, responding “Idk what’s the point of this or what are the main uses for it so hope you make a video on this.”
According to Meta, SAM uses an image encoder that produces a one-time embedding for the image, while a lightweight encoder converts any prompt into an embedding vector in real-time.
These two information sources are then combined in a lightweight decoder that predicts segmentation masks. After the image embedding is computed, SAM can produce a segment in just 50 milliseconds.
Feel like trying it?
SAM is developed by Meta AI Research and it is publicly available on GitHub. You can also try SAM online with a demo or download the dataset (SA-1B). Here’s what you need to do.
- Download the demo or go to the Segment Anything Model demo link.
- Upload an image or choose one in the gallery.
- Add and subject areas.
- Mask areas by adding points. Select Add Area, then select the object. Refine the mask by selecting.
- Remove Area, then select the area.
The AI arms race
Conversations around AI have exploded across the globe following the release of OpenAI’s ChatGPT in November. This has sparked serious competition in the sector, with tech giants like Meta, Microsoft and Google racing to come up with their own rival products or incorporate the tech in their products and services.
Microsoft has gone as far as adding ChatGPT’s capabilities to its Bing search engine, together with its Office suite.
Meta, meanwhile, has been experimenting with generative AI. CEO Mark Zuckerberg says incorporating such technology into the company’s apps is a priority this year, though it hasn’t abandoned the metaverse.
Among the AI tools the company is developing is one that creates surreal videos from text prompts. There’s also another that quickly generates children’s book illustrations from prose.