Reddit Reportedly Gives Its Content To AI Models For Training

Reddit Reportedly Gives Its Content To AI Models For Training

Reddit articles could be the next fuel in the AI innovation machine since the “front page of the internet” reportedly negotiated a content licensing arrangement allowing its data to be used to train AI models.

Under a new licensing agreement, Reddit will allow “an unnamed large AI company” to have access to its user-generated content platform. The agreement, which is worth about $60 million on an annualized basis, could change as the company is still working on plans to go public. 

Also Read: Zuckerberg Reviews Apple’s Vision Pro, Says Meta’s Quest 3 is ‘So Much Better’

Reddit and Search Engines

This deal follows an October story in which Reddit threatened to cut off Google and Bing’s search crawlers if it couldn’t make a training data deal with AI companies. According to the story, the company can survive without its search feature.

Whether that’s true or not, adding “Reddit” to your search query is one of the ways to avoid SEO spam in search, as Reddit has demonstrated its willingness to be tough in the past. When the most popular Reddit developers shut down due to changes in its third-party API access fee last year, it managed to successfully stonewall its way out of the biggest protest in history.

Many thousands of Reddit communities shut down in protest last year when Reddit said it would start charging for access to its APIs. The website failed shortly after, and a few days later, Reddit hackers threatened to expose previously stolen site data unless Reddit CEO Steve Huffman paid them $4.5 million or revoked the API plan. Reddit later said it was deleting data from before Jan. 1, 2023, in order to create a new chat infrastructure and erase years’ worth of private chat logs and messages from users’ accounts.

The Dilemma of AI Companies and Data

Only recently have most AI companies trained their data on the web without proper permission. However, that has proven to be legally dubious, which has prompted leading companies to try to get data on a more stable footing.

The company Reddit made a deal with has yet to be known. Still, it’s a significant increase over the $5 million yearly payment OpenAI has allegedly been making to news publishers in exchange for their data.

Apple has additionally been pursuing multi-year agreements with significant news organizations that may be valued at “at least $50 million.

Reddit’s Revenue Surge and Other Changes

By the end of 2023, additionally, Reddit’s year-over-year revenue was up by 20 percent, but it was still $200 million short of the $1 billion target it had set two years earlier. Also, the company was advised to seek a $5 billion valuation when it opens up for public investment, which is expected to happen in March. That represents half the $10–$15 billion it might have made when it previously filed to go public in 2021 before a market downturn prevented it from doing so.

Reddit also revealed other changes, including new automatic moderation features, a new “official” badge designed to distinguish real accounts from impersonators, and new automatic moderation features.

Reddit’s decision to remove the option to turn off ad personalization in September incensed even more users against the platform’s evolution.

With the ongoing debate on the ethics of using public data, art, and other human-created content to train AI, this new AI deal could generate even more ire from users.

Image credits: Shutterstock, CC images, Midjourney, Unsplash.