OpenAI has fired back at The New York Times, saying the news outlet “intentionally manipulated” ChatGPT to’regurgitate’ entire lines from the publication’s articles, as the developer defended its practices in a copyright case.
On Dec. 27, the Times filed a lawsuit against OpenAI and its key investor, Microsoft, alleging intellectual property violations related to the use of millions of its “unique” articles to train ChatGPT.
According to a filing in the U.S. District Court for the Southern District of New York, the newspaper is seeking “billions of dollars in statutory and actual damages” from OpenAI and Microsoft for “unlawful copying and use of the Times’s uniquely valuable works.”
Also read: OpenAI Offers News Publishers $1m to Train its LLMs Using Their Content
Newspaper ‘not telling full story’
OpenAI said the lawsuit is without “merit,” according to a blog post published by the AI company this week, which added that The New York Times is “not telling the full story.” The developer claimed that it only learned about the lawsuit from a news story published by the Times a few days after Christmas.
“We collaborate with news organizations and are creating new opportunities. Training is fair use, but we provide an opt-out because it’s the right thing to do,” OpenAI wrote. It said the Times adopted the content removal option in August but still proceeded to sue months later.
In its copyright case, the news organization claimed ChatGPT had ‘regurgitated’ many of its articles—the tendency by AI chatbots to spit out entire “memorized” passages of specific sections of content or articles. The Times wants OpenAI to destroy any training data and AI models that use its copyrighted material without consent.
OpenAI explained in its blog post that regurgitation “is a rare bug that we are working to drive to zero.” But the firm also accused the newspaper of cherry-picking prompts intentionally designed to trigger regurgitation rather than normal customer usage. It said the examples cited by the Times in its lawsuit are from old articles published on several third-party sites.
“It seems [the Times] intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate,” the company said.
“Our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”
Ian Crosby, a partner at the law firm Susman Godfrey, which is representing the newspaper, told the Financial Times that, “the blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT.”
Crosby added it was “not fair use by any measure” that, as the lawsuit alleges, OpenAI sought “to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.”
AI copyright wars
ChatGPT is a free-to-use generative AI that’s trained on billions of text and code, including the whole internet as it existed before 2021. Since its launch in November 2022, the chatbot has become incredibly popular thanks to its ability to perform different tasks, such as writing poetry.
However, AI companies like OpenAI are facing increasing pressure over their use of copyrighted material to train their large language models. OpenAI and other artificial intelligence firms argue that processing large amounts of data, which are available to the public on the internet, constitutes “fair use” under U.S. copyright laws.
Still, that has not stopped the companies from getting sued. In September, nearly 20 U.S. fiction authors, including John Grisham, George R.R. Martin, and Jodi Picoult, sued OpenAI over alleged copyright violations in using their work to train ChatGPT.
In July, two non-fiction writers filed a similar lawsuit against the company, accusing OpenAI of using their books to train its chatbot without their consent. OpenAI has also been sued for $3 billion for alleged data theft. In February last year, Getty Images filed a lawsuit against AI image generator Stability AI for allegedly copying 12 million of Getty’s images for training data.
The New York Times lawsuit comes as OpenAI is making attempts to close deals with other news publishers to use their content under license. In December, the company reached an agreement with German publisher Axel Springer, worth millions of dollars a year, which could work as a template for deals of a similar nature in the future.
“We regard The New York Times’ lawsuit as without merit. Still, we are hopeful for a constructive partnership with The New York Times and respect its long history,” OpenAI said in its blog.