Authors, Including Pulitzer Prize Winner Chabon, Sue OpenAI Over Alleged Copyright Infringement

Authors, Including Pulitzer Prize Winner Chabon, Sue OpenAI Over Alleged Copyright Infringement

Pulitzer Prize winner Michael Chabon and four other writers have sued OpenAI, accusing them of misusing his writing to train their popular AI-powered chatbot, ChatGPT.

ChatGPT, OpenAI’s brainchild, has become the trendsetter of the generative AI era after millions of people embraced it last year. However, the company has also garnered attention from various intellectual individuals and firms for violating copyright.

The Microsoft-backed American start-up is already facing a lawsuit for spreading false information by the U.S. Federal Trade Commission.

Along with Chabon, playwright David Henry Hwang and authors Matthew Klam, Rachel Louise Snyder, and Ayelet Waldman have accused OpenAI of copying their works without permission in the lawsuit filed on Sept. 8.

OpenAI says only publicly available information was used

ChatGPT is trained by three basic processes, which include publicly available information on the internet, information licensed from third parties, and information provided by human trainers, according to a blog post.

“For this set of information, we only use publicly available information that is freely and openly available on the Internet; for example, we do not seek information behind paywalls or from the “dark web,” reads the blog post.

However, the group of authors has accused ChatGPT of being trained by their copyright-protected books like The Amazing Adventures of Kavalier and Clay of Chabon, The Dance and the Railroad by Hwang, and Who is Rich? by Klam in a 24-page lengthy lawsuit registered in District Court, California.

Also read: Data Centers Doubling as Water Guzzlers Cooling Off Generative AI Servers

“OpenAI also copied many books while training GPT-3. In the July 2020 paper introducing GPT-3, Language Models are Few-Shot Learners, OpenAI disclosed that, in addition to using the “Common Crawl” and “WebText” datasets that capture web pages, 16% of the GPT3 training dataset came from “two internet-based book corpora,” which OpenAI simply refers to as “Books1” and “Books2,” mentioned in the lawsuit.

AI infringing copyright continues to make headlines

ChatGPT became the fastest-growing application in history earlier this year. However, this is not the first lawsuit filed against OpenAI over copyright infringement. Two U.S. authors also sued the company for the same allegation in June.

Massachusetts-based writers Paul Tremblay and Mona Awad said ChatGPT mined data copied from thousands of books without permission, infringing the authors’ copyrights back in June. That lawsuit also seeks compensation on behalf of a nationwide class of copyright owners whose works were used by OpenAI without permission.

OpenAI was also on the verge of facing the world’s first defamation lawsuit over ChatGPT content back in April.

“After making the inquiry, it generated five or six paragraphs of information. The really disturbing thing was that some of the paragraphs were accurate, and then there were other paragraphs that described things that were completely incorrect. It told me that I’d be charged with very serious criminal offenses, that I’d be convicted of them, and that I had spent 30 months in jail,” said Brian Hood, mayor of Hepburn Shire Council.

Hence, I could not find out the further development of the Australian mayor’s defamation lawsuit. However, another generative AI chatbot on has confirmed that the mayor has filed the lawsuit.

OpenAI is not alone

The generative AI era has been vibrant since the beginning of this year, but it has equally garnered the attention of stakeholders through copyright infringement.

Stability AI was also accused of coping with and processing millions of Getty Images’ images without proper licensing earlier this year.

Meta and OpenAI are currently embroiled in a lawsuit over copyright infringement. Sarah Silverman, Richard Kadrey, and Christopher Golden have accused Meta and OpenAI of utilizing their books without permission in the development of their large language models.

Image credits: Shutterstock, CC images, Midjourney, Unsplash.