Authors Sue Nvidia Over Copyright Infringement in AI Platform NeMo Megatron

Authors Sue Nvidia Over Copyright Infringement in AI Platform NeMo Megatron

Three authors have sued chip making giant – Nvidia for allegedly using their copyrighted works without their permission to train its AI platform NeMo Megatron.

According to the authors’ arguments, Nvidia’s NeMo Megatron-GPT, released in 2022 “copies and draws from their books without consent, credit or compensation.”

Now the scribes are seeking “unspecified damages” for people whose work was used to train NeMo in the past three years.

This adds to a list of other law-suits against AI firms over allegations of copyright infringements, as they trained their models using information scrapped off the internet without consent.

Hundreds of thousands of books used

The authors Brian Keene, Abdi Nazemian, and Stewart O’Nan claim their works were part of a dataset of approximately 196,640 books used to train Nvidia’s NeMo AI platform to simulate ordinary written language.

According to Reuters, they said, this was before they were taken down in October “due to reported copyright infringement.”

In their filing, the authors also indicated Nvidia’s takedown is a sign of admitting they infringed on the writers’ copyrights when training NeMo.

The lawsuit, filed on Friday in San Francisco covers Keene’s 2008 novel called “Ghost Walk,” “Like a Love Story” penned by Nazemian in 2019, and O’Nan’s 2007 novella “Last Night at the Lobster.”

“During training, the LLM copies and ingests each textual work in the training dataset and extracts protected expression from it,” reads part of the complaint.

According to Fox Business, the authors say in their lawsuit that the books were in a data known as The Pile “that contained a collection of books called “Books3.” Nvidia has however admitted to training its NeMo Megatron AI models “on The Pile and the three books.”

The Pile’s Book3

According to PCMag, The Pile – which was used to train NeMo Megatron consists 800GB of data. This includes 108GB of books, as stated in the authors’ lawsuit.

Its book component is dubbed “Books3” with more than the 196,000 books on “Bibliotik,” and this includes the three authors’ books.

As stated by the authors, the Pile’s Books3 was listed on Hugging Face until last year in October when the dataset was removed with the message it “is defunct and no longer accessible due to reported copyright infringement.”

Also read: LLMs Arrive on Laptops: CEOs Of Nvidia And HP Celebrate AI PCs

Growing litigation by writers

A Nvidia spokesperson however said the chip maker was in compliance with the laws and regulations.

“We respect the rights of all content creators and believe we created NeMo in full compliance with copyright law,” the spokesperson told PCMag by email.

The latest lawsuit adds Nvidia to a growing list of litigations by writers and publishers.

Nvidia has touted its AI platform NeMo as the “fast and affordable way to adopt generative AI,” the technology that is able to create prose, compose lyrics, create images or videos, and write poems.

Nvidia is not the only AI firm battling lawsuits. ChatGPT maker OpenAI and its backers Microsoft have a pending copyright lawsuit by The New York Times.

But, that’s not all. Recently, artists have also stepped out raising concerns that generative AI image creator Midjourney is also using “their unique styles to create outputs that pull from their bodies of work without their consent, calling it dehumanizing and disrespectful.”


Image credits: Shutterstock, CC images, Midjourney, Unsplash.