News Outlets Struggle to Apply Copyright Law Against ChatGPT

April 24, 2023

Publishers seeking to apply existing copyright laws to ChatGPT and other AI-generated works may face an uphill struggle.

Copyright laws such as the European Union’s copyright directive ensure that media outlets have the right to seek fair compensation from social media sites and search engines that use their work. But media outlets are struggling to apply those laws to generative AI – and regulators are taking notice.

A confusing new reality

Copyright laws have existed since the 18th century, but in generative AI they have found a stern new test. Both outlets and policymakers are now having to grapple with the challenge of chatbots using their data without approval or compensation.

Petra Wikström, director of public policy at Scandinavia’s largest media group Schibsted, is among those concerned by the technology and its implications.

“We are not going into the AI debate with fear but, of course, we need to look at the challenges, and copyright is one of them,” Wikström told Politico earlier this month.

In the case of generative AI, the problem is identifying whose copyright may have been infringed. Facts themselves cannot be copyrighted, and data mining is generally an exception to copyright law. But if human-written articles are being wholly copied into an AI’s database, then it is certainly possible that copyright law may have been breached.

The question boils down to what exactly is being collected and stored within these AI databases.

Iacob Gammeltoft, policy manager at News Media Europe (which represents 2,400 European news outlets) admits that AI has left the organization and its members with an almost impossible-to-solve copyright conundrum.

“We have copyright protection for our work, but it’s more of a question of how to be able to make use of it. Enforcement is the main issue,” said Gammeltoft.

One thing that will greatly impede the generative AI discussion will be copyright.

This is especially true in the EU, whose privacy laws have dictated the direction programming has gone over the years.

— Ant 💀 Working Right Now on a Novel (@AGramuglia) April 18, 2023

Keep off the lawn

Data mining may be one of the major exceptions to copyright law, but that doesn’t mean publishers have to take it lying down.

Holders of copyright can already place machine-readable code on their websites that warn data miners not to trespass; a digital equivalent of a “keep of the lawn” sign. Some jurisdictions are currently considering whether applying those no-trespass signs to chatbot technology might provide some sort of solution.

French publishers from the GESTE association are among those considering whether digital trespassing laws can be used in this manner, and whether it is the right direction.

There are certainly voices who are sympathetic to the need to take action. The problem, however, would be identifying whether these no-trespass signs had been violated by AI.

As Gammeltoft admits, “there is no way to be sure 100 percent until you see the training database that [has] been used to train ChatGPT.”

Without complete transparency of the ChatGPT database, publishers are left to play guessing games – or engage in expensive and time-consuming detective work.

AI copyright briefly explained

Laws for copyright differ across jurisdictions. In common law countries such as the UK, the focus of copyright law is to protect creators financially. In continental Europe, the focus is the creators’ natural rights. In either case, copyright law exists to protect the author of the work.

When an author creates any work, copyright is assumed without the need for further action. If a third party then uses a substantial part of a copyrighted work without the author’s permission the original author’s copyright is infringed and they may be entitled to some form of financial compensation.

Where copyright differs between humans and AI is in the mind of the human or the database of AI. If a human remembers a piece of copyrighted work, no infringement takes place. But if an AI stores a significant part of a copyrighted work in its database, then an infringement may have occurred.

The problem, alas, is proving it. Without access to the AI’s database, identifying an instance of copyright infringement would be extremely difficult – and therein lies the rub. Whatever steps the EU takes next, the road ahead when it comes to policing US-based tech firms would appear extremely challenging.