AIs can generate near-verbatim copies of novels from training data

library shelves 1152x648 1768598730

Last year a US court found that Anthropic’s LLM training on certain copyrighted material could be considered fair use because it was deemed “transformative”.

But it determined that storing pirated works was “inherently, irreparably infringing”, leading AI Group to pay $1.5 billion to settle the lawsuit.

In Germany, a ruling in November last year found that OpenAI had infringed copyright because its model memorized song lyrics. The case, brought by GEMA, the organization representing composers, songwriters and publishers, was considered a landmark decision in the EU.

Rudy Telscher, a partner at the law firm Husch Blackwell, said that reproducing the entire book without jailbreaking is “clearly a copyright violation.” But “it’s a matter of whether it’s getting enough.” [AI models] may be vicariously liable for the violation,” he said.

Anthropic said that the jailbreaking technique used in the Stanford and Yale research was impractical for ordinary users and would require more effort to extract the text than simply purchasing the content.

The company also said that its model does not store copies of specific datasets but instead learns from patterns and relationships between words and strings in its training data.

xAI, OpenAI and Google did not respond to requests for comment.

The fact that AI labs have taken safeguards to prevent training data from being extracted means they are aware of the problem, said Imperial’s de Montjoye.

Ben Zhao, a computer science professor at the University of Chicago, questioned whether AI labs really need to use copyrighted material in training data to build cutting-edge models.

“Whether the technical results can be done or not, it is still a question of whether we should do it?” Zhao said. “The legal side should ultimately present its position and actually be the arbiter in this entire process.”

© 2026 The Financial Times Ltd. All rights reserved. May not be redistributed, copied, or modified in any way.



<a href

Leave a Comment