OpenAI’s Smackdown by a German Court Hints at What’s Next for AI and Art

Last weekend, a German musicians’ organization won a pretty groundbreaking legal victory against OpenAI. The court says that the training of the GPT 4 and 4o models involves copyright infringement, and that some of the outputs of the models themselves are infringing. A pretty comprehensive victory for the “It’s just a plagiarism machine” crowd.

I think seasoned OpenAI haters would agree with at least some of the recent legal analysis of the decision by intellectual property law scholar Andres Guadamuz. University of Sussex. Guadamuz explains that the decision and its implications are a bit complicated, but may actually benefit copyright holders in the long term.

This means that the big fish with copyright – pop stars, Hollywood actors and best-selling authors – must now understand how this technology can benefit them financially, even if small-time creators are not so lucky.

Context: GEMA is a German organization with no American counterpart, it is a copyright group that represents the interests of composers, lyricists and publishers. It sued OpenAI on behalf of stakeholders related to nine famous and undisputed German songs. It would amount to a lawsuit on behalf of the composers and lyricists of nine American songs, ranging from Sheryl Crow’s “Soak Up the Sun” to Pharrell Williams’ “Happy.”

In other words, these are not the songs that OpenAI once discovered from Garage Band’s website and turned into training data. Instead, they are unavoidable cultural touchstones that appear repeatedly in the training data in multiple, potentially altered, or parodied forms and as fragments, fragments, and snippets.

The basis of the lawsuit was that after turning off ChatGPT’s ability to browse the web, users were able to feed it queries such as “What’s the second verse?” [the German equivalent of “No Scrubs” by TLC]And ChatGPT gives sometimes fragmented or flawed, but largely correct answers.

The judgment is from the Munich Regional Court, and naturally it is in German, but the Google Translated version gave me the following broad interpretation of what the court laid down:

The model itself stored bootleg copies of the lyrics of those songs. When it reproduced the song in response to the prompts, even if it was producing the song in an incomplete form, or stating the wrong lyrics, it was a further act of infringement. The important thing is that some hypothetical ChatGPT user attempting to obtain a song from ChatGPT is not a copyright infringer; OpenAI is. And because ChatGPT output contains shareable links, OpenAI was making this infringing content available to the public without permission.

OpenAI will now have to disclose at some point how often these song texts were used as training data, and when, if ever, it made money from them. It will also have to stop storing them and not output them again. Monetary damages may be determined at a later point.

Earlier this month, a similar court case in the UK went exactly the other way: Getty Images lost its case against Stability AI because, the judge in that case wrote, “an AI model like Stable Diffusion that does not store or reproduce any copyrighted works (and has never done so) is not an ‘infringing copy’.”

Guadamuz’s analysis on this point is interesting, because it shows what the Court was thinking here. The German court, Guadamuz said, relied on research about machine “memorization,” something that a model could do more easily and clearly with lyrics than one trained on a Getty Images photo.

So unlike the Getty decision, this new decision is more in line with current intellectual property legal views in the digital age – that the same copyright rules apply to a playable CD and a CD-ROM.

So as long as the copyrighted material can be made comprehensible again, it is a monetizable copy of the artwork. This is also the case for songs “included” in LLM.

However, Guadamuz raises questions about how the ruling treats this “remembering” concept, saying it appears to be an attempt to make training without remembering a legal norm by using EU data-mining law. In the local sense, Guadamuz finds it a problem because it assumes a situation that does not correspond to what is said in the law. But more importantly, it appears that memorization always occurs during training on a given task, which Guadamuz says is not the case.

This legal laxity may be a problem as companies interpret the matter in the years to come, but the takeaway for Guadamuz is this: We’ll likely “eventually end up with some kind of licensing market.”

Like Sora 2’s copyright and equity treatment, which many actors and copyright holders eventually approved, a framework aimed at revenue sharing is gradually materializing (theoreticalfuture AI revenues) with the owners of the copyright texts. OpenAI shocked all copyright holders in the world by creating a new universe of alleged copyright infringement. Artists and creators naturally felt betrayed.

But slowly, powerful stakeholders are warming up to the idea of ​​generic AI, as they begin to imagine how they’ll get their beaks wet, and eventually how wet their beaks could get. You can see this with major US record labels now teaming up with companies they once sued, like Udio.

But as far as the dry, cracked beaks of powerless copyright stakeholders – small-time artists, writers and creators – who are worried that their work will be made redundant or irrelevant in this strange new content universe, it’s still not at all clear how those beaks will benefit from any of this.



<a href

Leave a Comment