The End of Free AI Training Data? A U.S. Copyright Ruling That Could Reshape the Future of Generative AI

A recent U.S. court decision has sent a strong signal to the AI industry: training generative AI models using copyrighted content without permission may no longer be considered “fair use.” This ruling could fundamentally change how companies approach data acquisition and copyright compliance in AI development.

How Do Generative AIs Like ChatGPT Actually Learn?

Generative AI tools like ChatGPT or Claude seem to know everything—articles, blogs, news, even legal analysis. But where does all this knowledge come from? Many assume that these models are trained on vast amounts of freely available text scraped from the internet. But was permission obtained from the original content creators? And if not, is that even legal?

As generative AI becomes deeply integrated into daily life, copyright concerns are no longer theoretical—they’re practical and urgent.

Let’s explore a recent case that could serve as a turning point: Thomson Reuters Enterprise Centre GMBH v. Ross Intelligence Inc.

The Game-Changer: Thomson Reuters v. Ross Intelligence

In this landmark case, a U.S. federal court ruled that using copyrighted materials to train an AI system does not qualify as “fair use.” This is one of the first major judicial opinions to directly address AI training and copyright, and it favors copyright holders.

Case Background

Thomson Reuters operates Westlaw, one of the most widely used legal research platforms in the U.S. Westlaw offers judicial opinions, statutes, secondary sources, and editorial content such as “headnotes,” all protected under Thomson Reuters’ copyright.

Ross Intelligence was building an AI-based legal research tool aimed at improving search accuracy. When Ross was denied a license to use Westlaw’s content for training, it allegedly obtained “bulk memos” derived from Westlaw headnotes through a third party and used them to train its AI.

Thomson Reuters filed suit for copyright infringement.

Key Takeaways from the Court's Decision

On February 11, 2024, the U.S. District Court in Delaware ruled in favor of Thomson Reuters, holding that Ross’s use did not qualify as fair use.

Among the four fair use factors, the court focused heavily on:

  • Purpose and Character of Use: Ross used the content for commercial purposes and did not transform it meaningfully. The AI output served a similar function to the original—legal search—making it directly competitive.

  • Market Effect: The court emphasized this as the most important factor. Ross’s AI tool could substitute for Westlaw in both existing and potential markets (e.g., licensing of legal content for AI training). This posed a substantial threat to Thomson Reuters’ business.

Although Ross argued that the AI outputs didn’t replicate Westlaw’s content verbatim, the court concluded that the training process still harmed the market value of the original works.

The court suggested that if the use had been transformative—meaning it significantly altered the original content’s nature and purpose—the outcome might have been different.

Implications for Generative AI

While this case didn’t directly involve a large language model (LLM) like GPT, it provides a potential roadmap for how courts may assess copyright claims against generative AI platforms in the future.

Two principles emerged as decisive:

  1. Transformative Use: Simply feeding copyrighted text into an AI does not make the use transformative.

  2. Market Substitution: If the AI or its outputs serve as a substitute for the original work or its licensed uses, fair use will likely not apply.

Other factors—such as the originality of the source content, the extent of reproduction, and the similarity between outputs and inputs—will also play significant roles going forward.

This decision sets a precedent that favors copyright holders and could encourage similar lawsuits targeting generative AI models trained without clear licensing.

A New Era of Licensed Training Data

The ruling signals a clear shift: high-quality training data is no longer free. AI companies can no longer rely on web-scraped data without considering the legal implications. From now on, licensing agreements and proper attribution may become the industry standard.

Transformative innovation in AI must go beyond reformatting or remixing existing content. Companies will need to invest in developing models that produce genuinely original outputs—and they may need to compensate content creators along the way.

This could reshape the economics of AI, leading to rising development costs and more expensive AI services in the long run.

Final Thoughts

The Thomson Reuters v. Ross case represents a legal turning point in how courts may treat AI training and copyright. Developers can no longer assume that using copyrighted content for machine learning purposes falls under “fair use.”

Instead, they must prioritize:

  • Demonstrating transformative use

  • Avoiding market substitution

  • Building cooperative, licensed frameworks with content creators

If you or your company are navigating copyright risks in AI development or need help structuring data licensing agreements, LexSoy Legal LLC is here to assist. Please contact us at contact@lexsoy.com for practical, attorney-led support.

© LexSoy Legal LLC. All rights reserved.

Previous
Previous

[TOU Series] 1. Are Terms and Conditions Just a Formality? Understanding Their Legal Power

Next
Next

Impact of Tariff Policy Changes on International Transactions and Strategic Contractual Responses