Does Copyright Apply to AI Training? – Summary of the U.S. Copyright Office Report

As someone who frequently uses AI tools to summarize materials and organize complex legal content, I’ve often asked myself a question:
“If I didn’t train this AI myself, do I really have the right to use the result freely?”

In recent months, this question has moved from theory to courtroom reality. Several major copyright lawsuits have been filed in the U.S. involving the use of copyrighted materials in generative AI training. Two particularly significant cases include:

  • The New York Times vs. OpenAI & Microsoft
    The Times alleges that ChatGPT reproduced its articles nearly verbatim, raising serious questions about whether AI models trained on journalistic content are infringing copyright.

  • Sarah Silverman et al. vs. Meta and OpenAI
    This class action suit claims that published books were used in AI training datasets without permission, bringing attention to how text-based models acquire and use copyrighted material.

In response to these growing concerns, the U.S. Copyright Office published a report in March 2025 titled
“Copyright and AI – Part 3: Generative AI Training”, which examines how copyright law applies to the use of protected content in AI model training.

Read the full report here

Why Does Generative AI Raise Copyright Concerns?

Generative AI (GAI) systems produce human-like text, images, or music based on training data—often scraped in massive volumes from online sources.
The problem is that much of this data may be protected by copyright. Whether that use is lawful depends on key questions:

  • Was the data simply referenced, or was it copied and reused?

  • Does the AI output replace or replicate the original content?

The answers have legal consequences.

Is Fair Use a Valid Defense?

Under U.S. copyright law, unauthorized use of copyrighted material may be allowed if it qualifies as fair use, based on four key factors:

1. Purpose and Character of Use

If used for nonprofit education or research, fair use is more likely to apply.
But commercial AI systems like ChatGPT or Claude often face higher scrutiny.

2. Nature of the Copyrighted Work

Use of factual content (e.g., news, public data) weighs in favor of fair use.
Use of highly creative works (e.g., novels, music, art) weighs against it.

3. Amount and Substantiality

Use of entire works or repeated usage of key content reduces fair use viability.
For AI training, how the data is selected and used matters.

4. Effect on the Market

If the AI output replaces the original or affects the creator’s income, fair use is unlikely.
Even style imitation may count as market harm.

The Licensing Debate: Voluntary vs. Compulsory

The report acknowledges increasing interest in voluntary licensing models, where AI developers obtain content through direct licensing or licensing platforms.
For example, Getty and Bria are actively building AI models on fully licensed datasets.

On the other hand, some groups have proposed compulsory licensing, a system where the government grants access to content in exchange for a fee.
However, the Copyright Office recommends caution:

“There is no clear evidence of market failure that would justify government-mandated access to copyrighted content.”

Practical Takeaways for Companies and Developers

  • If your AI system trains on protected content, getting a license is the safest legal path.

  • If you rely on fair use, be prepared to justify it with evidence (e.g., transformative use, minimal impact on market).

  • SaaS and API-based platforms should update their Terms of Use to clearly state data usage scope and allow opt-outs if necessary.

  • The law is still evolving, and this report serves as a key step in shaping future copyright guidance in the age of AI.

© LexSoy Legal LLC. All rights reserved.
All content on this website is the intellectual property of LexSoy Legal LLC and is protected by copyright law. No portion may be reproduced or distributed without permission.

Previous
Previous

Limitation of Liability in Contracts: What You Must Know

Next
Next

Setting the Standard for AI Contracts: A Practical Guide to Australia’s ‘AI Model Clauses’