Reddit Sues Anthropic Over Unauthorized Use of Data for AI Training — Why Contractual Clarity Is Essential

As the AI boom continues, companies across industries are racing to develop large language models (LLMs) and other AI systems. One critical aspect of this process is sourcing high-quality training data—often from publicly available online platforms. But a recent lawsuit filed by Reddit against Anthropic highlights the growing legal risks of using such data for AI training without explicit permission.

According to a complaint filed in a Northern California court, Reddit claims that Anthropic used data scraped from Reddit without a proper license to train its AI models. Reddit alleges that this unauthorized use of its data for commercial purposes violated its user agreement.

The lawsuit marks one of the first legal challenges by a major tech company specifically targeting an AI provider’s training data practices. It follows similar lawsuits from the New York Times against OpenAI and Microsoft, various authors against Meta, and the music industry against AI content generation startups—all centered on the unauthorized use of copyrighted or proprietary data for AI training.

Reddit’s Chief Legal Officer, Ben Lee, stated that the company "will not tolerate profit-seeking entities like Anthropic commercially exploiting Reddit content for billions of dollars without any return for redditors or respect for their privacy."

Interestingly, Reddit has entered into licensing agreements with other AI providers, including OpenAI and Google, which allow these companies to use Reddit data for AI training under specific terms designed to protect user interests and privacy. Reddit argues that Anthropic, in contrast, scraped its platform without authorization and ignored the site’s robots.txt file, which signals that automated systems should not crawl the site. According to the complaint, even after claiming to block its bots from Reddit in 2024, Anthropic allegedly continued scraping the platform over 100,000 times.

The Reddit lawsuit reflects a broader trend: AI training use cases are being treated differently from ordinary data usage in both legal and commercial contexts. Using data for AI model training inherently involves incorporating that data into the model in an irreversible way, raising complex issues around intellectual property rights, privacy, and fair compensation.

This distinction is increasingly shaping contract negotiations. In my recent work reviewing commercial contracts, this topic comes up again and again. Even when parties have entered into data licensing agreements or API agreements, it is critical to remember that such agreements do not automatically authorize the use of data for AI model training. The prevailing trend is to prohibit AI training use unless it is expressly permitted in the contract.

Therefore, even if a company believes it has the right to use certain data under an existing contract, it must carefully review whether that right extends to using the data to train proprietary LLMs or other machine learning models. In most cases today, such uses are explicitly restricted unless separately licensed.

The Reddit vs. Anthropic lawsuit serves as an important reminder: companies must clearly define the scope of data usage rights in their contracts, particularly as AI-related use cases evolve rapidly. Failing to do so can lead to significant legal and business risks.

If your organization is navigating data usage agreements or considering AI training projects, it is critical to carefully assess your contractual rights and obligations. LexSoy Legal LLC provides extensive experience in this evolving area and is ready to assist. For legal guidance, please contact us at contact@lexsoy.com.

© LexSoy Legal LLC. All rights reserved.

Next
Next

CCPA Proposed Rule Released in 2025: Cybersecurity Audits Mandatory Starting 2028