How a new court win for OpenAI and a tough Supreme Court standard could leave AI copyright claims on shaky ground.
A relatively obscure Supreme Court case involving the Fair Credit Reporting Act might seem like an odd fit for the high-stakes world of copyright in the age of artificial intelligence. But it’s time to start paying attention to the Court’s 2021 ruling in TransUnion v. Ramirez, which introduced a strict new requirement for establishing standing to sue in federal court. Plaintiffs now must demonstrate that they’ve suffered specific, “concrete harm.” The mere violation of a federal statute, on its own, is no longer enough to open the courthouse doors.
After TransUnion, legal experts warned that this break from precedent could have sweeping effects, particularly for statutes that provide for statutory damages without proof of actual harm—like the Copyright Act. For copyright claims involving AI, TransUnion could also make it significantly harder for copyright owners to bring claims against AI companies for using their creative works in training data—at least without specific examples of infringing output.
That scenario began to unfold last week, as TransUnion played a starring role in the dismissal of Raw Story Media v. OpenAI (read here). Southern District of New York Judge Colleen McMahon ruled that the plaintiffs failed to show any concrete harm caused by OpenAI’s alleged removal of copyright management information from their articles, which they claim were then used to train ChatGPT’s language model. If Judge McMahon’s reasoning is adopted—or even extended—by other courts, AI-related copyright claims could find themselves on shaky ground, facing stricter standing requirements across a broader range of cases.
Raw Story Media v. OpenAI
In the Raw Story Media case, two digital news organizations, Raw Story and AlterNet, claimed that OpenAI violated the Digital Millennium Copyright Act (DMCA) by using their copyrighted articles—stripped of copyright management information (CMI), such as author names and copyright notices—to train ChatGPT. The plaintiffs argued that this violated section 1202(b) of the DMCA, which prohibits the removal or alteration of CMI when the party knows that doing so will facilitate future infringement.
But Judge Colleen McMahon dismissed the case, finding that the plaintiffs failed to allege a “concrete injury-in-fact”—a requirement for Article III standing, which is a threshold question in every federal case. Central to her ruling was the conclusion that the plaintiffs hadn’t shown that ChatGPT disseminated a copy of their works in response to any user query. As she noted, “Plaintiffs have not alleged any actual adverse effects stemming from this alleged DMCA violation.” Without concrete allegations of harm, Judge McMahon found the claims too abstract to meet federal standing requirements.
Judge McMahon also found that the plaintiffs lacked standing to seek injunctive relief. “Plaintiffs allege that ChatGPT has been trained on a scrape of most of the internet, which includes massive amounts of information from innumerable sources on almost any given subject,” she wrote. “Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of plaintiffs’ articles seems remote.”
Unlike typical DMCA section 1202(b) cases, in which copyrighted works are stripped of CMI and then reproduced verbatim, Judge McMahon noted that generative AI models like ChatGPT don’t reproduce works directly. Instead, they synthesize relevant information from the underlying data to generate new responses. She concluded that the plaintiffs hadn’t plausibly alleged that any of their articles were directly infringed—or that they faced a substantial risk of future infringement.
TransUnion v. Ramirez: The New Standard for Concrete Harm
The Supreme Court’s 2021 decision in TransUnion v. Ramirez was pivotal in Judge McMahon’s dismissal of the publishers’ claims. In TransUnion, a class of plaintiffs sued the credit bureau TransUnion, alleging it had falsely labeled them as potential terrorists and drug traffickers on their credit reports. (Because nothing says “poor credit risk” like a side hustle in arms dealing.) Although Congress had granted plaintiffs the right to sue under the Fair Credit Reporting Act for reporting errors, the Supreme Court held that this statutory right alone wasn’t enough to establish standing in federal court. Plaintiffs also had to demonstrate “concrete harm”—a specific, measurable injury—to satisfy Article III’s requirement that federal courts only hear actual cases or controversies. In the words of Justice Kavanaugh, “No concrete harm, no standing.”
The concrete injury inquiry asks whether plaintiffs can identify a “close historical or common-law analogue” for their asserted harm, a standard that the Court said isn’t tethered to “evolving beliefs about what kinds of suits should be heard in federal courts.”
Although the plaintiffs’ credit reports contained errors, the Court in TransUnion concluded that an incorrect report alone didn’t constitute a sufficiently concrete injury unless it was disclosed to a third party, which the Court saw as analogous to the harm caused by defamation. As Justice Kavanaugh explained, “Congress’s creation of a statutory prohibition or obligation and a cause of action does not relieve courts of their responsibility to independently decide whether a plaintiff has suffered a concrete harm under Article III.” In other words, “Congress may not simply enact an injury into existence.”
Applying TransUnion to the DMCA, Judge McMahon acknowledged that Raw Story and Alternet had alleged that their copyrighted works, minus CMI, were used to train ChatGPT and that their articles remain in its repository. But without a showing of dissemination or specific adverse effects, Judge McMahon found the plaintiffs’ claims lacking the kind of measurable harm that Article III requires.
Concrete Harm and AI Copyright Cases: A High Bar?
To be clear, the plaintiffs in Raw Story Media didn’t assert a claim for copyright infringement, instead relying entirely on section 1202(b). Indeed, Judge McMahon observed that the DMCA was a poor fit for the case: “Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation to Plaintiffs.” The plaintiffs now have an opportunity to potentially amend their case, perhaps adding a claim centered on OpenAI’s use of their articles without payment.
That said, the impact of TransUnion, if broadly applied, could reach far beyond the DMCA and affect other copyright infringement cases involving AI training data as well. Judge McMahon’s ruling suggests that without demonstrable harm—such as identical or near-identical output—copyright owners may face an uphill battle in court.
Statutory Damages at Risk?
You may be asking yourself, why do we need demonstrable harm when statutory damages are available under the Copyright Act. Well, the TransUnion majority was explicit that, even if Congress creates a statutory “prohibition or obligation,” plaintiffs still need to show they’ve suffered concrete harm. Justice Thomas, dissenting from his conservative colleagues, saw this as a radical shift. (When Clarence Thomas thinks you’re too extreme, that’s saying something.) Thomas argued that the majority’s ruling upends centuries-old precedent, pointing to statutory damages going back to the Copyright Act of 1790 as an example: “The First Congress enacted a law defining copyrights and gave copyright holders the right to sue infringing persons in order to recover statutory damages, even if the holder could not show monetary loss.”
Whether the Supreme Court would apply a different standard to copyright cases remains to be seen. But as Cornell law professor James Grimmelmann told WIRED last week, “This theory of no standing is actually a potential earthquake far beyond AI. It has the potential to significantly restrict the kinds of IP cases that federal courts can hear”—possibly leaving publishers without standing “to sue over model training at all, even for copyright infringement.”
The Bottom Line
The Raw Story Media ruling, with its reliance on TransUnion, raises significant questions about the future of copyright law in the context of AI. If other courts follow Judge McMahon’s lead, copyright owners may find it increasingly difficult to bring cases involving AI training data, particularly if they can’t show concrete harm from the outset.
For now, copyright holders may need to rethink their approach to AI-related claims. Gathering clear evidence of actual harm—such as instances where AI models produce outputs that closely mirror expressive elements from the original copyrighted material—may be essential. In any event, plaintiffs will need to show a real-world impact from the AI’s use of their work or risk seeing their claims fall short.
As always, I’d love to know what you think. Let me know in the comments below or @copyrightlately on social media. In the meantime, here’s a copy of Raw Story Media, Inc. v. OpenAI, Inc.—hopefully concrete enough for you to read.
View Fullscreen
6 comments
From a foreign bystander point of view, maybe, via a conceptual categorisation between legal rights
(like property, body) and interests(sth. conceptually like pure economic loss), the USSC’s ruling mentioned here can be distinguished as merely directing at judicial remedy for legal interests, and thus copyright infringement claims might be outside the realm of this ruling, for where it is the infringed copyright per se is a substantial loss ( but I am not sure whether common law tradition admits such legal understanding xD )
I see this as a quirk of the DMCA cause of action and not necessarily a setback to the broader copyright campaign against AI companies. For an infringement action based on unauthorized training, which seems like the big ticket item at the moment, there is a clear harm from the copyright owner’s lost license fee and lost ability to control distribution of their work.
Anyone know why the plaintiffs didn’t just assert a claim for violation of section 106 rights in the first place? No copyright registrations maybe?
Like you I have to speculate. Wouldn’t the 106 claim starting point be the copying of the work into the dataset? Which then leads to the ultimate and complicated fair use battle? Perhaps they wanted to try out a theory that avoids the battle? Were they being “clever”? But it doesn’t seem likely they could meet the intent or knowledge standards in section 1202, does it? The removal of the CMI doesn’t seem designed or likely to result in infringing activity related to the plaintiff’s works. A strange case that proved counterproductive for those who would restrict AI training data uses. For those folks, “bad cases made bad law.”
It’s hard for small online news organizations to get copyright protection in their articles. Large orgs. like the NYT can get copyright in aggregate articles, but online news reporting orgs. can’t (or they could, but it would likely be prohibitively expensive), so they generally don’t. USCO is contemplating changes to this, but it may be why Raw Story and Intercept went with the DMCA claims: https://news.bloomberglaw.com/ip-law/copyright-rule-would-ease-news-registration-as-ai-fight-looms
It seems like making/taking the data for AI training is making a copy of the work, much like loading code into RAM is generally accepted as a violation of the reproduction right.
Also, seems foolish that these plaintiffs are not registering their copyrights, which seems to be a persistent issue here.