Why a Little-Known Copyright Case May Shape the Future of AI

While a flurry of AI copyright lawsuits from prominent authors and artists grab headlines, another case has quietly taken something more important: a head start.

Even die-hard copyright geeks would be forgiven for overlooking a lawsuit first filed over three years ago by information services company Thomson Reuters against AI start-up Ross Intelligence. That’s because the case involves Westlaw, a legal research tool that’s about as sexy as the underwear section in a 1940s Sears catalog. I say this with peace and love as a longtime Westlaw user, but let’s be honest—headnotes and key numbers are simply no match for the likes of Sarah Silverman and John Grisham.

It’s time to start paying attention though, because a Delaware District Court judge just ordered this low-profile AI case to trial, largely denying the parties’ motions for summary judgment on copyright infringement and fair use (read the opinion here). This means that a jury could weigh in on some of the thorniest copyright questions involving artificial intelligence as early as May 2024.

Thomson Reuters v. Ross Intelligence

The issues at play in Thomson Reuters v. Ross Intelligence largely mirror those I’ve discussed in connection with recent class action copyright lawsuits filed against the creators of Stable Diffusion, ChatGPT and other generative AI tools. In a nutshell, plaintiffs allege that Ross hired a third-party contractor to unlawfully copy Westlaw content—including its proprietary Key Number System and case headnotes—in order to train Ross’s own AI-driven natural language legal search engine.

“But Your Honor, how am I supposed to get home?”

Unlike the creative works ingested by AI tools in the recent lawsuits filed against OpenAI and Stability AI, the copyrights in Westlaw are more limited. Thomson Reuters doesn’t own any of the underlying judicial opinions that make up its database. It does, however, claim copyright in its keynote organization system as well as its original case summaries and headnote descriptions. These “editorial enhancements” are drafted by the company’s attorney-editors in what I’d imagine is the most thankless job this side of working for Louis Litt.

But according to Ross, it wasn’t interested in the Westlaw key numbers or headnotes. Instead, the goal of its system was for users to ask questions and for the search engine to spit out quotations directly from judicial opinions—no commentary necessary. In other words, Ross contends that the output of its tool won’t infringe any original copyrighted material owned by Thomson Reuters, notwithstanding the so-called “intermediate copies” of West’s key numbers and headnotes that may have been made to initially train Ross’s dataset. These copies, Ross claims, are fair use.

In January, Thomson Reuters moved for summary judgment on its copyright infringement claim, and both sides moved for summary judgment on Ross’s fair use defense.

Judge Stephanos Bibas ultimately declined to determine the scope of protection to be given the Key Number System or to decide whether Westlaw’s headnotes added sufficient non-trivial material to the underlying judicial opinions to meet copyright’s originality threshold. While the court did find that Ross committed an act of “actual copying” by scraping and reproducing headnotes during the AI training process, whether that copying constitutes infringement will depend on whether or not the headnotes are protected expression. That issue will be decided by a jury.

The court likewise ruled that a jury needs to decide whether there are substantial similarities in protectable expression (as opposed to unprotectable material) between Westlaw’s headnotes and summaries and thousands of “bulk memos” created by Ross’s third-party contractor to train Ross’s AI tool.

Fair Use

The court found disputed issues of fact on all four fair use factors, meaning that a jury will be tasked with answering most of the questions underlying this key defense.

The Purpose and Character of the Use

Interestingly, the court’s first factor analysis largely focused, not on the commercial nature of Ross’s competing tool, but on disputes over whether Ross’s copying was transformative—an inquiry that some observers (but, ahem, not this one) thought would take a backseat following the Supreme Court’s recent Warhol decision.

Judge Bibas noted that whether Ross’s so-called “intermediate copying” (copies made during the input stage of the training process) was transformative would depend on the precise nature of Ross’s actions: “It was transformative intermediate copying if Ross’s AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes.” If, on the other hand, “Thomson Reuters is right that Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors,” then the copying would weigh against a transformative fair use. This raised a material question of fact that a jury needs to decide.

The Nature of the Copyrighted Work

While declining to definitively rule that Westlaw’s headnotes were too unoriginal to satisfy the second fair use factor, the judge certainly signaled that he didn’t think plaintiffs’ contributions were at the “core of intended copyright protection,” and specifically distinguished them from “traditionally protected materials, such as literary works or visual art.”

The Amount and Substantiality of the Copying

Because it was unclear how much of Ross’s copying was of protectable expression, the court found that a jury would need to decide the third fair use factor too. Interestingly, the court also noted that copying could be deemed insubstantial if Ross’s AI actually works in the way the company claimed—i.e., if the tool outputs only the unprotectable judicial opinion, not any original expression. This suggests that the presence or absence of substantial similarity at the output stage may influence the court’s input stage rulings as well.

The Effect of the Use Upon the Market for the Work

Finally, on the fourth fair use factor, the court declined to decide whether Ross’s use of Westlaw’s material had a “meaningful or significant effect” on the value of the original or its potential market. Focusing not merely on economic effects, but “public benefits” of the copying, the court concluded that a jury would be best situated to answer these questions:

“Deciding whether the public’s interest is better served by protecting a creator or a copier is perilous, and an uncomfortable position for a court. Copyright tries to encourage creative expression by protecting both. Here, we run into a hotly debated question: Is it in the public benefit to allow AI to be trained with copyrighted material?”

Thomson Reuters Enterprise Centre Gmbh, et al. v. Ross Intelligence Inc.

This hotly debated question—and the ultimate question of fair use—will also be decided by a jury.

Why the Case Is Important

As a practical matter, most complex federal court lawsuits take at least three years or more to wind their way through court. A trial in the Thomson Reuters/Ross case is tentatively scheduled for May 2024—a full four years after the case was initially filed, but likely long before any of the recently filed class action copyright lawsuits will be decided.

This means that the jury’s verdict in the Westlaw case, along with any ancillary legal rulings by the court, will be known to the parties, lawyers, and judges participating in other similar pending lawsuits. Depending upon how the jury comes out on fair use, this could prompt tech companies like OpenAI and Stability AI to either settle with the plaintiffs or, conversely, to dig in their heels. At the same time, companies producing new AI tools will no doubt take the outcome of the first AI copyright trial into account in determining whether their business models will include the licensing of scraped content, opt-in or opt-out models, and the extent to which these new tools will make use of so-called “ethically sourced” datasets.

In other words, while a trial over Westlaw data may not be glamorous, it will still be worth following closely.

As always, I’d love to know what you think. Drop me a note in the comments below or @copyrightlately on social media. Meanwhile, here’s a copy of the court’s opinion in Thomson Reuters Enterprise Centre Gmbh, et al. v. Ross Intelligence Inc.:

View Fullscreen
2 comments
  1. Thanks for the excellent case note. My initial reaction is not to any of the merits, but to the process. Does it not seem less than ideal to have any of this decided by a jury, with the possible exception of the factual question about market harm. I suppose juries do determine important community standards. We learn that in Torts class. But the technical copyright questions in this case are difficult even for copyright lawyers. And having a jury decide the potentially huge policy questions is also perilous or at least odd. Do you think the appellate courts will leave these questions to the jury?

    1. The Supreme Court held a couple of years ago in Google v. Oracle that fair use is a mixed question of fact and law, which means that the jury is supposed to make findings of underlying facts, but the ultimate question of whether those facts amount to fair use is a legal question for the judge. As a practical matter though, juries aren’t always given discrete questions to answer and are instead asked to decide the ultimate issue of fair use, subject to the judge’s review. And at least theoretically, the appellate courts are there to hopefully make sure the whole thing isn’t screwed up. Here’s an example of a fair use trial in which, in my opinion, the jury got it wrong, as did the district judge (appeal TBD): https://copyrightlately.com/tattoo-artist-trial-victory-copyright-lawsuit/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.