First AI Copyright Trial Starts This Week: What to Know

The trial in Thomson Reuters v. Ross Intelligence may not be glamorous, but it will be groundbreaking.

The AI copyright and fair use trial in Thomson Reuters v. Ross Intelligence may not be glamorous, but it will be groundbreaking.

On Friday, August 23, jurors are scheduled to hear opening statements in the first trial to test whether using copyrighted data to train an AI program qualifies as fair use.

The trial won’t take place in Silicon Valley, and Sarah Silverman and John Grisham won’t be taking the stand. OpenAI, ChatGPT and Midjourney aren’t involved either. Instead, in a nondescript federal courtroom in Wilmington, Delaware, information services companies Thomson Reuters and West Publishing—owners of Westlaw—will face off against Ross Intelligence, a legal research startup they effectively forced out of business. In other words, this trial might be as thrilling as a Kia Soul driven by Mike Pence.

But make no mistake, all eyes in the AI industry will be watching closely, searching for hints on how a group of everyday citizens may view some of the thorniest copyright questions facing the emerging technology. That’s because the issues at play in Thomson Reuters v. Ross Intelligence largely mirror those raised in the more than two dozen pending copyright lawsuits against the creators of Stable DiffusionChatGPT and other AI tools. But unlike those cases, this one was first filed way back in 2020, before generative AI was poised to change the world (and also before it was spitting out fake cat videos and telling us to put glue on pizza).

The plaintiffs allege that Ross hired a third-party contractor to unlawfully copy Westlaw content—including its proprietary Key Number System and case headnotes—in order to train Ross’s own AI-driven natural language legal search engine. I first wrote about this case last year, when Third Circuit Court of Appeals Judge Stephanos Bibas (sitting by designation) largely denied both parties’ motions for summary judgment, setting the stage for the jury trial that’s about to begin. While many of the pretrial documents, including witness lists, remain confidential and under seal, here’s what I can tell you so far:

UPDATE—August 22, 2024—Judge Bibas has just postponed trial in the Thomson Reuters v. Ross Intelligence AI copyright case that was scheduled to start tomorrow. The judge held a Zoom hearing on Tuesday, during which he appeared to find that Ross committed acts of infringement as a matter of law by copying West’s headnotes to train its AI system, but he’s now rescinded that oral ruling and has invited the parties to bring new summary judgment motions on copyrightability, validity, infringement and fair use. This case has already been pending for over four years, and the latest setback serves as a reminder that, absent voluntary settlements, we may not have any clarity on legal issues at stake in the more than two dozen gen-AI copyright cases filed against Open AI, Stability AI, Microsoft, Midjourney and others for years to come.

The Trial Will Be Short (Supposedly)

Trial is scheduled to last only five days, which is pretty quick considering the complexity of the issues involved. Jury selection was originally supposed to begin on August 26, but (per the transcript of the Pretrial Conference) it’s now scheduled for August 23, because Judge Bibas was concerned about the jury rushing its deliberations heading into the Labor Day weekend. The judge plans to conduct a quick voir dire on Friday morning, August 23, followed by opening statements (lasting an hour each side) on Friday afternoon.

Thomson Reuters is expected to call its first witness on Monday, August 26. Each side will be allotted 15 hours of trial time, plus 90 minutes for closing argument. Given that there’s just over six hours of trial time each day, excluding recesses and lunch breaks, it’s frankly hard to see this trial being completed before Labor Day. (Sorry, future jurors.)

The Case Involves the Copying of Westlaw’s Headnotes

Unlike the creative works ingested by AI tools in the recent lawsuits filed against OpenAI, Microsoft, and others, the copyrights in Westlaw are more limited. Thomson Reuters doesn’t own any of the underlying judicial opinions included in its database. It does, however, claim protection in its original headnotes, key number organization system and case summaries. Last September, the court ruled as a matter of law that Ross committed an act of “actual copying” by scraping and reproducing this content when it trained its own fledgling AI legal research tool.

But whether Ross’s copying constitutes infringement will depend on whether the jury finds that the headnotes and other Westlaw content are original creative expression and whether Ross appropriated protectable elements from that content. Expect Thomson Reuters to introduce evidence that its legally-trained editors exercise editorial discretion, skill, and judgment when drafting headnotes and case summaries.

Earlier this month, Judge Bibas largely denied Ross’s request for the court to “filter out” (and withhold from the jury) allegedly uncopyrightable material from the Westlaw content, deciding that the jury will “consider all headnotes that it could reasonably find are original and thus eligible for copyright protection.” But the judge did invite Ross to submit a list of the headnotes it contends are verbatim or near verbatim quotations of judicial opinions. This is the kind of stuff the parties are fighting over:

As copyright protection goes, it may be only slightly more creative than a telephone directory. Still, Thomson Reuters contends that the headnotes differ sufficiently from their corresponding (unprotectable) judicial opinions for the jury to find copyright protection. The plaintiffs also claim protection in the “selection and arrangement” of the headnotes, arguing that “[t]here is a reason Plaintiffs hire attorneys to do this work; the selection of which concepts to headnote, how many headnotes to create, and which case passages to link those concepts to requires judgment and choice.” Thomson Reuters will attempt to show that Ross used the associations between the Westlaw headnotes and the corresponding passages from judicial opinions to train its AI system.

Fair Use is a Key Defense

While Ross disputes that the Westlaw content constitutes protectable expression, I expect that much of its defense will focus on proving that any copying of that content qualifies as fair use. This is where the trial should get interesting—and particularly relevant to the larger debate over the legality of generative AI.

If Ross’s AI tool only studied the language patterns in the Westlaw headnotes to learn how to produce its own summaries or search results, the jury could find that the copying served a “transformative purpose,” which weighs in favor of fair use. If, on the other hand, the jury finds that Ross used the headnote text to output Westlaw’s creative expression for its own competitive commercial purposes, the copying would weigh against a finding of fair use.

“This is one of those unsettled issues because your defense is the way you’re using something to crawl over it in AI is not—maybe they would disagree with this, but arguably might not be extracting a creative art but just underlying, unprotected facts. I’m not saying—I’m saying I understand Defendant’s position in taking that argument.”

Judge Stephano Bibas, comments during August 6, 2024 Pretrial Conference in Thomson Reuters et al. v. Ross Intelligence Inc.

There will also be a major battle over whether Ross’s use of original Westlaw material had an appreciable negative effect on the value of the Westlaw content or its potential market (the so-called “fourth fair use factor”). Thomson Reuters contends that Ross’s tool was designed to compete directly with Westlaw, targeting the same legal market.

“This is not, you know, 2 Live Crew using another person’s music where the people who are going to buy country style ‘Pretty Woman’ are different than the people who are going to buy the parody. This is selling to the same market and the same people, so you’ve got a strong position that this is going to undercut the value of your copyright for that market.”

Judge Bibas, comments during August 6, 2024 Pretrial Conference

In addition to economic effects, the judge has signaled that he’s interested in the potential “public benefits” of the copying as well. In denying summary judgment last year, he wrote that “Deciding whether the public’s interest is better served by protecting a creator or a copier is perilous, and an uncomfortable position for a court. Copyright tries to encourage creative expression by protecting both.”

And earlier this month, Judge Bibas denied Thomson-Reuters’s motion in limine to exclude testimony, evidence, or argument about the public benefit of generative artificial intelligence (although he also ruled that if Ross introduces such evidence, it will need to clarify the extent to which its technology differs from the more advanced generative AI tools used today). Among Ross’s proposed jury instructions: “[E]ven if you find Westlaw’s copyrights to be valid, you must answer the question of whether it is in the public benefit to allow AI to be trained with copyrighted material about judicial decisions and therefore, fair use?”

Who Will Testify?

Expert Witnesses

Based on what I’ve been able to piece together from the hearing transcripts and the redacted versions of under-seal filings, Thomson Reuters is likely to call two expert witnesses. First is Dr. Jonathan L. Krein, an expert in machine learning and AI, who will testify as to the similarities between the Westlaw headnotes and Ross’s “Bulk Memo Project,” a set of 25,000 legal questions and answers that Ross used to train its AI tool. Thomson Reuters contends that the Bulk Memo Project consisted of little more than Westlaw headnotes with question marks at the end. Dr. Krein will also testify that there is a licensing market for the use of plaintiffs’ Westlaw materials as potential AI training data. Plaintiffs’ second expert, James E. Malackowski, is expected to testify regarding the market for artificial intelligence training data as well as Thomson Reuters’s claimed damages.

Meanwhile, Ross will call up to four expert witnesses of its own. They include Barbara Frederiksen-Cross, an expert in forensic software analysis who analyzed the differences between the legal questions presented in Ross’s Bulk Memo Project and Westlaw’s headnotes. Alan J. Cox, Ph.D. will counter plaintiffs’ experts on damages, lost profits, and market harm. Ross may also call Richard Leiter, a legal research professor at the University of Nebraska College of Law who is an expert on legal information technology issues. Plaintiffs’ final expert, Dr. L. Karl Branting, died unexpectedly on July 18. After noting that Ross had “nothing to do with the untimely demise,” Judge Bibas allowed Ross to replace Dr. Branting with Joe Marks, Ph.D., who serves as the Executive Director for the Center for Machine Learning and Health at Carnegie Mellon University.

At the pretrial conference, Judge Bibas said he’ll give the jury an instruction “explaining that it’s no fault of Ross’s that the previous expert died.” (Seriously.)

Percipient Witnesses

The parties collectively took several dozen depositions during the course of discovery, but until their pretrial filings are unsealed, it’s unclear who exactly will testify at trial. Given the time crunch, I expect that the parties will play short video clips of key deposition testimony for the jury, but we should also see some live witnesses. They will likely include representatives of Thomson Reuters and West Publishing, as well as individuals from LegalEase, the third party contractor that prepared the Bulk Memo Project. I think we’ll also hear from Jimoh Ovbiagele, a Ross co-founder and former Chief Technology Officer who was responsible for training Ross’s AI tool, as well as fellow co-founder Andrew Arruda, the former Ross CEO.

The Bottom Line

While a trial over copied Westlaw data won’t attract the same attention as those involving Johnny Depp and Alec Baldwin, it just may be pivotal for the future of AI.

The jury’s verdicts in the case, along with any ancillary legal rulings by the court, will be known to the parties, lawyers, and judges participating in other similar pending lawsuits. Depending upon how the jury comes out on fair use, this could prompt tech companies like OpenAI and Stability AI to either settle with other plaintiffs or, conversely, to dig in their heels. At the same time, companies producing new AI tools will no doubt take the outcome of the first AI copyright trial into account in determining whether their business models will include the licensing of scraped content, opt-in or opt-out models, and the extent to which these new tools will make use of so-called “ethically sourced” datasets.

Stay tuned for updates, and in the meantime, I’d love to hear what you think. As always, you can drop me a note in the comments below or @copyrightlately on social media.

2 comments
  1. I find it hard to believe that in the end a jury will (or should) decide the rather profound policy questions that are posed in this case! I suppose there are a few factual disputes involved, but how germane they are to deciding the big issues remains to be seen. As you say, many will be interested in the outcome of this case. I’ll add one to your list: Wall Street, which is showing signs of skepticism about AI. If as a legal matter Tech has to “license” training data, that could fuel the skeptics. Perhaps the case will be decided for the defendants without reaching fair use, though as you indicate, the originality standards involved are not high.

  2. Excellent summary Aaron. Lots of questions and few answers about what Judge Bibas did.
    Why did Judge Bibas abort the trial the day before it was to start? No reason given on the docket sheet
    Why did the judge direct the parties to essentially renew the same motions on the same record the judge in 2023 found raised material issues of fact?
    It seems the judge, after admitting in a pretrial conference on August 6 that most of the issues were “well beyond my pay grade” and were for “the jury,” had second thoughts?
    But except in cases where a directed verdict is warranted a judge can’t take fact issues away from a jury no matter how difficult the issues may be .
    And there are plenty of fact questions in this case as the judge outlined in his 2023 opinion denying the parties’ cross-motions for summary judgment.
    As you say Aaron this case may have been a bellwether because it raised the fair use issue at the heart of the many infringement cases (29 at last count) against the AI developers.
    Now we wait till sometime in 2025 for the judge to decide the renewed motions. Uggh.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.