Researchers Find Legal Errors 'Pervasive' In Top AI Models

This article has been saved to your Favorites!
Large language models regularly give incorrect responses when asked legal questions, making it crucial for this technology to be supervised when used in law practice, according to a recently published study by researchers at Stanford University's RegLab and Institute for Human-Centered Artificial Intelligence.

The researchers found that large language models from top tech companies "hallucinated" — produced text with incorrect information — 69% to 88% of the time when answering legal questions, and their performances worsened as the questions got more difficult.

The models did no better than random guessing when asked to measure the precedential relationship between two cases and "hallucinated" at least 75% of the time when asked about a court's main ruling, according to the study.

"These findings suggest that LLMs are not yet able to perform the kind of legal reasoning that attorneys perform when they assess the precedential relationship between cases — a core objective of legal research," the researchers said in a blog post Thursday.

The researchers asked OpenAI's GPT-3.5, Meta's Llama 2 and Google's PaLM 2 more than 200,000 legal questions of varying difficulty, according to the study. The questions ranged in complexity from who wrote a court opinion to whether two cases agree with each other.

The researchers found that the models got it wrong more frequently when asked about decisions from lower courts than when asked about rulings from higher courts like the U.S. Supreme Court.

"This suggests that LLMs may struggle with localized legal knowledge that is often crucial in lower court cases, and calls into doubt claims that LLMs will reduce long-standing access to justice barriers in the United States," the researchers said.

The researchers also found that the models were wrong more often when asked about the Supreme Court's oldest and newest cases, suggesting their peak performance falls behind current legal doctrine.

In addition, the researchers found that the AI models are vulnerable to what they called "contra-factual bias," treating a false premise in a legal question as correct.

Meta, Google and OpenAI did not respond to requests for comment.

Since OpenAI released its chatbot ChatGPT in November 2022, several legal tech companies and law firms have developed their own tools with the technology.

While OpenAI's GPT-3.5 and more advanced GPT-4 are popular models to use for legal tools, law firms and legal tech companies are experimenting with models from other providers like Meta and Google.

In November, LegalOn Technologies published a study saying that OpenAI's GPT-4 outdid the average test score for law students in multiple-choice legal ethics questions.

--Editing by Robert Rudinger.

For a reprint of this article, please contact



Law360 Law360 UK Law360 Tax Authority Law360 Employment Authority Law360 Insurance Authority Law360 Real Estate Authority Law360 Healthcare Authority Law360 Bankruptcy Authority


Social Impact Leaders Prestige Leaders Pulse Leaderboard Women in Law Report Law360 400 Diversity Snapshot Rising Stars Summer Associates

National Sections

Modern Lawyer Courts Daily Litigation In-House Mid-Law Legal Tech Small Law Insights

Regional Sections

California Pulse Connecticut Pulse DC Pulse Delaware Pulse Florida Pulse Georgia Pulse New Jersey Pulse New York Pulse Pennsylvania Pulse Texas Pulse

Site Menu

Subscribe Advanced Search About Contact