Nearly 1 In 5 Queries Cause Legal AI Tools To Hallucinate

Nearly one in five queries caused leading legal artificial intelligence tools to respond with misleading or false information, according to a new report on Thursday, reminding users to remain cautious about the outputs of these platforms.

Stanford University's Human-Centered AI Group released the report, called "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools," which found that AI research tools from LexisNexis and Thomson Reuters each hallucinate more than 17% of the time.

Hallucination refers to false outputs from a generative AI tool. The report identified hallucinations as outputs that were "unfaithful to the true facts of the world," which makes them as risky in the high-stakes legal domain.

Jeff Pfeifer, the chief product officer for LexisNexis North America and U.K., told Law360 Pulse that the Stanford researchers did not contact LexisNexis and that its own data analysis "suggests a much lower rate of hallucination."

Lexis+ AI, the generative AI tool used in the study, produces linked legal citations, which means a user can review the reference, according to Pfeifer.

"In the rare instance that a citation appears without a link, it is an indication that we cannot validate the citation against our trusted data set," Pfeifer said. "This is clearly noted within the product for user awareness and customers can easily provide feedback to our development teams to support continuous product improvement."

Researchers tested the tools with a preregistered dataset of over 200 legal queries. The outputs were reviewed for accuracy and fidelity to authority.

The study concluded that Lexis+ AI and Thomson Reuters' Ask Practical Law AI were less prone to hallucination than GPT-4, the large language model from OpenAI.

Researchers also found that Thomson Reuters' answers were incomplete more than 60% of the time and that it only provided an accurate response 19% of the time. LexisNexis gave accurate responses 65% of the time, three times the rate of Thomson Reuters.

Within complete responsive answers, Thomson Reuters' system hallucinates at a similar rate to GPT-4, and more than twice as often as LexisNexis.

"In this study, Stanford used Practical Law's Ask Practical Law AI for primary law legal research, which is not its intended use, and would understandably not perform well in this environment," a spokesperson for Thomson Reuters told Law360 Pulse. "Westlaw's AI-Assisted Research is the right tool for this work."

The spokesperson added that to help the team at Stanford develop the next phase of its research, Thomson Reuters has made this product available to them.

"The Stanford study acknowledges Reuters also offers a product called 'AI-Assisted Research' that appears to have access to additional primary source material as well (Thomson Reuters, 2023)," a spokesperson for the researchers told Law360 Pulse. "However, the research notes this product is not yet generally available, and multiple requests for access were denied by the company at the time the researchers conducted the evaluation. "

Retrieval-Augmented Generation

Retrieval-augmented generation is a framework experts say can improve the output of generative AI tools.

Legal AI vendors, including Thomson Reuters and LexisNexis, often tout the framework as a way to reduce hallucinations.

Researchers say these claims are overstated, pointing to the lack of empirical evidence and the closed nature of these systems, making it difficult to assess the claims.

While the study demonstrated that the framework can reduce hallucinations, they still occurred.

The study concluded that retrieval-augmented generation may not be able to fully eliminate hallucinations because retrieval of information is challenging in legal, document relevance in the law is not just based on text and generating meaningful legal texts can be complicated.

As a solution, researchers suggested more rigorous, transparent benchmarking and public evaluations of AI tools in law.

--Editing by Karin Roberts.

Law360 is owned by LexisNexis Legal & Professional, a RELX Group company.

Update: The story has been updated with a statement from the researchers.

For a reprint of this article, please contact reprints@law360.com.

Nearly 1 In 5 Queries Cause Legal AI Tools To Hallucinate

Find More

Discover

Got a tip?

Law360

Rankings

National Sections

Regional Sections

Site Menu

Regional

National

Nearly 1 In 5 Queries Cause Legal AI Tools To Hallucinate

Find More

Discover

Related Sections

Trending Stories

Got a tip?

Law360

Rankings

National Sections

Regional Sections

Site Menu