When accurate AI is still dangerously incomplete

Typically, when building, training and deploying AI, enterprises prioritize accuracy. And that, no doubt, is important; but in highly complex, nuanced industries like law, accuracy alone isn’t enough. Higher stakes mean higher standards: Models outputs must be assessed for relevancy, authority, citation accuracy and hallucination rates.

To tackle this immense task, LexisNexis has evolved beyond standard retrieval-augmented generation (RAG) to graph RAG and agentic graphs; it has also built out "planner" and "reflection" AI agents that parse requests and criticize their own outputs.

“There’s no such [thing] as ‘perfect AI’ because you never get 100% accuracy or 100% relevancy, especially in complex, high stake domains like legal,” Min Chen, LexisNexis' SVP and chief AI officer, acknowledges in a new VentureBeat Beyond the Pilot podcast.

The goal is to manage that uncertainty as much as possible and translate it into consistent customer value. “At the end of the day, what matters most for us is the quality of the AI outcome, and that is a continuous journey of experimentation, iteration and improvement,” Chen said.

Getting ‘complete’ answers to multi-faceted questions

To evaluate models and their outputs, Chen’s team has established more than a half-dozen “sub metrics” to measure “usefulness” based on several factors — authority, citation accuracy, hallucination rates — as well as “comprehensiveness.” This particular metric is designed to evaluate whether a gen AI response fully addressed all aspects of a users' legal questions.

“So it's not just about relevancy,” Chen said. “Completeness speaks directly to legal reliability.”

For instance, a user may ask a question that requires an answer covering five distinct legal considerations. Gen AI may provide a response that accurately addresses three of these. But, while relevant, this partial answer is incomplete and, from a user perspective, insufficient. This can be misleading and pose real-life risks.

Or, for example, some citations may be semantically relevant to a user's question, but they may point to arguments or instances that were ultimately overruled in court. “Our lawyers will consider them not citable,” Chen said. “If they're not citable, they're not useful.”

Moving beyond standard RAG

LexisNexis launched its flagship gen AI product, Lexis+ AI — a legal AI tool for drafting, research and analysis — in 2023. It was built on a standard RAG framework and hybrid vector search that grounds responses in LexisNexis' trusted, authoritative knowledge base.

The company then released its personal legal assistant, Protégé, in 2024. This agent incorporates a knowledge graph layer on top of vector search to overcome a “key limitation” of pure semantic search. Although “very good” at retrieving contextually relevant content, semantic search “doesn't always guarantee authoritative answers," Chen said.

Initial semantic search returns what it deems relevant content; Chen’s team then traverses those returns across a “point of law” graph to further filter the most highly authoritative documents.

Going beyond this, Chen's team is developing agentic graphs and accelerating automation so agents can plan and execute complex multi-step tasks.

For instance, self-directed “planner agents” for research Q&A break user questions into multiple sub-questions. Human users can review and edit these to further refine and personalize final answers. Meanwhile, a “reflection agent” handles transactional document drafting. It can “automatically, dynamically” criticize its initial draft, then incorporate that feedback and refine in real time.

However, Chen said that all of this is not to cut humans out of the mix; human experts and AI agents can “learn, reason and grow together.” “I see the future [as] a deeper collaboration between humans and AI.”

Watch the podcast to hear more about:

How LexisNexis’ acquisition of Henchman helped ground AI models with proprietary LexisNexis data and customer data;
The difference between deterministic and non-deterministic evaluation;
Why enterprises should identify KPIs and definitions of success before rushing to experimentation;
The importance of focusing on a “triangle” of key components: Cost, speed and quality.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Source link

When accurate AI is still dangerously incomplete

Can AI help predict which heart-failure patients will worsen within a year? | MIT News

NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents

E.SUN Bank and IBM build AI governance framework for banking

How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

New MIT class uses anthropology to improve chatbots | MIT News

Anthropic and OpenAI just exposed SAST's structural blind spot with free tools

How to Use AI to Make Money in 2026 (Realistic Version) | No Guru Lies

What Is Agentic AI? | Agentic AI Explained in 13 Minutes |Introduction to Agentic AI | Simplilearn

😮 eBay’s New AI Update Just Changed Listing…Sellers Should Pay Attention #ebay

Aave to Roll Out Aave Shield After $50M User Loss Incident

Crypto Funds Add $1B as Bitcoin and Ethereum Lead Gains

Top Insights

US Bitcoin ETFs Hit 5-Day Inflow Streak For First Time In 2026

How a 2.85% Price Error Triggered $27M in Liquidations on Aave

When accurate AI is still dangerously incomplete

Getting ‘complete’ answers to multi-faceted questions

Moving beyond standard RAG

Related Posts