When accurate AI is still dangerously incomplete

Typically, when building, training and deploying AI, enterprises prioritize accuracy. And that, no doubt, is important; but in highly complex, nuanced industries like law, accuracy alone isn’t enough. Higher stakes mean higher standards: Models outputs must be assessed for relevancy, authority, citation accuracy and hallucination rates.

To tackle this immense task, LexisNexis has evolved beyond standard retrieval-augmented generation (RAG) to graph RAG and agentic graphs; it has also built out "planner" and "reflection" AI agents that parse requests and criticize their own outputs.

“There’s no such [thing] as ‘perfect AI’ because you never get 100% accuracy or 100% relevancy, especially in complex, high stake domains like legal,” Min Chen, LexisNexis' SVP and chief AI officer, acknowledges in a new VentureBeat Beyond the Pilot podcast.

The goal is to manage that uncertainty as much as possible and translate it into consistent customer value. “At the end of the day, what matters most for us is the quality of the AI outcome, and that is a continuous journey of experimentation, iteration and improvement,” Chen said.

Getting ‘complete’ answers to multi-faceted questions

To evaluate models and their outputs, Chen’s team has established more than a half-dozen “sub metrics” to measure “usefulness” based on several factors — authority, citation accuracy, hallucination rates — as well as “comprehensiveness.” This particular metric is designed to evaluate whether a gen AI response fully addressed all aspects of a users' legal questions.

“So it's not just about relevancy,” Chen said. “Completeness speaks directly to legal reliability.”

For instance, a user may ask a question that requires an answer covering five distinct legal considerations. Gen AI may provide a response that accurately addresses three of these. But, while relevant, this partial answer is incomplete and, from a user perspective, insufficient. This can be misleading and pose real-life risks.

Or, for example, some citations may be semantically relevant to a user's question, but they may point to arguments or instances that were ultimately overruled in court. “Our lawyers will consider them not citable,” Chen said. “If they're not citable, they're not useful.”

Moving beyond standard RAG

LexisNexis launched its flagship gen AI product, Lexis+ AI — a legal AI tool for drafting, research and analysis — in 2023. It was built on a standard RAG framework and hybrid vector search that grounds responses in LexisNexis' trusted, authoritative knowledge base.

The company then released its personal legal assistant, Protégé, in 2024. This agent incorporates a knowledge graph layer on top of vector search to overcome a “key limitation” of pure semantic search. Although “very good” at retrieving contextually relevant content, semantic search “doesn't always guarantee authoritative answers," Chen said.

Initial semantic search returns what it deems relevant content; Chen’s team then traverses those returns across a “point of law” graph to further filter the most highly authoritative documents.

Going beyond this, Chen's team is developing agentic graphs and accelerating automation so agents can plan and execute complex multi-step tasks.

For instance, self-directed “planner agents” for research Q&A break user questions into multiple sub-questions. Human users can review and edit these to further refine and personalize final answers. Meanwhile, a “reflection agent” handles transactional document drafting. It can “automatically, dynamically” criticize its initial draft, then incorporate that feedback and refine in real time.

However, Chen said that all of this is not to cut humans out of the mix; human experts and AI agents can “learn, reason and grow together.” “I see the future [as] a deeper collaboration between humans and AI.”

Watch the podcast to hear more about:

How LexisNexis’ acquisition of Henchman helped ground AI models with proprietary LexisNexis data and customer data;
The difference between deterministic and non-deterministic evaluation;
Why enterprises should identify KPIs and definitions of success before rushing to experimentation;
The importance of focusing on a “triangle” of key components: Cost, speed and quality.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Source link

When accurate AI is still dangerously incomplete

Working to automate nuclear plant operations | MIT News

VentureBeat Research: Where enterprise AI agent governance hasn't caught up

Meta, Microsoft, Nvidia, IBM, and others back open-weight AI

Meet the New Claude Opus 5: Frontier-Class Agentic Coding and Computer Use at Unchanged Opus Pricing

MIT projects selected for funding under US Department of Energy’s Genesis Mission | MIT News

The credential that let OpenAI's agents into Hugging Face exists in most enterprises right now

Bitcoin Flips Volatile As US Trading Session Sees Spike Toward $66,000

Strategy Adds $525M to USD Reserve, Pushing Dividend Coverage to 2.1 Years

Can Bulls Repair the Damage?

NVIDIA Nemotron 3 Ultra Sets New Standard for RTL AI Efficiency

Ethereum ETFs End 5-Day Inflow Streak With $70.6M Outflows

Top Insights

Compromised contract let hackers print 5.2 million WEMIX stablecoins, forcing a complete network freeze

Abu Dhabi’s $430B Asset Giant Makes Blockchain Leap, Coinbase Buys In

When accurate AI is still dangerously incomplete

Getting ‘complete’ answers to multi-faceted questions

Moving beyond standard RAG

Related Posts