The Production Gap in Agentic AI

Synechron

Summary

Most agentic AI programs in financial services stall between pilot and production because the surrounding systems were never built to support autonomous decisioning.
The firms moving past that gap are redesigning the operating model around AI rather than layering it on top of legacy processes built for certainty.
Production-grade deployments share a clear pattern: a defined boundary between rule-based logic and agent-led actions, governance built into the architecture and a single accountable agent coordinating specialized agents with full auditability.
Success criteria, baselines and measurable outcomes are set before deployment, with time built in for systems to stabilize.

Synechron recently hosted a panel in Montreal. Murex, AWS, Salesforce and BNP Paribas in the room, two hours comparing notes on what's working and what isn't.

A set of common themes kept surfacing. Most matched what we see in our work with banks.

Most banking systems run on one assumption: one input, one process, one answer, always the same. Risk calculations, payments, regulatory reporting. That part is intentional. It's a legal requirement. AI agents work differently. They reason through situations where the answer isn't set in advance. When banks drop that into processes built for certainty, without deciding what stays rule-based and what gets handed to an agent, things break in ways that are hard to explain to a regulator. Marc Natale from Murex was direct: get that line wrong, and the agent becomes the liability.

How you structure the rollout matters just as much. Banks that treat AI like a regular IT project tend to end up with pilots that go nowhere. Agents aren't something you ship and move on from. They need oversight built in from the start and a team behind them to keep them running. The banks with agents in production built that foundation first. Most didn't.

There's another problem underneath all of this. Bill Reich from Salesforce described why these systems need close supervision: LLMs, he said, are "toddlers with no frontal lobe." Brilliant but sycophantic. Someone must play that supervising role, and most banks haven't figured out who. They default to whoever's nearby: the Salesforce admin, the database admin, the web designer who knows the interface. The role that bridges business need with how an agent reasons hasn't fully emerged yet.

A major global insurer got the architecture right. Seven agents, each with a specific job and its own audit trail, coordinated by a central agent managing the overall process. Insurance claims that previously took up to 100 days now average 20. The AI model didn't produce that result. The way it was built did.

Benjamin Crestel, who runs the AI lab at BNP Paribas, made a related point about evaluation. His team has put serious work into it, and they've learned that off-the-shelf evaluation tools rarely reflect a specific bank's standards. You end up having to evaluate the evaluator. There's no shortcut, and most rollout plans don't budget for it.

Sean McCarthy from AWS raised something most rollout plans skip: measuring how the process runs today before you change anything, then give it 90 days before drawing conclusions. These systems get better as they go. What they do in week one isn't what they do in month three.

There's a reason that baselining step is so easy to skip. Once agents are running, the cost is immediately visible. The value often isn't, at least not at first. The first question we ask any bank coming to us with an agent project is always the same: what are you trying to accomplish, and how will you know when you've done it? Most can answer the first part. The second is where it gets hard.

So, if a regulator walked in tomorrow and asked exactly how a specific decision was made, how confident are you in the answer?

The organizations with a clean answer have agents running. The rest are still counting pilots.

See More Relevant Articles

What Mythos Reveals About Modern Cyber Risk

Accelerating Deal Velocity in Media: Why Unified Platforms are Key

The Blast Radius Principle: Enterprise Claude Plugins and Governance

Digital, AI

The Invisible User: Designing for AI Agents

The Chatbot Was Never the Point

Explore articles