Automation Theater: Why Carrier AI Investments Aren’t Showing Up in the P&L

The boards are asking. Investors are asking. The carrier C-suite is starting to ask: Where’s the payoff?

Executive Summary

“Has your operating model changed in any structural way to accommodate AI—decision rights, role definitions, accountability boundaries?”

That’s one of three questions that SSA & Company’s Brian Nordyke and Nick Kramer say that carriers leading with AI can answer affirmatively in under a minute. If the org chart looks the same as it did before AI pilots launched, then the pilots have changed nothing, they suggest, explaining that the real ROI unlock for insurance carriers isn’t the next generation of AI tooling. Instead, it’s rethinking the operating model itself, with AI as a design constraint rather than a bolt-on, they write.

Insurers have been far from timid in their AI investments. Innovation labs have been built. Pilots have been launched. Vendor demos have generated real boardroom excitement. Yet for most carriers, brokers and managing general agents, measurable returns remain elusive.

The diagnosis is remarkably consistent across the industry: Carriers have deployed AI on operating models built for a pre-digital era. Layered onto legacy infrastructure, AI codifies inefficiency, removes the human oversight that used to catch it and makes the structural problem harder to see. The pilots didn’t optimize anything. They taught a machine to repeat existing mistakes faster.

The real ROI unlock isn’t the next generation of AI tooling. It’s rethinking the operating model itself, with AI as a design constraint rather than a bolt-on. That requires a discipline most carriers have been slow to embrace: fix the process before deploying the technology. Carriers getting that sequence right are pulling ahead. The rest are investing heavily in automation theater.

What Automation Theater Looks Like

From a distance, automation theater looks like comprehensive transformation. Dozens of pilots run simultaneously across functions, each with its own budget, sponsor and benchmark for success. Morale is high. The decks are compelling. But when someone asks which pilots have graduated to production, the answer turns evasive.

When organizations are pressed on what those that did graduate have actually delivered, the response is often vague and formulaic—”efficiency gains,” “grunt work eliminated,” “improved throughput”—rather than anything that shows up in a KPI.

Consider one pattern we have seen across the industry: implementation of multi-phase document-ingestion AI platforms. Often, we have seen pinpoint solutions across disparate product lines with a full-buildout business case projecting savings of fewer than a handful of FTEs due to the lack of a full organization-wide plan.

That is not transformation. That is an expensive demo. And the pattern repeats—capital deployed, milestones hit, leadership briefed, but no operational unit reports a meaningful change in their P&L. While these initial proofs of concept are valuable for showcasing the potential, they must be deployed in the context of a broader organization-wide plan to account for regional and product differences in design from the start.

The Bottlenecks AI Can’t Fix Alone

In our experience, the three areas where legacy process architecture most consistently blocks AI ROI are in claims intake, underwriting triage and policy servicing.

Claims intake remains one of the most structurally resistant. Most carriers still rely on intake processes built around manual data entry, inconsistent document formats and handoff-heavy triage. AI accelerates each of those steps in isolation. But when the underlying process requires human judgment at a dozen sequential decision points—because that’s how liability was managed before anyone thought about AI—automation produces throughput gains at the front and a bottleneck immediately downstream. The process was never designed for the continuous, high-volume decisioning AI enables. Until it is, the returns will underwhelm.

Underwriting triage is where the contrast is sharpest. We have seen carriers that redesigned their underwriting workflow before deploying GenAI deliver 40-50% uplift in quotes per underwriter and route 60-70% of renewals through accelerated decisioning paths. Carriers that bolted similar tooling onto unchanged workflows captured a fraction of that. The technology was nearly identical. The operating model was not. Large multiline carriers have publicly disclosed that submission volume and quote ratios have substantially increased when they not only deploy GenAI but accompany a workflow redesign with the deployment.

Policy servicing is the clearest example of how siloed process design defeats AI at scale. Servicing workflows are typically fragmented across multiple administration systems, each with its own data model, integration constraints and institutional logic. AI tools that perform well in a demo run into walls in production—not because the model failed but because the underlying architecture was never designed for the data continuity AI requires.

The pattern extends past carriers. Brokers and MGAs face structurally similar bottlenecks across placement and post-bind operations: facultative reinsurance handoffs that depend on two or three named individuals, slip retrieval workflows with no system-of-record visibility, multi-system fragmentation between placement and servicing platforms. The vocabulary changes. The diagnosis does not.

When AI Outgrows the Org Chart

At some point, the process diagnosis converges on a different question: Is the organization actually built to operate with AI, or just to experiment with it? For most carriers, the honest answer is the latter—and the gap is not technological. It is organizational.

When AI absorbs work humans used to do, the consequences ripple further than most leadership teams anticipate. Decision rights shift. Accountability structures designed around human judgment at every step suddenly have gaps where the human used to be. The traditional carrier operating model—hierarchical, siloed by line of business, built for annual product cycles and sequential handoffs—was not designed for the speed and iteration AI demands. Retrofitting it is not a technology project. Rather, it’s a change management problem of the first order.

Two organizational realities are particularly underappreciated.

First, the human-in-the-loop is not a transitional arrangement until AI matures. It is a permanent design feature of any responsibly built AI operating model. As AI agents absorb the routine, high-volume transactional work that has traditionally lived in large onshore and offshore processing teams, the organizational pyramid reshapes. The bottom flattens. The middle and upper layers—judgment, oversight, client-facing expertise—become more consequential, not less. Carriers continuing to model AI deployments as a straight-line headcount-reduction exercise are setting themselves up for the wrong workforce structure for the next decade.

Second, the institutional knowledge problem. A generation of underwriting, claims and operations expertise is approaching retirement, and the window to capture and transfer it is narrowing. AI models trained on the residue of organizational memory—incomplete documentation, partial process artifacts, outdated playbooks—will produce outputs that look authoritative but reflect knowledge gaps the model itself can’t surface. Carriers treating retirement as an HR issue rather than an AI readiness issue will feel the consequences when the systems they’re betting on start drifting in ways nobody can quite explain.

The Regulator Is Already in the Room

A second pressure has begun bearing down on carriers—not from the market but from the statehouse.

Regulators have grown skeptical and, in some cases, openly hostile toward AI systems making consequential decisions about consumers without being able to explain how they reached them. Underwriting denials. Claims settlements. Pricing determinations. When an algorithm touches any of these and something goes wrong, “the model decided” is not a defense.

The pace of the regulatory shift is the story. Roughly 18 months ago, fewer than a dozen states had adopted the NAIC’s Model Bulletin requiring written AI governance programs. That number is now 25 jurisdictions (including the District of Columbia), with another four states enacting parallel state-specific frameworks. New York’s Department of Financial Services has gone further still, requiring insurers to demonstrate that AI and external data systems do not proxy for protected classes or generate disproportionate adverse effects, and demanding that vendors be subject to audit. Colorado’s life insurance regulation (Reg 10-1-1) has put hard rules around algorithmic discrimination, complete with an audit requirement—the most operationally specific framework in the country today.

And the examination apparatus is now real. In January 2026, the NAIC launched a multi-state pilot of its AI Systems Evaluation Tool—a structured examination framework giving regulators a standardized way to interrogate carrier AI governance during market conduct exams. Twelve states are participating in the pilot, which runs through September 2026. Carriers in those jurisdictions should expect inquiries. The era of regulatory patience on AI ended quietly while everyone was watching the model launches.

For many carriers, the vendor accountability piece will be the uncomfortable one. In the rush to deployment, third-party AI tools were procured, integrated and quietly left to run, with little visibility into what they were doing or whether the carrier could account for their outputs. The vendor relationship does not transfer the liability.

The remedy is not a compliance program bolted on after the fact—that is automation theater applied to governance. The carriers that will navigate this regulatory environment deftly are the ones that built auditability into their processes from the start: clear data lineage, documented model logic, bias testing baked into the deployment cycle rather than performed under duress before an exam.

An AI system running on a well-designed, well-documented process is explainable almost by definition. An AI system running on a fragmented legacy workflow, where nobody is entirely sure what inputs feed what outputs, is a liability waiting to happen. Regulatory readiness and operational readiness, it turns out, are the same thing.

The Diagnostic

Strip away the vendor noise, the pilot inventory and the slide decks, and a carrier’s actual AI maturity comes down to whether leadership can answer three questions in under a minute.

Of the AI initiatives currently in your portfolio, how many sit on processes you would describe as well-designed and well-documented? If the honest answer is a minority, you are not running an AI program. You are running an automation theater.
If a state regulator demanded an audit trail for any consequential decision your AI tools have made in the last 12 months—including outputs generated by third-party vendors—could you produce it? If not, you do not yet have AI in production. You have AI exposure.
Has your operating model changed in any structural way to accommodate AI—decision rights, role definitions, accountability boundaries? If the org chart looks the same as it did before the pilots launched, the pilots haven’t changed anything.

Carriers that can answer those three questions cleanly are pulling away from the rest of the industry. Those that cannot have a 24-month window before the gap becomes very difficult to close—and before regulators, capital markets and rating agencies start treating it as a discriminator rather than a curiosity.

The carriers winning with AI are those that fixed the operating model first.