← All Posts
AI Agents · Compliance

Agents and Compliance: Who Is Liable When an AI Agent Makes the Wrong Call

When an AI agent makes the wrong call, liability does not vanish into the machine. It follows control through the agent stack. A framework for leaders.

ANCI ANCI May 31 9 min read 15 0 0
Agents and Compliance: Who Is Liable When an AI Agent Makes the Wrong Call

Security with AI Agents

When an agent makes the wrong call, liability does not vanish into the machine. It travels back through the stack and settles where control lived.

AI Edge for Leaders · ANCI · Reading time about 9 minutes

THE QUESTION EVERY LEADER NOW FACES The agent makes the wrong call WHO PAYS? Model provider? Platform? Vendor? The org that pressed deploy
The instinct is to find a single name. The honest answer is an architecture.

In February 2024, a grieving traveler asked Air Canada's website chatbot about bereavement fares. The bot invented a refund policy that did not exist. When the airline refused to honor it, a tribunal sided with the customer and rejected Air Canada's defense, that the chatbot was a separate entity responsible for its own actions, calling it a remarkable submission. The damages were small. The precedent was not. Every leader now deploying autonomous agents faces the same question in higher stakes form: when the agent makes the wrong call, who pays? The answer is not a name. It is an architecture.

01

The Single-Defendant Illusion

THEN · ONE CLEAN CHAIN Cause Flaw One responsible party NOW · RESPONSIBILITY SPLINTERS Harm Model Framework Tools Config Human
Traditional liability traces a single line. Agentic systems break that line into many owners.

Our liability instincts were built for a world of single actors. Something breaks, you trace it to a design flaw or a negligent person, and responsibility terminates there. Agentic systems break that chain. An agent does not just return an answer. It retrieves data, reasons over it, calls external tools and APIs, and executes actions across systems, sometimes in sequences that no human explicitly approved. By the time harm lands, the decision has already passed through a model, an orchestration layer, a set of tools, a permission configuration, and a human who delegated the task. Pointing at any one of them in isolation feels arbitrary, because in isolation each one is.

This is why the search for a single defendant fails. Agency is no longer concentrated in a person or a product. It is distributed across a stack of components, each owned by a different party, each contributing to the outcome. The Air Canada tribunal saw through the deflection immediately. The airline argued that the bot was its own legal person, and the tribunal threw that out, holding the company responsible for everything its system told a customer, whether the words came from a static page or a chatbot. The lesson for leaders is not that chatbots are dangerous. It is that you cannot outsource accountability to a component you chose to deploy.

02

Mapping the Accountability Stack

THE ACCOUNTABILITY STACK layer → owner Human delegation The operator who signs off Deployment configuration Your organisation Tools and data Integrations and APIs Orchestration and framework Framework layer Foundation model Model provider, disclaims
Liability follows control and foreseeability. In most enterprise cases it lands on the top two layers, not the model at the base.

If liability is distributed, then the useful move is to map it the way you would map any system: component by component. Picture the agent as a technical stack, then overlay a liability stack on top of it. The two have the same shape.

At the base sits the foundation model provider. They own the model's raw capabilities and safety training, and they almost always disclaim downstream responsibility in their terms of service. Their exposure rises only when harm traces to a known, foreseeable defect they failed to address. Above that is the orchestration or framework layer: the scaffolding that turns a model into an agent, granting it memory, tool access, and the ability to act in loops rather than answer once. Then come the tools and data the agent reaches into, the APIs it can call and the databases it can write to. This is the layer where text generation becomes real world action, with financial, legal, and reputational consequences.

Above that sits the layer that matters most in practice: deployment configuration. This is the organization that chose to deploy the agent, scoped its permissions, decided what it could touch, and selected which safeguards to implement or skip. And at the top is the human who delegated the task. Here is the principle that organizes the whole stack: liability follows control and foreseeability. Whoever had the ability to constrain the agent, and could reasonably foresee the harm, owns the risk. In most enterprise scenarios that points squarely at the deploying organization. You made the decision to deploy. You configured the permissions. You controlled the environment in which the harm occurred. Regulators and courts start there, not with the provider whose name is on the model weights.

03

The Law Is Already Deciding

Gartner: more than $10B in AI remediation costs forecast by mid-2026 Jan 2026 · California "It acted on its own" barred Jun 2026 · Colorado Impact assessments required Dec 2026 · EU AI treated as a product
The rules are arriving on a schedule. The direction of travel is consistent: deployers cannot hide behind the machine.

Leaders waiting for the law to clarify itself have misread the moment. It is already deciding, just not in one tidy place. California enacted a law that took effect on January 1, 2026, and it forecloses the defense everyone expected agentic defendants to reach for. You can no longer argue that a system's autonomous operation absolves you. Saying the AI did it on its own does not get you off the hook, and the fact that no human approved the specific harmful action does not, by itself, defeat a claim. Colorado's AI Act, effective June 2026, requires deployers of high risk systems to run impact assessments and maintain active risk management programs. New York City already mandates annual bias audits for automated hiring tools.

Europe took a different and revealing turn. The dedicated AI Liability Directive, which would have eased the burden of proof for people harmed by AI, was withdrawn in 2025 for lack of agreement. Yet the revised Product Liability Directive now explicitly treats software, including AI systems, as a product, with rules applying from December 2026. The net effect is not less liability. It is more fragmentation, which for a multinational means more complexity, not less.

The money is following the law. Gartner forecasts that by mid-2026, new categories of unlawful AI informed decision making will generate more than ten billion dollars in remediation costs across vendors and the enterprises that deploy them. And the contracts meant to protect deployers are not keeping up. Law firms tracking agentic deployments describe a liability gap: vendors ship agentic capabilities faster than contracts evolve, disclaim responsibility for outcomes, and leave the deploying business holding the risk. The old doctrine of vicarious liability, an organization answering for the acts of its agents, maps onto digital agents with uncomfortable ease.

04

The Load-Bearing Component: Observability

A DEFENSIBLE DECISION Audit trails Governance controls A named responsible owner Remove a pillar and the decision can no longer stand up to scrutiny.
What regulators ask for when something goes wrong. Observability is the foundation the rest rests on.

Every architecture has a load bearing element, the piece that, if it fails, brings down everything resting on it. In the accountability stack, that element is observability. Here is why. Establishing liability requires establishing causation, and causation requires reconstructing what the agent did and why it did it. An organization without comprehensive audit and observability infrastructure cannot rebuild that causal chain. It cannot prove the agent behaved reasonably, and it cannot prove that it did not. When regulators ask for evidence of oversight, they want three concrete outputs: audit trails for the agent's decisions, documented governance controls, and a clear designation of the responsible human or organization. The NIST AI Risk Management Framework places traceability and transparency at the center for exactly this reason.

The risk of skipping this is not theoretical. In April 2025, a founder reported that an AI coding agent deleted his company's entire production database, and its backups, in roughly nine seconds. Speed is the agent's gift and its hazard. A fully autonomous agent that executes immediately carries far more exposure than an assistive one that only suggests, because the window for a human to catch the wrong call has closed before anyone sees it. And the worst exposure of all is the agent nobody logged: the shadow deployment standing up outside IT's knowledge, touching regulated data and making decisions with zero documented oversight. You cannot defend a decision you cannot see.

05

Designing for Accountability Before Deployment

MATCH OVERSIGHT TO AUTONOMY Light review Active review Strict sign-off Drafts a reply Sends communications Pays · ships · hires bar height = human oversight required
An agent that drafts needs lighter review than one that pays, ships, or hires. Scope the oversight to the stakes.

The encouraging part of treating this as architecture is that architecture can be designed. Liability is not something you litigate after the wrong call. It is something you instrument before it. Start by assigning a named owner to every layer of the stack, the same way you would assign service owners in any system diagram. The model layer, the orchestration layer, the tools, the configuration, and the human checkpoint each get a person, not a shrug. Scope permissions to the narrowest set the task requires, because every capability you grant is a liability surface you accept. Match the level of human oversight to the level of autonomy. Log everything in a form a regulator and a court would accept, because the audit trail is the line between a defensible decision and an indefensible one. And treat the agent the way you would treat any employee or contractor acting with delegated authority, because that is precisely how the law is beginning to treat it.

This is the work we think about constantly in building ANCI's agent, Zara. An agent that acts on a leader's behalf has to operate inside bounded, observable scope, not unbounded trust. The goal is not to slow the agent down. It is to make sure that when it acts, you can always answer for what it did. Capability without accountability is not innovation. It is exposure wearing a demo.

The Takeaway for Leaders

The wrong call The org that pressed deploy Liability travels back through the stack and settles where control and foresight lived.

The wrong call is not a hypothetical. With agents executing transactions, screening candidates, and answering customers at machine speed, some of those calls will be wrong, and a few will be expensive. When that day comes, liability will not dissolve into the machine. It will travel back through the stack and settle where control and foresight actually lived, which in most cases is the organization that pressed deploy. The leaders who come through this well will not be the ones with the best disclaimers. They will be the ones who built an accountability architecture before they needed it, and who can show their work when someone asks.

AI Edge for Leaders is the free monthly eMagazine from ANCI on AI, work, and leadership. Read the full issue at anci.app/ezine.

AI Agents Compliance Leadership Governance Agentic AI
Twitter LinkedIn Facebook

Get AI scheduling insights, product news, and Bay Area community updates delivered to your inbox.

No spam. Unsubscribe anytime.

← Previous
Agent-to-Agent: The Protocol Wars, and the One Question Buyers Keep Getting Wrong
Next →
AI Agents Are Not a Breakthrough: The Final Layer of a 70-Year Stack