Concepts6 min read

An agentic AI compliance checklist

Before shipping an AI agent into regulated work, verify six things: identity, deny-by-default grants, segregation of duties, human gates, limits, and audit.

Most agent checklists you will find online are accuracy checklists. Does it hallucinate? Does it handle edge cases? Is the latency acceptable? Those matter, but they are the wrong list for a regulated firm. The question that decides whether your agent ships is not does it work. It is can you account for what it did.

The list below is the one a compliance, QA, pharmacovigilance, or AML lead should run before an agent touches anything that counts. Each item is something you verify, not something you hope. If you cannot tick it with evidence, the agent is not ready — no matter how good the demo looked.

1. Identity: the agent is named, not anonymous

Before anything else, confirm the agent acts as a named principal — a single identity that holds exactly one role at a time. Not a shared service account, not an anonymous process calling APIs with a stored key.

This is the precondition for every other item. You cannot authorize, limit, or audit an actor you cannot name. If three agents and a batch job all act as the same credential, you have already lost the ability to say who did what, and no later control can recover it.

Verify: every agent maps to one identity, every action carries that identity, and the identity holds one role for the duration of a run.

2. Grants: deny-by-default and versioned

Next, look at how the agent's capabilities are defined. The only safe posture is deny-by-default: the agent can do nothing except the specific actions its role was explicitly granted. Everything not granted is refused, not permitted by silence.

Then check the part most teams skip — that grants are versioned. You should be able to reconstruct exactly what the agent was permitted to do on any past date, and see who approved each change. An examiner's question is rarely "what can it do now." It is "what was it allowed to do the day this happened, and who signed that off." We make the full case in deny-by-default permissions.

Verify: capabilities are an allow-list, not a block-list; the list is versioned; and each version records its approver.

3. Segregation of duties: it cannot approve its own work

This is the item teams most often fake. The agent that prepared a piece of work — drafted the report, cleared the alert, assembled the submission — must not be the one that approves it. Not "should not." Cannot, enforced structurally inside the run.

A configuration flag that you promise to set correctly is not segregation of duties. The test is whether the same agent, asked to both make and check on one run, is structurally refused — and whether that refusal is recorded. This is the oldest control in finance and quality assurance, and it long predates AI: it is 21 CFR 211.22 in pharma, where the quality unit's separation of duties is a structural requirement, and the Wolfsberg Group's four-eye standard in anti-money-laundering.

Verify: an agent provably cannot be both maker and checker on the same run, and attempts to self-approve land in the log as refusals.

4. Human gates on every one-way door

Some actions cannot be undone. Releasing a drug batch, filing a Suspicious Activity Report under the Bank Secrecy Act, pushing a configuration to live medical devices, posting a rebate accrual. For each of these, the agent should park the run and demand a named human signature before it proceeds.

A real gate is more than a notification a tired reviewer clicks through. It can require a quorum of named approvers, it bars the requester from approving its own request, and it captures the signer's reason verbatim — so the signature carries its meaning, the way 21 CFR 11.50 has long required of the people who sign batch releases under EU GMP Annex 16. Filing a SAR is a mandated human decision; the gate is where that requirement becomes enforceable on a machine. We go deeper in human-in-the-loop approval gates.

Verify: every irreversible action is gated; the requester cannot sign its own request; and the signature records who, when, and why.

5. Limits: the agent cannot exceed its mandate by accident

Authorization tells you the agent is allowed to do a kind of thing. Limits tell you it cannot do too much of it. An agent scoped to issue refunds should not be able to issue a refund larger than its role permits, or more refunds per hour than a human would ever process. An agent scoped to read records should not be able to read the entire database in one run.

These are not the same as content guardrails, which ask whether a message is dangerous. Limits ask whether the volume, value, or rate of authorized actions has slipped outside the band a human in that seat would stay inside. The distinction between the two is worth understanding before you ship; we draw it in governance versus guardrails.

Verify: value, volume, and rate ceilings exist for every consequential action, and breaching one stops the run rather than logging it after the fact.

6. Audit: tamper-evident and verifiable offline

The final item is the one everything else exists to produce. Every action, model call, grant change, and approval must land in an append-only, hash-chained, cryptographically signed ledger. Change one record and the chain visibly breaks.

The test that separates a real audit trail from a log file is this: can a third party who distrusts you verify the export offline, against a published spec, with no access to your systems? That is the modern form of the tamper-evident audit trail 21 CFR 11.10(e) has demanded for decades, and the linking 11.70 requires between a record and the signature that approved it. A log you can edit is not evidence. A log only you can verify is not much better.

Verify: the trail is append-only and hash-chained; signatures use a verifiable key; and the evidence bundle checks out for someone offline who does not trust the vendor.

The one-page version

Check What "ready" looks like
Identity Each agent is one named principal holding one role
Grants Deny-by-default, versioned, with recorded approvers
Segregation of duties Maker cannot be checker — structurally, per run
Human gates Every one-way door requires a named signature
Limits Value, volume, and rate ceilings stop the run
Audit Hash-chained, signed, verifiable offline

Why no rulebook gets you off this list

You might wait for a regulator to hand you the agent version of this checklist. Do not. In April 2026 the Federal Reserve issued SR 26-2, which replaced the old model-risk guidance and scoped agentic AI explicitly out of it. The EU AI Act's high-risk obligations were deferred to December 2027. There is no supervisory template for agents — and no template means no safe harbor.

The predicate rules underneath did not move. They govern what a human in that seat must do, and they are date-proof. Examiners still ask who authorized the action, discovery still demands the trail, and a personally-liable officer still has to account for the decision. The absence of an agent-specific rulebook removes the template, not the exposure. This list is how you close the gap between pilot and production before someone else writes the rules for you.

If you can tick all six with evidence, you have an agent an auditor can examine. If any item is "it's in the prompt," you have an agent with good intentions — and in a regulated industry, good intentions are not evidence.


See how it works, or book a demo to watch an agent get blocked from approving its own work — live.

Where this goes to work

How MakerChecker works — the six primitives

Agents as employees, versioned grants, structural segregation of duties, approval gates, role limits, and a signed audit a regulator verifies offline.

See it for yourself

See an agent get stopped.

One command starts the demo: an agent stopped from signing off its own work, and the signed evidence file an inspector can check for themselves.

Designed against the rules your auditors already enforce.