A sanctions-screening system does one thing and does it badly: it matches the names on your payments, customers, and counterparties against government watchlists, and it matches them loosely. "Mohammed Ali" on a wire trips against a dozen sanctioned entries that share two syllables. A shipping address near a restricted port flags a transaction that has nothing to do with it. The system is tuned to catch everything, which means it catches almost everything, and a human has to sit with each hit and decide: real, or noise.
The arithmetic is brutal. The overwhelming majority of sanctions alerts are false positives — fuzzy matches on common names, dead entries, transliteration collisions. An analyst opens each one, compares dates of birth, nationalities, identifiers, and the underlying transaction, and clears the obvious mismatches. The backlog never empties. Payments sit held while the queue is worked. It is exactly the high-volume, mechanical, reviewable labour an AI agent is good at.
Where the agent earns its keep, and where it must stop
An AI agent — software that reads the alert, pulls the customer and transaction data, compares it against the watchlist entry, and reasons about whether they are the same party — can work the false-positive queue at a speed no team of analysts matches. That is the real prize, and it is worth being precise about its shape.
| Stage | Who acts | Why |
|---|---|---|
| Gather customer, transaction, and list-entry data | Agent | Mechanical, high-volume, reviewable |
| Compare identifiers, score the match quality | Agent | Fast, draftable, never final |
| Clear obvious false positives under policy | Agent, under policy | Reversible, sampled, logged |
| Confirm a true match; block or release the payment | Named officer | Mandated human judgment |
The first three stages are where the speed lives. A clear mismatch — a different date of birth, a different nationality, an entry delisted years ago — is a defensible auto-clear under written policy, and the agent can dispatch it in seconds while ranking the uncertain hits to the top of the human queue. Done well, the backlog shrinks and analysts spend their time on the matches that merit a person.
The fourth stage is different in kind. Confirming that a payment or a customer is a sanctioned party — and the inverse, deciding a borderline hit is not a match and letting the funds move — carries direct legal weight. Release the wrong payment and you have a sanctions breach. That call belongs to a named officer, not to the model that summarized the file.
The Wolfsberg standard already drew this line
This is not a MakerChecker preference. The Wolfsberg Group — the consortium of global banks that writes the de facto reference for financial-crime controls — names the four-eyes principle, the maker-checker split, as the control standard for exactly these dispositions. One actor prepares the case; a different, qualified actor reviews and signs. The reviewer carries the accountability for the call.
The principle predates AI by a century, and that is the point. When the maker was a screening analyst and the checker was a sanctions officer, the segregation was enforced by org charts, system roles, and sign-off forms. When the maker becomes an agent, the org chart stops helping. The agent does not sit in a reporting line. It cannot be coached, disciplined, or held personally liable for a sanctioned payment it waved through. The control has to move somewhere the agent cannot edit — and stay provable after the fact.
Why the prompt won't hold this line
The obvious fix is to instruct the agent: clear the obvious false positives, but never confirm a true match and never release a held payment without a human. This reads like a control. It is a request.
A prompt instruction has no record of who set the boundary, no version history when it changes, and no proof — months later, in an examination or a regulator's look-back — that the agent actually stopped where it was told. An agent that is re-prompted, upgraded, or jailbroken can quietly begin clearing the hits it was meant to escalate, and nothing in the file would show the boundary ever existed. "It was in the system prompt" is not evidence an examiner accepts.
A control plane — the layer that decides what an agent is allowed to act on, separate from the agent itself — moves the boundary out of the model and into enforced, versioned policy. The agent's authority to act on an alert is a grant attached to its role, denied by default. That grant includes "clear a policy-defined false positive." It does not include "confirm a true match" or "release a held payment." The agent can propose the disposition. It structurally cannot complete it.
The requester cannot approve its own request
Here is the rule that does the real work, and the one a prompt can never deliver. When the screening agent escalates a hit it cannot clear under policy, it becomes the requester. The control plane parks the run at an approval gate and demands a signature from a named officer — and the officer who signs cannot be the agent that raised it. The same actor provably cannot be both maker and checker on one alert.
This matters more with agents than it ever did with people, because an agent is fast enough to flag and self-clear thousands of hits before anyone notices the pattern. Structural segregation removes the possibility rather than auditing for it after the fact. The agent that prepared the case is barred from confirming or clearing it, full stop, and every attempt — including the refusals — lands in the record.
For the genuinely consequential dispositions, the gate can demand a quorum: two of three sanctions officers must sign before a held payment to a near-match party is released. That is an n-of-m approval — n sign-offs required from a pool of m authorized people — and the requester is never in the satisfying set. The mechanics are the same ones behind any human-in-the-loop approval gate, applied at the point where a sanctions error becomes a legal liability.
What the officer signs is captured with its meaning intact: the disposition, the signer's identity, and the reason in their own words, linked to the exact version of the dossier they reviewed. That is a true-match decision becoming defensible evidence rather than a cleared checkbox.
What the examiner gets
Sanctions enforcement does not run on annual self-attestation; it runs on look-backs and discovery. When a regulator asks why a payment to a sanctioned party cleared, or why a customer who matched a list was onboarded, the question is specific: who decided this hit was not a match, on what basis, and prove the record has not been altered since.
A control plane answers it directly. Every action the agent took, every false positive it cleared under policy, every escalation it raised, and every human signature lands in an append-only, hash-chained, cryptographically signed ledger. Change one record and the chain visibly breaks. The output is an evidence bundle a third party can verify offline, against a published spec, without touching your systems — the posture you want when that third party is a regulator who distrusts the vendor by default.
This is also where the line with content guardrails matters. Tools that screen an agent's output for prompt injection or unsafe text answer "is this content dangerous?" They are useful, and they are not this. A control plane answers a different question entirely: "is this actor authorized to take this action, and who signed it?" You want both. Only one of them keeps a sanctions disposition in named human hands.
Done this way, the split holds at speed. The agent clears the noise that has swallowed sanctions teams for years, the institution keeps the judgment, and the decision that carries the legal exposure stays with a person — not as policy hope, but as a structural fact the audit trail can demonstrate to anyone who asks. It is the same discipline that governs AML alert triage, and it belongs in every queue across the bank middle office.
See how it works, or book a demo to watch a screening agent get blocked from confirming its own match — live.