Integration6 min read

Govern Claude Agent SDK agents

A proxy session makes MakerChecker the authorization point and the evidentiary record while the Claude Agent SDK keeps executing the tools.

A team builds an agent on the Claude Agent SDK, gives it a handful of tools, and within a fortnight it is reading cases, querying systems, and drafting decisions faster than the analyst it was meant to assist. Then the compliance lead asks the question that ends pilots: who decided this agent could call that tool, and where is the proof it did not approve its own work? The build is excellent. The answer is missing. And so the agent stays in pilot.

The Claude Agent SDK is the toolkit Anthropic provides for building agents on Claude — it gives the model a loop, a set of tools it can call, and the plumbing to run multi-step work without you wiring each integration by hand. It is good at what it does. What it does not do is decide whether a particular agent, in a particular role, was authorized to take a particular action right now, and leave a record an examiner can verify. That is not a gap in the SDK. It is a different job, and it belongs in a different layer.

The SDK executes; it does not authorize

When an agent built on the SDK calls a tool — close_alert, release_batch, post_adjustment — the SDK faithfully runs it. That is the contract. The SDK's job is to let the model reach the tool, pass the arguments, and return the result to the loop. It is engineered to make the call, not to interrogate it.

So nothing in that path asks the three questions a regulator asks about a human in the same seat. Does this actor's role permit this capability today? Did this same actor already prepare the item it is now about to approve? Is there a tamper-evident record that someone can later check? These are authorization and evidence questions. The SDK was never designed to answer them, and a system prompt instructing the agent to "always get approval before closing an alert" is a request, not a control — the model can ignore it, drift from it, or be talked out of it, and either way nothing is recorded.

The honest framing is that a tool call is an authorization decision in disguise. Treating it as just a function is how a regulated team ends up with an agent that is fast, useful, and impossible to account for.

A proxy session makes MakerChecker the checkpoint

You do not fix this by rebuilding the agent on someone else's runtime. You put a checkpoint in front of the agent you already have. The mechanism is a proxy session — a session the agent opens with MakerChecker that sits between the agent's intent and the tool it wants to call. The agent no longer reaches the tool directly; it asks through the proxy, and the proxy answers before anything touches a real system.

The SDK keeps doing exactly what it is good at. The model still reasons, drives its loop, and executes the tool when the call is allowed. What changes is that the decision to allow it now lives outside the agent, where the agent cannot edit it. Two things happen inside the proxy session, and only two.

First, authorization. MakerChecker checks the requested action against the agent's role and its versioned grants. Capability is deny-by-default: only the specific skills the role was explicitly granted are open, and every grant carries a version and a record of who approved it. Because grants are versioned, you can reconstruct exactly what the agent was permitted to do on any past date. If the action falls outside the role, the proxy refuses — and the refusal is itself logged.

Second, the evidentiary record. Whether the action is allowed, denied, or parked for a human signature, the proxy writes it to a hash-chained, cryptographically signed ledger. The property that buys is simple: change one entry and the chain visibly breaks, and the resulting export can be verified offline by someone who does not trust the vendor and has no access to your systems.

Note what does not happen in the proxy session. MakerChecker does not run the tool, does not host your agent, and does not own the integration. The work still executes inside the Claude Agent SDK. MakerChecker is the authorization point and the witness, not the engine.

What the agent can and cannot do once it is wrapped

The point of the checkpoint is not to give the agent more freedom. It is to stop the dangerous calls before they touch a real system, and to make each stop a piece of evidence rather than a silent failure.

The agent tries to… What the proxy session does
Call a tool its role was never granted Denies the call, logs the attempt with the missing grant named
Approve an item it prepared itself Refuses — same actor cannot be maker and checker — and records it
Take a high-stakes, one-way action Parks the run at an approval gate and waits for a named person to sign

That second row is segregation of duties enforced structurally, not as advice. The same agent that prepared a piece of work provably cannot be the one that approves it on the same run. Not "should not" — cannot. That is the maker-checker principle, the four-eye standard the Wolfsberg Group names in finance and the quality-unit separation 21 CFR §211.22 requires in pharma, applied to a machine.

The third row matters for irreversible actions — filing a suspicious-activity report, releasing a batch, posting an adjustment. The proxy does not just deny; it routes the run to an n-of-m approval gate where a named human signs, the requester is barred from approving its own request, and the signer's reason is captured verbatim. Those are the signature manifestations 21 CFR §11.50 demands, applied to an agent's work.

Why this beats moving to a "governed platform"

The instinct, when governance is missing, is to re-home the agent on a new runtime that promises control built in. That is a migration: re-implementing prompts, re-wiring tools, re-testing every path your validation team already signed off, and re-earning the trust of whoever approved the pilot. In a regulated shop, that last part is the expensive one, and it is the surest way to stall a project that was finally working.

Wrapping inverts the trade. You keep the Claude Agent SDK, you keep the tools and loops your team already debugged, and governance arrives as a checkpoint at the tool boundary rather than a rewrite of the agent. The same pattern for other stacks is covered in governing LangChain agents, and the general principle in wrap existing AI agents without migrating. Because many SDK agents reach their tools through the Model Context Protocol, the mediation often lands at the MCP boundary — see MCP-native agent governance.

It is worth being precise about what this layer does and does not do. A proxy session answers is this actor authorized to do this? It does not inspect whether the content the model produced is toxic, hallucinated, or unsafe — that is the job of a guardrail product, and a good one belongs in the stack too. The two are complementary. One asks whether the output is dangerous; the proxy asks whether the actor was allowed.

The path out of pilot

Agents stall in pilot for one reason, and it is rarely capability. It is that nobody can sign off on letting an unaccountable actor touch production systems. That sign-off is a question of evidence, not enthusiasm, and a proxy session is what produces the evidence — without asking your team to rebuild the agent they spent quarters tuning.

And do not wait for a rulebook that names this case. US model-risk guidance was rewritten in April 2026 to scope agentic AI out, so there is no supervisory template telling you precisely how to govern an SDK agent. No template is not the same as no obligation: the predicate rules governing what a human in the seat must do never moved, and discovery does not wait for new guidance. A self-hosted checkpoint that can run air-gapped, on Postgres you already operate, is how an agent your team already trusts starts answering those questions today.


See how it works, or book a demo to watch an agent get blocked from approving its own work — live.

Where this goes to work

How MakerChecker works — the six primitives

Agents as employees, versioned grants, structural segregation of duties, approval gates, role limits, and a signed audit a regulator verifies offline.

See it for yourself

See an agent get stopped.

One command starts the demo: an agent stopped from signing off its own work, and the signed evidence file an inspector can check for themselves.

Designed against the rules your auditors already enforce.