Segregation of duties is not a software feature. It is a centuries-old admission that any single person, however competent, however honest, should not be the only set of hands on a decision that can do real harm. The reviewer who prepares an adverse-event case does not sign the seriousness call. The analyst who prepares a batch record does not sign off on it. The control exists precisely because trust is not evidence, and because the cost of one undetected error or one quiet fraud is too high to leave to a single actor.

Now the actor is a model. An AI agent can draft the seriousness call, approve it, mark the case reportable, and write the note explaining why, in one unbroken run, in milliseconds, with nobody in the loop. Everything that made segregation of duties necessary for people applies to the machine, and one thing makes it worse: the agent does not get tired, does not feel doubt, and will execute a flawed plan at full speed without the hesitation that sometimes saves a human from themselves.

What the rule actually requires

Two regimes name this control directly, and neither is new.

In pharmaceutical manufacturing, 21 CFR 211.22 establishes the quality unit as an authority independent from production. The people who make the product do not get to be the people who release it. That structural separation, production proposes, quality disposes, is the spine of pharmaceutical quality. It is not advisory language; it is the rule that lets a finished batch reach a patient.

In pharmacovigilance, the predicate rules name the same control. Under 21 CFR 314.80, EU GVP, and ICH E2B(R3), the seriousness and expectedness determination on an adverse-event case sets the 15-day expedited reporting clock, and a named safety physician signs it. One actor prepares the case; a second, independent person reviews and signs the call. The reviewer who assembled the case cannot be the signer. Setting the reporting clock, escalating a signal, submitting an expedited report: these are decisions a single actor is not permitted to make alone.

Both rules say the same thing in different vocabularies. The actor who does the work must not be the actor who blesses it. For decades that meant two names on a form. The question now is what it means when one of the actors is an agent, and whether your current setup enforces it or merely hopes for it.

Flagging is not enforcing

Here is the distinction that decides whether you have a control or a comforting story. Most "governed" agent setups detect a self-approval after it has already happened. A monitor notices that the same agent prepared and approved a run, and raises an alert. A log entry is written. A dashboard turns amber.

That is detection, not segregation of duties. The improper action occurred. The clock was set, the batch was released, the report was filed, and you found out afterward. Detective controls have their place, but they are not what 211.22 or 314.80 ask for. Those rules demand that the improper combination of duties be impossible, not merely visible.

There is a second failure mode that looks more sophisticated and is just as weak: putting the rule in the prompt. You must not approve your own work; route approvals to a different agent. This is an instruction, not a constraint. A re-prompted, updated, or jailbroken agent can ignore it, and the prompt leaves no record of who authorized the separation or whether it held on a given date. An examiner asking "prove this agent could not approve its own run" will not accept a paragraph of natural-language guidance as proof.

Approach	What happens when an agent self-approves
Prompt instruction	It might comply. It might not. No record either way.
Post-hoc monitor	The action completes; you get an alert afterward.
Structural enforcement	The run is refused at the moment of the attempt.

Enforcing it structurally, at runtime

Structural enforcement means the boundary lives outside the agent, in a layer the agent cannot edit, and is checked at the instant of action rather than after it. This is the job of an agent control plane, the policy layer that sits between what an agent intends and what it is allowed to actually do.

MakerChecker enforces segregation of duties as a property of the run itself. Every agent acts as a named identity holding one role at a time, so there is always an answer to who did this. When a run reaches a step that requires approval, the control plane checks the identity of the actor proposing the approval against the identity that prepared the work. If they are the same actor, the approval is refused, not flagged, refused, and the refusal is recorded.

The same agent provably cannot be both maker and checker on one run. This is not a setting an agent can talk its way around, because the check does not run inside the agent. It runs in the control plane, against versioned policy, before the one-way action executes.

For decisions that need more than one reviewer, the same mechanism extends to n-of-m approval gates: a run parks until a quorum of named approvers signs, and the requester is barred from being one of them. That is the four-eyes principle made literal, a second, independent set of eyes that the system guarantees rather than requests. The signers' reasons are captured verbatim, so the signature carries its meaning the way 21 CFR 11.50 expects of a human one.

Why "cannot" beats "should not"

The difference between should not and cannot is the difference between a policy and a control. A policy describes intended behavior. A control makes the unintended behavior unavailable. Auditors have spent a long time learning to distrust the first and ask for the second, because "should not" is exactly the sentence that precedes every controls failure ever written up.

When the actor is a fast, tireless, occasionally jailbroken model, the gap widens. An agent that should not self-approve will, under the right prompt or the wrong edge case, do it repeatedly before anyone reads the dashboard. An agent that cannot self-approve fails closed on the first attempt, and the attempt itself becomes evidence, a recorded refusal in a tamper-evident audit trail that shows the control was live and working on the date in question.

That recorded refusal is the artifact that matters. It is the modern equivalent of a second signature missing from a form, not a story about your good intentions, but proof of what the system would and would not permit.

The same control, two industries

Segregation of duties is one of the few controls that reads identically across a drug safety team's pharmacovigilance process and a manufacturer's quality unit. The maker and the checker must be different actors. The requester cannot approve their own request. The separation must be enforced, not assumed, and provable after the fact.

The regulators wrote those rules for people because, at the time, only people did the work. The rules never said the actor had to be human, they said the actor who prepares must not be the actor who approves. Putting an agent on one side of that line, or both, does not change what the rule requires. It only changes what you have to build to keep it true.

See how it works, or book a demo to watch an agent get blocked from approving its own work, live.

Segregation of duties for AI agents

What the rule actually requires

Flagging is not enforcing

Enforcing it structurally, at runtime

Why "cannot" beats "should not"

The same control, two industries

How MakerChecker works, the six primitives

The four-eyes principle for AI workflows

What is maker-checker?

GMP batch release with AI agents

See an agent get stopped.