The four-eyes principle is one of the oldest controls in regulated work, and one of the simplest to state: no single person commits a consequential action alone. One prepares it, a second reviews and signs. Two pairs of eyes, hence the name. Healthcare calls it maker-checker. Pharma calls it segregation of duties. The predicate rules, from ICH-GCP to EU GMP Annex 16, name it as the control standard for high-risk decisions.

The principle was written for humans. The interesting question now is what it means when the maker is a model, an AI agent that drafts an adverse-event seriousness call, prepares a batch release, or assembles a device-complaint reportability determination. The answer is not "add an AI reviewer." It is more demanding than that, and getting it wrong is how teams ship agents that look governed and are not.

What four-eyes actually requires

Strip the principle to its load-bearing parts and three conditions have to hold at once. Miss any one and you do not have four-eyes, you have a workflow that resembles it.

The two actors must be different. Not different in name only, but provably distinct, such that the one who prepared the work cannot also be the one who approves it. This is the whole point. A control that lets the maker quietly become the checker is not a control.

The approval must be a deliberate act by a named individual. Someone with authority puts their identity on the decision. "The system approved it" satisfies nothing, an auditor needs a who, not a what. This is why 21 CFR §11.50 requires an electronic signature to carry the signer's name, the date, and the meaning of the signature (review, approval, responsibility). A signature with no recorded meaning is a keystroke, not a sign-off.

The reason must be captured. A checker who clicks "approve" with no record of why has produced a rubber stamp, and rubber stamps fail the moment they are examined. The reason is what turns a signature into evidence.

Those three conditions are the spec. Everything below is how you implement them when the maker is a machine.

Implementing it for an LLM pipeline

A language-model pipeline does not change the principle, but it does change where the control has to live. You cannot ask the model to police itself, the model is the thing being checked. The four-eyes logic has to sit outside the agent, in a layer the agent cannot edit, reprompt, or talk its way around. That layer is a control plane, and it implements four-eyes through three mechanisms.

1. The requester cannot approve their own request. When an agent reaches an action that demands a sign-off, the control plane parks the run and opens an approval gate. The identity that raised the request is structurally barred from satisfying it. This is enforced, not advised: the attempt to self-approve is refused, and the refusal is recorded as evidence the control was tested and held.

2. Approval is n-of-m by named approvers. A gate can demand one signature, or a quorum, two of three safety physicians, a Qualified Person plus a second reviewer. Each approver is a named principal. "n-of-m" simply means n sign-offs are required from a pool of m authorized people, and the requester is never in the satisfying set. For a human-in-the-loop approval gate, this is the difference between one tired analyst and a real quorum.

3. The reason is recorded verbatim. Every signature captures the signer's stated reason at the moment of signing, bound to that specific run. The reason is not a free-floating comment, it is part of the signed record, so it cannot be edited afterwards without breaking the audit chain.

Here is the same idea as a table, mapping the human control onto its machine implementation.

Four-eyes requirement	Human practice	Machine implementation
Two distinct actors	Maker and checker are different staff	Requester structurally barred from approving its own run
Named, deliberate sign-off	Wet or e-signature with meaning	Ed25519-signed approval by a named principal
Recorded reason	Margin note, sign-off comment	Reason captured verbatim, bound to the run
Provable after the fact	Paper trail, filing	Hash-chained, offline-verifiable audit export

Why a second model is not a second pair of eyes

The most common shortcut is to bolt a second model onto the pipeline, a "reviewer" LLM that reads the first model's output and votes approve or reject. This is appealing because it scales and never sleeps. It is also not four-eyes, and an examiner will say so.

A second model is not a second pair of eyes. It is a second draft from the same kind of process, non-deterministic, unaccountable, and unable to bear responsibility. Four-eyes exists precisely because the second actor brings something the first cannot: independent judgement and personal accountability. A reviewer model has neither. It cannot be held responsible for a missed adverse-event signal. It cannot be deposed. It cannot, in any sense the law recognises, decide.

This matters because the controls that demand four-eyes demand a human at the specific point of consequence. Signing the seriousness call that sets the 15-day expedited reporting clock under 21 CFR 314.80 is a mandated human decision. Qualified Person batch release under EU GMP Annex 16 is a named human's personal certification. A device-complaint reportability determination under 21 CFR Part 803 is a named regulatory-affairs reviewer's call. None of these is satisfied by a model checking a model. Use a reviewer model freely, to triage, to catch obvious errors, to draft the case, but the signing eye must be human and named.

This is also where four-eyes and content guardrails part company. A guardrail product asks "is this output dangerous?" Four-eyes asks "did the right authorized person approve this action, and can we prove it?" The two are complementary. A guardrail catches the toxic draft; four-eyes catches the unauthorized approval. You want both, and you should not confuse one for the other.

The test that separates real from theatre

When you evaluate a four-eyes implementation, ignore the diagram and ask one question: can the agent that did the work approve its own work? If the honest answer is "we tell it not to" or "a reviewer model checks it," the control is advisory, and advisory controls fail under segregation-of-duties scrutiny.

If the answer is "no, the requester is structurally barred, a named human signs, the reason is recorded, and the whole thing is in a tamper-evident log," then you have implemented the four-eyes principle rather than gestured at it. That distinction is the entire difference between an agent you can put into production and an agent you have to explain away.

The principle has survived three centuries of fraud because it is hard to fake. Faking it for AI is just as detectable, and the place an examiner looks is the audit trail, where the self-approval that never happened, and the one that was refused, are both written down.

See how it works, or book a demo to watch an agent get blocked from approving its own work, live.

The four-eyes principle for AI workflows

What four-eyes actually requires

Implementing it for an LLM pipeline

Why a second model is not a second pair of eyes

The test that separates real from theatre

How MakerChecker works, the six primitives

Segregation of duties for AI agents

What is maker-checker?

Human-in-the-loop approval gates for agents

See an agent get stopped.