Every team running AI agents already has logs. Application logs, model-call traces, a SIEM (Security Information and Event Management, the central system that collects logs across an organisation) ingesting all of it. So when the question of an audit trail comes up, the instinct is reasonable: we log everything, we're covered. You are not covered. A log records what a system believes happened. Evidence proves the record itself was not changed afterward. Those are different claims, and a regulated buyer is being held to the second one.

The gap is easy to miss because, on a screen, a SIEM dashboard and a real audit trail look identical. Both show timestamped events in order. The difference only surfaces under the one condition that matters, when someone with a motive, and access, wants a line to read differently than it did when the agent wrote it.

What a log can and cannot prove

A standard log is a list of statements a system made about its own behaviour. It is useful, it is necessary, and it answers the question what did we do? It does not answer can you prove that what I'm showing you is what we did?

Consider the failure modes a regulator or opposing counsel will probe:

Mutation. A row is edited after the fact. A "blocked" becomes an "approved," a timestamp slides three hours, a "non-serious" becomes "serious" after the reporting clock has already run.
Deletion. A row is simply removed. The trace shows nine steps; there were ten.
Insertion. A row that never happened is added to make a story coherent, a human approval that was never actually given.
Reordering. The sequence is rearranged so a check appears to precede the action it was supposed to gate.

A SIEM defends against external tampering reasonably well. It does very little against the person who runs the SIEM. Administrators have write access by design; log-forwarding pipelines can be reconfigured; retention can be shortened. "Trust us, we didn't touch it" is the entire security model, and it is exactly the assurance an inspector is paid not to accept. The control you need is one where altering the past is detectable, not merely discouraged.

Hash-chaining: making the past detectable

A tamper-evident log closes the gap with a technique that predates AI by decades. Each entry includes a cryptographic hash, a short, fixed-length fingerprint, of the entry before it. Every record is therefore mathematically welded to its predecessor, and that one to its predecessor, all the way back to the first.

The consequence is that you cannot change a single record in isolation. Edit one line and its fingerprint changes; the next line, which baked in the old fingerprint, no longer matches; and the break cascades forward to the end of the chain. There is no way to alter the middle of the history without rewriting everything after it, and rewriting everything after it is itself detectable, because the chain's final state is recorded and watched.

Hash-chaining turns silent mutation into a visible fracture. That is the whole move. You are not preventing a determined administrator from typing into the database. You are guaranteeing that if they do, the record announces it.

Signing: proving who, not just what

Hash-chaining proves the history is internally consistent. It does not, on its own, prove who produced it, a sufficiently motivated party could rebuild the entire chain from scratch and present a clean forgery.

That is what a cryptographic signature adds. MakerChecker signs its audit export with an Ed25519 key, a modern public-key signature scheme. The holder of the private key signs the chain; anyone with the matching public key can verify the signature, but no one without the private key can produce a valid one. A forged history will not carry a valid signature, and a tampered history will not match the hash chain. To pass both tests at once, you would need the private key and you would need to redo the entire chain, and you still could not back-date it past anything already exported and signed.

This is the difference between a log that says "this is true" and an export that lets a sceptic check that it is true.

The test that separates evidence from theatre: offline verification

Here is the question that quietly decides whether you have an audit trail or a dashboard. Can a third party verify the record without access to your systems?

Most "audit" features fail this test. They let you view history through the vendor's interface, on the vendor's infrastructure, rendered by the vendor's code. The viewer and the data live in the same trust boundary, which means the verification is only as honest as the party being verified. An auditor looking at your screen is not checking your evidence; they are checking your software's willingness to show it.

MakerChecker's audit export is offline-verifiable. The evidence bundle, the hash-chained, Ed25519-signed records, can be handed to a regulator, an external auditor, or opposing counsel and checked on their machine, against a published open specification, with the public key and a verification tool. No login to your environment. No call to a server you control. No trust in the vendor required at all. Because MakerChecker is open source, the verification logic is itself inspectable; nobody has to take the format on faith.

That property is what makes the record usable in the two settings a regulated buyer actually fears: a supervisory inspection and litigation discovery. In both, the other side is specifically not inclined to trust you. An export they can check independently is worth more than any amount of "our system shows."

Why this maps to 21 CFR 11.10(e)

None of this is a novelty invented for AI. The US Food and Drug Administration wrote the standard into 21 CFR Part 11, the rule governing electronic records and signatures, in the late 1990s. Section 11.10(e) requires "secure, computer-generated, time-stamped audit trails to independently record the date and time of operator entries and actions", and, critically, that recorded changes "shall not obscure previously recorded information."

Read that last clause again. The regulation does not ask you to avoid changing records. It requires that the trail make any change visible, the previous state cannot be obscured. That is hash-chaining, written into federal rule a quarter-century before agentic AI existed. A SIEM that lets an administrator overwrite a row silently does not meet 11.10(e). A hash-chained, signed, offline-verifiable export is a direct, mechanical answer to it.

The same logic generalises across the regulated life sciences. ICH-GCP rests on the integrity of the clinical-trial record, and ALCOA+ spells out that data must stay attributable, original, and enduring. EU GVP holds a pharmacovigilance system to an audit trail behind every seriousness and expectedness call, an assurance only as good as the evidence behind it. The predicate rules are date-proof: they govern what the record must prove, regardless of whether a human or an agent generated the entries. We trace that point in 21 CFR Part 11 for AI agents and in the case for why now.

Where the audit trail comes from

A tamper-evident log is only as valuable as the events it captures. This is why the audit trail is not a bolt-on feature but the final primitive of an agent control plane, it records the output of all the others. When an agent is denied a capability it was never granted, that refusal is logged. When the same agent is structurally barred from approving its own work, the blocked attempt is logged. When a human signs an approval gate, the signature, the signer, and their stated reason are logged. The trail is a faithful record precisely because the events it records were enforced, not merely requested. You can read how those grants are constructed in deny-by-default permissions.

A log tells you a story. A tamper-evident, signed, offline-verifiable audit export lets a hostile third party confirm the story is true, without trusting you, your vendor, or your administrators. In a regulated industry, only the second one counts as evidence.

MakerChecker produces a hash-chained, Ed25519-signed audit export anyone can verify offline. See how it works, or book a demo to watch an agent get blocked from approving its own work, live.

Tamper-evident audit logs for AI agents

What a log can and cannot prove

Hash-chaining: making the past detectable

Signing: proving who, not just what

The test that separates evidence from theatre: offline verification

Why this maps to 21 CFR 11.10(e)

Where the audit trail comes from

How MakerChecker works, the six primitives

21 CFR Part 11 for AI agents

Deny-by-default permissions for AI agents

What is an AI agent control plane?

See an agent get stopped.