A drug-safety team does not get to choose its workload. A new label warning, a season of flu shots, a litigation campaign, a viral social-media post, any of these can multiply adverse-event intake overnight, and the regulatory clock does not slow down to match. Serious cases carry expedited timelines measured in days. Miss them and the consequence is not an internal slip; it is a reportable failure of the entire pharmacovigilance system.

So the pull toward AI agents is obvious. Adverse-event intake is repetitive, high-volume, and language-heavy, exactly the work a model handles well. An agent can read a messy email, a call-centre transcript, or a literature abstract and turn it into a structured case faster than a human can open the form. The temptation is to hand it the whole pipeline. That is the mistake.

Pharmacovigilance, often shortened to PV, is the science of detecting and assessing the harms of medicines once they are on the market. It runs on a distinction the technology is happy to ignore: the difference between handling a case and judging one.

What an agent should do: structure and prioritise

Most of an adverse-event case is assembly, not judgement. A report arrives as free text and has to become a structured record in a defined format. The global standard for that record is ICH E2B, the electronic format every regulator expects an Individual Case Safety Report to arrive in, specifying how the patient, the suspect drug, the event, and the source are coded.

This is honest work for an agent. Given a raw report, it can:

Extract the four minimum criteria for a valid case, an identifiable patient, an identifiable reporter, a suspect product, and an event.
Map the described reaction to the right coded term so cases group correctly.
Populate the E2B fields and flag what is missing, so a follow-up question goes out the same day.
De-duplicate against existing cases and sort the queue, pushing the ones that look serious to the front.

Done well, this is the difference between a safety team buried in backlog and one that meets its deadlines. The agent is fast, consistent, and tireless, and none of those qualities require it to decide anything that matters. It prepares. It proposes. It does not conclude.

What stays human: seriousness and causality

Two judgements in PV are not clerical, and they are where regulatory liability concentrates.

Seriousness sets the reporting clock. A case judged serious triggers an expedited timeline; a case judged non-serious does not. Get this wrong in the permissive direction and you have missed a mandated deadline. The assessment turns on defined criteria, death, life-threatening, hospitalisation, disability, congenital anomaly, applied to ambiguous human accounts. An agent can flag candidates. It must not be the actor of record that decides and starts or stops the clock.

Causality is the judgement of whether the drug plausibly caused the event. It weighs timing, dechallenge and rechallenge, confounders, and the patient's history into a clinical opinion that feeds signal detection across the whole product. This is qualified medical judgement. Handing it to a model that cannot be held accountable, and cannot explain itself to an inspector in the terms the inspector uses, is not efficiency. It is an unrecorded decision waiting to be discovered.

The line is clean. The agent owns throughput. The qualified person owns the call. The system has to make that line structural, not a matter of good intentions, which is precisely the job of an AI agent control plane.

The control that makes the split real

Saying "a human signs off" is not a control. Anyone can paste an agent's conclusion into a field and click approve. The control has to guarantee that the actor who prepared the case cannot be the actor who judges its seriousness or causality, and that the judgement, with its reasoning, is captured in a form an inspector can verify years later.

This is the maker-checker principle, the four-eye control that quality and safety functions have run on for decades, applied to a machine. In pharma it is the logic behind 21 CFR §211.22: the quality unit's responsibilities are separated from the work it oversees. A control plane enforces the same separation at runtime.

Step	Actor	Control
Intake and E2B structuring	Agent	Deny-by-default, versioned skill grant
Queue triage and prioritisation	Agent	Recorded, reversible, not a final call
Seriousness determination	Qualified person	Approval gate; requester cannot self-approve
Causality assessment	Qualified person	Approval gate with reason captured verbatim
Case lock and submission	Qualified person	Signed, hash-chained audit entry

The structural part matters. It is not enough for policy to say the agent should stop at triage. The same agent must provably not be able to act as both the maker of the case and the checker of its seriousness on a single run. The attempt to self-approve is refused, and the refusal itself is logged, which is often the evidence an inspector most wants to see.

Why the record is the point

An adverse-event case is, in the end, a chain of decisions someone will be asked to defend. Who structured this case? On what date, under which version of the triage rules? Who judged it non-serious, and on what stated reasoning? Was the record altered after the fact?

Part 11 makes those questions concrete. §11.10(e) requires a tamper-evident audit trail of who did what and when. §11.50 requires that an electronic signature carry its meaning, review, approval, responsibility. §11.70 binds that signature to the specific record so it cannot be lifted and reused. A safety system built around agents has to satisfy all three for the agent's actions and the human's, or the speed it bought you evaporates the first time the record is tested. We cover that obligation in depth in Part 11 and AI agents.

MakerChecker produces exactly this evidence as a by-product of doing the work. Every model call, every grant, every gate, every signature lands in an append-only, hash-chained, cryptographically signed ledger. Change one entry and the chain visibly breaks. The export verifies offline, against an open spec, without anyone needing access to your systems, which is the form of proof an inspector trusts.

The honest version of the pitch

AI agents will not replace the qualified person in pharmacovigilance, and any vendor who implies otherwise is selling the part of the job that carries the liability. What agents replace is the backlog, the hours spent transcribing, coding, and sorting before a human ever gets to think.

That is a worthwhile trade only if the split between machine throughput and human judgement is enforced and recorded, not promised. The same logic governs the adjacent regulated reporting obligations, see how it applies to FDA medical device reporting, and it is the difference between an agent you can put into production and one you have to keep in a pilot you cannot defend.

See how it works, or book a demo to watch an agent get blocked from approving its own work, live.

Pharmacovigilance and AI agents

What an agent should do: structure and prioritise

What stays human: seriousness and causality

The control that makes the split real

Why the record is the point

The honest version of the pitch

MakerChecker for life sciences

Medical device reporting with AI agents

21 CFR Part 11 for AI agents

Cold-chain monitoring with AI agents

See an agent get stopped.