Compliance by Design: When Regulatory Principles Become Code

What the FDA-EMA Joint AI Principles Really Mean for the Companies Building AI in Life Sciences

In January 2026, the FDA and EMA did something quietly significant. They published ten joint guiding principles for the use of AI across the drug development lifecycle — the first time these two agencies have formally aligned on how artificial intelligence should be governed in the pharmaceutical world.

The document is short. It’s non-binding. And most of the coverage so far has treated it as a policy milestone worth a press release and a LinkedIn post. But if you’re actually building AI systems for regulated life sciences — not just commenting on them — these ten principles are something else entirely: they’re a preview of what becomes binding. They are the design constraints your platform needs to satisfy today if you don’t want to be retrofitting tomorrow.

I’ve spent the past several years building medically-tuned AI at Sorcero, working with pharmaceutical and medical affairs teams who operate under some of the most demanding compliance requirements in any industry. When I read these principles, I didn’t see a regulatory document. I saw an architecture spec.

And three of the ten principles stood out as the ones that will separate companies that are genuinely ready from those that have been treating compliance as a checkbox exercise – along with one critical gap the principles don’t yet address..

The Quiet End of “General-Purpose AI” in Pharma - Principle #4: Clear Context of Use

Of all ten principles, Principle #4 “Clear Context of Use” may be the most consequential for how life sciences AI gets built. It requires that every AI system have a well-defined context of use: its role, its scope, what inputs it expects, and how its outputs should be interpreted.

On the surface, that sounds obvious. In practice, it’s a direct challenge to the prevailing approach in our industry: take a general-purpose large language model, wrap it in a compliance-sounding system prompt, and call it “life sciences ready.” The problem? A model that can do anything is, from a regulatory perspective, defined to do nothing in particular. It can’t be clearly scoped. It can’t be rigorously validated. And it can’t be held accountable when something goes wrong.

In agentic AI systems — where multiple specialized components work together to plan, retrieve, reason, generate, and collaborate with human experts — this principle becomes even more critical. Each agent in the orchestration chain needs its own well-defined context of use. The retrieval component has a different role and scope than the summarization component, which has a different role than the component responsible for checking factual grounding. Treating them as a single undifferentiated system isn’t just an architectural shortcut — it’s a compliance gap, because you can’t validate what you haven’t clearly defined, and you can’t audit decisions that were never explicitly scoped.

We encountered this directly when a pharmaceutical customer escalated their requirements around detecting commercial intent in queries to our platform. Our system already enforced boundaries between scientific and commercial content — that’s foundational to serving Medical Affairs teams. But this customer went further: their legal team required explicit, testable proof that commercially-intended queries could not produce promotional output under any circumstances. It became a contractual requirement. Other customers addressed this primarily through employee training and policy — “your people should know not to do this.” This customer wanted the platform itself to make it impossible.

Their initial assumption was that it would be a prompt-level fix: add an instruction telling the model to refuse commercial requests. But that’s exactly the kind of surface-level approach that Principle #4 argues against. A prompt instruction can be circumvented, misinterpreted, or hallucinated around. It doesn’t provide a traceable, auditable control.

So instead, we built the detection into our orchestration layer — the point in the system where the AI decides which downstream agents to invoke. When commercial intent is detected, the system doesn’t try to generate a filtered response and hope the filter works. It prevents the downstream agents from executing entirely and guides the user toward appropriate use. Because the control sits upstream of generation — before any retrieval, summarization, or content creation happens — there’s no output to manipulate. The boundary is architectural, not probabilistic.

That’s the difference between compliance as a prompt hack and compliance as an architectural property. Principle #4, taken seriously, demands the latter.

compliances

Compliance as Architecture vs. Compliance as a Wrapper

Provenance Is Necessary, but Not Sufficient – Principle #6: Data Governance and Documentation

Principle #6 “Data Governance and Documentation” calls for data source provenance, processing steps, and analytical decisions to be documented in a detailed, traceable, and verifiable manner. For anyone who has operated in GxP-regulated environments, this principle is familiar territory. Life sciences companies have been doing data lineage in some form for decades.

Where it gets interesting — and where I believe the principles haven’t fully caught up with the technology — is in what happens when your AI system isn’t a single model processing a single dataset, but an orchestrated chain of agents making sequential decisions, working alongside human experts who review, intervene, and redirect.

In that context, data provenance alone is insufficient. You need decision provenance — a complete trace that captures not just the data that went in, but every consequential choice in the workflow. When a retrieval agent selects certain documents over others, when a reranking component reorders results based on relevance scoring, when a hallucination-checking mechanism flags a claim as ungrounded — each of those is an analytical decision that shapes the final output. And critically, when a human expert reviews the AI’s output and accepts, modifies, or overrides it — that’s a decision too, and it needs to be part of the same audit trail. The principles call for traceability; in a system where humans and AI agents collaborate, traceability means the full chain: what the AI did, what the human saw, what they decided, and why.

This matters practically, not just philosophically. When a Medical Science Liaison questions why the platform surfaced a particular insight, or when a quality team needs to audit how a document was generated, the ability to reconstruct the full decision chain — from query to retrieval to ranking to generation to human review — is what separates a trustworthy system from a black box that happens to produce plausible output.

This is the standard we’re building toward at Sorcero. Our architecture is designed to produce a full decision trace at every step of the pipeline — but critically, whether and where that trace is stored is governed by the customer, not by us. Many pharmaceutical organizations have strict policies around what data can be retained by a vendor, and we respect that. When customers enable audit trail storage, we maintain it securely for analysis and auditability. Where they don’t, we’re building toward models where the provenance record can reside on the customer’s infrastructure while we retain the access needed for support and continuous improvement. The platform provides the capability for full decision provenance; the customer governs the policy.

We didn’t arrive at this because we anticipated these specific FDA-EMA principles. We arrived at it because our customers — pharmaceutical teams responsible for pharmacovigilance, scientific communication, and medical affairs — told us directly: they need auditability, but they also need data sovereignty. That dual demand has shaped our roadmap from the start, and these principles now give us a shared language for why that investment matters.

But I’d challenge the principles to go further. As the industry moves toward agentic AI, governance frameworks need to explicitly address multi-step decision chains — including the human decisions within them — not just data lineage. And they should acknowledge that in enterprise life sciences, the question of who stores the audit trail is as important as whether one exists. The principles are a strong starting point, but the next iteration should account for the real-world data governance constraints that pharmaceutical companies operate under.

pipeline

Decision Provenance in an Agentic System

The Most Underappreciated Principle – Principle #8: Risk-based performance assessment

If you noticed that human-in-the-loop kept surfacing in the previous two sections, that’s not an accident. It’s because Principle #8 “Risk-based Performance Assessment” ties them together — and it may be the most underappreciated of all ten.

The AI industry has a benchmarking obsession. F1 scores, BLEU, ROUGE, perplexity — the entire evaluation culture is built around measuring model performance in isolation. Principle #8 asks a fundamentally different question: how does the complete system perform, including the humans who interact with it?

This is the principle that most of the industry is least prepared for. It calls for evaluating “human-AI interactions” using metrics appropriate for the intended context of use. In plain language: it’s not enough to show that your model is accurate. You need to show that when a Medical Affairs professional uses your system in their actual workflow, the combined human-AI system produces reliable outcomes.

That reframing changes everything about how you design, test, and monitor AI systems. It means your evaluation framework can’t stop at the API boundary. It has to account for how outputs are presented, how easy it is for a domain expert to verify the AI’s reasoning, whether the interface supports or undermines critical review, and whether the system degrades gracefully when it encounters edge cases.

At Sorcero, this principle maps to a design philosophy we’ve operationalized: we separate the evaluation of “did the AI retrieve the right evidence and follow its instructions?” from the evaluation of “is the final output scientifically sound and fit for purpose?” The first question is an engineering question — it’s about system performance, and it can be partially automated. The second question is a domain question — it requires human expertise and judgment, and the system must be designed to facilitate that judgment, not replace it.

What that looks like in practice depends on the workflow. In our scientific communications work, human-in-the-loop means medical writers and SMEs review, validate, and approve AI-generated content before it reaches its intended audience — the AI drafts, the expert decides. In our medical intelligence platform, the dynamic is different: MSLs and Medical Directors use AI-surfaced insights to inform scientific strategy, respond to medical inquiries, or identify emerging trends. The AI’s job is to surface the right evidence, in the right context, so the expert can make an informed decision. In both cases, the AI accelerates the work; the human judgment is what makes it trustworthy.

This is why human-in-the-loop isn’t a feature we bolt on. It’s a structural requirement embedded in how we design every workflow. When regulators evaluate the performance of AI systems in life sciences, they’ll be looking at the whole chain: what the AI produced, how the human interacted with it, and whether the combined system delivered a reliable result.

What’s Missing: The Agentic Governance Gap

I want to be clear: these principles are a strong foundation, and the fact that the FDA and EMA aligned on them is significant. But they were written for the AI landscape of 2024, and the systems being deployed in 2026 have already moved beyond what the principles explicitly address.

The most notable gap is agentic AI governance. These principles assume an AI system that supports human decision-making — that generates evidence, produces summaries, or flags risks for a human to act on. They don’t yet address AI systems that act — that plan multi-step workflows, invoke specialized components, make routing decisions, and produce complex outputs through chains of autonomous reasoning.

In agentic systems, the governance questions multiply. Who is accountable when an orchestration agent routes a task to the wrong specialist agent? How do you validate a system where the execution path varies based on the input? What does “lifecycle management” look like when the system’s behavior is emergent from the interaction of multiple agents?

These aren’t hypothetical questions. They’re the questions we’re answering in production today, and I believe they represent the next frontier for regulatory guidance. I’d welcome collaboration between platform builders, regulators, and the broader life sciences community to develop governance frameworks that match the reality of how these systems are being built and deployed. The principles provide an excellent starting point. What’s needed now is a pragmatic extension that addresses the complexity of agentic orchestration — and that guidance will be strongest if it’s shaped by those of us doing the building alongside those doing the regulating.

The Window Is Now

The January 2026 principles are non-binding today. But the trajectory is clear: they’ll inform binding guidance from both agencies, they’ll shape what reviewers expect in regulatory submissions, and they’ll become the baseline against which AI-enabled products are evaluated.

Companies that embed these principles into their platform architecture now — not as documentation, but as design constraints — will find themselves with a structural advantage that compounds over time. Compliance built into the system from the start is dramatically less expensive, more reliable, and more auditable than compliance retrofitted after the fact.

For us at Sorcero, this isn't a pivot. It's a validation of choices we made years ago: that life sciences AI must be domain-specific, architecturally transparent, and designed from the ground up for human oversight. We laid out this philosophy in detail in our whitepaper with Google and USDM Life Sciences, A Comprehensive Guide to Responsible AI for Life Sciences, and the FDA-EMA principles have now given the industry a shared framework for why those choices matter. The question for the rest of the industry is whether they're ready to make them too.

About the Author

Hellmut Adolphs is the CTO of Sorcero, a life sciences AI company that builds medically-tuned AI for pharmaceutical and medical affairs teams.