← back
May 2026 · duran.bio

What AI Adoption in Science Tells Us About Trust

Why AI in scientific software wins or loses on the parts that aren't AI — verification, reproducibility, and the rigour scientists already apply to everything else.

Professionalised Scepticism: What AI-Native Actually Means in Scientific Software

Deloitte surveyed 104 R&D executives across biopharma in early 2025. The headline finding was near-universal optimism about AI in the lab. Investments flowing into automation, analytics platforms, AI tools, robotics, the lot. The same survey, in the very next breath, was clear-eyed about why most of it isn’t paying off yet: data foundations and governance.

Industry surveys of biotech R&D leaders have shown the pattern more sharply. Adoption appears to be fastest where outputs can be verified. Literature review, structure prediction, scientific reporting. Conversely, it falls off where they can’t be. e.g. Generative design or biomarker analysis.

Most analysis frames this as a maturity gap. I think it’s a pattern, and the lesson for anyone building scientific software in the next decade is in that pattern.

The pattern, not the gap

What is interesting is that this verifiability divide isn’t a niche observation. When you dig deeper it appears to be playing out across all the biggest deals of the last eighteen months.

Importantly, the pattern isn’t unique to drug discovery. It’s how AI is being deployed across regulated, high-stakes industries such as big pharma.

The part most commentary misses?

Every one of these works because there’s something downstream verifying the model’s output. Wet-lab confirmation. Virtual cell simulation. Physics-based engineering tolerances. Audit logs that survive a regulatory inspection. The deals aren’t bets on the model. They’re bets on the verifier the model gets paired with.

Why the verifiability pattern isn’t an accident

Science is professionalised scepticism. Peer review, replication, retraction, error bars, confidence intervals. These aren’t bureaucracy. They’re the trust architecture, and they exist because every working scientist knows that confident-sounding answers are the easy part. The hard part is establishing which of those answers survives a determined attempt to break it.

In that frame, a language model that produces a fluent, plausible answer to a scientific question without any way to verify it isn’t really a scientific tool. It’s a literature-shaped object. Useful in narrow contexts, dangerous in broad ones, and increasingly easy to spot once you know what you’re looking for.

What is encouraging is that the architectural answer to this problem already exists. Berkeley AI Research called it compound AI systems in a 2024 paper. State-of-the-art results, they argued, are increasingly produced by systems with multiple components rather than monolithic models. Cross-verification, retrieval, tool use, deterministic checks. The model is one piece of the system, not the whole of it.

That consensus has been absorbed by serious AI engineering teams across legal, financial services, and code generation. It hasn’t fully landed in scientific software yet. Obviously that’s the gap, and in my view it’s a strategic one, not a maturity one.

AI-native doesn’t mean LLM-native. It means starting from the trust architecture, not the model.

The architecture

The most important part of an AI product is the part that isn’t AI.

Sounds backwards, doesn’t it? Let’s explore further.

The pattern I keep noticing across every domain where AI has to produce outputs that survive scrutiny is the same one. A model proposes; a deterministic engine verifies; the system flags what’s unknowable and routes the hard cases for review. Propose, verify, revise.

The pattern is already in production across very different industries.

Three industries, three vocabularies, the same architecture. A generative model proposes; a deterministic engine verifies; the loop iterates. Replace “part geometry” with “molecular structure,” “case law” with “experimental evidence,” and the architecture doesn’t change. Hard to read this as a coincidence. It looks more like the pattern that emerges anywhere AI has to produce outputs that survive contact with the physical world.

Same pattern, three industries, all in production. The architecture isn’t a forecast. It’s already here.

What this means for what we build

A few things follow from all of this for anyone building scientific software.

Adoption follows verification. The question isn’t where you can add an LLM. It’s where the answer is checkable, what the verifier looks like, and how much of the user’s existing trust the verifier inherits. Get that ordering right and adoption is fast. Get it backwards and you build something that demos well and dies in production.

Auditability isn’t a feature. It’s the product. In any domain that touches GxP, EMA, peer review, or a clinical filing. The audit trail isn’t a compliance tax: it’s the reason the software gets used. Builders who treat the audit trail as an afterthought are, in my view, building toys.

The integration substrate is settling. The Model Context Protocol (MCP) went from Anthropic’s launch in late 2024 to Linux Foundation governance in thirteen months, with developer adoption quickly reaching in the millions. The plumbing is being commoditised, and what runs on top of it appears to be where the value moves. Agentic workflows are twelve to twenty-four months from being standard enterprise deployment, and I’d expect them to favour vendors with verified computation rather than raw data access.

The EVE Online Project Discovery initiative is one of my favourite signals here. Player counts in the high hundreds of thousands have spent their gaming time drawing polygons around cell clusters in real flow cytometry data, building a consensus dataset that AI gating algorithms train on. My colleague and friend Ryan Brinkman is the scientific partner on the project, and his academic work on automated flow cytometry gating predates this entire wave of AI by two decades. The verifier became valuable enough that the AI got built around it, not the other way round. Of course, that’s what a moat looks like in this architecture.

There’s a counter-argument worth taking seriously, and it’s a good one. The dominant view in foundation model labs is that general methods scaling with compute power consistently beat domain-specific engineering over the long run. The implication drawn: as models get good enough, the verifier becomes legacy code.

I’d push back on that. Even if a future model is right 99% of the time, science still has to know which 1% is wrong, why, and how to reproduce the result tomorrow on a different machine. A system that can’t be audited can’t be cited. The verifier isn’t legacy code. It’s the part that makes the model usable in a domain where being right most of the time isn’t the same as being trusted.

Coda

The companies that win in scientific software post-AI won’t be the ones that bolt a language model onto existing workflows. My bet is they’ll be the ones that treat AI with the same rigour scientists apply to everything else: measurable fitness, reproducible outputs, and the intellectual honesty to flag what’s unknowable.

Science is professionalised scepticism. That’s exactly the mindset this space needs more of.

If you’re building in this space and thinking about the verifier rather than the model, I’d love to hear from you. I’d genuinely enjoy comparing notes, especially with anyone who has counter-evidence on this.