What does Opus 4.7 verify against?

2 minute read

Claude Opus 4.7 looks like a genuine step forward, and one line in the announcement caught my attention: the model “devises ways to verify its own outputs.” The model isn’t just generating; it’s checking.

The obvious question is: checking against what?

That could mean internal self-consistency — trying a calculation two ways, looking for contradictions in its own reasoning. Useful, but it doesn’t escape the model’s own knowledge boundaries. Or it could mean external retrieval — and for most deployments today, that means a web search. That’s better than nothing, but it’s a weak verification tool for scientific claims. The web will tell you that fish oil is associated with cardiovascular health. It won’t tell you whether the mechanism-of-action proposed in a 2019 paper has been confirmed, challenged, or quietly superseded by six subsequent studies. For that, you need something structured.

Which raises a more interesting question: what would Opus 4.7’s verification loop look like if it had access to a proper scientific knowledge graph — not search, but a graph of claims made across the literature, tagged with confidence, provenance, and the network of studies that support or contradict them? Or better still, causal datasets: not “paper A mentions compound X and outcome Y” but “experiment N demonstrated cause-effect at dose Z, replicated three times.”

I’ve written before about how the speed of the verification loop is what separates fields where AI has transformed research from fields where it hasn’t (yet). Math closes the loop via proof assistants; drug discovery historically couldn’t close it in under months. That’s changing — Exscientia’s closed design-make-test-learn cycles, Periodic Labs building automated materials discovery. But closing the experimental loop is a separate problem from connecting AI reasoning to the existing literature — and that side has barely started.

A model that actively seeks to verify its reasoning is only as good as what it can verify against. Right now we’re giving it the open web. The more interesting engineering problem is connecting it to the structured record of what science has actually established — and what it hasn’t. Wiley’s Scholar Gateway and Nexus Domains are attempts at this — Scholar Gateway for in-session retrieval via MCP, giving Claude and other AI systems access to peer-reviewed literature rather than the open web; Nexus Domains for curated content feeds delivered via API and MCP to enterprise R&D pipelines. These are first steps in building the right verification layer. The question Opus 4.7 makes newly urgent is whether the rest of the field catches up.

Build to learn 🔗

less than 1 minute read

Marty Cagan on the distinction between product discovery (“build to learn”) and product delivery (“build to earn”), and why AI makes the former more important, not less.

The hard part is building the product sense necessary to evaluate the learnings and guide the direction.

Similarly, an AI editor could confirm or deny whether a paper’s claims are likely to be true, but in a coming age of radical overabundance of valid research it’s the taste of the editor that matters — selecting which papers their audience would actually care about. Product sense works the same way: it isn’t verification, it’s curation.

Math and code got there first

1 minute read

Quanta Magazine has a piece this week on how AI has changed mathematical research — AlphaEvolve, LLMs as collaborative partners, problems that used to take months solved in days.

The structural reason mathematics and software development got there first is worth pausing on. Both have fast automated verification built in — proof assistants like Lean for math, test suites and type checkers for code. The loop closes in seconds. Drug discovery has never had that — the verification step is a wet lab experiment that takes weeks or months.

That gap is getting shorter. A dynamic flow system at NC State, published last year in Nature Chemical Engineering, generates ten times more experimental data than previous approaches by monitoring reactions in real time rather than waiting for steady state. Exscientia has been running closed design-make-test-learn cycles in its Oxford robotics facility since late 2024. Periodic Labs, which launched last October with a $300M round from founders of ChatGPT and GNoME, is building explicitly toward this for materials discovery.

The distinguishing factor between disciplines where AI has already transformed research and those where it hasn’t isn’t the AI. It’s the speed of the verification loop. Mathematics and software development had that built in. Experimental science is engineering its way to the same place.

The Quanta piece reads like a preview.

Where graphs supplement LLMs 🔗

less than 1 minute read

Graph-based parsers appear to outperform LLMs on relation extraction — and the gap widens as relational complexity grows. A preprint out today from Gajo et al. has evidence across six datasets. For pharma and biomedical knowledge graphs, where the useful relations are mechanism-of-action chains and adverse event pathways rather than simple co-mentions, this is the relevant regime. Useful alongside what I wrote earlier this week on knowledge graphs as research discovery tools.

MCP vs. Skills 🔗

less than 1 minute read

A good breakdown of the MCP vs. Skills tradeoffs from David Mohl:

Skills are great for pure knowledge and teaching an LLM how to use an existing tool. But for giving an LLM actual access to services, the Model Context Protocol (MCP) is the far superior, more pragmatic architectural choice.

In practice, some publishers aren’t forcing the choice. Wiley’s Knowledge Nexus offers both — MCP if you want to point an LLM at it directly, API if you’d rather build your own integration. Whichever fits your stack is probably fine.