Build to learn đź”—

less than 1 minute read

Marty Cagan on the distinction between product discovery (“build to learn”) and product delivery (“build to earn”), and why AI makes the former more important, not less.

The hard part is building the product sense necessary to evaluate the learnings and guide the direction.

Similarly, an AI editor could confirm or deny whether a paper’s claims are likely to be true, but in a coming age of radical overabundance of valid research it’s the taste of the editor that matters — selecting which papers their audience would actually care about. Product sense works the same way: it isn’t verification, it’s curation.

Math and code got there first

1 minute read

Quanta Magazine has a piece this week on how AI has changed mathematical research — AlphaEvolve, LLMs as collaborative partners, problems that used to take months solved in days.

The structural reason mathematics and software development got there first is worth pausing on. Both have fast automated verification built in — proof assistants like Lean for math, test suites and type checkers for code. The loop closes in seconds. Drug discovery has never had that — the verification step is a wet lab experiment that takes weeks or months.

That gap is getting shorter. A dynamic flow system at NC State, published last year in Nature Chemical Engineering, generates ten times more experimental data than previous approaches by monitoring reactions in real time rather than waiting for steady state. Exscientia has been running closed design-make-test-learn cycles in its Oxford robotics facility since late 2024. Periodic Labs, which launched last October with a $300M round from founders of ChatGPT and GNoME, is building explicitly toward this for materials discovery.

The distinguishing factor between disciplines where AI has already transformed research and those where it hasn’t isn’t the AI. It’s the speed of the verification loop. Mathematics and software development had that built in. Experimental science is engineering its way to the same place.

The Quanta piece reads like a preview.

Where graphs supplement LLMs đź”—

less than 1 minute read

Graph-based parsers appear to outperform LLMs on relation extraction — and the gap widens as relational complexity grows. A preprint out today from Gajo et al. has evidence across six datasets. For pharma and biomedical knowledge graphs, where the useful relations are mechanism-of-action chains and adverse event pathways rather than simple co-mentions, this is the relevant regime. Useful alongside what I wrote earlier this week on knowledge graphs as research discovery tools.

MCP vs. Skills đź”—

less than 1 minute read

A good breakdown of the MCP vs. Skills tradeoffs from David Mohl:

Skills are great for pure knowledge and teaching an LLM how to use an existing tool. But for giving an LLM actual access to services, the Model Context Protocol (MCP) is the far superior, more pragmatic architectural choice.

In practice, some publishers aren’t forcing the choice. Wiley’s Knowledge Nexus offers both — MCP if you want to point an LLM at it directly, API if you’d rather build your own integration. Whichever fits your stack is probably fine.

The knowledge graph as digital twin

2 minute read

A new paper from Wharton finds that LLM-generated Community Notes on X are rated more helpful than human-written ones across 108,000+ ratings. It’s a well-designed study and the result is credible — for social media fact-checking, which is what it’s testing. Whether something similar could work for scientific literature is a different question, and the answer depends entirely on what you build underneath it.

Social media claims are mostly atomic: a politician said something, a statistic is cited correctly or not, an event happened or didn’t. You can check those against a corpus. Scientific claims are relational — they assert relationships between entities distributed across thousands of papers, and the “truth” of the claim is a property of the network, not any individual document. Asking an LLM to fact-check “compound X inhibits pathway Y at therapeutic doses” requires knowing what the literature establishes about X’s mechanism, Y’s context-dependence, and whether the relevant concentrations have ever appeared in the same study. A retrieval system can find text that mentions both; it can’t tell you whether the relationship holds.

This is precisely what knowledge graphs were built for. Don Swanson demonstrated it in 1986: he found that fish oil and Raynaud’s syndrome research had never cited each other, yet traversing the relationships — fish oil inhibits platelet aggregation, platelet aggregation implicated in Raynaud’s — produced a testable hypothesis. No document stated it. The connection existed only in the graph. A clinical trial three years later confirmed it.

Thirty years on, Himmelstein et al. built Hetionet: 47,000 nodes, 2.25 million relationships, 29 biomedical databases integrated into a single graph. They used it to generate drug repurposing predictions across 209,000 compound-disease pairs. Most of those candidates couldn’t be found by searching the literature because no paper had connected them — that’s what made them candidates worth testing.

The reason I keep coming back to this is that “fact-checking” is actually the least interesting thing a knowledge graph enables. Verification looks backward: does this claim hold given what we know? Discovery looks forward: what does the structure of existing knowledge imply that nobody has tested yet? Swanson and Himmelstein were doing the second thing. An AI system built on structured biomedical knowledge could do both simultaneously — flagging claims that contradict established relationships while surfacing hypotheses that the graph supports but the literature hasn’t yet stated.

The infrastructure question is the hard one, and also the interesting one. Building a knowledge graph like Hetionet is, in a real sense, constructing a digital twin of the scientific record — a computable representation of what the literature actually establishes about how the world works. Ground truth in science is still reality, just harder to access than a test suite. A well-constructed knowledge graph is the closest thing we have to making it queryable. Agents can already find errors faster than humans can triage them — the bottleneck isn’t computation, it’s the structured representation of what science actually knows. That’s a much larger project than building a better Community Notes, and a much more valuable one.