The Median Is Not a Discovery

Tuesday, April 21, 2026 less than 1 minute read

Via Ethan Mollick:

Classic study gave 146 economist teams the same dataset & got wildly different answers. New paper reruns it with agentic AI. Claude Code & Codex land near the human median but with far tighter dispersion & no extremes.

I’m torn between the reproducibility (the tight clustering) and what it might cost in AI-assisted scientific creativity. Barry Marshall looked at the same gastric biopsy data as everyone else and reached the conclusion the field had ruled out. That kind of outlier isn’t noise; it’s occasionally how science moves. If AI reliably clusters near the median human interpretation, it scales up the research we already know how to do. It won’t find the next H. pylori.

Direct Link

Share on

LinkedIn Email Mastodon Bluesky

The methods section is not a recipe

Sunday, April 26, 2026 2 minute read

Via Ethan Mollick, a new paper on agentic reproduction of social-science results asks whether AI agents can reproduce published results from the paper and da...

Scientific datasets are riddled with copy-paste errors 🔗

Monday, April 20, 2026 less than 1 minute read

Markus Englund scanned 600 datasets on Dryad and found serious copy-paste errors in 18 of them — projecting around 700 cases across the full repository of ~2...

AI model behavior, versioned 🔗

Sunday, April 19, 2026 less than 1 minute read

Via Simon Willison, who has turned Anthropic’s published system prompts into a git archive that diffs changes across model releases:

What does Opus 4.7 verify against?

Friday, April 17, 2026 2 minute read

Claude Opus 4.7 looks like a genuine step forward, and one line in the announcement caught my attention: the model “devises ways to verify its own outputs.” ...

Dave Flanagan