less than 1 minute read

Via Ethan Mollick:

Classic study gave 146 economist teams the same dataset & got wildly different answers. New paper reruns it with agentic AI. Claude Code & Codex land near the human median but with far tighter dispersion & no extremes.

I’m torn between the reproducibility (the tight clustering) and what it might cost in AI-assisted scientific creativity. Barry Marshall looked at the same gastric biopsy data as everyone else and reached the conclusion the field had ruled out. That kind of outlier isn’t noise; it’s occasionally how science moves. If AI reliably clusters near the median human interpretation, it scales up the research we already know how to do. It won’t find the next H. pylori.