Via Ethan Mollick, a new paper on agentic reproduction of social-science results asks whether AI agents can reproduce published results from the paper and data alone, without seeing the original code. Often they can. But the more interesting failures reported in the paper are where they cannot, because the paper does not actually specify enough of the method.
That feels very familiar from chemistry.
I am not a social scientist, so I will leave the empirical social-science side there. The chemistry version is easy to recognize. There is a large gap between a typical experimental section that gestures at what was done and something like Organic Syntheses, where procedures are written in much more detail and each reaction and characterization dataset is checked for reproducibility in the laboratory of a member of the Board of Editors.
That standard exists for a reason. Rick Danheiser wrote in C&EN that, from 1982-2005, about 12% of submitted Organic Syntheses articles were rejected because the results could not be reproduced. After more detailed author instructions and a procedure checklist were introduced in 2005-07, more than 95% of submissions checked with satisfactory reproducibility.
That is the part I keep coming back to. The problem is not always that an author is hiding something. Often they know the procedure so well that they no longer notice which details are load-bearing. Stirring rate, addition order, concentration, drying time, workup details, vendor grade, how dry the solvent really was, what “room temperature” meant that week in that lab. Anyone who has tried to repeat a reaction from a too-short experimental section knows this feeling.
This is where AI tools could be useful without pretending to be the chemist. Not “write my experimental section,” and definitely not “certify that this procedure works.” More like: read this procedure as an annoying first-year graduate student who has to run it tomorrow, and ask what is missing.
There is already adjacent work here. A Nature Communications paper converted prose synthesis procedures into structured action sequences for chemical synthesis. That is not the same thing as reproducing a reaction, but it points in the right direction: if a procedure cannot be converted into concrete actions, quantities, conditions, and decision points, it probably is not as complete as it looks.
I wrote recently that ground truth in science is still reality, just harder to access than a test suite. That remains true. AI cannot tell you whether a reaction will work in the lab without someone eventually doing the experiment. But it may be able to tell authors where their methods section stops being a recipe and starts being a memory aid for the person who already knows what happened.
That would be a useful tool. Slightly irritating, probably. But useful.