Scientific datasets are riddled with copy-paste errors

Monday, April 20, 2026 less than 1 minute read

Markus Englund scanned 600 datasets on Dryad and found serious copy-paste errors in 18 of them — projecting around 700 cases across the full repository of ~24,000 datasets.

“There just isn’t anybody whose job it is to actively look for it.”

Not surprising. Many researchers are working in Excel rather than reproducible pipelines, and journals don’t have the bandwidth to audit supporting data. If anything, 3% seems low?

The obvious fix is an AI toolbench for pre-publication data validation — something that catches this before it enters the literature rather than years after. The verification loop is precisely what experimental science is missing.

Direct Link

Share on

LinkedIn Email Mastodon Bluesky

Dave Flanagan

Scientific datasets are riddled with copy-paste errors

Share on

You May Also Enjoy

The methods section is not a recipe

The Median Is Not a Discovery 🔗

AI model behavior, versioned 🔗

What does Opus 4.7 verify against?