Automation of Systematic Reviews with Large Language Models Permalink

less than 1 minute read

This looks very impressive, using LLMs to not only survey the literature but also synthesize the results and generate new statistically significant findings.

I’m not sure if the April 2024 Cochrane reviews used for validation are included in GPT-4.1’s training data, so the evaluation might need a second look, but overall this could significantly accelerate evidence synthesis and make genuine contributions to the literature.

Systematic reviews (SRs) inform evidence-based decision making. Yet, they take over a year to complete, are prone to human error, and face challenges with reproducibility; limiting access to timely and reliable information. We developed otto-SR, an end-to-end agentic workflow using large language models (LLMs) to support and automate the SR workflow from initial search to analysis. We found that otto-SR outperformed traditional dual human workflows in SR screening (otto-SR: 96.7% sensitivity, 97.9% specificity; human: 81.7% sensitivity, 98.1% specificity) and data extraction (otto-SR: 93.1% accuracy; human: 79.7% accuracy). Using otto-SR, we reproduced and updated an entire issue of Cochrane reviews (n=12) in two days, representing approximately 12 work-years of traditional systematic review work.

Finding secrets in ‘Oops’ commits Permalink

less than 1 minute read

As someone who gets confused beyond simple commits and pushes, this approach of spelunking for thought-to-be-deleted secrets in “oops” commits is a little scary.

GitHub Archive logs every public commit, even the ones developers try to delete. Force pushes often cover up mistakes like leaked credentials by rewriting Git history. GitHub keeps these dangling commits, from what we can tell, forever. In the archive, they show up as “zero-commit” PushEvents.

Sometimes you just gotta get started Permalink

less than 1 minute read

Via Jeff Triplett:

Write and publish before you write your own static site generator or perfect blogging platform. We have lost billions of good writers to this side quest because they spend all their time working on the platform instead of writing.