Posts by Category

The methods section is not a recipe

Sunday, April 26, 2026 2 minute read

Via Ethan Mollick, a new paper on agentic reproduction of social-science results asks whether AI agents can reproduce published results from the paper and da...

The Median Is Not a Discovery 🔗

Tuesday, April 21, 2026 less than 1 minute read

Via Ethan Mollick:

Scientific datasets are riddled with copy-paste errors 🔗

Monday, April 20, 2026 less than 1 minute read

Markus Englund scanned 600 datasets on Dryad and found serious copy-paste errors in 18 of them — projecting around 700 cases across the full repository of ~2...

AI model behavior, versioned 🔗

Sunday, April 19, 2026 less than 1 minute read

Via Simon Willison, who has turned Anthropic’s published system prompts into a git archive that diffs changes across model releases:

What does Opus 4.7 verify against?

Friday, April 17, 2026 2 minute read

Claude Opus 4.7 looks like a genuine step forward, and one line in the announcement caught my attention: the model “devises ways to verify its own outputs.” ...

Build to learn 🔗

Friday, April 17, 2026 less than 1 minute read

Marty Cagan on the distinction between product discovery (“build to learn”) and product delivery (“build to earn”), and why AI makes the former more importan...

Math and code got there first

Tuesday, April 14, 2026 1 minute read

Quanta Magazine has a piece this week on how AI has changed mathematical research — AlphaEvolve, LLMs as collaborative partners, problems that used to take m...

Where graphs supplement LLMs 🔗

Monday, April 13, 2026 less than 1 minute read

Graph-based parsers appear to outperform LLMs on relation extraction — and the gap widens as relational complexity grows. A preprint out today from Gajo et a...

MCP vs. Skills 🔗

Sunday, April 12, 2026 less than 1 minute read

A good breakdown of the MCP vs. Skills tradeoffs from David Mohl:

The knowledge graph as digital twin

Saturday, April 11, 2026 2 minute read

A new paper from Wharton finds that LLM-generated Community Notes on X are rated more helpful than human-written ones across 108,000+ ratings. It’s a well-de...

Provenance, not detection

Friday, April 10, 2026 2 minute read

Researchers recently published a method for removing Google’s SynthID watermarks from AI-generated images with near-invisible quality loss, by reverse-engine...

The inbox apocalypse 🔗

Thursday, April 9, 2026 less than 1 minute read

I have started talking about the inbox apocalypse that is going to hit this year, where everything that is normally sort of reviewed and bottlenecked by h...

Ground truth is reality

Wednesday, April 8, 2026 1 minute read

Anthropic restricted Claude Mythos to vetted security researchers this week via Project Glasswing — not because it was producing false positives, but because...

Publishing’s two jobs

Monday, April 6, 2026 2 minute read

There’s a piece on the ergosphere blog worth reading this week about what the author calls the Alice-and-Bob problem. Alice and Bob both produce a PhD resear...

Agents, bugs, and the statistical editor

Saturday, April 4, 2026 1 minute read

Nicholas Carlini, a research scientist at Anthropic, ran a simple bash script that looped over every file in the Linux kernel and asked Claude Code to look f...

MCP lets you ship faster 🔗

Friday, November 21, 2025 less than 1 minute read

I’ve been thinking a lot about this quote from Steve Krouse (via Simon Willison):

uv is the best thing to happen to the Python ecosystem in a decade 🔗

Thursday, October 30, 2025 less than 1 minute read

Agree with this post 100%. I switched to using uv about six months ago and it has made package management in python much easier.

AI-Generated “Workslop” Is Destroying Productivity 🔗

Wednesday, September 24, 2025 less than 1 minute read

Let’s be considerate about how we use GenAI to write emails, articles, or blog posts. When I first started, it was fun: Wow, I can crank out a 750-word essay...

I think ‘agent’ may finally have a widely enough agreed upon definition to be useful jargon now 🔗

Friday, September 19, 2025 less than 1 minute read

Via Simon Willison:

Python: The Documentary

Monday, September 15, 2025 less than 1 minute read

Learning Python for data science seven years ago changed the trajectory of my career. This documentary is a great behind-the-scenes view of the people who br...

Anycrap.shop: Bring Impossible Products to Life 🔗

Sunday, September 14, 2025 less than 1 minute read

Is it satire? Is it an art project?

Beyond vibe coding: AI-assisted development 🔗

Friday, September 5, 2025 less than 1 minute read

Via Simon Willison:

Switzerland releases its own AI model trained on public data 🔗

Thursday, September 4, 2025 less than 1 minute read

Switzerland released their own Llama-3-class model, trained exclusively on public sources while respecting crawler opt-out requests.

Are content credentials going mainstream? 🔗

Wednesday, September 3, 2025 less than 1 minute read

The Content Authenticity Initiative is a collaborative effort to bring transparency to digital media. By using cryptographic signatures and standardized meta...

China Releases “AI Plus” Policy: A Brief Analysis 🔗

Wednesday, September 3, 2025 less than 1 minute read

China released their new “AI Plus” strategy document last week when I was in Beijing. Here is some context and a translation of the policy document (via Bene...

M365 Copilot + GPT-5 = big improvement

Tuesday, September 2, 2025 less than 1 minute read

Have you tried M365 Copilot lately? It has gotten seriously good.

Piloting Claude for Chrome 🔗

Wednesday, August 27, 2025 less than 1 minute read

I’m not sure if we’re ready for agentic browser control. Yes, you can click each time to accept the risk, but how many of us read the T&Cs before we clic...

Chinese universities want students to use more AI, not less 🔗

Tuesday, July 29, 2025 less than 1 minute read

While many educators in the West see AI as a threat they have to manage, more Chinese classrooms are treating it as a skill to be mastered. In fact, as th...

FDA’s artificial intelligence is supposed to revolutionize drug approvals. It’s making up studies 🔗

Friday, July 25, 2025 less than 1 minute read

The FDA’s head of AI, Jeremy Walsh, admitted that Elsa can hallucinate nonexistent studies. “Elsa is no different from lots of [large language models] ...

Markitdown 🔗

Wednesday, July 16, 2025 less than 1 minute read

This looks like a handy package for converting documents (PDF, .docx, .pptx, and more) to .md. There’s also a MCP server so you can use it with your LLM.

Reflections on OpenAI 🔗

Wednesday, July 16, 2025 less than 1 minute read

A fascinating look into OpenAI the company:

Empirical evidence of Large Language Model’s influence on human spoken communication 🔗

Wednesday, July 16, 2025 less than 1 minute read

Preprint:

Grok 4 delivers surprises 🔗

Monday, July 14, 2025 less than 1 minute read

The AP article quotes Simon Willison:

The AWS Survival Guide for 2025: A Field Manual for the Brave and the Bankrupt 🔗

Monday, July 14, 2025 less than 1 minute read

I haven’t had to figure out AWS IAM or review the Cost Explorer in a hot minute.

If MCP is the USB-C of AI agents, A2A is their Ethernet 🔗

Monday, July 14, 2025 less than 1 minute read

Via @ErikJonker@mastodon.social:

Fighting AI Hallucinations One Citation at a Time: Introducing the LLM Citation Verifier

Sunday, July 13, 2025 6 minute read

As AI writing assistants become more prevalent in academic and professional settings, we face a growing challenge: how do we maintain the integrity of the sc...

Enabling cookie consent on a Jekyll Minimal Mistakes site 🔗

Sunday, July 13, 2025 less than 1 minute read

Paul’s post got me about 80% of the way there, but I was still having issues with

TIL: Modern Python Package CI/CD with uv, Trusted Publishing, and GitHub Actions

Saturday, July 12, 2025 5 minute read

Today I learned how to set up a complete CI/CD pipeline for Python packages using modern tooling. As a first-time package publisher, I wanted to make sure I ...

The bad boy of bar charts: William Playfair 🔗

Tuesday, July 8, 2025 less than 1 minute read

Via Clarke & Esposito, an entertaining sketch written by Mike Woodward of William Playfair, who invented bar charts and pie charts in between misadventur...

Stop Building AI Agents 🔗

Monday, July 7, 2025 less than 1 minute read

Hugo-Bowne Anderson1 argues that agentic workflows shouldn’t be your first choice because of their increased complexity and instability. Remember that GenAI ...

Automation of Systematic Reviews with Large Language Models 🔗

Friday, July 4, 2025 less than 1 minute read

This looks very impressive, using LLMs to not only survey the literature but also synthesize the results and generate new statistically significant findings.

AI4Research: A Survey of Artificial Intelligence for Scientific Research 🔗

Friday, July 4, 2025 less than 1 minute read

This is more of a survey than a critical review, and the equations on pages 8–10 seem unnecessary, but a potentially useful compilation map of what’s new as ...

Finding secrets in ‘Oops’ commits 🔗

Thursday, July 3, 2025 less than 1 minute read

As someone who gets confused beyond simple commits and pushes, this approach of spelunking for thought-to-be-deleted secrets in “oops” commits is a little sc...

Sometimes you just gotta get started 🔗

Sunday, June 29, 2025 less than 1 minute read

Via Jeff Triplett:

Posts by Category

Blog