Math and code got there first
Quanta Magazine has a piece this week on how AI has changed mathematical research — AlphaEvolve, LLMs as collaborative partners, problems that used to take m...
Quanta Magazine has a piece this week on how AI has changed mathematical research — AlphaEvolve, LLMs as collaborative partners, problems that used to take m...
Graph-based parsers appear to outperform LLMs on relation extraction — and the gap widens as relational complexity grows. A preprint out today from Gajo et a...
A good breakdown of the MCP vs. Skills tradeoffs from David Mohl:
A new paper from Wharton finds that LLM-generated Community Notes on X are rated more helpful than human-written ones across 108,000+ ratings. It’s a well-de...
Researchers recently published a method for removing Google’s SynthID watermarks from AI-generated images with near-invisible quality loss, by reverse-engine...
I have started talking about the inbox apocalypse that is going to hit this year, where everything that is normally sort of reviewed and bottlenecked by h...
Anthropic restricted Claude Mythos to vetted security researchers this week via Project Glasswing — not because it was producing false positives, but because...
There’s a piece on the ergosphere blog worth reading this week about what the author calls the Alice-and-Bob problem. Alice and Bob both produce a PhD resear...
Nicholas Carlini, a research scientist at Anthropic, ran a simple bash script that looped over every file in the Linux kernel and asked Claude Code to look f...
I’ve been thinking a lot about this quote from Steve Krouse (via Simon Willison):
Agree with this post 100%. I switched to using uv about six months ago and it has made package management in python much easier.
Let’s be considerate about how we use GenAI to write emails, articles, or blog posts. When I first started, it was fun: Wow, I can crank out a 750-word essay...
Via Simon Willison:
Learning Python for data science seven years ago changed the trajectory of my career. This documentary is a great behind-the-scenes view of the people who br...
Is it satire? Is it an art project?
Via Simon Willison:
Switzerland released their own Llama-3-class model, trained exclusively on public sources while respecting crawler opt-out requests.
The Content Authenticity Initiative is a collaborative effort to bring transparency to digital media. By using cryptographic signatures and standardized meta...
China released their new “AI Plus” strategy document last week when I was in Beijing. Here is some context and a translation of the policy document (via Bene...
Have you tried M365 Copilot lately? It has gotten seriously good.
I’m not sure if we’re ready for agentic browser control. Yes, you can click each time to accept the risk, but how many of us read the T&Cs before we clic...
While many educators in the West see AI as a threat they have to manage, more Chinese classrooms are treating it as a skill to be mastered. In fact, as th...
The FDA’s head of AI, Jeremy Walsh, admitted that Elsa can hallucinate nonexistent studies. “Elsa is no different from lots of [large language models] ...
This looks like a handy package for converting documents (PDF, .docx, .pptx, and more) to .md. There’s also a MCP server so you can use it with your LLM.
A fascinating look into OpenAI the company:
The AP article quotes Simon Willison:
I haven’t had to figure out AWS IAM or review the Cost Explorer in a hot minute.
Via @ErikJonker@mastodon.social:
As AI writing assistants become more prevalent in academic and professional settings, we face a growing challenge: how do we maintain the integrity of the sc...
Paul’s post got me about 80% of the way there, but I was still having issues with
Today I learned how to set up a complete CI/CD pipeline for Python packages using modern tooling. As a first-time package publisher, I wanted to make sure I ...
Via Clarke & Esposito, an entertaining sketch written by Mike Woodward of William Playfair, who invented bar charts and pie charts in between misadventur...
Hugo-Bowne Anderson1 argues that agentic workflows shouldn’t be your first choice because of their increased complexity and instability. Remember that GenAI ...
This looks very impressive, using LLMs to not only survey the literature but also synthesize the results and generate new statistically significant findings.
This is more of a survey than a critical review, and the equations on pages 8–10 seem unnecessary, but a potentially useful compilation map of what’s new as ...
As someone who gets confused beyond simple commits and pushes, this approach of spelunking for thought-to-be-deleted secrets in “oops” commits is a little sc...
Via Jeff Triplett: