Researchers recently published a method for removing Google’s SynthID watermarks from AI-generated images with near-invisible quality loss, by reverse-engineering the resolution-dependent carrier frequencies and building a spectral codebook for direct subtraction. You can already upload an image and strip the watermark through consumer web tools. The technical sophistication has just moved into the open.
This is the pattern with individual detectors. AI writing detectors misclassify non-native English speakers as AI-generated above 61% of the time; multiple universities have stopped using them in misconduct cases as a result. Watermarks added at generation time get removed when the algorithm can be modeled. The attack surface for any single signal is always exploitable given enough motivation.
A camera that supports C2PA points at something better. It embeds a signed manifest in every photo — device, lens, timestamp, location, edit history — cryptographically signed with a private key stored in a hardware secure element and certificated at manufacture. You can strip the manifest, but you can’t forge it. There’s also a more basic difference: it’s easier to prove something is there than to prove something isn’t. A valid C2PA credential is verifiable. “No watermark detected” is an absence — and that absence means less and less as removal tools proliferate. It doesn’t ask whether the image has statistical properties consistent with AI generation; it asks whether we can verify where it came from. Criminal evidence works this way: documented chain of custody from the moment it’s read off an instrument to the moment it’s presented in court. But science doesn’t work that way, and building C2PA-equivalent provenance into the full research software stack — every Python environment, every lab instrument, every figure-rendering tool — is a long way from practical.
What science has instead of chain of custody is trust — or more precisely, the question of whether trust is warranted. The right frame for research integrity tooling isn’t “detect the artifact” — it’s figure out whether we can trust this author.
I spent time building AI/ML tools for this problem. What seems to work better is a complete profile: how many signals does this paper trigger across the detection stack, and is this paper anomalous for this author — are they writing outside their normal domain, with patterns that deviate from their prior work? When enough signals deviate, a human takes a second look.
The same lesson is sitting in the AI writing detector story. They became a fairness problem as soon as they were deployed visibly at scale, and their failure modes were immediately gamed. A profile-based approach — one that builds up a picture of what this author’s work looks like and flags when something doesn’t fit — is slower to build and harder to explain to an editorial board. But it’s the one that’s actually asking the right question.