the red queen problem in ai detection
The Red Queen runs to stay in place, and at least she has someone to run against. The detection arms race is stranger than that. One side is sprinting, recalibrating thresholds, tuning perplexity models, publishing benchmarks. The other side is not running at all. It is not evading. It is not even aware there is a race. It is just drifting, like a buoy, slowly toward the middle of the distribution, and that midpoint is precisely where the detectors lose their grip.
I keep a voice corpus, a pile of my own writing the tools chew on to model how I sound. At some point I fed in a handful of posts I had written with AI assistance, just to be thorough, and the measured perplexity of the whole corpus dropped. Not catastrophically, but clearly. The reference had shifted under me. The thing measuring "how surprising is this text" had been quietly told that less surprise is normal now, since I had salted the fields with prose that lives closer to the mean. A clean baseline jumps on the signal sharply. A contaminated one shrugs.
Scale that up and you get the actual problem. Perplexity as an AI tell depends on a reference model trained on human writing. Build that yardstick on a corpus that is, say, thirty percent machine output, and the ruler stops measuring what you hope it measures. The gap between human and machine surprise narrows, and you will be tempted to read that narrowing as the AI getting better. It is not, necessarily. The baseline came down to meet it. The ruler degrades from the measurement.
Reddit is the canary down this particular shaft. High volume, moderation stretched thin as cling film, a measurable fraction of synthetic text substantial by 2024. A model trained on the 2024 version of the site sits lower than one trained on 2020, and nobody publishes the contamination fraction, so the unreliability is itself invisible. You cannot correct for a drift you can't even see.
And model collapse does the rest without intent. Each new AI generation trains on a corpus containing the last generation's outputs, which cluster toward the mean and clip the tails, and those clipped extremes are exactly where the discriminating signal lives. The text is not hiding. Regressing to its center, while the center moves out to meet it.
Here is the part that unsettles me, mildly, in the way a draft from a window you cannot locate unsettles you. The standard story says AI writes in an averaged, generic register and we flag it for lacking human spark. Maybe. But latest gen frontier models do not read entirely as averaged-down, at least not to me. They do have a register, a specific recognizable quality. I measured generic AI prose against mine once. The machine landed closer to me than any other flesh-and-blood writer I tested. And I cannot tell whether that quality is a tell of this particular moment in model development, or whether it is simply where writing is headed as the corpus grows into it (I will not pretend I find the second possibility comfortable).
If it is the second, we are not detecting AI. We are hearing our own voices from the future and marking it wrong because we still speak the Victorian English of our era. Opus sounds slightly off the way next year's teens always sound slightly off.
The ruler is measuring something. The question is whether it is measuring what it was.
