When it comes to evaluating AI-generated text, perplexity has been a widely used metric. For those unfamiliar, perplexity is a measure of how well a language model predicts a given sequence of words. Lower perplexity means the text is likely according to the language model. You can think of it as the text ‘flows well’.
The theory behind perplexity-based detection is that if the text has unusually low perplexity, it must have been generated by a language model. By definition a language model will generate low-perplexity text (as evaluated by itself).
Seems pretty clever, except that, if the language model is any good, it will assign well-flowing text a low perplexity as well. So, if you’re a good writer, these detection methods may flag your writing saying, essentially ‘this flows too well for a human to have written it’.
As teachers begin to clamp down on AI-based cheating, this method will also flag the best works from the best students as being potentially AI-generated, punishing students who have learned to write well.
There have been discussions for the bigger players (like OpenAI) to put codes into the text generated by their models (maybe embedding unusual word choices or other means that wouldn’t be easy to detect for the user) but it’s unlikely that other open-source models would follow suit.
I think the solution is actually to re-think how we evaluate writing mastery in the first place. It will be hard, but it might be the only sensible path forward.
(image by Stable Diffusion)