When it comes to evaluating AI-generated text, perplexity has been a widely used metric. For those unfamiliar, perplexity is a measure of how well a language model predicts a given sequence of words. Lower perplexity means the text is likely according to the language model. You can think of it as the text ‘flows well’.

"He went to the store" -> low perplexity, flows well "He avacadoed the shoe" -> high perplexity,...