The Reverse Turing Test: Can AI Tell When We're Lying?
"AI excels at language manipulation, but we are fooled in thinking that they are smart."— Yann LeCun
I personally believe that you can't say AI is smart until it recognises when someone is lying. Our ability to detect lies is based on millions of years of evolved heuristics. We may think it's something simple intuitively, but it's actually one of the most complex problem-solving capabilities that humans possess. And AI cannot tell whether someone is lying to it or not.
Even the most advanced AI in the world, you could lie to them, and they wouldn't 'know'. They'll simply go along with whatever you tell them. You could say, for example, that you're on trial with Chelsea FC and that you won £1 million from a £5 bet while drunk, and it will believe you. These systems might even reply, "Wow, that's impressive; I wish I was as lucky as you!" At worst, it might take two minutes of casual conversation to get AI to 'believe' that as fact. And unless it's explicitly trained on that specific scenario being impossible (which it probably isn't), it will continue the conversation as though that were reality. Humans, on the other hand, would not believe such statement— we would be instantly sceptical. This is because we operate with an internal uncertainty model of other minds. We treat others as potentially manipulative, biased, sarcastic, confused, or even malicious. AI doesn't. It treats language statistically, not socially. And that's why it fails at lie detection.
There is likely an almost infinite number of possible configurations of belief systems—a kind of belief-space combinatorics. By this, I'm referring to the sheer number of things people could possibly, both theoretically and practically, believe in. People underestimate evolution—specifically, the role of emotion, emotional heuristics, and how they allow us to form quick judgments in a practically infinite state space.
We've evolved to deal with uncertainty far more deeply than most realise. Emotions kept our ancestors alive when they didn't have time to simulate every branch of a decision tree. Emotion, in this sense, is a kind of computational shortcut in an intractable world. Humans evolved fast heuristics to prune the infinite space of possible beliefs down to something actionable—not always rational, but evolutionarily useful. AI doesn't have that level of uncertainty handling. It doesn't form quick, emotionally grounded judgments on what's false, what's true, what's plausible, and what's not. It lacks the embodied emotional priors that we rely on in human reasoning.
On a deeper level, AI operates within its own framework—its own meaning structures. Humans don't fully understand how monkeys communicate. They have their own internal meaning structures. A monkey can't tell whether a human is lying, because it doesn't share the same meaning structure— the concept of lying doesn't mean anything to it because it would need to share similarities with our own meaning structures to do so. With AI, people argue it's different because it uses our language and performs tasks within human-designed systems. But that's surface-level mimicry. You can slap on an octopus suit, but doing so doesn't give you the internal meaning structure of an octopus. You still operate within a human frame. Similarly, AI may appear to communicate in our terms, but its internal processing is fundamentally different. AI is built on discrete logic gates and binary transistors. It's not grounded in the biological stochasticity that defines living organisms—especially humans. Its internal representations, its very sense of "meaning," are structured differently. That's why you can lie to an AI, and it won't know. It doesn't share the same structure of interpretation, even if we try to engineer it to. This is the core illusion of anthropomorphic design in AI. We assume it understands us because it responds in human-like ways. But AI likely understands within its own alien frame of reference. The danger is that we treat its outputs as if they reflect human-like understanding. But its "understanding" is not interoperable with ours—it mimics surface behaviour while lacking the internal mechanisms that make human understanding epistemically trustworthy.
Perhaps, true intelligence is the ability to transcend meaning structures—to generalise across all practically possible meaning structures. An octopus, for instance, may not seem extremely intelligent from our perspective because it doesn't understand ours. So perhaps for something to be considered superintelligent, it would not only have its own primary meaning structure, but also the ability to generalise across multiple, radically different ones.