Can ChatGPT pass the Turing Test yet?

ChatGPT passing the Turing Test feels like an inevitability. In fact, some researchers believe it already has.

May 9, 2025 - 10:17
 0
Can ChatGPT pass the Turing Test yet?
the chatgpt logo on a smartphone screen

Artificial intelligence chatbots like ChatGPT are getting a whole lot smarter, a whole lot more natural, and a whole lot more…human-like. It makes sense — humans are the ones creating the large language models that underpin AI chatbots' systems, after all. But as these tools get better at "reasoning" and mimicking human speech, are they smart enough yet to pass the Turing Test?

For decades, the Turing Test has been held up as a key benchmark in machine intelligence. Now, researchers are actually putting LLMs like ChatGPT to the test. If ChatGPT can pass, the accomplishment would be a major milestone in AI development.

So, can ChatGPT pass the Turing Test? According to some researchers, yes. However, the results aren't entirely definitive. The Turing Test isn't a simple pass/fail, which means the results aren't really black and white. Besides, even if ChatGPT could pass the Turing Test, that may not really tell us how “human” an LLM really is.

Let's break it down.

What is the Turing Test?

The concept of the Turing Test is actually pretty simple.

The test was originally proposed by British mathematician Alan Turing, the father of modern computer science and a hero to nerds around the world. In 1949 or 1950, he proposed the Imitation Game — a test for machine intelligence that has since been named for him. The Turing Test involves a human judge having a conversation with both a human and a machine without knowing which one is which (or who is who, if you believe in AGI). If the judge can't tell which one is the machine and which one is the human, the machine passes the Turing Test. In a research context, the test is performed many times with multiple judges.

Of course, the test can't necessarily determine if a large language model is actually as smart as a human (or smarter) — just if it’s able to pass for a human.

Do LLMs really think like us? 

Large language models, of course, do not have a brain, consciousness, or world model. They're not aware of their own existence. They also lack true opinions or beliefs.

Instead, large language models are trained on massive datasets of information — books, internet articles, documents, transcripts. When text is inputted by a user, the AI model uses its "reasoning" to determine the most likely meaning and intent of the input. Then, the model generates a response.

At the most basic level, LLMs are word prediction engines. Using their vast training data, they calculate probabilities for the first “token” (usually a single word) of the response using their vocabulary. They repeat this process until a complete response is generated. That's an oversimplification, of course, but let's keep it simple: LLMs generate responses to input based on probability and statistics. So, the response of an LLM is based on mathematics, not an actual understanding of the world.

So, no, LLMs don't actually think in any sense of the word.

What do the studies say about ChatGPT and the Turing Test?

person holding smartphone with the openai logo on the screen
Joseph Maldonado / Mashable Composite by Rene Ramos Credit: Mashable

There have been quite a few studies to determine if ChatGPT has passed the Turing test, and many of them have had positive findings. That's why some computer scientists argue that, yes, large language models like GPT-4 and GPT-4.5 can now pass the famous Turing Test. 

Most tests focus on OpenAI's GPT-4 model, the one that's used by most ChatGPT users. Using that model, a study from UC San Diego found that in many cases, human judges were unable to distinguish GPT-4 from a human. In the study, GPT-4 was judged to be a human 54% of the time. However, this still lagged behind actual humans, who were judged to be human 67% of the time.

Then, GPT-4.5 was released, and the UC San Diego researchers performed the study again. This time, the large language model was identified as human 73% of the time, outperforming actual humans. The test also found that Meta’s LLaMa-3.1-405B was able to pass the test.

Other studies outside of UC San Diego have also given GPT passing grades, too. A 2024 University of Reading study of GPT-4 had the model create answers for take-home assessments for undergraduate courses. The test graders weren't told about the experiment, and they only flagged one of 33 entries. ChatGPT received above-average grades with the other 32 entries. 

So, are these studies definitive? Not quite. Some critics (and there are a lot of them) say these research studies aren't as impressive as they seem. That's why we aren't ready to definitively say that ChatGPT passes the Turing Test.

We can say that while previous-gen LLMs like GPT-4 sometimes passed the Turing test, passing grades are becoming more common as LLMs get more advanced. And as cutting-edge models like GPT-4.5 come out, we’re fast headed toward models that can easily pass the Turing Test every time.

OpenAI itself certainly envisions a world in which it's impossible to tell human from AI. That's why OpenAI CEO Sam Altman has invested in a human verification project with an eyeball-scanning machine called The Orb.

What does ChatGPT itself say?

We decided to ask ChatGPT if it could pass the Turing Test, and it told us yes, with the same caveats we've already discussed. When we posed the question, "Can ChatGPT pass the Turing Test?" to the AI chatbot (using the 4o model), it told us, "ChatGPT can pass the Turing Test in some scenarios, but not reliably or universally." The chatbot concluded, "It might pass the Turing Test with an average user under casual conditions, but a determined and thoughtful interrogator could almost always unmask it."

a screenshot from chatgpt showing the response to the prompt 'Can ChatGPT pass the Turing Test'
AI-generated image. Credit: OpenAI

The limitations of the Turing Test

Some computer scientists now believe the Turing test is outdated, and that it's not all that helpful in judging large language models. Gary Marcus, an American psychologist, cognitive scientist, author, and popular AI prognosticator, summed it up best in a recent blog post, where he wrote, “as I (and many others) have said for years, the Turing Test is a test of human gullibility, not a test of intelligence."

It's also worth keeping in mind that the Turing Test is more about the perception of intelligence rather than actual intelligence. That's an important distinction. A model like ChatGPT 4o might be able to pass simply by mimicking human speech. Not only that, but whether or not a large language model passes the test will vary depending on the topic and the tester. ChatGPT could easily ape small talk, but it could struggle with conversations that require true emotional intelligence. Not only that, but modern AI systems are used for much more than chatting, especially as we head toward a world of agentic AI.

None of that is to say that the Turing Test is irrelevant. It's a neat historical benchmark, and it's certainly interesting that large language models are able to pass it. But the Turing Test is hardly the gold-standard benchmark of machine intelligence. What would a better benchmark look like? That's a whole other can of worms that we'll have to save for another story.


Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.