OpenAI’s GPT-4.5 model has definitively beaten the Turing test, after it was found to be human 73% of the instances where it was prompted to adopt a human-like persona. The Turing test, named after British computer scientist Alan Turing in 1950, measures a machine’s ability to exhibit human-like intelligence in conversation with a human evaluator.
The latest test by scholars at the University of California at San Diego found that GPT-4.5 fooled humans into thinking that the AI model was a person during text-based exchanges — more often than actual humans could convince others they were a person.
The achievement, “Large Language Models Pass the Turing Test”, is awaiting peer review.
Am I human?
The experiment involved a three-way test conducted on an online platform. Nearly 300 student participants were randomly assigned to either be a judge or one of two “witnesses,” with the other witness being a chatbot. The two witnesses had to convince the human judge that they were human based on text messages they both sent. The judge then had to decide which one was which.
Three other AI programs were also tested:
- Meta’s LLaMa 3.1 405b, which was judged to be human 56% of the time.
- ELIZA, a very early chatbot from the 1960s, which was judged to be human 23% of the time.
- GPT-4o, OpenAI’s previous model, GPT-4o, which was judged to be human 21% of the time.
“People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt),” concluded Cameron Jones, a researcher at UC San Diego’s Language and Cognition Lab, in a post on X about the work. “And 4.5 was even judged to be human significantly more often than actual humans!”
What are other AI experts saying about this research?
Some researchers do not believe this means the model has met or surpassed human capabilities and can actually think, a concept known as artificial general intelligence or AGI.
In the journal Science, AI scholar Melanie Mitchell, a professor at the Santa Fe Institute in Santa Fe, New Mexico, wrote that the Turing test is less a measure of true intelligence and more a reflection of human assumptions. Despite an AI performing well on a test, “the ability to sound fluent in natural language, like playing chess, is not conclusive proof of general intelligence,” wrote Mitchell.
She also cited a 2024 press release from Stanford University touting a Stanford team’s research on the earlier GPT 4 model as marking “one of the first times an artificial intelligence source has passed a rigorous Turing test.” The team’s “so-called Turing Test consisted of comparing statistics of how GPT-4’s behavior on psychological surveys and interactive games compared with those of humans,” Mitchell noted.
But the team’s formulation, she added, “might not be recognizable to Turing.”
See these photos about the life of Alan Turing on our sister site TechRepublic.