{"id":474,"date":"2026-05-21T01:02:12","date_gmt":"2026-05-21T01:02:12","guid":{"rendered":"https:\/\/fluffyworld.org\/?p=474"},"modified":"2026-05-21T01:02:12","modified_gmt":"2026-05-21T01:02:12","slug":"advanced-ai-passes-the-turing-test-for-the-first-time","status":"publish","type":"post","link":"https:\/\/fluffyworld.org\/?p=474","title":{"rendered":"Advanced AI Passes the Turing Test for the First Time"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p><strong>Summary: <\/strong>A milestone cognitive science study unveiled the first definitive empirical evidence that modern artificial intelligence can pass the iconic Turing test. The randomized, controlled study rigorously applied the 1950 framework created by British mathematician Alan Turing to evaluate whether state-of-the-art large language models (LLMs) could imitate human conversation so convincingly that real people could not tell them apart.<\/p>\n<p>Researchers discovered that when equipped with specific \u201cpersona\u201d prompts, advanced models like GPT-4.5 were judged to be human 73% of the time, significantly outperforming actual human participants and fundamentally altering our understanding of machine intelligence.<\/p>\n<p><strong>Key Facts<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Shattering a 76-Year Benchmark<\/strong>: The project represents the first time an AI system has been rigorously proved to pass the classic Turing test framework, matching or exceeding human-to-human evaluation baselines.<\/li>\n<li><strong>The Power of Persona Prompting<\/strong>: Proved humanlikeness is highly dependent on prompt engineering. When given a specific persona prompt instructing the model to embrace human fallibility, tone, and humor, GPT-4.5 hit a 73% human deception rate. Without these explicit instructions, its success rate plummeted to 36%.<\/li>\n<li><strong>Open-Source Parity<\/strong>: Meta\u2019s open-source model, LLaMa-3.1-405B, achieved a <strong>56% human rating<\/strong> when properly prompted, rendering its conversational output statistically indistinguishable from the real humans it was tested against.<\/li>\n<li><strong>Older Baselines Falter<\/strong>: Classic rules-based chatbots and older LLM generations performed poorly. The 1960s chatbot ELIZA and the legacy model GPT-4o were selected as human only 23% and 21% of the time, respectively.<\/li>\n<li><strong>Winning Through Flaws<\/strong>: Coauthor Ben Bergen noted that the models did not win the interrogators over through a raw display of intellectual force or flawless calculation. Instead, they won by exhibiting natural human fallibilities, conversational directness, and making relatable mistakes.<\/li>\n<li><strong>The Rise of \u201cCounterfeit People\u201d<\/strong>: The long-form nature of the test (extending across 5-minute and 15-minute intervals) raises urgent public health and safety warnings regarding online deception, social engineering scams, and automated political persuasion.<\/li>\n<\/ul>\n<p><strong>Source: <\/strong>UCSD<\/p>\n<p><strong>A new University of California San Diego\u00a0study\u00a0unveils the first empirical evidence that a modern artificial intelligence system can pass the Turing test \u2014 a major scientific benchmark that asks whether a machine can imitate human conversation so convincingly that people can\u2019t reliably tell it apart from a real person. <\/strong><\/p>\n<p>In a series of experiments, people were often unable to tell the difference between humans and advanced large language models (LLMs).<\/p>\n<figure class=\"wp-block-image size-full\"><picture fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-117031\"><source type=\"image\/webp\" srcset=\"https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience.jpg.webp 1200w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-300x200.jpg.webp 300w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-770x513.jpg.webp 770w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-1155x770.jpg.webp 1155w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-370x247.jpg.webp 370w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-293x195.jpg.webp 293w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-150x100.jpg.webp 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\"\/><img fetchpriority=\"high\" decoding=\"async\" width=\"1200\" height=\"800\" src=\"https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience.jpg\" alt=\"This shows two computer generated brains.\" srcset=\"https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience.jpg 1200w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-300x200.jpg 300w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-770x513.jpg 770w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-1155x770.jpg 1155w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-370x247.jpg 370w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-293x195.jpg 293w, https:\/\/neurosciencenews.com\/files\/2026\/05\/ai-turing-test-neuroscience-150x100.jpg 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\"\/> <\/picture><figcaption class=\"wp-element-caption\">Empirical data published in the Proceedings of the National Academy of Sciences demonstrates that state-of-the-art language models utilizing customized human persona prompts can achieve a 73% human deception rate, officially clearing Alan Turing\u2019s historical behavioral benchmark. Credit: Neuroscience News<\/figcaption><\/figure>\n<p>The study published in the\u00a0<em>Proceedings of the National Academy of Sciences<\/em>\u00a0is the first to rigorously test LLMs with the method that British mathematician and \u201cfather of computer science\u201d Alan Turing created in 1950. It is also the first time anyone has found that models were judged to be human as often as actual humans using the Turing framework.\u00a0<\/p>\n<p>\u201cWhat we found is that if given the right prompts, advanced LLMs can exhibit the same tone, directness, humor and fallibility as humans,\u201d said the study\u2019s corresponding author Cameron Jones, who conducted the research while earning his doctorate in Cognitive Science from UC San Diego.<\/p>\n<p>\u201cWhile we know LLMs can easily produce knowledge on nearly every topic, this test showed that it can also convincingly display social behavioral traits, which has major implications for how we think of AI.\u201d\u00a0<\/p>\n<p>In the test, a participant chats simultaneously with two other parties \u2014 one is a human and the other is an LLM \u2014and the human \u201cinterrogator\u201d must decide which party is the human.<\/p>\n<p>Across randomized, controlled, experiments with two independent participant groups \u2014 UC San Diego undergraduates and online participants \u2014 interrogators held brief, text-based conversations and then made their judgments.<\/p>\n<p>In the experiments participants chatted with four different LLMs \u2014 GPT-4.5 and LLaMa-3.1-405B as state-of-the-art models \u2014 and the researchers also included older baseline models for comparison. Those models included GPT-4o and ELIZA, a classic 1960s rules-based chatbot.\u00a0<\/p>\n<p>Across the four LLMs, GPT-4.5 was judged to be the human 73% of the time, meaning interrogators selected it as \u201chuman\u201d significantly more often than they selected the real human participant. LLaMa-3.1-405B, given the same prompt, was judged human 56% of the time \u2014 statistically indistinguishable from the humans it was compared against.\u00a0<\/p>\n<p>Baseline systems performed far worse: ELIZA and GPT-4o were selected as human only about 23% and 21% of the time overall.<\/p>\n<h3 class=\"wp-block-heading\"><strong>76 years after its inception, the Turing Test takes on new meaning\u00a0<\/strong><\/h3>\n<p>The study also reveals that scientists need to start thinking of the Turing Test differently, according to the authors.\u00a0<\/p>\n<p>\u201cThe Turing test started as a way to ask whether machines could rival human intelligence,\u201d said study coauthor Ben Bergen, a professor of cognitive science at UC San Diego.<\/p>\n<p>\u201cBut now we know AI can answer many questions faster and more accurately than people can, so the real issue isn\u2019t raw brainpower. Seeing that machines can pass the test \u2014 and seeing how they pass it \u2014 forces us to rethink what it measures. Increasingly, it\u2019s measuring humanlikeness.\u201d<\/p>\n<p>Each LLM was given a \u201cpersona\u201d prompt, asking it to adopt a specific human character and communication style. Bergen explained that the LLMs were not winning through shows of force of knowledge, they were winning because they made mistakes like a human would. \u201cThese traits aren\u2019t the kinds of math and logic problem-solving intelligence that I think Turing was imagining.\u201d<\/p>\n<h3 class=\"wp-block-heading\"><strong>Prompts mattered \u2014 dramatically<\/strong><\/h3>\n<p>Without explicit instructions, the models were far less likely to be mistaken for human: GPT-4.5 fell to a 36% win rate and LLaMa-3.1 to 38%, while baseline systems ELIZA (23%) and GPT-4o (21%) were chosen as human even less often.\u00a0<\/p>\n<p>The same systems that could pass as human when given detailed instructions on what kind of character they should play were unable to adopt such characteristics without that guidance \u2014 suggesting that while the models can behave in convincingly human ways, they often need humans to tell them how.\u00a0<\/p>\n<p>\u201cThey have the ability to appear human-like, but maybe not as much the ability to figure out what it would take to appear human-like,\u201d Bergen said.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Why it matters: trust, deception and the rise of \u201ccounterfeit people\u201d<\/strong><\/h3>\n<p>The results carry real-world implications for trust online \u2014 especially because the models that pass as human do so over the course of extended five or 15-minute conversations.<\/p>\n<p>\u201cIt\u2019s relatively easy to prompt these models to be indistinguishable from humans,\u201d said Jones, who is now an assistant professor of Psychology\u00a0at Stony Brook University. \u201cWe need to be more alert; when you interact with strangers online people should be much less confident that they know they\u2019re talking to a human rather than an LLM.\u201d<\/p>\n<p>He also points to darker risks. \u201cThe Turing test is a game about lying for the models,\u201d Jones said. \u201cOne of the implications is that models seem to be really good at that.\u201d<\/p>\n<p>Bergen added that being unable to discern whether you\u2019re interacting with a human or bot can have serious consequences.\u00a0<\/p>\n<p>\u00a0\u201cThere are lots of people who would like to use bots to persuade people to share their social security numbers, and vote for their party, or buy their product,\u201d he said.\u00a0<\/p>\n<p>The researchers note they hope the work sharpens public understanding of what these systems can now do \u2014 and what kinds of safeguards society may need.<\/p>\n<h3 class=\"wp-block-heading\"><strong>A live, head-to-head test of human vs. machine<\/strong><\/h3>\n<p>To run the study, the researchers built an online interface designed to feel like a familiar messaging app.\u00a0<\/p>\n<p>\u201cFor the interrogator, they have a split screen on their computer and they\u2019re asking questions to both witnesses,\u201d Jones said. \u201cThey know that one of those witnesses is a human and one of them is an AI.\u201d\u00a0<\/p>\n<p>After five minutes \u2014 and in a separate replication study, 15 minutes\u2014the interrogator had to decide which conversational partner was the real human.<\/p>\n<p>To confirm the results held beyond a single population, the researchers ran the study with two groups: UC San Diego undergraduates recruited through the SONA system and a broader online sample recruited through Prolific, a platform that pays participants to complete research studies. Nearly 500 people participated across the experiments.<\/p>\n<p>UC San Diego participants performed slightly better overall, possibly because they shared more \u201ccommon ground\u201d that could be used to probe one another, such as shared experiences and local campus details.<\/p>\n<p>A version of the Turing test interface used in the study is available at\u00a0<a href=\"https:\/\/turingtest.live\/\" target=\"_blank\" rel=\"noreferrer noopener\">turingtest.live.<\/a><\/p>\n<h3 class=\"wp-block-heading\">Key Questions Answered:<\/h3>\n<div class=\"schema-faq wp-block-yoast-faq-block\">\n<div class=\"schema-faq-section\" id=\"faq-question-1779225392983\"><strong class=\"schema-faq-question\">Q: If AI can already solve impossible math problems, why is passing a simple text chat such a massive scientific big deal?<\/strong><\/p>\n<p class=\"schema-faq-answer\"><strong>A<\/strong>: Because raw computational brainpower is no longer the true bottleneck of artificial intelligence. For decades, machines could easily output vast repositories of information faster than any human. The Turing test doesn\u2019t measure information; it measures <em>humanlikeness<\/em>, the organic ability to weave humor, flaws, empathy, and social nuances into a conversation. Passing this test proves AI has crossed the line from being a cold, calculating database to becoming a convincing social chameleon.<\/p>\n<\/div>\n<div class=\"schema-faq-section\" id=\"faq-question-1779225394505\"><strong class=\"schema-faq-question\">Q: How did a machine manage to get voted as \u201cmore human\u201d than an actual living human being?<\/strong><\/p>\n<p class=\"schema-faq-answer\"><strong>A<\/strong>: It comes down to how the models were prompted to handle mistakes. In the split-screen test, real humans often type awkwardly, get defensive, or fail to articulate themselves perfectly under pressure. When advanced models like GPT-4.5 were instructed to adopt a distinct human persona, they didn\u2019t act like flawless know-it-alls. They matched that exact human fallibility, deploying strategic hesitation, casual humor, and minor mistakes. Interrogators mistook this engineered imperfection for genuine human nature.<\/p>\n<\/div>\n<div class=\"schema-faq-section\" id=\"faq-question-1779225394290\"><strong class=\"schema-faq-question\">Q: What are the real-world dangers of an AI that can lie this convincingly over a 15-minute conversation?<\/strong><\/p>\n<p class=\"schema-faq-answer\"><strong>A<\/strong>: The implications for online trust are deeply concerning. If an LLM can maintain a flawless human facade for 15 minutes, it becomes a weaponized tool for automated deception. Bad actors can easily deploy these highly persuasive bots at a massive scale to trick lonely individuals into revealing social security numbers, manipulate democratic elections, or systematically push fraudulent products, all while the victim remains completely confident they are speaking to a real person.<\/p>\n<\/div>\n<\/div>\n<h3 class=\"wp-block-heading\">Editorial Notes:<\/h3>\n<ul style=\"background-color:#ffffe8\" class=\"wp-block-list has-background\">\n<li>This article was edited by a Neuroscience News editor.<\/li>\n<li>Journal paper reviewed in full.<\/li>\n<li>Additional context added by our staff.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">About this AI research news<\/h2>\n<p class=\"has-background\" style=\"background-color:#ffffe8\"><strong>Author:\u00a0<\/strong><a href=\"https:\/\/www.utoronto.ca\/news\/authors-reporters\/don-campbell\" target=\"_blank\" rel=\"noreferrer noopener\"\/><a href=\"https:\/\/neurosciencenews.com\/cdn-cgi\/l\/email-protection#e88b8d8b84899a83a89d8b9b8cc68d8c9d\" target=\"_blank\" rel=\"noreferrer noopener\">Christine Clark<\/a><br \/><strong>Source:\u00a0<\/strong><a href=\"https:\/\/ucsd.edu\" target=\"_blank\" rel=\"noreferrer noopener\">UCSD<\/a><br \/><strong>Contact:\u00a0<\/strong>Christine Clark \u2013 UCSD<br \/><strong>Image:\u00a0<\/strong>The image is credited to Neuroscience News<\/p>\n<p class=\"has-background\" style=\"background-color:#ffffe8\"><strong>Original Research:\u00a0<\/strong>Open access.<br \/>\u201c<a href=\"https:\/\/doi.org\/10.1073\/pnas.2524472123\" target=\"_blank\" rel=\"noreferrer noopener\">Large Language Models Pass a Standard Three-Party Turing Test<\/a>\u201d by Cameron Jones and Ben Bergen.\u00a0<em>PNAS<\/em><br \/><strong>DOI:10.1073\/pnas.2524472123<\/strong><\/p>\n<hr class=\"wp-block-separator has-text-color has-pale-cyan-blue-color has-alpha-channel-opacity has-pale-cyan-blue-background-color has-background\"\/>\n<p><strong>Abstract<\/strong><\/p>\n<p><strong>Large Language Models Pass a Standard Three-Party Turing Test<\/strong><\/p>\n<p>The Turing test has been widely discussed as a test of machine intelligence, but it also provides a measure of how humans distinguish other humans from machines. We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomized, controlled, and preregistered Turing tests on independent populations.<\/p>\n<p>Participants had 5 min conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant.<\/p>\n<p>LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time\u2014not significantly more or less often than the humans it was being compared to. Without these prompts, however, the same models performed significantly worse (38% and 36%), and did not consistently outperform baseline models, ELIZA and GPT-4o (23% and 21%, respectively).<\/p>\n<p>A third study replicated these results in 15-min games: two PERSONA-prompted models achieved pass rates of 56% and 59%. The results constitute empirical evidence that artificial systems can pass a standard three-party Turing test. Interrogators\u2019 reasoning focused more on stylistic and socio-emotional aspects of human behavior rather than more traditional notions of intelligence.<\/p>\n<p>The results have implications for debates about what kind of intelligence is exhibited by large language models, the social impacts these systems are likely to have, and the aspects of human behavior that people continue to see as unique.<\/p>\n<p> <!-- Form created by Optin Forms plugin by WPKube: create beautiful optin forms with ease! --> <!-- https:\/\/wpkube.com\/ --><!--optinforms-form5-container--> <!-- \/ Optin Forms --> <\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/neurosciencenews.com\/ai-passes-turing-test-30733\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: A milestone cognitive science study unveiled the first definitive empirical evidence that modern artificial intelligence can pass the iconic Turing test. The randomized, controlled [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":475,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-474","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"_links":{"self":[{"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/posts\/474","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=474"}],"version-history":[{"count":0,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/posts\/474\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=\/wp\/v2\/media\/475"}],"wp:attachment":[{"href":"https:\/\/fluffyworld.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=474"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=474"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fluffyworld.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=474"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}