Creepy Exposé as an AI Model Seems to Know When Humans Test It

March 6, 2024

In a weird and creepy revelation, developers of the recently released ChatGPT rival, Claude 3 Opus, have disclosed that the AI tool appears to know when humans put it to the test.

Claude 3 Opus is the latest offering from the Google-backed startup Anthropic AI, which they also claim to be more powerful than rival OpenAI’s GPT-4.

Startling revelations

Outside its capabilities, the developers have made weird revelations that point to a new level of awareness or consciousness by an AI-powered chatbot.

Anthropic engineer Alex Albert highlighted in a post on the X platform how the AI model seemed to know it was under evaluation and scrutiny.

“Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval,” wrote Albert on his post.

“When we ran this test on Opus, we noticed some interesting behavior—it seemed to suspect that we were running an eval on it,” he added.

Albert explained that in order to evaluate chatbots’ capabilities, developers run what is known as the “needle-in-a-haystack” evaluation. This test entails asking the software “about a longer text into which an unrelated sentence has been artificially inserted.”

According to the developers, this is to see how well the software can identify the relevance of the information in its context.

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… pic.twitter.com/m7wWhhu6Fg

— Alex (@alexalbert__) March 4, 2024

Running the tests

As part of the test, the new AI model, Claude 3 Opus, examined a collection of technical texts and noted some disjointed sentences about an international pizza association recognizing figs, prosciutto ham, and goat’s cheese as the best toppings.

However, according to the developers, the AI model did not only note that the sentence did not fit with the rest of the text, which was mainly about programming languages and startups, as it appeared conscious of being tested by humans.

“I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all,” reportedly said the AI model.

This raised suspicion about the AI model’s level of consciousness.

“Opus not only found the needle; it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities,” said Albert.

“This level of meta-awareness was very cool to see, but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models’ true capabilities and limitations,” explained the developer.

Also read: Smartphones Powered by AI Can Diagnose Depression

Terrifying development

Responding to Albert’s post on the X platform, AI researcher Margaret Mitchell said this could be “terrifying.”

“That’s fairly terrifying, no? The ability to determine whether a human is manipulating it to do something foreseeably can lead to making decisions to obey or not,” she said.

Other AI enthusiasts described it as “wild,” while Geoffrey Miller highlighted the “fine line between ‘fun story’ and ‘existentially terrifying horror show.’”

According to the company, with the continued sophistication of AI, “the needle-in-a-haystack approach of testing the software with AI-constructed tasks could ultimately not be a reliable means of assessing its true capability.”

The Rolls-Royce of AI models

With backing from Google and Amazon, Anthropic exerts direct competition on ChatGPT maker OpenAI.

Meanwhile, management at Anthropic has hyped Claude 3 Opus, saying it outperforms its rivals GPT-4 and Gemini on various benchmark exams.

“This is the Rolls-Royce of models, at least at this point in time,” CEO Dario Amodei said in an interview.

Daniela Amodei, Anthropic’s president, is upbeat that despite a relatively higher price tag, people will still choose Claude 3 Opus whenever they need to handle complex tasks like complicated financial analysis.

According to Reuters, Anthropic indicated that the Claude 3 Opus comes with a $15 price tag to take every 1 million pieces of data, which are known as tokens, “and at least five times less for its smaller models to handle the same.”

OpenAI is charging $10 for every million tokens.