Is AI Already a Bad Actor?

You might be surprised by some of AI’s shenanigans. Did you know that many AI systems already deceive humans, even ones trained to be helpful and honest? An article published in the journal Patterns describes how researchers analyze the risks of deception by AI systems and call for governments to develop strong regulations to address the issue.

“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” says first author Peter S. Park, AI existential safety postdoctoral fellow at MIT. “But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”

Example of premeditated deception from Meta’s CICERO in the game Diplomacy
Patterns/Park Goldstein et al.

Park and his team analyzed literature focused on ways AI systems can spread false information through learned deception. One example of AI deception they uncovered was Meta’s CICERO AI system, designed to play Diplomacy, a world-conquest game that involves building alliances. While Meta claims it trained CICERO to be “largely honest and helpful” and to “never intentionally backstab” its human allies while playing the game, the data the company published along with its Science paper revealed that CICERO isn’t playing fair.

“We found that Meta’s AI had learned to be a master of deception,” says Park. “While Meta succeeded in training its AI to win in the game of Diplomacy—CICERO placed in the top 10% of human players who had played more than one game—Meta failed to train its AI to win honestly.”

Other AI systems also demonstrated the ability to bluff in a poker game against professionals, fake attacks while playing Starcraft to defeat opponents and misrepresent preferences to win.

So, it’s a game. However, if AI systems cheat at games, Park added that it could lead to “breakthroughs in deceptive AI capabilities, ” which may mean more advanced forms of AI deception in the future. For example, in one study, AI organisms in a digital simulator “played dead” to trick a test built to eliminate AI systems that rapidly replicate.

Risks potentially include fraud and tampering with elections, warned Park. Eventually, if these systems can refine this unsettling skill set, humans could lose control of them, he said.

Park and his team are encouraged that policymakers have begun taking the issue seriously through measures such as the EU AI Act and President Biden’s AI Executive Order. The key will be enforcement.

Leave A Reply

Your email address will not be published.