All Hail Cicero, the Conqueror – AI Beats Humans in Diplomacy

November 28, 2022

Meta Platforms Inc, the parent company of Facebook, said it has created an AI that can outsmart humans in an online version of the popular strategy game, Diplomacy, where seven players compete for geographic control of Europe by moving pieces around on a map.

In a paper published on Science.com, Meta said Cicero, was the first AI agent to achieve human-level performance in Diplomacy, a game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players.

In a total of 40 anonymous games of online Diplomacy, Meta said Cicero had achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

The leading technology group said this was part of its strategic and long-term goal in the field of artificial intelligence to build agents that can plan, coordinate, and negotiate with humans in natural language.

How important is Cicero?

Meta says Cicero is quite significant because the AI relies on non-adversarial environments.

Unlike in the past where prior major successes for multi-agent AI have been in purely adversarial environments, such as Chess, Go, and Poker, where communication has no value, Cicero employs a strategic reasoning engine and controllable dialogue module.

For these reasons, meta says Diplomacy has served as a challenging benchmark for multi-agent learning.

“Cicero couples a controllable dialogue module with a strategic reasoning engine. At each point in the game, Cicero models how the other players are likely to act based on the game state and their conversations,” Meta says.

The AI then plans how the players can coordinate to their mutual benefit and maps these plans into natural language messages.

Healthy mistrust

Cicero avoids blindly trusting proposals from other players and rejects plans that have low “predicted value” and that run parallel to its own interests.

Owing to the fact that dialogue in Diplomacy occurs privately between pairs of players, Cicero reasons and analyses the information players have access to when making predictions.

“For example, if Cicero is coordinating an attack with an ally against an adversary, Cicero’s prediction of the adversary’s policy must account for the fact that the adversary is not aware of the intended coordination,”said Meta.

Meta says it entered Cicero anonymously in 40 games of Diplomacy in an online league of human players between August 19th and October 13th, 2022.

In the course of 72 hours of play that involved sending 5,277 messages, Cicero ranked in the top 10% of participants who played more than one game, it said.

Meta says it collected data from 125,261 games of Diplomacy played online at webDiplomacy.net. Of these games, a total 40,408 games contained dialogue, with a total of 12,901,662 messages exchanged between players.

Prompt: "Robot beating everybody else in a game of Diplomacy" — Prompt: “Robot beating everybody else in a game of Diplomacy” (AI-generated).

Meta notes, its new AI is far from perfect

Cicero sent messages that contained errors, sometimes contradicted its own plans and made strategic blunders.

But Meta insists that humans nonetheless chose to collaborate with the AI over other players without realising it was a Bot.

“Almost all prior AI breakthroughs in games have been in two-player zero-sum (2p0s) settings, including chess, Go, heads-up poker, and StarCraft. In finite 2p0s games, certain reinforcement learning (RL) algorithms that learn by playing against themselves—a process known as self-play—will converge to a policy that is unbeatable in expectation in balanced games,” Meta added in the paper. “In other words, any finite 2p0s game can be solved via self-play with sufficient compute and model capacity.”

However, Meta said regarding games involving cooperation, self-play without human data is no longer guaranteed to find a policy that performs well with humans, even with infinite compute and model capacity, because the self-play agent may converge to a policy that is incompatible with human norms and expectations.

Cicero anticipates likely actions

Meta added that Cicero anticipates likely actions for each player based on the state of the board and dialogue, using that as the starting point for a planning algorithm using RL-trained models.

The AI uses a strategic reasoning module to intelligently select intents and actions, the company says.

This module then runs a planning algorithm that predicts the policies of all other players based on the game state and dialogue and accounts for both the strength of different actions and their likelihood in human games. Based on this information and variables, the best optimal action for Cicero is taken.

Under Meta’s founder and CEO Mark Zuckerberg, the company has been heavily investing in AI and the metaverse to take advantage of the fast-growing industry seen as the future of technology.

/MetaNews