Another game long believed to be very challenging for artificial intelligence (AI) to conquer has fallen to bots: Stratego.
DeepNash, an AI made by London-based company DeepMind, now matches expert humans at Stratego, a board game requiring long-term strategic thinking against imperfect information.
This latest feat comes in the wake of yet another major win for the AIs in games previously thought to be the forte of humans.
Just last week, Meta’s Cicero, an AI that can outsmart human players at the game of Diplomacy, made history for outsmarting opponents online.
“The rate at which qualitatively different game features have been conquered — or mastered to new levels — by AI in recent years is quite remarkable,” says Michael Wellman at the University of Michigan in Ann Arbor, a computer scientist who studies strategic reasoning and game theory.
“Stratego and Diplomacy are quite different from each other, and also possess challenging features notably different from games for which analogous milestones have been reached,” said Wellman.
Imperfect information
The game has characteristics that are generally much more complicated than chess, Go or poker. Chess, Go and Poker have all been mastered by AIs.
The objective of the game is to move pieces in turns to eliminate those of the opponent and capture a flag.
Stratego’s game tree — a graph of all possible ways the game could possibly go — has 10535 states against Go’s 10360.
When it comes to imperfect information at the beginning of a game, Stratego has1066 possible private positions, a figure that dwarfs only 106 such starting situations in two-player Texas hold’em poker.
“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” says Julien Perolat, a DeepMind researcher based in Paris.
DeepNash was developed by Perolat and his colleagues.
Nash inspired bot
The bot’s name is a tribute to the famous US mathematician John Nash, who came up with the Nash equilibrium theory that supposes that there are a “stable set of strategies” that can be followed by players in a manner that no player benefits by changing strategy on their own. As such, games tend to have zero, one or many Nash equilibria.
DeepNash combines reinforcement-learning algorithm and a deep neural network to find a Nash equilibrium.
Generally, reinforcement learning is where an intelligent agent (computer program) interacts with the environment and learns the best policy to dictate action for every state of a game.
In order to have an optimal policy, DeepNash played a total 5.5 billion games against itself.
In essence, if one side gets penalised, the other is rewarded, and the variables of the neural network — which represent the policy — are tweaked accordingly.
At some stage, DeepNash converges on an approximate Nash equilibrium. Unlike other Bots, DeepNash optimises itself without searching through the game tree.
For a duration of two weeks, DeepNash played against human Stratego players on online games platform, Gravon.
After competing in 50 matches, the Ai was ranked third among all Gravon Stratego players since 2002.
“Our work shows that such a complex game as Stratego, involving imperfect information, does not require search techniques to solve it,” says team member Karl Tuyls, a DeepMind researcher based in Paris. “This is a really big step forward in AI.”
Other researchers are impressed as well by this feat.
Impressive results
“The results are impressive,” agrees Noam Brown, a researcher at Meta AI, headquartered in New York City, and a member of the team that in 2019 reported the poker-playing AI Pluribus4.
At Meta, the parent company of Facebook, Brown and her colleagues built an AI that can play Diplomacy, a game where seven players compete for geographic control of Europe by moving pieces around on a map.
In Diplomacy, the goal is to take control of supply centres by moving units (fleets and armies).
Meta says Cicero is quite significant because the AI relies on non-adversarial environments.
Unlike in the past where prior major successes for multi-agent AI have been in purely adversarial environments, such as Chess, Go, and Poker, where communication has no value, Cicero employs a strategic reasoning engine and controllable dialogue module.
“When you go beyond two-player zero-sum games, the idea of Nash equilibrium is no longer that useful for playing well with humans,” says Brown.
Brown and her team trained Cicero using data from 125,261 games of an online version of Diplomacy involving human players.
Using self-play data and a strategic reasoning module (SRM),Cicero learnt to predict judgubg by the state of the game and the accumulated messages, the likely moves and policies of the other players.
Meta says it collected data from 125,261 games of Diplomacy played online at webDiplomacy.net. Of these games, a total 40,408 games contained dialogue, with a total of 12,901,662 messages exchanged between players.
Real-world behaviour
Brown believes game-playing Bots like Cicero can interact with humans and account for “suboptimal or even irrational human actions could pave the way for real-world applications.”
“If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” he says.
Cicero, he adds, is a big step in this direction. “We still have one foot in the game world, but now we have one foot in the real world as well.”
Others such as Wellman agree, but insist more work still needs to done. “Many of these techniques are indeed relevant beyond recreational games” to real-world applications, he says. “Nevertheless, at some point, the leading AI research labs need to get beyond recreational settings, and figure out how to measure scientific progress on the squishier real-world ‘games’ that we actually care about.”
/MetaNews.