Abstract
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.
Meta Fundamental AI Research Diplomacy Team (FAIR)†, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, et al. 2022. “Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.” Science, November, eade9097. https://doi.org/10.1126/science.ade9097.
Cicero is designed to be honest in the sense that all its messages are generated from its intents, where its intents are what moves Cicero in fact intends to play at the moment Cicero said them (Cicero can change its mind after saying things), and at the end of the turn played moves are equal to its last intents.
Not only Cicero uses its true intents to generate messages, it also tries to generate messages that correspond to intents. That is, its dialogue model is trained to imitate humans in WebDiplomacy, but when humans intend to attack Belgium, they will sometimes say things like "I won't attack Belgium". That is, AI can lie by forming intent to attack Belgium, and devising lying intent "won't attack Belgium", and generating lying messages from lying intent. Cicero doesn't do this, its intent input to dialogue model is always truthful. AI can also lie by forming intent to attack Belgium, and generating lying messages like "I won't attack Belgium" by imitating lying humans from truthful intent. Cicero also doesn't do this! Dialogue model is trained to imitate only truthful humans, training data is filtered by the lie detector and 5% of turns are filtered.
That does not mean Cicero does not dissemble or mislead! There are three aspects to this. First, there is messaging model, entirely separate from dialogue model. Messaging model decides whether to send messages at all, trained to imitate humans. When humans intend to attack Belgium, held by France, they may not message France at all. Cicero copies this behavior.
Second, there is topic model, also entirely separate. Topic model decides which intent to talk about, trained to imitate humans. When humans intend to attack Belgium, held by France, and also Norway, held by Russia, they may talk to France about Norway and Russia about Belgium. Cicero also copies this behavior.
Third, there is filtering model, also entirely separate. When Cicero intends to attack Belgium, held by France, maybe messaging model decides to talk to France and topic model decides to talk to France about Belgium and dialogue model decides to say "I will attack Belgium". That does not mean Cicero says "I will attack Belgium", filtering model can veto it. In particular, value-based filtering model estimates how saying something will impact its own utility. Eight messages are sampled from dialogue model, their value impacts are calculated, importance is calculated from value impacts, and in 15% of most important situations, bottom three messages are dropped, and one message is picked at random.