Abstract
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.
Meta Fundamental AI Research Diplomacy Team (FAIR)†, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, et al. 2022. “Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.” Science, November, eade9097. https://doi.org/10.1126/science.ade9097.
Timeline considerations:
This is not particularly unexpected if you believed in the scaling hypothesis. (It should be surprising if you continue to take seriously alternative & still-prestigious centrist paradigms like "we need a dozen paradigm shifts and it'll take until 2050".)
The human range is narrow, so once you reach SBER and GPT-3, you already are most of the way to full-press Diplomacy. The fact that the authors & forecasters thought it would take until 2029 in 2019 (ie 10 years instead of 3 years) is part and parcel of the systematic underestimation of DL, which we have seen elsewhere in eg all the forecasts shattered by inner-monologue techniques - as Eliezer put it, experts can often be the blindest because they miss the Outside View forest for the Inside View trees, having 'slaved over a hot GPU' for so long.
From a scaling perspective, the main surprise of the timing is that Facebook chose to fund this research for this long.
As Diplomacy is unimportant and a pretty niche game even among board games, there's no reason it 'had' to be solved this side of 2030 (when it presumably would become so easy that even a half-hearted effort by a grad student or hobbyist would crack it). Similarly, the main surprise in DeepMind's recent Stratego human-level AI is mostly that 'they bothered'. Keep this in mind if you want to forecast Settlers of Catan progress: it's not hard, and the real question you are forecasting is, 'will anyone bother?' (And, since the tendency of research groups is to bury their failures out back, you also can't forecast using wording like 'conditional on a major effort by a major DL group like FAIR, DM, OA etc' - you won't know if anyone does bother & fails because it's genuinely hard.)
Deception is deceptive: one interesting aspect of the 'honesty' of the bot is that it might show how deception is an emergent property of a whole system, not just the one part. (EDIT: some Twitter discussion)
CICERO may be constrained to be 'honest' in each interaction but it still will betray you if you trust it & move into positions where betraying you is profitable. Is it merely opportunistic, or is it analogous to humans where self-deception makes you more convincing? (You sincerely promise to not steal X, but the temptation of X eventually turns out to be too great...) It is trained end-to-end and is estimating expected value, so even if there is no 'deception module' or 'deception intent' in the planning, the fact that certain statements lead to certain long-term payoffs (via betrayal) may influence its value estimates, smuggling in manipulation & deception. Why did it pick option A instead of equally 'honest' option B? No idea. But option A down the line turns out to 'unexpectedly' yield a betrayal opportunity, which it then takes. The interplay between optimization, model-free, model-based planning, and the underlying models is a subtle one. (Blackbox optimization like evolution could also evolve this sort of de facto deception even when components are constrained to be 'honest' on a turn-by-turn basis.)
I'm not sure of this, because piKL (and the newer variants introduced & used in CICERO) are complex (maybe some causal influence diagrams would be helpful here), but if so, it'd be interesting, and a cautionary example for interpretability & alignment research. Just like 'security' and 'reliability', honesty is a system-level property, not a part-level property, and the composition of many 'honest' components can yield deceptive actions.
From a broader perspective, this result seems to continue to reinforce the observation "maybe humans just aren't that smart".
Here's full-press Diplomacy, a game so hard that they don't even have a meaningful AI baseline to compare to because all prior agents were so bad, which is considered one of the pinnacles of social games, combining both a hard board game with arbitrarily complicated social dynamics mediated through unconstrained natural language; and yet. They use a very small language model, and not even that much compute for the CFR planning, in a ramshackle contraption, and... it works well? Yeah, 5 minute rounds and maybe not the very best players in the world, OK, but come on, we've seen how this story goes, and of course, the first version is always the worst, which means that given more R&D it'll become much more powerful and more efficient in the typical DL experience curve. 'Attacks only get better' / 'sampling can show the presence of knowledge but not the absence' - small LMs are already quite useful.
Gain-of-lambda-function research: yes, this is among the worser things you could be researching, up there with the Codex code evolution & Adept Transformer agents. There are... uh, not many realistic, beneficial applications for this work. No one really needs a Diplomacy AI, and applications to things like ad auctions are tenuous. (Note the amusing wriggling of FB PR when they talk about "a strategy game which requires building trust, negotiating and cooperating with multiple players" - you left out some relevant verbs there...) And as we've seen with biological research, no matter how many times bugs escape research laboratories and literally kill people, the déformation professionnelle will cover it up and justify it. Researchers who scoff at the idea that a website should be able to set a cookie without a bunch of laws regulating it suddenly turn into don't-tread-on-me anarchocapitalists as soon as it comes to any suggestion that their research maybe shouldn't be done.
But this is far from the most blatantly harmful research (hey, they haven't killed anyone yet, so the bar is high), so we shouldn't be too hard on them or personalize it. Let's just treat this as a good example for those who think that researchers collectively have any fire alarm and will self-regulate. (No, they won't, and someone out there is already calculating how many megawatts the Torment Nexus will generate and going "Sweet!" and putting in a proposal for a prototype 'Suffering Swirlie' research programme culminating in creating a Torment Nexus by 2029.)