Abstract
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.
Meta Fundamental AI Research Diplomacy Team (FAIR)†, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, et al. 2022. “Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.” Science, November, eade9097. https://doi.org/10.1126/science.ade9097.
Related to this, from the blog post What does Meta AI’s Diplomacy-winning Cicero Mean for AI?:
I am originally a CS researcher trained several decades ago, actually in the middle of an AI winter. That might explain our different viewpoints here. I also have a background in industrial research and applied AI, which has given me a lot of insight into the vast array of problems that academic research refuses to solve for you. More long-form thoughts about this are in my Demanding and Designing Aligned Cognitive Architectures.
From where I am standing, the scaling hype is wasting a lot of the minds of the younger generation, wasting their minds on the problem of improving ML benchmark scores under the unrealistic assumption that ML will have infinite clean training data. This situation does not fill me with as much existential dread as it does some other people on this forum, but anyway.