Discussion article for the meetup : Talk by Eliezer Yudkowsky: Recursion in rational agents: Foundations for self-modifying AI
On October 17th from 4:00-5:30pm, Scott Aaronson will host a talk by MIRI research fellow Eliezer Yudkowsky. Yudkowsky’s talk will take place in MIT’s Ray and Maria Stata Center (see image on right), in room 32-123 (aka Kirsch Auditorium, with 318 seats). There will be light refreshments 15 minutes before the talk. Yudkowsky’s title and abstract are:
Recursion in rational agents: Foundations for self-modifying AI
Reflective reasoning is a familiar but formally elusive aspect of human cognition. This issue comes to the forefront when we consider building AIs which model other sophisticated reasoners, or who might design other AIs which are as sophisticated as themselves. Mathematical logic, the best-developed contender for a formal language capable of reflecting on itself, is beset by impossibility results. Similarly, standard decision theories begin to produce counterintuitive or incoherent results when applied to agents with detailed self-knowledge. In this talk I will present some early results from workshops held by the Machine Intelligence Research Institute to confront these challenges.
The first is a formalization and significant refinement of Hofstadter’s “superrationality,” the (informal) idea that ideal rational agents can achieve mutual cooperation on games like the prisoner’s dilemma by exploiting the logical connection between their actions and their opponent’s actions. We show how to implement an agent which reliably outperforms classical game theory given mutual knowledge of source code, and which achieves mutual cooperation in the one-shot prisoner’s dilemma using a general procedure. Using a fast algorithm for finding fixed points, we are able to write implementations of agents that perform the logical interactions necessary for our formalization, and we describe empirical results.
Second, it has been claimed that Godel’s second incompleteness theorem presents a serious obstruction to any AI understanding why its own reasoning works or even trusting that it does work. We exhibit a simple model for this situation and show that straightforward solutions to this problem are indeed unsatisfactory, resulting in agents who are willing to trust weaker peers but not their own reasoning. We show how to circumvent this difficulty without compromising logical expressiveness.
Time permitting, we also describe a more general agenda for averting self-referential difficulties by replacing logical deduction with a suitable form of probabilistic inference. The goal of this program is to convert logical unprovability or undefinability into very small probabilistic errors which can be safely ignored (and may even be philosophically justified).