Cooperative Oracles: Introduction

0Scott Garrabrant

0Anthony DiGiovanni

2Scott Garrabrant

New Comment

I have stopped working on this sequence, because a coauthor is trying to write it up as a more formal paper instead.

(At the risk of necroposting:) Was this paper ever written? Can't seem to find it, but I'm interested in any developments on this line of research.

Im not intending to go back to thinking about this anymore, but diffractor is the person who was thinking /writing about it. https://www.lesswrong.com/posts/SgkaXQn3xqJkGQ2D8/cooperative-oracles

This is the first in a series of posts introducing a new tool called a Cooperative Oracle. All of these posts are joint work Sam Eisenstat, Tsvi Benson-Tilsen, and Nisan Stiennon.

Here is my plan for posts in this sequence. I will update this as I go.

In this post, I will give a sketchy advertisement of what is to come.

Cooperative oracles are a refinement on reflective oracles. We consider the set of Turing machines with access to a reflective oracle, and we think of each Turing machine as having a utility function (perhaps written in a comment in the source code). This utility function is itself computable using the reflective oracle.

Since the definition of a reflective oracle use a fixed point procedure that may have had multiple fixed points, there are actually many different reflective oracles. However, we will take something like a Pareto optimum in the class of reflective oracles.

For example if two players in a prisoner's dilemma use a reflective oracles to cooperate with exactly the probability the other player cooperates, there is a continuum of fixed points in which the player cooperate with the same probability. We want to take the fixed point in which the both cooperate.

Unfortunately, we do not want to just take a standard Pareto Optimum, since we are working with the class of all Turing Machines, and for every machine with a utility function, there is another with the negation of that utility function, so all points will be Optima.

For this, we will use a stratified version of Pareto optima. If a third party wants the two above players to defect against each other, we will still choose mutual cooperation. Since the players only reference each other, and not the third party, the third party is not considered in the notion of optimality for the output of the Oracle on the prisoners' dilemma game.

We will be able to build a cooperative oracle, which is a reflective oracle that (roughly) ensures that all of the outputs of the computations are Pareto optimal for all of the utility function of the machines involved it that computation. This Pareto optimality is only used to choose between multiple fixed points. We will still be restricted to have the computation follow its source code, and have the reflective oracle give honest data about the output of the computations.

Using cooperative oracles, we will be able to set up a system similar to modal combat, but with reflective oracles.

We will also be able to study bargaining. We will formalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation, and implement this style of bargaining with reflective oracles.

This is interesting because it reduces the problem of finding a fair way to split gains from trade from a group endeavor (like in standard cooperative game theory) to an individual endeavor, where each player specifies what they consider to be a fair outcome for them, and the reflective oracle takes care of the rest. The problem of figuring out how to define what is fair to you is still wide open, however.

The notion of cooperative oracles presented will be use a clunky syntactic notion of when one computation depends on another. If an algorithm calls the reflective oracle on the output of another, we say the first algorithm depends on the second, and we take the transitive closure of this property. There are many alternatives to this, some involving looking at the actual function the algorithms compute and checking how these functions depend on other values. Specifying better notions of computations depending on each other is a hard problem, and better solutions to this may yield better definitions of cooperative oracles.