Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Reflective oracles and the procrastination paradox

0jimrandomh

0danieldewey

0jessicata

0danieldewey

New Comment

4 comments, sorted by Click to highlight new comments since: Today at 5:29 AM

The procrastination paradox is isomorphic to well-founded recursion. In the reasoning, the fourth step, "whether or not I press the button, the next agent or an agent after that will press the button" is an invalid proof-step; it's shown that there is an inductive steps ending at the conclusion, but not that that chain has a base case.

This can only happen when the relation between an agent and its successor is not well-founded. If there is any well-founded relation between agents and their successors - either because they're in a finite universe, or because the first agent picked a well-founded relation and build that in - then the button will eventually get pushed.

I don't (confidently) understand why the procrastination paradox indicates a problem to be solved. Could you clarify that for me, or point me to a clarification?

First off, it doesn't seem like this kind of infinite buck-passing could happen in real life; is there a real-life (finite?) setting where this type of procrastination leads to bad actions? Second, it seems to me that similar paradoxes often come up in other situations where agents have infinite time horizons and can wait as long as they want -- does the problem come from the infinity, or from something else?

The best explanation that I can give is "It's immediately obvious to a human, even in an infinite situation, that the only way to get the button pressed is to press it immediately. Therefore, we haven't captured human reasoning (about infinite situations), and we should capture that human reasoning in order to be confident about AI reasoning." This is AFAICT the explanation Nate gives in the Vingean Reflection paper. Is that how you would express the problem?

It is definitely a problem with infinite buck-passing. It is probably possible to prove optimality if we have a continuous utility function (e.g. we're using discounting). I think we might actually want a continuous utility function, but maybe not. Is there any time t such that you would consider it almost as good for a wonderful human civilization to exist for t steps and then die, compared to existing indefinitely?

The way I would express the procrastination paradox is something like:

- There's the tiling agents problem: we want AIs to construct successors that they trust to make correct decisions.
- It would be desirable to have a system where an infinite sequence of AIs each trust the next one. If it worked, this would solve the tiling agents problem.
- But, if we have something like this, then it will be unsound: it will prove that the button will eventually get pressed, even though it will never actually get pressed.

We can construct things that do press the button, but they don't have the property of trusting successors that is desirable in some ways. Due to their handling of recursion, Paul's logic and reflective oracles are both candidates for solving the tiling agents problem, however they both fail the procrastination paradox (when it's set up this way).

Cool, thanks; sounds like I have about the same picture. One missing ingredient for me that was resolved by your answer, and by going back and looking at the papers again, was the distinction between consistency and soundness (on the natural numbers), which is not a distinction I think about often.

In case it's useful, I'll note that the procrastination paradox is hard for me to take seriously on an intuitive level, because some part of me thinks that requiring correct answers in infinite decision problems is unreasonable; *so* many reasoning systems fail on these problems, and infinite situations seem so unlikely, that they are hard for me to get worked up about. This isn't so much a comment on how important the problem actually is, but more about how much argumentation may be required to convince people like me that they're actually worth working on.

The procrastination paradox relates to the following problem:

The paradox is that the following reasoning process leads to the button never getting pressed:

But the same reasoning can be used for every time step! No agent will press the button, and all will trust that some future agent will. The flaw here is that we constructed a sequence of logical systems (one for each agent), each of which considers the next one sound.

We can formalize the procrastination paradox using reflective oracles instead of logic. Suppose each agent is a reflective CDT agent. Define a machine AOi() to represent whether agent i presses the button (for some natural number i), and define a machine BOi() to represent whether agent i or a later agent presses the button:

AOi()=1−O(Bi+1,0) BOi()=protectO(Ai∨Bi+1)

The first line states that the agent must press the button if P(BOi+1()<1), and may do anything otherwise. In the second line, (Ai∨Bi+1)O()=AOi()∨BOi+1(), and protectO(Q) is a way of sampling a bit with the same distribution QO(), as defined in this post:

Observe that an assignment of probabilities to each P(AOi()=1),P(BOi()=1) (from which we can determine an O consistent with these probabilities) is consistent if and only if each BOi()=1. This is because if any P(BOi()=1)<1, then P(BOi+1()+1)<1 would also be less than 1, which implies P(AOi()=1), which implies P(BOi()=1).

Although a reflective oracle must assign P(BOi()=1)=1 for all i, we have no restrictions on P(AOi()=1) given this, so it is consistent for the oracle to say that no agent presses the button, but the button gets pressed eventually. Therefore, a sequence of reflective CDT agents reasoning in this fashion may choose to never press the button!

As reflective oracles were derived from Paul's probabilistic logic, we would expect this proof to resemble the proof that Paul's logic fails the procrastination paradox.

The proofs are not exactly analogous (specifically, the proof for Paul's logic does not use recursion to define the statement that the button is pressed in the future), but they are similar. Perhaps if we can solve the problem in the simpler case with reflective oracles, we can adapt the solution to talk about Paul's logic. For example, Benja suggested that we could restrict utility functions to be a continuous function of the actions (in that sufficiently late actions can only have a small effect on the resulting utility), and then prove an optimality result for reflective oracles that depends on continuity.