By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

*(Wait, I gotta take a breath now.)*

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.

*(Whew!)*

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.

For example, here's Newcomb's Problem:

`def world():`

` box1 = 1000`

` box2 = (agent() == 2) ? 0 : 1000000`

` return box2 + ((agent() == 2) ? box1 : 0)`

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent *makes its premise true* by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

A subtlety: If the proof search finds "agent()==1 implies world()==5" and "agent()==2 implies world()==1000", then the argument in the post shows that it will indeed turn out that agent()==2 and world()==1000. But the argument does

notshow in any similar sense that 'if the agent had returned 1, then world() would equal 5' -- in fact this has no clear meaning (other than the one used inside the algorithm, i.e. that the theorem above is found in the proof search).I do not think that this is actually a problem here, i.e. I don't think it'... (read more)

Thanks for taking another look. BTW, I suggest if people change the content of their comments substantially, it's better to make a reply to yourself, otherwise those who read LW by scanning new comments will miss the new content.

Note that by proving "agent()==1 implies world()==1000000", and then performing action 1, you essentially conclude that "world()==1000000" was

validin the first place. "You had the mojo all along", as Aaronson said (on Löb's theorem). Decision making is all about lack of logical transparency, inability to see the whole theory from its axioms.I was confused on the first reading: I thought you were presenting a contrasting reduction of couldness. Then, on reading the earlier comment, I saw the context in which you were presenting this: as a way to clarify the existing free will solution EY gave. Now, it makes more sense.

(It might be helpful to include more of the motivation for this explanation at the start.)

This has been extremely useful for me, even though I understood the basic idea already, because with the discussion it made me understand something very basic that somehow I'd always missed so far: I'd always thought of this kind of program in the context of applying classical decision theory with the possible actions being particular agent()s. But using classical decision theory already presupposes some notion of "could." Now I understand that the principle here yields (or is a real step towards) a notion of could-ness even in the real world: If... (read more)

Note that world() essentially doesn't talk about

what can happen, instead it talks abouthow to compute utility, and computation of utility is a single fixed program without parameters (not even depending on the agent), that the agent "controls" from the inside.To clarify, in the Newcomb example, the preference (world) program could be:

while the agent itself knows its code in the form agent3(). If it can prove both agent1() and agent... (read more)

OK, so just spelling out the argument for my own amusement:

The agent can prove the implications "agent() == 1 implies world() == 1000000" and "agent() == 2 implies world() == 1000", both of which are true.

Let x and y in {1, 2} be such that agent() == x and agent() != y.

Let Ux be the greatest value of U for which the agent proves "agent() == x implies world() == U". Let Uy be the greatest value of U for which the agent proves "agent() == y implies world() == U".

Then Ux is greater than or equal to Uy.

If the agent's for... (read more)

Bravo!

The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:

(Ax1) A=1 OR A=2

(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)

(Ax3) forall a1,a2,w1,w2 ((A=a1 => W=w1) AND (A=a2 => W=w2) AND (w1>w2)) => NOT (A=a2)

Then the Agent's algorithm would be equivalent to a step-by-step deduction:

(1) Ax2 |- A=1 => W=1000

(2) Ax2 |- A=2 => W=1000000

(3) Ax3 + (1) + (2) |- NOT (A=1)

(4) Ax1 + (3) |- A=2

In this form it is clear that there are no logical contradictions at any tim... (read more)

I enjoyed reading and thinking about this, but now I wonder if it's less interesting than it first appears. Tell me if I'm missing the point:

Since world is a function depending on the integers agent outputs, and not on agent itself, and since agent knows world's source code, agent could do the following: instead of looking for proofs shorter than some fixed length that world outputs u if agent outputs a, it could just simulate world and tabulate the values world(1), world(2), world(3),... up to some fixed value world(n). Then output the a < n + 1 for ... (read more)

Am I right that if world's output depends on the length of time agent spends thinking, then this solution breaks?

Edit: I guess "time spent thinking" is not a function of "agent", and so world(agent) cannot depend on time spent thinking. Wrong?

Edit 2: Wrong per cousin's comment. World does not even depend on agent, only on agent's output.

Neat. It seems like one could think of the agent as surveying mathematical objects (sets of theorems of the form "agent()==A implies world()==U") and then forcing itself to be in a particular mathematical object (that which contains a particular true statement "agent()==A implies world()==U"). If we're thinking in terms of an ensemble the agent would already be in a particular mathematical object, so perhaps it's procedure could be thought of as forcing itself into a particular mathematical sub-structure.

When I think about this stuff I get the feeling I'm confused on many levels.

You were right: reading a second time helped. I figured out what is bothering me about your post. The above seems misspoken, and may lead to an overly constricted conc... (read more)

I'm uncomfortable with the unbounded CPU time assumption. For example, what if

world()computes the payoff in a Prisoner's Dilemma between two identicalagent()s? You get non-termination due to a recursion without an escape clause.Would I be right in thinking that this implies that you can't apply could to world()? i.e. "Our universe could have a green sky" wouldn't be meaningful without some sort of metaverse program that references what we would think is the normal world program?

Or have the utility computed also depend on some set of facts about the universe?

I think both of those would require that the agent have some way of determining the facts about the universe (I suppose they could figure it out from the source code, but that seems sort of illegitimate to me).

Following is a generalization to take into account logical uncertainty (inability to prove difficult statements, which can lead in particular to inability to prove

anystatements of the necessary kind).Instead of proving statements of the form "agent()==A implies world()==U", let's prove statements of the form "(X and agent()==A) implies world()==U", where X are unsubstantiated assumptions. There are obviously provable statements of the new form. Now, instead of looking for a statement with highest U, let's for each A look at the sets ... (read more)

Does this differ from, or contradict something in, how EY reduces "could" in his free will solution? I thought the could-issue was resolved.