Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.

In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources. I'm posting it here with his permission:

There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.

Suppose:

- The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.
- The agent runs UDT, and has the ability to fully simulate the predictor.
Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.

Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).

The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.

About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:

a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.

b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.

Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.

Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)

Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?

Yes, especially if matt adds a 'decision theory' subreddit. That way those from the clique with a narrow interest in decision theory can just follow that page instead. I know I would be more likely to engage with the topics on a reddit style medium than via email. Email conversations are just a lot harder to follow. Especially given that, let's be honest, the 'small clique' is barely active and can go for weeks or a month between email.

Having lesswrong exposed to people who are actively thinking through and solving a difficult problem would be an overwhelmingly good influence on the lesswrong community and will expose potential new thinkers, hopefully inspiring some of them to get involved in the problem solving themselves. There will, of course, be a lowering of the average standard of the discussion but I would expect a net increase in high quality contributions as well. Heavy upvoting of the clear thinking and commensurate downvoting would make new insights accessible.

(Mind you there is an obvious advantage to having decision theory conversations that are

noton a sight run by SIAI too.)Just to give due credit: Wei Dai and others had already discussed Prisoner's Dilemma scenarios that exhibit a similar problem, which I then distilled into the ASP problem.

Speaking only for myself: Eliezer's sequences first lured me to Less Wrong, but your posts on decision theory were what convinced me to stick around and keep checking the front page.

I confess I don't understand all of the math. It's been decades since I studied mathematics with any rigour; these days I can follow some fairly advanced theory, but have difficulty reproducing it, and cannot in general extend it. I have had nothing substantial to add, and so I haven't previously commented on one of your posts. Somehow LW didn't seem like the

bestplace to go all fangirl…It's not only the contributors who are following your work on decision theory. I hope you'll continue to share it with the rest of us.

This whole thing seems like an artifact of failing to draw a boundary between decision theories and strategies. In my own work, I have for deterministic problems:

Phrased this way, the predictor is part of the World function, which means it is only allowed to simulate a Strategy, not a DecisionTheory, and so the problem as stated is ill-posed. This structure is

requiredfor decision theory in ge... (read more)I'm confused on this point, would you mind checking if my thinking on it is correct?

My initial objection was that this seems to assume that the predictor doesn't take anything into account, and that the agent was trying to predict what the predictor would do without trying to figure out what the predictor would predict the agent would choose.

Then I noticed that the predictor isn't actu... (read more)

On an extremely similar note, as I've argued here, I'm pretty sure that AIXI two-boxes on Newcomb's problem.

Basically my argument is that AIXI can fairly trivially work out whether the box is full or empty, and hence it two-boxes due to causal assumptions implicit in the AIXI algorithm. Moreover, that line of reasoning is very easy for Omega to come across, and so Omega predicts two-boxing and the box ends up being empty, and then AIXI takes two boxes and ends up with $1000.

It can be argued that the "epistemic advantage" of the predictor over the agent is an unfair one. After all, if the agent had an equivalent axiom for predictor's consistency, both would be inconsistent.

In the absence of this advantage, the predictor won't be able to find a proof of agent's action (if it is consistent).

Your comment reminded me of a post I've long wanted to write. The idea is that examining assumptions is unproductive. The only way to make intellectual progress, either individually or as a group, is to stop arguing about assumptions and instead explore their implications wherever they might lead. The #1 reason why I took so long to understand Newcomb's Problem or Counterfactual Mugging was my insistence on denying the assumptions behind these problems. Instead I should have said to myself, okay, is this direction of inquiry interesting when taken on its own terms?

Many assumptions seemed divorced from real life at first, e.g. people dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. People seem unable to judge the usefulness of assumptions before exploring their implications in detail, but they absolutely

lovearguing about assumptions instead of getting anythingdone.There, thanks for encouraging me to write the first draft :-)

It seems to me that this problem assumes that the predictor both does and does not predict correctly.

When determining the predictor's actions, we assume that it forsees the agent's two-boxing.

When determining the agent's actions, we assume that the simulated predictor behaves the same regardless of the agent's decision.

The question thus seems to contradict itself.

I don't even know a fraction of the math you people know, but this problem seems obvious to me: One-box fully expecting the box to be empty (or something impossible to happen).

More generally, if expecting A implies B, expecting B implies A or C and expecting C implies C and U(B)>U(C)>U(A) expect A unless the cost of "being wrong" is larger than the difference. (A being one-boxing the empty box, C being two-boxing with one box empty and B being one-boxing a full box here, in this case B -> C would be via D expecting to two-box with both b... (read more)

What happens if the UDT agent generates a proof that using any proof longer than N results in only $1000? Is that level of self-reference allowed?

If your proof works, I would expect that Omega also knows the agent is consistent and can follow the same logic, and so the UDT agent two-boxes on Newcomb's problem. Unless you use a version of UDT that (effectively) optimizes over decisions rather than actions (like TDT), which would solve both problems.

EDIT: On solving both problems: my understanding of UDT comes from AlephNeil's post. If you look at his "generalization 2," it is exactly what I mean by a problem where you need to optimize over decisions rather than actions - and he claims tha... (read more)

This problem is underspecified, in that it does not say what happens if the predictor fails to find a proof either way. Which is exactly what will happen if the agent simulates the predictor.

UDT may two-box in the above scenario if it simulates the predictor

once, but what if the UDT agent simulates the predictor twice, simulates itself using the above reasoning and two-boxing in simulation 1, and simulates itself one-boxing for whatever reason in simulation 2? The UDT agent that one-boxes "for whatever reason" does better, and thus the real UDT agent will realize this upon running these 2 simulations and one-box, which the predictor will reason that it would.