This is basically a rehash of my comment in your previous post, but I think you are confused in a very particular way which I am not. You are confusing "optimizing with the assumption Agent=X" with "optimizing without that assumption". In other words, optimizing for a decision problem were Omega always samples Agent=X, versus optimizing for your actual described decision problem were Omega samples X randomly.

For example, you describe the first case as one "where no one tries to predict this agent in particular". But actually, if we assume Agent=X, this is by definition "Omega predicting Agent perfectly" (even if, from the third-person perspective of the Programmer, this happened randomly, that is, it seemed unlikely a priori, and was very surprising). You also describe the second case as "direct logical entanglement of agent's behaviour with something that influences agent's utility". But actually, if you are assuming that Omega samples X randomly, then X isn't entangled (correlated) with Agent in any way, by definition.

Here's another way to highlight your confusion. Say Programmer is thinking about what Agent to implement. The following are two different (even if similar-sounding) statements.

Conditional on Agent seeing a Yes (that is, assuming that Omega always samples X=Agent), Programmer wants to submit an Agent that Cooperates (upon seeing Yes, of course, since that's all it will ever see).
Without conditioning on anything (that is, assuming that Omega sample X randomly), Programmer wants to submit an Agent that, upon seeing a Yes, Cooperates.

The former is true, while the latter is false.

So the only question is: Does Agent, the general reasoner created by Programmer, want to maximize its utility according to the former (updated) or the latter (non-updated) probability distribution?
Or in other words: Does Agent, even upon updating on its observation, care about what happens in other counterfactual branches (different from the one that turned out to be the case)?

It can seem jarring at first that there is not a single clean solution that allows all versions of a reasoner (updated and non-updated) to optimize together and get absolutely everything they want. But it's really not surprising when you look at it: optimizing across many branches at once (because you still don't know which one will materialize) will lead to different behavior than optimizing for one branch alone.

So then, what should a general-reasoning Agent actually do in this situation? What would I do if I suddenly learned I have been living this situation. What probability distribution should we optimize?

On the one hand, it sounds pretty natural to optimize from their updated distribution. After all, they have just come into existence. All they have ever known is Yes. They have never, literally speaking, been uncertain about whether they would see a Yes. They could reconstruct an artificial updateless prior further into the past (that they have never actually believed for a moment), but that can feel unnatural.
On the other hand, clearly Programmer would have preferred for it to maximize updatelessly, and would have hard-coded that in if they hadn't been so sloppy when programming Agent. There is a sense in which the designed purpose of Agent as a program would have been better achieved (according to Programmer) by maximizing updatelessly, so maybe we should just right this wrong now (despite Agent's updated beliefs).

This choice is just totally up for grabs! We'll have to query our normative intuitions for which one seems to better fit the spirit of what we want to call "maximization". Again, there is no deep secret to be uncovered here: maximizing from a random distribution, will recommend different actions than maximizing from a conditioned distribution, will recommend different actions than maximizing from a crazy distribution where unicorns exist.
I myself find it intuitive to be updateful here. That just looks more like maximization from my current epistemic state, and heck, it was the Programmer, from their updateless perspective, who could have just made me into a dumb Defect-rock, and it would have worked better for themselves! But I think this is just a subjective take based on vibes, and other decision theorists might have the opposite take, and it might be irreconcilable.

Reply

1

[-]Tapatakt5mo20

Do you also prefer to not pay in Counterfactual Mugging?

Reply

[-]Martín Soto5mo20

Depends on the complexity of the logical coin. Certainly not for 1+1=2. But probably yes for appropriately complex statements. This is due to strong immediate identification with "my immediately past self who didn't yet know the truth value", and an understanding that "he (my past self) cannot literally rewrite my brain at will to ensure this behavior holds, but it's understood that I will play along to some extent to satisfy his vision (otherwise he would have to invest more in binding my behavior, which sounds like a waste)".
(Of course, I need some kind of proof that the statement has been chosen non-adversarially, and I'm not yet sure that is possible)

Reply