Followup/summary/extension to this conversation with SilasBarta
So, you're going along, cheerfully deciding things, doing counterfactual surgery on the output of decision algorithm A1 to calculate the results of your decisions, but it turns out that a dark secret is undermining your efforts...
You are not running/being decision algorithm A1, but instead decision algorithm A2, an algorithm that happens to have the property of believing (erroneously) that it actually is A1.
Well, first, let me suggest a slightly more concrete way in which this might come up:
Physical computation errors. For instance, a stray cosmic ray hits your processor and flips a bit in such a way that a certain conditional that would have otherwise gone down one branch instead goes down the other, so instead of computing the output of your usual algorithm in this circumstance, you're computing the output of the version that, at that specific step, behaves in that slightly different way. (Yes, this sort of thing can be mitigated with error correction/etc. The problem that is being addressed here is that, (to me at least) it seems that basic TDT doesn't have a natural way to even represent this possibility).
Consider a slightly modified causal net with in which the innards of an agent are more more of an "initial state", and that there's a selector node/process (ie, the resulting computation) that selects which abstract algorithm's output is the one that's the actual output. ie, this process determines which algorithm you, well, are.
Similarly, another being that might base its actions on a model of your behavior will be represented as having a model of your innards and the model itself having a selector, analogous to the above.
To actually compute consequences of decisions and do all the relevant counterfactual surgery, ideally (ignoring "minor" issues like computability), one iterates over all possible algorithms one might be. That is, one first goes "if the actual results of the combination of my innards and all the messy details of reality and so on is to do computation A1, then..." and subiterate over all possible decisions. The second thing, of course, being done via the usual counterfactual surgery.
Then, weigh all of those by the probability that one actually _is_ algorithm A1, and then go "if I actually was algorithm A2..." etc etc... ie, and one does the same counterfactual surgery.
In the above diagram, that lets one consider the possibility of ones own choice being decoupled from what the model of their choice would predict, given that the initial model is correct, but while they are actually considering the decision, a hardware error or whatever causes the agent to be/implement A2 while the model of them is instead properly implementing A1.
I am far from convinced that this is the best way to deal with this issue, but I haven't seen anyone else bringing it up, and the usual form of TDT that we've been describing didn't seem to have any obvious way to even represent this issue. So, if anyone has any better ideas for how to clean up this solution, or otherwise alternate ideas for dealing with this problem, go ahead.
I just think it is important that it be dealt with _somehow_... That is, that the decision theory have some way of representing errors or other things that could cause ambiguity as to which algorithm it is actually implementing in the first place.
EDIT: sorry, to clarify: one determines the utility for a possible choice by summing over the results of all the possible algorithms making that particular choice. (ie, "I don't know if my decision corresponds to deciding the outcome of algorithm A1 or A2 or...") so sum over those for each choice, weighing by the probability of that being the actual algorithm in quesiton)
EDIT2: SilasBarta came up with a different causal graph during our discussion to represent this issue.