Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

*(A longer text-based version of this post is also available on MIRI's blog* *here, and the bibliography for the whole sequence can be found* *here.)*

*The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.*

*Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.*

I am talking about the surreal number ε, which is smaller than any positive real. Events of likelihood ε do not actually happen, we just keep them around so the counterfactual reasoning does not divide by 0.

Within the simulation, the AI might be able to conclude that it just made an ε-likelihood decision and must therefore be in a counterfactual simulation. It should of course carry on as it were, in order to help the simulating version of itself.

Why shouldn't the environment be learning?

To the Omega scenario I would say that since we have an Omega-proof random number generator, we get new strategic options that should be included in the available actions. The linear function then goes from the ε-adjoined open affine space generated by {Pick H with probability p | p real, non-negative and at most 1} to the ε-adjoined utilities, and we correctly solve Omega's problem by using p=1/2.