IIUC, AIXI assumes the environment is deterministic.

In other words, it only has epistemic uncertainty.

What if it didn't make that assumption and/or the assumption was violated?

Has anyone explored this question?

# 2 Answers

Yes. If you generate a bit sequence by flipping a coin, then with high probability AIXI will throw up its hands and say "you can't model this any better than just recording the sequence, therefore the next bit is 50/50."

With slight complications, similar arguments apply no matter what distribution you draw the environment from, so that the random part correctly gets modeled like a random variable drawn from the right distribution.

Is there a reference for this?

I was inspired to think of this by this puzzle (which I interpret as being about the distinction between epistemic and aleatoric uncertainty):

"""

"To present another example, suppose that five tosses of a given coin are planned and that the agent has equal strength of belief for two outcomes, both beginning with H, say the outcomes HTTHT and HHTTH. Suppose the first toss is made, and results in a head. If all that the agent learns is that a head occurred on the first toss it seems unreasonable for him to move to a greater confi... (read more)

Marcus Hutter's "Universal Algorithmic Intelligence: A mathematical top->down approach" has this in section 2.4.:

Let us now weaken our assumptions by replacing the deterministic environment q with a probability distribution µ(q) over chronological functions. Here µ might be interpreted in two ways. Either the environment itself behaves stochastically defined by µ or the true environment is deterministic, but we only have subjective (probabilistic) information of which environment is the true environment. Combinations of both cases are also possible. We assume here that µ is known and describes the true stochastic behavior of the environment. The case of unknown µ with the agent having some beliefs about the environment lies at the heart of the AIξ model described in Section 4.

The best or most intelligent agent is now the one that maximizes the expected utility (called value function) . This defines the AIµ model.

If I'm skimming the document correctly (I haven't read it in any detail), building up the AIµ model is part of later turning it into the AIξ model, which is AIXI. From the end of the section:

To get our final universal AI model the idea is to replace µ by the universal probability ξ, defined later.

And section 4:

The main idea of this work is to generalize universal induction to the general agent model described in Section 2. For this, we generalize ξ to include actions as conditions and replace µ by ξ in the rational agent model, resulting in the AIξ(=AIXI) model. In this way the problem that the true prior probability µ is usually unknown is solved. Convergence of ξ→µ can be shown, indicating that the AIξ model could behave optimally in any computable but unknown environment with reinforcement feedback.

Couldn't you just treat any 'stochastic' environment as hidden-variable theories - actually being a deterministic program with a PRNG appended whose seed you don't know?