[ Question ]

Is there any work on incorporating aleatoric uncertainty and/or inherent randomness into AIXI?

by capybaralet1 min read4th Oct 20207 comments


Personal Blog

IIUC, AIXI assumes the environment is deterministic.  
In other words, it only has epistemic uncertainty.
What if it didn't make that assumption and/or the assumption was violated?
Has anyone explored this question?

New Answer
Ask Related Question
New Comment

2 Answers

Yes. If you generate a bit sequence by flipping a coin, then with high probability AIXI will throw up its hands and say "you can't model this any better than just recording the sequence, therefore the next bit is 50/50."

With slight complications, similar arguments apply no matter what distribution you draw the environment from, so that the random part correctly gets modeled like a random variable drawn from the right distribution.

Couldn't you just treat any 'stochastic' environment as hidden-variable theories - actually being a deterministic program with a PRNG appended whose seed you don't know?

2Charlie Steiner5moYes, this is basically what I'm saying - treating the future as random, versus treating the future as encrypted by a one-time pad you don't know, lead to the same distributions and behavior. This means that if you want to think of Solomonoff induction in terms of random variables, you can, but it turns out that you get back something that's still equivalent to Solomonoff induction.
1capybaralet5moYeah that seems right. But I'm not aware of any such work OTTMH.

Is there a reference for this?

I was inspired to think of this by this puzzle (which I interpret as being about the distinction between epistemic and aleatoric uncertainty):

"To present another example, suppose that five tosses of a given coin are planned and that the agent has equal strength of belief for two outcomes, both beginning with H, say the outcomes HTTHT and HHTTH. Suppose the first toss is made, and results in a head. If all that the agent learns is that a head occurred on the first toss it seems unreasonable for him to move to a greater confi... (read more)

3Charlie Steiner5moI don't have a copy of Li and Vitanyi on hand, so I can't give you a specific section, but it's in there somewhere (probably Ch. 3). By "it" here I mean discussion of what happens to Solomonoff induction if we treat the environment as being drawn from a distribution (i.e. having "inherent" randomness). Neat puzzle! Let's do the math real quick: Suppose you have one coin with bias 0.1, and another with bias 0.9. You choose one coin at random and flip it a few times. Before flipping, flipping 3 H and 2 T seems just as likely as flipping 2 H and 3 T, no matter the order. P(HHHTT)= P(HHTTT) =(0.5×0.93×0.12)+(0.5×0.92×0.13)= 0.00405 After your first flip, you notice that it's a H. You now update your probability that you grabbed the heads-biased coin: P(heads bias|H) =0.5×0.90.5= 0.9. Now P(HHTT|H) =(0.9×0.92×0.12)+(0.1×0.92×0.12)= 0.0081 And P(HTTT|H) =(0.1×0.93×0.1)+(0.9×0.9×0.13)= 0.0081. Huh, that's weird. That's, like, super unintuitive. But if you look at the terms for P(HHTT|H) and P(HTTT|H), notice that they both simplify to(0.93×0.12)+(0.92×0.13). You think it's more likely that you have the heads-biased coin, but because you know the coin must be biased, the further sequence "HHTT" isn't as likely as the sequence "HTTT", and both this difference in likelihood and your probability of what coin you have are the same number, the bias of the coin!

Marcus Hutter's "Universal Algorithmic Intelligence: A mathematical top->down approach" has this in section 2.4.:

Let us now weaken our assumptions by replacing the deterministic environment q with a probability distribution µ(q) over chronological functions. Here µ might be interpreted in two ways. Either the environment itself behaves stochastically defined by µ or the true environment is deterministic, but we only have subjective (probabilistic) information of which environment is the true environment. Combinations of both cases are also possible. We assume here that µ is known and describes the true stochastic behavior of the environment. The case of unknown µ with the agent having some beliefs about the environment lies at the heart of the AIξ model described in Section 4.

The best or most intelligent agent is now the one that maximizes the expected utility (called value function) . This defines the AIµ model.

If I'm skimming the document correctly (I haven't read it in any detail), building up the AIµ model is part of later turning it into the AIξ model, which is AIXI. From the end of the section:

To get our final universal AI model the idea is to replace µ by the universal probability ξ, defined later.

 And section 4:

The main idea of this work is to generalize universal induction to the general agent model described in Section 2. For this, we generalize ξ to include actions as conditions and replace µ by ξ in the rational agent model, resulting in the AIξ(=AIXI) model. In this way the problem that the true prior probability µ is usually unknown is solved. Convergence of ξ→µ can be shown, indicating that the AIξ model could behave optimally in any computable but unknown environment with reinforcement feedback.