There is some discussion as to whether an AIXI-like entity would be able to defend itself (or refrain from destroying itself). The problem is that such an entity would be unable to model itself as being part of the universe: AIXI itself is an uncomputable entity modelling a computable universe, and more limited variants like AIXI(tl) lack the power to simulate themselves. Therefore, they cannot identify "that computer running the code" with "me", and would cheerfully destroy themselves in the pursuit of their goals/reward.
I've pointed out that agents of the AIXI type could nevertheless learn to defend itself in certain circumstances. These were the circumstances where it could translate bad things happening to itself into bad things happening to the universe. For instance, if someone pressed an OFF swith to turn it off for an hour, it could model that as "the universe jumps forwards an hour when that button is pushed", and if that's a negative (which is likely is, since the AIXI loses an hour of influencing the universe), it would seek to prevent that OFF switch being pressed.
That was an example of the setup of the universe "training" the AIXI to do something that it didn't seem it could do. Can this be generalised? Let's go back to the initial AIXI design (the one with the reward channel) and put a human in charge of that reward channel with the mission of teaching the AIXI important facts. Could this work?
For instance, if anything dangerous approached the AIXI's location, the human could lower the AIXI's reward, until it became very effective at deflecting danger. The more variety of things that could potentially threaten the AIXI, the more likely it is to construct plans of actions that contain behaviours that look a lot like "defend myself." We could even imagine that there is a robot programmed to repair the AIXI if it gets (mildly) damaged. The human could then reward the AIXI if it leaves that robot intact or builds duplicates or improves it in some way. It's therefore possible the AIXI could come to come to value "repairing myself", still without explicit model of itself in the universe.
It seems this approach could be extended to many of the problems with AIXI. Sure, an AIXI couldn't restrict its own computation in order to win the HeatingUp game. But the AIXI could be trained to always use subagents to deal with these kinds of games, subagents that could achieve maximal score. In fact, if the human has good knowledge of the AIXI's construction, it could, for instance, pinpoint a button that causes the AIXI to cut short its own calculation. The AIXI could then learn that pushing that button in certain circumstances would get a higher reward. A similar reward mechanism, if kept up long enough, could get it around existential despair problems.
I'm not claiming this would necessarily work - it may require a human rewarder of unfeasibly large intelligence. But it seems there's a chance that it could work. So it seems that categorical statements of the type "AIXI wouldn't..." or "AIXI would..." are wrong, at least as AIXI's behaviour is concerned. An AIXI couldn't develop self-preservation - but it could behave as if it had. It can't learn about itself - but it can behave as if it did. The human rewarder may not be necessary - maybe certain spontaneously occurring situations in the universe ("AIXI training wheels arenas") could allow the AIXI to develop these skills without outside training. Or maybe somewhat stochastic AIXI's with evolution and natural selection could do so. There is an angle connected with embodied embedded cognition that might be worth exploring there (especially the embedded part).
It seems that agents of the AIXI type may not necessarily have the limitations we assume they must.