I am a PhD student in computer science at the University of Waterloo.
My current research is related to Kolmogorov complexity. Sometimes I build robots, professionally or otherwise.
See my personal website colewyeth.com for an overview of my interests.
I've read a bit of the logical induction paper, but I still don't understand why Bayesian probability isn't sufficient for reasoning about math. It seems that the Cox axioms still apply to logical uncertainty, and in fact "parts of the environment are too complex to model" is a classic example justification for using probability in A.I. (I believe it is given in AIMA). At a basic rigorous level probabilities are assigned to percepts, but we like to assign them to English statements as well (doing this reasonably is a version of one of the problems you mentioned). Modeling the relationships between strings of mathematical symbols probabilistically seems a bit more well justified than applying them to English statements if anything, since the truth/falsehood of provability is well-defined in all cases*. Pragmatically, I think I do assign a probabilities to mathematical statements being true/provable when I am doing research, and I am not conscious of this leading me astray!
*the truth of statements independent of (say) ZFC is a bit more of a philosophical quagmire, though it still seems that assigning probabilities to provable/disprovable/independent is a pretty safe practice. This might also be a use case for semimeasures as defective probabilities.
To me, the natural explanation is that they were not trained for sequential decision making and therefore lose coherence rapidly when making long term plans. If I saw an easy patch I wouldn't advertise it, but I don't see any easy patch - I think next token prediction works surprisingly well at producing intelligent behavior in contrast to the poor scaling of RL in hard environments. The fact that it hasn't spontaneously generalized to succeed at sequential decision making (RL style) tasks is in fact not surprising but would have seemed obvious to everyone if not for the many other abilities that did arise spontaneously.
It may be possible to formalize your idea as in Orseau's "Space-Time Embedded Intelligence," but it would no longer bear much resemblance to AIXItl. With that said, translating the informal idea you've given into math is highly nontrivial. Which parts of its physical world should be preserved and what does that mean in general? AIXI does not even assume our laws of physics.
Since both objections have been pointers to the definition, I think it's worth noting that I am quite familiar with the definition(s) of AIXI; I've read both of Hutter's books, the second one several times as it was drafted.
Perhaps there is some confusion here about the boundaries of an AIXI implementation. This is a little hard to talk about because we are interested in "what AIXI would do if..." but in fact the embeddedness questions only make sense for AIXI implemented in our world, which would require it to be running on physical hardware, which means in some sense it must be an approximation (though perhaps we can assume that it is a close enough approximation it behaves almost exactly like AIXI). I am visualizing AIXI running inside a robot body. Then it is perfectly possible for AIXI to form accurate beliefs about its body, though in some harder-to-understand sense it can't represent the possibility that it is running on the robots hardware. AIXI's cameras would show its robot body doing things when it took internal actions - if the results damaged the actuators AIXI would have more trouble getting reward, so would avoid similar actions in the future (this is why repairs and some hand-holding before it understood the environment might be helpful). Similarly, pain signals could be communicated to AIXI as negative (or lowered positive) rewards, and it would rapidly learn to avoid them. It's possible that an excellent AIXI approximation (with a reasonable choice of UTM for its prior) would rapidly figure out what was going on and wouldn't need any of these precautions to learn to protect its body - but it seems clear to me that they would at least improve AIXI's chances of success early in life.
With that said, the prevailing wisdom that AIXI would not protect its brain may well be correct, which is why I suggested the off-policy version. This flaw would probably lead to AIXI destroying itself eventually, if it became powerful enough to plan around its pain signals. What I object to is only the dismissal/disagreement with @moridinamael's comment, though it seems to me to be directionally correct and not to make overly strong claims.
I tend to think of this through the lens of the AIXI model - what assumptions does it make and what does it predict? First, one assumes that the environment is an unknown element of the class of computable probability distributions (those induces by probabilistic Turing machines). Then the universal distribution is a highly compelling choice, because it dominates this call while also staying inside it. Unfortunately the computability level does worsen when we consider optimal action based on this belief distribution. Now we must express some coherent preference ordering over action/percept histories, which can be represented as a utility function by VNM. Hutter further assumed it could be expressed as a reward signal, which is a kind of locality condition, but I don't think it is necessary for the model to be useful. This convenient representation allows us to write down a clean specification of AIXI's behavior, relating its well-specified belief distribution and utility function to action choice. It is true that setting aside the reward representation, choosing an arbitrary utility function can justify any action sequence for AIXI (I haven't seen this proven but it seems trivial because all AIXI assigns positive probability to any finite history prefix), but in a way this misses the point: the mathematical machinery we've built up allows us to translate conclusions about AIXI's preference ordering to its sequential action choices and vice versa through the intermediary step of constraining its utility function.
I am confused that this has been heavily downvoted, it seems to be straightforwardly true insofar as it goes. While it doesn't address the fundamental problems of embeddedness for AIXI, and the methods described in the comment would not suffice to teach AIXI to protect its brain in the limit of unlimited capabilities, it seems quite plausible that an AIXI approximation developing in a relatively safe environment with pain sensors, repaired if it causes harm to its actuators, would have a better chance at learning to protect itself in practice. In fact, I have argued that with a careful definition of AIXI's off-policy behavior, this kind of care may actually be sufficient to teach it to avoid damaging its brain as well.
Interesting - intuitively it seems more likely to me that a well constructed mind just doesn't develop sophisticated demons. I think plenty of powerful optimization algorithms are not best understood as creating mesa-optimizers. The low-level systems of the human brain like the visual stream don't seem to ever attempt takeover. I suppose one could make the claim that some psychological disorders arise from optimization "daemons" but this seems mostly seems like pure speculation and possibly an abuse of the terminology. For instance it seems silly to describe optical illusions as optimization daemons.
Yes, I mostly agree with everything you said - the limitation with the probabilistic Turing machine approach (it's usually equivalently described as the a priori probability and described in terms of monotone TM's) is that you can get samples, but you can't use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It's possible that these approximations would be good enough in practice though.
In this example, if I knew the Pythagorean theorem and had performed the calculation, I would be certain of the right answer. If I were not able to perform the calculation because of logical uncertainty (say the numbers were large) then relative to my current state of knowledge I could avoid dutch books by assigning probabilities to side lengths. This would make me impossible to money pump in the sense of cyclical preferences. The fact that I could gamble more wisely if I had access to more computation doesn't seem to undercut the reasons for using probabilities when I don't.
Now in the extreme adversarial case, a bookie could come along who knows my computational limits and only offers me bets where I lose in expectation. But this is also a problem for empirical uncertainty; in both cases, if you literally face a bookie who is consistently winning money from you, you could eventually infer that they know more than you and stop accepting their bets. I still see no fundamental difference between empirical and logical uncertainties.