## LESSWRONGLW

Cole Wyeth

I am a PhD student in computer science at the University of Waterloo.

My current research is related to Kolmogorov complexity. Sometimes I build robots, professionally or otherwise.

See my personal website colewyeth.com for an overview of my interests.

# Sequences

Deliberative Algorithms as Scaffolding

# Wiki Contributions

Sorted by

In this example, if I knew the Pythagorean theorem and had performed the calculation, I would be certain of the right answer. If I were not able to perform the calculation because of logical uncertainty (say the numbers were large) then relative to my current state of knowledge I could avoid dutch books by assigning probabilities to side lengths. This would make me impossible to money pump in the sense of cyclical preferences. The fact that I could gamble more wisely if I had access to more computation doesn't seem to undercut the reasons for using probabilities when I don't.

Now in the extreme adversarial case, a bookie could come along who knows my computational limits and only offers me bets where I lose in expectation. But this is also a problem for empirical uncertainty; in both cases, if you literally face a bookie who is consistently winning money from you, you could eventually infer that they know more than you and stop accepting their bets. I still see no fundamental difference between empirical and logical uncertainties.

I've read a bit of the logical induction paper, but I still don't understand why Bayesian probability isn't sufficient for reasoning about math. It seems that the Cox axioms still apply to logical uncertainty, and in fact "parts of the environment are too complex to model" is a classic example justification for using probability in A.I. (I believe it is given in AIMA). At a basic rigorous level probabilities are assigned to percepts, but we like to assign them to English statements as well (doing this reasonably is a version of one of the problems you mentioned). Modeling the relationships between strings of mathematical symbols probabilistically seems a bit more well justified than applying them to English statements if anything, since the truth/falsehood of provability is well-defined in all cases*. Pragmatically, I think I do assign a probabilities to mathematical statements being true/provable when I am doing research, and I am not conscious of this leading me astray!

*the truth of statements independent of (say) ZFC is a bit more of a philosophical quagmire, though it still seems that assigning probabilities to provable/disprovable/independent is a pretty safe practice. This might also be a use case for semimeasures as defective probabilities.

To me, the natural explanation is that they were not trained for sequential decision making and therefore lose coherence rapidly when making long term plans. If I saw an easy patch I wouldn't advertise it, but I don't see any easy patch - I think next token prediction works surprisingly well at producing intelligent behavior in contrast to the poor scaling of RL in hard environments. The fact that it hasn't spontaneously generalized to succeed at sequential decision making (RL style) tasks is in fact not surprising but would have seemed obvious to everyone if not for the many other abilities that did arise spontaneously.

1. This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the "anvil problem" argued very rigorously - we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the "Cartesian boundary" where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see "Artificial General Intelligence and the Human Mental Model":

"Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)"

This (and the rest of the chapter's description of AIXI) is pretty accurate, but there's a technical sense in which AIXI does not "assume the maximizing action will always be chosen." Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as "death." Note that I am speaking of the latest definition of AIXI that uses a recursive value function - see the section of Jan Leike's PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.

See also my proposed off-policy definition of AIXI that should deal with brain surgery reasonably.

2. Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment - this is reasonable-ish because AIXI's environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn't do so perfectly perhaps because some complicated events are modeled as "random." The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.

It's true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being "a working AI."

3. This is right, but not really magical - AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn't include AIXItl itself!

It may be possible to formalize your idea as in Orseau's "Space-Time Embedded Intelligence," but it would no longer bear much resemblance to AIXItl. With that said, translating the informal idea you've given into math is highly nontrivial. Which parts of its physical world should be preserved and what does that mean in general? AIXI does not even assume our laws of physics.

Since both objections have been pointers to the definition, I think it's worth noting that I am quite familiar with the definition(s) of AIXI; I've read both of Hutter's books, the second one several times as it was drafted.

Perhaps there is some confusion here about the boundaries of an AIXI implementation. This is a little hard to talk about because we are interested in "what AIXI would do if..." but in fact the embeddedness questions only make sense for AIXI implemented in our world, which would require it to be running on physical hardware, which means in some sense it must be an approximation (though perhaps we can assume that it is a close enough approximation it behaves almost exactly like AIXI). I am visualizing AIXI running inside a robot body. Then it is perfectly possible for AIXI to form accurate beliefs about its body, though in some harder-to-understand sense it can't represent the possibility that it is running on the robots hardware. AIXI's cameras would show its robot body doing things when it took internal actions - if the results damaged the actuators AIXI would have more trouble getting reward, so would avoid similar actions in the future (this is why repairs and some hand-holding before it understood the environment might be helpful).  Similarly, pain signals could be communicated to AIXI as negative (or lowered positive) rewards, and it would rapidly learn to avoid them. It's possible that an excellent AIXI approximation (with a reasonable choice of UTM for its prior) would rapidly figure out what was going on and wouldn't need any of these precautions to learn to protect its body - but it seems clear to me that they would at least improve AIXI's chances of success early in life.

With that said, the prevailing wisdom that AIXI would not protect its brain may well be correct, which is why I suggested the off-policy version. This flaw would probably lead to AIXI destroying itself eventually, if it became powerful enough to plan around its pain signals. What I object to is only the dismissal/disagreement with @moridinamael's comment, though it seems to me to be directionally correct and not to make overly strong claims.

I tend to think of this through the lens of the AIXI model - what assumptions does it make and what does it predict? First, one assumes that the environment is an unknown element of the class of computable probability distributions (those induces by probabilistic Turing machines). Then the universal distribution is a highly compelling choice, because it dominates this call while also staying inside it. Unfortunately the computability level does worsen when we consider optimal action based on this belief distribution. Now we must express some coherent preference ordering over action/percept histories, which can be represented as a utility function by VNM. Hutter further assumed it could be expressed as a reward signal, which is a kind of locality condition, but I don't think it is necessary for the model to be useful. This convenient representation allows us to write down a clean specification of AIXI's behavior, relating its well-specified belief distribution and utility function to action choice. It is true that setting aside the reward representation, choosing an arbitrary utility function can justify any action sequence for AIXI (I haven't seen this proven but it seems trivial because all AIXI assigns positive probability to any finite history prefix), but in a way this misses the point: the mathematical machinery we've built up allows us to translate conclusions about AIXI's preference ordering to its sequential action choices and vice versa through the intermediary step of constraining its utility function.

I am confused that this has been heavily downvoted, it seems to be straightforwardly true insofar as it goes. While it doesn't address the fundamental problems of embeddedness for AIXI, and the methods described in the comment would not suffice to teach AIXI to protect its brain in the limit of unlimited capabilities, it seems quite plausible that an AIXI approximation developing in a relatively safe environment with pain sensors, repaired if it causes harm to its actuators, would have a better chance at learning to protect itself in practice. In fact, I have argued that with a careful definition of AIXI's off-policy behavior, this kind of care may actually be sufficient to teach it to avoid damaging its brain as well.

Interesting - intuitively it seems more likely to me that a well constructed mind just doesn't develop sophisticated demons. I think plenty of powerful optimization algorithms are not best understood as creating mesa-optimizers. The low-level systems of the human brain like the visual stream don't seem to ever attempt takeover. I suppose one could make the claim that some psychological disorders arise from optimization "daemons" but this seems mostly seems like pure speculation and possibly an abuse of the terminology. For instance it seems silly to describe optical illusions as optimization daemons.

Yes, I mostly agree with everything you said - the limitation with the probabilistic Turing machine approach (it's usually equivalently described as the a priori probability and described in terms of monotone TM's) is that you can get samples, but you can't use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It's possible that these approximations would be good enough in practice though.