Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Embedded World-Models

2nd Nov 2018

19nostalgebraist

24Abe Dillon

10nostalgebraist

6Abe Dillon

18lbThingrb

13nostalgebraist

6Davidmanheim

15Davidmanheim

12Rob Bensinger

5Davidmanheim

3Dweomite

3philip_b

3scarcegreengrass

1Liam Donovan

-7Kaellum Eridani

7shminux

New Comment

16 comments, sorted by Click to highlight new comments since: Today at 9:25 PM

This post feels quite similar to things I have written in the past to justify my lack of enthusiasm about idealizations like AIXI and logically-omniscient Bayes. But I would go further: I think that grappling with embeddedness properly will *inevitably* make theories of this *general type* irrelevant or useless, so that "a theory like this, except for embedded agents" is not a thing that we can reasonably want. To specify what I mean, I'll use this paragraph as a jumping-off point:

Embedded agents don’t have the luxury of stepping outside of the universe to think about how to think. What we would like would be a theory of rational belief for situatedagents which provides foundations that are similarly as strong as the foundations Bayesianism provides for dualistic agents.

Most "theories of rational belief" I have encountered -- including Bayesianism in the sense I think is meant here -- are framed at the level of an evaluator outside the universe, and have essentially no content when we try to transfer them to individual embedded agents. This is because these theories tend to be derived in the following way:

- We want a theory of the best possible behavior for agents.
- We have some class of "practically achievable" strategies , which can actually be implemented by agents. We note that an agent's observations provide some information about the quality of different strategies . So if it were possible to follow a rule like "find the best given your observations, and then follow that ," this rule would spit out very good agent behavior.
- Usually we soften this to a performance-weighted average rather than a hard argmax, but the principle is the same: if we could search over all of , the rule that says "do the search and then follow what it says" can be competitive with the very best . (Trivially so, since it has access to the best strategies, along with all the others.)
- But usually . That is, the strategy "search over all practical strategies and follow the best ones" is not a
*practical*strategy. But we argue that this is fine, since we are constructing a theory of*ideal*behavior. It doesn't have to be practically implementable.

For example, in Solomonoff, is defined by computability while is allowed to be uncomputable. In the LIA construction, is defined by polytime complexity while is allowed to run slower than polytime. In logically-omniscient Bayes, finite sets of hypotheses can be manipulated in a finite universe but the full Boolean algebra over hypotheses generally cannot.

I hope the framework I've just introduced helps clarify what I find unpromising about these theories. By construction, any agent you can actually design and run is a *single* element of (a "practical strategy"), so every fact about rationality that can be incorporated into agent design gets "hidden inside" the individual , and the only things you can learn from the "ideal theory" are things which can't fit into a practical strategy.

For example, suppose (reasonably) that model averaging and complexity penalties are broadly good ideas that lead to good results. But all of the model averaging and complexity penalization that can be done *computably* happens inside some Turing machine or other, at the level "below" Solomonoff. Thus Solomonoff *only* tells you about the extra advantage you can get by doing these things *uncomputably*. Any kind of nice Bayesian average over Turing machines that can happen computably is (of course) just another Turing machine.

This also explains why I find it misleading to say that good practical strategies constitute "approximations to" an ideal theory of this type. Of course, since just says to follow the best strategies in , if you are following a very good strategy in your behavior will tend to be close to that of . But this cannot be attributed to *any* of the searching over that does, since you are not doing a search over ; you are executing a *single member* of and ignoring the others. Any searching that can be done practically collapses down to a single practical strategy, and any that doesn't is not practical. Concretely, this talk of approximations is like saying that a very successful chess player "approximates" the rule "consult all possible chess players, then weight their moves by past performance." Yes, the skilled player will *play similarly* to this rule, but they are not *following* it, not even approximately! They are only themselves, not any other player.

Any theory of ideal rationality that wants to be a guide for embedded agents will have to be constrained in the same ways the agents are. But theories of ideal rationality usually get *all of their content* by going to a level above the agents they judge. So this new theory would have to be a very different sort of thing.

I think that grappling with embeddedness properly willinevitablymake theories of thisgeneral typeirrelevant or useless

I disagree. This is like saying, "we don't need fluid dynamics, we just need airplanes!". General mathematical formalizations like AIXI are just as important as special theories that apply more directly to real-world problems, like embedded agents. Without a grounded formal theory, we're stumbling in the dark. You simply need to understand it for what it is: a generalized theory, then most of the apparent paradoxes evaporate.

Kolmogorov complexity tells us there is no such thing as a universal lossless compression algorithm, yet people happily "zip" data every day. That doesn't mean Kolmogorov wasted his time coming up with his general ideas about complexity. Real world data tends to have a lot of structure because we live in a low-entropy universe. When you take a photo or record audio, it doesn't look or sound like white noise because there's structure in the universe. In math-land, the vast majority of bit-strings would look and sound like incompressible white noise.

The same holds true for AIXI. The vast majority of problems drawn from problem space would essentially be, "map this string of random bits to some other string of random bits" in which case, the best you can hope for is a brute-force tree-search of all the possibilities weighted by Occam's razor (i.e. Solomonoff inductive inference).

Most "theories of rational belief" I have encountered -- including Bayesianism in the sense I think is meant here -- are framed at the level of an evaluator outside the universe, and have essentially no content when we try to transfer them to individual embedded agents. This is because these theories tend to be derived in the following way: ...

I can't speak to the motivations or processes of others, but these sound like assumptions without much basis. The reason I tend to define intelligence outside of the environment is because it generalizes much better. There are many problems where the system providing the solution can be decoupled both in time and space from the agent acting upon said solution. Agents solving problems in real-time are a special case, not a general case. The general case is: an intelligent system produces a solution/policy to a problem and an agent in an environment acts upon that solution/policy. An intelligent system might spend all night planning how to most efficiently route mail trucks the next morning, the drivers then follow those routes. A real-time model in which the driver has to plan her routs while driving is a special case. You can think of it as the drivers brain coming up with the solution/policy and the driver acting on it in situ.

You could make the case that the driver has to do on-line/real-time problem solving to navigate the roads and avoid collisions, etc. in which case the full solution would be a hybrid of real-time and off-line formulation (which is probably representative of most situations). Either way, constraining your definition of intelligence to only in-situ problem solving excludes many valid examples of intelligence.

Also, it doesn't seem like you understand what Solomonoff inductive inference is. The weighted average is used because there will typically be multiple world models that explain your experiences at any given point in time and Occam's razor says to favor shorter explanations that give the same result, so you weight the predictions of each model by the inverse of the length of the model (in bits, usually).

Concretely, this talk of approximations is like saying that a very successful chess player "approximates" the rule "consult all possible chess players, then weight their moves by past performance." Yes, the skilled player will play similarlyto this rule, but they are not followingit, not even approximately! They are only themselves, not any other player.

I think you're confusing behavior with implementation. When people talk about neural nets being "universal function approximators" they're talking about the input-output behavior, not the implementation. Obviously the implementation of an XOR gate is different than a neural net that approximates an XOR gate.

I think you're confusing behavior with implementation.

I'm definitely not treating these as interchangeable -- my argument is about how, in a certain set of cases, they are importantly *not* interchangeable.

Specifically, I'm arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.

I don't understand how the rest of your points engage with my argument. Yes, there is a good reason Solomonoff does a weighted average and not an argmax; I don't see how this affects my argument one way or the other. Yes, fully general theories can be valuable even when they're not practical to apply directly to real problems; I was arguing that a *specific type of* fully general theory lacks *a specific type of* practical value, one which people sometimes expect that type of theory to have.

I was arguing that aspecific type offully general theory lacksa specific type ofpractical value

In that case, your argument lacks value in its own right because it is vague and confusing. I don't know any theories that fall in the "specific type" of general theory you tried to describe. You used Solomonoff as an example when it doesn't match your description.

one which people sometimes expect that type of theory to have.

When someone develops a formalization, they have to explicitly state its context and any assumptions. If someone expects to use Kolmogorov complexity theory to write the next hit game, they're going to have a bad time. That's not Kolmogorov's fault.

I'm arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.

Of course it can. It provides a different way of constructing a solution. You can start with an ideal then add assumptions that allow you to arrive at a more practicable implementation.

For instance, in computer vision; determining how a depth camera is moving in a scene is very difficult if you use an ideal formalization directly, but if you assume that the differences between two point-clouds are due primarily to affine transformations, then you can use the computationally cheap iterative-closest-point method based on Procrustes analysis to approximate the formal solution. Then, when you observe anomalous behavior, your usual suspects will be the list of assumptions you made to render the problem tractable. Are there non-affine transformations dominating the deltas between point clouds? Maybe that's causing my computer vision system to glitch. Maybe I need some way to detect such situations and/or some sort of fall-back.

Not only that, but there are many other reasons to formalize ideas like intelligence other than to guide the practical implementation of intelligent systems. You can explore the concept of intelligence and its bounds.

Again if you understand a tool for what it is, there's no problem. Of-course trying to use a purely formalized theory directly to solve real-world problems is going to yield confusing results. Trying to engineer a bridge using the standard model of particle physics is going to be just as difficult. It's not a fault of the theory, nor does it mean studying the theory is pointless. The problem is that you want it to be something it's not.

I don't understand how the rest of your points engage with my argument.

It's hard to engage much with your argument because it's made up of vague straw men:

Most "theories of rational belief" I have encountered

I have no solid context to engage you about. If you're talking about AIXI, then you've misunderstood AIXI because it isn't about choosing strategies out of a set of all strategies. In-fact, you've got Solomonoff Inductive inference completely wrong too:

For example, in Solomonoff, S is defined by computability while R is allowed to be uncomputable.

Solomonoff inductive inference is defined in the context of an agent observing an environment. That's all. It doesn't take actions. It just observes and predicts. There is no set of strategies. There is no rule for selecting a strategy, and given your definition of S and R:

We have some class of "practically achievable" strategies S, which can actually be implemented by agents. We note that an agent's observations provide some information about the quality of different strategies s∈S. So if it were possible to follow a rule like R≡ "find the best s∈S given your observations, and then follow that s," this rule would spit out very good agent behavior.

It doesn't even make sense that R would be incomputable given that S is computable.

When you say:

Concretely, this talk of approximations is like saying that a very successful chess player "approximates" the rule "consult all possible chess players, then weight their moves by past performance." Yes, the skilled player willplay similarlyto this rule, but they are notfollowingit, not even approximately! They are only themselves, not any other player.

On what grounds do you even justify the claim that the chess player's behavior is "not even approximately" following the rule of "consult all possible chess players, then weight their moves by past performance."?

Actually, what vanilla AIXI would prescribe is a full tree traversal similar to the min-max algorithm. Which is, of-course; impractical. However, there are things you can do to approximate a full tree traversal more practically. You can build approximate models based on experience like "given the state of the board, what moves should I consider" which prunes the width of the tree, and "given the state of the board, how likely am I to win" which limits the depth of the tree. So instead of considering every possible move at every possible step of the game to every possible conclusion, you only consider 3-4 possible moves per step and only maybe 4-5 steps into the future. Maybe diminishing the number of moves per step.

Yes, there is a good reason Solomonoff does a weighted average and not an argmax

Did you edit your original comment? Because I could have sworn you said more disparaging the use of "arbitrary" weights. At any rate, it's not a "performance-weighted average" as it isn't about performance. It's about uncertainty.

Thanks, this is a very clear framework for understanding your objection. Here's the first counterargument that comes to mind: Minimax search is a theoretically optimal algorithm for playing chess, but is too computationally costly to implement in practice. One could therefore argue that all that matters is computationally feasible heuristics, and modeling an ideal chess player as executing a minimax search adds nothing to our knowledge of chess. OTOH, doing a minimax search of the game tree for some bounded number of moves, then applying a simple board-evaluation heuristic at the leaf nodes, is a pretty decent algorithm in practice.

Furthermore, it seems like there's a pattern where, the more general the algorithmic problem you want to solve is, the more your solution is compelled to resemble some sort of brute-force search. There are all kinds of narrow abilities we'd like an AGI to have that depend on the detailed structure of the physical world, but it's not obvious that any such structure, beyond hypotheses about what is feasibly computable, could be usefully exploited to solve the kinds of problem laid out in this sequence. So it may well be that the best approach turns out to involve some sort of bounded search over simpler strategies, plus lots and lots of compute.

OTOH, doing a minimax search of the game tree for some bounded number of moves, then applying a simple board-evaluation heuristic at the leaf nodes, is a pretty decent algorithm in practice.

I've written previously about this kind of argument -- see here (scroll down to the non-blockquoted text). tl;dr we can often describe the same optimum in multiple ways, with each way giving us a different series that approximates the optimum in the limit. Whether any one series does well or poorly when truncated to N terms can't be explained by saying "it's a truncation of the optimum," since they all are; these truncations properties are facts about the different series, not about the optimum. I illustrate with different series expansions for .

Furthermore, it seems like there's a pattern where, the more general the algorithmic problem you want to solve is, the more your solution is compelled to resemble some sort of brute-force search.

You may be right, and there are interesting conversations to be had about when solutions will tend to look like search and when they won't. But this doesn't feel like it really addresses my argument, which is not about "what kind of algorithm should you use" but about the weirdness of the injunction to optimize over a space containing every procedure you could ever do, including all of the *optimization* procedures you could ever do. There is a logical / definitional weirdness here that can't be resolved by arguments about what sorts of (logically / definitionally unproblematic) algorithms are good or bad in what domains.

...the weirdness of the injunction to optimize over a space containing every procedure you could ever do, including all of theoptimizationprocedures you could ever do.

My most recent preprint discusses multi-agent Goodhart ( https://arxiv.org/abs/1810.10862 ) and uses the example of poker, along with a different argument somewhat related to the embedded agent problem, to say why the optimization over strategies needs to include optimizing over the larger solution space.

To summarize and try to clarify how I think it relates, strategies for game-playing must at least implicitly include a model of the other player's actions, so that an agent can tell which strategies will work against them. We need uncertainty in that model, because if we do something silly like assume they are rational Bayesian agents, we are likely to act non-optimally against their actual strategy. But the model of the other agent itself needs to account for their model of our strategy, including uncertainty about our search procedure for strategies - otherwise the space is clearly much too large to optimize over.

Does this make sense? (I may need to expand on this and clarify my thinking...)

Epistemic Status: Attempting to bridge what I see as a missing inferential link in the post / sequence.

(This is a point which I picked up on because I am familiar with what Abram was thinking about 3 years ago, and I was surprised it didn't get mentioned. Maybe it was assumed to be obvious, maybe it's not as relevant as I assumed, but I think some others will find the point worth a bit more explaining.)

The reason we care about the relative size of the world and the model is that we have a deep reason to think that a model smaller than the world cannot perform optimally - it's the Conant-Ashby Theorem, which states "every good regulator of a system must be a model of that system." For a great explanation of this idea, there is a paper that Abram pointed me to years ago, "Every good key must be a model of the lock it opens (The Conant & Ashby Theorem Revisited)" To quote from there:

"What all of this means, more or less, is that the pursuit of a goal by some dynamic agent (Regulator) in the face of a source of obstacles (System) places at least one particular and unavoidable demand on that agent, which is that the agent's behaviors must be executed in such a reliable and predictable way that they can serve as a representation (Model) of that source of obstacles."

To lay the connection out explicitly, if the agent model of the world is not isomorphic to the world, the actions chosen will be sub-optimal. This is bad if we assume the world is not isomorphic to a simple model (and this sequence is laying out reasons that for reflexive agents, there cannot be such a computational model.)

Abram has made a major update to the post above, adding material on self-reference and the grain of truth problem. The corresponding text on the MIRI Blog version has also been expanded, with some extra material on those topics plus logical uncertainty.

Some of these issues (obviously) are not limited to AI. Specifically, the problem of how to deal with multi-level models and "composibility" was the subject of an applied research project for military applications by my dissertation chair, Paul Davis, here: https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG101.pdf -

"The appealing imagery of arbitrary plug-and-play is fatally flawed for complex models... The more-complex [lower level] model components have typically been developed for particular purposes and depend on context-sensitive assumptions, some of which are tacit."

This issue has formed the basis of a fair amount of his later work as well, but this work focuses on practical advice, rather than conceptual understanding of the limitations. Still, that type of work may be useful as inspiration.

This article talks about multi-level models, where you somehow switch between cheaper models and more-accurate models depending on your needs. Would it be useful to generalize this idea to switching between multiple "same-level" models that are differentiated by something other than cheap vs. accurate?

For example, one might have one model that groups individual people together into "families", another that groups them into "organizations", and a third that groups them into "ideologies". None of those models seems to be strictly "higher" than another (e.g. neither families nor ideologies are composed of each other), and different models might be useful for different problems.

One could also imagine combining all of those into one unified model, of course. But it might be wasteful to model all of them for problems where you only really care about one.

I feel like humans do something like this.

If multiple "same-level" models can coexist, then one strategy for holding onto your values while inventing new models might be to always hold onto whichever model the values were originally defined in, even if you add more models alongside it.

I think this article is too vague, because for almost almost claims in it I am not sure if I understand the author correctly. Below I am posting my notes. If you want to help me and others clarify understanding of this article, consider answering **questions in bold**, or, if you see a mistake in my notes, correcting it. Also I hope my notes help the author as a piece of feedback. I've only finished 2/3 of the article so far, but posting notes because I might become less interested in this later.

Also it's unfortunate that unlike in https://intelligence.org/2018/11/02/embedded-models/ version of this article we don't have hyperlinks to explanations of various concepts here. Perhaps you could add them under the corresponding images? Or have images themselves be hyperlinks or reference links (like in academic articles) to the bottom of the document where all relevant links would be stored grouped by image number.

The post says an embedded agent can't hold an exact model of the
environment in its head, can't think through the consequences of every
potential course of action, can't hold in its head every possible way the environment
could be. **I think this may not be necessarily true and I am not sure what assumptions the author used here.**

**It seems the whole article assumes countable probability spaces (even before the AIXI part). I wonder why and I wonder how realizability is defined for uncountable probability space.**

--

Regarding relative bounded loss and what this bound is for, my best guess is as follows. Here I use non-conditional probability notation instead of . Let be the elementary event that is actually true. Let "expert" be a (not necessarily elementary) event, such that . Then loss of the expert is . Loss of the prior is . For their difference it holds that .

Initially, is your initial trust in expert h, and in each case where it is even a little bit more correct than you, you increase your trust accordingly; the way you do this ensures you assign an expert probability 1 and hence copy it precisely before you lose more than compared to it.

Remember, . It follows
that probability of h increases given evidence e if and only if , i.e. h "is even a little bit more
correct than you". **But I don't understand the bit about copying the expert
h precisely before losing more than **. If the expert is an event, how can you "copy" it?

Combining this with the previous idea about viewing Bayesian learning as a way of allocating “trust” to “experts” which meets a bounded loss condition, we can see the Solomonoff prior as a kind of ideal machine learning algorithm which can learn to act like any algorithm you might come up with, no matter how clever.

It is assuming all possible algorithms are computable, not that the world is.

I don't understand this. Our probability space is the cartesian product of
the set of all possible UTM programs and the set of all possible UTM
working tape initial configurations. Or, equivalently, the set of outputs
of UTM under these conditions. Hence our whole hypothesis space only
includes computable worlds. **What does "can learn to act like any algorithm"
mean here?** "It's getting bounded loss on its predictive accuracy as
compared with any computable predictor." Huh? **Does predictor here mean
expert h? If yes, what does it mean that h is computable and why? All in
all, is the author claiming it's impossible to have a better computable
predictor than AIXI with Solomonoff prior, even if it has non-computable
worlds in the probability space?**

probabilities may not be calibrated identification of causal structure may not work

**What do these mean?** I only know informally what calibration means
related to forecasting.

So, does AIXI perform well without a realizability assumption?

**How is AIXI even defined without realizability, i.e. when the actual world isn't in the probability space, or it has zero prior probability?**

This is fine if the world “holds still” for us; but because the map is in the world, it may implement some function.

**Is this about the world changing because of the agent just thinking? Or
something else?**

It should be noted, though, that there are additional barriers to getting this property in a game-theoretic setting; so in their common usage cases, "grain of truth" is technically demanding while "realizability" is a technical convenience.

...

In game theory, on the other hand, the assumption itself may be inconsistent. This is because games commonly yield paradoxes of self-reference.

From the former paragraph I don't understand anything except that (the author claims) game theory has more problems with grain of truth / realizability, than AIXI. After the latter paragraph, my best guess is: for any game, if there is no pure strategy equilibrium in it, then we say it has no grain of truth, because for every possible outcome rational agents wouldn't choose it.

If we put weight in both places until a proof rules one out, the beliefs just oscillate forever rather than doing anything useful.

Weights represent possible worlds, therefore they are on the scales right
from the beginning (the prior), we never put new weights on the scales. **My
probably incorrect guess of what the author is saying is**
some agent which acts like AIXI but
instead of updating on pieces of evidence as soon as he receives it, he
stockpiles it, and at some points he (boundedly) searches for proofs that
these pieces of evidence are in favor of some hypothesis and performs
update only when he finds them. **But still, why oscillation?**

Any computable beliefs about logic must have left out something, since the tree will grow larger than any container.

I interpret it as there are infinitely many theorems, hence an agent with finite amount of space or finite amount of computation steps can't process all of them.

Another consequence of the fact that the world is bigger than you is that you need to be able to use high-level world models: models which involve things like tables and chairs.

This is related to the classical symbol grounding problem; but since we want a formal analysis which increases our trust in some system, the kind of model which interests us is somewhat different. This also relates to transparency and informed oversight: world-models should be made out of understandable parts.

**No idea what the second quoted paragraph means.**

All in all, I doubt that high level world models are necessary. And it's very not clear what is meant by "high level" or "things" here. Perhaps embedded agents can (boundedly) reason about the world in other ways, e.g. by modeling only part of the world.

https://intelligence.org/files/OntologicalCrises.pdf explains the ontological crisis idea better. Suppose our AIXI-like agent thinks the world is an elementary outcome of some parameterized probability distribution with the parameter θ. θ is either 1 or 2. We call the set of elementary outcomes with θ=1 the first ontology (e.g. possible worlds running on classical mechanics), and the set of elementary outcomes with θ=2 the second ontology (e.g. possible worlds running on superstrings theory). The programmer has only programmed the agent's utility functiom for θ=1 part, i.e. a u function from ontology 1 to real numbers. The agent keeps count of which value of θ is more probable and chooses actions by considering only current ontology. If at some point he decides that the second ontology is more useful, he switches to it. The agent should extrapolate the utility function to θ=2 part. How can he do it?

I can follow most of this, but i'm confused about one part of the premise.

What if the agent created a low-resolution simulation of its behavior, called it Approximate Self, and used that in its predictions? Is the idea that this is doable, but represents a unacceptably large loss of accuracy? Are we in a 'no approximation' context where any loss of accuracy is to be avoided?

My perspective: It seems to me that humans also suffer from the problem of embedded self-reference. I suspect that humans deal with this by thinking about a highly approximate representation of their own behavior. For example, when i try to predict how a future conversation will go, i imagine myself saying things that a 'reasonable person' might say. Could a machine use a analogous form of non-self-referential approximation?

Great piece, thanks for posting.

In order to do this, the agent needs to be able to reason approximately about the results of their own computations, which is where logical uncertainty comes in

(A longer text-based version of this post is also available on MIRI's bloghere, and the bibliography for the whole sequence can be foundhere)(Edit: This post had 15 slides added on Saturday 10th November.)