Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Mentioned in

Bridging Expected Utility Maximization and Optimization

3Rohin Shah

1Whispermute

3Hoagy

2Whispermute

1cubefox

New Comment

We’re not sold on the idea that the target set must be small. We’re skeptical because we need a notion of size, and the probability measure is the most natural way to define size. But if we think of the system as optimizing for that target, shouldn’t the target be

morelikely, and less small?

I would have gone with the entropy of the probability distribution if you wanted to use the probability measure; this captures the idea that the more powerful the optimizer is, the more you can constrain your expectations about what will happen, and the less information there is to gain from knowing exactly which state you end up in.

Thanks for this. We agree it’s natural to think that a stronger optimizer means less information from seeing the end state, but the question shows up again here. The general tension is that one version of thinking of optimization is something like, the optimizer has a high probability of hitting a narrow target. But the narrowness notion is often what is doing the work in making this seem intuitive, and under seemingly relevant notions of narrowness (how likely is this set of outcomes to be realized), then the set of outcomes we wanted to say is narrow is, in fact, not narrow at all. The lesson we take is that a lot of the ways we want to measure the underlying space rely on choices we make in describing the (size of the) space. If the choices reflect our uncertainty, then we get the puzzle we describe. I don't see how moving to thinking in terms of entropy would address this. Given that we are working in continuous spaces, I think one way to see that we often makes choices like this, even with entropy, is to look at continuous generalizations of entropy. When we move to the continuous case, things become more subtle. Differential entropy (the most natural generalization) lacks some of the important properties that makes entropy a useful measure of uncertainty (it can be negative, and it is not invariant under continuous coordinate transformations). You can move to relative entropy to try to fix these problems, but this depends on a choice of an underlying measure *m*. What we see in both these cases is that the generalizations of entropy --- both differential and relative --- rely on *some *choice of a way to describe the underlying space (for differential, it is the choice of coordinate system, and for relative, the underlying measure *m*).

I'm really interested to see this progress, it would feel very healthy if we could have a solid integrated definition of optimizer to work with.

I'm not sure I understand why you don't agree with the 'small' criterion for the target set. It seems that you should be able to say something about the likelihood of the target in the absence of any agent (or if the agent takes a max-ent distribution over actions or something), and that's the relevant notion of smallness, which then becomes large in the presence of the agent. Or is it that you expect it to be difficult to properly specify what it means to have no agent or random decisions?

On the relationships between the three ways of defining acts - is there a trivial way of connecting (1) and (3) by saying that the action that the agent takes in (1) is just baked into the trajectory as some true fact about the trajectory that doesn't have consequences until the agent acts on it? Or instead of the action itself, we could 'bake in' the mapping from some information about the trajectory to the action. Either way we could see this as being determined initially or at the point of decision without a difference in the resulting trajectories.

'all *dependent* variables in the system of equations' - I think this should be 'independent'.

Thank you for this. Yes, the problem is that (in some cases) we think it can sometimes be difficult to specify what the probability distribution would be without the agent. One strategy would be to define some kind of counterfactual distribution that would obtain if there were no agent, but then we need to have some principled way to get this counterfactual (which might be possible). I think this is easier in situations in which the presence of an agent/optimizer is only one possibility, in which case we have a defined probability distribution, conditional on there not being an agent. Perhaps that is all that matters (I am somewhat partial to this), but then I don't think of this as giving us a definition of an optimizing system (since, conditional on their being an optimizing system, there would cease to be an optimizing system---for a similar idea, see Vingean Agency).

I like your suggestions for connecting (1) and (3).

And thanks for the correction!

In Jeffrey's desirability formula you write . But isn't this value always 1 for any i? Which would mean the term can be eliminated since multiplying with 1 makes no difference? Assume p = "the die comes up even". So the partition of p is (the die comes up...) {2,4,6}. And for all i. E.g. P(even|2)=1.

I guess you (Jeffrey) rather meant ?

## Background

This is the second of our (Ramana, Abram, Josiah, Daniel) posts on our PIBBSS research. Our previous post outlined five potential projects that we were considering pursuing this summer. Our task since then has been to make initial attempts at each project. These initial attempts help us to clarify each project, identify the primary problems that need to be solved, and perhaps discover a promising line of attack or two.

This post is aimed at the second proposal from our previous post. There we asked: what is the connection between an agent that maximizes expected utility and an agent that succeeds in action? Here we will outline a few of the problems we see in this area and potential routes for solving them.

## Expected Utility Maximization and Forming Expectations

In economics and formal philosophy, the standard characterization of a rational agent is an agent who maximizes expected utility. Informally, when such an agent has a set of options available to her, she chooses the one that maximizes the expectation of her utility function, where the expectation is taken relative to her subjective degrees of belief. The claim that expected utility (EU) maximization characterizes rationality is usually supported by representation theorems (see here for a good, quick introduction). Expected utility maximization plays a core role in the philosophical foundations of decision theory, game theory, and probabilism. Given that EU maximization plays such a central role in theories of rationality, and given that there is a vast literature surrounding it, it seems very plausible that EU maximization would help us think precisely about agency.

Despite this, it seems that that expected utility theory doesn’t seem to predict anything (or, at the very least, you need to combine EU maximization with certain complexity notions to get something that is weakly predictive). Obviously this is an issue, given that we want notions of agency to constrain our expectations about the behaviour and effects of concrete systems. This is the sense in which we want to bridge the gap between expected an agent’s utility maximization and its success in action (or, rather, our expectation that it will be successful in action). If we know that a (sophisticated) EU maximizer has a certain goal, then we should be able to infer something about the likely unfolding of the world.

The primary obstacle for making EU maximization predictive is that for many systems we can reverse engineer a utility function and a probability distribution such that the system’s behavior is maximizing expected utility relative to that utility / probability pair. For example, consider a rock. We can say that the rock’s highest preference is to just sit there (or to roll, if it gets bumped). Voilà, a maximizer.

One way of understanding what went wrong here is that the system we started with (the rock) has no

obviousutility scale (what John Wentworth calls a “measuring stick of utility”, but which a measurement theorist would call a “representation”). An obvious utility scale is something like money, or food. If someone sees me pay $1 to exchange an apple for an orange and $1 to exchange an orange for an apple, then EU theory says either (1) I’m irrational, or (2) I don’t have a utility function over apples and oranges, or (3) I don’t value having more money given the same fruit. These aren’t particularly strong requirements, but they are requirements nonetheless.In order to understand conditions under which EU maximization can help us form useful expectations about the world, we wanted to identify some models that

doseem to help us form expectations about the world going in a way that seems sensitive to a goal. This brought us to Flint’s post on optimization. Flint defines anoptimizing systemas a system with:^{[1]}Flint likes this definition, among other reasons, because it rules out systems like rocks just sitting somewhere, since the target configuration is no smaller than the whole configuration space. This definition lets us talk about a few quantitative dimensions of optimization. One dimension that we find useful is

robustness. A system is more robust ifMore concisely, an optimizing system is more robust if its target set has a larger basin of attraction, and perturbations don’t affect its tendency to evolve to a certain target set too much.

Maybe there’s a nice connection between optimizing systems and expected utility maximizers. If so, this might give us a way in which EU maximization can constrain our expectations.

For example, imagine we have some agent (some physical system we are inclined to categorize as an agent) that has some goal. The configuration space might be the space of all joint configurations of the agent-system and the rest of the world. The target configuration set is the agent’s goal. The dynamics of the system is how the joint system evolves, so intuitively the agent’s “decision rule” should either (i) select which region the trajectory moves into during the next time interval, (ii) determine the dynamics (either at each time interval or once at the starting time), or (iii) determine an initial configuration. We’ll discuss these options later on when the connection to decision theory is clearer. But, if the jump from decision rule to system dynamics can be formalized in a nice way, then we can actually distinguish an EU maximizer from some suboptimal decision maker by representing them as two different trajectories on the same underlying configuration space.

Now we could ask how an expected utility maximizer might look from this perspective. Do EU maximizers form more robust optimizing systems? One might think this is intuitively plausible: we might expect an EU maximizer to achieve its goals in a wider variety of circumstances, on average.

Is there some feature which all EU-maximizer-dynamics share, so that we can talk about them as a distinct set? Perhaps they have the largest basin of attraction among some set of dynamics.

Do EU maximizers take “shorter paths” to the target set? Imagine a robot hand programmed to solve a Rubik’s cube. One way to think of the optimal decision procedure is: the procedure which solves the cube in the shortest time, or the fewest number of rotations of layers of the cube, or some other resource. This is distinguished from a suboptimal procedure which wastes time or movement. We can perhaps use this metric to distinguish optimizing systems which achieve the target state “intelligently” versus those that “accidentally” hit on it. This is related to the idea that maximizers are those systems which don’t throw away resources.

There are a number of conceptual problems that we will need to figure out to make this connection precise. For example, when we normally think of optimizing systems, it’s not clear (to us, at least) what the status of the “configuration space” is. The configurations that lie off the actual trajectory of the system do not actually exist; they’re mere “possibilities”. But Josiah and Daniel think our best theories of probability and possibility make these thoroughly subjective notions. Possibility is always

epistemicpossibility, relative to an agent.From this perspective, it seems like Flint and Yudkowsky are giving roughly equivalent (under certain assumptions) definitions of optimization, which, crucially, all depend on a probability measure, or a model, and hence are perspective-relative. This is fine: our goal is to understand the conditions under which thinking of something as an EU maximizer can constrain our expectations about the world. If thinking of something an an optimizing system constrains our expectations about the world, then it makes that it would help us to do because the optimizing system matches up in some way to our uncertainty about how the world is. If we can build a bridge from EU maximization to optimizing systems, then EU maximization can inform our expectations through optimization.

## First Steps in Formalizing the Bridge

Here we’ll lay out some initial steps toward the project outlined above, namely finding a connection between EU maximization and Flint’s optimizing systems. We don’t have it fully worked out, but we are writing this up to document our progress, and (hopefully!) to get people’s comments and suggestions.

Consider Flint’s optimizing system. It is, in effect, just a dynamical system with an attractor. It’s intended to capture the intuition that optimizers steer the environment toward their goals (which are improbable).

There are a couple conceptual issues with this statement. First, there’s a reference to “probability”. We think probability is subjective, so the claim that some goal set is “unlikely” only makes sense relative to some perspective, world model, etc. We should say that

some agentthinks the target space is unlikely. (There is a further issue — do they still think it is unlikely once they think there is an optimizer/EU maximizer around? Maybe this “unlikely” notion is something like a “default” probability, but we are not sure that this makes total sense.)Second, there’s the configuration space. Intuitively, a configuration space is the space of all possible configurations of the system. One determines the degrees of freedom of the system (i.e., all independent variables in the system of equations), and this determines the dimensions of the system; the configuration space is then the possible values those variables can take on. But here we need to refer to

possibility, which we also think is subjective. Indeed, only one trajectory through configuration space will actually happen. These possibilities don’t exist in any deep sense; we use them because someone (us, usually) thinks they’re possible. Something like this idea is expressed in a comment by AdamGleave (though, if we take our notion of the configuration space to be the conceptual space of an agent, then adding the dummy dimensions as he suggested looks unmotivated, which is a benefit of thinking of optimizing systems from a subjective point of view).^{[2]}So we have at least two sources of subjectivity if we interpret Flint’s proposal along Bayesian lines. We think that’s okay; we’ll address the conceptual upshots later. But for now, it suffices to say that under a Bayesian interpretation a dynamical system is

agent-orworld model-relative.Again, we think Yudkowsky is expressing the same idea (notice he stresses how his definition is relative to a measure).So, we propose that we think of the dynamical system, the configuration space together with its dynamics and target set, as corresponding to a map, not the territory. The configuration space is all the states of the system some agent thinks is possible. Trajectories are world histories that the agent envisions. The target set is chosen by an agent; it’s their goal.

There’s another conceptual issue we need to clear up: Flint requires that the target set be

smallrelative to the whole configuration space. There’s no canonical notion ofsizefor an arbitrary dynamical system, however. (Dynamical systems are often defined on Rn, in which case there are a few natural metrics — for example, Euclidean distance — but this isn’t true of arbitrary configuration spaces.) One could define a metric on the configuration space, as suggested by Richard Ngo. One could also define a probability measure, to capture the idea that the target set isimprobablecompared to the rest of the space. This feels like a natural thing to do if we’re thinking of the dynamical system as the agent’s map, and brings Flint’s proposal closer to Yudkowsky’s. (But again, there is the issue of defining even this measure, since, once we know that there is an EU maximizer/optimizer as part of the system, the target set might no longer be small.)To define the probability measure we need a probability space. What should the sample space be? One obvious candidate is just the configuration space itself. Then points in the configuration space are points in the sample space. I think, though, that when I’m attempting to evaluate how likely some outcome is, I tend to think of all the possible ways that outcome could happen, and about how likely I think

those waysare, given where I currently am. The “possible ways” are trajectories. So I think that probabilities on points in configuration space should be derived from probabilities on trajectories that end (or at least pass through) that point (possibly indexed by time).Let’s think more about the underlying configuration space. We can think of it as a subset of Rn, though in general it need not be. (Probably in general it need only be a separable metric space.) We’ll use C to denote the configuration space. It is, intuitively, a set of points that the system can move through. The system also comes equipped with a

dynamics, a set of differential equations that determine trajectories through the system. We won’t need to write down a precise dynamics.We now want to define a probability measure on the system. We’ll take our sample space Ω to be the set of all

trajectoriesthrough C. We do this because it doesn’t make much intuitive sense to assign probabilities to C-pointssimpliciter. If I ask “how likely is this point?”, this seems like an imprecise way of asking “how likely is it that the system moves through this point during interval I?”, in which case I’m asking about the measure of the set of trajectories which pass through that point (during some period of time). That question has an answer only relative to a prior measure on trajectories.First, let us consider the situation in which time is discrete. This is helpful for building intuitions about the underlying algebra. We will then discuss how to generalize to continuous time contexts. Thus, note that the following construction only works if our time index set T is countable. If T is uncountable, then important sets (like the set of all continuous trajectories) are not measurable. The following is the standard construction.

So Ω is a set of functions →xi(t):T→C which describe the trajectories. We’ll assume that C⊆Rn. Since we are working in a context where T is countable we’ll let T=N.

This means that our functions →xi are of type N→Rn, so Ω=CT⊆RnN. If we are considering all possible trajectories, then Ω may be written as the product ∏t∈TCt, where each Ct⊆Rn is a copy of the configuration space of the dynamical system. This product comes equipped with the canonical projections πt:CT→C defined, for each ω∈Ω, by πt(ω)=ω(t). We need a σ-algebra to define our probability measure on Ω. We’ll do this via the canonical method for discrete processes: the product algebra in Ω.

Definition 1(F).Let Ω be defined as above, and let B(Rn) denote the Borel sets of Rn. We define F to be the product σ-algebra on Ω, that is, the σ-algebra generated by the cylinder sets {π−1t(Bt)|Bt∈B(Ct),t∈T}.Together with a probability measure P, the triple (Ω,F,P) forms a probability space.

Let’s look at this definition more intuitively. We initially said that we wanted our sample space to be the set of all trajectories in our dynamical system. We ended up with a product of subsets of n-dimensional real space indexed by the naturals. But this isn’t too crazy. Think of each copy Ct as the time-slice of the system at time t. For each t we select a point ct∈Ct. Putting these points together determines a trajectory, a function from T to C. This explains why the canonical projection map sends trajectories to their positions at a time; the factors of Ω are time-slices! Notice that we did not require trajectories to be continuous in our construction. This means that most of these trajectories will be entirely implausible as real trajectories of the system, but that’s fine, our probability measure can be conditioned on the events that describe plausible trajectories if we like (or we can just remove certain points from Ω).

Moving to a context with continuous time, we can define a similar probability space. Let Ω∗ be the set of continuous functions f:R≥0→Rn. Then we can define the following.

Definition 2(F∗). LetΩ∗ be defined as above. Then we define F∗as theσ-algebra generated by the cylinder sets{f∈Ω|f(ti)∈Ii,i=1,2,…,n} where t1,…,tn∈R≥0and eachIi is the product of intervals Πnj=1(aji,bji].This standard σ-algebra has many nice properties (see, for example, Doob’s paper, where the analogous construction is in his Definition 1 [and he considers R instead of Rn]).

Since we have a σ-algebra we can define a probability measure on the configuration space. We can recover probabilities of individual points at a time by taking the probability of all trajectories which pass through that point at that time. Since our algebra contains the cylinders, we can define the probability of many intuitive events.

Now that we have a probability measure it makes sense to define the target set as “small”, i.e., has a small measure. In other words, the target set is unlikely, as Yudkowsky wanted it to be.

We ultimately want to connect this construction up to expected utility maximization. We need to define utilities on the system. Since the probability measure is defined on a sample space of trajectories, we let our utility function be defined on trajectories as well. If we wanted to capture situation in which the utility of a trajectory in some additive sense supervenes on the utility of points along the trajectory, we could define a measure on the positive reals, a utility function on points in the configuration space, and then set the utility of a trajectory equal to the integral of the utility of points with respect to the measure.

We’ve set things up so that it looks a lot like the Jeffrey-Bolker framework for decision theory. That system, unlike Savage’s, has just one epistemic object: propositions. Propositions, for us, are events in the σ-algebra. So they can be cones, or points, or any set of trajectories or points (such as the target set). They also have utilities, defined as above.

The SEP article on decision theory has the following suggestive quote: “The basic upshot of Jeffrey’s theory is that the desirability of a proposition, including one representing acts, depends both on the desirabilities of the different ways in which the proposition can be true, and the relative probability that it is true in these respective ways.” More precisely, we take some proposition p and some finite partition {p1,p2,…,pn} of p. The partition represents various ways that p could be realized. In terms of the dynamical system we’re defining, a partition of some event p is a set of some other events pi which make up p. That is, each pi is a

narrowerset of future trajectories which fall within p. Keep this in mind. Jeffrey then defines thedesirabilityof p, denoted Des(p), byDes(p)=∑iDes(pi)⋅P(p|pi).

Jeffrey’s theory (almost, see Jeffrey’s later paper) collapses the distinction between expected utility and utility; it’s all just desirability. In the language of dynamical systems, this equation expresses the idea that the utility of some set of trajectories p is the probability-weighted average utility of all sets of trajectories pi which partition p. These sets of trajectories pi themselves have their desirability defined by the above equation, i.e., are themselves partitioned into narrower sets of trajectories. In the limit,

desirability is defined in terms of single trajectories. All other desirabilities are defined recursively from them.Note that this looksa lotlike backward induction!Once we have defined utilities on whole trajectories, it’s obvious how we would proceed. We define utilities of

events(which are neither whole trajectories nor points) can then be defined in terms of Jeffrey’s desirability function. This is how we can embed the optimizing system configuration space structure into a decision theoretic structure.Maximizing expected utility is then just maximizing desirability. This prompts a discussion of “acts” from the perspective of the dynamical system. We are trying to set things up so that we can prove a statement like, “EU maximizers are robust optimizers”. We think that really we’ll get something like, “As the set of acts available to the EU agent grows, the robustness of the optimization system increases”. In order to make this precise, we will have to specify what the acts the agent can take are. There are (at least) three different ways to think about acts in this framework:

(1) At certain points in time, we stipulate the agent can choose an element from a partition of the sample space to make true. This is very natural in Jeffrey-Bolker. The agent chooses the act with the highest desirability. Then the theorem should look something like, as the agent gets more control (either through richer sets of acts, or through more choice points), the system as a whole becomes a more robust optimizer.

(2) The agent could control the dynamics. Perhaps they choose a dynamics from some limited set of dynamics at the beginning, or perhaps they get to influence the dynamics along the way. Strictly speaking, if we wanted to embed this in Jeffrey-Bolker, we would need to enrich our σ-algebra by adding events corresponding to different dynamics. This shouldn’t be too much of a problem. Then the theorem should look something like, as the agent gets more control over the dynamics, the system as a whole becomes a more robust optimizer.

(3) If we think instead of the agent as embedded in the system itself, and thus the agent component and the environment component jointly determining a point in configuration space at each time, then we can consider what happens when the agent component is more in the shape of an EU maximizer. In this framing, if we imagine the agent’s shape fixed at the first time step, then what this is doing is restricting the possible starting points in configuration space to those with an agent of that shape. What we would then want to show is something like, the more of an EU maximizer the agent looks like, the more robust of an optimizer the system

restricted to those pointsbecomes. This is the least natural in Jeffrey-Bolker, but seems interesting to think about.## Summary and Future Work

This is, at the time of writing, the furthest we’ve gone toward connecting decision theory with optimizing systems. To summarize the points we’ve made:

morelikely, and less small?^{[3]}One possible next step is to settle on a notion of “act” and then try to prove a theorem stating something to the effect of “as an agent’s set of acts become richer, they form a more robust optimizing system”. There were also questions early in the post that wondered whether EU maximizers might correspond to trajectories which converge to the target set quickly; this is an open question.

Another project we want to pursue is to try to characterize relationships between the three different notions of acts above. For example, if we can show that, under some specified conditions, making choices along the way ((1) and (2)) and choosing an embedded decision-making mechanism at the beginning (3) will result in the same choices, then we can get a kind of generality of the theory. Things might not work out in this way, in which case the project would be to understand better when and why the different act notions come apart.

We would also like to think about the implications of such a connection (if we do indeed find one). One possible consequence is that the two viewpoints (decision theory and optimizing systems) are really in agreement, so one can feel comfortable using either or both when thinking about agency.

^{^}How we are thinking about it, an in-flight perturbation is the same as resetting the system to a new initial state (presumably in some small neighborhood of the current state).

^{^}Another conceptual problem with “configuration space” is its dependence on linear time. John makes a case against this here.

^{^}One other way to try to make sense of this is to think of the target set as small if we weren’t hypothesizing agency. But this seems to run into problems (if we think there is an agent there, how do I define the counterfactual in a principled/satisfactory way). Of course, there might be a solution here—right now we are just noting the problem.