Thoughts and problems with Eliezer's measure of optimization power

7timtyler

2Stuart_Armstrong

5Steve_Rayhawk

4jsalvatier

0timtyler

0Stuart_Armstrong

2John_Maxwell

2amcknight

1Stuart_Armstrong

0amcknight

0Stuart_Armstrong

2Kindly

0shminux

0Kindly

1Martín Soto

1MinibearRex

1amcknight

1amcknight

1Cyan

3Stuart_Armstrong

1Alex_Altair

6Manfred

0amcknight

0amcknight

New Comment

OP measures the system's ability to achieve its goals, not how hard these goals are.

Not really. It's often largely a measure of how big the specified solution space is - e.g.:

What's the smallest prime larger than a million, and smaller than a billion?

What's the smallest prime larger than a million, and smaller than a trillion?

Essentially the same problem, bigger solution space - more EOP [Eliezer Optimization Power].

A concept I've played with, coming off of Eliezer's initial take on the problem of formulating optimization power, is: Suppose something generated *N* options randomly and then chose the best. Given the observed choice, what is the likelihood function for *N*?

For continuously distributed utilities, this can be computed directly using beta distributions. Beta(*N*, 1) is the probability density for the highest of *N* uniformly distributed unit random numbers. This includes numbers which are cumulative probabilities for a continuous distribution at values drawn from that distribution, and therefore numbers which are cumulative probabilities at the goodness of an observed choice. (*N* doesn't have to be an integer, because beta distributions are defined for non-integer parameters.)

(The second step of this construction, where you attach a beta distribution to another distribution's CDF, I had to work out by myself; it's not directly mentioned in any discussions of extreme value statistics that I could find. The Mathworld page on order statistics, one step of generalization away, uses the formula for a beta CDF transformed by another CDF, but it still doesn't refer to beta distributions by name.)

If the utilities are discretely distributed, you have to integrate the beta density over the interval of cumulative probabilities that invert to the observed utility.

To handle choices of mixtures, I guess you could modify this slightly, and ask about the likelihood function for *N* given the observed outcome, marginalizing over (as User:Kindly also suggests) the (unobserved) choice of option. This requires a distribution over options and a conditional distribution over observations given options. This would also cover situations with composite options where you only observe one of the aspects of the chosen option.

Opposed optimization might be very crudely modeled by increasing the number on the opposite side of the beta distribution from N. Somewhat related to this is Warren Smith's "Fixed Point for Negamaxing Probability Distributions on Regular Trees", which examines the distributions of position values that result when two opponents take turns choosing the worst option for each other.

Alternatively, instead of a likelihood function for *N*, you could have a likelihood function for an exponential weighting *λ* on the expected utility of the option:

Pr(*A* was chosen)/Pr(*B* was chosen) ∝ exp(*λ*(*U*(*A*)-*U*(*B*))).

Higher values of *λ* would be hypotheses in which better options were more strongly randomly selected over worse ones. (This is something like a logit-response model, for which *λ* (or "*β*", or "1/*μ*") would be the "rationality" parameter. It might be more familiar as a Gibbs distribution.) But this would fail when the expected utility from the null distribution was heavy-tailed, because then for some *λ*≠0 the distribution of optimized expected utilities would be impossible to normalize. Better would be for the cumulative probability at the expected utility of the chosen option to be what was exponentially weighted by *λ*. In that case, in the limit *λ* = (*N*-1) >> 1, the two models give the same distribution.

All of these statistics, as well as Eliezer's original formulation, end up encoding equivalent information in the limit where the utility of each option is an independent sum of many identical light-tailed-distributed components and you're predicting a marginal distribution of utilities for one of those components. In this limit you can safely convert everything to a statistical mechanics paradigm and back again.

Of course, the real criterion for a good formulation of optimization power is whether it helps people who use it in an argument about things that might be optimizers, or who hear it used in such an argument, to come to truthful conclusions.

In this respect, likelihood functions can have the problem that most people won't want to use them: they're hard to compute with, or communicate, unless they belong to a low-dimensional family. The likelihood functions I suggested won't do that except under very simple conditions. I'm not sure what the best way would be to simplify them to something lower-dimensional. I guess you could just communicate a maximum-likelihood estimate and precision for the optimization power parameter. Or, if you chose a reference prior over optimization power, you could communicate its posterior mean and variance.

All of this presupposes that the problems with the unoptimized probability measure can be dealt with. Maybe it would work better to describe the optimization power of a system in terms of a series of levels of simpler systems leading up to that system, where each level's new amount of optimization was only characterized approximately, and only relative to something like a distribution of outputs from the previous level. (This would at least patch the problem where, if thermodynamics is somehow involved in the results of an action, that action can count as very powerful relative to the uniform measure over the system's microstates.) If optimizer B is sort of like choosing the best of N options generated by optimizer A, and optimizer C is sort of like choosing the best of M options generated by optimizer B, that might not have to mean that optimizer C is much like choosing the best of N*M options generated by optimizer A.

Interesting discussion. What do want to do with this measure once you have it? I can see how it would be elegant to have it, but it seems useful to think about how you would use this measure to inform your decisions.

Those building optimization algorithms need measures of their performance on the target problem. They generally measure space-time resources to find a solution. AFAIK, few bother with measuring solution quality Eliezer-style. Counting the better solutions not found is often extremely expensive. Dividing by the size of the soultion space is usually pointless.

Well, it's always good to have a measure of intelligence, it we're worrying about high intelligence beings. Also, I was hoping that it might give a way of formulating "reduced impact AI". Alas, it seems to be insufficient.

Would it make sense for the measure of real-world intelligence to be "optimization power per unit time" or similar? Given arbitrarily large amounts of time, I could do an exhaustive search of the solution space or something like that, which isn't very intelligent or useful.

Another point: It doesn't seem like optimization power is usefully defined for every possible search space. Let's say our search space is countably infinite, each item corresponds to a single unique natural number, and each item's score is equal to its corresponding natural number. I think for a while and come up with the solution that has score 1 million. What's my optimization power? 0? Regardless of the solution I come up with, my optimization power is going to be 0, even though I vastly prefer some solutions to others. (I don't know how much of a problem this would be in practice... it seems like it might depend on the "shape of the infinitude" of a given solution space.)

(Let me know if these don't make sense for some reason; I haven't taken the time to understand EY's idea of optimization power in depth.)

If OP were an entropy, then we'd simply do a weighted sum 1/2(OP(X4)+OP(X7))=1/2(1+3)=2, and then add one extra bit of entropy to represent our (binary) uncertainty as to what state we were in, giving a total OP of 3.

I feel like you're doing something wrong here. You're mixing state distribution entropy with probability distribution entropy. If you introduce mixed states, shouldn't each mixed state be accounted for in the phase space that you calculate the entropy over?

If you down the "entropy is ignorance about the exact microstate" route, this makes perfect sense. And various people have made convincing sounding arguments that this is the right way to see entropy, though I'm not expert myself.

I think a lot of the problems come from starting with final states. Instead, we can let **Y1**, **Y2**, ... be the *possible outputs* of the system S, each with a certain utility attached. There is no such thing as mixed states: you can't half-output of **Y1** and half-output **Y2** (in some cases, you might have a probabilistic strategy which outputs **Y1** 50% of the time, but the utility calculation for that is different). Furthermore, there is no need to deal with the coarse-graining issue.

How do you assign utility to an output that is a mixed final state? As a weighted sum of utilities?

Presumably. It's debatable whether or not this captures risk aversion, but in my opinion it does (and risk aversion falls out of nonlinear utility).

But one thing to keep in mind is that if there is an output **Y1** that leads to a final state **X1**, and an output **Y2** that leads to a final state **X2**, then there is not necessarily an output leading to 0.5**X1**+0.5**X2**.

Even if we just restrict to a tiny, non-zero area, around

X, we get arbitrarily high OP - it's not a fluke or a calculation error. Which means that if we followed the distributive law,_{7}Q=(1-10^{-1000})X+ 10_{0}^{-1000}Xmust have a much larger OP than_{7}X- despite the fact that nearly every possible outcome is better than_{6}Q.

It's not weird that the discrete probability distribution X_7 has infinite OP, because in the real world we never get absolute certainty.

And anyway, that's obviously not how probabilities on the simplex are supposed to work, right? Just define a probability density on the simplex of probabilities (for example, uniform on the simplex), and for any given p in the simplex, its OP(p) is the reciprocal of the integral (using that measure) of the set {q | EV(q)>= EV(p)} (we do need utility for this, not only a preference ordering). Drawing a simplex for U(X_n)=n shows why we wouldn't even naturally expect distributivity to hold.

Similarly, OP measures the system's ability to achieve its very top goals, not how hard these goals are. A system that wants to compose a brilliant sonnet has more OP than exactly the same system that wants to compose a brilliant sonnet while embodied in the Andromeda galaxy. Even though the second is plausibly more dangerous. So OP is a very imperfect measure of how powerful a system is.

I'm confused. A system that has to compose a brilliant sonnet and make sure that it exists in the Andromeda galaxy has to hit a smaller target of possible worlds than a system that wants to compose a brilliant sonnet, and doesn't care where it ends up. Achieving more complex goals require more optimization power, in Eliezer's sense, than achieving simple goals.

It seems that optimization power as it's currently defined would be a value that doesn't change with time (unless the agent's preferences change with time). This might be fine depending what you're looking for, but the definition of optimization power that I'm looking for would allow an agent to gain or lose optimization power.

Notice that OP is defined in terms of the state that S achieved. So it give us a measure of how powerful S is in practice in our model, not some platonic measure of how good S is in general situations. This does not seem a drawback to OP: after all, we want to measure how powerful a system actually is, not how powerful it could be in other circumstances.

I think that if you're looking for a useful measure of optimization power, you will *not* want to use actual achievement rather than potential achievement if you want to have a nicely encapsulated concept that doesn't include properties of the rest of the environment. Clearly the dangerous optimizers are the ones with actual power rather than merely potential power, but I think it's much more clear to just talk about optimization power as potentially dangerous, given an environment conducive to that optimizer.

Measuring optimization power in terms of *achieved* outcome is wrong for the same reason that attributing a highly desired outcome to rationality is wrong. Even if I win the lottery, it was still a negative expected dollars choice to play the lottery. What we really want is something like the minimum utility outcome a process will achieve where the minimum is taken over a given a set of starting states. Or we could have a probability distribution over starting states and get the expected utility of the outcome. ETA: oops, I don't mean utility of the outcome, I mean measure of the set of outcomes with greater utility than the minimum or expected or whatever.

The problem of the size of the solution space isn't a problem if we content ourselves with comparing optimization power of different processes (I think...).

Even if I win the lottery, it was still a negative expected dollars choice to play the lottery.

At that point in the post, I hadn't introduced mixed states. So the only options are to choose among fully deterministic worlds. Obviously if you had the choice between "win the lottery or don't win the lottery", then you'd go for the first.

Later when I bring in mixed states you can model such things as "buy a lottery ticket" as the achieved state, which generally has expected disutility.

I'm confused.

If OP were an entropy, then we'd simply do a weighted sum 1/2(OP(X4)+OP(X7))=1/2(1+3)=2, and then add one extra bit of entropy to represent our (binary) uncertainty as to what state we were in

Why do we add the extra bit? Doesn't the weighted sum already represent that uncertainty?

Suppose the X's had 0 entropy each - that is, they were states with no "internal moving parts," like an electron.

Now imagine that you introduced ignorance into the problem - now we don't know if the electron is in state 4 or state 7, so you assign each P=0.5. What is the entropy of this distribution?

Well, it turns out the entropy (amount of ignorance) is 1 bit. Which is 1 bit more than the 0 bits of entropy that states 4 and 7 had on their own.

It turns out we can't. Imagine that, without changing anything else, the utility of X7 is suddenly set to ten trillion, rather than 7. The OP of X7 is still 3 - it's still the best option, still with probability 1/8.

I think a problem with this line of attack is that you are mixing preferences and utilities. You could imagine two types of optimization power. A preference-centric one and a utility-centric one, both of which can be useful depending what you're talking about. You can map preferences to utilities and utilities to preferences, but one may be more natural than the other for your purposes.

An additional problem with the original definition of optimization power is that it requires the agent to have a rich set of preferences. If it doesn't prefer much but would have been very effective with its capacities had it had more preferences, I'd still say the optimization process was powerful. It seems to me that the most useful concept of optimization power will be independent of the optimizer's actual preferences (or utility).

Back in the day, Eliezer proposed a method for measuring the optimization power (OP) of a system S. The idea is to get a measure of small a target the system can hit:

For example, assume there were eight equally likely possible states {

X,_{0}X, ... ,_{1}X}, and S gives them utilities {0, 1, ... , 7}. Then if S can make_{7}Xhappen, there are two states better or equal to its achievement (_{6}Xand_{6}X), hence it has hit a target filling 1/4 of the total space. Hence its OP is log_{7}_{2}4 = 2. If the best S could manage isX, then it has only hit half the total space, and has an OP of only log_{4}_{2}2 = 1. Conversely, if S reached the perfectX, 1/8 of the total space, then it would have an OP of log_{7}_{2}8 = 3.## The system, the whole system, and everything else in the universe

Notice that OP is defined in terms of the state that S achieved (for the moment this will be a pure world, but later we'll allow probabilistically mixed worlds to be S's "achievement"). So it give us a measure of how powerful S is in practice in our model, not some platonic measure of how good S is in general situations. So an idiot king has more OP than a brilliant peasant; a naive search algorithm distributed across the internet has more OP than a much better program running on Colossus. This does not seem a drawback to OP: after all, we want to measure how powerful a system actually is, not how powerful it could be in other circumstances.

Similarly, OP measures the system's ability to achieve its very top goals, not how hard these goals are. A system that wants to compose a brilliant sonnet has more OP than exactly the same system that wants to compose a brilliant sonnet

while embodied in the Andromeda galaxy. Even though the second is plausibly more dangerous. So OP is a very imperfect measure of how powerful a system is.We could maybe extend this to some sort of "opposed OP": what is the optimization power of S, given that humans want to stop it from achieving its goals? But even there, a highly powerful system with nearly un-achievable goals will still have a very low opposed OP. Maybe the difference between the opposed OP and the standard OP is a better measure of power.

As pointed out by Tim Tyler, OP can also increase if we change the size of the solution space. Imagine an agent that has to print out a non-negative integer N, and whose utility is -N. The agent will obviously print 0, but if the printer is limited to ten digit numbers, its OP is smaller than if the printer is limited to twenty digit numbers: though the solution is just as easy and obvious, the number of ways it "could have been worse" is increased, increasing OP.

## Is it OP an entropy? Is it defined for mixed states?

In his post Eliezer makes a comparison between OP and entropy. And OP does have some of the properties of entropy: for instance if S is optimizing two separate independent processes (and its own utility treats them as independent), then its OP is the sum of the OP for each process. If for instance S hit an area of 1/4 in the first process (OP 2) and 1/8 in the second (OP 3), then it hits an area of 1/(4*8)=1/32 for the joint processes, for an OP of 5. This property, incidentally, is what allows us to talk about "the" entropy of an isolated system, without worrying about the rest of the universe.

But now imagine that our S in the first example can't be sure to hit a pure state, but has 50% chance of hitting

Xand 50% of hitting_{7}X. If OP were an entropy, then we'd simply do a weighted sum 1/2(OP(_{4}X)+OP(_{4}X))=1/2(1+3)=2, and then add one extra bit of entropy to represent our (binary) uncertainty as to what state we were in, giving a total OP of 3. But this is the same OP as_{7}Xitself! And obviously a 50% of_{7}Xand 50% of something inferior cannot be as good as a certainty of the best possible state. So unlike entropy, mere uncertainty cannot increase OP._{7}So how should OP extend to mixed states? Can we write a simple distributive law:

OP(1/2

X+ 1/2_{4}X) = 1/2(OP(_{7}X) + OP(_{4}X)) = 2?_{7}It turns out we can't. Imagine that, without changing anything else, the utility of

Xis suddenly set to ten trillion, rather than 7. The OP of_{7}Xis still 3 - it's still the best option, still with probability 1/8. And yet 1/2_{7}X+ 1/2_{4}X_{7}is now obviously much, much better thanX, which has an OP of 2. But now let's reset_{6}Xto being ten trillion minus 1. Then it still has a OP of 2, and yet is now much much better than 1/2_{6}X+ 1/2_{4}X_{7}.But I may have been unfair in those examples. After all, we're looking at mixed states, and

Xneed not have a fixed OP of 2 in the space of mixed states. Maybe if we looked at the simplex formed by all mixed states made up of {_{6}X,_{0}X, ... ,_{1}X}, we could get these results to work? Since all_{7}Xare equally likely, we'd simply put a uniform measure on that simplex. But now we run into another problem: the OP of_{i}Xhas suddenly shot up to infinity! After all,_{7}Xis now an event of probability zero, better than any other outcome; the log_{7}_{2}of the inverse of its probability is infinity. Even if we just restrict to a tiny, non-zero area, aroundX, we get arbitrarily high OP - it's not a fluke or a calculation error. Which means that if we followed the distributive law,_{7}Q=(1-10^{-1000})X+ 10_{0}^{-1000}Xmust have a much larger OP than_{7}X- despite the fact that nearly every possible outcome is better than_{6}Q.So it seems that unlike entropy, OP cannot have anything resembling a distributive law. The set of possible outcomes that you started with - including any possible mixed outcomes that S could cause - is what you're going to have to use. This sits uncomfortably with the whole Bayesian philosophy - after all, there mixed states shouldn't represent anything but uncertainty between pure states. They shouldn't be listed as separate outcomes.

## Measures and coarse-graining

In the previous section, we moved from using a finite set of equally likely outcomes, to a measure over a simplex of mixed outcomes. This is the natural generalisation of OP: simply compute the probability measure of the states better than what S achieves, and use the log

_{2}of the inverse of this measure as OP.Some of you may have spotted the massive elephant in the room, whose mass twists space and underlines and undermines the definition of OP. What does this probability measure actually represent? Eliezer saw it in his original post:

Or how could I write "there were eight equally likely possible states"

and"S can makeXhappen"? Well, obviously, what I meant was that if S didn't exist, then it would be equally likely that_{6}X_{7}andXand_{6}Xand_{5}Xand..._{4}But wait! These

X's are final states of the world - so they include the information as to whether S existed in them or not. So what I'm actually saying is that {_{i}X(¬S),_{0}X(¬S), ... ,_{1}X(¬S)} (the worlds with no S) are equally likely, whereas_{7}X(S) (the worlds with S) are impossible for i≠6. But what has allowed me to identify_{i}X(¬S) with_{0}X(S)? I'm claiming they're the same world "apart from S" but what does this mean? After all, S can have huge impacts, and_{0}X(S) is actually an impossible world! So I'm saying that "there two worlds are strictly the same, apart that S exists in one of them, but them again, S would never allow that world to happen if it did exist, so, hum..."_{0}Thus it seems that we need to use some sort of coarse-graining to identify

X(¬S) with_{i}X(S), similar to those I speculated on in the reduced impact post._{i}