# Optimization power as divergence from default trajectories

by Josh7 min read15th Jun 20222 comments

# 9

Edit: in hindsight, this is a pretty useless post.

In this post, I'll propose a measure of optimization power inspired by Eleizer's measure and Alex Flint's ground of optimization and show how it relates to utility maximization.

The environment will be modeled as a partially observable MDP (POMDP) and I'm going to gloss over embedded agency by assuming that the agent is outside the environment and has a fixed policy.

## The POMDP setting

• The environment consists of a sequence of states , ...
• There is an agent in the environment that takes actions , ... .
• The agent makes a sequence of observations , ...  which are determined by the environment via the function .
• The agent selects a sequence of actions according to the policy
• There is a state transition function  which gives a probability distribution over the next state based on the current state and the action the agent takes.

Here is a visualization:[1]

# What is optimization power?

Intuitively, agents with lots of optimization power are trying to steer the environment in a particular direction. One way to make this precise is to say that agents exert a lot of influence over the probability that any given trajectory occurs, where a trajectory is a sequence of states , ... . In order to measure this influence, we need to know two things:

1. The 'default' probability that the trajectory occurs, which I'll denote . This is the probability that the trajectory occurs when the agent selects actions randomly from a uniform distribution over the actions space, i.e.  =  where  is the random policy.
2. The probability  of the trajectory when the agent is following its actual policy , i.e.  .

To measure influence, you can just take the ratio . If the agent makes a trajectory 2x more likely to occur, it exerts an influence of '2' over that trajectory.

Optimization power (which I'll denote ) could then be viewed as 'expected influence':

In order to prevent this measure from exploding exponentially with the number of states in the trajectory, I'll define it to be the  of this expectation:

This is equivalent to the second order Rényi divergence between the distributions  and

## Problems with this

### What if the optimization power comes from another agent?

Let's say that the agent is a remote control robot that is operated by a human. As long as it is being controlled, it will appear to have a lot of optimization power even though the human is actually doing the optimization.

This also applies to a powerful 'tool AI.' Even if it isn't goal-directed, it could have a large optimization power according to my metric as a result of being useful to humans and amplifying human optimization.

Identifying the source of optimization power seems important for clarifying the notion of goal directedness, which I am not going to try to disentangle in this post.

### Couldn't an agent exert influence without being intelligent?

Consider the 'nuclear bomb' agent. It can take  actions. One of these actions is to . The others do nothing. The random policy can be made arbitrarily close to the policy 'never do anything' by making  large, so we would expect the policy  to have a lot of 'optimization power,' even though the agent is not doing anything internally that looks like optimization.

Some agents have OP actions available to them. Is it fair to say they have more optimization power? I think it is. Optimization power can be viewed as a function of intelligence and the power of its available actions. Note that if the nuclear bomb was intelligent and exploded at a choice moment, it would have a significantly larger expected influence.

### Is this measure useful in deterministic environments?

I don't think so. Consider an environment where every trajectory can only occur if the agent takes a unique series of actions. i.e. . Many simple video games have this property. This makes all trajectories equally probable under the random policy and only one trajectory possible under the true agent policy, so the plots for  and  look something like this:

According to my metric, all policies have the same optimization power in these environments. So, why do I still think my measure is useful?

Alex Flint defines optimization as "a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system."

Stochastic environments have built-in perturbations, which is why they are useful for determining if an agent does optimization. If you can only observe one trajectory under the agent's policy, it is difficult to say whether it would continue to push the world towards a narrow set of trajectories in a variety of circumstances. You could consider counterfactuals in which aspects of the environment were changed, but this would be hard to formalize -- and you would have to decide what kinds of perturbations the agent must be robust to. In the POMDP setting, you can just use the perturbations that are built into the stochasticity of the transition function."

# Connection to utility maximization

It turns out that optimization power in the way I've defined it directly corresponds to a notion of how well the agent is maximizing a utility function.

If  is a utility function over trajectories (not states!), then an agent's -intelligence[2] is it's expected utility as measured by

In order to prevent  from depending on the particular positive affine transformation that is chosen, I'll apply the following normalizations:

-intelligence is another way of measuring optimization power. If we don't know what the agent's utility function is, we can compute an upper bound on its -intelligence by first finding the utility function that the agent most competently maximizes:

Proof in footnotes[3]

is the  divergence between  and . Now it is easy to show that:

Proof in footnotes[4]

-intelligence is bounded by a measure of divergence between the distributions  and . Does this ring a bell? The measure of optimization power  that I defined previously is also a divergence measure. It turns out that ,[5] so:

Knowing how the agent's optimization power  gives you an upper bound on the expected utility that it could be obtaining in the environment.

# Conclusion

I'm not sure how useful this definition of optimization power is. Perhaps it could be connected to power-seeking or used to formalize 2D-robustness, but that's beyond the scope of the post. There is often value in turning fuzzy intuitions into concepts you can do math with, and hopefully, this measure of optimization power is a step in that direction.

1. ^
2. ^

Adapted from Tom Everitt's -intelligence (source)

3. ^

Let the functions u, p, and r, be represented as vectors indexed by the trajectories :  , and .

4. ^

5. ^

This comes from a standard Renyi divergence identity

# 9

2 comments, sorted by Click to highlight new comments since:
New Comment

Some thoughts: one problem I have with Eliezer's definition is that bits don't cost the same in terms of computation because of logical non-omniscience. Imagine two agents in an environment with 2^N possible trajectories corresponding to N bit bitstrings. One agent always outputs the zero bitstring and the other outputs the preimage of some hash function on the zero bitstring or something else expensive like that. Both of these narrow the world down the same amount, and have the same expected influence, but it seems intuitive that if you have to think really hard about each decision you make, then you're also putting in more optimization in some sense.