Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(Crossposted from my blog)

In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there's a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it's trying really hard to maximize the productivity of a paperclip factory. For a real number , a -quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion , i.e., randomly among any strategy that is better than a proportion  of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than . Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure.

Roughly speaking, my claim is that for a natural number , up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most  bits performs as well in any goal as a -quantilizer. More precisely, to explain what I mean by "up to a constant additive factor", there is a reasonably-sized constant  such that an optimizer with  bits of outputs performs as well as a -quantilizer, and a -quantilizer performs as well as an optimizer with  bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicitly with strategies like "Walk to a computer and type this program, and attach to the computer the accessories necessary to run the strategy specified by the program." To really make the result true up to a constant factor and not a logarithmic factor or anything like that you need to careful about encoding using stuff like prefix-free encodings but I don't think that's really important and will gloss over it.

Quantilizers are at least as good as bounded-output optimizers: If a particular strategy is optimal among all strategies of length  bits, then since there are at most  such strategies this one optimizing strategy already makes up a proportion  of all strategies with  bits, so a -quantilizer over strategies with  bits must use exactly this strategy or an equally-performing strategy. If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that, a proportion  of all strategies will have length , so a proportion  will be identical to this -bit optimal strategy, so a -quantilizer will perform at least as well as this strategy.

Bounded-output optimizers are at least as good as quantilizers: Using a deterministic pseudo-random number generator, it is possible to precisely specify a sequence  of strategies that are fairly sampled from the space of all strategies. A proportion  of these strategies are also in the top  of performance among all strategies. So the smallest  such that  is among the top  in performance is statistically certain to have  for a small constant . It takes  bits to specify the , plus an additional  bits to specify the pseudo-random sequence and the command to perform strategy  given . So an optimizer with  bits of output can specify the strategy  which is -quantilizing, so the actual optimizing strategy is at least as good.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 6:38 AM

You can quantize any distribution. For the random distribution, you need to use 0.000... 01% quantilization to get anything useful at all. (In non-trivial environments, the vast majority of random actions are useless junk. If you have a humanoid coffee making robot, almost all random motor inputs will result in twitching on the floor.)

However, you can also quantalize over a model trained to imitate humans. Suppose you give some people a joystick and ask them to remote control the robot to make coffee. They manage this about half the time. You train the robot to be a 25% quantalizer. The top 25% of human actions will do better than humans. 

This is technically equivalent to an imitator of humans, and an ASI that can only output 2 bits (which are used as a random seed for the human) 

Though these descriptions imply different internal workings, which may indicate different probabilities of the AI using rowhammer. 

Yeah, I think this is the key point. Quantilizers are only safe or smart relative to some base distribution.

You mention: 

If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that

Can you discuss more your assumptions on the quantilizer's base distribution? I take it that it's uniform?

Really all I need is that a strategy that takes n bits to specify will be performed by 1 in  of all random strategies. Maybe a random strategy consists of a bunch of random motions that cancel each other out, and in 1 in  of strategies in between these random motions are directed actions that add up to performing this n-bit strategy. Maybe 1 in  strategies start off by typing this strategy to another computer and end with shutting yourself off, so that in the remaining bits of the strategy will be ignored. A prefix-free encoding is basically like the latter situation except ignoring the bits after a certain point is built into the encoding rather than being an outcome of the agent's interaction with the environment.