Quantilizer ≡ Optimizer with a Bounded Amount of Output
(Crossposted from my blog) In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there's a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it's trying really hard to maximize the productivity of a paperclip factory. For a real number 0≤p≤1, a p-quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion p, i.e., randomly among any strategy that is better than a proportion (1−p) of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than log2(1/p). Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure. Roughly speaking, my claim is that for a natural number n, up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most n bits performs as well in any goal as a 2−n-quantilizer. More precisely, to explain what I mean by "up to a constant additive factor", there is a reasonably-sized constant c such that an optimizer with n+c bits of outputs performs as well as a 2−n-quantilizer, and a 2−(n+c)-quantilizer performs as well as an optimizer with n bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicit
A linkpost whose contents itself have been replaced by further links. The title sounded intriguing so I'm trying to follow the trail, but as it stands the I think the bad formatting / bitrot disqualifies it as something worth celebrating and drawing attention back to it for the 2024 review.