Epistemic status: Speculative. Rambling stream of ideas.
Recomended semiprerequisite https://www.lesswrong.com/posts/DkcdXsP56g9kXyBdq/coherence-arguments-imply-a-force-for-goal-directed-behavior
How do we tell if an arbitrary system is an optimizer or not?
Well "optimiser" is a set of hypothesis. We can take an AIXItl like structure, and use the universal prior over utility functions. The other hypothesis set is the default solomonov prior. All computer programs weighted by complexity.
Note that each hypothesis set contains a shrunken copy of the other. Within the space of all computer programs, some of those programs will be optimisers. Within the space of all possible optimizers, some will have the goal of making their output match the output of an arbitrary computer program. Thus the odds ratio in either direction is bounded. We call this the odds ratio approach.
The intuition is that this odds ratio provides a good measure if a program is optimizing.
(There is another measure of optimization, namely the weighted average utility over all Turing machine environments. We call this the weighted reward approach)
I will discuss a few programs, and how much optimization each formalism thinks they display.
Trivial constant programms
While True: print(0)
While True: print(1)
Under the odds ratio approach, both these programs are on the not optimizer end of the scale and right next to each other.
Under the weighted reward approach, the answer is much more sensitive on choice of universal Turing machine. An environment that just gives reward for each 1 outputted is very simple. It might even account for say 5% (random number) of weight across all computable environments. If the environment that just rewards 0's isn't given a correspondingly high weight, then according to the weighted reward framework, one of these programs can be much more optimizing than the other.
if input.read(100)==long_random_string: behave_like_AIXI() else: while True: print(0)
if input.read(100)!=long_random_string: behave_like_AIXI() else: while True: print(0)
In the odds ratio formulation, both of these are pretty optimizerry, and almost identical. In either formalization they need to take the complexity of describing the long random string. In the optimizer formulation, they need to take the complexity of constantly outputing 0's. In the all programs formulation, they need the complexity of describing AIXI like behaviour.
In the weighted reward formulation, the first is basically the constant 0 program, while the other is nearly as much of an optimizer as AIXI.
Optimising X and only X
The basic idea here is that once you compensate for the X optimization then the system isn't an optimizer.
One straightforward way of doing this is a rejection of low X proposed action. You have an arbitrary Turing machine T. Whenever T proposes an action A, the utility/ expected utility of that action is computed. If A is one of the top m actions when it comes to maximizing X, then A is taken. Otherwise T is told to try again. A system is said to be an X and only X maximizer if it is better modelled as being any old Turing machine behind a low X rejection gate. As opposed to a Y optimizer behind a low X rejection gate.
For example, a system that outputs whichever action in the million X-best actions contains the most 0's is an X and only X best optimizer. Here m=1000000, and the simple Turing machine behind the low X rejection gate is just a TM that cycles through all strings, starting with the ones full of 0's.
A system that outputs whichever action in the million X-best actions maximizes the number of paperclips in existence is not an X and only X maximiser. It is best modelled as a paperclip maximiser behind a low X rejector. And this comes back to an implicit choice of distribution over optimizers at the start of this article. But presumably the paperclip maximizer is prominent in that set.
Note that at the start of this, we used a set of optimisers. One way to formally specify that set would be to take an AIXI like design, and put a solomonov prior over utility functions (as functions of observations)
However, this fails to capture AI's that optimise for something that is not a simple function of observations. (Like some real world object in a world where the AI's senses can be fooled.) However, if we had a fomalization of an agent that did this kind of optimization, we could just plug in all possible utility functions, weighted by simplicity, into our set of optimizers. Likewise for other decision theories, compute limited AI's ect.