This proposition is true according to you if you believe that: "Nobody has yet proposed a satisfactory fixed/simple algorithm that takes as input a material description of the universe, and/or channels of sensory observation, and spits out or a ."
The thesis says that, on the object-level, any specification of has high .
In some sense, all the complexity required to specify value must be contained inside human brains; even as an object of conversation, we can't talk about anything our brains do not point to. This is why distinguishes the object-level complexity of value from meta-level complexity--the minimum program required to get a to learn values. It would be a separate question to consider the minimum complexity of a function that takes as input a full description of the material universe including humans, and outputs "".
This question also has a : given sensory observations an AGI could reasonably receive in cooperation with its programmers, or a predictive model of humans that AGI could reasonably form and refine, is there a simple rule that will take this data as input, and safely and reliably Tasks on the order of "develop molecular nanotechnology, use the nanotechnology to synthesize one strawberry, and then stop, with a minimum of side effects"?
In this case we have no strong reason to think that the functions are high-complexity in an absolute sense.
However, nobody has yet proposed a satisfactory piece of pseudocode that solves any variant of this problem even in principle.
Consider a simple that specifies a sense-input-dependent formulation of : An object-level outcome has a utility that is if a future sense signal is 1 and if is 2. Given this setup, the AI has an incentive to tamper with and cause it to be 1 if is easier to optimize than and vice versa.
More generally, sensory signals from humans will usually not be reliably and unalterably correlated with our goal identification. We can't treat human-generated signals as an about any referent, because (a) ; and (b) humans make mistakes, especially when you ask them something complicated. You can't have a scheme along the lines of "the humans press a button if something goes wrong", because some policies go wrong in ways humans don't notice until it's too late, and some AI policies destroy the button (or modify the human).
Even leaving that aside, nobody has yet suggested any fully specified pseudocode that takes in a human-controlled sensory channel and a description of the universe and spits out a utility function that (actually realistically) identifies our task over (including not tiling the universe with subagents and so on).
Indeed, nobody has yet suggested a realistic scheme for identifying any kind of goal whatsoever to actually describe the material universe. [1]
For similar reasons as above, nobody has yet proposed (even in principle) effective pseudocode for a meta-meta program over some space of meta-rules, which would let the AI learn a value-identifying meta-rule. Two main problems here are:
One, nobody even has the seed of any proposal whatsoever for how that could, work short of "define a correctness-signaling channel and throw program induction at it" (which seems unlikely to work directly, given ).
Two, if the learned meta-rule doesn't have a stable, extremely compact human-transparent representation, it's not clear how we could arrive at any confidence whatsoever . E.g., consider all the example meta-rules we could imagine which would work well on a small scale but fail to scale, like "something good just happened if the humans smiled".
Except in the rather non-meta sense of inspecting the AI's ontology once it's advanced enough to describe what you think you want the AI to do, and manually programming the AI's consequentialist preferences with respect to what you think that ontology means.