The third proposition is Fragility of Value which says that if you have a 1000-byte exact specification of worthwhile happiness, and you begin to mutate it, the value created by the corresponding AI with the mutated definition falls off rapidly. E.g. an AI with only 950 bytes of the full definition may end up creating ~~5%,~~ 0%~~, or negative~~ of the value rather than 95% of the value. (E.g., the AI understood all aspects of what makes for a life well-lived... except the part about requiring a conscious observer to experience it.)

Complexity of Value is ~~an extremely~~a central proposition in value alignment theory. Many anticipated difficulties revolve around it:

Should we try to develop ~~Genies~~Sovereigns, or ~~Sovereigns?~~restrict ourselves to Genies?
How likely is a moderately safety-aware project to succeed?
Should we be more worried about malicious actors creating AI, or about well-intentioned errors?
How difficult is the total problem and how much should we be panicking?
How attractive would be any genuinely credible game-changing alternative to AI?

The Complexity of Value proposition is true if, relative to viable and acceptable real-world ~~Methodologies~~methodologies for AI development, there isn't any reliably knowable way to specify the AI's object-level preferences as a structure of low algorithmic complexity, such that the result of running that AI is achieving ~~Enough~~enough of the possible value, for reasonable ~~and humanly interpersonally persuasive~~ definitions of value.

~~The caveats above are spelled out below.~~Caveats:

Suppose there turns out to exist, in principle, a relatively simple Turing machine (e.g. 100 states) that picks out 'value' by re-running entire evolutionary histories, creating and discarding a hundred billion sapient races in order to pick out one that ended up relatively similar to ~~human.~~humanity. This would ~~both be using a very~~use an unrealistically large amount of computing power and also ~~committing~~ commit an unacceptable amount of ~~mindcrime.~~mindcrime.

~~Todo~~

~~Clickbait: Normative value has high algorithmic complexity: there's no simple way to describe the goals we want AIs to want.~~

~~Summary: The proposition that there's no~~ ~~algorithmically simpleobject-level goal~~ ~~we can give to an~~ ~~advanced AI~~ ~~that yields a future of high~~ ~~value~~. Or: Any formally simple goal given to an AI, that talks directly about what sort of world to create, will produce disaster. Or: If you're trying to talk directly about what events or states of the world you want, then any sort of programmatically simple utility function, of the sort a programmer could reasonably hardcode, will lead to a bad end. Contrast, e.g., an induction rule that can learn very complicated classification rules from labeled instances, or a preference framework that explicitly models humans in order to learn complicated facts about what humans want.

We can understand the idea of complexity of value by contrasting it to, for example, the situation with respect to epistemic reasoning aka truth-finding or answering simple factual questions about the world. In an ideal sense, we can try to compress and reduce the idea of mapping the world well down to algorithmically simple notions like "Occam's Razor" and "Bayesian updating". In a practical sense, natural selection, in the course of optimizing humans to solve factual questions like "Where can I find a tree with fruit?" or "Are brightly colored snakes usually poisonous?" or "Who's plotting against me?", ended up with enough of the central core of epistemology that humans were later able to answer questions like "How are the planets moving?" or "What happens if I fire this rocket?", even though humans hadn't been explicitly selected on to answer those exact questions. Because epistemology does have a central core of simplicity and Bayesian updating, selecting for an organism that got some pretty complicated epistemic questions right enough to reproduce, also caused that organism to start understanding things like General Relativity. When it comes to truth-finding, we'd expect by default for the same thing to be true about an Artificial Intelligence; if you build it to get epistemically correct answers on lots of widely different problems, it will contain a core of truthfinding and start getting epistemically correct answers on lots of other problems - even problems completely different from your training set, the way that humans understanding General Relativity wasn't like any hunter-gatherer problem. The complexity of value thesis is that there isn't a simple core to normativity, which means that if you hone your AI to do normatively good things on A, B, and C and then confront the AI with very different problem D, the AI may do the wrong thing on D.

The Orthogonality Thesis says that, contra to the intuition that maximizing paperclips feels "stupid", you can have arbitrarily cognitively powerful entities that maximize paperclips, or arbitrarily complicated other goals. The Complexity of Value thesis says that, contra to the feeling that rightness ought to be simple, darn it, normativity doesn't have an algorithmically simple core the way that correctly answering questions of fact has a central tendency that generalizes well. And so, even though an AI that you train to do well on problems like steering cars or figuring out General Relativity from scratch, may have hit on a core capability that leads the AI to do well on stranger and more complicated problems of galactic scale, there is no corresponding bonus from training the AI on an equally small set of moral or ethical problems - it'd be much harder to hit on a central tendency that generalizes well, and it wouldn't be a simple one.

Introduction.

summary: The proposition that there's no [algorithmically simpleobject-level goal we can give to an advanced AI that yields a future of high value. Or: Any formally simple goal given to an AI, that talks directly about what sort of world to create, will produce disaster. Or: If you're trying to talk directly about what events or states of the world you want, then any sort of programmatically simple utility function, of the sort a programmer could reasonably hardcode, will lead to a bad end. (The non-simple alternative would be, e.g., an induction rule that can learn complicated classification rules from labeled instances, or a preference framework that explicitly models humans in order to learn complicated facts about what humans want.)]

Introduction

Key sub-propositions.

propositions

Centrality.

Centrality

Importance.

Importance

Viable and acceptable computation.

computation

~~summary: The proposition that there's no [algorithmically simpleobject-level goal~~ ~~we can give to an~~ ~~advanced AI~~ ~~that yields a future of high~~ ~~value~~. Or: Any formally simple goal given to an AI, that talks directly about what sort of world to create, will produce disaster. Or: If you're trying to talk directly about what events or states of the world you want, then any sort of programmatically simple utility function, of the sort a programmer could reasonably hardcode, will lead to a bad end. (The non-simple alternative would be, e.g., an induction rule that can learn complicated classification rules from labeled instances, or a preference framework that explicitly models humans in order to learn complicated facts about what humans want.)]

Complexity of value is a further idea above and beyond the orthogonality thesis which states that AIs don't automatically do the right thing and that we can have, e.g., paperclip maximizers. Even if we accept that paperclip maximizers are possible, and simple and nonforced, this wouldn't yet imply that it's very difficult to make AIs that do the right thing. If the right thing is very simple to encode - if there are value optimizers that are scarcely more complex than diamond maximizers - then it might not be especially hard to build a nice AI even if not all AIs are nice. Complexity of Value is the further proposition that says, no, this is forseeably quite hard - not because AIs have 'natural' anti-nice desires, but because niceness ~~is hard~~requires a lot of work to ~~specify and easy to get wrong.~~specify.

Frankena's list

Lack of a central core

We can understand the idea of complexity of value by contrasting it ~~to, for example,~~to the situation with respect to epistemic reasoning aka truth-finding or answering simple factual questions about the world. In an ideal sense, we can try to compress and reduce the idea of mapping the world well down to algorithmically simple notions like "Occam's Razor" and "Bayesian updating". In a practical sense, natural selection, in the course of optimizing humans to solve factual questions like "Where can I find a tree with fruit?" or "Are brightly colored snakes usually poisonous?" or "Who's plotting against me?", ended up with enough of the central core of epistemology that humans were later able to answer questions like "How are the planets moving?" or "What happens if I fire this rocket?", even though humans hadn't been explicitly selected on to answer those exact questions.

The complexity of value thesis is that there isn't a simple core to normativity, which means that if you hone your AI to do normatively good things on A, B, and C and then confront the AI with very different problem D, the AI may do the wrong thing on D. There's a large number of independent ideal "gears" inside the complex machinery of value, compared to epistemology that in principle might only contain "prefer simpler hypotheses" and "prefer hypotheses that match the evidence".

The Orthogonality Thesis says that, contra to the intuition that maximizing paperclips feels "stupid", you can have arbitrarily cognitively powerful entities that maximize paperclips, or arbitrarily complicated other goals. So while intuitively you might think it would be simple to avoid paperclip maximizers, requiring no work at all for a sufficiently advanced AI, the Orthogonality Thesis says that things will be more difficult than that; you have to put in some work...

Arguments.Introduction.

"Complexity of value" is the idea that if you tried to write an AI that would do right things (or maximally right things, or adequately right things) without further looking at humans (so it can't take in a flood of additional data from human advice, the AI has to be complete as it stands once you're finished creating it), the AI's preferences or utility function would need to contain a large amount of data (algorithmic complexity). Conversely, if you try to write an AI that directly wants simple things or try to specify the AI's preferences using a small amount of data or code, it won't do acceptably right things in our universe.

Complexity of value says, "There's no simple and non-meta solution to AI preferences" or "The things we want AIs to want are complicated in the Kolmogorov-complexity sense" or "Any simple goal you try to describe that is All We Need To Program Into AIs is almost certainly wrong."

Complexity of value is a further idea above and beyond the orthogonality thesis which states that AIs don't automatically do the right thing and that we can ~~see~~have, e.g., paperclip maximizers. Even if we accept that paperclip maximizers are possible, and simple and nonforced, this wouldn't yet imply that it's very difficult to make AIs that do the right thing. If the right thing is very simple to encode - if there are value optimizers that are scarcely more complex than diamond maximizers - then it might not be especially hard to build a nice AI even if not all AIs are nice. Complexity of Value is the further proposition that says, no, this is forseeably quite hard - not because AIs have 'natural' anti-nice desires, but because niceness is hard to specify and easy to get wrong.

As an intuition pump for the complexity of value thesis, consider William Frankena's list of things which many cultures and people seem to value (for their own sake rather than their external consequences):

"Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc."

When we try to list out properties of a human or galactic future that seem like they'd be very nice, we at least seem to value a fair number of things that aren't reducible to each other. (What initially look...

Introduction.

Complex values tend to be implicated in patch-resistant problems that wouldn't be resistant if there was some obvious 5-line specification of exactly what to do, or not do.
Complex values tend to be implicated in the context change problems that wouldn't exist if we had a 5-line specification that solved those problems once and for all and that we'd likely run across during the development phase.

It has been advocated that there are psychological biases and popular mistakes leading to beliefs that directly or by implication deny Complex Value. To the extent one credits that Complex Value is probably true, one should arguably be concerned about the number of early assessments of the value alignment problem that seem to rely on Complex Value being false (like just needing to hardcode a particular goal into the AI, or in general treating the value alignment problem as not panic-worthily difficult).

The Complexity of Value proposition is true if, relative to viable and acceptable real-world Methodologies for AI development, there isn't any reliably knowable way to specify the AI's ~~[object-~~object-level ~~preferences]~~preferences as a structure of low algorithmic complexity, such that the result of running that AI is achieving Enough of the possible value, for reasonable and humanly interpersonally persuasive definitions of value.

			v1.16.0Apr 14th 2016 GMT	(-751)
			v1.15.0Dec 16th 2015 GMT	(+806/-60)
			v1.14.0Dec 15th 2015 GMT	(+1249/-230)
			v1.13.0Dec 15th 2015 GMT	(+2843)
			v1.12.0Dec 15th 2015 GMT	(+4842/-105)
			v1.11.0Dec 15th 2015 GMT	(+123/-195)
			v1.10.0Dec 15th 2015 GMT	(-850)
			v1.9.0May 27th 2015 GMT	(+708/-33)
			v1.8.0May 27th 2015 GMT	(+86)
			v1.7.0May 16th 2015 GMT	(+66/-77)

			v1.16.0Apr 14th 2016 GMT	(-751)
			v1.15.0Dec 16th 2015 GMT	(+806/-60)
			v1.14.0Dec 15th 2015 GMT	(+1249/-230)
			v1.13.0Dec 15th 2015 GMT	(+2843)
			v1.12.0Dec 15th 2015 GMT	(+4842/-105)
			v1.11.0Dec 15th 2015 GMT	(+123/-195)
			v1.10.0Dec 15th 2015 GMT	(-850)
			v1.9.0May 27th 2015 GMT	(+708/-33)
			v1.8.0May 27th 2015 GMT	(+86)
			v1.7.0May 16th 2015 GMT	(+66/-77)

LESSWRONG
LW

LESSWRONG
LW

Complexity of value

Complexity of value

Introduction.

Introduction

Key sub-propositions.

Centrality.

Centrality

Importance.

Importance

Viable and acceptable computation.

Arguments.

Frankena's list

Lack of a central core

Arguments.Introduction.

Introduction.