The Orthogonality Thesis says that it is possible to direct arbitrarily intelligent agents toward any end. For example, it's possible to have an extremely smart mind which only pursues the end of creating as many paperclips as possible.
The Orthogonality Thesis is a statement about computer science - a property of the design space of possible cognitive agents. Orthogonality doesn't claim, for example, that AI projects on our own Earth are equally likely to create any possible mind design. Orthogonality says nothing about whether a human AI researcher would want to build an AI that made paperclips, or want to make a nice AI. The Orthogonality Thesis just says that the space of possible designs at least contains AIs that make paperclips, or AIs that are nice.
Orthogonality stands in contrast to an inevitabilism thesis which might say, for example:
The relevant policy implication of Orthogonality is that:
The Orthogonality thesis does not say anything about the practical probability of either option for a real-world AI project. It only states the possibility as a matter of computer science.
A precise statement of Orthogonality includes the caveat that the corresponding optimization problem must be tractable.
Suppose, for the sake of argument, that aliens offer to pay you the equivalent of a million dollars in wealth for every paperclip that we make. We would not find anything especially intractable about figuring out how to make lots of paperclips. We can imagine ourselves having a human reason to make lots of paperclips, and given that reason, the optimization problem of "How can I make lots of paperclips?" would pose no special difficulty.
That is, the questions:
...would not be especially computationally burdensome or intractable.
The Orthogonality Thesis in stronger form says that, when specifying an agent that takes actions whose consequences are highly ranked according to some outcome-scoring function there's no added difficulty except whatever difficulty is inherent in the question "What policies would in fact result in consequences with high scores?"
In contrast, if an agent wanted the SHA512 hash of a digitized representation of the quantum state of the universe to be 0 as often as possible, this would be an exceptionally intractable kind of goal. Even if an alien offered to pay us a lot of money, we still couldn't figure out how to do that.
Intuitively, the Orthogonality Thesis could be restated as, "To whatever extent you could figure out how to get a high- outcome if aliens offered to pay you huge amount of resources to do it, the corresponding agent that wants high- outcomes won't be any worse at solving the problem." This formulation would be false if, for example, an intelligent agent that terminally wanted paperclips was limited in intelligence by the defects of reflectivity required to make it not realize how stupid it is to pursue paperclips; whereas a galactic superintelligence being paid to pursue paperclips could be far more intelligent and strategic about getting them.
The corresponding principle in philosophy was advocated first by David Hume, whose phrasings included, "Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger." (In our terms: an agent whose preferences over outcomes scores the destruction of the world more highly than the scratching of David Hume's finger, is not thereby impeded from forming accurate models of the world or figuring out which policies to pursue to that end.)
On an intuitive level, Hume's principle was seen by some as obvious, and by others as ridiculous. In our terms, the corresponding objection would be, "What do you mean, it's intelligent while making only paperclips? There must be some defect of reflectivity if the agent can't see itself from the outside and realize how pointless paperclips are."
Some philosophers responded to Hume by advocating 'thick' definitions of intelligence that included some statement about the 'reasonableness' of the agent's ends. For our purposes, however, if an agent is cognitively powerful enough to build Dyson Spheres, we don't care whether it's defined as 'intelligent' or not.
We could arrange ascending strengths of the Orthogonality Thesis as follows:
To pry apart these possibilities:
Argument (A) supports ultraweak Orthogonality, argument (B) supports weak Orthogonality, argument (C) supports classic Orthogonality, and (D)-(F) support strong Orthogonality:
...
(in progress, content from older version of page preserved below)
For pragmatic reasons, the phrase 'every agent of sufficient cognitive power' in the Inevitability Thesis is specified to include e.g. all cognitive entities that are able to invent new advanced technologies and build Dyson Spheres in pursuit of long-term strategies, regardless of whether a philosopher might claim that they lack some particular cognitive capacity in view of how they respond to attempted moral arguments, or whether they are e.g. conscious in the same sense as humans, etcetera.
Most pragmatic implications of Orthogonality or Inevitability revolve around the following refinements:
Implementation dependence: The humanly accessible space of AI development methodologies has enough variety to yield both AI designs that are value-aligned, and AI designs that are not value-aligned.
Value loadability possible: There is at least one humanly feasible development methodology for advanced agents that has Orthogonal freedom of what utility function or meta-utility framework is introduced into the advanced agent. (Thus, if we could describe a value-loadable design, and also describe a value-aligned meta-utility framework, we could combine them to create a value-aligned advanced agent.)
Pragmatic inevitability: There exists some goal G such that almost all humanly feasible development methods result in an agent that ends up behaving like it optimizes some particular goal G, perhaps among others. Most particular arguments about futurism will pick different goals G, but all such arguments are negated by anything that tends to contradict pragmatic inevitability in general.
Implementation dependence is the core of the policy argument that solving the value alignment problem is necessary and possible.
Futuristic scenarios in which AIs are said in passing to 'want' something-or-other usually rely on some form of pragmatic inevitability premise and are negated by implementation dependence.
Orthogonality directly contradicts the metaethical position of moral internalism, which would be falsified by the observation of a paperclip maximizer. On the metaethical position that orthogonality and cognitivism are compatible, exhibiting a paperclip maximizer has few or no implications for object-level moral questions, and Orthogonality does not imply that our humane values or normative values are arbitrary, selfish, non-cosmopolitan, that we have a myopic view of the universe or value, etc.