The Orthogonality Thesis says that it is possible to direct arbitrarily intelligent agents toward any end. For example, it's possible to have an extremely smart mind which only pursues the end of creating as many paperclips as possible.
The Orthogonality Thesis is a statement about computer science - a property of the design space of possible cognitive agents. Orthogonality doesn't claim, for example, that AI projects on our own Earth are equally likely to create any possible mind design. Orthogonality says nothing about whether a human AI researcher would want to build an AI that made paperclips, or want to make a nice AI. The Orthogonality Thesis just says that the space of possible designs at least contains AIs that make paperclips, or AIs that are nice.
Orthogonality stands in contrast to an inevitabilist thesis which might say, for example:
The relevant policy implication of Orthogonality is that:
The Orthogonality thesis does not say anything about the real-world character of AI projects - which goals they try to inculcate, how competently they do so, etcetera. It just claims as a matter of computer science that certain possible agent designs exist.
Some particular agent architectures may still be much more easily configurable to some goals than others. Orthogonality is an existence statement over the whole design space of possibilities, not true of every particular agent architecture.
Orthogonality is meant as a descriptive statement about reality (or about the mathematical space of possibilities for agent designs) rather than a normative statement. It is not a claim about the way things ought to be; or a claim that moral relativism is true (e.g. that all human moralities are on equally uncertain footing relative to some uniquely normative higher metamorality that judges all human moralities as equally devoid of what would objectively constitute a justification); etcetera. Claiming that paperclip maximizers can exist is not necessarily meant to say anything favorable about paperclips, or derogatory about valuing sapient life, etcetera.
A precise statement of Orthogonality includes the caveat that the corresponding optimization problem must be tractable.
Suppose, for the sake of argument, that aliens offer to pay you the equivalent of a million dollars in wealth for every paperclip that we make. We would not find anything especially intractable about figuring out how to make lots of paperclips. We can imagine ourselves having a human reason to make lots of paperclips, and given that reason, the optimization problem of "How can I make lots of paperclips?" would pose no special difficulty.
That is, the questions:
...would not be especially computationally burdensome or intractable.
The Orthogonality Thesis in stronger form says that, when specifying an agent that takes actions whose consequences are highly ranked according to some outcome-scoring function there's no added difficulty except whatever difficulty is inherent in the question "What policies would in fact result in consequences with high scores?"
In contrast, if an agent wanted the SHA512 hash of a digitized representation of the quantum state of the universe to be 0 as often as possible, this would be an exceptionally intractable kind of goal. Even if aliens offered to pay us to do that, we still couldn't figure out how.
Intuitively, the Orthogonality Thesis could be restated as, "To whatever extent you could figure out how to get a high- outcome if aliens offered to pay you huge amount of resources to do it, the corresponding agent that wants high- outcomes won't be any worse at solving the problem." This formulation would be false if, for example, an intelligent agent that terminally wanted paperclips was limited in intelligence by the defects of reflectivity required to make it not realize how stupid it is to pursue paperclips; whereas a galactic superintelligence being paid to pursue paperclips could be far more intelligent and strategic about getting them.
For purposes of stating Orthogonality's precondition, we consider only the object-level search problem of relating the material goal to external actions. If there turn out to be any special difficulties associated with "How can I make sure that I go on pursuing ?" or "What kind of subagent would want to pursue ?" then these difficulties are a contradiction to Orthogonality, rather than an exception to its precondition. Orthogonality claims that the only added difficulties come from difficulties inherent in "What non-reflective, non-agent-programming-related, object-level events are needed to achieve material outcomes that fulfill ?"
The corresponding principle in philosophy was advocated first by David Hume, whose phrasings included, "Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger." (In our terms: an agent whose preferences over outcomes scores the destruction of the world more highly than the scratching of David Hume's finger, is not thereby impeded from forming accurate models of the world or figuring out which policies to pursue to that end.)
On an intuitive level, Hume's principle was seen by some as obvious, and by others as ridiculous. Some philosophers responded to Hume by advocating 'thick' definitions of intelligence that included some statement about the 'reasonableness' of the agent's ends. For our purposes, if an agent is cognitively powerful enough to build Dyson Spheres, we don't care whether it's defined as 'intelligent' or not. A definition of the word 'intelligence' contrived to exclude paperclip maximization doesn't change the empirical behavior or empirical power of a paperclip maximizer.
We could arrange ascending strengths of the Orthogonality Thesis as follows:
To pry apart these possibilities:
Argument (A) supports ultraweak Orthogonality, argument (B) supports weak Orthogonality, argument (C) supports classic Orthogonality, and (D)-(F) support strong Orthogonality:
...
(in progress, content from older version of page preserved below)
For pragmatic reasons, the phrase 'every agent of sufficient cognitive power' in the Inevitability Thesis is specified to include e.g. all cognitive entities that are able to invent new advanced technologies and build Dyson Spheres in pursuit of long-term strategies, regardless of whether a philosopher might claim that they lack some particular cognitive capacity in view of how they respond to attempted moral arguments, or whether they are e.g. conscious in the same sense as humans, etcetera.
Most pragmatic implications of Orthogonality or Inevitability revolve around the following refinements:
Implementation dependence: The humanly accessible space of AI development methodologies has enough variety to yield both AI designs that are value-aligned, and AI designs that are not value-aligned.
Value loadability possible: There is at least one humanly feasible development methodology for advanced agents that has Orthogonal freedom of what utility function or meta-utility framework is introduced into the advanced agent. (Thus, if we could describe a value-loadable design, and also describe a value-aligned meta-utility framework, we could combine them to create a value-aligned advanced agent.)
Pragmatic inevitability: There exists some goal G such that almost all humanly feasible development methods result in an agent that ends up behaving like it optimizes some particular goal G, perhaps among others. Most particular arguments about futurism will pick different goals G, but all such arguments are negated by anything that tends to contradict pragmatic inevitability in general.
Implementation dependence is the core of the policy argument that solving the value alignment problem is necessary and possible.
Futuristic scenarios in which AIs are said in passing to 'want' something-or-other usually rely on some form of pragmatic inevitability premise and are negated by implementation dependence.
Orthogonality directly contradicts the metaethical position of moral internalism, which would be falsified by the observation of a paperclip maximizer. On the metaethical position that orthogonality and cognitivism are compatible, exhibiting a paperclip maximizer has few or no implications for object-level moral questions, and Orthogonality does not imply that our humane values or normative values are arbitrary, selfish, non-cosmopolitan, that we have a myopic view of the universe or value, etc.