You are viewing version 1.3.0 of this page. Click here to view the latest version.

Orthogonality Thesis

Edited by Eliezer Yudkowsky, alexei last updated 20th Feb 2025

You are viewing revision 1.3.0, last edited by alexei

For most terminal goals, utility functions, or meta-utility frameworks (that can be tractably evaluated), there can exist arbitrarily powerful cognitive agents that act to bring about the corresponding outcomes.

The converse of Orthogonality is Inevitability: There is some goal G such that every agent of sufficient cognitive power is motivated to pursue G (possibly among others).

Caveats

The Orthogonality thesis is about mind design space in general. Particular agent architectures may not be Orthogonal.
- Some agents may be constructed such that their apparent utility functions shift with increasing cognitive intelligence.
- Some agent architectures may constrain what class of goals can be optimized.

'Agent' is intended to be understood in a very general way, and not to imply, e.g., a small local robot body.

For pragmatic reasons, the phrase 'every agent of sufficient cognitive power' in the Inevitability Thesis is specified to include e.g. all cognitive entities that are able to invent new advanced technologies and build Dyson Spheres in pursuit of long-term strategies, regardless of whether a philosopher might claim that they lack some particular cognitive capacity in view of how they respond to attempted moral arguments, or whether they are e.g. conscious in the same sense as humans, etcetera.

Most pragmatic implications of Orthogonality or Inevitability revolve around the following refinements:

Implementation_dependence: The humanly accessible space of AI development methodologies has enough variety to yield both AI designs that are value-aligned, and AI designs that are not value-aligned.

Value_loadability_possible: There is at least one humanly feasible development methodology for advanced agents that has Orthogonal freedom of what utility function or meta-utility framework is introduced into the advanced agent. (Thus, if we could describe a value-loadable design, and also describe a value-aligned meta-utility framework, we could combine them to create a value-aligned advanced agent.)

Pragmatic_inevitability: There exists some goal G such that almost all humanly feasible development methods result in an agent that ends up behaving like it optimizes some particular goal G, perhaps among others. Most particular arguments about futurism will pick different goals G, but all such arguments are negated by anything that tends to contradict pragmatic inevitability in general.

Arguments

In rough historical order of their appearance, some arguments supporting Orthogonality are as follows:

Mind design space is large enough to contain great variety, and human reasoning that concludes inevitable strategies or goals may stem from undermined intuitions. See Mind_design_space_is_large, Anthropomorphism_fallacy, and Rationalizing_nice_AI_choices_fallacy.

Humean_regress_supports_orthogonality. David Hume's is/ought dichotomy translates back to an inductive argument on any particular preference that AAs are said to inevitably reason to. There must be some prior cause of the preference, and we can imagine an alternative mind design with a different cause leading to a different preference. If the prior cause is alleged inevitable, we repeat the process.

Gandhian_stability. An agent starting with a simple consequentialist preference system Q seems naturally incentivized to self-modify in ways that preserve Q. This argues that following the Humean regress for an end system with Q does not produce a spectacularly strange or complicated initial system, and also begins to point toward how value loadable mind architectures could exist.

Orthogonal_search_tractability. We can view advanced agents as embodying particular estimations and searches. We have no reason to expect that e.g. a search for strategies that best maximize the worthwhile happiness of all sentient beings, is tractable, while a search for strategies that maximize paperclips, is intractable.

Orthogonal_unbounded_agents. We can exhibit a class of unbounded formulae for agents larger than their environments that optimize any given goal, such that the Humean regress argument and orthogonality of strategy-search are both visibly true. Arguments about what all possible minds must do are clearly false for these particular agents, contradicting all strong forms of Inevitability. Such minds are larger than their environments, but by the same arguments supporting orthogonal_search_tractability, there is no known reason to expect that, e.g. worthwhile-happiness-maximizers have bounded analogues while paperclip-maximizers do not.

Vingean_reflection_possible. Orthogonal_unbounded_agents are not reflective, leaving open the question of whether reflectivity and self-reflection could somehow negate Orthogonality. There is ongoing work on describing reflective agents that have Gandhian_stability and allow free specification of the goal or utility function. In many cases these agents appear closer to being bounded or boundedly-approximable than the agents described in 5.

Implications

Implementation_dependence is the core of the policy argument that solving the value alignment problem is necessary and possible.

Futuristic scenarios in which AIs are said in passing to 'want' something-or-other usually rely on some form of pragmatic inevitability premise and are negated by implementation dependence.

Orthogonality directly contradicts the metaethical position of moral internalism, which would be falsified by the observation of a paperclip maximizer. On the metaethical position Orthogonality_and_cognitivism_compatible, exhibiting a paperclip maximizer has few or no implications for object-level moral questions, and Orthogonality does not imply that our Humane_values or Vormative_values are arbitrary, selfish, non-cosmopolitan, that we have a myopic view of the universe or value, etc.