Personal Blog

# The orthogonality thesis and its relation to existing meta-ethical debates

In the field of AI alignment theory, the orthogonality thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of motivation. The reverse thesis, that we may call the heterogonality thesis, asserts that, with enough intelligence, any possible agent would pursue only one set of motivations.

In the field of meta-ethics, moral internalism asserts that any possible agent who hold a moral judgment is motivated to act on this judgment. For example, according to moral internalism, any agent who hold that one ought to donate 10% of one's income to charity is motivated to do so.

Also in the field of meta-ethics, moral realism asserts that some moral judgments are objectively correct. This is a form of moral cognitivism, which that moral judgments are factual statements that can be objectively correct or incorrect (anti-realist cognitivism is error theory, which asserts that all moral judgments are incorrect).

It's easy to see that the heterogonality thesis is moral internalism plus moral realism. A moral realist would say that, with enough intelligence, any possible agent would discover objective morality and hold only one set of moral judgments that is objectively correct. Therefore, a moral realist who is also a moral internalist would support the heterogonality thesis by saying that this mean that, with enough intelligence, any possible agent would be motivated to act by only one set of moral judgments, and thus would pursue only one set of motivations.

This is why, even though the orthogonality thesis is a recent concept that is only known by the small circles of AI alignment theorists (and I had to invent the term for its negation by myself), we can try to estimate how many philosophers accept the orthogonality thesis.

# The PhilPapers survey

The PhilPapers Survey was a survey of professional philosophers and others on their philosophical views, carried out in November 2009. The Survey was taken by 3226 respondents, including 1803 philosophy faculty members and/or PhDs and 829 philosophy graduate students.

It included three questions on meta-ethics: whether one accept moral realism or moral anti-realism, whether one accept moral internalism or moral externalism, and whether one accept moral cognitivism or moral non-cognitivism. (There was also a question on normative ethics, whether one accept virtue ethics, deontology, or consequentialism. It is not relevant to the orthogonality thesis.)

Each question is divided between multiple options: one for every position, plus an "Other" option for people for whom the question is too unclear to answer, agnostics, people insufficiently familiar with the issue, etc.

# Methodology

The methodology is implemented by a bash script which is available in the appendix. It downloads the answers of public respondents to the PhilPapers survey, extract their opinions on meta-ethics, exclude philosophers who picked "Other" options (because we can't know if they accept the orthogonality thesis), and then compute the number of philosophers (with a knowable opinion) who accept the orthogonality thesis.

# Results

66% of philosophers (with a knowable opinion) accept the orthogonality thesis. This is about two thirds of philosophers.

# Appendix: Source code of the script

#!/bin/bash
# WARNING: This script creates many new files.
# It is highly recommended to be in an empty folder when executing it.

function opinion() {
# Example: opinion 42 "Meta-ethics" "moral realism" "moral anti-realism"
str="$2:$3 or $4?<\/td><td bgcolor='#......' style='width:250px'>" answer=$(grep -o "$str[-A-Za-z/: ]*" "$1" | sed "s/$str//") r=other if grep "$3" <<< "$answer" > /dev/null; then r=$3; fi
if grep "$4" <<< "$answer" > /dev/null; then r=$4; fi echo$r
}

function metaethical_opinions() {
# Usage: metaethical_opinions <file>
# Example: metaethical_opinions 42
metaethics=$(opinion "$1" "Meta-ethics"      "moral realism"  "moral anti-realism")
mjudgement=$(opinion "$1" "Moral judgment"   "cognitivism"    "non-cognitivism")
motivation=$(opinion "$1" "Moral motivation" "internalism"    "externalism")
echo "$metaethics$mjudgement    $motivation" } if ! [ -e public_respondents.html ]; then wget https://philpapers.org/surveys/public_respondents.html fi if ! [ -e pps_meo ]; then for profile in$(sed "s/'/\n/g" public_respondents.html | grep "https://philpapers.org/profile/"); do
id=$(cut -d/ -f5 <<<$profile)
if ! [ -e $id ]; then wget$profile -O $id fi metaethical_opinions$id
done | sort | uniq -c | grep -v other | sed 's/^ *//' > pps_meo
fi

orthogonalists=$(grep -v -P "moral realism\tcognitivism\tinternalism" pps_meo | cut -d\ -f1 | paste -sd+ | bc) philosophers=$(cut -d\  -f1 pps_meo | paste -sd+ | bc)

python3 << EOF
print("{}% of philosophers (with a knowable opinion) accept the orthogonality thesis.".format($orthogonalists/$philosophers * 100))
EOF

Personal Blog

# 5

New Comment
66.42512077294685%

This should not be reported this way. It should be reported as something like 66%. The other digits are not meaningful.

Yes, you're right, some people raised this in the /r/ControlProblem subreddit. I fixed this.

The PhilPapers Survey was a survey of professional philosophers and others on their philosophical views, carried out in November 2009.

Given that the phrase "orthogonality thesis" was not coined until 2012, I doubt the usefulness of this data set in determining current philosophical consensus around it.

Yes, this is the whole point of the first part of the article.

Moral realism plus moral internalism does not imply heterogonality. Just because there is an objectively correct morality, does not mean that any sufficiently powerful optimization process would believe that that morality is correct.

Becasue?

When people say that a morality is "objectively correct", they generally don't mean to imply that it is supported by "universally compelling arguments". What they do mean might be a little hard to parse, and I'm not a moral realist and don't claim to be able to pass their ITT, but in any case it seems to me that the burden of proof is on the one who claims that their position does imply heterogonality.

When people say that a morality is “objectively correct”, they generally don’t mean to imply that it is supported by ”universally compelling arguments“.

I think they do mean that quite a lot of the time, for non-srawman versions of "universally compelling". I suppose what you a getting at objectively correct morality existing, in some sense, but being undiscoverable, or cognitively inaccessible.

Sure, probably some of them mean that, but you can't assume that they all do.

But then that would be covered by "internalism".

That wouldn't be covered by "internalism". Whether any possible agent who hold a moral judgment is motivated to act on this judgment is orthogonal (no pun intended) to whether moral judgments are undiscoverable or cognitively inaccessible.

Arguably, AIs don't have Omohundroan incentives to discover morality.

Whether it would believe it, and whether it would discover it are rather separate questions.

It can't believe it if it doesn't discover it.

It is possible to be told something.

Yes, this is my problem with this theory, but there are much stupider opinions held by some percentage of philosophers.

If only everyone could agree with what they are.

Also, it's not clear that AI would reject the proposition that if there are objectively correct values, then it should update its value system to them, since humans don't always.

Let me make sure that I get this right: you look at the survey, measure how many people answered yes to both moral internalism and moral realism, and conclude that everyone who did not accepts the orthogonality thesis?

If yes, then I don't think that's a good approach, for three distinct reasons

1. You're assuming philosophers all have internally consistent positions

2. I think you merely have a one-way implication: , but not necessarily backwards. It seems possible to reject the orthogonality thesis (and thus accept heterogonality) without believing in both moral realism and moral internalism. But most importantly,

3. Many philosophers probably evaluated morel internalism with respect to humans. Like, I would claim that this is almost universally true for humans, and I probably agree with moral realism, too. kind of. But I also believe the orthogonality thesis when it comes to AI.

All your objections are correct and important, and I think the correct results may be anything from 50% to 80%. That said, I think there's a reasonable argument that most heterogonalists would consider morality to be the set of motivations from "with enough intelligence, any possible agent would pursue only one set of motivations" (more mathematically, the utility function from "with enough intelligence, any possible agent would pursue only one utility function").

I don't think the orthogonality thesis can be defined as ~[moral internalism & moral realism] -- that is, I think there can be and are philosophers who reject moral internalism, moral realism, *and* the orthogonality thesis, making 66% a high estimate.

Nick Land doesn't strike me as a moral internalist-and-realist (although he has a Twitter and I bet your post will make its way to him somehow), but he doesn't accept the orthogonality thesis:

Even the orthogonalists admit that there are values immanent to advanced intelligence, most importantly, those described by Steve Omohundro as ‘basic AI drives’ — now terminologically fixed as ‘Omohundro drives’. These are sub-goals, instrumentally required by (almost) any terminal goals. They include such general presuppositions for practical achievement as self-preservation, efficiency, resource acquisition, and creativity. At the most simple, and in the grain of the existing debate, the anti-orthogonalist position is therefore that Omohundro drives exhaust the domain of real purposes. Nature has never generated a terminal value except through hypertrophy of an instrumental value.

This is a form of internalism-and-realism, but it's not about morality -- so it wouldn't be inconsistent to reject orthogonality and 'heterogonality'.

I recall someone in the Xenosystems orbit raising the point that humans, continuously since long before our emergence as a distinct species, existed under the maximal possible amount of selection pressure to reproduce, but 1) get weird and 2) frequently don't reproduce. There are counterarguments that can be made here, of course (AIs can be designed with much more rigor than evolution allows, say), but it's another possible line of objection to orthogonality that doesn't involve moral realism.

Can we use "collinearity" instead? It's an existing word which is the opposite of orthogonality.

I'm not sure it really conveys the relevant idea -- it's too specific an opposite of "orthogonality". I'm not keen on "heterogonality" either, though; that would be the opposite of "homogonality" if that were a word, but not of "orthogonality". "Dependence" or "dependency"? (On the grounds that "orthogonality" here really means "independence".) I think we need a more perspicuous name than that. "The value inevitability thesis" or something like that.

Actually, I'm not very keen on "orthogonality" either because it suggests a very strong kind of independence, where knowing that an agent is highly capable gives us literally no information about its goals -- the Arbital page about the orthogonality thesis calls that "strong orthogonality" -- and I think usually "orthogonality" in this context has a weaker meaning, saying only that any computationally tractable goal is possible for an intelligent agent. I'd rather have "orthogonality" for the strong thesis, "inevitability" for its opposite, and two other terms for "weak orthogonality" (the negation of inevitability) and "weak inevitability" (the negation of strong orthogonality).

Quoting the specific definitions in the Arbital article for orthogonality, in case people haven't seen that page (bold added):

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in creating an intelligent agent to pursue a goal, above and beyond the computational tractability of that goal. [...]
This contrasts to inevitablist theses which might assert, for example:
"It doesn't matter what kind of AI you build, it will turn out to only pursue its own survival as a final end."
"Even if you tried to make an AI optimize for paperclips, it would reflect on those goals, reject them as being stupid, and embrace a goal of valuing all sapient life." [...]
Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct functions of its sensory data, like a reward signal; it would not be easy to rejigger the AIXI architecture to care about creating massive diamonds in the environment (let alone any more complicated environmental goals). The Orthogonality Thesis states "there exists at least one possible agent such that..." over the whole design space; it's not meant to be true of every particular agent architecture and every way of constructing agents. [...]
The weak form of the Orthogonality Thesis says, "Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal."
The strong form of Orthogonality says, "And this agent doesn't need to be twisted or complicated or inefficient or have any weird defects of reflectivity; the agent is as tractable as the goal." [...]
This could be restated as, "To whatever extent you (or a superintelligent version of you) could figure out how to get a high-U outcome if aliens offered to pay you huge amount of resources to do it, the corresponding agent that terminally prefers high-U outcomes can be at least that good at achieving U." This assertion would be false if, for example, an intelligent agent that terminally wanted paperclips was limited in intelligence by the defects of reflectivity required to make the agent not realize how pointless it is to pursue paperclips; whereas a galactic superintelligence being paid to pursue paperclips could be far more intelligent and strategic because it didn't have any such defects. [...]
For purposes of stating Orthogonality's precondition, the "tractability" of the computational problem of U-search should be taken as including only the object-level search problem of computing external actions to achieve external goals. If there turn out to be special difficulties associated with computing "How can I make sure that I go on pursuing U?" or "What kind of successor agent would want to pursue U?" whenever U is something other than "be nice to all sapient life", then these new difficulties contradict the intuitive claim of Orthogonality. Orthogonality is meant to be empirically-true-in-practice, not true-by-definition because of how we sneakily defined "optimization problem" in the setup.
Orthogonality is not literally, absolutely universal because theoretically 'goals' can include such weird constructions as "Make paperclips for some terminal reason other than valuing paperclips" and similar such statements that require cognitive algorithms and not just results. To the extent that goals don't single out particular optimization methods, and just talk about paperclips, the Orthogonality claim should cover them.

I thought about orthodox/heterodox when making the term.

Ah, I see. The trouble is that "ortho-" is being used kinda differently in the two cases.

Ortho- means "straight" or "right". Orthodoxy is ortho-doxy, right teaching, as opposed to hetero-doxy, different teaching (i.e., different from that of The One True Church, and obviously therefore wrong). But orthogonal is ortho-gonal, right-angled, where of course a "right" angle is traditionally half of a "straight" angle. (Why? Because "right" also means "upright", so a "right" angle is one like that between something standing upright and the ground it stands on. This applies in Greek as well as English.) I suppose heterogonality could be other-angled-ness, i.e., being at an angle other than a right angle, but that doesn't feel like a very natural meaning to me somehow.