How many philosophers accept the orthogonality thesis ? Evidence from the PhilPapers survey

66.42512077294685%

This should not be reported this way. It should be reported as something like 66%. The other digits are not meaningful.

Yes, you're right, some people raised this in the /r/ControlProblem subreddit. I fixed this.

The PhilPapers Survey was a survey of professional philosophers and others on their philosophical views, carried out in November 2009.

Given that the phrase "orthogonality thesis" was not coined until 2012, I doubt the usefulness of this data set in determining current philosophical consensus around it.

[-]Paperclip Minimizer8y60

Yes, this is the whole point of the first part of the article.

[-]Dacyn8y110

Moral realism plus moral internalism does not imply heterogonality. Just because there is an objectively correct morality, does not mean that any sufficiently powerful optimization process would believe that that morality is correct.

[-]TAG8y20

Becasue?

[-]Dacyn8y40

When people say that a morality is "objectively correct", they generally don't mean to imply that it is supported by "universally compelling arguments". What they do mean might be a little hard to parse, and I'm not a moral realist and don't claim to be able to pass their ITT, but in any case it seems to me that the burden of proof is on the one who claims that their position does imply heterogonality.

[-]TAG8y00

When people say that a morality is “objectively correct”, they generally don’t mean to imply that it is supported by ”universally compelling arguments“.

I think they do mean that quite a lot of the time, for non-srawman versions of "universally compelling". I suppose what you a getting at objectively correct morality existing, in some sense, but being undiscoverable, or cognitively inaccessible.

[-]Dacyn8y40

Sure, probably some of them mean that, but you can't assume that they all do.

[-]TAG8y00

But then that would be covered by "internalism".

[-]Paperclip Minimizer8y20

That wouldn't be covered by "internalism". Whether any possible agent who hold a moral judgment is motivated to act on this judgment is orthogonal (no pun intended) to whether moral judgments are undiscoverable or cognitively inaccessible.

[-]Paperclip Minimizer8y20

Arguably, AIs don't have Omohundroan incentives to discover morality.

[-]TAG8y00

Whether it would believe it, and whether it would discover it are rather separate questions.

[-]Paperclip Minimizer8y20

It can't believe it if it doesn't discover it.

[-]TAG8y30

It is possible to be told something.

[-]Paperclip Minimizer8y20

Yes, this is my problem with this theory, but there are much stupider opinions held by some percentage of philosophers.

[-]TAG8y40

If only everyone could agree with what they are.

[-]TAG8y00

Also, it's not clear that AI would reject the proposition that if there are objectively correct values, then it should update its value system to them, since humans don't always.

[-]Rafael Harth8y100

Let me make sure that I get this right: you look at the survey, measure how many people answered yes to both moral internalism and moral realism, and conclude that everyone who did not accepts the orthogonality thesis?

If yes, then I don't think that's a good approach, for three distinct reasons

1. You're assuming philosophers all have internally consistent positions

2. I think you merely have a one-way implication: $int \land real ⟹ het$ , but not necessarily backwards. It seems possible to reject the orthogonality thesis (and thus accept heterogonality) without believing in both moral realism and moral internalism. But most importantly,

3. Many philosophers probably evaluated morel internalism with respect to humans. Like, I would claim that this is almost universally true for humans, and I probably agree with moral realism, too. kind of. But I also believe the orthogonality thesis when it comes to AI.

[-]Paperclip Minimizer8y30

All your objections are correct and important, and I think the correct results may be anything from 50% to 80%. That said, I think there's a reasonable argument that most heterogonalists would consider morality to be the set of motivations from "with enough intelligence, any possible agent would pursue only one set of motivations" (more mathematically, the utility function from "with enough intelligence, any possible agent would pursue only one utility function").

[-]cousin_it8y60

Can we use "collinearity" instead? It's an existing word which is the opposite of orthogonality.

[-]gjm8y230

I'm not sure it really conveys the relevant idea -- it's too specific an opposite of "orthogonality". I'm not keen on "heterogonality" either, though; that would be the opposite of "homogonality" if that were a word, but not of "orthogonality". "Dependence" or "dependency"? (On the grounds that "orthogonality" here really means "independence".) I think we need a more perspicuous name than that. "The value inevitability thesis" or something like that.

Actually, I'm not very keen on "orthogonality" either because it suggests a very strong kind of independence, where knowing that an agent is highly capable gives us literally no information about its goals -- the Arbital page about the orthogonality thesis calls that "strong orthogonality" -- and I think usually "orthogonality" in this context has a weaker meaning, saying only that any computationally tractable goal is possible for an intelligent agent. I'd rather have "orthogonality" for the strong thesis, "inevitability" for its opposite, and two other terms for "weak orthogonality" (the negation of inevitability) and "weak inevitability" (the negation of strong orthogonality).

[-]Rob Bensinger8y100

Quoting the specific definitions in the Arbital article for orthogonality, in case people haven't seen that page (bold added):

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in creating an intelligent agent to pursue a goal, above and beyond the computational tractability of that goal. [...]

This contrasts to inevitablist theses which might assert, for example:

"It doesn't matter what kind of AI you build, it will turn out to only pursue its own survival as a final end."

"Even if you tried to make an AI optimize for paperclips, it would reflect on those goals, reject them as being stupid, and embrace a goal of valuing all sapient life." [...]

Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct functions of its sensory data, like a reward signal; it would not be easy to rejigger the AIXI architecture to care about creating massive diamonds in the environment (let alone any more complicated environmental goals). The Orthogonality Thesis states "there exists at least one possible agent such that..." over the whole design space; it's not meant to be true of every particular agent architecture and every way of constructing agents. [...]

The weak form of the Orthogonality Thesis says, "Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal."

The strong form of Orthogonality says, "And this agent doesn't need to be twisted or complicated or inefficient or have any weird defects of reflectivity; the agent is as tractable as the goal." [...]

This could be restated as, "To whatever extent you (or a superintelligent version of you) could figure out how to get a high-U outcome if aliens offered to pay you huge amount of resources to do it, the corresponding agent that terminally prefers high-U outcomes can be at least that good at achieving U." This assertion would be false if, for example, an intelligent agent that terminally wanted paperclips was limited in intelligence by the defects of reflectivity required to make the agent not realize how pointless it is to pursue paperclips; whereas a galactic superintelligence being paid to pursue paperclips could be far more intelligent and strategic because it didn't have any such defects. [...]

For purposes of stating Orthogonality's precondition, the "tractability" of the computational problem of U-search should be taken as including only the object-level search problem of computing external actions to achieve external goals. If there turn out to be special difficulties associated with computing "How can I make sure that I go on pursuing U?" or "What kind of successor agent would want to pursue U?" whenever U is something other than "be nice to all sapient life", then these new difficulties contradict the intuitive claim of Orthogonality. Orthogonality is meant to be empirically-true-in-practice, not true-by-definition because of how we sneakily defined "optimization problem" in the setup.

Orthogonality is not literally, absolutely universal because theoretically 'goals' can include such weird constructions as "Make paperclips for some terminal reason other than valuing paperclips" and similar such statements that require cognitive algorithms and not just results. To the extent that goals don't single out particular optimization methods, and just talk about paperclips, the Orthogonality claim should cover them.

[-]Paperclip Minimizer8y20

I thought about orthodox/heterodox when making the term.

[-]gjm8y110

Ah, I see. The trouble is that "ortho-" is being used kinda differently in the two cases.

Ortho- means "straight" or "right". Orthodoxy is ortho-doxy, right teaching, as opposed to hetero-doxy, different teaching (i.e., different from that of The One True Church, and obviously therefore wrong). But orthogonal is ortho-gonal, right-angled, where of course a "right" angle is traditionally half of a "straight" angle. (Why? Because "right" also means "upright", so a "right" angle is one like that between something standing upright and the ground it stands on. This applies in Greek as well as English.) I suppose heterogonality could be other-angled-ness, i.e., being at an angle other than a right angle, but that doesn't feel like a very natural meaning to me somehow.

[-]vedrfolnir8y50

I don't think the orthogonality thesis can be defined as ~[moral internalism & moral realism] -- that is, I think there can be and are philosophers who reject moral internalism, moral realism, *and* the orthogonality thesis, making 66% a high estimate.

Nick Land doesn't strike me as a moral internalist-and-realist (although he has a Twitter and I bet your post will make its way to him somehow), but he doesn't accept the orthogonality thesis:

Even the orthogonalists admit that there are values immanent to advanced intelligence, most importantly, those described by Steve Omohundro as ‘basic AI drives’ — now terminologically fixed as ‘Omohundro drives’. These are sub-goals, instrumentally required by (almost) any terminal goals. They include such general presuppositions for practical achievement as self-preservation, efficiency, resource acquisition, and creativity. At the most simple, and in the grain of the existing debate, the anti-orthogonalist position is therefore that Omohundro drives exhaust the domain of real purposes. Nature has never generated a terminal value except through hypertrophy of an instrumental value.

This is a form of internalism-and-realism, but it's not about morality -- so it wouldn't be inconsistent to reject orthogonality and 'heterogonality'.

I recall someone in the Xenosystems orbit raising the point that humans, continuously since long before our emergence as a distinct species, existed under the maximal possible amount of selection pressure to reproduce, but 1) get weird and 2) frequently don't reproduce. There are counterarguments that can be made here, of course (AIs can be designed with much more rigor than evolution allows, say), but it's another possible line of objection to orthogonality that doesn't involve moral realism.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

3

How many philosophers accept the orthogonality thesis ? Evidence from the PhilPapers survey

3

3

The orthogonality thesis and its relation to existing meta-ethical debates

The PhilPapers survey

Methodology

Results

Appendix: Source code of the script