1709

LESSWRONG
LW

1708
AI
Personal Blog

-7

[ Question ]

Orthogonality Thesis seems wrong

by Donatas Lučiūnas
26th Mar 2024
1 min read
A
3
26

-7

AI
Personal Blog

-7

Orthogonality Thesis seems wrong
3Jonas Hallgren
2Vladimir_Nesov
1Donatas Lučiūnas
1Jonas Hallgren
2Dagon
1Donatas Lučiūnas
2Viliam
-8Donatas Lučiūnas
14Tamsin Leake
-4Donatas Lučiūnas
2the gears to ascension
-1Donatas Lučiūnas
2the gears to ascension
1Donatas Lučiūnas
2the gears to ascension
1Donatas Lučiūnas
2Viliam
1Donatas Lučiūnas
2Dagon
-7Donatas Lučiūnas
2Viliam
-1Donatas Lučiūnas
2Viliam
1Donatas Lučiūnas
2Viliam
1Donatas Lučiūnas
New Answer
New Comment

3 Answers sorted by
top scoring

Jonas Hallgren

Mar 26, 2024

32

Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape. 

Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.

Add Comment
[-]Vladimir_Nesov1y20

Orthogonality thesis says that it's invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It's entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it's in fact likely. But you do need to ask, that's the point of orthogonality thesis, its narrow scope.

Reply
1Donatas Lučiūnas1y
Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?
1Jonas Hallgren1y
Yeah, I agree with what you just said; I should have been more careful with my phrasing.  Maybe something like: "The naive version of the orthogonality thesis where we assume that AIs can't converge towards human values is assumed to be true too often"

Dagon

Mar 25, 2024

20

an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption

The question isn't whether every AGI would or would not make this assumption, but whether it's actually true, and therefore whether it's true that a powerful AGI could have a wide range of goals or values, including the possibility that they're alien or contradictory to common human values.

I think it's highly unlikely that objective norms/values exist, and that weak versions of orthogonality (not literally ANY goals are possible, but enough bad ones to still be worried) are true.  Even more strongly, I think it hasn't been shown that they're false, and we should take the possibility very seriously.

Add Comment
[-]Donatas Lučiūnas1y10

Could you read my comment here and let me know what you think?

Reply

Viliam

Mar 25, 2024

20

Orthogonality thesis is not about the existence or nonexistence of "objective norms/values", but whether a specific agent could have a specific goal. The thesis says that for any specific goal, there can be an intelligent agent that has the goal.

To simplify it, the question is not "is there an objective definition of good?" where we probably disagree, but rather "can an agent be bad?" where I suppose we both agree the answer is clearly yes.

More precisely, "can a very intelligent agent be bad?". Still, the answer is yes. (Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.)

Add Comment
[+]Donatas Lučiūnas1y-8-17
Moderation Log
More from Donatas Lučiūnas
View more
Curated and popular this week
A
3
0

Orthogonality Thesis (as well as Fact–value distinction) is based on an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption, it is a logical fallacy, specifically argument from ignorance. As black swan theory says - there are unknown unknowns. Which in this context means that objective norms / values may exist, maybe they are not discovered yet. Why Orthogonality Thesis has so much recognition?