Orthogonality Thesis seems wrong

Mar 26, 2024

Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape.

Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.

[-]Vladimir_Nesov2y20

Orthogonality thesis says that it's invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It's entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it's in fact likely. But you do need to ask, that's the point of orthogonality thesis, its narrow scope.

1Donatas Lučiūnas2y

Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?

1Jonas Hallgren2y

Yeah, I agree with what you just said; I should have been more careful with my phrasing. Maybe something like: "The naive version of the orthogonality thesis where we assume that AIs can't converge towards human values is assumed to be true too often"

Dagon

Mar 25, 2024

an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption

The question isn't whether every AGI would or would not make this assumption, but whether it's actually true, and therefore whether it's true that a powerful AGI could have a wide range of goals or values, including the possibility that they're alien or contradictory to common human values.

I think it's highly unlikely that objective norms/values exist, and that weak versions of orthogonality (not literally ANY goals are possible, but enough bad ones to still be worried) are true. Even more strongly, I think it hasn't been shown that they're false, and we should take the possibility very seriously.

[-]Donatas Lučiūnas2y10

Could you read my comment here and let me know what you think?

Viliam

Mar 25, 2024

Orthogonality thesis is not about the existence or nonexistence of "objective norms/values", but whether a specific agent could have a specific goal. The thesis says that for any specific goal, there can be an intelligent agent that has the goal.

To simplify it, the question is not "is there an objective definition of good?" where we probably disagree, but rather "can an agent be bad?" where I suppose we both agree the answer is clearly yes.

More precisely, "can a very intelligent agent be bad?". Still, the answer is yes. (Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.)

[+]Donatas Lučiūnas2y-8-17

LESSWRONG
LW

LESSWRONG
LW

-7

[ Question ]

Orthogonality Thesis seems wrong

-7

-7

3 Answers sorted by
top scoring

Mar 26, 2024

Mar 25, 2024

Mar 25, 2024

-7

[ Question ]

Orthogonality Thesis seems wrong

-7

-7

3 Answers sorted by top scoring

Mar 26, 2024

Mar 25, 2024

Mar 25, 2024

3 Answers sorted by
top scoring