x

LESSWRONG

LW

k_polym — LessWrong

k_polym

k_polym

Message

1

12d

k_polym

12d

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

While it doesn't solve your problem, I think a clearer distinction between preferences and plans would somewhat narrow the issue, and clear up some of the mess around manipulation vs. counsel and the like. For example, if we focus on the prediction and planning phase and consider preferences constant, assume we have agent 1 discussing their plans with an AI, or with another agent 2 in general. Here, the difference between manipulation and honest counsel from the AI is easier to pin down: if the AI is providing a truthful representation of how it expects ea... (read more)