LESSWRONG
LW

93
luosha@gmail.com
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
AGI Ruin: A List of Lethalities
luosha@gmail.com3y10

On instrumental convergence: humans would seem to be a prominent counterexample to "most agents don't let you edit their utility functions" -- at least in the sense that our goals/interests etc are quite sensitive to those of people around us. So maybe not explicit editing, but lots of being influenced by and converging to the goals and interests of those around us. (and maybe this suggests another tool for alignment, which is building in this same kind of sensitivity to artificial agents' utility functions)

Reply