x

luosha@gmail.com

Subscribe

Message

2

4y

luosha@gmail.com

Subscribe

Message

2

4y

What, if not agency?

luosha@gmail.com7mo30

"Maybe one agent aligned with another agent is a somewhat anti-natural concept."
If you have two agents that are perfectly aligned (have all the same preferences with respect to how the world should be), and who can share information (they should want to, if they are perfectly aligned) in order to coordinate action, would there be any reason to model them as two distinct agents, rather than one? Maybe the natural concept of having multiple agents who are aligned requires that there be only imperfect communication among them, so that it makes sense to distinguish each from the others.

Reply

AGI Ruin: A List of Lethalities

luosha@gmail.com4y10

On instrumental convergence: humans would seem to be a prominent counterexample to "most agents don't let you edit their utility functions" -- at least in the sense that our goals/interests etc are quite sensitive to those of people around us. So maybe not explicit editing, but lots of being influenced by and converging to the goals and interests of those around us. (and maybe this suggests another tool for alignment, which is building in this same kind of sensitivity to artificial agents' utility functions)

Reply