Mohsen — LessWrong

If we are going to build these agents without "losing the game", either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there's a day when an AI agent is created without either of these conditions, that's the day I'd consider humanity to have lost. We might not be immediately wiped out by a nanobot swarm, but from that time forward humans will be more like pawns than players, and when our replacement actuators have been built, we'll likely be left without the resources we need to survive.
*****I might be many months late to this post but I wanted to suggest a third option inspired by Russell & Norvig's textbook "Artificial Intelligence: A Modern Approach":
(c) Agents that know they don’t know the objective.
They argue that the standard model of AI, where we supply a fully specified objective, won't scale effectively. They state, "We don’t want machines that are intelligent in the sense of pursuing their objectives; we want them to pursue our objectives... [while being] necessarily uncertain as to what they are."
When an agent recognizes this uncertainty, it is driven to "act cautiously, ask permission, learn more about our preferences..., and defer to human control," enabling provably beneficial behavior.
This approach, in my opinion, beats options (a) and (b) because:
(a) Perfect alignment: We can't fully specify human interests/objectives, and mis-specification harms scale with capability.
(b) Permanent constraints: Static limits are fragile; smarter agents can find ways around them—and they also block useful capabilities.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments