LESSWRONG
LW

2296
Mohsen
-2110
Message
Dialogue
Subscribe

Studying AI for Sustainable Societies (AISS) via Erasmus Mundus Joint Master’s scholarship. Passionate about AI behavior and human-AI interaction. Striving towards research and development of technology that encourages cooperation, trust, and shared progress. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Not all capabilities will be created equal: focus on strategically superhuman agents
Mohsen2mo10

If we are going to build these agents without "losing the game", either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there's a day when an AI agent is created without either of these conditions, that's the day I'd consider humanity to have lost. We might not be immediately wiped out by a nanobot swarm, but from that time forward humans will be more like pawns than players, and when our replacement actuators have been built, we'll likely be left without the resources we need to survive.

*****I might be many months late to this post but I wanted to suggest a third option inspired by Russell & Norvig's textbook "Artificial Intelligence: A Modern Approach":

(c) Agents that know they don’t know the objective.

They argue that the standard model of AI, where we supply a fully specified objective, won't scale effectively. They state, "We don’t want machines that are intelligent in the sense of pursuing their objectives; we want them to pursue our objectives... [while being] necessarily uncertain as to what they are." 

When an agent recognizes this uncertainty, it is driven to "act cautiously, ask permission, learn more about our preferences..., and defer to human control," enabling provably beneficial behavior.

This approach, in my opinion, beats options (a) and (b) because:

  • (a) Perfect alignment: We can't fully specify human interests/objectives, and mis-specification harms scale with capability.
  • (b) Permanent constraints: Static limits are fragile; smarter agents can find ways around them—and they also block useful capabilities.
Reply
0Jagged Vs. Continuous intelligence
2mo
0