Mechanism design for AI – Reducing Risks of Future Suffering

A few weeks ago, I finished my paper on Adaptive Mechanism Design: Learning to Promote Cooperation. In this post, I’d like to outline why I chose to work on this topic, and why I would recommend it to people who want to reduce s-risks of advanced AI.

—

Human civilisation has always faced two fundamentally different types of problems: “man vs. man” problems (also called social dilemmas) and “man vs. nature” problems. Innovation and technological progress allowed us to solve more and more man vs. nature problems. Assuming that this trend continues, it seems plausible that man vs. nature problems will eventually become negligible.

But what about social dilemmas? We often try to solve these by altering the structure of the interactions in a way that facilitates cooperation (mechanism design); for instance, we set up the legal system and the police to discourage crime. This works to an extent, but not perfectly: we are a long way from a world where everybody always acts in the most cooperative way possible.

So, on a very abstract level, better mechanism design could be considered the most important problem in the world. (That’s not to say that we need progress in the academic field of mechanism design – which, due to some odd path dependency, focuses on how auction revenue can be maximized.)

One could object that solving general problems of civilisation may not be ideal in terms of reducing s-risks. However, I think better mechanism design is valuable from this perspective, too, because it reduces the risk of escalating conflicts between future agents – which is one plausible way how very bad futures could come about.

—

If artificial learning agents become widespread in the future, they will interact with both other learning agents and humans in a variety of complex settings including social dilemmas. It will thus be important to systematically work to ensure cooperation – that is, to apply mechanism design not just to humans, but also to AI.

(If a single AI will quickly achieve a decisive strategic advantage and form a singleton, then these social dilemmas are irrelevant; but I think this scenario is relatively unlikely.)

Similar to existing proposals to “bootstrap” aligned AI, we can use AI itself to work on the problem of facilitating more cooperative outcomes in social dilemmas (regardless of whether the agents are AI systems or humans). For instance, we could set up a system of (centralised or decentralised) “policing AIs” that stop other agents from doing something harmful – similar to human police. That’s essentially what my paper tries to achieve in a simplified setting.

Leave a Reply Cancel reply