4 comments, sorted by Click to highlight new comments since: Today at 9:46 PM
New Comment


  • Website: https://www.joshuagans.com
  • Wikipedia: https://en.wikipedia.org/wiki/Joshua_Gans


Philosophers have speculated that an AI tasked with a task such as creating paperclips might cause an apocalypse by learning to divert ever-increasing resources to the task, and then learning how to resist our attempts to turn it off. But this column argues that, to do this, the paperclip-making AI would need to create another AI that could acquire power both over humans and over itself, and so it would self-regulate to prevent this outcome. Humans who create AIs with the goal of acquiring power may be a greater existential threat.

Key paragraph:

The insight from economics is that while it may be hard, or even impossible, for a human to control a super-intelligent AI, it is equally hard for a super-intelligent AI to control another AI. Our modest super-intelligent paperclip maximiser, by switching on an AI devoted to obtaining power, unleashes a beast that will have power over it. Our control problem is the AI's control problem too. If the AI is seeking power to protect itself from humans, doing this by creating a super-intelligent AI with more power than its parent would surely seem too risky.

Link to actual paper: https://arxiv.org/abs/1711.04309


Here we examine the paperclip apocalypse concern for artificial general intelligence (or AGI) whereby a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that simple goal and are unavailable for any other use. We provide conditions under which a paper apocalypse can arise but also show that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal.

There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code. Secondly the AI doesn't have to be confident in getting it right. A paperclipping AI has two options, it can work at a factory and make a few paperclips, or it can self improve, but it has a chance of the resultant AI won't maximize paperclips. However the amount of paperclips it could produce if it successfully self improves is astronomically vast. If its goal function is linear in paperclips, it will self improve if it thinks it has any chance of getting it right. If it fails at preserving its values as its self improving then the result looks like a staple maximizer.

Humans (at least the sort thinking about AI) know that we all have roughly similar values, so if you think you might have solved alignment, but aren't sure, it makes sense to ask for others to help you, to wait for someone else to finish solving it.

However a paperclipping AI would know that no other AI's had its goal function. If it doesn't build a paperclipping super-intelligence, no one else is going to. It will therefore try to do so even if unlikely to succeed.

There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code.

"Has code" and "has code that's in an unpluggable and reusable module" are two different things.

Reading through the referenced paper I'm impressed by how well it models the situation and considers outcomes given certain assumptions about the problem. I suspect that the assumptions will not in general hold, but that still makes this interesting since any particular set of assumptions is not likely to hold and so exploring various classes of ways AGI issues may play out is valuable. I'd be interested to see more application of micro-economic models to explore AGI issues in ways that we are not currently (although to be fair current approaches are adjacent, like using game theory, but I think there's value in the slightly different perspective economic models bring).

New to LessWrong?