Is "Control" of a Superintelligence Possible?

Existing post that's one piece of the answer to this:

https://www.lesswrong.com/posts/EZ8GniEPSechjDYP9/free-to-optimize

Neat!

I think either is technically possible with perfect knowledge - that is, I don’t think either option is so incoherent that you cannot make any logical sense of it.

This leaves the question of which is easier. (1) requires somehow getting a full precise description of the human utility function. I don’t fully understand the arguments against (2), though MIRI seems to be pretty confident there are large issues.

[-]JBlack4y*30

The main distinction seems to be in the extent of how strongly these super-intelligent agents will use their power to influence human decision-making.

At one extreme end is total control, even in the most putatively aligned case: If my taking a sip of water from my glass at 10:04:22 am would be 0.000000001% better in some sense than sipping at 10:04:25 am, then it will arrange the inputs to my decision so that I take a sip of water at 10:04:22 am, and similarly for everything else that happens in the world. I do think that this would constitute a total loss of human control, though not necessarily a loss of human agency.

At the extreme other end would be something more like an Oracle, a superintelligent system (I hesitate to call it an agent) that has absolutely no preferences, including implied preferences, for the state of the world beyond some very narrow task.

Or to put it another way, how much slack will a superintelligence have in its implied preferences?

Concept 1 appears to be describing a superintelligence with no slack at all. Every human decision (and presumably everything else in the universe) must abide by a total strict order of preferences and it will optimize the hell out of those preferences. Concept 2 describes a superintelligence that may be designed to have - or be constrained to abide by - some slack in orderings of outcomes that depend upon human agency. Even if it can predict exactly what a human may decide, it doesn't necessarily have to act so as to cause a preferred expected distribution of outcomes.

I don't really think that we can rationally hold strong beliefs about where a future superintelligence might fall in this spectrum, or even outside it in some manner that I can't imagine. I do think that the literal first scenario is infeasible even for a superintelligent agent, if it is constrained by anything like our current understanding of physical laws. I can imagine a superintelligence that acts in a manner that is as close to that as possible, and that this would drastically reduce human control even in the most aligned case.

[-]Donald Hobson4y20

I think something that doesn't match either. I think both 1 and 2 are possible. But what we should probably go for is 3.

Suppose a superintelligent AI, and it has a goal involving giving humans as much actual control as possible.

and able to manipulate humans so well, that any “choice” humanity faces will be predetermined.

All our choices are technically predetermined, because the universe is deterministic. (Modulo unimportant quantum details)

The AI could manipulate humans, but is programmed not to want to.

This isn't an impairment scheme like boxing or oracle AI. This is the genie that listens to your wish without manipulating you, and then carries it out in the spirit you asked. If the human(s?) tell the AI to become a paperclip maximizer, the AI will. (Maybe with a little pop up box. "It looks like you are about to destroy the universe, are you sure you want to do that?" to prevent mistakes.)And the humans are making that decision using brains that haven't been deliberately tampered with by the AI.

[-]tailcalled4y20

I think it depends on the goals of the superintelligence. If it is optimized for leaving humans in control, then it could do so. However, if it is not optimized for leaving humans in control, then it would be an instrumentally convergent goal for it to take over control, and therefore it could be assumed to do so.

[-]Mahdi Complex4y10

I'm just confused about what "optimized for leaving humans in control" could even mean? If a Superintelligence is so much more intelligent than humans that it could find a way, without explicit coercion, for humans to ask it to tile the universe with paper-clips, then "control" seems like a meaningless concept. You would have to force the Superintelligence to treat the human skull, or whatever other boundary of human decision making, as some kind of unviolable and uninfluenceable black box.

[-]tailcalled4y30

This basically boils down to the alignment problem. We don't know how to specify what we want, but that doesn't mean it is necessarily incoherent.

Treating the human skull as "some kind of unviolable and uninfluenceable black box" seems to get you some of the way there, but of course is problematic in its own ways (e.g. you wouldn't want delusional AIs). Still it seems like it points to the path forwards in a way.

[-]Rafael Harth4y20

I think control is a meaningful concept. You could have AI that doesn't try to alter your terminal goals. Something that just does what you want (not what you ask, since that has well-known failure modes) without trying to persuade you into something else.

The difficulty of building such a system is another question, alas.

[-]Yitz4y10

Third option not considered here (though it may be fairly unlikely)—it may be the case that superintelligence does not provide a substantial enough advantage to be able to control much of humanity, due to implications of chaos theory or something similar. Maybe it would be able to control politics fairly well, but some coordination problems could plausibly be beyond any reasonable finite intelligence, and hence beyond its control.

[-]Victor Novikov4y10

Why would you want to control a superintelligence aligned with our values? What would be the point of that?

Why would we want to allow for individual humans, who are less-than-perfectly-aligned with our values, to control a superintelligence that is perfectly-aligned-with-our-values?

A Superintelligence would be so far superior to any human or group of humans, and able to manipulate humans so well, that any “choice” humanity faces will be predetermined.

I guess the positive way to phrase this is, "FAI would create an environment where the natural results of our choices would typically be good outcomes" (typically, but not always, because being optimized too hard to succeed is not fun).

Talking about manipulation seems to imply that FAI would trick humans into making choices against their own best interest. I don't think that, typically, is what would happen

I also see a scenario where FAI deliberately limits its ability to predict people's actions, out of respect for people being upset over the feeling of their choices being "predetermined".

But only a faint illusory glimmer of human choice would remain, while the open-ended, agentic power over the Universe would have left humanity with the creation of the first Superintelligence.

Meh. I'd rather have the FAI make the big-picture decisions, rather than some corrupt/flawed group of human officials falling prey to the usual bias in human thinking. Either way, I am not the one making the decisions, so what does it matter to me? At least FAI would actually make good decisions.

[-]Mahdi Complex4y10

I didn't mean to make 1. sound bad. I'm only trying to put my finger on a crux. My impression of most prosaic alignment work seems to be that they have 2. in mind, even though MIRI/Bostrom/LW seem to believe that 1. is actually what we should be aiming towards. Do prosaic alignment people think that work on human 'control' now will lead to scenario 1 in the long run, or do they just reject scenario 1?

[-]Victor Novikov4y20

I'm not sure I understand the "prosaic alignment" position well enough to answer this.

I guess, personally, I can see appeal of scenario 2, of keeping a super-optimizer under control and using it in limited ways to solve specific problems. I also find that scenario incredibly terrifying, because super-optimizers that don't optimize for the full set of human values are dangerous.

[+][comment deleted]4y10

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

9

Is "Control" of a Superintelligence Possible?

9

9

1. Surrender to the Will of the Superintelligence

2. Riding the Techno-Leviathan