Transcription and Summary of Nick Bostrom's Q&A

by daenerys30 min read17th Nov 201110 comments


Q&A (format)

INTRO: From the original posting by Stuart_Armstrong:

Underground Q&A session with Nick Bostrom ( on existential risks and artificial intelligence with the Oxford Transhumanists (recorded 10 October 2011).

Below I (will) have a summary of the Q&A followed by the transcription. The transcription is slightly edited, mainly for readability. The numbers are minute markers. Anything followed by a (?) means I don't know quite what he said (example- attruing(?) program), but if you figure it out, let me know!

SUMMARY: I'll have a summary here by end of the day, probably.


Nick: I wanted to just [interact with your heads]. Any questions, really, that you have. To discuss with you. I can say what I’m working on right now which is this book on super-intelligence, not so much on the question of whether and how long it might take to develop machine intelligence that equals human intelligence, but rather what happens if and when that occurs. To forget human level machine intelligence, how quickly, how explosively will we get super-intelligenct, and how can you solve the control problem. If you build super-intelligence how can you make sure it will do what you want. That it will be safe and beneficial.

Once one starts to pull on that problem, it turns out to be quite complicated and difficult. That it has many aspects to it that I would be happy to talk about. Or if you prefer to talk about other things; existential risks, or otherwise, I’d be happy to do that as well. But no presentation, just Q&A. So you all have to provide at least the stimulus. So should I take questions or do you want…


Questioner: So what’s your definition of machine intelligence or super-intellegence AI… Is there like a precise definition there?

Nick: There isn’t. Now if you look at domain specific intelligence, there are already areas where machines surpass humans, such doing arithmetical calculations or chess. I think the interesting point is when machines equal humans in general intelligence or perhaps slightly more specifically in engineering intelligence. So if you had this general capability of being able to program creatively and design new systems... There is in a sense a point at which if you had sufficient capability of that sort, you have general capability.

Because if you can build new systems, even if all it could initially do is this type engineering work, you can build yourself a poetry module or build yourself a social skills module, if you have that general ability to build . So it might be that general intelligence or it might be that slightly more narrow version of that engineering type of intelligence is the key variable to look at. That’s the kind of thing that can unleash the rest. But “human-level intelligence”... that’s a vague term, and I think it’s important to understand that. It’s not necessarily the natural kind.


Questioner: Got a question that maybe should have waited til the end: There are two organizations, FHI and SIAI, working on this. Let's say I thought this was the most important problem in the world, and I should be donating money to this. Who should I give it to?


It's good. We've come to the chase!

I think there is a sense that both organizations are synergistic. If one were about to go under or something like that, that would probably be the one. If both were doing well, it's... different people will have different opinions. We work quite closely with a lot of the folks from SIAI.

There is an advantage to having one academic platform and one outside academia. There are different things these types of organizations give us. If you want to get academics to pay more attention to this, to get postdocs to work on this, that's much easier to do within academia; also to get the ear of policy-makers and media.

On the other hand, for SIAI there might be things that are easier for them to do. More flexibility, they're not embedded in a big bureaucracy. So they can more easily hire people with non-standard backgrounds without the kind of credentials that we would usually need, and also more grass-roots stuff like the community blog Less Wrong, is easier to do.

So yeah. I'll give the non-answer answer to that question.


Questioner: Do you think a biological component is necessary for an artificial intelligence to achieve sentience or something equivalent?

Nick: It doesn’t seem that that should be advantageous…If you go all the way back to atoms, it doesn’t seem to matter that it’s carbon rather than silicon atoms. Then you could wonder, instead of having the same atoms you run a simulation of everything that’s going on. Would you have to simulate biological processes? I don’t even think that’s necessary.

My guess (and Im not sure about this, I don’t have an official position or even a theory about what exactly the criteria are that would make a system conscious)…But my intuition is that If you replicated computational processes that goes on in a human brain, at a sufficient level of detail, where that sufficient level of detail might be roughly on the level of individual neurons and synapses, I think you would likely have consciousness. And it might be that it’s something weaker than that which would suffice. Maybe you wouldn’t need every neuron. Maybe you could simplify things and still have consciousness. But at least at that level it seems likely.

It’s a lot harder to say if you had very alien types of mental architecture. Something that wasn’t a big neural network but of normal machine intelligence that performs very well in a certain way, but using a very different method than a human brain. Whether that would be conscious as well? Much less sure. A limiting case would be a big lookup table that was physically impossible to realize, but you can imagine having every sort of situation possible described, and that program would run through until it found the situation that matched its current memory and observation and would read off which action it should perform. But that would be an extremely alien type of architecture. But would that have conscious experience or not? Even less clear. It might be that it would not have, but maybe the process of generating this giant look-up table would generate kinds of experiences that you wouldn’t get from actually implementing it or something like that. (?)


Questioner- This relates to AI being dangerous. It seems to me that while it would certainly be interesting if we were to get AI that were much more intelligent than a human being, its not necessarily dangerous.

Even if the AI is very intelligent it might be hard for it to get resources for it to actually do anything to be able to manufacture extra hardware or anything like that. There are obviously situations where you can imagine intelligence or Creative thinking can get you out of or get you further capability . So..

Nick: I guess it’s useful to identify two cases: One is sort of the default case unless we successfully implement some sort of safeguard or engineer it in a particular way in order to avoid dangers …So let’s think of a default just a bit: You have something that is super intelligent and capable of improving itself to even more levels of super intelligence…. I guess one way to get initial possibility of why this is dangerous is to think about why humans are powerful.. Why are we dominant on this planet? It’s not because we have stronger muscles or our teeth are sharper or we have special poison glands. It’s all because of our brains, which have enabled us to develop a lot of other technologies that give us in effect muscles that are stronger than the other animals…We have bulldozers and external devices and all the other things. And also, it enables us to coordinate socially and build up complicated society so we can act as groups. And all of this makes us supreme on this planet. We can argue with the case of bacteria which have their own domains where they rule. But certainly in the case of the larger mammals we are unchallenged because of our brains.

And the brains are not all that different from the brains of other animals. It might be that all these advantages we have are due to a few tweaks on some parameters that occurred in our ancestors a couple million years ago. And these tiny changes in the nature our intelligence that had these huge affects. So just prima facie it then seems possible that that if the system surpassed us by just a small amount that we surpass chimpanzees, it could lead to a similar kind of advantage in power. And if they exceeded our intelligence by a much greater margin, then all of that could happen in a more dramatic fashion

It’s true that you could have in principle an AI that was locked in a box, such that it would be incapable of affecting anything outside the box and in that sense it would be weak. That might be one of the safety methods one tries to apply that I've been thinking about.

Broadly speaking you can distinguish between two different approaches to solving the control problem, of making sure that super-intelligence, if it’s built wouldn’t cause harm. On one hand you have capability control measures where you try to limit what the AI is able to do. The most obvious example would be lock it in a box and limit its ability to interact with the rest of the world.

The other class of approach would be motivation selection methods where you would try to control what it wants to do. Where you build it in such a way that even if it has the power to do all this bad stuff, it would choose not to. But so far, there isn’t one method or even a combination of methods that it seems we can currently be fully convinced would work. There’s a lot more work needed...


Questioner: Human beings have been very successful. One feature of that that has been very crucial are our hands that have enabled us to get a start on working tools and so on. Even if an AI is running on some computer somewhere, that would be more analogous to a very intelligent creature which doesn’t have very good hands. It’s very hard for it to actually DO anything.

Maybe the in-the-box method is promising. Because if we just don’t give the AI hands, some way to actually do something..If all it can do is alter its own code, and maybe communicate infomationally. That seems...

Nick: So let’s be careful there… So clearly it’s not “hands” per se. If it didn’t have hands it could still be very dangerous, because there are other people with hands, that it could persuade to do its bidding. It might be that it has no direct effectors other than the ability to type very slowly, and then some human gatekeeper could read and choose to act on or not. Even that limited ability to affect the world might be sufficient if it had a super power in the domain of persuasion. So if it had an engineering super-power, it might then get all these other superpowers. And then if it were able to, in particular be a super skilled persuader, it could then get other accessories outside our system that could implement its designs.

You might have heard of this guy, Eliezer Yudkowsky, about 5 years back who ran a series of role playing exercises...The idea was one person should play the AI, pretend to be in a box. The other should play the human gatekeeper whose job was to not let the AI out of the box, but he has to talk with the AI for a couple of hours over the internet chat. This experiment was run five times, with EY playing the AI and different people playing the human gatekeeper. And for the most part people, who were intitially convinced that they would never let the AI out of the box, but in 3 of 5 cases, the experiment ended with the gatekeepers announcing yes, they would let the AI out of the box.

This experiment was run under conditions that neither party would be allowed to disclose the methods that were used, the main conversational sequence...sorta maintain a shroud of mystery. But this is where the human-level persuader has two hours to work on the human gatekeeper. It seems reasonable to be doubtful of the ability of humanity to keep the super-intelligent persuader in the box, indefinitely, for that reason.


Questioner: How hard do you think the idea of controlling the mentality of intelligence is, with something at least as intelligent as us, considering how hard it is to convince humans to act in a certain civilized way of life?

Nick: So humans sort of start out with a motivation system and then you can try to persuade them or structure incentives to behave in a certain way. But they don’t start out with a tabula rasa where you get to write in what a human’s values should be. So that’s made a difference. In the case of the super-intelligence of course once it already has unfriendly values and it has sufficient power, it will resist any attempt to corrupt its goal system as it would see it.


Questioner: You don’t think that like us, its experiences might cause it to question its core values as we do?

Nick: Well, I think that depends on how the goal system is structured. So with humans we don’t have a simple declarative goal structure list. Not like a simple slot where we have super goal and everything else is derived from that

Rather it’s like many different little people inhabit our skull and have their debates and fight it out and make compromises. And in some situations, some of them get a boost like permutations and stuff like that. Then over time we have different things that change what we want like hormones kicking in, fading out, all kinds of processes.

Another process that might affect us is what I call value accretion. The idea that we can have mechanisms that loads new values into us, as we go along. Like maybe falling in love is like that; Initially you might not value that person for their own sake above any other person. But once you undergo this process you start to value them for their own sake in a special way. So human have this mechanism that make us acquire values depending on our experiences.

If you were building a machine super intelligence and trying to engineer its goal systems so that it will be reliably safe and human friendly, you might want to go with something, more transparent where you have an easier time seeing what is happening, rather than have a complex modular minds with a lot of different forces battling it might want to have a more hierarchical structure.

Questioner: What do you think of the necessary…requisites for the conscious mind? What are the features?

Nick: Yes, I’m not sure. We’ve talked a little on that earlier. Suppose there is a certain kind of computation that is needed, that is really is the essence of mind. I’m sympathetic to the idea that something in the vicinity of that view might be correct. You have to think about exactly how to develop it. Then there is this stage of what is a computation.

So there is this challenge (I think it might go back to Hans Moravec but I think similar objections have been raised in philosophy against computationalism) where the idea is that if you have an arbitrary physical system that is sufficiently complicated, it could be a stone or a chair or just anything with a lot of molecules in it. And then you have this abstract computation that you think is what constitutes the implementation of the mind. Then there would be some mathematical mapping between all the parts in your computation and atoms in the chair so that you could artificially, through a very complicated mapping interpret the motions of the molecules in the chair in such a way that they would be seen as implementing the computation. It would not be any plausible mapping, not a useful mapping, but a bizarro mapping. Nonetheless if there were sufficiently limited parts there, you could just arbitrarily define some, by injection..

And clearly we don’t think that all these random physical objects implement the mind, or all possible minds.

So the lesson to me is that it seems that we need some kind of account of what it means to implement a computation that is not trivial and this mapping function between the abstract entity that is a sort of Turing program, or whatever your model of a computation is and the physical entity that decides to implement it to be some sort of non-trivial representation of what this mapping can look like

It might have to be reasonably simple. It might have to have certain counter-factual properties, so that the system would have implemented a related, but slightly different computation if you had scrambled the initial conditions of the system in a certain way, so something like that. But this is an open question in the philosophy of mind, to try to nail down what it means to implement the computation.


Questioner: To bring back to the goal and motivation approach to making an AI friendly towards us, one of the most effective ways of controlling human behavior, quite aside from goals and motivations , is to train them by instilling neuroses. It’s why 99.99% of us in this room couldn’t pee in our pants right now even if he really, really wanted to.

Is it possible to approach controlling an AI in that way or even would it be possible for an AI to develop in such a way that there is a developmental period in which a risk-reward system or some sort of neuroses instilment could be used to basically create these rules that an AI couldn’t break?

Nick: It doesn’t sound so promising because a neurosis is a complicated thing that might be a particular syndrome of a phenomenon that occurs in human- style mind, because of the way that humans’ minds are configured. It’s not clear there would be something exactly analogous to that in a cognitive system with a very different architecture.

Also, because neuroses, at least certain kinds of neuroses, are ones we would choose to get rid of if we could. So if you had a big phobia and there was a button that would remove the phobia, obviously you would press the button. And here we have this system that is presumably able to self-modify. So if it had this big hang up that it didn’t like, then it could reprogram itself to get rid of that.

This would be different than a top-level goal because top-level goal would be the criterion it produced to decide whether to take an action. In particular, like an action to remove the top level goal.

So generally speaking with reasonable and coherent goal architecture you would get certain convergent instrumental values that would crop up in a wide range of situations. One might be self preservation, not necessarily because you value your own survival for its own sake, but because in many situations you can predict that if you are around in the future you can continue to act in the future according to your goals, and that will make it more likely that the world will then be implementing your goals.

Another convergent instrumental value might be protection of your goal system from corruption (?) for very much the same reason. For even if you were around in the future but you have different goals from the ones you had now, you would now predict that that means in the future you will no longer be working towards realizing your current goals but maybe towards a completely different purpose, that would make it now less likely that your current goals would be realized. If your current goals are what you use as a criterion to choose an action, you would want to try to take actions that would prevent corruption of your goal system.

One might list a couple of other of the convergent instrumental values like intelligence amplification, technology perfection and resource acquisition. So this relates to why generic super-intelligence might be dangerous. It’s not so much that you have to worry that it would have human Unfriendliness in the sense of disliking human goals, that it would *hate* humans . The danger is that it wouldn’t *care* about humans. It would care about something different, like paperclips. But then if you have almost any other goals, like paperclips, there would be these other convergent instrumental reasons that you discover. For while your goal is to make as many paperclips as possible you might want to a) prevent humans from switching you off or tampering with your goal system or b) you might want to acquire as much resources as possible, including planets, and the solar system, and the galaxy. All of that stuff could be made into paperclips. So even with pretty much a random goal, you would end up with these motivational tendencies which would be harmful to humans.


Questioner: Appreciating the existential risks, what do you think about goals and motivations, and such drastic measures of control sort of a) ethically and b) as a basis of a working relationship?

Nick: Well, in terms of the working relationship one has to think about the differences with these kinds of the artificial being. I think there are a lot of (?) about how to relate to artificial agents that are conditioned on the fact that we are used to dealing with human agents, and there are a lot of things we can assume about the human.

We can assume perhaps that they don’t want to be enslaved. Even if they say that they want to be enslaved, we might think that deep inside of them, there is a sort of more genuine authentic self that doesn’t want to be enslaved. Even if some prisoner has been brainwashed to do the bidding of their master, maybe we say it’s not really good for them because it’s in their nature, this will to be autonomous. And there are other things like that, that don’t necessarily have to obtain for a completely artificial system which might not have any of that rich human nature that we have.

So in terms of what the good working relationship is, just as what we think of a good relationship with our word processor or email program. Not in these terms, as if you’re exploiting it for your ends, without giving it anything in return. If your email program had a will, presumably it would be the will to be a good and efficient email program that processed your emails properly. Maybe that was the only thing it wanted and cared about. So having a relationship with it would be a different thing.

There was another part of your question, about whether this would be right and ethical. I think if you are operating a new agent from scratch, and there are many different possible agents you could create, some of those agents will have human style values; they want to be independent and respected. Other agents that you could create would have no greater desire than to be of service. Others would just want paperclips. So if you step back, and look at which of these options we should decide, then looking at the question of moral constraints on which of these are legitimate.

And I’m not saying that those are trivial, I think there are some deep ethical questions here. However in the particular scenario where we are considering the creation of a single super intelligence the more pressing concern would be to ensure that it doesn’t destroy everything else, like humanity and its future. Now, if you have a different scenario, like instead of this one uber-mind rising ahead, you have many minds that become smarter and smarter that rival humans and then gradually exceed them

Say an uploading scenario where you start with very slow software, where you have human like minds running very slowly. In that case, maybe how we should relate to these machine intellects morally becomes more pressing. Or indeed, even if you just have one, but in the process of figuring out what to do it creates “thought crimes”.

If you have a sufficiently powerful mind maybe you have thoughts themselves would contain structures that are conscious. This sounds mystical, but imagine you are a very powerful computer and one of the things you are doing is you are trying to predict what would happen in the future under different scenarios, and so you might play out a future

And if those simulations you are running inside of this program were sufficiently detailed, then they could be conscious. This comes back to our earlier discussion of what is conscious. But I think a sufficiently detailed computer simulation of the mind could be conscious

You could then have a super intelligence that could process by thinking about things could create sentient beings, maybe millions or billions or trillions of them, and their welfare would then be a major ethical issue. They might be killed when it stops thinking about them, or they might be mistreated in different ways. And I think that would be an important ethical complication in this context


Questioner: Eliezer suggests that one of the many problems with arbitrary stamps in AI space is that human values are very complex. So virtually any goal system will go horribly wrong because it will be doing things we don’t quite care about, and that’s as bad as paperclips. How complex do you think human values will be?

Nick: It looks like human values are very complicated. Even if they were very simple, even if it turned out its just pleasure say, which compared to other things of what has value, like democracy flourishing and art. As far as we can think of values that’s one of the more simplistic possibilities. Even that if you start to think of it from a physicalistic view, and you have to now specify which atoms have to go how and where for there to be pleasure. It would be a pretty difficult thing to write down, Like the Schrödinger Equation for pleasure.

So in that sense it seems fair that our values are very complex. So there are two issues here. There is a kind of technical problem of figuring out that if you knew what our values are, in the sense that we think that we normally know what our values are, how we could get the AI to share those values, like pleasure or absence of pain or anything like that.

And there is the additional philosophical problem which is if we are unsure of what are values are, if we are groping about in axiology trying to figure out how much to value different things, and maybe there are values we have been blind to today, then how do you also get all of that on board, on top of what we already think has value, that potential of moral growth? Both of those are very serious problems and difficult challenges.

There are a number of different ways you can try to go. One approach that is interesting is what we might call is indirect normativity. Where the idea is rather than specifying explicitly what you want the AI to achieve, like maximizing pleasure while respecting individual autonomy and pay special attention to the poor. Rather than creating a list, what you try to do instead is specify a process or mechanism by which the AI could find out what it is supposed to do.

One of these ideas that has come out is this idea Coherent Extrapolated Volition, where the idea is if you could try to tell the AI to do that which we would have asked it to do if we had thought about the problem longer, and if we had been smarter, and if we had some other qualifications. Basically, if you could describe some sort of idealized process whereby we at the end, if we underwent that process would be able to create a more detailed list, then maybe point the AI to that and make the AI’s value to run this process and do what comes out of the end of that, rather than go with where our current list gets us about what we want to do and what has value.


Questioner: Isn’t there are risk that.. the AI would decide that if we thought about it for 1000 years really, really carefully, that we would just decide to just let the AIs to take over?

Nick: Yeah, that seems to be a possibility. And then that raises some interesting questions. Like if that is really what our CEV would do. Let’s assume that everything has been implemented in the right way, like there is no flaw on the realization of this. So how should we think about this?

Well on the one hand, you might say if this is really what our wiser selves would want. What we would want if we were saved from these errors and illusions we are suffering under, then maybe we should go ahead with that. On the other hand, you could say, this is really a pretty tall order. That we’re supposed to sacrifice not just a bit, but ourselves and everybody else, for this abstract idea that we don’t really feel any strong connection to. I think that’s one of the risks, but who knows what will be the outcome of this CEV?

And there are further qualms one might have that need to be spelled out. Like exactly whose volition is it that is supposed to be extrapolated. Humanity’s? Well then, who is humanity? Like does it include past generations for example? How far back? Does it include embryos that died?

Who knows whether the core of humanity is nice? Maybe there are a lot of suppressed sadists out there, that we don’t realize, because they know that they would be punished by society. Maybe if they went through this procedure, who knows what would come out?

So it would be dangerous to run something like that, without some sort of safeguard check at the end. On the other hand, there is worry that if you put in too many of these checks, then in effect you move the whole thing back to what you want now. Because if you were allowed to look at an extrapolation, see whether you like it, or if you dislike it you run another one by changing the premises and you were allowed to keep going like that until you were happy with the result then basically it would be you now, making the decision. So, it’s worth thinking about, whether there is some sort of compromise or blend that might be the most appealing.


Questioner: You mentioned before about a computer producing sentience itself in running a scenario. What are the chances that that is the society that we live in today?

Nick: I don’t know, so what exactly are the chances? I think significant. I don’t know, it’s a subjective judgment here. maybe less than 50%? Like 1 in 10?

There’s a whole different topic, maybe we should save that topic for a different time..


Questioner: If I wanted to study this area generally, existential risk, what kind of subject would you recommend I pursue? We’re all undergrads, so after our bachelors we will start on master or go into a job. If I wanted to study it, what kind of master would you recommend?

Nick: Well part of it would depend on your talent, like if you’re a quantitative guy or a verbal guy. There isn’t really an ideal sort of educational program anywhere, to deal with these things. You’d want to get a fairly broad education, there are many fields that could be relevant. If one looks at where people are coming from so far that have had something useful to say, a fair chunk of them are philosophers, some computer scientists, some economists, maybe physics.

Those fields have one thing in common in that they are fairly versatile. Like if you’re doing Philosophy, you can do Philosophy of X, or of Y, or of almost anything. Economics as well. It gives you a general set of tools that you can use to analyze different things, and computer science has these ways of thinking and structuring a problem that is useful for many things

So it’s not obvious which of those disciplines would be best, generically. I think that would depend on the individual, but then what I would suggest is that while you were doing it, you also try to read in other areas other than the one you were studying. And try to do it at a place where there are a lot of other people around with a support group and advisor that encouraged you and gave you some freedom to pursue different things.


Questioner: Would you consider AI created by human beings as some sort of consequence of evolutionary process? Like in a way that human beings tried to overcome their own limitations and as it’s a really long time to get it on a dna level you just get it quicker on a more computational level?

Nick: So whether we would use evolutionary algorithms to produce super- intelligence or..?

Questioner: If AI itself is part of evolution..

Nick: So there’s kind of a trivial sense in which if we evolved and we created…then obviously evolution had a part to play in the overall causal explanation of why we’re going to get machine intelligence at the end. Now, for evolution to really to exert some shaping influence there have to be a number of factors at play. There would have to be a number of variants created that are different and then compete for resources and then there is a selection step. And for there to be significant evolution you have to enact this a lot of times.

So whether that will happen or not in the future is not clear at all. If you have a signal tone for me, in that if a world order arises at a top level. Where there is only one decision making agency, which could be democratic world government or AI that rules everybody, or a self-enforcing moral code, or tyranny or a nice thing or bad thing

But if you have that kind of structure there will at least be, in principal ability, for that unitary agent to control evolution within itself, like it could change selection pressures by taxing or subsidizing different kinds of life forms.

If you don’t have a singleton then you have different agencies that might be in competition with one another, and in principle in that scenario evolutionary pressures can come into play. But I think the way that it might pan out would be different from the way that we’re used to seeing biological evolution, so for one thing you might have these potentially immortal life forms, that is they have software minds that don’t naturally die, that could modify themselves.

If they knew that their current type, if they continued to pursue their current strategy would be outcompeted and they didn’t like that, they could change themselves immediately right away rather than wait to be eliminated.

So you might get, if there were to be a long evolutionary process ahead and agents could anticipate that, you might get the effects of that instantaneously from anticipation.

So I think you probably wouldn’t see the evolutionary processes playing out but there might be some of the constraints that could be reflected more immediately by the fact that different agencies had to pursue strategies that they could see would be viable.


Questioner: So do you think it’s possible that our minds could be scanned and then be uploaded into a computer machine in some way and then could you create many copies of ourselves as those machines?

Nick: So this is what in technical terminology is “whole brain emulation” or in more popular terminology “uploading”. So obviously this is impossible now, but seems like it’s consistent with everything we know about physics and chemistry and so forth. So I think that will become feasible barring some kind of catastrophic thing that puts a stop to scientific and technological progress.

So the way I imagine it would work is that you take a particular brain, freeze it or vitrify it, and then slice it up into thin slices that would be fed through some array of microscopes that would scan each slice with sufficient resolution and then automated image analysis algorithms would work on this to reconstruct the 3 dimensional neural network that your own organic brain implemented and I have this sort of information structure in a computer.

At this point you need computational neuroscience to tell you what each component does. So you need to have a good theory of what say a pyramidal cell does, what a different kind of…And then you would combine those little computational models of what each type of neuron does with this 3D map of the network and run it. And if everything went well you would have transferred the mind, with memories and personalities intact to the computer. And there is an open question of just how much resolution would you need to have, how much detail you would need to capture of the original mind in order to successfully do this. But I think there would be some level of detail which as I said before, might be on the level of synapses or thereabouts, possibly higher, that would suffice. So then you would be able to do this. And then after you’re software , you could be copied, or speeded up or slowed down or paused or stuff like that


Questioner: There has been a lot of talk of controlling the AI and evaluating the risk. My question would be assuming that we have created a far more perfect AI than ourselves is there a credible reason for human beings to continue existing?

Nick: Um, yeah, I certainly have the reason that if we value our own existence we seem to have a…Do you mean to say that there would be a moral reason to exist or if we would have a self interested reason to exist.

Questioner: Well I guess it would be your opinion..

Nick: My opinon is that I would rather not see the genocide of the entire human species. Rather that we all live happily ever after. If those are the only two alternatives, I think yeah! Let’s all live happily ever after! Is where I would come down on that.


Questioner: By keeping human species around You’re going to have a situation presumably where you have extremely, extremely advanced AIs where they have few decades or few centuries or whatever and they will be far, far beyond our comprehension, and even if we still integrate to some degree with machines (mumble) biological humans then they’ll just be completely inconceivable to us. So isn’t there a danger that our stupidity will hamper their perfection?

Nick: Would hamper their perfection?? Well there’s enough space for there to be many different kinds of perfection pursued. Like right now we have a bunch of dust mites crawling around everywhere, but not really hampering our pursuit of art or truth or beauty. They’re going about their business and we’re going about ours.

I guess you could have a future where there would be a lot of room in the universe for planetary sized computers thinking their grand thoughts while…I’m not making a prediction here, but if you wanted to have a nature preserve, with original nature or original human beings living like that, that wouldn’t preclude the other thing from happening..

Questioner: Or a dust mite might not hamper us, but things like viruses or bacteria just by being so far below us (mumble). And if you leave humans on a nature preserve and they’re aware of that, isn’t there a risk that they’ll be angry at the feeling of being irrelevant at the grand scheme of things?

Nick: I suppose. I don’t think it would bother the AI that would be able to protect itself, or remain out of reach. Now it might demean the remaining humans if we were dethroned from this position of kings, the highest life forms around, that it would be a demotion, and one would have to deal with that I suppose.

It’s unclear how much value to place on that. I mean right now in this universe which looks like it’s infinite somewhere out there are gonna be all kinds of things including god like intellects and everything in between that are already outstripping us in every possible way.

It doesn’t seem to upset us terribly; we just get on with it. So I think people will have to make some psychological..I’m sure we can adjust to it easily. Now it might be from some particular theory of value that this might be a sad thing for humanity. That we are not even locally at the top of the ladder.

Questioner: If rationalism was true, that is if it were irrational to perform wrong acts. Would we still have to worry about super-intelligence? It seems to me that we wouldn’t have.

Nick: Well you might have a system that doesn’t care about being rational, according to that definition of rationality. So I think that we would still have to worry


Questioner: Regarding trying to program AI without values, (mumbles) But as I understand it, what’s considered one of the most promising approach in AI now is more statistical learning type approaches.. And the problem with that is if we were to produce an AI with that, we might not understand its inner workings enough to be able to dive in and modify it in precisely the right way to give it an unalterable list of terminal values.

So if we were to end up with some big neural network that we trained in some way and ended up with something that could perform as well as humans in some particular task or something. We might be able to do that without knowing how to alter it to have some particular set of goals.

Nick: Yeah, so there are some things there to think about. One general worry that one needs to bear in mind if one tries that kinds of approach is we might give it various examples like this is a good action and this is a bad action in this context, and maybe it would learn all those examples then the question is how would it generalize to other examples outside this class?

So we could test it we could divide our examples initially into classes and train it on one and test its performance on the other, the way you would do to cross-validate. And then we think that means other cases that it hasn’t seen it would have the same kind of performance. But all the cases that we could test it on would be cases that would apply to its current level of intelligence. So presumably we’re going to do this while it’s still at human or less than human intelligence. We don’t want to wait to do this until it’s already super-intelligent.

So then the worry is that even if it were able to analyze what to do in a certain way in all of these cases, it’s only dealing with all of these cases in the training case, when it’s still at a human level of intelligence. Now maybe once it becomes smarter it will realize that there are different ways of classifying these cases that will have radically different implications for humans.

So suppose that you try to train it to… this was one of the classic example of a bad idea of how to solve the control problem: Lets train the AI to want to make people smile, what can go wrong with that? So we train it on different people and if they smile when it does something that’s like a kind of reward; it gets strength in those positions that led to the behavior that made people smile. And frowning would move the AI away from that kind of behavior. And you can imagine that this would work pretty well at a primitive state where the AI will engage in more pleasing and useful behavior because the user will smile at it and it will all work very well. But then once the AI reaches a certain level of intellectual sophistication it might realize that It could get people to smile not just by being nice but also by paralyzing their facial muscles in that constant beaming smile.

And then you would have this perverse instantiation of the constant values all along the value that it wants to make people smile, but the kinds of behaviors it would pursue to achieve this goal would suddenly radically change at a certain point once the new set of strategies became available to it, and you would get this treacherous turn, which would be dangerous. So that’s not to dismiss that whole category of approaches altogether. One would have to think through quite carefully, exactly how one would go about that.


There’s also the issue of, a lot of the things we would want it to learn, if we think of human values and goals and ambitions. We think of them using human concepts, not using basic place atom A to zed in a certain order, But we think like promote peace, encourage people to develop and achieve…These are things that to understand them we really need to have human concept, which a sub-human AI will not have, it’s too dumb at that stage to have that. Now once it’s super-intelligent it might easily understand all human concepts but then it’s too late. It already needs to be friendly before that. So there might only be this brief window of opportunity where its roughly human leve,l where its still safe enough not to resist our attempt to indoctrinate it but smart enough that it can actually understand what we are trying to tell it.

And again were going to have to be very careful to make sure that we can bring the system up to that interval and then freeze its development there and try to load the values in before boot strapping it farther.

And maybe(this was one of the first questions) its intelligence will not be human level in the sense of being similar to a human at any one point. Maybe it will immediately be very good at chess but very bad at poetry and then it has to reach radically superhuman levels of capability in some domains before other domains even reach human level. And in that case it’s not even clear that there will be this window of opportunity where you can load in the values. So I don’t want to dismiss that, but that’s like some additional things that one needs to think about, if one tries to develop that.


Questioner: How likely is it that we will have the opportunity in our lifetimes to become immortal by mind uploading?

Nick: Well first of all, by immortal here we mean living for a very long time, rather than literally never dying, which is a very different thing that would require our best theories of cosmology to turn out to be false for something like that.

So living for a very long time: Im not going to give you a probability in the end. But I can say some of the things that…Like first we would have to avoid most kinds of things like existential catastrophe that could put an end to this.

So, if you start with 100% and you remove all the things that could go wrong, so first you would have to throw away whatever total level of existential risk is, integrated over all time. Then there is the obvious risk that you will die before any of this happens, which seems to be a very substantial risk. Now you can reduce that by signing up for cryonics, but that’s of course an uncertain business as well. And there could be sub-existential catastrophes that would put an end to a lot of things like a big nuclear war or pandemics.

And then I guess there are all these situations in which not everybody who is still around gets the opportunity to participate in what came after. Even though what came after doesn’t count as an existential catastrophe… And [it can get] even more complicated, like if you took into account the simulation hypothesis, which we decided not to talk about today.


Q: Is there a particular year we should aim for?

Nick: As for the timelines, truth is we don’t know. So you need to think about a very smeared out probability distribution. And really smear it, because things could happen surprisingly sooner like some probability 10 years from now or 20 years now but probably more probable at 30, 40, 50 years but some probability at 80 years or 200 years..

There is just not good evidence that human beings are very good at predicting with precision these kinds of things far out in the future.

Questioner: (hard to understand) How intelligent can we really get. … we already have this complexity class of problems that we can solve or not…

Is it fair to believe that a super-intelligent machine can be actually be that exponentially intelligent... this is very close to what we could achieve …A literal definition of intelligence also, but..

Nick: Well in a sort of cheater sense we could solve all problems, sort of like everything a Turing Machine could take like a piece of paper and..

a) It would take too long to actually do it, and if we tried to do it, there are things that would probably throw us off before we have completed any sort of big Turing machine simulation

There is a less figurative sense in which our abilities are already indirectly unlimited. That is, if we have the ability to create super intelligence, then in a sense we can do everything because we can create this thing that then solves the thing that we want solved. So there is this sequence of steps that we have to go through, but in the end it is solved.

So there is this level of capability that means that once you have that level of capability your indirect reach is universal, like anything that could be done, you could indirectly achieve, and we might have already surpassed that level a long time ago, save for the fact that we are sort of uncoordinated on a global level and maybe a little bit unwise.

But if you had a wise singleton then certainly you could imagine us plotting a very safe course, taking it very slowly and in the end we could be pretty confident that we would get to the end result. But maybe neither of those ideas are what you had in mind. Maybe you had more in mind The question of just how smart, in everyday sort of smart could a machine be,. So just how much more effective at social persuasion, to take one particular thing, than the most persuasive human.

So that we don’t really know. If one has a distribution of human abilities, and it seems like the best humans can do a lot better, in our intuitive sense of a lot, than the average humans. Then it would seem very surprising if the best humans like the top tenth of a percent had reached the upper limit of what was technologically feasible, that would seem to be an amazing coincidence. So one would then expect for the maximum achievable to be a lot higher. But exactly how high we don’t know.

So two more questions:


Q: Just like we are wondering about super-intelligent being, is it possible that that super-intelligent will worry about another super-intelligent being that it will create? Isn’t that also recursive?

Nick: So you consider where one AI designs another AI that’s smarter and then that designs another.

But it might not be clearly distinguishable from the scenario where we have one AI that modifies itself so that it ends up smarter. Whether you call it the same or different, it might be an unimportant difference.

Last question. This has to be super profound question.


Q: So my question is why should we even try to build a super-intelligence?

Nick: I don’t think we should now, do that. If you took a step back and thought what would a sane species do, well they would first figure out how to solve the control problem, and then they would think about it for a while to make sure that they really had the solution right and they hadn’t just deluded themselves to how to solve it, and then maybe they would build a super-intelligence.

So that’s what the sane species will do, now what humanity will do is try to do everything they can as soon as possible, so there are people who have tried to build it as we speak, in a number of different places on earth, and fortunately it looks very difficult to build it with current technology. But of course it’s getting easier over time, computers get better, computer science, the state of the art advances, we learn more about how the human brain works.

So every year it gets a little bit easier, from some unknown very difficult level, it gets easier and easier. So at some point it seems someone will probably succeed at doing it. If the world remains sort of uncoordinated and uncontrolled as it is now, it’s bound to happen soon after it becomes feasible. But we have no reason to accelerate that even more than its already happening ...

So we were thinking about what would a powerful AI thing do that had just come into existence and it didn’t know very much yet, but it had a lot of clever algorithms and a lot of processing power. Someone was suggesting maybe it would move around randomly, like a human baby does, to figure out how things move, how it can move its actuators.

Then we had a discussion if that was a wise thing or not.

But if you think about how the human species behave, we are really behaving very much like a baby were sort of moving and shaking everything that moves, just to see what happens. And the risk is that we are not in the nursery with a kind mother who has put us in a cradle, but that we are out in the jungle somewhere screaming at the top of our lungs, and maybe just alerting the lions to their supper.

So let’s wrap up. I enjoyed this a great deal, so thank you for your questions.


10 comments, sorted by Highlighting new comments since Today at 6:26 PM
New Comment

"the main specific intelligence"

He actually said "domain-specific intelligence".

That makes WAY more sense!

You might have heard of this guy EY about 5 years back who ran a series of role playing exercises...

He didn't actually say 'this guy eee why' did he?

Hans Morivick


a sort of attruing (?) program

"a sort of Turing program"

Like if you’re doing Philosohpy, you can do Philosophy of X, or of Y, or of almost anything


Also, not all the questions are bolded.

Some more along those lines:

So there’s kind of a triggeral(?) sense...

Probably "trivial".

There is a less figurative science...

Probably "a less figurative sense".

how much more affective at social persuasion...


series problems -> serious problems

Nice work, upvoted!

Could you do something about the font size? It looks smaller than is standard for LW.

(Fixed formatting.)

Also, I think that there's an unnecessary amount of space at the bottom of the post.

Thanks everyone! I'll get all your corrections in tomorrow :)

EDIT- All corrections have been made. Thanks!...I am unsure if I want to post a summary or not though. I've got it edited down to a little less than half the length right now, but if I can't get it to be at least a third of the length, then there's really no point to it. I play around with a little more, and we'll see. Mainly its just a PITA.

But if you had a wise single form (?) then certainly you could imagine us plotting a very safe course

Wise singleton, perhaps?

The implication is that if you have a bunch of different agents going their own ways with no powerful agreement between them, then any one of them might change into something potentially dangerous to all of them; but if there's initially a single agent or some way for all agents to coordinate, then they can put into place strong enough safeguards to avoid those problems...

P.S. A thousand thanks for doing this!