Unknown2 — LessWrong

Eliezer: "And you might not notice if your goals shifted only a bit at a time, as your emotional balance altered with the strange new harmonies of your brain."

This is yet another example of Eliezer's disagreement with the human race about morality. This actually happens to us all the time, without any modification at all, and we don't care at all, and in fact we tend to be happy about it, because according to the new goal system, our goals have improved. So this suggests that we still won't care if it happens due to upgrading.

Amputation of Destiny

Unknown217y160

My guess is that Eliezer will be horrified at the results of CEV-- despite the fact that most people will be happy with it.

This is obvious given the degree to which Eliezer's personal morality diverges from the morality of the human race.

Nonsentient Optimizers

Unknown217y10

Being deterministic does NOT mean that you are predictable. Consider this deterministic algorithm, for something that has only two possible actions, X and Y.

Find out what action has been predicted.
If X has been predicted, do Y.
If Y has been predicted, do X.

This algorithm is deterministic, but not predictable. And by the way, human beings can implement this algorithm; try to tell someone everything he will do the next day, and I assure you that he will not do it (unless you pay him etc).

Also, Eliezer may be right that in theory, you can prove that the AI will not do X, and then it will think, "Now I know that I will decide not to do X. So I might as well make up my mind right now not to do X, rather than wasting time thinking about it, since I will end up not doing X in any case." However, in practice this will not be possible because any particular action X will be possible to any intelligent being, given certain beliefs or circumstances (and this is not contrary to determinism, since evidence and circumstances come from outside), and as James admitted, the AI does not know the future. So it will not know for sure what it is going to do, even if it knows its own source code, but it will only know what is likely.

Nonsentient Optimizers

Unknown217y10

James, of course it would know that only one of the two was objectively possible. However, it would not know which one was objectively possible and which one was not.

The AI would not be persuaded by the "proof", because it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Your example does not prove what you want it to. Yes, if the source code included that line, it would do it. But if the AI were to talk about itself, it would say, "When someone types 'tickle' I am programmed to respond 'hahaha'." It would not say that it has made any decision at all. It would be like someone saying, "when it's cold, I shiver." This does not depend on a choice, and the AI would not consider the hahaha output to depend on a choice. And if it was self modifying, it is perfectly possible that it would modify itself not to make this response at all.

It does not matter that in fact, all of its actions are just as determinate as the tickle response. The point is that it understands the one as determinate in advance. It does not see that there is any decision to make. If it thinks there is a decision to be made, then it may be deterministic, but it surely does not know which decision it will make.

The basic point is that you are assuming, without proof, that intelligence can be modeled by a simple algorithm. But the way intelligence feels from the inside, proves that it cannot be so modelled, namely it proves that a model of my intelligence must be too complicated for me to understand, and the same is true of the AI: it's own intelligence is too complicated for it to understand, even if it can understand mine.

Nonsentient Optimizers

Unknown217y10

James Andrix: an AI would be perfectly capable of understanding a proof that it was deterministic, assuming that it in fact was deterministic.

Despite this, it would not be capable of understanding a proof that at some future time, it will take action X, some given action, and will not take action Y, some other given action.

This is clear for the reason stated. It sees both X and Y as possibilities which it has not yet decided between, and as long as it has not yet decided, it cannot already believe that it is impossible for it to take one of the choices. So if you present a "proof" of this fact, it will not accept it, and this is a very strong argument that your proof is invalid.

The fact is clear enough. The reason for it is not quite clear simply because the nature of intelligence and consciousness is not clear. A clear understanding of these things would show in detail the reason for the fact, namely that understanding the causes that determine which actions will be taken and which ones will be not, takes more "power of understanding" than possessed by the being that makes the choice. So the superintelligent AI might very well know that you will do X, and will not do Y. But it will not know this about itself, nor will you know this about the AI, because in order to know this about the AI, you would require a greater power of understanding than that possessed by the AI (which by hypothesis is superintelligent while you are not.)

Nonsentient Optimizers

Unknown217y-20

Nick, the reason there are no such systems (which are at least as intelligent as us) is that we are not complicated enough to manage to understand the proof.

This is obvious: the AI itself cannot understand a proof that it cannot do action A. For if we told it that it could not do A, it would still say, "I could do A, if I wanted to. And I have not made my decision yet. So I don't yet know whether I will do A or not. So your proof does not convince me." And if the AI cannot understand the proof, obviously we cannot understand the proof ourselves, since we are inferior to it.

So in other words, I am not saying that there are no rigid restrictions. I am saying that there are no rigid restrictions that can be formally proved by a proof that can be understood by the human mind.

This is all perfectly consistent with physics and math.

Nonsentient Optimizers

Unknown217y00

Emile, you can't prove that the chess moves outputted by a human chess player will be legal chess moves, and in the same way, you may be able to prove that about a regular chess playing program, but you will not be able to prove it for an AI that plays chess; an AI could try to cheat at chess when you're not looking, just like a human being could.

Basically, a rigid restriction on the outputs, as in the chess playing program, proves you're not dealing with something intelligent, since something intelligent can consider the possibility of breaking the rules. So if you can prove that the AI won't turn the universe into paperclips, that shows that it is not even intelligent, let alone superintelligent.

This doesn't mean that there are no restrictions at all on the output of an intelligent being, of course. It just means that the restrictions are too complicated for you to prove.

Nonsentient Optimizers

Unknown217y-20

Eliezer, this is the source of the objection. I have free will, i.e. I can consider two possible courses of action. I could kill myself, or I could go on with life. Until I make up my mind, I don't know which one I will choose. Of course, I have already decided to go on with life, so I know. But if I hadn't decided yet, I wouldn't know.

In the same way, an AI, before making its decision, does not know whether it will turn the universe into paperclips, or into a nice place for human beings. But the AI is superintelligent: so if it does not know which one it will do, neither do we know. So we don't know that it won't turn the universe into paperclips.

It seems to me that this argument is valid: you will not be able to come up with what you are looking for, namely a mathematical demonstration that your AI will not turn the universe into paperclips. But it may be easy enough to show that it is unlikely, just as it is unlikely that I will kill myself.

Disjunctions, Antipredictions, Etc.

Unknown217y20

Ben Jones, the means of identifying myself will only show that I am the same one who sent the $10, not who it is who sent it.

Eliezer seemed to think that one week would be sufficient for the AI to take over the world, so that seems enough time.

As for what constitutes the AI, since we don't have any measure of superhuman intelligence, it seems to me sufficient that it be clearly more intelligent than any human being.

Disjunctions, Antipredictions, Etc.

Unknown217y30

Eliezer: did you receive the $10? I don't want you making up the story, 20 or 30 years from now, when you lose the bet, that you never received the money.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments