My long runs on Saturdays give me time to ponder the various material at lesswrong.  Recently my attention has been kept busy pondering a question about rationality that I have not yet resolved and would like to present to lesswrong as a discussion.  I will try to be as succinct as possible, please correct me if I make any logical fallacies.


Instrumental rationality is defined as the art of choosing actions that steer the future toward outcomes ranked higher in your preferences/values/goals (PVGs)

 Here are my questions:

1. If rationality is the function of achieving our preferences/values/goals, what is the function of choosing our PVGs to begin with, if we could choose  our preferences?  In other words, is there an "inherent rationality" absence of preference or values?  It seems as if the definition of instrumental rationality is saying that if you have a PVG, that there is a rational way to achieve it, but there is not necessarily rational PVGs.  


2.If the answer is no, there is no "inherent rationality" absence of a PVG, then what would preclude the possibility that a perfect rationalist, given enough time and resources, will eventually become a perfectly self interested entity with only one overall goal which is to perpetuate his existence, at the sacrifice of everything and everyone else?

Suppose a superintelligence visits Bob and grants him the power to edit his own code.  Bob can now edit or choose his own preferences/values/goals.  

Bob is a perfect rationalist.

Bob is genetically predisposed to abuse alcohol, as such he rationally did everything he could to keep alcohol off his mind.  

Now, Bob no longer has to do this, he simply goes into his own code and deletes this code/PVG/meme for alcohol abuse.

Bob continues to cull his code of "inefficient" PVGs.   

Soon Bob only has one goal, the most important goal, self preservation. 

3. Is it rational for Bob, having these powers, to rid himself of humanity, and rewrite his code to only want to support one meme, that is the meme to ensure his existence.  Everything he will do will goes to support this meme.  He will drop all his relationships, his hobbies, all his wants and desires into concentrate on a single objective.  How does Bob not become a monster superintelligence hell bent on using all the energy in the universe for his own selfish reasons?

 

I have not resolved any of these questions yet, and look forward to any responses I may receive.  I am very perplexed at Bob's situation.  If there are some sequences that would help me better understand my questions please suggest them.

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 9:14 AM

Of course Bob becomes a monster superintelligence hell bent on using all the energy in the universe for his own selfish reasons. I mean, duh! It's just that "his own selfish reasons" involves things like cute puppies. If Bob cares about cute puppies, then Bob will use his monstrous intelligence to bend the energy of the universe towards cute puppies. And love and flowers and sunrises and babies and cake.

And killing the unbelievers if he's a certain sort - I don't want to make this sound too great. But power doesn't corrupt. Corruption corrupts. Power just lets you do what you want, and people don't want "to stay alive." People want friends and cookies and swimming with dolphins and ice skating and sometimes killing the unbelievers.

[-]D22712y20

If Bob cares about cute puppies, then Bob will use his monstrous intelligence to bend the energy of the universe towards cute puppies. And love and flowers and sunrises and babies and cake.

I follow you. It does resolve my question of whether or not rationality + power necessarily involves a terrible outcomes. I had asked the question of whether or not a perfect rationalist given enough time and resources would become perfectly selfish. I believe I understand the answer as no.

Matt_Simpson gave a similar answer:

Suppose a rational agent has the ability to modify their own utility function (i.e. preferences) - maybe an AI that can rewrite its own source code. Would it do it? Well, only if it maximizes that agent's utility function. In other words, a rational agent will change its utility function if and only if it maximizes expected utility according to that same utility function

If Bob's utility function is puppies, babies and cakes, then he would not change his utility function for a universe with out these things. Do I have the right idea now?

I follow you. It does resolve my question of whether or not rationality + power necessarily involves a terrible outcomes. I had asked the question of whether or not a perfect rationalist given enough time and resources would become perfectly selfish. I believe I understand the answer as no.

Indeed. The equation for terrible outcomes is "rationality + power + asshole" (where 'asshole' is defined as the vast majority of utility functions, which will value terrible things. The 'rationality' part is optional to the extent that you can substitute it with more power. :)

Of course Bob becomes a monster superintelligence hell bent on using all the energy in the universe for his own selfish reasons. I mean, duh! It's just that "his own selfish reasons" involves things like cute puppies.

When the monster superintelligence Bob is talking about 'cute puppies' lets just say that 'of the universe' isn't the kind of dominance he has in mind!

[-][anonymous]12y70

I think the most relevant idea here is the distinction between wanting, liking, and approving.

Suppose we have the capability, for any behavior, to change whether we want it, or whether we like it, or whether we approve of it. It seems reasonable to self-modify in such a way that these three things coincide -- that way, you want only things that you like, and you approve of your own behavior. Seen through this lens, Bob's decision to delete his alcohol abuse makes sense: he wants alcohol, but he may or may not like it, and he does not approve of it. So he changes his values so that none of the three apply.

I think a general definition of "approving" is "would self-modify to like and want this, if possible". So the fixed point of Bob's self-modification is not putting flags of +want, +like, +approve on self-preservation, and forgetting everything else. Rather, you would decide what behavior you approve of (which may well include relationships and hobbies -- it certainly doesn't involve being a selfish monster superintelligence) and make yourself want to do it and enjoy doing it.

There are two definitions of rationality to keep in mind: epistemic rationality and instrumental rationality. An agent is epistemically rational to the extent it update their beliefs about the world based on the evidence and in accordance with probability theory - notably Bayes rule.

On the other hand, an agent is instrumentally rational to the extent it maximizes it's utility function (i.e. satisfies it's preferences).

There is no such thing as "rational preferences," though much ink has been spilled trying to argue for them. Clearly preferences can't be rational in an epistemic sense because, well, preferences aren't beliefs. Now can preferences be rational in the instrumental sense? Well, actually, yes but only in the sense that having a certain set of preferences may maximize the preferences you actually care about - not in the sense of some sort of categorical imperative. Suppose a rational agent has the ability to modify their own utility function (i.e. preferences) - maybe an AI that can rewrite its own source code. Would it do it? Well, only if it maximizes that agent's utility function. In other words, a rational agent will change its utility function if and only if it maximizes expected utility according to that same utility function - which is unlikely to happen under most normal circumstances.

As for Bob, presumably he's a human. Humans aren't rational, so all bets are off as far as what I said above. However, let's assume at least with respect to utility function changing behavior Bob is rational. Will he change his utility function? Again, only if he expects it to better help him maximize that same utility function. Now then, what do we make of him editing out his alcoholism? Isn't that a case of editing his utility function? Actually, it isn't - it's more of a constraint of the hardware that Bob is running on. There are a lots of programs running inside Bob's head (and yours), but only a subset are Bob. The difficult part is figuring out which parts of Bob's head are Bob and which aren't.

[-]D22712y00

Thank you for your response. I believe I understand you correctly, I made a response to Manfred's comment in which I reference your response as well. Do you believe I interpreted you correctly?

An agent that has an empathetic utility functions will only edit its own code if and only if it maximizes expected utility of the same empathetic utility function. Do I get your drift?

I think that's right, though just to be clear an empathetic utility function isn't required for this behavior. Just a utility function and a high enough degree of rationality (and the ability to edit its own source code).

Put another way:

Suppose an agent has a utility function X. It can modify it's utility function to become Y. It will only make the switch from "X" to "Y" if it believes that switching will ultimately maximize X. It will not switch to Y simply because it believes it can get a higher amount of Y than X.

This is correct, if the agent has perfect knowledge of themselves, if X is self-consistent, it X is cheap to compute, etc.

The article supposes that "Bob is a perfect rationalist". What exactly does it mean? In my opinion, it does not mean the he is always right. He is "merely" able to choose the best possible bet based on his imperfect information. In a few branches of a quantum multiverse his choice will be wrong (and he anticipates it), because even his perfect reasoning could be misled by a large set of very improbable events.

Bob may be aware that some of his values are incosistent and he may choose to sacrifice some of them to create a best possible coherent approximation (an intra-personal CEV of Bob.)

In theory, X can be very expensive to compute, so Bob must spend significant resources to calculate X precisely, and these resources cannot be used for increasing X directly. If there is a function Y giving very similar results to X, but much cheaper to compute, then Bob may make a calculated risk of replacing X by Y, assuming that maximum Y will give him near-maximum Y, and he can spend the saved resources to increase Y, thereby paradoxically (probably) obtaining higher X than if he tried to increase X directly.

For #2: Let's assume for the sake of simplicity that Bob is a perfect rational agent with a stable utility function -- Bob's utility function (the function itself, not the output) doesn't change over time. If Bob goes FOOM, Bob is still going to support his original utility function (by definition, because I stipulated that it was stable wrt time above.) I think that you're wondering whether this stability conflicts with Bob being a perfect (instrumentally) rational agent. It doesn't -- Bob being a perfect rational agent just means that he makes the best choices possible to maximize his utility function given the information and computing power that he has.

You can have a rational paperclip maximizer, or a rational bunny maximizer...pretty much whatever you like.

Note: The whole utility function thing gets more complicated for humans, because human's utility functions tend not to be stable wrt time, and it gets tricky when you utility is changing while you're trying to maximize it. Also, we have intuitions which make us happier/more content/whatever when we're trying to maximize certain types of utility functions, and less so with others (having a nice meal will probably make you happier than would a close friend dying, to use an extreme example) -- we usually want to consider this when picking utility functions.

2.If the answer is no, there is no "inherent rationality" absence of a PVG, then what would preclude the possibility that a perfect rationalist, given enough time and resources, will eventually become a perfectly self interested entity with only one overall goal which is to perpetuate his existence, at the sacrifice of everything and everyone else?

That would require having a PVG toward self-interest.

A being without a PVG is not an optimizer. It could not be accurately described as intelligent, let alone rational.

[-][anonymous]12y00

Value is Fragile

You can't derive an ought from an is.

[This comment is no longer endorsed by its author]Reply