christopheg - LessWrong

Rationality Quotes July 2013

Thanks for fixing my broken english.

There is actually several quotes expressing the same idea in different Terry Pratchett's book. Everyone of them much better than what I could remember. I dug these two ones:

In Wyrd Sisters you have (Granny Weatherwas speaking): “The reward you get for digging holes is a bigger shovel.”

And another one from "Carpe Jugulum" that I like even better (also Granny Weatherwax speaking): "The reward for toil had been more toil. If you dug the best ditches, they gave you a bigger shovel."

Rationality Quotes July 2013

christopheg11y10

Who says fruit is to be prefered to foliage ?

I often wonder about something along this line when speaking of education. Are students learning for getting a job (fruit) or for culture (foliage) ? Choosing between one or the other should it be the choice of the student or of the society ? I believe the most common answer is : we study for job and the choice is made by society. But I, for one, cannot so easily dismiss the question. It has too much to do with meaning of life: are people living to work/act or to understand/love.

That's obviously not the only way to interpret this quote, the obvious one would probably be a simple statement that knowledge can be flashy but still sterile. Anyway, as most good quotes it is ambiguous, henceforth may lead to fruitful thinking.

Newcomb's Problem and Regret of Rationality

christopheg11y00

A true Omega needs to make both P(box B full | take one box) and P(box B empty | take both boxes) high. The proposed scheme ensures that P(box B full | habitual one-boxer) and P(box B empty | habitual two-boxer) are high, which is not quite the same.

If I understand correctly the distinction you're making between habitual one boxer and take one box the first kind would be about the past player history and the other one about the future. If so I guess you are right. I'm indeed using the past to make my prediction, as using the future is beyond my reach.

But I believe you're missing the point. My program is not an iterated Newcomb's Problem because Omega does not perform any prediction along the way. It will only perform one prediction. And that will be for the last game and the human won't be warned. It does not care at all about the reputation of the player, but only on it's acts in situations where he (the human player) can't know if he is playing of not.

But another point of view is possible, and that is what comes to mind when you run the program: it is coercing the player to be either a one boxer or a two boxer if he wan't to play at all. Any two-boxing and the player will have to spend a very long time one-boxing to reach the state when he is again seen as a one boxer. As it is written, the program is likely (to the chosen accuracy level) to make it's prediction while the player is struggling to be a one boxer.

As a human player what comes through my mind while running my program is ok: I want to get a million dollars, henceforth I have to become a one boxer.

Newcomb's Problem and Regret of Rationality

christopheg11y00

It's conforting sometimes to read from someone else that rationality is not the looser's way, and arguably more so for Prisonner's Dilemma than Newcomb's if your consider the current state of our planet and the tragedy of commons.

I'm writing this because I believe I suceeded writing a computer program (it is so simple I can't call it an AI) able to actually simulate Omega in a Newcomb game. What I describe below may look like an iterated Newcomb's problem. But I claim it is not so and will explain why.

When using my program the human player will actually be facing some high accuracy predictor and it will be true.

Obviously there is a trick. Here is how it goes. The predictor must first be calibrated. This is done in the simplest possible fashion : it just asks to the user if it would one-box or two-box. The problem achieving that is like asking to someone if she would enter burning building to save a child : nobody (except profesional firemen) would actually know before confronted to the actual event.

The program can actually do that : just don't say to the player if it's calibration of the predictor he is doing or the actual unique play.

Now reaching the desired prediction accuracy level is simple enough : just count the total trial runs, and the number of two-boxing or one-boxing, when one or the other goes over 99%. The program can then go for the prediction.

Obviously it must no advertise that is the real game, or it would defeats the strategy of not saying if it's the real game or not for prediction accuracy. But any reader can check from program source code that the prediction will indeed be done before (in a temporal meaning) asking to the player if he will one box or two box.

Here goes my program, it is written using python language and hevily commented, it should not be necessary to be much of a CS litterate to undrstand it. The only trick is insertion of some randomness to avoid the player could predict the end of calibration and start of the game.

print "I will run some trial games (at least 5) to calibrate the predictor."
print ("As soon as the predictor will reach the expected quality level\n"
      "I will run the actual Newcomb game. Be warned you won't be\n" 
      "warned when calibration phase will end and actual game begin\n"
      "this is intended to avoid any perturbation of predictor accuracy.\n")

# run some prelude games (to avoid computing averages on too small a set)
# then compute averages to reach the intended prediction quality
# inecting some randomness in prelude and precision quality avoid
# anybody (including program writer) to be certain of when
# calibration ends. This is to avoid providing to user data that
# will change it's behavior and defeats prediction accuracy.
import random
# 5 to 25 calibration move
prelude = (5 + random.random() * 20.0)              
# 90% accuracy or better, and avoid infinite loop
# we do not tell how much better to avoid guessers
accuracy = 1.0 - (random.random() * 0.1) - 0.01 
# postlude is the number of test games where desired accuracy must be kept
# before running the actual game
# postlude will be a random number between 1 and 5 to avoid players guessing
# on the exact play time when percent will change, this could give them some
# hint on the exact final game time. It is possible the current postlude 
# can still be exploited to improve cheater chances above intended predictor
# values, but it's just here to get the idea... and besides outguessing omega
# the cheater is only doing so in the hope of getting 100 bucks.
# How much energy does that deserve ?
postlude = 0 
one = total = two = 0
while ((total < prelude) and (int(postlude) != 1)):
    a = raw_input ("1 - One-box, 2 - Two-boxes : ")
    if not a in ['1', '2']: continue
    if a == '1':
        one += 1
    else:
        two += 1
    total += 1
    print "current accuracy is %d%%" % int(100.0 * max(two, one) / total)
    if (max(two, one) * 1.0 < total * accuracy):
        if postlude != 0 :
            postlude -= 1
        else:
            postlude = 1 + random.random() * 5.0
    else:
        postlude = 0

# Now prediction accuracy is good enough, run actual Newcomb's game
# prediction is truly a prediction of the future
# nothing prevents the user to choose otherwise.
#print "This is the actual Newcomb game, but I won't say it"
prediction = 1 if one > two else 2
finished = False
while not finished:
    a = raw_input ("1 - One-box, 2 - Two-boxes : ")
    if a == '1':
        if prediction == 1:
            print "You win 1 000 000 dollars"
        else:
            print "You win zero dollars"
        finished = True
    elif a == '2':
        if prediction == 1:
            print "You win 1 000 100 dollars"
        else:
            print "You win 100 dollars"
        finished = True

Now, why did I said this is not an Iterated Newcomb's ?

The point is that the way it is written the program is not finite. The human player is the only one able to stop the game. And to do that he has to commit to some option one-boxing or two-boxing, thus leaving the program to reach the desired accuracy level. He also has no possibility of "uncommiting" when the real game comes as this last one is not different from the others.

You could consider that the whole point of this setting is to convince the user that the claimed accuracy of Omega is true. What is fun is that in this setting it becomes true because the human player choose it to be so.

I believe the above program prooves that One-boxing is rational, I should even say obvious, provided with the right setting.

Now, I can't stop here. I believe in maths as a neutral tool. It means that if the reasoning leading to one-boxing is right, the reasoning leading to tow-boxing must also be false. If both reasoning were true maths would collapse;and that is not to be taken lightly.

Summarily as the two-boxing reasoning goes it is an immediate consequence of the Dominance Argument.

So what ? Dominance Argument is rock solid. It is so simple, so obvious.

Below is a quote from Ledwig's review on Newcomb's problem about Dominance Argument, you could say a restrictive clause of when you can of cannot apply it:

> The principles of dominance are restricted in their range, for they can only be applied,
> when the decision maker believes that the possible actions of the decision maker don't
> causally influence the possible states of the world, or the possible actions of any other
> decision maker.

There is a subtile error in the above statement. You should replace the words causally influence by are not correlated with. Using probabilist words it means actions of both decision makers are independant variables. But the lack of correlation isn't guaranteed by the lask of causality.

Think of a Prisonner's like Dilemma between traders. Stock exchange is falling down for some corporate. If traders sell you get a stock market crash, if they buy it's back to business as usual. If one sell while the other buy, only one will make big money.

Do you seriously believe that given access to the same corporate data (but not communicating between each other), both traders are not likely to make the same choice ?

In the above setting both players are not independant variables and you can't directly apply Dominance.

Reasoning backward, you could say that your choice gives you some information on the probability of the other's choice and as taking that information into account can change your choice, it may also change the choice of the other, you enter some inifinite recursion (but that's not a problem, you still have tools to solve that, like fixed point theorem).

In the Newcomb's problem, we are in an extreme case. The hypothesis states the correlation between players, that's the Omega's prediction accuracy.

Henceforth, two-boxing is not a rational decision based on causality, but a simple disbelief of the correlation stated in the hypothesis, and a confusion betwwen correlation and causality.

When you remove that disbelief (that's what my program does) the problem disappears.

Newcomb's Problem and Regret of Rationality

christopheg11y00

I don't know if you have seen it, but I have posted an actual program playing Newcomb's game. As far as I understand what I have done, this is not an Iterated Newcomb's problem, but a single shot one. You should also notice that the calibration phase does not returns output to the player (well, I added some showing of reached accuracy, but this is not necessary).

If I didn't overviewed some detail, the predictor accuracy is currently tuned at above 90% but any level of accuracy is reachable.

As I explained yesterday, the key point was to run some "calibration" phase before running the actual game. To make the calibration usefull I have to blur the limit between calibration and actual game or the player won't behave as in real game while in calibration phase. Hence the program need to run a number of "maybe real" games before playing the true one. For the reason explained above we also cannot say to the user he his playing the real and last game (or he would known if he is playing a calibration game or a real one and the calibration would be useless).

But it is very clear reading source code that if the (human) player was some kind of supernatural being he could defeat the program by choosing two boxes while the prediction is one-box. It just will be a very unlikely event to the desired accuracy level.

I pretend this is a true unmodified Newcomb's problem, all the calibration process is here only to make actually true the preassertion of the Newcomb's problem : prediction accuracy of Omega (and verifiably so for the human player : he can read the source code and convince himself or even run the program and understand why prediction will be accurate).

As I know it Necomb's problem does not impose the way the initial preassertion of accuracy is reached. As programming goes, I'm merely composing two functions, the first one ensuring the entry preassertion of good prediction accuracy is true.

Newcomb's Problem and Regret of Rationality

christopheg11y00

I posted a possible program doing what I describe in another comment. The trick as expected is that it's easier to change the human player understanding of the nature of omega to reach the desired predictability. In other words : you just remove human free will (and running my program the player learn very quickly that is in his best interrest), then you play. What is interresting is that the only way compatible with Newcomb's problem description to remove his free will is to make it a one-boxer. The incentive to make it a two-boxer would be to exhibit a bad predictor and that's not compatible with Newcomb's problem.

Newcomb's Problem and Regret of Rationality

christopheg11y00

Here is an actual program (written in python) implementing the described experiment. It has two stages. The first part is just calibration intending to find out if the player is one boxing or two boxing. The second is a straightforward non iterated Newcomb problem. Some randomness is used to avoid the player to exactly know when calibration stops and test begin, but calibration part does not care at all if it will predict the player is a one boxer or a two boxer it is just intended to create an actual predictor behaving as described in Newcomb's.

print "I will run some trial games (at least 5) to calibrate the predictor."
print ("As soon as the predictor will reach the expected quality level\n"
      "I will run the actual Newcomb game. Be warned you won't be\n" 
      "warned when calibration phase will end and actual game begin\n"
      "this is intended to avoid any perturbation of predictor accuracy.\n")

# run some prelude games (to avoid computing averages on too small a set)
# then compute averages to reach the intended prediction quality
# inecting some randomness in prelude and precision quality avoid
# anybody (including program writer) to be certain of when
# calibration ends. This is to avoid providing to user data that
# will change it's behavior and defeats prediction accuracy.
import random
# 5 to 25 calibration move
prelude = (5 + random.random() * 20.0)              
# 90% accuracy or better, and avoid infinite loop
# we do not tell how much better to avoid guessers
accuracy = 1.0 - (random.random() * 0.1) - 0.01 
# postlude is the number of test games where desired accuracy must be kept
# before running the actual game
# postlude will be a random number between 1 and 5 to avoid players guessing
# on the exact play time when percent will change, this could give them some
# hint on the exact final game time. It is possible the current postlude 
# can still be exploited to improve cheater chances above intended predictor
# values, but it's just here to get the idea... and besides outguessing omega
# the cheater is only doing so in the hope of getting 100 bucks.
# How much energy does that deserve ?
postlude = 0 
one = total = two = 0
while ((total < prelude) and (int(postlude) != 1)):
    a = raw_input ("1 - One-box, 2 - Two-boxes : ")
    if not a in ['1', '2']: continue
    if a == '1':
        one += 1
    else:
        two += 1
    total += 1
    print "current accuracy is %d%%" % int(100.0 * max(two, one) / total)
    if (max(two, one) * 1.0 < total * accuracy):
        if postlude != 0 :
            postlude -= 1
        else:
            postlude = 1 + random.random() * 5.0
    else:
        postlude = 0

# Now prediction accuracy is good enough, run actual Newcomb's game
# prediction is truly a prediction of the future
# nothing prevents the user to choose otherwise.
#print "This is the actual Newcomb game, but I won't say it"
prediction = 1 if one > two else 2
finished = False
while not finished:
    a = raw_input ("1 - One-box, 2 - Two-boxes : ")
    if a == '1':
        if prediction == 1:
            print "You win 1 000 000 dollars"
        else:
            print "You win zero dollars"
        finished = True
    elif a == '2':
        if prediction == 1:
            print "You win 1 000 100 dollars"
        else:
            print "You win 100 dollars"
        finished = True

Newcomb's Problem and Regret of Rationality

christopheg11y00

If my program runs as long as wished accuracy is nor reached it can reach any accuracy. Truly random numbers are also expected to deviate toward extremes sometimes in the long run (if they do not behave like that they are not random). As it is very rare events, against random players the expected accuracy would certainly never be reached in a human life.

Why I claim is the "calibration phase" described above takes place before Newcomb's problem. When the actual game starts the situation described in Newcomb's problem is exactly what is reached. THe description of the calibration phase could even be provided to the player to convince him Omega prediction will be accurate. At least it is convincing for me and in such a setting I would certaily believe Omega can predict my behavior. In a way you could the my calibration phase as a way for Omega to wait for the player to be ready to play truly instead of trying to cheat. As trying to cheat will only result in delaying the actual play.

OK. It may be another problem, what I did is merely replacing a perfectly accurate being with an infinitely patient one... but this one is easy to program.

Newcomb's Problem and Regret of Rationality

christopheg11y00

As proposed, the idea is to run the program in "test mode". To simulate the super-being Omenga we give it the opportunity to decide when game stops being a simulation (predictor calibration) and start being the "real game". To be fair, this change (or the rules governing it) will be communicated to some external judge before the actual "real play". But it will not be communicated to player (or obviously it would break any calibration accuracy). A possible rule could be to start the real game when some fixed accuracy is reached (something like prediction is right 99% of the time), or it could also be a fixed number of calibration games.

Writing such predictor is as easy as it can be : just a simple loop waiting for the fixed accuration wich is reached when either one-box or two-box is above expected threshold. Obviously it player is random, that could be quite long (but even with random it should happen sometime). But with a rational player that should be fast enough. I'm not sure that a random player could qualify as rational, anyways.

Doing that Omega can be as accurate as wished.

It still is not a perfect predictor, the player could still outguess Omega and predict at wich move the desired accuracy will be reached, but it's good enough for me (and the Omega player could add some randomness on his side tu avoid guessers).

I see no reason why the program describe above could not be seen as an acceptable Omega following Newcomb's problem rules.

Not communicating the actual real game is just here to avoid cheaters and enforce that the actual experiment will be done in the same environment sa the calibration.

I wonder if anyone would seriously choose to two-box any time with the above rules.

Newcomb's Problem and Regret of Rationality

christopheg11y00

I do not see your reasoning here ? What I'm proposing is not letting know when practising round stops and real round starts. That means indeed that one boxer would get higher rewards in both practice and real round, and that's why I believe it's an argument for one boxing.

My proposal for "simulating" Newcomb's may not be accurate (and it's certainly not perfect) but you can't conclude that based on the (projected) outcome of the experiment disagreeing with wath you expect.

LESSWRONG
LW

Posts

Wiki Contributions

Comments