Decimal digit computations as a testing ground for reasoning about probabilities

[-]Wei Dai14y160

This shows that probabilities which are apparently very small can rapidly shift to being quite large with the influx of new information.

Isn't there a simpler way of making this point, without going into the properties of pi and the history of its computation? For example, I could write a small program that will fill my screen with a random n digit numbers at time X. Before time X, the probability of me seeing any particular number is 10^-n, and after time X it's 1 for one number and 0 for others.

But I'm having trouble seeing the practical significance of this. It would be interesting to see Anna's side of this conversation, to get a better idea of what you guys were talking about...

[-]multifoliaterose14y20

Isn't there a simpler way of making this point, without going into the properties of pi and the history of its computation? For example, I could write a small program that will fill my screen with a random n digit numbers at time X. Before time X, the probability of me seeing any particular number is 10^-n, and after time X it's 1 for one number and 0 for others.

It may be that my exposition could be improved. I'm not sure whether or not the use of a particular real number is central to the line of thinking that I'm interested in exploring.
If any two independent people compute the first thousand digits of pi then the results should be identical whereas if two people generate 1000 random digits then their results will not be identical. In this way "the first 1000 digits of pi" is well defined in a way that "1000 random digits" is not.
Note that it's possible to think that a string of n digits is probably the first n digits of pi and then learn that there was an error in the coding/algorithm used which causes a dramatic drop in the probability that the string of digits is correct whereas it's not possible to learn that a string of random digits is "wrong."One may learn that the algorithm used was not a good random number generator but that's not quite the same thing...
Some readers may find the properties of pi and the history of its computation to be of independent interest.

But I'm having trouble seeing the practical significance of this. It would be interesting to see Anna's side of this conversation, to get a better idea of what you guys were talking about...

It was just a casual conversation; a tangent to an unrelated subject. But my interest in the topic is that I think that reasoning about small probabilities may be important to x-risk reduction and I want to get a feel for the types of errors & the times of valid heuristics attached to the domain with a view toward thinking about x-risk reduction.

[-]Alexei14y140

Once I found out that you can compute any digit of pi without computing any other digit (albeit in hexadecimal), my life just wasn't the same anymore.

[-]NancyLebovitz14y70

It surprised me tremendously, but I don't think it's affected me otherwise. If you were being literal, what changed for you?

It isn't limited to hexidecimal.

[-]Nisan14y00

I like the way the author formatted their summation formulas in ascii.

[-]Alexei14y00

I think it showed me that Kolmogorov complexity of pi's digits is really low (this was before I even understood what Kolmogorov complexity was). Looking at it now, however, it's not all all surprising, given that pi's definition is already very low complexity.

[-]Armok_GoB14y00

!!!

Unless I misunderstand something that could be used as an RNG with properties I've been looking for for ages! Sadly, I don't understand half the math involved. Example code would be nice...

[-]malthrin14y30

Is there something wrong with the Mersenne Twister?

[-]Armok_GoB14y10

Yes, lots of things, although I don't know a quick or non-technical way of summing up what.

Looking at my response to Emile below will probably help.

[-]saturn14y20

Is there something wrong with AES in counter mode?

[-]Armok_GoB14y00

what is that?

[-]saturn14y30

Encrypting successive integers with AES gives you a pretty fast and very high quality PRNG.

[-]Armok_GoB14y00

apperently I'm geting to tired to cognite any compsci rigth now, it might work thou.

[-]Emile14y10

I'm curious, what properties are you looking for?

(I'm pretty interested in roguelikes and procedural content generation in games, and have played around with various algorithms for infinite consistent worlds etc. - if your name's anything to go by, I expect you might be interested in that stuff too)

[-]Armok_GoB14y10

Something that can act like a lookup table of random binary data of infinite size and dimensionality, which is as fast as possible but have very low demands on the quality of the randomness.

This should if I got things right be both faster and more random than XORing a load of bitstrings repeating with different prime numbers length.

[-][anonymous]14y10

I don't think it's fast. The any-base one that NancyLebovitz linked to claims to be O((n log n)^3). The original hex one, I don't know exactly but from the description on wikipedia it's definitely worse than O(n) (I'd guess significantly worse).

[-]Armok_GoB14y10

oh, yea then it's probably useless.

[-]Emile14y10

That sounds like a hash function, or are you looking for something else? (something faster?)

What applications do you have in mind? Procedural generation in games, or something else?

[-]Armok_GoB14y00

reads up on hash functions

Oooh, interesting. While not exactly right some could probably be re-purposed as an RNG yes... Especially cryptographic hash functions seem interesting, but they are probably far to slow. Something that looks like a cryptographic hash function to someone who really sucks at cryptography is probably what I'm looking for. Also there are probably some things to gain from collisions being a non-issue and the output needed being extremely short.

I don't actually know exactly what I need this for, it's just somehting I for some reason tend to stumble upon repeatedly when I think about compsci things and has thus concluded would be generally useful. Procedural generation, either for games, art, or math experiments, probably constitutes most of it thou.

[-]Emile14y60

In Python, you can do things like

>>> import random
>>> def random_lookup(key):
        random.seed(("Whatever", key))
        return random.randint(0, 999)

>>> random_lookup(11)
885
>>> random_lookup((42, 115, 802))
481
>>> random_lookup((31, 42, 717))
805
>>> random_lookup("strings are good keys too")
564
>>> random_lookup(11)
885
>>> random_lookup("strings are good keys too")
564

... internally, he's just using the hash of your input as a random seed, but then you can get random floats, etc.

I used something like that to get boundless fields of random temperatures, elevation and humidity, resulting in a tile-based world with mountains, oceans, swamps, forests etc. that you could scroll as far as you wanted to in any direction, and always get the same thing for a given position.

[-]Armok_GoB14y00

Nice trick, thanks, but isn't it awfully slow?

[-]Pavitra14y30

How fast do you need it to be? My laptop can do SHA1 about 2^20 times with no noticeable delay.

[-]Armok_GoB14y00

Ok, that's pretty fast for python. Should be fast enough for anything with speed requirements that makes python a viable language in the first place. Unless I did the math wrong somewhere.

Thanks.

[-]Pavitra14y10

Sorry, that's not actually Python. I was minting a 20-bit Hashcash stamp.

Edit: In Python, I get 2^15 hashes in similar time.

Edit2: and if for some reason you need insane speed, I believe graphics cards can be made to perform hashing.

Edit3: with the specific code listed in the 4-parent, rather than using hashlib.sha1(), about 2^12. The point is that computers are fast.

[-]Armok_GoB14y10

4000 per second is probably way to slow for any application I might use, and I don't think I have quite the kind of wizardry needed to employ the graphics card for anything.

[-]Emile14y20

I don't think Pavitra was talking about seconds, but in "no noticeable delay" - on my laptop, in python, I get about 2^22 hashes per second.

[-]Armok_GoB14y00

Oh. Well then I don't know but it'll be worth trying if I ever need to do somehting like this in python.

[-]Emile14y10

Not awfully - I haven't counted but for a screenful of terrain tiles there can't be more than a thousand calls, which could be a bit slow if you're recalculating everything every frame, but is fast enough as soon as you cache your results, so that you only need to calculate new, undiscovered terrain.

If performance is really a problem, it's always possible to rewrite the slow bits in C, using simpler algorithms than the Python hash function.

[-]Eugine_Nier14y20

One would have still more confidence if one were personally involved in such a team.

I don't think this statement is correct since it contradicts the law of conservation of expected evidence, i.e., if you anticipate that knowing some piece of information would increase your probability estimate of something, then simply knowing that that piece of information exists should correspondingly increase your estimate unless you also suspect that knowing the information might actually decrease your estimate by a corresponding amount.

[-]MixedNuts14y20

If one had the inside info given by being part of the team, and this info decreased one's confidence, one wouldn't be part of the team.

[-]multifoliaterose14y10

This is true. I meant assuming that the computations are sound, being personally involved would increase one's confidence relative to what it would otherwise be. I'll change the post accordingly.

LESSWRONG
LW

LESSWRONG
LW

10

Decimal digit computations as a testing ground for reasoning about probabilities

10

10