Logical uncertainty, kind of. A proposal, at least.

[-]Wei Dai13y40

Thanks for writing this, I found it helpful for understanding the motivation behind Benja's proposal and to develop some intuitions about it. I'd like to see some explanation of how statements involving quantifiers are supposed to work in the proposal. Here's my understanding, and please let me know if it's correct.

A statement with a "for all" quantifier is also supposed to start with 1/2 probability. Consider "For all X, Q(X)" for some equation Q. This implies Q(0) and Q(1), etc., each of which also starts with 1/2 probability. What happens as the robot proves individual implications of the form "(For all X, Q(X)) implies Q(0)"? The probability of the quantified statement is reduced by half each time. Proving that individual statements like Q(0) are true can only bring the probability of the quantified statement back up to 1/2 but not above that.

Since proving Q(i) tends to be harder (take longer) than proving "(For all X, Q(X)) implies Q(i)", probability of the quantified statement goes to 0 as we increase the time/length limit even if all of the individual Q(i) are provable. This doesn't seem to be desirable, if my understanding is correct.

[-]Manfred12y20

Oh, since I was recently thinking about this, I figured I'd write down how the robot actually ends up raising probabilities above 1/2 (without just going to 1): if we can make the probability of Q go down, then we just have to make the probability of ¬Q go down the same way. The probability of Q goes down when we learn that Q implies X but we don't see X. The probability of Q goes up when we learn that ¬Q implies Y but we don't see Y.

[-]Manfred13y20

Hm. Yeah, you're right. The probability goes like 1/(1+2^number of things implied). You'd think that "3 is prime, 5 is prime, 7 is prime" would at least make the robot guess that all odd numbers were prime.

On the other hand, if we think of all possible statements "for all X, Q(X)", there are more was to be false than true. Infinity more ways, even, and this is directly related to how many things are implied by the statement. So in that way, the robot is functioning exactly as intended and assigning the maximum entropy value.

Which makes this problem sort of like my suspicions about magfrump's idea below - related to the fact that we expect simple things, we don't actually expect maximum entropy - and we're often right.

[-]private_messaging13y10

I thought of it some more... Don't think of probability as a way to represent ignorance. Probability represents a kind of knowledge. For example when you say "probability that coin fell heads is 1/2 ", that represents a sort of a model of what bounces do to the coin direction. When you multiply probabilities for a pair of coin throws, that represents independence of the throws. The 1/2 for Q(i) is not representative of that, and even if it was representative of something (statistics on the Q), product of these numbers is not probability but lower bound on the probability assuming independence; assuming they aren't independent, the probability is still 1/2 .

Probabilistic methods certainly are very useful in time bounded calculations - they are used very heavily in computer graphics and simulations of all kinds - but that is usually through use of random number source or sufficiently good random number generator, as in the probabilistic primality tests such as Miller-Rabin for example. The time-bounded bot, far from evaluating and updating "probabilities", would have to do symbolic mathematics searching for an imperfect solution, and make use of random number sources whenever appropriate, striving for bounds on probability of being wrong. (e.g. when I implemented Miller-Rabin I just made it run enough calculations so that it wouldn't fail till heat death of universe assuming that my prng wouldn't somehow return a huge string of bad witnesses, which would be very interesting if it happened).

[-]Manfred13y20

I thought of it some more... Don't think of probability as a way to represent ignorance. Probability represents a kind of knowledge.

I'll be specific. Shannon entropy represents ignorance. The bigger it is, the more ignorant you are.

[-]private_messaging13y00

Well, that doesn't seem very useful as per Wei Dai example...

Why not look at the methods people actually employ to approximate and guess math when they can't quite compute something? Applied mathematics is an enormously huge field.

[-]private_messaging13y00

On the other hand, if we think of all possible statements "for all X, Q(X)", there are more was to be false than true. Infinity more ways, even,

If you consider all syntactically valid Q in some sort of math notation, no matter the length, there's the fraction that is simply 0=0*( ........... X somewhere here........) , or your favourite less trivial statement of choice. Ditto for Turing machine tapes et cetera. There's certainly a nonzero probability of constructing Q(X) that holds for all X.

[-]Manfred13y00

That is a more detailed model than the robot uses. What it means by the ways to be false or true is more like

"true for 1, true for 2, true for 3, false for 4, true for 5..." The robot can't look inside the statements while it's doing probabilistic logic, it can only look at truth values and relationships.

On the other hand, the power of doing that is certainly a good reason to upgrade the robot :)

[-]alex_zag_al11y00

This is counterintuitive in an interesting way.

You'd think that since P(Q1|~∀xQx) = 1/2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.

And it is, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence against ∀xQx.

It's also an amusing answer to the end of part 1 exercise.

[-]magfrump13y30

I feel like the example of digits of prime numbers is a little bit leading here, because prime numbers are actually equidistributed in a technical sense, and sufficiently firmly that if you pretend that everything works like probabilities, you end up getting the right answers and producing deep theories.

There are other situations in which there aren't naive probabilistic models which are actually highly predictive, and even if it's clear to me how to translate your reasoning into the specific example presented, it's not clear to me how to translate it to something about, say, orders of exceptional finite simple groups or smooth structures on real vector spaces, which are both amenable to probabilistic answers but have answers that don't look like probability distributions.

[-]Manfred13y-20

prime numbers are actually equidistributed in a technical sense

The last digit of the trillionth prime number is 3, which is about as non-equidistributed as you can get. That's the right answer. Anything else is just a way for the robot to confess its ignorance gracefully.

EDIT: although one trouble might be that this robot isn't equipped to handle improper distributions. So if you hand it an infinitude of finite simple groups and tell it to choose one, it assigns everything a probability of zero and chooses according to whatever algorithm it uses to choose between things that have identical utility.

[-]magfrump13y50

Yes but since "trillionth prime" isn't computable, the question translates to "some high prime" and among the set of odd primes, the ones digits are equidistributed over the set {1,3,7,9}. I agree with the point that this type of question can be attacked as though it were a probabilistic question.

My point is that heuristic reasoning WORKS on primes, this is a theorem. If you say "I don't know the answer, but I will use heuristic reasoning to make a guess" and you ask for the ones digits of the trillionth, trillion and third, and two-trillion-and-seventeenth primes, you expect to get three numbers picked at random from that set, and guessing this way will serve you well.

There are other questions where you might think to apply heuristic reasoning, such as the ones digit of the number of isomorphism classes of finite groups of a given integer order, which like the question about primes is a function from the natural numbers to numbers 0-9, but I do not believe that it gives you a function which anything reasonable is known about; it doesn't converge to a distribution on the set of digits, it just acts weird and changes drastically at unpredictable points.

[-]private_messaging13y30

One can make a slightly better guess, though. The number of digits in the trillionth prime can be found via some approximate prime counting rule, then the distribution of sums of all digits but last can be estimated, then the divisibility-by-3 rule over sum of all digits makes the last digits very slightly not equiprobable with regards to divisibility by 3. If you use algebra cleverly and avoid any unnecessary computations (such as computation of actual 'probabilities' whenever those are unnecessary, trying to answer question of A>B algebraically rather than arithmetically), you might be able to do that in time.

edit: or much more simply, if you have an upper bound and a lower bound on the value of the prime, then within this range the last digits are not equidistributed. Albeit it feels to me that the choice of the last digit will depend entirely to which method comes to mind of our agent, and in turn, the agent will appear irrational (priming - the choice of the digit will depend to what ideas the agent has available), and a malicious third party that knows the answer and wants our agent deprived of cookies, could talk the agent into making a wrong guess.

edit: this is quite interesting topic, actually. If you know that different approximation methods will yield different last digits, and you assume that the choice of an approximation method is random, then you should further scale down the utility difference to approximate the sum (average) over all methods.

Though, I don't see anything between a very very slightly better guess, and the correct answer. And if the very very slightly better guess doesn't return 3, you are guaranteed to lose.

[-]magfrump13y20

If you try to count primes IN ANY WAY, then you will end up with the answer that YOU HAVE AN EQUAL CHANCE OF GETTING ANY COPRIME RESIDUE MODULO 10; this is a THEOREM. It is a proven fact of nature, or platonic ideals, or whatever it is theorems are facts of. Measure mod 3 will just tell you that the residues are also equidistributed modulo 3; it will reduce to computing residues modulo 30, which will not give you any information whatsoever because there will be an equal number that reduce to 1,3,7, and 9 modulo 10.

If you have an upper and lower bound obviously primes aren't EXACTLY equidistributed, but in this case your upper and lower bounds will be away by millions, with primes showing up on average every 27 digits. The probabilities will work out, if you think of this as a random question selected from many possible and indistinguishable questions.

In real life if I ask you "how tall is the tallest Sequoia tree?" versus "how tall is the tallest scott's pine?" Even though both questions have specific answers, if you don't have wikipedia handy, you'll need to default to "how tall do evergreens get" and the answer becomes a probability distribution. The same is true of prime numbers, once you blur out the details as much as they already have been blurred out. So the "correct answer" in a Bayesian sense is the well-calibrated answer, where you have a distribution and confidence intervals that are of known accuracy over many similar questions, even if you get the wrong answer sometimes. This sort of reasoning is totally transparent with respect to one specific mathematical phenomenon; the distribution of prime numbers in a range.

It may not be transparent with respect to other mathematical problems.

[-]private_messaging13y20

Equidistribution, of course, doesn't imply "equal chances" for Nth prime. It's 3. If you don't have time to calculate 3, there's no theorem saying you can't conclude something (trivially, in edge cases where you have almost enough time you might be able to exclude 1 somehow). Other interesting thing is that 2^n-1 are rarely primes (n has to be prime and then still usually not), and we know that 2^n-1 don't end in 9 ; impact of this decreases as n grows though. All sorts of small subtle things going on. (And yes, the resulting difference in "probabilities" is very small).

[-]magfrump13y00

If you were able to conclude additional information about the one trillionth prime OTHER THAN that it is a large prime number which is approximately of size 27 trillion, then that information MAY contain information about that prime modulo 10, I agree.

I would be very surprised (p = .01) if there actually are results of the form "the nth prime has such and such characteristics modulo whatever" because of the strong equidistribution results that do exist, and obviously after someone computes the answer it is known. If you look at prime numbers and apply techniques from probability theory, it does work, and it works beautifully well. Adding information beyond "high prime" would allow you to apply techniques from probability theory to deal with logical uncertainty, and it would work well. The examples you give seem to be examples of this.

My core point is that other problems may be less susceptible to approaches from probability theory. Even if looking at digits of prime numbers is not a probability theory problem, using techniques from probability theory accesses information from theorems you may not have proven. Using techniques from probability theory elsewhere does not do this, because those theorems you haven't proven aren't even true. So I am concerned, when someone purports to solve the problem of logical uncertainty, that their example problem is one whose solution looks like normal uncertainty.

I don't think we disagree on any matters of fact; I think we may disagree about the definition of the word "probability."

[-]private_messaging13y00

Yes, I agree. In other problems, your "probabilities" are going to not be statistically independent from basic facts of mathematics. I myself posted a top level comment with regards to ultimate futility of 'probabilistic' approach. Probability is not just like logic. If you have graph with loops or cycles, it is incredibly expensive. It isn't some reals flowing through network of tubes, sides of a loop are not statistically independent. It doesn't cut down your time at all, except in extreme examples (I once implemented a cryptographic algorithm dependent on Miller–Rabin primality test, which is "probabilistic", and my understanding is that this is common in cryptography and is used by your browser any time you establish a SSL connection)

[-]magfrump13y00

Okay yes we are in agreement.

[-]Manfred13y20

I think you may be mixing probability with frequency here.

If you say "I don't know the answer, but I will use heuristic reasoning to make a guess" and you ask for the ones digits of the trillionth, trillion and third, and two-trillion-and-seventeenth primes, you expect to get three numbers picked at random from that set, and guessing this way will serve you well.

You should rather say that they're independent if you're computationally limited. But that independence doesn't even matter for this post. I think you're assigning too many things to the word "random." Probability is not about random, it's about ignorance.

EDIT: speaking of ignorance, I probably made this comment in ignorance of what you actually intended, see more recent comment.

[-]Manfred13y00

Whoops. Well, I may have figured out what you mean. But just in case, would you be willing to try and explain what sort of extra reasoning the robot should be doing in certain cases as if I had no idea what you meant?

[-]magfrump13y00

What I'm saying is that the robot will actually do well in the given example, because when you use Bayesian inference on primes it works for some reason.

Other questions in math have weird answers; for example, for every natural number n, the number of smooth structures on R^n is exactly 1... except for n=4, in which case there are uncountably many.

Probability distributions can continuously approximate this, but only with a lot of difficulty and inaccuracy. It's an answer that isn't going to nicely appear out of doing Bayesian thinking. Once an agent has seen one dimension with an infinite number, they'll be hard pressed to accept that no other dimension has more than 1... or seeing an infinite number with 1, how will they accept one with infinitely many? Basically your priors have to be weirdly formalized. This may be a problem that has been addressed.

So my point is that just "throw Bayes at it" is a totally reasonable way of making probabilistic estimates for theorems about countable and compact sets. It's just that that method of reasoning will vary wildly in how useful it is, because some mathematical results look like probabilities and some don't.

I don't have idea for extra reasoning processes I just want to urge caution that the problem isn't solved because it can handle some situations.

[-]Manfred13y00

Hm. I think you think the robot works differently than it actually does.

My guess for what you were going to say was that you wanted the robot to work in this different way, but nopes.

A robot that predicts the next number in a series is something like a minimum message-length predictor. The robot outlined here actually can't do that.

Instead, the robot attacks each problem based on theorems that require or rule out different combinations of relevant possibilities. So, for example, in the question "how many smooth structures are there in n dimensions" (thank you for making that example clear btw), the robot would for each separate case try to prove what was going on, and if it failed, it would likely do something dumb (like run out of time, or pick the first thing it hadn't proven a theorem about) because it doesn't handle infinity-option problems very well. What it wouldn't do is try and predict the answer based on the answer for smaller dimensions, unless it could prove theorems connecting the two.

[-]magfrump13y00

Okay, but say the robot in ten seconds manages to prove the theorem: "For all dimensions n not equal to 4, there is exactly one smooth structure on R^n." But is unable to produce a result regarding n=4? Or more realistically, say it is able to construct 1 smooth structure on R^n for any n, and can prove that no others exist unless n=4. How does it make guesses in the case n=4?

[-]magfrump13y00

If we want to live in the least convenient possible world assume that in second 1 it constructs the smooth structures; it takes three seconds to prove that there are no more for n>5, three seconds to prove no more for n=1,2, and three more seconds to prove there are no more for n=3 and runs out of time. These results are obtained incidentally from inequalities that arise when pursuing a proof for n=4, which is the central value of some equation at the core of the proofs. (so the proofs really say "if another smooth structure exists, it exists for n<5, 2<n<5, 3<n<5.")

[-]Manfred13y00

If it really can't prove any theorems that directly include the translation of "the number of smooth structures for n=4 is," it simply won't ever update that.

[-]magfrump13y00

Well it can prove "the number of structures for n=4 is at least 1."

[-]AlexMennen13y00

one trouble might be that this robot isn't equipped to handle improper distributions. So if you hand it an infinitude of finite simple groups and tell it to choose one, it assigns everything a probability of zero and chooses according to whatever algorithm it uses to choose between things that have identical utility.

I don't think this is too hard to fix. The robot could have a unit of improper prior (epsilon) that it remembers is larger than zero but smaller than any positive real.

Of course, this doesn't tell you what to do when asked to guess whether the order of the correct finite simple group is even, which might be a pretty big drawback.

[-]Manfred13y20

For the robot as described, this will actually happen (sort of like Wei Dai's comment - I'm learning a lot from discussing with you guys :D ) - it only actually lowers something's probability once it proves something about it specifically, so it just lowers the probability of most of its infinite options by some big exponential, and then, er, runs out of time trying to pick the option with highest utility. Okay, so there might be a small flaw.

[-]private_messaging13y10

A more effective robot only looks for the most probable digit, it doesn't need to know how probable that digit is (edit: that is, in this specific problem where utilities are equal. There is nothing ad-hoc about use of algebra). E.g. if it would figure out that for some reason the large primes are most likely to end in 3 (or rather, that no digit is more likely than 3), it does not even need to know that 0,2,4,5,6,8 are not an option.

Furthermore, as you lower the prime, there has to be a seamless transition to the robot actually calculating the last digit in an efficient manner.

A bounded agent has to rationally allocate it's computing time. Calculating direct probabilities or huge sets of relational probabilities - rather than minimum necessary - is among the least rational things that can be done with the computing time. (Even worse thing to do could be a comparison between two estimates of utility which have different errors, such as two incomplete sums). Outside very simple toy problems, probabilistic reasoning is incredibly computationally expensive, as often every possible combination of variables has to be considered.

[-]Manfred13y20

A more effective robot only looks for the most probable digit

I agree with the rest of your comment, but this seems too ad hoc. It runs into trouble if outcomes differ in utility, so that you can't just look for high probability. And storing a number seems like a much better way of integrating lots of independent pieces of information than storing a list.

[-]private_messaging13y00

Then you look for largest probability*utility , which you generally do by trying to find a way to demonstrate A>B which you can do in many cases where you can't actually evaluate either A or B (and many cases where you can only evaluate A and B so inaccurately that outcome of comparison of evaluations of A and B is primarily dependent on inaccuracies).

Furthermore, a "probability" is a list due to loss of statistical independence with other variables. edit: the pieces of information are very rarely independent, too. Some reasoning that 3 is more likely than other digits would not be independent from 2 being a bad choice.

edit: also, holy hell, trillionth prime does end with 3.

[-]AlexMennen13y00

This is cool, but seems underspecified. Could you write a program that carries out this reasoning with respect to a fairly broad class of problems?

If you have an inconsistent probability distribution, the result you get when you resolve them depends on the order in which you resolve them. For example, given P(A)=1/2. P(B)=1/2, P(AB)=1/2, and P(A¬B)=1/2, you could resolve this by deriving P(A¬B)=0 from the first 3, or by deriving P(AB)=0 from the first 2 and last 1. Both of these answers seem obviously wrong. Does your method have a consistent way of resolving that sort of problem?

[-]Manfred13y00

A nice way of thinking about it is that the robot can do unlimited probabilistic logic, but it only takes finite time because it's only working from a finite pool of proven theorems. When doing the probabilistic logic, the statements (e.g. A, B) are treated as atomic. So you can have effective inconsistencies, in that you can have an atom that says A, and an atom that says B, and an atom that effectively says 'AB', and unluckily end up with P('AB')>P(A)P(B). But you can't know you have inconsistencies in any way that would lead to mathematical problems. Once you prove that P('AB') = P(AB), where removing the quotes means breaking up the atom into an AND statement, then you can do probabilistic logic on it, and the maximum entropy distribution will no longer be effectively inconsistent.

[-]AlexMennen13y00

Oh, I see. Do you know whether you can get different answers by atomizing the statements differently. For instance, will the same information always give the same resulting probabilities if the atoms are A and B as it would if the atoms are A and A-xor-B?

P('AB')>P(A)P(B)

Not a problem if A and B are correlated. I assume you mean P('AB')>min(P(A), P(B))?

[-]Manfred13y00

Ah, right. Or even P('AB')>P(A).

You can't get different probabilities by atomizing things differently, all the atoms "already exist." But if you prove different theorems, or theorems about different things, then you can get different probabilities.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

Logical uncertainty, kind of. A proposal, at least.

16

16