I read this comment, and after a bit of rambling I realized I was as confused as the poster. A bit more thinking later I ended up with the “definition” of probability under the next heading. It’s not anything groundbreaking, just a distillation (specifically, mine) of things discussed here over the time. It’s just what my brain thinks when I hear the word.
But I was surprised and intrigued when I actually put it in writing and read it back and thought about it. I don’t remember seeing it stated like that (but I probably read some similar things).
It probably won’t teach anyone anything, but it might trigger a similar “distillation” of “mind pictures” in others, and I’m curious to see that.
What “probability” is...
Or, more exactly, what is the answer to “what’s the probability of X?”
Well, I don’t actually know, and it probably depends on who asks. But here’s the skeleton of the answer procedure:
- Take the set of all (logically) possible universes. Assign to each universe a finite, real value m (see below).
- Eliminate from the set those those that are inconsistent with your experiences. Call the remaining set E.
- Construct T, the subset of E where X (happens, or happened, or is true).
- Assign to each universe u in set E a value p, such that p(u) is inversely proportional to m(u), and the integral of p over set E is 1.
- Calculate the integral of p over the set T. The result is called “the probability of X”, and is the answer to the question.
I’m aware that this isn’t quite a definition; in fact, it leaves more unsaid (undefined) than it explains. Nevertheless, to me it seems that the structure itself is right: people might have different interpretations for the details (and, like me, be uncertain about them), but those differences would still be mentally structured like above.
In the next section I explain a bit where each piece comes from and what it means, and in the one after I’m going to ramble a bit.
About (logically possible) universes: We don’t actually know what our universe is; as such, other possible universes isn’t quite a well-defined concept. For generality, the only constraint I put above is that they be logically possible, for the only reason that the description is (vaguely) mathematical and I don’t have any idea what math without logic means. (I might be missing something, though.)
Note that by “universe” I really mean an entire universe, not just “until now”. E.g., if it so happens your experiences allow for a single possible past (i.e., you know the entire history), but your universe is not deterministic, there are still many universes in E (one for each possible future); if it’s deterministic, then E contains just one universe. (And your calculations are a lot easier...)
Before you get too scared or excited by the concept of “all possible universes” remember that not all of them are actually used in the rest of the procedure. We actually need only those consistent with experience. That’s still a lot when you think about it, but my mind seems to reel in panic more often I forget this point. (Lest this note makes you too comfortable, I must also mention that the possibility that experience is (even partly) simulated explodes the size of E.)
About that real value m I was talking about: “m” comes from “measure”, but that’s a consequence of how I arrived at the schema above. Even now I’m not quite sure it belongs there, because it depends on what you think “possible universes” means. If you just set it to 1 for all universes, everything works.
But, for example, you might consider that the set U is countable, encoding them all as numbers using a well-defined rule, and use the Kolmogorov complexity of the bit-string encoding a universe for that universe’s measure. (Given step  above, this would mean that you think simpler universes are more probable; except it doesn’t quite mean that, because “probable” is defined only after you picked your “m”. It’s probably closer to “things that happen in simpler universes are more probable”; more in the ramblings section.)
A bit about the math: I used some math terms a bit loosely in the schema above. Depending exactly on how you mean by “possible universes”, the set of them might be finite, countably infinite, not countable, or might be a proper class rather than a set. Depending on that, “integrating” might become a different operation. If you can’t (mathematically, not physically) do such an operation on your collection of possible universes (actually, on those in E) then you have to define your own concept of probability :-P
With regards to computability, note that the series of steps above is not an algorithm, it’s just the definition. It doesn’t feel intuitive to me that there is any possible universe where you can actually follow the steps above, but math surprises me in that regard sometimes. But note that you don’t really need p(X): you just need a good-enough approximation, and you’re free to use any trick you want.
If the above didn’t interest you, the rest probably won’t, either. I’ve put in this the most interesting consequences of the schema above. It’s kind of rambling, and I apologize; as in the last section, I’ll bold keywords, so you might just skim it for paragraphs that might interest you.
I found it interesting (but not surprising) to note that Bayesian statistics correspond well to the schema above. As far as I can tell, the Bayesian prior for (any) X is the number assigned in step 5; Bayesian updating is just going back to step 2 whenever you have new experiences. The interesting part is that my description smells frequentist. I wasn’t that surprised because the main difference (in my head) between the two is the use of priors; frequentist statistics ignore prior knowledge. If you just do frequentist statistics on every possible event in every possible universe (for some value of possible), then there is no “prior knowledge” left to ignore.
The schema above describes only true/false–type problems. For non-binary problems you just split of E in step 3 into several subsets, one for each possible answer. If the problem is real-valued you need to split E into an uncountably infinite number of sets, but I’ve abused set theory terms enough today that I’m not very concerned. Anyway, in practice (in our universe) it’s usually enough to just split the domain of the value in countably many intervals, according to precision you need, and split the universes in E according to which interval they fall in. That is, you don’t actually need to know the probability that a value is, say, sqrt(2), just that it’s closer to sqrt(2) than you can measure it.
With regard to past discussions about a rationale for rationality, observe that it’s possible to apply the procedure above to evaluate what is the “rational way”, supposing you define it by “the rational guy plays to win”: instead of step (3) generate the set of decision procedures that are applicable in all E, call it D; for each d in D, split E into universes where you win and those where you lose (don't win), and call these W(d) and L(d); instead of step 4, for each decision procedure d, calculate the “winningness” of d as the integral of p over W(d) divided by the integral over L(d) (with p defined like above); instead of step 5, pick a decision d0 such that it's “winningness” is maximal (no other has a larger value).
Note that I’ve no idea if doing this actually picks the decision procedure above, nor what exactly it would mean if it doesn’t... Of course, if it does, it’s still circular, like any “reason for reasoning”. The procedure might also give different results for people with different E. I found it interesting to contemplate that it might be “possible” for someone in another universe (one much friendlier to applied calculus than ours) to calculate exactly the solution of the procedure for my E, but at the same time for the best procedure for approximating it in my universe to give a different answer. They can’t, of course, communicate this to me (since then they’re not in a different universe in the sense used above).
If your ontology implies a computable universe (thus you only need to consider those in E), you might want to use Kolmogorov complexity as a measure for the universes. I’ve no idea which encoding you should use to calculate it; there are theorems that say the difference between two encodings is bounded by a constant, but I don't see why certain encodings can't be biased to have systematic effects on your probability calculations. (Other than “it's kind of a long shot”.) You might use the above procedure for deciding on decision procedures, of course :-P
There’s also a theorem that say you can’t actually make a program to compute the KC for any arbitrary bit-string. There might be a universe–to–bit-string encoding that generates only bit-strings for which there is such a program, but that’s also kind of a long shot.
If your ontology implies quantum mechanics then I think the measure of the universes (m(u) in step 1) must involve wave functions somehow, but my understanding of QM doesn’t allow me to think it through much.
The schema above illuminated a bit something that puzzled me in that comment I was talking about at the beginning: say you are suddenly sent to the planet Progsta and a Sillpruk comes and asks you whether the game of Doldun will be won by the team Strigli; what’s your prior for the answer? What puzzled me was that the very fact that you were asked that question communicates an enormous amount of information — see this comment of mine for examples — and yet I couldn’t actually see how that should affect my priors. Of course, the information content of the question restricts hugely the universes in my E. But there were so many there that it’s still huge; more importantly, it restricts the universes along boundaries that I’ve not previously explored, and I don’t have ready heuristics to estimate that little p above:
If I throw a (correct) dice, I can split the universes in six approximately equal parts on vague symmetry justifications, and just estimate the probability of each side as 1/6. If someone on the street asks me to bet him on his dice I can split the universes in those where I win and those where I lose and estimate (using a kind of Montecarlo-integration with various scenarios I can think of) that I’ll probably lose. If I encounter an alien named Sillpruk I’ve no idea how to split the universes to estimate the result of a Doldun match. But if I were to encounter lots of aliens with strange first-questions for a while, I might develop some such simple heuristics based on simple trial and error.
I’m sorry if this was too long or just stupid. In the former case I welcome constructive criticism — don’t hesitate to tell me what you think should have been cut. I hereby subject myself to Crocker’s Rules. In the latter case... well, sorry :-)