Janos2

30

Re: whose CEV?

I'm certain this was explained in an OB post (or in the CEV page) at some point, but the notion is that people whose visions of the future are currently incompatible don't necessarily have incompatible CEVs. The whole point of CEV is to consider what we would want to want, if we were better-informed, familiarized with all the arguments on the relevant issues, freed of akrasia and every bad quality we don't want to have, etc.; it seems likely that most of the difference between people's visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc., and so maybe the space of all our CEVs is actually quite small in configuration-space. Then if the AI steered towards this CEV-region in configuration space, it would likely conform to many people's altruism, and hence be beneficial to humankind as a whole.

10

Ben:

Well, that depends on your number system. For some purposes +infinity is a very useful value to have. For instance if you consider the extended nonnegative reals (i.e. including +infinity) then every measurable nonnegative extended-real-valued function on a measure space actually has a well-defined extended-nonnegative-real-values integral. There are all kinds of mathematical structures where an infinity element (or many) is indispensable. It's a matter of context. The question of what is a "number" is I think very vague given how many interesting number-like notions mathematicians have come up with. But unquestionably "infinity" is not a natural number, or a real number, or a complex number.

Probability theory, on the other hand, would have to change shape if we comfortably wanted to exclude 0 probabilities. What we now call measures would be wrong for the job. I don't know how it would look, but I find the standard description intuitively appealing enough that I don't think it should be changed. It's probably true that for a Bayesian inference engine of some sort, whose purpose is to find likelihoods of propositions given evidence, the "probabilities" it keeps track of shouldn't become 0 or 1. If there's a rich theory there focussing on how to practically do this stuff (and I bet there is, although I know nothing of it beyond Bayes' Theorem, which is a simple result) then ignoring the possibility of 0s and 1s makes sense there: for example you can use the log odds. But in general probability theory? No.

20

Eliezer:

I'm not sure what an "infinite set atheist" is, but it seems from your post that you use different notions of probability than what I think of as standard modern measure theory, which surprises me. Utilitarian's example of a uniform r.v. on [0, 1] is perfect: it must take some value in [0, 1], but for all x it takes value x with probability 0. Clearly you can't say that for all x it's impossible for the r.v. to take value x, because it must in fact take one of those values. But the probabilities are still 0. Pragmatically the way this comes out is that "probability 0" doesn't imply impossible. If you perform an experiment countably-infinitely many times with the probability of a certain outcome being 0 each time, the probability of ever getting that outcome is 0; in this sense you can say the outcome is almost impossible. However it's possible that each outcome individually is almost impossible, even though of course the experiment will have an outcome.

You can object that such experiments are physically impossible e.g. because you can only actually measure/observe countably many outcomes. That's fine; that just means you can get by with only discrete measures. But such assumptions about the real world are not known a priori; I like usual measure theory better, and it seems to do quite a good job of encompassing what I would want to mean by "probability", certainly including the discrete probability spaces in which "probability 0" can safely be interpreted to mean "impossible".

You're right, it's not that hard to come up with larger countable classes of reals than the computables; I just meant that all of the usual, "rolls-off-the-tip-of-your-tongue" classes seem to be subsets of the computables. But maybe Nick is right, and the definables are broader. I haven't studied this either.

And yes, I also sometimes think about how assumptions I make about life and the perceptible universe could be wrong, but I do not do this much for mathematics that I've studied deeply enough, because I'm almost as convinced of its "truth" as I am of my own ability to reason, and I don't see the use in reasoning about what to do if I can't reason. This is doubly true if the statements I'm contemplating are nonsense unless the math works.

10

I suspect Eliezer would object to my post claiming that I'm confusing map and territory, but I don't think that's fair. If there's a map you're trying to use all over the place (and you do seem to), then I claim it makes no sense to put a little region on the map labelled "maybe this map doesn't make any sense at all". If the map seems to make sense and you're still following it for everything, you'll have to ignore that region anyway. So is it really reasonable to claim that "the probability that probability makes sense is <1"?

Utilitarian:

Measure theory gives a clear answer to this: it's 0. Which is fine. For all x, the probability that your rv will take the value x is 0. Actually the probability that your rv is computable is also 0. (Computable numbers are the largest countable class I know of.) What's false is the tempting statement that probability 0 events are impossible. It's only the converse that's true: impossible events have probability 0. There's another tempting statement that's false, namely the statement that if S is an arbitrary collection of disjoint events, the probability of one of them happening is the sum of the probabilities of each one happening. Instead, this only holds for countable sets S. This is part of the definition of a measure.

70

I agree with cumulant. The mathematical subject of probability is based on measure theory, which loses a ton of convergence theorems if we exclude 0 and 1. We can agree that things that are not known a priori can't have probability 0 or 1, but I think we must also agree that "an impossible thing will happen soon" has probability 0, because it's a contradiction. An alternate universe in which the number 7 (in the same kind of number system as ours, etc.) is prime is damn-near inconceivable, but an alternate universe in which impossible things are possible is purely absurd.

If our mathematical reasoning is coherent enough for it to be meaningful to make probability assignments then certainly we are not so fundamentally flawed that what we consider tautologies could be false. If you are willing to accept that maybe 0 is 1, then you can't do any of your probability adjustments, or use Bayes' Theorem, or anything of the sort without having a (possibly unstated) caveat that probability theory might be complete nonsense. But what's the probability that probability theory is nonsense (i.e. false or inconsistent)? What does that even mean? We can only assign a probability if that makes sense, so conditioned on the sentence making sense, probability theory must be nonsense with probability 0, no? So averaged over all possible universes (those where probability theory makes sense, and those where it doesn't) the sentence "probability makes sense with probability 1" better approximates the truth value of probability making sense than "probability makes sense with probability p" for p0. If it's not, it's still not worse, but what the hell are we even saying?

Agreed re: the bashing of mainstream math in PT:TLOS. AFAIK, his claims that mainstream math leads to paradoxes are all false; of course trying to act as though various items of mainstream math meant what an uneducated first glance says they mean can make them look bad. (e.g. the Banach-Tarski paradox means either "omg, mathematicians think they can violate conservation of mass!" or "OK, so I guess non-measurable things are crazy and should be avoided") It's not only unnecessary and annoying, but also I think that using usual measure theory would clarify things sometimes. For instance the fact that MaxEnt depends on what kind of distribution you start with, because a probability distribution doesn't actually have an entropy, but only a relative entropy relative to a reference measure, which is of course not necessarily uniform, even for a discrete variable. Jaynes seems to strongly deemphasize this, which is unfortunate: from PT:TLOS it seems as though MaxEnt gives you a prior given only some constraints, when really you also need a "prior prior".