What Are Probabilities, Anyway?

In Probability Space & Aumann Agreement, I wrote that probabilities can be thought of as weights that we assign to possible world-histories. But what are these weights supposed to mean? Here I’ll give a few interpretations that I've considered and held at one point or another, and their problems. (Note that in the previous post, I implicitly used the first interpretation in the following list, since that seems to be the mainstream view.)

  1. Only one possible world is real, and probabilities represent beliefs about which one is real.
    • Which world gets to be real seems arbitrary.
    • Most possible worlds are lifeless, so we’d have to be really lucky to be alive.
    • We have no information about the process that determines which world gets to be real, so how can we decide what the probability mass function p should be? 
  2. All possible worlds are real, and probabilities represent beliefs about which one I’m in.
    • Before I’ve observed anything, there seems to be no reason to believe that I’m more likely to be in one world than another, but we can’t let all their weights be equal.
  3. Not all possible worlds are equally real, and probabilities represent “how real” each world is. (This is also sometimes called the “measure” or “reality fluid” view.)
    • Which worlds get to be “more real” seems arbitrary.
    • Before we observe anything, we don't have any information about the process that determines the amount of “reality fluid” in each world, so how can we decide what the probability mass function p should be?
  4. All possible worlds are real, and probabilities represent how much I care about each world. (To make sense of this, recall that these probabilities are ultimately multiplied with utilities to form expected utilities in standard decision theories.)
    • Which worlds I care more or less about seems arbitrary. But perhaps this is less of a problem because I’m “allowed” to have arbitrary values.
    • Or, from another perspective, this drops another another hard problem on top of the pile of problems called “values”, where it may never be solved.

As you can see, I think the main problem with all of these interpretations is arbitrariness. The unconditioned probability mass function is supposed to represent my beliefs before I have observed anything in the world, so it must represent a state of total ignorance. But there seems to be no way to specify such a function without introducing some information, which anyone could infer by looking at the function.

For example, suppose we use a universal distribution, where we believe that the world-history is the output of a universal Turing machine given a uniformly random input tape. But then the distribution contains the information of which UTM we used. Where did that information come from?

One could argue that we do have some information even before we observe anything, because we're products of evolution, which would have built some useful information into our genes. But to the extent that we can trust the prior specified by our genes, it must be that evolution approximates a Bayesian updating process, and our prior distribution approximates the posterior distribution of such a process. The "prior of evolution" still has to represent a state of total ignorance.

These considerations lead me to lean toward the last interpretation, which is the most tolerant of arbitrariness. This interpretation also fits well with the idea that expected utility maximization with Bayesian updating is just an approximation of UDT that works in most situations. I and others have already motivated UDT by considering situations where Bayesian updating doesn't work, but it seems to me that even if we set those aside, there is still reason to consider a UDT-like interpretation of probability where the weights on possible worlds represent how much we care about those worlds.

78 comments, sorted by
magical algorithm
Highlighting new comments since Today at 5:43 PM
Select new highlight date

In order answer questions like "What are X, anyway?", we can (phenomenologically) turn the question into something like "What can we do with X?" or "What consequences does X have?"

For example, consider the question "What are ordered pairs, anyway?". Sometimes you see "definitions" of ordered pairs in terms of set theory. Wikipedia says that the standard definition of ordered pairs is:

(a, b) := {{a}, {a, b}}

Many mathematicians find this "definition" unsatisfactory, and view it not as a definition, but an encoding or translation. The category-theoretic notion of a product might be more satisfactory. It pins down the properties that the ordered pair already had before the "definition" was proposed and in what sense ANY construction with those properties could be used. Lambda calculus has a couple constructions that look superficially quite different from the set-theory ones, but satisfy the category-theoretic requirements.

I guess this is a response at the meta level, recommending this sort of "phenomenological" lens as the way to resolve these sort of questions.

Lambda calculus has a couple constructions that look superficially quite different from the set-theory ones, but satisfy the category-theoretic requirements.

... as does the set-theoretic one.

ETA: Now that I read more closely, you didn't imply otherwise.

This word "possible" carries a LOT of hidden baggage. If math tells us anything its that LOTS of things SEEM possible to us because we aren't logically omniscient but aren't really possible.

While we're at it, how about we drop "worlds" from the mix. I don't think it adds anything. If we replace it with "information flows" do things work better?

Do you mean something precise by "information flows"?

Possible world is a standard term in several related fields, such as philosophy and linguistics. Are you arguing against my particular usage, or all usage of the term in general?

"Worlds" apparently means pretty-much what it means in the MWI.

Lumping probabilities in with utilities sounds pretty close to Vladimir Nesov's Representing Preference by Probability Measures.

Before I’ve observed anything, there seems to be no reason to believe that I’m more likely to be in one world than another, but we can’t let all their weights be equal.

We can't? Why not? Estimating the probability of two heads on two coinflips as 25% is giving existence in worlds with heads-heads, heads-tails, tails-heads, and tails-tails equal weight. The same is true of a more complicated proposition like "There is a low probability that Bigfoot exists" - giving every possible arrangement of objects/atoms/information equal weight, and then ruling out the ones that don't result in the evidence we've observed, few of these worlds contain Bigfoot.

giving every possible arrangement of objects/atoms/information equal weight

Without an arbitrary upper bound on complexity, there are infinitely many possible arrangements.

Theoretically, it's not infinite because of the granularity of time/space, speed of light, and so on.

Practically, we can get around this because we only care about a tiny fraction of the possible variation in arrangements of the universe. In a coin flip, we only care about whether a coin is heads-up or tails-up, not the energy state of every subatomic particle in the coin.

This matters in the case of a biased coin - let's say biased towards heads 66%. This, I think, is what Wei meant when he said we couldn't just give equal weights to all possible universes - the ones where the coin lands on heads and the ones where it lands on tails. But I think "universes where the coin lands on heads" and "universes where the coin lands on tails" are unnatural categories.

Consider how the probability of winning the lottery isn't .5 because we choose with equal weight between the two alternatives"I win" and "I don't win". Those are unnatural categories, and instead we need to choose with equal weight between "I win", "John Q. Smith of Little Rock Arkansas wins", "Mary Brown of San Antonio, Texas, wins" and so on to millions of other people. The unnatural category "I don't win" contains millions of more natural categories.

So on the biased coin flip, the categories "the coin lands heads" and "the coin lands tails" contains a bunch of categories of lower-level events about collisions of air molecules and coin molecules and amounts of force one can use to flip a coin, and two-thirds of those events are in the "coin lands heads" category. But among those lower-level events, you choose with equal weight.

True, beneath these lower-level categories about collisions of air molecules, there are probably even lower things like vibrations of superstrings or bits in the world-simulation or whatever the lowest level of reality is, but as long as these behave mathematically I don't see why they prevent us from basing a theory of probability on the effects of low level conditions.

Theoretically, it's not infinite because of the granularity of time/space, speed of light, and so on.

These initial weights are supposed to be assigned before taking into account anything you have observed. But even now (under the second interpretation in my list) you can't be sure that the world you're in is finite. So, suppose there is one possible world for each integer in the set of all integers, or one possible world for each set in the class of all sets. How could one assign equal weight to all possible worlds, and have the weights add up to 1?

Practically, we can get around this because we only care about a tiny fraction of the possible variation in arrangements of the universe. In a coin flip, we only care about whether a coin is heads-up or tails-up, not the energy state of every subatomic particle in the coin.

I don't think that gets around the problem, because there is an infinite number of possible worlds where the energy state of nearly every subatomic particle encodes some valuable information.

How could one assign equal weight to all possible worlds, and have the weights add up to 1?

By the same method we do calculus. Instead of sum of the possible worlds we integrate over the possible worlds (which is a infinite sum of infinitesimally small values). For explicit construction on how this is done any basic calculus book is enough.

My understanding is that it's possible to have a uniform distribution over a finite set, or an interval of the reals, but not over all integers, or all reals, which is why I said in the sentence before the one you quotes, "suppose there is one possible world for each integer in the set of all integers."

There is a 1:1 mapping between "the set of reals in [0,1]" and "the set of all reals". So take your uniform distribution on [0,1] and put it through such a mapping... and the result is non-uniform. Which pretty much kills the idea of "uniform <=> each element has the same probability as each other".

There is no such thing as a continuous distribution on a set alone, it has to be on a metric space. Even if you make a metric space out of the set of all possible universes, that doesn't give you a universal prior, because you have to choose what metric it should be uniform with respect to.

(Can you have a uniform "continuous" distribution without a continuum? The rationals in [0,1]?)

As there is the 1:1 mapping between set of all reals and unit interval we can just use the unit interval and define a uniform mapping there. As whatever distribution you choose we can map it into unit interval as Pengvado said.

In case of set of all integers I'm not completely certain. But I'd look at the set of computable reals which we can use for much of mathematics. Normal calculus can be done with just computable reals (set of all numbers where there is an algorithm which provides arbitrary decimal in a finite time). So basically we have a mapping from computable reals on unit interval into set of all integers.

Another question is that is the uniform distribution the entropy maximising distribution when we consider set of all integers?

From a physical standpoint why are you interested in countably infinite probability distributions? If we assume discrete physical laws we'd have finite amount of possible worlds, on the other hand if we assume continuous we'd have uncountably infinite amount which can be mapped into unit interval.

From the top of my head I can imagine set of discrete worlds of all sizes which would be countably infinite. What other kinds of worlds there could be where this would be relevant?

Theoretically, it's not infinite because of the granularity of time/space, speed of light, and so on.

(Nitpick: Spacetime isn't quantized AFAIK in standard physics, and then there are still continuous quantum amplitudes.)

This, I think, is what Wei meant when he said we couldn't just give equal weights to all possible universes - the ones where the coin lands on heads and the ones where it lands on tails. But I think "universes where the coin lands on heads" and "universes where the coin lands on tails" are unnatural categories.

I thought Wei was talking about single worlds (whatever those may be), not sets of worlds. Applied to sets of worlds, this seems correct.

Yvain said the finiteness well, but I think the "infinitely many possible arrangements" needs a little elaboration.

In any continuous probability distributions we have infinitely many (actually uncountably infinitely many) possibilities, and this makes the probability of any single outcome 0. Which is the reason why, in the case of continuous distributions, we talk about probability of the outcome being on a certain interval (a collection of infinitely many arrangements).

So instead of counting the individual arrangements we calculate integrals over some set of arrangements. Infinitely many arrangements is no hindrance to applying probability theory. Actually if we can assume continuous distribution it makes some things much easier.

Good point. Does this work over all infinite sets, though? Integers? Rationals?

It does work, actually if we're using Integers (there are as many integers as Rationals so we don't need to care about the latter set) we get the good old discrete probability distribution where we either have finite number of possibilities or at most countable infinity of possibilities, e.g set of all Integers.

Real numbers are strictly larger set than integers, so in continuous distribution we have in a sense more possibilities than countably infinite discrete distribution.

Hmmm - caring as a part of reality? Why not just flip things up, and consider that emotion is also part of reality. Random by any other name. Try to exclude it and you'll find you can't no matter how infinitely many worlds you suppose. There's also calculus to irrationality . . .

The "caring" interpretation doesn't say that caring is part of reality (except insofar as minds are implemented in reality). Rather, it says that probability isn't part of reality, it's part of decision theory (again except insofar as minds are implemented in reality).

cool! but can you really posit artificial intelligence (decision theory has to get enacted somewhere) and not allow mind as part of reality?

All possible worlds are real, and probabilities represent how much I care about each world. ... Which worlds I care more or less about seems arbitrary.

This view seems appealing to me, because 1) deciding that all possible worlds are real seems to follow from the Copernican principle, and 2) if all worlds are real from the perspective of their observers, as you said it seems arbitrary to say which worlds are more real.

But on this view, what do I do with the observed frequencies of past events? Whenever I've flipped a coin, heads has come up about half the time. If I accept option 4, am I giving up on the idea that these regularities mean anything?

What does real even mean, by the way? Interpretation 1 with real taken to mean ‘of or pertaining to the world I'm in’ (as I would) is equivalent to Interpretation 2 with real taken to mean ‘possible’ (as Tegmark would, IIUC) and to Interpretation 3 with real taken to mean ‘likely’ and to Interpretation 4 with real taken to mean ‘important to me’.

Your getting yourself in trouble because you assume that puzzling questions must have deep answers when usually the question itself is flawed or misleading. In this case there just seems to be a need for any explanation of the kind you offer nor would be of any use anyway.

These 'explanations' you offer of probability aren't really explaining anything. Certainly we do succesfully use probability to reason about systems that behave in a deterministic classical fashion (rolling dice probably counts). No matter what sort of probability you believe in you have to explain that application. So introducing 'objective' probability merely adds things we need to explain (possible worlds etc..).

The correct approach is to step back and ask what is it that needs explaining. Well probability is really nothing but a fancy way of counting up outcomes. So once we justify describing the world in a probabilistic fashion (even when it's deterministic in some sense) the application of mathematical inference to reformulate that description in more useful ways is untroubling. In other words if it's reasonable to model rolling two six sided dice as being independent uniformly random variables on 1...6 counting up the combinations and saying there is a 1/6 chance of getting a 7 doesn't raise any new difficulties.

So the question just comes down to is it reasonable of us to model the world using random variables?. I mean one might worry that some worlds were deeply 'tricky' in that almost always when it appeared two objects behaved like independent random variables in reality there was some hidden correlation that would eventually pop out to bite you in the ass and then once you'd taken that correlation into account another one would bite you and so on and so on.

But if you think about it for awhile this isn't really so much a question about the nature of the world as it is a purely mathematical question. If we keep factoring out by our best predictions will the remaining unaccounted for variation in outcomes appear to be random, i.e., make modeling it as random variables an accurate way to make predictions? Well that's actually kinda complicated, I have a theorem (well tiny tweak of someone else's theorem plus interpratation) which I believe says that yes indeed it must work this way. I won't go into it here but let me just say one thing to convince you of it's plausibility.

Basically the argument is that things only fail to look random because we notice a more accurate way of predicting their behavior. The only evidence for a sequence of observations failing to be random according to the supposed distribution would be a pattern in the observations not captured by R so would in turn yield a more accurate distribution. So basically the claim is that we can always simply divide up any observable into the part we can predict (i.e. a distribution of outcomes) and the part we can't. Once you mod out by the part you can predict by defintion anything left is totally unpredictable to you (e.g. computable machines) and thus can't detectably fail to look random according to it's distribution since that would be a better prediction.

This isn't rigorous (it's complicatd) but the point is that Randomness is nothing but our inability to make any better predictions

It depends. We use the term "probability" to cover a variety of different things, which can be handled by similar mathematics but are not the same.

For example, suppose that I'm playing blackjack. Given a certain disposition of cards, I can calculate a probability that asking for the next card will bust me. In this case the state of the world is fixed, and probability measures my ignorance. The fact that I don't know which card would be dealt to me doesn't change the fact that there's a specific card on the top of the deck waiting to be dealt. If I knew more about the situation (perhaps by counting cards) I might have a better idea of which cards could possibly be on top of the deck, but the same card would still be on top of the deck. In this situation, case 1 applies from the choices above.

Alternately consider photons going through a double slit in the classical quantum physics experiment. If the holes are of equal size and geometry, a photon has a 50% chance of passing through each slit (the probabilities can be adjusted, for example by changing the width of one slit). One of the basic results of quantum physics is that the profile of the light through both slits is not the same as the sum of the profiles of the light through each slit. In general, it is not possible to say which slit a given photon when through, and attempting to make that measurement changes the answer. In this situation, case 3 of the above post seems to apply.

My point is that the post's question can't be answered for probabilities in general. It depends.

2 and 4 are much the same if you only care about worlds you are in.

The post would be much better if a definition of "possible world" was given. When giving definitions, perhaps to define what does "real" precisely mean would be beneficial.

More or less, I interpret "reality" as all things which can be observed. "Possible", in my language", is something which I can imagine and which doesn't contradict facts that I already know. This is somewhat subjective definition, but possibility obviously depends subjective knowledge. I have flipped a coin. Before I have looked at the result, it was possible that it came up heads. After I have looked at it, it's clear that it came up tails, heads are impossible.

Needless to say, people rarely imagine whole worlds. Rather, they use the word "possible" when speculating about unknow parts of this world. Which may be confusing, since our intuitive understanding of the word doesn't match its use.

Even if defined somehow objectively (as e.g. possible world is any world isomorphic to a formal system with properties X), it seems almost obvious that real world(s) and possible worlds are different categories. If not, there is no need to have distinct names for them.

So before creating theories about what probability means, I suggest we unite the language. These things have been discussed here already several times, but I don't think there is a consensus in interpretation of "possible", "real", "world", "arbitrary". And, after all, I am not sure whether "probability" even should be interpreted using these terms. It almost feels like "probability" is a more fundamental term than "possible" or "arbitrary".

I must admit that I am biased against "possible worlds" and similar phrases, because they tend to appear mostly in theological and philosophical discussions, whose rather empty conclusions are dissatisfactory. I am afraid of lack of guidelines strong enough to keep thinking in limits of rationality.

Why should probabilities mean anything? How how would you behave differently if you decided (or learned) a given interpretation was correct?

As long as there's no difference, and your actions add up to normality under any of the interpretations, then I don't see why an interpretation is needed at all.

The different interpretations suggest different approaches to answer the question of "what is the right prior?" and also different approaches to decision theory. I mentioned that the "caring" interpretation fits well with UDT.

Can't you choose your (arational) preferences to get any behaviour (decision theory) no matter what interpretation you choose?

Preferences may be arational, but they're not completely arbitrary. In moral philosophy there are still arguments for what one's preferences should be, even if they are generally much weaker than the arguments in rationality. Different interpretations influence what kinds of arguments apply or make sense to you, and therefore influence your preferences.

How can there be arguments about what preferences should be? Aren't they, well, a sort of unmoved mover, a primal cause? (To use some erstwhile philosophical terms :-)

I can understand meta-arguments that say your preferences should be consistent in some sense, or that argue about subgoal preferences given some supergoals. But even under strict constraints of that kind, you have a lot of latitude, from humans to paperclip maximizers on out. Within that range, does interpreting probabilities differently really give you extra power you can't get by finetuning your prefs?

Edit: the reason I'd perfer editing prefs is that talking about the Meaning of Probabilities sets off my materialism sensors. It leads to things like multiple-world theories because they're easy to think about as an inetrpretation of QM, regardless of whether they actually exist. Then they can actually negatively affect our prefs or behavior.

Re: "How can there be arguments about what preferences should be?"

The idea that some preferences are "better" than other ones is known as "moral realism".

Wikipedia says moral realists (in general) claim that moral propositions can be true or false as objective facts but their truth cannot be observed or verified. This doesn't make any sense. Sounds like religion.

Are you looking at http://en.wikipedia.org/wiki/Moral_realism ...?

Care to quote an offending section about moral truths not being observervable or verifiable?

Under the section "Criticisms":

Others are critical of moral realism because it postulates the existence of a kind of "moral fact" which is nonmaterial and does not appear to be accessible to the scientific method. Moral truths cannot be observed in the same way as material facts (which are objective), so it seems odd to count them in the same category. One emotivist counterargument (although emotivism is usually non-cognitivist) alleges that "wrong" actions produce measurable results in the form of negative emotional reactions, either within the individual transgressor, within the person or people most directly affected by the act, or within a (preferably wide) consensus of direct or indirect observers.

Regarding the emotivist criticism, it begs a lot of questions. Surely not all negative emotional reactions signal wrong moral actions. Besides, emotivism isn't aligned with moral realism.

I see - thanks.

That some criticisms of moral realism appear to lack coherence does not seem to me to be a point that counts against the idea.

I expect moral realists would deny that morality is any more nonmaterial than any other kind of information - and would also deny that it does not appear to be accessible to the scientific method.

If moral realism acts as a system of logical propositions and deductions, then it has to have moral axioms. How are these grounded in material reality? How can they be anything more than "because i said so and I hope you'll agree"? Isn't the choice of axioms done using a moral theory nominally opposed to moral realism, such as emotivism, or (amoral) utilitarianism?

One way would be to consider the future of civilization. At the moment, we observe a Shifting Moral Zeitgeist. However, in the future we may see ideas about how to behave towards other agents settle down into an optimal region. If that turns out to be a global optimum - rather than a local one - i.e. much the same rules would be found by most surviving aliens - then that would represent a good foundation for the ideas of moral realism.

Even today, it should be pretty obvious that some moral systems are "better" than others ("better" in the sense of promoting the survival of those systems). That doesn't necessarily mean there's a "best" one - but it leaves that possibility open.

It might also sound like science - don't scientists generally claim that propositions about the world can be true or false, but cannot be directly observed or verified?

Joshua Greene's thesis "The Terrible, Horrible, No Good, Very Bad Truth about Morality and What to Do About it" might be a decent introduction to moral realism / irrealism. Overall it is an argument for irrealism.

In science, a proposition about the world can generally be proven or disproven with arbitrary probability, so you can become as sure about it as you like if you invest enough resources.

In moral realism, propositions are purely logical constructs, and can be proven true or false just like a mathematica proposition. Their truth is one with the truth of the axioms used, and the axioms can't be proven or disproven with any degree of certainty; they are simply accepted or not accepted. The morality is internally consistent, but you can't derive it from the real world, and you can't derive any fact about the real world from the morality. That sounds just like theology to me. (The difference between this and ordinary math or logic, is that mathematical constructs aren't supposed to lead to should or ought statements about behavior.)

I will read Greene's thesis, but as far as I can tell it argues against moral realism (and does it well), so it won't help me understand why anyone would believe in it.

How can there be arguments about what preferences should be?

Well, I don't know what many of my preferences should be. How can I find out except by looking for and listening to arguments?

Aren't they, well, a sort of unmoved mover, a primal cause? (To use some erstwhile philosophical terms :-)

No, not for humans anyway.

Well, I don't know what many of my preferences should be. How can I find out except by looking for and listening to arguments?

That implies there's some objectively-definable standard for preferences which you'll be able to recognize once you see it. Also, it begs the question of what in your current preferences says "I have to go out and get some more/different preferences!" From a goal-driven intelligence's POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.

I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov's reply to you also concerns the former, I think, but I'm more interested in the latter here.

Unlike a rational agent, we don't have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn't make much sense to me.

I'm not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.

Please see my reply to Nesov above, too.

I think we shouldn't try to emulate rational agents at all, in the sense that we shouldn't pretend to have rationality-style preferences and supergoals; as a matter of fact we don't have them.

Up to here we seem to agree, we just use different terminology. I just don't want to conflate rational preferences with human preferences because they the two systems behave very differently.

Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn't normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they're not your actual acting-preferences.

But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn't rationality, and when there could be confusion due to context (such as on LW most of the time) I'd prefer not to use the term "preferences" about humans, or to make clear what is meant.

That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or "true" neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, "absence of action" is also an action, so not exploring the arguments may as well be a worse choice, since you won't be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.

Thus, there is a tradeoff:

  • Irrational perception of arguments leads to modification of preference, which is bad for original preference, but
  • Considering moral arguments leads to a more clear understanding of some preference close to the original one, which allows to make more rational decisions, which is good for the original preference.

FWIW, my preferences have not been changed by arguments in the last 20 years. So I don't think your "we" includes me.

As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn't necessarily "change" or "determine arbitrarily" the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn't mean that you know what this algorithm will give you in the end, what the algorithm "means". Resist the illusion of transparency.

I don't understand what you're saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn't know what the end output of that algorithm will be?

There will always be something about preference that you don't know, and it's not the question of modifying preference, it's a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.

If we figure out the conceptual issues of FAI, we'd basically have the algorithm that is our preferences, but not in infinite and unknowable normal "execution trace" denotational "form".

As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.

I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn't call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.

On the other hand if we define human preferences as 'whatever human behavior happens to optimize', then there's nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei's statement that he deliberately makes it happen is unclear (the totality of a human's behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.

There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of "whatever happens". Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn't directly related to what they "appear" to strive towards.

I understand. So you're saying we should indeed use the term 'preference' for humans (and a lot of other agents) because no really rational agents can exist.

Actually, why is this true? I don't know about perfect rationality, but why shouldn't an agent exist whose preferences are completely specified and unchanging?

I understand. So you're saying we should indeed use the term 'preference' for humans (and a lot of other agents) because no really rational agents can exist.

Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans' have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of "preference" still needs to be there.

Actually, why is this true? I don't know about perfect rationality, but why shouldn't an agent exist whose preferences are completely specified and unchanging?

Again, it's not about changing preference. See these comments.

An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.

OK, I understand now what you're saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren't separate from the rest of our mind.

I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren't separate from the rest of our mind.

I don't understand this point.

Rational (designed) agents can have an architecture with preferences (decision making parts) separate from other pieces of their minds (memory, calculations, planning, etc.) Then it's easy (well, easier) to reason about changing their preferences because we can hold the other parts constant. We can ask things like "given what this agent knows, how would it behave under preference system X"?

The agent may also be able to simulate proposed modifications to its preferences without having to simulate its entire mind (which would be expensive). And, indeed, a sufficiently simple preference system may be chosen so that it is not subject to the halting problem and can be reasoned about.

In humans though, preferences and every other part of our minds influence one another. While I'm holding a philosophical discussion about morality and deciding how to update my so-called preferences, my decisions happen to be affected by hunger or tiredness or remembering having had good sex last night. There are lots of biases that are not perceived directly. We can't make rational decisions easily.

In rational agents who are self-modifying preferences, the new prefs are determined by the old prefs, i.e. via second-order prefs. But in humans prefs are potentially determined by the entire state of mind, so perhaps we should talk about "modifying our minds" and not our prefs, since it's hard to completely exclude most of our mind from the process.

Then it's easy (well, easier) to reason about changing their preferences because we can hold the other parts constant.

As per Pei Wang's suggestion, I'm stating that I'm going to opt out of this conversation until you take seriously (accept/investigate/argue against) the statement that preference is not to be modified, something that I stressed in several of the last comments.

There are other relevant differences as well, of course. For instance, a good rational agent would be able to literally rewrite its preferences, while humans have trouble with self-binding their future selves.

All possible worlds are real, and probabilities represent how much I care about each world.

Could you elaborate on what it means to have a given amount of "care" about a world? For example, suppose that I assign (or ought to assign) probability 0.5 to a coin's coming up heads. How do you translate this probability assignment into language involving amounts of care for worlds?

You care equally for your selves that see heads and your selves that see tails. If you don't care what happens to you after you see heads, then you would assign probability one to tails. Of course, you'd be wrong in about half the worlds, but hey, no skin off your nose. You're the one who sees tails. Those other guys ... they don't matter.

A bizarre interpretation.

For example, caring about "living until tomorrow" does not normally mean assigning a zero probability to death in the interim. If anything that would tend to make you fearless - indifferent to whether you stepped in front of a bus or not - the very opposite of what we normally mean by "caring" about some outcome.

Thanks. That makes it a lot clearer.

It seems like this "caring" could be analyzed a lot more, though. For example, suppose I were an altruist who continued to care about the "heads" worlds even after I learned that I'm not in them. Wouldn't I still assign probability ~1 to the proposition that the coin came up tails in my own world? What does that probability assignment of ~1 mean in that case?

I suppose the idea is that a probability captures not only how much I care about a world, but also how much I think that I can influence that world by acting on my values.

See http://lesswrong.com/lw/15m/towards_a_new_decision_theory/ for more details. Many of my later posts can be considered explanations/justifications for the "design choices" I made in that post.