Friedrich Spee von Langenfeld, a priest who heard the confessions of condemned witches, wrote in 1631 the Cautio Criminalis (“prudence in criminal cases”), in which he bitingly described the decision tree for condemning accused witches: If the witch had led an evil and improper life, she was guilty; if she had led a good and proper life, this too was a proof, for witches dissemble and try to appear especially virtuous. After the woman was put in prison: if she was afraid, this proved her guilt; if she was not afraid, this proved her guilt, for witches characteristically pretend innocence and wear a bold front. Or on hearing of a denunciation of witchcraft against her, she might seek flight or remain; if she ran, that proved her guilt; if she remained, the devil had detained her so she could not get away.

Spee acted as confessor to many witches; he was thus in a position to observe every branch of the accusation tree, that no matter what the accused witch said or did, it was held as proof against her. In any individual case, you would only hear one branch of the dilemma. It is for this reason that scientists write down their experimental predictions in advance.

But you can’t have it both ways —as a matter of probability theory, not mere fairness. The rule that “absence of evidence is evidence of absence” is a special case of a more general law, which I would name Conservation of Expected Evidence: the expectation of the posterior probability, after viewing the evidence, must equal the prior probability.

Therefore, for every expectation of evidence, there is an equal and opposite expectation of counterevidence.

If you expect a strong probability of seeing weak evidence in one direction, it must be balanced by a weak expectation of seeing strong evidence in the other direction. If you’re very confident in your theory, and therefore anticipate seeing an outcome that matches your hypothesis, this can only provide a very small increment to your belief (it is already close to 1); but the unexpected failure of your prediction would (and must) deal your confidence a huge blow. On average, you must expect to be exactly as confident as when you started out. Equivalently, the mere expectation of encountering evidence—before you’ve actually seen it—should not shift your prior beliefs.

So if you claim that “no sabotage” is evidence for the existence of a Japanese-American Fifth Column, you must conversely hold that seeing sabotage would argue against a Fifth Column. If you claim that “a good and proper life” is evidence that a woman is a witch, then an evil and improper life must be evidence that she is not a witch. If you argue that God, to test humanity’s faith, refuses to reveal His existence, then the miracles described in the Bible must argue against the existence of God.

Doesn’t quite sound right, does it? Pay attention to that feeling of this seems a little forced, that quiet strain in the back of your mind. It’s important.

For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.

This realization can take quite a load off your mind. You need not worry about how to interpret every possible experimental result to confirm your theory. You needn’t bother planning how to make any given iota of evidence confirm your theory, because you know that for every expectation of evidence, there is an equal and oppositive expectation of counterevidence. If you try to weaken the counterevidence of a possible “abnormal” observation, you can only do it by weakening the support of a “normal” observation, to a precisely equal and opposite degree. It is a zero-sum game. No matter how you connive, no matter how you argue, no matter how you strategize, you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.

You might as well sit back and relax while you wait for the evidence to come in.

. . . Human psychology is so screwed up.

New Comment
81 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

One minor correction, Eliezer: the link to your essay uses the text "An Intuitive Expectation of Bayesian Reasoning." I think you titled that essay "An Intuitive EXPLANATION of Bayesian Reasoning." (I am 99.9999% sure of this, and would therefore pay especial attention to any evidence inconsistent with this proposition.)

I guess I was a Bayesian before I knew what it meant....

Perhaps this formulation is nice:

0 = (P(H|E)-P(H))P(E) + (P(H|~E)-P(H))P(~E)

The expected change in probability is zero (for if you expected change you would have already changed).

Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.

Hey, sorry if it's mad trivial, but may I ask for a derivation of this? You can start with "P(H) = P(H|E)P(E) + P(H|~E)P(~E)" if that makes it shorter.

(edit):

Never mind, I just did it. I'll post it for you in case anyone else wonders.

1} P(H) = P(H|E)P(E) + P(H|~E)P(~E) [CEE]
2} P(H)P(E) + P(H)P(~E) = P(H|E)P(E) + P(H|~E)P(~E) [because ab + (1-a)b = b]
3} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [subtract P(H) from every value to be weighted]
4} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = P(H) - P(H) = 0 [because ab + (1-a)b = b]
(conclusion)
5} 0 = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [by identity syllogism from lines 3 and 4]

8Vaniver
P(H) = P(H|E)P(E) + P(H|~E)P(~E) P(H)*(P(E)+P(~E))=P(H|E)P(E) + P(H|~E)P(~E) P(H)P(E)+P(H)P(~E)=P(H|E)P(E) + P(H|~E)P(~E) P(H)P(~E)=(P(H|E)-P(H))*P(E) + P(H|~E)P(~E) 0=(P(H|E)-P(H))*P(E) + (P(H|~E)-P(H))*P(~E) The trick is that P(E)+P(~E)=1, and so you can multiply the left side by the sum and the right side by 1.

Eliezer,

Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?

More precisely, I see no reason why there need be no change in the confidence level. As long as the probability is greater than 50% in one direction or the other, I have an expectation of a certain outcome. So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?

One reason is Cox's theorem, which shows any quantitative measure of plausibility must obey the axioms of probability theory. Then this result, conservation of expected evidence, is a theorem.

What is the "confidence level"? Why is 50% special here?

"Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?"

Because it's mathematically proven. You might as well ask "Why do we have to accept the strong form of arithmetic?"

"So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?"

Because (in this case especially!) small probabilities can have large consequences. If we invent a marvelous new cure for acne, with a 1% chance of death to the patient, it's well below 50% and no specific person using the "medication" would expect to die, but no sane doctor would ever sanction such a "medication".

"Why is 50% special here?"

People seem to have a little arrow in their heads saying whether they "believe in" or "don't believe in" a proposition. If there are two possibilities, 50% is the point at which the little arrow goes from "not believe" to "believe".

3benbenson
And if I am following you, this is irrational. Correct?
1royf
More importantly, it's physically proven. The fact that the math is consistent (and elegant!) would not have been so powerful if it wasn't also true, particularly since Bayesianism implies some very surprising predictions. Fortunately, it is the happy case that, to the best of my knowledge, no experiments thus far contradict Bayesianism, and not for the lack of trying, which is as much proof as physically possible.
8gwern
Foundational issues like Bayesianism run into the old philosophy of science problems with a vengeance: which part of the total assortment of theory and observation do you choose to throw out? If someone proves a paradox in Bayesianism, do you shrug and start looking at alternatives - or do you 'defy the evidence' and patiently wait for an E.T. Jaynes to come along and explain how the paradox stems from taking an imprior limit or failing to take into account prior information etc.?
-1royf
(I'll adopt the seemingly rationalist trait of never taking questions as rhetorical, though both your questions strongly have that flavor). A central part of the modern scientific method is due to Popper, who gave an essentially Bayesian answer to your first question. However, Science wouldn't fall apart if it turned out that priors aren't a physical reality. Occam's razor is non-Bayesian, and it alone accounts for a large portion of our scientific intuitions. At the bottom line, the scientific method doesn't have to be itself true in order to be effective in discovering truths and discarding falsehoods. The concept of "proving a paradox" is unclear to me (almost a paradox in itself...). Paradoxes are mirages. Also, it seems that you have some specific piece of scientific history in mind, but I'm uncertain which. Luckily, we did have Jaynes and others to promote what I believe to be both a compelling mathematical framework and a physical reality. Before them, well, it would be wishful to think I could hold on to Bayesian ideas in the face of apparent paradoxes. The shoulders of giants etc.
3aspera
Occam's Razor is non-Bayesian? Correct me if I'm wrong, but I thought it falls naturally out of Bayesian model comparison, from the normalization factors, or "Occam factors." As I remember, the argument is something like: given two models with independent parameters {A} and {A,B}, the P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) <= P(A model). Even if the argument is wrong, I think the result ends up being that more plausible models tend to have fewer independent parameters.
3royf
You're not really wrong. The thing is that "Occam's razor" is a conceptual principle, not one mathematically defined law. A certain (subjectively very appealing) formulation of it does follow from Bayesianism. Your math is a bit off, but I understand what you mean. If we have two sets of models, with no prior information to discriminate between their members, then the prior gives less probability to each model in the larger set than in the smaller one. More generally, if deciding that model 1 is true gives you more information than deciding that model 2 is true, that means that the maximum entropy given model 1 is lower than that given model 2, which in turn means (under the maximum entropy principle) that model 1 was a-priori less likely. Anyway, this is all besides the discussion that inspired my previous comment. My point was that even without Popper and Jaynes to enlighten us, science was making progress using other methods of rationality, among which is a myriad of non-Bayesian interpretations of Occam's razor.
0Decius
How does deciding one model is true give you more information? Did you mean "If a model allows you to make more predictions about future observations, then it is a priori less likely?"
0royf
Let's assume a strong version of Bayesianism, which entails the maximum entropy principle. So our belief is the one that has the maximum entropy, among those consistent with our prior information. If we now add the information that some model is true, this generally invalidate our previous belief, making the new maximum-entropy belief one of lower entropy. The reduction in entropy is the amount of information you gain by learning the model. In a way, this is a cost we pay for "narrowing" our belief. The upside of it is that it tells us something useful about the future. Of course, not all information regarding the world is relevant for future observations. The part that doesn't help control our anticipation is failing to pay rent, and should be evacuated. The part that does inform us about the future may be useful enough to be worth the cost we pay in taking in new information. I'll expand on all of this in my sequence on reinforcement learning.
0Decius
At what point does the decision "This is true" diverge from the observation "There is very strong evidence for this", other than in cases where the model is accepted as true despite a lack of strong evidence? I'm not discussing the case where a model goes from unknown to known- how does deciding to believe a model give you more information than knowing what the model is and the reason for the model. To better model an actual agent, one could replace all of the knowledge about why the model is true with the value of the strength of the supporting knowledge. How does deciding that things always fall down give you more information than observing things fall down?
0CynicalOptimist
I believe the idea was to ask "hypothetically, if I found out that this hypothesis was true, how much new information would that give me?" You'll have two or more hypotheses, and one of them is the one that would (hypothetically) give you the least amount of new information. The one that would give you the least amount of new information should be considered the "simplest" hypothesis. (assuming a certain definition of "simplest", and a certain definition of "information")
0aspera
Crystal clear. Sorry to distract from the point.
1DanielLC
It's based on premises that may or may not be accurate. Just because it's mathematically proven, doesn't mean it's true.

Tom,

Bayes' Theorem has its limits. The support must be continuous, the dimensionality must be finite. Some of the discussion here has raised issues here that could be relevant to these kinds of conditiosn, such as fuzziness about the truth or falsity of H. This is not as straightforward as you claim it is.

Furthermore, I remind one and all that Bayes' Theorem is asymptotic. Even if the conditions hold, the "true" probability is approached only in the infinite time horizon. This could occur so slowly that it might stay on the "wrong" side of 50% well past the time that any finite viewer might hang around to watch.

There is also the black swan problem. It could move in the wrong direction until the black swan datum finally shows up pushing it in the other direction, which, again, may not occur during the time period someone is observing. This black swan question is exactly the frame of discussion here, as it is Taleb who has gone on and on about this business about evidence and absence thereof.

8bigjeff5
You cannot predict a black swan. That's why it can screw up your expectation. However, once you have a black swan you'd be an irrational fool not to include it in your expectation. That's the point. That's why theories get updated - new data that nobody was aware of before does not match expectations. This new evidence adjusts the probability that the theory was correct, and it gets thrown out if a different theory now has a higher probability in light of the new evidence. This is not a shortcoming of Bayes Theorem, it's a shortcoming of observation. That you should certainly be aware of. I.e. "I might not have all the facts."

you can't possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.

But you can act to change the probability distribution of your future beliefs (just not its mean). That's the entire point of testing a belief. If you have a 50% belief that a ball is under a certain cup, then by lifting the cup, you can be certain than your future belief will be in the set {0%,100%} (with equal probability for 0 and 100, hence the same mean as now).

Getting the right shape of the probability distribution of future belief is the whole skill in testing a hypothesis.

But you can't have it both ways - as a matter of probability theory, not mere fairness.

You've proved your case - but there's still enough wriggle room that it won't make much practical difference. One example from global warming, which predicts higher temperature on average in Europe - unless it diverts the gulf stream, in which case it predicts lower average temperatures. Consider the two statements: 1) If average temperatures go up in Europe, or down, this is evidence for global warming. 2) If average temperatures go up in Europe, and the gulf stream isn't diverted, or average temperatures go down, while the gulf stream is diverted, this is evidence of global warming.

1) is nonsense, 2) is true. Lots of people say statements that sound like 1), when they mean something like 2). Add an extra detail, and the symmetry is broken.

This weakens the practical power of your point; if an accused witch is afraid, that shows she's guilty; if she's not afraid, in a way which causes the inquisitor to be suspicious, she's also guilty. That argument is flawed, but it isn't a logical flaw (since the similar statement 2) is true).

Then we're back to arguing the legitimacy of these "extra details".

Stuart, if the extra details are observable and specified in advance, the legitimacy is clear-cut.

Barkley, I'm an infinite set atheist, all real-world problems are finite; and you seem to be assuming that priors are arbitrary but likelihood ratios are fixed eternal and known, which is a strange position; and in any case what does that have to do with something as simple as Conservation of Expected Evidence? If anyone attempts to make an infinite-set scenario that violates CEE, it disproves their setup by reductio ad absurdum, and reinforces the ancient wisdom of E. T. Jaynes that no infinity may be assumed except as the proven limit of a finite problem.

Eliezer,

I do not necessarily believe that likelihood ratios are fixed for all time. The part of me that is Bayesian tends to the radically subjective form a la Keynes.

Also, I am a fan of nonstandard analysis. So, I have no problem with infinities that are not mere limits.

a more general law, which I would name Conservation of Expected Evidence

I thought it was pretty clear that I was coining the phrase. I'm certainly not the first person to point out the law. E.g. Robin notes that our best estimate of anything should have no predictable trend. In any case, I posted the mathematical derivation and you certainly don't have to take my word about anything.

Eliezer,

Fair enough. You get credit, then, for coining the term. However, the problem remains, why should that equals sign be there? Sure, if you put it there, the logic holds up, my niggles about Bayes' Theorem and time to convergence and all that aside. But, it is not clear at all that the equals sign should be there, or is there in any meaningfully regular way. Your defense has been to cite an essentially empirical argument by Robin. But that empirical argument is much contested in many arenas. Sure, Burton Malkiel posed that financial markets ar... (read more)

Barkley, it looks to me like Eli derived it using the sum and product rules of probability theory.

What Peter said. Barkley, do you question that P(H) = P(H,E) + P(H, ~E) or do you question that P(H,E) = P(H|E)*P(E)?

Eliezer and Peter, I think the problem is statics versus dynamics. Your set of equations are correct only at a specific point in time, which makes them irrelevant to saying anything about what happens later when new information arrives. That would entail subscripting H by time. For any given t, sure. But, that says nothing about what happens when new information arrives. P(H) might change.

The obvious example is indeed the black swan story, which we all know is what is lying behind this discussion. So, at a point in time before black swans are observe... (read more)

...

Barkley, you don't realize that Bayes's Theorem is precisely what describes the normative update in beliefs over time? That this is the whole point of Bayes's Theorem?

Before black swans were observed, no one expected to encounter a black swan, and everyone expected to encounter another white swan on occasion. A black swan is huge evidence against, a white swan is tiny additional evidence for. Had they been normative, the two quantities would have balanced exactly.

I'm not sure what to say here. Maybe point to Probability Theory: The Logic of Science or A Technical Explanation of Technical Explanation? I don't know where this misunderstanding is coming from, but I'm learning a valuable lesson in how much Bayesian algebra someone can know without realizing which material phenomena it describes.

"no one expected to encounter a white swan, and everyone expected to encounter another black swan on occasion. A white swan is huge evidence against, a black swan is tiny additional evidence for." I presume you meant the reverse of this?

per the Black Swan:

The set of potential multicolored variations of Swans is infinite (purple, brown, grey, blue, green, etc). We can not prove any one of them do not exist. But every day that proceeds where we don't see these swans gives us a higher probability they do not exist. It never equals 1, but it's darn close.

The problem with the Black Swan parable is not that it's untrue, but rather unimportant. The set of things we have no evidence of is infinite. To then pounce across an unexpected observation (eg, a Black Swan, that Kevin Federline is a re... (read more)

Eliezer,

This is about to scroll off, but, frankly, I do not know what you mean by "normative" in this context. The usual usage of this term implies statements about values or norms. I do not see that anything about this has anything to do with values or norms. Perhaps I do not understand the "wholel point of Bayes' Theorem." Then again, I do not see anything in your reply that actually counters the argument I made.

Bottom line: I think your "law" is only true by assumption.

What I mean, Barkley, is that the expression P(H|E), as held at time t=0, should - normatively - describe the belief about H you will hold at time t=2 if you see evidence E at time t=1. Thus, statements true in probability theory about the decomposition of P(H) imply the normative law of Conservation of Expected Evidence, if you accept that probability theory is normative for real-world problems where no one has ever seen an infinite set.

If you don't think probability theory is valid in the real world, I have some Dutch Book trades I'd like to make with y... (read more)

Eliezer Yudkowsky, The word "normative" has stood in the way of my understanding what you mean, at least the first few times I saw you use it, before I pegged you as getting it from the heuristics and biases people. It greatly confused me many times when I first encountered them. It's jargon, so it shouldn't be surprising that different fields use it to mean rather different things.

The heuristics and biases people use it to mean "correct," because social scientists aren't allowed to use that word. I think there's a valuable lesson about academics, institutions, or taboos in there, but I'm not sure what it is. As far as I can tell, they are the only people that use it this way.

My dictionary defines normative as "of, relating to, or prescribing a norm or standard." It's confusing enough that it carries those two or three meanings, but to make it mean "correct" as well is asking for trouble or in-groups.

1Jotto999
I agree - it can be especially ambiguous if you're also used to the economics context of normative, meaning "how subjectively desirable something is".

This post was one of the most helpful for me personally, but I recently realized this isn't true in an absolute sense: "There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before."

Suppose the statement "I perform action A" is more probable given position P than given not-P. Then if I start planning to perform action A, this will be evidence that I will perform A. Therefore it will also be evidence for p... (read more)

Um, no, if a study shows that people who chew gum also have a gene GXTP27 or whatever, which also protects against cancer, I cannot plan to increase my subjective probability that I have gene GXTP27 by starting to chew gum.

See also: "evidential decision theory", why nearly all decision theorists do not believe in.

Here's an example which doesn't bear on Conservation of Expected Evidence as math, but does bear on the statement,

"There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before."

taken at face value.

It's called the Cable Guy Paradox; it was created by Alan Hájek, a philosopher the Australian National University. (I personally think the term Paradox is a little strong for this scenario.)

Here it is: the cable guy is co... (read more)

9bigjeff5
You either have a new interval, or new information suggesting the probability density for the interval has changed. Conservation of Expected Evidence does not mean Ignorance of Observed Evidence. This is just a restatement of the black swan problem, and it's a non-issue. If evidence does not exist yet it does not exist yet. It doesn't cast doubt on your methods of reasoning, nor does it allow you make a baseless guess of what might come in the future.
2lolbifrons
If you count the amount of "wanting to switch" you expect to have because the cable guy hasn't arrived yet, it should equal exactly the amount of "wishing you hadn't been wrong" you expect to have if you pick the second half because the cable guy arrived before your window started. I'm not sure how to say this so it's more easily parseable, but this equality is exactly what conservation of expected evidence describes.
1CCC
At 10am tomorrow, I can legitimately express my confidence in the proposition "the cable guy will arrive after noon" is different to what it was today. There are two cases to consider: * The cable guy arrived before 10am (occurs with 25% probability). In this case, I expect that he has a close on zero probability of arriving after noon. * The cable guy is known not to have arrived before 10am (occurs with 75% probability). At this point, I calculate that the odds of the cable guy turning up after noon are two in three. But none of this takes anything away from the original statement: This is because I am changing my probability estimate on the basis of new information received - it's not a fixed proposition.

Eliezer - what if the presence of the gene was decided by an omnipotent being called Omega? Then you'd break out the Spearmint, right?

I'll modify my advice. If the probability that "I do action A in order to increase my subjective probability of position P" is greater given P than given not P, then doing A in order to increase my subjective probability of position P will be evidence in favor of P.

So in many cases, there will such a plan that I can devise. Let's see Eliezer find a way out of this one.

1Liliet B
Let's say you are organising a polar expedition. It will succeed (A) or fail (~A). There is a postulate that there are no man eating polar Cthulhu in the area (P). If there are some (~P), the expedition will fail (~A), thus entangling A with P. You can do your best to prepare the expedition so that it will not fail for non-Cthulhu reasons, strengthening the entanglement - ~A becomes stronger evidence for ~P. You can also do your best to prepare the expedition to survive even the man eating polar Cthulhu, weakening the entanglement - by introducing a higher probability of A&~P, we're making A weaker evidence for P. Do any of these preparations, in themselves, actually influence the amount of man eating polar Cthulhu in the area?
1Liliet B
Before you have actually done A, since it might fail because of ~P (which is what the thing you said actually means), your confidence is still the same as before you came up with the plan. We're still at t=0. Information about your plan succeeding or not hasn't arrived yet. Now if over the course of planning you realize that the very ability you have to make the plan shifts probability estimate of P, then we've already got the new evidence. We're at t=1, and the probability has shifted rightfully without violating the law. The evidence is no longer expected, it's already here! Before you started planning, you didn't know that you would succeed and get this information. Not for certain. Or if you did, your estimate of probability of P was clearly wrong, but you hadn't noticed it yet, where the "yet" is the time factor that distinguishes between t0 and t1 again... Can't cheat your way out of this at t=0, I'm afraid.

Actually, the Omega situation is a perfect example. Someone facing the two boxes would like to increase his subjective probability that there is a million in the second box, and he is able to do this by deciding to take only the second box. If he decides to take both, on the other hand, he should decrease his credence in the presence of the million, even before opening the box.

2khafra
In this case, the decision he's leaning toward is evidence of the presence of $1M, by way of Omega's observed reliability in predicting decisions of agents like him.

Fantastic heuristic! It's like x=y·(z/y)+(1-y)·(x-z)/(1-y) for the rationalist's soul :)

It's worth noting, though, that you can rationally expect your credence in a certain belief "to increase", in the following sense: If I roll a die, and I'm about to show you the result, your credence that it didn't land 6 is now 5/6, and you're 5/6 sure that this credence it about to increase to 1.

I think this is what makes people feel like they can have a non-trivial expected value for their new beliefs: you can expect an increase or expect a decrease, but quantitatively the two possibilities exactly cancel each out in the expected value of your belief.

-5jslocum

I have a theory that I will post this comment. By posting the comment, I'm seeking evidence to confirm the theory. If I post the comment, my probability will be higher than before.

Similarly, in Newcomb's problem, I seek evidence that box A has a million dollars, so I refrain from taking box B. There was money in box B, but I didn't take it, because that would give me evidence that box A was empty.

In short, there's one exception to this: when your choice is the evidence.

0CG_Morton
The simple answer is that your choice is also probabilistic. Let's say that your disposition is one that would make it very likely you will choose to take only box A. Then this fact about yourself becomes evidence for the proposition that A contains a million dollars. Likewise if your disposition was to take both, it would provide evidence that A was empty. Now let's say that you're pretty damn certain that this Omega guy is who he says he is, and that he was able to predict this disposition of yours; then, noting your decision to take only A stands as strong evidence that the box contains the million dollars. Likewise with the decision to take both. But what if, you say, I already expected to be the kind of person who would take only box A? That is, that the probability distribution over my expected dispositions was 95% only box A and 5% both boxes? Well then it follows that your prior over the contents of box A will be 95% that is contains the million and 5% that it is empty. And as a result, the likely case of you actually choosing to take only box A need only have a small effect on your expectation of the contents of the box (~.05 change to reach ~1), but in the case that you introspect and find that really, you're the kind of person who would take both, then your expectation that the box has a million dollars will drop by exactly 19(=.95/.05) times as much as it would get raised by the opposite evidence (resulting in ~0 chance that it contains the million). Making the less likely choice will create a much greater change in expectation, while the more common choice will induce a smaller change (since you already expected the result of that choice). Hope that made sense.

There is more discussion of this post here as part of the Rerunning the Sequences series.

Wouldn't the rule be something more like:

((P(H|E) > P(H)) if and only if (P(H) > P(H|~E))) and ((P(H|E) = P(H)) if and only if (P(H) = P(H|~E)))

So, if some statement is evidence of a hypothesis, its negation must be evidence against. And if some statement's truth value is independent of a hypothesis, then so is that statements negation.

This is implied by the expectation of posterior probabilities version. Since P(E) + P(~E) = 1, that means that P(H|E) and P(H|~E) are either equal, or one is greater than P(H) and one is less than. If they were both l... (read more)

Hi, I'm new here but I've been following the sequences in the suggested order up to this point.

I have no problem with the main idea of this article. I say this only so that everyone knows that I'm nitpicking. If you're not interested in nitpicking then just ignore this post.

I don't think that the example given bellow is a very good one to demonstrate the concept of Conservation of Expected Evidence:

If you argue that God, to test humanity's faith, refuses to reveal His existence, then the miracles described in the Bible >must argue against the existen

... (read more)
1TheOtherDave
I would say, rather, that: G = God exists N = The existence of God is not revealed directly to humanity M = Miracles occur ...and we're talking about P(G|N) and P(G|M) and not talking about P(T) at all. More generally, T seems to be a red herring here. That said, I agree that there's a presumption that M implies ~N... that is, that if miracles occurred, that would constitute the direct revelation of God's existence. And yes, one could argue instead that no, miracles aren't a revelation of God's existence at all, but rather a test of faith. A lot depends here on what counts as a miracle; further discussion along this line would benefit from specificity.
2ctuck
I agree that T in and of itself is problematic. Your N seems more likely what the author intended, now that you point it out. Though I still don't think anyone who thought about it for more than 20 seconds would ever assert that N could be used as evidence for G. But using that as a model would probably serve well to underscore the point of Conservation of Evidence If the fact that God has not been revealed directly to humanity is evidence for the existence of God. Then should God ever reveal himself directly to humanity, it would be evidence against his existence. That's probably the statement Eliezer intended to make.
3TheOtherDave
(nods) And I would not be in the least surprised to find theologians arguing that the absence of direct evidence of God's existence is itself proof of the existence of God, and I would be somewhat surprised to find that none ever had, but I don't have examples. That said, straw theism is not particularly uncommon on LW; when people want a go-to example of invalid reasoning, belief in god comes readily to hand. It derives from a common cultural presumption of atheism, although there are some theists around.

Is this the same as Jaynes' method for construction of a prior using transformation invariance on acquisition of new evidence?

Does conservation of expected evidence always uniquely determine a probability distribution? If so, it should eliminate a bunch of extraneous methods of construction of priors. For example, you would immediately know if an application of MaxEnt was justified.

Therefore, for every expectation of evidence, there is an equal and opposite expectation of counter-evidence.

Eliezer, isn't the "equal" part untrue? I like the parallel with Newton's 3rd law, but the two terms P(H|E)*P(E) and P(H|~E)*P(~E) aren't numerically equal - we only know that they sum to P(H).

1Kindly
P(H) is the belief where you start, and P(H|E) and P(H|~E) are the possible beliefs where you end. You could go to one with probability P(E) and to the other with probability P(~E), but due to the identity you quote, in expectation you do not move at all.
3Oscar_Cunningham
The changes are equal and opposite: [ P(H|E) - P(H) ]*P(E) + [ P(H|~E) - P(H) ]*P(~E) = 0 See Nick Hay's much earlier comment.

For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.

Old post, but isn't evidence that disconfirms the theory X equal to confirming ~X? Is ~X ineligible to be considered a theory?

0pragmatist
Everything in that quote applies just as much to disconfirming a theory as it does to confirming a theory. Conservation of expected evidence means that you cannot legitimately expect your confidence in a theory to go down either.

The hyperlink "An Intuitive Explanation of Bayesian Reasoning" is broken. The current location of that essay is here: http://yudkowsky.net/rational/bayes

Mantel cox log rank tests compare observations and expectations too...

Can someone tell me if I understand this correctly : He is saying that we must be clear before hand what constitutes evidence for and what constitutes evidence against and what doesn't constitute evidence either way?

Because in his examples it seems that what is being changed is what counts as evidence. It seems that no matter what transpires (in the witch trials for example) it is counted as evidence for. This is not the same as changing the hypothesis to fit the facts. The hypothesis was always 'she's a witch'. Then the evidence is interpreted as supportive of the hypothesis no matter what.

0gjm
You don't necessarily have to figure it out beforehand (though it's certainly harder to fool yourself if you do). But if X is evidence for Y then not-X has to be evidence for not-Y. And yes, one thing that's going wrong in those witch trials is that both X and not-X are being treated as evidence for Y, which can't possibly be correct. (And the way in which it's going wrong is that the prosecutor correctly observes that Y could produce X or not-X, whichever of the two actually happened to turn up, and fails to distinguish between that and showing that Y is more likely to produce that outcome than not-Y, which is what would actually make the evidence go in the claimed direction.) Did anyone say it is? I'm not seeing where.

Hi, new here.

I was wondering if I've interpreted this correctly:

'For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.'

Does this mean that it is impossible to prove the truth of a theory? Because the only evidence that can exist is evidence that falsifies the theory, or... (read more)

1lucidfox
It is correct that we can never find enough evidence to make our certainty of a theory to be exactly 1 (though we can get it very close to 1). If we were absolutely certain in a theory, then no amount of counterevidence, no matter how damning, could ever change our mind.
0nshepperd
The important part of the sentence here is seek. The isn't about falsificationism, but the fact that no experiment you can do can confirm a theory without having some chance of falsifying it too. So any observation can only provide evidence for a hypothesis if a different outcome could have provided the opposite evidence. For instance, suppose that you flip a coin. You can seek to test the theory that the result was HEADS, by simply looking at the coin with your eyes. There's a 50% chance that the outcome of this test would be "you see the HEADS side", confirming your theory (p(HEADS | you see HEADS) ~ 1). But this only works because there's also a 50% chance that the outcome of the test would have shown the result to be TAILS, falsifying your theory (P(HEADS | you see TAILS) ~ 0). And in fact there's no way to measure the coin so that one outcome would be evidence in favour of HEADS (P(HEADS | measurement) > 0.5), without the opposite result being evidence against HEADS (P(HEADS | ¬measurement) < 0.5).

Closely related is the law of total expectation: https://en.wikipedia.org/wiki/Law_of_total_expectation

It states that E[E[X|Y]]=E[X].

I do not understand the validity of this statement:

There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before.

Given a temporal proposition A among a set of other mututally exclusive temporal propositions {A, B, C...}, demonstrating B, C, and other candidates do not meet the evidence so far while A meets the evidence so far does raise our confidence in the proposition *continuing to hold*. This is standard Bayesian inferenc... (read more)

[This comment is no longer endorsed by its author]Reply

Criticism of this article was found at a talk page at RationalWiki.

The Sequences do not contain unique ideas, and they present the ideas they do contain in misleading ways using parochial language. The "Law of Conservation of Expected Confidence" essay, for instance, covers ideas that are often covered in introductory philosophical methods or critical thinking courses. There is no novelty either in the idea that your expected future credence must match your current credence (otherwise, why not update your credence now?), nor in the idea that if E is eviden

... (read more)