Expected evidence is conserved if almost anything you could observe would update you slightly in the same direction, with the much rarer cases updating you correspondingly much further in the opposite direction.
For what it's worth, I also read this as Eliezer reporting on a case where he later realised that his views were violating CoEE. This is equivalent to observing evidence, realising that your prior should have given it more credence, and then updating in the same direction you moved your prior. Sounds awful but can be reasonable for computationally bounded agents!
This sounds like you're describing questions whose answers you know some things about. There is a subtle but important difference between knowing the answer, versus knowing something about the answer. It sounds like "direction" is metadata about answers, not the answer itself. In programming, I would liken it to the difference between knowing what type a function will return, versus knowing what value it will return when presented with a particular input. The direction is like the type; the answer itself is like the value.
I often get the impression that Eliezer thinks more abstractly than I tend to, so I find it useful to construct several embarrassingly concrete examples which seem to fit what he's talking about and try to triangulate from them toward what he's actually discussing.
What's the most embarrassingly concrete, eli5 example that might highlight what "direction" is? Maybe we breed two organisms within the same species, and wonder what color the offspring will be. I'm reading "direction" as metadata about the answer, so the "direction" of the answer would be the space of possible colors for the species? I know that whatever color I observe on the offspring will be one of the possible colors for the species unless something went wrong -- if it comes out a brand new color, I was wrong about what colors were possible within that species, and if the parents don't reproduce successfully, I was wrong about the offspring existing at all.
Any instance of the planning fallacy is an example of this - any setback, regardless of what the specific setback is, would make one expect a project to take longer to do, but by default people predict timelines as if the project will go smoothly even though if you asked them they'd say that they'd expect there to probably be a setback of some form.
Eliezer says some confusing things in Biology Inspired Timelines:
A few lines down, he admits it might be tricky to get:
I can stomach abstract and confusing, but this also smelled like a violation of the law of conservation of expected evidence. An excerpt, from fifteen years ago:
So why does he seem to be saying different now? Stop reading now if you want to figure this out for yourself first.
Proof of Eliezer's claim.
Long story short: this way of reasoning would in fact be a refined application of CoEE, not a violation; to get yourself to be more tightly in accord with it. Your prior "should" be the expected posterior. If it isn't, something (the prior) needs to change.
It might help to see the (fairly trivial) math actually say "make your confidence higher than before".
Setup
Let's take:
H:= some hypothesis you're reasoning about. Like "There is a substantial difference between brain machinery and AGI machinery".
E:={w∈ worlds | learning that I'm in world w would move me closer to H}.
or, expressed more mathematically:
E:={w∈ worlds | P(H|{w})≥P(H)}.
Lastly, recall that conservation of expected evidence (which is a simple tautology) says:
P(H)=P(H|E)P(E)+P(H|¬E)P(¬E).
Proof.
Suppose Eliezer's conditional is true: "learning almost any specific fact would move me in the same direction."
Notice how E above is constructed to talk about exactly this: it is the set of worlds that would move you in the same direction, towards H. This conditional is effectively saying P(E) is close to 1.
That means that the CoEE reduces to P(H)≈P(H|E).
Answer. You'd still have to decompose similarly. Either way, I find CoEE more dynamically intuitive, as a balance between my future possible states of mind. Ultimately, they're both trivial.
Now this "equation" is intentionally (but carefully) conforming with the law, not an equation that's just always conveniently true about your actual reasoning methodology. You merely wish it were true. In other words, it's more like a variable assignment than an equation:
P(H)←P(H|E).
Since each world w∈E as a singleton is disjoint, we can re-write the RHS:
P(H|E)=∑w∈EP(H|{w})⋅P(w)P(E).
But for each such w∈E, we have P(H|{w})≥P(H) by definition of E. Thus:
P(H|E)=∑w∈EP(H|{w})⋅P(w)P(E)≥∑w∈EP(H)⋅P({w})P(E)=P(H)P(E)P(E)=P(H).
Thus, your prior weight on H should be assigned something higher than it was before.
(You could have ignored the P(E) term by assumption, but I avoided doing that so that you don't have to wonder about some funny business about approximations and such. As you can see, it cancels out regardless.)
Conclusion
If you've failed to (almost) equalize this but believe the conditional, then update now to match before a Dutch gentleman finds you.
Examples?
This is a nice (albeit tautological) result, but I don't have any realistic examples. How do you end up in this situation? I'm not even sure everyone would be convinced by the one Eliezer gives:
It is perhaps arguable that we might find in most worlds that the machinery is in fact quite similar, where it matters. Personally I'd be surprised by this, but a good example ought to be completely incontrovertible and at least somewhat non-trivial.
Help me out with some?