Understanding Eliezer's "Any Fact Would Move Me in the Same Direction"

lhc

Eliezer says some confusing things in Biology Inspired Timelines:

Even without knowing the specifics [..] you ought to be able to reason abstractly about a directional update that you would make, if you knew any specifics instead of none.

A few lines down, he admits it might be tricky to get:

(Though I say this without much hope; I have not had very much luck in telling people about predictable directional updates they would make, if they knew something instead of nothing about a subject. I think it's probably too abstract for most people to feel in their gut, or something like that, so their brain ignores it and moves on in the end.

I have had life experience with learning more about a thing, updating, and then going to myself, "Wow, I should've been able to predict in retrospect that learning almost any specific fact would move my opinions in that same direction." But I worry this is not a common experience, for it involves a real experience of discovery, and preferably more than one to get the generalization.)

I can stomach abstract and confusing, but this also smelled like a violation of the law of conservation of expected evidence. An excerpt, from fifteen years ago:

There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before.

So why does he seem to be saying different now? Stop reading now if you want to figure this out for yourself first.

Proof of Eliezer's claim.

Long story short: this way of reasoning would in fact be a refined application of CoEE, not a violation; to get yourself to be more tightly in accord with it. Your prior "should" be the expected posterior. If it isn't, something (the prior) needs to change.

It might help to see the (fairly trivial) math actually say "make your confidence higher than before".

Setup

Let's take:

some hypothesis you're reasoning about. Like "There is a substantial difference between brain machinery and AGI machinery".

$E := {w \in$ worlds | learning that I'm in world $w$ would move me closer to $H}$ .

or, expressed more mathematically:

$E := {w \in$ worlds | $P (H | {w}) \geq P (H)}$ .

Note: This is ever-so-slightly meta, so stare at it a little if you're not used to this. As a reminder: an event is any collection of worlds. For example, the event "I win this round of chess" is the set of all worlds where you somehow ended up winning.

Lastly, recall that conservation of expected evidence (which is a simple tautology) says:

$P (H) = P (H | E) P (E) + P (H | \neg E) P (\neg E)$ .

Proof.

Suppose Eliezer's conditional is true: "learning almost any specific fact would move me in the same direction."

Notice how $E$ above is constructed to talk about exactly this: it is the set of worlds that would move you in the same direction, towards $H$ . This conditional is effectively saying $P (E)$ is close to $1$ .

That means that the CoEE reduces to $P (H) \approx P (H | E)$ .

Question. Isn't this trivial to see with just the definition of conditional probability (ie. isn't it clear that $P (H) = P (H | E)$ if $P (E) \approx 1$ )?

Answer. You'd still have to decompose similarly. Either way, I find CoEE more dynamically intuitive, as a balance between my future possible states of mind. Ultimately, they're both trivial.

Now this "equation" is intentionally (but carefully) conforming with the law, not an equation that's just always conveniently true about your actual reasoning methodology. You merely wish it were true. In other words, it's more like a variable assignment than an equation:

$P (H) \leftarrow P (H | E)$ .

Since each world $w \in E$ as a singleton is disjoint, we can re-write the RHS:

$P (H | E) = \frac{\sum w \in E P (H | {w}) \cdot P (w)}{P (E)}$ .

But for each such $w \in E$ , we have $P (H | {w}) \geq P (H)$ by definition of $E$ . Thus:

$P (H | E) = \frac{\sum w \in E P (H | {w}) \cdot P (w)}{P (E)} \geq \frac{\sum w \in E P (H) \cdot P ({w})}{P (E)} = \frac{P (H) P (E)}{P (E)} = P (H)$ .

Thus, your prior weight on $H$ should be assigned something higher than it was before.

(You could have ignored the $P (E)$ term by assumption, but I avoided doing that so that you don't have to wonder about some funny business about approximations and such. As you can see, it cancels out regardless.)

Conclusion

If you've failed to (almost) equalize this but believe the conditional, then update now to match before a Dutch gentleman finds you.

Examples?

This is a nice (albeit tautological) result, but I don't have any realistic examples. How do you end up in this situation? I'm not even sure everyone would be convinced by the one Eliezer gives:

If you did know how both kinds of entity consumed computations, if you knew about specific machinery for human brains, and specific machinery for AGIs, you'd then be able to see the enormous vast specific differences between them, and go, "Wow, what a futile resource-consumption comparison to try to use for forecasting."

It is perhaps arguable that we might find in most worlds that the machinery is in fact quite similar, where it matters. Personally I'd be surprised by this, but a good example ought to be completely incontrovertible and at least somewhat non-trivial.

Help me out with some?

[-]Zac Hatfield-Dodds2y70

Expected evidence is conserved if almost anything you could observe would update you slightly in the same direction, with the much rarer cases updating you correspondingly much further in the opposite direction.

For what it's worth, I also read this as Eliezer reporting on a case where he later realised that his views were violating CoEE. This is equivalent to observing evidence, realising that your prior should have given it more credence, and then updating in the same direction you moved your prior. Sounds awful but can be reasonable for computationally bounded agents!

[-]nim2y10

This sounds like you're describing questions whose answers you know some things about. There is a subtle but important difference between knowing the answer, versus knowing something about the answer. It sounds like "direction" is metadata about answers, not the answer itself. In programming, I would liken it to the difference between knowing what type a function will return, versus knowing what value it will return when presented with a particular input. The direction is like the type; the answer itself is like the value.

I often get the impression that Eliezer thinks more abstractly than I tend to, so I find it useful to construct several embarrassingly concrete examples which seem to fit what he's talking about and try to triangulate from them toward what he's actually discussing.

What's the most embarrassingly concrete, eli5 example that might highlight what "direction" is? Maybe we breed two organisms within the same species, and wonder what color the offspring will be. I'm reading "direction" as metadata about the answer, so the "direction" of the answer would be the space of possible colors for the species? I know that whatever color I observe on the offspring will be one of the possible colors for the species unless something went wrong -- if it comes out a brand new color, I was wrong about what colors were possible within that species, and if the parents don't reproduce successfully, I was wrong about the offspring existing at all.

[-]holomanga2y10

Any instance of the planning fallacy is an example of this - any setback, regardless of what the specific setback is, would make one expect a project to take longer to do, but by default people predict timelines as if the project will go smoothly even though if you asked them they'd say that they'd expect there to probably be a setback of some form.

LESSWRONG
LW