Counterfactuals, thick and thin

[-]Dacyn7yΩ370

The question "how would the coin have landed if I had guessed tails?" seems to me like a reasonably well-defined physical question about how accurately you can flip a coin without having the result be affected by random noise such as someone saying "heads" or "tails" (as well as quantum fluctuations). It's not clear to me what the answer to this question is, though I would guess that the coin's counterfactual probability of landing heads is somewhere strictly between 0% and 50%.

[-]Nisan7yΩ230

Oh, interesting. Would your interpretation be different if the guess occurred well after the coinflip (but before we get to see the coinflip)?

[-]Dacyn7yΩ250

Sure, in that case there is a 0% counterfactual chance of heads, your words aren't going to flip the coin.

[-]Nisan7yΩ240

Ok. I think that's the way I should have written it, then.

[-]Shmi7y30

I agree that is is a well-defined question, though not easily answered without knowing how guessing physically affects flipping the coin, reading the results (humans are notoriously prone to making mistakes like that) and so on. But I suspect that Nisan is asking something else, though I am not quite sure what. The post says

In real life, we have a causal model of the world that tells us that the first counterfactual is correct. But we don't have anything like that for logical uncertainty; the best we have is logical induction, which just give us a joint distribution.

I am not sure how physical uncertainty is different from logical uncertainty, maybe there are some standard examples there that could help the uninitiated like myself.

[-]Vaniver7y30

If we have an ordering over logical sentences, such that we can look at two sentences and determine at most one of (A is simpler than B), (B is simpler than A), then it seems natural to privilege the counterfactual that keeps the simpler term constant (and likely that this ordering is such that you never have to choose between counterfactuals at the same level of simplicity).

This doesn't fully solve the problem--now I have a concept of thickness that's predicated on an ordering, and the ordering is (in some sense) arbitrary for the reasons noted elsewhere (I could define B = A xor C as the ground term, which makes A = B xor C now a composite term). But it seems (to me) like the important thing is being able to build a model that doesn't allow cyclical behavior at all. Afterwards, one can check to see whether or not the ordering matters (and if so, try to figure out the criteria that make for a good ordering), or view it as arbitrary in approximately the way that axiom sets are arbitrary.

[-]Chris_Leong7y30

Thanks for this post, it's really helpful. I would really like to understand the maths in this post, is there anywhere which describes this in more detail? In particular, I can't follow:

Why are probabilities being permutated?
What kind of kernel are you referring to?

[-]Nisan7y50

The definition involving the permutation is a generalization of the example earlier in the post: $ϕ (T)$ is the identity and $ϕ (H)$ swaps heads and tails. And $X = ϕ (A)^{- 1} (C)$ . In general, if you observe $A = a$ and $C = c$ , then the counterfactual statement is that if you had observed $A = a^{'}$ , then you would have also observed $C = ϕ (a^{'}) (ϕ (a)^{- 1} (c))$ .

I just learned about probability kernels thanks to user Diffractor. I might be using them wrong.

[-]SarahNibs7y20

I don't know enough math to understand whether you've covered this in your examples, but here's my intuition in the form of typing without a lot of reflection or editing okay disclaimer over:

If we have two variables, A and C, and we're considering A, C, and (A xor C), it sounds to me like we've privileged things arbitrarily in some sense... relabeling them A, B, and C it's clear that we could have pivoted to consider any two of them the "base" variables and the third the "xor'd" variable, so there should be no preferred counterfactual. It's a loopy cause, a causal diagram that's not a DAG. Which doesn't show up IRL. Like going back in time to kill grandpa.

But we often pretend they occur by abstracting time and saying steady-state is a thing (or steady-states, and we're looking at the map of transitions) and then we get loops and start studying feedback and whatnot. But if you unpacked any of those loops you'd get a very-very-repetitive DAG that looks a lot like the initial diagram copied over and over with one-way arrows from copy to copy.

Seems like there are three options to deal with {A,B,C}. They are isomorphic to each other, so in some sense we shouldn't be able to say which counterfactuals to use. We could:

do our modeling relative to a specified imposed ordering of all variables, which seems really hard, or
somehow calculate all possible results and average over permutations, which seems either factorially harder or much easier depending on Math!, or
assume there is hidden structure, that A, B, and C are abstractions atop a real DAG, and use a not-known-to-me mathematics of loopy causation to define something other than counterfactuals atop the variables, calling counterfactuals over A, B, C a sort of type error.

[-]Chris_Leong7y20

I'm not following what you're saying about loopy causation. How are you constructing this graph?

[-]Nisan7y20

That sounds about right to me. I think people have taken stabs at looking for causality-like structure in logic, but they haven't found anything useful.

LESSWRONG
LW

LESSWRONG
LW

28

Counterfactuals, thick and thin

28

Ω 8

28

Ω 8