Kevin Van Horn — LessWrong

This "do" notation may seem mysterious, as it is not part of standard probability theory. However, as Pearl shows in Section 3.2.2 ("Interventions as Variables") of Causality, second edition, all of this can be treated as a notational convenience, as a causal model can be reduced to a certain kind of PGM with variables for interventions, and the "do" notation (or lack of a "do") can be considered to be a statement about the value of an intervention variable.

See also this Powerpoint deck.

Kevin Van Horn8y*10

No, this kind of factorization is used for any probabilistic graphical model (PGM), whether or not it is causal. The difference is that for a causal model an arc from node x to node y additionally indicates that x has a causal influence on y, whereas there is no such assumption in general for PGMs.

Kevin Van Horn8y*10

This example is flawed because the analysis does not condition on all the information you have. The analysis assumes that

P(toxoplasmosis | pet cat, B) > P(toxoplasmosis | not pet cat, B)

where B is your background information. Why should this be so? If B is the background information of an outside observer who does not have access to your inner thoughts and feelings, then this is a plausible claim, because petting or not petting the cat provides evidence as to how fond you are of cats.

But you already know how fond you are of cats. Even if your introspect... (read more)

Kevin Van Horn8y*10

No, the difference between the two sentences lies entirely in the background information assumed. The first sentence implicitly assumes background information B that includes the fact that someone did, in fact, shoot JFK. The second sentence implicitly assumes that we have some sort of structural equation model (as discussed in Pearl's book Causality) from which we can show that JFK must have been shot -- even if we exclude from our background information all events occurring on or after November 22, 1963.

Kevin Van Horn8y*10

The expression P(a_x [ ]-> o_i) is meaningless. Probability theory is an extension of classical propositional logic (CPL) to express degrees of certainty; probabilities are assigned to CPL formulas. "a_x [ ]-> o_i" is not a CPL formula, as there is no counterfactual conditional operator in CPL.

Kevin Van Horn8y*10

This claim is wrong, and the formula is correct. The formula shown is just a special case of the standard formula for the expected value of any random variable.

What is true is that the decision rule of maximizing expected utility can go wrong if you don't condition on all the relevant information when computing expected utility. What the author has written as

P(o_i | a_x) and E[U | a_x]

should actually be written as

P(o_i | a_x, B) and E[U | a_x, B]

where B is all the background information relevant to the problem. Problems with maximizing expected utility arise when B fails to include relevant causal information.

Kevin Van Horn8y*10

Probability theory already tells you how to define that. There are no degrees of freedom left once you've defined the outcome and conditioning information. The only possible area of debate is over what specific information we are conditioning on. If two different analyses get different answers, and they're both using probability theory correctly, then they must be conditioning on DIFFERENT information.

Kevin Van Horn8y*10

This strikes me as pretty shaky reasoning. You've been talking about cases where you have access to the actual decision-making code of the agents you're interacting with, and they have access to yours, and therefore can prove some sort of optimality of a decision-making algorithm. When it comes to voting, none of that applies. We don't even have access to our own decision-making algorithm, much less those of others.

Kevin Van Horn8y*10

No idea what this means.

Kevin Van Horn8y*10

I can't figure out what this paragraph means -- I have no idea what the "et cetera" could be. I'm wondering if "when we have complex systems of..." should be "when we have complex systems such as..."