No, this kind of factorization is used for any probabilistic graphical model (PGM), whether or not it is causal. The difference is that for a causal model an arc from node x to node y additionally indicates that x has a causal influence on y, whereas there is no such assumption in general for PGMs.
This example is flawed because the analysis does not condition on all the information you have. The analysis assumes that
P(toxoplasmosis | pet cat, B) > P(toxoplasmosis | not pet cat, B)
where B is your background information. Why should this be so? If B is the background information of an outside observer who does not have access to your inner thoughts and feelings, then this is a plausible claim, because petting or not petting the cat provides evidence as to how fond you are of cats.
But you already know how fond you are of cats. Even if your introspect...
No, the difference between the two sentences lies entirely in the background information assumed. The first sentence implicitly assumes background information B that includes the fact that someone did, in fact, shoot JFK. The second sentence implicitly assumes that we have some sort of structural equation model (as discussed in Pearl's book Causality) from which we can show that JFK must have been shot -- even if we exclude from our background information all events occurring on or after November 22, 1963.
This claim is wrong, and the formula is correct. The formula shown is just a special case of the standard formula for the expected value of any random variable.
What is true is that the decision rule of maximizing expected utility can go wrong if you don't condition on all the relevant information when computing expected utility. What the author has written as
P(o_i | a_x) and E[U | a_x]
should actually be written as
P(o_i | a_x, B) and E[U | a_x, B]
where B is all the background information relevant to the problem. Problems with maximizing expected utility arise when B fails to include relevant causal information.
Probability theory already tells you how to define that. There are no degrees of freedom left once you've defined the outcome and conditioning information. The only possible area of debate is over what specific information we are conditioning on. If two different analyses get different answers, and they're both using probability theory correctly, then they must be conditioning on DIFFERENT information.
This strikes me as pretty shaky reasoning. You've been talking about cases where you have access to the actual decision-making code of the agents you're interacting with, and they have access to yours, and therefore can prove some sort of optimality of a decision-making algorithm. When it comes to voting, none of that applies. We don't even have access to our own decision-making algorithm, much less those of others.
This "do" notation may seem mysterious, as it is not part of standard probability theory. However, as Pearl shows in Section 3.2.2 ("Interventions as Variables") of Causality, second edition, all of this can be treated as a notational convenience, as a causal model can be reduced to a certain kind of PGM with variables for interventions, and the "do" notation (or lack of a "do") can be considered to be a statement about the value of an intervention variable.
See also this Powerpoint deck.