Speaking personally, I think something like #1 is true on the grounds that I have seen many cases of white Australian people, often with considerable power, acting in excessively patronising and authoritarian ways towards Aboriginal people and I have no difficulty believing that similar things happen in the US.
However, I also do not think that racial disparities in outcomes are almost all caused by #1; in fact I think that probably less than 50% of almost any particular disparity is caused by #1. Thus, I think that outcome disparities are at best weak evidence for #1. Many people (notably Ibram X Kendi) say that in fact they are. I actually believe that the theory underlying this claim causes some of the authoritarian behaviour I observe. I think people reason something like this: - We don't want to be racist - Differences in outcome indicate racism - We must eliminate differences in outcome - Eliminating differences in outcome requires substantial behavioural changes on the part of Aboriginal people
- Authoritarian strategies are the most reliable way we have to induce substantial behavioural changes
I think that overly authoritarian policy is often harmful.
I don't know if DiAngelo endorses this claim - that outcome disparities are almost all caused by #1 - but claims like "being white is to know privilege" make me suspect that to some extent she is also reasoning backwards from outcome disparities to the existence of racismS. I think this is a big mistake!
I also think, with less confidence, that DiAngelo is not really popularising this theory but is rather explaining a theory that is already popular. Perhaps many people, like myself, think that this theory is flawed and that it is unfortunate that it is so popular. However, I suspect that they are making a mistake blaming DiAngelo for this. Criticism of her book could be a stand-in for criticism of this theory in general.
Maybe taking it further, I think that it's possible that reasoning backwards from outcome disparities to racismS yields a flawed theory of what racismS is, because it's a flawed inference to begin with. This might be why many people take issue with racismS rather than the premise (outcome disparities -> racism), even though my best guess is that the premise comes first.
I think the motivation for the representability of some sets of conditional independences with a DAG is pretty clear, because people already use probability distributions all the time, they sometimes have conditional independences and visuals are nice.On the other hand the fundamental theorem relates orthogonality to independences in a family of distributions generated in a particular way. Neither of these things are natural properties of probability distributions in the way that conditional independence is. If I am using probability distributions, it seems to me I'd rather avoid introducing them if I can. Even if the reasons are mysterious, it might be useful to work with models of this type - I was just wondering if there were reasons for doing that are apparent before you derive any useful results.
Alternatively, is it plausible that you could derive the same results just using probability + whatever else you need anyway? For example, you could perhaps define X to be prior to Y if, relative to some ordering of functions by "naturalness", there is a more natural f(X,Y) such that X⊥f(X,Y) and X⊥/f(X,Y)|Y than any g(X,Y) such that Y⊥g(X,Y) etc. I have no idea if that actually works!However, I'm pretty sure you'll need something like a naturalness ordering in order to separate "true orthogonality" from "merely apparent orthogonality", which is why I think it's fair to posit it as an element of "whatever else you need anyway". Maybe not.
I don't understand the motivation behind the fundamental theorem, and I'm wondering if you could say a bit more about it. In particular, it suggests that if I want to propose a family of probability distributions that "represent" observations somehow ("represents" maybe means in the sense of Bayesian predictions or in the sense of frequentist limits), I want to also consider this family to arise from some underlying mutually independent family along with some function. I'm not sure why I would want to propose an underlying family and a function at all, and even if I do I'm not sure why I want to suppose it is mutually independent.
One thought I had is that maybe this underlying family of distributions on S is supposed to represent "interventions". The reasoning would be something like: there is some set of things that fully control my observations that I can control independently and which also vary independently on their own. I don't find this convincing, though - I don't see why independent controllability should imply independent variation.
Another thought I had was that maybe it arises from some kind of maximum entropy argument, but it's not clear why I would want to maximise the entropy of a distribution on some S for every possible configuration of marginals.
Also, do you know how your model relates to structural equation models with hidden variables? Your factored set S looks a lot like a set of independent "noises", and the function f:S->Y looks a lot like a set of structural equations and I think it's straightforward to introduce hidden variables as needed to account for any lossiness. In particular, given a model and a compatible orthogonality database, I can construct an SEM by taking all the variables that appear in the database and defining the structural equation for X to be X:=X∘f. However, the set of all SEMs that are compatible with a given orthogonality database I think is larger than the set of all FFS models that are similarly compatible. This is because SEMs (in the Pearlean sense) can be distinct even if they have "apparently identical" structural equations. For example, X:=1,Y:=Xand X:=1,Y:=1 are distinct because interventions on X will have different results, while my understanding is that they will correspond to the same FFS model.
Your result 2e looks interestingly similar to the DAG result that says X⊥Z and X⊥/Z|Y implies something like X→Y←Z (where ⊥ is d-separation). In fact, I think it extends existing graph learning algorithms: in addition to checking independences among the variables as given, you can look for independences between any functions of the given variables. This seems like it would give you many more arrows than usual, though I imagine it would also increase your risk of spurious indepdendences. In fact, I think this connects to approaches to causal identification like regression with subsequent independence test: if X is independent of Y−E[Y|X], we prefer X→Y, and if Y is independent of X−E[X|Y], prefer Y→X.