The problem is indeed that $P (B | A)$ is insufficient to compute a unique counterfactual---additional causal information is needed. Pearl's approach is to specify each observable variable as a deterministic function of its parents in the causal graph. Any uncertainty must be represented by a set of "exogenous" variables $U$ , which can feature in the functions for the observables. (See chapter 7 of Causality, or also An Axiomatic Characterization of Causal Counterfactuals.)

For example, your first process could be represented by the following causal model:

$A (X) = X B (A, Y) = \neg (A \oplus Y) P (X) = p P (Y) = 0.75$

The other processes might have different structures, equations, and distributions $P (X, Y)$ ---it's not possible in general to distinguish these purely from the distribution $P (A, B)$ .