So I see two possible interpretations of traditional Dutch books:
I disagree, I don't think it's a simple binary thing. I don't think Dutch book arguments in general never apply to recursive things, but it's more just that the recursion needs to be modelled in some way, and since your OP didn't do that, I ended up finding the argument confusing.
The standard dutch book arguments would apply to the imp. Why should you be in such a different position from the imp?
I don't think your argument goes through for the imp, since it never needs to decide its action, and therefore the second part of selling the contract back never comes up?
For example, multiply the contract payoff by 0.001.
Hmm, on further reflection, I had an effect in mind which doesn't necessarily break your argument, but which increases the degree to which other counterarguments such as AlexMennen's break your argument. This effect isn't necessarily solved by multiplying the contract payoff (since decisions aren't necessarily continuous as a function of utilities), but it may under many circumstances be approximately solved by it. So maybe it doesn't matter so much, at least until AlexMennen's points are addressed so I can see where it fits in with that.
This, again, seems plausible if the payoff is made sufficiently small.
How do you make the payoff small?
This is actually very similar to traditional Dutch-book arguments, which treat the bets as totally independent of everything.
Isn't your Dutch-book argument more recursive than standard ones? Your contract only pays out if you act, so the value of the dutch book causally depends on the action you choose.
So the overall expectation is P(Act=a)⋅(cdt(a)−edt(a)).
Wouldn't it be P(Act=a|do(buy B)) rather than P(Act=a)? Like my thought would be that the logical thing for CDT would be to buy the contract and then as a result its expected utilities change, which leads to its probabilities changing, and as a result it doesn't want to sell the contract. I'd think this argument only puts a bound on how much cdt and edt can differ, rather than on whether they can differ at all. Very possible I'm missing something though.
Directing a robot using motor actions and receiving camera data (translated into text I guess to not make it maximally unfair, but still) to make a cup of tea in a kitchen.
Did it memorize the way to beat the levels, or did it learn a generalized method of beating Helltaker?
Exciting stuff. One thing I suspect is that you'll need some different account for abstractions in the presence of agency/optimization than abstractions that deal with unoptimized things, because agency implies "conspiracies" where many factors may all work together to achieve something.
Like your current point about "information at a distance" probably applies to both, but the reasons that you end up with information at a distance likely differ; with non-agency phenomena, there's probably going to be some story based on things like thermodynamics, averages over large numbers of homogenous components, etc., while agency makes things more complex.
Suppose that the prosecutor has some random noise in their charges, such that they sometimes overcharge a bunch and sometimes undercharge a bunch. In that case it seems reasonable to suppose that things are more likely to go to court when the prosecutor is overcharging and the accused therefore thinks they can get more of the accusations dropped. But this would mean that the prosecutors are evaluated on a subset of the charges that are systematically too high, and therefore to compensate they end up lowering their assessed probabilities below what is actually counterfactually accurate if people just went to court about it always.
I don't know how big a problem this would be, but it seems like something that would be good to evaluate in the proposal.
The ability of the prosecutor to accurately access the likelihood can be measured via the Briers score or a Log score.
Scoring predictions requires knowing the outcomes. But wouldn't the outcome depend on whether the accused takes plea deals and such?
I've been thinking a lot about differences between people for... arguably most of my life, but especially the past few years. One thing I find interesting is that parts of your abstraction/chaos relationship don't seem to transfer as neatly to people. More specifically, what I have in mind is to elements:
People carry genes around, and these genes can have almost arbitrary effects that aren't wiped away by noise over time because the genes persist and are copied digitally in their bodies.
It seems to me that agency "wants" to resist chaos? Like if some sort of simple mechanical mechanism creates something, then the something easily gets moved away by external forces, but if a human creates something and wants to keep it, then they can place it in their home and lock their door and/or live in a society that respects private property. (This point doesn't just apply to environmental stuff like property, but also to biological stuff.)
Individual differences often seem more "amorphous" and vague than you get elsewhere, and I bet elements like the above play a big role in this. The abstraction/chaos post helps throw this into sharp light.
If there is really both reverse causation and regular causation between Xr and Y, you have a cycle, and you have to explain what the semantics of that cycle are (not a deal breaker, but not so simple to do. For example if you think the cycle really represents mutual causation over time, what you really should do is unroll your causal diagram so it's a DAG over time, and redo the problem there).
I agree, but I think this is much more dependent on the actual problem that one is trying to solve. There's tons of assumptions and technical details that different approaches use, but I'm trying to sketch out some overview that abstracts over these and gets at the heart of the matter.
(There might also be cases where there is believed to be a unidirectional causal relationship, but the direction isn't know.)
The real question is, why should Xc be unconfounded with Y? In an RCT you get lack of confounding by study design (but then we don't need to split the treatment at all). But this is not really realistic in general -- can you think of some practical examples where you would get lucky in this way?
Indeed that is the big difficulty. Considering how often people use these methods in social science, it seems like there is some general belief that one can have Xc be unconfounded with Y, but this is rarely proven and seems often barely even justified. It seems to me that the general approach is to appeal to parsimony and assume that if you can't think of any major confounders, then they probably don't exist.
This obviously doesn't work well. I think people find it hard to get an intuition for how poorly it works, and I personally found that it made much more sense to me when I framed it in terms of the "Know your Xc!" point; the goal shouldn't be to think of possible confounders, but instead to think of possible nonconfounded variance. I also have an additional blog post in the works arguing that parsimony is empirically testable and usually wrong, but it will be some time before I post this.