I somewhat expect your response will be "why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system", to which I would say that you are not in the intended audience.
Ok, this is a fair answer. I think you and I, at least, are basically aligned here.
I do think a lot of people took away from your post something like "all behavior can be rationalized as EU maximization", and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can't fault you much for some of your readers not paying sufficiently close attention, especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.
I assume this is my comment + post
I was referring mainly to Richard's post here. You do seem to understand the issue of assuming (rather than deriving) probabilities.
I discussed VNM specifically because that's the best-understood coherence theorem and the one that I see misused in AI alignment most often.
This I certainly agree with.
I don't know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.
Exactly which objection are you talking about here?
If it's something like "coherence theorems do not say that tool AI is not a thing", that seems true. Even today humans have plenty of useful tools with some amount of information processing in them which are probably not usefully model-able as expected utility maximizers.
But then you also make claims like "all behavior can be rationalized as EU maximization", which is wildly misleading. Given a system, the coherence theorems map a notion of resources/efficiency/outcomes to a notion of EU maximization. Sure, we can model any system as an EU maximizer this way, but only if we use a trivial/uninteresting notion of resources/efficiency/outcomes. For instance, as you noted, it's not very interesting when "outcomes" refers to "universe-histories". (Also, the "preferences over universe-histories" argument doesn't work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.)
Combining these points: your argument largely seems to be "coherence arguments apply to any arbitrary system, therefore they don't tell us interesting things about which systems are/aren't <agenty/dangerous/etc>". (That summary isn't exactly meant to pass an ITT, but please complain if it's way off the mark.) My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren't <agenty/dangerous/etc>. There may be good arguments for why coherence theorems are the wrong way to think about goal-directedness, but "everything can be viewed as EU maximization" is not one of them.
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn't seem all that relevant to the specific, narrow setting which I was considering.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an e-coli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an e-coli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about e-coli and/or selection pressures, and possibly point to better coherence theorems).
I actually think it shouldn't be in the alignment section, though for different reasons than Rohin. There's lots of things which can be applied to AI, but are a lot more general, and I think it's usually better to separate the "here's the general idea" presentation from the "here's how it applies to AI" presentation. That way, people working on other interesting things can come along and notice the idea and try to apply it in their own area rather than getting scared off by the label.
For instance, I think there's probably gains to be had from applying coherence theorems to biological systems. I would love it if some rationalist biologist came along, read Yudkowsky's post, and said "wait a minute, cells need to make efficient use of energy/limited molecules/etc, can I apply that?". That sort of thing becomes less likely if this sort of post is hiding in "the alignment section".
Zooming out further... today, alignment is the only technical research area with a lot of discussion on LW, and I think it would be a near-pareto improvement if more such fields were drawn in. Taking things which are alignment-relevant-but-not-just-alignment and lumping them all under the alignment heading makes that less likely.
My understanding is that the usual lab mouse breeds are highly inbred, resulting in high levels of cancer. That makes is "easier", in some sense, to extend their lifespans - especially by interventions which trade off cancer risk against other age-related deterioration. For instance, there are ways to make cells more sensitive to DNA damage, so they undergo senescence at lower damage levels. This can decrease cancer risk, at the cost of accelerating other age-related degeneration.
The key problem is... sometimes you actually just do need to have status fights, and you still want to have as-good-epistemics-as-possible given that you're in a status fight. So a binary distinction of "trying to have good epistemics" vs "not" isn't the right frame.
Part of my model here is that moral/status judgements (like "we should blame X for Y") like to sneak into epistemic models and masquerade as weight-bearing components of predictions. The "virtue theory of metabolism", which Yudkowsky jokes about a few times in the sequences, is an excellent example of this sort of thing, though I think it happens much more often and usually much more subtly than that.
My answer to that problem on a personal level is to rip out the weeds wherever I notice them, and build a dome around the garden to keep the spores out. In other words: keep morality/status fights strictly out of epistemics in my own head. In principle, there is zero reason why status-laden value judgements should ever be directly involved in predictive matters. (Even when we're trying to model our own value judgements, the analysis/engagement distinction still applies.)
Epistemics will still be involved in status fights, but the goal is to make that a one-way street as much as possible. Epistemics should influence status, not the other way around.
In practice it's never that precise even when it works, largely because value connotations in everyday language can compactly convey epistemically-useful information - e.g. the weeds analogy above. But it's still useful to regularly check that the value connotations can be taboo'd without the whole model ceasing to make sense, and it's useful to perform that sort of check automatically when value judgements play a large role.
Not exactly an answer to your question, but this post is probably relevant and has several short-but-concrete examples.
I've wanted for a while to see a game along these lines. It would have some sort of 1-v-1 fighting, but dominated by "random" behavior from environmental features and/or unaligned combatants. The centerpiece of the game would be experimenting with the "random" components to figure out how they work, in order to later leverage them in a fight.
Fleshing this out a bit more, within the framework of this comment: when we can consistently predict some outcomes using only a handful of variables, we've learned a (low-dimensional) constraint on the behavior of the world. For instance, the gas law PV = nRT is a constraint on the relationship between variables in a low-dimensional summary of a high-dimensional gas. (More precisely, it's a template for generating low-dimensional constraints on the summary variables of many different high-dimensional gases.)
When we flip perspective to problems of design (e.g. engineering), those constraints provide the structure of our problem - analogous to the walls in a maze. We look for "paths in the maze" - i.e. designs - which satisfy the constraints. Duality says that those designs act as constraints when searching for new constraints (i.e. doing science). If engineers build some gadget that works, then that lets us rule out some constraints: any constraints which would prevent the gadget from working must be wrong.
Data serves a similar role (echoing your comment here). If we observe some behavior, then that provides a constraint when searching for new constraints. Data and working gadgets live "in the same space" - the space of "paths": things which definitely do work in the world and therefore cannot be ruled out by constraints.
I detect the ghost of Jaynes in this!
In particular, the view in this post is extremely similar to the view in Macroscopic Prediction. As there, reproducible phenomena are the key puzzle piece.