So I'd rather say that we "affect nothing but what we intervene on and what's downstream of what we intervened on".
A fair clarification.
Not sure whether this has anything to do with your point, though.
My point is very tangential to your post: you're talking about decision theory as top-level naturalized ways of making decisions, and I'm talking about some non-top-level intuitions that could be called CDT-like. (This maybe should've been a comment on your Dutch book post.) I'm trying to contrast the aspirational spirit of CDT, understood as "make it so that there's such a thing as 'all of what's downstream of what we intervened on' and we know about it", with descriptive CDT, "there's such a thing as 'all of what's downstream of what we intervened on' and we can know about it". Descriptive CDT is only sort of right in some contexts, and can't be right in some contexts; there's no fully general Arcimedean point from which we intervene.
We can make some things more CDT-ish though, if that's useful. E.g. we could think more about how our decisions have effects, so that we have in view more of what's downstream of decisions. Or e.g. we could make our decisions have fewer effects, for example by promising to later reevaluate some algorithm for making judgements, instead of hiding within our decision to do X also our decision to always use the piece-of-algorithm that (within some larger mental context) decided to do X. That is, we try to hold off on decisions that have downstream effects we don't understand well yet.
The specifications would correctly capture what-we-actually-mean, so they wouldn't be prone to goodhart
I think there's an ambiguity in "concept" here, that's important to clarify re/ this hope. Humans use concepts in two ways:
1. as abstractions in themselves, like the idea of an ideal spring which contains its behavior within the mental object, and
2. as pointers / promissory notes towards the real objects, like "tree".
Seems likely that any agent that has to attend to trees, will form the ~unique concept of "tree", in the sense of a cluster of things, and minimal sets of dimensions needed to specify the relevant behavior (height, hardness of wood, thickness, whatever). Some of this is like use (1): you can simulate some of the behavior of trees (e.g. how they'll behave when you try to cut them down and use them to build a cabin). Some of this is like use (2): if you want to know how to grow trees better, you can navigate to instances of real trees, study them to gain further relevant abstractiosn, and then use those new abstractions (nutrient intake, etc.) to grow trees better.
So what do we mean by "strawberry", such that it's not goodhartable? We might mean "a thing that is relevantly naturally abstracted in the same way as a strawberry is relevantly naturally abstracted". This seems less goodhartable if we use meaning (2), but that's sort of cheating by pointing to "what we'd think of these strawberrys upon much more reflection in many more contexts of relevance". If we use meaning (1), that sems eminently goodhartable.
>There is no continuum of tree-like abstractions.
Some possibly related comments, on why there might be discrete clusters:
From a superrational perspective (in the game with no randomness), in both cases there's two actions; in the correlation game both actions give a util, in the anti-correlation game both actions give no utils. The apparent difference is based on the incoherent counterfactual "what if I say heads and my copy says tails", which doesn't translate into the superrational perpective.
(Side note: There's an aspect to the notion of "causal counterfactual" that I think it's worth distinguishing from what's discussed here. This post seems to take causal counterfactuals to be a description of top-level decision reasoning. A different meaning is that causal counterfactuals refer to an aspiration / goal. Causal interventions are supposed to be interventions that "affect nothing but what's explicitly said to be affected". We could try to describe actions in this way, carefully carving out exactly what's affected and what's not; and we find that we can't do this, and so causal counterfactuals aren't, and maybe can't possibly, be a good description (e.g. because of Newcomb-like problems). But instead we could view them as promises: if I manage to "do X and only X" then exactly such and such effects result. In real life if I actually do X there will be other effects, but they must result from me having done something other than just exactly X. This seems related to the way in which humans know how to express preferences data-efficiently, e.g. "just duplicate this strawberry, don't do any crazy other stuff".)
>Surely there's some precise way the universe is.
Agree, and would love to see a more detailed explicit discussion of what this means and whether it's true. (Also, worth noting that there may be a precise way the universe is, but no "precise" way that "you" fit into the universe, because "you" aren't precise.)
--Human brains have special architectures, various modules that interact in various ways (priors?)
--Human brains don't use Backprop; maybe they have some sort of even-better algorithm
This is a funny distinction to me. These things seem like two ends of a spectrum (something like, the physical scale of "one unit of structure"; predictive coding is few-neuron-scale, modules are big-brain-chunk scale; in between, there's micro-columns, columns, lamina, feedback circuits, relays, fiber bundles; and below predictive coding there's the rules for dendrite and synapse change).
I wouldn't characterize my own position as "we know a lot about the brain." I think we should taboo "a lot."
I think there's mounting evidence that brains use predictive coding
Are you saying, there's mounting evidence that predictive coding screens off all lower levels from all higher levels? Like all high-level phenomena are the result of predictive coding, plus an architecture that hooks up bits of predictive coding together?
It is implausible that human beings' cognitive instincts contain significantly more information than the human genome (750 megabytes). I expect our instincts contain much less.
Our instincts contain pointers to learning from other humans, which contain lots of cognitive info. The pointer is small, but that doesn't mean the resulting organism is algorithmically that simple.
__Levers error__. Anna writes about bucket errors . Attempted summary: sometimes two facts are mentally tracked by only one variable; in that case, correctly updating the belief about one fact can also incorrectly update the belief about the other fact, so it is sometimes epistemic to flinch away from the truth of the first fact (until you can create more variables to track the facts separately). There's a conjugate error: two actions are bound together in one "lever". For example, I want to clean my messy room. But somehow it feels pointless / tiring, even before I've started. If I just started cleaning anyway, I'd get bogged down in some corner, trying to make a bunch of decisions about where exactly to put lots of futzy random objects, tiring myself out and leaving my room still annoyingly cluttered. It's not that there's a necessary connection between cleaning my room and futzing around inefficiently; it's that the only lever I have right now that activates the "clean room" action also activates the "futz interminably" action.What I want instead is to create a lever that activates "clean room" but not "futz", e.g. by explicitly noting the possibility to just put futzy stuff in a box and not deal with it more. When I do that, I feel motivated to clean my messy room. I think this explains some "akrasia".The general pattern: I want to do X to acheive some goal, but the only way (that I know how right now) to do X is if I also do Y, and doing Y in this situation would be bad. Flinching away from action toward a goal is often about protecting your goals.
Generally, apprenticeships should have planned obsolescence. A pattern I've seen in myself and others: A student takes a teacher. They're submissive, in a certain sense--not giving up agency, or harming themselves, or following arbitrarily costly orders, or being overcredulous; but rather, a narrow purely cognition-allocating version of assuming a low-status stance: deferring to local directions of attention by the teacher, provisionally accepting some assumptions, taking a stance of trying to help the teacher with the teacher's work. This is good because it enhances the bandwidth and depth of transmission of tacit knowledge from the teacher. But, for many students, it shouldn't be the endpoint of their development. At some point they should be questioning all assumptions, directing their attention and motivation on all levels, being the servant of their own plans. When this is delayed, e.g. the teacher or the student or something else is keeping the student within a fixed submissive role, the student is stunted, bitter, wasted, restless, jerked around, stagnant. In addition to lines of retreat from social roles, have lines of developmental fundamental change.