It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there's no point at which you want to be VNM, and in some senses you don't even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn't revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
I very much agree that structurally what matters a lot, but that seems like half the battle to me.
But somehow this topic is not afforded much care or interest. Some people will pay lip service to caring, others will deny that mental states exist, but either way the field of alignment doesn't put much force (money, smart young/new people, social support) toward these questions. This is understandable, as we have much less legible traction on this topic, but that's... undignified, I guess is the expression.
a sufficiently intelligent AI system that does understand that relationship will be able to exploit the extra degrees of freedom in the lower level ontology to our disadvantage, and we won’t be able to see it coming.
Even if you do understand the lower level, you couldn't stop such an adversarial AI from exploiting it, or exploiting something else, and taking control. If you understand the mental states (yeah, the structure), then maybe you can figure out how to make an AI that wants to not do that. In other words, it's not sufficient, and probably not necessary / not a priority.
I'm unsure whether it's a good analogy. Let me make a remark, and then you could reask or rephrase.
The discovery that the phenome is largely a result of the genome, is of course super important for understanding and also useful. The discovery of mechanically how (transcribe, splice, translate, enhance/promote/silence, trans-regulation, ...) the phenome is a result of the genome is separately important, and still ongoing. The understanding of "structurally how" characters are made, both in ontogeny and phylogeny, is a blob of open problems (evodevo, niches, ...). Likewise, more simply, "structurally what"--how to even think of characters. Cf. Günter Wagner, Rupert Riedl.
I would say the "structurally how" and "structurally what" is most analogous. The questions we want to answer about minds aren't like "what is a sufficient set of physical conditions to determine--however opaquely--a mind's effects", but rather "what smallish, accessible-ish, designable-ish structures in a mind can [understandably to us, after learning how] determine a mind's effects, specifically as we think of those effects". That is more like organology and developmental biology and telic/partial-niche evodevo (<-made up term but hopefully you see what I mean).
https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html
why you are so confident in these "defeaters"
More than any one defeater, I'm confident that most people in the alignment field don't understand the defeaters. Why? I mean, from talking to many of them, and from their choices of research.
People in these fields understand very well the problem you are pointing towards.
I don't believe you.
if the alignment community would outlaw mechinterp/slt/ neuroscience
This is an insane strawman. Why are you strawmanning what I'm saying?
I dont think progress on this question will be made by blanket dismissals
Progress could only be made by understanding the problems, which can only be done by stating the problems, which you're calling "blanket dismissals".
I.e. a training technique? Design principles? A piece of math ? Etc
All of those, sure? First you understand, then you know what to do. This is a bad way to do peacetime science, but seems more hopeful for
I think I am asking a very fair question.
No, you're derailing from the topic, which is the fact that the field of alignment keeps failing to even try to avoid / address major partial-consensus defeaters to alignment.
This is a derail. I can know that something won't work without knowing what would work. I don't claim to know something that would work. If you want my partial thoughts, some of them are here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
In general, there's more feedback available at the level of "philosophy of mind" than is appreciated.
They aren't close to the right kind of abstraction. You can tell because they use a low-level ontology, such that mental content, to be represented there, would have to be homogenized, stripped of mental meaning, and encoded. Compare trying to learn about arithmetic, and doing so by explaining a calculator in terms of transistors vs. in terms of arithmetic. The latter is the right level of abstraction; the former is wrong (it would be right if you were trying to understand transistors or trying to understand some further implementational aspects of arithmetic beyond the core structure of arithmetic).
What I'm proposing instead, is theory.
I'm not sure what "concrete" is supposed to mean; for the one or two senses I immediately imagine, no, I would say the feedback is indeed concrete. In terms of consensus/outcome, no, I think the feedback is actually concrete. There is a difficulty, which is that there's a much smaller set of people to whom the outcomes are visible.
As an analogy/example: feedback in higher math. It's "nonconcrete" in that it's "just verbal arguments" (and translating those into something much more objective, like a computer proof, is a big separate long undertaking). And there's a much smaller set of people who can tell what statements are true in the domain. There might even be a bunch more people who have opinions, and can say vaguely related things that other non-experts can't distinguish from expert statements, and who therefore form an apparent consensus that's wrong + ungrounded. But one shouldn't conclude from those facts that math is less real, or less truthtracking, or less available for communities to learn about directly.