Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
I'll take a stab at answering the questions for myself (fairly quick takes):
Thanks, especially like vague/incorrect labels to refer to that mismatch. Well-posed Q by Garrabrant, might touch on that in my next post.
Proof that a model is an optimizer says very little about the model. I do not know what a research group is studying outer alignment is studying. Inner alignment seems to cover the entire problem at the limit. Whether an optimizer is mesa or not depends on your point of view. These terms seem to be a magnet for confusion and debate. I have to do background reading on someone to even understand what claim they're making. These are all indicators that we are using the wrong terms.
What are we actually pointing at? What questions do we want answered?
I'll post my answers to these questions in a couple days but I'm curious how other people slice it. Does "inner alignment failure" mean anything or do we need to point more directly?