I do remember a bunch of content around that, yeah. And I would agree that terminal goals are arbitrary in the sense that they could be anything. But, for any given agent/organism/"thing that wants stuff", there will be a fact-of-the-matter of what terminals goals got instantiated inside that thing.
There are also a few separate but related and possibly confusing facts;
Hm, I'm not sure about Mere Goodness, I read the sequences soon after they were finished, so I don't much remember which concepts were where. There is a sequence post titled Terminal Values and Instrumental Values, though it mostly seems to be emphasizing that both things exist and are different, saving the rest of the content for other posts.
(Note that What DALL-E 2 can and cannot do is not in the top 100 when inflation-adjusted.)
Morality. To me it seems like rationality can tell you how to achieve your goals but not what (terminal) goals to pick. Arguments that try to tell you what terminal goals to pick have just never made sense to me. Maybe there's something I'm missing though.
Okay, I'll bite on this one.
The very thing that distinguishes terminal goals is that you don't "pick" them, you start out with them. They are the thing that gives the concept of "should" a meaning.
A key thing the orthogonality thesis is saying that it is perfectly possible to have any terminal goals, and that there's no such thing as a "rational" set of terminal goals to have.
If you have terminal goals, then you may still need to spend a lot of time introspecting to figure out what they are. If you don't have terminal goals, then the concept of "should", and morality in general, cannot be made meaningful for you. People often consider themselves to be "somewhere in between", where they're not a perfect encoding of some unchangeable terminal values, but there is still a strong sense in which they want stuff for its own sake. I would consider nailing down exactly how these in-between states work to be part of agent foundations.
I'd strongly encourage you to split this post up into a sequence! I think it improves readability (and strongly increases engagement).
I just remembered that we can tag users now; I'll try tagging @evhub to check with his opinion.
I found the beginning of this post very confusing because you don't seem to be at all acknowledging that the Speed Prior is this specific idea created in 2000 long before AI alignment was a field. (It doesn't seem like you even reference this paper in the post?) Early in the post, right under the heading "What is the speed prior and why do we care about it?" you say,
The speed prior is a potential technique for combating formation of deceptive alignment.
This is a true statement about the Speed Prior, but it's not what it is, and it's emphatically not why it was conceived; instead this is a statement of why we (the alignment community) care about it.
My guess about what happened here would be something like;
I think this is a great idea for the alignment community to be developing, but we should do so under a term that doesn't already refer to something specific outside our field. (I think most of my objection would be ameliorated if you consistently use "a speed prior" and "speed priors".) I'm not too much of a stickler for freezing the usage of terms, but I was genuinely confused by this usage, and I suspect that other alignment researchers would be too.
Okay so this post is great, but just want to note my confusion, why is it currently the 10th highest karma post of all time?? (And that's inflation-adjusted!)
Oh, yeah, that's totally fair. I agree that a lot of those writings are really valuable, and I've been especially pleased with how much Nate has been writing recently. I think there are a few factors that contributed to our disagreement here;
Anyway my overall reason for saying that was to argue that it's reasonable for people to have been updating in the "MIRI giving up" direction long before Death With Dignity.
I really appreciate how clear and concise this post is.