I give them a lot of credit for, to my eyes, realising this was a big deal way earlier than almost anyone else, doing a lot of early advocacy, and working out some valuable basic ideas, like early threat models, ways in which standard arguments and counter-arguments were silly, etc. I think this kind of foundational work feels less relevant now, but is actually really hard and worthwhile!
(I don't see much recent stuff I'm excited about, unless you count Risks from Learned Optimisation)
I think most every aspiring conceptual alignment researcher should read basically all of the work on Arbital's AI alignment section. Not all of it is right, but you'll avoid some obvious-in-retrospect pitfalls you likely would have otherwise fallen into. So I'd count that corpus as a big achievement.
They have a big paper on logical induction. It doesn't have any applications yet, but possibly will serve some theoretical grounding for later work. And I think the more general idea of seeing inexploitable systems as markets has a good chance of being generally applicable.
Scott Garrabrant has done a lot in the public eye, and so has Vanessa Kosoy.
Risks From Learned Optimization, as others have mentioned, explained & made palatable the idea of "mesa optimizers" to skeptics.
Re logical induction: there's a connection with infra-Bayes and the logical induction from the MIRI paper. This both increases the chance that logical induction will serve as theoretical grounding and means it helps serve as evidence for infra-Bayes being fruitful (evidence we wouldn't have had had we not gotten the logical induction paper).
I think a lot of threat models (including modern threat models) are found in, or heavily inspired by, old MIRI papers. I also think MIRI papers provide unusually clear descriptions of the alignment problem, why MIRI expects it to be hard, and why MIRI thinks intuitive ideas won't work (see e.g., Intelligence Explosion: Evidence and Import, Intelligence Explosion Microeconomics, and Corrigibility).
Regarding more recent stuff, MIRI has been focusing less on research output and more on shaping discussion around alignment. They are essentially "influencers" on the alignment space. Some people I know label this as "not real research", which I think is true in some sense, but I think more about "what was the impact of this" than "does it fit into the definition of a particular term."
For specifics, List of Lethalities and Death with Dignity have had a pretty strong effect on discourse in the alignment community (whether or not this is "good" depends on the degree to which you think MIRI is correct and the degree to which you think the discourse has shifted in a good vs. bad direction). On how various plans miss the hard bits of the alignment challenge remains one of the best overviews/critiques of the field of alignment, and the sharp left turn post is a recent piece that is often cited to describe a particularly concerning (albeit difficult to understand) threat model. Six dimensions of operational adequacy is currently one of the best (and only) posts that tries to envision a responsible AI lab.
Some people have found the 2021 MIRI Dialogues to be extremely helpful at understanding the alignment problem, understanding threat models, and understanding disagreements in the field.
I believe MIRI occasionally advises people at other organizations (like Redwood, Conjecture, Open Phil) on various decisions. It's unclear to me how impactful their advice is, but it wouldn't surprise me if one or more orgs had changed their mind about meaningful decisions (e.g., grantmaking priorities or research directions) partially as a result of MIRI's advice.
There's also MIRI's research, though I think this gets less attention at the moment because MIRI isn't particularly excited about it. But my guess is that if someone made a list of all the alignment teams, MIRI would currently have 1-2 teams in the top 20.
Being ~50% of where people were thinking about AI alignment until about 2018 - putting out educational materials, running workshops and conferences, etc.
I think this is important to mention- from 2000 to 2018 they were doing basically all the heavy lifting, and 2018-2022 was a low period of contributions. That's a pretty great ratio of peak to valley.
They also spent almost all of that second period trying to find a way out by coming across something big again, like they'd been for almost two years prior; their work with CFAR seems to me to have been ...
I agree that the work on ontological crises was good, and feels like a strong precursor to model-splintering and concept/value extrapolation.
Their decision theory stuff is perhaps relevant because of how the difficulties encountered seem to be central to alignment: for example, logical uncertainties and logical counterfactuals. Embedded Agency argues that decision theory is merely one of multiple entangled problems that come from trying to have embedded agents.
Encountering the same difficulties provides further evidence that those difficulties are important; especially if the thing is "entangled with" the same core issue. Thus working out some of the theory beforehand helps you figure out what paths to pursue later.
I don't know if quantilizers will be important. The idea is that you have the AI sample randomly from what the top x% of humans (or whatever "safe" distribution you want) would do, and you can then give some guarantees that essentially the AI will at worst be 1/x (e.g. 10 times) as bad as a human.
I'm skeptical about usefulness because I expect that to be useful you'll need to push to a high enough part of the human distribution that your safety guarantee is practically useless.