I give them a lot of credit for, to my eyes, realising this was a big deal way earlier than almost anyone else, doing a lot of early advocacy, and working out some valuable basic ideas, like early threat models, ways in which standard arguments and counter-arguments were silly, etc. I think this kind of foundational work feels less relevant now, but is actually really hard and worthwhile!
(I don't see much recent stuff I'm excited about, unless you count Risks from Learned Optimisation)
I think most every aspiring conceptual alignment researcher should read basically all of the work on Arbital's AI alignment section. Not all of it is right, but you'll avoid some obvious-in-retrospect pitfalls you likely would have otherwise fallen into. So I'd count that corpus as a big achievement.
They have a big paper on logical induction. It doesn't have any applications yet, but possibly will serve some theoretical grounding for later work. And I think the more general idea of seeing inexploitable systems as markets has a good chance of being generally applicable.
Scott Garrabrant has done a lot in the public eye, and so has Vanessa Kosoy.
Risks From Learned Optimization, as others have mentioned, explained & made palatable the idea of "mesa optimizers" to skeptics.
I think a lot of threat models (including modern threat models) are found in, or heavily inspired by, old MIRI papers. I also think MIRI papers provide unusually clear descriptions of the alignment problem, why MIRI expects it to be hard, and why MIRI thinks intuitive ideas won't work (see e.g., Intelligence Explosion: Evidence and Import, Intelligence Explosion Microeconomics, and Corrigibility).
Regarding more recent stuff, MIRI has been focusing less on research output and more on shaping discussion around alignment. They are essentially "influencers" on the alignment space. Some people I know label this as "not real research", which I think is true in some sense, but I think more about "what was the impact of this" than "does it fit into the definition of a particular term."
For specifics, List of Lethalities and Death with Dignity have had a pretty strong effect on discourse in the alignment community (whether or not this is "good" depends on the degree to which you think MIRI is correct and the degree to which you think the discourse has shifted in a good vs. bad direction). On how various plans miss the hard bits of the alignment challenge remains one of the best overviews/critiques of the field of alignment, and the sharp left turn post is a recent piece that is often cited to describe a particularly concerning (albeit difficult to understand) threat model. Six dimensions of operational adequacy is currently one of the best (and only) posts that tries to envision a responsible AI lab.
Some people have found the 2021 MIRI Dialogues to be extremely helpful at understanding the alignment problem, understanding threat models, and understanding disagreements in the field.
I believe MIRI occasionally advises people at other organizations (like Redwood, Conjecture, Open Phil) on various decisions. It's unclear to me how impactful their advice is, but it wouldn't surprise me if one or more orgs had changed their mind about meaningful decisions (e.g., grantmaking priorities or research directions) partially as a result of MIRI's advice.
There's also MIRI's research, though I think this gets less attention at the moment because MIRI isn't particularly excited about it. But my guess is that if someone made a list of all the alignment teams, MIRI would currently have 1-2 teams in the top 20.
Being ~50% of where people were thinking about AI alignment until about 2018 - putting out educational materials, running workshops and conferences, etc.
I think this is important to mention- from 2000 to 2018 they were doing basically all the heavy lifting, and 2018-2022 was a low period of contributions. That's a pretty great ratio of peak to valley.
They also spent almost all of that second period trying to find a way out by coming across something big again, like they'd been for almost two years prior; their work with CFAR seems to me to have been ...
I agree that the work on ontological crises was good, and feels like a strong precursor to model-splintering and concept/value extrapolation.