Really appreciate you pulling this together Ruby. Amazing work.
Two resources I've found useful or interesting that weren't on the above list:
Obviously a different market and more mainstream but the Alliance for Decision Education has funding, big names on their board, and plans in motion to scale rationality training (under a different name) https://alliancefordecisioneducation.org/
Has anyone considered (or already completed) testing models to see how well they can answer questions about this filtered data?
Consider an eval that tests how well models' awareness of the alignment literature. "Models know everything about misaligned AI, including benefits and strategies for scheming" seems like a useful thing for us to know? Or is it a foregone conclusion that they'd saturate this benchmark anyway?