I have two questions I'd love to hear your thoughts about.
1. What is the overarching/high-level research agenda of your group? Do you have a concrete alignment agenda where people work on the same thing or do people work on many unrelated things?
2. What are your thoughts on various research agendas to solve the alignment that exists today? Why do you think they will fall short of their goal? What are you most excited about?
Feel free to talk about any agendas, but I'll just list a few that come to my mind (in no particular order).
IDA, Debate, Interpretability (I read a tweet I think, where you said you are rather skeptical about this), Natural Abstraction Hypothesis, Externalized Reasoning Oversight, Shard Theory, (Relaxed) Adversarial Training, ELK, etc.