Hmm, I thought a portfolio approach to safety is what Anthropic has been saying is the best approach for years e.g. from 2023: https://www.anthropic.com/news/core-views-on-ai-safety. Has that shifted recently toward a more "all eggs in interpretability" basket to trigger this post?
I would also love to understand, maybe as a followup post, what the pragmatic portfolio looks like - and what's the front runner in this portfolio. Since I sort of think it's Interpretability, so even though we can't go all in on it, it's still maybe our best shot and thus deserves appropriate resource allocation?
Hmm, I thought a portfolio approach to safety is what Anthropic has been saying is the best approach for years e.g. from 2023: https://www.anthropic.com/news/core-views-on-ai-safety. Has that shifted recently toward a more "all eggs in interpretability" basket to trigger this post?
I would also love to understand, maybe as a followup post, what the pragmatic portfolio looks like - and what's the front runner in this portfolio. Since I sort of think it's Interpretability, so even though we can't go all in on it, it's still maybe our best shot and thus deserves appropriate resource allocation?