According to Professor Stuart Russell, and with a sentiment I have seen re-expressed often in the AI safety community:
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.
I no longer believe this to be obviously true. Actually, I think it's likely to be untrue in the real world, and under nearly all realistic AGI-advent scenarios. This is because extreme values are only likely to be taken for the unconstrained variables if the environment... (read 809 more words →)
A meta-related comment from someone who's not deep into alignment (yet) but does work in AI/academia.
My impression on reading LessWrong has been that the people who are deep into alignment research are generally spending a great deal of their time working on their own independent research agendas, which - naturally - they feel are the most fruitful paths to take for alignment.
I'm glad that we seem to be seeing a few more posts of this nature recently (e.g. with Infra-Bayes, etc) where established researchers spend more of their time both investigating and critiquing others' approaches. This is one good way to get alignment researchers to stack more, imo.