If you want to solve AI safety before AI capabilities become too great, then it seems that AI safety must have some of the following:

  • More researchers
  • Better researchers
  • Less necessary insights
  • Easier necessary insights
  • Ability to steal insights from AI capability research more than the reverse.
  • ...

Is this likely to be the case? Why? Another way to ask this question is: Under which scenarios doesn't aligning add time?

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Some more ways:

If it turns out that capabilities and safety are not so dichotomous, and so robustness / interpretability / safe exploration / maybe even impact regularisation get solved by the capabilities lot.

If early success with a date-competitive performance-competitive safety programme (e.g. IDA) puts capabilities research onto a safe path.

Let's just save time by jumping to to the place where the AI in charge of AI Safety goes Foom and takes us back to the Stone Age "for safety" ;)