TLDR²: If you notice risk and pause, then you have a better chance of doing sufficient mitigations.
Assume you are able to
think of all major AI risk categories,
define dummy tasks that are (very) leading indicators of each risk, and
make any AI system to do a genuine best-effort on simple tasks.
Then, if you actually define your risks & tasks and actually try to use your AI to complete the tasks, you get plenty of warning of danger.
You pause training & development until the risks are "properly mitigated".
It is too early to say what mitigations work well when, or how to really be sure whether the risk is gone. All else being equal, it is better to make a tentative mitigation plan anyways.
I do not speak for ARC Evals!
TLDR²: If you notice risk and pause, then you have a better chance of doing sufficient mitigations.
Assume you are able to
Then, if you actually define your risks & tasks and actually try to use your AI to complete the tasks, you get plenty of warning of danger.
You pause training & development until the risks are "properly mitigated".
It is too early to say what mitigations work well when, or how to really be sure whether the risk is gone. All else being equal, it is better to make a tentative mitigation plan anyways.