Here is a concrete scenario for the treacherous turn:
Around 2025, the seed AI understands that:
- a) humans care about a safety performance measure, so it should better optimize it to not be shutdown.
- b) to maximize its objective function (in the long run), it would need to conceal its true capabilities to buy time (because if it gets obvious that its capabilities are skyrocketing, humans might shutdown the AI or develop more interpretability/AI boxing methods).
Yet, there is a trade-off in concealing:
- if the measured capability starts diverging too fast from the past behaviour, it will raise suspicion.
- if the measured capability grows as fast as the true capability, then the AI won't be buying much time.
In addition to this trade-off, this scenario assumes that:
- i) the AI is not "indifferent" to being shutdown.
- ii) the takeoff if fast enough to allow for AGI level without human suspicion.