Consider the current state of the world A and a "bad" state of the world B (eg, where humans have all become paperclips). For a benign act-based agent to be safe it seems you need to prove that there is no sequence of actions A_2, A_3, ..., A_n, B, such that A_i is always preferable given world state A_i-1, and B would be preferable to A_n. I don't think this is realistically the case.

Reply