by Ali
1 min read4th Mar 20241 comment
This is a special post for quick takes by Ali . Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
1 comment, sorted by Click to highlight new comments since: Today at 11:22 PM
[-]Ali 2mo20

I've only read a couple of pieces on corrigibility but curious why the following scheme wouldn't work/ I haven't seen any work in this direction (from my admittedly v brief scan). 

Suppose for every time t some decision maker would choose a probability p(t) with which the AI would shutdown. Further, suppose the AI's utility function in any period was initially U(t). Now scale the new AI's utility function to be U(t)/(1-p(t))- I think this can be quite easily generalized so future periods of utility are likewise unaffected by an increase in the risk of shutdown (eg. scale them by the product of (1-p(t)) over all intervening time periods).

In this world the AI should be indifferent over changes to p(t) (as long as it gets arbitrarily close to 1 but never reaches it) and so should take actions trying to maximize U whilst being indifferent to if humans decide to shut it down.