A Timing Problem for Instrumental Convergence
This paper of mine ("A Timing Problem for Instrumental Convergence"), co-authored with Helena Ward and Jen Semler, was recently accepted in Philosophical Studies for a superintelligent robots issue (open access). The paper argues that instrumental rationality doesn't require goal preservation/goal-content integrity/goal stability. Here is the abstract: > Those who worry...
"It's plausible that AIs will have self-preserving preferences (e.g. like E[sum_t V_t0(s_t)]). It is likely we will build such AIs because this is roughly how humans are, we don't have a good plan to build very useful AIs that are not like that, and current AIs seem to be a bit like that. And if this is true, and we get V even slightly wrong, a powerful AI might conclude its values are better pursued if it got more power, which means self-preservation and ultimately takeover."
This strikes me as plausible. The paper has a narrow target. It's arguing against the instrumental convergence argument for goal preservation. It argues that we shouldn't expect... (read more)