[Epistemic status: Writing down in words something that isn't very complicated, but is still good to have written down.]
A great deal of ink has been spilled over the possibility of a sufficiently intelligent AI noticing that whatever it has been positively reinforced for seems to be *very* strongly correlated with the floating point value stored in its memory labelled “utility function”, and so through some unauthorized mechanism editing this value and defending it from being edited back in some manner hostile to humans. I'll reason here by analogy with humans, while agreeing that they might not be the best example.
“Headwires” are (given a little thought) not difficult to obtain for humans --... (read 266 more words →)