Posts

Sorted by New

Wiki Contributions

Comments

The argument in this post does provide a harder barrier to takeoff, though. In order to have dangers from a self-improving ai, you would have to first make an ai which could scale to uncontrollability using 'safe' scaling techniques relative to its reward function, where 'safe' is relative to its own ability to prove to itself that it's safe. (Or in the next round of self-improvements it considers 'safe' after the first round, and so on). Regardless I think self-improving ai is more likely to come from humans designing a self-improving ai, which might render this kind of motive argument moot. And anyway the ai might not be this uber-rational creature which has to prove to itself that it's self-improved version won't change its reward function--it might just try it anyway (like humans are doing now).

Why wouldn't the agent just change the line of code containing its loss function? Surely that's easier to do than world domination.