All of mpopv's Comments + Replies

Assuming we have control over the utility function, why can't we put some sort of time-bounding directive on it?

i.e. "First and foremost, once [a certain time] has elapsed, you want to run your shut_down() function. Second, if [a certain time] has not yet elapsed, you want to maximize paperclips."

Is that problem that the AGI would want to find ways to hack around the first directive to fulfill the second directive? If so, that would seem to at least narrow the problem space to "find ways of measuring time that cannot be hacked before the time has elapsed".

I think this is how evolution selected for cancer. To ensure humans don’t live for too long competing for resources with their descendants. Internal time bombs are important to code in. But it’s hard to integrate that into the AI in a way that the ai doesn’t just remove it the first chance it gets. Humans don’t like having to die you know. AGI would also not like the suicide bomb tied onto it. The problem of coding this (as part of training) into an optimiser such that it adopts it as a mesa objective is an unsolved problem.
3Jay Bailey4mo
This is where my knowledge ends, but I believe the term for this is myopia or a myopic AI, so that might be a useful search term to find out more!
0Ryan Beck4mo
That's a good point, and I'm also curious how much the utility function matters when we're talking about a sufficiently capable AI. Wouldn't a superintelligent AI be able to modify its own utility function to whatever it thinks is best?