AI will change the world, but won’t take it over by playing “3-dimensional chess”.
By Boaz Barak and Ben Edelman [Cross-posted on Windows on Theory blog; See also Boaz’s posts on longtermism and AGI via scaling as well as other “philosophizing” posts.] [Disclaimer: Predictions are very hard, especially about the future. In fact, this is one of the points of this essay. Hence, while...
Ah, I think I understand what you meant now. The reward for this agent is not determined by the actual long-term consequences of its action, but by the predicted long-term consequences. In that case, yes, this seems like it might be an interesting middle ground between what we are calling short-term and long-term AIs. Though it still feels closer to a long-term agent to me—I'm confused about why you think it would both (a) not plan ahead of time to disempower humans, and (b) disempower humans when it has the chance. If the predictive model is accurate enough such that it is predictable that disempowering humans would be instrumentally useful, then wouldn't the model incorporate that into its earlier plans?