Welcome to the Technical AI Safety Podcast, the show where I interview computer scientists about their papers. This month I covered Optimal Policies Tend to Seek Power, which is closely related to Seeking Power is Often Robustly Instrumental in MDPs which is a part of the Reframing Impact sequence and was recently a part of the 2019 review.
The point of the show is to make papers more parsable, the interview features a detailed walkthrough padded on either side by discussion of where the work came from and where it's going.
I had a lot of fun doing this month's episode, a tricky paper to wrap my head around but very rewarding. Do let me know if you have trouble finding it on your favorite podcast app, thanks!
Show notes:
With Alex Turner
Optimal Policies Tend to Seek Power
by Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli
Abstract:
Some researchers have speculated that capable reinforcement learning agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: human power-seeking instincts seem idiosyncratic, and these urges need not be present in reinforcement learning agents. We formalize a notion of power within the context of Markov decision processes. With respect to a class of neutral reward function distributions, we provide sufficient conditions for when optimal policies tend to seek power over the environment.