Instrumental Convergence

Experimental Evidence

The question of instrumentally convergent drives potentially arising in machine learning models is explored in the paper - Optimal Policies Tend To Seek Power. The authors explored instrumental convergence (specifically power-seeking behavior) as a statistical tendency of optimal policies in reinforcement learning (RL) agents.

The authors focus on Markov Decision Processes (MDPs) and prove that certain environmental symmetries are sufficient for optimal policies to seek power in the environment. They formalize power as the ability to achieve a wide range of goals. Within this formalization, the authors show that most reward functions make it optimal to try and seek power since this allows for keeping a wide range of options available to the agent.

This provides a counter to the claim that instrumental convergence is merely an anthropomorphic theoretical tendency, and that human-like power-seeking instincts will not arise in RL agents.

Applied to Deceptive Alignment by Noosphere89 6mo ago