Impact regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.
How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.
Currently, impact regularization research focuses on two approaches:
For a review of earlier work, see A Survey of Early Impact Measures.
Sequences on impact regularization: