LESSWRONG
LW

24
Wikitags

Impact Regularization

Edited by TurnTrout, Multicore, et al. last updated 30th Dec 2024

Impact Regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us. 

Currently, impact regularization research focuses on two approaches:

  • Relative reachability: the AI preserves its ability to reach many kinds of world-states. The hope is that by staying able to reach many goal states, the AI stays able to reach the correct goal state.
  • Attainable utility preservation: the AI preserves its ability to achieve one or more auxiliary goals. The hope is that by penalizing gaining or losing control over the future, the AI doesn't take control away from us.

For a review of earlier work, see A Survey of Early Impact Measures. 

Sequences on impact regularization:

  • Reframing Impact: we're impacted when we become more or less able to achieve our goals. Seemingly, goal-directed AI systems are only incentivized to catastrophically impact us in order to gain power to achieve their own goals. To avoid catastrophic impact, what if we penalize the AI for gaining power?
  • Subagents and Impact Measures explores how subagents can circumvent current impact measure formalizations.

Related tags: Instrumental Convergence, Corrigibility, Mild Optimization.

Subscribe
Discussion
Subscribe
Discussion
Posts tagged Impact Regularization
97Reframing Impact
Ω
TurnTrout
6y
Ω
15
38Attainable Utility Preservation: Concepts
Ω
TurnTrout
6y
Ω
20
103Towards a New Impact Measure
Ω
TurnTrout
7y
Ω
159
37Tradeoff between desirable properties for baseline choices in impact measures
Ω
Vika
5y
Ω
24
31Impact measurement and value-neutrality verification
Ω
evhub
6y
Ω
13
60Best reasons for pessimism about impact of impact measures?
QΩ
TurnTrout, Vaniver
6y
QΩ
55
73Worrying about the Vase: Whitelisting
Ω
TurnTrout
7y
Ω
26
73Attainable Utility Theory: Why Things Matter
Ω
TurnTrout
6y
Ω
24
72Deducing Impact
Ω
TurnTrout
6y
Ω
28
70Value Impact
Ω
TurnTrout
6y
Ω
10
67World State is the Wrong Abstraction for Impact
Ω
TurnTrout
6y
Ω
19
66Attainable Utility Preservation: Empirical Results
Ω
TurnTrout, nealeratzlaff
6y
Ω
8
54The Gears of Impact
Ω
TurnTrout
6y
Ω
16
52Attainable Utility Landscape: How The World Is Changed
Ω
TurnTrout
6y
Ω
7
45The Catastrophic Convergence Conjecture
Ω
TurnTrout
6y
Ω
16
Load More (15/52)
Add Posts