LESSWRONG
LW

Wikitags

Alignment Tax

Edited by markov, et al. last updated 30th Dec 2024

Alignment Tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

In order to get a better idea of what the alignment tax is, consider some of the cases that lie at the edges. The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is  Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. So you either deploy an unaligned system, or you don’t get any benefit from AI systems at all.  We expect something in between these two scenarios to be the case.

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.[1][2]  

  • The first is to have the will to pay the tax, i.e. the relevant actors (corporations, governments, etc.) would be willing to pay the extra costs to avoid deploying a system until it is aligned.

  • The second is to reduce the tax by differentially advancing existing alignable algorithms or by making existing algorithms more alignable. This means, for any potentially unaligned algorithm, ensuring the additional cost for an aligned version of the algorithm is low enough that the developers would be willing to pay it.

Further reading

  • Askell, Amanda et al. (2021) A general language assistant as a laboratory for alignment, arXiv:2112.00861 [Cs].
  • Xu, Mark & Carl Shulman (2021) Rogue AGI embodies valuable intellectual property, LessWrong, June 3.
  • Yudkowsky, Eliezer (2017) , Arbital, February 22.
  • Christiano, Paul (2020). Current work in AI alignment
Aligning an AGI adds significant development time
Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Alignment Tax
76The case for a negative alignment tax
Cameron Berg, Judd Rosenblatt, Diogo de Lucena, AE Studio
10mo
20
67Alignment can be the ‘clean energy’ of AI
Cameron Berg, Judd Rosenblatt, AE Studio
5mo
8
59Safety-capabilities tradeoff dials are inevitable in AGI
Ω
Steven Byrnes
4y
Ω
4
57Against ubiquitous alignment taxes
beren
2y
10
48The case for removing alignment and ML research from the training dataset
beren
2y
8
44How difficult is AI Alignment?
Ω
Sammy Martin
10mo
Ω
6
31Safety tax functions
owencb
9mo
0
27[Linkpost] Jan Leike on three kinds of alignment taxes
Orpheus16
3y
2
22AI safety tax dynamics
owencb
9mo
0
138Ten Levels of AI Alignment Difficulty
Ω
Sammy Martin
2y
Ω
24
106Security Mindset and the Logistic Success Curve
Eliezer Yudkowsky
7y
49
18On the Importance of Open Sourcing Reward Models
elandgre
3y
5
6Labor Participation is a High-Priority AI Alignment Risk
alex
1y
0
5The commercial incentive to intentionally train AI to deceive us
Derek M. Jones
3y
1
Add Posts