DavidW

Deceptive Alignment is <1% Likely by Default

Thanks to Wil Perkins, Grant Fleming, Thomas Larsen, Declan Nishiyama, and Frank McBride for feedback on this post. Thanks also to Paul Christiano, Daniel Kokotajlo, and Aaron Scher for comments on the original post that helped clarify the argument. Any mistakes are my own. In order to submit this to...

Feb 21, 202391

DavidW

DavidW

Deceptive Alignment is <1% Likely by Default

Order Matters for Deceptive Alignment

Linkpost: A Contra AI FOOM Reading List

Counterarguments to Core AI X-Risk Stories?

DavidW

Deceptive Alignment is <1% Likely by Default

Order Matters for Deceptive Alignment

Linkpost: A Contra AI FOOM Reading List

Counterarguments to Core AI X-Risk Stories?

Linkpost: ‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting

Linkpost: A Contra AI FOOM Reading List

Linkpost: A tale of 2.5 orthogonality theses

Counterarguments to Core AI X-Risk Stories?

Deceptive Alignment is <1% Likely by Default

Order Matters for Deceptive Alignment