LESSWRONGTags
LW

Treacherous Turn

EditHistory
Discussion (0)
Help improve this page (1 flag)
EditHistory
Discussion (0)
Help improve this page (1 flag)
Treacherous Turn
Random Tag
Contributors
2plex
2Ruby
1Noosphere89

A Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Posts tagged Treacherous Turn
5
73A Gym Gridworld Environment for the Treacherous TurnΩ
Michaël Trazzi
5y
Ω
9
5
17Any work on honeypots (to detect treacherous turn attempts)?Q
David Scott Krueger (formerly: capybaralet)
3y
Q
4
4
121Soares, Tallinn, and Yudkowsky discuss AGI cognitionΩ
So8res, Eliezer Yudkowsky, jaan
2y
Ω
39
4
36A toy model of the treacherous turn
Stuart_Armstrong
7y
13
3
106A very crude deception eval is already passedΩ
Beth Barnes
2y
Ω
6
3
30AI learns betrayal and how to avoid itΩ
Stuart_Armstrong
2y
Ω
4
3
23[AN #165]: When large models are more likely to lieΩ
Rohin Shah
2y
Ω
0
3
16Superintelligence 11: The treacherous turn
KatjaGrace
9y
50
2
31[Linkpost] Treacherous turns in the wildΩ
Mark Xu
2y
Ω
6
2
3Give the model a model-builder
Adam Jermyn
1y
0
1
1Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities?
Christopher King
4mo
7
Add Posts