LESSWRONGTags
LW

Treacherous Turn

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
Treacherous Turn
Random Tag
Contributors
2plex
2Ruby
1Noosphere89

A Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Posts tagged Treacherous Turn
5
73A Gym Gridworld Environment for the Treacherous Turn
Ω
Michaël Trazzi
5y
Ω
9
5
17Any work on honeypots (to detect treacherous turn attempts)?
Q
David Scott Krueger (formerly: capybaralet)
3y
Q
4
4
121Soares, Tallinn, and Yudkowsky discuss AGI cognition
Ω
So8res, Eliezer Yudkowsky, jaan
2y
Ω
39
4
42A toy model of the treacherous turn
Stuart_Armstrong
8y
13
3
107A very crude deception eval is already passed
Ω
Beth Barnes
2y
Ω
6
3
30AI learns betrayal and how to avoid it
Ω
Stuart_Armstrong
2y
Ω
4
3
23[AN #165]: When large models are more likely to lie
Ω
Rohin Shah
2y
Ω
0
3
16Superintelligence 11: The treacherous turn
KatjaGrace
9y
50
2
31[Linkpost] Treacherous turns in the wild
Ω
Mark Xu
2y
Ω
6
2
3Give the model a model-builder
Adam Jermyn
1y
0
1
2A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.
AlexFromSafeTransition
3mo
16
1
1Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities?
Christopher King
7mo
7
Add Posts