LESSWRONG
is fundraising!
Tags
LW
$

Treacherous Turn

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
Treacherous Turn
Random Tag
Contributors
2plex
2Ruby
1Noosphere89
1Dakara

Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Posts tagged Treacherous Turn
5
74A Gym Gridworld Environment for the Treacherous Turn
Ω
Michaël Trazzi
6y
Ω
9
5
17Any work on honeypots (to detect treacherous turn attempts)?
Q
David Scott Krueger (formerly: capybaralet)
4y
Q
4
4
121Soares, Tallinn, and Yudkowsky discuss AGI cognition
Ω
So8res, Eliezer Yudkowsky, jaan
3y
Ω
39
4
43A toy model of the treacherous turn
Stuart_Armstrong
9y
13
3
108A very crude deception eval is already passed
Ω
Beth Barnes
3y
Ω
6
3
30AI learns betrayal and how to avoid it
Ω
Stuart_Armstrong
3y
Ω
4
3
23[AN #165]: When large models are more likely to lie
Ω
Rohin Shah
3y
Ω
0
3
16Superintelligence 11: The treacherous turn
KatjaGrace
10y
50
2
31[Linkpost] Treacherous turns in the wild
Ω
Mark Xu
4y
Ω
6
2
22A simple treacherous turn demonstration
Nikola Jurkovic
1y
5
2
3Give the model a model-builder
Adam Jermyn
3y
0
1
3"Destroy humanity" as an immediate subgoal
Seth Ahrenbach
1y
13
1
2A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.
AlexFromSafeTransition
2y
16
1
1Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities?
Christopher King
2y
7
1
-3More Thoughts on the Human-AGI War
Seth Ahrenbach
1y
4
Load More (15/15)
Add Posts