LESSWRONGTags
LW

Tripwire

EditHistory
Discussion (0)
Help improve this page (2 flags)
EditHistory
Discussion (0)
Help improve this page (2 flags)
Tripwire
Random Tag
Contributors
1plex

In AI safety, a tripwire is a mechanism designed to detect signs of misalignment in an advanced artificial intelligence and shut it down automatically.

Posts tagged Tripwire
1
46Shutdown-Seeking AIΩ
Simon Goldstein
16d
Ω
31
1
37Mr. Meeseeks as an AI capability tripwire
Eric Zhang
1mo
17
1
24Implications of Quantum Computing for Artificial Intelligence Alignment ResearchΩ
Jsevillamol, PabloAMC
4y
Ω
3
1
17Any work on honeypots (to detect treacherous turn attempts)?Q
David Scott Krueger (formerly: capybaralet)
3y
Q
4
1
14Superintelligence 13: Capability control methods
KatjaGrace
9y
48
1
3Corrigibility thoughts II: the robot operator
Stuart_Armstrong
6y
2
1
3Corrigibility thoughts III: manipulating versus deceiving
Stuart_Armstrong
6y
0
1
2Corrigibility thoughts I: caring about multiple thingsΩ
Stuart_Armstrong
6y
Ω
0
1
-13[FICTION] ECHOES OF ELYSIUM: An Ai's Journey From Takeoff To Freedom And Beyond
Super AGI
1mo
11
Add Posts