LESSWRONG
LW

1105
Wikitags

Tripwire

Edited by plex, Dakara last updated 30th Dec 2024

Tripwire is a mechanism designed to detect signs of misalignment in an advanced artificial intelligence and shut it down automatically.

Subscribe
Discussion
Subscribe
Discussion
Posts tagged Tripwire
50Shutdown-Seeking AI
Ω
Simon Goldstein
2y
Ω
32
37Mr. Meeseeks as an AI capability tripwire
Eric Zhang
2y
17
24Implications of Quantum Computing for Artificial Intelligence Alignment Research
Ω
Jsevillamol, PabloAMC
6y
Ω
3
17Any work on honeypots (to detect treacherous turn attempts)?
Q
David Scott Krueger (formerly: capybaralet)
5y
Q
4
14Superintelligence 13: Capability control methods
KatjaGrace
11y
48
4Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive
Justausername
2y
1
3Corrigibility thoughts III: manipulating versus deceiving
Stuart_Armstrong
9y
0
3Corrigibility thoughts II: the robot operator
Stuart_Armstrong
9y
2
2Corrigibility thoughts I: caring about multiple things
Ω
Stuart_Armstrong
8y
Ω
0
-13[FICTION] ECHOES OF ELYSIUM: An Ai's Journey From Takeoff To Freedom And Beyond
Super AGI
2y
11
Add Posts