This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI Control
Settings
•
Applied to
SIGMI Certification Criteria
by
a littoral wizard
1d
ago
•
Applied to
Thoughts on the conservative assumptions in AI control
by
Buck
4d
ago
•
Applied to
Topological Debate Framework
by
lunatic_at_large
5d
ago
•
Applied to
What’s the short timeline plan?
by
Marius Hobbhahn
19d
ago
•
Applied to
Are Sparse Autoencoders a good idea for AI control?
by
Gerard Boxo
26d
ago
•
Applied to
Reduce AI Self-Allegiance by saying "he" instead of "I"
by
Knight Lee
1mo
ago
•
Applied to
Measuring whether AIs can statelessly strategize to subvert security measures
by
Alex Mallen
1mo
ago
•
Applied to
A toy evaluation of inference code tampering
by
Gunnar_Zarncke
1mo
ago
•
Applied to
The Queen’s Dilemma: A Paradox of Control
by
Raemon
2mo
ago
•
Applied to
Why imperfect adversarial robustness doesn't doom AI control
by
Raemon
2mo
ago
•
Applied to
Using Dangerous AI, But Safely?
by
Raemon
2mo
ago
•
Applied to
Sabotage Evaluations for Frontier Models
by
Buck
2mo
ago
•
Applied to
Win/continue/lose scenarios and execute/replace/audit protocols
by
Buck
2mo
ago
•
Applied to
Toward Safety Cases For AI Scheming
by
Mikita Balesni
3mo
ago
•
Applied to
Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
by
Super AGI
3mo
ago
•
Applied to
A Brief Explanation of AI Control
by
Aaron_Scher
3mo
ago
•
Applied to
Coup probes: Catching catastrophes with probes trained off-policy
by
StefanHex
3mo
ago