LESSWRONG
LW

Wikitags

AI Auditing

Edited by Raemon last updated 4th Aug 2025

Formerly "auditing games"

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged AI Auditing
89Automating Auditing: An ambitious concrete technical research proposal
Ω
evhub
4y
Ω
13
163A transparency and interpretability tech tree
Ω
evhub
3y
Ω
11
141Auditing language models for hidden objectives
Ω
Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
6mo
Ω
15
114Towards Alignment Auditing as a Numbers-Go-Up Science
Ω
Sam Marks
1mo
Ω
15
52Putting up Bumpers
Ω
Sam Bowman
4mo
Ω
14
38What progress have we made on automated auditing?
QΩ
LawrenceC
1y
QΩ
1
33Auditing games for high-level interpretability
Ω
Paul Colognese
3y
Ω
1
22Hidden Cognition Detection Methods and Benchmarks
Paul Colognese
2y
11
Add Posts