LESSWRONG
LW

609
Wikitags

AI Auditing

Edited by Raemon last updated 4th Aug 2025

Formerly "auditing games"

Subscribe
Discussion
Subscribe
Discussion
Posts tagged AI Auditing
89Automating Auditing: An ambitious concrete technical research proposal
Ω
evhub
4y
Ω
13
163A transparency and interpretability tech tree
Ω
evhub
3y
Ω
11
141Auditing language models for hidden objectives
Ω
Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei Nishimura-Gasparian, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
7mo
Ω
15
123Towards Alignment Auditing as a Numbers-Go-Up Science
Ω
Sam Marks
2mo
Ω
15
54Putting up Bumpers
Ω
Sam Bowman
6mo
Ω
14
38What progress have we made on automated auditing?
QΩ
LawrenceC
1y
QΩ
1
33Auditing games for high-level interpretability
Ω
Paul Colognese
3y
Ω
1
22Hidden Cognition Detection Methods and Benchmarks
Paul Colognese
2y
11
Add Posts