This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
609
Wikitags
AI Auditing
Edited by
Raemon
last updated
4th Aug 2025
Formerly "auditing games"
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
AI Auditing
Most Relevant
89
Automating Auditing: An ambitious concrete technical research proposal
Ω
evhub
4y
Ω
13
163
A transparency and interpretability tech tree
Ω
evhub
3y
Ω
11
141
Auditing language models for hidden objectives
Ω
Sam Marks
,
Johannes Treutlein
,
dmz
,
Sam Bowman
,
Hoagy
,
Carson Denison
,
Kei Nishimura-Gasparian
,
7vik
,
Akbir Khan
,
Austin Meek
,
Euan Ong
,
Christopher Olah
,
Fabien Roger
,
jeanne_
,
Meg
,
Drake Thomas
,
Adam Jermyn
,
Monte M
,
evhub
7mo
Ω
15
123
Towards Alignment Auditing as a Numbers-Go-Up Science
Ω
Sam Marks
2mo
Ω
15
54
Putting up Bumpers
Ω
Sam Bowman
6mo
Ω
14
38
What progress have we made on automated auditing?
Q
Ω
LawrenceC
1y
Q
Ω
1
33
Auditing games for high-level interpretability
Ω
Paul Colognese
3y
Ω
1
22
Hidden Cognition Detection Methods and Benchmarks
Paul Colognese
2y
11