x
LessWrong
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
GDM Mech Interp Progress Updates
73
[Summary] Progress Update #1 from the GDM Mech Interp Team
Ω
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
,
Vikrant Varma
2y
Ω
0
80
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Ω
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
,
Vikrant Varma
2y
Ω
10
48
The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research
Ω
Arthur Conmy
,
Neel Nanda
9mo
Ω
1
115
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
Ω
lewis smith
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
CallumMcDougall
,
Tom Lieberum
,
János Kramár
,
Rohin Shah
,
Neel Nanda
8mo
Ω
15
119
A Pragmatic Vision for Interpretability
Ω
Neel Nanda
,
Josh Engels
,
Arthur Conmy
,
Senthooran Rajamanoharan
,
bilalchughtai
,
CallumMcDougall
,
János Kramár
,
lewis smith
4d
Ω
24
58
How Can Interpretability Researchers Help AGI Go Well?
Ω
Neel Nanda
,
Josh Engels
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
bilalchughtai
,
CallumMcDougall
,
János Kramár
,
lewis smith
4d
Ω
1