x

LESSWRONG

LW

reinthal — LessWrong

reinthal

reinthal

Message

Currently (2025) Learning about AI Safety to make sure AI is done right for mankind and beyond

17

1

4

9mo

reinthal

Currently (2025) Learning about AI Safety to make sure AI is done right for mankind and beyond

The Changing North Star of AI Control

On December 1st, 2025, the GDM mech interp team published a LessWrong article declaring a pivot to a pragmatic approach to interpretability. Much time had been lost chasing numbers in SAE reconstruction loss; they argued that optimizing such proxies did not get them closer to their north star: making AI...