x

LESSWRONG

LW

QSED — LessWrong

QSED

QSED

Message

3

1

3y

QSED

3

3y

On Developing a Mathematical Theory of Interpretability

QSED3y41

I'm skeptical, but I'd love to be convinced. I'm not sure that it's necessary to make interpretability scale, but it definitely strikes me as a potential trump card that would allow interpretability research to keep pace with capabilities research.

Here are a couple relatively unsorted thoughts (Keep in mind that I'm not a mathematician!):

Deep learning as a field isn't exactly known for its rigor. I don't know of any rigorous theory that isn't as you say purely 'reactive', with none of it leading to any significant 'real world' results. As far as I can tell

... (read more)