Interpretability through two lenses: biology and physics
> Interpretability is the nascent science of making the vast complexity of billion-parameter AI models more comprehensible to the human mind. Currently, the mainstream approach is reductionist: dissecting a model into many smaller components, much like a biologist mapping cellular pathways. Here, I describe and advocate for the complementary perspective...
I agree! LLMs are in many ways self-organized complex systems. And they are very easy to experiment on.
Which questions about LLM interpretability do you think theoretical physics can help investigate?