LESSWRONG
LW

raphael
22120
Message
Dialogue
Subscribe

physicist, latent spaces explorer
currently @ Cornell
raphaelsarfati.xyz     

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Interpretability through two lenses: biology and physics
raphael18d10

I agree! LLMs are in many ways self-organized complex systems. And they are very easy to experiment on.
Which questions about LLM interpretability do you think theoretical physics can help investigate?

Reply
Interpretability through two lenses: biology and physics
raphael19d10

Thank you for the comment, I appreciate this perspective.
A couple of things to clarify:

As I quoted from "Emergence of brains", I think the more traditional approach of biology is to embrace the complexity of living systems. In contrast, physics tends to seek more and more general and unifying principles (symmetries, order, curvature) that apply broadly across the universe, whether it's matter or living systems, or even abstract phenomena like financial markets.

In the context of interpretability, I use this analogy to describe two (main) distinct approaches: one is self-proclaimed "biological", and nowadays is mostly about breaking down models into features and circuits, and the other is more emergentist, and seek to find, for example, unifying principles in how models structure internal representations.

Again, it's mostly an analogy, and both approaches are based on math (which is incidentally often similar to stat-mech math). And by "second law of thermodynamics" I did not mean it in a literal sense, but as an illustration for a principle which, without being very precise or practical, could provide a general guiding framework (eg, perpetual motion is not possible). 

To conclude, I'm not saying the messiness of features and circuits is not useful; I think it's actually fascinating. But I gently push for more recognition for the alternative approaches, which I believe will become very illuminating.

Reply
23Interpretability through two lenses: biology and physics
20d
4