Posts

Sorted by New

Wiki Contributions

Comments

Transformers Represent Belief State Geometry in their Residual Stream

JoNeedsSleep16d10

Thank you for the insightful post! You mentioned that:

Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM.

and the linear projection consists of:

Linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors).

Given any natural language dataset, if we didn't have the ground truth belief distribution, is it possible to reverse engineer (data model) a HMM and extract the topology of the residual stream activation?

I've been running task salient representation experiments on larger models and am very interested in replicating and possibly extending your result to more noisy settings.

Reply