alec_tschantz
alec_tschantz has not written any posts yet.

alec_tschantz has not written any posts yet.

Great post; a few short comments:
Closing the action loop of active inference
There is a sense in which this loop is already closed - the sensory interface for an LLM is a discrete space of size context window x vocabulary that it observes and acts upon. The environment is whatever else writes to this space, e.g., a human interlocutor. This description contains the necessary variables and dependencies to get an action-perception loop off the ground. One caveat is that action-perception loops usually have actions that influence the environment to generate desirable observations, whereas LLMs directly influence their observation space. However, there are counter-examples, such as LLMs generating questions that cause the environment (a user)... (read more)
Interesting, iterative attention mechanisms had always reminded me of predictive coding, where cross-attention encodes a kind of prediction error between the latent and data. But I could also see how self-attention could be read as a type of prediction error between tokens {0,...,n} and {1,...,n+1}.
There is some work comparing residual connections and iterative inference that may be of relevance; they show that such architectures "naturally encourage features to move along the negative gradient of loss during the feedforward phase", I expect some of these insights could be applied to the residual stream in transformers.