Thank you for the insightful post! You mentioned that:
Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM.
and the linear projection consists of:
Linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors).
Given any natural language dataset, if we didn't have the ground truth belief distribution, is it possible to reverse engineer (data model) a HMM and extract the topology of the residual stream activation?
I've been running task salient representation experiments on larger models and am very interested in replicating and possibly extending your result to more noisy settings.
Eliezer seems on track to win: current AI benchmark for IMO geometry problems is at 27/30 (IMO Gold human performance is at 25.9/30). This new benchmark was set by LLM-augmented neurosymbolic AI.
Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry [2024 April]