Using Base-LCM to Monitor LLMs
Epistemic status: experimental results. This is an exploratory work examining an alternative approach to the interpretation of language models. Summary We aim to determine whether the LCM model — which predicts sentence embeddings rather than token embeddings — can predict the outputs of LLM. We compare four architectures, the most...