Éloïse Benito-Rodriguez

Using Base-LCM to Monitor LLMs

Epistemic status: experimental results. This is an exploratory work examining an alternative approach to the interpretation of language models. Summary We aim to determine whether the LCM model — which predicts sentence embeddings rather than token embeddings — can predict the outputs of LLM. We compare four architectures, the most...

May 6-1

Using Base-LCM to Monitor LLMs

by NickyP and Éloïse Benito-Rodriguez

Epistemic status: experimental results. This is an exploratory work examining an alternative approach to the interpretation of language models. Summary We aim to determine whether the LCM model — which predicts sentence embeddings rather than token embeddings — can predict the outputs of LLMs. Using a dataset of 1 million...

Apr 2012

Modelling Trajectories - Interim results

by NickyP, Einar Urdshals, Micurie, and Éloïse Benito-Rodriguez

Introduction Note: These are results which have been in drafts for a year, see discussion about how we have moved on to thinking about these things. Our team at AI Safety Camp has been working on a project to model the trajectories of language model outputs. We're interested in predicting...

Dec 4, 202511