x
Monitoring Internal Stability in Language Models as an Early Indicator of Failure — LessWrong