This post is a copy of the introduction of this paper on situational awareness in LLMs.
Authors: Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans.
Abstract
We aim to better understand the emergence of situational awareness in large language models (LLMs). A model is situationally aware if it's aware that it's a model and can recognize whether it's currently in testing or deployment. Today's LLMs are tested for safety and alignment before they are deployed. An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment.
Situational awareness may emerge unexpectedly as a byproduct of model scaling. One... (read 1227 more words →)
I'd like to use this from your description, but tough to trust / understand without a README!