x

LESSWRONG

LW

Edmund Mills — LessWrong

Edmund Mills

Edmund Mills

Message

16

Ω

6

1

3y

Edmund Mills

16

Ω

6

3y

Storytelling Makes GPT-3.5 Deontologist: Unexpected Effects of Context on LLM Behavior

TL;DR When prompted to make decisions, do large language models (LLMs) show power-seeking behavior, self-preservation instincts, and long-term goals? Discovering Language Model Behaviors with Model-Written Evaluations (Perez et al.) introduced a set of evaluations for these behaviors, along with other dimensions of LLM self-identity, personality, views, and decision-making. Ideally, we’d...

Mar 14, 2023•17