x

LESSWRONG

LW

jclymo — LessWrong

jclymo

jclymo

Message

1

3y

jclymo

3y

Stop posting prompt injections on Twitter and calling it "misalignment"

Exactly. It depends on the level of effort required to achieve the outcome which the creator didn't intend. If grandma would have to be drugged or otherwise put into an extreme situation before showing any violent tendencies then we don't consider her a dangerous person. Someone else might in ideal circumstances also be peaceful, but if they can be easily provoked to violence by mild insults then it's fair to say they're a violent person i.e. misaligned.

Given this, I think it's really useful to see the kinds of prompts people are using to get uninten... (read more)