LESSWRONG
LW

2067
Carter Hart
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
Carter Hart1mo10

I imagine this could work.

I can also imagine the LLM reasoning that AIs in sci-fi tropes will have been given similar instructions to be "nice" and "not evil," yet act evil regardless. So an LLM roleplaying as an AI may predict the AI will be evil regardless of instructions to be harmless.

Reply