LESSWRONG
LW

808
EA
5020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Using GPT-Eliezer against ChatGPT Jailbreaking
EA3y10

This might work a bit better:


e.g., the following confused the previous version (which didn't allow the benign answer):


but

Reply
Using GPT-Eliezer against ChatGPT Jailbreaking
EA3y63

Asking a separate session to review the answer seems to work nicely, at least in some cases:

Image

but:
Image

Reply