LESSWRONG
LW

238
Platinuman
49010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Using GPT-Eliezer against ChatGPT Jailbreaking
Platinuman3y5018

As far as I can tell, OpenAI is already using a separate model to evaluate prompts. See their moderation API at https://beta.openai.com/docs/guides/moderation/overview. Looking at the network tab, ChatGPT also always sends a request to "text-moderation-playground" first.

Reply
No posts to display.