OpenAI lied about SFT vs. RLHF

sanxiyn

10 OpenAI lied about SFT vs. RLHF

by sanxiyn

10th Feb 2025

1 min read

2

10

This is a linkpost for https://x.com/BlancheMinerva/status/1887801044777807897

I used to think while OpenAI is pretty deceitful (eg for-profit conversion) it generally won't lie about its research. This is a pretty definitive case of lying, so I updated accordingly. I am posting here because it doesn't seem to be widely known.

OpenAIAI

Personal Blog

10

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:30 PM

[-]peterbarnett1y116

This was discussed in this post: Update to Mysteries of mode collapse: text-davinci-002 not RLHF
I don't think OpenAI explicitly lied about text-davinci-002 being the same model as InstructGPT. I think if you weren't very carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF). I don't think OpenAI as an organization did much to clear this up, although individual researchers did.

Reply

[-]sanxiyn1y46

I think if you weren't carefully reading OpenAI's documentation it was pretty easy to believe that text-davinci-002 was InstructGPT (and hence trained with RLHF).

Not only was it easy, in fact many people did (including myself). In fact, can you point a single case of people NOT making this reading mistake? As in, after January 2022 instruction following announcement, but before October 2022 model index for researchers. Jan Leike's tweet you linked to postdates October 2022 and does not count. The allegation is that OpenAI lied (or at the very least was extremely misleading) for ten months of 2022. I am more ambivalent about post October 2022.

Reply

1

Moderation Log