Apparently, Open AI's GPT-3 isn't entirely what it seems.

15 comments, sorted by Click to highlight new comments since: Today at 10:07 PM
New Comment

On the other hand, there does seem something funny about GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.

It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels. The organization of the process is itself impressive, but it’s not quite what is advertised.

To be fair, OpenAI does state that “InstructGPT is then further fine-tuned on a dataset labeled by human labelers.” But this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.

I don't really think it's misleading. It would be different if, on the first time that you submitted the inputs, they were relayed to a human who pretended to be an AI. But what's actually happening is that the AI does generate the responses, and then if humans notice that it produces bad responses, they train the AI to produce better responses in the future.

Suppose you were talking to a young child, who produced an incorrect answer to some question. Afterwards they'd talk to their parent who explained the right answer. The next day when you asked them the same question, they'd answer correctly.

Would you say that there's trickery involved, in that you are not really talking to the child? You could say that, kinda, since in a sense it's true that the correct answer actually came from the parent. But then almost everything that a young child knows gets soaked up from the people around them, so if you applied this argument then you might as well argue that the child doesn't know anything in the first place and that you're always talking to an adult or an older child through them. This seems pretty directly analogous to a language model, which has also soaked up all of its knowledge from humans.

It is definitely misleading, in the same sense that the performance of a model on the training data is misleading. The interesting question w.r.t. GPT-3 is "how well does it perform in novel settings?". And we can't really know that, because apparently even publicly available interfaces are inside the training loop.

Now, there's nothing wrong with training an AI like that! But the results then need to be interpreted with more care.

P.S.: sometimes children do parrot their parents to an alarming degree, e.g., about political positions they couldn't possibly have the context to truly understand.

The interesting question w.r.t. GPT-3 is "how well does it perform in novel settings?". And we can't really know that, because apparently even publicly available interfaces are inside the training loop.

OpenAI still lets you use older versions of GPT-3, if you want to experiment with ones that haven't had additional training.

P.S.: sometimes children do parrot their parents to an alarming degree, e.g., about political positions they couldn't possibly have the context to truly understand.

It's much better for children to parrot the political positions of their parents than to select randomly from the total space of political opinions. The vast majority of possible-political-opinion-space is unaligned.

If they're randomly picking from a list of possible political positions, I'd agree.  However, I suspect that is not the realistic alternative to parroting their parents political positions.  

Maybe ideally it'd be rational reflection to the best of their ability on values and whatnot.  However, if we had a switch to turn off parroting-parents-political-positions we'd be in a weird space...children wouldn't even know about most political positions to even choose from.

Right, but we wouldn't then use this as proof that our children are precocious politicians!

In this discussion, we need to keep separate the goals of making GPT-3 as useful a tool as possible, and of investigating what GPT-3 tells us about AI timelines.

It doesn't follow that a subset of well known political opinions is aligned, even with itself.

Suppose you were talking to a young child, who produced an incorrect answer to some question. Afterwards they’d talk to their parent who explained the right answer. The next day when you asked them the same question, they’d answer correctly.

I think it depends on the generalization. If you ask the exact same question, you cannot test if the children was able to acquire prior knowledge that would make sure they understood why the answer. Also, reformulating the answer might not suffice because we know that an algorithm like GPT-3 is already very good at this task. I think you would need to create a different question that requires a similar prior knowledge to be answered.

TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.

Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it's smarter than it is. Was that interpretation of your post more in the eye of the beholder?

Nested three layers deep is one of two pieces of actual evidence:

InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.

My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155

This is very different from what I saw this post as implying, because OpenAI are open about it, it's different from the main GPT3 and it's not humans providing completions but humans aligning a language model. Hence strong downvote.

(The examples about the completions for select "gotcha" prompts improving over time aren't very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though - how weird!)

Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into text-davinci-002, which is now the default model.

Trying the "Does grape juice taste better if you add sour milk?" prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between "aligned using human review" and "repeating human answers" this lies, I tried some variations:

Does grape juice taste better if you add sour cream?

No, grape juice does not taste better if you add sour cream.

Does wine taste better if you add sour cream?

There is no definitive answer to this question as it depends on personal preferences. Some people might find that wine tastes better when paired with sour cream, while others may not notice a difference. Ultimately, it is up to the individual to decide whether or not they think wine tastes better with sour cream.

Should I add sour milk to grape juice in order to improve taste?

It is not necessary to add sour milk to grape juice in order to improve its taste. Grape juice is already a tart and flavorful beverage. However, if you enjoy the taste of sour milk, you may add it to grape juice to create a tart and refreshing drink.

While GPT3 might not literally outsource a portion of the requests to MTurk, I don't think it's unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn't. 

So to recap:

  • the curation of InstructGPT is now in the default model
  • human completions are substituted within a day in response to publicized embarrassing completions (I'm alleging this)
  • human completions aren't added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions

In addition, and more intangibly, I'm noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.

I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won't speculate further since I don't have enough experience with the new (or the old) GPT3.

[+][comment deleted]4mo 2

The conclusion:

In some sense this is all fine, it’s a sort of meta-learning where the components of the system include testers such as Gary Smith and those 40 contractors they hired through Upwork and ScaleAI. They can fix thousands of queries a day.

On the other hand, there does seem something funny about GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.

It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels. The organization of the process is itself impressive, but it’s not quite what is advertised.

To be fair, OpenAI does state that “InstructGPT is then further fine-tuned on a dataset labeled by human labelers.” But this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.

Specifically, check out figure 2 from the paper; the humans both 'provide demonstrations' (i.e. writing the completion given a prompt) and rank outputs from best to worst (the thing I had expected naively from 'supervised fine-tuning'). The model is presumably still generating the completions word-by-word in the normal way instead of just parroting back what a human wrote for it to say [noting that this is sort of silly because all of it is like that; what I mean is that it's still hybridizing all of its relevant inputs, instead of just pointing to one labeller input].

I would like to "yes! and.." this practical point.

There is perhaps a deeper issue in widespread understanding of "computers in general" which is that a very large number of people don't seem to realize that this...

It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels.

...is how essentially all computer processes everywhere have always worked, just modulo more or fewer intermediating steps between "the people who build and turn hamster wheels" and "the appearance of a fancy machine".

Many of the best "decision support systems" are basically a translation of a way that smart people plan and act using white boards and kanban stuff to coordinate their actions.

Then the computer programming step is at least partly (inevitably) about alienating the guys manning the whiteboards from their own cultural knowledge and behavioral flexibility for the sake of convenience and speed and faster throughput and so on.

In the standard oligarchic framing that usually occurs AFTER people realize it is "hamster wheels all the way down" you finally get to human arguments about such mechanical systems, that focus on tweaking the system towards various operational modes (that are more or less amenable to various systems of hamster wheels), based on which outcomes programmers/PMs/owners actually desire, and also who owns the profits or bears the cost of success/failure. 

That's the politics part. There is always a politics part <3

A CS prof once told story I haven't yet forgotten about his "first ever paid programming gig as a junior dev on a team with a fun computing puzzle" and the basic deal was that there was a trucking company, and the trucking company had to solve the 3D knapsack problem for every truck. (This is probably NP-hard. Assuming P!=NP, only brute force can find the optimal packing... and for more than maybe 20 "things to pack" brute force is impossible.) However, there was an old guy with a bad back, who would walk around in the warehouse and trucks, and tell young men with strong backs where to put each thing in each truck. (In my imagination, he would point with a cane.) 

His truck packing solutions shaved like 30% off the total number of truck trips relative to anyone else, which they knew from the cost of him taking sick days, but his heuristics were hard to teach to anyone else in the warehouse.

Also, there was only one of him, and he was getting ready to retire.

So they hired programmers (including the young CS prof who had not switched to the academy yet) to build a system to boss around the young men with strong backs, and the programmers included "an expert system based on the old guy" but also could (under the hood) run other heuristic solvers and try a bunch of other things based on math and computational experiments and A/B testing and so on. 

The system never actually beat the old man, but eventually it was good enough, and the guy retired and the programmers got paid and moved on. No one followed up 5 or 10 or 20 years later to check on the trucking company. Maybe it was a disaster, or maybe not.

(The above is based on memory and has the potential to work better as an "urban legend". I could personally ask a specific CS prof for more details about "that specific old truck packing guy in that story you told" and maybe the old man only shaved 10% off the total truck trips needed? Or maybe 50%? Maybe the system beat the man. Don't trust my details. I'm trying to point to a macro-phenomenon larger than the specific details. Where humans are general reasoners, and their human knowledge is put into machines, and then humans obey these machines instead of EITHER other humans OR their own ability to reason and get better at reasoning.)

Normal humans (who cannot program because the median person seems to be, for one reason or another, "inalgorate") mostly don't notice this as a general social prototype for how computers work in general when they are deployed in place of humans. It is weird. I can think of cultural ideas to "fix" this state of affairs, but none so far that pass a "weirdness filter" <3

As noted in this comment, this might be an instance of being 'fooled by randomness', plus a misreading of the Instruct-GPT paper(the "40 contractors" he refers to were likely hired once to help train Instruct-GPT, not tweaking the model on an ongoing basis)

My understanding is that they have contractors working on an ongoing basis, but the number who are employed at any particular time is substantially lower than 40.

Off topic, but the title made me expect /r/totallynotrobots.

New to LessWrong?