From GPT to AGI

[-]gwern5y90

I would expect that with increased model size it will be possible to increase the attention field by a lot without much need for additional AI insight.

It's not model size/parameters, it's the cost of the self-attention at runtime.The number of parameters to expand self-attention is linear but the runtime memory consumption goes up quadratically. Even a GPT-2-117M can use up to like 300GB RAM if you increase the window to 30k. You need more efficient attention or alternative architectures.

[-]ChristianKl5y20

What exactly is 30k? When I try to calculate the value for GPT-3 it seems to me like 96 * 96 * 128 = 1179648 (1179k) is the resulting value for the 350 GB model.

[-]gwern5y20

As explained in the link, that is the size of the context window; past 30k, even TPU pod RAM is too small to run 117M with wider context windows as the RAM usage continues to explode quadratically.

I'm not sure what your calculation is supposed to be.

[-]ChristianKl5y20

The unit for your 30k seems to be BPEs (Byte pair encodings).

I found on https://www.gwern.net/GPT-3#dialogue:

The first limit is that it remains hobbled by the limited context window. GPT-3 has no form of memory or recurrence, so it cannot see anything outside its limited 2048 BPEs

If GPT-2 could have a context window of 30k BPEs with 300GB ram, could GPT-3 also have such a context window length? So it could be made 15 times as big as it's currently?

[-]gwern5y40

If you tweaked GPT-3 (let's assume the total parameter count remained the same so layers were made a little narrower or somesuch) to have a 30k BPE context, I think the RAM requirements would explode to the point where even the small layers couldn't fit their forward pass onto a single GPU. You can forget about training it too.

[-][missing username]5y50

FailedSave (the author of that guide) here, thanks for reading!

My understanding of that pin item is that it AI Dungeon has a certain amount of text it's going to feed into the model at every step. The pin item doesn't increase that size. The contents of the pin item take first priority, but they go at the "back" of the text block. So the disadvantages of using it are that you get less of the story fed in, and if you have a particular style you use for the pin item (short choppy sentences, say) that affects the style of the output as well. So adding more pin items is the same difficulty as adding any other amount of input.

(The AI also happens to think things in the pin item are relevant, so if a "new" character gets introduced or mentioned, they may get a name that's used in the pin item.)

I'm sure others here, and probably gwern in particular, have a better understanding of both GPT-3 and how the game interacts with it; I'm curious to know how close my understanding of the system matches what's actually going on.

[-][missing username]5y10

To engage more directly with what you're suggesting: You seem to be suggesting that the AI could read text, figure out the most important part, and feed it to itself to improve further outputs. Without a highly supervised step that not only figures out the most important part but optimizes it to be used effectively by the AI, I don't think that would be effective. Already the AI tends to get "stuck"--to pick up on a particular phrase or pattern as very important and keep repeating it--in situations what that's unexpected or unwanted.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

6

6

6