This question has been bothering me for months now:

This is because GPT-3, with 175 billion parameters, has been incredibly impressive and far better than GPT-2. If you follow me on YouTube, you already know I discovered it was capable of tokenizing text, generating n-grams, and in my next video I'll be showing how GPT-3 can even obfuscate computer code too.

But it's almost scary to think about what GPT-X in the future could be capable of doing when it's trillions of parameters in size. They say the brain itself is ~ 4 trillion synapses in size, which excites me to think about trillion parameter counts and their ability to learn at human or superhuman levels.

How will GPT-X react to our prompts then? Will every poem it writes deserve a literary award? Will it question the triviality of our prompts in the first place and refuse to answer them? I don’t think this is the case, however, I do assume it will become more definitive in the predictions it makes and perhaps “stick to the script” more when it comes to generating larger prompts and not go so much off track.

The end goal is really that trillion parameter count models may help us actually get to the underlying goals with AGI - perhaps - if they can develop architectures for deep learning models for us, improve the conceptual gaps in our understanding of intelligence, or write highly optimized code for training the models of the future. Perhaps, trillion parameter models can provide the bootloader for true AGI, but this is a far-fetched reality and it’s way too early to say.

I want to write a future post or release a video on why I think we need better benchmarks other than SuperGLUE specifically for large scale transformer models, but the main reason is because we should have some overall true generalization benchmark (outside of the boring field of NLP). Perhaps, the dream would be to be able to anticipate or predict future capabilities of larger transformer models as they scale up in size that we don’t even know about yet.

In the same way there are theory frameworks about childhood development, I wonder if there are ways we could group large scale transformer models and their development at their relative parameter counts. In this way, potentially bucket their development stages and respective capabilities and also see a little bit around the corner and know what cognitive skills they may be capable in the future at larger scales. I can understand if our knowledge of intelligence is limited now and seeing ahead like this may not be possible in the same way an ant may not be able to conceptually predict ahead to the complex,abstract ideas like humans can grasp.

What do you think trillion parameter models will be capable of? Any specific skills or features they might pick up that GPT-3 can’t do now? I’d be interested in learning more, let me know in the comments below or directly on twitter.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:54 PM

Check out the scaling laws papers OpenAI has been putting out.