# 33

GPTAI
Frontpage

Regarding GPT-3, there is some discussion whether growing the model would transform it into an Oracle AI. I looked into the actual benchmark results (Appendix H in the paper) to see if we can predict something useful from the actual measurements.

Method: The OpenAI team ran a suite of 63 different benchmarks (including sub-types), each for zero/one/few shot. In each scenario, there are 8 model sizes. I looked at how results scale with model size. With only 8 measurements, there is a large associated uncertainty for predictions. Formally, one would test the trend function using a
Bayesian model selection between a linear and (e.g.,) a polynomial. I did this for a few and then eye-balled the rest. So, please take the following as an indication only.

Disclaimer: The smallest model for GPT-3 has parameters, the largest . That's a span of 3 orders of magnitude. Scaling this out to many more orders of magnitude is dangerous. Thus, take these numbers only as an indication.

Results. For the following tests, I find an asymptotic trend. Scaling the model will apparently not yield fantastic results for:

• HellaSwag, LAMBADA, PIQA, CoQA, OpenBookQA, Quac, RACE, CB, ReCoRD, WiC
• Translations - but unclear level description.

In the following tests, it is unclear if the trend is asymptotic or better than that:

• SAT: Could be linear, could be asymptotic. If linear, it will achieve 100% at parameters.

These tests show a linear scaling:

• TriviaQA ( parameter estimate to achieve 100%)
• BoolQ ()
• MultiRC ()
• ARC ()
• SuperGLUE ()
• WSC ()
• WebQs ()
• Cycled ()

Some tests scale neither linear nor asymptotic:

• Symbol: Near exponential ()
• Arithmetic: Exponential; one-digit composite may achieve 100% at
• Reversed: Near exponential ()
• Anagrams: Polynomial ()
• ANLI: stepped, unclear
• RTE: stepped, unclear

Summary: About half of the tested skills will likely not scale much with larger models. The other half will (e.g., TriviaQA, SuperGLUE, arithmetic, anagrams). Going to e.g., parameters - would that make an Oracle AI? Probably it's not sufficient, but I'm interested in hearing your opinion!

GPT3AI2
Frontpage