With massive size comes massive generalization ability: GPT-3 is competitive in many benchmarks without even tuning on the target task. [...] Perhaps the most impressive part, though, is that even at such a massive scale, the model still scales smoothly in performance instead of plateauing, implying that still-larger models would perform even better.
New Comment

New to LessWrong?