Amazon trained a seq2seq model that outperforms GPT-3 on SuperGLUE, SQuADv2, and is not (that) far behind PaLM with 540 billion parameters.

Article: https://www.amazon.science/publications/alexatm-20b-few-shot-learning-using-a-large-scale-multilingual-seq2seq-model

Benchmarks:

New to LessWrong?

New Comment