Imagine typing the following meta-question into GPT-4, a revolutionary new 20 Trillion parameter language model released in 2021:
"I asked the superintelligence how to cure cancer. The superintelligence responded __"
How likely are we to get an actual cure for cancer, complete with manufacturing blueprints? Or will we get yet another "nice sounding, vague suggestion" like "by combining genetic engineering and fungi based medicine"- the sort GPT-2/3 is likely to suggest?
The response depends on whether GPT focuses on either:
1. What GPT thinks humans think that the superintelligence would say; or
2. Using basic reasoning, solve for what the character (an actual superintelligence) would say if this scenario were playing out in real life.
If GPT takes... (read 1001 more words →)
The data for GPT-2 has been replicated by the open source OpenWebText project. To my knowledge the same dataset was utilised for GPT-3, so accessing it is not a problem.
The parallelizability of GPT-3 is something I've been looking into. The current implementation of zero-2 seems like the best way to memory-optimally train a 170B parameter transformer model.