Play Go better than AlphaGo Zero. AlphaGo Zero was trained using millions of games. Even if GPT-4 is trained on all of the internet, there simply isn't enough training data for it to have comparable effectiveness.
Things that it can probably do sometimes, but will fail on some inputs:
- Factor numbers
- Solve NP-Complete or harder problems
- Execute code
There are other “tail end” tasks like this that should eventually become the hardest bits that optimization spends the most time on, once it manages to figure everything else out.
Know if it's reply to a prompt is actually useful.
Eg: prompt with "a helicopter is most efficient when ... "; "a helicopter is more efficient when"; and "helicopter efficiency can be improved by." GPT-4 will not be able to know which response is the best. Or even if any of the responses would actually move helicopter efficiency in the right direction.
Reason about code.
Specifically, I've been trying to get GPT-3 to outperform the Hypothesis Ghostwriter in automatic generation of tests and specifications, without any success. I expect that GPT-4 will also underperform; but that it could outperform if fine-tuned on the problem.
If I knew where to get training data I'd like to try this with GPT-3 for that matter; I'm much more attached to the user experience of "
hypothesis write mypackage generates good tests" than any particular implementation (modulo installation and other managable ops issues for novice users).
Directing a robot using motor actions and receiving camera data (translated into text I guess to not make it maximally unfair, but still) to make a cup of tea in a kitchen.
It's vaporware, so it can do whatever you imagine. It's hard to constrain a project that doesn't exist, as far as we know.