OpenAI recently announced progress in NLP, using a large transformer-based language model to tackle a variety of tasks and breaking performance records in many of them. It also generates synthetic short stories, which are surprisingly good.

How surprising are these results, given past models of how difficult language learning was and how far AI had progressed? Should we be significantly updating our estimates of AI timelines?

New Answer
New Comment

2 Answers sorted by

It doesn't move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren't really doing consequentialist reasoning, they're just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.

However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.

Something you learn pretty quickly in academia: don't trust the demos. Systems never work as well when you select the inputs freely (and, if they do, expect thorough proof). So, I wouldn't read too deeply into this yet; we don't know how good it actually is.

Vis-a-vis selecting inputs freely: OpenAI also included a large dump of unconditioned text generation in their github repo.

They claim beating records on a range of standard tests (such as the Winograd schema), which is not something you can cheat by cherry-picking, assuming they are honest about the results.

https://transformer.huggingface.co/ is a nice demonstration of GPT2 that allows you to select the inputs freely.

2 comments, sorted by Click to highlight new comments since: Today at 1:59 PM

It lowers expected AI timing but not only because it is so great achievement, but also because it demonstrates that large part of human thinking could be just generating plausible continuation of the input text.

OpenAI's "safety" move (not releasing the model) reduces the scrutiny it can receive, which makes its impact on forecasts conditional on how good you think it is, when you haven't seen it.