58 AI Performance on Human Tasks

by Asher Ellis

3rd Mar 2022

26 min read

3

58

AI

Frontpage

58

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:04 PM

[-][anonymous]4y30

Regarding Image classification performance it seems worth noting that ImageNet was labeled by human labelers (and IIRC there was a paper showing that labels are ambiguous or wrong for a substantial minority of the images).

As such, I don't think we can conclude too much about superhuman AI performance on Image recognition from ImageNet alone (as perfect performance on the benchmark corresponds to perfectly replicating human judgement, admittedly aggregated over multiple humans). To demonstrate superhuman performance, a dataset with known ground truth were humans struggle to correctly label images would seem more appropriate.

Reply

[-]TLW4y30

I have low(er) confidence in any prediction that simultaneously require some exponential trends to continue and yet requires other related trends to stop.

These predictions are all based on exponentially-rising amounts of compute power. Unfortunately, as I've mentioned before, this exponentially-rising amount of compute power comes with exponentially-rising cost of fabs. At about 2080 or so at present trends a fab would cost more than the world GDP, which is obviously nonsense.

I would be interested in seeing a similar set of predictions, but with compute power held constant. From what I've seen things by and large still do scale exponentially, but with far longer doubling times^[1]. (For instance: Proebsting's Law is an observation that compilers roughly double the performance of the output program, all else being equal, with an 18-year doubling time^[2]. And again with an exponentially-increasing amount of compute required to do the compilation...)

^{^}
With exceptions.
^{^}
...and it's actually likely worse. The 2001 reproduction suggested more like 20 years under optimistic assumptions, and a 2022 informal test showed a 10-15% improvement on average in the last 10 years (or a 50-year doubling time...)

Reply

[-]Sable4y20

Should there be some kind of compensation, in your time estimates, for how many orders of magnitude of compute were available at the time?

In other words, was the progression from below-human to average across these metrics due to more development effort, more compute, or some other variable? You post focuses on algorithmic improvements, but how many of those were doable with the preceding generation's compute limitations?

All told, great post, and I enjoyed reading it.

Reply

Moderation Log

Task	Current capabilities
Poker	Superhuman (consistently)
Image classification	Superhuman (usually)
Text-summarization	Average human (unreliable)
Static visual art	Superhuman (but requires human input)
Human-like dexterity	Below humans (except specific tasks)

Limit Hold ‘Em
Range	Start	End	Duration (years)
First attempt to above-average	<1983	1997	>14
Above-average to superhuman	1997	2015	18

Image classification (ImageNet)
Range	Start	End	Duration (years)
First attempt to beginner	2010	2013	3
Beginner to average	2013	2015	2
Average to superhuman	2015	2017	2

Text-summarization
Range	Start	End	Duration (years)
First attempt to beginner (extractive)	<1958	2005	>47
Beginner (extractive) to average (abstractive)	2005	2019	14

Generating static visual art
Range	Start	End	Duration (years)
First attempt to above average (GANs)	<1973	2014	>41
Above average (GANs) to superhuman (DALL-E)	2014	2021	7

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

58

AI Performance on Human Tasks

58

58

Introduction

Task 1: Poker

Task 2: Image classification

Task 3: Text-summarization

Task 4: Creating static visual art

Task 5: Human-like dexterity

Conclusion

Discussion / Personal Predictions

Personal predictions
Task	Overall, AI will {augment/replace} humans.
Poker	Both
Image classification	Replace
Text-summarization	Replace
Static visual art	Augment
Human-like dexterity	Both