Algorithmic Progress in Six Domains

by lukeprog1 min read3rd Aug 201332 comments


Computer ScienceProgrammingAI
Personal Blog

Today MIRI released a new technical report by visiting researcher Katja Grace called "Algorithmic Progress in Six Domains." The report summarizes data on algorithmic progress – that is, better performance per fixed amount of computing hardware – in six domains:

  • SAT solvers,
  • Chess and Go programs,
  • Physics simulations,
  • Factoring,
  • Mixed integer programming, and
  • Some forms of machine learning.

MIRI's purpose for collecting these data was to shed light on the question of intelligence explosion microeconomics, though we suspect the report will be of broad interest within the software industry and computer science academia.

One finding from the report was previously discussed by Robin Hanson here. (Robin saw an early draft on the intelligence explosion microeconomics mailing list.)

This is the preferred page for discussing the report in general.


In recent boolean satisfiability (SAT) competitions, SAT solver performance has increased 5–15% per year, depending on the type of problem. However, these gains have been driven by widely varying improvements on particular problems. Retrospective surveys of SAT performance (on problems chosen after the fact) display significantly faster progress.
Chess programs have improved by around 50 Elo points per year over the last four decades. Estimates for the significance of hardware improvements are very noisy, but are consistent with hardware improvements being responsible for approximately half of progress. Progress has been smooth on the scale of years since the 1960s, except for the past five. Go programs have improved about one stone per year for the last three decades. Hardware doublings produce diminishing Elo gains, on a scale consistent with accounting for around half of progress.
Improvements in a variety of physics simulations (selected after the fact to exhibit performance increases due to software) appear to be roughly half due to hardware progress.
The largest number factored to date has grown by about 5.5 digits per year for the last two decades; computing power increased 10,000-fold over this period, and it is unclear how much of the increase is due to hardware progress.
Some mixed integer programming (MIP) algorithms, run on modern MIP instances with modern hardware, have roughly doubled in speed each year. MIP is an important optimization problem, but one which has been called to attention after the fact due to performance improvements. Other optimization problems have had more inconsistent (and harder to determine) improvements.
Various forms of machine learning have had steeply diminishing progress in percentage accuracy over recent decades. Some vision tasks have recently seen faster progress.


32 comments, sorted by Highlighting new comments since Today at 1:14 AM
New Comment

One interesting thing about advances in SAT solvers is that they often translate into advances in other areas. There was this one period in AI planning when people got an order of magnitude speedups by translating planning problems into SAT and running them on SAT solvers (this was embarrassing, so advances in pure planners followed within a year).

Now people are converting graphical model structure learning problems into SAT (there was a talk about this at UAI this year).

"Can a SAT solver do it faster than your entire area?" is a good sanity check.

Minor note: while this is true for many areas, some areas like physics simulations don't have this sort of sanity check because there are no known witnesses to a correct calculation.

An interesting statistic I came across is the performance of JavaScript engines over the last 11 years.

According to the data here, Phoenix 0.1 (first release of Firefox, September 2002), on the v8bench version 3, is 134 times slower than Chrome 21 (released end of July 2012), on the same hardware. Other benchmarks don't run on both browsers so it's hard to know how objective v8bench ver 3 is on this. See the details of the measurements on the article History of JavaScript Performance: Firefox.

Not sure of the additional speed up over the last year or so, but JS performance is interesting since 4 companies (Google, Mozilla, Apple, Microsoft) have poured money and talent at this arms race, probably the first time we've had this happen for basically-fully-compatible implementations of the same language (happy to hear if I'm wrong).

Edit: The data here imply a further 33% or so improvement for Chrome over the last year on Octane, the newest version of v8bench. (Firefox seems to have improved a massive 70%+ in the same period.).

JS performance is interesting since 4 companies (Google, Mozilla, Apple, Microsoft) have poured money and talent at this arms race, probably the first time we've had this happen for basically-fully-compatible implementations of the same language (happy to hear if I'm wrong).

There used to be an awful lot of commercial C compilers before GCC killed off most of them, and even now there's still a big 3: GCC, Intel's compiler, and LLVM.

Yes, C is the one counterexample I had in mind, but I vaguely recall compatibility issues much larger than is acceptable for JS. I might be wrong.

I remember porting code between GCC and ICC basically painlessly but not so much the details. Are the three compilers expected to run code tuned on the others with 0 complaints? How prevalent are compiler-specific settings? (I do remember needing to do inline assembly in a special way, I think).

Ok, got some answers. It seems GCC leads by adding non-standard features, and ICC/Clang follow to be compatible. This doesn't seem to be the case in the JS world, where the competitors seem to be on a more equal footing (though V8 has the mindshare lead, with node.js etc.). If this is enough to support the "first ever" description is I guess a matter of personal taste at this point. All in all, thanks for the counter Gwern.

[-][anonymous]8y 0

It's likely not all (or even mostly) new algorithmic progress though, applying well-known techniques to a new technology takes a lot of time.

[This comment is no longer endorsed by its author]Reply

From paper:

The recent defining moment in computer vision is the smashing victory by Geoff Hinton and his team on the ImageNet Large Scale Visual Recognition Challenge last October.

Geoff Hinton is a giant, it's incredible that he is so productive at a late stage in his career. Everyone who is interested in machine learning should read all his big papers. Unfortunately you have to piece together the elements of his big picture ideas by connecting the dots between several papers; he should really have a book out that summarizes his vision.

Another possible source is the Computer language shootout. They don't publish clear historical measures (but perhaps some of the existing research on the shootout does?), so a workaround might be to figure out the submission time of individual programs/versions and use that to create a historical time-series. This would usefully cover both the improving performance of various programming languages and the ability of programmers of that language to improve their programs to hit the fixed target of a benchmark.

(But turns out that only measures hardware progress.)


"Insais" on page 26 should probably be "insei."

"Insais" on page 26 should probably be "insei."

Fixed, thanks!

Cool indeed!

Both uses of "regime" on page 3 look weird:

improving accuracy by a percentage point in the ninety-percent regime arguably makes translation software a lot more useful than improving accuracy by a point in the thirty-percent regime

"Region" seems better to me.

This is common terminology in... I'm not sure exactly, but some parts of mathematics, computer science, and physics. Generally one speaks of the behavior of a problem in the regime where some parameters are large or small. Wikipedia has a related usage in the sciences.

Also, should “A Science-Based Case for Large-Scale Simulation” be cited on page 4?

It is cited later on page 33, though citation added to the page 4 reference as well. Thanks!

Some explanation of why these are natural domains to look at might be nice. With the exception of chess and go they seem to all be problems where solving them is of wide interest and the problems in question are very general. Chess and go seem to be not at all natural problems but mainly of interest because they show what sort of improvements can happen to a narrow problem when a lot goes into improving those specific problems (although that's not all that is going on here since the article does discuss how hardware improvement has mattered a lot).

It might also make sense at some point to branch out some of these problems a little more, such as focusing on 3-SAT rather than general SAT, or looking at work which has focused on factoring specific types of integers (such as the ongoing project to factor Mersenne numbers).

I think the problems were selected largely on the basis of which data were easiest to obtain, so as to start under the streetlight.

Interesting outlier in algorithmic improvement: This presentation, at the Microsoft Research summit where Robin Hanson also presented, described a 20 order of magnitude improvement in the last three years (in verifiable computation).

Figures for data compression are relevant - and should be obtainable.

Hutter's contest only goes back to 2006, and is very much not a generalizable progress curve as it's based on a static dump of Wikipedia. You'd want to also use Calgary & Canterbury Corpus results. (Ideally you'd dredge up compressors from the early history of computing, like basic Unix utilities from the '60s or '70s, but then you have the difficulty of actually running them - same issue that bars really good evaluation of "Proebsting's Law".)

Shannon's bounds mean that you won't see crazy progress in general in data compression.

Compression is closely related to forecasting - which in turn is a large part of intelligence (my estimate is 80% forecasting vs 20% pruning and evaluation). So your thesis bears on the idea of an intelligence explosion.

I guess it's very weak negative evidence (like the fact that NP-complete problems exist, and lots of AI problems are NP-complete).

The steelman of a pro-explosion argument is probably very sophisticated and easily avoids these issues.

Well, creatures able to compress their sense data to 0.22 of its original size might drive to extinction creatures who can only manage a compression ratio of 0.23 - in an evolutionary contest. 'Small' differences in modeling ability - as measured by compression ratios - could thus have large effects on the world.

However compression (and AI) are hard problems and run rapidly into diminishing returns - at least if you measure them in this way.

Well, creatures able to compress their sense data to 0.22 of its original size might drive to extinction creatures who can only manage a compression ratio of 0.23 - in an evolutionary contest.

Sounds pretty speculative.

Unless you have a model that exactly describes how a given message was generated, its Shannon entropy is not known but estimated... and typically estimated based on the current state of the art in compression algorithms. So unless I misunderstood, this seems like a circular argument.

You need to read about universal coding, e.g. start here:

I highly recommend Thomas and Cover's book, a very readable intro on info theory. The point is we don't need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before "in the limit," but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).

I'm not as familiar with Shannon's work as I'd like to be, but don't Shannon's bounds limit compression efficiency and not compression speed? This is a speed test, not an efficiency test.


From the url:

Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than about 16MB. More precisely:


There are restrictions on runspeed, but they seem to be there to limit brute forcing all possible approaches. There is also talk of "compressing intelligently." I am not even sure what a pure speed test of compression would mean -- the identity function is pretty fast!

There are linear time implementations of compression schemes that reach the Shannon bound, so any runtime improvement of an asymptotically optimal encoder will not be dramatic either...

This is my point - the algorithm challenge is to do a given compression task quickly, and Shannon has no limits I'm aware of on how fast that can be done. So there's no problem of approaching physical limits the way there would be for, say, "Encode this file in the smallest space possible".

Edit: The stuff below the line either wasn't there when I replied or I didn't see it. You're right though, if linear-time implementations already exist, then you're basically hard-bounded.

To be sure, constant factors matter a lot! But I think in this case the curve would be pretty boring.

[-][anonymous]8y 0

Very nice!

Based on a brief skim, this would look like the kind of research that might actually get published somewhere. I encourage submitting it to an appropriate journal, if you're not already doing so.

[This comment is no longer endorsed by its author]Reply