Sequences

Trends in Machine Learning

Wiki Contributions

Comments

shrug 

I think this is true to an extent, but a more systematic analysis needs to back this up.

For instance, I recall quantization techniques working much better after a certain scale (though I can't seem to find the reference...).  It also seems important to validate that techniques to increase performance apply at large scales. Finally, note that the frontier of scale is growing very fast, so even if these discoveries were done with relatively modest compute compared to the frontier,  this is still a tremendous amount of compute!

even a pause which completely stops all new training runs beyond current size indefinitely would only ~double timelines at best, and probably less

 

I'd emphasize that we currently don't have a very clear sense of how algorithmic improvement happens, and it is likely mediated to some extent by large experiments, so I think is more likely to slow timelines more than this implies.

I agree! I'd be quite interested in looking at TAS data, for the reason you mentioned.

I think Tetlock and cia might have already done some related work?

Question decomposition is part of the superforecasting commandments, though I can't recall off the top of my head if they were RCT'd individually or just as a whole.

ETA: This is the relevant paper (h/t Misha Yagudin). It was not about the 10 commandments. Apparently those haven't been RCT'd at all?

I cowrote a detailed response here

https://www.cser.ac.uk/news/response-superintelligence-contained/

Essentially, this type of reasoning proves too much, since it implies we cannot show any properties whatsoever of any program, which is clearly false.

Here is some data through Matthew Barnett and Jess Riedl

Number of cumulative miles driven by Cruise's autonomous cars is growing as an exponential at roughly 1 OOM per year.

https://twitter.com/MatthewJBar/status/1690102362394992640

That is to very basic approximation correct.

Davidson's takeoff model illustrates this point, where a "software singularity" happens for some parameter settings due to software not being restrained to the same degree by capital inputs.

I would point out however that our current understanding of how software progress happens is somewhat poor. Experimentation is definitely a big component of software progress, and it is often understated in LW. 

More research on this soon!

algorithmic progress is currently outpacing compute growth by quite a bit

This is not right, at least in computer vision. They seem to be the same order of magnitude.

Physical compute has growth at 0.6 OOM/year and physical compute requirements have decreased at 0.1 to 1.0 OOM/year, see a summary here or a in depth investigation here

Another relevant quote

Algorithmic progress explains roughly 45% of performance improvements in image classification, and most of this occurs through improving compute-efficiency.

 is not a transpose! It is the timestep . We are raising  to the -th power.

Load More