Scenario analyses for technological progress for the next decade

VipulNaik

This is a somewhat long and rambling post. Apologies for the length. I hope the topic and content are interesting enough for you to forgive the meandering presentation.

I blogged about the scenario planning method a while back, where I linked to many past examples of scenario planning exercises. In this post, I take a closer look at scenario analysis in the context of understanding the possibilities for the unfolding of technological progress over the next 10-15 years. Here, I will discuss some predetermined elements and critical uncertainties, offer my own scenario analysis, and then discuss scenario analyses by others.

Remember: it is not the purpose of scenario analysis to identify a set of mutually exclusive and collectively exhaustive outcomes. In fact, usually, the real-world outcome has some features from two or more of the scenarios considered, with one scenario dominating somewhat. As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

The predetermined element: the imminent demise of Moore's law "as we know it"

As Steven Schnaars noted in Megamistakes (discussed here), forecasts of technological progress in most domains have been overoptimistic, but in the domain of computing, they've been largely spot-on, mostly because the raw technology has improved quickly. The main reason has been Moore's law, and a couple other related laws, that have undergirded technological progress. But now, the party is coming to an end! The death of Moore's law (as we know it) is nigh, and there are significant implications for the future of computing.

Moore's law refers to many related claims about technological progress. Some forms of this technological progress have already stalled. Other forms are slated to stall in the near future, barring unexpected breakthroughs. These facts about Moore's law form the backdrop for all our scenario planning.

The critical uncertainty arises in how industry will respond to the prospect of Moore's law death. Will there be a doubling down on continued improvement at the cutting edge? Will the battle focus on cost reductions? Or will we have neither cost reduction nor technological improvement? What sort of pressure will hardware stagnation put on software?

Now, onto a description of the different versions of Moore's law (slightly edited version of information from Wikipedia):

Transistors per integrated circuit. The most popular formulation is of the doubling of the number of transistors on integrated circuits every two years.
Density at minimum cost per transistor. This is the formulation given in Moore's 1965 paper. It is not just about the density of transistors that can be achieved, but about the density of transistors at which the cost per transistor is the lowest. As more transistors are put on a chip, the cost to make each transistor decreases, but the chance that the chip will not work due to a defect increases. In 1965, Moore examined the density of transistors at which cost is minimized, and observed that, as transistors were made smaller through advances in photolithography, this number would increase at "a rate of roughly a factor of two per year".
Dennard scaling. This suggests that power requirements are proportional to area (both voltage and current being proportional to length) for transistors. Combined with Moore's law, performance per watt would grow at roughly the same rate as transistor density, doubling every 1–2 years. According to Dennard scaling transistor dimensions are scaled by 30% (0.7x) every technology generation, thus reducing their area by 50%. This reduces the delay by 30% (0.7x) and therefore increases operating frequency by about 40% (1.4x). Finally, to keep electric field constant, voltage is reduced by 30%, reducing energy by 65% and power (at 1.4x frequency) by 50%. Therefore, in every technology generation transistor density doubles, circuit becomes 40% faster, while power consumption (with twice the number of transistors) stays the same.

So how are each of these faring?

Transistors per integrated circuit: At least in principle, this can continue for a decade or so. The technological ideas exist to publish transistor sizes down from the current values of 32 nm and 28 nm all the way down to 7 nm.
Density at minimum cost per transistor. This is probably stopping around now. There is good reason to believe that, barring unexpected breakthroughs, the transistor size for which we have minimum cost per transistor shall not go down below 28 nm. There may still be niche applications that benefit from smaller transistor sizes, but there will be no overwhelming economic case to switch production to smaller transistor sizes (i.e., higher densities).
Dennard scaling. This broke down around 2005-2007. So for approximately a decade, we've essentially seen continued miniaturization but without any corresponding improvement in processor speed or performance per watt. There have been continued overall improvements in energy efficiency of computing, but not through this mechanism. The absence of automatic speed improvements has led to increased focus on using greater parallelization (note that the miniaturization means more parallel processors can be packed in the same space, so Moore's law is helping in this other way). In particular, there has been an increased focus on multicore processors, though there may be limits to how far that can take us too.

Moore's law isn't the only law that is slated to end. Other similar laws, such as Kryder's law (about the cost of hard disk space) may also end in the near future. Koomey's law on energy efficiency may also stall, or might continue to hold but through very different mechanisms compared to the ones that have driven it so far.

Some discussions that do not use explicit scenario analysis

The quotes below are to give a general idea of what people seem to generally agree on, before we delve into different scenarios.

EETimes writes:

We have been hearing about the imminent demise of Moore's Law quite a lot recently. Most of these predictions have been targeting the 7nm node and 2020 as the end-point. But we need to recognize that, in fact, 28nm is actually the last node of Moore's Law.

[...]

Summarizing all of these factors, it is clear that -- for most SoCs -- 28nm will be the node for "minimum component costs" for the coming years. As an industry, we are facing a paradigm shift because dimensional scaling is no longer the path for cost scaling. New paths need to be explored such as SOI and monolithic 3D integration. It is therefore fitting that the traditional IEEE conference on SOI has expanded its scope and renamed itself as IEEE S3S: SOI technology, 3D Integration, and Subthreshold Microelectronics.

Computer scientist Moshe Yardi writes:

So the real question is not when precisely Moore's Law will die; one can say it is already a walking dead. The real question is what happens now, when the force that has been driving our field for the past 50 years is dissipating. In fact, Moore's Law has shaped much of the modern world we see around us. A recent McKinsey study ascribed "up to 40% of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements." Indeed, the demise of Moore's Law is one reason some economists predict a "great stagnation" (see my Sept. 2013 column).

"Predictions are difficult," it is said, "especially about the future." The only safe bet is that the next 20 years will be "interesting times." On one hand, since Moore's Law will not be handing us improved performance on a silver platter, we will have to deliver performance the hard way, by improved algorithms and systems. This is a great opportunity for computing research. On the other hand, it is possible that the industry would experience technological commoditization, leading to reduced profitability. Without healthy profit margins to plow into research and development, innovation may slow down and the transition to the post-CMOS world may be long, slow, and agonizing.

However things unfold, we must accept that Moore's Law is dying, and we are heading into an uncharted territory.

CNet says:

"I drive a 1964 car. I also have a 2010. There's not that much difference -- gross performance indicators like top speed and miles per gallon aren't that different. It's safer, and there are a lot of creature comforts in the interior," said Nvidia Chief Scientist Bill Dally. If Moore's Law fizzles, "We'll start to look like the auto industry."

Three critical uncertainties: technological progress, demand for computing power, and interaction with software

Uncertainty #1: Technological progress

Moore's law is dead, long live Moore's law! Even if Moore's law as originally stated is no longer valid, there are other plausible computing advances that would preserve the spirit of the law.

Minor modifications of current research (as described in EETimes) include:

Improvements in 3D circuit design (Wikipedia), so that we can stack multiple layers of circuits one on top of the other, and therefore pack more computing power per unit volume.
Improvements in understanding electronics at the nanoscale, in particular understanding subthreshold leakage (Wikipedia) and how to tackle it.

Then, there are possibilities for totally new computing paradigms. These have fairly low probability, and are highly unlikely to become commercially viable within 10-15 years. Each of these offers an advantage over currently available general-purpose computing only for special classes of problems, generally those that are parallelizable in particular ways (the type of parallelizability needed differs somewhat between the computing paradigms).

Quantum computing (Wikipedia) (speeds up particular types of problems). Quantum computers already exist, but the current ones can tackle only a few qubits. Currently, the best known quantum computers in action are those maintained at the Quantum AI Lab (Wikipedia) run jointly by Google, NASA. and USRA. It is currently unclear how to manufacture quantum computers with a larger number of qubits. It's also unclear how the cost will scale in the number of qubits. If the cost scales exponentially in the number of qubits, then quantum computing will offer little advantage over classical computing. Ray Kurzweil explains this as follows:

A key question is: how difficult is it to add each additional qubit? The computational power of a quantum computer grows exponentially with each added qubit, but if it turns out that adding each additional qubit makes the engineering task exponentially more difficult, we will not be gaining any leverage. (That is, the computational power of a quantum computer will be only linearly proportional to the engineering difficulty.) In general, proposed methods for adding qubits make the resulting systems significantly more delicate and susceptible to premature decoherence.

Kurzweil, Ray (2005-09-22). The Singularity Is Near: When Humans Transcend Biology (Kindle Locations 2152-2155). Penguin Group. Kindle Edition.
DNA computing (Wikipedia)
Other types of molecular computing (Technology Review featured story from 2000, TR story from 2010)
Spintronics (Wikipedia): The idea is to store information using the spin of the electron, a quantum property that is binary and can be toggled at zero energy cost (in principle). The main potential utility of spintronics is in data storage, but it could potentially help with computation as well.
Optical computing aka photonic computing (Wikipedia): This uses beams of photons that store the relevant information that needs to be manipulated. Photons promise to offer higher bandwidth than electrons, the tool used in computing today (hence the name electronic computing).

Uncertainty #2: Demand for computing

Even if computational advances are possible in principle, the absence of the right kind of demand can lead to a lack of financial incentive to pursue the relevant advances. I discussed the interaction between supply and demand in detail in this post.

As that post discussed, demand for computational power at the consumer end is probably reaching saturation. The main source of increased demand will now be companies that want to crunch huge amounts of data in order to more efficiently mine data for insight and offer faster search capabilities to their users. The extent to which such demand grows is uncertain. In principle, the demand is unlimited: the more data we collect (including "found data" that will expand considerably as the Internet of Things grows), the more computational power is needed to apply machine learning algorithms to the data. Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.

Uncertainty #3: Interaction with software

Much of the increased demand for computing, as noted above, does not arise so much from a need for raw computing power by consumers, but a need for more computing power to manipulate and glean insight from large data sets. While there has been some progress with algorithms for machine learning and data mining, the fields are probably far from mature. So an alternative to hardware improvements is improvements in the underlying algorithms. In addition to the algorithms themselves, execution details (such as better use of parallel processing capabilities and more efficient use of idle processor capacity) can also yield huge performance gains.

This might be a good time to note a common belief about software and why I think it's wrong. We often tend to hear of software bloat, and some people subscribe to Wirth's law, the claim that software is getting slower more quickly than hardware is getting faster. I think that there are some softwares that have gotten feature-bloated over time, largely because there are incentives to keep putting out new editions that people are willing to pay money for, and Microsoft Word might be one case of such bloat. For the most part, though, software has been getting more efficient, partly by utilizing the new hardware better, but also partly due to underlying algorithmic improvements. This was one of the conclusions of Katja Grace's report on algorithmic progress (see also this link on progress on linear algebra and linear programming algorithms). There are a few softwares that get feature-bloated and as a result don't appear to improve over time as far as speed goes, but it's arguably the case that people's revealed preferences show that they are willing to put up with the lack of speed improvements as long as they're getting feature improvements.

Computing technology progress over the next 10-15 years: my three scenarios

Slowdown to ordinary rates of growth of cutting-edge industrial productivity: For the last few decades, several dimensions of computing technology have experienced doublings over time periods ranging from six months to five years. With such fast doubling, we can expect price-performance thresholds for new categories of products to be reached every few years, with multiple new product categories a decade. Consider, for instance, desktops, then laptops, then smartphones, then tablets. If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation. There are already some indications of a possible slowdown, and it remains to be seen whether we see a bounceback.
Continued fast doubling: The other possibility is that the evidence for a slowdown is largely illusory, and computing technology will continue to experience doublings over timescales of less than five years. There would therefore be scope to introduce new product categories every few years.
New computing paradigm with high promise, but requiring significant adjustment: This is an unlikely, but not impossible, scenario. Here, a new computing paradigm, such as quantum computing, reaches the realm of feasibility. However, the existing infrastructure of algorithms is ill-designed for quantum computing, and in fact, quantum computing engenders many security protocols while offering its own unbreakable ones. Making good use of this new paradigm requires a massive re-architecting of the world's computing infrastructure.

There are two broad features that are likely to be common to all scenarios:

Growing importance of algorithms: Scenario (1): If technological progress in computing power stalls, then the pressure for improvements to the algorithms and software may increase. Scenario (2): if technological progress in computing power continues, that might only feed the hunger for bigger data. And as the size of data sets increases, asymptotic performance starts mattering more (the distinction between O(n) and O(n²) matters more when n is large). In both cases, I expect more pressure on algorithms and software, but in different ways: in the case of stalling hardware progress, the focus will be more on improving the software and making minor changes to improve the constants, whereas in the case of rapid hardware progress, the focus will be more on finding algorithms that have better asymptotic (big-oh) performance. Scenario (3): In the case of paradigm shifts, the focus will be on algorithms that better exploit the new paradigm. In all cases, there will need to be some sort of shift toward new algorithms and new code that better exploits the new situation.
Growing importance of parallelization: Although the specifics of how algorithms will become more important varies between the scenarios, one common feature is that algorithms that can better make parallel use of large numbers of machines will become more important. We have seen parallelization grow in importance over the last 15 years, even as the computing gains for individual processors through Moore's law seems to be plateauing out, while data centers have proliferated in number. However, the full power of parallelization is far from tapped out. Again, parallelization matters for slightly different reasons in different cases. Scenario (1): A slowdown in technological progress would mean that gains in the amount of computation can largely be achieved by scaling up the number of machines. In other words, the usage of computing shifts further in a capital-intensive direction. Parallel computing is important for effective utilization of this capital (the computing resources). Scenario (2): Even in the face of rapid hardware progress, automatic big data generation will likely improve much faster than storage, communication, and bandwidth. This "big data" is too huge to store or even stream on a single machine, so parallel processing across huge clusters of machines becomes important. Scenario (3): Note also that almost all the new computing paradigms currently under consideration (including quantum computing) offer massive advantages for special types of parallelizable problems, so parallelization matters even in the case of a paradigm shift in computing.

Other scenario analyses

McKinsey carried out a scenario analysis here, focused more on the implications for the semiconductor manufacturing industry than for users of computing. The report notes the importance of Moore's law in driving productivity improvements over the last few decades:

As a result, Moore’s law has swept much of the modern world along with it. Some estimates ascribe up to 40 percent of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements.

The scenario analysis identifies four potential sources of innovation related to Moore's law:

More Moore (scaling)
Wafer-size increases (maximize productivity)
More than Moore (functional diversification)
Beyond CMOS (new technologies)

Their scenario analysis uses a 2 X 2 model, with the two dimensions under consideration being performance improvements (continue versus stop) and cost improvements (continue versus stop). The case that both performance improvements and cost improvements continue is the "good" case for the semiconductor industry. The case that both stop is the case where the industry is highly likely to get commodified, with profit margins going down and small players catching up to the big ones. In the intermediate cases (where one of the two continues and the other stops), consolidation of the semiconductor industry is likely to continue, but there is still a risk of falling demand.

The McKinsey scenario analysis was discussed by Timothy Taylor on his blog, The Conversable Economist, here.

Roland Berger carried out a detailed scenario analysis focused on the "More than Moore" strategy here.

Blegging for missed scenarios, common features and early indicators

Are there scenarios that the analyses discussed above missed? Are there some types of scenario analysis that we didn't adequately consider? If you had to do your own scenario analysis for the future of computing technology and hardware progress over the next 10-15 years, what scenarios would you generate?

As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

I already identified some features I believe to be common to all scenarios (namely, increased focus on algorithms, and increased focus on parallelization). Do you agree with my assessment that these are likely to matter regardless of scenario? Are there other such common features you have high confidence in?

If you generally agree with one or more of the scenario analyses here (mine or McKinsey's or Roland Berger's), what early indicators would you use to identify which of the enumerated scenarios we are in? Is it possible to look at how events unfold over the next 2-3 years and draw intelligent conclusions from that about the likelihood of different scenarios?

I think your predictions about where Moore's Law will stop are wildly pessimistic. You quote EETimes saying that "28nm is actually the last node of Moore's Law", but Intel is already shipping processors at 22nm! Meanwhile on an axis entirely orthogonal to transistor size and count, there's a new architecture in the pipeline (Mill) which credibly claims an order of magnitude improvement in perf/power and 2x in single-threaded speed. Based on technical details which I can't really get into, I think there's another 2x to be had after that.

I think continued progress of Moore's law is quite plausible, and that was one of the scenarios I considered (Scenario #2). That said, it's interesting that you express high confidence in this scenario relative to the other scenarios, despite the considerable skepticism of computer scientists, engineers, and the McKinsey report.

Would you like to make a bet for a specific claim about the technological progress we'll see? We could do it with actual money if you like, or just an honorary bet. Since you're claiming more confidence than I am, I'd like the odds in my favor, at somewhere between 2:1 and 4:1 (details depend on the exact proposed bet).

My suggestion to bet (that you can feel free to ignore) isn't intended to be confrontational. cf.

http://econlog.econlib.org/archives/2012/05/the_bettors_oat.html

"28nm is actually the last node of Moore's Law" is referring to the "Density at minimum cost per transistor" version of Moore's Law, not the "smallest feature size we can get".

Yeah, but if the cost per transistor were going up, you'd expect them to stop there. But a little googling turned up this press release talking about 14nm, and this roadmap which extends out to 5nm.

There are applications where smaller sizes make sense, even at higher cost.

One point that I think might be interesting wrt saturation in consumer computing demands is the possibility of different interaction paradigms driving a dramatic increase in demand.

The example I have in mind for this is something like Oculus Rift, or more generally, VR. While consumer demand for computing power may be nearing saturation under the desktop paradigm, other modes of interaction require DRAMATICALLY more computing power to perfect the experience.

So if VR actually takes off there may be continued, or even greatly increased, demand by consumers for increased computing power. This would help ensure continued investment in the cutting edge, because even small increases could impact the experience in significant ways.

Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.

Algorithms to find the parameters for a classifier/regression, or algorithms to make use of it? And if I've got a large dataset that I'm training a classifier/regression on, what's to stop me from taking a relatively small sample of the data in order to train my model on? (The one time I used machine learning in a professional capacity, this is what I did. FYI I should not be considered an expert on machine learning.)

(On the other hand, if you're training a classifier/regression for every datum, say every book on Amazon, and the number of books on Amazon is growing superlinearly, then yes I think you would get a robust increase.)

Good question.

I'm not an expert in machine learning either, but here is what I meant.

If you're running an algorithm such as linear or logistic regression, then there are two dimension numbers that are relevant: the number of data points, and the number of features (i.e., the number of parameters). For the design matrix of the regression, the number of data points is the number of rows and the number of features/parameters is the number of columns.

Holding the number of parameters constant, it's true that if you increase the number of data points beyond a certain amount, you can get most of the value through subsampling. And even if not, more data points is not such a big issue.

But the main advantage of having more data is lost if you still use the same (small) number of features. Generally, when you have more data, you'd try to use that additional data to use a model with more features. The number of features would still be less than the number of data points. I'd say that in many cases it's about 1% of the number of data points.

Of course, you could still use the model with the smaller number of features. In that case, you're just not putting the new data to much good use. Which is fine, but not an effective use of the enlarged data set. (There may be cases where even with more data, adding more features is no use, because the model has already reached the limits of its predictive power).

For linear regression, the algorithm to solve it exactly (using normal equations) takes time that is cubic in the number of parameters (if you use the naive inverse). Although matrix inversion can in principle be done faster than cubic, it can't be faster than quadratic, which is a general lower bound. Other iterative algorithms aren't quite cubic, but they're still more than linear.

That makes sense. And based on what I've seen, having more data to feed in to your model really is a pretty big asset when it comes to machine learning (I think I've seen this article referenced).

Moore's Law has continued unabated for decades, with continuous predictions of its imminent demise. I would put a high level of confidence on its continuing for a while longer, possible with a minor slowdown or a brief pause of a couple years at some point, such as what occurred in 2008 with the great recession.

Interesting post. I thought this comparison from CNET was a bit misleading:

I drive a 1964 car. I also have a 2010. There's not that much difference -- gross performance indicators like top speed and miles per gallon aren't that different. It's safer, and there are a lot of creature comforts in the interior," said Nvidia Chief Scientist Bill Dally. If Moore's Law fizzles, "We'll start to look like the auto industry."

Car progress is clearly slower than computer progress, but it does seem very substantial:

The most recent survey by the Consumer Reports National Research Center found that five-year-old vehicles had about one-third fewer problems than the five-year-old vehicles we studied in April 2005. In fact, owners of about two-thirds of those vehicles reported no problems. And serious repairs, such as engine or transmission replacement, were quite rare.

One third fewer problems every 5 years seems like a very substantial rate of progress. It would be interesting to see if this rate of progress has continued from 2010-2014.

To be blunt, I don't believe Dally. A while back, in the context of technological stagnation, I compared a 2012 Ford Focus to a 1970 Ford Maverick -- both popular midrange compact cars for their time -- and found that the Focus beat the pants off the Maverick on every metric but price (it cost about twice what the Maverick did, adjusted for inflation). Roughly twice the engine power with 1.5 to 2x the gas mileage; more interior room; far safer and more reliable; vastly better amenities.

It's not scaling as fast as Moore's Law by any means, but progress is happening. That might be tempered a bit by the price point, but reliability alone would be a strong counter to that once you amortize over the lifetime of the car.

My scenario #1 explicitly says that even in the face of a slowdown, we'll see doubling times of 10-25 years: "If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation."

So I'm not predicting complete stagnation, just a slowdown where computing power gains aren't happening fast enough for us to see new products every few years.