Watts, son — LessWrong

Some interesting numbers to contextualize IBM’s Watson:

90 Power 750 Express servers, each with 4 CPUs, each of those having 8 cores
Total of 15TB RAM (yep, all of Watson’s data was stored in RAM for rapid search. The human brain’s memory capacity is estimated at between 3 and 6 TB, and not all of that functions like RAM, and it’s implemented in meat.)
Each of the Power 750 Express servers seems to consume a maximum of 1,949 watts, making a total of 175kw for the whole computer
There also appears to be a sophisticated system connected to the Jeopardy buzzer but I can’t find power specs for that part.
IBM estimates that Watson can compute at about 80 teraflops (10^12). This paper mentions in passing that the human brain operates in the petaflop range (10^15), but at the same time, a brain is not a digital system and so the flop comparison is less meaningful.

To put this in perspective, a conservative upper bound for a human being standing still is at most about 150w — less than 1/10 of 1% of Watson — and the person just holds the buzzer and operates it with a muscular control system.

Each of the servers generates a maximum of 6,649 BTU/hour. Watson overall would generate about 600,000 BTU/hour and require massive amounts of air conditioning. I don’t know a good estimate on heat removal, but it would up Watson’s energy cost significantly.

I don’t mean to criticize Watson unduly; it certainly is an impressive engineering achievement and has generated a lot of good publicity and public interest in computing. The engineering feat is impressive if for no other reason than that it is the first accomplishment of this scale, and pioneering is always hard… future Watsons will be cheaper, faster, and more effective because of IBM’s great work on this.

But at the same time, the amazing power and storage costs for Watson really kind of water it down for me. I’m not surprised that if you throw power and hardware and memory at a problem, you can use rather straightforward machine learning methods to solve it. I feel similarly about Deep Blue and chess.

A Turing test that would be more impressive to me would be building something like Watson or Deep Blue that is not allowed to consume more power than an average human, and has comparable memory and speed. The reason this would be impressive is that in order to build it, you’d have to have some way of representing data and reasoning in the system that is efficient to a similar degree that human minds are. One thing you could not do is simply concatenate an unreasonable number of large feature vectors together and overfit a machine learning model. Since this is an important open problem with lots of implications, we should use funding and publicity to drive research organizations like IBM towards that goal. Maybe building Watson is a first step and now the task is to miniaturize Watson, and in doing so, we’ll be forced to learn about efficient brain architectures along the way.

Note: I gathered the numbers above by looking here and then scouring around for various listings of specific hardware specs. I'm willing to believe some of my numbers might be off, but probably not significantly.

Some interesting numbers to contextualize IBM’s Watson:

90 Power 750 Express servers, each with 4 CPUs, each of those having 8 cores
Total of 15TB RAM (yep, all of Watson’s data was stored in RAM for rapid search. The human brain’s memory capacity is estimated at between 3 and 6 TB, and not all of that functions like RAM, and it’s implemented in meat.)
Each of the Power 750 Express servers seems to consume a maximum of 1,949 watts, making a total of 175kw for the whole computer
There also appears to be a sophisticated system connected to the Jeopardy buzzer but I can’t find power specs for that part.
IBM estimates that Watson can compute at about 80 teraflops (10^12). This paper mentions in passing that the human brain operates in the petaflop range (10^15), but at the same time, a brain is not a digital system and so the flop comparison is less meaningful.

There's a categorical difference between "try to find a reasonable solution" and "throw money at this until it's no longer a problem" and you're acting like there isn't. I already made exactly the same comments you have in the OP, where I said:

I don’t mean to criticize Watson unduly; it certainly is an impressive engineering achievement and has generated a lot of good publicity and public interest in computing. The engineering feat is impressive if for no other reason than that it is the first accomplishment of this scale, and pioneering is always hard… future Watson’s will be cheaper, faster, and more effective because of IBM’s great work on this.

But there's a categorical difference in the two approaches. In my own field of computer vision, it's like this: if you want to understand how face recognition works, you will study the neuroscience of primate brains and come up with compact and efficient representations of the problem that can run in a manner similar to the way primates do it. If you just want to recognize faces right now, you just concatenate every feature vector imaginable at every scale level that could conceivably be relevant and you train 10,000 SVMs over a month and then use cross-validation and mutual information to reduce that down to a "lean" set of 2,000 SVMs and there you go, you've overfitted a solution that still leaves face recognition as a total black box, and you use orders of magnitude more resources and time to get that solution.

It's interesting that current researchers who spent years working on the primate brain / Barlow infomax principle idea and studied monkey face recognition at Cal Tech, and couldn't do good face recognition for years, are now blowing face.com and other proprietary face recognition software out of the water.

There's a categorical difference between even trying to solve the hard problem and resorting to using more resources when you have to, vs. just overblowing the whole thing and not even making an attempt at solving the hard problem. From what I know about natural language processing, machine learning, and Watson, Watson is the latter approach and its power and memory consumption reveal it to be quite unimpressive... though hopefully trying to miniaturize it will spawn interesting engineering research.

I already made exactly the same comments you have in the OP

Yeah, I read them at different times, and missed that.