Tuesday, March 27, 2007

That's great. Now how can I use three trailers?

In his post Showdown in the trailer park II, the always enjoyable Nick Carr excerpts a paper from James Hamilton of Microsoft.

The paper, available here, details the use of shipping containers filled with racks of commodity hardware as a low-cost method of adding power to data centers. According to the abstract,

Large numbers of low-cost, low-reliability commodity components are rapidly replacing high-quality, mainframe-class systems in data centers. These commodity clusters are far less expensive than the systems they replace, but they can bring new administrative costs in addition to heat and power-density challenges. This proposal introduces a data center architecture based upon macro-modules of standard shipping containers that optimizes how server systems are acquired, administered, and later recycled.
Carr contrasts Hamilton's views with those of Greg Papadopoulos from Sun, who thinks that specialized hardware (similar to those products released by Sun and Rackable) will be the residents of supercomputing trailer parks.

In either case, I think all of these gentlemen are missing a key piece in this puzzle: how is the average company, with average developers, going to use this stuff? If you dump 9600 cores in the lap of Elvis or Mort, how is he going to use them?

The answer is: if you don't make it a lot easier than it is now, he's not.

There have long been tools for the Einsteins of software to write parallel and massively parallel software. And the Googles of the world (as well as the Live.coms, the Boeings, the Lockheed Martins, the Barclays Capitals) have the resources to buy and train developers like that. These are the companies you read about that are utilizing thousands of processors in grids.

But distributed computing will not (and indeed cannot) move into the mainstream, and into its long tail, without giving the average developer access to these tools.

Utilizing 2 cores is too hard for most developers. Using 9600? Out of the question.

Already, the growth in the HPC industry is in the smaller companies, in the smaller departments, and in the smaller implementations. The growth isn't in the 5 1,000-node grids that will go in next year; it's in the 50 100-node grids that will go in.

But to make that number larger--to sell 500 100-node grids--this hardware has to become much more accessible. And when I say "accessible," I'm not referring to solving the problem of how to ship those containers. I mean that you have to give developers the means to take advantage of these processors.

The canonical "long tail" example is, of course, Amazon. Amazon learned early on that the bulk of their sales came not from a few blockbusters that sell thousands of copies a day, but from the tens of thousands of books that sell a few copies a day. Sure, Harry Potter is important to them, but there are so many Harry Potters released each year. Most books simply don't sell that many copies.

This market will grow the same way. The headlines will come from the latest 5,000 node grid--but these will be few and far between. Most companies simply don't need that kind of computing power. No, in 5 years, there will be many more processors in 100-node grids than in 5,000 node grids.

That is...the market will grow this as long as the developer tools mature along with the hardware. The book market needed Amazon to figure out how to sell books over the internet to enable its long tail. The grid computing market needs great developer tools to enable its own.

Sun may be learning this lesson. According to Ashlee Vance of the Register, Sun is trying to improve demand for its disappointing Sun Grid product by making it easier for developers to use. My opinion? From what I've read, they don't understand ease of use.

As far as the hardware is concerned, I think James Hamilton has it right. If MSN and Google think commodity hardware is the way to go (and if solutions from companies like Sun continue to be so hard to use), the long tail will follow that commodity path.