Friday, March 09, 2007

The Long Tail of Distributed Computing


I
nsideHPC is by far the most prolific of the HPC blogs I follow. John E. West only started the site about 2 months ago, but he seems to post more than once a day with informative, HPC-related information.

And he still finds time to comment on other blogs!

He left a couple of interesting comments on my previous post; as my response grew lengthier, I decided I need to put together yet another post. (I'm going to pull liberally from my own comments to that post)

As part of his comment, John said to me...
I would be interested to know whether you've looked at research that groups like the IDC (and their HPC User Forum) may have done to accurately survey the parallel jobs mix.
I don't know that I've read about the parallel jobs mix, John, but I have seen their growth numbers. Until a few years ago, most of the sales of HPC were in the category they call "capability" class systems--the very largest clusters and supercomputers, costing well over $1,000,000 per sale. Things below that were either "enterprise" or a smaller a category called "capacity."

What's happened since then? Well, that category called "capacity" got so big (over half of HPC sales), IDC had to break it into three separate categories--divisional, departmental and workgroup. It accounts for over half of all HPC sales, and it is growing much, much faster than the larger systems.

This jibes exactly with what we hear in the market, and what we hear from our partners. The smaller systems are growing, and growing at an amazing rate.

And there's one other factor here: IDC isn't counting a lot of sales here.

According to IDC, the work that my financial customer does on their, say, 300-processor grid isn't HPC at all. Every night they may consume over 3000 CPU Hours performing their analysis, and sometimes just as much during the work day. I realized it's paltry by the standards John is talking about (1,000,000 CPU-hour jobs), but in corporate America it's nothing to sneeze at.

However, according to IDC's definition, this isn't HPC. It's production work, and therefore they put it in a different category. When IDC presents its sales numbers for 2007, the 200 servers (800 procs) this customer buys this year won't be counted as HPC server sales--they'll be regular, old enterprise server sales. The software they buy from us? Enterprise software, not HPC software.

The problem is that the line between HPC and enterprise computing is not a line at all; it's a big, blurry continuum. For their jobs, IDC needs to draw a line somewhere--and nearly all of our customers' purchases fall on the "enterprise" side of the line. That's fine with me--IDC's classification of my company's sales doesn't affect our bottom line!

IDC's definition of HPC is a traditional definition, and it happens to be centered around scientific and engineering applications. That's not surprising--that's where HPC came from, and IDC is one of the few organization that is honestly trying to come up with accurate HPC numbers.

The thing is, it's easy to classify those huge, megamillion-dollar supercomputers as HPC. They're easy to identify, they're easy to classify (and there aren't really that many organizations that either need them or have the ability to pay for them). It's harder, however, to determine how every 128-, 64-, and 32-node cluster is being used.

I think of it as the long tail of distributed computing. Only, unlike Chris Anderson's famous "long tail" of products, where the bulk of sales is in the many, less-popular products--the long tail of distributed computing is that the bulk of distributed computing sales are in the hundreds of thousands of smaller installations.

I'm not trying to say that there aren't lots of engineering problems out there that require zillions of tightly coupled processors. There are thousands of science and engineering firms out there who need the capabilities of traditional HPC.

I am trying to say that there are hundreds of thousands of companies out there that do reporting. Analysis. Content creation. Content repurposing. None of those companies will ever have a million CPU-hour job. But millions of them will have 1 CPU-hour jobs (the kind that a 5 node grid gets done in 12 minutes!). And nearly all of those companies are outside the science and engineering realm.

The long tail of distributed computing is rich and varied indeed. And it's growing longer every day.

And why is that exciting? I'll close with a quote I pulled from John E. Wests's comment:
...out of a million new users of HPC, someone uses the power of computing to do what HPC does: change a technology, a community, or the entire world.

The things I'm championing on the scientific side resonate deeply with what you're doing on the enterprise side: reduce the barriers to entry so that everyone has a chance to get at these capabilities.
Exactly, John. HPC or not, distributed computing lets people do things that individual machines could never accomplish.