
T
raveling every week so far in 2007 has had a major impact on my blogging frequency--I haven't had time to read my feeds in weeks, let alone write anything. But when I started this blog in 2005, I had a goal of writing at least one post a week, and I'm determined to get back to that.And last night, after a 12 hour day when I returned to my hotel room to start a post, how does Blogger greet me? By letting me know that Servlet NewFrontend is currently unavailable. Great. Can someone remind me why I use this service?
Now, back to our regularly scheduled program.
Spending a ton of time at customer sites this year has given me a vastly improved perspective how much demand there is for distributed computing power. I've been at a customer site where people were asking for time on a brand-new development grid because they need to get production analysis runs completed. I've seen people running from desktop machine to desktop machine starting analysis software. And, of course, the most prevalent "grid" out there: using remote desktop to access many servers, starting processes on each.
And why exactly are people using these slow, inefficient methods for getting work done? Can it even help? Of course it can. Because, after all, it's a delightfully parallel world.
Several years ago, long before Digipede had released a product but after we had decided on a feature set, we were told by one of the luminaries of distributed computing that the problem with a system like ours is that it could only solve "embarrassingly parallel" problems.
For those of you unfamiliar with the term, the term embarrassingly parallel refers to computing problems that are easy to segment for separate, parallel computation. Moreover, embarrassingly parallel problems require no communication between the various pieces of the problem.
There has long been a feeling in the academic computer science community that "embarrassingly parallel" problems aren't worth spending time on. Academics have been much more intent on solving those problems that can't be easily broken up, that require constant communication and direct memory access between processes. Fields like Finite Element Analysis and Complex Fluid Dynamics, for example, are enormously complex, require vast amounts of computing power, and have great computer science minds struggling to come up with new and innovative technologies.
While the academics have been solving these very difficult problems, they've been looking down their noses at embarrassingly parallel problems--the name itself is quite condescending.

But when you go out into corporate America, and you look at the problems that most developers are trying to solve, and you look at the compute loads that are strangling most overworked servers, you find a nasty little secret:
It's a delightfully parallel world.
That industry luminary told us that, in his estimation, perhaps 10% of computing problems might be considered embarrassingly parallel--everything else requires "real" distributed computing.
Having spent a bunch of time with customers, I think he is exactly wrong. Why? Because it's a delightfully parallel world.
Most customers out there who are adapting their software to run on a grid or cluster aren't tearing apart their algorithms, rewriting every line of code using a complex toolkit so it functions across processors. Carving an algorithm like that is amazingly difficult and requires enormous expertise.
No, customers do something far more efficient and practical: instead of trying to carve up their algorithms, they break up their data.
Say you've written a routine that can analyze the risk for a customer's portfolio--it runs for 5 seconds. If you have 1000 customers, it's going to take an hour and a half to run. Imagine you have 20 servers--you'd really like to spread that work around to get it done quicker. Now you could try to rewrite that algorithm in such a way that it uses multiple processors simultaneously, but that would involve complex technology like MPI and completely rearchitecting your routine. Here's a much easier solution: leave your algorithm exactly the way it is now, and break up the data instead. Each server analyzes 50 customers, and your analysis is done in about 4 minutes. Why was that possible?
Because it's a delightfully parallel world. Your customers' portfolios aren't dependant on each other--each can be analyzed independently.
And when you venture into corporate America, and you look at the server loads, you see that most of the analysis they are doing falls into this category.
A special effects company needs to render 50,000 frames for a scene. An electric power company needs to generate 20,000 complex bills for their largest customers. A web application needs to generate PDFs for users on the website. A bioinformatician needs to check 300 different proteins to see how well they dock on a segment of DNA. A trader needs to try 50 different trading algorithms against the history of a stock's performance.
All of these are daunting problems in terms of computing capacity--and all can be solved in parallel by dividing up the data.
Now, before the MPI-jockeys take me to task, some disclaimers: I don't pretend that every problem in the world can be divided like this, and I understand that dividing data can be a complex task in its own right. Moreover, what you guys do is really, really hard. I get that, and I'm glad you're out there solving those problems.
But for the other 90% of developers out there: don't rewrite your algorithms. Break up your data.
Because, as John Powers says, it's a delightfully parallel world!
Photo credit: Scott Liddell
Dan - I would be interested to know whether you've looked at research that groups like the IDC (and their HPC User Forum) may have done to accurately survey the parallel jobs mix. I suspect they have, and this might give you a better feel for quantitative stats on the mix of synchronous and asynchronous (ie, embarrassingly parallel) parallel workloads out there.
ReplyDeleteOne of the ways that I am used to talking about the amount of work to be done is to quantify it in terms of processor hours spent on a problem (1,000 processors running together for 2 hours uses 2,000 cpu hours). It is possible that your industry luminary was talking about processor hours when he came up with that 10% number for the amount of totally asynchronous parallel computation that needs to be done. That number sounds like it might be believable in that context, given that my users routinely bring allocations of 10 million hours to my HPC center to solve a single tightly synchronous problem in science and engineering (this could be many, many runs).
I do support users even on the S&E side of the house that run on 1-4 processors in ensembles of mostly asynchronous jobs (in fact this is about 20% of my job mix). But 80% of the total hours I provide in a year are consumed by users solving problems that use at least 256 processors at one time. I think it depends a lot on your domain.
John -
ReplyDeleteFirst - I believe your premise. In terms of the processor hours, the scientific problems out there consume enormous amounts of CPU power that dwarf the things I'm talking about. I don't think that the luminary in question was thinking that way, but I think you are correct.
I have, in fact, seen some IDC numbers and talked to their analysts, and here I run into another situation where the HPC industry isn't quite understanding what's happening in the enterprise world.
According to IDC, the work that my financial customer does on their, say 300-processor grid isn't HPC at all. Every night they may consume over 3000 CPU Hours performing their analysis, and sometimes just as much during the work day. I realized it's paltry by the standards you are talking about, but in corporate America it's nothing to sneeze at (and, yes, they're working toward a much larger grid).
However, according to IDC's definition, this isn't HPC. It's production work, and therefore they put it in a different category. When IDC presents its sales numbers for 2007, the 200 servers (800 procs) this customer buys this year won't be counted as HPC server sales--they'll be regular, old enterprise server sales. The software they buy from us? Enterprise software, not HPC software.
The problem is that the line between HPC and enterprise computing is not a line at all; it's a big, blurry continuum. For their jobs, IDC needs to draw a line somewhere--and nearly all of our customers' purchases fall on the "enterprise" side of the line. That's fine with me--IDC's classification of my company's sales don't affect our bottom line!
But it does effect the quality of the data they present. Their definition of HPC is a traditional definition, and it happens to be centered around scientific and engineering applications (not surprising--that's where HPC came from).
Really, the point of my post wasn't that there aren't lots of engineering problems out there that require zillions of tightly coupled processors; rather, my point is that their are millions of enterprise jobs out there that can be solved with tens, hundreds, or thousands of loosely coupled processors. Call it the long tail of distributed computing, if you like.
Hmm, I like that phrase. I may have to turn this into a blog post!
Dan -
ReplyDeleteGood points. I'm a dyed-in-the-wool scientific and technical HPC guy, but I'm one of the few in my immediate community that subscribes to the "million monkeys computing" theory.
Same as the "million monkeys typing" theory, but instead of the monkeys eventually producing hamlet, out of a million new users of HPC someone uses the power of computing to do what HPC does: change a technology, a community, or the entire world.
The things I'm championing on the scientific side resonate deeply with what you're doing on the enterprise side: reduce the barriers to entry so that everyone has a chance to get at these capabilities.
There is a lot of resistance to expanding definitions along the lines that you're talking about, not just in HPC, but by human nature everywhere. But I think its just a matter of time.