Monday, October 10, 2005

Grids? Clusters? Distributed Computing?

Over at ADTMag there's a interesting article today on the use of the word "cluster" versus the use of the word "grid."

John K. Waters interviews Donald Becker, co-founder of the original Beowulf project. Becker points out that a lot of people use the word "grid" when describing something a lot narrower than "grid" may imply.

Grid is a concept that involves working with a large number of separately administered machines. With grid, you don’t control the configuration, the operating systems, the libraries installed—anything.
He makes a good, albeit tardy, point. He's tardy because the word "grid" now means so many things to so many people, it's impossible to define. I have yet to attend a grid event that doesn't start with hours of discussion over what "grid" means.

For a long time here at Digipede we avoided the term "grid" entirely, using "distributed computing" instead. And we defined it like this: "combining multiple computers to deliver increased performance on compute-, data-, and transaction-intensive applications." We avoided the term "grid" because we are only working on one OS. After all, our software doesn't run on all operating systems and it doesn't try to hide that fact. So why did we switch and start saying "grid?"

Simply because more people understand what "grid computing" means than understand what "distributed computing" means. A lot of people think that .NET remoting in and of itself is "distributed computing." The terminology becomes more confusing when you add in a term like "utility computing." Becker says:
So-called grid computing solutions for small to midsize businesses are more likely to be utility computing or clustering solutions
That's just the opposite of how I generally hear "utility computing" used; generally, people use it to mean "just plug into the network and get computing cycles, just like you get electrons." It's the ultimate in transparency, and it's years and years away. Clearly, that's not what Becker means. We have different ideas about "utility."

So what is grid? I've stopped trying to define it. Using multiple CPUs in multiple boxes, you can get more work done faster. Call it grid, call it utility, call it cluster, call it distributed computing, call it what you will. It doesn't matter to me.

All I know is that no matter what you call it: if you're not doing it, your software is running too slowly.