Tuesday, March 27, 2007

That's great. Now how can I use three trailers?

In his post Showdown in the trailer park II, the always enjoyable Nick Carr excerpts a paper from James Hamilton of Microsoft.

The paper, available here, details the use of shipping containers filled with racks of commodity hardware as a low-cost method of adding power to data centers. According to the abstract,

Large numbers of low-cost, low-reliability commodity components are rapidly replacing high-quality, mainframe-class systems in data centers. These commodity clusters are far less expensive than the systems they replace, but they can bring new administrative costs in addition to heat and power-density challenges. This proposal introduces a data center architecture based upon macro-modules of standard shipping containers that optimizes how server systems are acquired, administered, and later recycled.
Carr contrasts Hamilton's views with those of Greg Papadopoulos from Sun, who thinks that specialized hardware (similar to those products released by Sun and Rackable) will be the residents of supercomputing trailer parks.

In either case, I think all of these gentlemen are missing a key piece in this puzzle: how is the average company, with average developers, going to use this stuff? If you dump 9600 cores in the lap of Elvis or Mort, how is he going to use them?

The answer is: if you don't make it a lot easier than it is now, he's not.

There have long been tools for the Einsteins of software to write parallel and massively parallel software. And the Googles of the world (as well as the Live.coms, the Boeings, the Lockheed Martins, the Barclays Capitals) have the resources to buy and train developers like that. These are the companies you read about that are utilizing thousands of processors in grids.

But distributed computing will not (and indeed cannot) move into the mainstream, and into its long tail, without giving the average developer access to these tools.

Utilizing 2 cores is too hard for most developers. Using 9600? Out of the question.

Already, the growth in the HPC industry is in the smaller companies, in the smaller departments, and in the smaller implementations. The growth isn't in the 5 1,000-node grids that will go in next year; it's in the 50 100-node grids that will go in.

But to make that number larger--to sell 500 100-node grids--this hardware has to become much more accessible. And when I say "accessible," I'm not referring to solving the problem of how to ship those containers. I mean that you have to give developers the means to take advantage of these processors.

The canonical "long tail" example is, of course, Amazon. Amazon learned early on that the bulk of their sales came not from a few blockbusters that sell thousands of copies a day, but from the tens of thousands of books that sell a few copies a day. Sure, Harry Potter is important to them, but there are so many Harry Potters released each year. Most books simply don't sell that many copies.

This market will grow the same way. The headlines will come from the latest 5,000 node grid--but these will be few and far between. Most companies simply don't need that kind of computing power. No, in 5 years, there will be many more processors in 100-node grids than in 5,000 node grids.

That is...the market will grow this as long as the developer tools mature along with the hardware. The book market needed Amazon to figure out how to sell books over the internet to enable its long tail. The grid computing market needs great developer tools to enable its own.

Sun may be learning this lesson. According to Ashlee Vance of the Register, Sun is trying to improve demand for its disappointing Sun Grid product by making it easier for developers to use. My opinion? From what I've read, they don't understand ease of use.

As far as the hardware is concerned, I think James Hamilton has it right. If MSN and Google think commodity hardware is the way to go (and if solutions from companies like Sun continue to be so hard to use), the long tail will follow that commodity path.

Sunday, March 25, 2007

From Lemons to Lemonade

  ost of the work I've been doing lately is under NDA, either with clients or partners, and I haven't been able to blog much about it. But a snowstorm in Boston last weekend that led to an impromptu night of partying in Vegas seemed worthy of a post.

Last week I was in Boston at the Microsoft Technology Center sitting in on an Architectural Design Session (too bad I can't write about that--some very exciting stuff). After arriving in Boston on the redeye Thursday at 6:00am, I was scheduled to leave out of Logan airport Friday at about 7:30pm and land late that night in Oakland.

As luck would have it, Boston's second blizzard of the year hit Friday afternoon. When I left the MTC at around 4:00, the parking lot was empty save for 4-6" of new snow, and a snowplow vainly attempting to clear it. Before leaving, I called JetBlue to ensure that my flight was still scheduled to leave on time. Despite the fact that all flights up and down the east coast, the flights headed west were still reported as scheduled to depart on time.

So I left the cozy confines of Waltham, MA and headed to Logan. The 30-minute drive took about an hour and a half with the crappy conditions, but I made it in plenty of time...to find that my flight had been delayed.

After an hour or two delay, they thought they had a departure window. To cheers from the waiting area, they announced the impending departure of our flight. We gathered our carry-ons, scurried aboard...and waited. And waited. The snow turned to freezing rain, and the FAA delayed us some more. They opened the door and let anyone who wanted to leave. The captain came out to visit with the passengers to answer questions. And finally, about 11:00pm, they canceled the flight.

As soon as I could fire up the EVDO card and my Skype headset, I learned these things: every hotel within 10 miles of the airport was fully booked, JetBlue was sold out for days, and my weekend looked like it was going to be full of lemons.

Within minutes, though, things were looking up. My wife was having dinner with my good friend (and great travel agent) Celia Coene, and they quickly found me a hotel in Cambridge, a flight the next morning out of Providence RI, and a (rental) car to get there.

More lemons in the morning, as I realized that I had no cash and had no idea what the PIN for my brand-spanking new ATM card was. My phone call to Cindy came about 6:00 AM Saturday PDT, waking her up--but she quickly ungroggified herself enough to dig up my PIN so I could get enough cash to get myself to the airport.

A little while later, my ATM card misfortune started to turn those lemons to lemonade. Unable to go back to sleep, Cindy had gotten online and noticed that Southwest had one earlier flight from Providence to Oakland--one that would get me home about 6 hours earlier. She couldn't get me a seat, but she suggested I call and find out if it was possible to get on that flight.

I called Southwest, and after 30 minutes on hold (nearly killing my cell phone), I finally had some good luck: I got the last seat on the flight. I barely had time to make it, but I sped up and got there just in time.

I got through security almost exactly at the departure time and learned that the flight was slightly delayed--I had about 10 minutes to spare!

My phone rang. It was Cindy. "You know you're flying home through Vegas?" Um, not really. "Did you know that Chevelle is playing in Vegas tonight?" (Loyal readers will remember that I was the tour manager for Chevelle in 2002 and 2003.) "How about I meet you in Vegas?" My wife is a genius.

Suddenly, this lemonade was tasting wonderful indeed.

We had 10 minutes to take care of the logistics--change my flight, redeem a Southwest coupon for Cindy's flight, and book a room at the Palms. Chevelle and Evanescence were playing together at the grand opening of Pearl, the new venue at the Maloof brothers' hotel and casino.

With two internet connections, two phones, and a helpful Southwest worker, we did it.

Six hours later, Cindy and I were having dinner at the Second Floor Cafe with Pete, Sam, and Dean. The guys were great, the show was a blast, and the weekend was saved.

That was some sweet lemonade indeed!

Photo credits: Jane M. Sawyer, Dawn M. Turner, and Cindy Ciruli

Monday, March 12, 2007

DST and Me (also, Microsoft Screws Up)

have long had a love-hate relationship with Daylight Saving Time.

Despite my love of snowboarding, summer is by far my favorite season (and the solstice my favorite day of the year). As Alicia Bridges sang, I love the night life--but I love the late evening summer sun even more. In the days when I used to play a lot of ultimate frisbee, I cherished the month or so a year when the combination of long summer days and DST until nearly 9.

And to celebrate the first workday of DST (and to keep Robert Karl from bothering me about my carbon emissions), I was elated to ride my bike to work yesterday.

But my relationship with DST isn't all roses and lilacs.

At my old company, Energy Interactive, I was tasked with designing a database that stored lots of time-series data--electric usage metered every 60, 15, 5, or 1 minute. The database had to be efficient, performant, and had to store data and retrieve data from different time zones.

And, thanks to the wonder of Daylight Saving Time, it had to deal with one 23-hour day and one 25-hour day per year. That wasn't such a big deal, of course, because I did what any right-thinking database designer would do: I stored all of the data based on GMT. GMT doesn't change, GMT doesn't have 23-hour days, GMT can help you through all sorts of time zone issues. The only trick is that you have to translate any local times into GMT when you import data (and back again if you're going to display it, bill on it, etc.).

The modules I wrote back then were heavily dependent on the time functions built into the Microsoft Visual C runtime library. They were sometime a pain to work with (especially with differing time zones), but they were invaluable.

This year, however, my relationship with DST took a decided turn for the worse.

As everyone knows by know, the Energy Policy Act of 2005 decreed that the start and end dates for Daylight Saving Time were changed this year--that meant that my wife, who still works at my old company and had the grave misfortune of inheriting my old code, was responsible for updating all of the modules to function correctly with the new definitions of DST.

It didn't seem like a big deal, right? Microsoft would release a new version of MSVCRT.DLL, she'd perform some testing to verify it, and everything would be peachy.

Well, it didn't work that way. Over the last few weeks, Microsoft had released patches to all of their operating systems, and a patch to .NET (albeit a buggy one)--however, MSVCRT.DLL wasn't updated.

Cindy played a game of chicken with Microsoft, hoping that eventually they'd release a new version.

Finally, last week, she blinked. She gave up hoping that they'd release a new MSVCRT.DLL, and she spent every night last week coming up with her own fix: writing code that would ensure that, for all time zones and for all years, her software would work properly.

And then she came into work on Monday and found this: On Friday, Microsoft had released a patch to MSVCRT.DLL.

Yep, a full 36 hours before the DST event.

What does this mean? It means that now she may have to undo her code changes (she tried to anticipate this in her code of course), and she'll definitely have to re-QA everything.

Microsoft, along with everyone else in America, has known about this change for 2 years. Leaving developers hanging until the day before the event was simply inexcusable.

Technorati tags: ,

Friday, March 09, 2007


hree of us headed down to PodTech's palatial offices in Palo Alto yesterday to spend a little time with Robert Scoble. John and Rob did a half-hour interview on camera, then I did 10 minutes of show and tell using some of our demos (I even showed some code, which must have felt very Channel 9-y to Robert; he doesn't seem to do that now that he's in the heart of Web 2.0land instead of in the middle of Redmond).

Digipede may be a bit more "enterprisey" than most things he covers, but he's still geeky enough to appreciate a powerful software development tool when he sees one.

Scoble says it'll be take 3 weeks to get the interview up (and, given that he's headed to SXSW today, I bet it could take even longer than that!).

Technorati tags:

The Long Tail of Distributed Computing

nsideHPC is by far the most prolific of the HPC blogs I follow. John E. West only started the site about 2 months ago, but he seems to post more than once a day with informative, HPC-related information.

And he still finds time to comment on other blogs!

He left a couple of interesting comments on my previous post; as my response grew lengthier, I decided I need to put together yet another post. (I'm going to pull liberally from my own comments to that post)

As part of his comment, John said to me...
I would be interested to know whether you've looked at research that groups like the IDC (and their HPC User Forum) may have done to accurately survey the parallel jobs mix.
I don't know that I've read about the parallel jobs mix, John, but I have seen their growth numbers. Until a few years ago, most of the sales of HPC were in the category they call "capability" class systems--the very largest clusters and supercomputers, costing well over $1,000,000 per sale. Things below that were either "enterprise" or a smaller a category called "capacity."

What's happened since then? Well, that category called "capacity" got so big (over half of HPC sales), IDC had to break it into three separate categories--divisional, departmental and workgroup. It accounts for over half of all HPC sales, and it is growing much, much faster than the larger systems.

This jibes exactly with what we hear in the market, and what we hear from our partners. The smaller systems are growing, and growing at an amazing rate.

And there's one other factor here: IDC isn't counting a lot of sales here.

According to IDC, the work that my financial customer does on their, say, 300-processor grid isn't HPC at all. Every night they may consume over 3000 CPU Hours performing their analysis, and sometimes just as much during the work day. I realized it's paltry by the standards John is talking about (1,000,000 CPU-hour jobs), but in corporate America it's nothing to sneeze at.

However, according to IDC's definition, this isn't HPC. It's production work, and therefore they put it in a different category. When IDC presents its sales numbers for 2007, the 200 servers (800 procs) this customer buys this year won't be counted as HPC server sales--they'll be regular, old enterprise server sales. The software they buy from us? Enterprise software, not HPC software.

The problem is that the line between HPC and enterprise computing is not a line at all; it's a big, blurry continuum. For their jobs, IDC needs to draw a line somewhere--and nearly all of our customers' purchases fall on the "enterprise" side of the line. That's fine with me--IDC's classification of my company's sales doesn't affect our bottom line!

IDC's definition of HPC is a traditional definition, and it happens to be centered around scientific and engineering applications. That's not surprising--that's where HPC came from, and IDC is one of the few organization that is honestly trying to come up with accurate HPC numbers.

The thing is, it's easy to classify those huge, megamillion-dollar supercomputers as HPC. They're easy to identify, they're easy to classify (and there aren't really that many organizations that either need them or have the ability to pay for them). It's harder, however, to determine how every 128-, 64-, and 32-node cluster is being used.

I think of it as the long tail of distributed computing. Only, unlike Chris Anderson's famous "long tail" of products, where the bulk of sales is in the many, less-popular products--the long tail of distributed computing is that the bulk of distributed computing sales are in the hundreds of thousands of smaller installations.

I'm not trying to say that there aren't lots of engineering problems out there that require zillions of tightly coupled processors. There are thousands of science and engineering firms out there who need the capabilities of traditional HPC.

I am trying to say that there are hundreds of thousands of companies out there that do reporting. Analysis. Content creation. Content repurposing. None of those companies will ever have a million CPU-hour job. But millions of them will have 1 CPU-hour jobs (the kind that a 5 node grid gets done in 12 minutes!). And nearly all of those companies are outside the science and engineering realm.

The long tail of distributed computing is rich and varied indeed. And it's growing longer every day.

And why is that exciting? I'll close with a quote I pulled from John E. Wests's comment:
...out of a million new users of HPC, someone uses the power of computing to do what HPC does: change a technology, a community, or the entire world.

The things I'm championing on the scientific side resonate deeply with what you're doing on the enterprise side: reduce the barriers to entry so that everyone has a chance to get at these capabilities.
Exactly, John. HPC or not, distributed computing lets people do things that individual machines could never accomplish.