West Coast Grid: December 2005

Wednesday, December 21, 2005

Maximizing computation, minimizing data

I had a Digipede customer ask me last week about how to optimize his processes for running on the Digipede Network. He is grid-enabling an existing application--he's got a class that does a lot of calculation, and he wants to have a bunch of those objects calculating in parallel. Each object is already independent.

Sounds perfect for distributed execution, right? "Twenty lines of code" away from being Digipede-enabled?

Well, not quite.

See, the objects that he does his calculation on are pretty darn big--on the order of 20-25 megs each. He actually had no idea they were so big; but the class has lots of members, and many of the members are large classes themselves. Now, the Digipede Network can certainly move objects of that size--but those objects have to be moved in order to calculate on them, and moving data is often more time consuming than the computation you need to do on it. (See Jim Gray's Distributed Computing Economics to get an idea of what that means, but bear in mind that we are only discussing LAN-wide computation here).

The answer can frequently be to create a class that contains only the data relevant to the distributed calculation. In the case with this customer, he was "retrofitting" the application to work on the Digipede Network. His class wasn't designed for distribution and, as a result, had a lot of data in it that wasn't necessary for the calculation that he wanted to happen remotely. In other words, his objects did a lot in their lifetime, only a portion of which was going to be distributed.

The customer needed to create a class that contained only the data that was relevant to his distributed process, and use that as a member in his huge class. Only this new, smaller class gets distributed across the network. Instead of moving 20MB per object, he was now moving only a few kilobytes. When the small class returns from its journey across the network, its result data is then copied back into the main object.

Our customer needed to do a bit more work than the fabled "twenty lines of code"--but he ended up with a more structured application and vastly improved performance.

Technorati tags: digipede, distributed computing

Thursday, December 15, 2005

Good Reads

[Update: looking for my discussion of Microsoft's tools? It's down here]

Here are a few links for your perusal...

Don Dodge, always an interesting read, echoes Dan'l Lewin's list of hot startups using Microsoft tools here:

Digipede—Its "many legs make light work" and turn any combination of servers and desktops into a grid for .NET apps.

Over at the Science Library Pad, they point out that SearchCIO has published Gartner's 2006 Top 10 Strategic Technologies.

Grid computing. Who's doing grid computing? Charles Schwab, Royal Dutch Shell and Sony are among the companies tapping the technology. Definitions of grid computing vary, but its popularity continues to rise.

And, my favorite read of the day, Software Development Magazine has just published a review of our software, the Digipede Network. Paid registration required for the article, but suffice it to say: Four stars! Thanks guys!

It's a great review, and not just because he liked our product. As it turns out, the author (Rick Wayne) isn't just a tech writer--he also does soil analysis. He knows his way around compute intensive simulations. So he converted his office into a mini-grid, using nothing but the Digipede Network software (and included documentation, of course)--with not a drop of help from us. He got it working in a snap, and he was one happy camper.

Technorati tags: grid computing, digipede, .net

MS Tools Too Expensive? Think Again.

This started as a thread in some comments over on Scoble's blog, but it got to be too long for a comment so I moved it over here.

An anonymous commentor wrote:

There are startups using .NET, but they aren’t the majority, and those who chose to do so are buying themselves into a trap with expensive licenses and a locked-in platform.

Jeremy Wright from b5Media, who has a great blog called Ensight, contradicts him:

The startup can grab ISV packs which’ll cost about 2500$ to get the company up and running with all the dev tools and server bits they’ll need. Toss in another 2500$ and they’ll get all the MSDN stuff they need. 5000$ is not that much to get a 5-10 man shop up and running, even when bootstrapping.

Jeremy has a great point, but he's off by an order of magnitude!

I work at a startup. We joined Microsoft's Empower program, which exists to help startups with initial costs. It cost us $375. It included a universal MSDN subscription with 5 user licenses, as well as 5 licenses for Office, and "the full array of server products including Windows Server 2003, Exchange 2003 Server and SQL Server."

Um, does $375 seem like an exorbitant amount for that?

Also included: technical support and training. Individuals at Microsoft assigned to help us with technical details, our marketing, and even our sales.

As soon as we could, we became Partners, then Certified Partners, then Gold Certified Partners. The benefits are enormous: Microsoft helps with marketing, they help with architectural issues, we get early releases of software, we get direct access to the product teams and their roadmaps. Oh, and we get GREAT license benefits.

If you haven't worked with Microsoft, then you just don't understand this: they work very, very hard to create an ecosystem that actually fosters innovation. They want startups like mine to choose their tools, so they do a tremendous amount of work to make it a good choice. And with programs like Empower, the cost of all those tools, operating systems, and support, is virtually nothing.

Windows may not be your OS of choice, and if your users are all running Linux boxes, obviously these tools and programs aren't for you.

But if you think it's too expensive for a startup to use Microsoft products, you just haven't done the research.

From our perspective: it's a slam dunk.

Technorati tags: microsoft, devtools, msdn

Tuesday, December 13, 2005

Not done Christmas shopping yet?

Dan'l Lewin, who runs Microsoft's Emerging Business Team, has picked out his list of "coolest companies under the tree."

Guess who made the cut?

Thanks, Dan'l. I predict there will be no coal in your stocking this year!

Thursday, December 08, 2005

Of Course Scaling Matters!

The controversy over at Jeremy Wright’s Blog, which started with this post, hasn't slowed down much. He's gotten a ton of comments, flames, and posts. He defends himself here:

Which is why designing for scale is so important. I don’t believe any startup “needs” to achieve anything more than around 2 9s of uptime, which is what a properly configured server should do for you. However, even at the beginning, you need to be coding and planning for growth.
Small things like managing how transactions occur, having separate database connections for reading and writing, making your app able to handle variable state sessions, etc are key.

(emphasis mine)

One of his posters, Ian Holsman, responds:

The reasoning behind not worrying about scaling is that in a lot of cases people worry about the wrong things. They will spend hours getting that code tuned just so, and have it running in 10ms less time, only not to realize that the code is only run once a day.

Scott Sanders from Feedlounge also responded:

The FeedLounge development process was more along the lines of:
1. Build a webapp, see if the features are compelling to a set of users, keeping a design in mind that is capable of scaling
2. Overrun the shared server that you are using, switch to dedicated server, so you can properly measure the effects of the application.
3. Add more users, adding requested features from the users, measuring the load in a fixed, known environment, and start work on “Distributed” part of ladder. The is where the build portion of the scalability starts.
4. Now that you believe you have something that has value, invest in the hardware and software development necessary to scale. Continue working on priority based tasks towards release of your product.

(again, emphasis mine)

Scott's point is valid, and the italicized portion makes it valid: you need to be designing your software so it's capable of scaling. No, Ian, this doesn't mean spending weeks optimizing the code to trim every last microsecond off of every transaction. It means designing your software well from the beginning.

Most importantly, it means acknowledging the possibility, however remote, that you may actually succeed and build something that people eventually use. Many people.

This point applies equally to those designing web sites and those planning on deploying SaaS. If you are going to make it available on the web, and you're not designing for scalability, then you just aren't planning for success: you're planning for failure.

Ian does make one point validly: in the beginning, you can't spend too much time on scalability. You need to make sure you get the <content, features, service, whatever> right. But you need to be prepared for scaling; that's why it's important to choose a toolset in the beginning that makes scaling later as easy as possible.

Plan on succeeding.

As a product manager, I'd be remiss if I didn't point out that that is exactly what we designed our SDK for--so developers can spend their time in their area of expertise, but have a framework underpinning their software that will scale when the time comes.

Technorati tags: scalability, web2, web2.0, web 2.0, SaaS

Wednesday, December 07, 2005

How to Buy a Grid

I was starting to prepare a post on how to buy a grid--what are the steps you can take, and what's involved.

Then I remembered this post by my colleague Kim. It's a great place to start: 10 easy steps, and guaranteed success (well, they may not all be easy, and no one can guarantee success). But it's a very good list of things to consider when you are educating yourself about a purchase.

I'd add one more thing to watch out for: don't be swayed by features that you don't need (and won't be able to take advantage of). If someone tells you that their software is the most powerful on the planet because it works on 12 different operating systems and can tie together PCs in Idaho with mainframes in Kirkutsk, but you have 200 PCs in one office building, how does that help you? It doesn't. So look for something that will help you the way you work.

So read Kim's post. It's a good one. Plus, she used it to invent the word gridified, which is my new favorite word.

Tuesday, December 06, 2005

Hey, Blogger! Spell check much?

I tend not to use spell-checkers; I try to read my writing carefully (although anyone who has been paying attention knows that I certainly don't always do that successfully).

Today, for the first time, I used Blogger's built-in spell checker.

It choked on a word, that, well, I thought it would know...

Technorati tags: blogger

Web 2.0 Companies NEED To Scale

Sometimes I read a blog post and it just makes me smile.

Today Jeremy Wright talks about the need for scaleability for Web 2.0 companies (he's specifically talking about an announcement from FeedLounge). He says:

Listen up. If your company relies on the web to stay alive, you’d damn well better be using at least some of the following “ladder to high availability”:

Backups, Redundant, Failover, Cluster, Distributed, Grid and finally Mesh

Each step up is a massive increase in cost, but it’s also a massive increase in uptime and such. I hate it when companies say they want 99.9% uptime (or even worse 5 9s of uptime) without thinking about what that’ll cost them.

Distributed/Grid computing should be the kind of thing that these companies are thinking about from the time they begin planning their architecture. They have to plan for success. They have to plan on hundreds of thousands or millions of users, right?

And I'll go a step further than Jeremy--the other thing Web2.0 companies shouldn't do is write that portion of their applications from scratch. I mean, no one in their right mind writes their own database to sit under web apps, right? You go get one--SQL Server, MySQL, PostgreSQL , whatever is right for you--but it would be a complete waste of your time to sit down and write one from scratch.

So why do people try to do that with a distributed/grid system? Most do. It's too bad; it's a waste of their valuable time and money. And in all likelihood, they'll end up with a solution that isn't nearly as scalable as they think is.

I'll let Jeremy finish this one up for me:

If your business depends on your website being up, look at your code, look at your infrastructure and for your users sake figure out what you actually need and build the damn thing properly!

Technorati tags: Web20, Web 2.0, Web2

VS 2005: VSTO Improvements

A couple of weeks ago, I wrote a post about my difficulties in upgrading a VSTO project from VSTO 2003 to VSTO 2005 (for those of you unfamiliar with VSTO, it stands for Visual Studio Tools for Office). It's replacing VBA as the method to put code behind spreadsheets, Word documents, etc.)

I had some difficulties upgrading a VSTO project from Visual Studio 2003 to Visual Studio 2005; they've changed some of the architecture, and it was a pain (read that post if you want the gory details).

However, my post was remiss--I neglected to mention any of the good changes in VSTO.

First and foremost: Integrated development environment. In the past, if I wanted to put buttons on my Excel spreadsheet, I had to put Excel into "Design mode", create buttons using the Control Toolbox, then switch over to Visual Studio to wire those to methods in my C# code. It was pretty cumbersome, and dealing with modality in Excel was a pain ("No, I meant to select the button, not click it!"). Now, all of the design work is done directly in Visual Studio 2005--I open my Excel spreadsheet there, and I work on the UI using the regular toolbox in Visual Studio. It's a much more cohesive experience.

Secondly: now the controls look better! The buttons that appear on the Excel spreadsheet just plain look better than they used to.

Third: this is the part I don't quite understand. For some reason, the .NET code in the spreadsheet seems to load much faster. I can't tell if this is my imagination or not--but I've done this a bunch, and I don't think it is. It seemed in the past that when I made my first call into .NET code behind my spreadsheet, I had a several-second delay (disclaimer: I have the slowest laptop on the planet). For whatever reason, that seems to have disappeared. Now, when I make my first call, it happens immediately. This must be a .NET 2.0 change (because I'm still running the same version of Excel 2003), but it's certainly welcome.

The change to the object model took some getting used to, and it was annoying to have to rewrite some of my code. But I'm coming around to it.

Last, the methodology of associating the binary with the spreadsheet has been improved. In the past, there were two custom file properties in the spreadsheet--one which gave the assembly location, and one which gave the assembly name. The location defaulted to NameOfSpreadsheet_bin, and the name of the DLL was NameOfSpreadsheet.dll. It always looked pretty unwieldy--you'd have your NameOfSpreadsheet.xls, and next to it a NameOfSpreadsheet_bin folder with a NameOfSpreadsheet.dll in it. And if you ever moved anything, it was a pain to tell .NET 1.1 how to give permission to the new DLL.

Now, the custom property of the spreadsheet has the GUID of the DLL in it, and .NET 2.0 gives permission to that GUID. This means that you no longer need to have a *_bin folder, and you have a much easier time if you need to move/deploy your spreadsheet.

So the upgrade process is a pain. But once you get in there, VSTO for Visual Studio 2005 is definitely better to work with that VSTO for Visual Studio 2003. I haven't built anything extravagant yet (unless you consider a grid-enabled, supercomputing spreadsheet extravagant--come to my webinar in a half an hour to hear more about that), but I think the product is much improved.

[Updated 13:17 adding Technorati tags] vsto visual studio excel

Monday, December 05, 2005

Want to learn what I'm talking about?

I'm giving a Developer webinar tomorrow at 10:00 AM, Pacific time.

I'll talk a bit about the Digipede Network, I'll use Visual Studio 2005 to grid-enable a .NET application, and show that application running faster by running on a cluster of Windows boxes.

If you haven't seen a demo of the Digipede Framework SDK yet, you'll be amazed at how little I have to modify an existing application in order to make it run on the Digipede Network. Click here to register and join in on the fun.

Friday, December 02, 2005

Windows Cluster/HPC links

One of the questions I hear most frequently about Windows clusters is "Windows clusters?"

The answer: "Yes!"

People do use Windows for clusters and HPC. Here are some links to valuable resources for anyone considering a Windows Cluster.

Microsoft's Windows Compute Cluster Server home page

Microsoft's HPC Discussion Group

Windows Cluster Resource Centre

Windows HPC site

And, of course, there is Microsoft's HPC Partners Page, featuring this little company!

Thursday, December 01, 2005

Nothing to see here

No Need to Click Here - I'm just claiming my feed at Feedster

Subscribe to: Comment Feed (RSS)

West Coast Grid