Thursday, July 06, 2006

Kicking a Half-KLOC


K
evin Burton (of TailRank) posted yesterday saying "Number of blogs is the new KLOC." KLOC stands for thousand lines of code; his post makes a very good point that the number of blogs that a site indexes is not necessarily the best measure of how good that index is--and draws a parallel to Steve Ballmer noting that tracking a developer's KLOC fails to track how useful it can be to eliminate lines of code.

I experienced that yesterday when porting a partner's application to run on the grid.

This was yet another very cool grid app that had been written behind Excel. It values a portfolio of callable bonds under a variety of interest rates, and had been written to run the analysis on a cluster. Like most cluster applications, it was pretty hardcoded to work on the cluster. It was well written, but it was extremely complicated: it had different threads starting tasks on each node, at least one thread for monitoring tasks, and a thread for reassigning tasks gone awry. It needed to know the name of every machine on the cluster, and, of course, it relied on its computation algorithm being pre-installed on each node on the cluster in a standard fashion. Pretty normal stuff.

And, in fact, it was very fragile. Our partner had attempted to move this from a 4-node cluster to an 8-node cluster and found that it ran much slower. Why? It's not clear--my guess is that trying to write a complicated multi-threaded application to run behind Excel just isn't reliable. The submitting machine was responsible for monitoring everything as well as processing the results, so it got bogged down. Debugging that was going to be an absolute nightmare: with so many different threads happening simultaneously, finding the inefficiency could take days or weeks.

I made a wiser choice. In a couple of hours, I ported it to run on the Digipede Network. Result: now the spreadsheet has none of the extremely complicated code in it--it makes simple API calls. It now has guaranteed execution of the tasks across the cluster without having to manually monitor each one. The user no longer has to pre-stage anything on the cluster--all of that happens automatically. The cluster is used more efficiently, and the whole thing runs faster (and scales much, much better).

The best part? I eliminated over 500 lines of code in doing so. That's right: I made the whole thing faster and simpler, and I kicked a half-KLOC in the process.

[Update 7/6/2006 2:15] I should have given a hat-tip to my good friend Robert (who loves to delete code) for coming up with the phrase "kicking a half-KLOC." Hat tip.


Photo credits: jeltovski, rosevita
Technorati tags: , ,