Wednesday, September 28, 2005

Networks keep getting faster

I saw this post on Grid Today about the network that Force10 set up for the iGrid 2005 conference. It's prety darn impressive stuff: the Force10 Terascale E1200 supports 56 line-rate 10 Gigabit and 1,260 Gigabit Ethernet ports.

It's a reminder of one reason why distributed computing continues to make better and better sense. As quickly as Moore's law predicts that CPU speeds will increase (via increased transistor density), bandwidth increases faster. So does the data available (measured via bits / sq inch).

This Scientific American article details how Moore's Law just can't keep up with the relative increases in bandwidth and storage.

This image shows a graph of the performance improvements of CPU speed (doubles every 18 months), data storage density (doubles every 12 months), and bandwidth (doubles ever 9 months).

What does this tell us? CPUs are losing. Even though they keep getting faster and faster, the amount of data they can have locally and the speed with which they can get data increases much faster than they can possibly deal with it.

And one thing we've learned about data is that the more we can store, the more we do store. Today's databases are several orders of magnitude larger than those of just a couple of years ago. No matter what field you are in, you are gathering, storing, analyzing and reporting on much more data now than you ever were. Sensor networks have more sensors. Supply chains have RFID tracking each individual item. Megabyte databases have become gigabyte databases and gigabyte databases have become terabyte databases.

And the speed increases for storage and bandwidth are so much faster than CPU speeds, even multi-core technology won't help in the long run. It will provide an extra doubling or two, but that will only make up for 1 or 2 years of innovation on the storage/bandwidth problem.

So, where does that leave us? Distributing our processing, of course. The answer isn't faster chips: it's more chips brought to bear on the problem. It's the ability to coordinate many machines to work on those huge datasets. The increasing speed of the network and storage make it more practical than ever to move bits before acting on them so that more work can be done in parallel.

And all indications are that these trends will just continue. More storage, more bandwidth, and the need for more computes.

Grid up!