I tend to keep my posts around job duties, but not my industry, but I came across this fun post on Data Domain’s blog so I thought it was time to chime in. (No comments on their blog. :D)
The gist of the post? Large vendors spent a few years pretending deduplication was unnecessary, and it came back to haunt them later. Now it is a key requirement for their customers. I think dedupe really illustrates the huge changes in the storage industry over the past five years or so. There have been a host disruptive technologies: iSCSI, clustered block storage, tiered storage, snapshots, disk to disk backup, and deduplication that have changed the conversation on building storage infrastructure.
Before, companies bought a huge system, where you anticipated your future needs and it cost big bucks. If you chose wrong at the beginning, a few years down the road it would haunt you. Today overall capacity is growing more quickly. Instead of doubling every 36 months, needs double every 9 months. Smaller organizations have the same needs as large ones, but smaller budgets.
Today’s storage goals: scale easily and inexpensively, use storage resources efficiently, simplify storage management, and pay as you grow not up front. Newer vendors have heard this message loud and clear and are developing appropriately. Most larger vendors are still pushing the big iron storage systems, and this has left a lot of space for innovative vendors to flourish and grow, since they developed according to the new reality.
I am looking forward to the next disruptive storage technology: we have a lot of data to deal with these days.
Back to my post title. So does de-dupe make sense for your SAN? Do SANs need to be smarter? Is the next phase in storage consolidation centralizing all host-based disk functions to the network? Clearly the industry has been back and forth on this. Intelligent switches have come and gone, EMCs Invista is looking a little too much like Windows Vista. Maybe SANs are on their way to be replaced by DANs (data area networks) and the network (and storage arrays) will have insight into the data and act accordingly. Is this what users want?