Friday, October 16, 2009

IT managers put data dedupe at the top of their future tech list

Being able to address data management through consolidation is key

ComputerWorld
Lucas Mearian


IT managers interviewed at Storage Networking World here this week said the key technology in their near future is data deduplication, though how they would implement that technology differed from person to person.

Most managers said their data silos have grown to the point where they're becoming difficult to manage, and growth over the next several years is expected to be exponential. Data deduplication would offer them significant relief in that it could drastically reduce capacity requirements and costs by allowing them to use their storage assets more effectively, they said.

J. Travis Martin, CIS infrastructure services manager for Lawrence Livermore National Laboratory in Livermore, Calif., said his operation manages 750TB of data that will soon be growing to more than a petabyte.

Martin said Lawrence Livermore uses Data Domain appliances to deduplicate its backup data, but he wants to move to a technology that performs deduplication globally, across geographically dispersed nodes.

Martin is considering deduplication vendor Exagrid Systems Inc. in Westborough, Mass., which uses byte-level data de-duplication on a grid architecture.

"That's what tips Exagrid over the top for us, global dedupe across nodes," said Eric Ghere, a systems architect with Lawrence Livermore.

Martin said he would like to get any data deduplication technology as close to the source of data as possible versus deploying it as he does today, as part of the data backup stream.

Brett Michalak, CIO at online ticket retailer Tickets.com, said data deduplication and WAN optimization technology would help him deal with about 100TB of virtualized storage capacity on arrays from 3Par Inc. Through virtualization, Michalak said he is able to provision storage on the fly and keep up with changing customer service level agreements, but that doesn't address growing bandwidth requirements.

"I'm looking at those two primarily because as we start rolling out more assets globally, and the fact that our data will be distributed ... the impact on our networks is going to grow," he said. "I think deduplication will be necessary for us for obvious reasons - for backup and recovery."

Michalak sees in his company's future the complete elimination of tape-based backup and a move toward nearline disk-based storage.

"It's just a waste of time in my opinion. The time it takes to retrieve that data and bring it back takes too long," he said.

Mark Saussure, director of digital library infrastructure for Penn State University, said his 160TB of disk-based data is expected to grow exponentially over the next few years. To address that growth, he has been rolling out the eXtensible Access Method (XAM), a specification developed by the Storage Networking Industry Association that will help him not only to automate backup across tiers of storage, but also allow anyone to search silos and retrieve data through the use of standardized meta data.

"Information silos, if not controlled, will outstrip our ability to manage the objects in them," he said. "The demand is just phenomenal. We can't continue to manage the silos the way we've it done for years."

Saussure hopes to go live with a gateway appliance in front of his backend disk storage that will automatically populate data with standardized meta data that will in turn assist him with data routing, meta data extraction for reporting, data retention and give him low-level search capabilities.

Saussure also hopes to deploy a grid-based storage architecture that will assist him in seamlessly moving data objects around his various storage silos.