Skip navigation

Data De-Duplication: The Quiet Environmental Asset

By removing duplicate data companies can reduce energy expenditures from data processes, reduce storage requirements for enterprise data and curtail CO<sub>2</sub> emissions.

With Gartner estimating that two percent of the world's carbon dioxide emissions come from information technology practices, it's no longer acceptable for anyone in the IT industry to pretend that environmental responsibility isn't our problem. As technology advances and the world data storage needs reach unprecedented levels, it's time for IT and business executives to take a closer look at the quantity of information we deal with everyday -- and how to encourage business growth in tandem with "green" initiatives.

An IDC research study sponsored by EMC in March concluded that there are approximately 281 billion gigabytes of digital data in the world as of 2007. Unfortunately, the rate at which companies collect data is only increasing, meaning data storage will only increase in prominence in the future.

However, we can make a significant difference in slowing this trend through a technique known as data de-duplication. By removing duplicate data (and "retiring" unnecessary servers), it's possible to reduce energy expenditures from data processes, reduce storage requirements for enterprise data and curtail CO2 emissions.

Data de-duplication may not sound particularly glamorous, but according to the 451 Group, it could grow into a $1 billion market by 2009. By finding and resolving duplicate data within and across data sources, you can create a more accurate picture of the enterprise -- and lower the sheer amount of data being managed by the company.

When a duplicate record enters your applications, your corporate systems now have two entries with similarities that are duplicates, but aren't recognized as such. Multiply that common error by hundreds of thousands and you might have a huge number of customers who don't really exist or products you can't really sell. Fixing the problem can have a significant business impact. One online retailer was keeping records on 20 million customers, only to find out through a de-duplication practice that they had closer to eight million unique, distinct customers.

This soaks up server and storage space, which prolongs everyday data processes and results in unnecessarily high energy expenditures. How high? Consider this: a public swimming pool in Zurich will soon be heated thanks to excess energy from a nearby data center. By eliminating duplicate data, companies can cut storage costs and reduce the resources needed to account for it. Less power used equals fewer emissions and fewer emissions equals a smaller impact on the environment -- and reduced costs.

Beyond the environmental benefits, imagine how data de-duplication can impact other efforts. By creating more consistent, accurate and reliable data, you can improve business intelligence and reporting by providing an accurate view of information and authentic customer counts. Customer outreach efforts are more streamlined and effective, while the supply chain is no longer choked with data on duplicate, redundant inventory and parts.

If you're already planning, or are in the process of implementing a data management initiative, de-duplication is likely already on your radar. If not, there are plenty of effective data de-duplication tools and resources currently available that can help you start to resolve your data dilemmas. The end result is a true win-win. You can streamline your data centers while creating high-quality data that can support more nimble, efficient operations.

Tony Fisher is the CEO of DataFlux, which is wholly owned subsidiary of SAS. DataFlux enables organizations to analyze, improve and control their data through an integrated technology platform. DataFlux helps customers rapidly assess and improve problematic data, building the foundation data governance, compliance and master data management (MDM) initiatives.

Interested in information related to this topic? Subscribe to our Information Technology eNewsletter.
Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.