Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

When should old data be deleted?

If you don't have a good reason to store data, you have good reason to delete it.

In the modern enterprise, data is everything. No business decision can be made without valuable data to back it up, whether that's launching a new product, setting prices, or developing a new marketing campaign.

But increasingly, data is also dangerous. The rising frequency of costly breaches and ransomware attacks that target data stores—not to mention a bevy of new privacy laws that control how consumer data is stored and used—has cast a spotlight on the hazards of holding on to too much data for too long a time. Case in point: Neiman Marcus recently announced that 4.6 million online customers had personal information, including payment card information, stolen. The "good" news is that 85 percent of the payment cards affected were expired or invalid, raising a couple of big questions: Why had Neiman Marcus been holding on to that information at all? And how do you know when it's time to get rid of old data?

Please read: How to test your backup and restore plan—the right way

To be fair, we all do it. It's human nature to hang on to data, says Adam Kriesberg, a professor of library and information science at Simmons University in Boston. "Tech companies can have a tendency to be digital hoarders," he says. "The mindset is that if we can afford to keep it, we should do so forever, because we think we might need it." But in addition to risk issues like those exemplified by the Neiman Marcus case, that data can eventually become an albatross, becoming so overwhelming in size that the business can no longer make sense of it or extract real value from it.

Four pillars of a data retention policy

Determining when data should be deleted requires analysis of several interrelated criteria, says Stuart Stent, director of cloud professional services at Hewlett Packard Enterprise. He lays out a framework for building a data retention/deletion strategy that consists of four key considerations:

  • Regulatory requirements  This is the first consideration for virtually all enterprises: Are there rules governing the data that either forbid or mandate its deletion in some way? GDPR is the biggest of these, says Stent, "as it says that the person whom the data is about should be able to demand the deletion of the data." If a customer in a GDPR-protected region wants their data deleted, the company must comply. On the other hand, GDPR doesn't set rules that prevent the deletion of consumer data, so if a business wants to set a policy that deletes all user data after, say, 18 months of non-contact, it's free to do so.

    Other regulations do require that data be saved for certain periods of time. It's commonly stated that U.S. income tax records and underlying data should be maintained for seven years, along with records of any securities transactions. If the business files foreign tax returns, those records should be kept for at least 10 years. Realistically, when it comes to financial records, many businesses elect to maintain this data indefinitely.
  • Risk – If the data is exposed, what's the risk to the enterprise? Customer records or a database of credit card information presents a sizable amount of risk, but a lot of enterprise data would be nearly worthless to a hacker, says Stent. Access logs or logs of machine sensor data, for example, don't have any value on the black market, and as such, it isn't essential that they be frequently purged, at least for risk reasons. This type of historical data can also have hidden value to the enterprise. "Looking at access logs, you can use this information to trace and manage exposure to a hack," says Stent, allowing the organization to pinpoint the moment an attack began, even if it's looking back months or years later.

    Similarly, modern ransomware attacks are compelling many organizations to become more creative with their data storage policies, electing to save complete backups to off-site locations that are disconnected from the internet and, as such, immune from these types of cryptological attacks. In this way, having more data available can actually reduce overall risk. Conversely, Kriesberg notes that long-term data storage can introduce risks related to legal discovery; anything that hasn't been deleted can be subject to subpoena.
  • Cost – Data storage is notoriously cheap, but it isn't free. For Stent, cost is a question that is closely aligned with the fourth consideration, which involves the value of the data. "If you're storing 40 to 50 TB of data and it's costing you around $1,000 a month, what is the business benefit you're gaining from that?" he says. In a world where cloud-based storage is easily accessible and infinitely scalable, most enterprises probably don't concern themselves overly with the cost of data storage. But if that data is being stored without any path to deriving clear value from it, they probably should.
  • Business benefit – Arguably the most important question to ask is simple: What's the benefit of having this data? It's also the toughest to answer. The answer to tomorrow's critical business question may hinge on information contained somewhere in the enterprise, so deleting old data carries a certain level of risk if it's not done thoughtfully. These decisions, says Stent, have to be made by carefully balancing the other three categories of consideration. For example, with customer data, the overall pattern of customer access has to be taken into account. "Do you have customers that come in once a year or twice a year, or is it a weekly thing?" asks Stent. Seasonal businesses may need to retain customer data longer than a higher-traffic store that caters to local clientele, for example.

Executive ownership is key

It's easy to pose all of these questions about data retention. But answering them properly is first a matter of deciding who is best positioned to do so.

Please read: Getting the most from your data-driven transformation: 10 key principles

"Very few organizations actually have the appropriate owner for this problem," says Ilia Sotnikov, security strategist and vice president of user experience at Netwrix, a data security company. "Only large enterprises can afford an executive role that can coordinate multiple stakeholders to find the intersection of various business needs, legal requirements, and technical capabilities."

Sotnikov notes that an executive-level leader—preferably a chief data officer—needs to take the reins here. "Without executive-level ownership, the task of creating a retention policy often falls onto the legal or IT department," he says. "Neither of these two has the right level of visibility into business processes and data flows," which often drives users into shadow IT solutions such as using personal email and consumer cloud services to store information. "Such behavior often leads to user mistakes, excessive sharing, and data leaks."

Of course, IT plays a key role in establishing data retention and deletion processes, but clear guidelines and data access control procedures need to be set to avoid mistakes—or malicious activity. A recent debacle in the city of Dallas led to more than 8 million police department files, totaling more than 20 TB of data, being destroyed by a single IT employee. The city eventually chalked up the error to a lack of training and documentation during a data migration process.

Beyond drilling holes in hard drives

One final concern naturally involves the proper methodology for deleting old information. For data stored on premises that is being retired or migrated to the cloud, permanent disposal tactics are still quite common. "Most of the time, drives are still being physically destroyed," says Stent.

Please read: What lawyers want IT to understand about e-discovery

But data no longer resides solely in the data center. Data stored on a cloud computing service is more difficult to properly delete, because there's limited visibility into where it resides. Encryption is key here, Stent says. "The best practice is to encrypt the data when it's in day-to-day usage, so when you're ready to delete the data, you simply delete the key, and that data is no longer accessible in any meaningful way," he says. For added security, the data can be re-encrypted and then deleted. But for most types of corporate data, that's probably overkill.

Regardless of whether data is stored on premises or in the cloud, it's important to establish a strong data retention policy and adhere to it before the situation becomes unmanageable or, worse, a breach occurs. And that policy has to be both clear and strategic. As Kriesberg notes, "When it comes to data, making decisions about what to throw away is as important as what to keep."

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.