Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

Data deluge: How to get data management strategy in sync

Vast amounts of data will drive new insights and better business decisions, but only if you have a comprehensive data management plan in place.

We are living in the golden age of data. Thanks to phones, cloud apps, and billions of IoT devices, the volume of enterprise data is growing by more than 60 percent per year, according to IDC.

The idea that data is the new oil has become a cliche. It is the fuel that will power the AI revolution. By applying analytics to vast pools of data, businesses will be able to glean new insights and make better decisions.

But unlike petrochemicals, data can vary wildly from one source to the next. Each enterprise has its own methods of extracting, refining, and applying it. And many organizations still lack the skills to turn their raw data into something useful.

A 2019 study by Experian found that nearly a third of enterprises believe their data is inaccurate. Some 70 percent say they lack direct control over strategic data such as the quality of their customer experience, while 95 percent say poor data quality is hurting their business's bottom line.

Enterprises need to get their data houses in order now, before the deluge. Here are five steps they can follow.

Don't be a data hoarder

Most companies already have more data than they know what to do with. They've been collecting it for years without a coherent plan.

"Many organizations just collect everything under the assumption they're going to do something smart with this data in the future," says Glyn Bowden, chief architect for AI and data science at Hewlett Packard Enterprise. "But when you start creating large pools of data with no indication of where it came from or why it was collected, then it's open to interpretations that could be wildly wrong."

And as the volume of data grows at an exponential rate, enterprises are going to face some difficult economic choices, says Mike Leone, senior analyst at Enterprise Strategy Group (ESG).

"Storage and compute may seem infinite in the cloud, but it's not free to process and analyze data," he says. "Long term, many organizations won't be able to afford to do what they want to do. Unless they figure out a way to commoditize the consumption of data, combined with ultra-efficient resource utilization, they're going to hit a breaking point."

Please read: Is your approach to data protection more expensive than useful?

Worse, clinging to irrelevant data can also drive the enterprise in the wrong direction, Bowden warns. Organizations may end up changing their core business to match the data, rather than using the data to drive their core business.

"You have to answer the question, 'Why am I keeping this data?' Once you know why you're capturing your data and how you intend to use it, things become more clear," he says. "You should always start with a business outcome that aligns with your current objectives, not try to pivot just because you've got access to new data."

Dismantle your data silos

Once you've identified the data that can drive business outcomes, the next step is to figure out where it resides, how it enters and leaves the organization, and who's responsible for managing it, Bowden says.

"You need a good understanding of what your data ecosystem looks like and what the challenges are," he says. "If you're creating data silos, you need to figure out why. Is it because the data is stuck inside an SQL database that you can't easily share? Have you created a data lake but nobody else in the organization knows it's there? Mapping what you have and how you're using it today is a good place to start."

Please read: A data fabric enables a comprehensive data strategy

Sometimes silos emerge because the data is owned by a specific business unit that may be reluctant to relinquish control over it, notes David Raab, founder of the Customer Data Platform Institute.

"It's usually a little more subtle than somebody standing there with their arms crossed saying, 'I'm not going to share my data with you,'" he adds. "It's more like, 'If there's no benefit to me or my group, then somebody else needs to pay for it.'"

Silos can lead to costly data duplication and prevent the organization as a whole from taking full advantage of that data. Often, they're created because business unit leaders aren't thinking hard enough about the big picture, says Anil Gadre, vice president of the Ezmeral go-to-market team at HPE.

"Historically, people have had this notion of 'I have a dataset and I'm going to do this one thing with it, and I have this other dataset and I'm going to do something else with that,'" Gadre says. "But we are increasingly seeing our customers create larger datasets that are used for multiple purposes."

For example, Gadre notes, one of the largest insurance companies in the world relies on a massive data lake that feeds 52 different business units, each with dozens of use cases.

"So you might have 500 different applications tapping into this common set of data, because they're being used in very different ways by those business units," he says.

In cases where enterprises must comply with data sovereignty regulations, data silos may be unavoidable. But in most situations, it's better to break them down, Gadre says.

Foster a data-centric culture

Managing data at this scale requires a top-down data governance strategy, says ESG's Leone.

"Data is growing at an alarming rate, and a majority of it is not analyzed," he says. "It's difficult for organizations to ensure trust in data if it's not properly integrated, cataloged, qualified, and made available to the right tools or—more importantly—the right people. Over the next year, data governance is going to be massively important, especially as organizations look to leverage more high-quality data."

It's why many organizations are hiring chief data officers who can reach across different business units to coordinate a unified strategy, Leone adds. But you also need to build a team with the right kinds of skills.

Gadre says organizations are showing an increasing interest in DataOps, creating teams of specialists that can manage the logistics involved in storing, sharing, and securing enormous datasets.

"Think about the supply chain logistics of getting the COVID-19 vaccine rolled out around the country and getting it administered," Gadre says. "The data logistics problem is not much different. You have to get the data from here to there. You have to get it to the right people who can use it in a timely way. Some data has a very short shelf life and loses value quickly, while other data doesn't. How do you store the data? How do you recover from failures? It's a layer cake of the many different things you have to do."

Prep your data for analytics

The primary reason most enterprises collect massive amounts of data is so they can apply AI to it and make smarter business decisions. But the value of the insights that analytics can provide are only as good as the quality of the data fed to the machine learning models. You know the saying: Garbage in, garbage out.

"The biggest driver of successful AI scaling within any organization is having access to well-organized and relevant data," says James Hodson, CEO of the AI for Good Foundation, a nonprofit organization focused on the use of AI to address societal needs. "Companies that are better at collecting, storing, and analyzing data stand to gain a lot more from the principled introduction of AI into their processes than those that are still figuring out what data gives them an advantage and how to collect it."

But many companies underestimate the time and effort required to build an effective data infrastructure and hire the right people to manage it, and they may need to spend years collecting data before it becomes truly useful, Hodson says.

Identify the right use cases

Organizations also need to identify what data sources are the most useful for analytics purposes and the proper use cases to apply them to.

"On its own, data is about as useful as oil when you don't have an engine to put it in," says Bowden. "Data is only useful in the context of a particular problem you're trying to solve or a particular inference you're trying to get to—something that's actually going to drive business value."

Some business processes are more conducive to machine learning than others, notes Anastassia Fedyk, assistant professor of finance at the Haas School of Business at the University of California, Berkeley. For example, well-defined prediction problems, such as anticipating potential failures in a piece of industrial equipment, are good candidates for machine learning.

Problems where factors outside your control can influence results—say, trying to predict sales after a competitor has introduced a new product in the market—won't be as accurate, she notes.

"When enterprises ask me where to apply analytics, I ask them, 'What is the biggest thing preventing you from growing revenue or solving your cost problems?'" says Gadre. "What is the number one thing you'd love to be better at?'"

Most organizations end up with five or 10 key business objectives they'd like to drive with analytics. Gadre advises them to assign a team to each objective and give them a few weeks to see if anything interesting arises from the data.

"It's the classic idea of fail fast," he adds. "Try it. If it doesn't work, celebrate what you learned and move onto the next thing. Even if it was a failure, finding out that this dataset was not useful is well worth the money."


This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.