Design, deliver, and run enterprise blockchain workloads quickly and easily.
All servers and systems
Business and technology leaders in most organizations understand the power of big data analytics—but few are able to harness that power in the way they want. The challenges are complex, and so are the technologies. Identifying and investing in key principles will help you navigate that complexity to find the right way to tap the growing pools of information available to your organization.
A new Hewlett Packard Enterprise white paper breaks down the six main factors required to get a big data analytics platform right. We asked Paul Cattrone, worldwide platform elite team lead at HPE, to discuss the paper's insights and explain how companies can get big data right.
Expectations around data are higher than ever. Business users and customers demand results almost instantly, but meeting those expectations can be challenging, especially with legacy systems. Speed is not the only factor in implementing a big data analytics strategy, Cattrone says, but it's top of the list. He recalls working with a customer that needed to run queries on a 10 terabyte data set.
"With their existing solution, that query would take 48 hours to come back with an answer," he says. "And after 48 hours, that question is almost moot because the time to take action has passed."
By prioritizing time to insight in its move to a new analytics platform, the company immediately cut that 48 hours down to five minutes, Cattrone says. Wait times shrank to less than a second after the new solution was fully optimized, enabling it to provide data in time to fuel bottom-line results.
It's a given that your big data analytics solution must accommodate huge quantities of data, but it also needs to grow organically with data volumes. "Look at some of the older solutions—once you outgrew an appliance, it was rip and replace, which was very costly, with lots of downtime," Cattrone says. "Today, I'm able to grow my database in line with my data growth, and do it in a way that is transparent to the data consumer or analyst. A modern analytics solution introduces very little downtime, if any at all. Capacity and computer expansion happens in the background."
An important part of an analytics strategy is making sure it works with what you have—but also knowing which tools must be replaced and when.
"A lot of people have made investments in these older tools," Cattrone says, citing industry-standard extract, transform, load (ETL) tools as one example. "It's important to support those legacy tools. But at scale, and as the need for data and analysis grows, you may find that scaling those ETL solutions becomes a costly problem. It may make more sense to retool your ETL with a more modern and more parallel solution."
For many, Hadoop, an open source big data framework, has become synonymous with big data analytics. But Hadoop alone is not enough.
"A lot of people try to do everything with Hadoop," Cattrone says. "At the end of the day, Hadoop is a batch processing system, meaning that when I launch a job to analyze data, I go into a queue, and it finishes when it finishes. When you're talking about high-concurrency analytics, Hadoop is going to show its weaknesses."
As HPE's white paper puts it, "What's needed is...a way to harness the advantages of Hadoop without incurring the performance penalties and potential disruptions of Hadoop."
Organizations should support their most expert—and most in-demand—data workers by investing in tools that allow them to conduct more robust analyses on larger sets of data.
"What's key here is you want to move toward a solution where the data scientists can work on the data in place in the database," Cattrone says. "Let's say they have SQL Server. They're pulling a subset or sample of data out of the database, transferring it on their local machine, and running their analysis there. If they can run statistical models in-database, they're no longer sampling and can get their answer much faster. It's a much more efficient process."
As businesses move toward predictive analytics, they need more from their data technology. "It's no longer just reporting. It's no longer the aggregates of the data in your data warehouse," Cattrone says. "It's asking a very complex question of the data in your database—predictive, geospatial, and sentiment focused."
"I would say the majority of customers are still just doing strict reporting," he adds. "But two years ago, I started to see a shift toward predictive analysis and witnessed other advanced analysis start to grow. Organizations now—with the way data science has become more and more a corporate asset—there's definitely greater interest in becoming more predictive and more data-science savvy in nature."
Globally, data is growing at a rate of 40 percent per year—or two terabytes per second. In this environment, every business is going to struggle against an overwhelming tide of data. Understanding the new technologies that can help manage data at that speed and scale—and using it to drive business success—is vital if you want to avoid drowning in your data.
IT Transformation Benchmark Tool: Find out how your decision-making process compares to that of your peers
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.