IoT analytics strategy: Cloud, edge, or both?
One element in working out an Internet of Things strategy is figuring out how and where to apply analytics—the process of examining the data collected—and which analytics are best to use for maximum benefit. Sometimes, it makes sense to go all-out with analytics on the edge; in other circumstances, it means analyzing data in the cloud or in the data center. And a hybrid approach may be the most profitable path. Here's your guide to figuring out which to do when.
Let's start with a few definitions. "Analytics at the edge" means that part of the data processing is done relatively close to the sensor. The analytics may be performed on the sensor or a gateway close by.
You might think that anything labeled as analytics at the edge must be geared toward small datasets, since the data collections hang out on or near small sensors. However, that isn't usually the case. A common reason to use distributed computation systems is the data is too substantial to transfer to the cloud in real time. In these circumstances, real time matters a great deal: These are the computations that keep an autonomous car from crashing, a robot from harming a human co-worker, or a refrigeration unit from spoiling a human organ in transit for transplant. If a machine must respond immediately to its own metrics, environmental sensor data, or some combination of those, analytics at the edge is the right choice.
Latency and connectivity considerations
In contrast, cloud-based analytics works on a larger variety of data. For instance, it can add historical data to streaming data or analyze all the output from all the devices using edge analytics. Some cloud analytics are real time, and some are near real-time or historical. In other words, they can be very fast—just not as fast as massaging the data close to the device, where it doesn't need to take into account network latency or other connectivity issues.
"Generally, if a specific task requires a high volume of data or if it is processor-intensive, the latency and cost required to move it to centralized cloud locations would simply be impractical, making edge computing the better fit," says Ersin Galioglu, vice president of Limelight Networks, a content delivery network.
"An example of this is processing large volumes of video or image data to detect leakages or other anomalies in oil pipelines, train tracks, or other infrastructure problems, and quickly providing alerts," Galioglu adds. "However, if there isn't a lot of data or it is not processor-intensive, the cloud might be a better fit. For example, for fleet management needs, businesses can track movement of vehicles to determine the most efficient routes."
When you have more data, says Dana Gardner, principal analyst at Interarbor Solutions, you can head off challenges such as supply chain interruptions, labor inefficiencies, and security breaches. Analyzing the data, whether in retrospect or in real time, lets you forecast more quickly how to fix and anticipate problems. That can result in huge savings—far more than the cost of deploying the sensors.
"What we're doing with IoT is significantly increasing the nodes on the network," Gardner says. "The network becomes more valuable. What we can do with that data also becomes more valuable, and we have the capacity to manage that data now." Because of the increase in the speed, velocity, and variety of data that we can churn and analyze, he says, we can take advantage of a lot more data points. That impacts the network's ability to harness that data, which is the next big step. "We already have the means in the cloud to manage that data and take that data and turn it into very meaningful predictions and analysis," Gardner adds.
The difference between the two typically comes down to the practicalities of distance. Think of it this way: In a tennis game, the odds of reacting and hitting the ball over the net are better if the player closest to the ball does the calculations than if the coach makes the calculations and yells to the player to tell her where to swing.
However, the coach has the benefit of analyzing many tennis players and many racket strokes. So even off the court, the coach can better analyze the particular player's performance and suggest improvements.
Together as one
As the tennis analogy suggests, companies use analytics on the edge and in the cloud, wherever the advantages are greater. But often only part of the processing is done on the edge. So, you really do need both parts—the cloud and the edge.
The most common reason analytics is done on the edge is to get results fast so that an appropriate response can happen in real time, such as safe machine steering and appropriate braking in an autonomous car.
"Consider the automated manufacturing process," says James Kirkland, chief architect for IoT at Red Hat. If a machine encounters a problem, it's critical to identify issues and respond quickly according to predetermined rules, such as adjusting operating speed or temperature. That happens at the edge for speed reasons.
"The data feeds from all similar machines would then be sent to the cloud for analysis. That would find patterns and trends that build machine learning and could be used to optimize how the machines are configured for maximum performance," Kirkland adds.
Another reason to use analytics in the cloud and at the edge is to shrink the amount of data slated for transfer to the cloud. Truly huge datasets—of the sizes typical for continuous streams of millions of devices—are too large to transfer to the cloud before the data loses its value. (It's too late now: The car already hit another car!) It's also expensive to store datasets that large, especially when they are steadily growing, despite the cheap storage costs today. So, the right answer for many companies is to process some of the information at the edge and send only the resulting outputs to the cloud for further analysis or storage.
Finally, there's the issue of bandwidth concerns in transferring data to the cloud. For example, if you have many agricultural fields located around the world in rural locations, each fitted with a variety of sensors, it's a serious challenge to ensure constant and reliable Internet connections suitable for sending huge batches of data (or even streaming data). Again, it is generally more practical to process some information at the edge, and send either just the outputs or narrowed bands of raw data to the cloud.
"While cloud and edge computing fit different business needs for analysis and decision-making, the truth is that optimized business operations depend on both," says Galioglu.
Adding machine learning and analytics to the mix…gradually
Almost by definition, IoT sensors are collecting a lot of data. That makes it appealing to apply machine learning and artificial intelligence (AI) to those datasets. These sophisticated algorithms can find patterns too big or subtle for humans to easily detect. However, while AI and machine learning are already invaluable tools today, the industry is far from mature.
Additionally, it takes time for machines to learn and they aren't discriminate about what they learn, so managing their exposure to information is very important. For example, Microsoft's AI-driven chatbot was taught by Twitter users to swear like a sailor. While some Twitter users found that funny, it was a serious obstacle to teaching the machine to respond appropriately to human conversation. It taught researchers about the importance of training tools correctly, though.
At this point, AI and machine learning are more common in IoT analytics in the cloud. That's because of the immaturity of the technology (and sometimes the teachers), the time it takes to learn, and the need for close human supervision to make sure it is coming to the correct conclusions. It might be comic relief for AI to curse like a sailor when there's almost a car accident, but most autonomous car users would rather it instantly make a correct decision, avoid the accident, and skip the commentary entirely.
There are also security issues, given a history of insecure IoT devices; criminals are now aiming to manipulate data rather than contain their activities to stealing it. Manipulating data analytics at the edge could cause a car to crash, a pacemaker to explode, traffic lights to go awry, or a drone to attack rather than protect.
"From a security perspective, this guarantees potential disasters should one of several drone systems or the software used to control them become compromised or manipulated," said Mohamad Amin Hasbini, a Securing Smart Cities board member, in a statement to the press.
A survey by Bain & Co. asked U.S. respondents about factors they believe to be barriers limiting the adoption of the IoT and analytics. Respondents said their primary adoption challenges include the right security, integration with current systems, and achieving returns on their investment:
Top barriers to implementation
High price or unclear economic benefits
Difficulty integrating IT with operational technology
Lack of internal technical expertise to implement and operate
Not compatible across devices and systems
Vendor lock-in, or lack of open standards
Data portability and ownership
Why it's important to deploy both now
While many companies are just now experimenting with IoT business cases, the competitive nature of its use is already glaringly evident.
Gartner estimates that 8.4 billion connected things were in use worldwide by the end of 2017, up 31 percent from 2016. Further, the researchers expect the count to reach 20.4 billion by 2020. Total spending on endpoints and services, they say, will reach almost $2 trillion in 2018. According to Gartner, IoT is a "marriage of operational and information technology." And if that's the case, then analytics is the heartbeat of that marriage and its pulse is felt everywhere, from the edge to the cloud.
How and where to apply analytics: Lessons for leaders
- Cloud-based analytics can work on a larger variety of data.
- The most common reason to use analytics at the edge is to get results faster.
- The choice typically comes down to the practicalities of distance: at the edge (on or near the sensor) for speed; in the cloud for variety of data to be analyzed.
- Most organizations use both, in combination, to leverage the strengths of each.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.