Why peer comparisons are changing AIOps
The days of encountering a network problem on a Friday afternoon but not being able to get it fixed until Monday are long gone. A competent IT department is expected to troubleshoot problems 24/7.
Thanks to automation, many problems can be resolved within minutes. For example, if IT encounters a common problem, an algorithm can propose tried-and-tested solutions based on data from other networks. Of course, this works only if data is continuously collected and processed in an intelligent way, which is exactly what AIOps—the use of artificial intelligence in IT operations—does.
But what happens if an organization faces a rare problem or one it has never encountered before? In a world where demand is ever-changing and where it's hard enough to keep up with the newest tech developments, this is a frustratingly frequent occurrence. The scenario applies to practically everything in IT but especially when it comes to operating networks. That's where peer comparisons step in.
Many problems aren't as special or individual as they initially seem. Organizations that operate in similar buildings may face similar problems, for example, like glass walls weakening their Wi-Fi signal. It may seem like a very individual problem at first, but if one organization managed to solve the issue, another can benefit from that experience.
No AI without lots of quality data
It's standard practice for networking vendors to collect telemetry data from every customer's wireless, switching, and WAN device and all user and Internet of Things clients to create a working baseline. This is essential: Any AI needs data to train on, and a dataset that's too small or not diverse enough will not deliver the results that developers and network operators are aiming for. The point isn't to monetize anything personal about your data but measure the behavior of the technology being used.
To obtain such a vast dataset, it's not enough to perform measurements for a few weeks or months. In the case of networking, the value of a dataset shows only after many years of consistent data collection. Top players in the industry are currently collecting and processing around 32 TB of telemetry data each day and have been doing so for several years.
It may sound easy to build a collection of telemetry data over time, but that's not the case. What the user wants—stable connectivity and no overload in traffic—needs to be translated to quantities that are directly measurable. Such quantities include the throughput, latency, and resource efficiency, and can be influenced by various network settings. It's the interplay of these quantities that determine what the user experiences.
Analysis of the telemetry can also find fingerprints of key applications, such as Zoom or Microsoft Teams, and make their use part of the analysis.
Finally, all that data needs to be used to understand which settings make users happy and network devices work best. In principle, AIOps has two ways of doing that with machine learning algorithms. With supervised learning, the analysis engine learns from the settings that have been used in the past. It can then recommend those settings that lead to optimal performance to the user. With active learning, it can also recommend settings that have never been tried before but which are likely to increase network performance.
Find what makes the difference: Network environment factors
There are various solutions for deploying the machine learning algorithms in the most efficient way and making them return the best possible recommendations about network settings. It's important to keep in mind, however, that settings aren't everything.
Other influences include the building materials where the network is in use, how far access points are spaced apart, and what types of devices are used to connect to the network. These are factors that can't be adjusted by AIOps. Rather, the network settings should be tailored to them.
Take a university campus and a fast food restaurant. Campuses are typically vast spaces spread out over several buildings, where users often spend half or a whole day at a time and may use a stationary computer alongside their smartphones. Fast food restaurants typically cover just the ground floor of a single building, where users won't be spending more than an hour and will primarily surf with their smartphones.
This affects the network configuration. On a university campus, it's important that users can access the network from any location, that they can connect with their smartphones and laptops alike, and that they won't be kicked out of the network after two hours of usage.
In a fast food restaurant, roaming will not be an issue and users are accustomed to being asked to reauthenticate after a set period of time. For a busy restaurant, it's more important than on a college campus that people passing by on the street don't take advantage of the network and use it up. This can be influenced by appropriately configuring network settings.
Leveraging peer groups to understand external circumstances
Telemetry won't pick up, and the vendor database won't store, whether the customer site is a restaurant or a college campus. It will only be able to determine characteristics of the installation, such as the number of access points and the patterns of use in the installation.
With such telemetry from a large number of customer sites, AIOps can place sites into peer groups with similar characteristics. These peer groups face similar challenges and typically require similar network settings. This makes them directly comparable to one another, and as a result, the solutions that worked for one member in the peer group can be scaled to serve other members. They therefore end up with a set of possible solutions to problems, peer group by peer group, that are more likely to work than a one-size-fits-all approach.
AIOps can take the different sites in a peer group and rank them based on performance. It won't work every time, but the settings used by the peers with the best performance may provide guidance for administrators of networks with lesser performance.
Peer comparisons help deliver solutions instead of possible problems
Without peer comparisons, it's hard to establish a baseline for network performance that goes beyond just fixing what is already established. Since the demands of a network are vastly different in a stadium compared with a coffee shop, a provider would be poking in the dark if it sent the same settings recommendations to both. IT teams would be getting useless recommendations and left trying to fix problems on their own.
In contrast, when using peer groups, a class baseline naturally emerges for what a good performance is for an airport, a mall, or any other like locations where networks are installed.
The bottom line: Better performance through individualization
Peer comparisons provide tailored recommendations instead of trying to solve all problems in a one-size-fits-all approach. This can keep networks flexible to adapt to changes in their particular environment and ultimately prevent issues before they happen.
Instead of throwing all networks into one category and trying to make sense of it, peer comparisons bring AIOps to the next level. They separate network settings that are easy to change from environmental factors outside the realm of their influence. This makes it easier to establish a class baseline of what is possible and what possible fixes there are in case of trouble.
All this isn't possible without a vast dataset and continuous data collection. It's also not possible without a dynamic baseline that ensures each customer's network is monitored for problems within the customer's own environment using its own data, to meet expected service levels. Peer comparisons are, then, a way to partake in recommendations that elevate AIOps from just fixing networks to a solution that looks to continuously optimize how a network performs.
Aruba's Trent Fierro contributed to this story and other coverage.
AIOps peer comparisons: Lessons for leaders
- We are still learning new applications for AI and big data and the business benefits they bring.
- Network health and performance are complicated topics. The knowledge necessary to optimize them may not all be in your hands already.
- Network performance is the function of a large number of variables; AI and big data can make sense of what would overwhelm a human.
With a proper baseline for your environment, it's logical to add peer comparisons to establish how your network performs compared with other like networks.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.