Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

Lost in transition: Why hybrid cloud systems need AIOps

Modern hybrid cloud systems will only get more complex, too complex for IT to tune or solve problems. But AI can do it.

The benefits of hybrid cloud platforms are enormous, and their growth will clearly only continue. But with those benefits, including adaptability, scalability, and reliability, comes a price in complexity. That complexity arises from the array of components and different locations of those components within hybrid cloud architectures. And it is in the interactions of those components where the potential losses in transition reside: losses of efficiency, losses of accuracy, and losses of the benefits that companies were promised when they adopted hybrid cloud.

This is where artificial intelligence picks up the heavy lifting. At its core, AI is designed to see patterns and problems that are far beyond the scope of the human brain. It can solve these hugely complex issues by breaking down the operational barriers of the hybrid environment. The use of AI to diagnose and optimize operations is called AIOps.

Each organization that approaches a hybrid cloud platform comes with its own transition pain points. The following examples can provide a top-level understanding of a few of those points and the role of AIOps in easing the transition.

Legacy systems and applications

Established organizations have built their technological platforms with systems and applications that may have been in place for years, if not decades. Legacy, proprietary, and core mission-critical applications can't always take advantage of true cloud operating models in their current format.

Please readThe rise of artificial intelligence and machine learning

Of course, the hybrid model is designed knowing that some components can't be put on the cloud. Yet, a potential transition pain point can occur with an on-prem data center that has all assets (hardware, software, networking) in one location and relies on a multitude of interdependencies with other applications within that system. When moving some applications to the cloud, those interdependencies can cause deployment delays and reliability issues.

Essentially, the legacy applications may have some subcomponent that is trying to access services that either don't exist in the public cloud or have not been moved to the public cloud yet, thus getting a "nobody here" response and failing. Even then, the microservices applications on the cloud can be ephemeral. They spin up, exist for maybe a short period of time, and then spin back down. They also move all over—they go to where the resources are to provide the utilization and efficiency possible. Legacy applications may have a hard time utilizing a dynamic service if the legacy application was built with the assumption that the needed service is always in a specific place.

These are problems that are notoriously difficult to diagnose, but AIOps can cut through your mountain of log files to find the source of the problem. It may be able to suggest solutions to the problem, and it should get better in these suggestions over time.

Varied stack components

It would be great to have a server stack that has the same logo on every part. Unfortunately, the volatile landscape of technology has seen tech companies rise, fall, and merge. An organization could, within its server stacks, have components from half a dozen different brands. And that's just the hardware.

Here's a plausible scenario: Imagine you're a large company. Your growth could be both organic and through acquisition. The application that runs your customer portal was developed by a company you acquired and runs in Amazon Web Services. Your company's core datasets exist both on premises and in Microsoft Azure. Due to growth, your web services are now a mishmash.

Please read: Crushing edge complexity with automation

In both cases, the crunch is in how those different brands and services interact with each other on premises and in the cloud. An update to either environment could create errors or unexpected behavior involving multiple components in a way that human analysts can't easily understand.

Once again, AIOps thrives with complexity like this. A properly prepared AIOps system can be expert enough on all the different products and platforms and up to date enough on changes in their configurations to advise on the most complex installations.

Departmental silos and retaining experience

Where the last two examples are largely rooted in software and hardware, the issues caused by departmental silos point directly to human fallibility. Even within the same organization, departments working together on a hybrid cloud platform, using their different tools, can create a fragmented view of all processes, especially if each department is using different cloud services. Even with the aid of a comprehensive procedure manual compiled and updated by all departments, there seems to always be the issue of "I can't do my task until that other person does."

The problem of retaining experience complicates the narrative. A procedure manual may contain a database of knowledge, but often it's the employee with years of experience that understands when to apply that knowledge. When that employee leaves the company, they are usually replaced by someone who may not be as skilled or approaches troubleshooting in different ways.

With AIOps, the AI can learn how to solve these problems, and it in turn becomes the subject matter expert that provides recommendations to humans to fix the things that the AI doesn't explicitly have permission to touch. The effect of a senior employee leaving becomes less of an issue. The dataset that is used to train and re-train the model can be refreshed as issues outside of the AI's current capabilities are addressed, removing the need to manually update manuals and processes.

Operational visibility

Relevant to the other points highlighted, the lack of visibility on your entire data center infrastructure creates gaps where losses in transition can occur.

Traditionally, end users have had access to every piece of the stack (hardware, software, networking). They could interrogate the stack for different types of information: Is the CPU busy? Is the network link congested? Are there lock states in the database? When moving to a hybrid model where you might have either reserved capacity or parts of the application exist inside of the public cloud, the visibility needed to answer all the inquiries is greatly reduced. You're not going to get to see the actual physical Azure rack and then query it at your convenience. You get the data the public cloud provider presents to you, when it presents it to you.

Please read: Solving the AI efficiency conundrum

Ideally, a dedicated IT team would be all that is needed for monitoring hybrid cloud systems, but the very nature and speed of hybrid cloud platforms creates limited operational visibility. Traditional monitoring tools and operational tools just aren't flexible enough or quick enough to exist in a hybrid world—there are just too many moving parts moving too fast in too many places. But it's not too many for AIOps.

AIOps is the solution

AI has the power to absorb huge amounts of information about the hybrid environment and push messages out to identify faults along the chain—for example, an issue with the cloud service or with the localized networking configuration.

Programmatically building an AI model that touches the data points in the on-prem data center and any cloud-based services provides for a level of analytic complexity well beyond what humans or even conventional tools can do. AIOps collects data and analyzes it and conducts diagnostics on the service, all at the same time.

Ideally, the industry will consolidate around a set of standards that allows all vendors, hardware, and software to write performance data and advanced error condition handling and other informational data as required to build a comprehensive model. This does go against the concept of major vendors creating closed systems that create a sense of perceived value-add to their solutions. Most likely a standard like this will be driven by major cloud players looking to extend their services into the on-prem hybrid cloud world.

Until such a standard is widely accepted, organizations using hybrid cloud platforms still have options to apply AI in navigating the transitions for themselves.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.