Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

The do's and don'ts of machine learning

Successful machine learning models demand rigorous and consistent data management. In this Tech Talk episode, HPE distinguished technologist Rajesh Vijayarajan looks at solutions for large, complex implementations.

When launching transformative technologies such as machine learning, organizations not only need an edge-to-core strategy but the expertise to execute on it. That's where Rajesh Vijayarajan, a distinguished technologist at Hewlett Packard Enterprise, steps in. He helps customers and partners assemble all the pieces of their machine learning models and move the implementations into production.

Please read: Accelerating time to insights by operationalizing machine learning in the enterprise

"A lot of enterprises today have seen machine learning projects fail, and that's mostly attributed to the fact that they're not able to differentiate between what it takes to develop machine learning models in a very synthetic setting to actually putting it to the test in the real world," Vijayarajan says in this Tech Talk episode with host Robert Christiansen, vice president of strategy in the Office of the CTO at HPE.

Continuous improvement

When deploying machine learning, organizations must focus on two things, he says. First, once a machine learning model is put into production, there is the issue of data drift and the need to continually improve upon and update the model as the dataset the machine learning was trained on expands. Second is the concept of reproducibility, or explainability, of the model—something critical in highly regulated industries, Vijayarajan notes.

"That would mean very rigorous versioning of all it takes to actually build that model," he says. "This includes Cord, it includes parameters, it includes the new network architecture and the specific version of the datasets that were actually put to use, including the transformations that would apply to them."

It takes a data engineer

Data engineering is key, Vijayarajan says, especially in deployments comprising hundreds, or even thousands, of models. "The moment you say there is an ensemble of a few hundred models that are trying to solve a problem, what it really requires you to do is create what are called data pipelines for the whole consumption of that model to happen," he explains. "It's essentially an output of another model that is feeding into it or some sort of a pre-processing logic."

For those data pipelines to "rehydrate" and continuously update across large deployments, automation needs to be built in. Using tools such as HPE's edge-to-core data engineering platform, organizations can orchestrate applications and data across deployments at the edge, in the data center, or in the cloud, he says.

Please read: Operationalizing machine learning: The future of practical AI

"The beauty of the data engineering platform is that you could seamlessly deploy these models into thousands of sites, and then you could apply [heuristics techniques] and essentially do what I call crowdsourcing outliers. And all of this is completely automated," he says. "If you think about it at a deeper level, what this really enables is a certain deployment to actually learn from its peers."

For example, in the case of a robot that needs to pick parts on a manufacturing floor, when a new part is introduced, "you'll see outliers for that new part actually pouring in back to the core. And the next iteration of the training will actually enable that model to end for that new part, and before that part was even introduced to the other manufacturing floors, they already have the ability to identify it," Vijayarajan says.

Managing data at the edge

He adds an important component of this is the data fabric, which orchestrates apps and data across implementations. "You can have an extremely consistent way of how you manage data at the edge, because ultimately, what we're trying to convey to our customers and partners here is that we should hone our data gravity. Replicating data from the edge is absolutely not sustainable," Vijayarajan says.

In large, complex machine learning deployments, consistent data management capabilities like data tiering and time-based data decommissioning are critical, he adds.

"You really want consistency there, given the scale of the number of sites and the sheer amount of data that you're dealing with," Vijayarajan says. "You should be able to selectively move data back and forth between the edge and the core. And this could mean pushing out models, it could mean crowdsourcing outliers, it could mean crowdsourcing telemetry from an operations perspective—any of that."

Listen to other Tech Talk episodes.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.