Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

Solving the AI efficiency conundrum

It's well known that model training uses a lot of energy, but so does using it in the field. Both sides can be optimized.

AI has an efficiency problem. And left unchecked, it's only going to get worse.

"Everybody believes that AI is easy," says Sorin-Cristian Cheran, vice president and fellow at Hewlett Packard Labs. "AI is not easy. Everybody believes AI is free if you're running it in the cloud. No, it is not."

Artificial intelligence is a complex topic, and it's expensive no matter where you run it—and one of the big reasons for that is the computational cost involved.

The problem is twofold. First, AI is costly to train. At the development stage, the amount of computational hardware involved is staggering. AI expert Elliot Turner tweeted in 2019 that it cost $245,000—the price for running 512 TPU v3 chips for two and a half solid days—to train the XLNet natural language processing model, with no guarantee that the results would be what the developers were hoping for. The comment brought considerable interest to the topic of the efficiency, or lack thereof, in training AI. And while Turner's math may have been overstated, the numbers involved are certainly still massive.

Consider a more grounded problem like training an AI how to solve a Rubik's Cube. This may sound straightforward, but the task was estimated to consume a monstrous 2.8 gigawatt hours of electricity. As Wired reported, that's "roughly equal to the output of three nuclear power plants for an hour," though, again, experts debate those numbers. Nonetheless, figures like these have brought into question how scalable AI training is, particularly as some estimates suggest that technology operations could consume as much as 51 percent of global electricity production by 2030.

Training is only half of the problem, though, because once the model is completed, a second efficiency problem comes into focus. Running a trained AI model requires power too, and while not nearly as much as it does during training, it's still substantial. And if thousands or millions of devices are using the model, the efficiency issue becomes exponentially bigger. The energy required to run an AI model varies based on the algorithm, but a single PC equipped with a GPU, the common platform for most AI processing needs, will consume around 400 watts. That's well outside the energy profile of handheld devices, virtual reality headsets, drones, and Internet of Things gadgets, and it can even present a challenge on larger platforms, such as airplanes and automobiles.

Please read: AI and sustainability: The most important tech challenges of 2021

While these dual challenges are intense, the good news is that both are being addressed, with the overall goal of improving the energy and computational efficiency of AI at every step of the way.

Reducing the cost of algorithm training

Algorithm training is inherently complex and resource intensive because of the way neural networks are designed. As the size of a problem gets larger, the complexity of the network grows. Consider a challenge like optimizing the game of checkers. This simple game has 5 x 1020 possible positions, and it took AI researchers 18 years of near-continuous processing work to "solve" it—determining perfect play that would ensure that the AI could never lose.

That work was completed in 2007, and while the overall efficiency of AI platforms has improved over that time, the scale of the problems being attacked has grown at a much greater pace. The complexity of chess, for example, is so great that it is difficult to properly quantify, with the number of board positions estimated between 1050 and 10120, depending on your assumptions about how reasonable those positions might be. That's a challenge that simply can't be solved with brute force processing, which has led to the development of a number of conservation-oriented training strategies.

"At each step of the lifecycle of the AI model there are ways to optimize the model," says Hana Malha, a machine learning engineer at HPE. Malha explains that one major solution to the challenge of optimizing an AI model is related to the parameters that are specified in the initial training of the model. Compression techniques let you squeeze out parts of the model that aren't overly relevant, pruning away conditions that would automatically be of no use to the algorithm. "There are some models today where we can actually get rid of 90 percent of the parameters that represent a model and still get the same accuracy as if we were using all of them," says Malha.

The concept has already proved successful in real-world training experiments. Traditionally, a database of 60,000 handwritten digits, called MNIST, has been used to train algorithms to discern handwritten numbers. Lately, researchers have been able to compress that database down to a mere 10 images, one for each digit, which were optimized to maximize the amount of information they contain. When attempting to recognize handwritten numerals, the results for an AI trained on the 10 images are almost as good as those for the AI trained on the full MNIST dataset, all because the training data has been optimized.

But that's not all: More recently, researchers have been able to successfully reduce the training set down to just five images, using "soft labels" to describe, say, how an eight is similar and dissimilar from a three. While the limitations of so-called less than one-shot learning remain an open question, it's clear that many AI models need not rely on the massive datasets they're usually trained with.

Please read: Supercomputers, AI, and the power of big datasets

Another efficiency-boosting concept that is proving successful is transfer learning, a method by which the insights gained by training one AI model can be transferred to another model. Rather than starting from scratch, says Rakshit Agrawal, director of research and development at Camio, which markets an AI-powered real-time video search system, a successful, completed model is used as a starting point and then refined with new information or parameters. "This makes training much easier," says Agrawal. "It allows us to run fewer iterations at the data center at the time of training, while still maintaining high accuracy in the models."

Automated machine learning, or auto ML, is yet another set of technologies that stands to reduce the overhead involved with AI operations. Auto ML is in part tasked with identifying the AI models that work best for a given dataset by considering how previous, similar datasets performed with a given model, then adapting that model for use with the new dataset. When combined with some of the previously discussed tactics, auto ML technologies can be dramatically effective at improving the overall efficiency of the AI modeling process.

Through a combination of these tactics, training efficiency can be greatly improved. Says Agrawal, "We've successfully mitigated a lot of these expenses and are managing costs."

Lowering AI's energy requirements at runtime

The second half of AI's efficiency issue comes when it's time to run the trained model, a process commonly known as inference. Here, the challenge is ensuring the power consumption used by the AI algorithm in the process of inferring results is within the capabilities of the device tasked with running it. While obviously the goal is to limit the energy footprint of any device at runtime, this is especially important for battery-powered devices, which invariably need to minimize their drain rate.

The vast number of devices using AI models makes this a much larger issue than training, when considered in the aggregate. In fact, it's estimated that 80 to 90 percent of the overall power consumption of AI actually occurs during the inference stage.

Solutions are arriving in the form of both hardware and software. On the hardware side, manufacturers have been developing chips specifically for AI, which offer higher performance and lower power consumption. For example, Nvidia's Ampere chip, released in 2020, was designed to increase performance by 50 percent while drawing half the power of its predecessor. Agrawal says his organization has also used general-purpose, low-power ARM devices like the Raspberry Pi to run some of its models when power draw is of special concern.

Please read: Containers as an enabler of AI

The remainder of the work then falls to software. Got a system like a smart speaker, with limited computing capabilities? Large-scale consumer AI systems—think Alexa—do an end run of the device altogether, pushing the bulk of processing to the cloud instead of doing the work locally. That doesn't really save any energy, however, and more recently, developers have been working to perform inference directly on low-power devices, in part because it is faster and less prone to the vagaries of wireless networks.

To make this possible, AI models have to be pruned, trimmed, compressed, and optimized for these types of devices. Agrawal also says that published frameworks like ONNX can be helpful in improving the energy efficiency of an AI model.

The most obvious use case for model compression can be found in autonomous driving, says Malha, where dozens of AI models are being run simultaneously and physical space and energy are at a premium. "You have to deal with multiple cameras and sensors, preprocess it as fast as possible, apply various models, and make decisions in real time," she says—and it all has to fit into a finite power environment. "And you can't afford to make a mistake."

Auto manufacturers have used both hardware and software tweaks to address this issue. Tesla began designing its own AI silicon a few years ago, for example. And most manufacturers are also likely taking software shortcuts to improve both performance and power usage. It's impossible to know exactly how any given model is pruned to fit within a certain power profile, but "the model used in the car is probably a compressed version of the model that was trained in the lab," says Malha.

Ultimately, however, most research into AI efficiency remains tied to training, as these expenses are easier to isolate and because any benefits realized here tend to trickle down to the edge over time. So, for the time being, expect to see continued investment in training efficiency. As Cheran notes, "The better you want your model to run at the edge, the more you need to invest in the data center."

Lessons for leaders

  • The cost and cost effectiveness of AI and machine learning needs to include the power consumed in both the training and execution of the model.
  • AI model training is the obvious target for power savings, but the model is likely consumed many times by many devices and so consumes a lot of power overall at that stage as well.
  • Many approaches are being taken to compress AI models and otherwise simplify their execution so as to minimize the power consumed by them.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.