AI Supercomputing

What is AI Supercomputing?

AI supercomputing is when organizations use ultrafast processors made up of hundreds of thousands of powerful machines to manage and interpret vast quantities of data using artificial intelligence (AI) models.

How do AI supercomputers work?

AI supercomputers are typically made up of finely tuned hardware consisting of hundreds of thousands of processors, a specialized network, and a huge amount of storage.

The supercomputers divide workloads into different processors, so that each processor has a small piece of the work. As they run their individual parts of the work, the processors communicate with each other, often quite frequently. Each processor sends a message through a communication grid, so that the information is exchanged in many dimensions; up, down, left, right, back and forth, depending on the problem. This multi-dimensional process helps to keep workloads better tuned to each other, enabling greater processing speed.

Surprisingly, AI supercomputers run fairly ordinary operating systems, using Linux software to run the applications, network, and scheduling. The analytics programs being run are usually written in C or Fortran, passing messages through a communications library called MPI, which can be used on more than one machine.

With smaller circuits packed densely in the circuit boards, an AI supercomputer can run faster, but it also runs hotter. That’s because getting sufficient power into and out of a chip isn’t efficient enough yet, so the chips get very hot. But with hundreds of thousands of these multi-core nodes together, supercomputers have huge cooling needs. To mitigate that, the circuits are made with copper wires because they can transfer energy with very high power density. The supercomputer also uses forced air to dissipate heat and circulates refrigerants as well through the entire system.

How can AI supercomputing manage analytics workloads?

There are several reasons AI supercomputers can manage complex analytics workloads.

Nodes

AI supercomputers have multiple CPUs to enable extremely rapid computational speed. Within those CPUs, or nodes, there are 10 to 12 cores to perform tasks. And because a supercomputer often clusters thousands of nodes within its architecture, that means there are 12,000 cores working per thousand. So, if a single supercomputer has a mere thousand nodes, its work performance is in the trillions of cycles per second.

Circuits

They also have very small wire connections so the circuit board can be packed with more power than traditional computers’ circuit boards. Those two advancements allow complex arithmetic and logical operations to be interpreted and executed sequentially.

Processing

In addition, supercomputers use parallel processing so that multiple workloads can be run simultaneously. Specifically, because thousands of tasks are being processed at the same time, the work is done in milliseconds. AI supercomputers allow industries to train bigger, better, and more accurate models. And with more precision, teams can analyze information faster, bring key learnings to processes, tap more sources, and test more scenarios—all so industry advancements can accelerate.

HPE and AI supercomputing

HPE has HPC/AI solutions to help you manage a wide range of workload and scale requirements. Our solutions are part of a comprehensive modular software portfolio for HPC/AI workloads that are optimized for HPC/AI applications and performance at scale, with the density and reliability you need to support high-powered CPUs and GPUs.

In addition, HPE offers HPC hardware designed for large-scale deployments that are fully integrated for deployment of any scale and built with advanced cooling options for dense platforms.

HPE Cray Supercomputers are an entirely new approach to supercomputing, with revolutionary capabilities. They are architected with a choice of infrastructure to provide an optimal solution for tens to hundreds of thousands of nodes. A dense eight-way GPU server provides consistent, predictable, and reliable performance, ensuring high productivity on large-scale workflows. Slingshot interconnect and Cray Software enable cloudlike user experiences, along with HPE Performance Cluster Manager for comprehensive system management.

HPE also offers an industry-leading enterprise platform for accelerated computing. The HPE Apollo 6500 Gen10 Plus System provides superior performance-per-dollar for GPU-intensive workloads, with unprecedented performance from NVIDIA and AMD accelerators. With the flexibility to support a wide range of CPU:accelerator ratios, workloads, and accelerators for the deep learning and complex simulation and modeling that are typical of HPC workloads, it can be delivered as a service and as hybrid HPC for flexible deployment (on-premises, off-premises, hybrid).

And whether you run HPC on-premises or on hybrid cloud, HPE Pointnext Services, offered through HPE GreenLake, can help you keep your HPC IT at top performance. With resident engineers to guide you and customized support for HPE software and hardware, you can speed the design and deployment of your AI strategy and maximize your HPC investment.