Cloud-Based Supercomputing
What is Cloud-Based Supercomputing?
Cloud-based supercomputing is when high performance computing is executed on the cloud, allowing multiple users to share a supercomputer while ensuring the privacy and security of each individual workload.
How do cloud-based supercomputers work?
Cloud-native supercomputers have an architecture that allows for more efficient performance than traditional supercomputers. They manage both computing and communications in parallel to allow intense workloads to process more smoothly. That’s because they use three kinds of processors—CPUs, DPUs, and accelerators that are typically GPUs. Let’s examine what each of the three does.
- CPU: CPUs are built for the parts of algorithms that require fast serial processing. However, as compute tasks are much more complex in supercomputing, CPUs often become burdened with growing layers of communications tasks needed to manage increasingly large and complex systems. In fact, on traditional supercomputers, a computing job sometimes has to wait while the CPU handles a communications task.
- DPU: A DPU, or data processing unit, is a data-center-on-chip platform that delivers infrastructure services, managing all provisioning, virtualization, and hardware. It gives each supercomputing node two new capabilities: one to enable bare-metal multi-tenancy, and the second to enable bare-metal performance. In the first case, an infrastructure control plane processor secures user access, storage access, networking, and lifecycle orchestration for the computing node. In the second case, an isolated line-rate data path allows hardware acceleration. This allows the CPU to offload routine tasks and instead focus on processing tasks, maximizing overall system performance.
- GPU: GPUs in cloud-native supercomputing function as general-purpose co-processor engines. They use graphic processing units to speed up applications running on a CPU by running multiple searches in parallel.
Supercomputing vs. cloud computing
The difference between supercomputing and cloud computing is purely a measure of scale. While enterprise cloud computing created new ways for businesses to engage customers and disrupted how organizations managed data, DevOps, and overall IT operations, supercomputing accelerates R&D (research and development) speed and product development by orders of magnitude. Quite simply, by processing at trillions of operations per second, supercomputing changes the pace and dynamics of innovation.
Cloud computing brought the entire suite of computing services—servers, storage, databases, networking, software, analytics, and intelligence—to the internet. Rather than using an on-premises environment, any type of computing service is hosted on the cloud so that multiple users can access them at any time, at the same time, without the risk of overloading capacity. This created a whole new world of scalability and efficiency that continues to enable organizations to modernize their operations.
Connecting the massive processing power of supercomputers to the cloud’s scale and inherently connected nature opens up an entirely new field of possibilities for science and engineering. Cloud-native supercomputing enables rapid simulations, from software to medicines to prototypes, accelerating the pace at which companies can commercialize new product innovations and scientists can advance breakthroughs in health, space, energy, and more.
HPE and cloud-based supercomputing
The HPE Cray supercomputer software platform adds the productivity of cloud and data center interoperability to the power of supercomputing. Built on decades of supercomputing expertise, it is offered in a cloud services model to deliver a new standard in manageability, reliability, availability, and resiliency.
The software stack is designed to support microservices-based composability for rapid innovation of new converged workflows across processor architectures. With a curated set of flexible and powerful tools to create new converged modeling, simulation, analytics, and AI workflows, it delivers the highest level of productivity for today’s challenges in the engineering, scientific, security, and artificial intelligence communities.
The platform can be split into administrator services and end-user services with full separation of the management plane from the compute plane. This allows each to be run and upgraded seamlessly without impacting the other, which leaves more time for computation. And it can be scaled from development in the cloud to production on a supercomputer.
In addition, with an entirely new design, created from the ground up, we offer two powerful solutions to deliver application HPC and AI performance at scale. Both HPE Cray supercomputers provide flexible solutions for tens to hundreds to thousands of nodes and deliver consistent, predictable, and reliable performance, facilitating high productivity on large-scale workflows.