Cray pursues advanced research with DOE for exascale systems

May 7, 2019

Cray to partner with Oak Ridge National Laboratory to develop new technologies for the Exascale Era

In this article

  • Shasta supercomputers are set to be the technology underpinning the Exascale Era—one characterized by a deluge of data and a convergence of modeling, simulation, analytics, and AI workloads
  • Frontier is the third major contract award for the Shasta architecture and Slingshot interconnect

Cray Inc. and the U.S. Department of Energy (DOE) today announced they are pushing forward the technology boundaries of computational science with the development of the “Frontier” exascale supercomputer for Oak Ridge National Laboratory (ORNL), expected to be delivered in 2021. The system will be based on Cray’s new Shasta architecture and Cray Slingshot interconnect and will feature future generation AMD EPYC CPU and Radeon Instinct GPU technology. This is the third major contract award for the Shasta architecture and Slingshot interconnect; previous awards were for the National Energy Research Scientific Computing Center’s NERSC-9 pre-exascale system, and the Argonne National Laboratory’s Aurora exascale system.

“The Frontier system architecture embodies the compute- and data-intensive capabilities required to unlock the full potential of the Exascale Era,” said Jeff Nichols, Associate Lab Director at ORNL. “The power and flexibility of the system will enable the creation of new converged HPC, analytics, and AI applications across the full breadth of the exascale computing program’s mission.”

Shasta supercomputers are set to be the technology underpinning the Exascale Era, which is characterized by a deluge of new data and a convergence of modeling, simulation, analytics, and AI workloads. To enable this fusion of workloads to run simultaneously across the system, Slingshot was designed to incorporate intelligent features like adaptive routing, quality-of-service, and congestion management. Frontier will utilize Cray’s new Shasta system software for monitoring, orchestration, and application development to provide a single developer interface across the system. The new software stack is a fully containerized architecture that combines the scalability and performance of HPC while enabling the productivity and portability of cloud.

“Exascale systems demand a complex balance of compute, interconnect, and software capabilities to enable HPC and AI applications to execute simultaneously and with optimal performance,” said Steve Scott, CTO at Cray. “This poses a number of architectural challenges across the entire HPC space ranging from the development of new high-density compute infrastructure, to modernizing developer software for the creation of extreme scale, data-intensive applications. Delivering these technologies for Frontier is incredibly exciting, as they will also become standard product offerings from Cray, enabling us to deliver enhanced performance and productivity to businesses large and small.”

Innovation for Frontier

In addition to the capabilities native to the Shasta system, Cray has also been awarded a separate joint development contract to pursue new foundational technologies for the Frontier system. This includes the development of new high-density compute infrastructure, enhancements to HPC developer tools for GPU scaling and AI, and the creation of a Center of Excellence to establish best practices for exascale application development and tuning.

Compute Blade and Cabinet Infrastructure

Current approaches to delivering dense GPU compute form factors have hit limitations in packaging density due the amount of power that can be delivered to a blade, resulting in more data center floorspace being required to deliver comparable performance. To reach sustained exaflop performance, the Frontier system will transcend those limitations with powerful and dense compute and cabinet infrastructure capabilities. For Frontier, Cray is designing a new AMD EPYC CPU and Radeon Instinct GPU powered blade for the Shasta high-density cabinet. Cray will also engineer new high-efficiency power delivery and integrated direct liquid cooling capabilities for key server components to ensure high operational energy efficiency and low total cost of ownership.

“The Frontier design is a marvel of engineering and AMD is proud to be bringing its technical innovation to the project in conjunction with Cray, Oak Ridge National Lab and the Department of Energy,” said Mark Papermaster, executive vice president and chief technology officer, AMD. “AMD has a long history of pushing the boundaries of compute performance and working with DOE on advanced exascale research. I’m very excited to see a combination of custom AMD EPYC CPUs, purpose-built Radeon Instinct GPUs, and our open software development tool set selected to power this amazing machine.”

Software Innovation and Collaboration

To enable developer productivity, users will require a high-level software development environment with tightly coupled compilers, tools, and libraries which abstract away system complexity. The Cray Programming Environment (Cray PE) has delivered these core capabilities for Cray users for decades and, as part of this program, will see a number of enhancements for increased functionality and scale.

This will start with Cray working with AMD to enhance these tools for optimized GPU scaling with extensions for Radeon Open Compute Platform (ROCm). These software enhancements will leverage low-level integrations of AMD ROCmRDMA technology with Cray Slingshot to enable direct communication between the Slingshot NIC to read and write data directly to GPU memory for higher application performance. Finally, to provide a seamless developer workflow, Cray PE will be integrated with a full machine learning software stack with support for the most popular tools and frameworks. Taken together, the rich HPC development capabilities of Cray PE, in combination with an optimized and scalable data science suite, will enable developers to fully embrace the converged use of analytics, AI, and HPC at extreme scale for the first time.

Application Development and Tuning

To further accelerate user adoption of the system, a Center of Excellence will be established by Cray and Oak Ridge National Lab to drive collaboration and innovation, and to assist in the porting and tuning of key DOE applications and libraries for the Frontier system. This will include collaborative modernization of new and legacy code to support directive-based programming models such as OpenMP, and delivering training and workshops for hands-on learning of how to fully leverage the system. This collaboration will ensure that best practices are defined and disseminated quickly to further accelerate development of exascale-class applications.

“This is another major win for Cray and means that in 2021 America’s top two supercomputers and most powerful entries in the global exascale race will use the Cray ‘Shasta’ architecture,” said Steve Conway, Hyperion Research senior vice president of research. “This architecture is designed to support the extreme heterogeneity needed for future HPC and AI workloads.”

Cray Shasta systems fuse the performance and scale of supercomputing with the productivity of cloud computing and full data center interoperability. By providing a flexible compute infrastructure, a modular and containerized software architecture, and an intelligent and ethernet-capable system interconnect, Shasta supercomputers seamlessly bridge the worlds of extreme scale advanced research and enterprise data centers for the first time.

The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system, and multi-year system support.  The Frontier system is expected to be delivered in 2021 and acceptance is anticipated in 2022.

For more information about the work being done by Cray with the DOE’s ORNL visit their website HERE.

About Hewlett Packard Enterprise

Hewlett Packard Enterprise is the global edge-to-cloud platform-as-a-service company that helps organizations accelerate outcomes by unlocking value from all of their data, everywhere. Built on decades of reimagining the future and innovating to advance the way we live and work, HPE delivers unique, open and intelligent technology solutions, with a consistent experience across all clouds and edges, to help customers develop new business models, engage in new ways, and increase operational performance. For more information, visit: www.hpe.com.

This press release originally published on cray.com and has been updated and published here in HPE’s Newsroom.

Share this article