Identifying developers' HPC priorities: Parallelism, accelerators, and more
Every computing era has technology challenges that require heavy lifting, with tasks just slightly beyond current capabilities. These challenges help software developers and IT justify investments in the most powerful computers of the time.
As computing power increases, so does the definition of high-performance computing (HPC), as well as the problems that computer scientists throw at it. The 8088 processor at the heart of the IBM PC released in 1981, just a decade after Apollo 11's trip to the Moon, had eight times more memory than the Apollo Guidance Computer. Today’s smartphones leave that computational power in the (moon) dust.
Our idea of what’s possible is always just slightly outside our grasp. The idea of big data in the 1980s involved a far smaller dataset than something we take for granted today.
As a result, the notion of HPC changes more quickly than do the hardware innovations on which its capabilities rely. As the processors and storage become more powerful, so does the scope of the problems that computer scientists can envision solving. In the 1970s, there were flame wars about the best sorting algorithms because those stressed mainframe systems. These days, HPC is often used in machine learning, data analysis, and software modeling—and in five years, it may be something else.
Here is where the technology is going, according to computer scientists who work in the nation’s top supercomputing centers. They offer useful advice for the HPC community on what to prioritize in 2019. The answers may reassure you that your enterprise is on the right track—or bolster your argument that you need additional funding (or a new HPC job) because your project is hopelessly behind.
Do the numbers
Let’s start with some overall statistics. Last August, Evans Data Corp. surveyed developers about their HPC efforts, including their goals for HPC application optimization. Nearly 1,000 developers met its criteria (they target multiple processors or processor cores or plan to do so within the next 12 months), with primary concerns being fast response times for small datasets (22 percent), high streaming throughput of large datasets (19 percent), as well as real-time analysis (17 percent), which would seem to underlie the other two concerns.
Some HPC needs reflect the focus of individual businesses, industries, or departments, such as computer-aided design (18 percent), intensive graphics (12 percent), and mathematical/scientific simulation or modeling (11 percent).
The top two end goals for HPC developers optimizing their applications are efficient use of parallel processing and sufficient use of instruction or memory caching.
Nick Wright, chief architect of Perlmutter, the National Energy Research Scientific Computing Center’s next-generation supercomputer, offers blunt advice that underscores these aims: “The time for deploying CPU-only based resources and expecting to see significant increases in the performance and capabilities of an HPC system is coming to an end. HPC resource providers need to work with their user communities to modernize their applications to exploit energy-efficient accelerators such as GPUs or FPGAs [field programmable gate arrays].”
Wright concludes with a rallying cry: “Start working on your application now! Any modifications that enable more parallelism to be exploited will stand you in good stead for the future.”
Planning in 2019
Weighing in extensively on 2019 HPC priorities, Rob Neely, associate division lead for the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, says he sees “a major paradigm shift” coming for HPC. Like Wright, he feels “purely CPU-based homogenous systems are reaching their limits in performance and energy efficiency.”
He warns, “For applications that wish to remain at the leading edge of computational power, it will become critical in 2019 to consider a path toward transitioning computationally intensive portions of their calculations to utilize accelerators.”
Beyond the qubit: quantum computing, practical alternatives, and Memory-Driven Computing
Richard Loft, director of the technology development division in the Computational and Information Systems Laboratory of the National Center for Atmospheric Research (NCAR), also emphasizes accelerators. “A big challenge we see for ourselves and the HPI [high performance interconnects] community is moving large and complex applications from the prototype stage to production readiness on accelerators,” he says.
Storage advances are having an impact as well. “Heterogeneity is increasingly finding its way into memory and storage systems,” says Neely. “High Bandwidth Memory is fast but relatively low capacity and often currently limited to accelerated devices, such as GPUs, or as a subset of the total memory on a node, as deployed in the Intel Xeon Phi class of Knights Landing processors. Likewise, NVRAM provides ample storage capacity for memory-hungry applications but is lagging in performance. This multilevel approach to the memory hierarchy (HBM, standard DDR, and NVRAM) is becoming increasingly common and, combined with heterogeneous processing units, is further complicating the programming models and performance tuning required for HPC applications.”
Both computer scientists see machine learning and AI as a priority, with its demands propelling the adoption of accelerated systems. Importantly, says Neely, “it will be critical for the HPC community to not only learn to ‘ride the wave’ of trends in hardware designed for AI and machine learning, such as use of lower precision units (like the NVIDIA Tensor Cores introduced in the NVIDIA V100 Volta GPU), but also to completely rethink how we approach our algorithms so that we can incorporate these rapidly advancing technologies in scientific computing. For example, we may find that a sufficiently trained dataset for complex models could enhance or even replace in certain physical regimes the need for expensive sub-grid analytic models.”
But if machine learning drives the challenge, it could also be a vital part of the solution. According to Neely, “In situ analytic techniques building on machine learning methods could allow us to gain much deeper insights into our calculations, leading to both increased scientific discovery as well as self-tuning algorithms and more robust applications. I fully expect that over the next decade we will see the most innovative applications taking advantage of ‘intelligent simulation’ to help overcome the slowdown in raw floating-point and memory performance we’re seeing in the run-up to post-Moore’s Law computing.”
To implement his vision of intelligent simulations, Neely recommends more research and development in the building of complex workflow management systems. They’re needed to allow for the plethora of solid machine-learning toolkits to coexist in a complex distributed memory (e.g., MPI) based simulation. “Technologies such as containers (Docker, Singularity, Shifter) can help ease some of the effort but are not an end solution,” he says. “As a starting point, one can more easily take the first steps of exploring the benefits of AI technologies during pre- and post-processing through data analysis. Whether it’s managing your large datasets in Spark, using R for analysis, or simply exploring the possibilities of training some data with a simple MatLab or Mathematica model, 2019 is the year in which this will begin to take root.”
Neely joins Wright in exhorting developers to adjust their focus as soon as possible: “Teams that begin to internalize this inevitable change now will be well placed in 2019 and beyond to avail this monumental shift in computing taking place around us.”
Loft says such focus should begin as early as possible. “Moving forward, machine learning techniques need to be in the tool box of every HPI professional, in the same way a knowledge of basic statistics or mathematics was required in the past.”
While it’s “no great surprise” that cloud computing is a 2019 priority, says Loft, he also warns it is not without potential pitfalls. “While the cloud can be a great place to prototype and test, HPI developers with (or who will generate) lots of data or who need very high performance really do need to think carefully up front about the cloud TCO issues.”
Also, because the cost of entry to the cloud is so low, organizations may end up with a patchwork quilt of cloud accounts sprinkled across their different development teams, cautions Loft. “A more cost-effective solution can be to pursue an organizational bulk purchase of cloud resources,” he says. But be careful in implementing this approach, Loft says, to avoid stifling innovation with red tape.
HPC recommendations for developers
Crushing the fantasies of pointy-haired bosses but reassuring those on the front lines, Loft points out, “Developing large-scale accelerator applications is not for the faint of heart.”
Like many projects, the problems come less from a lack of technical prowess than a lack of planning process. “Once underway, projects can easily fall prey to a ‘perpetual pursuit curve’ syndrome in which architectural and application changes outstrip the team’s ability to refactor and verify ported code. Therefore, identifying the compilers, tools, or frameworks that can best accelerate the porting process are critical to the project’s prospect of success. Equally important is obtaining access to stable test platforms. This becomes more difficult as the project’s focus moves to scaling the applications up.”
Neely gives even more detailed advice: “To address accelerated computing, it will be important for application teams to begin to understand how to write algorithms and implementations that can effectively utilize accelerators while not completely giving up on pure CPU-based performance, as non-accelerated CPU-based systems will likely be with us for years to come at the midrange and desktop scale.”
One help, says Neely, are standards such as the recently announced OpenMP 5.0 specification. They provide one path to performance portability, he says, “but can be limited in their effectiveness depending on your algorithm and target architectures, and compiler technology supporting this standard is nascent.”
“Software abstraction layers such as Kokkos and RAJA (both open source) are another effective approach that can help ease the transition—but are limited to C++ applications,” Neely continues. “Likewise, language standards such as those being debated for the C++20 specification are moving toward native support for some parallelism in the base language, but these are not always focused on the needs of the HPC community. The programming environment available with NVIDIA GPUs is currently the most robust and advanced, and doing an initial port of your computationally expensive kernels to CUDA can be an effective learning tool (and perhaps an end game for some) to understand how to extract massive amounts of concurrency from your algorithms and manage heterogeneous memory resources.”
HPC developer priorities for 2019: Lessons for leaders
- The era of on-premises CPU-based HPC computing is over. Up your expertise on all the latest buzzwords: cloud computing, GPUs, accelerators, heterogeneous environments.
- Machine learning isn’t just an application for HPC. It could also help model solutions to computing bottlenecks.
- Train the young, train yourselves. A revolution is coming, and you don’t want your enterprise to be left understaffed or your own skill set to be outmoded.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.