Q&A: Argonne National Laboratory's Rick Stevens talks about 'the biggest, baddest' computer in the world
We are on the cusp of a new computing age. One of the nation's first exascale computers, Aurora, will in the near future be deployed at Argonne National Laboratory, a U.S. Department of Energy (DOE) research facility outside Chicago.
Exascale machines like Argonne's HPE-Cray Aurora will be many times more powerful than the fastest supercomputers in existence today. How big a deal is that? Coupled with advancements in artificial intelligence, exascale computing promises to radically accelerate virtually every field of scientific research—from climate change to alternative fuels to new vaccines. In short, it can help tackle the world's biggest problems, potentially affecting every person on the planet.
We spoke with Rick Stevens, a professor of computer science at the University of Chicago and leader of Argonne's Exascale Computing Initiative, about how these new supercomputers will change the world.
Let's start with the basics: What is exascale computing?
Exascale is computing starting at 10 to the 18th operations per second—or a billion-billion OPS. It's a thousand times faster than the petascale supercomputers we have now, which are the fastest machines we've built to date. It's a single machine that's bigger than anything we can do in the cloud, focused on a single problem at a time. Exascale computers are simply the biggest, baddest machines we can build out of conventional technology.
What are some of the tech innovations that make exascale possible?
One of the things that's helped to drive exascale is the shift from CPUs to GPUs, which has been accompanied by a dramatic improvement in high-bandwidth memory. If we'd tried to power the level of computing GPUs provide with normal memory, it wouldn't work. Exascale systems have a much higher ratio of compute capability relative to memory and bandwidth.
This means that exascale computers are going to be really excellent, almost purpose-built AI machines. GPUs are really good at doing AI training, and the ratio of compute to memory you need for that is ideally situated for these machines.
What will these machines enable scientists to do that they couldn't do before?
It's a long list. The ability to run large-scale simulations and combine them with AI gives us the opportunity to push on a number of areas important to Argonne and the world at large, including materials science, climate, cancer research, and cosmology.
We can use it to invent next-generation batteries that are safer, cheaper, and run longer. We can use it to design better nuclear fuels for clean energy or develop new polymers that degrade naturally when exposed to light so we don't put more plastic into the ocean. We can connect it via fiber-optic cable to the Advanced Photo Source (APS)—Argonne's world-leading high-energy light source, just 2 kilometers down the road—and use AI to generate scientific results in real time.
The APS works like a giant, ultra-bright X-ray machine, allowing researchers to study samples at an atomic level. It is being used to determine the protein structures of the virus that causes COVID-19 and to understand why some face mask fabrics provide better protection than others, to name just two of many examples. It's going to be really amazing to have two of the world's leading scientific instruments tied together on the same campus.
We know this is a bit like asking parents to name their favorite child, but which of the many exascale research projects excites you the most?
I think we will make great strides in drug development with these machines. They will let us apply AI and machine learning to physics-based models that will fundamentally change how new drugs are designed and tested. The goal is to go from an idea to a clinical trial in one year. Obviously, that timeline is being driven in part by COVID-19 research, but there's also a big opportunity to extend it to cancer, heart disease, and the development of new antibiotics.
Another exciting opportunity is in the science of manufacturing. Whether you're designing a new battery, a new optical coating to protect solar panels, or a new photo-voltaic material, it usually takes about 20 years for it to become widely available. Can we use AI and modeling to shorten that? Can we invent a new material in one year and get it into scalable production in another? Exascale computing can get us much closer to that.
Many large federally funded technology projects have gone on to impact the lives of ordinary people as well as enterprises, the Internet and GPS being two obvious examples. Will exascale follow a similar trajectory?
When GPS satellites were first launched, the military was the primary customer, and devices that could receive the signals cost more than your average car. What happened over time, of course, is that those costs came down. With exascale, we pushed the computer vendors to really up their game. And so the advanced CPUs, GPUs, and memory technology they're enabling will be in everyday machines within a few years. These won't be exascale machines, but the technology will be there. That will be one of the lasting dividends.
The other benefit, of course, is that solving these big problems—whether it's climate modeling or precision medicine or applying machine learning to scientific data—will affect a lot of other things, including policy decisions.
If exascale computers use conventional silicon technology—only much more of it—why haven't we built one already?
There are three enormous challenges to exascale. The first is power. If we had tried to build an exascale computer 10 years ago, it would have required a gigawatt of power—generating an electricity bill of about a $1 billion a year. We had to develop more energy-efficient electronics.
The second challenge is scale. We're using close to a hundred thousand GPUs, each of which has tens of billions of transistors cycling a billion times a second. Until recently, getting to that many computational elements was impossible due to the levels of integration required.
The third element is reliability. When you've assembled hundreds of thousands of electronic components, things are constantly breaking. You've got to build enough redundancy and fault tolerance into the design that it keeps working when parts fail.
In the world of PCs, software development typically lags well behind hardware. There is no Moore's Law for applications. Will that be a problem with exascale?
When we began planning for exascale, we recognized there could be a huge gap between building the hardware and having software available to run on it. So, in 2016, the DOE created the Exascale Computing Project to build applications and operating systems that work with exascale.
It's an amazing team effort involving more than 30 universities and over a thousand DOE employees, all to ensure that the hardware and software land at the same time. It's like building an airplane while you're trying to fly it, while at the same time you're also building the aircraft carrier where it's going to land.
When we unbox Aurora, it will come with more than 80 software packages in two dozen application areas, with many more to come. The DOE's massive investment in software libraries will make it much easier for people to build new exascale applications.
Science fiction has taught us to be afraid of machines that are as smart or smarter than humans. Are there good reasons to be afraid?
In some ways, we already have machines that are much smarter than we are. I mean, when was the last time you computed a quantum wave function?
So we definitely want machines to be better than people in doing certain tasks. That's the reason machines exist: to do things that are too hard or unsafe for humans to perform.
Technologies by themselves are neither good nor bad. But people with evil intentions can do bad things with advanced technology. We need to stay balanced in how we think about the ethical, security, and safety issues, especially as advanced technology becomes even more important than it is today. If you've got a machine that is a thousand times faster and smarter, you want it to operate in an environment where doing the right thing is understood by everyone. This means that everyone, not just technologists, needs to be part of this discussion.
We need to create a greener economy. We've got to solve the climate problem, and the problem of running out of antibiotics, and the need for better energy technology and safer vehicles. That is why we are trying to build these intelligent machines. We just need to do it with our eyes open.
It's a single machine that's bigger than anything we can do in the cloud, focused on a single problem at a time. Exascale computers are simply the biggest, baddest machines we can build out of conventional technology.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.