Implement an effective HPC Cryo-EM strategy and give your researchers timely access to mission-critical data

This white paper examines the challenges IT organizations face in supporting cryo-EM and what to look for in an HPC solution to speed time to discovery and delivery.

+ show more
Business white paper
TAP IMAGE TO ZOOM IN

Introduction

Cryo-electron microscopy (cryo-EM) is rapidly making a significant impact across industries from drug discovery to material sciences. Any organization seeking to solve macromolecular structures is reaping the benefits of recent advancements in cryo-EM technology. To maximize the impact of the microscope investment, organizations need to maximize the productive utilization of the instruments and associated resources.

Since the announcement of the Nobel Prize for the method in 2017, cryo-EM’s high‑resolution structure determination has provided an almost quantum leap in drug exploration and development for disease treatments. One of the three contributions that led to this groundbreaking method was an algorithm for processing and analyzing cryo‑EM images. To get the full value of that contribution, you need an efficient computational infrastructure tailored to cryo-EM’s data processing workflow.

 

Therefore, the compute and storage resources for cryo-EM are as mission critical as the microscopes themselves. Cryo-EM microscopes produce so much data that storage and 3D reconstruction compute systems quickly become overwhelmed. In some cases it drives the need for IT procurements every three to four months. This lengthens the time it takes for researchers to get accurate models of the protein structures and leaves discoveries bottlenecked and value wasted.

 

This leaves the IT organization facing increasing pressure to support the growing needs, not only of cryo-EM, but of other computational imaging and data-intensive workloads. Processing this enormous amount of data requires a powerful, tightly integrated, and scalable solution that maximizes the utilization and productivity of the researchers and the equipment. Choosing an integrated compute and storage solution that fits easily into your existing IT infrastructure and scales to meet demand will keep your users’ projects on pace today and in the future.

 

In this paper, we examine the challenges the IT organization faces in supporting cryo-EM and recommend what to look for in a solution. The mission is simple—to make it easier for you to deliver a high-performance computing (HPC) technology infrastructure that can help your organization reduce complexity and improve time to discovery.

TAP IMAGE TO ZOOM IN

Cryo-EM creates advances...and IT challenges

Cryo-EM uses a rapid sample-freezing technique, together with electron microscopes equipped with extremely high-resolution cameras, to image samples of purified biological macromolecules such as proteins. By generating 3D models in near-atomic detail, cryo-EM helps researchers determine the structure, and therefore the function, of these molecules in order to identify more targeted and effective treatments.

TAP IMAGE TO ZOOM IN

But IT challenges are introduced as soon as cryo-EM instruments begin taking pictures. In practice, a 3D model of a molecule is reconstructed from thousands of 2D digital images of isolated identical particles stored as a movie; producing, on average, 2 GB of data per movie.  1 In the course of a day, a single cryo-EM microscope generates a huge volume of data—currently about 1 to 3 TB per day. And when organizations have multiple microscopes in production, the data growth multiplies and IT challenges compound.

 

In addition to explosive data growth, cryo-EM generates an enormous demand for computing power as the entire post-data acquisition process involves intensive computations performed by specialized packages optimized to take advantage of GPU‑accelerated systems. Careful and planned investments into HPC infrastructure is critical to ensure the data coming from the instruments is stored and served up efficiently ensuring computing bottlenecks do not occur.

 

Understanding the cryo-EM data flow and how IT can positively impact its promise

Cryo-EM microscopes generate image data faster and in greater volumes than typical IT infrastructures can handle. Throughout the cryo-EM data pipeline, movie files are iteratively processed and analyzed (Figure 1). More images at greater resolution is a boon for researchers, but only if the compute and storage architecture can accommodate it. Critical to the success of a cryo-EM facility, particularly one that operates in a shared-service mode, is the availability of sufficient HPC storage and computing resources.

TAP IMAGE TO ZOOM IN

Figure 1. Cryo-EM workflow

The Figure 1 shows the various steps in the cryo-EM pipeline from data acquisition to final 3D structure that we’ve used to build our configuration blueprints for accelerating that workflow and making it more cost-effective. Each cryo‑EM workflow will differ slightly from this based on the combination of software packages your researchers prefer.

 

The infrastructure that any organization needs will vary based on the size and type of an institution, the number of researchers and projects supported, resources and funding available, the institution’s unique research mission and the servers and storage already in place. Given the diversity of the cryo-EM community, the following outlines an approach to addressing three common operational scenarios.

 

  • Scalable cluster for preprocessing supports tasks after data collection up to particle sorting. Organizations focusing on preprocessing often support many users (inside and outside of the organization) and have multiple microscopes but may not have the capacity to support data analysis after preprocessing.
  • Dedicated cluster for image analysis consists of the image analysis part of the workflow from 2D classification to the complete 3D model of the structure. Organizations supporting researchers that do not own their own microscopes and have preprocessed data from a shared facility will be interested in this configuration blueprint.
  • Flexible cluster for mixed life sciences workloads addresses the entire cryo-EM workflow from data collection to complete 3D model, but it is also designed to support a heterogeneous workload. Organizations requiring a cryo-EM end-to-end arrangement typically support other data-intensive lab instruments, high-performance data analytics (HPDA) workloads, or machine learning and deep learning projects.

 

Data storage and availability

In a shared facility, upwards of 21 TB of data can be generated weekly by a single microscope. All this data needs to be transferred, processed, stored, and analyzed. In some cases, data is processed concurrently with data collection and can only be efficiently addressed with high-performance storage systems. When the infrastructure can’t support the volume of data generated, the data is offloaded and stored until it can be analyzed. Slower network speeds may create computation bottlenecks and lack of sufficient shared storage can hinder your ability to support researchers throughout the computing phases of a cryo-EM project.

 

Workflow backlog

As mentioned in the previous section, a cryo-EM project requires significant processing resources for the major steps of the workflow. Cryo-EM processing packages (RELION, cryoSPARC, and so on) take advantage of GPU-enabled HPC clusters to provide timely results. The lack of sufficient resources can cause costly delays in the analytics pipeline and impact overall system congestion. Of particular importance is the availability of GPU-enabled systems to handle the compute-intensive processes related to 2D and 3D classification and refinement.

 

In selecting a system to support cryo-EM, focus on a balanced HPC configuration able to support a number of third‑party applications associated with cryo-EM processing (see Software Tools for Molecular Microscopy). A balanced HPC system should include: 1) high-performance parallel storage able to handle the performance and size demands of the cryo-EM facility, 2) general purpose processing nodes to handle many of the tools, and 3) sufficient GPU‑nodes for specialized processing tasks.

Download the PDF

“A Tough Road: Cost To Develop One New Drug Is $2.6 Billion; Approval Rate for Drugs Entering Clinical Development Is Less than 12%.” Sullivan, Thomas. Updated 2018, May 6. Retrieved from policymed.com/2014/12/a-tough-road-cost-to-develop-one-new-drug-is-26-billion-approval-rate-for-drugs-entering-clinical-de.html