Nvidia RAPIDS Accelerator for Apache Spark

NVIDIA RAPIDS Accelerator for Apache Spark

 

The NVIDIA RAPIDS Accelerator for Apache Spark 3.x transparently accelerates ETL, query and analytics workloads on HPE Ezmeral Runtime by dramatically improving the performance of Spark SQL and Data Frame operations.

As data scientists shift from using traditional analytics to leveraging AI applications that better model complex market demands, traditional CPU-based processing can no longer keep up without compromising either speed or cost. The growing adoption of AI in analytics has created the need for a new framework to process data quickly and cost efficiently with NVIDIA GPUs and NGC (NVIDIA GPU Cloud) software.

The NVIDIA RAPIDS Accelerator for Apache Spark requires no code changes to accelerate most workloads and enables Spark application developers to explicitly take advantage of GPU hardware.

Product Name

NVIDIA Accelerator for Apache Spark

Product Version

v21.08.0

HPE Ezmeral Runtime Version

5.3

Overview
  • The RAPIDS Accelerator for Apache Spark extends Spark's Catalyst query planner to analyze and transparently replace CPU-based SQL and DataFrame operators with GPU-accelerated equivalents. When the query plan is executed, those operators can then be run on GPUs within the Spark cluster.
  • NVIDIA has also created a new Spark shuffle implementation that optimizes the data transfer between Spark processes. This shuffle implementation is built upon GPU-accelerated communication libraries, including UCX, RDMA, and NCCL.
  • Spark 3.0 recognizes GPUs as a first-class resource along with CPU and system memory. This allows Spark 3.0 to place GPU-accelerated workloads directly onto servers containing the necessary GPU resources as they are needed to accelerate and complete a job.
  • NVIDIA engineers have contributed to this major Spark enhancement, enabling the launch of Spark applications on GPU resources in Spark standalone, YARN, and Kubernetes clusters.
 

Spark 3.0 marks a key milestone for analytics and AI, as ETL operations are now accelerated while ML and DL applications leverage the same GPU infrastructure. The complete stack for this accelerated data science pipeline is shown above.

Resources
HPE Ezmeral x NVIDIA = Performance Multiplied for the Age of Insight
Additional Information

Explore the industry’s first enterprise-grade container platform for cloud-native and distributed non-cloud native applications,  HPE Ezmeral Runtime

Interested in learning more about the HPE Ezmeral Runtime and Apache Spark? Please contact us to learn more.  

Explore other featured applications