Accelerate performance for production AI technical white paper

  • Technical White Paper
  • PDF 2037 KB
  • 15 pages

Overview

Learn about the HPC storage requirements to accelerate performance for production AI scenarios with distributed AI servers. This paper shows the testing results from a variety of benchmarks from 1 to 32 GPUs up to 4 server nodes using flash-based WekaIO storage. See how GPU performance compares within a single server versus a clustered configuration with the same amount of ...

Learn about the HPC storage requirements to accelerate performance for production AI scenarios with distributed AI servers. This paper shows the testing results from a variety of benchmarks from 1 to 32 GPUs up to 4 server nodes using flash-based WekaIO storage. See how GPU performance compares within a single server versus a clustered configuration with the same amount of GPUs, as well as how GPU performance scales from 1 to 32 GPUs. Discover the storage bandwidth and throughput requirements for common benchmarks, such as Resnet50, VGG16, and Inceptionv4. The information in this paper can help you plan and optimize your AI resources for production AI.

Read this whitepaper to know about the impact of storage I/O on training portion of DL workflow and on inferencing for training-model validation within a distributed AI compute cluster.