Use PyTorch with the Cray Programming Environment (PE) Deep Learning (DL) Plugin

Steps to perform before, during, and after training.

This procedure assumes that the Urika-XCS software has been installed properly.

PyTorch applications can be modified to use the Cray Programming Environment (PE) Deep Learning (DL) plugin for node communication. Changes that will need to be made to the application will be described in these instructions. This section is intended to provide high-level instructions for code changes, and it will be up to the reader to adapt these instructions according to their specific code base.

  1. Begin with a serial PyTorch training script
  2. Import the Cray PE DL Plugin
    import dl_comm.torch as cdl
  3. Wrap PyTorch optimizer with the Cray PE DL Plugin's DistributedOptimizer class:
    optimizer = cdl.DistributedOptimizer(optimizer)

    For a more complete overview of the Cray PE DL Plugin Python API, refer to the API overview in the Use Keras with the Cray Programming Environment (PE) Deep Learning (DL) Plugin section as well as the examples included with the Cray PE DL Plugin software package.