Enable Anaconda Python and the Conda Environment Manager

How to enable Python and Conda.

This procedure assumes that Slurm or PBS Pro is being used as the workload manager. Contact Cray support if using other workload managers.

In addition to the default system Python, Urika-GX also ships with the Anaconda Python distribution version 4.1.1, including the Conda package and environment manager. Users can enable and/or disable Anaconda for their current shell session by using environment modules.

OSA images ship with the Anaconda Python distribution version 2019.10, including the Conda package and environment manager. This is the recommended Python distribution for running analytic jobs using Urika-XCS. If there is an active Conda environment, PySpark will automatically use Anaconda.

  1. Open a shell on the login node.
    nid00030 is used as an example for a login node in this procedure.
  2. Load the analytics module
    $ module load analytics
  3. Allocate resources, using workload management specific commands.
    Example for allocating resources using Slurm.
    $ salloc -N numberOfResources
    Example for allocating resources using PBS Pro.
    $ qsub -I -lnodes=numberOfResources
    $ module load analytics
    $ qsub -I -lnodes=numberOfResources
    $ module load analytics
    $ module load openmpi/gcc/64/4.0.1

    The path shown in the preceding example for loading the openMPI module depends on the system.

  4. Start an analytics cluster in development mode.
    $ start_analytics -d
    For more information, refer to the start_analytics man page.
    This will place the user on a node running an interactive container. nid00030 is used as an example for an interactive container node in this procedure.
  5. Load the anaconda3 module.
    [user@nid00030 ~]$ module load anaconda3
    Loading the anaconda3 module will make Anaconda Python the default Python, and enable Conda environment management.
  6. Create a Conda environment.
    The following example creates a Conda environment with scipy and all of its dependencies loaded:
    [user@nid00030 ~]$ conda create --name scipyEnv scipy
    Important: Use the conda config --add envs_dirs path_to_directory command if it is required to set an alternate environments directory for Conda. path_to_directory must be a directory that is mounted within the container. This is particularly useful when the home directory space is limited.
  7. Activate the Conda environment.
    [user@nid00030 ~]$ source activate scipyEnv
    For more information about Anaconda, refer to https://docs.anaconda.com. For additional information about the Conda environment manager, please refer to http://conda.pydata.org/docs/
  8. Verify that the name of the environment is prepended to the shell prompt to ensure that the Conda environment has been activated.
    In the following example, (scipyEnv) has been prepended in the prompt, which indicates that the Conda environment has been activated.
    (scipyEnv) [user@nid00030 ~]$ 
    Once the Conda environment has been activated, Python and PySpark will both utilize the selected environment. If it is not required to have PySpark utilize the environment, manually set PYSPARK_PYTHON to point to a different Python installation.
    • To deactivate a Conda environment, use source deactivate:
      (scipyEnv) [user@nid00030 ~]$ source deactivate
    • To disable Anaconda and Conda, and switch back to the default system Python, unload the module:
      (scipyEnv) [user@nid00030 ~]$ module unload anaconda3
    For more information about Anaconda, refer to https://docs.anaconda.com. For additional information about the Conda environment manager, please refer to http://conda.pydata.org/docs/