Hyperparameter Optimization (HPO) Troubleshooting Information

HPO tips for debugging evaluator related issues.

Resolving Issues Related to Python3

If the Python3 enthronement is not set up properly, the system will return will return the following message when loading the analytics module: Refer to the procedure Set up a Python3 Environment if the system returns the following error when loading the analytics module and it is required to use Hyperparameter Optimization (HPO):
INFO: CrayAI HPO requires the numpy module to be installed with \
python3, but no numpy support was detected.
INFO: Now attempting to load module cray-python to provide numpy support.
INFO: No cray-python available. In order to use the CrayAI HPO Module, \
please set up a python3 environment with numpy support.
Refer to the procedure Set up a Python3 Environment before loading the analytics module to resolve this issue.

Verbose Mode

Verbose options of the evaluator and optimizer provide insight into different portions of the HPO process. By setting the evaluator to verbose mode, information regarding the distribution of evaluations and any stderr coming from those evaluations is printed to the console. By setting the optimizer to verbose mode, details from the optimization process are printed between generations, providing insight into how the algorithm is adjusting the hyperparameters. Details from the optimization process also indicate progress towards minimizing the figure of merit.

Suggested Debugging Steps

  • Ensure the produced evaluator commands run properly outside of the HPO tool.
  • Set the evaluator to verbose mode.
    • Run the command with a separate run_training or srun command, starting with a single node and working up for distributed trainings.
    • Ensure that the Urika-CS commands are working as expected when urika_args is provided to the Evaluator
  • Try setting -v in the urika_args parameter, or when running outside of the HPO tool, as illustrated in the following example:
    evaluator = hpo.Evaluator(cmd, nodes=8, launcher='urika', urika_args="-v --no-node-list", verbose=False)