Using Hyperparameter Optimization (HPO)

Steps for importing the crayai module, defining parameters to optimize, creating an evaluator, creating an optimizer, and then optimizing over the hyperparameter.

This procedure requires the following software to be installed on the system:
  • Python 3.6
  • numpy
To use the HPO framework, a user must perform the following steps:
  1. Import the required module.
  2. Define parameters to be optimized.
  3. Create an Evaluator.
  4. Create an Optimizer.
  5. Optimize over the parameter.
Descriptions of each of these steps is provided in this procedure.
  1. Log on to a login node.
  2. Load the analytics module.
  3. Load the desired Python 3 environment.
    The site default python3 or a local python3 environment can be used as long as numpy is installed.
    login$ module load cray-python
  4. Import the hpo sub module of the crayai module in a Python script.
    from crayai import hpo
  5. Define parameters to be optimized.
    These are the parameter to optimize over. They are exposed to the training program through command-line flags. The crayai hpo tool searches within a specified range, starting at a specified default value.
    Hyperparameter Definition Format
    params = hpo.Params([[command_line_flag_1, default_val_1, (min_val, max_val)],
                         [command_line_flag_2, default_val_2, (min_val, max_val)],
                         ...
                        ])
    Hyperparameter Definition Example
    params = hpo.Params([["--learningRate", 0.01, (1e-6, 1.0)],
                         ["--neuronsPerLayer", 100, (50, 500)],
                         ["--dropoutRate", 0.5, (0.0, 0.7)]])
  6. Create an Evaluator.
    The Evaluator class defines how to evaluate a set of hyperparameters by running the kernel program (model training script) with command-line arguments. This includes distribution of individual evaluations via a workload manager (specified as wlm), the Urika-XCS launcher (specified as urika), or local mode (specified as none).The Evaluator can handle distributed training processes via the nodes_per_eval parameter, and can calculate the number of parallel evaluations that can be executed simultaneously within the given allocation.
    Evaluator Definition Format
    evaluator = hpo.Evaluator(command,# Command to run to evaluate the hyperparameters
                              run_path,       # Opt: Workspace directory for log files.
                              fom,            # Opt: Unique string identifying where field of
                                              #      merit value will be in evaluation output.
                              checkpoint,     # Opt: Path to checkpoint directory per workspace.
                              alloc_job_ID,   # Opt: Allocation id for existing allocation
                                              #      (wlm launcher only)
                              nodes,          # Opt: Total node count in the allocation
                              nodes_per_eval, # Opt: Nodes needed for each evaluation
                              launcher,       # Opt: How to distribute the evaluation. Choose
                                              #      from "urika", "wlm", or "none"
                              urika_args,     # Opt: Argument to pass on to run_training for
                                              #      the urika launcher
                              verbose)        # Opt: Verbose print message.
    Evaluator Example
    cmd = "python source/train.py --epochs 5"
    evaluator = hpo.Evaluator(cmd,
                              nodes=8,
                              nodes_per_eval=2
                              launcher='urika',
                              urikaArgs="--no-node-list",
                              verbose=False)
    In the preceding example:
    • The training process defined in source/train.py will run with 5 full epochs every time it is executed.
    • The Evaluator will have access to 8 nodes in an allocation.
    • Each evaluation will run on 2 nodes, allowing 4 evaluations to occur in parallel.
    • The urika launcher will be used to run the command with run_training from the Urika-XCS package.
    • --no-node-list will be passed as an additional argument to run_training for each evaluation.
    • Verbose logging information will not be printed.
  7. Create an Optimizer.
    The Optimizer contains the core algorithms behind HPO, specifically genetic, random and grid searches. The Optimizer works in tandem with the Evaluator by ingesting the results from the Evaluator and returning a new set of hyperparameters to be evaluated.
    Optimizer Definition Format
    optimizer = hpo.GeneticOptimizer(evaluator, # Evaluator instance
                                  generations,        # Opt: Number of generations.
                                  num_demes,          # Opt: Number of distinct demes (populations)
                                  pop_size,           # Opt: Number of individuals per deme
                                  mutation_rate,      # Opt: Probability of mutation per
                                                      #      hyperparameter during creation of next
                                                      #      generation
                                  crossover_rate,     # Opt: Probability of crossover per
                                                      #      hyperparameter during creation of next
                                                      #      generation
                                  migration_interval, # Opt: Interval of migration between demes
                                  log_fn,             # Opt: Filename to record results of optimization
                                  verbose)            # Opt: Enable verbose output
    optimizer = hpo.RandomOptimizer(evaluator, # Evaluator instance
                                  numIters,  # Opt: Number of iterations to run
                                  seed,      # Opt: Seed for random number generator. Defaults to 0,
                                             #      i.e. random seed used.
                                  verbose)   # Opt: Enable verbose output
    optimizer = hpo.GridOptimizer(evaluator,  # Evaluator instance
                                  grid_size,  # Opt: Number of grid points to discretize for each
                                              #      hyperparameter
                                  chunk_size, # Opt: Number of grid points to evaluate per batch (chunk)
                                  verbose)    # Opt: Enable verbose output
    Optimizer Example
    optimizer = hpo.GeneticOptimizer(evaluator,
                                      pop_size= 4,
                                      num_demes=2,
                                      generations=5,
                                      mutation_rate=0.10,
                                      crossover_rate=0.4,
                                      verbose=True )
    
    optimizer.optimize(params)