Hyperparameter Optimization (HPO) Example

MNIST, PBT, and Distributed trainings examples for using HPO.

This section provides a simple example which uses the genetic HPO algorithm to optimize the fit of polynomial coefficients to a sin function by minimizing the error of the fit. In this example, the driver script containing all calls to the HPO interface (hpo.Evaluator, hpo.Params, and hpo.GeneticOptimizer) is given in genetic.py, as follows.

#!/usr/bin/env python3

"""Genetic optimizer example
"""
from crayai import hpo

evaluator = hpo.Evaluator('python sin.py',
                          workload_manager='slurm',
                          launcher='urika',
                          verbose=True,
                          nodes=2)

params = hpo.Params([["-a", 1.0, (-10.0, 10.0)],
                     ["-b",-1.0, (-10.0, 10.0)],
                     ["-c", 1.0, (-10.0, 10.0)],
                     ["-d",-1.0, (-10.0, 10.0)],
                     ["-e", 1.0, (-10.0, 10.0)],
                     ["-f",-1.0, (-10.0, 10.0)],
                     ["-g", 1.0, (-10.0, 10.0)]])

optimizer = hpo.GeneticOptimizer(evaluator, verbose=True, generations=2, pop_size=10, log_fn='genetic.log')

optimizer.optimize(params)

print(optimizer.best_fom)
print(optimizer.best_params) 

The source script defines the polynomial fit and error to be minimized, given in sin.py, as follows.

import math
import argparse

def main():
    # Parse command line args
    argparser = argparse.ArgumentParser()
    argparser.add_argument("-a", "--A", type=float, default=1.0)
    argparser.add_argument("-b", "--B", type=float, default=-1.0)
    argparser.add_argument("-c", "--C", type=float, default=1.0)
    argparser.add_argument("-d", "--D", type=float, default=-1.0)
    argparser.add_argument("-e", "--E", type=float, default=1.0)
    argparser.add_argument("-f", "--F", type=float, default=-1.0)
    argparser.add_argument("-g", "--G", type=float, default=1.0)
    args = argparser.parse_args()
    # Compute error
    err = 0.0
    for i in range(100):
        x = ((float(i)-50.0) / 50.0) * 3.1415926535
        val = args.A + args.B*pow(x,1) + args.C*pow(x,2) + args.D*pow(x,3)
        val = val    + args.E*pow(x,4) + args.F*pow(x,5) + args.G*pow(x,6)
        val = abs( math.sin(x) - val )
        val = val * val
        err = err + val
    fom = err
    # Print the error as our figure of merit, to be minimized
    print("FoM: %e"%fom)

main()

The example uses two nodes and the Urika launcher to run each of the evaluations in a Urika-XCCS container. Given two nodes, by default two evaluations will be done in parallel. 40 genotypes (4 individuals x 10 population) will be evaluated for two generations. After two generations complete, the individual with the best figure of merit (FOM) is printed (in this case the lowest error) as well as the corresponding hyperparameters (in this case the polynomial coefficients).

This example can be run by doing the following.

  1. Log on to a login node.
  2. Place both of the scripts above in the same directory.
  3. Load the python, analytics, and crayai modules:
    $ module load cray-python analytics crayai
  4. Obtain a 2 node allocation (using Slurm in this example):
    $ salloc -N 2
  5. Run the example. This assumes that the Python environment has been set up properly as discussed above.
    $ python genetic.py

The standard output should be similar to the example that follows.

Job id determined from SLURM_JOBID: 132390
Number of nodes determined from SLURM_NNODES: 2
------------------------------------------------------------
Optimizer Settings:
------------------------------------------------------------
generations:        2
num_demes:          4
pop_size:           10
verbose:            true
mutation_rate:      0.05
mul_mutation_bounds: 	   0.01 0.1 0.2
add_mutation_bounds: 	   0.03 0.03 0.13
crossover_rate:     0.33
migration_interval: 5
------------------------------------------------------------
Evaluator Settings:
------------------------------------------------------------
run_path:           "/path/to/hpo/run"
Normalized metric weights:
"FoM=1.0"
num_parallel_evals: 2
Job Settings
============
workloadManager:    slurm
launcher:           urika
active:             true
id:                 132390
number of nodes:    2
allocation source:  environment
isXC:               true
verbose:            true
------------------------------------------------------------
Optimizer Settings:
------------------------------------------------------------
generations:        2
num_demes:          4
pop_size:           10
verbose:            true
mutation_rate:      0.05
mul_mutation_bounds: 	   0.01 0.1 0.2
add_mutation_bounds: 	   0.03 0.03 0.13
crossover_rate:     0.33
migration_interval: 5
------------------------------------------------------------
Adding 10 individuals to each deme with genotype:
-a:  1,
-b:  -1,
-c:  1,
-d:  -1,
-e:  1,
-f:  -1,
-g:  1,
Adding mutants to first generation.
------------------------------------------------------------
Generation: 0
------------------------------------------------------------
Evaluating 40 genotypes.

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=-1.2035303 -b=-0.85464905 -c=0.86982715 -d=-0.88827309 -e=0.88912102 -f=-1.1544676 -g=0.85622564"

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=1.1320064 -b=-0.7511699 -c=1.5190717 -d=-1.0093883 -e=1.1932477 -f=1.5264775 -g=-0.79627715"
Generation 0: 1/40 evaluations completed

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=0.85016708 -b=-0.8345141 -c=1.1679045 -d=-1.2864144 -e=1.5055114 -f=-2.850595 -g=0.67866473"
Generation 0: 2/40 evaluations completed

[snip]

Migrating individuals across demes.
Migrating deme3_ind7 to deme1
Migrating deme4_ind2 to deme2
Migrating deme3_ind7 to deme3
Migrating deme4_ind2 to deme4
------------------------------------------------------------
Global Best: deme4_ind2        2.134012e+06 (5.2x) [1.030e+07 avg]
Best hyperparameters:
-a:  -1.4742517
-b:  -0.99990206
-c:  1.3994163
-d:  0.47363707
-e:  1.0018659
-f:  -1.1683297
-g:  -0.57133446
------------------------------------------------------------
deme1     size: 10    fom: 1.176e+07 (avg)
  deme1_ind13         fom: 4.302e+06 (local best)
-a=1.0005078 -b=-0.99673555 -c=1.8043571 -d=-1.1078188 -e=-1.3175 -f=-0.99798545 -g=0.47822263
deme2     size: 10    fom: 8.677e+06 (avg)
  deme2_ind12         fom: 2.134e+06 (local best)
-a=-1.4742517 -b=-0.99995173 -c=1.3994163 -d=-0.13356174 -e=1.1932429 -f=-0.24077012 -g=-0.57133446
deme3     size: 10    fom: 1.106e+07 (avg)
  deme3_ind16         fom: 5.033e+06 (local best)
-a=1.1320064 -b=-0.7511699 -c=1.5190717 -d=-1.0093883 -e=1.1932477 -f=1.3779062 -g=-0.79627715
deme4     size: 10    fom: 9.680e+06 (avg)
  deme4_ind14         fom: 2.134e+06 (local best)
-a=-1.4742517 -b=-0.99990206 -c=-0.067048009 -d=-0.67101744 -e=1.0018659 -f=-1.1683297 -g=-0.75745131
------------------------------------------------------------
Timings:
Setup:           1.852e-05 s
Reading:         1.750e-07 s
Evaluation:      1.702e+00 s
Writing:         2.500e-07 s
Cleanup:         0.000e+00 s
------------------------------------------------------------
Logging deme1 results: "Deme1_genetic.log"
Logging deme2 results: "Deme2_genetic.log"
Logging deme3 results: "Deme3_genetic.log"
Logging deme4 results: "Deme4_genetic.log"
------------------------------------------------------------
Generation: 1
------------------------------------------------------------
Evaluating 40 genotypes.

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=-1.4742517 -b=-1.0056967 -c=-0.067048009 -d=0.47363707 -e=1.004064 -f=-0.99530402 -g=-0.75745131"

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=3.5286442 -b=-1.1157018 -c=0.21498174 -d=-1.1668634 -e=0.85316778 -f=-0.082945272 -g=1.1647828"
Generation 1: 1/40 evaluations completed

run_training --no-node-list -n 1 --ppn 1   "python source/sin.py -a=1.1925278 -b=-1.0086988 -c=0.80030017 -d=-0.87294638 -e=0.99030772 -f=-0.86089002 -g=0.87977147"
Generation 1: 2/40 evaluations completed

[snip]

Generation 1: 39/40 evaluations completed
Generation 1: 40/40 evaluations completed
------------------------------------------------------------
Global Best: deme2_ind18       1.132800e+06 (9.8x) [6.742e+06 avg]
Best hyperparameters:
-a:  0.54982687
-b:  -1.0096096
-c:  0.88791129
-d:  -1.0096621
-e:  0.66114496
-f:  -1.0011993
-g:  -0.20787069
------------------------------------------------------------
deme1     size: 10    fom: 1.202e+07 (avg)
  deme1_ind22         fom: 9.033e+06 (local best)
-a=0.99770682 -b=-1.1157018 -c=1.5834944 -d=-1.0093883 -e=1.1932477 -f=-0.99798545 -g=1.1440034
deme2     size: 10    fom: 2.957e+06 (avg)
  deme2_ind22         fom: 1.133e+06 (local best)
-a=0.54982687 -b=-1.0096096 -c=0.88791129 -d=-2.883936 -e=1.0018659 -f=-1.1683297 -g=-0.20787069
deme3     size: 10    fom: 4.496e+06 (avg)
  deme3_ind26         fom: 3.295e+06 (local best)
-a=1.1320064 -b=-0.7511699 -c=1.5190717 -d=-0.87294638 -e=0.99030772 -f=-0.86089002 -g=0.71837671
deme4     size: 10    fom: 7.492e+06 (avg)
  deme4_ind21         fom: 3.350e+06 (local best)
-a=-1.4742517 -b=0.27139839 -c=1.3626765 -d=2.2548486 -e=1.004064 -f=-0.99530402 -g=-0.75745131
------------------------------------------------------------
Timings:
Setup:           1.883e-05 s
Reading:         1.750e-07 s
Evaluation:      1.688e+00 s
Writing:         2.000e-07 s
Cleanup:         2.500e-08 s
------------------------------------------------------------
Logging deme1 results: "Deme1_genetic.log"
Logging deme2 results: "Deme2_genetic.log"
Logging deme3 results: "Deme3_genetic.log"
Logging deme4 results: "Deme4_genetic.log"
------------------------------------------------------------
Best:  deme2_ind18         fom: 1.132800e+06  (9.77963x)
-a=0.54982687 -b=-1.0096096 -c=0.88791129 -d=-1.0096621 -e=0.66114496 -f=-1.0011993 -g=-0.20787069
------------------------------------------------------------
1132800.0
{'-a': 0.54982687, '-b': -1.0096096, '-c': 0.88791129, '-d': -1.0096621, '-e': 0.66114496, '-f': -1.0011993, '-g': -0.20787069}

Results are given in CSV format in the “Point.log” output file.

Other HPO examples can be found in the public HPO Examples Github repository.