Monitor Performance Counters

how to monitor and change performance counters

Environment variable: PAT_RT_PERFCTR

Use this environment variable to specify CPU, Intel uncore, network, accelerator, power management, and AMD Interlagos Northbridge events to be monitored while performing tracing experiments.

Counter events are specified in a comma-separated list. Event names and groups from all three components may be mixed as needed; the tool is able to parse the list and determine which event names or group numbers apply to which components. To list the names of the individual events on the system, use the papi_avail and papi_native_avail man pages.

For more information on individual counters, see PAT_RT_PERFCTR in Run Time Environment Variables.

To get useful information, papi_avail or papi_native_avail must be run on the compute nodes via the aprun command, not run from the login node or esLogin command line.

Hardware Counters

Alternatively, predefined counter group numbers can be used in addition to, or in place of, individual event names to specify one or more predefined performance counter groups. For complete lists of hardware counter events currently supported organized by processor family, execute the pat_help command and select the counters topic.

Network Counters

Alternatively, predefined network counter group names can be used in addition to or in place of individual event names, to specify one or more predefined network counter groups. The valid predefined network counter names are listed in the nwpc(5) man page.

For more information about available network performance counters:

On Gemini-based systems, either read the technical note Using the Cray Gemini Hardware Counters or view the counters->gemini topics in pat_help.
On Aries-based systems, either read the technical note Using the Aries Hardware Counters or view the counters->aries topics in pat_help.

Accelerator Counters

Alternatively, an acgrp value can be used in place of the list of event names, to specify a predefined performance counter accelerator group. The valid acgrp names are listed in the accpc(5) man page or on the system in $CRAYPAT_ROOT/share/counters/CounterGroups.accelerator, where accelerator is the accelerator GPU used on the system.

If the acgrp value specified is invalid or not defined, acgrp is treated as a counter event name. This can cause instrumented code to generate "invalid ACC performance counter event name" error messages or even abort during execution. Always verify that the acgrp values specified are supported on the type of GPU accelerators that are being used.

Accelerated applications cannot be compiled with -h profile_generate, therefore GPU accelerator performance statistics and loop profile information cannot be collected simultaneously.

Power Management Counters

Cray XC series systems support two types of power management counters. The PAPI Cray RAPL component provides socket-level access to Intel Running Average Power Limit (RAPL) counters, while the similar PAPI Cray Power Management (PM) counters provide compute node-level access to additional power management counters. Together, these counters enable the user to monitor and report energy usage during program execution.

CrayPat supports experiments that make use of both sets of counters. These counters are accessed through use of the PAT_RT_PERFCTR set of run time environment variables. When RAPL counters are specified, one core per socket is tasked with collecting and recording the specified events. When PM counters are specified, one core per compute node is tasked with collecting and recording the specified events. The resulting metrics appear on text reports.

To list the available events, use the PAPI_native_avail command on a compute node and filter for the desired PAPI components. For example:

$ aprun papi_native_avail -i cray_rapl
$ aprun papi_native_avail -i cray_pm

For more information about the RAPL and PM counters, see the cray_rapl(5) and cray_pm(5) man pages.