Run an OpenMP Application
Example of an OpenMP application.
This example shows how to compile and run an OpenMP/MPI application.
- PrgEnv-cray
- PrgEnv-pgi
- PrgEnv-gnu
- PrgEnv-intel
To compile an OpenMP program using the PGI compiler, include -mp on the compiler driver command line. For a GCC compiler, include -fopenmp. For in Intel compiler, include -openmp. No option is required for the Cray compilers; -h omp is the default.
Source code of C program xthi.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sched.h>
#include <mpi.h>
#include <omp.h>
/* Borrowed from util-linux-2.13-pre7/schedutils/taskset.c */
static char *cpuset_to_cstr(cpu_set_t *mask, char *str)
{
char *ptr = str;
int i, j, entry_made = 0;
for (i = 0; i < CPU_SETSIZE; i++) {
if (CPU_ISSET(i, mask)) {
int run = 0;
entry_made = 1;
for (j = i + 1; j < CPU_SETSIZE; j++) {
if (CPU_ISSET(j, mask)) run++;
else break;
}
if (!run)
sprintf(ptr, "%d,", i);
else if (run == 1) {
sprintf(ptr, "%d,%d,", i, i + 1);
i++;
} else {
sprintf(ptr, "%d-%d,", i, i + run);
i += run;
}
while (*ptr != 0) ptr++;
}
}
ptr -= entry_made;
*ptr = 0;
return(str);
}
int main(int argc, char *argv[])
{
int rank, thread;
cpu_set_t coremask;
char clbuf[7 * CPU_SETSIZE], hnbuf[64];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
memset(clbuf, 0, sizeof(clbuf));
memset(hnbuf, 0, sizeof(hnbuf));
(void)gethostname(hnbuf, sizeof(hnbuf));
#pragma omp parallel private(thread, coremask, clbuf)
{
thread = omp_get_thread_num();
(void)sched_getaffinity(0, sizeof(coremask), &coremask);
cpuset_to_cstr(&coremask, clbuf);
#pragma omp barrier
printf("Hello from rank %d, thread %d, on %s. (core affinity = %s)\n",
rank, thread, hnbuf, clbuf);
}
MPI_Finalize();
return(0);
}Load the PrgEnv-cray module:
% module swap PrgEnv-pgi PrgEnv-cray
PSC_OMP_AFFINITY environment variable to FALSE:% setenv PSC_OMP_AFFINITY FALSE
% export PSC_OMP_AFFINITY=FALSE
Compile and link xthi.c:
% cc -mp -o xthi xthi.c
% setenv OMP_NUM_THREADS 2
% export OMP_NUM_THREADS=2
If running Intel-compiled code, use one of the alternate methods when setting OMP_NUM_THREADS:
- Increase the aprun -d depth value by one. This will reserve one extra CPU per process, increasing the total number of CPUs required to run the job.
- Use the aprun
-cc depth affinity option. Setting the environment variable
KMP_AFFINITY=compactmay increase performance (see “User and Reference Guide for the Intel® C++ Compiler” for more information).Run program xthi:
% export OMP_NUM_THREADS=24 % aprun -n 1 -d 24 -L 56 xthi | sort Application 57937 resources: utime ~1s, stime ~0s Hello from rank 0, thread 0, on nid00056. (core affinity = 0) Hello from rank 0, thread 10, on nid00056. (core affinity = 10) Hello from rank 0, thread 11, on nid00056. (core affinity = 11) Hello from rank 0, thread 12, on nid00056. (core affinity = 12) Hello from rank 0, thread 13, on nid00056. (core affinity = 13) Hello from rank 0, thread 14, on nid00056. (core affinity = 14) Hello from rank 0, thread 15, on nid00056. (core affinity = 15) Hello from rank 0, thread 16, on nid00056. (core affinity = 16) Hello from rank 0, thread 17, on nid00056. (core affinity = 17) Hello from rank 0, thread 18, on nid00056. (core affinity = 18) Hello from rank 0, thread 19, on nid00056. (core affinity = 19) Hello from rank 0, thread 1, on nid00056. (core affinity = 1) Hello from rank 0, thread 20, on nid00056. (core affinity = 20) Hello from rank 0, thread 21, on nid00056. (core affinity = 21) Hello from rank 0, thread 22, on nid00056. (core affinity = 22) Hello from rank 0, thread 23, on nid00056. (core affinity = 23) Hello from rank 0, thread 2, on nid00056. (core affinity = 2) Hello from rank 0, thread 3, on nid00056. (core affinity = 3) Hello from rank 0, thread 4, on nid00056. (core affinity = 4) Hello from rank 0, thread 5, on nid00056. (core affinity = 5) Hello from rank 0, thread 6, on nid00056. (core affinity = 6) Hello from rank 0, thread 7, on nid00056. (core affinity = 7) Hello from rank 0, thread 8, on nid00056. (core affinity = 8) Hello from rank 0, thread 9, on nid00056. (core affinity = 9)
The aprun command created one instance of xthi, which spawned 23 additional threads running on separate cores.
Here is another run of xthi:
% export OMP_NUM_THREADS=6 % aprun -n 4 -d 6 -L 56 xthi | sort Application 57948 resources: utime ~1s, stime ~1s Hello from rank 0, thread 0, on nid00056. (core affinity = 0) Hello from rank 0, thread 1, on nid00056. (core affinity = 1) Hello from rank 0, thread 2, on nid00056. (core affinity = 2) Hello from rank 0, thread 3, on nid00056. (core affinity = 3) Hello from rank 0, thread 4, on nid00056. (core affinity = 4) Hello from rank 0, thread 5, on nid00056. (core affinity = 5) Hello from rank 1, thread 0, on nid00056. (core affinity = 6) Hello from rank 1, thread 1, on nid00056. (core affinity = 7) Hello from rank 1, thread 2, on nid00056. (core affinity = 8) Hello from rank 1, thread 3, on nid00056. (core affinity = 9) Hello from rank 1, thread 4, on nid00056. (core affinity = 10) Hello from rank 1, thread 5, on nid00056. (core affinity = 11) Hello from rank 2, thread 0, on nid00056. (core affinity = 12) Hello from rank 2, thread 1, on nid00056. (core affinity = 13) Hello from rank 2, thread 2, on nid00056. (core affinity = 14) Hello from rank 2, thread 3, on nid00056. (core affinity = 15) Hello from rank 2, thread 4, on nid00056. (core affinity = 16) Hello from rank 2, thread 5, on nid00056. (core affinity = 17) Hello from rank 3, thread 0, on nid00056. (core affinity = 18) Hello from rank 3, thread 1, on nid00056. (core affinity = 19) Hello from rank 3, thread 2, on nid00056. (core affinity = 20) Hello from rank 3, thread 3, on nid00056. (core affinity = 21) Hello from rank 3, thread 4, on nid00056. (core affinity = 22) Hello from rank 3, thread 5, on nid00056. (core affinity = 23)
The aprun command created four instances of xthi which spawned five additional threads per instance. All PEs are running on separate cores and each instance is confined to NUMA node domains on one compute node.