cpu
cpu_usage
memory
memory_usage
network_stress
network_bidirectional
network_unidirectional

7.2 Running Performance Health Tests

You can run the ovp command to generate reports on the performance health of the nodes. Use the following format to run a specific performance health test:

ovp [options] [-verify=perf_health/test]

Where:

options Specify additional command line options for the test. The ovp --help perf_health command lists the command line options for each test.

The following options apply to all the tests:NOTE: Use the --opts=option to pass this option.The --nnodes=noption runs the test on n compute nodes.The --nodelist=nodelistoption specifies the compute nodes to use.

NOTE: The --nodelist=nodelistoption is particularly useful for determining problematic nodes.

If you use this option and the --nnodes=noption, the --nnodes=noption is ignored.

The --queueLSF_queue option specifies the LSF queue for the performance health tests.

test Indicates the test to perform. The following tests are available:Tests CPU core performance using the Linpack benchmark.

Tests CPU core usage. All CPU cores should be idle during the test. This test reports a node if it is using more than 10% (by default) of its CPU cores.

The head node is excluded from this test.Uses the streams benchmark to test memory performance.

Tests memory usage. This test reports a node that uses more than 25 percent (by default) of its memory.

Tests network performance. Check network performance under stress using the Pallas benchmark's Alltoall, Allgather, and Allreduce tests. These tests should be performed on a large number of nodes for the most accurate results.

The default value for the number of nodes is 4, which is the minimum value that should be used.

The --all_groupoption allows you to select the node grouping size.

Tests network performance between pairs of nodes using the Pallas benchmark's Exchange test.

Tests network performance between pairs of nodes using the HP MPI ping_pong_ring test.

70 Monitoring Node Activity