•A Width of x16 is expected for Gen2 GPUs.
•The Bus ID can be used to identify the physical location of each GPU.
7.Select SGEMM to start the Single Precision General Matrix Multiply Test. All nodes should report a rate of 520 – 545 Gflop/s.
8.Select DGEMM to start the Double Precision General Matrix Multiply Test. All nodes should report a rate of 200 – 220 Gflop/s.
9.Select BandWidth to start that test. All GPUs should report
10.Run the Memory Test.
11.Run the Thermal Test. All GPU temperatures should remain below 81 °C.
The Cluster Test procedure
This procedure outlines the Cluster Test procedure recommended by HP.
If your cluster has accelerator cards (GPUs), first run the accelerator tests as described in “Accelerator test procedure” (page 27).
The test parameters for each of these tests are default values. You may change parameters as desired, but be careful the current system limits are not exceeded.
See “Sample test output” (page 59) for example output of these tests.
Refer to “Troubleshooting” (page 44) if any of the following tests fail.
This procedure assumes a diskless cluster of 77 nodes has been set up successfully.
NOTE: All commands and utilities in this procedure reside in the /opt/clustertest/bin directory. Cluster Test does not modify the PATH environment variable on the head node, therefore any calls to commands and utilities referenced in this section must be preceded with the path to the command.
1.Make sure the X server is running and the DISPLAY environment variable is set appropriately.
2.Make sure no one else is running tests on the nodes you will be testing.
3.Run testnodes.pl on the first head node. This will launch the Cluster Test interface.
A log directory will be created under /opt/clustertest/logs with a timestamp (year/month/day/time). The log, error, and script files are saved to this directory.
You can override the default test logging directory with the command testnodes.pl
<my_log_dir>. This will create a directory my_log_dir in /opt/clustertest/logs and set the sequence to avoid any conflict with other testnodes.pl jobs already running.
The Cluster Test procedure 29