As soon as you start testnodes.pl -gpua test is launched to check all nodes for the presence of accelerator cards (GPUs). If any GPUs are detected and they are responsive to communication, the node will be marked by adding /g<number of nodes> to the node name in the nodes window. In the example below, each node has three detected and responsive GPUs.

4.Compare the number of GPUs indicated in the nodes monitoring window to the actual number of GPUs for each node. Any discrepancies indicate a problem with GPUs on that node.

5.Deselect any nodes that do not have GPUs.

6.Select Verify and use the generated report for the following checklist.

Make sure all GPUs are listed for each node.

Verify the Model numbers.

Verify the Video BIOS.

The Link Speed can be reported as either 2.5, 5, or UNKNOWN. A report of 5 or UNKNOWN indicates the GPU is running at Gen2 speed and is acceptable. A value of 2.5 might indicate the GPU is not properly configured. However this test is timing sensitive, so it is recommended you retest any nodes reporting 2.5. If the test consistently reports 2.5, the GPU should be re-seated and the test repeated. If all the GPUs report 2.5, there might be a BIOS setting error.

28 Cluster Test procedure as recommended by HP

Page 28
Image 28
HP Cluster Test Software manual Cluster Test procedure as recommended by HP