HP Cluster Test Software manual Cluster Test procedure as recommended by HP

Page 28

As soon as you start testnodes.pl -gpua test is launched to check all nodes for the presence of accelerator cards (GPUs). If any GPUs are detected and they are responsive to communication, the node will be marked by adding /g<number of nodes> to the node name in the nodes window. In the example below, each node has three detected and responsive GPUs.

4.Compare the number of GPUs indicated in the nodes monitoring window to the actual number of GPUs for each node. Any discrepancies indicate a problem with GPUs on that node.

5.Deselect any nodes that do not have GPUs.

6.Select Verify and use the generated report for the following checklist.

Make sure all GPUs are listed for each node.

Verify the Model numbers.

Verify the Video BIOS.

The Link Speed can be reported as either 2.5, 5, or UNKNOWN. A report of 5 or UNKNOWN indicates the GPU is running at Gen2 speed and is acceptable. A value of 2.5 might indicate the GPU is not properly configured. However this test is timing sensitive, so it is recommended you retest any nodes reporting 2.5. If the test consistently reports 2.5, the GPU should be re-seated and the test repeated. If all the GPUs report 2.5, there might be a BIOS setting error.

28 Cluster Test procedure as recommended by HP

Image 28
Contents HP Cluster Test Administration Guide January Contents Sample test output Useful files and directories Utility commandsDocumentation feedback Glossary Index Varieties of Cluster Test CT Image using a networkCT Image RPM Cluster Test GUI Starting Cluster TestFiles generated by Cluster Test Running cluster testsCluster Test GUI Running cluster tests Configuration settings Running tests in a batch Using scripts to run tests Running cluster tests Test descriptions CrissCrossMonitoring tests and viewing results Nodes monitoring windowTest output window Monitoring tests and viewing results Performance analysis Test report Checking the InfiniBand fabric Cluster Test toolbar menus Cluster Test toolbar menusStarting accelerator tests Accelerator test GUIFiles generated by accelerator test GPU detection Running accelerator testsVerify Sgemm Single Precision General Matrix Multiply Test Dgemm Double Precision General Matrix Multiply TestBandWidth GPU Bandwidth Test Memory TestNvidia Linpack Cuda Accelerated Linpack Benchmark Configuring Cluster Test when using RPM Cluster Test procedure as recommended by HPAdditional software Accelerator test procedure Cluster Test procedure as recommended by HP Cluster Test procedure # checkadm Cluster Test procedure Cluster Test procedure as recommended by HP Performance monitor Performance monitor utilityPerformance Monitor toolbar menu Xperf utility Hardware Inventory Cluster Test toolsFirmware Summary Server health check Excluding the head node from tests Disk Scrubber Cluster Test tools Running tests in parallel An example per-node directory Creating and changing per node filesAn example cloned per-node directory NFS performance tuning NfsTroubleshooting Detecting new hardwareTroubleshooting Cluster Test Cluster Test Troubleshooting GuideSupport and other resources Scope of this documentIntended audience Contacting HPNew and changed information in this edition Related informationDocumentation WebsitesTypographic conventions Customer self repairCustomer self repair Useful files and directories Cluster Test Useful Files and DirectoriesUtility commands AnalyzeConrep Files generated by ibfabriccheck Inspectibfabric.pl Inspectibfabric.pl Utility commands Ipmitool Pdsh Sample test output CrissCrossSample test output Test4 Pallas Mpibyte Sample test output Stream Node24 Triad 3078.7949 3355 3488 3536 Disk Test CPULinpack UTKPassed Passed Passed Documentation feedback Glossary CMUIndex MPI Accelerator
Related manuals
Manual 25 pages 60.17 Kb