HP Cluster Test Software manual Cluster Test procedure

Page 29

A Width of x16 is expected for Gen2 GPUs.

The Bus ID can be used to identify the physical location of each GPU.

7.Select SGEMM to start the Single Precision General Matrix Multiply Test. All nodes should report a rate of 520 – 545 Gflop/s.

8.Select DGEMM to start the Double Precision General Matrix Multiply Test. All nodes should report a rate of 200 – 220 Gflop/s.

9.Select BandWidth to start that test. All GPUs should report 5650-5750 MBs. Values of half the expected range might indicate the GPU is running at Gen1 speed instead of Gen2 speed. This might be caused by a BIOS setting or might indicate a GPU hardware issue.

10.Run the Memory Test.

11.Run the Thermal Test. All GPU temperatures should remain below 81 °C.

The Cluster Test procedure

This procedure outlines the Cluster Test procedure recommended by HP.

If your cluster has accelerator cards (GPUs), first run the accelerator tests as described in “Accelerator test procedure” (page 27).

The test parameters for each of these tests are default values. You may change parameters as desired, but be careful the current system limits are not exceeded.

See “Sample test output” (page 59) for example output of these tests.

Refer to “Troubleshooting” (page 44) if any of the following tests fail.

This procedure assumes a diskless cluster of 77 nodes has been set up successfully.

NOTE: All commands and utilities in this procedure reside in the /opt/clustertest/bin directory. Cluster Test does not modify the PATH environment variable on the head node, therefore any calls to commands and utilities referenced in this section must be preceded with the path to the command.

1.Make sure the X server is running and the DISPLAY environment variable is set appropriately.

2.Make sure no one else is running tests on the nodes you will be testing.

3.Run testnodes.pl on the first head node. This will launch the Cluster Test interface.

A log directory will be created under /opt/clustertest/logs with a timestamp (year/month/day/time). The log, error, and script files are saved to this directory.

You can override the default test logging directory with the command testnodes.pl

<my_log_dir>. This will create a directory my_log_dir in /opt/clustertest/logs and set the sequence to avoid any conflict with other testnodes.pl jobs already running.

The Cluster Test procedure 29

Image 29
Contents HP Cluster Test Administration Guide January Contents Documentation feedback Glossary Index Useful files and directories Utility commandsSample test output CT Image CT Image using a networkVarieties of Cluster Test RPM Starting Cluster Test Cluster Test GUIFiles generated by Cluster Test Running cluster testsCluster Test GUI Running cluster tests Configuration settings Running tests in a batch Using scripts to run tests Running cluster tests CrissCross Test descriptionsNodes monitoring window Monitoring tests and viewing resultsTest output window Monitoring tests and viewing results Performance analysis Test report Checking the InfiniBand fabric Cluster Test toolbar menus Cluster Test toolbar menusFiles generated by accelerator test Accelerator test GUIStarting accelerator tests Verify Running accelerator testsGPU detection Dgemm Double Precision General Matrix Multiply Test Sgemm Single Precision General Matrix Multiply TestBandWidth GPU Bandwidth Test Memory TestNvidia Linpack Cuda Accelerated Linpack Benchmark Additional software Cluster Test procedure as recommended by HPConfiguring Cluster Test when using RPM Accelerator test procedure Cluster Test procedure as recommended by HP Cluster Test procedure # checkadm Cluster Test procedure Cluster Test procedure as recommended by HP Performance monitor utility Performance monitorPerformance Monitor toolbar menu Xperf utility Firmware Summary Cluster Test toolsHardware Inventory Server health check Excluding the head node from tests Disk Scrubber Cluster Test tools Running tests in parallel An example cloned per-node directory Creating and changing per node filesAn example per-node directory Nfs NFS performance tuningDetecting new hardware TroubleshootingCluster Test Troubleshooting Guide Troubleshooting Cluster TestScope of this document Support and other resourcesIntended audience Contacting HPRelated information New and changed information in this editionDocumentation WebsitesCustomer self repair Typographic conventionsCustomer self repair Cluster Test Useful Files and Directories Useful files and directoriesAnalyze Utility commandsConrep Files generated by ibfabriccheck Inspectibfabric.pl Inspectibfabric.pl Utility commands Ipmitool Pdsh CrissCross Sample test outputSample test output Test4 Pallas Mpibyte Sample test output Stream Node24 Triad 3078.7949 3355 3488 3536 CPU Disk TestUTK LinpackPassed Passed Passed Documentation feedback CMU GlossaryIndex MPI Accelerator
Related manuals
Manual 25 pages 60.17 Kb