Select plotting options, then click Generate Plot.

Figure 43 ColPlot results

5.5.7 Monitoring GPUs and coprocessors

5.5.7.1 Monitoring NVIDIA GPUs

If your client nodes contain NVIDIA GPUs and are running version 270.xx.xx or newer of the NVIDIA GPU driver, you can monitor your GPUs with HP Insight CMU.

If you haven’t done so already, install the NVIDIA GPU driver version 270.xx.xx or newer on your client nodes. This can be done two ways:

1.Install the NVIDIA GPU driver manually on one of the client nodes, backup the client image and clone the remaining clients with this new image.

2.Use the script /opt/cmu/contrib/install_nvidia.pl to install the NVIDIA GPU driver on all running clients. For details, see the file /opt/cmu/contrib/ install_nvidia.README.

To enable GPU monitoring, the /opt/cmu/etc/ActionAndAlertsFile.txt file must be updated with entries for HP Insight CMU GPU monitoring. This is done by running the script /opt/ cmu/bin/cmu_config_nvidia. This script takes the number of GPUs on each client as an argument. The following example updates ActionAndAlertsFile.txt to monitor clients that have 3 GPUs each. Monitoring must be restarted for the updates to take effect.

#cmu_config_nvidia 3

CMU GPU monitoring enables driver persistence mode on all GPUs and requires all GPU-enabled clients to be running NVIDIA driver 270.xx.xx or newer. Continue only if an appropriate driver is installed on the clients and persistence mode is permissible.

Continue? [y/n] y

Configuring GPU monitoring in CMU...

GPU monitoring configured successfully.

Copy of orignial /opt/cmu/etc/ActionAndAlertsFile.txt can found in /opt/cmu/etc/ActionAndAlertsFile.txt_before_cmu_config_nvidia_config

Please restart CMU ('/etc/init.d/cmu restart') to enable these changes.

# /etc/init.d/cmu restart

5.5 Tuning HP Insight CMU monitoring 85

Page 85
Image 85
HP Insight Cluster Management Utility manual Monitoring GPUs and coprocessors, Monitoring Nvidia GPUs