#

#

#

ALERTS

#

#

#cpu_freq_alert "CPU frequency is not nominal"

1

24

100

<

%

sh -c "b=`cat

/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq`;a=`cat

 

 

 

 

 

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq`;echo 100 \* \$b / \$a bc"

 

 

login_alert

"Someone is connected" 3

24

0

>

login(s)

w -h wc -l

root_fs_used

"The / filesystem is above 90% full"

4

24

90

>

%

df / awk '{

if ($6=="/") print $5}' cut -f 1 -d % -

 

 

#reboot_alert

"Node rebooted"

4

24

5

< rebooted awk '{printf "%.1f\n",$1/60}' /proc/uptime

# The line below allows to report MCE errors; be careful for possible false positives

#mce_alert

"The kernel has logged MCE errors; please check /var/log/mcelog" 5 60 1 > lines wc -l

/var/log/mcelog cut -f 1 -d ' '

 

 

 

 

#

 

 

 

 

 

#

 

 

 

 

 

ALERT_REACTIONS

 

 

 

 

#

 

 

 

 

 

#

 

 

 

 

 

#login_alert

"Sending mail to root"

ReactOnRaise

echo -e "Alert 'CMU_ALERT_NAME' raised on node(s)

CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'w -h'`" mailx -s "CMU: Alert

'CMU_ALERT_NAME' raised." root

 

 

 

 

#

 

 

 

 

 

#root_fs_used "Sending mail to root"

ReactOnRaise

echo -e "Alert 'CMU_ALERT_NAME' raised on node(s)

CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'df /'`" mailx -s "CMU: Alert

'CMU_ALERT_NAME' raised!" root

 

 

 

 

#

 

 

 

 

 

#reboot_alert "Sending mail to root"

ReactOnRaise

echo -e "Alert 'CMU_ALERT_NAME' raised on node(s)

CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'uptime'`" mailx -s "CMU: Alert 'CMU_ALERT_NAME' raised." root

#

Lines prefixed with # are ignored. Lines cannot begin with a leading white space. Each line corresponds to a sensor, alert, or an alert reaction. Sensors are placed at the beginning of the file, between the ACTIONS and ALERTS tags. Each alert is in the middle of the file between the ALERTS and ALERT_REACTIONS tags, and each alert reaction is at the end of the file below the ALERT_REACTIONS tag.

Most sensors have both a “native” line and a commented “collectl” line. To use collectl for collecting monitoring data, enable it by removing the comment from the corresponding sensor line.

NOTE: Using collectl requires additional steps described in “Using collectl for gathering monitoring data” (page 81).

5.5.2 Actions

Each action contains the following fields:

Name

The name of the sensor as it appears in the Java GUI. It must consist of letters only.

Description

A quote-contained string to describe in a few words what the sensor is. This appears in the GUI.

Time multiple

An integer value that determines when the sensors are monitored. If the monitoring has a default timer of 5 seconds:

A time multiple of 1 means the value is monitored every 5 seconds.

A time multiple of 2 means the value is monitored every 10 seconds.

Data type

This can be numerical or a string. A string sensor cannot be displayed in the pies by the interface.

Measurement method

This can be either Instantaneous or MeanOverTime.

Instantaneous returns the sensor value immediately.

78 Monitoring a cluster with HP Insight CMU

Page 78
Image 78
HP Insight Cluster Management Utility manual Actions, Alerts