Cluster Planning 45
event to inform system administrators that traffic may have to be rerouted.
Afterwards, you can use a network_up notification event to inform system
administrators that traffic can again be serviced through the restored network.
2.6.1.3 Predictive Event Error Correc tion
You can specify a command that attempts to recover from an event script
failure. If the recovery command succeeds and the retry count for the event
script is greater than zero, the event script is rerun. You can also specify the
number of times to attempt to execute the recovery command.
For example, a recovery command can include the retry of unmounting a file
system after logging a user off and making sure no one was currently
accessing the file system.
If a condition that affects the processing of a given event on a cluster is
identified, such as a timing issue, you can insert a recovery command with a
retry count high enough to be sure to cover for the problem.
2.6.2 Error Notification
The AIX Error Notification facility detects errors that are logged to the AIX
error log, such as network and disk adapter failures, and triggers a predefined
response to the failure. It can even act on application failures, as long as they
are logged in the error log.
To implement error notification, you have to add an object to the Error
Notification object class in the ODM. This object clearly identifies what sort of
errors you are going to react to, and how.
By specifying the following in a file:
errnotify: en_name = "Failuresample"
en_persistenceflg = 0
en_class = "H"
en_type = "PERM"
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s ’Disk Error’ root"
and adding this to the errnotify class through the odmadd <filename>
command, the specified en_method is executed every time the error notification
daemon finds a matching entry in the error report. In the example above, the
root user will get e-mail identifying the exact error report entry.