4

Software Reference

Fault Detection and Recovery

EPSR uses two methods to detect and recover from outages in either a node or a link within the ring. These methods are:

Master node polling fault detection

Transit node unsolicited fault detection

Master Node Polling Fault Detection

The master node issues healthcheck messages from its primary port as a means of checking the condition of the EPSR network ring. These messages are sent at regular periods, controlled by the hellotime parameter of the create epsr command on page 1-17. A failover timer is set each time a healthcheck message leaves the master node’s primary port. The timeout value for this timer is set by the failover parameter of the create epsr command on page 1-17. If the failover timer expires before the transmitted healthcheck message is received by the master node’s secondary port, the master node assumes that there is a fault in the ring, and implements its fault recovery procedures. Because this detection method relies on a timer expiry, its operation is inherently slower than the “transit node unsolicited detection method” described next.

Transit Node Unsolicited Fault Detection

This method relies on each transit node to detect a fault at its interface, and to immediately notify the master node that a ring breakage has occurred. When a transit node detects a connectivity loss, it immediately sends a “links down” message over its good link. Because a link spans two nodes, both nodes will send the “links down” message back to the master node. These nodes will also change their state from “links up” to “links down,” and will change the state of the port connecting to the broken link, from “forwarding” to “blocking.”

Page 202
Image 202
Allied Telesis AT-8900, RAPIER I, X900-48FE, AT-8700XL Fault Detection and Recovery, Master Node Polling Fault Detection