3-3
Cisco ASR 9000 Series Aggregation Services Router Overview and Reference Guide
OL-17501-09
Chapter 3 High Availability and Redundant Operation
Power Supply Redundancy
forwarding plane. Cisco IOS XR Software uses graceful restart and a combination of check pointing,
mirroring, route switch processor redundancy, and other system resiliency features to recover before a
timeout and avoid service downtime as a result of network reconvergence.
Process Restartability
The Cisco IOS XR distributed and modular microkernel operating system enables process independence,
restartability, and maintenance of memory and operational states. Each process runs in a protected
address space. Checkpointing facilities, reliable transports, and retransmission features enable processes
to be restarted without impacting other components and with minimal or no disruption of traffic. Usually
any time a process fails, crashes or incurs any faults, the process restarts itself. For example, if a Border
Gateway Protocol (BGP) or Quality of Service (QoS) process incurs a fault, it restarts to resume its
normal routine without impacting other processes.
Fault Detection and Management
To minimize service outage, the Cisco ASR 9000 Series Routers provide rapid and efficient response to
single or multiple system component or network failures When local fault handling cannot recover from
critical faults, the system offers robust fault detection, correction, failover, and event management
capabilities.
• Fault detection and correction—In hardware, the Cisco ASR 9000 Series Routers offer error
correcting code (ECC)-protected memory. If a memory corruption occurs, the system automatically
restarts the impacted processes to fix the problem with minimum impact. If the problem is persistent,
the Cisco ASR 9000 supports switchover and online insertion and removal (OIR) capabilities to
allow replacement of defective hardware without impacting services on other hardware components
in the system.
• Resource management—Cisco ASR 9000 Series Routers support resource threshold monitoring for
CPU and memory utilization to improve out of resource (OOR) management. When threshold
conditions are met or exceeded, the system generates an OOR alarm to notify operators of OOR
conditions. The system then automatically attempts recovery, and allows the operator to configure
flexible policies using the embedded event manager.
• Online diagnostics—Cisco ASR 9000 Series Routers provide built-in online diagnostics to monitor
functions such as network path failure detection, packet diversion failures, faulty fabric link
detections, etc. The tests are configurable through the CLI.
• Event management—Cisco ASR 9000 Series Routers offer mechanisms such as fault-injection
testing to detect hardware faults during lab testing, a system watchdog mechanism to recover failed
processes, and tools such as the Route Consistency Checker to diagnose inconsistencies between the
routing and forwarding tables.
Power Supply RedundancyThe Cisco ASR 9000 Series Routers are configured such that a power module failure or its subsequent
replacement does not cause a significant outage.
A power supply failure or over/under voltage at the output of a power module is detected, and an alarm
raised.