Chapter 4 Troubleshooting Line Cards

General Information for Troubleshooting Line Card Crashes

Step 2 If the results from the Output Interpreter indicate a hardware-related problem, try removing and reinserting the hardware into the chassis. If this does not correct the problem, replace the DRAM chips on the hardware. If the problem persists, replace the hardware.

Step 3 If the problem appears software-related, verify that you are running a released version of software, and that this release of software supports all of the hardware that is installed in the router. If necessary, upgrade the router to the latest version of software.

Tip The most effective way of using the Output Interpreter tool is to capture the output of the show stacks and show tech-supportcommands and upload the output into the tool. If the problem appears related to a line card, you can also try decoding the show context command.

Upgrading to the latest version of the Cisco IOS software eliminates all fixed bugs that can cause line card bus errors. If the crash is still present after the upgrade, collect the relevant information from the above troubleshooting, as well as any information about recent network changes, and contact Cisco TAC.

Software-Forced Crashes

Software-forced crashes (SIG type is 23) occur when the Cisco IOS software encounters a problem with the line card and determines that it can no longer continue, so it forces the line card to crash. The original problem could be either hardware-based or software-based.

The most common reason for a software-forced crash on a line card is a “Fabric Ping Timeout,” which occurs when the PRE-1 module sends five keepalive messages (fabric pings) to the line card and does not receive a reply. If this occurs, you should see error messages similar to the following in the router’s console log:

%GRP-3-FABRIC_UNI: Unicast send timed out (4)

%GRP-3-COREDUMP: Core dump incident on slot 4, error: Fabric ping failure

Fabric ping timeouts are usually caused by one of the following problems:

High CPU Utilization—Either the PRE-1 module or line card is experiencing high CPU utilization. The PRE-1 module or line card could be so busy that either the ping request or ping reply message was dropped. Use the show processes cpu command to determine whether CPU usage is exceptionally high (at 95 percent or more). If so, see the “High CPU Utilization Problems” section on page 3-9for information on troubleshooting the problem.

CEF-Related Problems—If the crash is accompanied by system messages that begin with “%FIB,” it could indicate a problem with Cisco-Express Forwarding (CEF) on one of the line card’s interfaces. For more information, see Troubleshooting CEF-Related Error Messages, at the following URL:

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a0080110d68.s html

IPC Timeout—The InterProcess Communication (IPC) message that carried the original ping request or the ping reply was lost. This could be caused by a software bug that is disabling interrupts for an excessive period of time, high CPU usage on the PRE-1 module, or by excessive traffic on the line card that is filling up all available IPC buffers.

If the router is not running the most current Cisco IOS software, upgrade the router to the latest software release, so that any known IPC bugs are fixed. If the show processes cpu shows that CPU usage is exceptionally high (at 95 percent or more), or if traffic on the line card is excessive, see the “High CPU Utilization Problems” section on page 3-9.

Cisco uBR10012 Universal Broadband Router Troubleshooting Guide

4-6

OL-1237-01

 

 

Page 54
Image 54
Cisco Systems UBR10012 manual Software-Forced Crashes