Cisco Systems UBR10012 manual Software-Forced Crashes

Page 54

Chapter 4 Troubleshooting Line Cards

General Information for Troubleshooting Line Card Crashes

Step 2 If the results from the Output Interpreter indicate a hardware-related problem, try removing and reinserting the hardware into the chassis. If this does not correct the problem, replace the DRAM chips on the hardware. If the problem persists, replace the hardware.

Step 3 If the problem appears software-related, verify that you are running a released version of software, and that this release of software supports all of the hardware that is installed in the router. If necessary, upgrade the router to the latest version of software.

Tip The most effective way of using the Output Interpreter tool is to capture the output of the show stacks and show tech-supportcommands and upload the output into the tool. If the problem appears related to a line card, you can also try decoding the show context command.

Upgrading to the latest version of the Cisco IOS software eliminates all fixed bugs that can cause line card bus errors. If the crash is still present after the upgrade, collect the relevant information from the above troubleshooting, as well as any information about recent network changes, and contact Cisco TAC.

Software-Forced Crashes

Software-forced crashes (SIG type is 23) occur when the Cisco IOS software encounters a problem with the line card and determines that it can no longer continue, so it forces the line card to crash. The original problem could be either hardware-based or software-based.

The most common reason for a software-forced crash on a line card is a “Fabric Ping Timeout,” which occurs when the PRE-1 module sends five keepalive messages (fabric pings) to the line card and does not receive a reply. If this occurs, you should see error messages similar to the following in the router’s console log:

%GRP-3-FABRIC_UNI: Unicast send timed out (4)

%GRP-3-COREDUMP: Core dump incident on slot 4, error: Fabric ping failure

Fabric ping timeouts are usually caused by one of the following problems:

High CPU Utilization—Either the PRE-1 module or line card is experiencing high CPU utilization. The PRE-1 module or line card could be so busy that either the ping request or ping reply message was dropped. Use the show processes cpu command to determine whether CPU usage is exceptionally high (at 95 percent or more). If so, see the “High CPU Utilization Problems” section on page 3-9for information on troubleshooting the problem.

CEF-Related Problems—If the crash is accompanied by system messages that begin with “%FIB,” it could indicate a problem with Cisco-Express Forwarding (CEF) on one of the line card’s interfaces. For more information, see Troubleshooting CEF-Related Error Messages, at the following URL:

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a0080110d68.s html

IPC Timeout—The InterProcess Communication (IPC) message that carried the original ping request or the ping reply was lost. This could be caused by a software bug that is disabling interrupts for an excessive period of time, high CPU usage on the PRE-1 module, or by excessive traffic on the line card that is filling up all available IPC buffers.

If the router is not running the most current Cisco IOS software, upgrade the router to the latest software release, so that any known IPC bugs are fixed. If the show processes cpu shows that CPU usage is exceptionally high (at 95 percent or more), or if traffic on the line card is excessive, see the “High CPU Utilization Problems” section on page 3-9.

Cisco uBR10012 Universal Broadband Router Troubleshooting Guide

4-6

OL-1237-01

 

 

Image 54
Contents Corporate Headquarters Text Part Number OL-1237-01Copyright 2001-2004, Cisco Systems, Inc All rights reserved N T E N T S ARP Traffic Testing with Digital Multimeters and Cable Testers B-1 OL-1237-01 Purpose AudienceDocument Organization Related DocumentationChapter Description Cisco.com Obtaining DocumentationDocumentation Feedback Ordering DocumentationObtaining Technical Assistance Cisco TAC WebsiteOpening a TAC Case Obtaining Additional Publications and Information TAC Case Priority DefinitionsXii Basic Troubleshooting Checklist Basic Troubleshooting Tasks and Startup IssuesConfirming the Hardware Installation Last reset from power-on Displaying the Cisco IOS Software VersionHardware Troubleshooting Flowchart Displaying System Environment InformationCisco uBR10012 System Startup Sequence TCC+Startup Event Event Description PEM Faults and Fan Assembly Failures AC PEM FaultsFault Symptom Corrective Action Color DescriptionDC PEM Faults DC PEM Front Panel original model, UBR10-PWR-DC 2400W AC-Input Power Shelf AC OK Other Electrical ProblemsFault DC OKFan Assembly Module Faults Fan Assembly ModuleFan Assembly Air Circulation Pattern Symptom Steps to Take Single FAN FailureMULTI-FAN Failure LED OL-1237-01 Troubleshooting PRE-1 Modules PRE Module Not Supported PRE-1 Module Status ScreenMessage Description IOS Intf Booting Up with Redundant PRE-1 ModulesIOS Prot IOS RUNPRE-1 Module Faults Fault Steps to Take LEDEthernet Connection Problems C10000config#interface fastethernet0/0/0Console Port Serial Connection Problems Troubleshooting Common System Problems Troubleshooting System CrashesHigh CPU Utilization Problems ARP TrafficRouterconfig-if# ip access-groupnumber Cpuhog Errors Debug and System MessagesExec and Virtual Exec Processes Invalid Scheduler Allocate Configuration Interrupts are Consuming a Large Amount of ResourcesIP Input Processing Bus Errors Problems with Access ListsSnmp Traffic Region Manager Start End Sizeb Class Media Name 0x0A000000 Memory Problems Alignment ErrorsLow Memory Errors Memory Parity ErrorsParticle Pool Fallbacks Spurious Interrupts Spurious Memory Accesses OL-1237-01 Troubleshooting Line Cards General Information for Troubleshooting Line Card Crashes Command DescriptionSIG Value SIG Name Error Reason Cache Parity Errors SigerrorSigreload Bus Errors Software-Forced Crashes Troubleshooting Line Cards TCC+ Front Panel Power MaintenanceStatus Description Fault Type Response Show controllers clock-reference command Troubleshooting the OC-12 Packet-Over-SONET Line Card Fault Corrective Action RX CARRIER-B ActiveRX CARRIER-A Enabled PASS-THROUGHPOS EnableFail SRPSync WrapPass Thru Troubleshooting the Gigabit Ethernet Line Card Gigabit Ethernet Line Card Faceplate and LED DescriptionsGigabit Ethernet Line Card Faults and Recommended Responses OL-1237-01 Password Recovery Procedure Overview Password Recovery ProcedurePress Return. The user Exec prompt appears Change all three passwords using the following commands OL-1237-01 Unsupported Commands Unsupported Frame Relay CommandsHccp Commands Mlppp CommandsSpectrum Management Commands Unsupported Mpls VPN CommandsUnsupported PPP Commands Unsupported Telco-Return CommandsOL-1237-01 Testing with Digital Multimeters and Cable Testers Equipment DescriptionTesting with TDRs and OTDRs Testing with TDRsTesting with OTDRs Testing with Network Monitors Testing with Breakout Boxes, Fox Boxes, and BERTs/BLERTsTesting with Network Analyzers Bert Enable LEDActive LED BlertMAINTENANCE, TCC+ ENABLE, OC-48 DPT/POSMAINTENANCE, OC-12 SRP/DPT POWER, OC-12 DPT/SRP POWER, TCC+STATUS, OC-12 DPT/SRP STATUS, TCC+ SYNC, OC-48 DPT/POS Maintenance LEDPower LED TX, OC-48 DPT/POS WRAP, OC-48 DPT/POSRX Carrier LED RX LED OC-12 DPT/SRP TCC+Present LED TCC+ RX Pkts LEDTDR B-2 TX LED OC-48 DPT/POSWrap LED IN-6