Chapter 4 Troubleshooting Line Cards
General Information for Troubleshooting Line Card Crashes
•Hard parity errors occur when a hardware defect in the DRAM or processor board causes data to be repeatedly corrupted at the same address. In general, a hard parity error occurs when more than one parity error in a particular memory region occurs in a relatively short period of time (several weeks to months).
When parity occurs, take the following steps to resolve the problem:
Step 1 Determine whether this is a soft parity error or a hard parity error. Soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, wait for a second parity error before taking any action. Monitor the router for several weeks after the first incident, and if the problem reoccurs, assume that the problem is a hard parity error and proceed to the next step.
Step 2 When a hard parity error occurs (two or more parity errors at the same memory location), try removing and reinserting the line card, making sure to fully insert the card and to securely tighten the restraining screws on the front panel.
Step 3 If this does not resolve the problem, remove and reseat the DRAM chips. If the problem continues, replace the DRAM chips.
Step 4 If parity errors occur, the problem is either with the line card or the router chassis. Try removing the line card and reinserting it. If the problem persists, try removing the line card from its current slot and reinserting it in another slot, if one is available. If that does not fix the problem, replace the line card.
Step 5 If the problems continue, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
For information on contacting TAC and opening a case, see the “Obtaining Technical Assistance” section on page x.
Bus Errors
Bus errors (SIG type is 10) occur when the line card tries to access a memory location that either does not exist (which indicates a software error) or that does not respond (which indicates a hardware error). Use the following procedure to determine the cause of a bus error and to resolve the problem.
Perform these steps as soon as possible after the bus error. In particular, perform these steps before manually reloading or power cycling the router, or before performing an Online Insertion/Removal (OIR) of the line card, because doing so eliminates much of the information that is useful in debugging line card crashes.
Step 1 Capture the output of the show stacks, show context, and show
Cisco uBR10012 Universal Broadband Router Troubleshooting Guide
|
| ||
|
|