Intel IXP12xx Simulation Measurement Procedure and Results, Simulated 29-byte packet performance1

Page 8

Version 1.0, 4/10/02

One issue with running simulations unbounded to wire-rate is that it can hide errors because there is no concept of device overflows or underflows. Further the design can become un-balanced, say for example if an efficient receiver races ahead of the rest of the design, hogging shared system resources and potentially penalizing another part of the system.

Another approach is to simulate bounded, but to bind to a wire-rate that is faster than the actual wire-rate. The disadvantage of this technique is that it is an iterative process. To discover the maximum performance one must raise the wire rate until the design fails to keep up, and then one must lower it until the design runs correctly without any overflows or underflows.

SIMULATION MEASUREMENT PROCEDURE AND RESULTS

In the simulation environment, 29, 40, and 1500 byte packets are measured using the Developer’s Workbench IX Bus Device Simulator’s streams facility. The workloads are homogeneous, in that the same sized packets are sent into both Ethernet and ATM ports.

To measure the performance of the design, the simulation is run with the Ethernet ports bounded to 100Mbps, and the ATM ports bounded to 155 or 622 Mbps, as appropriate. The simulator is set to stop if it detects a device overflow or underflow.

Full-bandwidth input streams of the specified packet size are simultaneously applied to all ATM and Ethernet ports present in the configuration for at least 1M cycles.

Upon completion of the simulation run, the line rates in the IX Bus Device Status window are observed. The Ethernet ports should be receiving at 100 Mbps each. The ATM port(s) should be receiving at 622 (or 155 Mbps each). For 29-byte packets, the Ethernet side should transmit at wire-rate and discard excess ATM input. For 40 and 1500-byte packet workloads the ATM side should transmit at wire-rate and discard excess Ethernet input.

No device overflows or underflows were detected during the simulation.

Simulated 29-byte packet performance1

For the OC-12 and 4xOC-3 configurations running the 1 cell/PDU workload, the simulation stops with a watch-point when the MSGQ from the ATM Receive Microengine to the IP Route Microengine fills to capacity. This means that the IPR Microengine is not able to keep up with the 1 lookup/cell workload (1.4M lookups/sec). Upon disabling the watch-point and completing the 1M cycle simulation, the number of PDUs dropped due to the ATM_RX_IPR_FULLQ is compared to the total number of cells received via ATM. This shows that the IP Router Microengine drops 19-22% of the cells received via ATM. Conversely, it shows that the IP Router Microengine is routing 78-81% of the 1.4M cells/sec input, or about 1.1M routes/second.

While this observation shows that under this workload the IP Router does not keep up with the input, it shows that for a workload with 2-cell PDUs, the IP Router has the capability of routing (1.1 – 1.4/2) = 400K routes/second more than the maximum 700K routes/second required.

1Simulations for 29-byte, 40-byte, and 1500-byte packet loads were run using 133 MHz memory (-75).

Page 8 of 17

Image 8
Contents IXP12xx ATM OC12/Ethernet IP Router Example Design Version 1.0, 4/10/02 Alternate Dram Timing OverviewMeasurement Environment Single Cell PDU Workload KEY Workloads & Approaches to Testing the Example DesignProtocol Performance of IP over ATM vs. Ethernet Frame and PDU Length versus IP Packet Length Multiple Cells/PDU Workload Cycle and Instruction BudgetsCycle Budgets to support Line Rates Cycles/cellUsec/frame = 1559 cycles/frame Cells/PDU Virtual Circuits Cycles/Cell Cycles/cell -7ESimulation Measurement Procedure and Results Simulated 29-byte packet performance1Hardware Measurement Results Hardware Measurement Procedure and ResultsSimulated 40-byte and 1500-byte packet performance2 Single-cell/PDU Performance using 133MHZ Dram Hardware 29-byte packet performanceHardware 40-byte packet performance 138 142 1449.5 88,300517 Hardware 1500-byte packet performance518 Ethernet Queue to Core Throughput Queue to Core Measurement TechniqueATM Queue to Core Throughput Resource Utilization and Headroom Analysis Microengine Register and Microstore HeadroomScratchpad RAM Capacity Sram CapacitySdram Capacity Sram and Sdram BandwidthAppendix Buffer Allocation in Dram