Intel IXP12xx manual Usec/frame = 1559 cycles/frame

Page 7

Version 1.0, 4/10/02

bytes/minimum frame}. 84 bytes/frame * 8 bits/byte / 100Mb/sec = 6.72 usec/frame. 232MHz *

6.72usec/frame = 1559 cycles/frame

These cycle budgets specify how frequently a cell or frame goes over the wire. If multiple threads handle multiple frames on the same wire, then the budgets are multiplied accordingly. For example, the OC-12 cycle budget is 164 cycles/cell, but since the four threads on a single microengine can work on four frames simultaneously, the equivalent per-thread cycle budget becomes 4 * 164 cycles, or approximately 660 cycles/frame. That is, four threads working on 4 different cells can each take up to 660 cycles to process a cell and still keep up with line rate.

This per-thread per-packet cycle budget is independent of how the thread consumes the cycles – it specifies only the maximum time (in cycles) between the beginning and end of packet processing. The cycles may be used for instruction execution, aborted instructions due to branches, microengine stalls due to command queue pushback, or idle cycles. Changes in any of these uses of time can cause a thread to meet or exceed its cycle budget.

The Developer’s Workbench IX Bus Device Simulator is typically configured to show performance in Mbps based on frames/sec. However, it can also be configured to display cycles/frame, which is useful in tuning a design to reach cycle budgets.

Developer’s Workbench IX Bus Simulator – Bounded and Unbounded Wire Rates

Simulations can be run with ports “bounded” or “unbounded” to the wire rate. Simulations run with ports “bounded” to wire rate will always show exactly the correct cycle budget/frame – because it is bound to the desired wire rate. It is also useful to run a simulation with the ports “unbounded” to wire-rate (infinite bandwidth on the wire). This means that on the receive side there is always data waiting on the wire, and on the transmit side the wire is always ready to accept more data. If the design is able to run faster than wire-rate, then setting the IX Bus Device Simulator to display in cycles/frame can be useful to relate that to instructions.

This technique was used to measure the OC-12 Receive Microengine over several workloads against its 164 cycles/cell budget. The 8 interleaved VC workloads were used to make sure that the VC-cache experienced a 100% miss rate. Figure 4 shows the results for both the –75 and –7E DRAM speed grades.

Cells/PDU

Virtual Circuits

Cycles/Cell –75

Cycles/cell –7E

1*

1

154.6

137.9

1*

4 random

163.9

149.8

1*

8 interleaved

172.8**

159.0

2

1

161.0

137.1

2

8 interleaved

158.4

149.2

32

1

152.5

141.9

32

8 interleaved

131.5

127.4

*Simulations show that ATM Receive can handle the 1-cell/PDU workload, but that the IP Router in the next pipeline stage falls behind.

**For –75 DRAM, the ATM Receive cycle budget is exceeded for a workload of single-cell, interleaved PDUs.

Figure 4OC-12 Unbounded ATM Receive simulations versus 164 cycle budget

Page 7 of 17

Image 7
Contents IXP12xx ATM OC12/Ethernet IP Router Example Design Version 1.0, 4/10/02 Measurement Environment OverviewAlternate Dram Timing Protocol Performance of IP over ATM vs. Ethernet KEY Workloads & Approaches to Testing the Example DesignSingle Cell PDU Workload Frame and PDU Length versus IP Packet Length Cycles/cell Multiple Cells/PDU WorkloadCycle and Instruction Budgets Cycle Budgets to support Line RatesCells/PDU Virtual Circuits Cycles/Cell Cycles/cell -7E Usec/frame = 1559 cycles/frameSimulated 29-byte packet performance1 Simulation Measurement Procedure and ResultsSimulated 40-byte and 1500-byte packet performance2 Hardware Measurement Procedure and ResultsHardware Measurement Results Hardware 29-byte packet performance Single-cell/PDU Performance using 133MHZ Dram88,300 Hardware 40-byte packet performance138 142 144 9.5518 Hardware 1500-byte packet performance517 ATM Queue to Core Throughput Queue to Core Measurement TechniqueEthernet Queue to Core Throughput Microengine Register and Microstore Headroom Resource Utilization and Headroom AnalysisSram Capacity Scratchpad RAM CapacitySram and Sdram Bandwidth Sdram CapacityBuffer Allocation in Dram Appendix