Version 1.0, 4/10/02
bytes/minimum frame}. 84 bytes/frame * 8 bits/byte / 100Mb/sec = 6.72 usec/frame. 232MHz *
6.72usec/frame = 1559 cycles/frame
These cycle budgets specify how frequently a cell or frame goes over the wire. If multiple threads handle multiple frames on the same wire, then the budgets are multiplied accordingly. For example, the OC-12 cycle budget is 164 cycles/cell, but since the four threads on a single microengine can work on four frames simultaneously, the equivalent per-thread cycle budget becomes 4 * 164 cycles, or approximately 660 cycles/frame. That is, four threads working on 4 different cells can each take up to 660 cycles to process a cell and still keep up with line rate.
This per-thread per-packet cycle budget is independent of how the thread consumes the cycles – it specifies only the maximum time (in cycles) between the beginning and end of packet processing. The cycles may be used for instruction execution, aborted instructions due to branches, microengine stalls due to command queue pushback, or idle cycles. Changes in any of these uses of time can cause a thread to meet or exceed its cycle budget.
The Developer’s Workbench IX Bus Device Simulator is typically configured to show performance in Mbps based on frames/sec. However, it can also be configured to display cycles/frame, which is useful in tuning a design to reach cycle budgets.
Developer’s Workbench IX Bus Simulator – Bounded and Unbounded Wire Rates
Simulations can be run with ports “bounded” or “unbounded” to the wire rate. Simulations run with ports “bounded” to wire rate will always show exactly the correct cycle budget/frame – because it is bound to the desired wire rate. It is also useful to run a simulation with the ports “unbounded” to wire-rate (infinite bandwidth on the wire). This means that on the receive side there is always data waiting on the wire, and on the transmit side the wire is always ready to accept more data. If the design is able to run faster than wire-rate, then setting the IX Bus Device Simulator to display in cycles/frame can be useful to relate that to instructions.
This technique was used to measure the OC-12 Receive Microengine over several workloads against its 164 cycles/cell budget. The 8 interleaved VC workloads were used to make sure that the VC-cache experienced a 100% miss rate. Figure 4 shows the results for both the –75 and –7E DRAM speed grades.
Cells/PDU | Virtual Circuits | Cycles/Cell –75 | Cycles/cell –7E |
1* | 1 | 154.6 | 137.9 |
1* | 4 random | 163.9 | 149.8 |
1* | 8 interleaved | 172.8** | 159.0 |
2 | 1 | 161.0 | 137.1 |
2 | 8 interleaved | 158.4 | 149.2 |
32 | 1 | 152.5 | 141.9 |
32 | 8 interleaved | 131.5 | 127.4 |
*Simulations show that ATM Receive can handle the 1-cell/PDU workload, but that the IP Router in the next pipeline stage falls behind.
**For –75 DRAM, the ATM Receive cycle budget is exceeded for a workload of single-cell, interleaved PDUs.
Figure 4 – OC-12 Unbounded ATM Receive simulations versus 164 cycle budget