the HPW will terminate and initiate a trap to software to provide the required TLB entry. This component counts the stall component only due to the HPW providing the required TLB entry. Time spent in the software trap handler is not counted in this component.

Dcache

This counts the number of cycles stalled due to data cache misses at any level of the cache hierarchy (L1, L2, L3). Due to event limitations, it is not possible to distinguish between freg-freg and freg-load dependencies. This has the unfortunate effect of counting either scoreboard cycles as data cache cycles or data access cycles as scoreboard cycles. This implementation allocates all floating-point stalls to the data cache category. This has the implication that some floating-point register dependency stalls that should be allocated to the scoreboard category will be incorrectly allocated to the data cache category.

RSE Active

This counts the number of cycles that the pipeline is stalled due to the Register Save Engine spilling/filling registers to/from memory.

sysbus Event Set

Available only on Itanium 2 and dual-core Itanium 2 systems.

The sysbus event set provides data on system bus utilization and its breakdown into:

Transaction originator (all, local cpu, io)

Transaction type (brl, bril, bil, bwl, partial)

If you use this option, you must use the --bus-speedoption.

If you use this event set, the default is to make the measurements irrespective of CPU operating state (that is, user, system, or interrupt states). By default, the idle state is not included in the measurement. You can use command-line options to limit the scope of the measurement. Specifically, you can:

Limit measurement to a specific privilege level: -m event_set[:alluserkernel]

Include idle: --exclude-idle False

Exclude the interruption state: --measure-on-interrupts off

Only measure the interruption state: --measure-on-interrupts only

Metrics Available from this Measurement

The following metrics are available from this event set. These descriptions do not take into account any command-line options you might use.

The metrics are:

Avg Lat

Average memory read latency provides a measure of the number of CPU cycles required to service a memory cache line read from the perspective of the bus request queue (BRQ). The time measured includes the arbitration cycles, address cycles, memory controller/memory cycles, and data return cycles.

Load-to-use latency can be computed by adding the processor overhead cycles required to issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline. For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on which bypasses fail, that must be added to the reported value to estimate true load-use cycles. For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles must be added to estimate load-use cycles.

The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.

sysbus Event Set 247