measurement. You can use command-line options to limit the scope of the measurement. Specifically, you can:

Limit measurement to a specific privilege level: -m event_set[:alluserkernel]

Include idle: --exclude-idle False

Exclude the interruption state: --measure-on-interrupts off

Only measure the interruption state: --measure-on-interrupts only

Metrics Available from this Measurement

The following metrics are available from this event set. These descriptions do not take into account any command-line options you might use.

The metrics are:

Avg Lat

Average memory read latency provides a measure of the number of CPU cycles required to service a memory cache line read from the perspective of the bus request queue (BRQ). The time measured includes the arbitration cycles, address cycles, memory controller/memory cycles, and data return cycles.

Load-to-use latency can be computed by adding the processor overhead cycles required to issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline. For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on which bypasses fail, that must be added to the reported value to estimate true load-use cycles. For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles must be added to estimate load-use cycles.

The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.

The average memory read latency on the dual-core Itanium 2 processor will appear greater than on previous Itanium 2 processors. This is because the reported latency also includes the latency that the arbiter adds to both the outbound request and inbound data transfer.

Avg Outstand

Average number of outstanding reads per cycle gives some idea of the memory request density, that is, the probability of one or more memory requests per cycle. For control-dominated code or for workloads that seldom miss the internal caches, this value will be very small. For data-flow-type workloads, this number can, if extensive prefetching is employed, be quite high, up to a maximum of 16, which is the Itanium 2 bus limit.

The reported average latency value will be incorrect on Itanium 2 steppings earlier than B2.

CPU

CPU transaction component is a measure of the percentage of all bus transactions generated by all CPUs on a shared front side bus (FSB).

I/O

I/O transaction component is a measure of the percentage of all bus transactions initiated by any I/O agent on a shared FSB.

Util Adrs

Average address bus utilization gives an estimate of total address bus utilization resulting from all bus transactions to include cache misses, I/O port reads/writes, interprocessor interrupts, writebacks, cache line invalidates (FC instruction, store hit on shared line), and clean castouts (if enabled). The utilization is computed as follows:

ADRS UTIL = 100.0 * (total transactions/sec * 3.0) / bus cycles/sec

The constant value (3.0) is the number of address cycles needed for each bus transaction.

262 Event Set Descriptions for CPU Metrics