Explicit - Instructions not dispersed

This is a count of the number of instructions that were not dispersed due to explicit stop bits. Explicit stop bits are used to separate bundles (three instructions) within a bundle group (two bundles of three instructions each) or to separate bundle groups. Explicit stops bits can also be found within bundle-specific templates that contain embedded stop bits, that is, M_II. The default mode will include all dispersal cycles.

Implicit - Instructions not dispersed

This is a count of the number of instructions that were not dispersed due to implicit stop bits. Implicit stops are generated by the CPU when a CPU resource becomes oversubscribed.

%Dispersed Instr retired

This is a metric of dispatch efficiency, that is, the percentage of the instructions that were dispersed that reached retirement. The number of instructions dispersed will not equal the number reaching retirement primarily because of pipeline flushes mainly due to mispredicted branches, traps, and interrupts. If the retirement ratio is low, this is likely due to poor branch prediction. This can be confirmed with using one of the branch prediction measurements (brpred or brpath). Other possibilities include excessive speculation failures (cspec, dspec) and tlb misses.

Software Limit - Dispersal ILP

This metric attempts to give some estimate of the compiler's ability to expose ILP (instruction level parallelism). The metric is shown in two parts. The raw ILP includes all instructions dispersed, including nop and instructions that will later be predicated off. The effective ILP excludes all nops and predicated off instructions, subject to the Itanium 2's predicated instruction semantics. This is based on the number of instructions not dispersed due to explicit stops, as this is directly controlled by the compiler.

Combined Total - Dispersal ILP

This metric gives the overall dispersal ILP that the compiler, processor implementation, and workload combination is providing. The metric is shown in two parts. The raw ILP includes all instructions dispersed, including nops and instructions that will later be predicated off. The effective ILP excludes all nops and predicated off instructions, subject to the Itanium 2's predicated instruction semantics.

dspec Event Set

The dspec event set provides information on the effectiveness of data speculation. Data speculation is the execution of a memory load prior to a store which preceded it and which might potentially alias with it.

Data speculation occurs when the ordering of data accesses is changed by the optimizer. The ability to alter the ordering of memory operations can greatly increase the degrees of freedom when attempting to generate optimal code. It carries with it the unfortunate side effect that the chosen order might generate traps as a result of data ordering issues. The metrics provided by this event set give some insight into the amount and effectiveness (success rate) of data speculation.

If you use this event set, the default is to make the measurements irrespective of CPU operating state (that is, user, system, or interrupt states). By default, the idle state is not included in the measurement. You can use command-line options to limit the scope of the measurement. Specifically, you can:

Limit measurement to a specific privilege level: -m event_set[:alluserkernel]

Include idle: --exclude-idle False

Exclude the interruption state: --measure-on-interrupts off

Only measure the interruption state: --measure-on-interrupts only

The event per kinst (event per 1000 instructions) metrics are computed using all instructions retired. This includes nops, predicated off instructions, failed speculation and instructions and associated

228 Event Set Descriptions for CPU Metrics