Dcache Metrics Summed for Entire Run

The sampled metrics also provide detailed latency information by breaking up the misses into eight different latency buckets based on latency cycles. The different buckets provide percentage of misses with different latency ranges.

A latency bucket is a grouping of latency data associated with data accesses serviced by particular levels of CPU cache and system memory. The different latency buckets can be one of the following: L2 cache access, L3 cache access, and memory access. On cell-based systems, the following additional buckets are provided: cell local memory access, 1–hop memory access, 2–hop memory access, and cache-to-cache (C2C) access.

The latency bucket information is particularly useful for understanding data cache access behavior of large-enterprise multithreaded, multiprocess applications and fine-tuning the applications. For example, if a large percentage of data cache misses are due to 1– or 2–hop C2C accesses, this could indicate that the processes are sharing data and running on CPUs in two different cells. You can possibly improve performance significantly by scheduling those processes to run on CPUs within the same cell.

You can turn off the latency bucket information by using the --latency-buckets False option.

On HP-UX, HP Caliper uses the model command to determine what the CPU type and CPU frequency are.

On Linux, you need to use the --system-modeloption to help HP Caliper determine the CPU type and CPU frequency. If you do not use this option, HP Caliper will break up the misses into the following three buckets by default: L2 cache access, L3 cache access, and memory access.

The report shows measured data by thread, load module, function, statement, and instruction.

Command-line options let you control the amount of data reported, how the data is sorted, and the number of statements and instructions reported for each sampled program location.

You can use the --dcache-data-profileoption to get Data Summary output with a report. See “Using the --dcache-data-profile Option to Produce a Data Summary” (page 191).

Example Command Line for Text Report

$ caliper dcache -o reports/dcachem.txt ./matmul

Example Command Line for CSV Report

$ caliper dcache --csv csvout ./wordplay thequickbrownfox

This section describes the metrics summed over the entire run of your application under HP Caliper.

Metrics for Integrity Servers Itanium 2 Systems

L1D_READS	The number of data memory read references issued into
	memory pipeline that are serviced by the L1 data cache
	(only integer loads), register stack engine (RSE) loads,
	L1-hinted loads (L1 data cache returns data if it hits in L1
	data cache but does not do a fill) and check loads (ld.c).
	Non-cacheable reads, virtual hash page table (VHPT) loads,
	semaphores, floating-point loads, and lfetch instructions are
	not counted here because the L1 data cache does not handle
	these. The count includes wrong path operations but
	excludes predicated off operations.
L1D_READ_MISSES.ALL	Number of L1 data cache read misses. L1 data cache is
	write through; therefore, write misses are not counted. The
	count only includes misses caused by references counted
	by an L1D_READS event. It will include L1 data cache misses
	that missed the ALAT, but not those that hit in the ALAT.
	Semaphores are not handled by the L1 data cache and are
	not included in this count.

dcache Measurement Report Description 187

HP UX Caliper Software manual Dcache Metrics Summed for Entire Run, L1DREADS, L1DREADMISSES.ALL

Models: UX Caliper Software

Example Command Line for Text Report

Example Command Line for CSV Report

dcache Metrics Summed for Entire Run

Metrics for Integrity Servers Itanium 2 Systems

L1D_READS

L1-hinted loads (L1 data cache returns data if it hits in L1

data cache but does not do a fill) and check loads (ld.c).

L1D_READ_MISSES.ALL

Number of L1 data cache read misses. L1 data cache is