Data Summary

---------------------------------------------------------------------------------------------------------------

% Total

 

 

 

Avg.

---Latency buckets as % Misses--

Dcache

Cumulat

Sampled

Dcache

Dcache

L2

--L3

-- ------Memory-------

Latency

% of

Dcache

Latency

Laten.

 

 

 

Cycles

Total

Misses

Cycles

Cycles

7

14

64 150 250 350 450 > Data Entry

---------------------------------------------------------------------------------------------------------------

66.82

66.82

42

580

13.8

62

29

7

0

0

2

0

0

Heap

7.72

74.54

10

67

6.7

80

20

0

0

0

0

0

0

Memory mapped shared library

5.65

80.18

5

49

9.8

40

60

0

0

0

0

0

0

Process Text Region

4.84

85.02

4

42

10.5

25

50

25

0

0

0

0

0

libc.so.1::_arena_rmutex

4.72

89.75

5

41

8.2

40

60

0

0

0

0

0

0

Process Data Region

The Data Entry column shows the global variable name, process region name, or unknown data address.

The process regions are:

Process Text Region - the address space occupied by the process text/instructions

Process Data Region - the address space occupied by initialized data and uninitialized data (.bss)

Heap - the address space where dynamically allocated memory resides

Data and Heap combined - when HP Caliper cannot discover the data and heap regions separately

Process Stack Region - the user stack area

Shared mem - all the shared memory areas mapped to the process

RSE Stack - the RSE stack area

Memory mapped shared library - the data area of the shared libraries mapped to the process

Memory mapped region - all other memory mapped regions

If there is more than one region of the same type, they are combined and reported as a single entry.

The Data Summary report is generated per-process. For a per-thread report, use the --thread all option. For a per-module report, use the --per-module-data True option.

The Data Summary report can be merged or differenced across two databases that contain the Data Summary information.

If a process exec()s, HP Caliper does not discover the process regions. In this case, the data addresses are mapped to global variables, and any unassigned samples are reported as unknown samples. A diagnostics message is generated with the report.

How Data Cache Metrics Are Obtained

HP Caliper obtains data cache metrics from the processor's performance monitoring unit (PMU).

Exact counts are obtained from the PMU's set of performance monitor configuration (PMC)/performance monitor data (PMD) register pairs. Sampled data cache metrics are obtained from the PMU's data event address register (D-EAR). Both sets of metrics focus on the L1 cache, with notable exceptions.

HP Caliper takes samples every Nth data cache miss, where N is defined in the dcache measurement configuration file in the HP Caliper home directory config subdirectory. At each sample point, HP Caliper records both the instruction that resulted in a data cache miss and the latency (number of clock cycles) incurred by the miss. You can override the value in the measurement configuration file by using the -soption.

For data cache miss sampling, the PMU can monitor only one data cache load at a time. Since there are likely to be multiple loads in progress at any given moment, the PMU can process only a subset of data cache misses. The PMU randomizes which loads it monitors.

This means that the number of data cache misses observed through sampling—number of sampled misses multiplied by sampling rate—is only a subset of the total number of actual data cache misses. Therefore, it is best to interpret sampling data not as an indication of how many data cache

192 Descriptions of Measurement Reports