Getting Performance Advice
$ caliper ecount myapp $ caliper fprof myapp $ caliper advise
collect and save initial data; generate advice on collected data and possible next steps
Specifying cpu Event Sets
(for cpu measurement; not available on Linux)
broad overview of CPU (includes | ||
| event sets: cpi, stall, | |
| dispersal, l1icache, | |
l1dcache, l2cache, tlb, fp) | ||
restrict overview to execution in | ||
| ||
kernel space | ||
branch types, path distributions, | ||
branches per instructions, … | ||
branch prediction effectiveness | ||
cache coherence activity | ||
cycles per instruction | ||
central electronics complex | ||
(CEC) traffic and CPU interaction | ||
control speculation effectiveness | ||
parallelism at instruction | ||
| dispersal | |
data speculation effectiveness | ||
| execution rate, and flush/trap | |
events density | ||
L1 data cache miss rates | ||
L1 instruction cache miss and | ||
prefetch information | ||
L2 unified cache miss rates | ||
L2 data cache miss rates | ||
L2 instruction cache miss rates | ||
L3 unified cache miss rates | ||
bus request queue information | ||
cycles lost due to instruction | ||
| replays (Intel® Itanium® 9500 | |
processors only) | ||
CPU performance limiters | ||
system bus utilization | ||
impact of HyperThreading | ||
translation lookaside buffer | ||
| (TLB) misses |
Specifying CPU Event Privilege Levels
count CPU_CYCLES events | |
that occur in user space | |
count CPU_CYCLES that occur | |
in kernel space | |
record CPI for both user and | |
| kernel space (cpu |
measurement) | |
sample on CPU_CYCLES in | |
| both user and kernel space |
set default privilege level to | |
| kernel for all events |
Choosing CPU Events
(summed across run for ecount, fprof, pmu_trace, {di}cache, {di}tlb, branch)
event | count event events | |
event,event2,… | count multiple events |
Controlling Amount of Data Reported
in Function Summary section, | ||
| only report functions responsible | |
| for 2 percent or more of data | |
only report functions responsible | ||
| ||
| for 1% or more of data; stop | |
| reporting functions when the | |
| cumulative percentage is at 80% | |
| ||
in Function Details section, | ||
| only report functions responsible | |
| for 10 percent or more of data | |
do not include Function Details | ||
section in report | ||
show source statements and | ||
| disassembly in Function Details | |
| report 1 extra source line before | |
| and after source lines with data; | |
| report 3 extra instructions before/ | |
after each instruction with data | ||
| ||
in Process Summary section, | ||
| only report processes respon- | |
| sible for >=5 percent of data | |
save module information so | ||
| reports can be generated when | |
| modules are not present |
--dcache-data-profile causes the dcache
measurement to create a data summary report that reports global variables and process regions associated with dcache misses
Finding Source Files, Libraries, Executables
(for use in symbol, source, and disassembly reporting)
search dir1, dir2 (short form)
replace /work with /net/work in source file paths when searching for source files
Selecting Measurement Duration
Reporting Threads
same thread starter function name
Specifying Location of Kernel
(for use in symbol and disassembly reporting)
Collecting Memory Usage Data
(for any measurement; not available on Linux)
measure at start and end of run | ||
measure at start/end (short form) | ||
measure every 10 seconds | ||
timed:1m | measure every minute | |
all | measure once a second and at |
start and end of run