This prevents the compiler from reordering statements while optimizing code, so the measured program results may be worse than it would be otherwise. For example, with sample points inside of a loop, this could mean that loop invariant promotion or other loop transformations become illegal or less effective. For sample points placed at the entrance and exit of functions, this could affect performance if the function is inlined.
Unfortunately, the only way to check for such issues is to check the code generated by the compiler with and without those macros, and estimate whether the program measurements are significantly affected.
Restricting PMU Measurements to Specific Code Regions
By default, HP Caliper measures PMU events for your entire program. You can, however, restrict measurements to
You can use this feature with these measurements:
•alat
•branch
•dcache
•dtlb
•ecount
•fprof
•icache
•itlb
•pmu_trace
•scgprof
While you can also use this feature with the cgprof measurement, it might lead to inconsistent results. This is because the time statistics are collected using the PMU, while the call graph and function counts are collected using dynamic instrumentation.
Reasons to use this feature include:
•Analyzing a particular loop or function. You can restrict measurements to a particular loop to get information such as:
ecount Number of events occurring in the loop fprof Hot spots in the loop
branch Analysis of the loop branches dcache Data cache misses in the loop
•Analyzing a particular phase in an application.
For applications with important startup or shutdown phases, it is sometimes beneficial to limit measurements to the
Similarly, the data collection can be restricted to the startup or shutdown phases to target those for performance improvements.
Taking PMU Samples in Your Code 161