HP UX Fortran Software manual Performance and parallelization, Profiling parallelized programs

Page 101

As this command line implies, if you link and compile separately, you must use f90, notld. The command line to link must also include the +Oparallel and +O3options in order to link in the parallel runtime support.

Performance and parallelization

To ensure the best runtime performance from programs compiled for parallel execution on a multiprocessor machine, do not run more than one parallel program on a multiprocessor machine at the same time. Running two or more parallel programs simultaneously may result in their sharing the same processors, which will degrade performance. You should run a parallel-executing program at a higher priority than any other user program; see rtprio ((1))for information about setting real-time priorities.

Running a parallel program on a heavily loaded system may also slow performance.

Profiling parallelized programs

You can profile a program that has been compiled for parallel execution in much the same way as for non-parallel programs:

1.Compile the program with the +gprof option

2.Run the program to produce profiling data.

3.Run gprofagainst the program.

4.View the output from gprof.

The differences are:

Step 2 produces a gmon.outfile with the CPU times for all executing threads.

In Step 4, the flat profile that you view uses the following notation to denote DOloops that were parallelized:

routine_name##pr_line_nnnn

whereroutine_name is the name of the routine containing the loop, pr(parallel region) indicates that the loop was parallelized, and nnnnis the line number of the start of the loop.

Conditions inhibiting loop parallelization

The following sections describe conditions that can cause the compiler not to parallelize. These include the following:

Calling routines with side effects

Indeterminate iteration counts

Data dependences

Calling routines with side effects parallellization

The compiler will not parallelize any loop containing a call to a routine that has side effects. A routine has side effects if it does any of the following:

Modifies its arguments

Modifies a global, common-block variable, or save variable

Redefines variables that are local to the calling routine

Performs I/O

Calls another subroutine or function that does any of the above

You can use the DIR$ NO SIDE EFFECTSdirective to force the compiler to ignore side effects when determining whether to parallelize the loop. For information about this directive, see .

Parallelizing HP Fortran programs 101

Image 101
Contents Abstract HP Fortran Programmer GuidePage Contents Controlling data storage Using the on statementDebugging Performance and optimizationWriting HP-UX applications 107 Using Fortran directives 123Calling C routines from HP Fortran 110 Migrating to HP Fortran 131Documentation Feedback 153 Glossary 154 Index 159 Fortran 2003 Features 151Porting to HP Fortran 141 HP secure development lifecycle An overview of HP Fortran An overview of HP FortranHP Fortran compiler environment Options for controlling the f90 driver Driver+dryrun +preinclude= fileOptions for controlling the C preprocessor PreprocessorOptions for controlling the front end Front-end+moddir=directory Options for controlling optimization Back-end+Onooptimization Options for controlling code generationOptimization +DAmodelOptions for controlling the Linker LinkerOoutfile +FPflagsLdirectory Wl ,options ToolsHP-UX operating system F90 command syntax Compiling with the f90 commandCompiling and linking $ f90 hello.f90F90 command syntax Command-line optionsCommand-line options Example 2 hello.f90Commonly-used options Command-line options by categoryCommonly-used options +saveOptions listed by category Option descriptions+allowunaligned Example 3 ExampleDo I+1, N 14164 Data type sizes and +autodbl4+autodbl +autodbl4 Boption+charlit77 +cpp=default+check=bounds +nocfc+DAmodel Name=def+DDdatamodel DatamodelareItanium BlendedItanium2 NativeValues for the +FP option +hugecommon Signals recognized by the +fpexception optionGformat77 Example 4 % f90 +hugecommon=results pcvals.f90 +indirectcommonlist=file /usr/include directory +noimplicitnone+initheapcomplex=rvalival +initheapinteger=ivalIpo +io77+nocheckuf +nolibsRequires concurrent use of the +Oprofile=use option Levels of optimizationWith different values of optlevel +noobjdebug+pa1 +nodemandload the default +demandload option. The default is +nodemandload+r8 +realconstant=singleTp,/usr/ccs/lbin/cpp Tx,pathF90com End.oWx,arg1,arg2,...,argN Bdefault=symbol,symbol Symbol binding optionsBextern =symbol ,symbol Bhidden =symbol ,symbolF90 +O3 +Osize myprog.f90 Using optimization optionsReviewing general optimization options +Onoall +Oconservative+Onoautopar +Oautopar and omit +OparallelF90 +O3 +Onomoveflops +Ofltacc myprog.f90 Fine-tuning optimization optionsDefault is +Odataprefetch Default is +Onocxlimitedrange+Ocachepadcommon option +Onocxlimitedrange+Onofastaccess +Onofenvaccess+Onoentrysched +OnofailsafeOptimizations performed by +Onofltacc +Onoinline +Oinlinebudget=n +Oinlinebudget enables+Onoinlinefilename +Onoinline=function1,function2Millicode versions of intrinsic functions Values for the +Oinlinebudget option+Onoloopunroll=factor +inlinelevel num+Oloopunroll=4 +Onoloopunrolljam+Oparallelintrinsics Default is+Onoparmsoverlap+Onoparmsoverlap +Onopipeline+Onorecovery Default is +OnopromoteindirectcallsDefault is +Oshortdata=8 For +Oprofile=collectarc,strideFilenames recognized by f90 FilenamesLinking with f90 vs. ld Linking HP Fortran programsLibraries linked by default on Itanium Libraries linked by default on PA-RISCLinking to libraries $ f90 -c hello.f90 # compileLinking HP Fortran 90 routines Linking to nondefault librariesOpt/fortran90/lib/pa2064/ -lF90 -lisamstub Additional HP Fortran librariesLinking to shared libraries Compiling programs with modules Special-purpose compilationsLibrary search rules $ f90 -Wl,-a,archive prog.f90 -lmSpecial-purpose compilations Example ExamplesExample 6 Example 2-2 main.f90 Example 7 Example 2-3 code.f90Example 8 Example 2-4 data.f90 Compiling with make$ f90 -o dostats data.f90 code.f90 main.f90 $ dostatsManaging .mod files Compiling for different PA-RISC machinesExample 9 Example 2-5 makefile $ makeCompiling with +pic Creating shared librariesLinking with -b Using the C preprocessorProcessing cpp directives Using the C preprocessorExample 13 Example 2-9 cppdirect.f90 $ f90 +cpp=yes -D Debug cppdirect.f90Saving the cpp output file Creating demand-loadable executablesCreating shared executables Using environment variables Compiling in 64-bit mode$ f90 +noshared prog.f90 HP Fortran environment variablesSTF90COM64 environment variable F90ROOT environment variableHPF90OPTS environment variable $ f90 +list hello.f90Floating installation Floating installationLpath environment variable Mpnumberofthreads environment variableAlternate-path/opt/fortran90.3.6.1 Setting up floating installationDisabling implicit typing Controlling data storageAutomatic and static variables Disabling implicit typingControlling data storage ContainsIncreasing the precision of constants Increasing default data sizes Increasing default data sizesIncreasing default data sizes Which creates multiple threads Sharing data among programsUsr/lib/libpthread.sl Sharing data among programs $ gotosleepIm up Modules vs. common blocks$ wakeup Modules vs. common blocks Stripping debugging information Using the HP WDB debuggerDebugging Signal Signals recognized by +fpexceptionHandling runtime exceptions Floating-point exceptions= 1.0/0.0 Bus error exceptionFloating-point exceptions Segmentation violation exception Illegal instruction exceptionBad argument exception Using debugging linesExceptions handled by the on statement Using the on statementOn REAL8 DIV 0 Call divzerotrap Exceptions handled by the on statementExceptions handled by the on statement Actions specified by onOn Double Precision DIV 0 Call divzerotrap Terminating program execution Ignoring errorsExample 14 Example5-1 abort.f90 Example 15 Example5-2 ignore.f90Trapping floating-point exceptions Calling a trap procedureTrapping integer overflow exceptions On Double Precision Overflow Call trapExample 17 Example5-4 callitrap.f90 Trapping +Ctrl-C trap interruptsAllowing core dumps On Real Overflow Ignore Example 18 Example 5-5 allowcore.f90Using profilers Using profilersPerformance and optimization HP CaliperOpt/ansic/bin/cc -Aa +O3 -o program +Oprofile=collect Comparing Program PerformanceProgram.c ProgramprogramargumentsGprof Using Options to Control Data CollectionSpecifying PBO file names and locations $ gprof prog gprof.outUsing +O to set optimization levels Using options to control optimizationProf $ f90 +O4 file.f90+O2, -O Using the optimization options+O3 +O4$ f90 +02 +Oaggressive +Osize prog.f90 Fine-tuning optimization options$ f90 +O4 +Oaggressive +Ofltacc prog.f90 Packaged optimization options+Ofastaccess at level Is +Onofastaccess at+O2 +Ofltacc=relaxed+Onoinitcheck +Ofltacc=relaxed . ThisFast +Onolibcalls +Oinlinelevel num+Olibcalls +Onoloopunroll=n+Opipeline +Onoparminit+Orecovery +Onoreturn +Oregreassoc+Oshortdata=8 +Ovectorize option on+Owholeprogrammode Conservative vs. aggressive optimization+Onowholeprogrammode Parallelizing HP Fortran programs Conservative, aggressive, and default optimizationsCompiling for parallel execution F90 +O3 +Oparallel -c x.f90 y.f90 F90 +O3 -c z.f90Profiling parallelized programs Performance and parallelizationConditions inhibiting loop parallelization Calling routines with side effects parallellizationData dependences Indeterminate iteration countsVectorization Using the +Ovectorize optionF90 +O3 +Ovectorize prog.f90 Vector routines called by +OvectorizeSaxpy Controlling vectorization locallySdot VecdmultaddExample 19 Example 6-1 axpy.f90 Calling Blas library routinesREAL, External sdot Industry-wide standard VectorizationControlling code generation for performance Writing HP-UX applications Accessing command-line arguments$ fprog arg1 another arg Example 20 Example 7-1 getargs.f90Stream I/O using Fstream Using HP-UX file I/OPerforming I/O using HP-UX system calls Calling HP-UX system and library routinesObtaining an HP-UX file descriptor Using HP-UX file I/OData type correspondence for HP Fortran and C Calling C routines from HP FortranData types Logicals Unsigned integersSize differences between HP Fortran and C data types Size differences after compiling with +autodblExample 21 Example 8-1 passcomplex.f90 Complex numbersComplex sqrcomplexCOMPLEX cmxval Derived types Argument-passing conventionsPointers Example 22 Example 8-2 sqrcomplex.cCase sensitivity Integer ptr INTEGER, DIMENSION100 iarrayVoid fooint *ptr, int iarray100, int Call foo%REFptr, %REFiarray, %VALi$HP$ Alias bubblesort = BubbleSort%REF,%VAL Example 23 Example 8-3 sortem.cExample 24 Example 8-4 testsort.f90 Case sensitivityREAL, DIMENSION2,3,4 Memory layout of a two-dimensional array in Fortran and CArrays IntExample 26 Example 8-6 getarray.c Example 25 Example 8-5 passarray.f90Null-terminated string StringsFortran hidden length argument Passing a stringStrings Following are example C and Fortran programsExample 28 Example 8-8 getstring.c File handlingExample 27 Example 8-7 passchars.f90 File handling Example 29 Example 8-9 fnumtest.f90Int somedata Sharing dataExtern int somedata Extern int globals100Using HP Fortran directives Using Fortran directivesDirective syntax HP Fortran directivesDescription and restrictions Syntax$HP$ Alias name = external-name arg-pass-mode-list NameArgument-passing conventions Local and global usageCase sensitivity For more information StringsExample 31 Example 9-1 prstr.c Example 32 Example 9-2 passstr.f90Specified on the command line Disables the inclusion of source lines in the listing fileExample 33 Example Listing fileControlling vectorization Compatibility directivesCompatibility directives recognized by HP Fortran Vendor Directive CrayControlling dependence checks Controlling parallelizationControlling checks for side effects Compatibility directivesUsing Fortran directives Migrating to HP Fortran Command-line options not supportedIncompatibilities with HP Fortran Compiler limitsFloating-point constants Format field widthsIntrinsic functions Double Precision x =Data types and constants Procedure calls and definitionsDirectives Input/outputFoo**REALbar, 8 ! foo**bar KEY=Source code issues Migration issuesMigration issues MiscellaneousHP Fortran 77 directives supported by f90 options DirectivesIntrinsic functions Command-line option issuesConflicting intrinsics and libU77 routine names F77 options supported by f90Data file issues Object code issuesHP-supplied migration tools Approaches to migration$ fid +800 file.f $ fid +es program.f Compatibility extensions Porting to HP FortranCompatibility statements END structure definitionCompatibility directives Compiler directivesPointer Cray-style +Oparallel orNonstandard intrinsic procedures in HP Fortran Intrinsic proceduresDirective prefixes recognized by HP Fortran +Oparallel or +OvectorizeUninitialized variables Using porting optionsLarge word size Using porting optionsOne-trip do loops $ f90 testloop.f90External int1 Name conflictsExample 34 Example 11-1 clash.f90 Source formats Names with appended underscores+cfc Porting from Tru64 to HP FortranEscape sequences New options EnhancementsNof66alternate for +noonetrip Porting from Tru64 to HP Fortran+nopadsrc Altparam Check noboundsoptions for example, -nocheckboundsInteroperability with C Fortran 2003 FeaturesInput/output enhancements Miscellaneous enhancementsData enhancements Object orientation featuresFortran 2003 Features 153 Documentation FeedbackGlossary Glossary155 So on. See also row-major orderAlso filename extension 157 Memory faultSee ttv 159 SymbolsIndex