Home
VampirTrace 5.8.3 User Manual
Contents
1. VT MAX FLUSHES VT MAX THREADS VT PFORM GDIR VT PFORM LDIR VT UNIFY VT VERBOSE VT CPUIDTRACE VT ETIMESYNC VT ETIMESYNC INTV VT_IOLIB_PATHNAME VT_IOTRACE VT_LIBCTRACE 16 Purpose Global Settings Path to the application executable Section 2 3 2 Size of internal event trace buffer This is the place where event records are stored before be ing written to a file Section 3 3 Remove temporary trace files Write compressed trace files Prefix used for trace filenames Enable unique trace file naming Set to yes no or a numerical ID Section 3 1 Maximum number of buffer flushes gt Section 3 3 Maximum number of threads lt 65536 per pro cess that VampirTrace reserves resources for Name of global directory to store final trace file in Name of node local directory which can be used to store temporary trace files Unify local trace files afterwards Level of VampirTrace related information mes sages Quiet 0 Critical 1 Information 2 Optional Features Enable tracing of core ID of a CPU Section 4 4 Enable enhanced timer synchronization Section 3 7 Interval between two successive synchronization phases in s Provides an alternative library to use for LIBC I O calls Section 4 6 Enable tracing of application I O calls Section 4 6 Enable tracing of fork system exec calls Secti
2. 59 de gd D 4 How can I speed up trace unification D 4 How can I speed up trace unification vtunify is an OpenMP parallel application that operates on all local traces and produces the final OTF trace Normally it is called automatically by VampirTrace after the actual application has run to completion vtunify opens as many threads as specified by the OMP NUM THREADS environment variable If the vari able is not set it uses only a single thread so one should set OMP NUM THREADS also for applications that normally do not use OpenMP To speed up trace unification one can disable automatic trace unification by setting the environment variable VT UNIFY to no and manually unify the trace described in section 3 5 D 5 The application has run to completion but there is no otf file What can I do The absence of an otf file usually means that the trace was not unified This is the case on certain platforms e g when using DYNINST or when the local traces are not available when the application ends and VampirTrace performs trace unification In those cases uct1 files can be found in the directory of the trace file and the user needs to perform trace unification manually See Sections 3 5 and B 2 to learn more about using vtunify D 6 What limitations are associated with VT ON VT OFF Starting and stopping tracing by using the VT ON VT OFF calls is considered ad vanced usage of VampirTrace and should be performe
3. appargs tions h help Show this help message V verbose Enable verbose mod s shlib Comma separated list of shared libraries lt shlib gt which should also be instrumented b blacklist Set path of blacklist file containing lt bfile gt a newline separated list of functions which should not be instrumented p pid lt pid gt application s process id attaches the mutator to a running process app path of application executable appargs application s arguments B Command Reference B 4 Trace Filter Tool vtfilter vtfilter filter generator for VampirTrace Syntax Filter a trace file using an already existing filter file vtfilter filt filt options lt input trace file gt Generate a filter vtfilter gen gen options lt input trace file gt general options h help show this help message p show progress filt options to lt file gt output trace file name fi lt file gt input filter file nam z lt zlevel gt Set the compression level Level reaches from 0 to 9 where 0 is no compression and 9 is the highest level Standard is 4 f lt n gt Set max number of file handles available Standard is 256 gen options fo lt file gt output filter file name E lt n gt Reduce the trace size to lt n gt percent of the original size The program relies on the fact that the major part of the tra
4. LIBC implementation provides a special hook mechanism that al lows intercepting all calls to memory allocation and free functions e g malloc 24 4 Recording Additional Events and Counters realloc free This is independent from compilation or source code access but relies on the underlying system library If VampirTrace has been built with memory tracing support Appendix A VampirTrace is capable of recording memory allocation information as part of the event records To request the measurement of the application s allocated memory the user must set the environment variable VT MEMTRACE to yes Note This approach to get memory allocation information requires changing internal function pointers in a non thread safe way so VampirTrace currently does not support memory tracing for thread able programs e g programs par allelized with OpenMP or Pthreads 4 4 CPU ID Counter The GNU LIBC implementation provides a function to determine the core id of a CPU on which the calling thread is running VampirTrace uses this functionality to record the current core identifier as counter This feature can be activated by setting the environment variable VT CPUIDTRACE to yes Note To use this feature you need the GNU LIBC implementation at least in version 2 6 4 5 Pthread API Calls When tracing applications with Pthreads only user events and functions are recorded which are automatically or manually instrumented Pth
5. PAPI PAPI PAPI gt PAPI PAPI PAPI PAPI gt PAPI PAPI PAP PAP PAP PAPI PAPI PAPI PAP PAP 54 MEM RCY Cycles Stalled Waiting for memory Reads MEM WCY Cycles Stalled Waiting for memory writes STL ICY Cycles with no instruction issue FUL ICY Cycles with maximum instruction issue STL CCY Cycles with no instructions completed FUL CCY Cycles with maximum instructions completed BR UCN Unconditional branch instructions I BR CN Conditional branch instructions I BR TKN Conditional branch instructions taken I BR NTK Conditional branch instructions not taken I BR MSP Conditional branch instructions mispredicted BR PRC Conditional branch instructions correctly predicted I FMA INS FMA instructions completed I TOT IIS Instructions issued TOT INS Instructions completed I INT INS Integer instructions FP INS Floating point instructions LD INS Load instructions SR INS Store instructions BR INS Branch instructions VEC INS Vector SIMD instructions LST INS Load store instructions completed SYC INS Synchronization instructions completed FML INS Floating point multiply instructions I FAD INS Floating point add instructions I FDV INS Floating point divide instructions I FSOQO INS Floating poi
6. and syn chronized buffer flush Note Be aware that the asynchronous behavior of the application will be dis turbed since VampirTrace makes use of asynchronous MPI collective functions for timer synchronization and synchronized buffer flush Only make use of these approaches if your application does not rely on an asynchronous behavior Otherwise keep this fact in mind during the process of performance analysis lyww netlib org clapack 21 3 7 Enhanced Timer Synchronization 22 4 Recording Additional Events and Counters 4 Recording Additional Events and Counters 4 1 Hardware Performance Counters If VampirTrace has been built with hardware counter support Appendix A it is capable of recording hardware counter information as part of the event records To request the measurement of certain counters the user is required to set the environment variable VT METRICS The variable should contain a colon separated list of counter names or a predefined platform specific group The user can leave the environment variable unset to indicate that no counters are requested If any of the requested counters are not recognized or the full list of counters cannot be recorded due to hardware resource limits program execution will be aborted with an error message PAPI Hardware Performance Counters If the PAPI library is used to access hardware performance counters metric names can be any PAPI preset names or PAPI na
7. before using VampirTrace s Compiler Wrappers Section 2 1 or launching an instrumented application For example configure prefix o0pt vampirtrace make install mv opt vampirtrace SHOME vampirtrace export VT_PREFIX SHOME vampirtrace D 9 have a question that is not answered in this document You may contact us at vampirsupport zih tu dresden de for support on installing and using VampirTrace D 10 need support for additional features so I can trace application xyz Suggestions are always welcome contact vampirsupport zih tu dresden de but there is a chance that we can not implement all your wishes as our resources are limited 61 I need support for additional features so I can trace application xyz Anyways the source code of VampirTrace is open to everybody so you may implement support for new stuff yourself If you provide us with your additions afterwards we will consider merging them into the official VampirTrace package 62
8. can enable tracing of specific resource counters by setting the environment variable VT RUSAGE to a colon separated list of counter names as specified in Section C 4 For example set VT RUSAGE ru stime ru majflt to record the system time consumed by each process and the number of page faults Alternatively one can set this variable to the value a11 to enable recording of all 16 resource usage counters Note that not all counters are supported by all Unix operating systems Linux 2 6 kernels for example support only resource information for six of them See Section C 4 and the manual page of get rusage for details The resource usage counters are not recorded at every event They are only read if 100 ms have passed since the last sampling The interval can be changed by setting vT RUSAGE INTV to the number of desired milliseconds Setting VT_RUSAGE_INTV to zero leads to sampling resource usage counters at every event which may introduce a large runtime overhead Note that in most cases the operating system does not update the resource usage informa tion at the same high frequency as the hardware performance counters Setting VT_RUSAGE_INTV to a value less than 10 ms does usually not improve the gran ularity Be aware that when using the resource usage counters for multi threaded programs the information displayed is valid for the whole process and not for each single thread 4 3 Memory Allocation Counter The GNU
9. executables Note that when linking statically a warning like the following may be issued Using dlopen in statically linked applications requires at runtime the shared libraries from the glibc version used for linking This is ok as long as the mentioned libraries are available for running the application If you d like to experiment with some other I O library set the environment variable VT_IOLIB_PATHNAME to the alternative one Beware that this library must provide all I O functions mentioned above otherwise VampirTrace will abort 4 7 fork system exec Calls If VampirTrace has been built with LIBC trace support Appendix A it is capa ble of tracing programs which call functions from the LIBC exec family execl xeclp execle execv execvp execve system and fork VampirTrace records the call of the LIBC function to the trace This feature works for sequen tial i e no MPI or threaded parallelization programs only It works for both dynamically and statically linked executables Note that when linking statically a 26 4 Recording Additional Events and Counters warning like the following may be issued Using dlopen in statically linked ap plications requires at runtime the shared libraries from the glibc version used for linking This is ok as long as the mentioned libraries are available for running the application When VampirTrace detects a call of an exec function the current trace fi
10. have to be separated by and may contain occurences of for wildcard matching The user supplied filter rules will be applied before the default filter and the first match counts so it is possible to include items that would be excluded by the default filter otherwise 5 3 Function Grouping VampirTrace allows assigning functions regions to a group Groups can for in stance be highlighted by different colors in Vampir displays The following stan dard groups are created by VampirTrace 32 5 Filtering amp Grouping Group name Contained functions regions MPI MPI functions OMP OpenMP API function calls OMP SYNC OpenMP barriers OMP PREG OpenMP parallel regions Pthreads Pthread API function calls MEM Memory allocation functions Section 4 3 I O I O functions gt Section 4 6 LIBC LIBC fork system exec functions gt Section 4 7 Application remaining instrumented functions and source code regions Additionally you can create your own groups e g to better distinguish differ ent phases of an application To use function region grouping set the environ ment variable VT GROUPS SPEC to the path of a file which contains the group assignments Below there is an example of how to use group assignments Se se Sh OSE Se OSE OE SE OE H VampirTrace region groups specification group definitions and region assignments syntax lt group gt lt regions gt group group name regions semicolon separat
11. only a few seconds can result in trace files of several hundred megabytes To protect users from creating trace files of sev eral gigabytes the default behavior of VampirTrace limits the internal buffer to 32 MB per process Thus even for larger scale runs the total trace file size will be moderate Please read Section 3 3 on how to remove or change this limit VampirTrace supports various Unix and Linux platforms that are common in HPC nowadays It is available as open source software under a BSD License The following list shows a summary of all instrumentation and tracing features that VampirTrace offers Note that not all features are supported on all platforms Tracing of user functions gt Chapter 2 e Record function enter and leave events e Record name and source code location file name line e Automatic instrumentation with many compilers and via Dyninst e Manual instrumentation using VampirTrace API Thttp www tu dresden de zih otf http www vampir eu Shttp www open mpi org faq category vampirtrace MPI Tracing gt Chapter 2 e Record MPI functions e Record MPI communication participating processes transferred bytes tag communicator OpenMP Tracing Chapter 2 e OpenMP directives synchronization thread idle time e Also hybrid MPI and OpenMP applications are supported Pthread Tracing e Trace POSIX thread API calls Section 4 5 e Also hybrid MPI and POSIX threads applications are supported Jav
12. post mortem filter i e exclude functions from being recorded in the trace e Runtime grouping i e assign functions to groups for improved analysis OTF Output Chapter 3 e Writes compressed OTF files e Output as trace file statistical summary profile or both Fl ZIH nter for Information Services amp High Performance Computi 2 Instrumentation 2 Instrumentation To perform measurements with VampirTrace the user s application program needs to be instrumented i e at specific points of interest called events VampirTrace measurement calls have to be activated As an example common events are amongst others entering and leaving of functions as well as sending and receiving of MPI messages VampirTrace handles this automatically by default In order to enable the in strumentation of function calls the user only needs to replace the compiler and linker commands with VampirTrace s wrappers see Section 2 1 below Vampir Trace supports different ways of instrumentation as described in Section 2 2 2 1 Compiler Wrappers All the necessary instrumentation of user functions MPI and OpenMP events is handled by VampirTrace s compiler wrappers vtcc vtcxx vtf77 and vtf90 In the script used to build the application e g a makefile all compile and link commands should be replaced by the VampirTrace compiler wrapper The wrappers perform the necessary instrumentation of the program and link the
13. 00 0 vtce vt cce gcc vt inst compinst c bar c o bar o vtce vt cce gcc vt inst compinst foo o bar o o foo manually instrumentation by using VT s API vtf90 vt inst manual foobar F90 o foobar DVTRACE IMPORTANT Fortran source files instrumented by VT s API hav to be preprocessed by CPP B 2 Local Trace Unifier vtunify vt Sy 44 unifyl mpi local ntax vtunify mpi trace uni fier for VampirTrace lt files gt lt iprei fix gt options B Command Reference Options h help files iprefix 0 lt oprefix gt s lt statsofile gt c nocompress k keeplocal P progress q quiet vV verbos Show this help message Number of local trace files equal to of x uctl files Prefix of input trace filename Prefix of output trace filename Statistics output filename default lt oprefix gt stats Don t compress output trace files Don t remove input trace files Show progress Enable quiet mode only emergency output Increase output verbosity can be used more than once 45 B 3 Dyninst Mutator vtdyn B 3 Dyninst Mutator vtdyn vtdyn Dyninst Mutator for VampirTrace Sy Op 46 ntax vtdyn v verbose s shlib lt shlib gt b blacklist lt bfile gt p pid lt pid gt lt app gt
14. For other sys tems the default name is a ot f Optionally the trace file name can be defined manually by setting the environment variable VT FILE PREFIX to the desired name The suffix ot f will be added automatically To prevent overwriting of trace files by repetitive program runs one can enable unique trace file naming by setting vT FILE UNIQUE to yes In this case Vam pirTrace adds a unique number to the file names as soon as a second trace file with the same name is created A Lock file is used to count up the number of trace files in a directory Be aware that VampirTrace potentially overwrites an ex isting trace file if you delete this lock file The default value of VT FILE UNIQUE is no You can also set this variable to a number greater than zero which will be added to the trace file name This way you can manually control the unique file naming The default location of the final trace file is the working directory at application start time If the trace file shall be stored in another place use VT PFORM GDIR as described in Section 3 2 to change the location of the trace file 3 2 Environment Variables The following environment variables can be used to control the measurement of a Vampir Trace instrumented executable 15 3 2 Environment Variables Variable VT APPPATH VT BUFFER SIZE VT CLEAN VT COMPRESSION VT FILE PREFIX VT FILE UNIQUE
15. So all in all you should use the following order to install with correctness checking support 1 Marmot see http www hlrs de organization av amt research marmot 2 UniMCl see http www tu dresden de zih unimci 3 VampirTrace see http www tu dresden de zih vampirtrace Information on how to install Marmot and UniMCI is given in their respec tive manuals VampirTrace will automatically detect an UniMCI installation if the unimci config tool is in path 27 4 9 User defined Counters 4 9 User defined Counters In addition to the manual instrumentation gt Section 2 4 1 the VampirTrace API provides instrumentation calls which allow recording of program variable values e g iteration counts calculation results or any other numerical quantity A user defined counter is identified by its name the counter group it belongs to the type of its value integer or floating point and the unit that the value is quoted e g GFlop sec The VT_COUNT_GROUP_DEF and VT_COUNT_DEF instrumentation calls can be used to define counter groups and counters Fortran include vt user inc integer id gid VT_COUNT_GROUP_DEF name gid VT COUNT DEF name unit type gid id C CHH include vt user h unsigned int id gid gid VT COUNT GROUP DEF name id VT COUNT DEF name unit type gid The definition of a counter group i
16. a ZIH Center for Information Services amp High Performance Computing VampirTrace 5 8 3 User Manual BEH Bae EERE BEE EERE HENNE BENENE HENNE EEE HESS BENENE eee EES UR AF pefefefel eff A E E BEEBE eee E EE E EAA A AAA HE BEERS BEEBE OOOBBERESB BEBaO00 eee E BEE EE BERE TU Dresden Center for Information Services and High Performance Computing ZIH 01062 Dresden Germany http www tu dresden de zih http www tu dresden de zih vampirtrace Contact vampirsupport zih tu dresden de Contents Contents 1 Introduction 1 2 Instrumentation 5 2 1 Compiler Wrappers 2 aa va vr nrk kr kann 5 2 2 Instrumentation Types 2 sa ee kake akk lik glem el feed 7 2 3 Automatic Instrumentation 2 aa arr kr kr ra 7 2 3 1 Supported Compilers 2 4 oe ee we ee save eg 8 2 3 2 Notes for Using the GNU Intel or PathScale Compiler 8 2 3 3 Notes on Instrumentation of Inline Functions 8 2 3 4 Instrumentation of Loops with OpenUH Compiler 9 2 4 Manual Instrumentation 0 20222000 9 2 4 1 Using the VampirTrace API 9 2 4 2 Measurement Controls a a aoaaa 10 2 5 Binary Instrumentation Using Dyninst 11 2 6 Tracing Java Applications Using JVMTI 12 2 7 Tracing Calls to 3rd Party Libraries 12 3 Runtime Measurement 15 3 1 Trace File Name and Location aooo 15 3 2 Envi
17. a Tracing Chapter 2 e Record method calls e Using JVMTI as interface between VampirTrace and Java Applications 3rd Party Library tracing Section 2 7 e Trace calls to arbitrary third party libraries e Generate wrapper for library functions based on library s header file s e No recompilation of application or library is required MPI Correctness Checking Section 4 8 e Record MPI usage errors e Using UniMCI as interface between VampirTrace and a MPI correctness checking tool e g Marmot User API e Manual instrumentation of source code regions Section 2 4 1 e Measurement controls Section 2 4 2 e User defined counters Section 4 9 e User defined marker Section 4 10 Performance Counters Sections 4 1 and 4 2 e Hardware performance counters using PAPI CPC or NEC SX performance counter e Resource usage counters using getrusage Memory Tracing Section 4 3 e Trace GLIBC memory allocation and free functions e Record size of currently allocated memory as counter 1 Introduction I O Tracing Section 4 6 e Trace LIBC I O calls e Record I O events file name transferred bytes CPU ID Tracing Section 4 4 e Trace core ID of a CPU on which the calling thread is running e Record core ID as counter Fork System Exec Tracing gt Section 4 7 e Trace applications calling LIBC s fork system or one of the exec functions e Add forked processes to the trace Filtering amp Grouping Chapter 5 e Runtime and
18. alls to third party libraries which come with at least one C header file even without the library s source code If VampirTrace was built with support for library tracing the tool vt 1 ibwrapgen can be used to generate a wrapper library to intercept each call to the actual library functions This wrapper library can be linked to the application or used in combination with the LD PRELOAD mechanism provided by Linux The generation of a wrapper library is done using the vt libwrapgen command and consists of two steps 12 2 Instrumentation The first step generates a C source file providing the wrapped functions of the library header file vtlibwrapgen g SDL o SDLwrap c usr include SDL h This generates the source file SDLwrap c that contains wrapper functions for all library functions found in the header files located in usr include SDL and instructs VampirTrace to assign these functions to the new group SDL The generated wrapper source file can be edited in order to add manual in strumentation or alter attributes of the library wrapper A detailed description can be found in the generated source file or in the header file vt 1ibwrap h which can be found in the include directory of VampirTrace To adapt the library instrumentation it is possible to pass a filter file to the gen eration process The rules are like these for normal VampirTrace instrumenta tion see Section 5 1 where only 0 exclude functions a
19. apgen build build options lt input lib wrapper source file gt options gen Generate a library wrapper source file default See gen options below for valid options build Build a wrapper library from a generated source file See build options below for valid options q quiet Enable quiet mode only emergency output V verbos Increase output verbosity can be used more than once h help Show this help message gen options 0 output FILE Pathname of output wrapper source file default wrap c l shlib SHLIB Pathname of shared library that contains the actual library functions can be used more then once f filter FILE Pathname of input filter file g group NAME Separate function group name for wrapped functions S sysheader FILE Header file to be included additionally nocpp Don t use preprocessor keepcppfile Don t remove preprocessed header files cpp CPP C preprocessor command 49 B 5 Library Wra pper Generator vtlibwrapgen default cppflags CPPFLAGS C preprocessor cppdir DIR Change to this environment variables VT CPP VT CPPFLAGS C preprocessor C preprocessor build options o output PREFIX Prefix of outpu default libwr shared Do only build s static Do only build s libtool LT Libtool command CC CC C compiler comm default gcc cf
20. ce are function calls The approximation of size will get worse with a rising percentage of communication and other non function calling or performance counter records 1 lt n gt Limit the number of accepted function calls for filtered functions to lt n gt Standard is 0 ex lt f gt lt f gt Exclude certain symbols from filtering A symbol may contain 47 B 4 Trace Filter Tool vtfilter 48 in S gt S855 svi inc stats environment variables TRACEF I LT ER EXCLUDEFI wildcards Force to include certain symbols into the filter A symbol may contain wildcards Automatically include children of included functions as well into the filter Prints out the desired and the xpected percentage of file size LE Specifies a file containing a list TRACEFI ER INCLUDEFI of symbols not to be filtered The list of members can be seperated by space comma tab newline and may contain wildcards LE Specifies a file containing a list of symbols to be filtered B Command Reference B 5 Library Wrapper Generator vtlibwrapgen vtlibwrapgen library wrapper generator for VampirTrace Syntax Generate a library wrapper source file vtlibwrapgen gen options lt input header file gt input header file Build a wrapper library from a generated source file vtlibwr
21. check whether your choice works properly use the command papi_event_chooser PAPI_L 1 2 3 _ D I T JC M H A R W Level 1 2 3 data instruction total cache misses hits accesses reads writes PAPI_L 1 2 3 _ LD ST M Level 1 2 3 load store misses PAPI CA SNP Requests for a snoop PAPI_CA_SHR Requests for exclusive access to shared cache line PAPI_CA_CLN Requests for exclusive access to clean cache line PAPI CA INV Requests for cache line invalidation PAPI CA ITV Requests for cache line intervention PAPI BRU ID Cycles branch units are idle PAPI FXU ID Cycles integer units are idle PAPI FPU ID Cycles floating point units are idle PAPI LSU ID Cycles load store units are idle PAPI TLB DM Data translation lookaside buffer misses PAPI TLB IM Instruction translation lookaside buffer misses PAPI TLB TL Total translation lookaside buffer misses PAPI BTAC M Branch target address cache misses PAPI PRF DM Data prefetch cache misses PAPI TLB SD Translation lookaside buffer shootdowns PAPI CSR FAL Failed store conditional instructions PAPI CSR SUC Successful store conditional instructions PAPI CSR TOT Total store conditional instructions PAPI MEM SCY Cycles Stalled Waiting for memory accesses 53 C 1 PAPI PAP PAP PAPI gt PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAPI _
22. d with care When restart ing the recording of events the call stack of the application has to have the same depth as when stopping the recording For example this can be ensured by calling VT OFF and VT_ON in the same function In addition stopping tracing while waiting for MPI messages can cause those MPI messages not to be recorded in the trace This can cause problems when analyzing the OTF trace afterwards e g with Vampir 60 D FAQ D 7 VampirTrace warns that it cannot lock file a lock what s wrong For unique naming of multiple trace files in the same directory a file lock is created and locked for exclusive access if VT FILE UNIQUE Is set to yes Section 3 1 Some file systems do not implement file locking In this case VampirTrace still tries to name the trace files uniquely but this may fail in certain cases Alternatively you can manually control the unique file naming by setting VT FILE UNIQUE to a different numerical ID for each program run D 8 Can I re locate my VampirTrace installation without re build from source VampirTrace hard codes some directory paths in its executables and libraries based on installation paths specified by the configure script However it s possible to move an existing VampirTrace installation to another location and use it without re build from source Therefore it s necessary to set the environment variable VT PREFIX to the new installation prefix
23. e filtered or not based on the evaluation of certain parameters For more information see Section B 4 31 5 2 Java Specific Filtering Rank Specific Filtering An experimental extension allows rank specific filtering Use clauses to restrict all following filters to the given ranks The rank selection must be given as a list of lt from gt lt to gt pairs or single values 4 10 20 29 34 foo bar 2000 0 The example defines two limits for the ranks 4 10 20 29 and 34 Attention The rank specific rules are activated later than usual at MPI_Init because the ranks are not available earlier The special MPI routines MPI Init MPI Init thread and MPI Initialized cannot be filtered in this way 5 2 Java Specific Filtering For Java tracing there are additional possibilities of filtering Firstly there is a de fault filter applied The rules can be found in the filter file lt vt install gt etc vt java default filter spec Secondly user defined filters can be ap plied additionally by setting VT JAVA FILTER SPEC to a file containing the rules The syntax of the filter rules is as follows lt method thread gt lt include exclude gt lt filter string fs gt Filtering can be done on thread names and method names defined by the first parameter The second parameter determines whether the matching item shall be included for tracing or excluded from it Multiple filter strings on a line
24. e usage 100 counters in ms Filtering Grouping VT_DYN_BLACKLIST Name of blacklist file for Dyninst instrumentation Section 2 5 17 3 3 Influencing Trace Buffer Size Variable Purpose Default VT DYN SHLIBS Colon separated list of shared libraries for Dyninst instrumentation Section 2 5 VT FILTER SPEC Name of function region filter file Section 5 1 VT_GROUPS_SPEC Name of function grouping file gt Section 5 3 VT JAVA FILTER SPEC Name of Java specific filter file Section 5 2 VT GROUP CLASSES Create a group for each Java class automati yes cally VT_MAX_STACK_DEPTH Maximum number of stack level to be traced 0 0 unlimited Symbol List VT_GNU_NM Command to list symbols from object files nm Section 2 3 VT_GNU_NMFILE Name of file with symbol list information 7 gt Section 2 3 The variables VT_PFORM_GDIR VI_PFORM_LDIR VT_FILE_PREFIX may con tain sub strings of the form xyz or xyz where xyz is the name of another environment variable Evaluation of the environment variable is done at mea surement runtime When you use these environment variables make sure that they have the same value for all processes of your application on all nodes of your cluster Some cluster environments do not automatically transfer your environment when executing parts of your job on remote nodes of the cluster and you may need to explicitly set and export them
25. ed list of regions can be wildcards CALC add sub mul div USER app_ These group assignments associate the functions add sub mul and div with group CALC and all functions with the prefix app are associated with group USER 33 5 3 Function Grouping 34 A VampirTrace Installation A VampirTrace Installation A 1 Basics Building VampirTrace is typically a combination of running configure and make Execute the following commands to install VampirTrace from the direc tory at the top of the tree configure prefix where to install lots of output make all install If you need special access for installing you can execute make all as a user with write permissions in the build tree and a separate make install as a user with write permissions to the install tree However for more details also read the following instructions Sometimes it might be necessary to provide configure with options e g specifications of paths or compilers VampirTrace comes with example programs written in C C and Fortran They can be used to test different instrumentation types of the VampirTrace in stallation You can find them in the directory examples of the VampirTrace pack age Note that you should compile VampirTrace with the same compiler you use for the application to trace see D 1 A 2 Configure Options Compilers and Options Some systems require unusual options for c
26. eeds some special atten tion The compiler wrappers and OPARI are built for the front end build system whereas the VampirTrace libraries vtdyn vtunify and vt filter are built for the back end host system Some configure options which are of interest for cross compilation are shown below e Set CC CXX F77 and FC to the cross compilers installed on the front end e Set CXX_FOR_BUILD to the native compiler of the front end used to com pile compiler wrappers and OPARI only e Set host to the output of config guess on the back end e Set with cross prefix to a prefix which will be prepended to the executables of the compiler wrappers and OPARI default cross e Maybe you also need to set additional commands and flags for the back end e g RANLIB AR MPICC CXXFLAGS For example this configure command line works for an NEC SX6 system with an X86_64 based front end configure CC sxcc CXX sxc F77 sxf90 FC sxf90 MPICC sxmpicc AR sxar RANLIB sxar st CXX FOR BUILD c host sx6 nec superuxl4 1 with cross prefix sx with otf lib lotf 41 ed A 4 Environment Set Up A 4 Environment Set Up Add the bin subdirectory of the installation directory to your SPATH environment variable To use VampirTrace with Dyninst you will also need to add the lib subdirectory to your LD LIBRARY PATH environment variable for csh and tcsh gt setenv PATH lt vt install gt bi
27. fix gt If VampirTrace was built with support for OpenMP and or MPI it is possible to speedup the unification of local traces significantly To distribute the unification on multible processes the MPI parallel version vtunify mpi can be used as follow 19 vers 3 6 Synchronized Buffer Flush je mpirun np lt nranks gt vtunify mpi lt nproc gt lt prefix gt Furthermore both tools vtunify and vtunify mpi are capable to open ad ditional OpenMP threads for unification The number of threads can be specified by the OMP NUM THREADS environment variable 3 6 Synchronized Buffer Flush When tracing an application VampirTrace temporarily stores the recorded events in a trace buffer Typically if a buffer of a process or thread has reached its maxi mum fill level the buffer has to be flushed and other processes or threads maybe have to wait for this process or thread This will result in an asynchronous run time behavior To avoid this problem VampirTrace provides a buffer flush in a synchronized manner That means if one buffer has reached its minimum buffer fill level VT_SYNC_FLUSH_LEVEL gt Section 3 2 all buffers will be flushed This buffer flush is only available at appropriate points in the program flow Currently Vam pirTrace makes use of all MPI collective functions associated with MP I_COMM_WORLD Use the environment variable VT SYNC FLUSH to enable syn chronized buffer flush 3 7 Enhanced Timer S
28. g support gen libc io default automatically by config ure enable rutrace enable resource usage tracing support default enable if found by config ure enable metrics TYPE enable support for hardware performance counter papi cpc necsx default automatically by configure enable zlib enable ZLIB trace compression support default enable if found by config ure lhttp www dyninst org 36 A VampirTrace Installation enable mpi enable MPI support default enable if MPI found by configure enable fmpi lib build the MPI Fortran support library in case your system does not have a MPI Fortran library default enable if no MPI Fortran library found by configure enable fmpi handle convert do convert MPI handles default enable if MPI conversion functions found by configure enable mpi thread enable MPI 2 Thread support default enable if found by configure enable mpi2 lsided enable MPI 2 One Sided Communication support default enable if found by configure enable mpi2 extcoll enable MPI 2 Extended Collective Operation support default enable if found by configure enable mpi2 io enable MPI 2 I O support default enable if found configure enable mpicheck enable support for Universal MPI Correctness Interface UniMCI default enable if unimci config found by configure enable etimesync enable enhanced timer synchronization support default enable if C LAPACK found by c
29. havior of VampirTrace and allows to trace only parts of interests Therefore the amount of trace data can be reduced essentially To check whether if tracing is enabled or not use the call VT_IS_ON 10 2 Instrumentation Please note that stopping and starting the recording of events has to be per formed at the same call stack level If this is not the case an error message will be printed during runtime and VampirTrace will abort execution For further information have a look at the FAQ D 6 Intermediate buffer flush In addition to an automated buffer flush when the buffer is filled it is possible to flush the buffer at any point of the application This way you can guarantee that after a manual buffer flush there will be a sequence of the program with no automatic buffer flush interrupting To flush the buffer you can use the call VT BUFFER FLUSH Intermediate time synchronisation VampirTrace provides several mecha nisms for timer synchronization Section 3 7 In addition it is also possi ble to initiate a timer synchronization at any point of the application by calling VT_TIMESYNC Please note that the user has to ensure that all processes are actual at a synchronized point in the program e g at a barrier To use this call make sure that the enhanced timer synchronization is activated set the environ ment variable VT_ETIMESYNC gt Section 3 2 Intermediate counter update VampirTrace provides the functionalit
30. id VT COUNT DEF i VT COUNT TYPE UNSIGNED cgid for i 1 lt 100 i VT COUNT UNSIGNED VAL cid i return 0 For all three languages the instrumented sources have to be compiled with DVTRACE Otherwise the vT calls are ignored Optionally if the sources contain further VampirTrace API calls and only the calls for user defined counters shall be disabled then the sources have to be compiled with DVTRACE_NO_COUNT in addition to DVTRACE 29 4 10 User defined Markers 4 10 User defined Markers In addition to the manual instrumentation gt Section 2 4 1 the VampirTrace API provides instrumentation calls which allow recording of special user information which can be used to better identify parts of interest A user defined marker is identified by its name and type Fortran include vt user inc integer mid VT MARKER DEF name type mid VT MARKER mid text C C include vt_user h unsigned int mid mid VT_MARKER_DEF name type VT_MARKER mid text Types for Fortran C Ctt VT MARKER TYPE ERROR VT MARKER TYPE WARNING VT MARKER TYPE HINT For all three languages the instrumented sources have to be compiled with DVTRACE Otherwise the VT calls are ignored Optionally if the sources contain further VampirTrace API calls and only the calls for user defined markers shall be disabled then the
31. ile system As proc is not present on all operating systems automatic symbol information might not be available In this case it is necessary to set the environment variable VT APPPATH to the pathname of the application executable to get symbols resolved via nm Should any problems emerge to get symbol information automatically then the environment variable VT_GNU_NMFILE can be set to a symbol list file which is created with the command nn like nm hello gt hello nm To get the source code line for the application functions use nm 1 on Linux systems VampirTrace will include this information into the trace Note that the output format of nm must be written in BSD style See the manual page of nm to obtain help for dealing with the output format setting 2 3 3 Notes on Instrumentation of Inline Functions Compilers behave differently when they automatically instrument inlined func tions The GNU and Intel gt 10 0 compilers instrument all functions by default 2 Instrumentation when they are used with VampirTrace They therefore switch off inlining com pletely disregarding the optimization level chosen One can prevent these par ticular functions from being instrumented by appending the following attribute to function declarations hence making them able to be inlined this works only for C C attribute no instrument function The PGI and IBM compilers prefer inlining over instrumentation when com pili
32. in batch job submission scripts 3 3 Influencing Trace Buffer Size The default values of the environment variables VT BUFFER SIZE and VT_MAX_FLUSHES limit the internal buffer of VampirTrace to 32 MB per process and the number of times that the buffer is flushed to 1 respectively Events that are to be recorded after the limit has been reached are no longer written into the trace file The environment variables apply to every process of a parallel appli cation meaning that applications with n processes will typically create trace files n times the size of a serial application 18 3 Runtime Measurement To remove the limit and get a complete trace of an application set VT_MAX_FLUSHES to 0 This causes VampirTrace to always write the buffer to disk when it is full To change the size of the buffer use the environment variable VT_BUFFER_SIZE The optimal value for this variable depends on the application which is to be traced Setting a small value will increase the memory available to the application but will trigger frequent buffer flushes by VampirTrace These buffer flushes can significantly change the behavior of the application On the other hand setting a large value like 2G will minimize buffer flushes by Vam pirTrace but decrease the memory available to the application If not enough memory is available to hold the VampirTrace buffer and the application data parts of the application may be s
33. ing OpenMP applications does not work e Both compilers should have the same naming style for Fortran symbols i e uppercase lowercase appending underscores when tracing Fortran MPI applications e VampirTrace must be built to support the instrumentation type of the com piler you use for the application For example the combination of a GCC compiled VampirTrace with an Intel com piled application will work except for OpenMP But to avoid any trouble it is ad visable to compile both VampirTrace and the application with the same compiler D 2 Why does my application takes such a long time to start up If subroutines have been instrumented with automatic instrumentation by GNU Intel or PathScale compilers VampirTrace needs to look up the function names and their source code line before program start In certain cases this may take very long To accelerate this process prepare a file with symbol information using the command nm as explained in Section 2 3 and set VT_GNU_NMFILE to the pathname of this file This method prevents VampirTrace from getting the function names from the binary D 3 How can trace functions in shared libraries Functions that reside in shared libraries so cannot be traced with the GNU backend of VampirTrace This affects GNU GCC Intel and PathScale compil ers Tracing of functions in shared libraries works for the PGI compiler The workaround for tracing such functions is building a static binary
34. ions of these tion for Sun processors can be found at http www sun com processors manuals C 3 NEC SX Hardware Performance Counter This is a list of all supported hardware performance counters for NEC SX ma chines SX_CTR_STM SX_CTR_USRCC SX_CTR_EX SX_CTR_VX SX_CTR_VE SX_CTR_VECC SX_CTR_VAREC SX_CTR_VLDEC SX_CTR_FPEC SX_CTR_BCCC SX_CTR_ICMCC SX_CTR_OCMCC SX_CTR_IPHCC SX_CTR_MNCCC SX_CTR_SRACC SX_CTR_BREC System Vector Vector Vector Vector Vector timer reg User clock counter Execution counter execution counter element counter execution clock counter arithmetic execution clock counter load execution clock counter Floating point data execution counter Bank conflict clock counter Instruction cache miss clock counter Operand cache miss clock counter Memory Shared Branch SX CTR BPFC 56 Branch Instruction pipeline hold clock counter network conflict clock counter resource access clock counter execution counter prediction failure counter C Counter Specifications C 4 Resource Usage The list of resource usage counters can also be found in the manual page of getrusage Note that depending on the operating system not all fields may be maintained The fields supported by the Linux 2 6 kernel are shown in the table Name Unit Linux Description ru_utime ms xX Total amount of user time used ru stime ms X Total amount
35. lags CFLAGS C compiler flag gec flags e g I lt include dir gt preprocessing directory command flags equivalent to cpp to cppflags equivalent t wrapper library ap hared wrapper library tatic wrapper library and S ld LD linker command default CC ldflags LDFLAGS linker flags e g L lt lib dir gt default CFLAGS libs LIBS libraries to pass to the linker e g l lt library gt environment variables VT_CC C compiler command equivalent to cc VT_CFLAGS C compiler flags equivalent to cflags VT_LD linker command equivalent to ld VT_LDF LAGS linker flags equivalent to ldflags VT LIBS libraries to pass to the linker equivalent to libs examples Generating wrapper library libm_wr lt q tlibwrapgen l libm so g MATH tlibwrapgen build o libm wrap lt q 50 ap for the Math library libm so o mwrap c usr include math h mwrap c B Command Reference export LD_PRELOAD SPWD libm_wrap so libvt so 51 B 5 Library Wrapper Generator vtlibwrapgen 52 C Counter Specifications C Counter Specifications C 1 PAPI Available counter names can be queried with the PAPI commands papi avail and papi_native_avail Depending on the hardware there are limitations in the combination of different counters To
36. le is closed before executing the new program If the executed program is also instrumented with VampirTrace it will create a different trace file Note that Vam pirTrace aborts if the exec function returns unsuccessfully Calling fork in an instrumented program creates an additional process in the same trace file 4 8 MPI Correctness Checking Using UniMCI VampirTrace supports the recording of MPI correctness events e g usage of in valid MPI requests This is implemented by using the Universal MPI Correctness Interface UniMCI which provides an interface between tools like VampirTrace and existing runtime MPI correctness checking tools Correctness events are stored as markers in the trace file and are visualized by Vampir If VampirTrace is built with UniMCI support the user only has to enable MPI correctness checking This is done by merely setting the environment variable VT_MPICHECK to yes Further if your application crashes due to an MPI error you should set VT_MPICHECK_ERREXIT to yes This environmental variable forces VampirTrace to write its trace to disk and exit afterwards As a result the trace with the detected error is stored before the application might crash To install VampirTrace with correctness checking support it is necessary to have UniMCI installed on your system UniMCI in turn requires you to have a supported MPI correctness checking tool installed currently only the tool Marmot is known to have UniMCI support
37. more explains how to control the runtime measurement system during execution tracing This also includes performance counter sampling as well as selective filtering and grouping of functions 1 Introduction 1 Introduction VampirTrace consists of a tool set and a runtime library for instrumentation and tracing of software applications It is particularly tailored to parallel and dis tributed High Performance Computing HPC applications The instrumentation part modifies a given application in order to inject addi tional measurement calls during runtime The tracing part provides the actual measurement functionality used by the instrumentation calls By this means a variety of detailed performance properties can be collected and recorded dur ing runtime This includes function enter and leave events MPI communication OpenMP events and performance counters After a successful tracing run VampirTrace writes all collected data to a trace file in the Open Trace Format OTF As a result the information is available for post mortem analysis and visualization by various tools Most notably Vampir Trace provides the input data for the Vampir analysis and visualization tool VampirTrace is included in Open MPI 1 3 and later versions If not disabled explicitly VampirTrace is built automatically when installing Open MPI 3 Trace files can quickly become very large especially with automatic instru mentation Tracing applications for
38. n SPATH gt setenv LD LIBRARY PATH lt vt install gt lib SLD LIBRARY PATH for bash and sh oe export PATH lt vt install gt bin S PATH export LD LIBRARY PATH lt vt install gt lib SLD LIBRARY PATH A 5 Notes for Developers Build from SVN If you have checked out a developer s copy of VampirTrace i e checked out from CVS you should first run o bootstrap otf package lt package gt version lt version gt Note that GNU Autoconf gt 2 60 and GNU Automake gt 1 9 6 are required You can download them from http www gnu org software autoconf and http www gnu org software automake 42 B Command Reference B Command Reference B 1 Compiler Wrappers vtcc vtcxx vtf77 vtf90 vtcc vtcxx vtf77 vtf90 compiler wrappers for C C Fortran 77 Fortran 90 Syntax vt lt cc cxx f 77 90 gt vt lt cc cxx f 77 90 gt lt cmd gt vt inst lt insttype gt vt lt seq mpi mt hyb gt vt opari lt args gt vt verbose vt version vt show options vt help Show this help message vt lt cc cxx 77 90 gt lt cmd gt Set the underlying compiler command vt inst lt insttype gt Set the instrumentation type possible values compinst fully automatic by compiler manual manual by using VampirTrace s API dyninst binary by using Dyninst www dyninst org vt opari lt args gt Set options for OPARI command
39. nary In this case you should tell the compiler wrapper which parallelization method your program uses 2 Instrumentation by using the switches vt mpi vt mt and vt hyb for MPI multithreaded and hybrid programs respectively Note that these switches do not change the underlying compiler or compiler flags Use the option vt verbose to see the command line that the compiler wrapper executes See Section B 1 for a list of all compiler wrapper options The default settings of the compiler wrappers can be modified in the files share vampirtrace vtcc wrapper data txt and similar for the other languages in the installation directory of VampirTrace The settings include compilers compiler flags libraries and instrumentation types You could for instance modify the default C compiler from gcc to mpicc by changing the line compiler gcc tO compiler mpicc This may be convenient if you instrument MPI parallel programs only 2 2 Instrumentation Types The wrapper option vt inst lt insttype gt specifies the instrumentation type to be used The following values for lt insttype gt are possible e compinst Fully automatic instrumentation by the compiler Section 2 3 e manual Manual instrumentation by using VampirTrace s API Section 2 4 1 needs source code modifications e dyninst Binary instrumentation with Dyninst gt Section 2 5 To determine which instrumentation type will be used by default and which ins
40. nd 1 generally include functions are allowed The second step is to compile the generated source file o vtlibwrapgen build shared o libSDLwrap SDLwrap c This builds the shared library 1ibSDLwrap so which can be linked to the application or preloaded by using the environment variable LD PRELOAD LD_PRELOAD PWD libSDLwrap so lt executable gt For more information about the tool vt 1ibwrapgen see Section B 5 13 2 7 Tracing Calls to 3rd Party Libraries 14 3 Runtime Measurement 3 Runtime Measurement Running a VampirTrace instrumented application should normally result in an OTF trace file in the current working directory where the application was exe cuted If a problem occurs set the environment variable VT VERBOSE to 2 before executing the instrumented application in order to see control messages of the VampirTrace runtime system which might help tracking down the problem The internal buffer of VampirTrace is limited to 32 MB per process Use the environment variables VT BUFFER SIZE and VT MAX FLUSHES to increase this limit Section 3 3 contains further information on how to influence trace file size 3 1 Trace File Name and Location The default name of the trace file depends on the operating system where the application is run On Linux MacOS and Sun Solaris the trace file will be named like the application e g hello otf for the executable hello
41. ng with enabled inlining Thus one needs to disable inlining to enable the instrumentation of inline functions and vice versa The bottom line is that a function cannot be inlined and instrumented at the same time For more information on how to inline functions read your compiler s manual 2 3 4 Instrumentation of Loops with OpenUH Compiler The OpenUH compiler provides the possibility of instrumenting loops in addition to functions To use this functionality add the compiler flag OPT instr loop In this case loops induce additional events including the type of loop e g for while or do and the source code location 2 4 Manual Instrumentation 2 4 1 Using the VampirTrace API The VT USER START VT_USER_END calls can be used to instrument any user defined sequence of statements Fortran include vt user inc VT_USER_START name VT_USER_END name Cs include vt_user h VT_USER_START name VT_USER_END name If a block has several exit points as it is often the case for functions all exit points have to be instrumented with VT_USER_END too Carrer for iden Sonne amp 2 4 Manual Instrumentation For C it is simpler as is demonstrated in the following example Only entry points into a scope need to be marked The exit points are detected automatically when C deletes scope local variables Ce include vt user h VT_TRACER name The instrume
42. nn kr 42 B Command Reference 43 B 1 Compiler Wrappers vtcc vtcxx vtf77 vtf90 43 B 2 Local Trace Unifier vtunify s sa sved sakke Se sed 44 R Dyninst Mutator vtdyn 2 ske bk a FEN Ge de ew 46 B 4 Trace Filter Tool vtfilter 2va arr rann 47 B 5 Library Wrapper Generator vtlibwrapgen 49 C Counter Specifications 53 GE EE ENE 53 DE TOs ee ey bee NE Ke G es ae ar dee 55 C 3 NEC SX Hardware Performance Counter 56 C4 Resource Usage 0 cad seen eee dra hk Ga G ka 57 D FAQ 59 D 1 Can I use different compilers for VampirTrace and my application 59 D 2 Why does my application takes such a long time to startup 59 D 3 How can I trace functions in shared libraries 59 D 4 How can I speed up trace unification 60 D 5 There is no otf file What can Ido 60 D 6 What limitations are associated with VT ON VT OFF 60 D 7 VampirTrace warns that it cannot lock file a lock what s wrong 61 D 8 Can I re locate my VampirTrace installation without re build from 2 ROE EEE OS On DE RE a Om EE 61 D 9 I have a question that is not answered in this document 61 D 10 I need support for additional features so can trace application xyz 61 This documentation describes how to apply VampirTrace to an application in order to generate trace files at execution time This step is called instrumentation It further
43. nt square root instructions I FNV INS Floating point inverse instructions RES STL Cycles stalled on any resource FP STAL Cycles the FP unit s are stalled FP OPS Floating point operations I TOT CYC Total cycles I HW INT Hardware interrupts C Counter Specifications C 2 CPC Available counter names can be queried with the VampirTrace tool vtcpcavail In addition to the counter names it shows how many performance counters can be queried at a time See below for a sample output o vtcpcavail CPU performance counter interface UltraSPARC T2 Number of concurrently readable performance counters on the CPU 2 Available events AES busy cycle AES op Atomics Br completed Br taken CPU ifetch to PCX CPU Id to PCX CPU st to PCX CRC MPA cksum CRC TCPIP cksum DC miss DES 3DES busy cycle DES 3DES op C miss TLB HWIW miss L2 TLB HWIW ref L2 ITLB miss Idle strands Instr FGU arithmetic DTLB HWTW miss L2 DTLB HWTIW ref L2 DTLB miss I I I Instr cnt Instr_ld Instr_other Instr_st Instr_sw L2 dmiss_ld L2_imiss MA busy cycle MA op MD5 SHA 1 SHA 256 busy cycle MD5 SHA 1 SHA 256 op MMU ld to PCX RC4 busy cycle 55 C 3 NEC SX Hardware Performance Counter RC4 op Stream ld to PCX Stream st to PCX TLB miss See the UltraSPA events Documentat RC T2 User s Manual for descript
44. nted sources have to be compiled with DVTRACE for all three languages otherwise the vT_ calls are ignored Note that Fortran source files instrumented this way have to be preprocessed too In addition you can combine this particular instrumentation type with all other types In such a way all user functions can be instrumented by a compiler while special source code regions e g loops can be instrumented by VT s API Use VT s compiler wrapper described above for compiling and linking the instrumented source code such as e combined with automatic compiler instrumentation vtcc DVTRACE hello c o hello e without compiler instrumentation vtec vt inst manual DVTRACE hello c o hello Note that you can also use the option vt inst manual with non instru mented sources Binaries created in this manner only contain MPI and OpenMP instrumentation which might be desirable in some cases 2 4 2 Measurement Controls Switching tracing on off In addition to instrumenting arbitrary blocks of code one can use the VT_ON VT_OFF instrumentation calls to start and stop the record ing of events These constructs can be used to stop recording of events for a part of the application and later resume recording For example one could not collect trace events during the initialization phase of an application and turn on tracing for the computation part Furthermore the on off functionality can be used to control the tracing be
45. of system time used ru maxrss kB Maximum resident set size ru_ixrss kB xs Integral shared memory size text segment over the runtime ru_idrss kB xs Integral data segment memory used over the runtime ruisrss kB x s Integral stack memory used over the run time ruminflt xX Number of soft page faults i e those ser viced by reclaiming a page from the list of pages awaiting reallocation rumajflt xX Number of hard page faults i e those that required I O ru nswap Number of times a process was swapped out of physical memory ruinblock Number of input operations via the file sys tem Note This and ru_oublock do not in clude operations with the cache ru_oublock Number of output operations via the file sys tem ru msgsnd Number of IPC messages sent ru msgrev Number of IPC messages received runsignals Number of signals delivered ru_nvcsw xX Number of voluntary context switches i e because the process gave up the processor before it had to usually to wait for some re source to be available ru_nivcsw xX Number of involuntary context switches i e a higher priority process became runnable or the current process used up its time slice 57 C 4 Resource Usage 58 D FAQ D FAQ D 1 Can I use different compilers for VampirTrace and my application There are several limitations which make this generally a bad idea e Using different compilers when trac
46. ompiling or linking which the configure script does not know Run configure help for details on some of the pertinent environment variables You can pass initial values for configuration parameters to configure by set ting variables in the command line or in the environment Here is an example configure CC c89 CFLAGS 02 LIBS lposix 35 od Ke A 2 Configure Options Installation Names By default make install will install the package s files in usr local bin usr local include etc You can specify an installation prefix other than usr local by giving configure the option prefix PATH Optional Features This a summary of the most important optional features For a full list of all available features run configure help enable compinst TYPE enable support for compiler instrumentation e g gnu pgi sun default automatically by configure enable dyninst enable support for Dyninst instrumentation default enable if found by con figure Note Requires Dyninst version 5 1 or higher enable dyninst attlib build shared library which attaches Dyninst to the running application de fault enable if Dyninst found by configure and system supports shared libraries enable memt race enable memory tracing support default enable if found by configure enable cpuidtrace enable CPU ID tracing support default enable if found by configure enable libtrace LIST enable library tracin
47. on 4 7 calls Default 32M yes yes Sect 3 1 no 65536 tmp yes no no 120 no yes 3 Runtime Measurement Variable Purpose Default VT MEMTRACE Enable memory allocation counter no Section 4 3 VT_MODE Colon separated list of VampirTrace modes TRACE Tracing TRACE Profiling STAT gt Section 3 4 VT_MP ICHECK Enable MPI correctness checking via UniMCI no VT_MPICHECK_ERREXIT Force trace write and application exit if an MPI no usage error is detected VT_MP ITRACE Enable tracing of MPI events yes VT_PTHREAD_REUSE Reuse IDs of terminated Pthreads yes VT_STAT_INV Length of interval for writing the next profiling 0 record VT_STAT_PROPS Colon separated list of event types that shall be ALL recorded in profiling mode Functions FUNC Messages MSG Collective Ops COLLOP or all of them ALL Section 3 4 VT_SYNC_FLUSH Enable synchronized buffer flush no Section 3 6 VT SYNC FLUSH LEVEL Minimum buffer fill level for synchronized buffer 80 flush in percent Counters VT_METRICS Specify counter metrics to be recorded with trace events as a colon VT_METRICS_SEP separated list of names Section 4 1 VT METRICS SEP Separator string between counter specifications in VT_METRICS VT RUSAGE Colon separated list of resource usage counters which shall be recorded Section 4 2 VT_RUSAGE_INTV Sample interval for recording resourc
48. onfigure enable threads LIST enable support for threads pthread omp default automatically by con figure enable java enable Java support default enable if JVMTI found by configure Important Optional Packages This a summary of the most important optional features For a full list of all available features run configure help 37 Fe A 2 Configure Options with platform PLATFORM configure for given platform altix bgl bgp crayt3e crayxl crayxt ibm linux macos necsx origin sicortex sun generic default automatically by configure with bitmode 32 64 specify bit mode with options FILE load options from FILE default configure searches for a config file in con fig defaults based on given platform and bitmode with local tmp dir DIR give the path for node local temporary directory to store local traces to default tmp If you would like to use an external version of OTF library set with extern otf use external OTF library default not set with extern otf dir OTFDIR give the path for OTF default usr with otf flags FLAGS pass FLAGS to the OTF distribution configuration only for internal OTF version with otf lib OTFLIB use given otf lib default 1otf 1z If the supplied OTF library was built without zlib support then OTFLIB will be set to lotf with dyninst dir DYNIDIR give the path for DYNINST default usr with papi dir PAPIDIR give the pa
49. read API func tions will not be traced by default To enable tracing of all C Pthread API functions include the header vt_user h and compile the instrumented sources with DVTRACE_PTHREAD C C include vt_user h vtcc DVTRACE PTHREAD hello c o hello Note Currently Pthread instrumentation is only available for C C 25 Center for Ime Sendes amp 4 6 I O Calls 4 6 I O Calls Calls to functions which reside in external libraries can be intercepted by imple menting identical functions and linking them before the external library Such wrapper functions can record the parameters and return values of the library functions If VampirTrace has been built with I O tracing support it uses this technique for recording calls to I O functions of the standard C library which are executed by the application The following functions are intercepted by VampirTrace close creat creat64 dup dup2 fclose fentl fdopen fgetc fgets flockfile fopen fopen64 fprintf fputc fputs fread fscanf fseek fseeko fseeko64 fsetpos fsetpos64 ftrylockfile funlockfile fwrite getc gets lock f lseek lseek64 open open64 pread pread64 putc puts pwrite pwrite64 read readv rewind unlink write writev The gathered information will be saved as I O event records in the trace file This feature has to be activated for each tracing run by setting the environment variable VT IOTRACE to yes This works for both dynamically and statically linked
50. ronment Variables a ooo a vek am a 15 3 3 Influencing Trace Buffer Size aoaaa 18 3 4 Profiling an Application o oo 56252 eee ees 19 3 5 Unification of Local Traces 2 06 6 ka de we ee 19 3 6 Synchronized Buffer Flush 0 20 3 7 Enhanced Timer Synchronization 20 4 Recording Additional Events and Counters 23 4 1 Hardware Performance Counters 23 4 2 Resource Usage Counters 2 22 00 24 4 3 Memory Allocation Counter 00050008 24 4 4 CPU ID Counter once oe ed ie eh OG ee eee 25 4 5 Pthread API Calls ess we sere wad wd we ee Hees 25 46 Seek he eee AO Gk wh Ae Go Be Oe oe BO Se He 8 26 4 7 fork system exec Calls arr vr knr nr rann 26 4 8 MPI Correctness Checking Using UniMCI 27 4 9 User defined Counters 222 var rv a a 000505 av 28 Contents 4 10 User defined Markers 0020 0 0005 30 5 Filtering amp Grouping 31 51 Funcion PINGING o eevee bak dk Sak ke ee ee GE 31 5 2 Java Specific Filtering idee s sake he KEM 32 5 3 Function Grouping EE EE 32 A VampirTrace Installation 35 AT BaSS o oes a ee doe s o aha a a aa a a a a ae G id e a A a a 35 A 2 Configure Options 22 6 ss e aoo oa dha eee eee a 35 A 3 Cross Compilation aooo vr rn ran 41 AA Environment Set Up 2 5 668 054660 sek ag ak Ga 42 A 5 Notes for Developers 2 aa rv vr ka
51. rumentation vtec hello c o hello lmpi If you want to instrument MPI events only this creates smaller trace files and less overhead use the option vt inst manual to disable auto matic instrumentation of user functions see also Section 2 4 1 e Threaded parallel programs When VampirTrace detects OpenMP or Pthread flags on the command line special instrumentation calls are in voked For OpenMP events OPARI is invoked for automatic source code instrumentation original ifort lt openmp pthread gt hello f90 o hello with instrumentation vt 90 lt openmp pthread gt hello f90 o hello For more information about OPARI read the documentation available in VampirTrace s installation directory at share vampirtrace doc opari Readme html e Hybrid MPI Threaded parallel programs With a combination of the above mentioned approaches hybrid applications can be instrumented original mpif90 lt openmp pthread gt hello F90 o hello with instrumentation vt 90 vt 90 mpif90 lt openmp pthread gt hello F90 o hello The VampirTrace compiler wrappers automatically try to detect which paral lelization method is used by means of the compiler flags e g lmpi openmp or pthread and the compiler command e g mpif90 If the compiler wrap per failed to detect this correctly the instrumentation could be incomplete and an unsuitable VampirTrace library would be linked to the bi
52. s optional If no special counter group is de sired the default group User can be used In this case set the parameter gid of VT COUNT DEF to VT COUNT DEFGROUP The third parameter type of VT COUNT DEF specifies the data type of the counter value To record a value for any of the defined counters the correspond ing instrumentation call vT COUNT VAL must be invoked Fortran Type Count call Data type VT COUNT TYPE INTEGER VT COUNT INTEGER VAL integer 4 byte VT COUNT TYPE INTEGER8 VT COUNT INTEGER8 VAL integer 8 byte VT COUNT TYPE REAL VT COUNT REAL VAL real VT COUNT TYPE DOUBLE VT COUNT DOUBLE VAL double precision C C Type Count call Data type VT_COUNT_TYPE_SIGNED VT_COUNT_SIGNED_VAL signed int max 64 bit VT_COUNT_TYPE_UNSIGNED VT_COUNT_UNSIGNED_VAL unsigned int max 64 bit VT_COUNT_TYPE_FLOAT VT_COUNT_FLOAT_VAL float VT_COUNT_TYPE_DOUBLE VT_COUNT_DOUBLE_VAL double 28 4 Recording Additional Events and Counters The following example records the loop index i Fortran include vt user inc program main integer i cid cgid VT_COUNT_GROUP_DEF loopindex cgid VT_COUNT_DEF i VT_COUNT_TYPE_INTEGER cgid cid do i 1 100 VT_COUNT_INTEGER_VAL cid i end do end program main C C include vt_user h int main f unsigned int i cid cgid cgid VT_COUNT_GROUP_DEF loopindex c
53. see share vampirtrace doc opari Readme html vt lt seq mpi mt hyb gt Force application s parallelization type Necessary if this cannot be determined by underlying compiler and flags seq sequential mpi parallel uses MPI mt parallel uses OpenMP POSIX threads hyb hybrid parallel MPI Threads default automatically determining by underlying compiler and flags vt verbose Enable verbose mode 43 B 2 Local Trace Unifier vtunify En Ex Do Ins vt show lin See the man page for yo options not invoke tead show the underlying compiler the command would be execut k the program line that ted to compile and ur underlying compiler for other that can be passed through vt lt cc cxx f77 f 90 gt vironment variables VT INST Equivalent to vt inst VIT EG Equivalent to vt cc VT CXX Equivalent to vt cxx VI_F77 Equivalent to vt 77 VT F90 Equivalent to vt 90 VT CFLAGS C compiler flags VT CXXFLAGS C compiler flags VI_F77F LAGS Fortran 77 compiler flags VT_FCFLAGS Fortran 90 compiler flags VT_LDFLAGS Linker flags VT LIBS Libraries to pass to the linker The corresponding command line options overwrite the environment variables setting amples automatically instrumentation by compiler vtec vt cce gcc vt inst compinst c foo c o f
54. sources have to be compiled with DVTRACE_NO_MARKER In addition to DVTRACE 30 5 Filtering amp Grouping 5 Filtering amp Grouping 5 1 Function Filtering By default all calls of instrumented functions will be traced so that the resulting trace files can easily become very large In order to decrease the size of a trace VampirTrace allows the specification of filter directives before running an instrumented application The user can decide on how often an instrumented function region shall be recorded to a trace file To use a filter the environment variable VT FILTER SPEC needs to be defined It should contain the path and name of a file with filter directives Here is an example of a file containing filter directives VampirTrace region filter specification call limit definitions and region assignments syntax lt regions gt lt limit gt regions semicolon separated list of regions can be wildcards limit assigned call limit 0 region s denied 1 unlimited add sub mul div 1000 x 3000000 These region filter directives cause that the functions add sub mul and div be recorded at most 1000 times The remaining functions will be recorded at most 3000000 times Besides creating filter files manually you can also use the vtfilter tool to generate them automatically This tool reads a provided trace and decides whether a function should b
55. suitable VampirTrace library Note that the VampirTrace version in cluded in Open MPI 1 3 has additional wrappers mpicc vt mpicxx vt mpif77 vt and mpif90 vt which are like the ordinary MPI compiler wrappers mpicc mpicxx mpif77 and mpif90 with the extension of automatic instrumentation The following list shows some examples specific to the parallelization type of the program e Serial programs Compiling serial codes is the default behavior of the wrappers Simply replace the compiler by VampirTrace s wrapper Original gfortran hello f90 o hello with instrumentation vtf90 hello f90 o hello This will instrument user functions if supported by the compiler and link the VampirTrace library e MPI parallel programs MPI instrumentation is always handled by means of the PMPI interface which is part of the MPI standard This requires the compiler wrapper to link with an MPl aware version of the VampirTrace library If your MPI implementation uses special MPI compilers e g mpicc i Ss 2 1 Compiler Wrappers mpxIf90 you will need to tell VampirTrace s wrapper to use this compiler instead of the serial one original mpicc hello c o hello with instrumentation vtec vt cc mpice hello c o hello MPI implementations without own compilers require the user to link the MPI library manually In this case simply replace the compiler by Vampir Trace s compiler wrapper original icc hello c o hello lmpi with inst
56. th for PAPI default usr with cpc dir CPCDIR give the path for CPC default usr If you have not specified the environment variable MP ICC MPI compiler com mand use the following options to set the location of your MPI installation with mpi dir MPIDIR give the path for MPI default usr 38 A VampirTrace Installation with mpi inc dir MPIINCDIR give the path for MPI include files default SMPIDIR include with mpi lib dir MPILIBDIR give the path for MPI libraries default SMPIDIR 1lib with mpi lib use given mpi lib with pmpi lib use given pmpi lib If your system does not have an MPI Fortran library set enabl see above otherwise set with fmpi lib use given fmpi lib Use the following options to specify your MPI implementation with hpmpi set MPI libs for HP MPI with intelmpi set MPI libs for Intel MPI with intelmpi2 set MPI libs for Intel MPI2 with lam set MPI libs for LAM MPI with mpibgl set MPI libs for IBM BG L with mpibgp set MPI libs for IBM BG P with mpich set MPI libs for MPICH with mpich2 set MPI libs for MPICH2 with mvapich set MPI libs for MVAPICH fmpi lib 39 Carrer for idea Sonne amp A 2 Configure Options with mvapich2 set MPI libs for MVAPICH2 with mpisx set MPI libs for NEC MPI SX with mpisx ew set MPI libs for NEC MPI SX with 8 Byte Fortran Integer
57. tility needs to be invoked manually Sections 3 5 and B 2 To prevent certain functions from being instrumented you can set the envi ronment variable VT DYN BLACKLIST to a file containing a newline separated list of function names All additional overhead due to instrumentation of these functions will be removed VampirTrace also allows binary instrumentation of functions located in shared libraries For this to work the shared libraries have to be compiled with g and a colon separated list of their names has to be given in the environment variable VT_DYN_SHLIBS VT DYN SHLIBS libsupport so libmath so 2 6 Tracing Java Applications Using JVMTI In addition to C C and Fortran VampirTrace is capable of tracing Java appli cations This is accomplished by means of the Java Virtual Machine Tool Inter face JVMTI which is part of JDK versions 5 and later If VampirTrace was built with Java tracing support the library 1ibvt java so can be used as follows to trace any Java program o java agentlib vt java Or more easier by replacing the usal Java application launcher java by the command vt java Q vtjava When tracing Java applications you probably want to filter out dispensable function calls Please have a look at Sections 5 1 and 5 2 to learn about different ways for excluding parts of the application from tracing 2 7 Tracing Calls to 3rd Party Libraries VampirTrace is also capable to trace c
58. tive counter names For exam ple set VT_METRICS PAPI_FP_OPS PAPI_L2_TCM CPU_TEMP1 to record the number of floating point instructions and level 2 cache misses PAPI preset counters cou temperature from the Im_sensors component The leading exclamation mark let CPU TEMP1 be interpreted as absolute value counter See Section C 1 for a full list of PAPI preset counters CPC Hardware Performance Counters On Sun Solaris operating systems VampirTrace can make use of the CPC perfor mance counter library to query the processor s hardware performance counters The counters which are actually available on your platform can be queried with the tool vtcpcavail The listed names can then be used within VT_ METRICS to tell VampirTrace which counters to record 23 eee 4 2 Resource Usage Counters NEC SX Hardware Performance Counters On NEC SX machines VampirTrace uses special register calls to query the pro cessor s hardware counters Use VT METRICS to specify the counters that have to be recorded See Section C 3 for a full list of NEC SX hardware performance counters 4 2 Resource Usage Counters The Unix system call getrusage provides information about consumed re sources and operating system events of processes such as user system time received signals and context switches If VampirTrace has been built with resource usage support it is able to record this information as performance counters to the trace You
59. trumentation types are available on your system have a look at the entry inst_avail in the wrapper s configuration file e g share vampirtrace vtcc wrapper data txt in the installation directory of VampirTrace for the C compiler wrapper See Section B 1 or type vtcc vt help for other options that can be passed to VampirTrace s compiler wrapper 2 3 Automatic Instrumentation Automatic instrumentation is the most convenient method to instrument your pro gram If available simply use the compiler wrappers without any parameters e g vtf90 hello f90 o hello 2 3 Automatic Instrumentation 2 3 1 Supported Compilers VampirTrace supports following compilers for automatic instrumentation GNU i e gcc g gfortran g95 Intel version gt 10 0 i e icc icpc ifort PathScale version gt 3 1 i e pathcc pathCC pathf90 Portland Group PGI i e pgcc pgCC pgf90 pgf77 SUN Fortran 90 i e cc CC f90 IBM i e xlec xICC xlf90 NEC SX i e sxcc SXC Sxf90 OpenUH version gt 4 0 i e uncc unCC uhf90 2 3 2 Notes for Using the GNU Intel or PathScale Compiler For these compilers the command nm is required to get symbol information of the running application executable For example on Linux systems this program is a part of the GNU Binutils which is downloadable from http www gnu org software binutils To get the application executable for nm during runtime VampirTrace uses the proc f
60. wapped to disk leading to a significant change in the behavior of the application Note that you can decrease the size of trace files significantly by using the runtime function filtering as explained in Section 5 1 3 4 Profiling an Application Profiling an application collects aggregated information about certain events dur ing a program run whereas tracing records information about individual events Profiling can therefore be used to get a Summary of the program activity and to detect events that are called very often The profiling information can also be used to generate filter rules to reduce the trace file size Section 5 1 To profile an application set the variable VT MODE to STAT Setting VT_MODE to STAT TRACE tells VampirTrace to perform tracing and profiling at the same time By setting the variable VT STAT PROPS the user can influence whether functions messages and or collective operations shall be profiled See Section 3 2 for information about these environment variables 3 5 Unification of Local Traces After a run of an instrumented application the traces of the single processes need to be unified in terms of timestamps and event IDs In most cases this happens automatically If the environment variable VT UNIFY is set to no or under certain circumstances it is necessary to perform unification of local traces manually To do this use the following command o vtunify lt nproc gt lt pre
61. with openmpi set MPI libs for Open MPI with sgimpt set MPI libs for SGI MPT with sunmpi set MPI libs for SUN MPI with sunmpi mt set MPI libs for SUN MPI MT To enable enhanced timer synchronization a LAPACK library with C wrapper support is needed with clapack dir LAPACKDIR set the path for CLAPACK default usr with clapack lib set CLAPACK libs default lclapack Icblas If2c with clapack acml set CLAPACK libs for ACML with clapack essl set CLAPACK libs for ESSL with clapack mkl set CLAPACK libs for MKL with clapack sunperf set CLAPACK libs for SUN Performance Library To enable Java support the JVM Tool Interface JVMTI version 1 0 or higher is required with jvmti dir JVMTIDIR give the path for JVMTI default JAVA HOME 40 A VampirTrace Installation with jvmti inc dir JVMTIINCDIR give the path for JVMTI include files default JVMTI include To enable support for generating wrapper for 3th Party libraries the C code parser CTool is needed with ctool dir CTOOLDIR give the path for CTool default usr with ctool inc dir CTOOLINCDIR give the path for CTool include files default CTOOLDIR include with ctool lib dir CTOOLLIBDIR give the path for CTool libraries default CTOOLDIR lib with ctool lib CTOOLLIB use given CTool lib default automatically by configure A 3 Cross Compilation Building VampirTrace on cross compilation platforms n
62. y to col lect the values of arbitrary hardware counters Chosen counter values are au tomatically recorded whenever an event occurs Sometimes e g within a long lasting function it is desirable to get the counter values at an arbitrary point within the program To record the counter values at any given point you can call VT_UPDATE_COUNTER Note For all three languages the instrumented sources have to be compiled with DVTRACE Otherwise the VT calls are ignored In addition if the sources contains further VampirTrace API calls and only the calls for measurement controls shall be disabled then the sources have to be compiled with DVTRACE_NO_CONTROL too 2 5 Binary Instrumentation Using Dyninst The option vt inst dyninst Is used with the compiler wrapper to instru ment the application during runtime binary instrumentation by using Dyninst Recompiling is not necessary for this kind of instrumentation but relinking vtf90 vt inst dyninst hello o o hello The compiler wrapper dynamically links the library 1ibvt dynatt so to the luttp www dyninst org 11 rer 2 6 Tracing Java Applications Using JVMTI application This library attaches the Mutator program vtdyn during runtime which invokes the instrumentation by using the Dyninst API Note that the ap plication should have been compiled with the g switch to have visible symbol names After a tracing run with this kind of instrumentation the vtunify u
63. ynchronization Especially on cluster environments where each process has its own local timer tracing relies on precisely synchronized timers Therefore VampirTrace pro vides several mechanisms for timer synchronization The default synchroniza tion scheme is a linear synchronization at the very begin and the very end of a trace run with a master slave communication pattern However this way of synchronization can become to imprecise for long trace runs Therefore we recommend the usage of the enhanced timer synchroniza tion scheme of VampirTrace This scheme inserts additional synchronization phases at appropriate points in the program flow Currently VampirTrace makes use of all MPI collective functions associated with MPT COMM WORLD To enable this synchronization scheme a LAPACK library with C wrapper sup port has to be provided for VampirTrace and the environment variable VT_ETIMESYNC Section 3 2 has to be set before the tracing The length of the interval between two successive synchronization phases can be adjusted with VT ETIMESYNC INTV The following LAPACK libraries provide a C LAPACK API that can be used by VampirTrace for the enhanced timer synchronization 20 3 Runtime Measurement e CLAPACK e AMD ACML e IBM ESSL e Intel MKL e SUN Performance Library Note Systems equipped with a global timer do not need timer synchronization Note Itis recommended to combine enhanced timer synchronization
Download Pdf Manuals
Related Search
Related Contents
AIR TO WATER HEAT PUMP SERVICE MANUAL Versati II A electrodos sumergidos: humiSteam Modulante ナースが進める IABP・PCPS・CHDF 管理 TCI 360000 TRANS-SCAT Homeowners Guide - National Builder Supply Copyright © All rights reserved.
Failed to retrieve file