Home
PAPI USER'S GUIDE - VUB Parallel Computing Laboratory
Contents
1. subscribe lt mailing list gt without the single quotes If you re having trouble try sending help in the body to the same address Should you become hopelessly confused send mail to the administrator sa PAPI User s Guide Version 2 3 APPENDICES APPENDIX A TABLE OF PRESET EVENTS AND THEIR AVAILABILITY ON SOME PLATFORMS The following is a table of hardware events that are defined in the header file papiStdEventDefs h which are deemed relevant and useful in tuning application performance These events have identical assignments in the header files on different platforms however they may differ in their actual semantics Therefore all of these events are not guaranteed to be present on all platforms The table indicates which events are available on some platforms Please check your platform s documentation or run tests avail c in the papi source distribution to determine the preset events that are available on your platform Note that these values should not be changed by the user 68 PAPI User s Guide Version 2 3 Key e indicates that the preset event is available and derived by using a combination of counters e indicates that the preset event is available and not derived PRESET NAME DESCRIPTION bare Don inet ee PAPLLILICM _ Level 1 instruction cache misses Le Je Le iee PAPI 12 DCM _ Level 2 data cache misses A Le Le Le PAPI 12 ICM _ Level 2 instruction cache misses o Melie PAPI 13 DCM
2. Intel Pentium Series register_code amp Oxffffff lt lt 8 register number amp Oxff MIPS RIOK amp RI2K Low 8 bits indicate which event number 0 31 UltraSparc I II amp III 8 bit event code in bits 8 16 counter number in bits 0 7 For more information on the native encoding for your platform please see the README file for your platform in the papi source distribution 75 PAPI User s Guide Version 2 3 APPENDIX F TABLE OF OVERHEAD FOR THE VARIOUS PLATFORMS lt under development gt APPENDIX G TABLE FOR MULTIPLEXING lt under development gt APPENDIX H TABLE FOR OVERFLOW lt under development gt APPENDIX I PAPI SUPPORTED TOOLS TOOLS LINKS Cactus http www cactuscode org DEEP PAPI http www psrv com deep_papi_top html DynaProf http www cs utk edu mucci dynaprof GProf http www cs utah edu dept old texinfo as gprof toc html Paradyn http www cs wisc edu paradyn Perfometer http icl cs utk edu projects papi tools SCALEA http www par univie ac at project scalea SvPablo http vibes cs uiuc edu Software SvPablo svPablo htm TAU http www cs uoregon edu research paracomp tau Vampir http www pallas com e products vampir index htm Vprof http aros ca sandia gov cljanss perf vprof 76 PAPI User s Guide Version 2 3 BIBLIOGRAPHY Browne S J Dongarra J Garner N London K and Mucci
3. The above function allows more events to be counted than there are physical counters by timesharing the existing counters at some loss in precision This function should be used after calling PAPI library_init After this function is called the user can proceed to use the normal PAPI routines It should be also noted that applications that make no use of multiplexing should not call this function On success this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C and for a code example see the next section CONVERTING AN EVENT SET INTO A MULTIPLEXED EVENT SET In addition a standard event set can be converted to a multiplexed event set by the calling the following low level function C PAPI set_multiplex EventSet Fortran PAPIF set_multiplex EventSet 48 PAPI User s Guide Version 2 3 ARGUMENT EventSet a pointer to an integer handle for a PAPI event set as created by PAPI create _eventset The above function converts a standard PAPI event set created by a call to PAPI create_eventset into an event set capable of handling multiplexed events This function must be used after calling PAPI multiplex_init and PAPI_create_eventset but prior to calling PAPI start Events can be added to an event set either before or after converting it into a multiplexed set but the conversion must be done prior to using it as a multiplexed set In the followi
4. OUTPUT Name PAPI_TOT_INS Code 80000032 Label Instr completed Description Instructions completed Note that the event code is in hexadecimal which is consistent with all the preset translation functions The hexadecimal values of each preset are specified in the header file papiStdEventDefs h On success all the functions return PAPI_OK and on error a non zero error code is returned For more information about the preset translation functions see Appendix C 16 PAPI User s Guide Version 2 3 PAPI S COUNTER INTERFACES HIGH LEVEL API WHAT IS A HIGH LEVEL API The high level API Application Programming Interface simply provides the ability to start stop and read the counters for a specified list of events It is meant for single thread applications and for programmers wanting simple and coarse grained measurements In addition it is not thread safe and allows only PAPI preset events Some of the benefits of using the high level API rather than the low level API are that it is easier to use and requires less setup additional code It should also be noted that the high level API could be used in conjunction with the low level API and in fact does call the low level API However the high level API by itself is only able to access those events countable simultaneously by the underlying hardware There are six functions that represent the high level API that allow the user to access and count specific hardwa
5. P A Portable Programming Interface for Performance Evaluation on Modern Processors University of Tennessee Technical Report Knoxville Tennessee July 2000 http icl cs utk edu papi documents Browne S Dongarra J Garner N London K and Mucci P A Scalable Cross Platform Infrastructure for Application Performance Tuning Using Hardware Counters Proc SC 2000 November 2000 http icl cs utk edu papi documents Dongarra J London K Moore S Mucci P and Terpstra D Using PAPI for Hardware Performance Monitoring on Linux Systems Conference on Linux Clusters The HPC Revolution Urbana Illinois June 25 27 2001 http icl cs utk edu papi documents London K Moore S Mucci P Seymour K and Luczak R The PAPI Cross Platform Interface to Hardware Performance Counters Department of Defense Users Group Conference Proceedings Biloxi Mississippi June 18 21 2001 http icl cs utk edu papi documens London K Dongarra J Moore S Mucci P Seymour K and Spencer T End user Tools for Application Performance Analysis Using Hardware Counters nternational Conference on Parallel and Distributed Computing Systems Dallas TX August 8 10 2001 http icl cs utk edu papi documents Mucci P Moore S and Smeds Nils Performance Tuning Using Hardware Counter Data Proc SC 2001 November 2001 http icl cs utk edu papi documents Mucci P The IA64 Hardwar
6. PAPI PROFIL COMPRESS Ignore samples if hash buckets get big prof pointer to PAPI_sprofil_t structure profcnt number of buffers for hardware profiling reserved PAPI profil creates a histogram of overflow counts for a specified region of the application code by using its first four parameters to create the data structures needed by PAPI sprofil and then calls PAPI sprofil to do the work PAPI sprofil assumes a pre initialized PAPI sprofil_t structure and enables profiling for the EventSet based on its value Note that the EventSet must be in the stopped state in order for both calls to succeed In the following code example PAPI profil is used to generate a PC histogram Se PAPI User s Guide Version 2 3 include lt papi h gt include lt stdio h gt main int retval int EventSet PAPI NULL unsigned long start end length PAPI exe info t prginfo unsigned short profbuf Initialize the PAPI library retval PAPI library_init PAPI_ VER CURRENT if retval PAPI VER CURRENT amp retval gt 0 fprintf stderr PAPI library version mismatch 0 exit 1 if retval lt 0 handle_error retval if prginfo PAPI get executable _info NULL handle_error 1 start unsigned long prginfo gt text_start end unsigned long prginfo gt text_end length end start profbuf unsigned short malloc length sizeof unsigned short if profbuf NULL handle_err
7. PAPIF create_eventset EventSet check ARGUMENT EventSet Address of an integer location to store the new EventSet handle Note that EventSet must be initialized to PAPI_NULL before calling this function Then the user may add hardware events to the EventSet by calling PAPI_add_event or similar functions On success this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C Also for a code example see the next section ADDING EVENTS TO AN EVENT SET Hardware events can be added to an event set by calling the following the low level functions C PAPI add _event EventSet EventCode PAPI add _events EventSet EventCode number Fortran PAPIF add event Eventset EventCode check PAPIF add events ZventSet EventCode number check Roce PAPI User s Guide Version 2 3 ARGUMENTS EventSet an integer handle for a PAPI Event Set as created by PAPI create_eventset EventCode a defined event such as PAPI_TOT_INS EventCode an array of defined events number an integer indicating the number of events in the array EventCode PAPI add event adds a single hardware event to a PAPI event set PAPI add events does the same as PAPI add event but for an array of hardware event codes In the following code example the preset event PAPI TOT _INS is added to an event set On success both of these functions return PAPI_OK and on error
8. at the University of Tennessee s Innovative Computing Laboratory in the Computer Science Department This project was created to design standardize and implement a portable and efficient API Application Programming Interface to access the hardware performance counters found on most modern microprocessors BACKGROUND Hardware counters exist on every major processor today such as Intel Pentium IA 64 AMD Athlon and IBM POWER series These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved However there are only a few APIs that allow access to these counters and most of them are poorly documented unstable or unavailable In addition performance metrics may have different definitions and different programming interfaces on different platforms These considerations motivated the development of the PAPI Project Some goals of the PAPI Project are as follows e To provide a solid foundation for cross platform performance analysis tools e To present a set of standard definitions for performance metrics on all platforms e To provide a standardize API among users vendors and academics e To be easy to use well documented and freely available ARCHITECTURE The Figure below shows the internal design of the PAPI architecture In this figure we can see the two layers of the architecture The Portable La
9. event natriy 00000 Oxeiribtierir lt lt 8 001 amp OETI if PAPI acd event en nam PAPI OK handle error i For more code examples see tests native c in the papi source distribution PRESET EVENTS WHAT ARE PRESET EVENTS Preset events also known as predefined events are a common set of events deemed relevant and useful for application performance tuning These events are typically found in many CPUs that provide performance counters and give access to the memory hierarchy cache coherence protocol events cycle and instruction counts functional unit and pipeline status Furthermore preset events are mappings from symbolic names PAPI preset name to machine specific definitions native countable events for a particular hardware resource For example Total Cycles in user mode is PAPI TOT_CYC Also PAPI supports presets that may be derived from the underlying hardware metrics For example Floating Point Instructions per Second is PAPI FLOPS A preset can be either directly available as a single counter derived using a combination of counters or unavailable on any particular platform The PAPI library names approximately 100 preset events which are defined in the header file papiStdEventDefs h For a given platform a subset of these preset events can be counted though either 10 PAPI User s Guide Version 2 3 a simple high level programming interface or a more complete C or Fortran low
10. exist to call these functions PAPI describe_event is used to translate either an ASCII PAPI preset name or an integer PAPI preset event code into the corresponding event code or name as well as an ASCII description of that event If the EventName argument is a string of length gt 0 it is assumed to contain the name to look up and the corresponding event code is returned in the argument EventCode Otherwise the EventCode argument is used to look up the event name which is stored in the EventName argument Finally a descriptive string of length less than PAPI MAX_STR_LEN is copied to the argument EventDescr Note that the functionality of this call is a superset of the PAPI_event_name_to_code and PAPI event_code_to_name calls 14 PAPI User s Guide Version 2 3 PAPI label event is used to translate an integer PAPI event code into a short lt 18 character ASCII label that is more descriptive than the preset name but shorter than the description These labels can be used as event identifiers in third party tools PAPI event name to code is used to translate an ASCII PAPI preset name into an integer PAPI event code PAPI event code _to_name is used to translate an integer PAPI event code into an ASCII PAPI preset name In the following code example PAPI event name to code is used to translate a string into an integer EventCode and PAPI label event is used to label an event from an event code PAPI User s Guide Version 2 3
11. implementations the string is truncated or space padded as necessary For other implementations the length of the character array is assumed to be of sufficient size No character string longer than PAPI MAX_STR_LEN is returned by the PAPIF interface For more information on all of the function calls and their job descriptions see Appendix B for the high level functions and Appendix C for the low level functions PAPI User s Guide Version 2 3 EVENTS WHAT ARE EVENTS Events are occurrences of specific signals related to a processor s function Hardware performance counters exist as a small set of registers that count events such as cache misses and floating point operations while the program executes on the processor Monitoring these events facilitates correlation between the structure of source object code and the efficiency of the mapping of that code to the underlying architecture Each processor has a number of events that are native to and often to that architecture PAPI provides a software abstraction of these architecture dependent native events into a collection of preset events that are accessible through the PAPI interface NATIVE EVENTS WHAT ARE NATIVE EVENTS Native events comprise the set of all events that are countable by the CPU In many cases these events will be available through a matching preset PAPI event Even if no preset event is available native events can still be accessed directly These events are
12. intended to be used by people who are very familiar with the particular platform in use PAPI provides access to native events on all supported platforms through the low level interface Native events use the same interface as used when setting up a preset event but a CPU specific bit pattern is used instead of the PAPI event definition Native encoding is usually register code amp Oxffffff lt lt 8 register number amp Oxff Native encodings are platform dependent so the above native encoding may or may not work with your platform To determine the native encoding for your platform see Appendix F or the README file for your platform in the PAPI source distribution In addition the native event lists for the various platforms can be found in the processor architecture manual Native events are specified as arguments to the low level function PAPI _add_event In the following code example a native event is added by using PAPI add event with the register code 0x800000 and the register number 0x01 PAPI User s Guide Version 2 3 include lt papi h gt include lt stdio h gt main int retval EventSet PAPI NULL unsigned int native 0x0 px Taicialize the library retval PAPI library init PAPI VER CURRENT A retval PAPI_VER CURRENT Primer PAR id brary a neh ter nO ile a y exit 1 i PAPI Create eventset a Siem ss ope PAPIT ORK handle error i Add the native
13. project its motivation and its architecture IH HOW TO INSTALL PAPI ONTO YOUR SYSTEM This section provides an installation guide for PAPI It states the necessary steps in order to install PAPI on the various supported operating systems IV CAND FORTRAN CALLING INTERFACES This section states the header files in which function calls are defined and the form of the function calls for both the C and Fortran calling interfaces Also it provides a table that shows the relation between certain pseudo types and Fortran variable types V EVENTS This section provides an explanation of events as well as an explanation of native and preset events The preset query and translation functions are also discussed in this section There are code examples using native events preset query and preset translation with the corresponding output VI PAPI COUNTER INTERFACES This section discusses the high level and low level interfaces in detail The initialization and functions of these interfaces are also discussed Code examples along with the corresponding output are included as well VII PAPI TIMERS This section explains the PAPI functions associated with obtaining real and virtual time from the platform s timers Code examples along with the corresponding output are included as well de PAPI User s Guide Version 2 3 VII PAPI SYSTEM INFORMATION This section explains the PAPI functions associated with obtaining hardware and execut
14. the counters the value is approximately twice as large as the first line after reading the counters because PAPI_read_counters resets and leaves the counters running then PAPI_accum_counters adds the value of the current counter into the values array LOW LEVEL API The following is a simple code example that does the same technique as the above example except it uses the Low Level API aT PAPI User s Guide Version 2 3 PAPI User s Guide Version 2 3 POSSIBLE OUTPUT After reading the counters 440973 Arter acehiag Che GOUMCESIESs S622536 After stopping the counters 443913 Notice that in order to get the desired results the second line approximately twice as large as the first line PAPI reset was called to reset the counters since PAPI read did not reset the counters 39 PAPI User s Guide Version 2 3 PAPI TIMERS PAPI timers use the most accurate timers available on the platform in use These timers can be use to obtain both real and virtual time on each supported platform The real time clock runs all the time e g a wall clock and the virtual time clock runs only when the processor is running in user mode REAL TIME Real time can be acquired in clock cycles and microseconds by calling the following low level functions respectively C PAPI get real cyc PAPI get real usec Fortran PAPIF get real_cyc check PAPIF get real_usec check Both of these functions return the total rea
15. the event CANNOT be counted EventNote additional text information about an event if available flags provides additional information about an event e g PAPI DERIVED for an event derived from 2 or more other events Note that PAPI query_all_events_verbose is not implemented in Fortran because it returns a C pointer to an array of C structures 11 PAPI User s Guide Version 2 3 PAPI query event asks the PAPI library if the PAPI Preset event can be counted on this architecture If the event CAN be counted the function returns PAPI OK If the event CANNOT be counted the function returns an error code On some platforms this function also can be used to check the syntax of a native event PAPI_query_event_verbose asks the PAPI library for a copy of an event descriptor This descriptor can then be used to investigate the details about the event In Fortran the individual fields in the descriptor are returned as parameters PAPI query_all_ events verbose asks the PAPI library to return a pointer to an array of event descriptors The number of objects in the array is PAPI MAX _PRESET_EVENTS and each object is a descriptor as returned by PAPI_query_event_verbose 12 PAPI User s Guide Version 2 3 OUTPUT IF THE EVENT PAPI TOT_INS IS AVAILABLE ON YOUR SYSTEM In the above code example PAPI query_event is used to see if a preset PAPI_TOT_INS exists PAPI query _event verbose is used to query
16. E OUTPUT For more information on these functions see Appendix C 43 PAPI User s Guide Version 2 3 PAPI SYSTEM INFORMATION EXECUTABLE INFORMATION Information about the executable s address space can be obtained by using the following low level function C PAPI get executable_info Fortran PAPIF get exe_info fullname name text_start text_end data_start data_end bss_start bss_end lib_preload_env check ARGUMENTS The following arguments are implicit in the structure returned by the C function or explicitly returned by Fortran fullname fully qualified path filename of the executable name filename of the executable with no path information text_start text_end Start and End addresses of program text segment data_start data_end Start and End addresses of program data segment bss_start bss_end Start and End addresses of program bss segment lib_preload_env environment variable for preloading libraries Note that the arguments fext_start and text_end are the only fields that are filled on every architecture In C this function returns a pointer to a structure containing information about the current program such as the start and end addresses of the text data and bss segments In Fortran the fields of the structure are returned explicitly In the following code example PAPI get_executable_info is used to acquire information about the start and end addresses of the progra
17. HREADS A thread is an independent flow of instructions that can be scheduled to run by the operating system Multi threaded programming is a form of parallel programming where several controlled threads are executing concurrently in the program All threads execute in the same memory space and can therefore work concurrently on shared data Threads can run parallel on several processors allowing a single program to divide its work between several processors thus running faster than a single threaded program which runs on only one processor at a time PAPI only supports thread level measurements with kernel or bound threads which are threads that have a scheduling entity known and handled by the operating system s kernel In most cases like with SMP or OpenMP complier directives bound threads will be the default Each thread is responsible for the creation start stop and read of its own counters When a thread is created it inherits no PAPI information from the calling thread There are some threading packages or APIs that can be used to manipulate threads with PAPI particularly Pthreads and OpenMP For those using Pthreads the user should take care to set the scope of each thread to PTHREAD_ SCOPE SYSTEM attribute unless the system is known to have a non hybrid thread library implementation In addition PAPI does support unbound or non kernel threads but the counts will reflect the total events for the process Measurements that are
18. PAPI User s Guide Version 2 3 PAPI USER S GUIDE TABLE OF CONTENTS I Preface Intended Audience Organization of This Document Document Convention Il Introduction to PAPI What is PAPI PAPI Background Motivation PAPI Architecture Internal Design Ii How to install PAPI onto your system IV C and Fortran Calling Interfaces V Events What are Events Native Events What are Native Events Preset Events What are Preset Events Preset Query Preset Translation VI PAPT S Counter Interfaces High Level API What is a High Level API Initialization of a High Level API Reading Adding and Stopping Counters Mflops s Real Time and Processor Time Low Level API What is a Low Level API Initialization of a Low Level API Event Sets What are Event Sets Creating an Event Set Adding events to an Event Set Starting Reading Adding and Stopping events in an Event Set Resetting events in an Event Set Removing events in an Event Set Emptying and Destroying an Event Set The State of an Event Set Getting and Setting Options PAPI User s Guide Version 2 3 Simple Code Examples High Level API Low Level API VII PAPI Timers Real Time Virtual Time VII PAPI System Information Executable Information Hardware Information IX Advanced PAPI Features Multiplexing What is Multiplexing Using PAPI with Multiplexing Initialization of Multiplex Support Con
19. PAPI start_counters and return their values In the following code example PAPI read counters and PAPI stop counters are used to copy and stop event counters in an array respectively 19 PAPI User s Guide Version 2 3 On success all of these functions return PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix B MFLOPS S REAL TIME AND PROCESSOR TIME Mflops s real time and processor time can be obtained by calling the following high level function C PAPI flops real_time proc_time flpins mflops Fortran PAPIF flops real_time proc_time flpins mflops check ARGUMENTS real_time the total real time since the first PAPI flops call proc_time the total process time since the first PAPI _ flops call fIpins the total floating point instructions since the first PAPI flops call 20 PAPI User s Guide Version 2 3 mflops Mflops s achieved since the latest PAPI flops call The first call to PAPI flops initializes the PAPI library set up the counters to monitor PAPI FP_INS and PAPI _TOT_CYC events and start the counters Subsequent calls will read the counters and return total real time total process time total floating point instructions and the Mflops s rate since the last call to PAPI flops Any call with flpins 1 will reinitialize all counters to 0 Note that most platforms are only capable of counting the number
20. PI ECLOST Access to the counters was lost or interrupted 6 PAPI EBUG Internal error please send mail to the developers 7 PAPI ENOEVNT Hardware event does not exist 8 PAPI ECNFLCT Hardware event exists but cannot be counted due to counter resource limitations 9 PAPI ENOTRUN No events or event sets are currently not counting 10 PAPI EISRUN Event Set is currently running 11 PAPI ENOEVST No such event set available 12 PAPI ENOTPRESET Event is not a valid preset 13 PAPI ENOCNTR Hardware does not support performance counters 14 PAPI EMISC Unknown error code CONVERTING ERROR CODES TO ERROR MESSAGES Error codes can be converted to error messages by calling the following low level functions C PAPI perror code destination length PAPI strerror code Fortan PAPIF_perror code destination check ARGUMENTS code the error code to interpret destination the error message in quotes length either 0 or strlen destination ee PAPI User s Guide Version 2 3 PAPI perror fills the string destination with the error message corresponding to the error code code The function copies length worth of the error description string corresponding to code into destination The resulting string is always null terminated If ength is 0 then the string is printed to stderr PAPI strerror returns a pointer to the error message corresponding to the error code code If the call fails the fu
21. The following is a code example of using PAPI library _init to initialize the PAPI library On success this function returns PAPI VER CURRENT On error a positive return code other than PAPI VER CURRENT indicates a library version mismatch and a negative return code indicates an initialization error For more information on this function see Appendix C 22 PAPI User s Guide Version 2 3 EVENT SETS WHAT ARE EVENT SETS Event Sets are user defined groups of hardware events preset or native which are used in conjunction with one another to provide meaningful information such as what low level hardware counters to use the most recently read counter values the state of the Event Set running not running and optional settings e g overflow profiling Therefore Event Sets allow a highly efficient implementation and as a result users can have more detailed and accurate measurements In addition Event Sets are managed by the user through the use of integer handles which helps simplify inter language calling conventions There are no real programming restrictions on the use of Event Sets The user is free to allocate and use any number of them provided the substrate can provide the required resources They may be used simultaneously and in fact may even share counter values CREATING AN EVENT SET An event set can be created by calling the following the low level function C PAPI create_eventset EventSet Fortran
22. Version 2 3 POSSIBLE OUTPUT VARIES ON DIFFERENT SYSTEMS This system has 4 available counters On success PAPI num counters returns the number of hardware counters available on the system and on error a non zero error code is returned Optionally the PAPI library can be initialize explicitly by using PAPI library init For more information on these functions see Appendix B READING ADDING AND STOPPING COUNTERS Counters can be read added and stopped by calling the following high level functions respectively C PAPI read_counters values array_length PAPI accum_counters values array_length PAPI stop_counters values array_length Fortran PAPIF read_counters values array_length check PAPIF accum_counters values array_length check PAPIF_stop_counters values array_length check ARGUMENTS values an array where to put the counter values array_length the number of items in the values array PAPI read_counters and PAPI accum counters read copy and add the event counters into the array values respectively PAPI accum counters resets the counters to zero while PAPI_read_counters does not The counters are left running after the call of these functions Care must be exercised when using PAPI accum_counters with rate based events such as PAPI FLOPS or PAPI IPS Adding such values together is likely to produce unexpected results PAPI stop counters stops the counters started by the function
23. _ Level 3 data cache misses WWT PAPLL3ICM__ Level 3 instruction cache misses W Wei T PAPI LI TCM _ Level 1 total cache misses eiee JL PAPI 12 TCM Level 2 total cache misses A eee PAPI 13 TCM _ Level 3 total cache misses WET PAPI CA SNP Requests fora Snoop WAI Tel PAPI_CA_SHR__ Requests for access to shared cache line SMP Ief elef PAPI CA CIN _ Requests for access to clean cache line SMP I e T PAPI CA INV Cache Line Invalidation SMP II elef PAPI CA ITV Cache Line Intervention SMP Me Le JL PAPI 13 LDM Levet3 toad misses AT PAPI 13 STM Level 3 store misses AT PAPI BRU_IDL Cycles branch units are iae Me T PAPI_FXU_IDL_ Cycles integer units are ide We II PAPLFPULIDL Cycles floating point units are idle Me LLL PAPI LSU IDL Cycles Ioad store units are idle A e LLL PAPI_TLB_DM _ Data translation lookaside buffer misses A I T PAPI_TLB_IM _ Instruction translation lookaside buffer misses ee PAPI TLB TL _ otal translation lookaside buffer misses efef I Ief PAPI LI LDM level 1 toad misses eee Te 69 AMD ATHLON K7 INTEL HP ITANIUM INTEL PENTIUM HI MIPS R12K IBM POWER3 ULTRA SPARC I PAPI User s Guide Version 2 3 PRESET NAME DESCRIPTION Z je x fm lt lt PAPLLL ST ber some PAPI L2 LDM Level 2 load misses e PAPI L2 STM bevel sioro mises J PAPI BTAC M Branch target address cache BTAC misses PAPI PRF DM _ Pre fetch data in
24. _INS N Z oO j T fm a lt C PAPILFP_INS Floating point instructions exeeuted PAPI LD INS road structions executed V PAPI SRINS Store instructions executed PAPI BRINS Total branch instructions exeoued gt PAPI VEC INS Vectori SIMD instructions executed PAPI FLOPS _ Floting Point Instructions executed per second PAPLRES STL Cycles processor is stalled on resume PAPI EP STAL Cycles any FP omits areas Pari ToT cr frontes JEA PAPLIPS Instructions executed persed JJ ET PAPLLSTINS Total 1oad store instructions exeoued Selle el PAPI SYC INS Synchronization insiuctons exeoued I pari tocn fir caa canens S 1 PAPLL2 DCH fi2 cara canens S O O PAPLLI_DCA Jit daa cache ceses J PAPLL2_DCA L2 dua cache accesses J PAPLL3_DCA rs daa cacne accesses II IT PAPLLLDCR L1daacacheres _ III III PAPLL2_DCR L2 dua cachereads _ Lee IT Parts vor sauce _ III IT Papi LL DCW er dua cache wries III OA PAPI 12 DOW J12 cara cacne wrs Lee IT PAPI 13 DCW Jis cara cacne wres J O PAPLLLICH L1insructioncachents J A PAPLL2 10H zinsrucioncachenis J OA PAPLL3 ICH Jis instruction canens J_ _1_ PAPLLIICA rt instruction cache acess J JE 71 el Je le elle fe emrower I de ele Je I mteLme rrantum Pe eee vrsrs Le i eefe Jeeta sparc fede de de ite INTEL PENTIUM m PAPI User s Guide Version 2 3 PRESET NAME DESCRIPTION PAPLL2 ICA L2insructon cache accesse
25. a non zero error code is returned For more information on these functions see Appendix C STARTING READING ADDING AND STOPPING EVENTS IN AN EVENT SET 24 PAPI User s Guide Version 2 3 Hardware events in an event set can be started read added and stopped by calling the following low level functions respectively C PAPI start EventSet PAPI read EventSet values PAPI accum EventSet values PAPI stop EventSet values Fortran PAPIF start EventSet check PAPIF read EventSet values check PAPIF accum EventSet values check PAPIF_ stop EventSet values check ARGUMENTS EventSet an integer handle for a PAPI Event Set as created by PAPI create _eventset values an array to hold the counter values of the counting events PAPI start starts the counting events in a previously defined event set PAPI read reads copies the counters of the indicated event set into the array values The counters are left counting after the read without resetting PAPI accum adds the counters of the indicated event set into the array values The counters are reset and left counting after the call of this function Use caution when calling PAPI accum with rate based derived events such as PAPI FLOPS or PAPI IPS The results will most likely not be what was expected since rates are not additive PAPI stop stops the counting events in a previously defined event set and return the current events The following is a c
26. able information Code examples along with the corresponding output are included as well IX ADVANCED PAPI FEATURES This section discusses the advanced features of PAPI which includes multiplexing threads MPI overflows and statistical profiling The functions that are use to implement these features are also discussed Code examples along with the corresponding output are included as well X PAPI ERROR HANDLING This section discusses the various negative error codes that are returned by the PAPI functions A table with the names values and descriptions of the return codes are given as well as a discussion of the PAPI function that can be used to convert error codes to error messages along with a code example with the corresponding output XI PAPI MAILING LISTS This section provides information on PAPI two mailing lists for the users to ask various questions about the project XII APPENDICES These appendices provide various listings and tables such as a table of preset events and the platforms on which they are supported a table of PAPI supported tools more information on native events multiplexing overflow and etc DOCUMENT CONVENTION handle_error 1 A function that passes the argument of 1 that the user should write to handle errors PAPI User s Guide Version 2 3 INTRODUCTION TO PAPI WHAT IS PAPI PAPI is an acronym for Performance Application Programming Interface The PAPI Project is being developed
27. articular hardware event exceeds a specified threshold PAPI provides the ability to call user defined handlers when an overflow occurs which is accomplished by setting up a high resolution interval timer and installing a timer interrupt handler For the systems that do not support counter overflow at the operating system level PAPI uses the signal SIGPROF by comparing the current counter value against the threshold If the current value exceeds the threshold then the user s handler is called from within the signal context with some additional arguments These arguments allow the user to determine which event overflowed how much it overflowed and at what location in the source code Using the same mechanism as for user programmable overflow PAPI also guards against register precision overflow of counter values Each counter can potentially be incremented multiple times in a single clock cycle This fact combined with increasing clock speeds and the small precision of some of the physical counters means that an overflow is likely to occur on platforms where 64 bit counters are not supported in hardware or by the operating system In those cases the PAPI implements 64 bit counters in software using the very same mechanism that handles overflow dispatch For more information on which platforms support hardware or software overflow see Appendix I BEGINNING OVERFLOWS IN EVENT SETS An event set can begin registering overflows by calling the followin
28. ates that in order to a pass a subroutine C as an argument the subroutine must be C declared external call PAPIF thread init omo Get threac num O error On success the function PAPI thread_init returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C and for a code example of using PAPI thread_init with Pthreads see the next section 52 PAPI User s Guide Version 2 3 THREAD ID The identifier of the current thread can be obtained by calling the following low level function C PAPI thread _id Fortran PAPIF thread_id check This function calls the thread id function registered by PAPI thread _init and returns an unsigned long integer containing the thread identifier In the following code example PAPI thread_init and PAPI thread_id are used to initialize thread support in the PAPI library and to acquire the identifier of the current thread respectively with Pthreads OUTPUT On success this function returns a valid thread identifier and on error unsigned long int 1 is returned More information on this function can be found in Appendix C s53 PAPI User s Guide Version 2 3 For more code examples of using Pthreads and OpenMP with PAPI see tests zero_pthreads c and tests zero_omp c in the papi source distribution respectively Also for a code example of using SMP with PAPI see tests zero_smp c in the papi sour
29. ce distribution MPI MPI is an acronym for Message Passing Interface MPI is a library specification for message passing proposed as a standard by a broadly based committee of vendors implementers and users MPI was designed for high performance on both massively parallel machines and on workstation clusters More information on MPI can be found at http www unix mcs anl gov mpi PAPI does support MPI When using timers in applications that contain multiplexing profiling and overflow MPI uses a default virtual timer and must be converted to a real timer in order to for the application to work properly Otherwise the application will exit Optionally the supported tools Tau and SvPablo can be used to implement PAPI with MPI The following is a code example of using MPI s PI program with PAPI 54 PAPI User s Guide Version 2 3 PAPI User s Guide Version 2 3 POSSIBLE OUTPUT AFTER ENTERING 50 75 AND 100 AS INPUT the number of tervals 0 quits 50 approximately 3 14162593869230028 Error 2s 0200003 33333332097 Cae inmMloesic OF lntecvalss 0 cuits 75 approximately 3 1416074684045965 Error is 0 0000148148148034 the number of intervals 0 quits 100 approximately 3 141600 9669231354 Error as 0 000008 3333333323 cle mumos of Llntercvalss 0 cuits 0 reaching Coumeerss 117593 SISO Once Ominieias EAZA OVERFLOW WHAT IS AN OVERFLOW An overflow is when a p
30. d by the current substrate PAPI num counters initializes the PAPI library using PAPI library_init if necessary PAPI start_counters initializes the PAPI library if necessary and starts counting the events named in the events array This function implicitly stops and initializes any counters running as a result of a previous call to PAPI_start_counters It is the user s responsibility to choose events that can be counted simultaneously by reading the vendor s documentation The length of the events array should be no longer than the value returned by PAPI num counters The first call to PAPI flops only initializes the library For more information on PAPI flops see the following section Mflops s Real Time and Processor Time In the following code example PAPI num counters is used to initialize the library and to get the number of hardware counters available on the system Also PAPI start_counters is used to start counting events include lt papi h gt main int Events 2 PAPI TOT CYC PAPI TOT INS ine num hwentre 20 Tnitialize the PAPI library and get the number of counters available i num hwentres PAPI num COunters lt PAPI OK handle error ll printf This system has td available counters num_hwentrs if mom hweners gt 2 mumahwentrs 2 starte eomneing events 7 L PAPI start Counters Events num hwentrs PAPI OK handle error i 18 PAPI User s Guide
31. details about the event and PAPI _query_all events verbose is used to acquire details about all PAPI events PAPI User s Guide Version 2 3 On success PAPI query event and PAPI query event verbose return PAPI OK and on error a non zero error code is returned On success PAPI query_all events verbose returns a pointer to an array of PAPI _preset_info t structures and on error a null pointer is returned For more information about the preset query functions see Appendix C PRESET TRANSLATION A preset event can be translated to a description label number and string by calling the following low level functions respectively C PAPI describe_event EventName EventCode EventDescr PAPI label_event EventCode EventLabel PAPI event name to code EventName EventCode PAPI event code to name EventCode EventName Fortran PAPIF describe _event EventName EventCode EventDesc check PAPIF label _event ZventCode EventLabel check PAPIF event name to code EventName EventCode check PAPIF event code to name EventCode EventName check ARGUMENTS EventCode a defined event of integer type such as PAPI_TOT_INS EventName the event name such as the preset name PAPI BR_CN EventDescr a descriptive string for the event of length less than PAPI MAX STR LEN EventLabel a short descriptive label for the event of length less than 18 characters Note that the preset does not actually have to
32. done in other threads will get all the same values namely the counts for the total process For unbound threads it is not necessary to call PAPI thread_init which will be discussed in the next section 51 PAPI User s Guide Version 2 3 When threads are in use PAPI allows the user to provide a routine to its library that returns the thread ID of the currently running thread for example pthreads_self for Pthreads and this thread ID is used as a lookup function for the internal data structures INITIALIZATION OF THREAD SUPPORT Thread support in the PAPI library can be initialized by calling the following low level function C PAPI thread_init handle flag Fortran PAPIF thread_init handle flag check ARGUMENTS handle Pointer to a routine that returns the current thread ID flag This is reserved for future use and should be set to zero This function should be called only once just after PAPI library_init and before any other PAPI calls If the function is called more than once the application will exit Also applications that make no use of threads do not need to call this function The following example shows the correct syntax for using PAPI thread_init with OpenMP C include lt papi h gt include lt omp h gt iit PAPI thread init omo Get thread mnim 0 Ya PAPI OK handle error i e Fortran t imelude fpapr hz include omp h EXTERNAL omp get _ thread _ num C Fortran dict
33. e Fortran interface are defined in the header file fpapi h and consist of the following form PAPIF function_name argl arg2 check As you can probably see the C function calls have equivalent Fortran function calls PAPI lt call gt becomes PAPIF lt call gt Well this is true for most function calls except for the functions that return C pointers to structures such as PAPI_get opt and PAPI get executable_info which are either not implemented in the Fortran interface or implemented with different calling semantics In the function calls of the Fortran interface the return code of the corresponding C routine is returned in the argument check For most architectures the following relation holds between the pseudo types listed and Fortran variable types Pseudo type Fortran type Description C_INT INTEGER Default Integer type C_ FLOAT REAL Default Real type C_LONG LONG INTEGER 8 Extended size integer C_ STRING CHARACTER PAPI MAX STR LEN Fortran string C_INT FUNCTION EXTERNAL INTEGER FUNCTION Fortran function returning integer result Array arguments must be of sufficient size to hold the input output from to the subroutine for predictable behavior The array length is indicated either by the accompanying argument or by internal PAPI definitions Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran In these
34. e Performance Monitor and PAPI The 2001 International Conference on Parallel and Distributed Processing Techniques and Applications June 2001 http icl cs utk edu papi documents Mucci P PAPI The Performance Application Programming Interface April 2000 http icl cs utk edu papi documents PAPI Programmer Reference December 2001 http icl cs utk edu papi files html_man papi html PAPI Software Specification December 2001 http icl cs utk edu papi files papispec21 html POSIX Threads Programming http www llnl gov computing tutorials workshops workshop pthreads MAIN html ae
35. ed explicitly Note that if this function were called before PAPI library_init it would be undefined In the following code example PAPI get_hardware_info is used to acquire hardware information about the total number of CPUs and the cycle time of the CPU POSSIBLE OUTPUT In C on success this function returns a non NULL pointer and on error NULL is returned 46 PAPI User s Guide Version 2 3 In Fortran on success this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C ST PAPI User s Guide Version 2 3 ADVANCED PAPI FEATURES MULTIPLEXING WHAT IS MULTIPLEXING Multiplexing allows more counters to be used than what is supported by the hardware thus allowing a larger number of events to be counted simultaneously When a microprocessor has a very limited number of events that can be counted simultaneously a large application with many hours of run time may require days or weeks of profiling in order to gather enough information to base a performance analysis Therefore multiplexing overcomes this limitation by subdividing the usage of the counter hardware over time timesharing USING PAPI WITH MULTIPLEXING INITIALIZATION OF MULTIPLEX SUPPORT Multiplex support in the PAPI library can be enabled and initialized by calling the following low level function C PAPI muliplex_init Fortran PAPIF multiplex_init check
36. f is only able to access those events countable simultaneously by the underlying hardware Note that the high level interface performs initialization implicitly and is not thread safe Under the covers it calls PAPI library init PAPI_ VER CURRENT and PAPI thread_init NULL 0 Note that the High Level API fully supports both C and Fortran For full details on the calling semantics of these functions please refer to the PAPI Programmer s Reference APPENDIX C LOW LEVEL API The functions of the Low Level PAPI API provide greatly increased efficiency and functionality over the high level API presented in the previous appendix As mentioned in the introduction the low level API is only as powerful as the substrate upon which it is built Thus some features may not be available on every platform The converse may also be true that more advanced features may be available and defined in the header file The user is encouraged to read the documentation for each platform carefully Note that most functions are implemented in both C and Fortran but some are implemented in only one of these two languages For full details on the calling semantics of these functions please refer to the PAPI Programmer s Reference 35 PAPI User s Guide Version 2 3 APPENDIX D PAPI SUPPORTED PLATFORMS HARDWARE OPERATING REQUIREMENTS SYSTEM Alpha EV6 amp EV67 Tru64 Unix Contact dcpi hp com for required system software Alpha EV6 amp EV67 Probe
37. g low level function C PAPI overflow EventSet EventCode threshold flags handler Fortran NOT IMPLEMENTED 56 PAPI User s Guide Version 2 3 ARGUMENTS EventSet a reference to the event set to use EventCode the counter to be used for overflow detection threshold the overflow threshold value to use flags bit map that controls the overflow mode of operation This is currently not used and should be set to 0 handler the handler function to call upon overflow This function marks a specific EventCode in an EventSet to generate an overflow signal after every threshold events are counted Only one event in an event set can be used as an overflow trigger Subsequent calls to PAPI overflow replace earlier calls To turn off overflow set the handler to NULL In the following code example PAPI overflow is used to mark PAPI TOT_INS in order to generate an overflow signal after every 100 000 counted events 57 PAPI User s Guide Version 2 3 On success this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C and for more code examples see the tests overflow c or tests overflow_pthreads c in the papi source distribution 58 PAPI User s Guide Version 2 3 ADDRESS OF THE OVERFLOW The address where an overflow occurred can be obtained by calling the low level function C PAPI _ get_overflow_address context Fo
38. hese functions pass a pointer to the PAPI option _t structure Not all options require or return information in this structure The Fortran interface is a series of calls implementing various subsets of the C interface Not all options in C are available in Fortran Note that some options such as PAPI SET DOMAIN are also available as separate entry points in both C and Fortran The file papi h contains definitions for the structures combined in the PAPI option_t structure Users should use the definitions in papi h that correspond with the library used 34 PAPI User s Guide Version 2 3 In the following code example PAPI get opt is used to acquire the option PAPI GET MAX HWCTRS of an event set and PAPI set_opt is used to set the option PAPI _ SET DOMAIN to the same event set POSSIBLE OUTPUT VARIES ON DIFFERENT PLATFORMS On success these functions return PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix C and for more code examples see tests second c or tests third c in the PAPI source distribution 35 PAPI User s Guide Version 2 3 SIMPLE CODE EXAMPLES HIGH LEVEL API The following is a simple code example of using the high level API PAPI User s Guide Version 2 3 POSSIBLE OUTPUT After reading the counters 441027 After adding the counters 891959 After stopping the counters 443994 Notice that on the second line after adding
39. l time passed since some arbitrary starting point and are equivalent to wall clock time Also these functions always succeed error free since they are guaranteed to exist on every PAPI supported platform In the following code example PAPI get_real_cyc and PAPI get real_usec are used to obtain the real time it takes to create an event set in clock cycles and microseconds respectively 40 PAPI User s Guide Version 2 3 POSSIBLE OUTPUT For more information on these functions see Appendix C PAPI User s Guide Version 2 3 VIRTUAL TIME Virtual time can be acquired in clock cycles and microseconds by calling the following low level functions respectively C PAPI get virt_cyc PAPI get virt_usec Fortran PAPIF get virt_cyc check PAPIF get virt_usec check Both of these functions return the total number of virtual units from some arbitrary starting point Virtual units accrue every time a process is running in user mode Like the real time counters these functions always succeed error free since they are guaranteed to exist on every PAPI supported platform However the resolution can be as bad as 1 Hz as defined by the operating system on some platforms In the following code example PAPI get virt_cyc and PAPI get_virt_usec are used to obtain the virtual time it takes to create an event set in clock cycles and microseconds respectively Ade PAPI User s Guide Version 2 3 POSSIBL
40. level interface For a list and a job description of all the preset events see Appendix A The exact semantics of an event counter are platform dependent PAPI preset names are mapped onto available events in a way so it can count as many similar types of events as possible on different platforms Due to hardware implementation differences it is not necessarily feasible to directly compare the counts of a particular PAPI event obtained on different hardware platforms To determine which preset events are available on a specific platform see Appendix E or run tests avail c in the papi source distribution PRESET QUERY The following low level functions can be called to query about the existence of a preset in other words if the hardware supports that certain preset to query details about a PAPI event or to acquire details about all PAPI events respectively C PAPI query_event EventCode PAPI query_event_verbose EventCode info PAPI query_all events_verbose Fortran PAPIF query_event EventCode check PAPIF query_event_ verbose EventCode EventName EventDescr EventLabel avail EventNote flags check ARGUMENTS EventCode a defined event such as PAPI TOT_INS EventName the event name such as the preset name PAPI BR CN EventDescr a descriptive string for the event of length less than PAPI MAX STR_LEN EventLabel a short descriptive label for the event of length less than 18 characters avail zero if
41. m s text segment AA PAPI User s Guide Version 2 3 POSSIBLE OUTPUT In C on success the function returns a non NULL pointer and on error NULL is returned In Fortran on success the function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C HARDWARE INFORMATION Information about the system hardware can be obtained by using the following low level function PAPI get hardware_info Fortan PAPIF get_hardware_info ncpu nnodes totalcpus vendor vendor_string model model_string revision mhz ARGUMENTS The following arguments are implicit in the structure returned by the C function or explicitly returned by Fortran ncpu number of CPUs in an SMP Node nnodes number of Nodes in the entire system 45 PAPI User s Guide Version 2 3 totalcpus total number of CPUs in the entire system vendor vendor id number of CPU vendor_string vendor id string of CPU model model number of CPU model_string model string of CPU revision Revision number of CPU mhz Cycle time of this CPU may be an estimate generated at initial time with a quick timing routine In C this function returns a pointer to a structure containing information about the hardware on which the program runs such as the number of CPUs CPU model information and the cycle time of the CPU In Fortran the values of the structure are return
42. nction returns a NULL pointer Otherwise a non NULL pointer is returned Note that this function is not implemented in Fortran In the following code example PAPI _perror is used to convert error codes to error messages 64 PAPI User s Guide Version 2 3 PAPI User s Guide Version 2 3 OUTPUT Invalid argument Notice that the above output was generated from the last call to PAPI_perror On success PAPI perror returns PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix C 66 PAPI User s Guide Version 2 3 PAPI MAILING LISTS PAPI has the two following mailing lists for users to ask any questions about the project To contact a general users discussion list for PAPI software Send mail to ptools perfapi ptools org This list is a good place for newbie questions and general conversation about how to use PAPI or tools that use PAPI To contact a list of developers of PAPI performance tools and kernel patches Send mail to perfapi devel ptools org This list is intended for more technical discussions about PAPI It is intended for developers of PAPI and other performance tools and kernel patches to share observations and insights Interested hackers are welcomed All the CVS log messages go here To subscribe to either of these mailing lists Send a message with blank subject to majordomo ptools org In the body of the message include
43. ng code example PAPI set_multiplex is used to convert a standard event set into a multiplexed event set 49 PAPI User s Guide Version 2 3 On success both functions return PAPI OK and on error a non zero error code is returned PAPI User s Guide Version 2 3 For more information on this function see Appendix C Also for more code examples see tests multiplex1 c in the papi source distribution ISSUES OF MULTIPLEXING The following are some issues concerning multiplexing that the PAPI user should be aware of e Multiplexing is not supported by all platforms and therefore PAPI implements software multiplexing on those platforms that do not support multiplexing through the use of a high resolution interval timer For more information on which platforms support hardware or software multiplexing see Appendix H e Multiplexing unavoidably incurs a small amount of overhead and can adversely affect the accuracy of reported counter values In other words the more events that are multiplexed the more likely that the results will be incorrect The granularity of the measured regions must be increased in order to get acceptable results e To prevent naive use of multiplexing by the novice user the high level API can only access those events countable simultaneously by the underlying hardware unless a low level function has been called to explicitly enable multiplexing USING PAPI WITH PARALLEL PROGRAMS THREADS WHAT ARE T
44. ode example of using PAPI start to start the counting of events in an event set PAPI read to read the counters of the same event set into the array values and PAPI stop to stop the counting of events in the event set 25 PAPI User s Guide Version 2 3 On success these functions return PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix C PAPI User s Guide Version 2 3 RESETTING EVENTS IN AN EVENT SET The hardware event counts in an event set can be reset to zero by calling the following low level function C PAPI reset EventSet Fortran PAPI reset EventSet check ARGUMENT EventSet an integer handle for a PAPI event set as created by PAPI create _eventset Note that the event set must be running or stopped in order to call PAPI reset For example the EventSet in the code example of the previous section could have been reset to zero by adding the following lines if PAPI reset EventSet PAPI OK handle _error 1 On success this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C REMOVING EVENTS IN AN EVENT SET A hardware event and an array of hardware events can be removed from an event set by calling the following low level functions respectively C PAPI rem event EventSet EventCode PAPI rem events EventSet EventCode number Fortran PAPIF rem e
45. of floating point instructions completed This may or may not translate to your definition of floating point operations The measured rate is thus Mflops s and will in some circumstances count FMA instructions as one operation Consult the hardware documentation for your system for more details PAPI flops may be called by the user s application program and contains calls to the following functions PAPI perror PAPI library init PAPI get hardware info PAPI create eventset PAPI add_ event PAPI start PAPI get real usec PAPI accum and PAPI shutdown On success it returns PAPI_OK and on error a non zero error code is returned For more information on this function see Appendix B Also for a code example see test flops c in the papi source distribution LOW LEVEL API WHAT IS A LOW LEVEL API The low level API Application Programming Interface manages hardware events in user defined groups called Event Sets It is meant for experienced application programmers and tool developers wanting more fine grained measurements Unlike the high level interface it is thread safe and allows both PAPI preset and native events Another features of the low level API are the ability to obtain information about the executable and the hardware as well as to set options for multiplexing and overflow handling Some of the benefits of using the low level API rather than the high level API are that it increases efficiency and functionality It should al
46. ontained in the executable GNU prof in conjunction with the p option of the GCC compiler performs exactly this analysis using the process time as the overflow trigger PAPI aims to generalize this functionality so that a histogram can be generated using any countable event as the basis for analysis GENERATING A PC HISTOGRAM A PC histogram can be generated on any countable event by calling the following low level functions Cs PAPI profil buf bufsiz offset scale EventSet EventCode threshold flags PAPI sprofil prof profcnt EventSet EventCode threshold flags 59 PAPI User s Guide Version 2 3 Fortran PAPI profil buf bufsiz offset scale EventSet EventCode threshold flags check AGRUMENTS buf pointer to profile buffer array bufsiz number of entries in buf offset starting value of lowest memory address to profile scale scaling factor for bin values EventSet The PAPI EventSet to profile when it is started EventCode Code of the Event in the EventSet to profile threshold threshold value for the Event triggers the handler flags bit pattern to control profiling behavior The defined bit values for the flags variable are shown in the table below Defined bit Description PAPI PROFIL POSIX Default type of profiling PAPI PROFIL RANDOM Drop a random 25 of the samples PAPI PROFIL WEIGHTED Weight the samples by their value
47. or 1 memset profbuf 0x00 length sizeof unsigned short if PAPI create_eventset amp EventSet PAPI OK handle_error retval Add Total FP Instructions Executed to our EventSet if PAPI add event amp EventSet PAPI FP_INS PAPI OK handle_error retval if PAPI profil profbuf length start 65536 EventSet PAPI FP_INS 1000000 PAPI PROFIL POSIX PAPI OK handle_error 1 Start counting if PAPI start EventSet PAPI OK handle_error 1 Si PAPI User s Guide Version 2 3 On success these functions return PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix C and for more code examples see profile c and sprofile c in the PAPI source distribution 62 PAPI User s Guide Version 2 3 PAPI ERROR HANDLING ERROR CODES All of the functions contained in the PAPI library return standardized error codes in which the values that are greater than or equal to zero indicate success and those that are less than zero indicate failure as shown in the table below VALUE SYMBOL DEFINITION 0 PAPI OK No error 1 PAPI EINVAL Invalid argument 2 PAPI ENOMEM Insufficient memory 3 PAPI ESYS A system or C library call failed please check errno 4 PAPI ESBSTR Substrate returned an error usually the result of an unimplemented feature 5 PA
48. or patches to be installed first The general installation steps are below but first find your particular Operating System s section of the papi INSTALL file for current information on any additional steps that may be necessary General Installation 1 Pick the appropriate Makefile lt arch gt for your system in the papi source distribution edit it if necessary and compile make f Makefile lt arch gt Check for errors Look for the libpapi a and libpapi so in the current directory Optionally run the test programs in the ftests and tests directories Not all tests will succeed on all platforms run_tests sh This will run the tests in quiet mode which will print PASSED FAILED or SKIPPED Tests are SKIPPED if the functionality being tested is not supported by that platform Create a PAPI binary distribution or install PAPI directly To directly install PAPI from the build tree make f Makefile lt arch gt DESTDIR lt install dir gt install Please use an absolute pathname for lt install dir gt not a relative pathname To create a binary kit papi lt arch gt tgz make f Makefile lt arch gt dist PAPI User s Guide Version 2 3 C AND FORTRAN CALLING INTERFACES PAPI is written in C The function calls in the C interface are defined in the header file papi h and consist of the following form lt returned data type gt PAPI function_name arg arg2 The function calls in th
49. patch included AMD Athlon Linux 2 2 2 4 Mikael Pettersson s Perfctr kernel patch for Linux on web site Cray SV1 SV2 amp T3E IBM POWER3 604 amp AIX 4 3 3 Pmtoolkit from IBM alphaWorks More 604e information on web site IBM POWER4 AIX 5 1 bos pmapi must be installed POWER3 604 amp 604e through Pentium II Linux on web site through Pentium III MIPS RIOK amp R12K UltraSparc LI amp I Solaris 2 8 or newer More information about the various supported platforms can be found at http icl cs utk edu projects papi links 74 PAPI User s Guide Version 2 3 APPENDIX E TABLE OF NATIVE ENCODING FOR THE VARIOUS PLATFORMS PLATFORM NATIVE ENCODING Alpha EV6 amp EV67 register code amp Oxffffff lt lt 8 register number amp Oxff AMD Athlon event_code lt lt 8 hw_counter_num event_code 16 bit event selector code and unit mask hw_counter_num Event register number 0 through 1 Cray SV1 SV2 amp T3E mask amp 0x7 sel2 amp Oxf sell amp Oxf sel0 amp 0x1 The mask indicates which of the three counters you want counted If more than one bit is set in the mask then the counters will be summed into a single event when read The mask must be non zero IBM POWER4 Low 8 bits indicate which counter number 0 7 POWER3 604 604e Bits 8 16 indicate which event number 0 94 Intel HP Itanium register code amp Oxffffff lt lt 8 register number amp Oxff
50. re events Note that these functions can be implemented in both C and Fortran For a list and job description of all the high level functions see Appendix B Also for a code example of using the high level interface see Simple Code Examples High Level API or tests high level c in the PAPI source distribution INITIALIZATION OF A HIGH LEVEL API The PAPI library can be initialized implicitly by calling one of the following three high level functions C PAPI num counters PAPI start_counters events array_length PAPI flops real_time proc_time flpins mflops Fortran PAPIF num_counters check PAPIF start_counters events array_length check PAPIF flops real_time proc_time flpins mflops check ARGUMENTS events an array of codes for events such as PAPI_INT_ INS or a native event code array_length the number of items in the events array real_time the total real time since the first PAPI flops call proc_time the total process time since the first PAPI flops call 17 PAPI User s Guide Version 2 3 fIpins the total floating point instructions since the first PAPI flops call mflops Mflops s achieved since the latest PAPI flops call Note that one of the above functions must be called before calling any other PAPI function PAPI num counters returns the optimal length of the values array for high level functions This value corresponds to the number of hardware counters supporte
51. rtran NOT IMPLEMENTED ARGUMENT context a platform dependent structure containing information about the overflow event Typically the signal handler returns this structure automatically This function returns the instruction pointer where an overflow occurred and it is often used as part of the overflow handler routine PAPI get_overflow_address always returns the value at the offset in the context structure where the instruction pointer should be No validity testing of this structure is done If an invalid context pointer is passed to this function the results will be undefined For more information on this function see Appendix C and for code examples see the above section as well as tests overflow c and tests overflow_pthreads c in the papi source distribution STATISTICAL PROFILING WHAT IS STATISTICAL PROFILING Statistical Profiling is built upon the method of installing and emulating arbitrary callbacks on overflow Profiling work as follows when an event exceeds a threshold the signal SIGPROF is delivered with a number of arguments Among those arguments is the interrupted thread s stack pointer and register set The register set contains the program counter and the address at which the process was interrupted when the signal was delivered Performance tools like UNIX prof extract this address and hashes the value into a histogram At program completion the histogram is analyzed and associated with symbolic information c
52. s _ PAPI L3 CA Ji nsructn cache accesses TI PAPLLIIGR Ji instruction cache reas I Ie 11 PAPLL2 1Cr L2insructioncachereads J O PAPLL31GR 3 insructioncachereads II IT PAPI LCW instruction cache wies PAPI 12 16w LZ insruction cache wies II PAPI 1316W fi insructio cache wies III II ParLc rn fir warani OO PAPLL2 TCH fi waaren _ ILL I _Ie PAPL_L3_TCH fi toalcachehits II PAPLLLTCA Jir ouleacheaceeses sd 1 PAPLL2 TCA L2ouleacheaceeses _ IL dT PAPLL3_TCA fi oal cache accesses TI PAPLLI_TCR ern cachereads IT PAPLL2 TCR 2 oaleachereads PAPI 13 TCR E toaleachereads IT PAPL_LI_TCW ET toal cache wies PAPI 12 TCW 2 ml cache wries J PAPI 3 Tew S toaleache wnies PAPI FML INS Floating Muliply insmucions API FAD INS Floating Addinstructions PAPILFDV_INS Floating Divide instructions e o Ao INTEL HP ITANIUM AMD ATHLON K7 e INTEL PENTIUM III IBM POWER3 MIPS R12K ULTRA SPARC I u PAPI FSQ INS Floating Square Root instructions API FNV_IN Floating Inverse instructions EERO ANTT ALANTI E u u u L oe PAPI User s Guide Version 2 3 APPENDIX B HIGH LEVEL API The simple interface implemented by the six routines of the High Level PAPI API allows the user to access and count specific hardware events It should be noted that this API could be used in conjunction with the low level API However the high level API by itsel
53. so be noted that the low level interface could be used in conjunction with the high level interface but the user would have to be careful about initialization and threads The low level API is only as powerful as the substrate upon which it is built Thus some features may not be available on every platform The converse may also be true that more advanced features may be available on every platform and defined in the header file Therefore the user is encouraged to read the documentation for each platform carefully There are approximately 40 functions that represent the low level API where some of these functions are implemented only in C or Fortran For more information on these function and their job descriptions see Appendix C Also for a code example of using the low level interface see Simple Code Examples Low Level API or tests low_level c in the PAPI source distribution 21 PAPI User s Guide Version 2 3 INITIALIZATION OF A LOW LEVEL API The PAPI library can be initialized explicitly by calling the following low level function C PAPI library_init version Fortran PAPIF library_init check ARGUMENT version upon initialization PAPI checks the argument against the internal value of PAPI VER_CURRENT when the library was compiled This guards against portability problems when updating the PAPI shared libraries on your system Note that this function must be called before calling any other PAPI function
54. struction caused a miss PAPI L3 DCH Level 3 Data Cache Hit PAPI TLB_SD Translation lookaside buffer shootdowns SMP PAPI_CSR_FAL Failed store conditional instructions PAPI CSR SUC Successful store conditional instructions PAPI _CSR_TOT Total store conditional instructions PAPI MEM SCY lCycles Stalled Waiting for Memory Access PAPI MEM RCY PAPI MEM WCY Cycles Stalled Waiting for Memory Write PAPLSTL_ICY Cycles with No Instruction sue PAPI FUL ICY _ Cycles with Maximum Instruction Issue PAPI STL COY Cycles with No instruction Completion PAPI FUL COY Cycles with Maximum Instruction Completion LI PAPLHWINT Hardware mem OO PAPL_BR_UCN Unconditional branch insractions eccede I PAPL_BR_CN Conditional ranch instructions executed Jefe Le 1 PAPL_BR TKN Condiinal branch insructions aken Je Je e PAPI BR NTK Condiinal branch instructions norak J Tele PAPI BR MSP Conditional branch instructions misprediced ll I Le PAPI BR PRC Conditional branch instructions correctly mils predicted PAPL_FMA_INS FMA instructions completed LLL PAPI TOT MS _ otal instructions issued A e Le Le IL PAPLTOT_INS Total instructions executed te Le Je Le Le Te 70 elle Je elle te eee ie fefe em powers II Te I Le as reve rran um Le eee I el Leese PLL ee Letra srarcr IA eee reL Pentium m PAPI User s Guide Version 2 3 PRESET NAME DESCRIPTION PAPLINT
55. uccess this function returns PAPI OK and on error a non zero error code is returned For more information on this function see Appendix C GETTING AND SETTING OPTIONS PAPI User s Guide Version 2 3 The options of the PAPI library or a specific event set can be obtained and set by calling the following low level functions respectively C PAPI get_opt option ptr PAPI set_opt option ptr Fortran PAPIF get clockrate clockrate PAPIF get domain EventSet domain mode check PAPIF get granularity EventSet granularity mode check PAPIF get preload preload check ARGUMENTS option is an input parameter describing the course of action The Fortran calls are implementations of specific options Possible values are defined in papi h and briefly described below Predefined name Explanation General information requests PAPI GET CLOCKRATE Return clockrate in MHz PAPI GET MAX CPUS Return number of CPUs PAPI_ GET MAX HWCTRS Return number of counters PAPI_GET_EXEINFO Addresses for text data bss PAPI_GET_HWINFO Info about hardware PAPI GET PRELOAD Get LD PRELOAD environment equivalent Defaults for the global library PAPI_GET_DEFDOM Return the default counting domain for newly created event sets PAPI_SET_DEFDOM Set the default counting domain PAPI GET DEFGRN Return the default counting granularity PAPI SET DEFGRN Set the default counting granularity PAPI_GET DEBUG Get the PAPI debug state The a
56. vailable debug states are defined in papi h The debug state is available in ptr gt debug PAPI SET DEBUG Set the PAPI debug state Multiplexing control PAPI GET MULTIPLEX Get options for multiplexing Currently not implemented PAPI SET MULTIPLEX Set options for multiplexing Manipulating individual event sets PAPI GET DOMAIN Get domain for a single event set The event set is specified in ptr a 332 PAPI User s Guide Version 2 3 gt domain eventset PAPI SET DOMAIN Set the domain for a single event set PAPI GET GRANUL Get granularity for a single event set The event set is specified in ptr gt granularity eventset PAPI SET_ GRANUL Set the granularity for a single event set ptr is a pointer to a structure that acts as both an input and output parameter It is defined in papi h and below EventSet input a reference to an EventSetInfo structure clockrate output cycle time of this CPU in MHz may be an estimate generated at init time with a quick timing routine domain output execution domain for which events are counted granularity output execution granularity for which events are counted mode input determines if domain or granularity are default or for the current event set preload output environment variable string for preloading libraries PAPI get opt and PAPI set_opt query or change the options of the PAPI library or a specific event set created by PAPI create eventset In C interface t
57. vent EventSet EventCode check PAPIF rem_events EventSet EventCode number check ARGUMENTS EventSet an integer handle for a PAPI event set as created by PAPI create _eventset EventCode a defined event such as PAPI TOT_INS or a native event EventCode an array of defined events number an integer indicating the number of events in the array EventCode IT PAPI User s Guide Version 2 3 PAPI rem event removes a single hardware event from a PAPI event set PAPI rem events does the same as PAPI rem event but for an array of hardware event codes In the following code example PAPI rem_event is used to removed the event PAPI TOT_INS from an event set On success these functions return PAPI OK and on error a non zero error code is returned 1 For more information on these functions see Appendix C 28 PAPI User s Guide Version 2 3 EMPTYING AND DESTROYING AN EVENT SET All the events in an event set can be emptied and destroyed by calling the following low level functions respectively C PAPI cleanup _eventset EventSet PAPI destroy_eventset EventSet Fortran PAPIF cleanup_eventset EventSet check PAPIF destroy_eventset EventSet check ARGUMENT EventSet an integer handle for a PAPI event set as created by PAPI create _eventset Note that the event set must be empty in order to use PAPI_destroy_eventset In the following code example PAPI_cleanup_eventset is used to empt
58. verting an Event Set into a Multiplexed Event Set Issues of Multiplexing Using PAPI with Parallel Programs Threads What are Threads Initialization of Thread Support Thread ID MPI Overflow What is an Overflow Beginning Overflows in Event Sets Address of the Overflow Statistical Profiling What is Statistical Profiling Generating a PC Histogram X PAPI Error Handling Error Codes Converting Error Codes to Error Messages XI PAPI Mailing Lists XII Appendices Appendix A Table of Preset Events Appendix B High Level API Appendix C Low Level API Appendix D PAPI Supported Platforms Appendix E Table of Native Encoding for the Various Platforms Appendix F Table of Overhead for the Various Platforms Appendix G Table for Multiplexing Appendix H Table for Overflow Appendix I PAPI Supported Tools XII Bibliography PAPI User s Guide Version 2 3 PREFACE INTENDED AUDIENCE This document is intended to provide the PAPI user with a discussion of how to use the different components and functions of PAPI The intended users are application developers and performance tool writers who need to access performance data to tune and model application performance The user is expected to have some level of familiarity with either the C or Fortran programming language ORGANIZATION OF THIS DOCUMENT II INTRODUCTION TO PAPI This section provides an introduction to PAPI by describing the
59. y all the events from an event set and PAPI remove _eventset is used to deallocate the memory associated with the empty event set 29 PAPI User s Guide Version 2 3 On success these functions return PAPI OK and on error a non zero error code is returned For more information on these functions see Appendix C THE STATE OF AN EVENT SET The counting state of an Event Set can be obtained by calling the following low level function C PAPI state EventSet status Fortran PAPIF state EventSet status check 30 PAPI User s Guide Version 2 3 ARGUMENTS EventSet an integer handle for a PAPI event set as created by PAPI create _eventset status an integer containing a Boolean combination of one or more of the following nonzero constants as defined in the PAPI header file papi h PAPI STOPPED EventSet is stopped PAPI RUNNING EventSet is running PAPI PAUSED EventSet temporarily disabled by the library PAPI NOT INIT EventSet defined but not initialized PAPI OVERFLOWING EventSet has overflow enabled PAPI PROFILING EventSet has profiling enabled PAPI MULTIPLEXING EventSet has multiplexing enabled PAPI ACCUMULATING EventSet has accumulating enabled In the following code example PAPI state is used to return the counting state of an EventSet ale PAPI User s Guide Version 2 3 On s
60. yer consists of the API low level and high level and machine independent support functions The Machine Specific Layer defines and exports a machine independent interface to machine dependent functions and data structures These functions are defined in the substrate layer which uses kernel extensions operating system calls or assembly language to access the hardware performance counters PAPI uses the most efficient and flexible of the three depending on what is available PAPI strives to provide a uniform environment across platforms However this is not always possible Where hardware support for features such as overflows and multiplexing is not supported PAPI implements the features in software where possible Also processors do not support the same metrics thus you can monitor different events depending on the processor in use Therefore the interface oe PAPI User s Guide Version 2 3 remains constant but how it is implemented can vary Throughout this guide implementation decisions will be documented where it can make a difference to the user such as overhead costs sampling and etc PAPI High Level Portable Layer Machine Specific Layer Kernel Extension Operating System dware Performance Counters PAPI User s Guide Version 2 3 On some of the systems that PAPI supports see Appendix D you can install PAPI right out of the box without any additional setup Others require drivers
Download Pdf Manuals
Related Search
Related Contents
ASUS G751JT User's Manual dossier de création Black & Decker 6124 Instruction Manual HP 20b Business Consultant E-Filer Manual - Florida Courts E Diaporama mise en oeuvre MAEC 2015 Copyright © All rights reserved.
Failed to retrieve file