Home
PAPI USER'S GUIDE
Contents
1. if retval PAPI VER CURRENT exit 1 if num PAPI oer opt PAPI handle error fprintf stderr PAPI library init error n MAX HWCTRS NULL lt 0 printf This machine has d counters 0 num if PAPI create eventset amp Ev handle error Set the domain of this EventSet ntSet PAPI OK to counter user and kernel modes for this process memset amp options 0x0 sizeof options options domain eventset EventSet options domain domain PAPI DOM ALL if PAPI set_opt PAPI DOMAIN amp options PAPI OK handle error 39 PAPI User s Guide Version 3 5 0 POSSIBLE OUTPUT VARIES ON DIFFERENT PLATFORMS This machine has 4 counters On success these functions return PAPI OK and on error a non zero error code is returned For more code examples see ctests second c or ctests third c in the PAPI source distribution SIMPLE CODE EXAMPLES HIGH LEVEL API The following is a simple code example of using the high level API include lt papi h gt define NUM FLOPS 10000 define NUM EVENTS 1 main int Events NU EVENTS PAPI TOT INS long long values NUM EVENTS Start counting events if PAPI start counters Events NUM EVENTS PAPI OK handle error 1 Defined in tests do_ loops c in the PAPI source distribution
2. Print all the native events for this platform printf Name s nCode x n EventCodeStr EventCode while PAPI enum event amp EventCode 0 PAPI OK OUTPUT Name DATA MEM REFS Code 40000000 Name DCU LINES IN Code 40000001 Name DCU_M LINES IN Code 40000002 Name SEG REG RENAMES TOT Code 40000078 Name RET SEG RENAMES Code 40000079 The output will vary depending on the platform This was generated on an Intel Pentium III processor On success all the functions return PAPI_OK and on error a non zero error code is returned 16 PAPI User s Guide Version 3 5 0 PAPI S COUNTER INTERFACES HIGH LEVEL API WHAT IS THE HIGH LEVEL API The high level API Application Programming Interface provides the ability to start stop and read the counters for a specified list of events It is meant for programmers wanting simple event measurements using only PAPI preset events Some of the benefits of using the high level API rather than the low level API are that it is easier to use and requires less setup additional calls This ease of use comes with somewhat higher overhead and loss of flexibility It should also be noted that the high level API can be used in conjunction with the low level API and in fact does call the low level API However the high level API by itself is only able to access those events countable simultane
3. do_flops NUM_FLOPS Read the counters if PAPI read counters values NUM EVENTS PAPI OK handle error 1 printf After reading the counters lld n values 0 do flops NUM FLOPS Add the counters if PAPI accum counters values NUM EVENTS PAPI OK handle error 1 printf After adding the counters lld n values 0 do_ flops NUM_FLOPS Stop counting events if PAPI stop _counters values NUM EVENTS PAPI OK handle error 1 printf After stopping the counters lld n values 0 40 PAPI User s Guide Version 3 5 0 POSSIBLE OUTPUT After reading the counters 441027 After adding the counters 891959 After stopping the counters 443994 Notice that on the second line after adding the counters the value is approximately twice as large as the first line after reading the counters because PAPI read counters resets and leaves the counters running then PAPI accum counters adds the value of the current counter into the values array LOW LEVEL API The following is a simple code example that applies the same technique as the above example except it uses the Low Level API include lt papi h gt include lt stdio h gt define NUM FLOPS 10000 main int retval EventSet PAPI NULL long long values 1 Initialize the PAPI library retval PAPI library
4. PAPI get executable info NULL exit 1 printf Start of user program is at p n prginfo gt text start printf End of user program is at p n prginfo gt text_ end POSSIBLE OUTPUT Start of user program is at 0x4000000000000f20 End of user program is at 0x4000000000034e00 In C on success the function returns a non NULL pointer and on error NULL is returned In Fortran on success the function returns PAPI OK and on error a non zero error code is returned HARDWARE INFORMATION Information about the system hardware can be obtained by using the following low level function C PAPI get hardware info Fortran PAPIF get hardware info ncpu nnodes totalcpus vendor vendor string model model string revision mhz ARGUMENTS The following arguments are implicit in the structure returned by the C function or explicitly returned by Fortran ncpu number of CPUs in an SMP Node nnodes number of Nodes in the entire system totalcpus total number of CPUs in the entire system vendor vendor id number of CPU vendor string vendor id string of CPU model model number of CPU model string model string of CPU revision Revision number of CPU 47 PAPI User s Guide Version 3 5 0 mhz Cycle time of this CPU may be an estimate generated at initial time with a quick timing routine In C this function returns a pointer to a structure containing information a
5. PAPI get executable info Fortran PAPIF get exe info fullname name text start text end data start data end bss start bss end lib preload env check ARGUMENTS The following arguments are implicit in the structure returned by the C function or explicitly returned by Fortran fullname fully qualified path filename of the executable name filename of the executable with no path information text start text end Start and End addresses of program text segment data start data end Start and End addresses of program data segment bss start bss end Start and End addresses of program bss segment lib preload env environment variable for preloading libraries Note that the arguments text_ start and text end are the only fields that are filled on every architecture In C this function returns a pointer to a structure containing information about the current program such as the start and end addresses of the text data and bss segments In Fortran the fields of the structure are returned explicitly In the following code example PAPI get executable info is used to acquire information about the start and end addresses of the program s text segment include lt papi h gt include lt stdio h gt main 46 PAPI User s Guide Version 3 5 0 const PAPI exe info t prginfo NULL if PAPI library init PAPI VER CURRENT PAPI VER CURRENT exit 1 if prginfo
6. PAPIF stop counters values array length check ARGUMENTS values an array where to put the counter values array length the number of items in the values array PAPI read counters PAPI accum counters and PAPI stop counters all capture the values of the currently running counters into the array values Each of these functions behaves somewhat differently PAPI read counters copies the current counts into the elements of the values array resets the counters to zero and leaves the counters running PAPI accum counters adds the current counts into the elements of the values array and resets the counters to zero leaving the counters running Care should be exercised not to mix calls to PAPI_accum_counters with calls to the execution rate functions Such intermixing is likely to produce unexpected results 20 PAPI User s Guide Version 3 5 0 PAPI stop counters stops the counters and copies the current counts into the elements of the values array This call can also be used to reset the rate functions if used with a NULL pointer to the values array In the following code example PAPI read counters and PAPI stop counters are used to copy and stop event counters in an array respectively include lt papi h gt define NUM FVENTS 2 main int Events NUM EVENTS PAPI TOT INS PAPI TOT CYC long long values NUM EVENTS Start counting events if PAP
7. POSIX Threads Programming http www Inl gov computing tutorials workshops workshop pthreads MAIN htm 74
8. vent EventSet PAPI FP INS l PAPI OK handle error retval if PAPI profil profbuf length start 65536 EventSet PAPI PP INS 1000000 PAPI PROFIL POSIX PAPI PROFIL BUCKET 16 l PAPI OK handle error 1 Start counting if PAPI start EventSet l PAPI OK handle error 1 DATA AND INSTRUCTION ADDRESS RESTRICTION Introduction Performance instrumentation of data structures as opposed to code segments is a feature not widely supported across a range of platforms One platform on which this feature is supported is the Itanium2 In fact event counting on Itanium2 can be qualified by a number of conditioners including instruction address opcode matching and data address We have implemented a generalized PAPI interface for data structure and instruction range performance instrumentation also referred to as data and instruction range specification and applied that interface to the specific instance of the Itanium2 platform to demonstrate its viability This feature is being introduced for the first time in the PAPI 3 5 release The PAPI Interface Since PAPI is a platform independent library care must be taken when extending its feature set so as not to disrupt the existing interface or to clutter the API with calls to functionality that is not available on a large subset of the supported platforms To that end we elected to extend an existing PAPI call PAPI set _opt with the cap
9. 57 PAPI User s Guide Version 3 5 0 POSSIBLE OUTPUT AFTER ENTERING 50 75 AND 100 AS INPUT Enter the number of intervals 0 quits 50 pi is approximately 3 1416259869230028 Error is 0 0000333333332097 Enter the number of intervals 0 quits 75 pi is approximately 3 1416074684045965 Error is 0 0000148148148034 Enter the number of intervals 0 quits 100 pi is approximately 3 1416009869231254 Error is 0 0000083333333323 Enter the number of intervals 0 quits 0 After reading counters 117393 After stopping counters 122921 OVERFLOW WHAT IS AN OVERFLOW An overflow happens when the number of occurrences of a particular hardware event exceeds a specified threshold PAPI provides the ability to call user defined handlers when an overflow occurs This can be done in hardware if the processor generates an interrupt signal when the counter reaches a specified value or in software by setting up a high resolution interval timer and installing a timer interrupt handler For software based overflow PAPI compares the current counter value against the threshold every time the timer interrupt occurs If the current value exceeds the threshold then the user s handler is called from within the signal context with some additional arguments These arguments allow the user to determine which event overflowed by how much it overflowed and at what location in the source code the overflow occurred
10. Using the same mechanism as for user programmable overflow PAPI also guards against register precision overflow of counter values Each counter can potentially be incremented multiple times in a single clock cycle This fact combined with increasing clock speeds and the small dynamic range of some of the physical counters means that an overflow is likely to occur on platforms where 64 bit counters are not supported in hardware or by the operating system In those cases the PAPI implements 64 bit counters in software using the same mechanism that handles overflow dispatch BEGINNING OVERFLOWS IN EVENT SETS An event set can begin registering overflows by calling the following low level function C PAPI overflow EventSet EventCode threshold flags handler 58 PAPI User s Guide Version 3 5 0 ARGUMENTS EventSet a reference to the event set to use EventCode the event to be used for overflow detection threshold the overflow threshold value to use flags bit map that controls the overflow mode of operation The only currently valid setting is PAPI OVERFLOW FORCE sw Which overrides the default hardware overflow setting on a platform that supports hardware overflow handler the handler function to call upon overflow This function marks a specific EventCode in an EventSet to generate an overflow signal after every threshold events are counted Mutiple events within an event set can be program
11. the definition is platform dependent PAPI query event asks the PAPI library if the preset or native event can be counted on this architecture If the event CAN be counted the function returns PAPI_OK If the event CANNOT be counted the function returns an error code PAPI get event info asks the PAPI library for a copy of an event descriptor This descriptor can then be used to investigate the details about the event In Fortran the individual fields in the descriptor are returned as parameters PAPI enum event asks the PAPI library to return an event code for the next sequential event based on the current event code and the modifier This function can be used to enumerate all preset or native events on any platform See util papi_avail c Or util papi_ native avail c for details EXAMPLE include lt papi h gt include lt stdio h gt main int EventSet PAPI NULL unsigned int native 0x0 int retval i PAPI preset info t info PAPI preset _info_ t infostructs Initialize the library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Check to see if the preset PAPI TOT INS exists if PAPI query event PAPI TOT INS PAPI OK fprintf stderr No instruction counter How lame n exit 1 13 PAPI User s Guide Version 3 5 0 Get details about the preset
12. the events named in the events array This function implicitly stops and initializes any counters running as a result of a previous call to PAPI start counters It is the user s responsibility to choose events that can be counted simultaneously by reading the vendor s documentation The size of array_length should be no larger than the value returned by PAPI num counters In the following code example PAPI num counters is used to initialize the library and to get the number of hardware counters available on the system Also PAPI start counters is used to start counting events include lt papi h gt main int Events 2 PAPI TOT CYC PAPI TOT INS T int num_hwcntrs 0 Initialize the PAPI library and get the number of counters available if num_hwcentrs PAPI num_counters lt PAPI OK handle error 1 printf This system has sd available counters num hwcntrs if num hwcntrs gt 2 num hwcntrs 2 Start counting events if PAPI start counters Events num hwcntrs PAPI OK handle_error 1 POSSIBLE OUTPUT varies on different systems This system has 4 available counters 18 PAPI User s Guide Version 3 5 0 On success PAPI num counters returns the number of hardware counters available on the system and on error a non zero error code is returned Optionally the PAPI library can be initialized explicitly by using PAPI library init This can be useful
13. Due to hardware implementation differences it is not necessarily feasible to directly compare the counts of a particular PAPI preset event obtained on different hardware platforms EVENT QUERY The following low level functions can be called to query about the existence of a preset or native event in other words if the hardware supports that certain event and to get details about that event C PAPI query event EventCode PAPI get event _info EventCode amp info PAPI enum event amp EventCode modifier Fortran PAPIF query event EventCode check PAPIF get_event_info EventCode symbol longDescr shortbescr count note flags check PAPIF enum event amp EventCode modifier check ARGUMENTS EventCode a defined event such as PAPI TOT INS symbol the event symbol or name such as the preset name PAPI BR CN longDescr a descriptive string for the event of length less than PAPI MAX STR_LEN 12 PAPI User s Guide Version 3 5 0 shortDescr a short descriptive string for the event of length less than 18 characters count zero if the event CANNOT be counted note additional text information about an event if available flags provides additional information about an event e g PAPI_DERIVED for an event derived from 2 or more other events modifier modifies the search criteria for preset events returns all events or only available events for native events
14. PAPI TOT INS IE PAPI get event _info PAPI_ TOT INS amp info PAPI OK fprintf stderr No instruction counter How lame n exit 1 if info count gt 0 printf This event is available on this hardware Nn if info flags amp PAPI DERIVED printf This event is a derived event on this hardware n Count the number of available preset events between PAPI TOT INS and the end of the preset list retval 0 i PAPI TOT INS while PAPI enum event ki TRUE PAPI OK retvaltt OUTPUT if PAPI_TOT_INS is available on your system This event is available on this hardware In the above code example PAPI query event is used to see if a preset PAPI TOT INS exists PAPI_get_event_info is used to query details about the event and PAPI enum event is used to count the number of events in the preset list after this preset On success all three of these functions return PAPI OK and on error a non zero error code is returned 14 PAPI User s Guide Version 3 5 0 EVENT TRANSLATION A preset or native event can be referenced by name or by event code Most PAPI functions require an event code while most user input and output is in terms of names Two low level functions are provided to translate between these formats C PAPI event name to code EventName EventCode PAPI event code to name EventCode EventName Fortran PAPIF
15. PAPI thread id are used to initialize thread support in the PAPI library and to acquire the identifier of the current thread respectively with Pthreads include lt papi h gt include lt pthread h gt main unsigned long int tid if PAPI library init PAPI VER CURRENT PAPI VER CURRENT exit 1 if PAPI thread init pthread self PAPI OK exit 1 if tid PAPI thread id unsigned long int 1 exit l1 printf Initial thread id is lu n tid OUTPUT Initial thread id is 0 On success this function returns a valid thread identifier and on error unsigned long int 1 is returned Four more utility functions related to threads are available in PAPI These functions allow you to register a newly created thread to make it available for reference by PAPI to remove a registered thread in cases where thread ids may be reused by the system and to create and access thread specific storage in a platform independent fashion for use with PAPI These functions are shown below C PAPI register thread PAPI unregister thread 55 PAPI User s Guide Version 3 5 0 PAPI get thr specific tag ptr PAPI set thr specific tag ptr ARGUMENTS tag Integer value specifying one of 4 storage locations ptr Pointer to the address of a data structure For more code examples of using Pthreads and OpenMP with PAPI see cte
16. as 1 Hz as defined by the operating system on some platforms In the following code example PAPI get virt cyc and PAPI get virt usec are used to obtain the virtual time it takes to create an event set in clock cycles and microseconds respectively include lt papi h gt main long long start cycles end cycles start _usec end usec 44 PAPI User s Guide Version 3 5 0 int EventSet PAPI NULL if PAPI library init PAPI_ VER CURRENT PAPI VER CURRENT exit 1 Gets the starting time in clock cycles start cycles PAPI get _virt_cyc Gets the starting time in microseconds start_usec PAPI get _virt_usec Create an EventSet if PAPI create eventset amp EventSet PAPI OK exit 1 Gets the ending time in clock cycles end cycles PAPI get virt cyc Gets the ending time in microseconds end usec PAPI get _virt_usec printf Virtual clock cycles lld n end cycles start cycles prinf Virtual clock time in microseconds lld n end usec start_usec POSSIBLE OUTPUT Virtual clock cycles 715408 Virtual clock time in microseconds 976 45 PAPI User s Guide Version 3 5 0 PAPI SYSTEM INFORMATION EXECUTABLE INFORMATION Information about the executable s address space can be obtained by using the following low level function C
17. calling the following low level functions C PAPI perror code destination length PAPI strerror code Fortan PAPIF perror code destination check 69 PAPI User s Guide Version 3 5 0 ARGUMENTS code the error code to interpret destination the error message in quotes length either 0 or strlen destination PAPI perror fills the string destination with the error message corresponding to the error code code The function copies length worth of the error description string corresponding to code into destination The resulting string is always null terminated If length is 0 then the string is printed to stderr PAPI strerror returns a pointer to the error message corresponding to the error code code If the call fails the function returns a NULL pointer Otherwise a non NULL pointer is returned Note that this function is not implemented in Fortran In the following code example PAPI _perror is used to convert error codes to error messages include lt papi h gt include lt stdio h gt main int EventSet PAPI NULL int native 0x0 char error str PAPI MAX STR_LEN Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT amp amp retval gt 0 fprintf stderr PAPI library version mismatch n exit 1 if retval PAPI create eventset amp EventSet PAPI O
18. event name to code EventName EventCode check PAPIF event code to name EventCode EventName check ARGUMENTS EventCode a preset or native event of integer type such as PAPI TOT INS EventName the event name string such as the preset name PAPI BR CN Note that the preset does not actually have to be available on a given platform to call these functions Native event names are platform specific and where feasible match those given in the vendor documentation PAPI event name to code is used to translate an ASCII PAPI preset or native event name into an integer PAPI event code PAPI event code to name is used to translate an integer PAPI event code into an ASCII PAPI preset or native event name Using PAPI event code to name in conjunction with PAPI enum event is a good way to explore the names of native events on a specific platform as shown in the following code example include lt papi h gt include lt stdio h gt main int EventCode retval char EventCodeStr PAPI MAX STR_LEN Initialize the library retval PAPI library init PAPI VER CURRENT 15 PAPI User s Guide Version 3 5 0 if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 EventCode 0 NATIVE MASK do Translate the integer code to a string if PAPI event code to name EventCode EventCodeStr PAPI OK
19. if you wish to call PAPI low level API functions before using the high level functions EXECUTION RATE CALLS Three PAPI high level functions are available to measure floating point or total instruction rates These three calls are shown below C PAPI flips real time proc time flpins mflips PAPI_flops real_time proc time flpins mflops PAPI ipc real time proc time Sins TD Fortran PAPIF flips real time proc time flpins mflips check PAPIF flops real time proc time flpins mflops check PAPIF ipc real time proc_time ins ipc check ARGUMENTS real_ time the total real wallclock time since the first rate call proc time the total process time since the first rate call flpins the total floating point instructions since the first rate call mflips mflops Millions of floating point operations or instructions per second achieved since the latest rate call ins the total instructions executed since the first PAPI_ipc call ipc instructions per cycle achieved since the latest PAPI_ipc call The first execution rate call initializes the PAPI library if needed sets up the counters to monitor either PAPI FP INS PAPI FP OPS Or PAPI TOT INS depending on the call and PAPI TOT Cyc events and starts the counters Subsequent calls to the same rate function will read the counters and return total real time total process time total instruc
20. in order to get acceptable results The default time slice for multiplexing is currently set at 100000 microseconds Occasionally this setting can cause a resonant situation in the code in which a given pattern repeats at the same frequency that timers are switched out This 52 PAPI User s Guide Version 3 5 0 can by addressed by changing the time slice setting by calling PAPI set opt with the PAPI DEF MPX USEC option e To prevent naive use of multiplexing by the novice user the high level API can only access those events countable simultaneously by the underlying hardware unless a low level function has been called to explicitly enable multiplexing USING PAPI WITH PARALLEL PROGRAMS THREADS WHAT ARE THREADS A thread is an independent flow of instructions that can be scheduled to run by the operating system Multi threaded programming is a form of parallel programming where several controlled threads are executing concurrently in the program All threads execute in the same memory space and can therefore work concurrently on shared data Threads can run in parallel on several processors allowing a single program to divide its work between several processors thus running faster than a single threaded program which runs on only one processor at a time PAPI only supports thread level measurements with kernel or bound threads which are threads that have a scheduling entity known and handled by the operating system s kern
21. more information about the mailing lists and to subscribe or modify your mailing list settings click the Mailing amp User Lists tab under Contacts on the PAPI web page or go directly to http icl cs utk edu papi custom index html lid 50 amp slid 67 REPORTING BUGS If you find a bug in PAPI you can send mail to one of the mailing lists above or peruse the mail archives to see if it has been reported by anyone else You can also submit the bug directly to PAPIzilla the PAPI bug tracking website To access PAPIzilla click the Bug Reporting tab under Contacts on the PAPI web page or go directly to http icl cs utk edu projects papi bugz PAPI PROGRAMMER S REFERENCE Function by function documentation for PAPI can be found in a number of formats PAPI man pages are part of the standard installation package For a properly installed PAPI you should be able to display a man page for any PAPI function by typing gt man PAPI xxx where PAPI_xxx is a PAPI function name 72 PAPI User s Guide Version 3 5 0 In addition this information is also available in HTML format on the PAPI website at http icl cs utk edu projects papi files html_ man3 papi html If you want a printable version of the Programmer s Reference you can find it in Word and PDF formats under the Documentation tab on the PAPI website TABLE OF PRESET EVENTS A table of present events their standard defin
22. must first ptrace and stop the thread process Supports edge detection on events Supports invert detection on events Supports data instr tlb miss address sampling J Underlying hardware uses counter groups unsigned int reserved bits 16 38 PAPI User s Guide Version 3 5 0 For PAPI DATA ADDRESS and PAPI INSTR ADDRESS address range specification for range restricted counting typedef struct papi addr range option if both are zero range disabled int eventset eventset to restrict caddr_t start user requested start address of address range caddr t end user requested end address of an address range int start_off hardware specified offset from start address int end off hardware specified offset from end address PAPI addr range option t The file papi h contains current definitions for the structures unioned in the PAPI option_t structure Users should refer to papi h for specifics on the use of fields in these structures In the following code example PAPI get opt is used to acquire the option include lt papi h gt include lt stdio h gt main int num retval EventSet PAPI option t options PAPI MAX HWCTRS Of an event set and PAPI set opt is used to set the option PAPI DOMAIN to the same event set PAPI NULL Initialize the PAPI library retval PAPI library init PAPI VER CURRENT
23. profiling in order to gather enough information on which to base a performance analysis Multiplexing overcomes this limitation by subdividing the usage of the counter hardware over time timesharing among a large number of performance events USING PAPI WITH MULTIPLEXING INITIALIZATION OF MULTIPLEX SUPPORT Multiplex support in the PAPI library can be enabled and initialized by calling the following low level function G PAPI muliplex init Fortran PAPIF multiplex init check The above function sets up the internal structures to allow more events to be counted than there are physical counters It does this by timesharing the existing counters at some loss in precision This function should be used after calling PAPI_library init After this function is called the user can proceed to use the normal PAPI routines It should be also noted that applications that make no use of multiplexing should not call this function On success this function returns PAPI OK and on error a non zero error code is returned For a code example see the next section 50 PAPI User s Guide Version 3 5 0 CONVERTING AN EVENT SET INTO A MULTIPLEXED EVENT SET In addition a standard event set can be converted to a multiplexed event set by the calling the following low level function C PAPI set multiplex EventSet Fortran PAPIF set multiplex EventSet ARGUMENT EventSet an integer handle for a PAPI event set as cre
24. EL AP b i u u o uuu ua a kas 21 WHAT IS THE LOW LEVEL AP IP sswasswasasnaananaasaananaanasnasnananaasasaaa 21 INITIALIZATION OF THE LOW LEVEL API EES 22 EVENT SEIS i u ayu uz n asus suu a sau au aa au suhu 24 WHATARE EVENT Ee 24 CPEA TINO AN EVENT SE EE 24 ADDING EVENTS TO AN EVENT SET ossee 25 STARTING READING ADDING AND STOPPING EVENTS IN AN EVENT SET 26 RESETTING EVENTS IN AN EVENT EE 27 REMOVING EVENTS IN AN EVENT EE eR asasunsna amas 28 EMPTYING AND DESTROYING AN EVENT EA 29 b i E EE 30 GETTING AND SETTING Eeer 33 SIMPLE CODE EXAMPLES ussuwnuwuunuunannannaananannannan 40 eiis EE 40 LOW LEVEL PL EE 41 PAPI TIMERS ek SE REAL TIME EE 43 VIRTUAL TIME aia ihc ei ck a AA PAPI SYSTEM INFORMATION anne nn enk ERR RER RER RER E RR San sens 40 EXECUTABLE INFORMATION ons nn en R ERR RER RER RRE R ERR E ERR RER RRE R RRE 46 HARDWARE INFORMATIDONN sssswansnsasanannans 8 47 SUBSTRATE INFORMATIDONN uwwawwwmnawnannawnnannananan 49 ADVANCED PAPI FEATURES usssssssssssss s DO MULIIPLEXING 2 a a a yawa khu oO WHAT IS MULTIPLEXING7 uwnannannawaanannannanaananannaana 50 USING PAPI WITH MULTIPLEXING rsssssssasnnassanasnnannanaanannnnannannna 50 1558 ES OP NUPTIPDES EE 52 USING PAPI WITH PARALLEL PROGRAMS 53 EE 53 MEI Ee 56 GEMET a ss ss 58 WRAT IS AN OVERFLOW EE 58 BEGINNING OVE
25. I User s Guide Version 3 5 0 Requested End Address 0x60000000000418cc End Offset Ox 734 Actual End Address 0x6000000000042000 loads retired 17971 stores retired 17971 The papi_native_avail Utility To effectively use the instruction and address range specification feature for Itanium 2 one must know which of the roughly 475 available native events support these features In addition there are other qualifiers to Itanium 2 native events that are valuable to inspect For these reasons the papi native avail utility was enhanced to make it possible to filter the list of native events by these qualifiers A help feature was added to this utility to make it easier to remember the Itanium specific options gt papi native avail help This is the PAPI native avail program It provides availability and detail information for PAPI native events Usage papi native avail options Options h help print this help message darr display Itanium events that support Data Address Range Restriction dear display Itanium Data Event Address Register events only iarr display Itanium events that support Instruction Address Range Restriction iear display Itanium Instruction Event Address Register events only opcm display Itanium events that support OpCode Matching NOTE The last five options are mutually exclusive If any of these options are spec
26. I start counters Events NUM EVENTS PAPI OK handle error 1 Do some computation here Read the counters if PAPI read counters values NUM EVENTS PAPI OK handle error 1 Do some computation here Stop counting events if PAPI stop counters values NUM EVENTS PAPI OK handle error 1 On success all of these functions return PAPI OK and on error a non zero error code is returned LOW LEVEL API WHAT IS THE LOW LEVEL API The low level API Application Programming Interface manages hardware events in user defined groups called Event Sets It is meant for experienced application programmers and tool developers wanting fine grained measurement and control of the PAPI interface Unlike the high level interface it allows both PAPI preset and native events Other features of the low level API are the ability to obtain information about the executable and the hardware as well as to set options for multiplexing and overflow handling Some of the benefits of using the low level API rather than the high level API are that it increases efficiency and functionality 21 PAPI User s Guide Version 3 5 0 It should also be noted that the low level interface could be used in conjunction with the high level interface as long as attention is paid to insure that the PAPI library is initialized prior to the first low level PAPI call The low level API i
27. K fprintf stderr PAPI error d s n retval PAPI strerror retval exit 1 Add Total Instructions Executed to our EventSet if retval PAPI add event amp EventSet PAPI TOT _INS PAPI OK PAPI perror retval error str PAPI MAX STR_LEN fprintf stderr PAPI error d s n retval error str exit 1 Add POWER4 native event retval PAPI event name to code PM LD MISS LIT amp native if retval PAPI add _event amp EventSet native PAPI OK Dump error string directly to stderr PAPI perror retval NULL NULL 70 PAPI User s Guide Version 3 5 0 exit l1 Start counting if retval PAPI start EventSet PAPI OK handle error retval OUTPUT Invalid argument Notice that the above output was generated from the last call to PAPI_perror On success PAPI perror returns PAPI OK and on error a non zero error code is returned 71 PAPI User s Guide Version 3 5 0 FURTHER INFORMATION In addition to the information in this Users Guide on Programming with PAPI a number of other information sources are available Most of these are available on the PAPI website Below we list some of these resources and indicate where they can be found PAPI HOME PAGE The PAPI Project home page can be found at http icl cs utk edu papi PAPI MAILING LISTS PAPI has two mailing lists for users and developers For
28. LD amp myid Initialize the PAPI library retval PAPI library init PAPI VER CURRENT 56 PAPI User s Guide Version 3 5 0 retval if PAPI VER CURRENT fprintf stderr PAPI library init erroriknimi exit 1 Create an EventSet if PAPI create eventset EventSet PAPI OK handle error 1 Add Total Instructions Executed to our EventSet if PAPI add vent EventSet PAPI TOT INS l PAPI OK handle error 1 Start counting if PAPI _start EventSet PAPI OK handle _error 1 while done if myid 0 printf Enter the number of intervals 0 quits scanf Sd amp n MPI Bcast amp n 1 MPI_ INT 0 MPI COMM WORLD if n 0 break h 1 0 double n sum 0 0 for i myid 1 i lt n i numprocs x h double i 0 5 sum 4 0 1 0 x x mypi h sum MPI Reduce amp mypi amp pi 1 MPI DOUBLE MPI SUM 0 MPI COMM WORLD if myid 0 printf pi is approximately pi fabs pi PI25DT Read the counters if PAPI read EventSet values handle error 1 printf After reading counters Start the counters if PAPI stop EventSet values handle error 1 printf After stopping counters MPI Finalize 16f Error is 16f n 1 PAPI OK 1ld n values 0 PAPI OK s1lldNn values 0
29. PAPI User s Guide Version 3 5 0 PAPI USER S GUIDE TABLE OF CONTENTS PAPI USER S GUIDE ccccceceeeeeeeeeeee eee eee 1 TABLE OF SEENEN 1 PREFACE sssasashipasapasssssszassaxsanussssssaas asssanu asasssaapansxsaap 4 INTENDED AUDIENCE en uuu ukuk aspasssassssasskskasswssiapanaaxakassassskasa 4 ORGANIZATION OF THIS DOCUMENTT ssamanna 4 ee 2 0 08 5 8 TO E EEN 4 EE EE 4 CAND FORTRAN CALLING 11 ERA eege 4 EE 4 O EE ee 4 EE 5 EE E Eege 5 Aa D U DU EE 5 r O E U L c eee 5 E EE 5 EE S DOCUMENT CONVENTION NSNNNSSNKER nananana NEEN RENE 5 INTRODUCTION TO PAPI E 6 WHAT IS A KE 6 e Ee TE 6 ARC AITECTURE aou a saksskasasshass asskawkansusuakapauaankanawasasaxskskawsusuanau 7 ENSTALLING PAPI E 8 C AND FORTRAN CALLING INTERFACES 9 AA IS Ree Cee PPE ET PPE r eT errr err ter eT errr re Teer re err Ter rT 10 WHAT ARE EVENTS AueuasEuRNeagaNeuskegNksRNkSgNNRaARKRANRNKRKKNNKNSNSNENRNEN 10 NATIVE EVENTS orrn RETE RER ENNER 10 WHAF ARE NATIVE EDS ee 10 PRESET EVENT S cronni aa 11 GE 11 EVENT RANSLA TION u sgtege gege eat REESEN E ANS 15 PAPI S COUNTER INTERFACES a 17 MHIGH LEVEL APL u usu usukassikasuapasss aassanainaniayiwyayayuiwawussskus 17 u ATI ERL HIRV EE Fi 1 PAPI User s Guide Version 3 5 0 INITIALIZING THE Ed ae EE a7 EXECUTION RATE CALLS ees 19 LOW LEV
30. RESET l PAPI TOT CYC if PAPI add _event amp EventSet pset gt event code PAPI OK handle error 1 if j gt max to add break values long Long malloc max to add sizeof long_long if values NULL handle_error 1 Start counting events if PAPI start EventSet PAPI OK handle_error 1 On success both functions return PAPI_OK and on error a non zero error code is returned For more code examples see ctests multiplex1 c in the papi source distribution ISSUES OF MULTIPLEXING The following are some issues concerning multiplexing that the PAPI user should be aware of Hardware multiplexing is not supported by all platforms On those platforms where it is supported PAPI takes advantage of it Otherwise PAPI implements software multiplexing through the use of a high resolution interval timer For more information on which platforms support hardware or software multiplexing see Appendix H Multiplexing unavoidably incurs a small amount of overhead when switching events In addition no single event is measured for the full analysis time These factors can adversely affect the precision of reported counter values In other words the more events that are multiplexed the more likely that the results will be statistically skewed The amount of time spent in the measured regions should be greater than the multiplexing time slice times the number of events measured
31. RFLOWS IN EVENT SETS sssssasasnasnasasnasaanana 58 STATISTICAL PROFTILING wsummuawanwnnuanaaaananaannaanananana 60 WHAT IS STATISTICAL EE ee 60 GENERATING A PC EE EE sua naanzuaq umasa ms 61 DATA AND INSTRUCTION ADDRESS RESTRICTIDONN 63 dE 63 TEPAT EE 63 PAPI User s Guide Version 3 5 0 REENEN 69 CONVERTING ERROR CODES TO ERROR MESSAGES EES RE 69 FURTHER INFORMATION uwwusmunaananana 72 PAPI HOME Eege 72 PAPI MAILING LISTS uuuuuu yunan a 72 REPORTING BUGS eer reer nr rr errr et er rr err ere rere rt Tr rer rrr er 72 PAPI PROGRAMMER S REFERENCIE sswuana 72 TABLE OF PRESET EVEN enert geen geb 73 SUPPORTED PLATFORMS ani 73 SUPPORTED tee BE 73 HARDWARE REFERENCES u Eege 73 BIBLIOGRAPHY s ssussumnauamauxaxanasasanasaanaas 74 PAPI User s Guide Version 3 5 0 INTENDED AUDIENCE This document is intended to provide the PAPI user with a discussion of how to use the different components and functions of PAPI The intended users are application developers and performance tool writers who need to access performance data to tune and model application performance The user is expected to have some level of familiarity with either the C or Fortran programming language ORGANIZATION OF THIS DOCUMENT INTRODUCTION TO PAPI This section provides an introduction to PAPI b
32. _STR_LEN _ Fortran string C_INT FUNCTION EXTERNAL INTEGER FUNCTION Fortran function returning integer result Array arguments must be of sufficient size to hold the input output from to the subroutine for predictable behavior The array length is indicated either by the accompanying argument or by internal PAPI definitions Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran In these implementations the string is truncated or space padded as necessary For other implementations the length of the character array is assumed to be of sufficient size No character string longer than PAPI_MAX_STR_LEN is returned by the PAPIF interface PAPI User s Guide Version 3 5 0 EVENTS WHAT ARE EVENTS Events are occurrences of specific signals related to a processor s function Hardware performance counters exist as a small set of registers that count events such as cache misses and floating point operations while the program executes on the processor Monitoring these events facilitates correlation between the structure of source object code and the efficiency of the mapping of that code to the underlying architecture Each processor has a number of events that are native to that architecture PAPI provides a software abstraction of these architecture dependent native events into a collection of preset events that are accessible through the PAPI i
33. _event is used to remove the event PAPI User s Guide Version 3 5 0 ARGUMENT EventSet an integer handle for a PAPI event set as created by PAPI create eventset Note that the event set must be empty in order to use PAPI destroy eventset In the following code example PAPI cleanup eventset is used to empty all the events from an event set and PAPI remove eventset is used to deallocate the memory associated with the empty event set include lt papi h gt include lt stdio h gt main int retval EventSet PAPI NULL Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error Nn exit 1 Create the EventSet if PAPI create eventset amp EventSet PAPI OK handle_error 1 Add Total Instructions Executed to our EventSet if PAPI add event amp EventSet PAPI TOT INS l PAPI_OK handle_error 1 Remove all events in the eventset if PAPI cleanup eventset amp EventSet PAPI OK handle_error 1 Free all memory and data structures EventSet must be empty if PAPI destroy eventset amp EventSet PAPI OK handle_error 1 On success these functions return PAPI_OK and on error a non zero error code is returned THE STATE OF AN EVENT SET The counting state of an Event Set can be obtai
34. ability of specifying starting and ending addresses of 63 PAPI User s Guide Version 3 5 0 data structures or instructions to be instrumented The PAPI set opt Call previously supported functionality to set a variety of optional capability in the PAPI interface including debug levels multiplexing of eventsets and the scope of counting domains This call was extended with two new cases to support instruction and data address range specification PAPI INSTR ADDRESS and PAPI DATA ADDRESS To access these options a user initializes a simple option specific data structure and calls PAPI set opt as illustrated in the code fragment below option addr eventset EventSet option addr start caddr_t array option addr end caddr_t array size array retval PAPI set opt PAPI DATA ADDRESS amp option The user creates a PAPI eventset and determines the starting and ending addresses of the data to be monitored The call to PAPI_set_opt then prepares the interface to count events that occur on accesses to data in that range The specific events to be monitored can be added to the eventset either before or after the data range is specified In a similar fashion an instruction range can be set using the PAPI INSTR ADDRESS option If this option is supported on the platform in use the data is transferred to the platform specific implementation and handled appropriately If not supported the call returns an error messa
35. architecture Portable PAPI High Level Layer PAPI Low Level PAPI Machine Dependent Substrate Machine Kernel Extension Specific Layer Operating System Hardware Performance Counters The Portable Layer consists of the API low level and high level and machine independent support functions The Machine Specific Layer defines and exports a machine independent interface to machine dependent functions and data structures These functions are defined in the substrate layer which uses kernel extensions operating system calls or assembly language to access the hardware performance counters PAPI uses the most efficient and flexible of the three depending on what is available PAPI strives to provide a uniform environment across platforms However this is not always possible Where hardware support for features such as overflows and multiplexing is not supported PAPI implements the features in software where possible Also processors do not support the same metrics thus you can monitor different events depending on the processor in use Therefore the interface remains constant but how it is implemented can vary Throughout this guide implementation decisions will be documented where it can make a difference to the user such as overhead costs sampling and etc 7 PAPI User s Guide Version 3 5 0 INSTALLING PAPI On some of the systems that PAPI supports you can install PAPI right out of the box without any additional s
36. ated by PAPI create eventset The above function converts a standard PAPI event set created by a call to PAPI create eventset into an event set capable of handling multiplexed events This function must be used after calling PAPI_multiplex_init and PAPI_create_eventset but prior to calling PAPI_start Events can be added to an event set either before or after converting it into a multiplexed set but the conversion must be done prior to using it as a multiplexed set In the following code example PAPI set multiplex is used to convert a standard event set into a multiplexed event set include lt papi h gt int retval i EventSet PAPI NULL max to add 6 j 0 long long values const PAPI preset info t pset main Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT handle error 1 Enable and initialize multiplex support if PAPI multiplex init PAPI OK handle error 1 Create an EventSet if PAPI create eventset amp EventSet PAPI OK handle error 1 Convert the EventSet to a multiplexed event set if PAPI set multiplex EventSet PAPI OK handle error 1 51 PAPI User s Guide Version 3 5 0 for 1 0 i lt PAPI MAX PRESET EVENTS i if PAPI_ query event i PAPI PRESET PAPI OK amp amp i PAPI P
37. ay EventCode PAPI remove event removes a single hardware event from a PAPI event set PAPI remove events does the same as PAPI remove event but for an array of hardware event codes 28 PAPI User s Guide PAPI TOT INS from an event set inc inc main lude lt papi h gt lude lt stdio h gt int retval EventSet PAPI NULL Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Create an EventSet if PAPI create eventset amp EventSet PAPI OK handle error 1 Add Total Instructions Executed to our EventSet if PAPI add event EventSet PAPI TOT INS PAPI OK handle error 1 Remove event if PAPI remov vent EventSet PAPI TOT INS PAPI OK On success these functions return PAPI OK and on error a non zero error code is handle error 1 returned EMPTYING AND DESTROYING AN EVENT SET All the events in an event set can be emptied and destroyed by calling the following low level functions respectively C PAPI cleanup eventset EventSet PAPI destroy eventset EventSet Fortran PAPIF cleanup eventset EventSet check PAPIF destroy eventset EventSet check 29 Version 3 5 0 In the following code example PAPI_remove
38. bout the hardware on which the program runs such as the number of CPUs CPU model information and the cycle time of the CPU In Fortran the fields of the structure are returned explicitly Note that if this function were called before PAPI library init it would be undefined In the following code example PAPI get hardware info is used to acquire hardware information about the total number of CPUs and the cycle time of the CPU include lt papi h gt include lt stdio h gt main const PAPI hw info t hwinfo NULL if PAPI library init PAPI VER CURRENT PAPI VER CURRENT exit 1 ac hwinfo PAPI get hardware info NULL exit 1 printf Sd CPU s at f Mhz n hwinfo gt totalcpus hwinfo gt mhz POSSIBLE OUTPUT 1 CPUs at 733 000000 Mhz In C on success this function returns a non NULL pointer and on error NULL is returned In Fortran on success this function returns PAPI OK and on error a non zero error code is returned 48 PAPI User s Guide Version 3 5 0 SUBSTRATE INFORMATION Implementation details about the current hardware dependent substrate can be obtained by using the following low level function C PAPI get substrate into Fortran This call is not implemented in the Fortran interface ARGUMENTS In C this function returns a pointer to a structure containing implementation details about the substrate currently in use suc
39. cation Performance Tuning Using Hardware Counters Proc SC 2000 November 2000 http icl cs utk edu papi documents Dongarra J London K Moore S Mucci P and Terpstra D Using PAPI for Hardware Performance Monitoring on Linux Systems Conference on Linux Clusters The HPC Revolution Urbana Illinois June 25 27 2001 http icl cs utk edu papi documents London K Moore S Mucci P Seymour K and Luczak R The PAPI Cross Platform Interface to Hardware Performance Counters Department of Defense Users Group Conference Proceedings Biloxi Mississippi June 18 21 2001 http icl cs utk edu papi documens London K Dongarra J Moore S Mucci P Seymour K and Spencer T End user Tools for Application Performance Analysis Using Hardware Counters International Conference on Parallel and Distributed Computing Systems Dallas TX August 8 10 2001 http icl cs utk edu papi documents Mucci P Moore S and Smeds Nils Performance Tuning Using Hardware Counter Data Proc SC 2001 November 2001 http icl cs utk edu papi documents Mucci P The IA64 Hardware Performance Monitor and PAPI The 2001 International Conference on Parallel and Distributed Processing Techniques and Applications June 2001 http icl cs utk edu papi documents Mucci P PAPI The Performance Application Programming Interface April 2000 http icl cs utk edu papi documents
40. check ARGUMENTS handle Pointer to a routine that returns the current thread ID This function should be called only once just after PAPI library init and before any other PAPI calls If the function is called more than once the application will exit Also applications that make no use of threads do not need to call this function The following example shows the correct syntax for using PAPI_thread_init with OpenMP C include lt papi h gt include lt omp h gt if PAPI thread init omp get thread num PAPI OK handle error 1 Fortran include fpapi h include omp h EXTERNAL omp get thread num C Fortran dictates that in order to a pass a subroutine C as an argument the subroutine must be C declared external call PAPIF thread_init omp_get_ thread num error On success the function PAPI thread init returns PAPI OK and on error a non zero error code is returned For a code example of using PAPI thread init with Pthreads see the next section THREAD ID The identifier of the current thread can be obtained by calling the following low level function C 54 PAPI User s Guide Version 3 5 0 PAPI thread id Fortran PAPIF thread 1d check This function calls the thread id function registered by PAPI thread init and returns an unsigned long integer containing the thread identifier In the following code example PAPI thread init and
41. d event set into the array values The counters are left counting after the read without resetting PAPI accum adds the counters of the indicated event set into the array values The counters are reset and left counting after the call of this function PAPI stop stops the counting events in a previously defined event set and returns the current events 26 PAPI User s Guide Version 3 5 0 The following is a code example of using PAPI start to start the counting of events in an event set PAPI read to read the counters of the same event set into the array values and PAPI stop to stop the counting of events in the event set include lt papi h gt include lt stdio h gt main int retval EventSet PAPI NULL Long Long values 1 Initialize the PAPI library retval PAPI library_init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Create the Event Set if PAPI create eventset amp EventSet PAPI OK handle error 1 Add Total Instructions Executed to our EventSet if PAPI add event EventSet PAPI TOT INS PAPI OK handle error 1 Start counting if PAPI start EventSet PAPI OK handle error 1 Do some computation here if PAPI read EventSet values PAPI OK handle error 1 Do some computation here if PAPI stop EventSe
42. d n PAPI MAJOR retval fprintf stdout MINOR sd n PAPI MINOR retval fprintf stdout REVISION d n PAPI REVISION retval OUTPUT FOR PAPI VERSION 3 5 0 PAPI Version Number MAJOR 8 MINOR 5 REVISION 0 23 PAPI User s Guide Version 3 5 0 EVENT SETS WHAT ARE EVENT SETS Event Sets are user defined groups of hardware events preset or native which are used in conjunction with one another to provide meaningful information The user specifies the events to be added to an Event Set and other attributes such as the counting domain user or kernel whether or not the events in the Event Set are to be multiplexed and whether the Event Set is to be used for overflow or profiling Other settings for the Event Set are maintained by PAPI such as what low level hardware registers to use the most recently read counter values and the state of the Event Set running not running Event Sets provide an effective abstraction for the organization of information associated with counting hardware events The PAPI library manages the memory for Event Sets with a user interface through integer handles to simplify calling conventions The user is free to allocate and use any number of them provided the substrate can provide the required resources Only one Event Set can be in active use at any time in a given thread or process CREATING AN EVENT SET An event set can be created by calling t
43. dependent pairs of data address registers exist in the hardware and would suggest that four disjoint address regions can be monitored simultaneously the Intel documentation strongly suggests that this is not a good idea Further the underlying software takes advantage of these register pairs to tune the range of addresses that is actually monitored See the discussion under Data Address Ranges for further detail Instruction Address Ranges Instruction ranges can be specified in one of two ways coarse and fine In fine mode addresses can be specified exactly but both the start and end addresses must exist on the same 4K byte page In other words the address range must be less than 4K bytes and the addresses can only differ in the bottom 12 bits If fine mode cannot be used the underlying perfmon library automatically switches to coarse address specification Four pairs of registers are available to specify coarse instruction address ranges The restrictions to coarse address specification are discussed below Data Address Ranges Data addresses can only be specified in coarse mode As with instruction ranges four pairs of registers are available to specify the data address ranges Use of coarse mode addressing for either instruction or data address specification can cause some anomalous results The Intel documentation points out that starting and ending addresses cannot be specified exactly since the hardware representation relies on p
44. e lt stdio h gt main int int EventSet PAPI NULL retval Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Create an EventSet if PAPI create eventset amp EventSet PAPI OK handle error 1 25 PAPI User s Guide Version 3 5 0 Add Total Instructions Executed to our EventSet if PAPI add_ vent EventSet PAPI TOT INS l PAPI OK handle error 1 On success both of these functions return PAPI oK and on error a non zero error code is returned STARTING READING ADDING AND STOPPING EVENTS IN AN EVENT SET Hardware events in an event set can be started read added and stopped by calling the following low level functions respectively C PAPI Start Eventset PAPI read EventSet values PAPI accum EventSet values PAPI stop EventSet values Fortran PAPIF start EventSet check PAPIF read EventSet values check PAPIF accum EventSet values check PAPIF stop EventSet values check ARGUMENTS EventSet an integer handle for a PAPI Event Set as created by PAPI Create eventset values an array to hold the counter values of the counting events PAPI start starts the counting events in a previously defined event set PAPI read reads copies the counters of the indicate
45. e other than PAPI VER CURRENT indicates a library version mismatch and a negative return code indicates an initialization error Beginning with PAPI 3 0 there are a number of options for examining the current version number of PAPI 22 PAPI User s Guide Version 3 5 0 PAPI VERSION produces an integer containing the complete current version including MAJOR MINOR and REVISION components Typically the REVISION component changes with bug fixes or minor enhancements the MINOR component changes with feature additions or API changes and the MAJOR component changes with significant API structural changes PAPI VER CURRENT contains the MAJOR and MINOR components and is useful for determining library compatibility changes PAPI VERSION MAJOR PAPI VERSION MINOR PAPI VERSION REVISION are macros that extract specified component from the version number The following is a code example of using PAPI library init to initialize the PAPI library include lt papi h gt include lt stdio h gt int main retval Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT amp amp retval gt 0 fprintf stderr PAPI library version mismatch n exit 1 if retval lt 0 fprintf stderr Initialization error n exit 1 fprintf stdout PAPI Version Number n fprintf stdout MAJOR s
46. ed Returns TRUE if currently detached Set event set specified in ptr gt ptr gt attach eventset to be detached from any thread or process id Get Set domain for a single event set The event set is specified in ptr gt domain eventset Get Set granularity for a single event set The event set is specified in ptr gt granularity eventset Currently unimplemented Platform Specific Options Set data address range to restrict event counting for event set specified in ptr gt addr eventset Starting and ending addresses are specified in ptr gt addr start and ptr gt addr end respectively If exact addresses cannot be instantiated offsets are returned in ptr gt addr start_off and ptr gt addr end_off Currently implemented on Itanium only Set instruction address range as described above Itanium only ptr is a pointer to a structure that acts as both an input and output parameter It is defined in papi h and below EventSet input a reference to an EventSetInfo structure clockrate output cycle time of this CPU in MHz may be an estimate generated at init time with a quick timing routine 34 PAPI User s Guide Version 3 5 0 domain output execution domain for which events are counted granularity output execution granularity for which events are counted mode input determines if domain or granularity are default or for the current event set preload output environment variable string for preload
47. el In most cases such as with SMP or OpenMP complier directives bound threads will be the default Each thread is responsible for the creation start stop and read of its own counters When a thread is created it inherits no PAPI information from the calling thread There are some threading packages or APIs that can be used to manipulate threads with PAPI particularly Pthreads and OpenMP For those using Pthreads the user should take care to set the scope of each thread to PTHREAD_SCOPE_SYSTEM attribute unless the system is known to have a non hybrid thread library implementation In addition PAPI does support unbound or non kernel threads but the counts will reflect the total events for the process Measurements that are done in other threads will get all the same values namely the counts for the total process For unbound threads it is not necessary to call PAPI_thread_init which will be discussed in the next section When threads are in use PAPI allows the user to provide a routine to its library that returns the thread ID of the currently running thread for example pthreads_self for Pthreads and this thread ID is used as a lookup function for the internal data structures INITIALIZATION OF THREAD SUPPORT Thread support in the PAPI library can be initialized by calling the following low level function 53 PAPI User s Guide Version 3 5 0 C PAPI thread init handle Fortran PAPIF thread init handle
48. etup Others require drivers or patches to be installed first Because installation instructions vary from platform to platform please find your particular Operating System and hardware section in the papi INSTALL txt file for current information on exactly how to install PAPI for your configuration PAPI User s Guide Version 3 5 0 C AND FORTRAN CALLING INTERFACES PAPI is written in C The function calls in the C interface are defined in the header file papi h and consist of the following form lt returned data type gt PAPI function name argl arg2 The function calls in the Fortran interface are defined in the header file fpapi h and consist of the following form PAPIF function name argl arg2 check As you can see the C function calls have equivalent Fortran function calls PAPI_ lt call gt becomes PAPIF_ lt call gt This is generally true for most function calls except for the functions that return C pointers to structures such as PAPI_get_opt and PAPI_get_executable_info which are either not implemented in the Fortran interface or implemented with different calling semantics In the function calls of the Fortran interface the return code of the corresponding C routine is returned in the argument check For most architectures the following relation holds between the pseudo types listed and Fortran variable types Pseudo type Fortran type Description INTEGER C_STRING CHARACTER PAPI_MAX
49. f PAPI library_init PAPI VER CURRENT PAPI VER CURRENT exit 1 Gets the starting time in clock cycles start cycles PAPI get real cyc Gets the starting time in microseconds start usec PAPI get real _usec Create an EventSet 43 PAPI User s Guide Version 3 5 0 if PAPI create eventset amp EventSet PAPI OK exit 1 Gets the ending time in clock cycles end cycles PAPI get real cyc Gets the ending time in microseconds end usec PAPI get real _usec printf Wall clock cycles lld n end cycles start cycles prinf Wall clock time in microseconds lld n end_usec start_usec POSSIBLE OUTPUT Wall Wall Q lock cycles 100173 lock time in microseconds 136 Q VIRTUAL TIME Virtual time can be acquired in clock cycles and microseconds by calling the following low level functions respectively C PAPI get virt cyc PAPI get virt usec Fortran PAPIF get virt cyc check PAPIF get virt usec check Both of these functions return the total number of virtual units from some arbitrary starting point Virtual units accrue every time a process is running in user mode Like the real time counters these functions always succeed error free since they are guaranteed to exist on every PAPI supported platform However the resolution can be as bad
50. f Tennessee s Innovative Computing Laboratory in the Computer Science Department This project was created to design standardize and implement a portable and efficient API Application Programming Interface to access the hardware performance counters found on most modern microprocessors BACKGROUND Hardware counters exist on every major processor today such as Intel Pentium Core IA 64 AMD Opteron and IBM POWER series These counters can provide performance tool developers with a basis for tool development and application developers with valuable information about sections of their code that can be improved However there are only a few APIs that allow access to these counters and many of them are poorly documented unstable or unavailable In addition performance metrics may have different definitions and different programming interfaces on different platforms These considerations motivated the development of the PAPI Project Some goals of the PAPI Project are as follows e To provide a solid foundation for cross platform performance analysis tools e To present a set of standard definitions for performance metrics on all platforms e To provide a standardize API among users vendors and academics e To be easy to use well documented and freely available PAPI User s Guide Version 3 5 0 ARCHITECTURE The Figure below shows the internal design of the PAPI architecture In this figure we can see the two layers of the
51. following code example a native event name is converted to an event code and added to an eventset by using PAPI_add_event 10 PAPI User s Guide Version 3 5 0 include lt papi h gt include lt stdio h gt main int retval EventSet PAPI NULL unsigned int native 0x0 PAPI event info t info Initialize the library retval PAPI library init PAPI_ VER CURRENT if retval PAPI VER CURRENT printf PAPI library init error n exit 1 if PAPI creat ventset amp EventSet PAPI OK handle error 1 Find the first available native event native NATIVE MASK 0 if PAPI get event _info native amp 1info PAPI OK if PAPI enum event amp native 0 PAPI OK handle error 1 Add it to the eventset if PAPI add event EventSet native PAPI OK handle error 1 For more code examples using native events see ctests native c and util native avail c in the papi source distribution PRESET EVENTS WHAT ARE PRESET EVENTS Preset events also known as predefined events are a common set of events deemed relevant and useful for application performance tuning These events are typically found in many CPUs that provide performance counters and give access to the memory hierarchy cache coherence protocol events cycle and instruction counts functional unit and pipeline status Furthermore preset events are ma
52. ge It may not always be possible to exactly specify the address range of interest If this is the case it is important that the user have some way to know what approximations have been made so that appropriate corrective action can be taken For instance to isolate a specific data structure completely it may be necessary to pad memory before and after the structure with dummy structures that are never accessed To facilitate this PAPI set opt returns the offsets from the requested starting and ending addresses as they were actually programmed into the hardware If the addresses were mapped exactly these values are zero An example of this is shown below retval PAPI set opt PAPI DATA ADDRESS amp o0ption actual start caddr t array option addr start_ off actual end caddr t array size array option addr end off Itanium Idiosyncrasies There are roughly 475 native events available on Itanium 2 160 of them are memory related and can be counted with data address specification in place 283 can be counted using instruction address specification All events in an eventset with data or instruction range specification in place must be one of these 64 PAPI User s Guide Version 3 5 0 supported events Further restrictions also apply to the use of data and instruction range specification as described below Data addresses can only be specified in coarse mode Although four in
53. h as the number of counters and multiplexed counters supported the number of preset and native events available and whether and how certain advanced features are supported For more details refer to the definition of the PAPI substrate info t structure found in papi h or see the discussion under getting and setting options Note if this function is called before PAPI library init its output is undefined In the following code example PAPI get substrate info is used to determine how many preset and native events can be counted for a given substrate include lt papi h gt include lt stdio h gt main const PAPI substrate info t subinfo NULL if PAPI library init PAPI VER CURRENT PAPI VER CURRENT exit 1 if subinfo PAPI get substrate info NULL exit 1 printf num preset_events dxn subinfo gt num preset events printf num native events d n subinfo gt num_native events POSSIBLE OUTPUT num preset events 47 num native events 193 On success this function returns a non NULL pointer and on error NULL is returned 49 PAPI User s Guide Version 3 5 0 ADVANCED PAPI FEATURES MULTIPLEXING WHAT IS MULTIPLEXING Multiplexing allows more events to be counted than can be supported by the hardware When a microprocessor has a limited number of hardware counters a large application with many hours of run time may require days or weeks of
54. he following the low level function C PAPI create eventset EventSet Fortran PAPIF create eventset EventSet check ARGUMENT EventSet Address of an integer location to store the new EventSet handle Once it has been created the user may add hardware events to the EventSet by calling PAPI_add_event or PAPI_add_events On success this function returns PAPI_OK On error a non zero error code is returned For a code example using this function see the next section 24 PAPI User s Guide Version 3 5 0 ADDING EVENTS TO AN EVENT SET Hardware events can be added to an event set by calling the following the low level functions C PAPI add event EventSet EventCode PAPI add events EventSet EventCode number Fortran PAPIF add event EventSet EventCode check PAPIF add events EventSet EventCode number check ARGUMENTS EventSet an integer handle for a PAPI Event Set as created by PAPI create eventset EventCode a defined event such as PAPI TOT INS EventCode address of an array of defined events number an integer indicating the number of events in the array EventCode PAPI add event adds a single hardware event to a PAPI event set PAPI add events does the same as PAPI add event but for an array of hardware event codes In the following code example the preset event PAPI TOT INS is added to an event set include lt papi h gt includ
55. he hardware PAPI_SHLIBINFO Get shared library information used by the program PAPI SUBSTRATEINFO Get the PAPI features the substrate supports SE E Get the full PAPI version of the library PAPI PRELOAD Get LD_PRELOAD environment equivalent Defaults for the global library PAE LE DEE DOM Get Set the default counting domain for newly created event sets PAPI _DEFGRN Get Set the default counting granularity PAPI DEBUG Get Set the PAPI debug state and the debug handler The 33 PAPI User s Guide Version 3 5 0 PAPI IMDD TE IL PLEX PAPI MAX MPX CTRS PAPI DEF MPX USEC PAPA APACE PVE IL DESCH PANE IE DOS PAPI GRANUL PAPI DATA ADDRESS PAPI INSTR ADDRESS available debug states are defined in papi h The debug state is available in ptr gt debug level The debug handler is available in ptr gt debug handler For information regarding the behavior of the handler please see the man page for PAPI_set_debug Multiplexing control Get Set options for multiplexing Get maximum number of multiplexing counters Get Set the sampling time slice in microseconds for multiplexing Manipulating individual event sets Get thread or process id to which event set is attached Returns TRUE if currently attached Set event set specified in ptr gt ptr gt attach eventset to be attached to thread or process id specified in in ptr gt attach tid Get thread or process id to which event set is attach
56. id PAPI attach option t 36 PAPI User s Guide Version 3 5 0 For PAPI MULTIPLEX and PAPI DEE MEN USEC typedef struct _papi_multiplex option int eventset int us int flags PAPI multiplex option t For PAPI_HWINFO typedef struct papi hw info int ncpu Number of CPU s in an SMP Node int nnodes Number of Nodes in th ntire system int totalcpus Total number of CPU s in th ntire system int vendor Vendor number of CPU char vendor string PAPI MAX STR_LEN Vendor string of CPU int model Model number of CPU char model _string PAPI MAX STR_LEN Model string of CPU float revision Revision of CPU float mhz Cycle time of this CPU PAPI mh info t mem hierarchy PAPI memory heirarchy description PAPI hw_info t For PAPI_SHLIBINFO and PAPI _EXEINFO typedef struct _papi address map char name PAPI HUGE STR_LEN caddr t text Start Start address of program text segment caddr t text end End address of program text segment caddr t data start Start address of program data segment caddr t data end End address of program data segment caddr_t bss_start Start address of program bss segment caddr t bas end End address of program bss segment PAPI address map t typedef struct papi shared lib info PAPI address map_t map int coun
57. ified on the command line only those events that support that option are displayed Even so the list can be extensive with roughly 160 events supporting data address ranging and even more supporting instruction address ranging 68 PAPI User s Guide Version 3 5 0 PAPI ERROR HANDLING ERROR CODES All of the functions contained in the PAPI library return standardized error codes in which the values that are greater than or equal to zero indicate success and those that are less than zero indicate failure as shown in the table below DEFINITION O PAPI OK Noerror Invalid argument Insufficient memor A system or C library call failed please check errno P Substrate returned an error usually the result of an unimplemented feature Access to the counters was lost or interrupted Internal error please send mail to the developers P P ES No events or event sets are currently not counting Event Set is currently running No such event set available Event is not a valid preset Hardware does not support performance counters Unknown error code You lack the necessary permissions A A A A A A Hardware event does not exist A A A A A A A PI Pi ee E P PI ST Hardware event exists but cannot be counted due to counter resource limitations Gre BIE AIE PiS PIU PI E CONVERTING ERROR CODES TO ERROR MESSAGES Error codes can be converted to error messages by
58. ing libraries PAPI get opt and PAPI set opt query or change the options of the PAPI library or a specific event set created by PAPI create eventset In the C interface these functions pass a pointer to the PAPI option t structure Not all options require or return information in this structure The Fortran interface is a series of calls implementing various subsets of the C interface Not all options in C are available in Fortran Note that a number of options are available as separate entry points in both C and Fortran This can make calling sequences simpler Calls that are simply wrappers to PAPI get opt and PAPI set opt are listed below PAPI get _ executable info Get the executable s address space information PAPI_get_hardware_info Get information about the system hardware BAPI Get multiplex Get the multiplexing status of specified event set PAPI get shared lib info Get information about the shared libraries used by the process PAPI get substrate info Get information about the substrate features PAPI set debug Set the current debug level for PAPI BAPI set domein Set the default execution domain for new event sets PAPI set granularity Get Set the default granularity for new event sets PAPT set multiplex Convert a standard event set to a multiplexed event set The PAPI option t structure is actually a union of structures that provide specific information for each of the options defined in the tab
59. init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Create the Event Set if PAPI create eventset amp EventSet PAPI OK handle error 1 Add Total Instructions Executed to our Event Set if PAPI add event EventSet PAPI TOT INS PAPI OK handle error 1 Start counting events in the Event Set if PAPI start EventSet PAPI OK handle error 1 Defined in tests do_loops c in the PAPI source distribution do flops NUM FLOPS Read the counting events in the Event Set if PAPI read EventSet values PAPI OK handle error 1 41 PAPI User s Guide Version 3 5 0 printf After reading the counters Reset the counting events in the if PAPI r set EventSet PAPI OK handle error 1 do_flops NUM_FLOPS Add the S lld n values 0 Event Set counters in the Event Set if PAPI accum EventSet values handle error 1 printf After adding the counters lld n values 0 do flops N UM FLOPS PAPI OK Stop the counting of events in the Event Set if PAPI stop EventSet values PAPI OK printf Aft handle error 1 POSSIBLE OUTPUT After reading the counters 440973 After adding the counters 882256 After stopping the counters 443913 ter stoppi
60. itions and the platforms on which they are defined can be found at http icl cs utk edu projects papi presets html Note that tables inevitably become outdated Always use the util papi_ avail utility for the most current definitions of preset events on your platform SUPPORTED PLATFORMS A table of currently supported hardware platforms and operating systems can be found on the PAPI website under the Supported Platforms tab SUPPORTED TOOLS A list of tools that support PAPI along with a brief description of each tool and a link to further information can be found under the Tools tab on the PAPI website A second list of tools that support PAPI along with a list of links to related projects and links to a variety of vendor documentation can be found on the PAPI website under the Links tab HARDWARE REFERENCES A series of links to vendor and third party hardware documentation on performance counter resources can also be found on the PAPI website under the Links tab 73 PAPI User s Guide Version 3 5 0 BIBLIOGRAPHY Browne S J Dongarra J Garner N London K and Mucci P A Portable Programming Interface for Performance Evaluation on Modern Processors University of Tennessee Technical Report Knoxville Tennessee July 2000 http icl cs utk edu papi documents Browne S Dongarra J Garner N London K and Mucci P A Scalable Cross Platform Infrastructure for Appli
61. le above This union is defined as shown below 35 PAPI User s Guide Version 3 5 0 typedef union PAPI preload info t preload PAPI debug option t debug PAPI granularity option t granularity PAPI granularity option t defgranularity PAPI domain option Lt domain PAPI domain option t defdomain PAPI attach option t attach PAPI multiplex option t multiplex PAPI hw info t hw info PAPI shlib info t shlib info PAPI exe info t exe info PAPI substrate info t sub_info PAPI addr range option_ t addr PAPI option t Each of these individual structures as defined in papi h is shown below For PAPI _PRELOAD typedef struct _papi_preload_option char lib preload_env PAPI MAX STR_LEN char lib preload sep char lib dir _env PAPI MAX STR_LEN char lib dir sep PAPI preload info t For PAPI DEBUG typedef int PAPI debug handler t int code typedef struct papi debug option int level PAPI debug handler t handler PAPI debug option t For PAPI_DEFGRN and PAPI_GRANUL typedef struct papi granularity option int eventset int granularity PAPI granularity option t For PAPI_DEFDOM and PAPI DOMAIN typedef struct _papi domain option int eventset int domain PAPI domain option Lt For PAPI_ATTACH and PAPI DETACH typedef struct _papi_attach_option int eventset unsigned long t
62. les along with the corresponding output are included as well ADVANCED PAPI FEATURES This section discusses the advanced features of PAPI which includes multiplexing threads MPI overflows and statistical profiling The functions that are use to implement these features are also discussed Code examples along with the corresponding output are included as well PAPI ERROR HANDLING This section discusses the various negative error codes that are returned by the PAPI functions A table with the names values and descriptions of the return codes are given as well as a discussion of the PAPI function that can be used to convert error codes to error messages along with a code example with the corresponding output PAPI MAILING LISTS This section provides information on two PAPI mailing lists for the users to ask various questions about the project APPENDICES These appendices provide various listings and tables such as a table of preset events and the platforms on which they are supported a table of PAPI supported tools more information on native events multiplexing overflow and etc DOCUMENT CONVENTION handle error 1 A function that passes the argument of 1 The user should provide this function to handle errors PAPI User s Guide Version 3 5 0 INTRODUCTION TO PAPI WHAT IS PAPI PAPI is an acronym for Performance Application Programming Interface The PAPI Project is being developed at the University o
63. med to overflow by making successive calls to this function but only a single overflow handler can be registered To turn off overflow for a specific event call PAPI overflow with EventCode set to the desired event and threshold set to zero The handler function is a user supplied callback routine that performs whatever special processing needed to handle the overflow interrupt including sorting multiple overflowing events from each other It must conform to the following prototype C PAPI overflow handler EventSet address overflow vector void context ARGUMENTS EventSet a reference to the event set in use address the address of the program counter when the overflow occurred overflow vector a 64 bit vector that specifies which counter s generated the overflow Bit 0 corresponds to counter 0 The handler should be able to deal with multiple overflow bits per call if more than one event may be set to overflow context a platform dependent structure containing information about the state of the machine when the overflow occurred This structure is provided for completeness but can generally be ignored by most users In the following code example PAPI_overflow is used to mark PAPI_TOT_INS in order to generate an overflow signal after every 100 000 counted events include lt papi h gt include lt stdio h gt define THRESHOLD 100000 59 PAPI User s Guide Version 3 5 0 int total 0 total
64. n 3 5 0 Start counting if PAPI state EventSet amp status PAPI OK handle error 1 printf State is now Sd n status if PAPI start EventSet PAPI OK handle error 1 if PAPI state EventSet amp status PAPI OK handle error 1 printf State is now Sd n status OUTPUT State is now 1 te is now 2 8 ct On success this function returns PAPI OK and on error a non zero error code is returned 32 PAPI User s Guide Version 3 5 0 GETTING AND SETTING OPTIONS The options of the PAPI library or a specific event set can be obtained and set by calling the following low level functions respectively C PAPI get opt option ptr PAPI set_opt option ptr Fortran PAPIF get clockrate clockrate PAPIF get domain EventSet domain mode check PAPIF get granularity EventSet granularity mode check PAPIF get preload preload check ARGUMENTS option iS an input parameter describing the course of action The Fortran calls are implementations of specific options Possible values are defined in papi h and briefly described below Option name Explanation General information requests PAPI_CLOCKRATE Get clockrate in MHz PAPI_MAX_CPUS Get number of CPUs PAPI MAX HWCTRS Get number of counters PAPI_EXEINFO Get Executable addresses for text data bss PAPI_HWINFO Get information about t
65. ned For more code examples see profile c profile twoevents c Or sprofile c in the ctests directory of the PAPI source distribution For a more extensive description of the parameters in the PAPI profil call see mer PAPI profil man page or its html counterpart at http icl cs utk edu projects papi files html_man3 papi_profil html In the following code example PAPI profil is used to generate a PC histogram include lt papi h gt include lt stdio h gt main int retval int EventSet PAPI NULL unsigned long start end length PAPI exe_info t prginfo unsigned short profbuf Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI_VER CURRENT amp retval gt 0 fprintf stderr PAPI library version mismatch 0 exit 1 if retval lt 0 handle error retval 62 PAPI User s Guide Version 3 5 0 if prginfo PAPI get executable info NULL handle error 1 start unsigned long prginfo gt text_ start end unsigned long prginfo gt text_ end length end start profbuf unsigned short malloc length sizeof unsigned short if profbuf NULL handle error 1 memset profbuf 0x00 length sizeof unsigned short if PAPI creat ventset amp EventSet PAPI OK handle error retval Add Total FP Instructions Executed to our EventSet if PAPI add
66. ned by calling the following low level function 30 PAPI User s Guide Version 3 5 0 C PAPI state EventSet status Fortran PAPIF state EventSet status check ARGUMENTS EventSet an integer handle for a PAPI event set as created by PAPI Create eventset status an integer containing a Boolean combination of one or more of the following nonzero constants as defined in the PAPI header file papi h EventSet is stopped EventSet is running EventSet temporarily disabled by the librar EventSet defined but not initialized PAPI OVERFLOWING EventSet has overflow enabled PAPI PROFILING EventSet has profiling enabled EventSet has multiplexing enabled EventSet is attached to another thread process In the following code example PAPI state is used to return the counting state of an EventSet include lt papi h gt include lt stdio h gt main int retval status 0 EventSet PAPI NULL Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT fprintf stderr PAPI library init error n exit 1 Create the EventSet if PAPI create eventset amp EventSet PAPI OK handle error 1 Add Total Instructions Executed to our EventSet if PAPI add event amp EventSet PAPI TOT INS l PAPI OK handle error 1 31 PAPI User s Guide Versio
67. ng the counters lld n values 0 Notice that in order to get the desired results the second line approximately twice as large as the first line PAPI_reset was called to reset the counters since PAPI_read did not reset the counters 42 PAPI User s Guide Version 3 5 0 PAPI TIMERS PAPI timers use the most accurate timers available on the platform in use These timers can be used to obtain both real and virtual time on each supported platform The real time clock runs all the time e g a wall clock and the virtual time clock runs only when the processor is running in user mode REAL TIME Real time can be acquired in clock cycles and microseconds by calling the following low level functions respectively C PAPI get real cyc PAPI get real usec Fortran PAPIF get real cyc check PAPIF get real usec check Both of these functions return the total real time passed since some arbitrary starting point and are equivalent to wall clock time Also these functions always succeed error free since they are guaranteed to exist on every PAPI supported platform In the following code example PAPI get real cyc and PAPI get real usec are used to obtain the real time it takes to create an event set in clock cycles and microseconds respectively include lt papi h gt main long long start cycles end cycles start usec end usec int EventSet PAPI NULL i
68. nterface NATIVE EVENTS WHAT ARE NATIVE EVENTS Native events comprise the set of all events that are countable by the CPU There are generally far more native events available than can be mapped onto PAPI preset events Even if no preset event is available that exposes a given native event native events can still be accessed directly To use native events effectively you should be very familiar with the particular platform in use PAPI provides access to native events on all supported platforms through the low level interface Native events use the same interface as used when setting up a preset event but since a PAPI preset event definition is not available for native events a native event name must often be translated into an event code before it can be used Native event codes and names are platform dependent so native codes for one platform are not likely to work for any other platform To determine the native events for your platform see the native event lists for the various platforms in the processor architecture manual Every attempt is made to keep native event names used by PAPI as similar as possible to those used in the vendor documentation This is not always possible The utility code util papi native avail provides insight into the names of the native events for a specific platform Native events are specified as arguments to the low level function PAPI_add_event in a manner similar to adding PAPI preset events In the
69. oad and store events on three different types of data structures Three static arrays of 16 384 ints were declared in the program and three dynamic arrays of 16 384 ints were malloc d The data range was specified sequentially to be the starting and ending addresses of each of e the pointers to the malloc d arrays e the malloc d arrays themselves e the statically declared arrays The work done in each case consisted of loading an initialization value into each element of each array and then summing the values of each element This should produce 16 384 loads and 16 384 stores on each element For the pointers the size was 8 bytes and the starting and ending addresses could be specified exactly Output is shown below Measure loads and stores on the pointers to the allocated arrays Expected loads 32768 Expected stores 0 These loads result from accessing the pointers to compute array addresses They will likely disappear with higher levels of optimization Requested Start Address 0x6000000000011640 Start Offset Ox 0 Actual Start Address 0x6000000000011640 Requested End Address 0x6000000000011648 End Offset Ox 0 Actual End Address 0x6000000000011648 loads retired 32768 stores retired 0 Requested Start Address 0x6000000000011628 Start Offset Ox 0 Actual Start Address 0x6000000000011628 Requested End Address 0x6000000000011630 End Offset Ox 0 Actual End Address 0x6000000000011630 loads retired 32768 stores
70. ously by the underlying hardware There are eight functions that represent the high level API that allow the user to access and count specific hardware events Note that these functions can be accessed from both C and Fortran For a code example of using the high level interface see Simple Code Examples High Level API or ctests high level c in the PAPI source distribution For full details on the calling semantics of these functions please refer to the PAPI Programmer s Reference INITIALIZING THE HIGH LEVEL API The PAPI library is initialized implicitly by several high level API calls In addition to the three rate calls discussed later either of the following two functions also implicitly initializes the library C PAPI num counters PAPI start counters events array length Fortran PAPIF num counters check PAPIF start counters events array length check 17 PAPI User s Guide Version 3 5 0 ARGUMENTS events an array of codes for events such as PAPI INT INS or a native event code array length the number of items in the events array PAPI num counters returns the optimal length of the values array for high level functions This value corresponds to the number of hardware counters supported by the current substrate PAPI num counters initializes the PAPI library using PAPI library init if necessary PAPI start counters initializes the PAPI library if necessary and starts counting
71. overflows void handler int EventSet void address long long overflow vector void context fprintf stderr handler d Overflow at p vector 0x 11lx n EventSet address overflow vector totaltt main int retval EventSet PAPI NULL Initialize the PAPI library retval PAPI library init PAPI VER CURRENT if retval PAPI VER CURRENT handle error 1 Create the EventSet if PAPI create eventset amp EventSet PAPI OK handle error 1 Add Total Instructions Executed to our EventSet if PAPI add event EventSet PAPI TOT INS PAPI OK handle error 1 Call handler every 100000 instructions retval PAPI overflow EventSet PAPI TOT INS THRESHOLD 0 handler if retval PAPI OK handle error 1 Start counting if PAPI start EventSet PAPI OK handle error 1 On success this function returns PAPI oK and on error a non zero error code is returned For more code examples see ctests overflow c ctests overflow_twoevents corctests overflow_pthreads c in the papi source distribution STATISTICAL PROFILING WHAT IS STATISTICAL PROFILING Statistical Profiling involves periodically interrupting a running program and examining the program counter at the time of the interrupt If this is done for a reasonable number of interrupting intervals the resulting program coun
72. owers of two bitmasks The perfmon library tries to optimize the alignment of these power of two regions to cover the addresses requested as effectively as possible with the four sets of registers available Perfmon first finds the largest power of two address region completely contained within the requested addresses Then it finds successively smaller power of two regions to cover the errors on the high and low end of the requested address range The effective result is that the actual range specified is always equal to or larger than and completely contains the requested range and can occupy from one to four pairs of address registers In some cases this can result in significant overcounts of the events of interest especially if two active data structures are located in close proximity to each other This may require that the developer insert some padding structures before and or after a particular structure of interest to guarantee accurate counts Supporting Software To make this new PAPI feature more accessible and easier to use a test case was developed to both provide a coding example and to exercise and test the fuctionality of the data ranging features of the Itanium 2 In addition the 65 PAPI User s Guide Version 3 5 0 papi_native_event utility was modified to make it easier to identify events that support these features The data_range c Test Case A test case called data_range was developed that measures memory l
73. ppings from symbolic names PAPI preset name to machine specific definitions native countable events for a particular hardware resource For example Total Cycles in user mode is PAPI_TOT_CYC Also PAPI supports presets that may be derived from the underlying hardware metrics For example Total L1 Cache Misses PAPI_L1_TCM might be the sum of L1 Data Misses and L1 Instruction Misses on a given platform A preset can be either directly available as a single counter derived using a combination of counters or unavailable on any particular platform 11 PAPI User s Guide Version 3 5 0 The PAPI library names approximately 100 preset events which are defined in the header file papiStdEventDefs h For a given platform a subset of these preset events can be counted though either a simple high level programming interface or a more complete C or Fortran low level interface For a representative list of all the preset events on some supported platforms visit the PAPI web page http icl cs utk edu projects papi presets html Note that processors and software are revised over time and this list may not be up to date To determine exactly which preset events are available on a specific platform run util papi avail c in the papi source distribution The exact semantics of an event counter are platform dependent PAPI preset names are mapped onto available events so as to map as many countable events as possible on different platforms
74. red 16392 stores retired 16392 For the static arrays the locations of the arrays resulted in significant offsets and hence significant errors The most interesting case is the second one in which the starting offset can be seen to force the inclusion of all three pointers to the malloc d arrays Because of this the loads retired count is too high by 98310 almost exactly 3 32768 98204 Measure loads and stores on the static arrays These values will differ from the expected values by the size of the offsets Expected loads 16384 Expected stores 16384 Requested Start Address 0x60000000000218cc Start Offset 0x 18cc Actual Start Address 0x6000000000020000 Requested End Address 0x60000000000318cc End Offset 0x 734 Actual End Address 0x6000000000032000 loads_retired 18432 stores retired 18432 Requested Start Address 0x60000000000118cc Start Offset 0x 18cc Actual Start Address 0x6000000000010000 Requested End Address 0x60000000000218cc End Offset 0x 734 Actual End Address 0x6000000000022000 loads retired 115155 stores retired 16845 Requested Start Address 0x60000000000318cc Start Offset 0x 18cc Actual Start Address 0x6000000000030000 67 PAP
75. retired 0 Requested Start Address 0x6000000000011638 Start Offset Ox 0 Actual Start Address 0x6000000000011638 Requested End Address 0x6000000000011640 End Offset Ox 0 Actual End Address 0x6000000000011640 loads retired 32768 stores retired 0 66 PAPI User s Guide Version 3 5 0 For the allocated arrays small offsets were introduced in each case and the resulting error in the loads and stores is exactly what would be predicted by the activity in the adjacent memory locations Measure loads and stores on the allocated arrays themselves Expected loads 16384 Expected stores 16384 Requested Start Address 0x6000000004044010 Start Offset Ox 10 Actual Start Address 0x6000000004044000 Requested End Address 0x6000000004054010 End Offset Ox 0 Actual End Address 0x6000000004054010 loads retired 16384 stores retired 16384 Requested Start Address 0x6000000004054020 Start Offset Ox 20 Actual Start Address 0x6000000004054000 Requested End Address 0x6000000004064020 End Offset Ox 0 Actual End Address 0x6000000004064020 loads retired 16388 stores retired 16388 Requested Start Address 0x6000000004064030 Start Offset 0x 30 Actual Start Address 0x6000000004064000 Requested End Address 0x6000000004074030 End Offset 0x 10 Actual End Address 0x6000000004074040 loads_reti
76. s only as powerful as the substrate upon which it is built Thus some features may not be available on every platform The converse may also be true that more advanced features may be available on every platform and defined in the header file Therefore the user is encouraged to read the documentation for each platform carefully There are approximately 50 functions that represent the low level API For a code example of using the low level interface see Simple Code Examples Low Level API or ctests low_level c in the PAPI source distribution Note that most functions are implemented in both C and Fortran but some are implemented in only one of these two languages For full details on the calling semantics of these functions please refer to the PAPI Programmer s Reference INITIALIZATION OF THE LOW LEVEL API The PAPI library must be initialized before it can be used It can be initialized explicitly by calling the following low level function Cs PAPI library init version Fortran PAPIF library init check ARGUMENT version upon initialization PAPI checks the argument against the internal value Of PAPI_VER_CURRENT when the library was compiled This guards against portability problems when updating the PAPI shared libraries on your system Note that this function must be called before calling any other low level PAPI function On success this function returns PAPI_VER_CURRENT On error a positive return cod
77. strat Number of Number of The defaul Available preset events the substrate supports native events the substrate supports t domain when this substrate is used domains Default granularity when this substrate is used Signal Available granularities number used by the multiplex timer D if not Number of the itimer or POSIX 1 timer used by the uS bet multiplex timer ween switching of sets Signal used by hardware to deliver PMC events Width of opcode matcher if exists 0 if not hardware intr 1 hw overflow intr does not need to be emulated in software precise intr 1 posixlb timers kernel profile Perfo cle Usi rmance interrupts happen precisely ng POSIX 1b interval timers timer_create instead of setitimer SE Has kernel profiling support buffered int errupts or sprofil like kernel multiplex 1 In kernel multiplexing data address range 1 instr_address_range 1 Supports instruction address range fast_counter read 1 L Zaart real time fast virtual t attach 1 attach must _ pt edge_detect 1 invert 1 profile ear 1 grouped _cntrs e infot Esl ZS imer 1l race l Supports data address range limiting limiting Supports user level PMC read instruction Supports a fast real timer Supports a fast virtual timer Supports attach Attach
78. sts zero pthreads c and ctests zero_ omp c in the papi source distribution respectively Also for a code example of using SMP with PAPI see ctests zero_ smp c in the papi source distribution MPI MPI is an acronym for Message Passing Interface MPI is a library specification for message passing proposed as a standard by a broadly based committee of vendors implementers and users MPI was designed for high performance on both massively parallel machines and on workstation clusters More information on MPI can be found at http www unix mcs anl gov mpi PAPI supports MPI When using timers in applications that contain multiplexing profiling and overflow MPI uses a default virtual timer and must be converted to a real timer in order to for the application to work properly Otherwise the application will exit Optionally the supported tools TAU and SvPablo can be used to implement PAPI with MPI The following is a code example of using MPI s PI program with PAPI include lt papi h gt include lt mpi h gt include lt math h gt include lt stdio h gt int main argc argv int argc char argv int done 0 n myid numprocs i rc retval EventSet PAPI NULL double PI25DT 3 141592653589793238462643 double mypi pi h sum x a long long values 1 long long 0 MPI Init amp argc amp argv MPI Comm size MPI COMM WORLD numprocs MPI Comm rank MPI COMM WOR
79. t PAPI shlib info t typedef struct _papi_program_info char fullname PAPI_HUGE STR DEN pathtname PAPI address map t address info PAPI exe info t 37 PAPI User s Guide Version 3 5 0 For PAPI_SUBSTRATEINFO typedef struct _papi_substrate_ option Char name PAPI MAX STR_LEN Name of the substrate we re using usually CVS RCS Id char version PAPI MIN STR_LEN Version of this substrate usually CVS Revision char support_version PAPI MIN STR_LEN Version of the support library char kernel version PAPI MIN STR_LEN Version of the kernel PMC support driver Number of hardware counters substrate supports ntrs Number of multiplexed counters the substrate or PAPI supports int num_cntrs int num mpx_c int num preset events int num_native events L int default domain gt int available_domains CH int default_granularity int available granularities int multiplex timer sig int multiplex_ timer num int multiplex timer us int hardware intr sig int opcode match width int reserved_ints 4 unsigned int unsigned int unsigned int unsigned int unsigned int unsigned int unsigned int unsigned int unsigned in unsigned in unsigned in unsigned in ct oo er ct as unsigned in unsigned int unsigned int unsigned int PAPI _sub
80. t values PAPI OK handle error 1 On success these functions return PAPI OK and on error a non zero error code is returned RESETTING EVENTS IN AN EVENT SET The hardware event counts in an event set can be reset to zero by calling the following low level function C 27 PAPI User s Guide Version 3 5 0 PAPI reset EventSet Fortran PAPI reset EventSet check ARGUMENT EventSet an integer handle for a PAPI event set as created by PAPI create eventset For example the EventSet in the code example of the previous section could have been reset to zero by adding the following lines if PAPI reset EventSet PAPI OK handle error 1 On success this function returns PAPI oK and on error a non zero error code is returned REMOVING EVENTS IN AN EVENT SET A hardware event and an array of hardware events can be removed from an event set by calling the following low level functions respectively C PAPI remove event EventSet EventCode PAPI remove events EventSet EventCode number Fortran PAPIF remove event EventSet EventCode check PAPIF remove events EventSet EventCode number check ARGUMENTS EventSet an integer handle for a PAPI event set as created by PAPI Create eventset EventCode a defined event such as PAPI TOT INS or a native event EventCode an array of defined events number an integer indicating the number of events in the arr
81. ter distribution will be statistically representative of the execution profile of the program with respect to the interrupting event Performance tools like UNIX prof 60 PAPI User s Guide Version 3 5 0 sample the program address with respect to time and hash the value into a histogram At program completion the histogram is analyzed and associated with symbolic information contained in the executable GNU prof in conjunction with the p option of the GCC compiler performs exactly this analysis using the process time as the interrupting trigger PAPI aims to generalize this functionality so that a histogram can be generated using any countable hardware event as the basis for the interrupt signal GENERATING A PC HISTOGRAM A PC histogram can be generated on any countable event by calling either of the following low level functions C PAPI profil buf bufsiz offset scale EventSet EventCode threshold flags PAPI sprofil prof profcnt EventSet EventCode threshold flags Fortran PAPI profil buf bufsiz offset scale EventSet EventCode threshold flags check AGRUMENTS buf pointer to profile buffer array bufsiz number of entries in buf offset starting value of lowest memory address to profile scale scaling factor for bin values EventSet The PAPI EventSet to profile when it is started EventCode code of the Event in the EventSet to profile threshold
82. threshold value for the Event triggers the handler flags bit pattern to control profiling behavior The defined bit values for the flags variable are shown in the table below PAPI PROFIL POSIX PAPI PROFIL RANDOM PAPI PROFIL WEIGHTED PAPI PROFIL COMPRESS PAPI PROFIL BUCKET 16 ee FEODAL F Sa EE PAPI PROFIL FORCE SW 61 PAPI User s Guide Version 3 5 0 prof pointer to PAPI sprofil_t structure profcnt number of buffers for hardware profiling reserved PAPI profil creates a histogram of overflow counts for a specified region of the application code by using its first four parameters to create the data structures needed by PAPI sprofil and then calls PAPI sprofil to do the work PAPI sprofil assumes a pre initialized PAPI sprofil_ t structure and enables profiling for the EventSet based on its value Note that the EventSet must be in the stopped state in order for either call to succeed More than one hardware event can be profiled at the same time by making multiple independent calls to these functions for the same EventSet before calling PAPI start This can be useful for the simultaneous generation of profiles of two or more related events for example L1 cache misses and L2 cache misses Profiling can be turned off for specific events by calling the function for that event with a threshold of zero On success these functions return PAPI OK and on error a non zero error code is retur
83. tions or operations and the appropriate rate of execution since the last call A call to PAPI _ stop counters will reinitialize all values to 0 Sequential calls to different execution rate functions will return an error Note that on many platforms there may be subtle differences between floating point instructions and operations Instructions are typically those execution elements most directly measured by the hardware counters They may include 19 PAPI User s Guide Version 3 5 0 floating point load and store instructions and may count instructions such as FMA as one even though two floating point operations have occurred Consult the hardware documentation for your system for more details Operations represent a derived value where an attempt is made when possible to more closely map to the theoretical definition of a floating point event On success the rate calls return PAPI OK and on error a non zero error code is returned For a code example see ctest flops c or ctest ipc c in the papi source distribution READING ACCUMULATING AND STOPPING COUNTERS Counters can be read accumulated and stopped by calling the following high level functions respectively C PAPI read counters values array length PAPI accum counters values array length PAPI stop counters values array length Fortran PAPIF read counters values array length check PAPIF accum counters values array length check
84. y describing the project its motivation and its architecture INSTALLING PAPI This section provides an installation guide for PAPI It states the necessary steps in order to install PAPI on the various supported operating systems C AND FORTRAN CALLING INTERFACES This section states the header files in which function calls are defined and the form of the function calls for both the C and Fortran calling interfaces Also it provides a table that shows the relation between certain pseudo types and Fortran variable types EVENTS This section provides an explanation of events as well as an explanation of native and preset events The preset query and translation functions are also discussed in this section There are code examples using native events preset query and preset translation with the corresponding output PAPI COUNTER INTERFACES This section discusses the high level and low level interfaces in detail The initialization and functions of these interfaces are also discussed Code examples along with the corresponding output are included as well PAPI User s Guide Version 3 5 0 PAPI TIMERS This section explains the PAPI functions associated with obtaining real and virtual time from the platform s timers Code examples along with the corresponding output are included as well PAPI SYSTEM INFORMATION This section explains the PAPI functions associated with obtaining hardware and executable information Code examp
Download Pdf Manuals
Related Search
Related Contents
自動体外式除細動器賃貸借[PDFファイル/59KB] Massive Floor lamp 42194/06/10 SOLUTIONS Samsung E27 PAR30 15W dim. Contactless Twins Charger for Wii Remote Control User's Manual Installationshandbuch der Serie 1X-F Model: PRT-E - The Floor Heating Warehouse Industrial Lite-Managed PCI Ethernet Switch Card Copyright © All rights reserved.
Failed to retrieve file