Home
TAU User Guide - Computer and Information Science
Contents
1. TAUDB TRIAL trial TAUDB COUNTER VALUE taudb get counter value TAUDB COUNTER VALUE counter values TAUDB COUNTER counter TAUDB THREAD thread TAUDB TIMER CALLPATH context char timestamp get the timer call data for a trial extern TAUDB TIMER CALL DATA taudb query timer call data TAUDB CONNECTION connection TAUDB TRIAL trial TAUDB TIMER CALLPATH timer callpath TAUDB THREAD thread extern TAUDB TIMER CALL DATA taudb query all timer call data TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIMER CALL DATA taudb query timer call data stats TAUDB CONNECTION connection TAUDB TRIAL trial TAUDB TIMER CALLPATH timer callpath TAUDB THREAD thread extern TAUDB TIMER CALL DATA taudb query all timer call data stats TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIMER CALL DATA taudb get timer call data by id TAUDB TIMER CALL DATA timer call data int id extern TAUDB TIMER CALL DATA taudb get timer call data by key TAUDB TIMER CALL DATA timer call data TAUDB TIMER CALLPATH callpath TAUDB THREAD thread char timestamp get the timer values for a trial extern TAUDB TIMER VALUE taudb query timer values TAUDB CONNECTION connection TAUDB TRIAL trial TAUDB TIMER CALLPATH timer callpath TAUDB THREAD thread TAUDB METRIC metric extern TAUDB TIMER VALUE taudb query timer stats TAUDB CONNECTION connection TAUDB TRIAL trial T
2. X User Event Window Application 13 Experiment 23 Trial 57 Dx File Options Windows Help Name Message size sent to all nodes value Type Number of Samples 31504 nct0 0 0 JE 31506 am ct 15 0 0 31506 nc 112 0 0 31508 eg nc 127 0 0 47258 nc 1 0 0 47258 DC 2 0 0 47258 el nc02 00 47258 1 C t 4 0 0 47250 m me 015 00 47258 m 1 15 0 0 47258 ma 1017 00 47258 mEui 9 1 C t 8 0 0 47258 m MC 1 9 0 0 47258 e 5 0 10 0 0 7 nc 11 0 0 47258 nC 12 0 0 47258 e nc 13 0 0 47258 nct 14 0 0 472538 e DC 16 0 0 47253 nc 32 0 0 47258 ct 48 0 0 47258 DC 64 0 0 47253 eg ct 30 0 0 47258 Ct 96 0 0 47259 DC 31 0 0 47259 ee D 47 0 0 This display graphs the value that the particular user event had for each thread 15 2 Ledgers ParaProf has three ledgers that show the functions groups and user events 15 2 1 Function Ledger Figure 15 2 Function Ledger 46 Miscellaneous Displays x X Function Ledger Window uintah1 6 ppk packedidataiamorris homd s H File Windows Help Bl Add Reference data ParticleVariable lt T gt allocated Bl Add Reference pset Particlevariable T allocate Allocate Data Particlevariable T all
3. imary metadata by name from trial IMARY METADATA current TADATA F P m E econdary metadata by id from trial ECONDARY METADATA current ndif TAUDB API H 31 4 TAUdb C API Examples 31 4 1 Creating a trial and inserting into the database 1 fi 1 fi 1 fi in nclude string nclude dirent nclude dump fu nclude taudb api h nclude stdio h n nclude lt sys types h gt h nctions h t main int argc char argv TAUDB_CONNECTION connection NULL if argc gt 2 connection taudb connect config argv 1 else fprintf stderr Please specify a TAUdb config file n exit 1 printf Checking connection n taudb check connection connection create a trial TAUDB TRIAL trial taudb create trials 1 trial name taudb strdup TEST TRIAL set the data source to other trial gt data source taudb get data source by id taudb query data sources connection 999 create some metadata TAUDB PRIMARY MI ETADATA pm taudb create primary metadata 1 pm gt name taudb_strdup Application pm gt value taudb strdup Test Application taudb add primary metadata to trial trial pm pm taudb create primary metadata 1 pm gt name taudb strdup Start Time pm gt value taudb strdup 2012 11 07 12 30 00 ta
4. Cluster Results Correlation Results gt L3 BigScience gt 3 CFDSHIP Field Value gt 3 LAMMPS large scale Atomic Molecula f Name P WALL CLOCK TIME gt 3 Miranda Metric ID 1270 fa Trial Ib 430 gt 3 Por gt 3 SHAMRC gt 3 SMG2000 gt 3 SPhot gt 3 Uintah gt 3 wee gt 3 gyro Bl sid gt 13 gyro Bl sid HFM gt 3 gyro B2 cy gt 13 gyro R2 lt y HPM l gt 3 gyro B3 gic gt 13 gyro B3 gic HP v LS sPPM V LE Frost v 16 16 gt P_WALL CLOCK TIRE gt PAPI FP INS b 0 PAPLINT_INS gt 6 PAPI TOT Cer gt 6 PAPI TOT IIS gt PAPI TOT INS gt 3 socorro 5Si2 56 input gt 3 Views After selecting the metric of interest select the Do Clustering item under the Analysis main menu bar item The following dialog will appear Figure 20 5 Confirm Clustering Options 58 Cluster Analysis eoo Confirm Analysis Analysis method K Means Dimension Reduction none Normalization none Max Clusters 10 Trial sPPM Frost 16 16 P_WALL_CLOCK_TIME Perform clustering with the these options Ma not yer After confirming the clustering the clustering will begin When the clustering results are available you can view them in the Cluster Results tab Figure 20 6 Cluster Results PerfExplorer Client File Analysis Views Charts Visualization Help Performance Data Y 12 Database Profiles gt J
5. nct 0 01 DS m Bold Size BE 0 Italic 0 10 20 30 40 Window defaults Settings C Show Path Title in Reverse Units Microseconds j D la Reverse Call Paths Interpret threads that do not call a given _ function as a 0 value for statistics computation Generate data for reverse calltree L requires lots of memory C Show Values as Percent does not apply to currently loaded profiles C Show Source Locations Restore Defaults Apply Cancel The preferences window allows the user to modify the behavior and display style of ParaProfs win dows The font size affects bar height a sample display is shown in the upper right The Window defaults section will determine the initial settings for new windows You may change the initial units selection and whether you want values displayed as percentages or as raw values The Settings section controls the following Show Path Title in Reverse Path title will normally be shown in normal order home amorris data etc They can be reverse using this option etc data amorris home This only affects loaded trials and the titlebars of new windows Reverse Call Paths This option will immediately change the display of all callpath functions between Root gt Leaf and Leaf lt Root Statistics Computation Turning this option on causes the mean computation to take the sum of value for a function across all threads and divide it by the
6. setenv TAU METRICS PAPI FP OPSN PAPI L1 DCH In addition to PAPI counters we support TIME via unix gettimeofday On Linux and CrayCNL sys tems we provide the high resolution LINUXTIMERS metric and on BGL BGP systems we provide BGLTIMERS and BGPTIMERS Chapter 3 Tracing Typically profiling shows the distribution of execution time across routines It can show the code loca tions associated with specific bottlenecks but it can not show the temporal aspect of performance vari ations Tracing the execution of a parallel program shows when and where an event occurred in terms of the process that executed it and the location in the source code This chapter discusses how TAU can be used to generate event traces 3 1 Generating Event Traces To enable tracing with TAU set the environment variable TAU TRACE to 1 Similarly you can enable disable profile with the TAU PROFILE variable Just like with profiling you can set the output direct ory with a environment variable setenv TRACEDIR users sameer tracedata experiment56 This will generate a trace file and an event file for each processor To merge these files use the tau treemerge pl script If you want to convert TAU trace file into another format use the tau2ot f tau2vtf or tau2slog2 scripts 10 Chapter 4 Analyzing Parallel Applications 4 1 Text summary For a quick view summary of TAU performance use pprof It reads and prints a summa
7. DI Di D 5 R R R R R R 255 R 255 R 255 EFERENCES C C CASCADI Z PDATE CASCADE F ER EIGN ON D ET FOREIGN Y ON DI E AX pd A Ez tE ET trial CASCADI metric Gl Bm HOHO R NCES trial id PDATE CASCADE ENCES metric LH un CASCADI LH O piece 272 DI PDATE CASCAD GI NOT NULL p NULL NOT NULL NOT NULL taudb_view id id operator gt in value 1 he application and experiment columns are not used in the PRIMARY KEY 108 Database Schema CR CR CRI CRI CRI UUUUUUUUUu Perfor create index creat create table analysis result id SERIAL NOT NULL PRIMARY KEY analysis settings INTEGER NOT NULL description VARCHAR 255 NOT NULL thumbnail size INTEGER NULL image size INTEGE NULL thumbnail BYTEA NULL image BYTEA NULL result type INTEGER NOT NULL mance indexes trial name index on trial name x EATE EX EAT EX EX thread trial on thread trial EATE EAT EATI EX timer name index on timer name timer callpath parent on timer callpath parent timer call data t
8. setenv TAU VERBOSE 1 setenv PROFILEDIR home sameer profiledata experiment55 mpirun np 4 matrix oe oe oe Other environment variables you can set to enable these advanced MPI measurement features are TAU TRACK MESSAGE to track MPI message statistics when profiling or messages lines when tracing and TAU COMM MATRIX to generate MPI communication matrix data 2 2 Reducing Performance Overhead with TAU THROTTLE TAU automatically throttles short running functions in an effort to reduce the amount of overhead asso ciated with profiles of such functions This feature may be turned off by setting the environment variable TAU THROTTLE to 0 The default rules TAU uses to determine which functions to throttle is numc alls 100000 amp amp usecs call 10 which means that if a function executes more than 100000 times and has an inclusive time per call of less than 10 microseconds then profiling of that func tion will be disabled after that threshold is reached To change the values of numcalls and usecs call the user may optionally set environment variables setenv TAU THROTTLE NUMCALLS 2000000 setenv TAU THROTTLE PERCALL 5 9 5 9 5 The changes the values to 2 million and 5 microseconds per call Functions that are throttled are marked explicitly in there names as THROTTLED 2 3 Profiling each event callpath You can enable callpath profiling by setting the environment variable TAU_CA
9. Chapter 23 Custom Charts In addition to the default charts available in the charts menu there are is a custom chart interface To ac cess the interface select the Custom Charts tab on in the results pane of the main window as shown Figure 23 1 The Custom Charts Interface TAU PerfExplorer Client File Analysis Views Charts Visualization Help is genns 7 7 GTC compiler options Analysis Management Cluster Results Correlation Results 9 Custom Charts 9 E GTC compiler options loops Main Only Call Paths Log Y Scalability Efficiency Strong Scaling Horizontal Show Y Axis Zero 9 C ocracoke 440d Chant Title 5 7 gtempi Time gt gtcmpi 02 Series Name Value Mean Time seconds gt gtcmpi 03 experimentname E 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 ial grempt 04 X Axis Value a i as TERN trial xml metadata ocracoke noinline oe agg ya X Axis Name 5 gtcmpi 02 t ji 02 gtcmpi 02 Y Axis Value ED 7 gtempi 03 4 mean inclusive X D o 7 gtempi 04 EN amp E gtcmpi 05 s Y Axis Name SE E z Q o STaSITES Dimension reduction EI gtcmpi Q 02 None e amp S gtcmpi 02 4 Cutoff 0 100 Q 03 e c gtcempi 03 4 5 Q 04 e gtcmpi 04 z Ed smi Metric a o gt c gtcempi 05 ET o Q 05 C ocracoke strict Ame m 2 qstrict Q gt nits a gt 7 atcmpi J SO cJ a
10. actual data from the database ant rds har name truct taudb data source data source nt node count i e number of processes nt contexts per node rarely used usually 1 nt threads per context max number of threads per process can be less on individual processes nt total threads total number of threads arrays of data for this trial truct taudb metric metrics by id truct taudb metric metrics by name truct taudb thread threads truct taudb time range time ranges truct taudb timer timers by id Gm Q ID om mp Ir Hr 113 TAUdb C API struct taudb timer timers by name struct taudb timer group timer groups struct taudb timer callpath timer callpaths by id struct taudb timer callpath timer callpaths by name struct taudb timer call data timer call data by ig struct taudb timer call data timer call data by key struct taudb counter counters by id struct taudb counter counters by name Struct taudb counter value counter values struct taudb primary metadata primary metadata struct taudb secondary metadata secondary metadata struct taudb secondary metadata secondary metadata by key TAUDB TRIAL RR KR KKK KK OK KK IK OK AA data dimensions ROR KR KR KKK KK OK RK KK A thread represents one physical amp logical location for a measurement typedef struct taudb_thread int
11. Display this hel p message The following options will run only from the console no GUI will launch merge lt file gz gt pack file dump dumprank rank v dumpsummary overwrite 0 OSS q dumpmpisummary d metadump x suppressmetrics s summary Notes For the TAU profiles type files on the commandline erges snapshot Pack the data in Dump profile dat Dump profile dat Dump derived sta Allow overwritin Print profile da Print high level profiles to packed ppk format a to TAU profile format a for rank to TAU profile form tistical data to TAU profile for g of profiles ta in OSS style text output Print profile me Exclude child ca Print only summa only applies time and communication summary tadata works with dumpmpisumm lls and exclusive time from du ry statistics to OSS output you can specify either a specific set of profile or you can specify a directory by default the current directory The specified directory will be searched for profile files or in the case of multiple counters profile data directories named MULTI containing 22 Introduction 7 2 Supported Formats ParaProf can load profile date from many sources The types currently supported are TAU Profiles profiles Output from the TAU measurement library these files generally take the form of profile X X X one for each
12. restore the options back to the default values When the chart is generated it can be saved as a vector image by selecting File gt Save As Vector Im age The chart can also be saved as a PNG by right clicking on the chart and selecting Save As 78 Chapter 24 Visualization Under the Visualization main menu item there are five types of raw data visualization The five items are 3D Visualization Data Summary Create Boxchart Create Histogram and Create Normal Probability Chart For the Boxchart Histogram and Normal Probability Charts you can either select one metric in the trial which selects all events by default or expand the metric and select events of in terest 24 1 3D Visualization When the 3D Visualization is requested PerfExplorer examines the data to try to determine the most interesting events in the trial That is for the selected metric in the selected trial the database will calcu late the weighted relative variance for each event across all threads of execution in order to find the top four significant events These events are selected by calculating stddev exclusive max exclusive min exclusive exclusive percentage After selecting the top four events they are graphed in an OpenGL window Figure 24 1 3D Visualization of multivariate data Correlation of top 4 variant events l ScatterPlot Axes Color cale LE D 24 2 Data Summary In order to see
13. v TAU PerfExplorer Normal Probability Plot File Help Normal Probability Plot 2 34 Ordered Measurements 4 5 6 3 0 2 5 2 0 1 5 1 0 0 5 0 0 0 5 1 0 1 5 2 0 2 5 Normal N 0 1 Ordered Statistic Medians M DIFUZE M DINTRF SS INTERF SPPM W Ideal normal 83 Chapter 25 Views Often times data is loaded into the database with multiple parametric cross sections For example the charts available in PerfExplorer are primarily designed for scalability analysis however data might be loaded as a parametric study For example in the following example the data has been loaded with three problem sizes MIN HALF and FULL Figure 25 1 Potential scalability data organized as a parametric study PerfExplorer Client File Analysis views Charts Visualization Help gt 3 SHAMRC b 2 5MG2000 b 3 SPhor gt 3 Uintah b 3 WRF gt 3 gyro B1 sid Y LE oyro B1 std HPhi Y 5 Hemas gt 3 FULLIS gt 3 HALF16 gt 3 MINIS Y LI HPMO32 gt 3 Fu 32 gt 3 HALF32 gt 0 MIN32 Y HPhi064 b 3 FULL64 gt 3 HALF64 gt 3 MIN64 Y L5 HFM096 b 3 FULL96 gt 3 HALF96 gt 3 MINS6 Y LI HFM128 b 2 FULL128 gt 3 HALF128 gt 2 MIN128 EH SE LE ro 1 Cluster Results Correlation Results Field Name Experiment 1 n nare machine_type arch system os em memory size compiler epp name compiler cpp version com
14. 16 Google profiles Google INSERT INTO data sourc name id description VALUES Cube3 17 Cube 3D profiles FZJ INSERT INTO data sourc name id description VALUES Gyro 100 Self timing profiles from Gyro application INSERT INTO data sourc name id description VALUES GAMESS 101 Self timing profiles from GAMESS application INSERT INTO data sourc name id description VALUES Other 999 Other profiles threads make it convenient to identify timer values Special values for thread index 1 mean nulls ignored 2 total 3 stddev nulls ignored 4 min 5 max 6 mean nulls are 0 value 7 stddev nulls are 0 value a CREATE TABLE derived_thread_type id INT NOT NULL name VARCHAR NOT NULL description VARCHAR NOT NULL INSERT INTO derived t VALUI INSERT VALUI ES 1 ES 27 MEAN INTO derived t TOTAL hread type MEAN hread type TOTAL 1 id id name name description nulls ignored description 102 Chambreau Database Schema INSERT INTO derived thread type id name description VALUES 3 STDDEV STDDEV nulls ignored INSERT INTO derived thread type id name description VALUES 4 MIN MIN INSERT INTO derived thr
15. 2047 9 E LOGFILE BREAK LOGFILE Heap Memory KB 2047 9 ET LOGFILE CLOSE LOGFILE Heap Memory KB 2041 1 ES MPI WaitallQ Heap Memory KB 2041 1 bs MPLirecv Heap Memory KB 2039 6 ed LOGFILE OPEN LOGFILE Heap Memory KB 2039 5 2 LOGFILE WRITE PERFMON SUMMARY Heap Memory KB 2032 2 mum CURRENT DATE TIME Heap Memory KB 2029 1 oo MPI Type free Heap Memory KB 2029 El MPI Type commit Heap Memory KB 2028 8 EE MPI Type vector Heap Memory KB 2028 6 mu MPi isend0 Heap Memory KB 2028 6 mmm MOVE BLOCK Heap Memory KB 2028 6 ed MPI_AllreduceQ Heap Memory KB 2024 MPl Barrier Heap Memory KB 2023 9 EZ MPl Finalize Heap Memory KB 2023 8 LOGFILE LOGFILE WRITE STR Heap Memory KB 2023 8 Eu MESH FINALIZE Heap Memory KB 2023 4 E DBASEPROPERTIES DBASEPROPERTYINTEGER Heap Memory KB 2023 4 EO PROFILE_FINALIZE Heap Memory KB 2023 3 LLL AMR DIAGONAL PATCH Heap Memory KB 2023 3 El AMR GUARDCELL CC C TO F Heap Memory KB 2023 3 El DBASETREE DBASELOCALBLOCKCOUNT Heap Memory KB i 14 T il D This display shows a particular thread s user defined event statistics as a bar chart This is the same data from the Section 11 6 User Event Statistics Window in graphical form 39 Chapter 12 Function Based Displays ParaProf has two displays for showing a single function across all threads of execution Th
16. 3 If the C API is desired a working C compiler is required along with the following libraries libpq PostgreSQL libraries libxml2 libz libuuid These libraries are all commonly installed by default on NIX systems 28 2 Installation The TAUdb utilities and applications are installed as part of the standard TAU release Shell scripts are installed in the TAU bin directory to configure and run the utilities It is assumed that the user has in stalled TAU and run TAU s configure and make install 1 Optionally Create a database This step will depend on the user s chosen DBMS H2 Because it is an embedded file based DBMS H2 does not require creating the database before configuring TAUdb TAUdb takes advantage of the auto server capabilities in H2 so multiple clients can connect to the same database at the same time Users should use the H2 DBMS if they expect to maintain a small to moderate local repository of performance data and want the convenience of connecting to the database from multiple clients Derby Because it is an embedded file based DBMS Derby does not require creating the data base before configuring TAUdb Be advised that the Derby DBMS does not allow multiple cli ents to connect to the same database For that reason we suggest users use the H2 DBMS if a file based database is desired Derby support is maintained for backwards compatability e PostgreSQL createdb O taudb taudb Or from psql psq
17. 5 Uintah 0 75 CA WRF 0 50 ec xoc1 025 gt C Views en 05 10 I IL I There are a number of images in the Correlation Results window Each thumbnail represents a pair wise correlation plot of two events Clicking on a thumbnail image in the main window will bring up the images as shown below Figure 21 5 Correlation Example 66 Correlation Analysis analysis result Correlation Results r 0 9464011339119169 1 00 c LS 0 90 0 85 0 80 0 75 4 0 70 0 65 0 60 0 55 0 50 0 45 0 40 0 35 0 30 1 0 25 0 20 0 15 0 10 0 05 0 00 0 0 0 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 0 DIFUZE Bl sPPM Frost 16 18 PAPI FP INS W Fitted Linear Regression Line W Fitted Power Regression Line OK 67 Chapter 22 Charts 22 1 Setting Parameters There are a few parameters which need to be set when doing comparisons between trials in the database If any necessary setting is not configured before requesting a chart you will be prompted to set the value The following settings may be necessary for the various charts available 22 1 1 Group of Interest TAU events are often associated with common groups such as MPI TRANSPOSE etc This value is used for showing what fraction of runtime that this group of events contributed to the total runtime Figure 22 1 Setting Group of Interest OOO Group of interest Please enter the group of interest
18. Generator 48 16 EE 49 16 1 Preferences Window 4 eeepc terere dre ey RI rer pP er t ences 49 16 2 Default Colors 5 e Ree ep e Er RM 50 16 3 Color Map i tite Ext not iere tii 50 IIL PerfExplorer User s Manual vs Ree re sees ds 52 17 Kettel e lee 54 18 Installation and Configuration esesse HH emer 55 18 1 Available configuration options tena sean een eees 55 19 Running PerfExplorer sise 56 20 Cluster Analysis o nee Re Reime rte E SEP EEEE Re 57 20 1 Dimension Reduction sssssssI Herrn 57 20 2 Max Number of Clusters bote ete Pete Str Poe lient Rer 57 20 3 Performing Cluster Analysis sese 58 21 Correlation Analysis 1 5 pete tr rte i DER REESEN EE 64 2 T Dimension Reduction 14 un foes sedeo ertet oio e o iaa 64 21 2 Performing Correlation Analysis ese teen teen seca een eene eens 64 22 CHATS tas Jeanie cc 68 22 1 Setting Parameters eee Eee ette e ert ku 68 22 1 1 Group of Interests oerte eee ee tee deter es 68 22 1 2 Metric OF Interest Aii adis den eiert AE Ue emu dt 68 22 453 2B vent of Interest eegend rere rt rre EP oben EA 68 22 1 4 Total Number of Timesteps coooconoconccnnncnnncnncnnoconcconccnnconnccnncons 69 22 2 Standard Chart Types 5 erede ret NEEN tutos 69 22 2 1 Timesteps Per Second sssssss e 69 22 22 Relative Efficiency sicir td eth ER E p EE RR S 70 22 2
19. Phase main void int char ill Akad File Options Windows Help Phase main void int char Metric Time alue Exclusive std dev mean n c t 0 00 n c t 1 0 0 n c t 2 0 0 nct 3 0 0 n c t 4 0 0 n c t 5 0 0 n c t 6 0 0 n c t 7 0 0 n c t 8 0 0 n c t 9 0 0 n c t 10 0 0 nct 11 0 0 I l i To access other phases either right click on the phase and select Open Profile for this Phase or go to the Phase Ledger and select it there Figure 13 2 Phase Ledger Phase Ledger c phaselexamples tau2 amol Dx File Windows Help 10 Phase Iteration O Iteration 1 Iteration 2 Iteration 3 Iteration 4 maing ParaProf can also display a particular function s value across all of the phases To do so right click on a 42 Phase Based Displays function and select Show Function Data over Phases Figure 13 3 Function Data over Phases X nc0 0 0 Function Data c phase examples tau2 amorris home W i HE File Options Windows Help Name mann Metric Name Time alue Exclusive Units seconds 1002 mE X n M eration O 1 002 main 1000 E eration 2 1 002 m U w H iteration 4 1 002 Iteration 3 1002 uw iteration 1 Because
20. So to create a View for all trials that have less than 16 threads select total threads read as a string is less than 16 Then click Save and give the View a name The Edit context menu option on an existing view will allow you to view and alter the view s criteria in the same interface 26 Chapter 9 Profile Data Management ParaProf uses PerfDMF to manage profile data This enables it to read the various profile formats as well as store and retrieve them from a database 9 1 ParaProf Manager Window Upon launching ParaProf the user is greeted with the ParaProf Manager Window Figure 9 1 ParaProf Manager Window X ParaProt Manager inj x File Options Help Applications o E Standard Applications o EI Default App 9 7 Default Exp 9 7 uintah16 ppk packed data O PAPI_FP_INS O P WALL CLOCK TIME PAPI TOT CYC PAPI_L1_DCM P VIRTUAL TIME gt CH Runtime Applications 9 DB Applications gt CI AORSA2D 7 Basic run time profiling for Socorro CA Gyro 3 gyro B1 std E Heap memory management for Socorro E hydroshock ES MFIX 5 mpiP data 4 New Application CA PNEO E Por gt 53D Field Name AORSA2D Application ID EA version description language paradigm usage_text execution options userdata 4797979797979 E OR 79 This window is used to manage profile data The user can upload d
21. intermediate phases dynamic or static or sample aggregations typedef struct taudb timer or created by a post processing tool 114 TAUdb C API nt column number end truct taudb timer group groups struct taudb timer parameter parameters UT hash handle trial hash by ig what column number does the timer end on int id database value also key to hash struct taudb trial trial pointer back to trial NOTE Necessary char name the full timer name can have file line tc char short name just the function name for example char source file what source file does this function live in int line number what line does the timer start on int line number end what line does the timer end on int column number what column number does the timer start on E S hash of groups using group hash handle hh2 array of parameters hash key for id lookup UT hash handle trial hash by name hash key for name lookup in temporary hash UT hash handle group hash by name hash key for name lookup in timer group TAUDB TIMER ROR KR KR KK hh hh A timer related structures RR KR KK KK hh OA timer groups are the groups such as tau_default mpi openmp tau_phase tau_callpath tau_param etc this mapping table allows for nx
22. the speedup or efficiency chart will be interpreted as a strong scaling study the workload is the same for all trials When selected the button will change to Weak Scaling and the chart will be interpreted as a weak scaling study the workload is propor tional to the total number of threads in each trial Horizontal when selected the chart X and Y axes will be swapped Show Y Axis Zero when selected the chart will include the value 0 When deselected the chart 77 Custom Charts will only show the relevant values for all data points e Chart Title value to use for the chart title Series Name Value the field to be used to group the data points as a series X Axis Value the field to use as the X axis value X Axis Name the name to put in the chart for the value along the X axis e Y Axis Value the field to use as the Y axis value Y Axis Name the name to put in the chart for the value along the X axis Dimension Reduction whether or not to use dimension reduction This is only applicable when Main Only is disabled e Cutoff when the Dimension Reduction is enabled the cutoff value for selecting All Events Metric The metric of interest for the Y axis Units The unit to be selected for the Y axis Event The event of interest or All Events XML Field When the X or Y axis is selected to be an XML field this is the field of interest e Apply build the chart Reset
23. 0 nct 12 0 0 n c t 13 0 0 n c t 14 0 0 n c t 15 0 0 This window shows each thread as well as statistics as a combined bar graph Each function is represen ted by a different color though possibly cycled From anywhere in ParaProf you can right click on ob jects representing threads or functions to launch displays associated with those objects For example in Figure 9 4 Main Data Window right click on the text n c 8 0 0 to launch thread based displays for node 8 Figure 9 5 Unstacked Bars X ParaProt uintah16 ppk packed data m x File Options Windows Help Metric P WALL CLOCK TIME alue Exclusive std dev mean n c t 0 0 0 nct 1 0 0 n c t 2 0 0 n ct 3 0 0 n c t 4 0 0 nat 5 0 0 n c t 6 0 0 nat 7 0 0 n c t 8 0 0 n ct 9 0 0 n c t 10 0 0 nct 11 0 0 n c t 12 0 0 n c t 13 0 0 n c t 14 0 0 nct 15 0 0 I il TT TIT AA aaa La ns na Es ef Eee m v m I1 You may also turn off the stacking of bars so that individual functions can be compared across threads in a global display 29 Chapter 10 3 D Visualization ParaProf displays massive parallel profiles through the use of OpenGL hardware acceleration through the 3D Visualization window Each window is fully configurable with rotation translation and zooming capabilities Rotation is accomplished by holding the left mouse button down and dragging the mouse Translat
24. 11 E ee EE 12 5 QUICK Reference reor erroe rur e Det ree dee KENE 13 6 Some Common Application Scenario use 14 6 1 Q What routines account for the most time How much 14 6 2 Q What loops account for the most time How much 14 6 3 Q What MFlops am I getting in all loops se 15 6 4 Q Who calls MPI Barrier Where sssssesee mm 16 6 5 Q How do I instrument Python Code sss 17 6 6 Q What happens in my code at a given time 17 6 7 Q How does my application scale sssss seca een eeneeees 18 II ParaProf User s Manual e tet eter Pere Ge ETHER EE gate op ea Pepe EES 20 Te Introduction eter SEENEN ERES 22 7 1 Using ParaProf from the command line es 22 7 2 Supported Formats 2 dete ette ge iei nte gees esses Per ENEE ec 23 3 Command Ime options ee ENNER EE des 23 8 Views and Sub Views AEN 25 8 1 To Create a Sub Views ss 25 9 Profile Data Management elotes porotos is the 27 9 1 ParaProf Manager Window 27 92 Loading A T E 27 9 3 Database Interaction ceems er e E E E T E E E EEE 28 94 Creatine Derived Metrics sienne e et toe geste 28 9 5 Main Data Window sise 28 10 3 D Visualization 5 onte tnter o Ree Pre rte re n D TAA reto 30 10 1 Triangle Mesh Plot eedem hn Eve Hn TID n eve ds 30 10 2 3 D Bar
25. 48 174 9 836 9 445 48 095 1 307 610 0 642 2 179 351 1 227 401 1 305 728 3 390 665 AAT 3 214 2 24 3 Creating a Boxchart In order to see a boxchart summary of the performance data in the database select the Create Boxchart item under the Visualization main menu item Figure 24 3 Boxchart 80 Visualization e TAU PerfExplorer Distributions of Significant Events File Help Significant 2 096 of runtime Event IQR Boxplot with Outliers Value M DIFUZE B DINTRF SW INTERF SPPM 24 4 Creating a Histogram In order to see a histogram summary of the performance data in the database select the Create Histo gram item under the Visualization main menu item Figure 24 4 Histogram 81 Visualization bra TAUJ PerfExplorer Distributions of Significant Events File Help TAU PerfExplorer Significant gt 2 0 of runtime Event Histograms 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 00 0 05 0 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0 50 0 55 0 60 0 65 0 70 0 75 0 80 0 85 0 90 0 95 1 0 Percentiles m DIFUZE E DINTRF MINTERF SPPM 24 5 Creating a Normal Probability Chart In order to see a normal probability summary of the performance data in the database select the Create NormalProbability item under the Visualization main menu item Figure 24 5 Normal Probability 82 Visualization
26. 64 1ib bindings icpc python mpi pdt pgi S SLD L mpirun np 4 lt dir gt pyMPI 2 4b4 TAU bin pyMPI wrapper py Instrumented pyMPI with wrapper py oP A oe oe oe 6 6 Q What happens in my code at a given time A Create an event trace Figure 6 5 Tracing with Vampir 17 Some Common Application Scenario How to create a trace set set o oo A o o oe qsu tau UMPS tau jum o o Cy oe OR VAMPI tau 4 st env TAU MAKEFILE opt apps tau tau2 x86 64 lib Makefile tau mpi pdt pgi path opt apps tau tau2 x86 64 bin path make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh setenv TAU TRACE 1 b run job treemerge pl merges binary traces to create tau trc and tau edf files HOT 2slog2 tau trc tau edf o app slog2 pshot app slog2 R 2otf tau trc tau edf app otf n 4 z reams compressed output trace vam pir app otf or vng client with vngd server 6 7 Q How does my application scale A Examine profiles in PerfExplorer Figure 6 6 Scalability chart ooo X TAU PerfExplorer Relative Speedup File Help Relative Speedup S3D Jaguar ORNL Harness Scaling Study GET TIME OF DAY 12 000 11 000 10 000 9 000 8 000 7 000 Value 6 000 5 000 4 000 3 000 2 000 1 000 0 1 000 2 000 3 000 4 000 5 000 6 000 7 000 8 000 9 000 10 000 11 000 12 000 Number
27. 94 62 5 966 11 932 BlMPI Rec 633 558 633 558 5 966 o Bun Sendo 228 118 228 118 5 966 0 o E mPi Alireduce 926 325 893 315 2 983 2 983 Msweep double darray darray Decomposition 646 218 646 218 5 966 0 Wun Gate 1 338 1 338 2 0 Wun wien 0 07 0 07 2 0 F Mstarup int int char 55 64 5 65 2 8 o wi casto 2 791 2 694 1 1 Wun Con cooraso 0 061 0 061 1 oH Pi Con create 0 594 0 483 1 3 Wun Con shino 0 087 0 087 al 0 Wun comm ranko 0 054 0 054 2 ox This display shows the callpath data in a table Each callpath can be traced from root to leaf by opening each node in the tree view A colorscale immediately draws attention to hot spots areas that contain highest values Figure 11 4 Thread Statistics Table 35 Thread Based Displays X Thread Statistics n c t 0 0 0 depth200 mpilieb amorris home KL HE File Options Windows Help LLE A E Name __ Cals Child Calls 9 man 9 579 1 2 997 S Mi CollectSolution darray darray Decomposition Grid 2 562 1 52 HH CreateArray void darray int int 0 148 1 0 MDumperror void darray darray 0 668 1 0 llrinatize void darray darray Grid 0 834 1 4 gt PB init_darrays void darray darray Decomposition Grid 0 24 1 2 Miteration 61 629 2 983 14 915 o DI Exchange void darray Decomposition Grid 956 296 5 966 11 932 o E MPI AllreduceQ 926 325 2 983 2 983 Bl Sweep double darray darray Deco
28. AVUS gt 2 BigScience gt 3 CFDSHIP gt 2 LAMMPS Large scale Atomic Molecula gt 5 Miranda gt 3 POP gt 3 SHAMRC b SMG2000 gt 3 SPhot gt J Uintah gt WRF gt 3 gyro B1 std gt 3 gyro B1 std HPM gt 3 gyro B2 cy b 3 gyro B2 cy HPM gt 2 gyro B3 gtc gt 2 gyro B3 gtc HPM Y GJ sPPM Y L2 Frost Y J 16 16 gt P_WALL_CLOCK_TIME gt PAPI FP INS gt PAPLINT INS gt PAPI TOT CYC gt PAPI TOT IIS gt PAPI TOT INS 0 25 d MMI f Analysis Management 4 Cluster Results Correlation Results 0 100 200 3p m n There are a number of images in the Cluster Results window From left to right the windows indicate the cluster membership histogram a PCA scatterplot showing the cluster memberships a virtual topo logy of the parallel machine the minimum values for each event in each cluster the average values for each event in each cluster and the maximum values for each event in each cluster Clicking on a thumb nail image in the main window will bring up the images as shown below Figure 20 7 Cluster Membership Histogram 59 Cluster Analysis eoe analysis result y sPPM AErost1 6 1 8612 WALL CLOCK TIME Threads in cluster 0 25 50 75 100 125 150 175 200 225 250 r Humber luste r a E Threads in cluster Figure 20 8 Cluster Membership Scatterplot 60 Cluster Analysis a
29. CALCULATION H4 Cancel 22 1 2 Metric of Interest Profiles may contain many metrics gathered for a single trial This selects which of the available metrics the user is interested in Figure 22 2 Setting Metric of Interest OOO Metric of interesi Please enter the metric of interest P WALL CLOCK TIME 22 1 3 Event of Interest Some charts examine events in isolation This setting configures which event to examine Figure 22 3 Setting Event of Interest 68 Charts OOO Event of interest Please enter the event of interest Coll Joller 0 10 eration 0 keration 0 gt Coll lieratiori 0 gt Coll_tr keration 1 0 keration gt NI keration 0 gt NL_tr eration gt extras Cancel 22 1 4 Total Number of Timesteps One chart the Timesteps per second chart will calculate the number of timesteps completed per second This setting configures that value Figure 22 4 Setting Timesteps eoo Total Timesteps er the total number ci 5 for the experiment Cancel x 22 2 Standard Chart Types 22 2 1 Timesteps Per Second The Timesteps Per Second chart shows how an application scales as it relates to time to solution If the timesteps are not already set you will be prompted to enter the total number of timesteps in the trial see Section 22 1 4 Total Number of Timesteps If there is more than one metric to choose from you may
30. Comparison Window 2 trials 44 Comparative Analysis X Comparison Window mE File Options Windows Help Metric Time Miu A 128 Mean alue Exclusive li iu C 512 Mean Units seconds 4207 eng Do 77 5200 0 908 Ez 2 015 O 0 699 E 2 386 amm 1 020 0 419 B ins 1 462 mmm 0 384 E 1326 eg Di 0 376 E A 0 651 Ee MPI_WaitQ 0 375 E 1 003 EE exchange 1 0242 B 4 1 392 ess 0377 l acy 0 776 END 0 168 H 0 724 mm C 0 141 H E 044 m bcast inputs gt Figure 14 3 Comparison Window 3 threads x Comparison Window EN Fite Options Windows Help Metric Time M lu A 128 Mean alue Exclusive liu C 512 Mean Units seconds llu A 128 n c t 0 0 0 4 802 uuu 12324 m MEI Feu 753 ma E s 0 908 zx 2 015 Ess Mi nn 1272 es 0 699 EZ 2 386 es MPI Sendo 0 243 H 0 419 E 1462 eg bits 0 368 E 0 384 EI 1336 es buts 0 32 E 0 376 EB 0 651 BEN MPI Wait 0 328 E 0 375 E 1 003 el exchange_1 0 267 B 0242 H 1392 eg rhs 0185 H gt 45 Chapter 15 Miscellaneous Displays 15 1 User Event Bar Graph In addition to displaying the text statistics for User Defined Events ParaProf can also graph a particular User Event across all threads Figure 15 1 User Event Bar Graph
31. D Scatter Plot Figure 10 3 3 D Scatter Plot X ParaProt Visualizer Application 13 Experiment 23 Trial 58 SUE File Options Windows Help AT 2 Triangle Mesh Bar Plot e Scatter Plot MPISend Width Exclusive Time w MPL_Recv Depth E 7 Exclusive Time ei buts J Height Exclusive Time v exchange_3 Exclusive S ScatterPlot Axes ColorScale Render Point size Point detail This visualization method plots the value of each thread along up to 4 axes each a different function met ric This view allows you to discern clustering of values and relationships between functions across threads Select functions using the button for each dimension then select a metric A single function across 4 metrics could be used for example 10 4 3 D Topology Plot Figure 10 4 3 D Topology Plot 31 3 D Visualization File Options Windows Help a rt Triangle Mesh Bar Plot Scatter Plot Topology Plot Layout Events qu ae O uu MN 478 058 seconds M E vi ibl DEMAM eise 479 034 seconds Lock Range X Axis Q Y Axis Q Z Axis Avg Color Value 478 533 seconds Topology Custom v X Axis s Y Axis sH Z Axis 64H In this visualization you can either define the layout with a MESP topology definition file or you can fill a rectangular prism of user defi
32. Plot ort ettet entrer st OE PI SEE eis 30 10 3 3 D Scatter Plot 2 eee s Dude sikk be dueti tias 31 10 4 3 D Topology Plot EE 31 10 5 3 D Commication Matrix ue 33 11 Thread Based Displays 23s sr ds ee eR eee 34 TI TI Thread Bai Graphe ste iore o ee eet perte 34 TAU User Guide 11 2 Thread Statistics Text Window sss 34 11 3 Thread Statistics Table comarcal 35 11 4 Call Graph WindOW 11 18 it eu 36 11 5 Thread Call Path Relations Window oooconoccnccnnccnnncnocnnncnnncnnccnnconnccnnins 37 11 6 User Event Statistics Window 38 11 7 User Event Thread Bar Chartisisesrir mites ients n ES 38 12 Function Based Displays 40 12 1 Function Bar Graph 55511 deeem ire Re mre er ERES EE Re 40 12 2 Function Histogramme ssl teri nece e NEEN raevu sega Skele i 40 13 Phase Based Displays mtt Pet Re pere des 42 13 1 Using Phase Based Displays oooccooccncconoconccnnocnnccnnccnncnnncnnccnnconnccnncons 42 14 Comparative Analysis prnl eee ee EEN 44 14 1 Using Comparitive Analysis seen tenn eeaes 44 15 Miscellaneous Displays ess ss nb VR er sere due 46 15 1 User Event Bar Grapb 2 1 oorr irte Dre eite Se ek Nee 46 15 2 A bees Pao PR en oue EUN 46 15 2 1 Function Ledger Eesen edu ae e Ea PRU Ere SET Fels 46 15 22 Group Ledger teh IPS 47 15 2 3 User Event Ledger ncmpe Peer e Papa 47 15 3 Selective Instrumentation File
33. Speedup for One Event chart shows how one event in an application scales with respect to relative speedup That is as the number of processors increases by a factor the speedup is expected to increase by the same factor with ideal scaling The ideal speedup is charted along with the actual spee dup for the application If there is more than one event to choose from and you have not yet selected an event of interest you may be prompted to select the event of interest see Section 22 1 3 Event of In terest If there is more than one metric to choose from you may be prompted to select the metric of in terest see Section 22 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 11 Relative Speedup one Event eoe Re Relative peedup for Event dup for Gell_tr wall y 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B B4 oid nD cheetah affnosng ideal 22 2 8 Group of Total Runtime The Group 46 of Total Runtime chart shows how the fraction of the total runtime for one group of events changes as the number of processors increases If there is more than one group to choose from and you have not yet selected a group of interest you may be prompted to select the group of interest see Sec 73 Charts tion 22 1 1 Group of Interest If
34. Total Runtime eege EENS ERENNERT ee EX EPI DOR re tapa sek 74 22 13 Runtime Breakdown sise 74 22 14 Relative Efficiency per Phase 1115124 ient re her e Rie TERR tuk ke s 75 22 15 Relative Speedup per Phase 75 22 16 Phase Fraction of Total Runtime ss 76 23 T The Custom Charts Interface de eere ANA Ee edd Eege 77 24 1 3D Visualization of multivariate data 79 24 2 Data Summary Window eee edd Ne Eed Tee NN des E voe Del ka eg ee 80 EE 80 ZA An Histogram 24443850 ELE 81 24 5 Normal Probability ene pee eben EEE eI kukeks enda 82 25 1 Potential scalability data organized as a parametric study 84 25 2 Selecting a table aeree istuks lask sesti edo ep endete tuae be su eb ee emal red 84 25 3 Selecting a column ust ee eR rs 85 25 4 Selecting an operator ecce eh ree e eese p ren VENN edes E E Ss teneur RDS Use 85 25 5 Selecting A eeuoo cie D Hr eot PER eet Etre eps 85 25 6 Enterng name for the VIEW sesto teer dre tee ax POE LOS ten nl fout de 85 25 7 The Completed view toi prre qx pedir 86 25 8 Selecting the base View A TS 86 25 9 Completed s b VIeWS e e HH e RW 87 viii List of Tables 1 1 Different methods of instrumenting applications rerererevereneneeenenenenen eee 1X TAU preface TAU Performance System is a portable profiling and tracing toolkit for performance analysis of paral lel programs written in F
35. databases eT HA M up T 98 29 2 perfdmf createexp deprecated only supported for older PerfDMF databases sede qudd E EO RR eR ay ane ER QE a 98 29 3 taudb loadtrial ornis inresa ioe maea aade r ETOS enese seene seene nene emensus 98 20 4 TA UAb VIEWS isse iks tif repete ane Te kalnas seme IR j vad 100 30 Database Schema tnr ttt t em PCR AEN 101 30 1 SOL for TAUdb ee Iter tiet NEEN AN 101 31 TAUdb CAPI a otsi eter vote RES er E iR rr ota a dese eh EE Ire 111 31 1 TAUdb C API Overview ses 111 31 2 TA UADE Structures ci eR Wed EN 111 313 TA UAB C API eege eege eg geesde eege kudas pn pukis 117 31 4 TAUdb APT Examples odore alati ENEE Reb tede 123 31 4 1 Creating a trial and inserting into the database 123 31 4 2 Querying a trial from the database 125 94 Chapter 28 Introduction TAUdb TAU Database formerly known as PerfDMF Performance Data Management Framework is a an API Toolkit that sits atop a DBMS to manage and analyze performance data The API is available in its native Java form as well as C 28 1 Prerequisites l A supported Database Management System DBMS TAUdb currently supports PostgreSQL MySQL Oracle H2 and Derby For use with the C API only PostgreSQL is supported SQLite support is currently being evaluated Because they are Java only H2 and Derby can NO be ac cessed with the C API 2 Java 1 5
36. each metric you wish papi to profile use papi utils papi event chooser PAPI LD INS PAPI SR INS PAPI L1 DCH Test case eventChooser events Available events which can be added with given Vendor string and code del string and code U Revision U Megahertz U s in this Node des in this System tal CPU s H 2923000 GJO O U U UO mber Hardware Counters ax Multiplex Counters GenuineIntel 1 Itanium 2 2 1 000000 1500 000000 16 1 16 4 32 Event PAPI L1 DCM can t be counted with others Here the event chooser tells us that PAPI LD INS PAPI SR INS and PAPI L1 DCM are incompat ible metrics Let try again this time removing PAPI LI DCM papi utils papi event chooser PAPI LD INS PAPI SR INS Test case eventChooser Available events which can be added with given events Vendor string and code GenuineIntel 1 Profiling Model string and code CPU Revision CPU Megahertz CPU s in this Node Nodes in this System Total CPU s Number Hardware Counters Max Multiplex Counters It l1 LS 16 1 16 4 32 anium 2 2 000000 00 000000 Usage eventChooser NATIV E PRESET evtl evet2 Here the event chooser verifies that PAPI LD INS and PAPI SR INS are compatible metrics Next make sure that you are using a makefile with papi in its name Then set the environment variable TAU METRICS to a colon delimited list of PAPI metrics you would like to use
37. for one of the threads 115 TAUdb C API typedef struct taudb call data key struct taudb timer callpath timer callpath link back to database struct taudb thread thread link back to database roundabout way char timestamp timestamp in case we are in a snapshot or something TAUDB TIMER CALL DATA KEY typedef struct taudb timer call data int id link back to database TAUDB TIMER CALL DATA KEY key hash table key int calls number of times this timer was seen int subroutines number of timers this timer calls struct taudb timer value timer values UT hash handle hh1 UT hash handle hh2 j TAUDB TIMER CALL DATA finally timer values are specific measurements during one of the observations of the node of the callgraph on a thread typedef struct taudb timer value struct taudb metric metric which metric is this double inclusive the inclusive value of this metric double exclusive the exclusive value of this metric double inclusive percentage the inclusive percentage of total time of the application double exclusive percentage the exclusive percentage of total time of the application double sum exclusive sguared how much variance did we see every time we measured this timer char key hash table key metric name UT hash handle hh TAUDB TIMER VA
38. gt This table would have two one for the x value and one for the y value m E timer pa timer INT rameter 104 Database Schema parameter name VARCHAR NOT NULL parameter value VARCHAR NOT NULL FOREIGN KEY timer REFERENCES timer id ON DELETE NO ACTION ON UPDATE NO ACTION timer callpath have the i If the profile is the parent points to a node in the callgraph triat nformation about the call graph in a trial these will all have no parents Otherwise the calling timer function CREATE TABLE timer callpath id SERIAL NOT NULL PRIMARY KEY what timer is this timer INT NOT NULL what is the parent timer parent INT FOREIGN KEY timer REFERENCES timer id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY parent REFERENCES timer callpath id ON DELETE NO ACTION ON UPDATE NO ACTION By definition profiles have no time data However there are a few examples where time ranges make sense such as tracking call stacks or associating metadata to a particular phas The time range tabl is used to give other measurements a time context The
39. ipm Integrated Performance Monitoring format from NERSC Google google Google Profiles 7 3 Command line options In addition to specifying the profile format the user can also specify the following options e fixnames Use the fixnames option for gprof When C and Fortran code are mixed the C routines have to be mapped to either function or function Strip the leading period or trailing underscore if it is there 23 Introduction pack file Rather than load the data and launch the GUI pack the data into the specified file dump Rather than load the data and launch the GUI dump the data to TAU Profiles This can be used to convert supported formats to TAU Profiles oss Outputs profile data in OSS Style Example Thread n c t 0 0 0 excl secs excl cum PAPI TOT CYC PAPI FP OPS calls 0 005 56 0 56 0 13475345 4194518 T 0 003 40 1 96 1 9682185 4205367 d 0 3 6 99 7 223173 17445 1 2 2E 05 0 3 100 0 14663 206 T summary Output only summary information for OSS style output 24 function foo bar baz main Chapter 8 Views and Sub Views In the past PerfDMF used a hierarchy of Applications and Experiments to organize Trials This ap proach was too rigid so in TAUdb trials are organized by dynamic Views Views are lists of Trials that share a given metadata value For example a View could contain all the Trials where the total number of threads is l
40. iteration subsetevents ArrayList 90 Running PerfExplorer Scripts subsetevents add CHARGEI subsetevents add PUSHI subsetevents add SHIFTI print got data for subsetevent in subsetevents events ArrayList for event in resultl getEvents if event find Iteration gt 0 and event rfind subseteven events add event extractor ExtractEventOperation resultl events extracted extractor processData get 0 print extracted phases f get the Statistics dostats BasicStatisticsOperation extracted False stats dostats processData print got stats for metric in stats get 0 getMetrics grapher DrawMMMGraph stats metrics HashSet metrics add metric grapher set metrics metrics grapher setTitle subsetevent metric grapher setSeriesType DrawMMMGraph TRIALNAME grapher setCategoryType DrawMMMGraph EVENTNA grapher setValueType AbstractResult INCLUSIV grapher setLogYAxis True grapher processData i Li return print JPython test script start glue print JPython test script end d 91 Chapter 27 Derived Metrics Sometimes metrics in a profile need to be combined to create a derived metric PerfExplorer allows the user to create these using the derived metric expression tab 27 1 CreatingExpressions The text box at the top of the tab allows the u
41. node context thread combination When multiple counters are used each metric is located in a directory prefixed with MULTI To launch ParaProf with all the metrics simply launch it from the root of the MULTI directories e ParaProf Packed Format ppk Export format supported by PerfDMF ParaProf Typically ppk e TAU Merged Profiles snap Merged and snapshot profile format supported by TAU Typically tauprofile xml e TAU pprof pprof Dump Output from TAU s pprof d Provided for backward compatibility only e DynaProf dynaprof Output From DynaProf s wallclock and papi probes mpiP mpip Output from mpiP gprof gprof Output from gprof see also the fixnames option e PerfSuite psrun Output from PerfSuite psrun files e HPM Toolkit hpm Output from IBM s HPM Toolkit e Cube cube Output from Kojak Expert tool for use with Cube e Cube3 cube3 Output from Kojak Expert tool for use with Cube and Cube4 e HPCToolkit hpc XML data from hpcquick Typically the user runs hpcrun then hpcquick on the resulting binary file e OpenMP Profiler ompp CSV format from the ompP OpenMP Profiler http www ompp tool com The user must use OMPP OUTFORMAT CVS PERI XML perixml Output from the PERI data exchange format General Purpose Timing Library gptl Output from the General Purpose Timing Library e Paraver paraver 2D output from the Paraver trace analysis tool from BSC IPM
42. of code line number INT line number of the end of the block of code line number end INT column number of the start of the block of code column number INT column number of the end of the block of code column number end INT FOREIGN KEY trial REFERENCES trial id ON DELETE NO ACTION ON UPDATE NO ACTION timer index on the trial and name columns CREATE INDEX timer trial index on timer trial name KR KR KKK kCkCk KK KK A CREATE T HE TIMER R ELATED TABLES RK KK KKK KKK timer gr KKKKKKKKKK oups are t KOR KKK ke e e ke e e x X he groups such as TAU DEFAULT MPI OPENMP TAU PHASE TAU CALLPATH TAU PARAM etc This mapping table allows for NxN mappings between timers and groups CREATE TABLE timer group timer INT group name VARCHAR NOT NULL FOREIGN KEY timer REFERENCES timer id ON DELETE NO ACTION ON UPDATE NO ACTION index for faster queries into groups CREATE INDEX timer group index on timer group timer group name timer parameters are parameter based profile values an examp le is foo x y where x 4 and y 10 In that example timer would be the index of the timer with the name fo entries CREATE TABL o x y lt x gt lt 4 gt lt y gt lt 10
43. of execution One such ap plication would be to collect separate statistics for each timestep or group of timesteps In order to visu alize the variance between the phases of execution a number of phase based charts are available 22 3 1 Relative Efficiency per Phase The Relative Efficiency Per Phase chart shows the relative efficiency for each phase as the number of processors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 22 14 Relative Efficiency per Phase Relative hiie 1 05 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 65 0 60 0 55 0 50 0 45 0 40 0 35 0 30 0 25 0 20 0 15 0 10 0 05 0 00 n staht ent fico stal nl2 8l ney by Pina 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E iteraton D iteration 1 A iteraton 2 iteration 3 Iteration 4 Y iteration 5 iteration 6 keraton 7 iteraton 8 lt iteratoni 22 3 2 Relative Speedup per Phase The Relative Speedup Per Phase chart shows the relative speedup for each phase as the number of pro cessors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of I
44. one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 9 Relative Speedup eoe Relative Speedup selup e gne stan al ec fms vell 30 0 27 5 25 07 22 5 175 w gt 450 12 5 7 5 50 2 5 Q0 25 50 75 100 125 150 175 200 225 250 275 25 350 375 400 425 450 475 500 525 Number of Pre E B1 sid n12 cheetah afinosng ideal 22 2 6 Relative Speedup by Event The Relative Speedup By Event chart shows how the events in an application scale with respect to relat ive speedup That is as the number of processors increases by a factor the speedup is expected to in crease by the same factor with ideal scaling The ideal speedup is charted along with the actual spee dup for the application If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one experi ment or view and select this chart item under the Charts main menu item Figure 22 10 Relative Speedup by Event 72 Charts Resla peedup by Event dp by Event for gyro le dl chestah arin w 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors WCol Coli AVO NL NL t Yextras field gt lin RHS other lt ideal 22 2 7 Relative Speedup for One Event The Relative
45. se he where clause ne of the m column name the operator operator the value for value FOREIGN ON DEI KEY ETE taud CASCA simple view of all INSERT INTO VALUES b D taudb_view column VARCHAR or the where clause VARCHAR NOT N the where clause VARCHAR NOT NULL view REFERENCES E ON UPDATE CASCADE 7 ULL trials parent name NULL Al 1 Trials and must have a param do not work correc ter or else th tly INS ERT INTO NULL NOT NULL ES taudb view id NOT NULL NOT NULL etadata NOT NULL conjoin NOT NULL tables hich organizes and filters trials PRIMARY KEY this is the taudb_view id sub views for this view taudb_view_parameter taudb_view VALUES ZE E latest sc create id applicat experime trial metric method dimensio nor maliza 1 nema PerfExplorer simpler table analysis set taudb view ion nt tion FOREIGN KEY ON D ET FOR EY n reduction table name se be ert but keepi ET ERIAL Column name total threads ng them makes th cod ngs S INT INT INT INT INT VARC VAR VAR taudb view EG EGI EGI EGE EGE HA HA HA NU NU NULL NU NULI NOT
46. there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 12 Group of Total Runtime Transpese Time Total Runtime tion Time IT mall 25 50 75 100 125 150 175 200 225 250 275 325 250 375 400 425 450 475 500 525 Number of Processors B 51 std n2 cheetah affnosng 22 2 9 Runtime Breakdown The Runtime Breakdown chart shows the fraction of the total runtime for all events in the application and how the fraction changes as the number of processors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of In terest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 22 13 Runtime Breakdown eoe EI D 65 al Runtime Breakdown al Runtime 100 95 90 as nn 75 70 en 55 50 45 4n a5 25 20 15 10 5 25 50 75 100 425 450 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Number of Processors Sco Beoir BVO ON SNL tr Mextras feld in RHS Mother 22 3 Phase Chart Types 74 Charts TAU now provides the ability to break down profiles with respect to phases
47. to install TAUdb see Chapter 28 Introduction To configure PerfExplorer move to the tools src PerfExplorer directory in you TAU distribution Type gt configure If you haven t already done so for other TAU tools add path to tau tau2 apple bin to your path The following command line options are available to configure Available configuration options e engine analysis engine Specifies the data mining engine to use The supported options include weka and R rroot directory Specifies the directory where R is installed Specifically it should be the directory where the bin include lib library and share directories are located e objectport available network port Specifies the port that the PerfExplorer server should use when running PerfExplorer in client serv er mode Select an available network port and make sure that other appropriate network configura tions are made firewalls etc The default port is 9999 e registryport available network port Specifies the port that the rmiregistry should use when ruining PerfExplorer in client server mode Select an available network port and make sure that other appropriate network configurations are made firewalls etc The default port is 1099 bd Server lt server name Specifies the fully qualified domain name of the server where PerfExplorer is run when running PerfExplorer in client server mode 55 Chapter 19 Runn
48. total number of threads With this option off the sum will only be divided by the number of threads that actively participated in the sum This way the user can control whether or not threads which do not call a particular function are consider as a O in the computation of statistics Generate Reverse Calltree Data This option will enable the generation of reverse callpath data ne 49 Preferences cessary for the reverse callpath option of the statistics tree table window Show Source Locations This option will enable the display of source code locations in event names 16 2 Default Colors Figure 16 2 Edit Default Colors X ParaProf Edit Default Colors DEE File Default Color Set Swatches HSB RGB Sat El Add Function Color Group 3 Group 4 Add Group Color Group 5 TET B E E BW EI HH Group 6 Delete Selected Color Group 7 E Update Selected Color Group 8 Group 9 Restore Defaults Group 10 Group 11 Group 12 Func Highlight E Preview Group Highlight 7 oO D y Sample Text Sample Text User Event Highlight oO Misc Func Color Ji D gt LED The default color editor changes how colors are distributed to functions whose color has not been spe cifically assigned It is accessible from the File menu of the Preferences Window 16 3 Color Map Figure 1
49. 08 0 0 002 MPI Wait 3 1 0 867 0 867 80504 0 1 0771E 5 MPI Send 2 8 0 768 0 769 40000 0 1 9213E 5 jacu 2 7 0 759 0 759 40000 0 1 8968E 5 icld 54 2 0 691 15 125 160000 160000 9 4533E 5 exchange 1 81 1 0 415 22 608 3 160267 7 536 hcast inputs 5 2 0 088 1 452 504 1512 0 003 exchange 3 0 3 0 068 0 094 1 48000 0 094 setiv 0 1 0 03 0 03 58886 0 5 1404E 7 exact 0 1 0 024 0 024 2 Q 0 012 0 4 0 015 0 104 T 2 0 104 erhs 0 1 0 014 0 018 1 2 0 018 read input 0 1 0 012 0 021 1 8001 0 021 or 0 0 0 01 0 01 8 0 0 001 0 0 0 007 0 007 508 0 1 3455E 5 MPI Irecv 0 0 0 006 0 008 1 2886 0 008 sethy A ADA A a A 4A 34r 4 MOT D LS This display shows a pprof style text view of the data 11 3 Thread Statistics Table Figure 11 3 Thread Statistics Table inclusive and exclusive x Thread Statistics n c t 0 0 0 depth200 mpilieb amorris home KL HE File Options Windows Help Widder rae aero Name A Inclusive Time Exclusive Time Calls Child Calls 9 Brain 2 662 i 9 579 t 2 997 o li Collectsolution darray darray Decomposition Grid 2 562 0 246 1 52 Wi CreateArray void darray int int 0 148 0 148 d 0 W Dumperror void darray darray 0 668 0 668 L 0 Finalize void darray darray Grid 0 834 0 056 1 4 o ini aarrays void darray darray Decomposition 0 24 0 072 1 2 Miteration 2 590 468 61 629 2 983 14 915 9 W Exchange void darray Decomposition Grid 956 296
50. 3 Relative Efficiency by Event metereerenrree 70 22 2 4 Relative Efficiency for One Event ocoooccnccnnccnnccnncnncconccnnccnnccnnions 71 22 2 5 Relative Speedup mie esarp tete es ats eere to Poe ted 72 22 2 6 Relative Speedup by Event 72 22 2 7 Relative Speedup for One Event 73 22 2 8 Group of Total Runtime eH 73 22 2 9 Runtime Breakdown esee 74 22 3 Phase Chart Types csset Olde kaka mi Sperre sage de ueni 74 22 3 1 Relative Efficiency per Phase 75 22 3 2 Relative Speedup per Phase 75 22 3 3 Phase Fraction of Total Runtime sss 76 23 Custom Charts EE TI 24 Visualization EE 79 TAU User Guide 24 1 3D VisualiZation zc eoo n ies Reid peed eRe ee ed ee me 79 24 25 Data S mtmaty ice oo etude Le e ete SRE 79 24 3 Creating Boxch rt sii lst adel Sneed eee ts 80 24 4 Creating Histo prai cose oett eter Pee encre tuent ere th dee dee 81 24 5 Creating a Normal Probability Chart e 82 25 VIEWS EUER 84 25 1 Creating VIEWS de Abuse se ENEE 84 25 2 Creating SubVIeWs eege dette E erre ENEE PERI E eere sek 86 26 Running PerfExplorer Scripts ss 88 26 1 Analysis Components ette Ro ei Poe i e ET E ETE Ere EE en 88 26 2 Scripting Interface casita ire 89 26 3 Example Script iia me EE 89 ZT Denved Metrics ageet Aessen 92 27 1 Creating EE 92 21 2 Selecting EXPresSIONS 5 costes ciere Ee LPS se
51. 6 3 Color Map X ParaProf Color Map W i HE Assign Colors Currently Assigned Colors subdomain zx Remove error Remove All MPI Barrier read input setiv MPI Comm size MP Ar getQ I2norm buts exchange 4 exchange 6 applu sethyper bcast inputs neighbors jacu 50 Preferences The color map shows specifically assigned colors These values are used across all trials loaded so that the user can identify a particular function across multiple trials In order to map an entire trial s function set Select Assign Defaults from gt and select a loaded trial Individual functions can be assigned a particular color by clicking on them in any of the other ParaProf Windows 51 Part Ill PerfExplorer User s Manual Table of Contents 17 Introduction ends ie litis 54 18 Installation and Configuration 1 ee eter retro nono EPESI R TH err ete 55 18 1 Available configuration options tena tenn een eees 55 19 R nnmng PerfExplorer ug Geier Re tr ed o i e 56 PAESI SOWIE RR 57 20 1 Dimension Reduction ss 57 20 2 Max Number of Clusters eireeeeerreeree eee enne eee 57 20 3 Performing Cluster Analysis esse 58 2 Correlation Analysis c eem sass e re dead oa SEENEN vo ore OR EPIS TS RE ERES 64 21 1 Dimension Reduc
52. 7 value taudb strdup Quad Core AMD Opteron tm Processor 8378 taudb add primary metadata to trial trial amp pm 7 create a metric TAUDB METRIC metric taudb create metrics 1 metric gt name taudb strdup TIME taudb add metric to trial trial metric create a thread TAUDB THREAD thread taudb create threads 1 thread node rank 1 thread gt context rank 1 thread gt thread rank 1 thread gt index 1 taudb add thread to trial trial thread create a timer timer callpath timer call data timer value TAUDB TIMER GROUP timer group taudb create timer groups 1 TAUDB TIMER timer taudb create timers 1 TAUDB TIMER CALLPATH timer callpath taudb create timer callpaths 1 TAUDB TIMER CALL DATA timer call data taudb create timer call data TAUDB TIMER VALUE timer value taudb create timer values 1 timer name taudb strdup int main int char kernel c 134 1 207 1 timer short name taudb strdup main timer gt source file taudb strdup kernel c timer gt line number 134 timer gt column number 1 timer gt line number end 207 timer column number end 1 taudb add timer to trial trial timer timer group gt name taudb strdup TAU DEFAULT taudb add timer group to trial trial timer group taudb add timer to timer group timer group timer timer callpath gt timer timer timer callpath gt parent NULL 124 TAU
53. AUDB COUNTER counter extern void taudb add counter value to trial TAUDB TRIAL trial TAUDB COUNTER VALUE counter value 121 TAUdb C API ex ex ex ex ex Profile parsers RIAL ex ex ex ex ex ext ex ext ex ext ext ex ex ex ex ter T ter T ter T ter T ter T ter T ter GET A te te BAROARAsAsArA4r Tr D 5 c K Ler tau au au au au au au au au au au n void AUDB_TI n void AUDB_TI n void AUDB_TI n void AUDB_TI n void AUDB_TI lt SSA lt R_ R R db add R timer db add PARAM db add ti ETI m m er to trial TAUDB TRIAL trial er parameter to trial TAUDB TRIAL trial R timer parameter er group to trial TAUDB TRIAL trial GROUP db add L tim imer_group er to timer group TAUDB TIME ER GROUP timer group R timer db add CAL ti PATH M n void AUDB TI A R db add tim er callpath to trial TAUDB TRIAL trial timer callpath er call data to trial TAUDB T RIAL trial CALL n void tau TAUDB TIM ab add tim DATA timer call data er value to timer call data ER CALL DATA n TAUDB T erators n TAUDB audb n
54. AUDB TIMER CALLPATH timer callpath TAUDB THREAD thread TAUDB METRIC metric extern TAUDB TIMER VALUE taudb query all timer values TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIMER VALUE taudb query all timer stats TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIMER VALUE taudb get timer value TAUDB TIMER CALL DATA timer call data TAUDB METRIC metric find main extern TAUDB TIMER taudb query main timer TAUDB CONNECTION connection TAUDB TRIAL trial save everything extern void taudb save trial TAUDB CONNECTION connection TAUDB TRIAL trial boolean update boolean cascade extern void taudb save threads TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save metrics TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save timers TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save time ranges TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save timer groups TAUDB CONNECTION connection 120 TAUdb C API TAUDB TRIAL trial boolean update extern void taudb save timer parameters TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save timer callpaths TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save timer call data TAU
55. AUDB TIMER GROUP taudb get timer group from timer by name TAUDB TIMER GROUP timers const char name extern TAUDB TIMER CALLPATH taudb query timer callpaths TAUDB CONNECTION connection TAUDB TRIAL trial TAUDB TIMER timer exte TAUDB TIMER CALLPATH rn taudb get timer callpath by id TAUDB TIMER CALLPATH timers int id extern TAUDB TIMER CALLPATH taudb get tim callpath by name TAUDB TIMER CALLPATH timers const char id extern TAUDB TIMER CALLPATH taudb query all timer callpaths TAUDB CONNECTION connection TAUDB TRIAL trial extern char taudb get callpath string TAUDB TIMER CALLPATH timer callpath get the counters for a trial extern TAUDB COUNTER taudb query counters TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB COUNTER taudb get counter by id TAUDB COUNTER counters int id extern TAUDB COUNTER taudb get counter by name TAUDB COUNTER counters const char id extern TAUDB COUNTER VALUE 119 TAUdb C API taudb query cou nter values TAUDB CONN ECTION connection
56. AULT B TAU_USER O TAU_USER3 Bl ThermalContact computeHeatExchange H send old data The group ledger shows each group along with its current color This ledger is especially important be cause it gives you the ability to mask all of the other displays based on group membership For example you can right click on the MPI group and select Show This Group Only and all of the windows will now mask to only those functions which are members of the MPI group You may also mask by the in verse by selecting Show All Groups Except This One to mask out a particular group 15 2 3 User Event Ledger Figure 15 4 User Event Ledger 47 Miscellaneous Displays The user event ledger shows each user event along with its current color 15 3 Selective Instrumentation File Generator ParaProf can also help you refine your program performance by excluding some functions from instru mentation You can select rules to determine which function get excluded both rules must be true for a given function to be excluded Below each function that will be excluded based on these rules are listed x User Event Window uintah16 ppk packed da File Windows Help M Message size for gather M message size for reduce H Message size received from all nodes E Message size sent to all nodes Figure 15 5 Selective Instrumentation Dialog Le TAU ParaProf Selective Inst
57. CES trial id ON DELETE TE NO ACTION ON UPDATE NO ACTION counter index on the trial and name columns CREATE INDEX counter trial index on counter trial name CREATE TABLE counter value what counter is this counter INT NOT NULL where in the callgraph timer callpath INT what thread is this thread INT NOT NULL The total number of samples sample count INT The maximum value seen maximum value DOUBLE PRECISION The minimum value seen minimum value DOUBLE PRECISION The mean value seen mean value DOUBLE PRECISION The variance for this counter standard deviation DOUBLE PRECISION 106 Database Schema FOREIGN KEY counter REFERENCES counter id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY timer callpath REFERENCES timer callpath id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY thread REFERENCES thread id ON DELETE NO ACTION ON UPDATE NO ACTION one thread one counter CREATE INDEX counter value index on counter value counter thread RR KR KKK KK KK A CREATE THE METADATA RELATED TABLES KOR KR KKK KK KK A primary metadata is metadata that is not nested does not contain unique data f
58. DB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save timer values TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save counters TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save counter values TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save primary metadata TAUDB CONNECTION connection TAUDB TRIAL trial boolean update extern void taudb save secondary metadata TAUDB CONNECTION connection TAUDB TRIAL trial boolean update J RCKCKCKCKCKCkCk hh E KCKCK OA memory functions ROR KR KR hh hh E e A extern char taudb strdup const char in string extern TAUDB TRIAL taudb create trials int count extern TAUDB METRIC taudb create metrics int count extern TAUDB TIME RANGE taudb create time ranges int count extern TAUDB THREAD taudb create threads int count extern TAUDB SECONDARY METADATA taudb create secondary metadata int count extern TAUDB PRIMARY METADATA taudb create primary metadata int count extern TAUDB PRIMARY METADATA taudb resize primary metadata int count TAUDB PRIMARY METADATA old primary metadata extern TAUDB COUNTER taudb create counters int count extern TAUDB COUNTE
59. DB STRUCTS H define TAUDB STRUCTS H 1 include time h include uthash h include taudb structs h f defined TAUDB POSTGRESQL nclude libpq fe h lif defined TAUDB SOLIT nclude sglite3 h ndif E D HD H H ifndef boolean define TRUE 1 define FALSE 0 typedef int boolean endif typedef struct taudb prepared statement char name UT hash handle hh hash index for hashing by name TAUDB PREPARED STATEMENT forward declarations to ease objects that need to know about each other and have doubly linked relationships struct taudb timer call data struct taudb timer value struct taudb timer callpath struct taudb timer group struct taudb timer parameter struct taudb timer struct taudb counter value struct taudb counter struct taudb primary metadata struct taudb secondary metadata struct taudb time range struct taudb thread struct taudb metric struct taudb trial struct perfdmf experiment struct perfdmf application typedef struct taudb configuration char jdbc db type to identify DBMS vendor postgresql mysql h2 derby etc char db hostname server host name char db portnum server port number char db dbname the database name at the server char db schemaprefix the schema prefix This is appended to all table names for some DBMSs char db use
60. File Generator 48 16 Preferences 2 ae e D E Ere ka re ee Ee o SU Re NEUE a teas VEU REN tee 49 10 1 Preferences Wand OW 25 002 sacs eet reete ote tr acne e ex repe weeds 49 16 2 Default Colors ui e eren rege er EI A LEE EL pe ces 50 16 3 Color Mapa metere tetto ier tere tutes eot de oe e doeet Rede 50 21 Chapter 7 Introduction ParaProf is a portable scalable performance analysis tool included with the TAU distribution O Important ParaProf requires Oracle Sun s Java 1 5 Runtime Environment for basic functionality Java JOGL included is required for 3d visualization and image export Additionally OpenGL is required for 3d visualization Note Most windows in ParaProf can export bitmap png jpg and vector svg eps images to disk png jpg or print directly to a printer This are available through the File menu 7 1 Using ParaProf from the command line ParaProf is a java program that is run from the supplied paraprof script paraprof bat for windows bin ary release 2 paraprof help Usage paraprof options Options lt files directory gt E filetype lt filetype gt range a b c h help Specify type of performance data options are profiles default pprof dynaprof mpip gprof psrun snap perixml hpm packed cube hpc ompp gptl ipm google Load only profil es from the given range s of pr Seperate individual ids or dash defined ranges
61. HER gyro B cy gyro B2 gyro B3 gte layro B3 gtc HP s PPM socorro 5i 55 input HPR a M Cancel After selecting the value you need to select a name for the view Figure 25 6 Entering a name for the view 85 Views eoo Enter View Mame Please enter a name for this view required Applizatiznimame gyrc Bl sid HPRE Cancel x gt After creating the view you will need to exit PerfExplorer and re start it to see the view This is a known problem with the application and will be fixed in a future release Figure 25 7 The completed view eoe PeriExplorer Client File Analysis Views Charts Visualization Help Performance Data Q Cluster Results Correlation Results gt 3 Database Profiles Y LS Views gt 3 Application name gyro B1 std HPR Field Value 25 2 Creating Subviews In order to create sub views you first need to select the Create New Sub View item from the Views main menu item The first dialog box will prompt you to select the view or sub view to base the new sub view on Figure 25 8 Selecting the base view eoo Select View Select a view on which to base this sub view Applicationiname gyro B1 std HPM Z 86 Views After selecting the base view or sub view the options for creating the new sub view are the same as cre ating a new view After creating the sub view you will need to
62. K KR RK KK A primary metadata is metadata that is not nested does not contain unique data for each thread typedef struct taudb_primary_metadata char name char value UT_hash_handle hh uses the name as the key TAUDB PRIMARY METADATA primary metadata is metadata that could be nested could contain unique data for each thread and could be an array typedef struct taudb_secondary_metadata_key f struct taudb timer callpath timer callpath link back to database struct taudb thread thread link back to database roundabout way struct taudb secondary metadata parent self referencing struct taudb time range time range char name TAUDB SECONDARY METADATA KEY typedef struct taudb secondary metadata char id link back to database TAUDB SECONDARY METADATA KEY key int num values can have arrays of data char value int child count struct taudb secondary metadata children self referencing UT hash handle hh uses the id as a compound key UT hash handle hh2 uses the key as a compound key TAUDB SECONDARY METADATA these are for supporting the older schema pedef struct perfdmf experiment nt id h p ty char name struct taudb_primary_metadata primary_metadata ERFDMF EXPERIMENT typedef struct perfdmf application int id char name struct taudb_pri
63. LLPATH In this mode TAU will recorded the each event callpath to the depth set by the TAU_CALLPATH_DEPTH environ ment variable default is two Because instrumentation overhead will increase with the depth of the call path you should use the shortest call path that is sufficient Profiling 2 4 Using Hardware Counters for Measurement Performance counters exist on many modern microprocessors They can count hardware performance events such as cache misses floating point operations etc while the program executes on the processor The Performance Data Standard and API PAPI http icl cs utk edu papi pack age provides a uniform interface to access these performance counters To use these counters you must first find out which PAPI events your system supports To do so type gt papi avail Available events and hardware information Vendor string and code Model string and code CPU Revision CPU Megahertz CPU s in this Node Nodes in this System Total CPU s Number Hardware Counters Max Multiplex Counters AuthenticAMD 2 AMD K8 Revision C 15 2 000000 2592 695068 4 1 4 4 32 The following correspond to fields in the PAPI event info t structure Name Code PAPI L1 DCM 0x80000000 0x80000001 PAPI L1 ICM Next to test the papi event chooser compatibility between Avail Deriv Description Note Yes Yes Level 1 data cache misses Yes Yes Level 1 instruction cache misses
64. LUE RR KR KK KK hh hh A counter related structures ROR KR KKK KK KR KK KK A counters measure some counted value An example would be MPI message size for an MPI Send typedef struct taudb counter int id database reference struct taudb trial trial char name UT hash handle hh1 hash key for hashing by id UT hash handle hh2 hash key for hashing by name TAUDB COUNTER counters are atomic counters not just interval timers typedef struct taudb counter value key struct taudb counter counter the counter we are measuring struct taudb thread thread where this measurement is struct taudb timer callpath context the calling context can be null char timestamp timestamp in case we are in a snapshot or something TAUDB COUNTER VALUE KEY typedef struct taudb counter value TAUDB COUNTER VALUE KEY key int sample count how many times did we see take this count double maximum value what was the max value we saw double minimum value what was the min value we saw double mean value what was the average value we saw 116 TAUdb C API double standard deviation how much variance was there UT hash handle hh1 hash key for hashing by key TAUDB COUNTER VALUE RR KR KR KKK KK KK KK hh A metadata related structures RR KR KKK K
65. NT NOT NULL thread rank relative to the process thread rank INT NOT NULL thread index from 0 to N 1 thread index INT NOT NULL FOREIGN KEY trial REFERENCES trial id ON DELETE NO ACTION ON UPDATE NO ACTION metrics are things like num calls num subroutines TIME PAPI counters and derived metrics CREATE TABLE metric id SERIAL NOT NULL PRIMARY KEY trial this value belongs to 103 Database Schema trial INT NOT NULL name of the metri name VARCHAR NOT if this metric is derived BOOLEAN NOT FOREIGN KEY trial R el NULL derived by one of the tools NULL DEFAULT FALSE EFERENCES trial id ON DELETE NO ACTIO timers are timers CREATE TABI phase profiles th N ON UPDATE NO ACTION capturing some interval value For callpath or E timer id trial t S his value E parent refers to the calling function or phase RIAL NOT NULL PRIMARY KEY belongs to trial IN T NOT NULL name of the timer name VARCHAR NOT NULL short name of the timer without source or parameter info short name VARCHAR NOT NULL filename source file VARCHAR line number of the start of the block
66. Phase information is implemented as callpaths many of the callpath displays will show phase data as well For example the Call Path Text Window is useful for showing how functions behave across phases 43 Chapter 14 Comparative Analysis ParaProf can perform cross thread and cross trial anaylsis In this way you can compare two or more trials and or threads in a single display 14 1 Using Comparitive Analysis Comparative analysis in ParaProf is based on individual threads of execution There is a maximum of one Comparison window for a given ParaProf session To add threads to the window right click on them and select Add Thread to Comparison Window The Comparison Window will pop up with the thread selected Note that mean and std dev are considered threads for this any most other pur poses Figure 14 1 Comparison Window initial X Comparison Window Dx File Options Windows Help Metric Time M 11 4 128 Mean Value Exclusive Units seconds 4 802 uuu PI Recv 0 908 E MPLInit 0 699 E MPI_Sendi 0 419 EX bits 0 384 EX buts 0 376 EX MPI Wait 0 375 Eu exchange_1 0 242 mu ths 0 177 E jacu 0 168 E jacld 0 141 El bcast inputs 0 058 MPI BcastQ m 0 051 exchange_3 0 045 MPI_Allreduced 0 017 setiv 0 013 MPI Allgather 0 008 error 114 Add additional threads from any trial by the same means Figure 14 2
67. R De RAPTI RYE PR E st ruens 79 24 2 Data Summary A terere repre ome ERE RR RR EEP EERO VARER EE gr 79 24 3 Creating a Boxch rt sssini as oeer n r E ient enu br sede EEN 80 24 4 Creating Histogram EES metet ro ere EE de 81 24 5 Creating a Normal Probability Chart e 82 PANI A CE EE 84 25 T Creatine EE 84 25 2 Creating SUDVICWS EE 86 26 Running PerfExplorer Scripts esse rene eue pert prt eripe en ane e Dee eo Sodio e e Ep UE 88 26 1 Analysis Components eiiterenereeneeenee Heer 88 26 2 Scapino Interface ic tede bep Re e ege dee est 89 26 3 Example Script cte emeret EEN 89 27 Derived Metrics aceite ee EP 92 27 1 Cr atingEXpressions 225 5335 ges eene NENNEN nettes nee sieste d 92 27 2 Selecting Expressions sieren ns eias EE or EE A ETEA 92 27 3 Expression Files sissie rote ee ERE EIE ESE TEE ES 92 53 Chapter 17 Introduction PerfExplorer is a framework for parallel performance data mining and knowledge discovery The frame work architecture enables the development and integration of data mining operations that will be applied to large scale parallel performance profiles The overall goal of the PerfExplorer project is to create a software to integrate sophisticated data mining techniques in the analysis of large scale parallel performance data PerfExplorer supports clustering summarization association regression and correlation Cluster ana ly
68. R VALUE taudb create counter values int count extern TAUDB TIMER taudb create timers int count extern TAUDB TIMER PARAMETER taudb create timer parameters int count extern TAUDB TIMER GROUP taudb create timer groups int count extern TAUDB TIMER GROUP taudb resize timer groups int count TAUDB TIMER GROUP old groups extern TAUDB TIMER CALLPATH taudb create timer callpaths int count extern TAUDB TIMER CALL DATA taudb create timer call data int count extern TAUDB TIMER VALUE taudb create timer values int count extern void taudb delete trials TAUDB TRIAL trials int count KOR KR KKK KK OK RR KK A Adding objects to the hierarchy JS hh hh E he hh E e ee he A A RAYS extern void taudb add metric to trial TAUDB TRIAL trial TAUDB METRIC metric extern void taudb add time range to trial TAUDB TRIAL trial Ej TAUDB TIME RANGE time range extern void taudb add thread to trial TAUDB TRIAL trial TAUDB THREAD thread extern void taudb add secondary metadata to trial TAUDB TRIAL trial TAUDB SECONDARY METADATA secondary metadata extern void taudb add secondary metadata to secondary metadata TAUDB SECONDARY METADATA parent TAUDB SECONDARY METADATA child extern void taudb add primary metadata to trial TAUDB TRIAL trial TAUDB PRIMARY METADATA primary metadata extern void taudb add counter to trial TAUDB TRIAL trial T
69. S thread id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY timer callpath REFERENCES timer callpath id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY parent REFERENCES secondary metadata id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY time range REFERENCES time range id ON DELETE NO ACTION ON UPDATE NO ACTION create an index for faster queries against the secondary metadata table CREATE INDEX secondary metadata index on secondary metadata 107 Database Schema trial name thread pa rent RR KR KKK KK OK NA EAS t crea REATE THE METADATA R ELAT ED TABLES ACkCk ck kk ck k ck ck k ck k k k k k k kkk kk his is the view table te table taudb view id views can be S nested KOR KK ke e e e x x Ww ERIAL d pa Name CO rent name of the view view co njoin njoin FO create ta table name INT EGER R NOT NULL meters R REIGN KEY paren ON DELETE CASCADE table taudb view para the view ID udb view VA the column name for t I value of the name INT the table name for the w f the table name is o PDATE CASCADE meter EGER here clau RCHAR
70. TAU User Guide TAU User Guide Updated Nov 15 2015 for use with version 2 25 or greater Copyright 1997 2012 Department of Computer and Information Science University of Oregon Ad vanced Computing Laboratory LANL NM Research Centre Juelich ZAM Germany Permission to use copy modify and distribute this software and its documentation for any purpose and without fee is hereby granted provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation and that the name of University of Oregon UO Research Centre Juelich ZAM and Los Alamos National Laboratory LANL not be used in advertising or publicity pertaining to distribution of the software without specif ic written prior permission The University of Oregon ZAM and LANL make no representations about the suitability of this software for any purpose It is provided as is without express or implied war ranty UO ZAM AND LANL DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IN NO EVENT SHALL THE UNIVERSITY OF OREGON ZAM OR LANL BE LIABLE FOR ANY SPE CIAL INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RES ULTING FROM LOSS OF USE DATA OR PROFITS WHETHER IN AN ACTION OF CON TRACT NEGLIGENCE OR OTHER TORTIOUS ACTION ARISING OUT OF OR IN CONNEC TION WITH THE USE OR PERFORMANCE OF THIS SOFT
71. WARE Table of Contents OR Be EE D I Tau User Gude EE 1 1 Tau Instrumentation cent Gr ertet eerte Desa oe ee etre reine beet 3 1 1 Types of Instrumenation mieneerneeneeneeeneeeneeeeeee IH mee 3 1 2 Dynamic instrumentation through library pre loading 3 1 3 TAU scripted compilation sssese IH Here 3 1 34 Instrumenation cepo tet ier orum eg ederet dot irse 3 1 3 2 Compiler Based Instrumentation 4 1 3 3 Source Based Instrumentation 4 1 3 4 Options to TAU compiler scripts 4 1 4 Selectively Profiling an Application sse HH 5 1 4 1 Custom Profiling sisii one eem Tre cH tk vce etv green 5 De Proline sles Seege Se rm er ST EK in nde oe PRSE EPI S EFC ETE ERES 7 2 1 Running the Application sss eene nenne hene ren nenne 7 2 2 Reducing Performance Overhead with TAU_THROTTLE s 7 2 3 Profiling each event Callpath ss reete ere reper insta cies 7 2 4 Using Hardware Counters for Measurement 8 KT 10 3 1 Generating Event Traces ot eese ee S le eee Uo dee ded 10 4 Analyzing Parallel Applications erereeereneeenenevnne nene ne mee 11 4 Text Run E EE 11 4 2 ParaProb str Gere tr step ei eH
72. a summary of the performance data in the database select the Show Data Summary item under the Visualization main menu item 79 Visualization Figure 24 2 Data Summary Window eoe Summirizarion Analysis 2 avg exclusive avg exclusive avg calls avg exclusive max min siddey stddev range 148 097 27 5 485 159 137 3 076 3 410 920 E 50 68 218 419 3 440 009 3 169 518 61 980 194 7 847 102 3 851 756 10 379 765 7 858 701 7 844 652 999 82 42 889 89 021 10 4 248 989 42 706 41 886 125 662 6 819 345 DO 556 12 265 7 429 6 572 66 891 71 088 27 035 25 2 843 535 71 113 71 059 6 766 0 004 754 230 156 H 101 7 867 625 757 047 752 633 303 349 0 022 2 294 469 0 001 101 22 718 2 379 2 244 14 74 D 227 813 23 0 112 25 9 112 529 228 059 227 387 39 408 0 007 328 622 634 0 161 100 3 286 226 329 771 327 121 417 024 0 025 157 284 851 0 077 26 6 049 417 157 472 157 125 51 884 0 012 12 323 665 0 006 556 22 165 12 902 12 125 86 167 0 001 12 134 025 0 006 556 21 824 12 521 11 922 62 659 4 788 719 0 002 556 8 513 5 33 4 659 46 885 5 118 220 2 512 50 102 364 401 5 322 15 5 005 247 54 819 037 1 419 136 0 696 1 668 850 801 32 85 1 411 972 732 407 554 854 428 0 272 250 2 219 418 556 050 554 079 256 157 357 027 69 0 175 27 13 223 248 357 315 355 737 544 914 ERRFUNC 24 302 567 0 012 1 24 302 567 24 845 23 678 138 443 FFT_CLOSE 240 424 1 240 424 247 230 1 109 FFT_SETUP 146 289 663 0 072 1 146 289 662 146 460 145 969 46 618 9 634 772 0 005
73. aks Po re Y e EHE SORRY 92 21 3 Expression PUES o RE EDS EHE HR 92 IW TA UAB LEER 93 28 Inttfoductloh si iii iore mee ree Age tn ENEE SERA 95 28 1 Prerequisites eps to t reo reel Pre erre ro EPERE IS REEE a 95 28 2 Installation cedet sn re Ee rere TERRE Eve E ERU Sin eye ds 95 29 Using T AUdb irem a SU ER ere tant da seu 98 29 1 perfdmf createapp deprecated only supported for older PerfDMF databases uec 98 29 2 perfdmf createexp deprecated only supported for older PerfDMF databases LER 98 29 3 taudb loadtrial sorore ete ege d ege Ce Vox Decor tese oe yo rae 98 2074 TAVAD EE 100 30 Database Sch mas ctt e reet coenae deed ma a nent evene 101 30 E SOL for TAUdb 55505 sertie nn seen tenta ib 101 31 TA UAB A RE Ee Hp err PR Ee ipie 111 31 1 TAUdb C API Overview ses 111 31 2 TAUdb C Structures ror t tb REPRISES 111 31 3 TAUdb A O 117 31 4 TAUdb C API Examples f gerer terere tena 123 31 4 1 Creating a trial and inserting into the database 123 31 4 2 Querying a trial from the database 125 vi List of Figures 4T Main Data Window EE 11 42 Mam Data Window 5 Eege rei ee e aan kaa See 12 SR 14 6 2 Flat Profile with Loops ee trie tin irre eves I REIR SERE yer opens 14 6 3 MEIGpS per 100p oec keela al rte ak s cet nt ea 15 6 4 Callpath Profile EE 16 6 5 Tracing with Vampires shirt ore Dal dee Re kod Ue regu 17 6 6 Scalabilit
74. atmull f90 31 9 36 14 443194 eeng MPI_Recv 81095 5 MAIN 49569 lj MPI Bcast 45669 D Loop MAIN matmult 90 86 9 106 14 12412 MPL Sendo 8959 Loop INITIALIZE matmult 0 17 9 21 14 8953 Loop INITIALIZE matmutt 30 10 9 14 14 5609 2 MPI Finalize 2932 667 MULTIPLY MATRICES 2577 667 Loop MAIN matmult 90 117 9 128 14 2091 8 MPI Barrier 1875 667 Loop MAIN matmult f90 112 9 115 14 1833 Loop MAIN matmult 90 71 9 74 14 107 Loop MAIN matmult 90 77 SH84 14 30 INITIALIZE 14 25 MPI Comm rank 1 MPI Comm ste Here is how to instrument loops in an application tenv TA EFILE opt apps tau tau2 x86 64 lib Makefile tau mpi pdt setenv TA o op oe EGIN INSTRUM IONS optTauSelectFile select tau optVerbose NT SECTION ops routine f D INSTRUMENT SECTION opt apps tau tau2 x86 64 bin path tau f90 sh Or edit Makefile and change F90 tau f90 sh run job o o oe oe paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk 6 3 A What MFlops am getting in all loops A Create a flat profile with PAPI FP INS OPS and time with loop instrumentation Figure 6 3 MFlops per loop Metric PAPI FP INS GET TIME OF DAY Value Exclusive Units Derived metric shown in microseconds format 770 699 UC Loop MULTIPLY MATRICES
75. ava util import HashSet from java util import ArrayList True 1 False 0 def glue print doing phase test for gtc on jaguar load the trial Utilities setSession perfdmf demo triall Utilities getTrial gtc bench Jaguar Compiler Options fasts resultl TrialResult triall print got the data f get the iteration inclusive totals events ArrayList for event in resultl getEvents if event find Iteration gt 0 and result1 getEventGroupNane eve if event find Iteration gt 0 and event find gt lt 0 events add event extractor ExtractEventOperation resultl events 89 Running PerfExplorer Scripts extracted extractor processData get 0 print extracted phases f derive metrics derivor DeriveMetricOperation extracted PAPI L1 TCA PAPI L1 TCM D derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCA PAPI L1 TCM PAP derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCM PAPI L2 TCM D derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger pro
76. b_next_primary_metadata_by_name_from_trial current current gt value void dump secondary metadata TAUDB SECONDARY METADATA metadata printf d secondary metadata fields n HASH COUNT metadata TAUDB SECONDARY METADATA current NULL taudb next secondary metadata by key from trial current for current metadata current current printf s s n current gt void dump trial TAUDB CONNECTION connection boolean haveTrial TAUDB TRIAL trial key name current value 0 TAUDB TRIAL filter 125 TAUdb C API 1f haveTrial trial filter else trial taudb_query_trials connection FALSE filter TAUDB TIMER timer taudb query main timer connection trial printf Trial name s id d main s n n trial name trial gt id timer gt name int main int argc char argv printf Connecting n TAUDB_CONNECTION connection NULL if argc gt 2 connection taudb connect config argv 1 else fprintf stderr Please specify a TAUdb config file n exit 1 printf Checking connection n taudb check connection connection printf Testing queries n int t test the find trials method to populate the trial TAUDB TRIAL filter taudb create trials 1 filter gt id atoi argv 2 TAUDB TRIAL trials taudb query trials connecti
77. be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 5 Timesteps per Second 69 Charts eoe Timesteps per Second er Second 800 total tin 3 75 3 50 3 25 3 00 2 75 2 50 2 25 2 00 Timssisps 1 75 1 50 1 25 1 00 0 75 0 50 0 25 0 25 50 75 100 125 150 175 200 225 250 275 300 225 350 375 400 425 450 475 500 525 Number of Processors B B sld nD2 cheetah affnosng 22 2 2 Relative Efficiency The Relative Efficiency chart shows how an application scales with respect to relative efficiency That is as the number of processors increases by a factor the time to solution is expected to decrease by the same factor with ideal scaling The fraction between the expected scaling and the actual scaling is the relative efficiency If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 22 6 Relative Efficiency iive Efficiency 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B 1 std nl2 cheetah affnosng 22 2 3 Relative Efficiency by Event The Relative Effici
78. cation file runhyG3 F 1550 1744 E do OpenMP location file runhyd3 F 1835 20062 do OpenMP location file runhyd3 F 2092 2256 B do OpenMP location file runhyd3 F 667 889 gt do OpenMP location file runhyd3 F 978 1166 1 63 Chapter 21 Correlation Analysis Correlation analysis in PerfExplorer is used to explore relationships between events in a profile Each event is pairwise plotted with the other events and a correlation coefficient is calcuated for the relation ship When the events are highly positively correlated coefficient of close to 1 0 or highly negatively correlated coefficient close to 1 0 then the relationships will show up as linear groupings in the res ults Clusters may also be apparent 21 1 Dimension Reduction Often many hundreds of events are instrumented when profile data is collected Clustering works best with dimensions less than 10 so dimension reduction is often necessary to get meaningful results Cur rently there is only one type of dimension reduction available in PerfExplorer To reduce dimensions the user specifies a minimum exclusive percentage for an event to be considered significant To reduce dimensions select the Select Dimension Reduction item under the Analysis main menu bar item The following dialog will appear Figure 21 1 Selecting a dimension reduction method 000 Dimension Reduction Selecta dimension reduc
79. cation on the callgraph CREATE TABLE timer value what node in the callgraph and thread is this timer call data INT NOT NULL what metric is this metric INT NOT NULL The inclusive value for this timer inclusive value DOUBLE PRECISION The exclusive value this timer exclusive value DOUBLE ECISION The inclusive percent is timer inclusive percent DOU The exclusive percent exclusive percent DOU The variance for this sum exclusive squared DOU FOREIGN KEY timer call da ON DELETE NO ACTION ON FOREIGN KEY metric REFER ON DELETE NO ACTION ON Li o ED a E Ed H 1O E HN i ipa J a is timer ECISION RECISION UJ ct H Ps D Z QE Q ri FERENCES timer call data id NO ACTION jS metric id DATE NO ACTION Had Eal C one metric one thread one timer CREATE INDEX timer value index on timer value timer call data metric KOR KR KKK KK KK KK KKK A CREATE THE COUNTER RELATED TABLES KOR KR KKK KK kCkCK KK KK A counters measure some counted value CREATE TABLE counter id SERIAL NOT NULL PRIMARY KEY trial INT NOT NULL name VARCHAR NOT NULL FOREIGN K Y trial REFEREN
80. cessData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCM PAPI L2 TCM PAP derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI FP INS P WALL CLOCK TI derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI FP INS PAPI TOT INS derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 print derived metrics f get the Statistics dostats BasicStatisticsOperation extracted False stats dostats processData print got stats return for metric in stats get 0 getMetrics grapher DrawMMMGraph stats metrics HashSet metrics add metric grapher set metrics metrics grapher setTitle GTC Phase Breakdown metric grapher setSeriesType DrawMMMGraph TRIALNAME grapher setCategoryType DrawMMMGraph EVENTNAM grapher setValueType AbstractResult INCLUSIVE grapher setXAxisLabel Iteration grapher setYAxisLabel Inclusive metric grapher setLogYAxis True grapher processData lt Hs graph the significant events in the
81. connection PERFDMF APPLICATION application char name extern TAUDB TRIAL perfdmf query trials TAUDB CONNECTION connection PERFDMF EXPERIMENT experiment get the data sources extern TAUDB DATA SOURCE taudb query data sources TAUDB CONNECTION connection extern TAUDB DATA SOURCE taudb get data source by id TAUDB DATA SOURCE data sources const int id extern TAUDB DATA SOURCE taudb get data source by name TAUDB DATA SOURCE data sources const char name using the properties set in the filter find a set of trials extern TAUDB TRIAL taudb query trials TAUDB CONNECTION connection boolean complete TAUDB TRIAL filter extern TAUDB PRIMARY METADATA taudb query primary metadata TAUDB CONNECTION connection TAUDB TRIAL filter extern TAUDB PRIMARY METADATA taudb get primary metadata by name TAUDB PRIMARY METADATA primary metadata const char name extern TAUDB SECONDARY METADATA taudb query secondary metadata TAUDB CONNECTION connection TAUDB TRIAL filter get the threads for a trial extern TAUDB THREAD taudb query threads TAUDB CONNECTION connection TAUDB TRIAL trial 118 TAUdb C API extern TAUDB THREAD taudb query derived threads TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB THREAD taudb get thread TAUDB THREAD threads int thread index extern int taudb get t
82. d by MPIScheduler postMPIRecvs The other 60 calls do not amount to much time 11 6 User Event Statistics Window Figure 11 8 User Event Statistics Window X n c t 2 0 0 Application 18 Experiment 32 Trial 87 Ox File Options Windows Help Sorted By Number of Samples NumSamples Max Min Mean Std Dev Name 380 281712 4 53601 94022 Message size received from all nodes 380 281600 4 53576 94001 Message size sent to all nodes 214 24 4 12 43 7 237 Message size for gather 181 112 4 23 823 40 191 Message size for reduce This display shows a pprof style text view of the user event data Right clicking on a User Event will give you the option to open a Bar Graph for that particular User Event across all threads See Sec tion 15 1 User Event Bar Graph 11 7 User Event Thread Bar Chart Figure 11 9 User Event Thread Bar Chart Window 38 Thread Based Displays X User Event Window flas hmemory ppk amorris home DEES File Options Windows Help Thread n c t 0 0 0 Value Type Max Value 22528 mmm Message size sent to all nodes 16328 a Message size received from all nodes 2023 3 DBASETREE DBASENEIGHBORBLOCKLIST Heap Memory KB 2023 3 AMR_GUARDCELL_C_TO_F Heap Memory KB E ee 4400 2 Message size for broadcast z 2055 3 EH MPLSsend Heap Memory KB 2055 3 ER ME Waitany Heap Memory KB m
83. db C API db add timer callpath to trial t rial timer callpath er call data gt key timer callpath er call data gt key thread threa er call data gt calls BK er call data gt subroutines 0 db add timer call data to trial d timer callpath trial timer call data gt metric metric or 5 million microsec mer value gt inclusive 5000000 mer value exclusive 5000000 timer value gt inclusive percentag m m er valu 5 seconds ct H 3 er_value gt exclusive_percentag er_value gt sum_exclusive_squared db_add_timer_value_to_timer_call onds 100 0 100 0 0 0 data timer call data timer value compute stats printf Computing Sta ESTO An taudb_compu save the printf Tes boolean update boolean cascade taudb save te statistics trial trial ting inserts n FALSE TRUE trial connection trial update printf Disconnecting n taudb_disco nnect connection cascade printf Done n return 0 include taudb_api h include lt stdio h gt include lt string h gt void dump_metadata TAUDB_PRIMARY_ME printf d metadata fields n TAUDB PRIMARY METADATA curren for current metadata current current printf s s n current gt name 31 4 2 Querying a trial from the database TADATA metadata HASH_COUNT metadata LS l NULL taud
84. deal scaling The fraction between the expected scaling and the actual scaling is the relative efficiency If there is more than one event to choose from and you have not yet selected an event of interest you may be prompted to select the event of interest see Section 22 1 3 Event of Interest If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 8 Relative Efficiency one Event eoe Relative Efficiency for Event 1 05 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 65 0 60 0 55 S 0 50 0 45 0 40 0 35 0 30 0 25 0 20 0 15 0 10 0 05 0 00 n 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E BA atd nD cheelah affnosng 71 Charts 22 2 5 Relative Speedup The Relative Speedup chart shows how an application scales with respect to relative speedup That is as the number of processors increases by a factor the speedup is expected to increase by the same factor with ideal scaling The ideal speedup is charted along with the actual speedup for the application If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart select
85. differ ent layers of instrumentation can be combined To use tau_exec place this command before the application executable when running the application In this example IO instrumentation is requested gt tau exec io a out mpirun np 4 tau exec io a out 1 3 TAU scripted compilation 1 3 1 Instrumenation 1 3 2 1 3 3 1 3 4 Tau Instrumentation For more detailed profiles TAU provides two means to compile your application with TAU through your compiler or through source transformation using PDT Compiler Based Instrumentation TAU provides these scripts tau cc sh tau cxx sh tau upc sh tau f77 sh and tau f90 sh to compile programs You might use tau cc sh to compile a C program by typing module load tau gt tau cc sh tau options optCompInst samplecprogram c On machines where a TAU module is not available you will need to set the tau makefile and or options The makefile and options controls how will TAU will compile you application Use gt tau_cc sh tau makefile path to makefile tau options option samplecprogram c The Makefile can be found in the arch 1ib directory of your TAU distribution for example x86 64 lib Makefile tau mpi pdt You can also use a Makefile specified in an environment variable To run tau cc sh so it uses the Make file specified by environment variable TAU MAKEFILE type export TAU MAKEFILE path to tau arch lib make
86. e Options menu of the ParaProf Manager Window Figure 9 3 Creating Derived Metrics X ParaProf Manager 0 x File Options Help Applications Field Value o ES Standard Applications Name multi mpilieb amorris home 7 Default App Application ID 10 Default Exp Experiment ID jo rial ID 0 o C multi mpilieb amorris home Trial ID PAPI FP INS PAPI_L1_DCM GET_TIME_OF_DAY PAPI_FP_INS GET TIME OF DAY gt 7 Runtime Applications CH DB Applications Argument 1 0 0 0 0 PAPI FP INS Argument 2 0 0 0 2 GET TIME OF DAY Divide S 2 Apply operation In Figure 9 3 Creating Derived Metrics we have just divided Floating Point Instructions by Wall clock time creating FLOPS Floating Point Operations per Second The 2nd argument is a user editable text box and can be filled in with scalar values by using the keyword val e g val 1 5 9 5 Main Data Window Upon loading a profile or double clicking on a metric the Main Data Window will be displayed 28 Profile Data Management Figure 9 4 Main Data Window X ParaProf uintah16 ppk packed data DOEN File Options Windows Help Metric P WALL CLOCK TIME alue Exclusive std dev mean nct 0 00 n c t 10 0 nct 2 0 0 nct 3 0 0 n c t 4 0 0 n c t5 0 0 n c t 60 0 nct 7 0 0 n c t 8 0 0 nct 9 0 0 n c t 10 0 0 nct 11 0
87. e profiled and how they are profiled If you are using one of the TAU compiler wrapper scripts to instrument your applica tion you can use the tau options optTauSelectFile lt file gt option to enable selective instrumentation Note Ce Selective instrumentation is only available when using source level instrumentation PDT To specify a selective instrumentation file create a text file and use the following guide to fill it in e Wildcards for routine names are specified with the mark because symbols show up in routine signatures The mark is unfortunately the comment character as well so to specify a leading wild card place the entry in quotes e Wildcards for file names are specified with symbols Here is a example file Tell tau to not profile these functions BEGIN EXCLUDE LIST Eal void quicksort int int int The next line excludes all functions beginning with sort and having arguments int void sort int zi void interchange int int END EXCLUDE LIST Exclude hese files from profiling BEGIN FIL EXCLUDE LIST Lr CT SO END FILE EXCLUDE LIST BEGIN INSTRUMENT SECTION A dynamic phase will break up the profile into phase where each events is recorded according to what phase of the application f in which it occured Tau Instrumentation dynamic phase name fool bar f
88. ead t ON cv thread t id E OR REPLACE VIEW atomic location profile ECT FROM atomic event value WHERE thread gt 0 Si n gt FH OR REPLACE VIEW atomic total summary ECT FROM atomic event value WHERE thread 2 FH 1 TE OR REPLACE VIEW atomic mean summary ELECT FROM atomic event value WHERE thread gt 1 110 Chapter 31 TAUdb C API 31 1 TAUdb C API Overview The C API for TAUGb is currently under development but there is a beta version of the API available The API provides the following capabilities 31 2 Loading trials from the database Inserting trials into the database Parsing TAU profile files TAUdb C Structures The C structures are roughly organized as a tree with a trial object at the root taudb trial A top level structure which contains the collections of all the performance data dimen sions taudb primary metadata Name value pairs which describe the properties of the trial taudb secondary metadata Name value pairs which describe the properties of the trial Unlike primary metadata values secondary metadata objects can have complex value types They are also associated with a measurement context a thread of execution a timer a timestamp an iteration etc taudb thread A structure which represents a thread of execution in the parallel measure
89. ead type id name description VALUES 5 MAX MAX INSERT INTO derived thread type id name description VALUES 6 MEAN MEAN nulls are 0 value INSERT INTO derived thread type id name description VALUES 7 STDDEV STDDEV nulls are 0 value RRR RK KKK kk k koe ke ke ke ke ke e e e e e x x x CREATE THE TRIAL TABLE RR KR KK KK KKK koc k ko ke ke ke ke e e ke e x x x trials are the top level table CREATE TABLE trial id SERIAL NOT NULL PRIMARY KEY name VARCHAR where did this data come from data source INT number of processes node count INT legacy values these are actually max values i e not all nodes have this many threads contexts per nod INT how many threads per node threads per context INT E total number of threads total_threads INT reference to the data source table FOREIGN KEY data_source REFERENCES data_source id ON DELETE NO ACTION ON UPDATE NO ACTION RR KKK KKK KK KK NA CREATE THE DATA DIMENSIONS ROR KKK KKK KR OK A threads are the location dimension CREATE TABLE thread id SERIAL NOT NULL PRIMARY KEY trial this thread belongs to trial INT NOT NULL process rank really node rank INT NOT NULL legacy value context rank I
90. ed in PerfExplorer Both hierarchical and k means analysis are used to group parallel profiles into common clusters and then the clusters are summarized Initially we used similarity measures computed on a single parallel profile as input to the clustering algorithms although other forms of input are possible Here the performance data is organized into multi dimensional vectors for analysis Each vector represents one parallel thread or process of execution in the profile Each dimension in the vector represents an event that was profiled in the application Events can be any sub region of code including libraries functions loops basic blocks or even individual lines of code In simple clustering examples each vector represents only one metric of measurement For our purposes some dissimilarity value such as Euclidean or Manhattan dis tance is computed on the vectors As discussed later we have tested hierarchical and k means cluster analysis in PerfExplorer on profiles with over 32K threads of execution with few difficulties 20 1 Dimension Reduction Often many hundreds of events are instrumented when profile data is collected Clustering works best with dimensions less than 10 so dimension reduction is often necessary to get meaningful results Cur rently there is only one type of dimension reduction available in PerfExplorer To reduce dimensions the user specifies a minimum exclusive percentage for an event to be considered sig
91. ency By Event chart shows how each event in an application scales with respect to relative efficiency That is as the number of processors increases by a factor the time to solution is ex pected to decrease by the same factor with ideal scaling The fraction between the expected scaling and the actual scaling is the relative efficiency If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of Interest To request this chart 70 Charts select one or more experiments or one view and select this chart item under the Charts main menu item Figure 22 7 Relative Efficiency by Event eoe Relative Efficiency by Event Relativ 145 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 85 0 60 0 50 0 45 1 0 40 0 35 0 30 0 25 0 20 0 15 0 40 0 05 6 00 1 M eet 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors Coll Golltr AVO NL Mk Y eias fiekt gt lin RHS I other 22 2 4 Relative Efficiency for One Event The Relative Efficiency for One Event chart shows how one event from an application scales with re spect to relative efficiency That is as the number of processors increases by a factor the time to solu tion is expected to decrease by the same factor with i
92. es TAU INSERT INTO data sourc name id description VALUES DynaProf 2 PAPI DynaProf profiles UTK INSERT INTO data sourc name id description VALUES mpiP 3 mpiP Lightweight Scalable MPI Profiling Vetter INSERT INTO data sourc name id description VALUES HPM 4 HPM Toolkit profiles IBM INSERT INTO data sourc name id description VALUES gprof 5 gprof profiles GNU INSERT INTO data sourc name id description VALUES psrun 6 PerfSuite psrun profiles NCSA INSERT INTO data sourc name id description VALUES pprof 7 TAU pprof dat output TAU INSERT INTO data sourc name id description VALUES Cube 8 Cube data FZJ INSERT INTO data sourc name id description VALUES HPCToolkit 9 HPC Toolkit profiles Rice Univ INSERT INTO data sourc name id description VALUES SNAP 10 TAU Snapshot profiles TAU INSERT INTO data sourc name id description VALUES OMPP 11 OpenMP Profiler profiles Fuerlinger INSERT INTO data_sourc name id description VALUES PERIXML 12 Data Exchange Format PERI INSERT INTO data_sourc name id description VALUES GPTL 13 General Purpose Timing Library ORNL INSERT INTO data sourc name id description VALUES Paraver 14 Paraver profiles BSC INSERT INTO data sourc name id description VALUES IPM 15 Integrated Performance Monitoring NERSC INSERT INTO data sourc name id description VALUES Google
93. ess than 16 Views can also have Sub Views For example it might be useful to have a View of all Trials from a certain machine and then Sub Views for each executable ran on that machine Trials can belong to any number of Vlews and Sub Views and new Trials loaded to the database will be sorted into Views automatically 8 1 To Create a Sub Views Launch ParaProf and Right click on a database or an existing View and select Add View or Add Sub View Figure 8 1 Add View eoo e TAU ParaProf Manager File Options Help O Applications 7 Standard Applications gt 3 Demo jdbc h2 home users smillst ParaProf Demo perfdmf AUTO SERVER TRUE o perfexplorer working jdbc h2 home users smillst ParaProf perfexplorer working AUTO SERVER TRUE gt 7 Default jdbc h2 home users smillst ParaProf perfdmf perfdmf AUTO SERVER TRUE gt 7 regression taudb jdbc postgresql taudb nic uoregon edu 5432 regression_taudb o 7 Nov2 jdbc h2 home users smillst ParaProf Nov2 perfdmf AUTO SERVER TRUE o C1TAUdb jdbc h2 home users smillst ParaProf TAlldb perf F ER TRUE gt 3200 Add Trial 9 EA Al Trials Add View o Y 200 16p e Y 200 2p gt Y 200 4p o Y 200 8p gt Y 800 16p o Y 800 2p gt Y 800 4p Y 800 8p o 3800 All Trials In ParaProf and PerfExplorer Views are marked by the Folder Icon and Trials are now marked with a yellow ball The All T
94. ex TAUDB TAUDB db nex TAUDB TAUDB T db next TAUDB M db nex TAUDB M db TA db au au au DB next next 1 alysis routines n void taudb co DATA SO ay mpu Lk URCE timer call data TAUDB TIMER VALUE timer value taudb parse tau profiles const char directory name te statistics TAUDB TRIAL trial data source by name from connection MORO DATA DATA SO data source by id from connection DATA SOU EAD hread by RLS tric by 4 Q tric by SOURCE F URCE current Lk RCE current index from trial TAUDB THR name from trial TAUDB MI id from trial TAUDB MI F RANGI ES Ta nge_by_id_from_trial TAU DB TI next M er by name from trial TAUDB DB TI next M er by id f rom trial DB TI next M name from group TAUDB TIM DB TI M D UE GRO er group by name fro EAD current ETRIC current ETRIC current h DB TIME RANGE current TIM ER current TAUDB TIMER current ER current m trial Hl SZ JC D UE p E H CG GRO RO D UE current r group by name fro m timer al SAA m D m R m R mer by R m x D m Ve EH D D x R GROUE PARAMET r parame D D Hl SAHA H H H does
95. exit PerfExplorer and re start it to see the sub view This is a known problem with the application and will be fixed in a future release Figure 25 9 Completed sub views eoo PerfExplorer Client File Analysis Views Charts Visualization Help gt 3 Database Profiles Se v 3 Views Field Value Y 9 Application name gyro B1 std HPM Y J Trial problem definition FULL gt 3 FULL16 gt 3 FULL32 gt 2 FULL64 gt 3 FULL96 gt 3 FULL128 Y 49 Trial problem definition HALF gt 3 HALF16 gt 3 HALF32 gt 3 HALF64 gt 3 HALF96 gt 2 HALF128 Y 12 Trial problem definition MIN gt 3 MIN16 gt 3 MIN32 gt 3 MIN64 gt 3 MIN96 gt 3 MIN128 87 Chapter 26 Running PerfExplorer Scripts 26 1 As of version 2 0 PerfExplorer has officially supported a scripting interface The scripting interface is useful for adding automation to PerfExplorer For example a user can load a trial perform data reduc tion extract out key phases derive metrics and plot the result Analysis Components There are many operations available including e BasicStatisticsOperation e CopyOperation e CorrelateEventsWithMetadata e CorrelationOperation e DeriveMetricOperation e DifferenceMetadataOperation e DifferenceOperation e DrawBoxChartGraph e DrawGraph e DrawMMMGraph e ExtractCallpathEventOperation e ExtractEventOperation e ExtractMetricOperation e ExtractNonCallpathEventOperati
96. f interest i e FP OPS TIME 4 Phase of execution i e iteration number timestamp 5 Dynamic timer context i e parameter values Counter dimensions 1 Process and thread of execution 2 Timer source code location 1 e foo 3 Phase of execution i e iteration number timestamp 4 Dynamic timer context i e parameter values SQL for TAUdb Below is the SQL schema definition for TAUdb RRR RK KKK kk k kk k kk ke ke ke ke ke ke e e e e x Ax CREATE THE STATIC TABLES RR KKK KKK kk k kk k kk ke ke ke 2 CREATE TABLE schema version version INT NOT NULL description VARCHAR NOT NULL IF THE SCHEMA IS MODIFIED INCREMENT THIS VALUE 0 PERFDMF ORIGINAL 1 TAUDB APRIL 2012 VALUES 1 TAUdb redesign from Spring 2012 S INSERT INTO schema version version description S 2 Changes after Nov 9 2012 release These are our supported parsers CREATE TABLE data source id INT UNIQUE NOT NULL name VARCHAR NOT NULL description VARCHAR 101 Database Schema INSERT INTO data sourc name id description VALUES ppk 0 TAU Packed profiles TAU INSERT INTO data sourc name id description VALUES TAU profiles 1 TAU profil
97. file gt export TAU OPTIONS optCompInst gt tau cc sh sampleCprogram c Similarly if you want to set compile time options like selective instrumentation you can use the TAU OPTIONS environment variable Source Based Instrumentation TAU provides these scripts tau cc sh tau cxx sh tau upc sh tau f77 sh and tau f90 sh to instrument and compile programs You might use tau cc sh to compile a C program by typing gt module load tau gt tau cc sh samplecprogram c When setting the TAU MAKEFILE make sure the Makefile name contains pdt because you will need a version of TAU built with PDT A list of options for the TAU compiler scripts can be found by typing man tau compiler shorin this chapter of the reference guide Options to TAU compiler scripts These are some commonly used options available to the TAU compiler scripts Either set them via the TAU OPTIONS environment variable or the tau options optio to tau cc sh 4 Tau Instrumentation tau cxx sh tau upc sh tau f77 sh and tau f90 sh optVerbose Enable verbose output default on optKeepFiles Do not remove intermediate files optShared Use shared library of TAU consider when using tau exec 1 4 Selectively Profiling an Application 1 4 1 Custom Profiling TAU allows you to customize the instrumentation of a program by using a selective instrumentation file This instrumentation file is used to manually control which parts of the application ar
98. file based databases when security is not an issue taudb configure Configuration file NOT found a new configuration file will be created Welcome to the configuration program for PerfDMF This program will prompt you for some information necessary to nsure the desired behavior for the PerfDMF tools You will now be prompted for new values if desired The current or default values for each prompt are shown in parenthesis To accept the current default value just press Enter Return Pleas nter the name of this configuration documentation example Pleas nter the database vendor oracle postgresql mysql db2 derby or h2 h2 Pleas nter the JDBC jar file Pl Users khuck src tau2 apple lib h2 jar he JDBC Driver name org h2 Drive as nter bs EE F 6C Pleas nter the path to the database directory Users khuck ParaProf documentation example Pleas nter the database username D Store the database password in CLEAR TEXT in your configuration file y n y Pl Pl as nter the database password as nter the PerfDMF schema file 96 Introduction Users khuck src Writing configurat tau2 etc taudb sql ion file Users khuck ParaProf perfdmf cfg documentation example Now testing your database connection Database created jdbc h2 Users khu Uploading Schema Found Users khuck c
99. fmatmult f90 31 9 36 14 223 39 zx Loop INITIALIZE matmull f90 10 9 14 14 223 24 Loop INITIALIZE matmult f90 17 9 21 14 171 855 Loop MAIN matmult f20 71 9 74 14 170 862 Esc Loop MAIN matmult f20 112 9 115 14 122 96 Loop MAIN matmult f90 117 9 4128 14 37 549 MULTIPLY MATRICES 21 367 B INITIALIZE 13 795 Loop MAIN matmul 90 86 9 106 14 11 MPI Comm geen 8 935 Loop MAIN matmull 90 779 84 14 1 131 MPI Send 0 794 MPI Comm rank 0 647 MPI Bcast 0 355 MPI Recv 0 171 MPI Barrier 0 115 MPI_Finalize 0 023 MAI Here is how to generate a flat profile with FP operations 15 Some Common Application Scenario setenv TA setenv TA cat selec MAKEFILE opt apps tau tau2 x86 64 lib Makefile tau papi mpi pdt pgi OPTIONS optTauSelectFile select tau optVerbose tau EGIN INSTRUMENT SECTION oops routine j4 ND INSTRUMENT SECTION o dp oe CT CC Flr UJ un WU t path opt apps tau tau2 x86 64 bin Spath make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh setenv TAU METRICS GET TIME OF DAY PAPI FP INS qsub run job paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk Choose Options gt Show Derived Panel gt Arg 1 PAPI FP INS Arg 2 GET TIME OF DAY Operation Divide gt Apply close 6 4 Q Who calls MPI Barrier Whe
100. hot tau slog2 o A oe Launching Jumpshot will bring up the main display window showing the entire trace zoom in to see more detail Figure 4 2 Main Data Window 12 Chapter 5 Quick Reference tau tau tau tau tau tau TAU TAU _run TAU s binary instrumentation tool _cc sh tau options optCompInst tau cxx sh _options optCompinst tau f90 sh tau options optCompInst _upc sh tau options optCompInst tau f77 sh options optCompInst Compiler instrumentation _cc sh tau cxx sh tau f90 sh tau f77 sh tau upc sh PDT instrumentation MAKEFILE Set instrumentation definition file OPTIONS Set instrumentation options dynamic phase name name file filename line start line 4 to line end line Specify dynamic Phase loops file filename routine routine name Instrument outer loops memory file filename routine routine name io TAU PROFILE TAU TRACE Track memory file filename routine routine name Track IO I Enable profiling and or tracing PROFILEDIR TRACEDIR TAU TAU TAU tau Set profile trace output directory CALLPATH 1 TAU CALLPATH DEPTH Enable Callpath profiling set callpath depth Tj FU _THROTTLE 1 TAU_THROTTLE_NUMCALLS TAU_THROTTLE Enable event throttling set number of call percall us threshold Ll ERCALI METRICS List of PAPI metrics to profi
101. hs may obscure it 11 5 Thread Call Path Relations Window Figure 11 7 Thread Call Path Relations Window C Call Path Data n c t 1 1 1 Application 18 Experiment 32 Trial 87 File Options Windows Help Sorted By Exclusive Units seconds Exclusive gt 14 934 9 051 gt 9 052 5 726 Metric Name GET TIME OF DAY Inclusive 9 051 9 052 5 726 Calls Tot Calls 1 1 1 4 34 8 8 4 4 1 1 11 1214 6 6 5 5 2 214 212 214 214 214 395 30 90 60 90 90 223 223 Name id main void int char 6 MPI Init thread 133 MPI Attr get 123 MPI Attr put 124 MPI Errhandler set 130 MPI Keyval create 136 MPI Type commit 148 MPI Type contiguous 149 MPI Type struct 154 MPIScheduler actuallyCompile 143 MPIScheduler execute 144 MPI Allreduce 122 MPI Type size 153 MPIScheduler postMPIRecvs 145 Relocate relocateParticles MPIScheduler execuj MPI_Recw 141 MPIScheduler processMPIRecvs 146 37 Thread Based Displays This display shows callpath data in a gprof style view Each function is shown with its immediate par ents For example Figure 11 7 Thread Call Path Relations Window shows that MPI Recv is call from two places for a total of 9 052 seconds Most of that time comes from the 30 calls when MPI Recv is calle
102. id database id also key to hash struct taudb_trial trial int node_rank which process does this thread belong to int context_rank which context USUALLY 0 int thread_rank what is this thread s rank in the process int index what is this threads OVERALL index ranges from 0 to trial thread count 1 struct taudb_secondary_metadata secondary_metadata UT_hash_handle hh TAUDB THREAD metrics are things like TIME PAPI counters and derived metrics typedef struct taudb metric int id database value also key to hash char name key to hash hh2 boolean derived was this metric measured or created by a post processing tool UT hash handle hh1 hash index for hashing by id UT hash handle hh2 hash index for hashing by name TAUDB METRIC W Time ranges are ways to delimit the profile data within time ranges They are also useful for secondary metadata which is associated with a specific call to a function typedef struct taudb time range int id database value also key to hash int iteration start int iteration end uint64 t time start uint64 t time end was this metric measured UT hash handle hh TAUDB TIME RANGE timers are interval timers capturing some interval value For callpath or phase profiles the parent refers to the calling function or phase Timers can also be sample locations
103. ile foo c line 26 to line 27 instrument all the outer loops in this routine loops file loop test cpp routine multiply f tracks memory allocations deallocations as well as potential leaks memory file foo f90 routine INIT tracks the size of read write and print statements in this routine io file foo f90 routine RINB END INSTRUMENT SECTION Selective instrumentation files can be created automatically from ParaProf by right clicking on a trial and selecting the Create Selective Instrumentation File menu item Chapter 2 Profiling This chapter describes running an instrumented application generating profile data and analyzing that data Profiling shows the summary statistics of performance metrics that characterize application per formance behavior Examples of performance metrics are the CPU time associated with a routine the count of the secondary data cache misses associated with a group of statements the number of times a routine executes etc 2 1 Running the Application After instrumentation and compilation are completed the profiled application is run to generate the pro file data files These files can be stored in a directory specified by the environment variable PRO FILEDIR By default profiles are placed in the current directory You can also set the TAU VERBOSE enviroment variable to see the steps the TAU measurement systems takes when your application is run ning Example
104. imer callpath on call data timer callpath EX counter name index on counter name timer call data thread on timer call data thread RO RO RO RO RO RO RO RO RO VIE VIEW VIEW VIE VIEW VIEW TU UU tU FEI PG TO DU SHORT TI mos ERM FIX tly VIEW IF VIEW IF VIEW IF VIEW IF IF IF IF EF IF IF EAT E OR REPLACE These views make sure that charts for now nterval_location_profile nterval_mean_summary nterval_total_summary nterval_event_value nterval event tomic location profile Comic mean summary tomic total summary tomic event value tomic event D D D D DH H H H pH EW interval event R E S HHATH PAO ZZ Z P PHA HH hj zZ Z DA DO Source file ROM timer callpath ER JOIN ER JOIN m h N E OR trial REPLACE group name source file line number line number end ECT tcp id t trial t name tg group name t line number t line number end tcp timer t ON tcp timer t id timer group tg ON tg timer t id VIEW interval event value n EE 3 HI ct ct Zwad Z OR D ELI hread ran inclusiv terval event clusive node context thread metric inclusive percentage exclusive percentage exclusive call subrout
105. in Data Window 28 UO CEET 30 10 1 Triangle Mesh Plot ee tette tetro e bet peer eve i 30 109 253 D Bar EE 30 10 3 3 D Scatter Plot erret rette a vans valss etre tan Sesh eet eer 31 10 4 3 D Topology Plot 31 10 5 3 DCommication Matrix coseson deett eee Ero tes tee Pe UR ED tos ed 33 11 Thread Based Dis play Sii epe re Ot ose eive 34 TIT Thread E EE 34 11 2 Thread Statistics Text Window sess em eee 34 11 3 Thread Statistics Table sat resp ie sente tie ESE 35 11 4 Call Graph Window ss 36 11 5 Thread Call Path Relations Window sse HH 37 11 6 User Event Statistics Window sss em emen 38 11 7User Event Thread Bar Chart rte tee RO REPRE Ti 38 12 Function Based Displays hassan derer eter vetere ghee ore Eo DP ee Sed o o E eu pee een 40 12 1 Function Bar Graph ede E E Rr eed 40 12 2 Function HastOSram be oet de ds Sener eee dss dese E ed eg Werte eet 40 I3 Phase Based Displays susi is o REA UU 42 13 1 Using Phase Based Displays sss HH 42 14 Comparative Analy Sis roe oe re ter veg tity ege reto tede e 44 14 1 Using Comparitive Analysis esses 44 15 Miscellaneous Displays ss 46 15 1 User Event Bar Graph 114 eigen err tee e he 46 15 2 s 46 15 2 1 Function Ledger pr ttt Pet roda 46 15 2 2 Group Ledger reete euh ee PETS 47 15 2 3 User Event edger 1 et ene Ee 47 15 3 Selective Instrumentation
106. ines clusive per call ECT tcd timer callpath t node rank t context rank tv metric tv inclusive percent valu sum exclusive sguared Ed E subrouti timer R JOIN R JOIN tv exclusive percent tv exclusive value tcd calls tv inclusive value tcd calls tv sum exclusive squared value tv imer call data tcd on tv timer call data tcd id bread t on tcd thread t id Up vD TE OR ECT H E OR RE VIEW interval location profile interval event value WHERE thread gt 0 nD u gt I ECT VIEW interval total summary from interval event value WHERE thread 2 109 Database Schema Ju CREATE OR REPLACE VIEW interval_mean_summary AS SELECT from interval event value WHERE thread 1 CREATE OR REPLACE VIEW atomic event id trial name group name source file line number AS SELECT c id c trial c name NULL NULL NULL FROM counter c CREATE OR REPLACE VIEW atomic event value atomic event node context thread sample count maximum value minimum value mean value standard deviation AS SELECT cv counter t node rank t context rank t thread rank cv sample count cv maximum value cv minimum value cv mean value cv standard deviation FROM counter value cv INNER JOIN thr
107. ing PerfExplorer To run PerfExplorer type perfexplorer When PerfExplorer loads you will see on the left window all the experiments that where loaded into PerfDMF You can select which performance data you are interested by navigating the tree structure PerfExplorer will allow you to run analysis operations on these experiments Also the cluster analysis results are visible on the right side of the window Various types of comparative analysis are available from the drop down menu selected To run an analysis operation first select the metric of interest form the experiments on the left Then perform the operation by selecting it from the Analysis menu If you would like you can set the clustering method dimension reduction normalization method and the number of clusters from the same menu The options under the Charts menu provide analysis over one or more applications experiments views or trials To view these charts first choose a metric of interest by selecting a trial form the tree on the left Then optionally choose the Set Metric of Interest or Set Event of Interest form the Charts menu if you don t and you need to you will be prompted Now you can view a chart by selecting it from the Charts menu 56 Chapter 20 Cluster Analysis Cluster analysis is a valuable tool for reducing large parallel profiles down to representative groups for investigation Currently there are two types of clustering analysis implement
108. ion is done likewise with the right mouse button Zooming is done with the mousewheel and the and keyboard buttons 10 1 Triangle Mesh Plot Figure 10 1 Triangle Mesh Plot X ParaProf Visualizer Application 13 Experiment 23 Trial 58 OH File Options Windows Help a Triangle Mesh Bar Plot Scatter Plot Height Metric Exclusive v Time Color Metric Exclusive e Time MPI_RecvO Function Thread Height value 14 37 seconds Color value 14 37 seconds Mesh Plot Axes ColorScale Render Plot Width dy Plot Depth EE Plot Height _ Transparency This visualization method shows two metrics for all functions all threads The height represents one chosen metric and the color another These are selected from the drop down boxes on the right To pinpoint a specific value in the plot move the Function and Thread sliders to cycle through the avail able functions threads The values for the two metrics in this case for MPI Recv on Node 351 the value is 14 37 seconds 10 2 3 D Bar Plot Figure 10 2 3 D Mesh Plot 30 3 D Visualization This visualization method is similar to the triangle mesh plot It simply displays the data using 3d bars instead of a mesh The controls works the same Note that in Figure 10 2 3 D Mesh Plot the transpar ency option is selected which changes the way in which the selection model operates 10 3 3
109. is chapter de scribes the Function Bar Graph Window and the Function Histogram Window 12 1 Function Bar Graph Figure 12 1 Function Bar Graph X Function Data Window miranda16k ppk packed data o 1x File Options Windows Help Name MPI BarrierQ Metric Name Time Value Exclusive Units seconds 31917 e std dev 71077 eg mean 120 61 e 1510 00 123 28 eg N C t 1 0 0 124 56 m 29 0512 00 126 4 mER 99 0512 00 127 12 ee 1514 00 126 8 w 515 00 126 24 m QI 0516 00 126 52 Q 1517 00 124 6 2 MM n6 0 0 125 45 012 00 125 4 i 10 0 0 12747 e nct 11 0 0 1275 O 5012 00 126 91 m 0112 00 126 76 mam N C t 14 0 0 129 48 EE Ct 15 0 0 12229 Qeew nc 16 0 0 1222 mam 94 5027 00 121 5 n c t 18 0 0 n 0119 00 n c t 20 0 0 n 5121 00 NC 22 0 0 n 9 122 00 n A 012400 KASSETI 125 0 0 SI seacaa Z This display graphs the values that the particular function had for each thread along with the mean and standard deviation ac
110. iteration start and end can be used to indicate which loop iterations or calls to a function are relevant for this time range CREATE TABLE time range id SERIAL NOT NULL PRIMARY KEY starting iteration iteration start INT NOT NULL ending iteration iteration end INT starting timestamp time start BIGINT NOT NULL ending timestamp time end BIGINT timer call data records have the dynamic information for when a node in the callgraph is visited by a thread If you are tracking dynamic callstacks you would use the time range field If you are storing snapshot data you would use the time range field CREATE TABLE timer call data id SERIAL NOT NULL PRIMARY KEY what callgraph node is this timer callpath INT NOT NULL what thread is this thread INT NOT NULL how many times this timer was called calls INT how many subroutines this timer called subroutines INT what is the time range this is for supporting snapshots time range INT FOREIGN KEY timer callpath REFERENCES timer callpath id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY thread REFERENCES thread id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY time range REFERENCES time range id 105 Database Schema ON DELETE NO ACTION ON UPDATE NO ACTION timer values have the timer of one timer on one thread for one metric at one lo
111. l create database taudb with owner taudb MySQL From the MySQL prompt mysql create database taudb Oracle It is recommended that you create a tablespace for taudb 95 Introduction create tablespace taudb datafile path to somewhere size 500m reuse Then create a user that has this tablespace as default create user amorris identified by db grant create session to amorris grant create table to amorris grant create sequence to amorris grant create trigger to amorris alter user amorris quota unlimited on taudb alter user amorris default tablespace taudb TAUdb is set up to use the Oracle Thin Java driver You will have to obtain this jar file for your DBMS In our case it was ojdbc14 jar Configure a TAUdb connection To configure TAUdb run the taudb configure program from the TAU bin directory The configuration program will prompt the user for several values The default values will work for most users When configuration is complete it will connect to the database and test the configura tion If the configuration is valid and the schema is not already found in the database as will be the case on initial configuration the schema will be uploaded Be sure to specify the correct version of the schema for your DBMS An example session for configuring a database is below The user is creating an H2 database with default settings including no username and no password recommended for
112. le treemerge pl Merge traces to one file tau 2otf tau2vtf tau2slog2 Trace conversion tools 13 Chapter 6 Some Common Application Scenario 6 1 Q What routines account for the most time How much A Create a flat profile with wallclock time Figure 6 1 Flat Profile Metric P VIRTUAL TIME Value Exclusive Units seconds 9647 318 C LEO IKSWEEPT 4357 213 ed LEO BICGSOT 2669 887 x LEG MATVECT 1777 752 I SOLVE SPECIES EO 1417 986 ia SOLVE LIN EO 1028 448 e PHYSICAL PROP 783 402 RRATES 682 376 H LEO MSOLVET 530 858 H INIT AB M 463 788 CALC MASS FLUX SPHR 446 025 H INIT MU S 421 747 CALC RESID S 381 363 SOLVE ENERGY EO 371 199 SOURCE PHI 258 829 DRAG GS Here is how to generate a flat profile with MPI setenv TAU MAKEFILE opt apps tau tau2 x86 64 1lib Makefile tau mpi pdt pgi set path opt apps tau tau2 x86 64 bin path make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh qsub run job paraprof pack app ppk Move the app ppk file to your desktop o o oe oe 2 paraprof app ppk 6 2 Q What loops account for the most time How much A Create a flat profile with wallclock time with loop instrumentation Figure 6 2 Flat Profile with Loops 14 Some Common Application Scenario Metric GET TIME OF DAY Value Exclusive Units microseconds 1729975 833 UU Loop MULTIPLY MATRICES m
113. lecting the metric of interest select the Do Correlation Analysis item under the Analysis main menu bar item confirmation dialog will appear and you can either confirm the correlation re quest or cancel it After confirming the correlation the analysis will begin When the analysis results are available you can view them in the Correlation Results tab Figure 21 4 Correlation Results 65 Correlation Analysis Ti TAU PerfExplorer Client e Eile Analysis Views Charts Visualization Help EJ NPB LU mer linl gov Analysis Management Cluster Results Correlation Results Custom Charts CA NPB LU neuronic nic uoregon edu gt CA NPB LU on MCR parametric gt 7 PatWorley o CS PERI cI PoP 9 S3D jacquard 9530 Jaguar ORNL 53D p655 3 EI SHAMRC gt 7 simple papi DDYNAMIC MATRIX gt simple papi DSTATIC MATRIX 7 Simulation MCR weak scaling d 6 s a EI SMG2000 8 y id C socorro Si256 input Dor d gt 5 SPhot SEH C sPPM 0 0 05 1 0 Frost t p 316 16 10 eg P WALL CLOCK TIME 0 75 OPAPLEE NS zm eal PAPLINT INS 025 P 9 PAPI_TOT_CYC 0 00 MM PAPI TOT IIS 00 05 10 PAPI TOT INS CI MCR Weak Scaling den oe CI MCR Weak Scaling unmodified 028 gt A sweep3d es Zu o C Sweep3d R o C Sweep 3d ooo La o C Sweep3D o test 1 00 s
114. ler directives or source transformation using PDT Here is a table that lists the features requirement for each method Table 1 1 Different methods of instrumenting applications Method Requires Requires Shows Routine Low level Throttling Ability to recompil PDT MPI level event events to reduce exclude file ing events loops overhead from in phases strumenta etc tion Interposi Yes Yes tion Compiler Yes Yes Yes Yes Yes Source Yes Yes Yes Yes Yes Yes Yes The requirements for each method increases as we move down the table tau_exec only requires a sys tem with shared library support Compiler based instrumentation requires re compiling that target ap plication and Source instrumentation aditionally requires PDT For this reason we often recommend that users start with Library interposition and move down the table if more features are needed 1 2 Dynamic instrumentation through library pre loading Dynamic instrumentation is achieved through library pre loading The libraries chosen for pre loading determine the scope of instrumentation Some options include tracking MPI io memory cuda opencl library calls MPI instrumentation is included by default the others are enabled by command line options to tau_exec More info at the tau_exec manual page Dynamic instrumentation can be used on both uninstrumented binaries and binaries instrumented via one of the methods below in this way
115. mary_metadata primary_metadata S PERFDMF APPLICATION endif TAUDB STRUCTS H 31 3 TAUdb C API ifndef TAUDB API H define TAUDB API H 1 117 TAUdb C API include taudb structs h when a get function is called this global has the number of top level objects that are returned extern int taudb numItems the database version extern enum taudb database schema version taudb version to connect to the database extern TAUDB CONNECTION taudb connec extern TAUDB CONNECTION taudb connec config char config name config file char config file name test the connection status extern int taudb check connection TAUDB CONNECTION connection disconnect from the database extern int taudb disconnect TAUDB CONNECTION connection KR KR KR KK hh eh NA query functions KR KR KKK KK OK KK KK A functions to support the old database schema avoid these if you can extern PERFDMF APPLICATION perfdmf query applications TAUDB CONNECTION connection extern PERFDMF EXPERIMENT perfdmf query experiments TAUDB CONNECTION connection PERFDMF APPLICATION application extern PERFDMF APPLICATION perfdmf query application TAUDB CONNECTION connection char name extern PERFDMF EXPERIMENT perfdmf query experiment TAUDB CONNECTION
116. ment taudb time range A structure which holds a time range value of beginning and ending iteration numbers or timestamps taudb metric A structure which represents a unit of measurement such as TIME FP OPS L1 DCM etc taudb timer A structure which represents a region of code For example a phase a function a loop a basic block or even a line of code taudb timer parameter A structure which represents parameter values when parameter based profiling is used taudb timer group A structure which represents a semantic grouping of timers such as I O MPI OpenMP etc taudb timer callpath A structure which represents a node in the dynamic callpath tree Timer_callpaths with a null parent are either top level timers or a timers in a flat profile taudb timer call data A structure which represents a tuple between a thread of execution and a node on the timer callpath tree taudb timer value A structure which represents a tuple between a timer call data object and a metric The timer value contains the measurement of one metric for one timer on one thread of exe cution 111 TAUdb C API taudb counter A structure which represents a counter in the profile For example the number of bytes transferred on an MPI Send timer taudb counter value A structure which represents a counter measurement on one thread of execu tion Below are the object definitions from the TAUdb C header file ifndef TAU
117. mposition 646 218 5 966 j Wun Baier 1 338 2 0 BB MPI_wtimeo 0 07 2 0 9 Mistarup int int char 5 65 1 8 o Bi uri casto 2 791 1 1 ME MPi_car_coordso 0 061 1 0 o e wi Con createo 0 594 1 3 Mur can shirto 0 087 1 o Wun Gomm ranko 0 054 2 0 Wun Comm aen 0 051 1 ol o mei nito 46 352 1 39 y Figure 11 5 Thread Statistics Table X Thread Statistics n c t 0 0 0 depth200 mpilieb amorris home ML HE File Options Windows Help A al hy rh EE Name A Time Calls Child Calls 9 Bran 9 Sg 1 2 997 S HcollectSolution darray darray Decomposition Grid 2 562 1 52 Bl CreateArray void darray int int 0 148 1 0 Bl Oumperror void darray darray 0 668 1 0 Finalize void darray darray Grid 0 834 1 4 init_darrays void darray darray Decomposition Grid 0 24 1 2 Miteration 61 629 2 983 14 915 9 W Exchange void darray Decomposition Grid 94 62 5 966 11 932 BlMPI Reco 633 558 5 966 0 Bun sendo 228 118 5 966 0 o EI mPLAllreduceo 926 325 2 983 2 983 sweep double darray darray Decomposition 646 218 5 966 0 Wun Baier 1 338 z 0 BE MPi_wtimeo 0 07 2 0 9 Bar int tint char 5 65 1 8 o Wurt gcasto 2 791 1 1 MP Con cooraso 0 061 al oL o ME MPi_cart_createg 0 594 1 3 Wun Con nmn 0 087 1 0 WPi Comm ranko 0 054 2 RE The display can be used in one of two ways in inclusive exclusive mode both the inclusive and ex clusive values are sho
118. n mappings between timers and groups typedef struct taudb_timer_group char name struct taudb_timer timers UT_hash_handle trial_hash_by_name UT_hash_handle timer_hash_by_name hash of timers using timer hash handle hh3 hash handle for trial hash handle for timers TAUDB TIMER GROUP ame foo x y timer parameters are parameter based profile values an example is foo timer would be the index of the timer with the n e E x y where x 4 and y 10 in that example lt x gt lt 4 gt lt y gt lt 10 gt this table would have two ntries one for the x value and one for the y value he parameter can also be a phase iteration index typedef struct taudb timer parameter char name char value UT hash handle hh TAUDB TIMER PARAMETE R callpath objects contain the merged dynamic callgraph tree seen during execution typedef struct taudb timer callpath int id link back to database and hash key struct taudb timer timer struct taudb timer callpath parent which timer is this callgraph parent char name a string which has the aggregated callpath UT hash handle hh1 hash key for hash by id UT hash handle hh2 hash key for name a gt b gt c lookup TAUDB TIMER CALLPATH timer call data objects are observations of a node of the callgraph
119. ned volume with rank points in order of rank For more information please see the etc topology directory for additional details on MESP topology definitions If the loaded profile is a cube file or a profile from a BGB then this visualizations groups the threads in two or three dimensional space using topology information supplied by the profile When topology metadata is available a trial specific topological layout may be visualized by selecting Windows gt gt 3D Visualization and selecting Topology Plot on the visualization pane The layout tab allows control of the layout and display of visualized cores processes Minimum Maximum Visible restricts display of nodes with measured values above below the selected levels Lock Range causes the sliders to move in unison The X Y Z Axis sliders allow selection of planes lines and individual points in the topology for examin ation of specific values in the display listed in the Avg Color Value field The topology selection dropdown box allows selection of either trial specific topologies contained in the metadata mapped topologies stored in an external file or a custom topology defined by the size of the prism containing the visualized cores The button allows selection of a custom topology mapping file while the map button allows selection of a map file see lt tau2 gt etc topology README cray map for 32 3 D Visualization more information on generating map files If a C
120. ng een tree rere DESEE dia TE een 3 2 Profine T ETE 7 2 1 Running the Application sss mene nennen nennen 7 2 2 Reducing Performance Overhead with TAU_THROTTLE sss 7 2 3 Profiling each event callpath miteeeeeeeeeneeeeeeeeeneenee enne even eee 7 2 4 Using Hardware Counters for Measurement 8 3 TTAGING Mm 10 3 1 Generating Event Traces nosia ege oed eee tee JURI EE d 10 4 Analyzing Parallel Applications ss 11 4 1 TextSUMMALY e ree ere riukiu e Ee ORE Pe ded 11 ADE SALE 11 4 3 Jumpshot ut iia secs Reseed div podre etre EEN ENNEN 12 5 Quick Reference tos eo d EE Eed 13 6 Some Common Application Scenario idees 14 6 1 Q What routines account for the most time How much 14 6 2 Q What loops account for the most time How much 14 6 3 Q What MFlops am I getting in all loops see 15 6 4 Q Who calls MPI Barrier Where essssesesee m 16 6 5 Q How do I instrument Python Code sese 17 6 6 Q What happens in my code at a given time 17 6 7 Q How does my application scale sss A 18 Chapter 1 Tau Instrumentation 1 1 Types of Instrumenation TAU provides three methods to track the performance of your application Library interposition using tau exec compi
121. nificant To reduce dimensions select the Select Dimension Reduction item under the Analysis main menu bar item The following dialog will appear Figure 20 1 Selecting a dimension reduction method OOO Dimension Reduction Select a dimension reduction method f t Over X Percent nd Cancel DK Select Over X Percent The following dialog will appear Figure 20 2 Entering a minimum threshold for exclusive percentage eoo Minimum Percentage Only s events with exclusive time 96 greater than X where 6 lt 199 Enter a value for example 1 20 2 Max Number of Clusters By default PerfExplorer will attempt k means clustering with values of k from 2 to 10 To change the maximum number of clusters select the Set Maximum Number of Clusters item under the Analysis 57 Cluster Analysis main menu item The following dialog will appear Figure 20 3 Entering a maximum number of clusters eoo Max Clusters Enter the max number of clusters lt 10 E Cancel E 20 3 Performing Cluster Analysis To perform cluster analysis you first need to select a metric To select a metric navigate through the tree of applications experiments and trials and expand the trial of interest showing the available met rics as shown in the figure below Figure 20 4 Selecting a Metric to Cluster eoe ParfExplorer Client File Analysis Views Charts Visualization Help
122. nterest To reguest this chart select one experiment or view and select this chart item under the Charts main menu item Figure 22 15 Relative Speedup per Phase 75 Charts Relative Speedup by Event dup by Pla stabnl2 cher for gyro 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E iteraton D 9 Iteration 1 4 Iteration 2 Itemton3 eration 4 Y Iteration 5 Iteration 6 gt lieration 7 T iteration 8 lt iteration 9 ideal 22 3 3 Phase Fraction of Total Runtime The Phase Fraction of Total Runtime chart shows the breakdown of the execution by phases and shows how that breakdown changes as the number of processors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 22 1 2 Metric of In terest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 22 16 Phase Fraction of Total Runtime 0 0 0 Total Runtime Breakdown Total Runtln ln eme 100 95 90 85 Ba g 75 E 70 m 65 60 E 55 S 50 45 40 a5 25 20 15 10 amp 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Number of Processors BiteiationO W iteration 1 Miteraton2 T iteration iteration 4 Miteraton 5 iteration 8 Iteration 7 W iteration 8 W iterations 76
123. ocate Contact exMomintegrated MPlscheduler execute Contact exMomlnterpolated MPIScheduler executed DataArchiver outputCheckpointReduction MPI5cheduler execute0 MPIScheduler actuallyCompileQ MPIScheduler executed MPIScheduler postMPIRecvs0 MPIScheduler processMPIRecvsQ MPI AllreduceQ MPI_Attr_get MPI_Attr_put MPI Bsend MPI Buffer attach MPI Buffer detach MPI Comm rankQ MPI_Comm_sizeQ v gt OOGOHOHOOGEOEOEOCEEOE il The function ledger shows each function along with its current color As with other displays showing functions you may right click on a function to launch other function specific displays 15 2 2 Group Ledger Figure 15 3 Group Ledger X Group Ledger Window uintah16 ppk packe X File Windows Help H Contact exMomintegrated Bl contact exMominterpolated Bl Data rchiver outputCheckpointReduction O mel Wl wu cactualiyinitialize MPM applyExternalLoads Bl mem computeinternalForce E MPM computelnternalHeatRate E MPM computeStressTensor M memsintegrateAcceleration O MPM integrateTemperatureRate Bl Mem interpolateParticlesToGrid Bl vru interpolateToParticlesAndUpdate H mem primPaniclecount Bl mem setGridBoundaryConditions O MPM solveEguationsMotion Bl nen solveHeatEquations MPM updateErosionParameter Bl Relocate relocateParticles E TAU CALLPATH E TAU DEF
124. of Processors m Harness Scaling Study ideal 18 Some Common Application Scenario How to examine a series of profiles in PerfExplorer setenv TAU MAKEFILE opt apps tau tau2 x86 64 lib Makefile tau mpi pdt set path opt apps tau tau2 x86 64 bin path make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh qsub runlp job paraprof pack lp ppk qsub run2p job AP oe oe o oe oe oe paraprof pack 2p ppk and so on On your client taudb configur create default taudb configure run without any arguments will prompt for advanced options perfexplorer configure Yes to load schema defaults paraprof load each trial Right click on trial gt Upload trial to DB perfexplorer Charts Speedup o9 al o9 oe 19 Part II ParaProf User s Manual Table of Contents 37 Introductio EE 22 7 1 Using ParaProf from the command line ee 22 7 2 Supported Formats disti estt etes et eerte eet eege ed 23 1 3 Command line Options ss 5 irre oo D S creo eee 23 8 VIEWS and SUb VI WS eco dek Ses repeto ne ende pete deep Nope Exe Ute Eeer de Vd 25 8 1 To Create a Sub Views sise 25 9 Profile Data Management esee oe tert iet deret oie EE Le dese bo Eee en ee 27 9 1 ParaProf Manager Window 27 KEE EE 27 9 3 Database Interaction 28 9 4 Creating Derived Metrics cerisier riss nier e rebote P s ei SUN ste 28 9 5 Ma
125. ommand Ck ParaProf documentation example perfdmf AUTO SERVER TRU EC Users khuck src tau2 etc taudb sql src tau2 etc taudb sql Loading Successfully uploaded schema Database connection successful Configuration complete 97 Chapter 29 Using TAUdb The easiest way to interact with TAUdb is to use ParaProf which provides a GUI interface to all of the database information In addition the following commandline utilities are provided 29 1 perfdmf createapp deprecated only supported for older PerfDMF databases This utility creates applications with a given name perfdmf createapp n New Application Created Application ID 24 29 2 perfdmf createexp deprecated only supported for older PerfDMF databases This utility creates experiments with a given name under a specified application perfdmf createexp a 24 n New Experiment Created Experiment ID 38 29 3 taudb loadtrial This utility uploads a trial to the database with a given name under a specified experiment taudb loadtrial h Usage perfdmf loadtrial a lt appName gt x lt expName gt n name options files Required Arguments n name text Specify the name of the trial a applicationname string Specify associated application name for this trial x xperimentname lt string gt Specify associated experiment name for this trial OL se n name lt te
126. on TRUE int numTrials taudb numItems for t 0 t lt numTrials t t 1 printf Trial name s id d n trials t name trials t id dump metadata trials t primary metadata dump secondary metadata trials t secondary metadata dump trial connection amp trials t TRUE printf Disconnecting n taudb_disconnect connection printf Done n return 0 filter 126
127. on e ExtractPhasesOperation ExtractRankOperation e KMeansOperation e LinearRegressionOperation e LogarithmicOperation e MergeTrialsOperation e MetadataClusterOperation e PCAOperation 88 26 2 26 3 Running PerfExplorer Scripts e RatioOperation e ScalabilityOperation e TopXEvents e TopXPercentEvents Scripting Interface The scripting interface is in Python and scripts can be used to build analysis workflows The Python scripts control the Java classes in the application through the Jython interpreter http www jython org There are two types of components which are useful in building analysis scripts The first type is the PerformanceResult interface and the second is the PerformanceAnalysisComponent interface For docu mentation on how to use the Java classes see the javadoc in the perfexplorer source distribution and the example scripts below To build the perfexplorer javadoc type gt make javadoc in the perfexplorer source directory Example Script from glue import PerformanceResult from glue import PerformanceAnalysisOperation from glue import ExtractEventOperation from glue import Utilities from glue import BasicStatisticsOperation from glue import DeriveMetricOperation from glue import MergeTrialsOperation from glue import TrialResult from glue import AbstractResult from glue import DrawMMMGraph from edu uoregon tau perfdmf import Trial from j
128. or each thread CREATE TABLE primary_metadata trial INT NOT NULL name VARCHAR NOT NULL value VARCHAR FOREIGN KEY trial REFERENCES trial id ON DELETE NO ACTION ON UPDATE NO ACTION create an index for faster queries against the primary metadata table CREATE INDEX primary metadata index on primary metadata trial name secondary metadata is metadata that could be nested could contain unique data for each thread and could be an array CREATE TABLE secondary metadata id VARCHAR NOT NULL PRIMARY KEY trial this value belongs to trial INT NOT NULL this metadata value could be associated with a thread thread INT this metadata value could be associated with a timer that happened timer callpath INT which call to the context timer was this time range INT this metadata value could be a nested structure parent VARCHAR the name of the metadata field name VARCHAR NOT NULL the value of the metadata field value VARCHAR this metadata value could be an array so tokenize it is array BOOLEAN DEFAULT FALSE FOREIGN KEY trial REFERENCES trial id ON DELETE NO ACTION ON UPDATE NO ACTION FOREIGN KEY thread REFERENCE
129. ortran C C Java and Python TAU Tuning and Analysis Utilities is cap able of gathering performance information through instrumentation of functions methods basic blocks and statements The TAU API also provides selection of profiling groups for organizing and controlling instrumentation Calls to the TAU API are made by probes inserted into the execution of the application via source transformation compiler directives or by library interposition This guide is organized into different sections Readers wanting to get started right way can skip to the Common Profile Requests section for step by step instructions for obtaining difference kinds of per formance data Or browse the starters guide for a quick reference to common TAU commands and vari ables TAU can be found on the web at http tau uoregon edu Part Tau User Guide Table of Contents 1 Tau e ge EE 3 1 1 Types of InstrumenatiOn err err rette rero REENEN 3 1 2 Dynamic instrumentation through library pre loading 3 1 3 TAU scripted compilation ss 3 1 3 1 Instrumenation EEN 3 1 3 2 Compiler Based Instrumentation 4 1 3 3 Source Based Instrumentation 4 1 3 4 Options to TAU compiler scripts e 4 1 4 Selectively Profiling an Application sse HH 5 1 4 1 Custom Profili
130. otal threads TAUDB THREAD threads get the metrics for a trial extern TAUDB METRIC taudb query metrics TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB METRIC taudb get metric by name TAUDB METRIC metrics const char name extern TAUDB METRIC taudb get metric by id TAUDB METRIC metrics const int id get the time ranges for a trial extern TAUDB TIME RANGE taudb query time range TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIME RANGE taudb get time range TAUDB TIM Ex RANGE time_ranges const int id get the timers for a trial extern TAUDB_TIMER taudb query timers TAUDB CONNECTION connection TAUDB TRIAL trial extern TAUDB TIMER taudb get timer by id TAUDB TIMER timers int id extern TAUDB TIMER taudb get trial timer by name TAUDB TIMER timers const char id extern TAUDB TIMER taudb get trial timer by name TAUDB TIMER timers const char id extern TAUDB TIMER GROUP taudb query timer groups TAUDB CONNECTION connection TAUDB TRIAL trial extern void taudb parse timer group names TAUDB TRIAL trial TAUDB TIMER timer char group names extern TAUDB TIMER GROUP E taudb get timer group from trial by name TAUDB TIMER GROUP timers const char name extern T
131. ownload profile data edit meta data launch visual displays export data derive new metrics etc 9 2 Loading Profiles To load profile data select File gt Open or right click on the Application s tree and select Add Trial Figure 9 2 Loading Profile Data X Load Trial DOE Trial Type Tau profiles zl Select Directory Jhome amorris cone Le 27 Profile Data Management Select the type of data from the Trial Type drop down box For TAU Profiles select a directory for other types files 9 3 Database Interaction Database interaction is done through the tree view of the ParaProf Manager Window Applications ex pand to Experiments Experiments to Trials and Trials are loaded directly into ParaProf just as if they were read off disk Additionally the meta data associated with each element is show on the right as in Figure 9 1 ParaProf Manager Window A trial can be exported by right clicking on it and selecting Export as Packed Profile New trials can be uploaded to the database by either right clicking on an entity in the database and se lecting Add Trial or by right clicking on an Application Experiment Trial hierarchy from the Stand ard Applications and selecting Upload Application Experiment Trial to DB 9 4 Creating Derived Metrics ParaProf can created derived metrics using the Derived Metric Panel available from th
132. periment 12 and give the trial the name HPM data 01 perfdmf loadtrial a NPB2 3 x parametric n 64 par64 ppk This will load packed profile par64 ppk into the experiment named parametric under the application named NPB2 3 and give the trial the name 64 The application and experiment will be created if not found TAUdb supports a large number of parallel profile formats TAU Profiles profiles Output from the TAU measurement library these files generally take the form of profile X X X one for each node context thread combination When multiple counters are used each metric is located in a directory prefixed with MULTI To launch ParaProf with all the metrics simply launch it from the root of the MULTI directories e ParaProf Packed Format ppk Export format supported by PerfDMF ParaProf Typically ppk e TAU Merged Profiles snap Merged and snapshot profile format supported by TAU Typically tauprofile xml e TAU pprof pprof Dump Output from TAU s pprof d Provided for backward compatibility only e DynaProf dynaprof Output From DynaProf s wallclock and papi probes mpiP mpip Output from mpiP gprof gprof Output from gprof see also the fixnames option e PerfSuite psrun Output from PerfSuite psrun files e HPM Toolkit hpm Output from IBM s HPM Toolkit e Cube cube Output from Kojak Expert tool for use with Cube 99 29 4 Using TAUdb e Cube3 c
133. piler cc name compiler cc version compiler java dirpath compiler java version compiler userdata configure prefix configure arch configure pp configure cc corifigure_jdk configure profile configure userdata userdata Valus HPhi 16 1 In order to examine this data in a scalability study it is necessary to reorganize the data However it is not necessary to re load the data Using views in PerfExplorer you can re organize the data based on values in the database 25 1 Creating Views To create a view select the Create New View item under the Views main menu item The first step is to select the table which will form the basis of the view The three possible values are Application Experiment and Trial Figure 25 2 Selecting a table 84 Views OOO select Level Select a level in the hierarchy Application he After selecting the table you need to select the column on which to filter Figure 25 3 Selecting a column OOO Select Column Select a column to filter name Hd After selecting the column you need to select the operator for comparing to that column Figure 25 4 Selecting an operator OOO select Operator Select an operator LS Cancel Cox After selecting the operator you need to select the value for comparing to the column Figure 25 5 Selecting a value eoo Select Value Select a value to filter required Uintah WEI gyro B1 std BI zm
134. rA e El e H md Ho R_PARAMET current R ter by name from timer ER curre CALLPATH r callpa SAA 25 hy j O R CALLPAT CALLPATH r callpat Hl Sud 25 j O R CALLPAT CALL DATA h by name n E from trial H curren h by id f t rom trial H curren ta by key Ey from trial er call da ER CALL H SZ H rt CALL DATA Ct ri UJ ct H UJ ct Hi UJ cf Hi UJ ct H LU cf H UJ eft er call da DATA curre nt from trial o D ID h R CALL R M M R M M R VALUE ta by id DATA curre Ht 122 TAUdb C API extern TAUDB E extern TAUDB extern TAUDB de taudb next timer value by metric from timer call data Ur A taudb nex TAUD taudb nex TAUD taudb nex TAUD UJ UJ ct Oa UJ ct ECONDARY M secondary metadata by key from trial SECONDARY METADATA current ECONDARY METADATA S S TAUDB TIMER VALUE current extern TAUDB COUN taudb next counter by name from trial TAUDB COUNTER current extern TAUDB COUNTER taudb next counter by id from trial TAUDB COUNTER current extern TAUDB COUNTER VALUE taudb next counter value by key from trial TAUDB COUNTER VALUE current ARY METADATA TER
135. re A Create a callpath profile with given depth o o op oe oe oe Figure 6 4 Callpath Profile Call Graph for n c t 0 0 0 tmp private File Options Windows Help Here is how to generate a callpath profile with MPI setenv TAU MAKEFILE opt apps tau tau2 x86 64 lib Makefile tau mpi pdt set path opt apps tau tau2 x86 64 bin path make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh AP o oe oe 16 Some Common Application Scenario setenv TAU CALLPATH 1 setenv TAU CALLPATH DEPTH 100 oo ap qsub run job paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk Windows Thread gt Call Graph o oe 6 5 Q How do instrument Python Code A Create an python wrapper library Here to instrument python code setenv TAU MAKEFILE opt apps tau tau2 x86 64 lib Makefile tau icpc python mpi p set path opt apps tau tau2 x86 64 bin path setenv TAU OPTIONS optShared optVerbose Python needs shared object based TAU library make F90 tau f90 sh CXX tau cxx sh CC tau cc sh build pyMPI w TAU cat wrapper py import tau def OurMain import App tau run OurMain Uninstrumented mpirun lsf pyMPI 2 4b4 bin pyMPI App py Instrumented setenv PYTHONPATH lt taudir gt x86 64 lib bindings python mpi pdt pgi same options string as TAU MAKEFILE setenv LD LIBRARY PATH taudir x86
136. reel rre PR Cy ot ex Qro pp eo io ere dese eo Ufo Pr eun 40 12 2 Eunction Histogram 2 o rure ee ete SE NE Pee eng 40 13 T Imtial Phase Display ost eee etu ettet e eese tee detur AE uds 42 13 2 Phase Ledger Sisi CER ee Na edb ads x eU Pe 42 13 3 Function Data over Phases eee otro desea bess Coan ce ese oe tee ox oes by Tone Ee STR 43 14 1 Comparison Window initial sise 44 14 2 Comparison Window 2 trials ss 44 14 3 Comparison Window 3 threads sss eem eem eene eene 45 15 1 User Event Bar Graph 1554 un erem Dre Perte EPI EPIS 46 15 2 Function Ledger eege eege stud trie erre petit EE Leber use Se 46 15 3 Group Led ger i Pe OPE Geop eti dates aes 47 15 4 User Event Ledger EE 47 15 5 Selective Instrumentation Dialog 48 16 1 ParaProf Preferences Window ssssssessssseseenee e ee ee me me he rhe rennes 49 16 2 Edit Default Colors ik ER ee Dea o dE ep X END eO as 50 16 3 Col r E EE 50 20 1 Selecting a dimension reduction method sesse 57 20 2 Entering a minimum threshold for exclusive percentage 57 20 3 Entering a maximum number of clusters esses 58 20 4 Selecting 4 M tric to Cluster eer tonne terne tag Fehr Re SEENEN 58 20 5 Confirm Clustering Options irreereeeneeeeeeeeeeeeneeeneeenee emm rhe rennen 58 20 6 Cluster Res ults eese po E ER ENEE sigs teas EE E a 59 20 7 Cluster Member
137. rials View is created when a database is created This will launch the View Creator window Figure 8 2 View Creator Window 23 Views and Sub Views X TAU ParaProf Manager File Options Help Applications 7 Standard Applications gt E Demo jdbc h2 home users smillst ParaProf Demo perfdmf AUTO SERVER TRUE PARENT gt 7 perfexplorer working jabc h2 home users smillst ParaProf perfexplorer working AUTO SERVER TRUE gt EJ Default jdbc h2 home users smillst ParaProf perfdmf perfdmf AUTO SERVER TRUE 7 regression taudb jdbc postgresal taudb nic uoregon edu 5432 fregression_taudh gt E Nov2 jdbc h2 home users smillst ParaProf Nov2 perfdmf ALTO SERVER TRUE CI TAldb jdbc h2 home users smillst ParaProf T AUdb perfdmf AUTO SERVER TRUE eoo IX TAUdb View Creator Match of the following rules s Tel read as a number View Fi Value ID 1 Here you can create the rule s for which Trials appear in this new View At the top you can choose to match all of the rules and or to match any of the rules The or buttons will remove the current rule or add a new one The first drop down box chooses which metadata field to use The second box chooses whether the field should be read as a string or a number Depending on whether it is read as a string or a number the fourth box will give options on how to compare the metadata field
138. rname the database username 112 TAUdb C API char db password the database password for username char db schemafile full or relative path to the schema file used for configuration not used in C API j TAUDB CONFIGURATION typedef enum taudb database schema version TAUDB 2005 SCHEMA TAUDB 2012 SCHEMA TAUDB SCHEMA VERSION typedef struct taudb data source int id char name char description UT hash handle hh1 hash index for hashing by id UT hash handle hh2 hash index for hashing by name TAUDB DATA SOURCE typedef struct taudb connection TAUDB CONFIGURATION configuration if defined TAUDB POSTGRESQL PGconn connection PGresult res TAUDB PREPARED STATEMENT statements elif defined TAUDB SOLITE sglite3 connection Sqlite3 stmt ppStmt int EG fendif TAUDB SCHEMA VERSION schema version boolean inTransaction boolean inPortal TAUDB DATA SOURCE data sources by id TAUDB DATA SOURCE data sources by name TAUDB CONNECTION these are the derived thread indexes define TAUDB MEAN WITHOUT NULLS 1 define TAUDB TOTAL 2 define TAUDB STDDEV WITHOUT NULLS 3 define TAUDB MIN 4 define TAUDB MAX 5 define TAUDB MEAN WITH NULLS 6 define TAUDB STDDEV WITH NULLS 7 trials are the top level structure typedef struct taudb trial
139. ross the threads You may also change the units and metric displayed from the Op tions menu 12 2 Function Histogram Figure 12 2 Function Histogram 40 Function Based Displays X Histogram miranda16k ppk packed data amorris home File Options Windows Help 50 100 Number of Bins MPI_Barrier 400 Threads 0 12 14 24 28 36 42 48 56 60 7 72 83 84 97 97 11 109 3 1214 Exclusive Time seconds This display shows a histogram of each thread s value for the given function Hover the mouse over a given bar to see the range minimum and maximum and how many threads fell into that range You may also change the units and metric displayed from the Options menu You may also dynamically change how many bins are used 1 100 in the histogram This option is available from the Options menu Changing the number of bins can dramatically change the shape of the histogram play around with it to get a feel for the true distribution of the data 41 Chapter 13 Phase Based Displays When a profile contains phase data ParaProf will automatically run in phase mode Most displays will show data for a particular phase This phase will be displayed in teh top left corner in the meta data pan el 13 1 Using Phase Based Displays The initial window will default to top level phase usually main Figure 13 1 Initial Phase Display X ParaProf phase3d new ppk uintah
140. rumentation File Generator Output File home amorris taudocs paraprof select tau v Exclude Throttled Routines v Exclude Lightweight Routines Lightweight Routine Exclusion Rules Microseconds per call 10 Number of calls 100000 Excluded Routines double FcCoreSourceConst getSource const std string amp double const double FcCoreSourceProfile getSource const std string 8 double const double FcCoreSourceProfile remap double const double FcTmCoreFluxCalc getFlux const std string amp FcArray NDIM T gt getBox FcDistArray NDIM T gt upperExt FcDomain NDIM getDx FcInterpolator NDIM FLT gt func FcPolvFunction NDIM FIT func save Note Only the functions profilied in ParaProf can be excluded If you had previously setup se lective instrumentation for this application the functions that where previously excluded will not longer be excluded 48 Chapter 16 Preferences Preferences are modified from the ParaProf Preferences Window launched from the File menu Prefer ences are saved between sessions in the ParaProf ParaProf prefs 16 1 Preferences Window In addition to displaying the text statistics for User Defined Events ParaProf can also graph a particular User Event across all threads Figure 16 1 ParaProf Preferences Window ParaProf Preferences n File Font sansserif I n c t 0 0 0 FR
141. ry of the TAU data in the current directory For performance data with multiple metrics move into one of the director les to get information about that metric gt cd MULTI P WALL CLOCK TIM gt pprof Reading Profile files in profile E NODE 0 CONTEXT 0 THREAD 0 Time Exclusive Inclusive Call Subrs Inclusive Name msec total msec usec call 100 0 24 590 1 iL 590963 main 959 26 566 1 2 566911 multiply 47 3 279 279 1 0 279280 multiply opt 44 1 260 260 1 0 260860 multiply regula 4 2 ParaProf To launch ParaProf execute paraprof from the command line where the profiles are located Launching ParaProf will bring up the manager window and a window displaying the profile data as shown below Figure 4 1 Main Data Window X ParaProt uintah16 ppk packed data 0 x File Options Windows Help Metric P_WALL_CLOCK_TIME alue Exclusive std dev mean n c t 0 0 0 n c t 10 0 n c t 2 0 0 n c t 3 0 0 n c t 4 0 0 n c t5 0 0 n c t 6 0 0 nct 7 0 0 n c t 8 0 0 n c t 9 0 0 n c t 10 0 0 n c t 11 0 0 n c t 12 0 0 n c t 13 0 0 n c t 14 0 0 n c t 15 0 0 For more information see the ParaProf section in the reference guide 11 Analyzing Parallel Applications 4 3 Jumpshot To use Argonne s Jumpshot bundled with TAU first merge and convert TAU traces to slog2 format tau treemerge pl tau2slog2 tau trc tau edf o tau slog2 jumps
142. ser to enter an expression Double clicking on a metric in the Performance Data tree will copy that metrics name into the box If a metric contains any operands the whole metric must be surrounded by quotes If the you would like of the metric to be renamed then you should start the expression with the new name and and equals sign If this is the only metric you wish to derive then select the trial expression or application where the metric should be derived and then click apply If you wish to derive many metrics then click Add to List and create more expressions 27 2 Selecting Expressions If you have added multiple expressions you can select one or many of them to apply They will be de rived from top to bottom After you have select some you can select the trial experiment or application to apply the expression to and then click apply 27 3 Expression Files You can also derive metrics using an expression file An expression file has a single expression per line To parse the file select the trial experiment or application to apply the expressions to then select File gt Parse Expression File and chose the file 92 Part IV TAUdb Table of Contents 28 1nttoductioni 2 testi lisis 95 28 1 on eps tote eter rtp rre Pere ertet tenerae Kees 95 28 2 Installation iei ttt des vanes ay e etie ti es RE dee 95 20 Usmg TA UAB o REDE DEN Vic 98 29 1 perfdmf createapp deprecated only supported for older PerfDMF
143. ses a300 0275 0225 2 200 aiso 2225 0075 aoso D 647INTERF anos 2 000 0025 050 1075 2100 0 125 0375 0200 5225 s Lo 05 analysis_result PCA Results D LI LI D B D LA 034 02 KD au AL 0 861barrier Openi location file runhyd3 F 580 0 gt M Clustor 3 BiClusior 1 Figure 20 9 Cluster Virtual Topology AL 23 M 61 Cluster Analysis eoe mg clusterlmage 1 2 66 430 12 70 2 elus Figure 20 10 Cluster Average Behavior 62 Cluster Analysis analysis result sPPM Froste16 16P WALL CLOCK TIIME Total Runtime Cluster Number UN MDIFUZE MDINTRE MINTERE SPPM M barrier OpenMP location file runhyd3 F 559 0 gt Bi barrier OpenMP location file runhyd3 F 562 0 gt B barrier OpenMP location file zrunhyd3 F 568 0 gt barrier OpenMP location file runhyd3 F 571 07 E barrier OpenMP location file runhyd3 F 577 0 gt W barrier OpenMP location file runhyd3 F 580 0 gt barrier OpenMP location file runhyd3 F 586 0 gt WB barrier OpenMP location file runhyd3 F 589 0 gt B barrier OpenMP location file runhyd3 F 595 0 gt B barrier OpenMP location file runhyd3 F 598 0 gt B barrier OpenMP location file runhyd3 F 604 0 gt B barrier OpenMP location file runhyd3 F 623 0 gt Wi do OpenMP location file runhyd3 F 1259 1457 gt Bio OpenMP lo
144. ship Histogram sss emm e meme ener 59 20 8 Cluster Membership Scatterplot ss 60 vii TAU User Guide 20 9 Cluster Virtual Topology sise 61 20 10 Cluster Average Behavior 454 21 488 oe eene tcr ete tee jast Aeon 62 21 1 Selecting a dimension reduction method sss 64 21 2 Entering a minimum threshold for exclusive percentage 64 21 3 Selecting a Metric to Cl ster 22 oce tese rsen ue dre pe ete ss E teke Syene 64 21 4 Correlation Results tot ettet emer Poe Pe TE REOR ESPERE SEDE PRESE bess 65 21 5 Correlation Example aee ple ner ee desks YE Ie E etuer 66 22 1 Setting Group of Interest 1554 eerte De Poe torte e hee ERR EE PERE ENNEN 68 22 2 Setting Metric of Interest eme hem eme nee rennen 68 22 3 Setting Event of Interest enc pa E ass berks ete e Po ei rias 68 22 4 Setting TIMEStEPS aoreet rien e aree tette espe e dose kest Uudses OP opp Se Ee 69 22 5 Timesteps per Second A tasse tt enr Pret meek dee eh T 69 22 6 Relative Efficiency aise sim ft pile EE ese I EL este ee RL 70 22 7 Relative Efficiency by Event ss 71 22 8 Relative Efficiency one Event n eott oeeet tree teer pa EO Eee e eorr e Eo Oboe t epigr Oed 71 22 0 Relative Speedup eU a p PER eH 72 22 10 Relative Speedup by Event ettet re Ee PR RC ERR E SIT res 72 22 11 Relative Speedup one Event ss 73 22 12 Group of
145. sis is the process of organizing data points into logically similar groupings called clusters Summariz ation is the process of describing the similarities within and dissimilarities between the discovered clusters Association is the process of finding relationships in the data One such method of association is regression analysis the process of finding independent and dependent correlated variables in the data In addition comparative analysis extends these operations to compare results from different experi ments for instance as part of a parametric study In addition to the data mining operations available the user may optionally choose to perform comparat ive analysis The types of charts available include time steps per second relative efficiency and speedup of the entire application relative efficiency and speedup of one event relative efficiency and speedup for all events relative efficiency and speedup for all phases and runtime breakdown of the application by event or by phase In addition when the events are grouped together such as in the case of commu nication routines yet another chart shows the percentage of total runtime spent in that group of events These analyses can be conducted across different combinations of parallel profiles and across phases within an execution 54 Chapter 18 Installation and Configuration 18 1 PerfExplorer uses TAUdb and PerfDMF databases so if you have not already you will need
146. tcmpi 02 seconds m oc gtcmpi 03 et ID EEE astrict Q 03 o 7 gtempi 04 qstrict Q 04 EI gtcmpi 05 XML Field astrict Q 05 ei yro B1 std i E ov S Compiler Options RA astrict qarch 440d Q gyro B1 std HPM Appi 7 gyro B2 cy BR astrict qarch 440d Q 02 CA gyro B2 cy HPM Reset astrict qarch 440d Q 03 9 E gyro B3 gtc qstrict qarch 440d Q 04 EJ gyro B3 gtc HPM TO 7 Heterogeneous Simulation on utonil SE A Heterogenous simulation on neuror a euroni m ocracoke 440d e ocracoke noinline 4 ocracoke nostrict gt ocracoke strict There are a number of controls for the cusotom charts They are Main Only When selected only the main event the event with the highest inclusive value will be selected When deselected the Events control see below is activated and one or all events can be selected e Call Paths When selected callpath events will be available in the Events control see below Log Y When selected the Y axis will be the log of the true value e Scalability When selected the chart will be interpreted as a speedup chart The trial with the fewest number of threads of execution will be considered the baseline trial Efficiency When selected the chart will be interpreted as a relative efficiency chart The trial with the fewest number of threads of execution will be considered the baseline trial e Strong Scaling When deselected
147. tion method f Cver X Percent Cancel TAE Select Over X Percent The following dialog will appear Figure 21 2 Entering a minimum threshold for exclusive percentage eoo Minimum Percentage Only 5 ts with exclusive time 96 greater than X where 100 E 1 ED Enter a value for example 1 21 2 Performing Correlation Analysis To perform correlation analysis you first need to select a metric To select a metric navigate through the tree of applications experiments and trials and expand the trial of interest showing the available metrics as shown in the figure below Figure 21 3 Selecting a Metric to Cluster 64 Correlation Analysis Perfexplorer Client EU gt 3 CFDSHIF gt 13 LABIMPS Large scale Atomic Molecula gt 3 Miranda 3 Por b 3 SHAMRC gt 3 Bigscience b L3 SMG2000 gt 3 SPhot gt 3 vintah gt 3 wee gt 3 gyro B1 gt 3 gyro B1 gt 3 ayro B2 gt 3 gyro B2 gt 3 gyro B3 gt 3 gyro B3 Y LE sFFM Y LE Frost sid std HPM Cy cu HPM gte oc HERR v GW 16 16 gt FWAL CLOCK TIME gt PAPLFP_INS b PAPLINT_INS b PAPLTOT_CYC gt 0 PAPI TOT IIS gt PAPLTOT_INS gt 3 socorro 5i2 55 input gt 3 Views File Analysis Views Charts Visualization Help Field Name Metric ID Trial ID Cluster Results Correlation Results Value PWALL CLOCK TIME 1270 430 After se
148. tion of the values across the threads will be used as the value 11 1 Thread Bar Graph Figure 11 1 Thread Bar Graph MS n c t 0 0 0 Application 13 Experiment 23 Trial 58 Mil HE File Options Windows Help Metric Time Value Exclusive Units seconds 13 878 e MP Eer 4 983 a ML ng 1483 e bits 1397 Eg ths 1287 END buts 3 1 069 E MPI Wait0 E 0 867 E MPI Sendo 0 769 E jacu 0 759 EI jacld 0 691 exchange 1 0 415 bcast inputs 0 088 exchange 3 0 069 setiv 0 03 exact 0 024 MPI Allgather 0 015 erhs 0 014 read input 0 012 error 0 01 MPI_Allreduceg 0007 MPI iren En This display graphs each function on a particular thread for comparison The metric units and sort order can be changed from the Options menu 11 2 Thread Statistics Text Window Figure 11 2 Thread Statistics Text Window 34 Thread Based Displays X n c t 0 0 0 Application 13 Experiment 23 Trial 58 O n x File Options Windows Help Metric Name Time Sorted By Exclusive Units seconds SIIT a Total Time Exclusive Inclusive Calls Child Calls Inclusive Call Name 49 8 13 878 13 878 80000 0 1 7348E 4 MPI_Recv 18 0 4 983 5 008 I 2 5 008 MPI Init 8 5 1 483 2 368 40000 80000 5 9202E 5 bits 9 9 1 397 2 76 251 502 0 011 rhs 55 7 1 287 15 528 40000 80000 3 8819E 4 buts 3 8 1 069 1 069 5
149. tion ssr secsi estu er sea HH emere 64 21 2 Performing Correlation Analysis s seen seca seca sean eeu eees 64 22 Charts REPE 68 22 1 Setting Parameters erm Leere meer tere IS RENEE rene REPE 68 22 1 1 Grou p of Interest ecce nitore rude esse 68 22 12 Metric of Interest uis eee 68 22 1 3 Event of Interest ee eerte teer emere tore RR EE E reet 68 22 1 4 Total Number of Timesteps ocoooconcconccnnccnnncnnccnncnnncnnccnnccnnccnnins 69 22 2 Standard Chart Types EE 69 22 27 Timesteps Per Second ceci ted p 69 222 2 Relative Bfficlen6y s oo etre spp muette nor DA UEe Per ope dere 70 22 2 3 Relative Efficiency by Event etrerseerrree 70 22 2 4 Relative Efficiency for One Event ocooocccccnnccnnccnncnnccnnccnnconnccnnions 71 22 2 5 Relative Speedup eee tetur IE ttes 72 22 2 6 Relative Speedup by Event 72 22 2 7 Relative Speedup for One Event 73 22 2 8 Group of Total Runtime seen eeneene 73 22 2 9 Runtime Breakdown esses 74 22 3Phase Chart Ly pes oc e ER Ere Dee REN 74 22 3 1 Relative Efficiency per Phase 75 22 3 2 Relative Speedup per Phase 75 22 3 3 Phase Fraction of Total Runtime 76 23 Custom Charts eet e E repa det reto DE Oed EA cds 77 24 Visualization 2 ctr E ER Ter Pt Aa OO Rue E ERE en dE ae 79 24 1 3D Visualization ii mee HE
150. ube3 Output from Kojak Expert tool for use with Cube and Cube4 e HPCToolkit hpc XML data from hpcquick Typically the user runs hpcrun then hpcquick on the resulting binary file e OpenMP Profiler ompp CSV Tomat from the ompP OpenMP Profiler http www ompp tool com The user must use OMPP OUTFORMAT CVS PERI XML perixml Output from the PERI data exchange format General Purpose Timing Library gptl Output from the General Purpose Timing Library e Paraver paraver 2D output from the Paraver trace analysis tool from BSC IPM ipm Integrated Performance Monitoring format from NERSC Google google Google Profiles TAUdb Views In order to provide flexible data management the application experiment trial hierarchy was removed in the conversion from PerfDMF to TAUdb In addition trial metadata was promoted from an XML blob in PerfDMF to queryable tables Users can now organize their data in arbitrary hierarchies using Views and SubViews Creating and using Views is outlined in the ParaProf User Manual in Chapter 2 100 Chapter 30 Database Schema 30 1 The database schema in TAUdb is designed to flexibly and efficiently store multidimensional parallel performance data There are 5 dimensions to the actual timer measurements and 4 dimensions to the counter measurements Timer dimensions Process and thread of execution 2 Timer source code location i e foo 3 Metric o
151. udb add primary metadata to trial trial pm 123 TAUdb C API 1 alternatively you can allocate the primary metadata in blocks pm taudb create primary metadata 10 pm 0 name taudb strdup ClientID pm 0 value taudb strdup joe user taudb add primary metadata to trial trial amp pm 0 pm 1 name taudb strdup hostname pm 1 value taudb strdup hopper04 taudb add primary metadata to trial trial amp pm 1 pm 2 name taudb strdup Operating System pm 2 value taudb_strdup Linux taudb add primary metadata to trial trial amp pm 2 pm 3 name taudb strdup Release pm 3 value taudb strdup 2 6 32 36 0 5 default taudb add primary metadata to trial trial amp pm 3 pm 4 name taudb strdup Machine pm 4 value taudb strdup Hopper nersc gov taudb add primary metadata to trial trial amp pm 4 pm 5 name taudb strdup CPU Cache Size pm 5 value taudb strdup 512 KB taudb add primary metadata to trial trial amp pm 5 pm 6 name taudb strdup CPU Clock Frequency pm 6 value taudb strdup 800 000 MHz taudb add primary metadata to trial trial amp pm 6 pm 7 name taudb strdup CPU Model pm
152. ustom is selected the dimensions of the rectangular prism containing the cores are defined by the X Y Z axis control widgets The Events tab controls which events are used to define the color values and positions of cores processes in the display For trail specific and Custom topologies only event3 Color can be changed For topolo gies loaded in MESP definition files all four events may be used in calculation of the topology layout In either case interval atomic or metadata values may be used to color or position points in the display 10 5 3 D Commication Matrix Figure 10 5 3 D Commication Matrix eoo TAU ParaProf 3D Communication Matrix g5 ppk Display Options Callpath All Paths e Height Value Max message size bytes Color Value Message volume bytes Sender Receiver Height value 4096 Color value 1 6777E7 lt Axes BS Render 1 6777E7 4 Show Colors Font Size 9 Rainbow Grays Inverse Grays Blue Blue White 1 6777E7 Message volume bytes If a Trial has commication information set TAU_COMM_MATRIX 1 at runtime of your application then you can launch the 3D Commication window as shown 33 Chapter 11 Thread Based Displays ParaProf displays several windows that show data for one thread of execution In addition to per thread values the users may also select mean or standard deviation as the thread to display In this mode the mean or standard devia
153. wn for each path see Figure 11 3 Thread Statistics Table inclusive and exclus ive for an example When this option is off the inclusive value for a node is show when it is closed and the exclusive value is shown when it is open This allows the user to more easily see where the time is spent since the total time for the application will always be represented in one column See Figure 11 4 Thread Statistics Table and Figure 11 5 Thread Statistics Table for examples This display also functions as a regular statistics table without callpath data The data can be sorted by columns by clicking on the column head ing When multiple metrics are available you can add and remove columns for the display using the menu 11 4 Call Graph Window Figure 11 6 Call Graph Window 36 Thread Based Displays This display shows callpath data in a graph using two metrics one determines the width the other the color The full name of the function as well as the two values color and width are displayed in a tooltip when hovering over a box By clicking on a box the actual ancestors and descendants for that function and their paths arrows will be highlighted with blue This allows you to see which functions are called X Mean Call Graph home amorris data tau mpilieb de pth200 a a x File Options Windows Help by which other functions since the interplay of multiple pat
154. xt gt Specify the name of the trial d xperimentid number Specify associated experiment ID for this trial Optional Arguments c config lt name gt Specify the name of the configuration to use g configFile file Specify the configuration file to use overrides c f filetype filetype Specify type of performance data options are profiles default pprof dynaprof mpip gprof psrun hpm packed cube hpc ompp snap perixml gptl paraver ipm google 98 Using TAUdb t trialid number Specify trial ID i fixnames Use the fixnames option for gprof z usenull Include NULL values as 0 for mean calculation Es reduce percentage Aggregate all timers less than percentage as other m metadata filename XML metadata for the trial Notes For the TAU profiles type you can specify either a specific set of profile files on the commandline or you can specify a directory by default the current directory The specified directory will be searched for profile files or in the case of multiple counters directories named MULTI containing profile data Examples perfdmf loadtrial e 12 n Batch 001 This will load profile or multiple counters directories MULTI into experiment 12 and give the trial the name Batch 001 perfdmf loadtrial e 12 n HPM data 01 f hpm perfhpm This will load perfhpm files of type HPMToolkit into ex
155. y Chart E esse net dansent eee nn e eee les 18 Sel Add NICW ai vr SAE es thes tan vies lube te re een yc ee EEGEN 25 8 2 View Creator Tele EE 25 9 1 ParaProf Manager Window 27 9 2 Loading Profile Data seotust rrr e re e Petre E EPI DE PIRE 27 9 3 Creating Derived M trics esee ENNEN juleta saa Reb Eye cen Du Sere ove se 28 9 4 Main Data WindoW termina It tenter en mien sonne ESTEE EVE RE ER ERRAT EAT 29 9 3 Unstacked Bars ier Ie E emt EE ee 29 10 T Triangle Mesh Plot m Pr er i ere Pme to jas ri sai ERI RP ee Ne 30 10 2 3 D Mesb Plot cioe ee tasast ironia ka Se eal saan beaten tUe oreet ech 30 10 33 D Scatter Plot iier ep etr Prec Ee Lo ERE OR Eee ee EE ken 31 10 4 3 D Topology PIOt EE 31 10 5 3 D Commication Matrix 33 1151 Thread Bar Graph enee eege tesselen Ee ege 34 11 2 Thread Statistics Text Window sss he nehmen hee eene 34 11 3 Thread Statistics Table inclusive and exclusive esses 35 11 4 Fhre d Statistics Table ita roo 35 11 3 Ehread Statistics Table 45 see eo EO e IRR I T gab rot dro 36 11 6 Call Graph Window e inedito 36 11 7 Thread Call Path Relations Window sese m mme hene ERE Ea 37 11 8 User Event Statistics Window vrrenevevonevevoneve veere vo nevetonevevoheveveneve ve neve nene 38 11 9 User Event Thread Bar Chart Window sese e m e meer 38 12 1 Functon Bar Graph eode lesk tete eoe ee
Download Pdf Manuals
Related Search
Related Contents
平成25年度についてのお知らせ - 東京都高等学校体育連盟 Users Manual User's Guide Specification - Manual und bedienungsanleitung. 避難・救助用品 C - Mettler Toledo Manuale utente SpagoBI QuickStart 0.9.3 Acronis Internet Security Suite 2010 Double Auto Reverse Cassette Deck Introduzione Copyright © All rights reserved.