Home
TAU User Guide
Contents
1. BEGIN INSTRUMENT SECTION A dynamic phase will break up the profile into phase where each events is recorded according to what phase of the application in which it occured dynamic phase name fool bar file foo c line 26 to line 27 instrument all the outer loops in this routine loops file loop_test cpp routine multiply tracks memory allocations deallocations as well as potential leaks memory file foo f90 routine INIT tracks the size of read write and print statements in this routine io file fo0 f90 routine RINB END_INSTRUMENT_SECTION Selective instrumentation files can be created automatically from ParaProf by right clicking on a trial and selecting the FCreate Selective Instrumentation File menu item Chapter 2 Profiling This chapter describes running an instrumented application generating profile data and analyzing that data Profiling shows the summary statistics of performance metrics that characterize application per formance behavior Examples of performance metrics are the CPU time associated with a routine the count of the secondary data cache misses associated with a group of statements the number of times a routine executes etc 2 1 Running the Application After instrumentation and compilation are completed the profiled application is run to generate the pro file data files These files can be stored in a directory specified by the environment vari
2. X ParaProt Edit Default Colors DEE File Default Color Set Swatches HSB RGB Sat El Add Function Color Group 3 Group Add Group Color Group 5 Group 6 Delete Selected Color Group 7 E Update Selected Color Group 8 Group 9 Restore Defaults Group 10 Group 11 Group 12 Func Highlight E Preview Group Highlight E A D y Sample Text Sample Text User Event Highlight oO Misc Func Color Ji D gt IMAN The default color editor changes how colors are distributed to functions whose color has not been spe cifically assigned It is accessible from the File menu of the Preferences Window 9 3 Color Map Figure 9 3 Color Map X ParaProf Color Map W i HE Assign Colors Currently Assigned Colors subdomain zx Remove error Remove All MPI Barrier read input setiv MPI Comm size MP Ar getQ I2norm buts exchange 4 exchange 6 applu sethyper bcast inputs neighbors jacu EN 30 Preferences The color map shows specifically assigned colors These values are used across all trials loaded so that the user can identify a particular function across multiple trials In order to map an entire trial s function set Select Assign Defaults from gt and select a loaded trial Indi
3. ED Enter a value for example 1 5 2 Performing Correlation Analysis To perform correlation analysis you first need to select a metric To select a metric navigate through the tree of applications experiments and trials and expand the trial of interest showing the available metrics as shown in the figure below Figure 5 3 Selecting a Metric to Cluster 16 Correlation Analysis Per xplorer Client EU gt 3 CFDSHIF gt 13 LABIMPS Large scale Atomic Molecula gt 3 Miranda 3 Por b 3 SHAMRC gt 3 Bigscience b L3 SMG2000 gt 3 SPhot gt 3 vintah gt 3 wee gt 3 gyro B1 gt 3 gyro B1 gt 3 ayro B2 gt 3 gyro B2 gt 3 gyro B3 gt 3 gyro B3 Y L9 sPPIM Y 44 Frost sid sid HPR Cy cu HPM gte oc HERR L 16 16 gt F WALL CLOCK TIME gt PAPLFP INS b PAPLINT INS b PAPI TOT CYC gt 0 PAPI TOT IIS gt PAPLTOT_INS gt 3 socorro 5i2 55 input gt 3 Views File Analysis Views Charts Visualization Help Field Name Metric ID Trial ID Cluster Results 9 Correlation Results Value PLWALL CLOCK TIME 1270 430 After selecting the metric of interest select the Do Correlation Analysis item under the Analysis main menu bar item confirmation dialog will appear and you can either confirm the correlation re quest or cancel it After confirming the correlation the analysis will begin When th
4. Data Summary Create Boxchart Create Histogram and Create Normal Probability Chart For the Boxchart Histogram and Normal Probability Charts you can either select one metric in the trial which selects all events by default or expand the metric and select events of in terest 8 1 3D Visualization When the 3D Visualization is requested PerfExplorer examines the data to try to determine the most interesting events in the trial That is for the selected metric in the selected trial the database will calcu late the weighted relative variance for each event across all threads of execution in order to find the top four significant events These events are selected by calculating stddev exclusive max exclusive min exclusive exclusive percentage After selecting the top four events they are graphed in an OpenGL window Figure 8 1 3D Visualization of multivariate data Correlation of top 4 variant events l ScatterPlot Axes ColorScale D 8 2 Data Summary In order to see a summary of the performance data in the database select the Show Data Summary item under the Visualization main menu item 31 Visualization Figure 8 2 Data Summary Window eoe Summerization Analysis 2 avg exclusive avg exclusive avg calls avg exclusive max min siddey stddev range 148 097 27 5 485 159 137 3 076 3 410 920 E 50 68 218 419 3 440 009 3 169 518 61 980 194 7 847 102 3 851 756 10
5. 15 Chapter 5 Correlation Analysis Correlation analysis in PerfExplorer is used to explore relationships between events in a profile Each event is pairwise plotted with the other events and a correlation coefficient is calcuated for the relation ship When the events are highly positively correlated coefficient of close to 1 0 or highly negatively correlated coefficient close to 1 0 then the relationships will show up as linear groupings in the res ults Clusters may also be apparent 5 1 Dimension Reduction Often many hundreds of events are instrumented when profile data is collected Clustering works best with dimensions less than 10 so dimension reduction is often necessary to get meaningful results Cur rently there is only one type of dimension reduction available in PerfExplorer To reduce dimensions the user specifies a minimum exclusive percentage for an event to be considered significant To reduce dimensions select the Select Dimension Reduction item under the Analysis main menu bar item The following dialog will appear Figure 5 1 Selecting a dimension reduction method Dimension Reduction Select a dimension reduction method a e Cver X Percent Cancel TAE Select Over X Percent The following dialog will appear Figure 5 2 Entering a minimum threshold for exclusive percentage eoo Minimum Percentage Only 5 is with exclusive time 96 greater than X where 100 1
6. 5 726 Calls Tot Calls 1 1 1 4 34 8 8 4 4 1 1 11 1214 6 6 5 5 2 214 212 214 214 214 395 30 90 60 90 90 223 223 Name id main void int char 6 MPI_Init_thread 133 MPI Attr get 123 MPI Attr put 124 MPI Errhandler set 130 MPI Keyval create 136 MPI Type commit 148 MPI Type contiguous 149 MPI Type struct 154 MPIScheduler actuallyCompile 143 MPIScheduler execute 144 MPI Allreduce 122 MPI Type size 153 MPIScheduler postMPIRecvs 145 Relocate relocateParticles MPIScheduler execuj MPI_Recw 141 MPIScheduler processMPIRecvs 146 17 Thread Based Displays This display shows callpath data in a gprof style view Each function is shown with its immediate par ents For example Figure 4 7 Thread Call Path Relations Window shows that MP1_Recv is call from two places for a total of 9 052 seconds Most of that time comes from the 30 calls when MPI Recv iscalled by MPIScheduler postMPIRecvs The other 60 calls do not amount to much time 4 6 User Event Statistics Window Figure 4 8 User Event Statistics Window x net 2 0 0 Application 18 Experiment 32 Trial 87 0 xXx File Options Windows Help Sorted By Number of Samples NumSamples Max Min Mean Std Dev Name 390 281712 4 53601 94022 Message size received from all
7. iteration 3 Heration4 Y iteration 5 ftersbon 6 lieration 7 T Iteration 8 lt iteration 9 O ideal 6 3 3 Phase Fraction of Total Runtime The Phase Fraction of Total Runtime chart shows the breakdown of the execution by phases and shows how that breakdown changes as the number of processors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of In terest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 6 16 Phase Fraction of Total Runtime 8 0 0 Teal Runtime Breakdown Total Rumm ln eme 100 95 90 85 Ba g 75 70 m sp sj n e 55 o 50 in 45 40 a5 25 20 15 10 amp 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Number of Processors BiteiationO W iteration 1 Miteraton2 T iteration iteration 4 Miteraton 5 iteration 8 Iteration 7 Bliteration 8 BR iterations 28 Chapter 7 Custom Charts In addition to the default charts available in the charts menu there are is a custom chart interface To ac cess the interface select the Custom Charts tab on in the results pane of the main window as shown Figure 7 1 The Custom Charts Interface TAU PerfExplorer Client File Analysis Views Charts Visualization Help
8. setenv TAU MAKEFILE opt apps tau tau 2 17 1 x86_64 1ib Makefile tau icpc python set path opt apps tau tau 2 17 1 x86 64 bin Spath setenv TAU OPTIONS optShared optVerbose Python needs shared object based TAU library make F90 tau f90 sh CXX tau cxx sh CC tau cc sh build pyMPI w TAU cat wrapper py import tau def OurMain import App tau run OurMain Uninstrumented mpirun lsf pyMPI 2 4b4 bin pyMPI App py Instrumented setenv PYTHONPATH taudir x86 64 lib bindings python mpi pdt pgi same options string as TAU MAKEFILE setenv LD LIBRARY PATH taudir x86 64 lib bindings icpc python mpi pdt pgiV LD L mpirun np 4 lt dir gt pyMPI 2 4b4 TAU bin pyMPI wrapper py Instrumented pyMPI with wrapper py oP o9 oe oe oe 6 6 Q What happens in my code at a given time A Create an event trace Figure 6 5 Tracing with Vampir 14 Some Common Application Scenario How to create a trace set set o oo A o o oe qsu tau UMPS tau jum oe o Cy oe OR VAMPI tau 4 st env TAU MAKEFILE opt apps tau tau 2 17 1 x86 64 1lib Makefile tau mpi pdt pgi path opt apps tau tau 2 17 1 x86 64 bin Spath make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh setenv TAU TRACE 1 b run job treemerge pl merges binary traces to create tau trc and tau edf files HOT 2slog2 tau trc tau edf o a
9. Correlation Results teet tt rtt PORE EBEN 17 3 5 Correlation Example bue enel een E ube decd TR eet EET E stevens 18 6 T Setting Group of Interest 02 eer Here eere Poe to rte re keral ERI RP rer Rep repa 20 6 2 Setting Metric of Interest atacada lili 20 6 3 Setting Event of Interest RUE eds 20 6 4 Setting TIm steps cercos tr toe ica aia ito 21 6 5 Timesteps per Second mina Eee TRU Ie p I EE e 21 6 6 Relative EEHIGIENGY cesto Zuele s ien A Saunad ae tee ed Rena Te RR eR PRSE AE OU D Pe derer de 22 6 7 Relative Efficiency by went iiie o ee e ee eee E s 23 6 8 Relative Efficiency one Event i isiaisskote tpe tete re P EEN RE ENNER RP ve iaedi s ba 23 6 9 Relative Speedup edd lahtise iet cual ehe eue Pr ze du Rye op veis 24 6 10 Relative Speedup by Event ettet ite eda tete o Panda erre 24 6 11 Relative Speedup one Event mmtereeeneeeeeeeeeeeeneeeneeeneneeen nene nenn nene 25 6 12 Group 96 of Total Rimtime siis enr toners tornero de seasacsees teas etie rrt 26 6 13 Runtime Breakdown oce eret asuda o RE Scie ae deeper ees Se EEN 26 6 14 Relative Efficiency per Phase ovillo ERE 27 6 15 Relative Speedup per Phase 27 6 16 Phase Fraction of Total Runtime esses He eme meer 28 Ch The Custom Charts Interface to 4 ice ete oeste deg tese ere e ee 29 8 1 3D Visualization of multivariate data 31 8 2 Data Summary WindOW 2 5 oet tege me te oe e sh eet epe bo rere Ce
10. is enne 7 7 EI GTC compiler options Analysis Management Cluster Results Correlation Results 9 Custom Charts 9 E GTC compiler options loops Main Only Call Paths Log Y Scalability Efficiency Strong Scaling Horizontal Show Y Axis Zero 9 Cocracoke 440d Chart Title 5 7 gtempi Time gt EI gtcmpi 02 Series Name Value Mean Time seconds gt cI gtcmpi 03 experimentname E 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 ial amplios X Axis Value en i as TERN trial xml metadata ocracoke noinline oe ARA X Axis Name 7 gtcmpi 02 t ji 02 gt gtcmpi 02 Y Axis Value ED 7 gtempi 03 4 mean inclusive X 35 o 7 gtempi 04 EN amp E gtcmpi 05 s Y Ais Name SE E z Q o KERnn Snie Dimension reduction 3 gtcmpi Q 02 None e e S gtcmpi 02 4 Cutoff 0 100 Q 03 c gtcempi 03 4 5 Q 04 e gtcmpi 04 z Ed smi Metric 2 o gt c gtempi 05 ET o Q 05 C ocracoke strict Ame m 2 qstrict Q gt nits a gt 7 atcmpi J SEA cJ atcmpi 02 seconds m oc EJ gtcmpi 03 et ID EEE astrict Q 03 o 7 gtempi 04 qstrict Q 04 o 3 gtempi 05 XML Field astrict Q 05 ei I vro B1 std i E ov b Compiler Options RA astrict qarch 440d Q 7 gyro B1 std HPM Appi 9 7 gyro B2 cy BR astrict qarch 440d Q 02 5 gyro B2 cy HPM Reset qstrict qarch 440d Q 03 9 E gyro B3 gtc qstrict qarch 440d Q 04 3 gyro B3 gtc HPM TO 7 Heteroge
11. 1 0 014 0 018 1 2 0 018 read input 0 1 0 012 0 021 1 8001 0 021 or 0 0 0 01 0 01 8 0 0 001 0 0 0 007 0 007 508 0 1 3455E 5 MPI Irecv 0 0 0 006 0 008 1 2886 0 008 sethy I1 A ADA A a A 4A 34r A MOT n AN This display shows a pprof style text view of the data 4 3 Thread Statistics Table Figure 4 3 Thread Statistics Table inclusive and exclusive x Thread Statistics n c t 0 0 0 depth200 mpilieb amorris home KL HE File Options Windows Help Widder eetedcedeng Name A Inclusive Time Exclusive Time Calls Child Calls 9 Brain 2 662 i 9 579 Y 2 997 o li Collectsolution darray darray Decomposition Grid 2 562 0 246 1 52 Wi CreateArray void darray int int 0 148 0 148 d 0 W Dumperror void darray darray 0 668 0 668 Tl 0 Finalize void darray darray Grid 0 834 0 056 1 4 o PB ini_darrays void darray darray Decomposition 0 24 0 072 1 2 Miteration 2 590 468 61 629 2 983 14 915 9 W Exchange void darray Decomposition Grid 956 296 94 62 5 966 11 932 BlMPI Rec 633 558 633 558 5 966 o Bun Sendo 228 118 228 118 5 966 0 o E mPi Alireduce 926 325 893 315 2 983 2 983 Msweep double darray darray Decomposition 646 218 646 218 5 966 0 Wun Gate 1 338 1 338 2 0 Wun wien 0 07 0 07 2 0 F Mstarup int int char 55 64 5 65 2 8 o wi casto 2 791 2 694 1 1 Wun Con cooraso 0 061 0 061 1 oH Pi Con create 0 594 0 483 1 3 Wun Con shino 0 087 0 087
12. 100 125 150 175 200 225 250 275 325 250 375 400 425 450 475 500 525 Number of Processors B 51 std n2 cheetah affnosng 6 2 9 Runtime Breakdown The Runtime Breakdown chart shows the fraction of the total runtime for all events in the application and how the fraction changes as the number of processors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of In terest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 6 13 Runtime Breakdown eoe EI D 65 al Runtime Breakdown al Runtime 100 95 9n as nn 75 70 en 55 50 45 4n a5 25 20 15 in 5 35 50 75 100 425 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Number of Processors Sco Bolir BVO ON SNL tr Bextas feld in RHS Mother 6 3 Phase Chart Types 26 6 3 1 6 3 2 Charts TAU now provides the ability to break down profiles with respect to phases of execution One such ap plication would be to collect separate statistics for each timestep or group of timesteps In order to visu alize the variance between the phases of execution a number of phase based charts are available Relative Efficiency per Phase The Relative Efficiency Per Phase chart shows the relative efficiency for each phase as the number of processors i
13. 124 56 e 0512 00 126 4 mER 99 0512 00 127 12 014 00 126 3 w 515 00 126 24 m QI 0516 00 126 52 Q 1517 00 124 6 2 MM n6 0 0 125 45 012 00 125 4 i 10 0 0 12747 e nct 11 0 0 1275 O 5012 00 126 91 m 0112 00 126 76 mam N C t 14 0 0 129 48 EE Ct 15 0 0 122 29 Qeew C 16 0 0 122 9 R E INI 5117 00 121 5 n c t 18 0 0 124 45 N me 19 0 0 124 02 n c 120 0 0 124 4 e DC 21 0 0 123 43 e 5122 00 127 31 eg 0123 00 124 03 e 5124 00 123 69 e 0125 00 Si EE 2 l This display graphs the values that the particular function had for each thread along with the mean and standard deviation across the threads You may also change the units and metric displayed from the Op tions menu 5 2 Function Histogram Figure 5 2 Function Histogram 20 Function Based Displays X Histogram miranda16k ppk packed data amorris home File Options Windows Help 50 100 Number of Bins MPI_Barrier 400 Threads 0 12 14 24 28 36 42 48 56 60 7 72 83 84 97 97 11 109 3 121 4 Exclusive Time seconds This display shows a histogr
14. Charts eoe Timesteps per Second 3 sonal S00 total thr 3 75 3 50 3 25 3 00 2 75 2 50 2 25 2 00 Tmesisps 1 75 1 50 1 25 1 00 0 75 0 50 0 25 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B B sld nD2 cheetah affnosng 6 2 2 Relative Efficiency The Relative Efficiency chart shows how an application scales with respect to relative efficiency That is as the number of processors increases by a factor the time to solution is expected to decrease by the same factor with ideal scaling The fraction between the expected scaling and the actual scaling is the relative efficiency If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 6 6 Relative Efficiency Relative Efficiency magava 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B 1 std nl2 cheetah affnosng 6 2 3 Relative Efficiency by Event The Relative Efficiency By Event chart shows how each event in an application scales with respect to relative efficiency That is as the number of processors increases by a factor the time to solution is ex pected to decrease by the same factor with ideal scaling The fract
15. Openi location file runhyd3 F 580 0 gt Cluster 3 MClustor 1 Figure 4 9 Cluster Virtual Topology ae M M 13 Cluster Analysis eoe amp mgp clusterlmage 1 2 66 430 12 70 2 elus Figure 4 10 Cluster Average Behavior 14 Cluster Analysis analysis resul sPPMeFrost16 167 WALL CLOCK MME Total Runtime Cluster Number UN MDIFUZE MEDINTRE MINTERE SPPM M barrier OpenMP location file runhyd3 F 559 0 gt Bi barrier OpenMP location file runhyd3 F 562 0 gt B barrier OpenMP location file zrunhyd3 F 568 0 gt barrier OpenMP location file runhyd3 F 571 07 E barrier OpenMP location file runhyd3 F 577 0 gt W barrier OpenMP location file runhyd3 F 580 0 gt barrier OpenMP location file runhyd3 F 586 0 gt WB barrier OpenMP location file runhyd3 F 589 0 gt B barrier OpenMP location file runhyd3 F 595 0 gt B barrier OpenMP location file runhyd3 F 598 0 gt B barrier OpenMP location file runhyd3 F 604 0 gt B barrier OpenMP location file runhyd3 F 623 0 gt Wi do OpenMP location file runhyd3 F 1259 1457 gt W do OpenMP location file runhyd3 F 1550 1744 E do OpenMP location file runhyd3 F 1835 20062 do OpenMP location file runhyd3 F 2092 2256 B do OpenMP location file runhyd3 F 667 889 do OpenMP location file runhyd3 F 978 1166 1
16. SOFTWARE Table of Contents D P viii T T u e et EE 1 1 1 Dyninst binary rewriting of applications s sese meme 1 1 2 TAU scripted compilation itereeeeeeeeneeenee neer nene nene nene ee neee N 1 1 2 1 Compiler Based Instrumentation e teieeeeeeeneeneeenee enne enne neenee 1 1 2 2 Source Based Instrumentation 2 1 2 3 Options to TAU compiler scripts sse 2 1 3 Selectively Profiling an Application erereeersererrerrereee 2 13 1 Custom Profiling suerge ter REF rere eeh ENEE Eh EE 2 2 tee 4 2 1 Running the Application iore ERR TR ERE ENER ENEE E P AE 4 2 2 Reducing Performance Overhead with TAU_THROTTEE eee 4 2 3 Profiling sach event callp th 5 iiie rte thier RR seni m isas si 4 2 4 Using Hardware Counters for Measurement sss 3 SL KEE 7 3 1 Generating Event Traces eese due e degt SEAN ee gres Coo kp Sene EEN eege 7 4 Analyzing Parallel Applications rrrerreeneeeneeeeeeeenee enne Hehe mee eere 8 As Text eier deed Sdt ee seed Eie here 8 4 2 ParaPtOoF see npe Decus ads 8 O LAUSED iv ADSL 9 2 Quick Referencia haa eh 10 6 Some Common Application Scenario eiieerieereeeneeenen I e e eene 11 6 1 Q What routines account for the most time How much 11 6 2 Q What loops account
17. Window irrererneevneeneennennen nene nne en e sese reser 29 Edit Default Colors aaa 30 Color Map date defteg eerte seid nian coe tese foo de rre oe oh nct de tet dee fe o Ure re ede Ponte da 30 Chapter 1 Introduction ParaProf is a portable scalable performance analysis tool included with the TAU distribution a Important o ParaProf requires Sun s Java 1 3 Runtime Environment for basic functionality Java 1 4 is required for 3d visualization and image export Additionally OpenGL is required for 3d visualization Note Most windows in ParaProf can export bitmap png jpg and vector svg eps images to disk png jpg or print directly to a printer This are available through the File menu 1 1 Using ParaProf from the command line ParaProf is a java program that is run from the supplied paraprof script paraprof bat for windows bin ary release 9 paraprof help Usage paraprof options lt files directory gt Options E filetype lt filetype gt Specify type of performance data options are profiles default pprof dynaprof mpip gprof psrun hpm packed cube hpc ompp h help Display this help message p Use pprof to compute derived data i fixnames Use the fixnames option for gprof m monitor Perform runtime monitoring of profile data The following options will run only from the console no GUI will launch pack file Pack the data into packe
18. Window 3 threads x Comparison Window EN File Options Windows Help Metric Time M lu A 128 Mean alue Exclusive liu C 512 Mean Units seconds llu A 128 n c t 0 0 0 4 802 uuu 12324 m MEI Feu 753 ma E s 0 908 zx 2 015 Ess Mi nn 1272 es 0 699 EZ 2 386 es MPI Sendo 0 243 H 0 419 E 1462 eg bits 0 368 E 0 384 EI 1336 es buts 0 32 E 0 376 EB 0 651 BEN MPI Wait 0 328 E 0 375 E 1 003 el exchange_1 0 267 B 0 242 H 1392 eg rhs 0185 H gt 23 Chapter 8 Miscellaneous Displays 8 1 User Event Bar Graph In addition to displaying the text statistics for User Defined Events ParaProf can also graph a particular User Event across all threads Figure 8 1 User Event Bar Graph x User Event Window Application 13 Experiment 23 Trial 57 o x File Options Windows Help Name Message size sent to all nodes value Type Number of Samples 31504 n G t0 0 0 31500 EE 5015 00 31506 00112 00 31508 ees 0127 00 47258 s 1 C t 1 0 0 47258 47258 47258 47259 47259 n c 164 0 0 n c t 80 0 0 n c t 96 0 0 n 65t 31 0 0 n c t 47 0 0 417250 1 02 0 0 0 47258 a nc02 00 417559 1 C t 4 0 0 47559 e 1 C t 5 0 0 47253 ee 1 ct 6 0 0 472538 e 517 00 47258 el Ci S 0 0 47258 e nC
19. al 0 Wun comm ranko 0 054 0 054 2 ojx This display shows the callpath data in a table Each callpath can be traced from root to leaf by opening each node in the tree view A colorscale immediately draws attention to hot spots areas that contain highest values Figure 4 4 Thread Statistics Table 15 Thread Based Displays x Thread Statistics n c t 0 0 0 depth200 mpilieb amorris home KL HE File Options Windows Help LLE A E Name A Cals Child Calls e man 9 579 1 2 997 S Mi CollectSolution darray darray Decomposition Grid 2 562 1 52 HH CreateArray void darray int int 0 148 1 0 MDumperror void darray darray 0 668 1 0 llrinatize void darray darray Grid 0 834 1 4 gt PB init_darrays void darray darray Decomposition Grid 0 24 1 2 Miteration 61 629 2 983 14 915 o Exchange void darray Decomposition Grid 956 296 5 966 11 932 o E MPI AllreduceQ 926 325 2 983 2 983 Bl Sweep double darray darray Decomposition 646 218 5 966 j Wun Baier 1 338 2 0 BB MPI_wtimeo 0 07 2 0 9 Mistarup int int char 5 65 1 8 o Bi uri casto 2 791 1 1 ME MPi_car_coordso 0 061 1 0 o e wi Con createo 0 594 1 3 W nP Cart shirto 0 087 1 o Wun Gomm ranko 0 054 2 0 Wun Comm aen 0 051 1 ol o mei inito 46 352 1 39 y Figure 4 5 Thread Statistics Table x Thread Stati
20. are summarized Initially we used similarity measures computed on a single parallel profile as input to the clustering algorithms although other forms of input are possible Here the performance data is organized into multi dimensional vectors for analysis Each vector represents one parallel thread or process of execution in the profile Each dimension in the vector represents an event that was profiled in the application Events can be any sub region of code including libraries functions loops basic blocks or even individual lines of code In simple clustering examples each vector represents only one metric of measurement For our purposes some dissimilarity value such as Euclidean or Manhattan dis tance is computed on the vectors As discussed later we have tested hierarchical and k means cluster analysis in PerfExplorer on profiles with over 32K threads of execution with few difficulties 4 1 Dimension Reduction Often many hundreds of events are instrumented when profile data is collected Clustering works best with dimensions less than 10 so dimension reduction is often necessary to get meaningful results Cur rently there is only one type of dimension reduction available in PerfExplorer To reduce dimensions the user specifies a minimum exclusive percentage for an event to be considered significant To reduce dimensions select the Select Dimension Reduction item under the Analysis main menu bar item The following dial
21. for the most time How much esses 11 6 3 Q What MFlops am I getting in all loops esse 12 6 4 Q Who calls MPI Barrier Where sssssssssseseee mm een 13 6 5 Q How do I instrument Python Code sss 14 6 6 Q What happens in my code at a given time mrerneerneeneenneeneeene 14 6 7 Q How does my application scale erreereeenee teen secu nene een ee nee 15 iv List of Figures 4T Main Data Window 5c deer cp Rh PR REPRE PIRE ORE EPI UID STREET PES 8 42 Mam Data Window si cere Pei e EE kaa Seege 9 SS PR STORIE FOU EEN EPOR ee 11 6 2 Flat Profile with Loops ee temere mire eves vihka SER E er as 11 6 3 MEIGpS per loop sair p e eat al rte e bor ree e eed 12 6 4 Callpath Profle LE 13 6 5 Tracing with Vample ex esee ore beh aaa Ro REP ola 14 6 6 Scalability Chart E Ee Sk En MANU 15 2 1 ParaProf Manager Window 8 2 2 Loading Profile Datacenter ete Reti e tese esten Eee eet ist 8 2 3 Creating Derived Metrics cesses eese eerte ener NEEN EE NNN ee 9 2 4 Main Data Window 7 noter t RR ee rr PIRE ES SERERE MESS rra ERRARE RES 10 2 3 Unstacked Bars vi E EET 10 3 1 Triangle Mesh Plot 3 rite eto e tre ERR E e st EE 11 3 2 3 D Mesh Plot voii ai tia E 11 3 3 3 D Scatter Plot orines rini erm SE Seed EE dee AEN 12 3 473 DiScatter Plot eges eee intet better etes 12 4 1 Thread Bar Grap
22. re organize the data based on values in the database 9 1 Creating Views To create a view select the Create New View item under the Views main menu item The first step is to select the table which will form the basis of the view The three possible values are Application Experiment and Trial Figure 9 2 Selecting a table 36 Views eoo Select Laval Select a level in the hierarchy Application he CED After selecting the table you need to select the column on which to filter Figure 9 3 Selecting a column OOO Select Column Select a column to filter name ES After selecting the column you need to select the operator for comparing to that column Figure 9 4 Selecting an operator OOO Select Operator Select an operator E A Cancel Co After selecting the operator you need to select the value for comparing to the column Figure 9 5 Selecting a value eoo Select Value Select a value te filter required Uintah WEI gyro B1 std PI zm HERA gyro B cy gyro B2 cy HPR gyro B3 gte ayro B3 ot HPM Pr socorro 5i 55 input a M After selecting the value you need to select a name for the view Figure 9 6 Entering a name for the view 37 Views eoo Enter View Mame Please enter a name for this view required Application name gyre Bl std HPM Cancel E After creating the view you will need to exit PerfE
23. window will bring up the images as shown below Figure 5 5 Correlation Example 18 Correlation Analysis analysis result Correlation Results r 0 9464011339119169 1 00 c SSI 0 90 0 85 0 80 0 75 4 0 70 0 65 0 60 0 55 0 50 0 45 0 40 0 35 0 30 1 0 25 0 20 0 15 0 10 0 05 0 00 0 0 0 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 0 DIFUZE E sPPM Frost 16 18 PAPI FP INS W Fitted Linear Regression Line W Fitted Power Regression Line OK 19 Chapter 6 Charts 6 1 Setting Parameters There are a few parameters which need to be set when doing comparisons between trials in the database If any necessary setting is not configured before requesting a chart you will be prompted to set the value The following settings may be necessary for the various charts available 6 1 1 Group of Interest TAU events are often associated with common groups such as MPI TRANSPOSE etc This value is used for showing what fraction of runtime that this group of events contributed to the total runtime Figure 6 1 Setting Group of Interest OOO Group of interest Please enter the group of interest CALCULATION H4 Cancel 6 1 2 Metric of Interest Profiles may contain many metrics gathered for a single trial This selects which of the available metrics the user is interested in Figure 6 2 Setting Metric of Interest OOO Metric of interesi Please enter the metric o
24. 00 425 450 475 500 525 Number of Pre E B1 sid nD2 cheelah affnosng ideal 6 2 6 Relative Speedup by Event The Relative Speedup By Event chart shows how the events in an application scale with respect to relat ive speedup That is as the number of processors increases by a factor the speedup is expected to in crease by the same factor with ideal scaling The ideal speedup is charted along with the actual spee dup for the application If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one experi ment or view and select this chart item under the Charts main menu item Figure 6 10 Relative Speedup by Event 24 Charts peedup by Event eoe Rela dl chestah arin w 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors WCol Col ir UO NL NI tr v extras gt field lin RHS other 4 ideal 6 2 7 Relative Speedup for One Event The Relative Speedup for One Event chart shows how one event in an application scales with respect to relative speedup That is as the number of processors increases by a factor the speedup is expected to increase by the same factor with ideal scaling The ideal speedup is charted along with the actual spee dup for the application If there is more than one event to choose from and you h
25. 2 cy b 3 gyro B2 cy HPM gt 2 gyro B3 gtc gt 2 gyro B3 gtc HPM Y GJ sPPM Y j Frost Y J 16 16 gt P_WALL_CLOCK_TIME gt PAPI FP INS gt PAPLINT INS gt Y PAPI TOT CYC gt PAPI TOT IIS gt PAPI TOT INS 0 25 d MT f y Analysis Management 4 Cluster Results Correlation Results 0 100 200 3p m n There are a number of images in the Cluster Results window From left to right the windows indicate the cluster membership histogram a PCA scatterplot showing the cluster memberships a virtual topo logy of the parallel machine the minimum values for each event in each cluster the average values for each event in each cluster and the maximum values for each event in each cluster Clicking on a thumb nail image in the main window will bring up the images as shown below Figure 4 7 Cluster Membership Histogram 11 Cluster Analysis eoe analysis resulte y sPPM AErost1 6 1 8612 WALL CLOCK TIME Threads in cluster 0 25 50 75 100 125 150 175 200 225 250 r Humber luste r a E Threads in cluster Figure 4 8 Cluster Membership Scatterplot 12 Cluster Analysis ases a300 0275 0 225 2 200 aiso 225 0075 aoso D 647INTERF anos 2 000 0025 MUSU nos 2100 0 125 0 275 H 0200 2 225 s to 05 analysis_result PCA Results 04 03 KP En au al 0 861barrier
26. 379 765 7 858 701 7 844 652 999 82 42 889 89 021 10 4 248 989 42 706 41 886 125 662 6 819 345 DO 556 12 265 7 429 6 572 66 891 71 088 37 035 25 2 843 535 71 113 71 059 6 766 0 004 754 230 156 H 101 7 867 625 757 047 752 633 303 349 0 022 2 294 469 0 001 101 22 718 2 379 2 244 14 74 D 227 813 23 0 112 25 9 112 529 228 059 227 387 39 408 0 007 328 622 634 0 161 100 3 286 226 329 771 327 121 417 024 0 025 157 284 851 0 077 26 6 049 417 157 472 157 125 51 884 0 012 12 323 665 0 006 556 22 165 12 902 12 125 86 167 0 001 12 134 025 0 006 556 21 824 12 521 11 922 62 659 4 788 719 0 002 556 8 613 5 33 4 559 46 885 5 118 220 2 512 50 102 364 401 5 322 15 5 005 247 54 819 037 1 419 136 0 696 1 668 850 801 32 85 1 411 972 732 407 554 854 428 0 272 250 2 219 418 556 050 554 079 256 157 357 027 69 0 175 27 13 223 248 357 315 355 737 544 914 ERRFUNC 24 302 567 0 012 1 24 302 567 24 845 23 678 138 443 FFT_CLOSE 240 424 1 240 424 247 230 1 109 FFT_SETUP 146 289 663 0 072 1 146 289 662 146 460 145 969 46 618 9 634 772 0 005 48 174 9 836 9 445 48 095 1 307 610 0 642 2 179 351 1 327 401 1 305 728 3 390 665 AAT 3 214 4 8 3 Creating a Boxchart In order to see a boxchart summary of the performance data in the database select the Create Boxchart item under the Visualization main menu item Figure 8 3 Boxchart 32 Visualization e TAU PerfExplorer Distributions of Significant Events File Help Significant 2 096 o
27. 9 0 0 47258 e 5 0 10 0 0 7 11 C 1 11 0 0 47258 e nC 12 0 0 47258 e ct 13 0 0 47258 nct 14 0 0 472538 e nC 16 0 0 47253 e DC 32 0 0 472558 eg ct 48 0 0 E eg eg EE 4 This display graphs the value that the particular user event had for each thread 8 2 Ledgers ParaProf has three ledgers that show the functions groups and user events 8 2 1 Function Ledger Figure 8 2 Function Ledger 26 Miscellaneous Displays x x Function Ledger Window uintah1 6 ppk packedidata amorris hom 0 File Windows Help Bl Add Reference data Particlevariable T gt allocated Bl Add Reference pset Particlevariable T allocate Allocate Data Particlevariable T allocate Contact exMomintegrated MPlscheduler execute Contact exMominterpolated MPIScheduler executed DataArchiver outputCheckpointReduction MPI5cheduler execute0 MPIScheduler actuallyCompileQ MPIScheduler executed MPIScheduler postMPIRecvs0 MPIScheduler processMPIRecvsQ MPI AllreduceQ MEI Aar get MPI Ar put MPI Bsend MPI Buffer attach MPI Buffer detach MPI Comm rankQ MPI_Comm_sizeQ v AI EIEIETISL CT ET EET UB UE E A UR URL ER E il The function ledger shows each function along with its current color As with other displays showing functions you may right click on a function to l
28. Analyzing Parallel Applications 4 3 Jumpshot To use Argonne s Jumpshot bundled with TAU first merge and convert TAU traces to slog2 format tau treemerge pl tau2slog2 tau trc tau edf o tau slog2 jumpshot tau slog2 o oe oe Launching Jumpshot will bring up the main display window showing the entire trace zoom in to see more detail Figure 4 2 Main Data Window E TimeLine tau slog2 Identity Map gt a v y B lt gt ajajaja Lowest Max Depth 4 Zoom Level Global Min TimeView Init TimeZoom Focus TimeView Final TimeGlobal Max TimeTime Permet d s 0 0 0 0014057542 0 0014919858 0 0015782174 0 003914 0000003248 AE Row Com m TimeLines 5 0 l l 0 00142 0 00144 0 00146 0 00148 0 0015 0 00152 0 00154 0 00156 Time seconds 4 Chapter 5 Quick Reference tau run TAU s binary instrumentation tool tau cc sh tau options optCompInst tau cxx sh tau options optCompInst tau_f90 sh tau options optCompInst Compiler wrappers Compiler instrumentation tau cc sh tau cxx sh tau f90 sh Compiler wrappers PDT instrumentation TAU MAKEFILE Set instrumentation definition file TAU OPTIONS Set instrumentation options dynamic phase name name file filename line start line 4 to line end line Specify dynamic Phase loops file filename routine routine name Instrument ou
29. E jacu 0 759 EI jacld 0 691 exchange 1 0 415 bcast inputs 0 088 exchange 3 0 069 setiv 0 03 exact 0 024 MPI Allgather 0 015 erhs 0 014 read input 0 012 error 0 01 MPI_Allreduceg 0007 MPI iren z This display graphs each function on a particular thread for comparison The metric units and sort order can be changed from the Options menu 4 2 Thread Statistics Text Window Figure 4 2 Thread Statistics Text Window 14 Thread Based Displays X n c t 0 0 0 Application 13 Experiment 23 Trial 58 O n x File Options Windows Help Metric Name Time Sorted By Exclusive Units seconds SIIT a Total Time Exclusive Inclusive Calls Child Calls Inclusive Call Name 49 8 13 878 13 878 80000 0 1 7348E 4 MPI_Recv 18 0 4 983 5 008 d 2 5 008 MPI Init 8 5 1 483 2 368 40000 80000 5 9202E 5 bits 9 9 1 397 2 76 251 502 0 011 rhs 55 7 1 287 15 528 40000 80000 3 8819E 4 buts 3 8 1 069 1 069 508 0 0 002 MPI Wait 3 1 0 867 0 867 80504 0 1 0771E 5 MPI Send 2 8 0 768 0 769 40000 0 1 9213E 5 jacu 2 7 0 759 0 759 40000 0 1 8968E 5 icld 54 2 0 691 15 125 160000 160000 9 4533E 5 exchange 1 81 1 0 415 22 608 3 160267 7 536 hcast inputs 5 2 0 088 1 452 504 1512 0 003 exchange 3 0 3 0 068 0 094 1 48000 0 094 setiv 0 1 0 03 0 03 58886 0 5 1404E 7 exact 0 1 0 024 0 024 2 Q 0 012 0 4 0 015 0 104 T 2 0 104 erhs 0
30. FcPolvFunction NDIM FIT func save Note Only the functions profilied in ParaProf can be excluded If you had previously setup se lective instrumentation for this application the functions that where previously excluded will not longer be excluded 28 Chapter 9 Preferences Preferences are modified from the ParaProf Preferences Window launched from the File menu Prefer ences are saved between sessions in the ParaProf ParaProf prefs 9 1 Preferences Window In addition to displaying the text statistics for User Defined Events ParaProf can also graph a particular User Event across all threads Figure 9 1 ParaProf Preferences Window ParaProf Preferences n File Font sansserif I n c t 0 0 0 FR nct 0 01 FR m Bold Size BE _ Italic 0 10 20 30 40 Window defaults Settings C Show Path Title in Reverse Units Microseconds j D la Reverse Call Paths Interpret threads that do not call a given L function as a 0 value for statistics computation Generate data for reverse calltree J requires lots of memory C Show Values as Percent does not apply to currently loaded profiles C Show Source Locations Restore Defaults Apply Cancel The preferences window allows the user to modify the behavior and display style of ParaProfs win dows The font size affects bar height a sample display is sh
31. File Options Windows Help Metric P WALL CLOCK TIME alue Exclusive std dev mean n c t 0 0 0 n c t 1 0 0 n c t 2 0 0 n ct 3 0 0 nct 4 0 0 nat 5 0 0 n c t 6 0 0 nat 7 0 0 n c t 8 0 0 n ct 9 0 0 n c t 10 0 0 nct 11 0 0 nct 12 0 0 nct 13 0 0 n c t 14 0 0 nct 15 0 0 I il TT TIT AA aaa La Senger UI AS v m I1 You may also turn off the stacking of bars so that individual functions can be compared across threads in a global display 10 Chapter 3 3 D Visualization ParaProf displays massive parallel profiles through the use of OpenGL hardware acceleration through the 3D Visualization window Each window is fully configurable with rotation translation and zooming capabilities Rotation is accomplished by holding the left mouse button down and dragging the mouse Translation is done likewise with the right mouse button Zooming is done with the mousewheel and the and keyboard buttons 3 1 Triangle Mesh Plot Figure 3 1 Triangle Mesh Plot X ParaProf Visualizer Application 13 Experiment 23 Trial 58 OH File Options Windows Help 4 a Triangle Mesh Bar Plot Scatter Plot Height Metric Exclusive v Time Color Metric Exclusive e Time MPI_RecvO Function Thread Height value 14 37 seconds Color value 14 37 seconds Mesh Plot Axes ColorScale Render Plot Width dy Plot Depth wao Plot
32. GET TIME OF DAY Divide S 2 Apply operation In Figure 2 3 Creating Derived Metrics we have just divided Floating Point Instructions by Wall clock time creating FLOPS Floating Point Operations per Second The 2nd argument is a user editable text box and can be filled in with scalar values by using the keyword val e g val 1 5 2 5 Main Data Window Upon loading a profile or double clicking on a metric the Main Data Window will be displayed Profile Data Management Figure 2 4 Main Data Window X ParaProt uintah16 ppk packed data DOEN File Options Windows Help Metric P WALL CLOCK TIME alue Exclusive std dev mean nct 0 00 n c t 10 0 nct 2 0 0 nct 3 0 0 n c t 4 0 0 n c t5 0 0 n c t 60 0 nct 7 0 0 n c t 8 0 0 nct 9 0 0 n c t 10 0 0 nct 11 0 0 nct 12 0 0 n c t 13 0 0 n c t 14 0 0 n c t 15 0 0 This window shows each thread as well as statistics as a combined bar graph Each function is represen ted by a different color though possibly cycled From anywhere in ParaProf you can right click on ob jects representing threads or functions to launch displays associated with those objects For example in Figure 2 4 Main Data Window right click on the text n c 8 0 0 to launch thread based displays for node 8 Figure 2 5 Unstacked Bars X ParaProt uintah16 ppk packed data m x
33. Height De Kg Transparency This visualization method shows two metrics for all functions all threads The height represents one chosen metric and the color another These are selected from the drop down boxes on the right To pinpoint a specific value in the plot move the Function and Thread sliders to cycle through the avail able functions threads The values for the two metrics in this case for MPI Recv on Node 351 the value is 14 37 seconds 3 2 3 D Bar Plot Figure 3 2 3 D Mesh Plot 11 3 D Visualization This visualization method is similar to the triangle mesh plot It simply displays the data using 3d bars instead of a mesh The controls works the same Note that in Figure 3 2 3 D Mesh Plot the transpar ency option is selected which changes the way in which the selection model operates 3 3 3 D Scatter Plot Figure 3 3 3 D Scatter Plot X ParaProt Visualizer Application 13 Experiment 23 Trial 58 SUE File Options Windows Help AT 2 Triangle Mesh Bar Plot e Scatter Plot MPISend Width Exclusive Time w MPL_Recv Depth E 7 Exclusive Time ei puts 0090 J Height Exclusive Time v exchange_3 Exclusive S ScatterPlot Axes ColorScale Render Point size Point detail This visualization method plots the value of each thread along up to 4 axes each a different fu
34. Miscellaneous Displays X user Event Window uintah16 ppk packed daf 0 File Windows Help E Message size for gather M message size for reduce H Message size received from all nodes E Message size sent to all nodes The user event ledger shows each user event along with its current color 8 3 Selective Instrumentation File Generator ParaProf can also help you refine your program performance by excluding some functions from instru mentation You can select rules to determine which function get excluded both rules must be true for a given function to be excluded Below each function that will be excluded based on these rules are listed Figure 8 5 Selective Instrumentation Dialog TAU ParaProf Selective Instrumentation File Generator Output File home amorris taudocs paraprof select tau v Exclude Throttled Routines v Exclude Lightweight Routines Lightweight Routine Exclusion Rules Microseconds per call 10 Number of calls 100000 Excluded Routines double FcCoreSourceConst getSource const std string double const double FcCoreSourceProfile getSource const std string 8 double const double FcCoreSourceProfile remap double const double FcTmCoreFluxCalc getFlux const std string amp FcArray NDIM T gt getBox FcDistArray NDIM T gt upperExt FcDomain NDIM getDx FcInterpolator NDIM FLT gt func
35. Por Poe ode get 32 O En EE 32 8 4 Histogram bore PRICE Haase haa voted Sea PERPE E Tarte E de POUR E E REED 600 33 8 5 Normal Probability eec eee leere rib E gusta cues RYE Ie ENEE 34 9 1 Potential scalability data organized as a parametric study ooocooccnnccnnconoconcconocononanicnninnns 36 9 2 Selecting a table iere eere ret ENEE verses ieee 36 9 3 Selecting Column NEE 37 9 4 Selecting an Operator se scavator iei ree eee idee pev ed eoe te parentesis oed s 37 9 5 Selecting a Value eege Eee Lee eese EE EE EE 37 9 6 Enterng a name Tor the VieW sssrin E Ur p jak ee ue RU eba dee Do Pe eee doe eg de 37 9 7 The completed view 1e Ee ia 38 9 8 Selecting Che age View sete Edel Ore Seife e Sn KUU S 38 9 9 Completed SUD VIeWS e Ru eere ep dedere e ed 39 Chapter 1 Introduction PerfExplorer is a framework for parallel performance data mining and knowledge discovery The frame work architecture enables the development and integration of data mining operations that will be applied to large scale parallel performance profiles The overall goal of the PerfExplorer project is to create a software to integrate sophisticated data mining techniques in the analysis of large scale parallel performance data PerfExplorer supports clustering summarization association regression and correlation Cluster ana lysis is the process of organizing data points into logically similar groupings called clusters Summariz at
36. TAU User Guide TAU User Guide Updated November 11th 2010 for use with version 2 20 or greater Copyright 1997 2011 Department of Computer and Information Science University of Oregon Ad vanced Computing Laboratory LANL NM Research Centre Juelich ZAM Germany Permission to use copy modify and distribute this software and its documentation for any purpose and without fee is hereby granted provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation and that the name of University of Oregon UO Research Centre Juelich ZAM and Los Alamos National Laboratory LANL not be used in advertising or publicity pertaining to distribution of the software without specif ic written prior permission The University of Oregon ZAM and LANL make no representations about the suitability of this software for any purpose It is provided as is without express or implied war ranty UO ZAM AND LANL DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IN NO EVENT SHALL THE UNIVERSITY OF OREGON ZAM OR LANL BE LIABLE FOR ANY SPE CIAL INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RES ULTING FROM LOSS OF USE DATA OR PROFITS WHETHER IN AN ACTION OF CON TRACT NEGLIGENCE OR OTHER TORTIOUS ACTION ARISING OUT OF OR IN CONNEC TION WITH THE USE OR PERFORMANCE OF THIS
37. ULL gt 3 FULL16 gt 3 FULL32 gt 3 FULL64 gt 3 FULL96 gt 3 FULL128 Y 49 Trial problem definition HALF gt 3 HALF16 gt 3 HALF32 gt 3 HALF64 gt 3 HALF96 gt 7 HALF128 Y 2 Trial problem_definition MIN gt 3 MIN16 gt 3 MIN32 gt 3 MIN64 gt 3 MIN96 gt 3 MIN128 39 Chapter 10 Running PerfExplorer Scripts 10 1 As of version 2 0 PerfExplorer has officially supported a scripting interface The scripting interface is useful for adding automation to PerfExplorer For example a user can load a trial perform data reduc tion extract out key phases derive metrics and plot the result Analysis Components There are many operations available including e BasicStatisticsOperation e CopyOperation e CorrelateEventsWithMetadata e CorrelationOperation e DeriveMetricOperation e DifferenceMetadataOperation e DifferenceOperation e DrawBoxChartGraph e DrawGraph e DrawMMMGraph e ExtractCallpathEventOperation e ExtractEventOperation e ExtractMetricOperation e ExtractNonCallpathEventOperation e ExtractPhasesOperation ExtractRankOperation e KMeansOperation e LinearRegressionOperation e LogarithmicOperation e MergeTrialsOperation e MetadataClusterOperation e PCAOperation 40 10 2 10 3 Running PerfExplorer Scripts e RatioOperation e ScalabilityOperation e TopXEvents e TopXPercentEvents Scripting Interface The scrip
38. a Trial 1 430 gt 3 Por gt 3 SHAMRC gt 3 SMG2000 gt 3 SPhot gt 3 Uintah gt 3 wee gt 3 gyro B1 sid gt 13 gyro B1 sid HPM gt 3 gyro B2 cy gt 13 gyro R2 lt y HPM l gt 3 gyro E3 gic gt 13 gyro B3 gic HP v 3 sPPM WS Frost v 16 16 gt P_WALL CLOCK TIME gt PAPI FP INS b 0 PAPLINT_INS gt 6 PAPI TOT CYC gt 6 PAPI TOT IIS gt PAPI TOT INS gt 3 socorro 5Si2 56 input gt 3 Views After selecting the metric of interest select the Do Clustering item under the Analysis main menu bar item The following dialog will appear Figure 4 5 Confirm Clustering Options 10 Cluster Analysis eoo Confirm Analysis Analysis method K Means Dimension Reduction none Normalization none Max Clusters 10 Trial sPPM Frost 16 16 P WALL CLOCK TIME Perform clustering with the these options OA A A Ma not yer After confirming the clustering the clustering will begin When the clustering results are available you can view them in the Cluster Results tab Figure 4 6 Cluster Results PerfExplorer Client File Analysis Views Charts Visualization Help Performance Data Y Database Profiles gt J AVUS gt 2 BigScience gt 3 CFDSHIP gt 2 LAMMPS Large scale Atomic Molecula gt 3 Miranda gt 3 POP gt 3 SHAMRC b SMG2000 gt 3 SPhot gt J Uintah gt WRF gt 3 gyro B1 std gt 3 gyro B1 std HPM gt 3 gyro B
39. a ee dl idas 12 Thread Bar Graph ui oe dete Geta t e te eue tii 14 Thread Statistics Text Window sss eene ee enemies eens 14 Thread Statistics Table inclusive and exclusive sese 15 Thread Statistics Table o ERI e e env cena e OUO De 15 Thread Statistics Table t itr EE D MAE tay 16 Call Graph Window A NEE Mehi ree EEE EE stevens 16 Thread Call Path Relations Window sse m eme ee enemies 17 User Event Statistics Window rrrerneenneenennnenren eese heres sese eere serene 18 User Event Thread Bar Chart Window sss eene eene 18 Function Bar EE 20 Punction Histogram ii sii 20 Initial Phase BEIM LETT 22 Phase Led LEE 22 gt Function Data Over Phases ss uec rite er Oe ER Misit odd EE EEEE SER 23 Comparison Window initial 2 2 0 0 cece sees eee eene eene 24 Comparison Window 2 trials 4 et erret CI SERERE POR EE ERE TREES 24 Comparison Window 3 threads oococcnoccccnonncononocononocononoconoroconoroconororonororonrorinonos 25 User Event Bar Graph assise ona rre eet Re re d precede ter kart 26 Function Ledger been dees eorr bei t de E jens abba eror ee EEN 26 Group Ledger x uL e a ESSEN 27 User Ey nt Ledger ee copas se eode teet ebur perte em cet es sense anta SABA S Origo EROR 27 Selective Instrumentation Dialog tereeeeeeeeeereeeneeeneneeen neem nene eenee 28 ParaProf Preferences
40. a ee n even enne nene nene nene ee nene 40 10 2 Scriptmg Interface pus den pe Reena acted I ege dee est ee 41 10 3 EE 41 11 Deriyed Metrics eigene ee ANERE ease sadam ED RUBRI eer eR 44 11 1 CreatingExpressions geese re nieve erede Eed EE See 44 11 2 Selecting Expressions 5 e rri tere rto rime Gen beasties kad ERE PII SERRE ness 44 11 3 Expression Files nre eR E EIUS 44 iv List of Figures 4 1 Selecting a dimension reduction method teieeereereeeneeenen nene nene nene nene 9 4 2 Entering a minimum threshold for exclusive percentage cee cece ceeeeneeee teen een eene eens 9 4 3 Entering a maximum number of clusters esses 10 4 4 Selecting a Metric to Cluster 10 4 5 Confirm Clustering Options rererreereernerteeneeeneeeneeeeeen eee en enne nene 10 4 6 Cluster Results x t ses doce dee cero ette o tee dedero ineo Oe be Posted IER d ege 11 4 7 Cluster Membership Histogram sse emen emm enne 11 4 8 Cluster Membership Scatterplot HH 12 4 9 Cluster Virtual Topology asii e e e HE ered a Me eee eee ee 13 4 10 Cluster Average Behavior escocia deier sree det eet hess eaten e despues Eine Ede 14 5 1 Selecting a dimension reduction method esses 16 5 2 Entering a minimum threshold for exclusive percentage oocoooccnnccnnconocnniconaconocnnicnninnn 16 5 3 Selecting a Metric to Cluster 16 5 4
41. able PRO FILEDIR By default profiles are placed in the current directory You can also set the TAU VERBOSE enviroment variable to see the steps the TAU measurement systems takes when your application is run ning Example setenv TAU VERBOSE 1 setenv PROFILEDIR home sameer profiledata experiment55 mpirun np 4 matrix oe oe oe Other environment variables you can set to enable these advanced MPI measurement features are TAU TRACK MESSAGE to track MPI message statistics when profiling or messages lines when tracing and TAU COMM MATRIX to generate MPI communication matrix data 2 2 Reducing Performance Overhead with TAU THROTTLE TAU automatically throttles short running functions in an effort to reduce the amount of overhead asso ciated with profiles of such functions This feature may be turned off by setting the environment variable TAU_THROTTLE to O The default rules TAU uses to determine which functions to throttle is numc alls 100000 amp amp usecs call 10 which means that if a function executes more than 100000 times and has an inclusive time per call of less than 10 microseconds then profiling of that func tion will be disabled after that threshold is reached To change the values of numcalls and usecs call the user may optionally set environment variables setenv TAU THROTTLE NUMCALLS 2000000 setenv TAU THROTTLE PERCALL 5 9 5 9 5 The changes the valu
42. ace file into another format use the tau2otf tau2vtf or tau2slog2 scripts Chapter 4 Analyzing Parallel Applications 4 1 Text summary For a quick view summary of TAU performance use pprof It reads and prints a summary of the TAU data in the current directory For performance data with multiple metrics move into one of the director les to get information about that metric gt cd MULTI P WALL CLOCK TIM gt pprof Reading Profile files in profile GI NODE 0 CONTEXT 0 THREAD 0 Time Exclusive Inclusive Call Subrs Inclusive Name msec total msec usec call 100 0 24 590 1 iL 590963 main 959 26 566 1 2 566911 multiply 47 3 279 279 1 0 279280 multiply opt 44 1 260 260 1 0 260860 multiply regula 4 2 ParaProf To launch ParaProf execute paraprof from the command line where the profiles are located Launching ParaProf will bring up the manager window and a window displaying the profile data as shown below Figure 4 1 Main Data Window X ParaProt uintah16 ppk packed data 0 x File Options Windows Help Metric P_WALL_CLOCK_TIME alue Exclusive std dev mean n c t 0 0 0 n c t 10 0 n c t 2 0 0 n c t 3 0 0 nat 4 0 0 n c t 5 0 0 nct 6 0 0 nct 7 0 0 n c t 8 0 0 n c t 9 0 0 nct 10 0 0 nct 11 0 0 n c t 12 0 0 nct 13 0 0 n c t 14 0 0 n c t 15 0 0 For more information see the ParaProf section in the reference guide
43. add threads to the window right click on them and select Add Thread to Comparison Window The Comparison Window will pop up with the thread selected Note that mean and std dev are considered threads for this any most other pur poses Figure 7 1 Comparison Window initial X Comparison Window Dx File Options Windows Help Metric Time lu A 128 Mean Value Exclusive Units seconds 4 802 uuu MP Recv 0 908 EH MPLInit 0 699 E Mptzengn 0 419 EX bits 0 384 EX buts 0 376 EX MPI Wait 0 375 Eu exchange_1 0 242 mu ths 0 177 E jacu 0 168 E jacld 0 141 El bcast inputs 0 058 MPI BcastQ m 0 051 exchange_3 0 045 MPI_Allreduced 0 017 setiv 0 013 MPI Allgather 0 008 error 114 Add additional threads from any trial by the same means Figure 7 2 Comparison Window 2 trials 24 Comparative Analysis x Comparison Window mE File Options Windows Help Metric Time Miu A 128 Mean alue Exclusive li iu C 512 Mean Units seconds 4207 eng Do HL er 0 908 Ez 2 015 O 0 699 E 2 386 amm 1 020 0 419 B ins 1 462 mmm 0 384 E 1326 eg Di 0 376 E A 0 651 Ee MPI_WaitQ 0 375 E 1 003 EE exchange 1 0242 B 4 1 392 ess 0377 l acy 0 776 END 0 168 H 0 724 mm C 0 141 H E 044 E bcast_inputs gt Figure 7 3 Comparison
44. am of each thread s value for the given function Hover the mouse over a given bar to see the range minimum and maximum and how many threads fell into that range You may also change the units and metric displayed from the Options menu You may also dynamically change how many bins are used 1 100 in the histogram This option is available from the Options menu Changing the number of bins can dramatically change the shape of the histogram play around with it to get a feel for the true distribution of the data 21 Chapter 6 Phase Based Displays When a profile contains phase data ParaProf will automatically run in phase mode Most displays will show data for a particular phase This phase will be displayed in teh top left corner in the meta data pan el 6 1 Using Phase Based Displays The initial window will default to top level phase usually main Figure 6 1 Initial Phase Display x ParaProf phase3d new ppk uintah Phase main void int char ill Akad File Options Windows Help Phase mainO void int char Metric Time alue Exclusive std dev mean nct 0 00 n c t 1 0 0 n c t 2 0 0 nct 3 0 0 n c t 4 0 0 n c t 5 0 0 n c t 6 0 0 nct 7 00 nct 8 0 0 n c t 9 0 0 n c t 10 0 0 nct 11 0 0 I l i To access other phases either right click on the phase and select Open Profile for this Phase or go to the Phase Ledger an
45. ath opt apps tau tau 2 17 1 x86 64 bin Spath make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh qsub run job paraprof pack app ppk Move the app ppk file to your desktop o o oe oe 9 paraprof app ppk 6 2 Q What loops account for the most time How much A Create a flat profile with wallclock time with loop instrumentation Figure 6 2 Flat Profile with Loops 11 Some Common Application Scenario tenv TA Metric GET TIME OF DAY Value Exclusive Units microseconds 1729975 833 UU Loop MULTIPLY_MATRICES matmull f90 31 9 36 14 443194 eeng MPI_Recv 81095 5 MAIN 49569 lj MPI_Bcast 45669 D Loop MAIN matmult 90 86 9 106 14 12412 MPI Send 8959 Loop INITIALIZE matmult 0 17 9 21 14 8953 Loop INITIALIZE matmutt 30 10 9 14 14 5609 2 MPI Finalize 2932 667 MULTIPLY_MATRICES 2577 667 Loop MAIN matmult 90 117 9 128 14 2091 8 MPI Barrier 1875 667 Loop MAIN matmult f90 112 9 115 14 1833 Loop MAIN matmult 90 71 9 74 14 107 Loop MAIN matmult 90 77 SH84 14 30 INITIALIZE 14 25 MPI Comm rank 1 MPI Comm ste Here is how to instrument loops in an application EFILI setenv TA oP op oe EGIN INSTRUM IONS NT SECTION ops routine 4 D INSTRUMENT SECTION opt apps tau tau 2 17 1 x86 64 bin Spath tau f90 sh Or ed
46. aunch other function specific displays 8 2 2 Group Ledger Figure 8 3 Group Ledger X Group Ledger Window uintah16 ppk packe X File Windows Help H contact exMomintegrated Bl contact exMominterpolated Bl Data rchiver outputCheckpointReduction O mel Wl wu cactualiyinitialize MPM applyExternalLoads Bl mem computeinternalForce E MPM computelnternalHeatRate E MPM computeStressTensor RW MPM integrateAcceleration O MPM integrateTemperatureRate Bl Mem interpolateParticlesToGrid Bl vru interpolateToParticlesAndUpdate H mem primPaniclecount Bl nen setGridBoundaryConditions O MPM solveEquationsMotion Bl vru solveHeatEquations MPM updateErosionParameter Bl Relocate relocateParticles E TAU CALLPATH E TAU DEFAULT B TAU_USER O TAU_USER3 Bl Thermalcontact computeHeatExchange H send old data The group ledger shows each group along with its current color This ledger is especially important be cause it gives you the ability to mask all of the other displays based on group membership For example you can right click on the MPI group and select Show This Group Only and all of the windows will now mask to only those functions which are members of the MPI group You may also mask by the in verse by selecting Show All Groups Except This One to mask out a particular group 8 2 3 User Event Ledger Figure 8 4 User Event Ledger 27
47. ave not yet selected an event of interest you may be prompted to select the event of interest see Section 6 1 3 Event of In terest If there is more than one metric to choose from you may be prompted to select the metric of in terest see Section 6 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 11 Relative Speedup one Event eoe Rs Relative peedup for Event dup for Calll_trwalll y 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B B4 oid nD cheetah affnosng ideal 6 2 8 Group of Total Runtime The Group 46 of Total Runtime chart shows how the fraction of the total runtime for one group of events changes as the number of processors increases If there is more than one group to choose from and you have not yet selected a group of interest you may be prompted to select the group of interest see Sec 25 Charts tion 6 1 1 Group of Interest If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 12 Group of Total Runtime Transpese Time Total Runtime tion Time T all 25 50 75
48. d ppk format dump Dump profile data to TAU profile format O OSS Print profile data in OSS style text output S summary Print only summary statistics only applies to OSS output Notes For the TAU profiles type you can specify either a specific set of profile files on the commandline or you can specify a directory by default the current directory The specified directory will be searched for profile files or in the case of multiple counters directories named MULTI containing profile data 1 2 Supported Formats ParaProf can load profile date from many sources The types currently supported are Introduction TAU Profiles Output from the TAU measurement library these files generally take the form of profile X X X one for each node context thread combination When multiple counters are used each metric is located in a directory prefixed with MULTI To launch ParaProf with all the metrics simply launch it from the root of the MULTI directories e pprof Dump Output from TAU s pprof d Provided for backward compatibility only e DynaProf Output From DynaProf s wallclock and papi probes mpiP Output from mpiP e gprof Output from gprof see also the fixnames option e HPM Toolkit Output from HPM Toolkit e ParaProf Packed Format Export format supported by PerfDMF ParaProf Typically ppk Cube Output from Kojak Expert tool for use with Cube HPCToolkit XML da
49. d select it there Figure 6 2 Phase Ledger X Phase Ledger c phaselexamples tau2 amol Dx File Windows Help 10 Phase Iteration O Iteration 1 Iteration 2 Iteration 3 Iteration 4 maing ParaProf can also display a particular function s value across all of the phases To do so right click on a 22 Phase Based Displays function and select Show Function Data over Phases Figure 6 3 Function Data over Phases X nc0 0 0 Function Data c phase examples tau2 amorris home W i HE File Options Windows Help Name mann Metric Name Time alue Exclusive Units seconds 1 002 mE X n M eration O 1 002 main 1000 E eration 2 1 002 e iteration 4 1000 E eration 3 1002 E iteration 1 Because Phase information is implemented as callpaths many of the callpath displays will show phase data as well For example the Call Path Text Window is useful for showing how functions behave across phases 23 Chapter 7 Comparative Analysis ParaProf can perform cross thread and cross trial anaylsis In this way you can compare two or more trials and or threads in a single display 7 1 Using Comparitive Analysis Comparative analysis in ParaProf is based on individual threads of execution There is a maximum of one Comparison window for a given ParaProf session To
50. e the charts available in PerfExplorer are primarily designed for scalability analysis however data might be loaded as a parametric study For example in the following example the data has been loaded with three problem sizes MIN HALF and FULL Figure 9 1 Potential scalability data organized as a parametric study PerfExplorer Client File Analysis Views Charts Visualization Help gt 3 SHAMRC b 2 5MG2000 b 3 SPhor gt 3 Uintah gt Owe gt 3 gyro B1 sid Y US oyro B1 std HPhi Y 7 Hemas gt 3 FULLIS gt 3 HALFI6 gt 3 MINIS Y 15 HPMO32 gt 3 Fu 32 gt 3 HALE22 gt 0 MIN32 Y Lo HPMO64 gt 3 FULL64 gt 3 HALF64 gt 3 MING Y J HPM096 b 2 FULLOS gt 3 HALF96 gt 3 MIN96 Y yo HPR128 gt 3 FULL128 gt 3 HALF128 gt 2 MIN128 EH SE LE ro FI 1 Cluster Results Correlation Results Field Name Experiment 1 n nare machine_type arch system os em memory size compiler epp name compiler cpp version compiler cc name compiler cc version compiler java_dirpath compiler java version compiler userdata configure prefix configure arch configure pp configure cc corifigure_jdk configure profile configure userdata userdata Valus HPhi 16 1 In order to examine this data in a scalability study it is necessary to reorganize the data However it is not necessary to re load the data Using views in PerfExplorer you can
51. e analysis results are available you can view them in the Correlation Results tab Figure 5 4 Correlation Results 17 Correlation Analysis Ti TAU PerfExplorer Client Bae File Analysis Views Charts Visualization Help EJ NPB LU mcr lini gov Analysis Management Cluster Results Correlation Results Custom Charts 7 NPB LU neuronic nic uoregon edu gt CI NPB LU on MCR parametric gt 7 PatWorley EI PERI cI PoP 3530 jacquard 153D Jaguar ORNL 53D p655 3 SHAMRC 7 simple papi DDYNAMIC MATRIX gt 7 simple papi DSTATIC MATRIX 7 Simulation MCR weak scaling d 6 a o x a EI SMG2000 8 y SE C socorro Si256 input Dor d gt 5 SPhot SEH C sPPM 0 0 05 10 Frost 49 15 16 10 eg P WALL CLOCK TIME 0 75 OPAPLEE NS zm eal PAPLINT INS 025 P 9 PAPI_TOT_CYC 0 00 MM PAPI TOT IIS 00 05 10 PAPI TOT INS CI MCR Weak Scaling den oe CI MCR Weak Scaling unmodified 028 gt 9 sweep3d es Zu o C Sweep3d R o C Sweep 3d ooo La o C Sweep3D o test 1 00 s 5 Uintah 0 75 3 WRF 0 50 ec xoci 0 25 C Views en 05 10 I il I There are a number of images in the Correlation Results window Each thumbnail represents a pair wise correlation plot of two events Clicking on a thumbnail image in the main
52. eee Ae 21 6 2 1 Timesteps Per Second sii ee lo fade 21 6 222 Relative Bfliclency ossen ee poros et dek VEER EES enreda VER de 22 6 2 3 Relative Efficiency by Event ocoooconoconccnnccnnccnncnnoconccnnccnnccnnccnnions 22 6 2 4 Relative Efficiency for One Event oocococnnccnnccnnccnoconccnnccnnconnccnnions 23 6 2 5 Relative Speedup dee RENE AEN te etum Tee RP evens 24 6 2 6 Relative Speedup by Event sess 24 6 2 7 Relative Speedup for One Event esssee 25 6 2 8 Group of Total Runtime sse 25 6 2 9 Runtime Breakdown sess eene 26 6 3 Phase Chart Types iere reise rte DP e er RE vie 26 6 3 1 Relative Efficiency per Phase 27 6 3 2 Relative Speedup per Phase ss teen seca sean enne 27 6 3 3 Phase Fraction of Total Runtime eH 28 T Custom Charts eed en eatem e rude ee wee Mey dee petia SALU Rep seuss 29 8 VISUALIZAHON tt EE e oO eret To ETRE Er tas OD EO ast ERE Ee Ee OE ses 31 8 1 3D Visualization ee nsec IER dee EE ANEN deg 31 8 2 Data SUMMALY ET 31 8 3 Creating a Boxchart ENKEN ENEE NN EEENNNE eedem tee benyeccbvagecesveaseess 32 8 4 Creating A AAA tee Peter eere eR NEEN 33 8 5 Creating a Normal Probability Chart 34 H VIEWS dnd O iege ite deese Urbe b c Een 36 9 1 Creatmg VIEWS REL 36 9 2 Creating EE 38 10 Running PerfExplorer Scripts sess 40 10 1 Analysis Components sssini seirer nen
53. entOperation resultl events extracted extractor processData get 0 print extracted phases f get the Statistics dostats BasicStatisticsOperation extracted False stats dostats processData print got stats for metric in stats get 0 getMetrics grapher DrawMMMGraph stats metrics HashSet metrics add metric grapher set_metrics metrics grapher setTitle subsetevent metric grapher setSeriesType DrawMMMGraph TRIALNAME grapher setCategoryType DrawMMMGraph EVENTNA grapher setValueType AbstractResult INCLUSIV grapher setLogYAxis True grapher processData Hs Li return print JPython test script start glue print P JPython test script end d 43 Chapter 11 Derived Metrics Sometimes metrics in a profile need to be combined to create a derived metric PerfExplorer allows the user to create these using the derived metric expression tab CreatingExpressions The text box at the top of the tab allows the user to enter an expression Double clicking on a metric in the Performance Data tree will copy that metrics name into the box If a metric contains any operands the whole metric must be surrounded by quotes If the you would like of the metric to be renamed then you should start the expression with the new name and and equals sign 11 1 If this is the only metric you wish to derive then select the t
54. erbose tau EGIN INSTRUMENT SECTION oops routine j4 ND INSTRUMENT SECTION o dp oe CT CC Flr UJ un WU t path opt apps tau tau 2 17 1 x86 64 bin Spath make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh setenv TAU METRICS GET TIME OF DAYWV PAPI FP INS qsub run job paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk Choose Options gt Show Derived Panel gt Arg 1 PAPI FP INS Arg 2 GET TIME OF DAY Operation Divide gt Apply close 6 4 Q Who calls MPI Barrier Where A Create a callpath profile with given depth o o op oe oe oe Figure 6 4 Callpath Profile Call Graph for n c t 0 0 0 tmp private File Options Windows Help Here is how to generate a callpath profile with MPI setenv TAU MAKEFILE opt apps tau tau 2 17 1 x86 64 lib Makefile tau mpi pdt set path opt apps tau tau 2 17 1 x86 64 bin path make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh AP o oe oe 13 Some Common Application Scenario setenv TAU CALLPATH 1 setenv TAU CALLPATH DEPTH 100 oo ap qsub run job paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk Windows Thread gt Call Graph o oe 6 5 Q How do instrument Python Code A Create an python wrapper library Here to instrument python code
55. es to 2 million and 5 microseconds per call Functions that are throttled are marked explicitly in there names as THROTTLED 2 3 Profiling each event callpath You can enable callpath profiling by setting the environment variable TAU_CALLPATH In this mode TAU will recorded the each event callpath to the depth set by the TAU_CALLPATH_DEPTH environ ment variable default is two Because instrumentation overhead will increase with the depth of the call path you should use the shortest call path that is sufficient Profiling 2 4 Using Hardware Counters for Measurement Performance counters exist on many modern microprocessors They can count hardware performance events such as cache misses floating point operations etc while the program executes on the processor The Performance Data Standard and API PAPI http icl cs utk edu papi pack age provides a uniform interface to access these performance counters To use these counters you must first find out which PAPI events your system supports To do so type gt papi avail Available events and hardware information Vendor string and code Model string and code CPU Revision CPU Megahertz CPU s in this Node Nodes in this System Total CPU s Number Hardware Counters Max Multiplex Counters AuthenticAMD 2 AMD K8 Revision C 15 2 000000 2592 695068 4 1 4 4 32 The following correspond to fields in the PAPI event info t structure Na
56. eu E reed 17 5 5 Correlation Example lito o e Ee te Dee eo ges he one se Geet 18 6 1 Setting Group ot Interest eee e epe terius doa DESEE Ey ce v ordus koitu ds 20 6 2 Setting Metric of Interest ettet Pee Pe EE ree ERU Een ETT terrier 20 6 3 Setting Event of Interest sicci ike lerne enne E pe EE p e er Yep pe rint nues 20 6 4 Setting Timesteps i ee Peter rre k it kase Pope toit ENEE EE IR Pre Re 21 6 5 Timesteps per Second esee tete tee ede eee neret oue diee E certa ge e Een 21 6 6 Relative Efficiency scooter order PO brotes 22 6 7 Relative Efficiency by Event spss poynes er EEA EE EAEE S t tra nntenee nen pn ees 23 6 8 Relative Efficiency one Event 23 6 9 Relative Speedup seisid ient tech ente iege epic qe dee ERE SOR URS eee ee Deeg e de 24 6 10 Relative Speedup by Event 24 6 11 Relative Speedup one Event iteereeeneeeeeeeeneeenee enne enne nene nene nenn nene 25 6 12 Group of Total Runtime sese m HII mH mem meme rhe reete 26 6 13 Runtime Breakdown 7 eiecti eer Pre rre EU ehe Org Pe ie IER E Ee 26 6 14 Relative Efficiency per Phase 27 6 15 Relative Speedup per Phase ere te Pr Re rore EX EPIS DOR eT Reg RR 27 6 16 Phase Fraction of Total Runtime esses I eme eere ree 28 7 1 The Custom Charts Interface i teet mter eR ERR Ree EEN 29 8 1 3D Visualization of multivariate data 31 8 2 Data Summary Window mimiireeneen
57. f interest P WALL CLOCK TIME 6 1 3 Event of Interest Some charts examine events in isolation This setting configures which event to examine Figure 6 3 Setting Event of Interest 20 Charts OOO Event of interest Please enter the event of interest Coll Goll er 0 10 eration 0 keration 0 gt Coll lieratiori 0 gt Coll_tr keration gt 1 0 keration gt NI Iteration Q gt NL_tr 4 lteratiori gt extras Cancel 6 1 4 Total Number of Timesteps One chart the Timesteps per second chart will calculate the number of timesteps completed per second This setting configures that value Figure 6 4 Setting Timesteps eoo Total Timesteps Please enter the total number of t esteps for the experiment Cancel x 6 2 Standard Chart Types 6 2 1 Timesteps Per Second The Timesteps Per Second chart shows how an application scales as it relates to time to solution If the timesteps are not already set you will be prompted to enter the total number of timesteps in the trial see Section 6 1 4 Total Number of Timesteps If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 5 Timesteps per Second 21
58. f runtime Event IQR Boxplot with Outliers Value I DIFUZE W DINTRF SW INTERF SPPM 8 4 Creating a Histogram In order to see a histogram summary of the performance data in the database select the Create Histo gram item under the Visualization main menu item Figure 8 4 Histogram 33 Visualization bra TAUJ PerfExplorer Distributions of Significant Events File Help TAU PerfExplorer Significant gt 2 0 of runtime Event Histograms 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 00 0 05 0 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0 50 0 55 0 60 0 65 0 70 0 75 0 80 0 85 0 90 0 95 1 0 Percentiles m DIFUZE E DINTRF S INTERF SPPM 8 5 Creating a Normal Probability Chart In order to see a normal probability summary of the performance data in the database select the Create NormalProbability item under the Visualization main menu item Figure 8 5 Normal Probability 34 Visualization v TAU PerfExplorer Normal Probability Plot File Help Normal Probability Plot 2 34 Ordered Measurements 4 5 6 3 0 2 5 2 0 1 5 1 0 0 5 0 0 0 5 1 0 1 5 2 0 2 5 Normal N 0 1 Ordered Statistic Medians W DIFUZE M DINTRF SS INTERF SPPM W Ideal normal 35 Chapter 9 Views Often times data is loaded into the database with multiple parametric cross sections For exampl
59. fies that PAPI LD INS and PAPI SR INS are compatible metrics Next make sure that you are using a makefile with papi in its name Then set the environment variable TAU METRICS to a colon delimited list of PAPI metrics you would like to use setenv TAU METRICS PAPI FP OPSN PAPI L1 DCH In addition to PAPI counters we support TIME via unix gettimeofday On Linux and CrayCNL sys tems we provide the high resolution LINUXTIMERS metric and on BGL BGP systems we provide BGLTIMERS and BGPTIMERS Chapter 3 Tracing Typically profiling shows the distribution of execution time across routines It can show the code loca tions associated with specific bottlenecks but it can not show the temporal aspect of performance vari ations Tracing the execution of a parallel program shows when and where an event occurred in terms of the process that executed it and the location in the source code This chapter discusses how TAU can be used to generate event traces 3 1 Generating Event Traces To enable tracing with TAU set the environment variable TAU_TRACE to 1 Similarly you can enable disable profile with the TAU PROFILE variable Just like with profiling you can set the output direct ory with a environment variable setenv TRACEDIR users sameer tracedata experiment56 This will generate a trace file and an event file for each processor To merge these files use the tau treemerge pl script If you want to convert TAU tr
60. geTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI FP INS PAPI TOT INS derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 print derived metrics f get the Statistics dostats BasicStatisticsOperation extracted False stats dostats processData print got stats return for metric in stats get 0 getMetrics grapher DrawMMMGraph stats metrics HashSet metrics add metric grapher set_metrics metrics grapher setTitle GTC Phase Breakdown metric grapher setSeriesType DrawMMMGraph TRIALNAME grapher setCategoryType DrawMMMGraph EVENTNAM grapher setValueType AbstractResult INCLUSIVE grapher setXAxisLabel Iteration grapher setYAxisLabel Inclusive metric grapher setLogYAxis True grapher processData lt Hs graph the significant events in the iteration subsetevents ArrayList 42 Running PerfExplorer Scripts subsetevents add CHARGEI subsetevents add PUSHI subsetevents add SHIFTI print got data for subsetevent in subsetevents events ArrayList for event in resultl getEvents if event find Iteration gt 0 and event rfind subseteven events add event extractor ExtractEv
61. hart will include the value 0 When deselected the chart 29 Custom Charts will only show the relevant values for all data points e Chart Title value to use for the chart title Series Name Value the field to be used to group the data points as a series X Axis Value the field to use as the X axis value X Axis Name the name to put in the chart for the value along the X axis e Y Axis Value the field to use as the Y axis value Y Axis Name the name to put in the chart for the value along the X axis Dimension Reduction whether or not to use dimension reduction This is only applicable when Main Only is disabled e Cutoff when the Dimension Reduction is enabled the cutoff value for selecting All Events Metric The metric of interest for the Y axis Units The unit to be selected for the Y axis Event The event of interest or All Events XML Field When the X or Y axis is selected to be an XML field this is the field of interest e Apply build the chart Reset restore the options back to the default values When the chart is generated it can be saved as a vector image by selecting File gt Save As Vector Im age The chart can also be saved as a PNG by right clicking on the chart and selecting Save As 30 Chapter 8 Visualization Under the Visualization main menu item there are five types of raw data visualization The five items are 3D Visualization
62. hee ENEE REENEN 20 3 1 Function Bar Graph 1 nce dEEEE NEEN cela ENEE 20 5 2 Function Histogram A ien e ER ERNEIEREN 20 M Phase Based Displays cm 22 6 1 Using Phase Based Displays sse HH 22 Comparative Analysis c ect tiet ioo eee dee o extre tese tee tee eg ees 24 7 1 Using Comparitive Analysis sse 24 8 Miscellaneous Displays reet eee ete ree deed ege edd pe Suan serve deep eroe Toa Ue seems 26 8 1 User Event Bar Graph ee eee esee NEE e ep det ecd 26 8 2 RT 26 82 1 Function Ledger nee ee a pes 26 8 2 2 Group Ledger sesini ones sets EE erem MR rrr RP REPERI 27 8 2 3 User Event Ledger nee eese eee ripe nier ede 27 8 3 Selective Instrumentation File Generator 28 OV EE 29 9 1 Preferences EE 29 9 2 Default Colors 4 ciet sortir eoe ed vo Duende eo ique dore 30 9 3 Color Map a o e as e eR e USE da pee vus 30 iv List of Figures ParaProf Manager Window i srs ertet pet eP ret eb EE PI REESE RR EENS 8 Loading Profile Data edd tes ede feretur EENEG rye re Eden 8 Creating Derived Metrics tuit e CREE eerte canes Pepe te aceites 9 Main Data Window Rien Inr EIS EI ARN EE SEN Ek Une 10 Unstacked Bats dE NEE e Lulu a NENNEN Ee 10 ErianglesMesh Plot shakes ie o eons oct tero Egal ed euer One haw aes ene ab te VERSES eve PP OO e 11 3 D Mesh Plot E 11 3 D Scattet A dee eege dee pope ee deret ode s oo Kuer 12 s 3D Scatter Plot noce eee ele
63. his AL et IS e dE 14 4 2 Thread Statistics Text Window esses eene eene hne nenne nennen 14 4 3 Thread Statistics Table inclusive and exclusive ooococccccncocococononnnonononononononononononono 15 4 4 Thread Statistics Tablet irte iet eei ed eoe Re EIE Ee SEA abe 15 4 5 Thread Statistics Table ree Re er re eue exe eee E den dee 16 4 6 Call Graph Window one Iter teer Eee RE ENEE MEER EE Ru 16 4 7 Thread Call Path Relations Window ereerererevereneveveneveneneneeeneneneneneeenen 17 4 8 User Event Statistics WindOW eee EH FERRI re E MER Pe Der EE TERR 18 4 9 User Event Thread Bar Chart Window sseseeee HH 18 5 1 Function Bar Graph oni nr Doo Ree Deren tetur etie ker 20 5 2 d ruritemsikiubu get Em 20 6 1 Inttial Phase Display ane nuku tunta e SOR REA ee adosado 22 6 2 Phas Ledger TEE 22 6 3 Function Data over Phases 23 7 1 Comparison Window initial erererrrereverenevneenenenenene ee hene nhe nennen 24 7 2 Comparison Window 2 trials o ooccoonocononccononncononocononoconoroconoroconoroconororonanoconanoronos 24 7 3 Comparison Window 3 threads essene e ia R EE E EEEE o EEEE 25 8 1 User Event Bar Graph 1 eec eese reve EEEE EEEE Ye is 26 8 2 Function Ledger eenegen re e PETRO e soot teats sensu de POUR PE EE 26 8 3 Group Led ger isesi ere s T s eE eee ipln ANERER ENEE EEN 27 8 4 User Event Ledger ue cate em pee erem
64. iles 2 3 Database Interaction Database interaction is done through the tree view of the ParaProf Manager Window Applications ex pand to Experiments Experiments to Trials and Trials are loaded directly into ParaProf just as if they were read off disk Additionally the meta data associated with each element is show on the right as in Figure 2 1 ParaProf Manager Window A trial can be exported by right clicking on it and selecting Export as Packed Profile New trials can be uploaded to the database by either right clicking on an entity in the database and se lecting Add Trial or by right clicking on an Application Experiment Trial hierarchy from the Stand ard Applications and selecting Upload Application Experiment Trial to DB 2 4 Creating Derived Metrics ParaProf can created derived metrics using the Derived Metric Panel available from the Options menu of the ParaProf Manager Window Figure 2 3 Creating Derived Metrics x ParaProf Manager 0 x File Options Help Applications Field Value o ES Standard Applications Name multi mpilieb amorris home 7 Default App Application ID 10 CH Default Exp Experiment ID jo rial ID 0 o C multi mpilieb amorris home Trial ID O PAPI FP INS PAPI_L1_DCM GET_TIME_OF_DAY PAPI_FP_INS GET TIME OF DAY gt 7 Runtime Applications CH DB Applications Argument 1 0 0 0 0 PAPI FP INS Argument 2 0 0 0 2
65. ion between the expected scaling and the actual scaling is the relative efficiency If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart 22 Charts select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 7 Relative Efficiency by Event eoe Relative Efficiency by Event Relativ 145 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 85 0 60 Y oso 0 45 1 0 40 0 35 0 30 0 25 020 0 15 0 40 0 05 6 00 i eet 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors B Coll Coll tr AVO NL Mk Y eias fied lin RHS I other 6 2 4 Relative Efficiency for One Event The Relative Efficiency for One Event chart shows how one event from an application scales with re spect to relative efficiency That is as the number of processors increases by a factor the time to solu tion is expected to decrease by the same factor with ideal scaling The fraction between the expected scaling and the actual scaling is the relative efficiency If there is more than one event to choose from and you have not yet selected an event of interest you may be prompted to select the event of interest see Section 6 1 3 Event of Interest If there is more than one metric to choo
66. ion is the process of describing the similarities within and dissimilarities between the discovered clusters Association is the process of finding relationships in the data One such method of association is regression analysis the process of finding independent and dependent correlated variables in the data In addition comparative analysis extends these operations to compare results from different experi ments for instance as part of a parametric study In addition to the data mining operations available the user may optionally choose to perform comparat ive analysis The types of charts available include time steps per second relative efficiency and speedup of the entire application relative efficiency and speedup of one event relative efficiency and speedup for all events relative efficiency and speedup for all phases and runtime breakdown of the application by event or by phase In addition when the events are grouped together such as in the case of commu nication routines yet another chart shows the percentage of total runtime spent in that group of events These analyses can be conducted across different combinations of parallel profiles and across phases within an execution Chapter 2 Installation and Configuration PerfExplorer uses PerfDMF databases so if you have not already you will need to install PerfDMF see 222 To configure PerfExplorer move to the tools src Perfl distribution Type gt configure If y
67. it Makefile and change F90 tau f90 sh run job o o oe oe paraprof pack app ppk Move the app ppk file to your desktop paraprof app ppk 6 3 Q What MFlops am getting in all loops A Create a flat profile with PAPI_FP_INS OPS and time with loop instrumentation Figure 6 3 MFlops per loop Metric PAPI FP INS GET TIME OF DAY Value Exclusive Units Derived metric shown in microseconds format 770 699 UC Loop MULTIPLY MATRICES fmatmult f90 31 9 36 14 223 39 ko Loop INITIALIZE matmull f90 10 9 14 14 223 24 gt Loop INITIALIZE matmult f90 17 8 21 14 171 855 Loop MAIN matmult f20 71 9 74 14 170 862 Esc Loop MAIN matmult f20 112 9 115 14 122 96 Loop MAIN matmult f90 117 9 4128 14 37 549 MULTIPLY_MATRICES 21 367 B INITIALIZE 13 785 Loop MAIN fmatmull f90 86 9 106 143 11 MPI Comm geen 8 935 Loop MAIN fmatmull r90 77 9 84 14 1 131 MPI Send 0 794 MPI Comm rank 0 647 MPI_Bcast 0 355 MPI Recv 0 171 MPI Barrier 0 115 MPI Finalize 0 023 MAI Here is how to generate a flat profile with FP operations 12 E opt apps tau tau 2 17 1 x86 64 lib Makefil optTauSelectFile select tau optVerbose tau mpi pdt Some Common Application Scenario setenv TA setenv TA cat selec MAKEFILE opt apps tau tau 2 17 1 x86 64 lib Makefile tau papi mpi pd OPTIONS optTauSelectFile select tau optV
68. me Code PAPI L1 DCM 0x80000000 0x80000001 PAPI L1 ICM Next to test the papi event chooser compatibility between Avail Deriv Description Note Yes Yes Level 1 data cache misses Yes Yes Level 1 instruction cache misses each metric you wish papi to profile use papi utils gt papi_event_chooser PAPI_LD_INS PAPI_SR_INS PAPI_L1_DCM Test case eventChooser events Available events which can be added with given Vendor string and code del string and code U Revision U Megahertz U s in this Node des in this System tal CPU s H 2923000 GJO O O Og O mber Hardware Counters ax Multiplex Counters GenuineIntel 1 Itanium 2 2 1 000000 1500 000000 16 1 16 4 32 Event PAPI L1 DCM can t be counted with others Here the event chooser tells us that PAPI LD INS PAPI SR INS and PAPI L1 DCM are incompat ible metrics Let try again this time removing PAPI LI DCM papi utils papi event chooser PAPI LD INS PAPI SR INS Test case eventChooser Available events which can be added with given events Vendor string and code GenuineIntel 1 Profiling Model string and code CPU Revision CPU Megahertz CPU s in this Node Nodes in this System Total CPU s Number Hardware Counters Max Multiplex Counters It l1 LS 16 1 16 4 32 anium 2 2 000000 00 000000 Usage eventChooser NATIV E PRESET evtl evet2 Here the event chooser veri
69. n Application 1 3 1 Custom Profiling TAU allows you to customize the instrumentation of a program by using a selective instrumentation file Tau Instrumentation This instrumentation file is used to manually control which parts of the application are profiled and how they are profiled If you are using one of the TAU compiler wrapper scripts to instrument your applica tion you can use the tau options optTauSelectFile lt file gt option to enable selective instrumentation Note Selective instrumentation is only available when using source level instrumentation PDT To specify a selective instrumentation file create a text file and use the following guide to fill it in e Wildcards for routine names are specified with the mark because symbols show up in routine signatures The mark is unfortunately the comment character as well so to specify a leading wild card place the entry in quotes e Wildcards for file names are specified with symbols Here is a example file Tell tau to not profile these functions BEGIN EXCLUDE LIST Li void quicksort int int int The next line excludes all functions beginning with sort and having arguments int void sort int zi void interchange int int END EXCLUDE LIST hese files from profiling EXCLUDE LIST xclude E W 3k je nj H pr 1 ct q z P nj H EA zj EXCLUDE LIST
70. ncreases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 6 14 Relative Efficiency per Phase e ee Relative Efficiency by Event Relative hele 1 05 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 65 0 60 0 55 0 50 0 45 0 40 0 35 0 30 0 25 0 20 0 15 0 10 0 05 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E lteraton 0 iteration 1 A iteraton 3 iteration 3 Iteration 4 Y iteration 5 iteration 6 fieration 7 iteration 8 lt iteratoni Relative Speedup per Phase The Relative Speedup Per Phase chart shows the relative speedup for each phase as the number of pro cessors increases If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one experiment or view and select this chart item under the Charts main menu item Figure 6 15 Relative Speedup per Phase 27 Charts Relative Speedup by Event dup by Pla stabnl2 cher for gyro 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E iteraton D Iteration 1 4 Hermtbon 2
71. nction met ric This view allows you to discern clustering of values and relationships between functions across threads Select functions using the button for each dimension then select a metric A single function across 4 metrics could be used for example 3 4 3 D Scatter Plot Figure 3 4 3 D Scatter Plot 12 3 D Visualization If the loaded profile is a cube file or a profile from a BGB then this visualization option is available This visualizations groups the threads in two or three dimensional space using topology information sup plied by the profile The right side bar provides options to manipulate the visualization You can select different metrics or topologies The sliders toward the top allow you to select the range of points to show 13 Chapter A Thread Based Displays ParaProf displays several windows that show data for one thread of execution In addition to per thread values the users may also select mean or standard deviation as the thread to display In this mode the mean or standard deviation of the values across the threads will be used as the value 4 1 Thread Bar Graph Figure 4 1 Thread Bar Graph MS n c t 0 0 0 Application 13 Experiment 23 Trial 58 File Options Windows Help Metric Time Value Exclusive Units seconds 13 878 e MPR 4 983 a ML ng 1483 e bits 1397 ed rhs 1 287 END buts 1 069 E MPI Wait0 0 867 E MPI Sendo 0 769
72. neeeneneeeneeeneeeneeeeeee enne EE enam neem nee nee 32 8 3 BoXCliatt oit tn eee SEAT EO TE 32 8 4 Hhstogram ios ed UE IS ree oe UR HEUS 33 8 3 Normal Probability coves r sunset rete romero EP Pe toe het eios tenore Cor Sesh ene esters 34 9 1 Potential scalability data organized as a parametric study oocooccnnccnnconnconconaconocnnicnninnno 36 9 2 Selecting artable dL I EE 36 9 3 Selecting a column eer tee plane oust eee ipe ee EET E des 37 HA Selecting an Operator cies ess ee uet ere ENEE NEEN RE libe oe oil seinad viga s e 37 9 5 Selecting a valle genee eie ie aerea rentals du K ue te 37 9 6 Entering a name for the View i ete tee rei te hai et o Pidan 37 9 7 The completed VIEW six siim etre ere NENNEN SO VINN SEE SEPET RESTENE EEEIEE NES 38 9 8 Selecting the base View enit ern Re De RR D Ret rete bx EPIS DP rete Reg 38 9 9 Completed SUb VIeWS 5 1 eerte tee Genie descen ja rider festa ket 39 vi List of Tables 1 1 Different methods of instrumenting applications errerereeeeeneeneeneeeeenee eee Vil TAU Performance System is a portable profiling and tracing toolkit for performance analysis of paral lel programs written in Fortran C C Java and Python TAU Tuning and Analysis Utilities is cap able of gathering performance information through instrumentation of functions methods basic blocks and statements The TAU API also provides selection of profiling grou
73. neous Simulation on utonil SE CJ Heterogenous simulation on neuror a euroni m ocracoke 440d e ocracoke noinline 4 ocracoke nostrict gt ocracoke strict There are a number of controls for the cusotom charts They are Main Only When selected only the main event the event with the highest inclusive value will be selected When deselected the Events control see below is activated and one or all events can be selected e Call Paths When selected callpath events will be available in the Events control see below Log Y When selected the Y axis will be the log of the true value e Scalability When selected the chart will be interpreted as a speedup chart The trial with the fewest number of threads of execution will be considered the baseline trial Efficiency When selected the chart will be interpreted as a relative efficiency chart The trial with the fewest number of threads of execution will be considered the baseline trial e Strong Scaling When deselected the speedup or efficiency chart will be interpreted as a strong scaling study the workload is the same for all trials When selected the button will change to Weak Scaling and the chart will be interpreted as a weak scaling study the workload is propor tional to the total number of threads in each trial Horizontal when selected the chart X and Y axes will be swapped Show Y Axis Zero when selected the c
74. nodes 390 281600 4 53576 94001 Message size sent to all nodes 214 24 4 12 43 7 237 Message size for gather 181 112 4 23 823 40 191 Message size for reduce This display shows a pprof style text view of the user event data Right clicking on a User Event will give you the option to open a Bar Graph for that particular User Event across all threads See Sec tion 8 1 User Event Bar Graph 4 7 User Event Thread Bar Chart Figure 4 9 User Event Thread Bar Chart Window 18 Thread Based Displays X User Event Window flashmemory ppk amorris home DEES File Options Windows Help Thread n c t 0 0 0 Value Type Max Value 22528 mmm Message size sent to all nodes 16328 e Message size received from all nodes 2023 3 DBASETREE DBASENEIGHBORBLOCKLIST Heap Memory KB 2023 3 AMR GUARDCELL C TO F Heap Memory KB E ee 4400 El Message size for broadcast z 2055 3 EH MPLSsend Heap Memory KB 2055 3 ER ME Waitany Heap Memory KB m 2047 9 LL LOGFILE BREAK LOGFILE Heap Memory KB 2047 9 ET LOGFILE CLOSE LOGFILE Heap Memory KB 2041 1 ES MPI WaitallQ Heap Memory KB 2041 1 bs MPLirecv Heap Memory KB 2039 6 ed LOGFILE OPEN LOGFILE Heap Memory KB 2039 5 CT LOGFILE WRITE_PERFMON_SUMMARY Heap Memory KB 2032 2 mum CURRENT DATE TIME Heap Memory KB 2029 1 oo MPI Type free Heap Memory KB 2029 El MPI Type commit Heap Memo
75. og will appear Figure 4 1 Selecting a dimension reduction method O O Dimension Reduction Select a dimension reduction method f t Over X Percent gt AR Cancel DE Select Over X Percent The following dialog will appear Figure 4 2 Entering a minimum threshold for exclusive percentage eoo Minimum Percentage Enter a value for example 1 4 2 Max Number of Clusters By default PerfExplorer will attempt k means clustering with values of k from 2 to 10 To change the maximum number of clusters select the Set Maximum Number of Clusters item under the Analysis Cluster Analysis main menu item The following dialog will appear Figure 4 3 Entering a maximum number of clusters eoo Max Clusters Enter the max number of clusters lt 10 E Cancel Co 4 3 Performing Cluster Analysis To perform cluster analysis you first need to select a metric To select a metric navigate through the tree of applications experiments and trials and expand the trial of interest showing the available met rics as shown in the figure below Figure 4 4 Selecting a Metric to Cluster eoe ParfExplorer Client File Analysis Views Charts Visualization Help O Cluster Results Q Correlation Results gt L3 BigScience gt 3 CFDSHIP Field Value gt 3 LAMMPS large scale Atomic Molecula f Name P WALL CLOCK TIME gt 3 Miranda Metric ID 1270 f
76. ou haven t already done so for other TAU tools add path your path The following command line options are available to configure Explorer directory in you TAU to tau tau2 apple bin to 2 1 Available configuration options engine analysis engine Specifies the data mining engine to use The supported options i rroot directory nclude weka and R Specifies the directory where R is installed Specifically it should be the directory where the bin include lib library and share directories are located e objectport available network port Specifies the port that the PerfExplorer server should use when running PerfExplorer in client serv er mode Select an available network port and make sure that other appropriate network configura tions are made firewalls etc The default port is 9999 e registryport available network port Specifies the port that the rmiregistry should use when ruining PerfExplorer in client server mode Select an available network port and make sure that other appropriate network configurations are made firewalls etc The default port is 1099 bd Server lt server name Specifies the fully qualified domain name of the server where PerfExplorer is run when running PerfExplorer in client server mode Chapter 3 Running PerfExplorer To run PerfExplorer type perfexplorer When PerfExplorer loads you will see on the left window all the e
77. own in the upper right The Window defaults section will determine the initial settings for new windows You may change the initial units selection and whether you want values displayed as percentages or as raw values The Settings section controls the following Show Path Title in Reverse Path title will normally be shown in normal order home amorris data etc They can be reverse using this option etc data amorris home This only affects loaded trials and the titlebars of new windows Reverse Call Paths This option will immediately change the display of all callpath functions between Root gt Leaf and Leaf lt Root Statistics Computation Turning this option on causes the mean computation to take the sum of value for a function across all threads and divide it by the total number of threads With this option off the sum will only be divided by the number of threads that actively participated in the sum This way the user can control whether or not threads which do not call a particular function are consider as a O in the computation of statistics Generate Reverse Calltree Data This option will enable the generation of reverse callpath data ne 29 Preferences cessary for the reverse callpath option of the statistics tree table window Show Source Locations This option will enable the display of source code locations in event names 9 2 Default Colors Figure 9 2 Edit Default Colors
78. pp slog2 pshot app slog2 R 2otf tau trc tau edf app otf n 4 z reams compressed output trace vam 2 pir app otf or vng client with vngd server 6 7 Q How does my application scale A Examine profiles in PerfExplorer Figure 6 6 Scalability chart ooo X TAU PerfExplorer Relative Speedup File Help Relative Speedup S3D Jaguar ORNL Harness Scaling Study GET TIME OF DAY 12 000 11 000 10 000 9 000 8 000 7 000 Value 6 000 5 000 4 000 3 000 2 000 1 000 0 1 000 2 000 3 000 4 000 5 000 6 000 7 000 8 000 9 000 10 000 11 000 12 000 Number of Processors m Harness Scaling Study e ideal 15 Some Common Application Scenario How to examine a series of profiles in PerfExplorer setenv TAU MAKEFILE opt apps tau tau 2 17 1 x86 64 lib Makefile tau mpi pdt set path opt apps tau tau 2 17 1 x86 64 bin Spath make F90 tau f90 sh Or edit Makefile and change F90 tau f90 sh qsub runlp job paraprof pack lp ppk qsub run2p job paraprof pack 2p ppk and so on On your client perfdmf configure Choose derby blank user password yes to save password defaults perfexplorer configure Yes to load schema defaults paraprof load each trial Right click on trial gt Upload trial to DB perfexplorer Charts Speedup AP oe oe o oe oe oe 16 ParaProf User s Manual University of Oregon Pa
79. ps for organizing and controlling instrumentation Calls to the TAU API are made by probes inserted into the execution of the application via source transformation compiler directives or by library interposition This guide is organized into different sections Readers wanting to get started right way can skip to the Common Profile Requests section for step by step instructions for obtaining difference kinds of per formance data Or browse the starters guide for a quick reference to common TAU commands and vari ables TAU can be found on the web at http tau uoregon edu viii Chapter 1 Tau Instrumentation TAU provides three methods to track the performance of your application Binary rewriting using Dyn inst compiler directives or source transformation using PDT Most projects need a comprehensive pic ture of where time is spent The TAU Compiler provides a simple way to automatically instrument an entire project The TAU Compiler can be used on C C fixed form Fortran and free form Fortran Here is a table that lists the features requirement for each method Table 1 1 Different methods of instrumenting applications Method Requires Requires Shows Routine Low level Throttling Ability to recompil PDT MPI level event events to reduce exclude file ing events loops overhead from in phases strumenta etc tion Binary re Yes Yes Yes writing Compiler Yes Yes Yes Yes Yes Source Yes Yes Yes Ye
80. r TAU distribution for example x86 64 lib Makefile tau mpi pdt You can also use a Makefile specified in an environment variable To run tau cc sh so it uses the Make file specified by environment variable TAU MAKEFILE type export TAU MAKEFILE path to tau arch lib makefile gt export TAU OPTIONS optCompInst gt tau cc sh sampleCprogram c Similarly if you want to set compile time options like selective instrumentation you can use the TAU OPTIONS environment variable Source Based Instrumentation TAU provides these scripts tau f90 sh tau cc sh and tau cxx sh to instrument and compile Fortran C and C programs respectively You might use tau cc sh to compile a C program by typing gt module load tau gt tau cc sh samplecprogram c When setting the TAU MAKEFILE make sure the Makefile name contains pdt because you will need a version of TAU built with PDT A list of options for the TAU compiler scripts can be found by typing man tau compiler shorin this chapter of the reference guide Options to TAU compiler scripts These are some commonly used options available to the TAU compiler scripts Either set them via the TAU OPTIONS environment variable or the tau options option to tau f90 sh tau cc sh or tau cxx sh optVerbose Enable verbose output default on optKeepFiles Do not remove intermediate files optShared Use shared library of TAU 1 3 Selectively Profiling a
81. raProf User s Manual by University of Oregon Published TBA Copyright O 2005 2010 University of Oregon Performance Research Lab Table of Contents T Introduction id 6 1 1 Using ParaProf from the command line 6 1 2 Supported Formats diete eren nist iere E EN 6 1 3 Command ine Options hill eee ge EAE ace ds 7 2 Protile Data M nagerent 2 acess sce eese ges EAR dE RENE piven des pb ee eee po e Eege ER rece dch 8 2 1 ParaProf Manager Window sse Hmm enr 8 2 2 Loading Probes oer per eet er deste eler dest 8 2 3 Database Interaction aii it REDE EGRE eU 9 2 4 Creating Derived Metrics reed rette red Ete rr rm EEN 9 2 5 Main Data Wgd ssec O 9 3 3 D Visualization ret eto e PEE T AEN 11 3 1 Triangle Mesh Plot eher EEN 11 3 2 3 D Bar Plot a rer err dee Ee dese 11 3 3 3 D Scatter Plot ae aser eget e Te eaae E SENEI 12 314 3 D Scatter Plot ii ER REN e SS 12 4 Thread Based Displays tra iaa 14 4 1 Thread Barradas sdai EE 14 4 2 Thread Statistics Text Window eese em eee 14 4 3 Thread Statistics Table iia e eet 15 4 4 Call Graph Window 55554485 eee suena seats ber reo Ee en reet obere See Eed 16 4 5 Thread Call Path Relations Window mtteeeeeeeneeeneeenen een 17 4 6 User Event Statistics Window sese He 18 4 7 User Event Thread Bar Chat 18 5 Function Based Displays inisieer ursis ioaea rro rte reor
82. raProf Manager Window Figure 2 1 ParaProf Manager Window X ParaProt Manager inj x File Options Help Applications o E Standard Applications o EI Default App 9 7 Default Exp 9 7 uintah16 ppk packed data PAPI_FP_INS P_WALL_CLOCK_TIME PAPI TOT CYC PAPI_L1_DCM P VIRTUAL TIME gt CH Runtime Applications 9 CC DB Applications gt CI AORSA2D 7 Basic run time profiling for Socorro CH Gyro 4 gyro B1 std E Heap memory management for Socorro E hydroshock Ei MFIX 5 mpiP data 4 New Application CA PNEO E Por gt 53D Field Name AORSA2D Application ID EA version description language paradigm usage text execution options userdata 4797979797979 E OR 79 This window is used to manage profile data The user can upload download profile data edit meta data launch visual displays export data derive new metrics etc 2 2 Loading Profiles To load profile data select File gt Open or right click on the Application s tree and select Add Trial Figure 2 2 Loading Profile Data X Load Trial DOE Trial Type Tau profiles zl Select Directory Jhome amorris cane a Profile Data Management Select the type of data from the Trial Type drop down box For TAU Profiles select a directory for other types f
83. rial expression or application where the metric should be derived and then click apply If you wish to derive many metrics then click Add to List and create more expressions Selecting Expressions If you have added multiple expressions you can select one or many of them to apply They will be de rived from top to bottom After you have select some you can select the trial experiment or application to apply the expression to and then click apply Expression Files You can also derive metrics using an expression file An expression file has a single expression per line To parse the file select the trial experiment or application to apply the expressions to then select File gt Parse Expression File and chose the file 11 2 11 3 44
84. rr te EES Ne 27 8 5 Selective Instrumentation Dialog mmereereeeneeeeeeeenee enne eee 28 9 ParaProf Preferences Window 2 35 EES RER p ERR RE ELSE ERE Pa RT EAA 29 9 2 Edit Default Color unto e sks ise peni laupa kustus e So UP o E Se Sk 30 9 3 Color EE 30 4 1 Selecting a dimension reduction method meireereereeeneeeneneeen sean nene 9 4 2 Entering a minimum threshold for exclusive percentage eseee 9 4 3 Entering a maximum number of clusters esee HH 10 4 4 Selecting a Metric to Cluster 10 4 5 Confirm Clustering Options 455444 seostus rete SERE EEN E Res bg e EIE Er EO DER REIR kak 10 4 0 Cluster MC 11 4 7 Cluster Membership Histogram issnin ossi neret rtg eee isse E Tate o yi asie 11 4 8 Cluster Membership Scatterplot ooocoocccnccnnccnnconnconnccnnconoconnconconcnnnconcnnncnnnnnnronose 12 4 9 Cluster Virtual Topology scisso metre ses rias EUR te REIR ERE deca as 13 4 10 Cluster Average Behavior viirnienneennenteeneeeneeeneee een eeee enne nennen 14 5 1 Selecting a dimension reduction method essssese 16 TAU User Guide 5 2 Entering a minimum threshold for exclusive percentage ooccooccnnconnconoconoconacnnononicnninnn 16 25 3 Selects Metric to Cluster 3 5 it ex tbe eere oreet bee Eege ee 16 2 4 Correlation Kelte ees eet Ae deed aad nce i
85. ry KB 2028 8 EE MPI Type vector Heap Memory KB 2028 6 mu MPi isend0 Heap Memory KB 2028 6 mmm MOVE BLOCK Heap Memory KB 2028 6 ed MPI_AllreduceQ Heap Memory KB 2024 MPl Barrier Heap Memory KB 2023 9 EZ MPl Finalize Heap Memory KB 2023 8 LOGFILE LOGFILE WRITE STR Heap Memory KB 2023 8 Eu MESH FINALIZE Heap Memory KB 2023 4 E DBASEPROPERTIES DBASEPROPERTYINTEGER Heap Memory KB 2023 4 Eo PROFILE_FINALIZE Heap Memory KB 2023 3 LLL AMR DIAGONAL PATCH Heap Memory KB 2023 3 El AMR GUARDCELL CC C TO F Heap Memory KB 2023 3 El DBASETREE DBASELOCALBLOCKCOUNT Heap Memory KB es i 14 T il D This display shows a particular thread s user defined event statistics as a bar chart This is the same data from the Section 4 6 User Event Statistics Window in graphical form 19 Chapter 5 Function Based Displays ParaProf has two displays for showing a single function across all threads of execution This chapter de scribes the Function Bar Graph Window and the Function Histogram Window 5 1 Function Bar Graph Figure 5 1 Function Bar Graph X Function Data Window miranda16k ppk packed data o 1x File Options Windows Help Name MPI BarrierQ Metric Name Time falue Exclusive Units seconds 31917 e std dev 71 077 eg mean 120 61 e Ci 0 0 0 123 28 eg N C t 1 0 0
86. s Yes Yes Yes 1 1 Dyninst binary rewriting of applications This section will cover how to profile your application by rewriting your binary to inserted instrumenta tion The tau run script allows you to instrument an application binary using the Dyninst Tool TAU must be configured with Dyninst This feature allows instrumentation of already compiled executables without TAU having to edit the application s code To use tau run select the binary and use the o option to name the rewritten binary tau run o a inst a out mpirun np 4 a inst 1 2 TAU scripted compilation 1 2 1 For more detailed profiles TAU provides two means to compile your application with TAU through your compiler or through source transformation using PDT Compiler Based Instrumentation TAU provides these scripts tau f90 sh tau cc sh and tau cxx sh to instrument and compile Fortran C and C programs respectively You might use tau cc sh to compile a C program by typing module load tau gt tau cc sh tau options optCompInst samplecprogram c Tau Instrumentation On machines where a TAU module is not available you will need to set the tau makefile and or options The makefile and options controls how will TAU will compile you application Use gt tau_cc sh tau makefile path to makefile tau options option samplecprogram c The Makefile can be found in the arch 1ib directory of you
87. se from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 8 Relative Efficiency one Event eoe Relative Efficiency for Event 1 05 1 00 0 95 0 90 0 85 0 80 0 75 0 70 0 65 0 60 0 55 S 0 50 0 45 0 40 0 35 0 30 0 25 0 20 0 15 0 40 0 05 0 00 n 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 Number of Processors E BA atd nD cheelah affnosng 23 Charts 6 2 5 Relative Speedup The Relative Speedup chart shows how an application scales with respect to relative speedup That is as the number of processors increases by a factor the speedup is expected to increase by the same factor with ideal scaling The ideal speedup is charted along with the actual speedup for the application If there is more than one metric to choose from you may be prompted to select the metric of interest see Section 6 1 2 Metric of Interest To request this chart select one or more experiments or one view and select this chart item under the Charts main menu item Figure 6 9 Relative Speedup eoe Relative Speedup Relativ SAU e 370 11 stole Streng vell 30 0 27 5 1 25 0 22 5 375 w gt 450 12 5 75 50 254 Q0 25 50 75 100 125 150 175 200 225 250 275 25 350 375 4
88. stics n c t 0 0 0 depth200 mpilieb amorris home ML HE File Options Windows Help A al he rh OE Name A Time Calls Child Calls 9 Brain 9 Sg 1 2 997 S Mi Collectsolution darray darray Decomposition Grid 2 562 1 52 Bl CreateArray void darray int int 0 148 1 0 Bl Oumperror void darray darray 0 668 1 0 Finalize void darray darray Grid 0 834 1 4 init_darrays void darray darray Decomposition Grid 0 24 1 2 F Miteration 61 629 2 983 14 915 9 W Exchange void darray Decomposition Grid 94 62 5 966 11 932 BlMPI Reco 633 558 5 966 0 BlMPi Sendo 228 118 5 966 0 EI mPLAllreduceo 926 325 2 983 2 983 sweep double darray darray Decomposition 646 218 5 966 0 Wun Baier 1 338 z 0 Wun wmen 0 07 2 0 9 Mstarup int int char 5 65 1 8 o Wurt gcasto 2 791 1 1 WB MPi_cart_coordso 0 061 il oL o ME MPi_cart_createg 0 594 1 3 Wun Con nmn 0 087 1 0 WPi Comm ranko 0 054 2 RE The display can be used in one of two ways in inclusive exclusive mode both the inclusive and ex clusive values are shown for each path see Figure 4 3 Thread Statistics Table inclusive and exclus ive for an example When this option is off the inclusive value for a node is show when it is closed and the exclusive value is shown when it is open This allows the user to more easily see where the time is spent since the total time for the application will always be represented in one column See Figure 4 4 Thread S
89. ta from hpcquick Typically the user runs hpcrun then hpcquick on the res ulting binary file e ompP CSV format from the ompP OpenMP Profiler http www ompp tool com The user must use OMPP OUTFORMAT CVS 1 3 Command line options In addition to specifying the profile format the user can also specify the following options e fixnames Use the fixnames option for gprof When C and Fortran code are mixed the C routines have to be mapped to either function or function Strip the leading period or trailing underscore if it is there e pack file Rather than load the data and launch the GUI pack the data into the specified file e dump Rather than load the data and launch the GUI dump the data to TAU Profiles This can be used to convert supported formats to TAU Profiles e oss Outputs profile data in OSS Style Example Thread n c t 0 0 0 excl secs excl cum PAPI_TOT_CYC PAPI_FP_OPS calls function 0 005 56 0 56 0 13475345 4194518 1 doo 0 003 40 1 96 1 9682185 4205367 1 bar 0 3 6 99 7 229403 17445 1 baz 2 2E 05 0 3 100 0 14663 206 1 main e summary Output only summary information for OSS style output Chapter 2 Profile Data Management ParaProf uses PerfDMF to manage profile data This enables it to read the various profile formats as well as store and retrieve them from a database 2 1 ParaProf Manager Window Upon launching ParaProf the user is greeted with the Pa
90. tatistics Ta ble and Figure 4 5 Thread Statistics Table for examples This display also functions as a regular stat istics table without callpath data The data can be sorted by columns by clicking on the column heading When multiple metrics are available you can add and remove columns for the display using the menu 4 4 Call Graph Window Figure 4 6 Call Graph Window 16 Thread Based Displays This display shows callpath data in a graph using two metrics one determines the width the other the color The full name of the function as well as the two values color and width are displayed in a tooltip when hovering over a box By clicking on a box the actual ancestors and descendants for that function and their paths arrows will be highlighted with blue This allows you to see which functions are called X Mean Call Graph home amorris data tau mpilieb de pth200 a a x File Options Windows Help by which other functions since the interplay of multiple paths may obscure it 4 5 Thread Call Path Relations Window Figure 4 7 Thread Call Path Relations Window DC Call Path Data n c t 1 1 1 Application 18 Experiment 32 Trial 87 File Options Windows Help Sorted By Exclusive Units seconds Exclusive gt 14 934 9 051 gt 9 052 5 726 Metric Name GET TIME OF DAY Inclusive 9 051 9 052
91. ter loops memory file filename routine routine name Track memory io file filename routine routine name Track IO TAU PROFILE TAU TRAC Enable profiling and or tracing LH PROFILEDIR TRACEDIR Set profile trace output directory TAU CALLPATH 1 TAU CALLPATH DEPTH Enable Callpath profiling set callpath depth TAU THROTTLE 1 TAU THROTTLE NUMCALLS TAU THROTTLE Enable event throttling set number of call percall us threshold IO LE ERCALI TAU METRICS List of PAPI metrics to profile tau treemerge pl Merge traces to one file tau2otf tau2vtf tau2slog2 Trace conversion tools 10 Chapter 6 Some Common Application Scenario 6 1 Q What routines account for the most time How much A Create a flat profile with wallclock time Figure 6 1 Flat Profile Metric P VIRTUAL TIME Value Exclusive Units seconds 9647 318 C LEO IKSWEEPT 4357 213 ed LEO BICGSOT 2669 887 x LEG MATVECT 1777 752 I SOLVE SPECIES EO 1417 986 ia SOLVE LIN EO 1028 448 e PHYSICAL PROP 783 402 RRATES 682 376 H LEQ_MSOLVET 530 858 H INIT AB M 463 788 CALC MASS FLUX SPHR 446 025 H INIT MU S 421 747 CALC_RESID_S 381 363 H SOLVE ENERGY EO 371 199 SOURCE PHI 258 829 DRAG GS Here is how to generate a flat profile with MPI setenv TAU MAKEFILE opt apps tau tau 2 17 1 x86 64 1lib Makefile tau mpi pdt pgi set p
92. ting interface is in Python and scripts can be used to build analysis workflows The Python scripts control the Java classes in the application through the Jython interpreter http www jython org There are two types of components which are useful in building analysis scripts The first type is the PerformanceResult interface and the second is the PerformanceAnalysisComponent interface For docu mentation on how to use the Java classes see the javadoc in the perfexplorer source distribution and the example scripts below To build the perfexplorer javadoc type gt make javadoc in the perfexplorer source directory Example Script from glue import PerformanceResult from glue import PerformanceAnalysisOperation from glue import ExtractEventOperation from glue import Utilities from glue import BasicStatisticsOperation from glue import DeriveMetricOperation from glue import MergeTrialsOperation from glue import TrialResult from glue import AbstractResult from glue import DrawMMMGraph from edu uoregon tau perfdmf import Trial from java util import HashSet from java util import ArrayList True 1 False 0 def glue print doing phase test for gtc on jaguar load the trial Utilities setSession perfdmf demo triall Utilities getTrial gtc bench Jaguar Compiler Options fasts resultl TrialResult triall print got the data get the iteration inclusive totals even
93. ts ArrayList for event in resultl getEvents dif event find Iteration gt 0 and result1 getEventGroupNane eve if event find Iteration gt 0 and event find gt lt 0 events add event extractor ExtractEventOperation resultl events 41 Running PerfExplorer Scripts extracted extractor processData get 0 print extracted phases derive metrics derivor DeriveMetricOperation extracted PAPI L1 TCA PAPI L1 TCM D derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCA PAPI L1 TCM PAP derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCM PAPI L2 TCM D derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI L1 TCM PAPI L2 TCM PAP derived derivor processData get 0 merger MergeTrialsOperation extracted merger addInput derived extracted merger processData get 0 derivor DeriveMetricOperation extracted PAPI FP INS P WALL CLOCK TI derived derivor processData get 0 merger Mer
94. vidual functions can be assigned a particular color by clicking on them in any of the other ParaProf Windows 31 PerfExplorer User s Manual University of Oregon PerfExplorer User s Manual by University of Oregon Published TBA Copyright 2011 University of Oregon Performance Research Lab Table of Contents 1 Introduction eere Este RESI ku saue da RE uku seg east 6 2 Installation and Configuration ENEE etre ENEE ENER pTi onespe SPENE EN s Es H 2 1 Available configuration options ocooccocnncnoconcnoconcnnroncnnconcnnroncnoroncnaroncnnrones 7 3 Runnmg PerfExplorer its ELA oi IDE SC sete 8 Se EE 9 4 1 Dimension Reduction seia eee e E ENEA EE ET E 9 4 2 Max Number of Clistets gt cess tases gen e e iii 9 4 3 Performing Cluster Analysis esses HH 10 5 Gorrelation Analysis ere rr ir P RD eor via are E EPESI rra 16 5 1 Dimension Reduction imiiteieeeeeeeeeenee nee eee eneeeneeenen neem nene enne 16 5 2 Performing Correlation Analysis sss 16 Mer Cm 20 6 1 Setting Parameters fesse erm peter tag ved shis kotike Ee EENS heart 20 6 1 1 Group of Interest mes eres entere me epe e bI seda ivt de 20 6 1 2 Metric of Interest 2 sse WEGE e et 20 6 1 32 Event of nterest cette tette emere tore PER ERE Ee rege 20 6 1 4 Total Number of Timesteps sss 21 6 2 Standard Chart Types es cameo thee te etas dE eae eue oet s et uet eed
95. xperiments that where loaded into PerfDMF You can select which performance data you are interested by navigating the tree structure PerfExplorer will allow you to run analysis operations on these experiments Also the cluster analysis results are visible on the right side of the window Various types of comparative analysis are available from the drop down menu selected To run an analysis operation first select the metric of interest form the experiments on the left Then perform the operation by selecting it from the Analysis menu If you would like you can set the clustering method dimension reduction normalization method and the number of clusters from the same menu The options under the Charts menu provide analysis over one or more applications experiments views or trials To view these charts first choose a metric of interest by selecting a trial form the tree on the left Then optionally choose the Set Metric of Interest or Set Event of Interest form the Charts menu if you don t and you need to you will be prompted Now you can view a chart by selecting it from the Charts menu Chapter 4 Cluster Analysis Cluster analysis is a valuable tool for reducing large parallel profiles down to representative groups for investigation Currently there are two types of clustering analysis implemented in PerfExplorer Both hierarchical and k means analysis are used to group parallel profiles into common clusters and then the clusters
96. xplorer and re start it to see the view This is a known problem with the application and will be fixed in a future release Figure 9 7 The completed view eoe PeriExplorer Client Fila Analysis Views Charts Visualization Help Performance Data 9 Cluster Results Correlation Results gt 3 Database Profiles Y LS Views gt 3 Application name gyro B1 std HPR Field Value 9 2 Creating Subviews In order to create sub views you first need to select the Create New Sub View item from the Views main menu item The first dialog box will prompt you to select the view or sub view to base the new sub view on Figure 9 8 Selecting the base view eoo Select View Select a view on which to base this sub view Applicationiname gyro B1 std HPM Z 38 Views After selecting the base view or sub view the options for creating the new sub view are the same as cre ating a new view After creating the sub view you will need to exit PerfExplorer and re start it to see the sub view This is a known problem with the application and will be fixed in a future release Figure 9 9 Completed sub views eoo PerfExplorer Client File Analysis Views Charts Visualization Help Performance Data 9 Analysis Management Cluster Results Q Correlation Results gt 3 Database Profiles v 3 Views Field Value V 9 Application name gyro B1 std HPM Y J Trial problem definition F
Download Pdf Manuals
Related Search
Related Contents
Betriebsanleitung Operating instructions RZ 1506 .... 1808 CG Manuale utente - American Diagnostic Corporation 一籍ーー・・ー@ 期間中にカラーモンキレンチを5丁お買い上げいただくと Samsung MX-C850 Manual de Usuario Humax DTV-4700 Revo Uninstaller Pro 3 User`s Guide TRANS CAL Typ 7280 - Burster Präzisionsmeßtechnik LaCie 1.2m USB 3.0 Pasarela de bus de campo UFF41B DeviceNet y - SEW PORTE NOVOSPEED Istruzioni di montaggio, uso e manutenzione Copyright © All rights reserved.
Failed to retrieve file