Home

Periscope User's Guide - Periscope Tuning Framework

image

Contents

1. src folder lt relative path gt Path to the source folder relative to the ex ecution directory This is needed to touch the sources to trigger recompilation of the instrumented versions strategy lt strategyname gt Strategy used by analysisagent Currently one of MPI MPI Communication analysis OMP OpenMP analysis P6 Power6 Analysis only on Power6 ma chines P6BF Power6 Breadth First only on Power6 machines P6BF_Memory Power6 Memory Behav ior Analysis only on Power6machines SCPS_BF Generic memory analysis strategy scalability_OMP Automatic OpenMP scalability analysis timeout lt secs gt Timeout for startup of the agent hierar chy Default varying depending on the num ber of processes uninstrumented Autotuning only instructs Periscope to tune an uninstrumented application Use with caution See also Section 4 2 version Displays the version of Periscope with deviation control Enables performance deviation control on POWER architectures 5 3 The instrumenter psc_instrument psc_instrument prepares the application for analysis with Periscope In the existing Makefile the compilation step generating the object files has to be modified by prepending psc_instrument to the compiler The script will preprocess the file instrument it and finally call the compiler for generating the instrumented object file In addition the compiler has to be augment
2. 2 2 4 Start Periscope analysis Periscope can be started via its frontend psc_frontend Upon calling the executable with proper parameters both Periscope s internal components as well as the test application are being started and the performance mea surements are then carried out For the NPB MZ BT example one should go to the bin directory and then call psc_frontend as follows psc_frontend apprun bt mz C 16 mpinumprocs 16 strategy MPI force localhost 2 2 5 Explore the results Upon successful termination Periscope generates a psc results file This is a standard XML file and could be opened using any text editor Periscope provides a Graphical User Interface GUI with enhanced visualisation and exploration functionalities for working with these performance result files Having started Periscope like described above for the NPB MZ BT bench mark the properties_ psc should have been created into the same bin directory Please follow the instructions in section for opening this file within the GUI Chapter 3 Analysis Flow within Periscope Periscope follows an iterative analysis approach it determines perfor mance properties based on measurements decides on possible new candidate properties and then it performs again new experiments to measure the data required to check whether the candidate properties hold See also the cycle depicted in Figure 3 1 Figure 3 1 Periscope iterative analysis The numbe
3. It can detect MPI and OpenMP operations loops subroutines and call statements All these code entries are considered to be separate regions alongside the user regions that can be defined manually see Section 3 4 By default Periscope instruments only the main routine There are two ways to instruct Periscope about which region types to instrument for the current application The first method is to pass to psc_instrument the option t followed by a comma separated list of region types For example psc_instrument t user mpi lt other_psc_options gt mpif90 Please refer to Table 5 3 for the complete list of valid region types Passing region types via the t option will enforce Periscope to apply the same region types configuration to all the files Setting different region types per file for instrumentation is also possible This can be done by editing the psc_inst_config file This file is generated by psc_instrument in the application source directory after the first build It contains a list of all the files that are going to be instrumented along with their corresponding region types For example instrumentation control for periscope id filename none mod_only all user sub call loop omp mpil if any H HH H H H 1 bt f user mpi initialize f user mpi CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 15 3 exact_solution f user mpi 4 exact_rhs f user mpi 5 set_constants f user mpi 6 adi f user mpi 7 rhs f use
4. can be gathered by using the selective debug parameter of the same psc_frontend executable selective debug lt levell gt lt level2 gt with the following levels being relevant for the agent hierarchy AgentApplComm displays information regarding the communication be tween the agents and the application nodes AutotuneAgentStrategy displays information regarding the analysis strat egy used in the analysis agent for tuning To be used only when the tuning feature of PTF is being used Other values for the selective debug parameter can be found in the PTF Developer s Guide Using a proper layout of the agent hierarchy is very important especially when performing analysis and tuning of applications on large systems Please note that if the force localhost option of the psc_frontend executable is being used then the entire agent hierarchy will be started on a single node This is not recommended for applications using a large number of processes as the communication between the agents and the application nodes would result in a bottleneck with a negative influence on the overall analysis time Chapter 7 Known Issues e Automatic restart of the application does not work on the Bluegene Make sure you specify a user region that is executed repetitively e C instrumentation The name of an OMP pragma should not occur again as a string in another context in this pragma e g in a variable name e Measurements m
5. the mapping of application and analysis agent processes are determined Both the application and the agent hierarchy are then started and a com mand is propagated from the frontend down to the analysis agents to start the search The search is performed according to a search strategy selected when the frontend is started Each of the analysis agents i e the nodes of the agent hierarchy searches autonomously for inefficiencies in a subset of the application processes The application processes are linked with a monitoring system that provides the Monitoring Request Interface MRI The agents attach to the monitor via sockets The MRI allows the agent to configure the measurements to start to halt to resume the execution and to retrieve the performance data The monitor currently only supports summary information At the end of the local search the detected performance properties are reported back via the agent hierarchy to the frontend 6 1 Agent hierarchy The layout of the agent hierarchy can be controlled by the user by means of the specific parameters of the psc_frontend executable maxfan determines the fan out of the tree of high level agents By default this is set to 4 33 CHAPTER 6 ADVANCED USER INFORMATION TECHNICAL DETAILS34 maxcluster gives the maximum number of MPI processes analysed by a single analysisagent The default number is 64 Further information on how the agents work within a specific run of PTF
6. user region Besides the regions detected and instrumented automatically by Periscope the user also has the possibility to define own custom regions An user region can be defined by surrounding the corresponding piece of code with the following directives as also shown before Fortran MON USER REGION S1 S2 MON END USER REGION C C pragma start_user_region S1 S2 pragma end_user_region When psc_instrument is called the source file is parsed and the directives are replaced with proper calls to the Periscope library There is no limit on the number of user regions that can be defined in a code 3See Table CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 19 Any user region has to be defined within one scope of the source code For example a user region cannot pass beyond the end of a subroutine if it starts within that subroutine 3 5 Starting performance analysis psc_frontend The Periscope performance measurement and analysis process can be started via the psc_frontend executable For example psc_frontend apprun bt mz_C 16 mpinumprocs 16 force localhost debug 1 All needed configuration options can be passed to Periscope by means of the command line parameters The mandatory parameters which are required for Periscope analysis to start are Option Description apprun lt command line gt Specify the command line to start the application It will be passed to the mpirun comma
7. For the BT application the phase region can be defined in file bt f lines 188 to 198 by inserting MON user region and MON end user region as shown below c start the benchmark time step loop do step 1 niter lines omitted here I MON user region call exch qbc u qbc nx nxmax ny nz do zone 1 num_zones call adi rho_i startl zone us startl zone vs startl zone ws startl zone qs startl zone square startl zone rhs start5 zone forcing start5 zone u start5 zone nx zone nxmax zone ny zone nz zone end do MON end user region end do 2 2 2 Modify the Makefile In order to enable performance measurements the test application has to be instrumented by the performance tool To enable instrumentation one has to substitute the compile link commands usually defined in the Makefile For NPB MZ BT one should edit the config make def file and update the F77 variable as follows F77 psc_instrument i v d s bin bt mz CLASS NPROCS sir t user mpi mpif90 CHAPTER 2 QUICK START 9 This links fortran programs usually the same as F77 FLINK F77 2 2 3 Build the application After the phase region was defined and the build command was adjusted one can continue with the common build process of the test application For the NPB MZ BT example one should go to the root directory of the NPB MZ series and issue make clean make bt mz CLASS C NPROCS 16
8. evaluated For most tuning plugins users can choose the preferred search algorithm There are several search algorithms available exhaustive search in dividual search random search and GDE3 search one genetic algo rithm pre analysis some plugins require an analysis step before the tuning pro cess can start The Periscope performance analysis feature is being used in this case Required pre analysis is very much plugin specific Please consult the given User s Guide to see whether user input is possible for each particular case 4 4 Uninstrumented applications The CFS Plugin an the MPI Plugin also allow tunning of uninstrumented applications but this is strongly not encouraged When measuring perfor mance for uninstrumented applications Periscope relies exclusively on the data retrieved from the system This mostly leads to inaccuracies especially CHAPTER 4 PERFORMANCE TUNING WITH PERISCOPE 25 for applications with a short execution time If one does want to use the uninstrumented version this can be done by passing the uninstrumented option to the psc_frontend process at the command line Chapter 5 Configuration Options 5 1 Environment Variables Option Description PSC_ROOT Root directory of the Periscope installation PERISCOPE_DEBUG 0 2 0 quiet 1 startup found properties in each search 2 candidate properties and found properties in each strategy step 5 2 The frontend psc f
9. 5 Chapter 1 Introduction Periscope is a scalable automatic performance analysis tool currently under development at Technische Universitat Mtinchen and is part of the Periscope Tuning Framework PTF along with tools like Pathway and tuning plugins Periscope provides two main functionalities for Fortran and C C appli cations performance analysis and performance tuning Performance analysis is performed at runtime using an iterative approach There is a starting set of performance properties which is then refined based on the measurements and the chosen search strategy In the end the ap propriate set of performance properties is provided for the application being analysed The search threshold the confidence value and the severity are defined by means of a formal specification of the properties Based on expert knowledge Periscope uses several strategies to identify possible performance issues Such strategies include exploiting parallel MPI or OpenMP regions as well as system specific approaches like for example for Power6 machines The second functionality performance tuning is provided through the tun ing framework Periscope offers the necessary support for measurements and search logic for a series of tuning plugins Different application and environment setups are tested within the plugins and the best configuration is provided as an advice at the end of the tuning Periscope consists of four main components the
10. Periscope User s Guide PTF Version 1 1 Periscope Version xx xx Michael Gerndt Anca Berariu 13 04 2015 Contents 1 Introduction 2 Quick Start 21 Installation 2 1 1 periscope configuration file 2 2 2222 21 2 SSH access 213 GUI 2 2 Basic analysis run 2 2 1 Specify the phase region in NPB MZ BT 2 2 2 Modify the Makefile 2 2 3 Build the application 2 2 4 Start Periscope analysis 2 2 5 Explore the results 3 Analysis Flow within Periscope 3 1 Specification of a phase region 3 2 Enabling instrumentation psc_instrument 3 3 Automatic instrumentation 3 3 1 Region typesl 3 3 2 sir file 3 3 3 Fortran particularities module instrumentation 3 3 4 Reducing the instrumentation overhead 3 4 Manual instrumentation user region 3 5 Starting performance analysis psc frontend 3 6 Exploring the results GUI 4 Performance Tuning with Periscope Erg eee i dt 5 Configuration Options 10 10 12 13 14 16 17 17 18 19 20 22 23 23 23 24 26 CONTENTS 6 7 5 1 Environment Variables 2 0 2 0 2 0 0000 5 2 The frontend psc_frontend 5 3 The instrumenter psc instrument Advanced user information technical details 6 1 Agent hierarchy Known Issues 26 26 30 33 33 3
11. a series of tuning plugins for automatic tuning of applications Chapter 2 Quick Start 2 1 Installation Periscope can be installed from the source files following the common pro cess of configuring and building using Autotools Please check the Periscope Installation Manual for a thorough guide on how to install Periscope on your machine The basic installation steps are 1 check and install prerequisites ACE Boost etc 2 checkout the source files from the Periscope repository git clone https periscope in tum de git Periscope git periscope 3 configure your installation choosing appropriate option for example configure prefix HOME install psc enable papi with papi lib HOME install papi lib with papi header HOME install papi include enable enopt no 4 build the files make j 16 5 install the files make install If you are using SuperMUC Periscope is already installed on the system In order to use it you have to add to your bashrc file Please refer to the PTF Installation Manual for further details regarding available options CHAPTER 2 QUICK START 6 module load periscope and then issue in your home directory source bashrc Note Please make sure to add the command for loading the periscope module into your bashrc Just issuing the command at the command line is not going to work properly 2 1 1 periscope configuration file Before using Periscope the peris
12. ck section of this Guide More on psc_frontend in section CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 17 3 3 3 Fortran particularities module instrumentation Fortran modules require special attention in the instrumentation process This is due to the fact that besides the common objects generated at compile time there is also an extra module description file mod generated for each module source The mod files may have different formats from compiler to compiler Periscope instrumenter uses its own format as well which most often do not match formats used by compilers In this context one should consider the following when instrumentating Fortran code containing modules e if a file a 90 refers to the module implemented in b f 90 e g it contains a statement like USE MODULE BModule then the file a 90 can only be instrumented if the Periscope instrumenter can also load the corresponding module file bmodule mod e due to format differences the Periscope instrumenter can only load mod files generated by itself e a mod file can only be generated if the corresponding source file 90 F90 etc is available There are two main issues that a user should take care of 1 The psc_instrument needs to know where the mod files can be loaded from See option M for setting the include paths 2 If the application uses a module for which the source code is not avail able then the files referencing this mod
13. cope setup file has to be created in your home directory You may create a new one or copy it from the Periscope installation directory cp PSC_ROOT templates periscope The setup file contains a list of lt option gt lt value gt pairs as follows MACHINE SuperMUC SITE LRZ REGSERVICE_HOST localhost REGSERVICE_PORT 50001 REGSERVICE_HOST_INIT localhost REGSERVICE_PORT_INIT 50001 APPL_BASEPORT 51000 AGENT_BASEPORT 50002 If running on your local machine only then the MACHINE option above should be set to localhost MACHINE localhost Please refer to the PTF Periscope Installation Manual for a detailed descrip tion on how to choose the proper option values for your particular system 2 1 2 SSH access In order to run Periscope a private key based ssh access has to be provided on the machine running the tool If not already configured you can do so in few steps 1 mkdir ssh 2 cd ssh 3 ssh keygen t rsa N f id rsa CHAPTER 2 QUICK START 7 4 cat id_rsa pub gt gt authorized_keys 5 chmod 600 authorized_keys The ssh access is not required if running on your local machine i e the MACHINE option is set to localhost in your periscope file 2 1 3 GUI The Periscope GUI used for analysing the performance measurements is provided as an Eclipse plugin You can install the GUI from this location http www lrr in tum de petkovve psc eclipse following the common p
14. ed with psc_instrument in the linking step too Here psc_instrument will link the monitoring library to the executable as well as generate the SIR containing the static information of the program CHAPTER 5 CONFIGURATION OPTIONS 31 The instrumentation is controlled by a file called psc_inst_config in which the file id and the region types to be instrumented are given for each file individually The calling syntax is psc_instrument t lt regions gt s lt sirfile gt f n d v lt compiler gt lt options gt lt file gt lt libs gt Please note that while psc_instrument can process both Fortran and C C files some options are specific to only one of the two programming languages Option Description d Provide debug information f lt fixed free gt Fortran only forces a specific Fortran file format By default 90 files are in free format M lt path gt Fortran only location where module files are placed n Dryrun run the makefile without executing the com mands s lt SIR file gt This file name will be used for the static program information It is recommended to name the SIR file as the executable adding the sir extension Default appl sir CHAPTER 5 CONFIGURATION OPTIONS 32 t lt regions gt List of region types to be instrumented This overwrites the specifications in psc_inst_config Fortran and C C all all regions u
15. erhead but it is limiting the precision of the analysis with respect to the location in the code Although available the usage of the value all for the region type is strongly not recommended If needed please use it with care as it frequently pro duces a high amount of instrumentation overhead 3 3 2 sir file Upon successful completion psc_instrument generates 1 an instrumented executable of the application and 2 a str file storing static information about the program SIR stands for Standard Intermediate Representation and is a format specific to Periscopd Periscope can only start its performance analysis if both the executable of the application as well as the sir file is provided By default psc_instrument stores the sir file under the name appl sir in the directory where the link process is executed You can change the name of the generated file by providing the option s to the instrumenter psc_instrument s sirfilename sir The same file name will then have to be passed to the Periscope executabld upon startup psc_frontend sir sirfilename sir Please note that if sir is not provided Periscope will search for a sir file called lt applname gt sir where applname is the actual name of the ap plication executable It is thus a good practice to name the SIR file as the application itself just adding the sir extension at the end For further information on the SIR format please che
16. frontend the hierarchy of communication and analysis agents the monitoring library and the GUI e The frontend is responsible for starting both the application to be analysed as well as all the internal components of Periscope All settings regarding the execution of Periscope can be selected by means of command line parameters of the frontend process CHAPTER 1 INTRODUCTION 4 e The agent hierarchy is transparent for the common users At the bot tom layer of the hierarchy there are the analysis agents They control and configure the measurements for each application node process They can start halt or resume the execution and they also retrieve the performance data The strategy is communicated upon startup by the frontend and at the end of the local search the performance prop erties are communicated back to the frontend via the agent hierarchy e The monitoring library is also transparent to the user and it provides the measurement and communication layer between the application being tested and the performance tool e The GUI is used to visualise and explore the performance results It is an Eclipse plugin which can be easily used to identify the most sever performance properties as well as the corresponding source lines responsible for the performance issues Periscope Tuning Framework Alongside Periscope the Periscope Tuning Framework PTF also provides PAThWay a workflow management tool for HPC experiments as well as
17. heck the next section for detailed information regarding the most common options in lt psc_options gt A complete list can be found in table 3 3 Automatic instrumentation psc_instrument is a source code instrumenter It parses the given source files and modifies them accordingly Usually this means inserting library calls at the proper places in the code Please note that Periscope will create four additional directories to store the instrumented versions of the files prep inst instmod compmod To switch to verbose mode and follow all actions performed by the instru menter please pass the v option to the psc_instrument script psc_instrument v lt other_psc_options gt mpif90 c lt args gt Frequently used options are Option Description t lt regions gt List of region types to be instrumented Some commonly used region types mpi mpi functions omp OMP constructs except atomic user user regions none no instrumentation files are only compiled See also Sections and CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 14 s lt SIR file gt This file name will be used for the static program information It is recommended to name the sir file as the executable adding the sir extension Default appl sir See also Section 3 3 2 d Provide debug information 3 3 1 Region types Periscope s automatic instrumentation can handle an entire set of region types
18. here might be also other options available for configuration Please consult the corresponding User s Guide for details specific to each of the plugins All other components in figure 4 1 are transparent to the users of the plugins and of the PTF tuning feature Figure 4 1 Plugin architecture of the Periscope Tuning Framework 22 CHAPTER 4 PERFORMANCE TUNING WITH PERISCOPE 23 4 1 Tuning plugins For the current version PTF provides the following tuning plugins CFS the Compiler Flags Selection plugin tunes the application to find the combination of compiler flags with which the best execution time is achieved DVFS the Dynamic Voltage and Frequency Scaling plugin tunes the en ergy consumption of an application Master Worker the Master Worker plugin tunes the number of tasks and processes to be used by applications based on the master worker paradigm MPI Paramenters automatically optimizes the values of a user selected subset of MPI configuration parameters Patterns the Parallel Patterns plugin works on applications using a Pipeline based execution to determine the best combination of the pipeline stages 4 2 Tuning advice As aresult of the tuning process Periscope generates an XML file describing e the final tuning advice to be applied to the application e the tuning scenarios which were used in searching the best advice e other information specific to the tuning plugin like for example the tuning parame
19. ight be wrong in recursive algorithms e Multiple running instances of Periscope might not work on some sys tems 35 Examples You can find two examples with the adapted makefile in psc test add and psc test cx_parallel Both directories include a file makefile psc_instrument Example on SuperMUC Periscope can be used in batch jobs Example batch script bin bash PBS j oe PBS S bin bash PBS l select 80 ncpus 1 PBS l walltime 0 20 00 PBS N cx64 PBS M gerndt in tum de PBS m e etc profile cd psc test cx parallel psc_regsrv amp sleep 10 sudo Irz sys Irz_perf bin lrz_perf_off_hlrb2 psc_frontend apprun cx mpinumprocs 64 strategy SCA debug 1 bin bash job_type parallel class test island_count 1 node 1 36 CHAPTER 7 KNOWN ISSUES 37 wall_clock_limit 1 12 30 job_name add network MPI sn_all not_shared us initialdir HOME TestingRepository add output jobid out error jobid err notification never notify_user gerndtin tum de queue etc profile etc profile d modules sh psc_frontend apprun add mpinumprocs 4 sir add sir tune demo force localhost debug 1
20. le The information it shows is a combination of the standard intermediate representation of the analyzed application and the distribution of its bottlenecks The main goals of the view are to assist the navigation in the source code and attract developer s attention to the most problematic code areas The multivariate statistical clustering is another key feature of the plug in that enhances the scalability of the GUI and provides means of conducting Peta scale performance analysis It can effectively summarize the displayed information and identify a runtime behavior possibly hidden in the large amount of data Chapter 4 Performance Tuning with Periscope Performance tuning using PTF Periscope Tuning Framework is based on the collaborative work performed by customized tuning plugins on the one side and Periscope as the host application of the plugins on the other side The high level architecture of PTF can be seen in figure Similar to using the analysis feature of PTF users can start and configure the tuning process by calling the psc_frontend with appropriate parameters The option enabling the tuning execution mode of Periscope is tune psc_frontend tune lt nameofplugin gt For example the following will run compiler flags tuning CFS on the BT application psc_frontend apprun bt MZ W mpinumprocs 1 force localhost tune compilerflags cfs config cfs_config cfg Depending on each particular plugin t
21. lugin installation process in Eclipsd 2 2 Basic analysis run Having Periscope properly installed there are only few steps required for a basic analysis of a test application 1 specify a phase region by instrumenting the source code of the appli cation 2 modify the Makefile to enable instrumentation 3 build the application 4 start the analysis 5 visualize and explore the performance results For the remainder of this section we consider as the test application the NPB MZ BT benchmark 2 2 1 Specify the phase region in NPB MZ BT Periscope uses an iterative analysis approach It starts first with a set of per formance properties which are measured for the test application throughout an experiment run Based on the measurements result it then determines new candidate properties which are going to be evaluated in the next exper iment The iteration stops when there are no new candidate properties Please refer to the PTF Periscope Installation Manual for a step by step description of the installation process 3See http www nas nasa gov publications npb htm1 for download and documen tation CHAPTER 2 QUICK START 8 If the test application has a repetitive region like for example the body of a main loop then the consecutive experiments could be performed without the need of restarting the entire application In order to do so the repetitive region has to be marked in the source code as a phase region
22. nd The executable specified in the com mand line must exist when Periscope is started mpinumprocs lt np gt Number of MPI processes for the application For serial applications please set this value to 1 Periscope treats serial applications as l process MPI applications Other frequently used options are Option Description debug lt level gt Level of debug output default 0 force localhost Locally start the agents instead of us ing SSH CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 20 strategy lt strategy gt Specify one of the following strate gies MPI SCA SCABF P6 P6BF P6BF_Memory SCPS_BF scalabil ity_OMP Please note Some strategies are platform dependent default all sir lt filename gt SIR file to be used during the analysis default lt appl gt sir propfile lt filename gt Store the detected properties into file name default properties psc ompnumthreads lt threads gt Number of OpenMP threads default 1 Please see table 5 2 for a complete list of options accepted by psc_frontend On startup a hierarchy of analysis and communication agents is first created then the application to be measured is started and the analysis agents attach to the application nodes The performance data are gathered by means of the monitoring library and communicated to the low level agents There it is analysed using the
23. put Regardless of this value all properties are output to the properties file Default 50 ompnumthreads lt n gt Number of OMP threads to be started per MPI process Default 1 pedantic Shows all detected properties phase lt fileid rfl gt Specifies the phase region via the fileid and the region first line number If no phase region is specified a user region is selected if at least one is given in the code If multiple are given it is undefined which is selected If no user region is given the main program is the user region and the program will be restarted for each strategy step If you mark the phase region via a user region and would like to use user regions also to guide analysis you have to give the fileid and rfl for the phase region propfile lt filename gt Specify the file to use when exporting the properties Default properties psc psc inst config lt relative path to inst config file gt File name relative to the execution direc tory quiet Turns off the debug messages srcrev lt source revision gt Specify the source code revision It will be written in the output file sir lt filename gt SIR file of the application to be analyzed Default The file name is composed of the executable s name and the exten sion sir If apprun is omitted the default is appl sir CHAPTER 5 CONFIGURATION OPTIONS 30
24. r mpi 8 zone_setup f user mpi 9 x_solve f user mpi 10 y_solve f user mpi 11 exch_gbc f user mpi 12 z_solve f user mpi 13 solve_subs f user mpi 14 add f user mpi 15 error f user mpi 16 verify f user mpi Editing the region type for a specific file instructs Periscope to apply that kind of instrumentation for that particular file For example in the file listed above one could instruct Periscope to also instrument subroutines and call statements for the bt f file and only loops for the adi f and rhs f files instrumentation control for periscope id filename none mod_only all user sub call loop omp mpi if any 1 bt f user mpi sub call 2 initialize f user 3 exact_solution f user 4 exact_rhs f user 5 set_constants f user 6 adi f loop 7 rhs f loop 8 zone_setup f user 9 x_solve f user Please note that the settings in the psc_inst_config file only apply if the t option is not passed when calling psc_instrument Passing t to psc_instrument will overwrite any changes of the psc_inst_config file CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 16 Especially for the debugging phase it might be interesting to use the none value as a region type This switches off instrumentation for some files and could be useful to circumvent any issues that might occur due to the source instrumenter Please note that files which are not instrumented cannot be analysed into detail Thus the selective instrumentation reduces the ov
25. r of experiments carried out in one run of Periscope depends on both the execution time of the application itself and also the performance issues it might exhibit The number of experiments carried out in one run of Periscope depends on the performance issues it might detect Thus the total execution time of one Periscope analysis will depend on both the the execution time of the application itself as well as the amount and severity of detected performance issues 3 1 Specification of a phase region The performance measurements carried out within one experiment of the iterative analysis could be applied to either the entire application or only a particular execution phase or code region Periscope offers the possibility to define such a phase region by means of manual instrumentation of the source code 10 CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 11 Section 3 4 describes manual instrumentation in more detail We only men tion here that a phase region can be in terms of Periscope code instrumen tation any regular user region A user region can be defined by inserting the following directives into the source code Fortran MON USER REGION S1 2 MON END USER REGION C C pragma start_user_region 51 S2 pragma end_user_region Periscope allows the specification of several user regions but only one such region can be defined as the phase region This is done by passing the phase option to the psc_frontend p
26. r re gions and remove their instrumentation It will then apply the selected analysis strategy The overhead strategy removes the overhead that influences the single node measurements but other overheads may lead to a prolongation of the execu tion The all_overhead strategy removes all overhead so that the prolongation of the execution will be negligible The analysis instrumentation strategy will first determine too fine granular re gions and will then instrument exactly those regions that are required in the next experiment inst folder lt relative path gt Path to the folder with the instrumented sources relative to the execution directory This is needed to modify the instrumenta tion in during automatic instrumentation make lt make command gt Command to be issued in order to recom pile the application maxcluster lt n gt Maximum number of MPI processes analyzed by a single analysisagent It is not used on the Bluegene since the analysisagents are running on the IO nodes All processes on the compute nodes of an IO nodes connect to its analysisagent Default 64 maxfan lt n gt Determines the fan out of the tree of high level agents in interactive mode Default 4 mpinumprocs lt n gt Number of MPI processes to be started CHAPTER 5 CONFIGURATION OPTIONS 29 Nprops lt n gt Specifies the number of properties the frontend prints to standard out
27. rocess at startup psc_frontend phase fileid rfl where e fileid is the id of the file containing the phase region It is the same id used in the psc_inst_config file See also Section 3 3 1 e rfl is the region first line number It represents the line number in the source file specified above at which the region starts If several user regions are defined but none of them is specified as the phase region then the behaviour of Periscope is undefined If only one user region is specified then this is automatically defined as the phase region If no phase region is specified Periscope will automatically restart the application to perform new experiments until no new candidate properties are found and the search terminates The use of phase regions is strongly recommended CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 12 e it reduces the overall execution time of the Periscope performance measurements e it delivers more accurate results as measurements are only performed for relevant execution fragments The best example for a phase region is the body of the main loop of an application It is common that scientific applications have a main loop iterating through time steps or grid elements If such a repetitive region is defined in the source code as a phase region then the experiments can be done during the same application run The application is suspended at the beginningof the phase region and new measurements are req
28. rontend The frontend starts up the application and the agent hierarchy Option Description 26 CHAPTER 5 CONFIGURATION OPTIONS 27 apprun lt appl cmdline gt This is the command line used to start the application It should be the same as in mpirun np lt procs gt lt appl cmdline gt This value is also used to determine the name of the SIR file when sir is missing The executable specified in the com mand line must exist when Periscope is started This is true also for the cases where the tuning feature of Periscope is used in combination with plugins which by themselves re build the application from its source files e g the CFS plugin bg mode SMP DUAL VN The node mode used on the Bluegene debug level Level of debugging All debug output up to that level will be printed Default PERISCOPE_DEBUG or 0 delay lt n gt Number of phase executions that are skipped before the search is started This is useful for applications that have a dif ferent behaviour at the beginning dontcluster Do not use online clustering for the de tected bottlenecks force localhost Locally start the agents instead of using SSH help Help information CHAPTER 5 CONFIGURATION OPTIONS 28 inst overhead all_overhead analysis Automatic instrumentation strategy The overhead and all_overhead strategies will first determine too fine granula
29. se with care as this option will generate a lot of instrumentation overhead loop outermost loops only mpi mpi functions none no instrumentation files are only compiled omp OMP constructs except atomic par OMP parallel and worksharing constructs sub subroutines sync OMP synchronization statements except atomic user user regions Fortran only call call statements forall forall statements io IO statements mod_only no instrumentation but processing by the instrumenter to generate compatible module files nestedloop non perfectly nested loops vect vector statements Default all v Verbose lt compiler gt Compiler for final compilation of the instrumented files e g mpif90 or mpicc lt file gt Name of the file to be instrumented Fortran only file extensions f90 and F90 determine free source format while f determines fixed source for mat lt libs gt Libraries for linking lt options gt List of compiler options used in the original call to the compiler These are passed to the compiler Please note that if c is specified in the options list psc_instrument will instrument and compile the given file Otherwise it will link the application Chapter 6 Advanced user information technical details The application and the agent network are started through the psc_frontend process First the set of available processors is analysed and based on this
30. strategy established at the beginning within the frontend and based on the results the next step of the iterative analysis is established The final results are propagated through the agent hierarchy up to the fron tend which then stores them in the properties file The frontend is the control point of Periscope Users can configure and direct the performance analysis process from here The agent hierarchy and the monitoring library remain transparent to the common user 3 6 Exploring the results GUI The frontend writes the found performance properties into a file called properties_ with the psc extension This file is in XML format and can be opened with any off the shelf text editor or a spreadsheet application Periscope also offers a Graphical User Interface GUI for an enhanced vi sualisation and exploration of the analysis results It is an Eclipse based plugin featuring a multi functional table for displaying and organizing the textual data Following functionalities are available e multiple criteria sorting algorithm e complex categorization utility CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 21 e searching engine using regular expressions e filtering operations e direct navigation from the bottlenecks to their precise source location using the default IDE editor for that source file type e g CDT Pho tran editor An outline view for the instrumented code regions that were used in an experiment is also availab
31. ters the execution times or the energy consumption 4 3 The tuning flow Being the host of the tuning plugins Periscope provides several services to build a standard tuning flow Data model The main components of the tuning data model are tuning parameters represent the parameters based on which a tuning of the application can be done These are plugin dependent and their se mantics is strictly defined in each plugin For example the CFS plugin CHAPTER 4 PERFORMANCE TUNING WITH PERISCOPE 24 uses compiler flags as tuning parameters while the MPI Parameters plugin uses MPI related switches and parameters For most plugins the tuning parameters are the given by user input through a configuration file tuning scenario represents a combination of tuning parameters The ap plication is analysed by Periscope using one scenario at a time Scenarios are computed internally based on a chosen search algorithm Users can choose between different search algorithms but cannot di rectly define tuning scenarios tuning space the set of all valid tuning scenarios analysis result the analysis result associated with one specific tuning sce nario Results are partially displayed in the final tuning advice pro vided by Periscope Operations On the functional side the tuning flow is supported by means two main operations search algorithm the search algorithm generates the tuning space and delivers the next scenario to be
32. uested The application is then released and the analysis is started When the application encounters again the end of the region it is suspended and the measured values are retrieved 3 2 Enabling instrumentation psc_instrument Measuring performance of an application is commonly based on the ability of the performance tool to communicate with the application at runtime This can be achieved through the instrumentation of the application i e inserting tool specific calls inside the source code or the compiled binary of the application See also the right hand side of figure In order to enable instrumentation with Periscope one needs to prepend the compiling and linking commands with the call to the psc_instrument script This could usually be done by editing the Makefile of the application For example one should replace mpif90 c lt args gt with psc_instrument lt psc_options gt mpif90 c lt args gt for a Fortran code and mpicc c lt args gt with psc_instrument lt psc_options gt mpicc c lt args gt for a C C code Do not forget to change both the compiling and the linking commands CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 13 Please note that the script recognizes the c argument passed to the compiler itself and uses it to decide between the instrumentation and the linking steps It is thus required that the respective test application is built in two distinct steps compilation and linking Please c
33. ule cannot be instrumented They have to be marked in the psc_inst_conf with none for the region type 3 3 4 Reducing the instrumentation overhead Especially in the case of large applications the automatically instrumented code has a high execution overhead To overcome this issue Periscope can be instructed to perform an analysis of the generated overhead and to re instrument the code accordingly This can be achieved by means of the inst parameter of the psc_frontend executable psc_frontend inst lt overhead all_overhead analysis gt There are three possible automatic re instrumentation strategies overhead all_overhead and analysis CHAPTER 3 ANALYSIS FLOW WITHIN PERISCOPE 18 The overhead and all_overhead strategies will first determine too fine gran ular regions and remove their instrumentation The overhead strategy re moves only the overhead concerning the single node measurements Other overheads may still lead to an extended execution time The all_overhead strategy removes all overhead so that the extra execution time produced due to the instrumentation will be negligible The analysis instrumentation strategy first determines the too fine granu lar regions like the previous strategies too but unlike those it will then only instrument those regions which are required in the next experiment These regions are determined based on the analysis strategy given by the strategy parameter 3 4 Manual instrumentation

Download Pdf Manuals

image

Related Search

Related Contents

Allegra-D (chlorhydrate de fexofenadine  Válvula de retención tipo venturi y kit de medición de flujo  

Copyright © All rights reserved.
Failed to retrieve file