Home
        Vampir 7 User Manual
         Contents
1.      46    mumua         DMM  mmama                   BBHHU  manaa         BHHHHH  mamm        BHHH  mananam     BHHHHHHH  maHHEHHH      naa  mmmHEHHH ana  Deet nmmummmm  BHEZGDUUD   DUHUHEHEH  mHumnnu     EDBBBHH  mmmnnu     ononnmm  manom onon oocse  msBEBBDHDDDBODBHHEHHH    CHAPTER6  AUSECASE MER    6 AUse Case    6 1 Introduction    In many cases the Vampir suite has been successfully applied to identify per   formance bottlenecks and assist their correction  To show in which ways the  provided toolset can be used to find performance problems in program code   one optimization process is illustrated in this chapter  The following example is a  three part optimization of a weather forecast model including simulation of cloud  microphysics  Every run of the code has been performed on 100 cores with man   ual function instrumentation  MPI communication instrumentation and recording  the number of L2 cache misses     In Figure 6 1  Vampir has been set up to show a high level overview of the model s  code  This layout can be achieved through two simple manipulations  Set up the  Master Timeline to adjust the process bar height to fit the chart height  All 100  processes are now arranged into one view  Likewise change the event category  in the Function Summary to show function groups  This way the many functions  have been condensed into fewer function groups     One run of the instrumented program took 290 seconds to finish  The first half  of the trace  Figure 6 1  A  is the
2.    Process 28   84 560075 s           7 25  Em UTIL     po 07  BM Application    0 46   VT API     0 27    COUPLE    Process 35    Process 42    Process 49    Proces      Process      rocess  118 s   202 324368 s  84 324368 s              Figure 6 6  Overview showing a significant overall improvement    By using the Vampir toolkit  three problems have been identified  As a con   sequence of addressing each problem  the duration of one iteration has been  decreased from 3 5 seconds to 2 0 seconds     As is shown by the Ruler  Chapter 4 1  in Figure 6 6 two large iterations now take  84 seconds to finish  Whereas at first  Figure  6 1  it took roughly 140 seconds   making a total speed gain of 40      This huge improvement has been achieved easily by using the insight into the  program s runtime behavior  provided by the Vampir toolkit  to ultimately optimize  the inefficient parts of the code     52    
3.    gram  Therefore  simply use the compiler wrappers without any parameters  e g        vtf90 hello f90  o hello    For manual instrumentation with the Vampir Trace API simply include   vt_user inc         Fortran  or       vt user h        C  C    and label any user defined sequence of  statements for instrumentation as follows        http   www tu dresden de zih vampirtrace    nmn  ann  amnmum           DMM  mmama          BHHHH  manaa                    BHHHHH          anana  maaHHEHnu   naa  mEBHHEHHHU    EBBHEHHE  EBEHBHBH       BHEBHHHH  msHEGDUUD   DUHDHEHEH  mumnnu     EDBBBHH  mmmnnu          nDBHHBH  manom onon  BmBHBH    VT USER START  name      VT USER END  name   in Fortran and C  respectively in C   as follows    VI TRACER        name           Afterwards  use   e vtec     DVIRACE hello c  o hello    to combine the manual instrumentation with automatic compiler instrumentation  Or      vtcc  vt inst manual  DVTRACE hello c  o hello  to prevent an additional compiler instrumentation     For a detailed description of manual instrumentation  please consider the   vampirTrace    User Manual       El    2 3 2 Tracing an Application    Running a VampirTrace instrumented application should normally result in an  OTF trace file in the current working directory where the application was exe   cuted  On Linux  Mac OS and Sun Solaris  the default name of the trace file will  be equal to the application name  For other systems  the default name is a  otf  but can be defined
4.    nmn  ann  waa   waa  waa       HHHH          BHHHHHH  maEHHEHHH   aa  mammHmHHu mBBBHEBSHR  EBEHBHHH   ocooooooe  wao hhh  manooo0     DBBBHH  manonn 2001  madan onon oocse  zao    Open Trace Hile       Suchen in    O io          wrrF Of    L      Zuletzt  verwendete D        2  Desktop    Eigene Dateien    Arbeitsplatz                Dateiname       Metzwerkumgeb Dateityp    OTF trace files    otf    Abbrechen    OTF trace files fo  EPILOG trace files   elg   esd   All files                Figure 2 2  Loading a Trace Log File in Vampir    Vampir    Trace View   C  ZIH gns vampirtrace otf   File Edit Chart Filter    Window Help             Figure 2 3  Progress Bar and Cancel Loading Button    11    ams  ann  waa  waa  waa   manm          DOMM          ansann  2 12 na  maBHEEHHH aa  kwa  BHEBHHHH  BHEBGDUUD   DUHUHEHEH  mumnnu     DBBBHH  mmmnnu          nBuBHBH  manom onon  BmBHBH  msBEBBDHDDDBODBHHEHHH    CHAPTERS  BASICS      YAM  R    3 Basics    After loading has been completed  the  Trace View  window title displays the  trace file s name as depicted in Figure By default  the  Charts  toolbar  and the  Zoom Toolbar  are available  Furthermore  the default set of charts    Vampir    Trace View   D  IHNsoc128 otf   mri eim              All Processes  Accumulated Exclusive Time per Function Group  5000 s 0s  10160 388681s EWALD_MOD  EWALD_ENERGY  MPI    i 445 185915s  JJ HAMILTONIAN_MOD  GET_SITE_DATA_I     418 615375 s    SYMMETRY_MOD     G_STRUCTURES_I  96 382357 s   H
5.   49    AE    Solution    To even this asymmetry out the code which determines the size of the work  packages for each process had to be thought over  To achieve the desired effect  an improved version of the domain decomposition has been implemented  Fig   ure 6 3 shows that all occurrences of the MICROPHYSICS routine are vertically  aligned  thus balanced  Additionally the MPI receive routine calls are now clearly  smaller than before  Comparing the Function Summary of Figure and Fig   ure  6 3  shows that the relative time spent in MPI receive has been decreased   and in turn the time spent inside MICROPHYSICS has been increased greatly   This means that now we spent more time computing and less time communicat   ing  which is exactly what we want     6 2 2 Serial Optimization  Problem    All the displays in Vampir show information only of the time span visible in the  Timeline  Thus the most time intensive routine of one iteration can be deter   mined by zooming into one or more iterations and having a look at the Function  Summary  The function with the largest bar takes up the most time  In this ex   ample  Figure  6 2  the MICROPHYSICS routine can be identified as the most  costly part of an iteration  Therefore it is a good candidate for gaining speedup  through serial optimization techniques     Solution    In order to get a fine grained view of the MICROPHYSICS routine s inner work   ings we had to trace the program using full function instrumentation  Only then it 
6.  FACTORS I  10 966021s  SOCORRO    9  ARG MOD  ARG START    Yi   lll ATOMIC OPERATORS  NCP MOD  CONSTRUCTOR  AO   ff ATOMIC OPERATORS NCP MOD  CONSTRUCTOR AP    an Mi ATOMIC_OPERATORS_NCP_MOD  FORM_FACTORS_I  Process 18     E  ATOMIC OPERATORS  NCP MOD  TYPE DATA I  d Man anti sonm mann li   ff  CONFIG_MOD  CONSTRUCTOR_CFG  lll CONFIG MOD  STANDARD PORTAL I  ll CRYSTAL MOD  CONSTRUCTOR CRYS 2  i       ECCE eR i D ELECTRONS_MOD  CONSTRUCTOR_EL  Process21   l m     DI ELECTRONS MOD  STANDARD PORTAL I  lProcess22    EWALD MOD  EWALD  ENERGY E  9  ERROR MERE poke yi START     Process23        m EXC M  D  FORM EXC  FIELDS         1 a   gt    gt  ENERGY    Process 19                                 Figure 3 2  Moving and Arranging Charts in the Trace View Window  1     The    Trace View    window can host an arbitrary number of charts  Charts can be    14    CHAPTER 3  BASICS    Vampir    Trace View   DAZIHisoc128 0tf   File View              View Chart       mus CODEHS 1G     AG ww             2852 773858 s SYMMETRY _MOD     _ SPACE GROUP  I  445 185915 s   HAMILTONIAN MOD  GET SITE DATA I  418 615375 s   SYMMETRY  MOD     G STRUCTURES 1  96 382357 s  HAMILTONIAN MO   S PROJECTORS I  51 9519545  FIELDS MOD  CON   RUCTOR  NCP  FD  I  42 242958 s  SYS MOD  SYS  START E  28 416744 s  ERROR  MOD  ERROR  START  25 28145 s  EXC MOD  WB VXC I  24 16995 s   CONFIG_MOD  CONSTRUCTOR_CFG  22 504484 s  TAU USER  20 890244 s   EXTERNAL  MOD  CONSTRUCTOR  EXT  2  20 518876 s  FIELDS MOD  CONSTR
7.  I 6 395182 s 36 773000 ms  E m GRID MOD  TAKE PTR GRID R 5 543465 s 305 000000 us  E  m GRID  MOD  TRANSFER  GRID 5 543192s 172 000000 us     m LAYOUT MOD  FFT PARALLEL LAY 5 543057 s 15 000000 us  B m FFT MOD  FFT  3D PARALLEL 5 543042 s 131 000000 us  a M void fft 3d  FFT DATA    FFT DATA      int    struct fft plan 3d     C 5 542911s 13 000000 us  E m oid fft In ki    int  strui In 5 542901s 83 859000 ms  a M void remap_3d   cfun c double     dou  ble    double    structremap plan 3d 5  C 5 532224s 798 000000 us  Di m EXC MOD  WB a  I 855  171000 ms 409 491000 ms  EXC  MOD  XCFUN  XC 1 113327 s 168 000000 us  EXC  MOD  XCDEN  XC 917 168000 ms 302 000000 us  EXC MOD  WB VXC I 531 100000 ms 269 043000 ms  EXC  MOD  FORM VXC  FIELDS I 6 464842 s 40 487000 ms  EXC  MOD  FORM EXC FIELDS I 641 601000 ms 41 804000 ms  GRID MOD  TAKE PTR GRID R 300 008000 ms 273 000000 us    MOD  g 299 811000 ms  E  m LAYOUT MOD  FFT PARALLEL LAY 299 736000 ms 8 000000 us  B m FFT MOD  FFT  3D  PARALLEL 299 730000 ms 57 000000 us  a M void Wi   ard DATA    FFT_DATA    int     struct fft n 3d    C 299 685000 ms 2 000000 us  B B vo 3d_cfunc FFT 1 DATA    FAT DATA     int  stri ct fit plan 3d 5 C 299 684000 ms 7 741000 ms  a n hor     a   cfunc double      double    dou ble     struct remap_plan_3d 5  C 291 985000 ms 660 000000 ps  ER   MPI Waitany   289 752000 ms 289 112000 ms  m er Fa    compare   855 000000 us 855 000000 us  E el MPI Send   20 037000 ms 19 329000 ms  m MPI Comm compare   850 0000
8.  It is possible to hide functions and function groups from the displayed information  with the context menu entry  Filter   To mark the function or function group to be  filtered just click on the associated label or color representation in the chart  Us   ing the    Process Filter   see Section 4 4  allows you to restrict this view to a set of  processes  As a result  only the consumed time of these processes is displayed  for each function group or function  Instead of using the filter which effects all  other displays by hiding processes it is possible to select a single process via   Set Process  in the context menu of the  Function Summary   This does not  have any effect on other timeline displays     The  Function Summary  can be shown as a    Histogram     a bar chart like in  timeline charts  or as a  Pie Chart   To switch between these representations  use the  Set Chart Mode  entry of the context menu     The shown functions or function groups can be sorted by name or value via the  context menu option  Sort By      31    am  Am  GWT biisoni 00 42  STATISTICAL CHARTS    4 2 3 Process Summary    The    Process Summary   shown in Figure  4 9  is similar to the  Function Sum   mary  but shows the information for every process independently  This is useful  for analyzing the balance between processes to reveal bottlenecks  For instance  finding that one process spends a significantly high time performing the calcu   lations could indicate an unbalanced distribution of 
9.  That enables a  very powerful format with respect to storage size  human readability  and search  capabilities on timed event records     In order to support fast and selective access to large amounts of performance  trace data  OTF is based on a stream model  i  e  single separate units repre   senting segments of the overall data  OTF streams may contain multiple inde   pendent processes whereas a process belongs to a single stream exclusively   As shown in Figure 1 1  each stream is represented by multiple files which store    waa   waa  waa   mama          DOMM           BHHHHHH  mammumnmnu      na  mmHHEHHH    EBBHEHHR  Deet nmmummmm  mBHEGDUUD   DUBHEHEH  manonn     DBBBHH  manono          noe  manom onon  BBHHBH  zao    CHAPTERS WIER   ew    definition records  performance events  status information  and event summaries  separately  A single global master file holds the necessary information for the  process to stream mappings     Each file name starts with an arbitrary common prefix defined by the user  The  master file is always named  name  otf  The global definition file is named   name  0 def  Events and local definitions are placed in files  name  x events  and  name  x defs where the latter files are optional  Snapshots and statistics  are placed in files named  name  x snaps and  name  x stats which are op   tional  too     Note  Open the master file      otf  to load a trace  When copying  moving or  deleting traces it is important to take all according files
10.  been done in a before and after  fashion to point out what changed by applying the specific improvements        Vampir   Trace View    traces success story pmp old otf    W file Edit Chart Filter Window Help   L  I US                Ty  Os  50s  100 s  150 s  200 s  250 s All Processes  Accumulated Exclusive Time per F       Process 0    Process 7     Process 14    Process 21    Process 28   MP UTIL  0 73  f Application  0 32   VT API  0 1996   COUPLE       Process 42    Process49    Process 56    Process 63    Process 70    Process 77    Process 84    Process 91          Figure 6 1  Master Timeline and Function Summary showing an overview of the  program run    6 2 Identified Problems and Solutions    6 2 1 Computational Imbalance  Problem    As can be seen in Figure  6 2  each occurrence of the MICROPHYSICS routine   ourple color  starts at the same time on all processes inside one iteration  but  takes between 1 7 and 1 3 seconds to finish  This imbalance leads to idle time in  Subsequent synchronization calls on the processes 1 to 4  because they have to  wait for process 0 to finish its work  marked parts in Figure  6 2   This is wasted  time  which could be used for computational work  if all MICROPHYSICS calls  would have the same duration  Another hint at this overhead in synchronization  is the fact that the MPI receive routine uses 17 6  of the time of one iteration   Function Summary in Figure 6 2      48    one  mumua     DMM  waa  waa   BH   DDB          anana  
11.  chart  The functions own the color  of their function group     30    nmn  ann  aEEUSB             BHHHM  MENS                     BBHHH  manaa                         BHHHHH  mama    h  o     DOMM  mananam     BHHHHHH  msmamHmnnu      haa  mmHHEHHH  EBBHEBSHR  EBHEHBHBH    BHEBHHHH  BHEUDUUD   DUBDHEHEH  mumpnnu     DBBBHH  manono      oodccse  manom onon oocse    Vampir    Trace View   D  ZIH soc128  otf   W File View Help  View Chart Filter    ic I I Dn      etn A    ENLOTFERE BOY  ANA nn        Function Summary AX Function Summary AX  45s 0s 15s Os  l AS es W     ooo Sm UP 1     nnn sn e      BEE exc M   vxc 1  i   119429235 MN ATOMI   ors 1    9277526 s NNA ATOMI   ATA 1  9 184775 s EN AU user  6 498692 s Ill Exc M   LDS 1  5 336949 s  M  EXTER   EXT_2   2 38151s    ATom  R_AO   1 395091s   NcP  D   EAR 1  11 338789 s leur    T xc  112271925   EXC M  LDSI      793 514 ms    ATOM  R_AP      100 ms  GRID     GRID  i   100 ms  GRID   RID R  i   100ms  EXC M  DER I  i   100ms  MPI M  V INT      100ms  MPI M  E LOG  io   100ms   FFT_M   ALLEL SYMMETRY _   UCTURES_I  376 674494 s   i    100ms  EXC   N XC  i   100ms  EXC   N XC  i   100ms  FIELD  OR FD      100ms  FIELD  TER I      100ms  ATOMI  R LAY  io   10ms  NCP   R PD  i   10ms  IO MO    FILE  io   10ms  LAYOU  L LAY      10ms  GRID_  RID_C  io   10ms  CONFI   TAL_I    EXC  MOD  WB  VXC I  20 343932 s   SYMMETRY    E GROUP  I  106 467136 s     EWALD MO   D ENERGY  290 524173 s     Le       Figure 4 8  Function Summary   
12.  initialization part  Processes get started and  synced  input is read and distributed among these processes  and the prepara   tion of the cloud microphysics  function group  MP  is done here     The second half is the iteration part  where the actual weather forecasting takes  place  In a normal weather simulation this part would be much larger  But in  order to keep the recorded trace data and the overhead introduced by tracing  as small as possible only a few iterations have been recorded  This is sufficient  since they are all doing the same work anyway  Therefore the simulation has  been configured to only forecast the weather 20 seconds into the future  The  iteration part consists of two  large  iterations  Figure  6 1  B and C   each calcu   lating 10 seconds of forecast  Each of these in turn is partitioned into several   smaller  iterations     For our observations we focus on only two of these small  inner iterations  since    47    MET  a  Te 62  IDENTIFIED PROBLEMS AND SOLUTIONS    this is the part of the program where most of the time is spent  The initialization  work does not increase with increasing forecast duration and in a real world run  takes a relatively small amount of time  The constant part at the beginning of  each large iteration takes less than a tenth of the whole iteration time  Therefore  by far the most time is spent in the small iterations  Thus they are the best place  to look at for improvement     All screenshots starting with Figure have
13.  installation depends  on the operation system     To install Vampir on a Unix machine the tarball has to be unpacked after having  placed it in an arbitrary directory     On Windows platforms Vampir comes with an installer what makes the installa   tion very simple and straightforward  Just run the installer and follow the instal   lation wizard  Install Vampir in a directory of your choice  we recommend     C  Program Files     In order to run the installer in silent  unattended  mode use the  S option  It is  also possible to specify the output directory of the installation with  D air  An  example of running a silent installation is as follows     Vampir 7 3 0 Standard setup x86 exe  S  D C  Program Files    If you want to  you can associate Vampir with OTF trace files    otf  during the  installation process  The Open Trace Format  OTF  is described in Chapter i 2   This allows you to load a trace file quickly by double clicking it  Subsequently   Vampir can be launched by double clicking its icon or by using the command line  interface  see Chapter  2 4      2 2 Generation of Trace Data on Windows Systems    2 2 1 Enabling Performance Tracing    The generation of trace log files for the Vampir performance visualization tool  requires a working monitoring system to be attached to your parallel program     The Event Tracing for Windows  ETW  infrastructure of the Windows client and  server OS s is such a monitor  The Windows HPC Server 2008 version of MS   MPI has built i
14.  into account otherwise  Vampir will render the whole trace invalid  Good practice is to hold all files be   longing to one trace in a dedicated directory     Detailed information about the Open Trace Format can be found in the       open  documentation     1 3 Vampir and Windows HPC Server 2008    The Vampir performance visualization tool usually consists of a performance  monitor   Vampir Trace  that records performance data and a performance GUI   which is responsible for the graphical representation of the data  In Windows  HPC Server 2008  the performance monitor is fully integrated into the operating  system  which simplifies its employment and provides access to a wide range of  system metrics  A simple execution flag controls the generation of performance  data  This is very convenient and an important difference to solutions based  on explicit source  object  or binary modifications  Windows HPC Server 2008  is shipped with a translator  which produces trace log files in Vampirs Open  Trace Format  OTF   The resulting files can be visualized very efficiently with the  Vampir 7 performance data browser       http    www tu dresden de zih otf    waa   waa  waa   mama          DOMM          BHHHHHH  aBHHEBHHH     1 aa  mmmHmHHu  oopeoenns  Dee nmmummum  BHEBGDUUD   DUBDHEHEH  mumnnu     EDBBBHH  manono      oodccse  manom onon  BmBHBH  msBEBBDHDDDBODBHHEHHH    2 Getting Started    2 1 Installation of Vampir    Vampir is available on all major platforms but naturally its
15.  manually by setting the environment variable VT  FILE PREFIX  to the desired name     After a run of an instrumented application the traces of the single processes need  to be unified in terms of timestamps and event IDs  In most cases  this happens  automatically  If it is necessary to perform unification of local traces manually   use the following command     O      vtunify  lt nproc gt    prefix      lf VampirTrace was built with support for OpenMP and or MPI  it is possible to  speedup the unification of local traces significantly  To distribute the unification  on multiple processes the MPI parallel version vtunify mpi can be used as fol   lows     O    v mpirun  np  lt nranks gt  vtunify mpi  lt nproc gt  Sprerix      http    www tu dresden de zih vampirtrace    AE    2 4 Starting Vampir and Loading a Trace File    Viewing performance data with the Vampir GUI is very easy  On Windows the  tool can be started by double clicking its desktop icon  if installed  or by using  the Start Menu  On a Unix based machine run      vampir    in the directory where  Vampir is installed     To open a trace file  select    Open       in the    File    menu  which provides the file  open dialog depicted in Figure  2 2  It is possible to filter the files in the list  The  file type input selector determines the visible files  The default  OTF Trace Files   on    shows only files that can be processed by the tool  All file types can be  displayed by using    All Files            Alterna
16.  run of the application using the Microsoft tool mpicsync  Now the eventlog files  can be converted into OTF files with help of the tool etiZotf  The last neces   sary step is to copy the generated OTF files from the compute nodes into one  shared directory  Then this directory includes all files needed by the Vampir per   formance GUI  The application performance can be analyzed now     The following commands illustrate the procedure described above and show  as  a practical example  how to trace an application on the Windows HPC Server  2008  For proper utilization and thus successful tracing  the file system of the  cluster needs to meet the following prerequisites     e    Nshare userHome    is the shared user directory throughout the cluster  e MS MPI executable myApp  exe is available in the shared directory    e      share userHome Trace       is the directory where the OTF files are col   lected    1  Launch application with tracing enabled  use of  tracefile option      mpiexec  wdir   share userHome    tracefile SUSERPROFILE  trace etl myApp exe    e    wdir sets the working directory  myApp exe has to be there    e SUSERPROFILE  translates to the local home directory  e g     C iNUsersNuserHome     on each compute node the eventlog file   etl   is stored locally in this directory    LIT  waa     waa  waa   mamm          DOMM              BHHHHHH  mmmumumnu   aa  mamHHEHHH   EBBHEHHR  EBEHBHHH   BHEBHHHH  mEHEBGDUUD   DUBDHEHEH  mHumpnnu      DBBBHH  manono       nD
17.  was possible to inspect and measure subroutines and subsubroutines of MICRO   PHYSICS  This way the most time consuming subroutines have been spotted   and could be analyzed for optimization potential     The review showed that there were a couple of small functions which were called  a lot  So we simply inlined them  With Vampir you can determine how often a  functions is called by changing the metric of the Function Summary to the num   ber of invocations     The second inefficiency we discovered were invariant calculations being done  inside loops  So we just moved them in front of their loops     Figure  6 3 sums up the tuning of the computational imbalance and the serial op   timization  In the Timeline you can see that the duration of the MICROPHYSICS     90    amm  ann  waa   waa  waa  mama          DOMM          BHHHHHH  mammHmnnu   na  mammHEHHH ana  EBHEHBHHH   BZHEBHHHH  BHEZGDUUD   DUBDHEHEH  mumnnu     DBBBHH  manono       nBuBHHBH  manom onon Tonnage  zaa    CHAPTER6  AUSECASE AA    routine is now equal among all processes  Through serial optimization its dura   tion has been decreased from about 1 5 to 1 0 second  A decrease in duration  of about 33  is quite good given the simplicity of the changes done     6 2 3 The High Cache Miss Rate    Vampir   Trace View    traces success story pmp old otf           W file Edit Chat filter Window Help    mime issus     Timeline              148 s  1 5  2 5  Process 0    Process 0  Values of Counter  PAPI L2 TCM  our Tim
18. 00 us 850 000000 us  m MPI_IrecvQ  256 000000 us 256 000000 us  S EXC MOD   FORM NDER I 371 270000 ms 406 000000 us  GRID MOD  TAKE PTR  GRID C 370 864000 ms 15 000000 ps  E m GRID MOD  TRANSFER GRID 370 852000 ms 156 000000 us  a m GRID_MOD   TRANSFER GRID 370  700000 ms 28 000000 us       GRID MOD  TRANSFER  GRID  Callers Callees    ll GRID MOD  TRANSFER  GRID  Room MOD  TAKE PTR GRID C     Fl  GRID MOD  TAKE PTR GRID R                      Figure 4 7  Call Tree    4 2 2 Function Summary    The    Function Summary    chart  Figure  4 8  gives an overview of the accumu   lated time consumption across all function groups  and functions  For example  every time a process calls the MPI Sena  function the elapsed time of that func   tion is added to the MPI function group time  The chart gives a condensed view  on the execution of the application and a comparison between the different func   tion groups can be made so that dominant function groups can be distinguished  easily     It is possible to change the information displayed via the context menu entry    Set  Metric  that offers values like  Average Exclusive Time        Number of Invocations       Accumulated Inclusive Time  and others     Note   Inclusive  means the amount of time spent in a function and all of its sub   routines   Exclusive  means the amount of time just spent in this function     The context menu entry  Set Event Category  specifies whether either function  groups or functions should be displayed in the
19. 3 45 s 13 50 s 13 55 s  rocess 0  Values of Counter  PAPI FP OPS  over Time i       i i i i       Figure 4 4  Counter Data Timeline    An example    Counter Data Timeline    chart is shown in Figure  4 4  The chart is  restricted to one counter at a time  lt shows the selected counter for one process   Using multiple instances of the    Counter Data Timeline    counters or processes  can be compared easily     The context menu entry    Set Counter    allows to choose the displayed counter  directly from a drop down list  The entry    Set Process    selects the particular  process for which the counter is shown     27    am  Am  GWT ee 42  STATISTICAL CHARTS    4 1 3 Performance Radar    The Performance Radar chart provides the search of function occurrences in the  trace file and the extended visualization of counter data     It can happen that a function is not shown in  Master  and  Process Timeline   due to a short runtime  An alternative to zooming is the option  Find Function      A color coded timeline indicates the intervals in which the function is executed     Vampir    Trace View   C AZIHfiofwrf otf   W File Edit Chat Filter    Window Help    EriSOesERS amp v 7  ME  Timeline   955115 9551435 955155 955175 955195  Occurrences of Function    ALL SUB K  over Time      Process 0          Process 1 M    Process 2   3     KS  Process 3         posed  Process 4         DEC  Process 5     e    Process 6         LES    4       Figure 4 5  Performance Radar Timeline   Search 
20. 767 MB   490 575188 MB s       2 124947 MB  488  396869 MiB s     1 136177 MiB  488  144613 MB s       1 685493 MiB  288 048712 MB s       1 026314 MiB  487 977614 MiB s       113 445312 KiB  437 023005 MB s    938 445312 KiB  486 700691 MB s       1 355904 MiB  485 985429 MB s     r   1 57563 MiB  485 258178 MiB s   l     713 445312 KiB    497 293101 MiB s  495 620 109 MiB s    483 428245 MiB s 35 054688 KiB    478 558369 MiB s         1 24604 MiB    600 945312 KB   825 945312 KB    281  164062 KiB    2 23481 MiB    299 0625 KiB    375 945312 KB    243 75 KB    595 53125 KiB    349 03125 KiB    245 21875 KiB  525 46875 KiB  443 649 197 MiB s   488 445312 KiB       Figure 4 10  Message Summary Chart with metric set to    Message Trans   fer Rate    showing the average transfer rate  A   and the mini   mal maximal transfer rate  B     33    am  Am  GWT ene 42  STATISTICAL CHARTS    All values are represented in a bar chart fashion  The number next to each  bar is the group base while the number inside a bar depicts the different values  depending on the chosen metric  Therefore  the  Set Metric  sub menu of the  context menu can be used to switch between  Aggregated Message Volume     Message Size      Number of Messages     and    Message Transfer Rate        The group base can be changed via the context menu entry  Group By   It is    possible to choose between  Message Size      Message Tag     and    Communica   tor  MPI         Note  There will be one bar for every occurr
21. AMILTONIAN_MO   S_PROJECTORS_I  51 951954 s  FIELDS MOD  CON   RUCTOR  NCP  FD I    e LII    EWALD MOD  EWALD ENERGY  sa    42 242958 s  SYS MOD  5YS START  28 416744 s  ERROR  MOD  ERROR  START     EWALD_MOD  EWALD_ENERGY 25 28145 s   EXC_MOD  WB_VXC_I    EWALD MOD  EWALD ENERGY ji    M    AA EAE Contest View    Process 13             Ubaldi at ml  Process 14    Al Lil  LLN EWALD MOD  EWALD ENERGY   H EIL n     EWALD MOD  EWALD ENERGY    Process 19      i Lo  Process 18     d Il    Ak  ne   LI    BR ARG  MOD  ARG START    e hh d  lll ATOMIC OPERATORS  NCP  MOD  CONSTRUCTOR  AO  i     eg  fl  ATOMIC  OPERATORS  NCP  MOD  CONSTRUCTOR AP  Ug EWALD MOD  EWALD ENERGY   Mi ATOMIC OPERATORS NCP  MOD  FORM  FACTORS I  MM  ATOMIC OPERATORS  NCP MOD  TYPE DATA I   i     EI CONFIG MOD  CONSTRUCTOR  CFG  Process24    III   EWALD_MOD  EWALD_ENERGY   lll CONFIG MOD  STANDARD PORTAL I   i i i i i i E i   E CRYSTAL MOD  CONSTRUCTOR CRYS 2    E  ELECTRONS_MOD  CONSTRUCTOR_EL   mn m e                 Figure 3 1  Trace View Window with Charts Toolbar  A  and Zoom Toolbar  B     IS opened automatically after loading has been finished  The charts can be di   vided into three groups  timeline   statistical   and informational charts  Timeline  charts show detailed event based information for arbitrary time intervals while  statistical charts reveal accumulated measures which were computed from the  corresponding event data  Informational charts provide additional or explanatory  information regar
22. BBHHBH  madan onon Tonnage  zaa       Rank 0 node  myApp exe  1  Run myApp with tracing enabled          iii an  2  Time Sync the ETL logs L y  3  Convert the ETL logs to OTF mpicsync    4  Copy OTF files to head node    ums       HEAD NODE        share           Figure 2 1  MS MPI Tracing Overview    2  Time sync the eventlog files throughout all compute nodes     mpiexec  cores 1  wdir  USERPROFILE S mpicsync trace etl  e  cores 1  run only one instance of mpicsync on each compute node    3  Format the eventlog files to OTF files    mpiexec  cores 1  wdir  USERPROFILES  etl2otf trace etl  4  Copy all OTF files from compute nodes to trace directory on share   mpiexec  cores 1  wdir  USERPROFILE  cmd  c copy  y        otfx       share userHome Trace           More information about performance tracing of MPI applications can be found in    the Microsoft HPC SDK tutorial     gt   Tracing the Execution of MPI Ap   plications with Windows HPC Server 2008          http   resourcekit windowshpc net  MORE  INFO TracingMP lApplications html    AE      A    2 3 Generation of Trace Data on Linux Systems    The generation of trace files for the  Vampir  performance visualization tool re   quires a working monitoring system to be attached to your parallel program     Contrary to Windows HPC Server 2008   whereby the performance monitor is in   tegrated into the operating system   recording performance under Linux is done  by a separate performance monitor  We recommend our Vampir Trace 
23. BHH  manono          nDBBHBH    CHAPTER 4  PERFORMANCE DATA VISUALIZATION   ane    0000    It is possible to profile only one function or function group or to hide functions and  function groups from the displayed information  To mark the function or function  group to be profiled or filtered just click on the associated color representation in  the chart and the context menu will contain the possibilty to profile or filter via the  context menu entry    Profile of Selected Function  Group     or    Filter of Selected  Function  Group      Using the  Process Filter   see Section 4 4  allows you to re   strict this view to a set of processes     The context menu entry  Sort by  allows you to order function profiles by    Num   ber of Clusters   This option is only accessible if the chart is clustered otherwise  function profiles are sorted by process automatically  Profiling one function al   lows you to order functions by length in addition via context entry  Sort by Value      4 2 4 Message Summary    The  Message Summary    is a statistical chart showing an overview of the differ   ent messages grouped by certain characteristics as shown in Figure  4 10     Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    Message Summary                   880 MiB s 800 MiB s 720 MiB s 640 MB s 560 MiB s 480 MiB s 400 MiB s 320 MiB s 240 MiB s 160 MiB s 80 MiB s 0 MiB s      150 945312 KB     2 344673 MiB     1 90522 MiB     1 795357 MiB     2 015083 MiB  1 465
24. D  CONSTRUCTOR EXT  2    F  FIELDS MOD  CONSTRUCTOR FD    f  FIELDS MOD  CONSTRUCTOR NCP FD I   Mi HAMILTONIAN MOD  CONSTRUCTOR HC    fl  HAMILTONIAN MOD  FORM RS PROJECTORS I   Mi HAMILTONIAN MOD  GET SITE DATA I   Mi 10 MOD  10 START    f  LAYOUT  MOD  CONSTRUCTOR  LAY    f  LAYOUT MOD  GATHER R LAY     f  MPI  MOD  ALLGATHERV  INT    fl  MPI MOD  ALLREDUCE LOG    F  MPI MOD  BROADCAST CH    fll MPI MOD  BROADCAST DPR1    f  MPI MOD  MPI START    ff  NCP  DATA MOD  READ LINEAR I    F  SOCORRO   Mi SYMMETRY  MOD  FORM  SYMMETRIZING  STRUCTURES  I   F  SYMMETRY  MOD  GENERATE SPACE  GROUP I   F  erg MOD  SYS START    fj SYS_MOD  SYS_STOP   lg TAU USER          Figure 3 7  Docking of a Chart    When hover the blank space between labels and graphical representation  a  moveable seperator appears  After clicking a separator decoration  moving the  mouse while leaving the left mouse button pressed causes resizing  The whole  process is illustrated in Figure  3 8    3 2 Context Menus    All of the chart displays have their own context menu with common entries as  well as display specific ones  In the following section  only the most common  entries will be discussed  A context menu can be accessed by right clicking in  the display window     Common entries are     e Reset Zoom  Go back to the initial state in horizontal zooming   e Reset Vertical Zoom  Go back to the initial state in vertical zooming     e Set Metric  Change values which should be represented in the chart  e g    
25. EWALD MOD  EWALD ENERGY  D EXC  MOD  FORM EXC FIELDS I  ll EXC MOD  FORM VXC FIELDS 1  MM EXC MOD  WB vxc 1  lll EXC MOD  XCDEN xc  ll Exc MOD  XCPOT xc  D EXTERNAL  MOD  CONSTRUCTOR EXT 2   F  FIELDS  MOD  CONSTRUCTOR FD   ff  FIELDS  MOD  CONSTRUCTOR  NCP FD I  Mi HAMILTONIAN  MOD  CONSTRUCTOR HC  MM HAMILTONIAN MOD    M RS PROJECTORS I  Mi HAMILTONIAN MOD  GET SITE DATA I  Mi 10 MOD  10 START   f  LAYOUT  MOD  CONSTRUCTOR  LAY   f  LAYOUT  MOD  GATHER BR LAY   m ver   f  MPI MOD  ALLGATHERV  INT   ff  MPI MOD  ALLREDUCE LOG   F  MPI  MOD  BROADCAST  CH   fl MPI MOD  BROADCAST DPR1   ff  vpt MOD  MPI START   ff  NCP DATA MOD  READ LINEAR I   F  SOCORRO  Mi SYMMETRY MOD  FOR   ZING STRUCTURES I   F  SYMMETRY MOD  GENERATE SPACE GROUP I   9  es MOD  SYS START   f  svs Mop  SYS STOP  lg TAU usen          View Chart Filter    Sru eE B iu                           Function Summary  250s Os  MPI    A  EWALD MOD  EWALD ENERGY JA    Message Summary  14000 12000 10000 8000 6000 0 0 Q   Te me             nm m   nnm mm tt  2 zen   G    2974 7 8125 KiB    750s  1035 248779 3    500 s                   SYM  I   Communication Matrix View    Process 0  Process 1  Process 2  Process 3  Process 4  Process 5  Process 6    Process 7          y mp 4  S P S y PF b    a x Process    Lo A Q3 a  SS S           S g    48 4 444444446    Be   process  EN       Zoom     Apply Global Process Filter       Function  DI m SYMMETRY MOD  GENERATE SPACE GROUP I       EI SYMMETRY  MOD  DISTRIBUTE FIELD  POINT
26. Exclusive Time  to  Inclusive Time      e Sort By  Rearrange values or bars by a certain characteristic     17    ZZ  Amy    Vampir    Trace View   D  ZIH soc128 otf   View Chart Filter  z I Tm np   on      EENMOTFERSES BOYZ  All Processes  Accumulated Exclusive Time per Function Group  50s 458 40s 35s 30s          13 491091s    11 228158 s  10 325262 s     lt   All Processes  Accumulated Exclusive Time per Function Group  50s 45s 40s 35s  s s  s 15s 10s    WALD MOD  EWALD ENERGY    SYMMETRY  MOD     G SIRIICTURES 1    IELDS MOD GON BWUCTYR NCP FD I  bYS MOD  M    e   on A  Pe   CONSTRI  CTOR EXT  2    RTOMIC  OPERATO   FORMIIFACTORS 1   IPI MOD  MPI START   AU USER   CRYSTAL MOD  CONSTRUGFOR CRYS 2  ATOMIC OPERATOR   OD  AYPE DATA I Wi       Figure 3 8  Resizing Labels   A  Hover a Seperator Decoration   B  Drag and  Drop the Seperator    3 3 Zooming    Zooming is a key feature of Vampir  In most charts it is possible to zoom in  and out to get abstract and detailed views of the visualized data  In the timeline  charts  zooming produces a more detailed view of a special time interval and  therefore reveals new information that could not be seen in the larger section   Short function calls in the  Master Timeline  may not be visible unless an ap   propriate zooming level has been reached  If the execution time of these short  functions is too short regarding the pixel resolution of your computer display  the  selection of a shorter time interval is required     Note  Other ch
27. G W   forschung t innovation       Vampir 7    User Manual       EEE  ERE  EERE  EERE  GELEET  GELEET  ERED  ERED  ERED  EE NNI II EL EL E   Bannan  Aa  V8 8 B Ec                CEET  EERE  EERE  EERE  E HEN  E  E ED UL E NI  ES ES ES  EL I I I NI  E ES ES  ERI I I I M   1 E ES ESI RI I I I  L1 EJ ES ES  ERI I I I  SIE  E ES ESI I I I  DETE E I M       EE EH LL SL EL EL EL E 1 E 1 E 1 E31 E EE ESI ERI RI RI I I       V E          TR       GWI forschung innovation    Copyright     2011 GWT TUD GmbH    Blasewitzer Str  43  01307 Dresden  Germany    http   gwtonline de    Support   Feedback   Bugreports    Please provide us feedback  We are very interested to hear what people like   dislike  or what features they are interested in     If you experience problems or have suggestions about this application or manual     please contact service vampir eu     When reporting a bug  please include as much detail as possible  in order to  reproduce it  Please send the version number of your copy of Vampir along with  the bugreport  The version is stated in the  About Vampir  dialog accessible from  the main menu under    Help     About Vampir        Please visit http    vampir euj for updates and new versions     httpi  vanplr eu    Manual Version  2011 06 18   Vampir 7 4    waa   waa  waa   mama        DOMM          aaa  mammumnnu      naa  mEBHEEHHHU ana  mmmanoon  BHEBGHHHH  BHEBUDUUD   DUBUHEHEH  mumnnu     DBDBBHH  manono     numBHBm  manom onon  BBHBH  zaa    Contents VAMPIR    Con
28. S I  Mi NCP  DATA MOD  CONSTRUCTOR  PD          MM  RG MOD  ARG START   lll ATOMIC OPERATORS  NCP  MOD  CONSTRUCTOR  AO   fll ATOMIC OPERATORS  NCP  MOD  CONSTRUCTOR  AP  lll ATOMIC OPERATORS  NCP  MOD  FORM  FACTORS I    f  ATOMIC OPERATORS NCP MOD  TYPE DATA I   m CONFIG  MOD   CONSTRUCTOR  CFG     f  MPI MOD  VOTE    9  MPI MOD  BROADCAST INT  ll MPI MOD  BROADCAST DPR  Mi MPI MOD  ALLREDUCE DPR  MM MPI AlireduceQ    Bil MPT Allnatherv                Ce  ee  ee           Mi SYMMETRY_MOD  FORM_SYMMETRIZING_STRUCTURES_I    Y Min Inclusive Time   0 000000 s  0 000000 s  0 000000 s  0 000000 s  0 000000 s  0 000000 s  0 000000 s  0 000000 s  0 000000 s  n nnnnnn s       Figure 3 4  A Custom Chart Arrangement in the Trace View Window    15    ZZ  ZI       Function Group Legend           fl  ARG MOD  ARG START  lll ATOMIC OPERATORS  NCP  MOD  CONSTRUCTOR  AO   fl  ATOMIC OPERATORS  NCP MOD  CONSTRUCTOR  AP  Mi ATOMIC OPERATORS NCP MOD  FORM FACTORS I   f  ATOMIC OPERATORS  NCP MOD  TYPE DATA I   f  CONFIG MOD  CONSTRUCTOR  CFG  lll CONFIG MOD  STANDARD PORTAL I  ll CRYSTAL MOD  CONSTRUCTOR CRYS 2   f  ELECTRONS MOD  CONSTRUCTOR EL   F  ELECTRONS MOD  STANDARD PORTAL I   F  ERROR  MOD  ERROR  START  ll EWALD MOD  EWALD ENERGY  LL m cuc son     rans ru rena x             Figure 3 5  Closing  right  and Undocking  left  of a Chart    added by clicking on the respective  Charts  toolbar icon or the corresponding   Chart  menu entry  With a few more clicks  charts can be combined to 
29. This tool experience is now available for HPC  systems that are based on Microsoft Windows HPC Server 2008  This new Win   dows edition of Vampir combines modern scalable event processing techniques  with a fully redesigned graphical user interface     1 1 Event based Performance Tracing and Profiling    In software analysis  the term profiling refers to the creation of tables  which sum   marize the runtime behavior of programs by means of accumulated performance  measurements  Its simplest variant lists all program functions in combination  with the number of invocations and the time that was consumed  This type of  profiling is also called inclusive profiling  as the time spent in subroutines is in   cluded in the statistics computation     A commonly applied method for analyzing details of parallel program runs is to  record so called trace log files during runtime  The data collection process itself  is also referred to as tracing a program  Unlike profiling  the tracing approach  records timed application events like function calls and message communica   tion as a combination of timestamp  event type  and event specific data  This  creates a stream of events  which allows very detailed observations of parallel  programs  With this technology  synchronization and communication patterns of  parallel program runs can be traced and analyzed in terms of performance and  correctness  The analysis is usually carried out in a postmortem step  i  e   after    d  Ay    completio
30. Timeline Master Timeline  Process 2 Function Function  Process 4   MPI Beast MPI Ecast  MPI  2490043 s 228 55 HS  5  827255 350 ns  292 66274 ms 228 5 US                Process 0    Function Group MPI  Process 6 Interval Begin 5 489814 s  p o Interval End o   82 255  bids Duration 292 9109 ms    Wo   IH gn Hm ti    Process 10  Process 12  Process 14       Figure 4 16  Comparison between Context Information    39    Amy  p dA    4 4 Information Filtering and Reduction    Due to the large amount of information that can be stored in trace files  it is  usually necessary to reduce the displayed information according to some filter  criteria  In Vampir  there are different ways of filtering  It is possible to limit  the displayed information to a certain choice of processes or to specific types  of communication events  e g  to certain types of messages or collective oper   ations  Deselecting an item in a filter means that this item is fully masked  In  Vampir  filters are global  Therefore  masked items will no longer show up in any  chart  Filtering not only affects the different charts  but also the  Zoom Toolbar    The different filters can be reached via the  Filter  entry in the main menu     Example  Figure  4 17  shows a typical process representation in the  Process  Filter  window  This kind of representation is equal to all other filters  Processes  can be filtered by their  Process Group        Communicators    and  Process Hier   archy   Items to be filtered are arra
31. UCTOR FD  14 217819 s  ELECTRONS MOD  CONSTRUCTOR  EL  14 007424 s  MPI MOD  MPI  START  13 729707 s  CONFIG MOD  STANDARD PORTAL I  13 491091s  ATOMIC OPERATO   FORM FACTORS I  10 966021s    SOCORRO  10 325262 s  CRYSTAL_MOD  CONSTRUCTOR_CRYS_2  9 409968 s   ATOMIC OPERATOR   OD  TYPE DATA I  9 249719 s  HAMILTONIAN  MO   CONSTRUCTOR  HC  6 638595 s  EXC MOD  FORM EXC FIELDS I  5 875283 s  LAYOUT  MOD  CONSTRUCTOR  LAY  5 851204  ELECTRONS MOD    NDARD PORTAL I  4 446968 s  SYS MOD  SYS STOP  3 998655 s   ARG MOD  ARG START  2 545526 s   ATOMIC  OPERATO   CONSTRUCTOR  AO  1 909427 s  MPI  MOD  ALLREDUCE  LOG  1 790565 s  MPI MOD  ALLGATHERV  INT  1 489477 s  LAYOUT_MOD  GATHER_R_LAY  1 449485 s  NCP  DATA  MOD  READ LINEAR  I    Vampir    Trace View   D  ZIH soc128 otf   File View Help          Process 0  Process 1  Process 2  Process 3  Process 4  Process 5  Process 6  Process 7  Process 8  Process 9  Process 10  Process 11  Process 12  Process 13  Process 14  Process 15  Process 16  Process 17  Process 18  Process 19  Process 20  Process 21  Process 22  Process 23    Process 24        m ARG  ARG    7 ATOMIC  OPERATORS     CONSTRUCTOR AO   ffl ATOMIC OPERATORS   D  CONSTRUCTOR  AP  Mi ATOMIC OPERATORS   D  FORM FACTORS I  D ATOMIC  OPERATORS _   _ MOD  TYPE DATA I   f  CONFIG MOD  CONSTRUCTOR  CFG  ll CONFIG MOD  STANDARD PORTAL I  lll CRYSTAL MOD  CONSTRUCTOR CRYS 2   f  ELECTRONS MOD  CONSTRUCTOR EL   F  ELECTRONS  MOD  STANDARD PORTAL I   F  ERROR  MOD  ERROR  START  ll 
32. a custom  chart arrangement as depicted in Figure  3 4  Customized layouts can be saved  as described in Chapter  5 3    Every chart can be undocked or closed by clicking the dedicated icon in its upper  right corner as shown in Figure  3 5  Undocking a chart means to free the chart  from the current arrangement and present it in an own window  To dock undock  a chart follow Figure  3 6  respectively Figure  3 7    Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    EvuLeJFERms BG                      40s 60s 80s    WALD_MOD  EWALD_ENERGY    SUD ENERGY  Process2      Process3       ZWALD  ENERGY  EWALD  ENERGY    MOD   EWALD_ENERGY           Process 12   UL LIL LII   OU Ak CWALD VALD ENERGY  Process 13 l F H VALD ENERGY   OD  EWALD ENERGY    Function Legend E AX   Function Summary   f  ARG MOD  ARG START    Al Processes  Accumulated Exclusive Time per Function Group   B ATOMIC OPERATORS NCP MOD CONSTRUCTOR  AP NE NN NN    ym  img   J  BEES W EWA Y  ll ATOMIC OPERATORS NCP MOD  FORM  FACTORS  1 Bue een  FA 0080 ENERG  IB  ATOMIC OPERATORS NCP MOD  TYPE  DATA I                 t   fl CONFIG_MOD  CONSTRUCTOR_CFG l 2852 773858 s  M SYMMETRY  MOD      SPACE GROUP I  lll CONFIG MOD  STANDARD  PORTAL I     445 185915 s  J HAMILTONIAN_MOD  GET_SITE_DATA_I       Mi CRYSTAL MOD  CONSTRUCTOR CRYS 2   D ELECTRONS MOD  CONSTRUCTOR EL i i      F  ELECTRONS MOD  STANDARD PORTAL I i i 26 382357 s   HAMILTONIAN MO   S PROJECT ORS I       ERROR  MOD  ERROR  START   
33. arts can be affected when zooming in timeline displays  Meaning  the interval chosen in a timeline chart such as  Master Timeline  or  Process  Timeline  also defines the time interval for the calculation of accumulated mea   surements in the statistical charts     otatistical charts like the  Function Summary  provide zooming of statistic values   In these cases zooming does not affect any other chart  Zooming is disabled in  the  Pie Chart    mode of the  Function Summary  reachable via context menu    18    MEN  ann  waa   waa  waa   mama          DOMM  mananam        BHHHHHH  msEHHEHHU    ZBHBHHBHH  mamHHEHHH aa  mmmanoon    BHEBHHHH  msHEZGDUUD   DUBDHEHEH  mumnnu     DBBBHH  manono       ooocse  madan nnnu onama    CHAPTERS  BASICS 1 e    nnn    under    Set Chart Mode     Pie Chart        To zoom into an area  click and hold the left mouse button and select the area  as    Vampir    Trace View   D  ZIH soc128 otf   W File View Help  View Chart Filter    EBries ites BO  sg    Process 1  Process 2    Process 3    Process 6    Process 7    Process 11    Process 12    Process 16    Process 17       Figure 3 9  Zooming within a Chart    shown in Figure  3 9  lt is possible to zoom horizontally and in some charts also  vertically  Horizontal Zooming in the  Master Timeline    defines the time interval  to be visualized whereas vertical zooming selects a group of processes to be  displayed  To scroll horizontally move the slider at the bottom or use the mouse  wheel     Addit
34. button held  the intended position executes horizontal  zooming in all charts     Note  Instead of dragging boundaries it is also possible to use the mouse wheel  for zooming  Hover the  Zoom Toolbar  and scroll up to zoom in and scroll down  to zoom out     Dragging the zoom area changes the section that is displayed without changing  the zoom factor  For dragging  click into the highlighted zoom area and drag and  drop it to the desired region  Zooming and dragging within the  Zoom Toolbar  is  shown in Figure  3 10  If the user double clicks in the  Zoom Toolbar     the initial  zooming state is reverted     The colors represent user defined groups of functions or activities  Please note  that all charts added to the  Trace View  window will adapt their statistics in   formation according to this time interval selection  The  Zoom Toolbar  can be  disabled and enabled with the toolbar s context menu entry  Zoom Toolbar      3 5 The Charts Toolbar    Use the  Charts  toolbar to open instances of the different charts  It is situated  in the upper left corner of the main window by default as shown in Figure  3 1  Of  course it is possible to drag and drop it as desired  The  Charts  toolbar can be  disabled with the toolbar s context menu entry  Charts      Table 3 1  shows the different icons representing the charts in  Charts  toolbar   The icons are arranged in three groups  divided by a small separator  The first  group represents timeline charts  whose zooming states aff
35. can be increased  by a recipient that delays reception for some reason  This will cause the dura   tion to increase  by this delay  and the message rate  which is the size of the    34    nmn  ann  CECR  waa  waa   mama            DOMM  mananam     BHHHHHHM  aBHHEHHH        ZBHBHHBHH  mEBHHEBHHH   EBBHEBUHR  mmmanoon    BHEBHHHH  BHEZGDUUD   DUBUHEHEH  manonn     EDBBBHH  mmmnnu       nDBBHBH  manom onon  BnBHBH  msBEBBDDHDDDBODBHHEHHH    Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    CEA Ya BOU    nl                   Average Bandwidth    Process 47    Process 48    Process 49    Process 50    Process 51    Process 52    Process 53    Process 54    Process 55    Process 56    Process 57    Process 58    Process 60       Figure 4 11  Communication Matrix View    message divided by the duration  to decrease accordingly     4 2 6 1 0 Summary    The    I O Summary     shown in Figure is a Statistical chart giving an overview  of the input  output operations recorded in the trace file     Vampir    Trace View   C  ZIH IO Trace dios mpi vt otf   File Edit Chart Filter Window Help    EYL SOS ERSS d   wn m    All Processes  Number of WO Operations per Operation Type  70 k 65 k 60 k 55 k 50 k          45 k 40 k 35k 30k 25k 20k 15k    M  752 MMB CLOSE   1 752   og   96  SEEK  48  SYNC       Figure 4 12  WO Summary    All values are represented in a histogram fashion  The text label indicates the  group base while the number inside each bar represents t
36. cess 2 1  191315 8 ERROR  MPI Type or ous  ES is For v  Type   Process 3 1 191929 s ERROR  MPI Type contiguous  oldtype is Fortran Typ     Process 0 1 191796 s WARNING  Tag   80000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 1 191882 s WARNING  Tag   90000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 1 192668 s WARNING  Tag   80000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 1 192702 s WARNING  Tag   90000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 1 193078 s WARNING  Tag   80000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 1 19311s WARNING  Tag   90000 greater then 32767   MPI only guarantees tags up to this  THIS implementation allows tags up to 138603128   Process 0 A pam     e oe ar     then hy   w guar uaran     ah up   rin THIS             tags up   pouason  Drarace n Tan tar than 220757 ta THTE i lam     ac          Figure 4 14  A chosen marker  A  and its representation in the Marker View  B        Context View    may contain several tabs  a new empty one can be added by  clicking on the    add    symbol on the right hand side  If an object in another chart  is selected its information is displayed in the 
37. current tab  If the    Context View    is  closed it opens automatically in that moment     The    Context View    offers a comparison between the information that is displayed  in different tabs  Just use the         on the left hand side and choose two objects  in the emerged dialog  Itis possible to compare different elements from different  charts  this can be useful in some cases  The comparison shows a list of com   mon properties  The corresponding values are displayed and their difference if  the values are numbers  The first line always shows the names of the displays     38    nmn  ann  mumua     DMM  waa  manaa     BHHHHH  mama          DEMM              BHHHHHM  mmamHmnmnu  o  mmmHEHHH ana  EBHEHBHBH  BHEBHHHH  msHEGDUUD   DUBUHEHEH  mumpnnu     EDBBBHH  manono          nBnBHHBH  madan onon Tonnage  zao    Vampir    Trace View   D  ZIH soc128 otf   File View Help z 53x  View Chart Filter    aliua Cl ZE      Context View          FunctionSummary    FE MasterTimeine E   Empty  3     Property Value                   33 4s 33 5 s 33 6 s 33 7 s    L      UL    Interval Begin 33 42685 s  IntervalEnd 33 81029 s  0 38344 s    IO  MOD  10  START  IO MOD  10  START                      Figure 4 15  Context View  showing context information  B  of a selected function   A     Vampir    Trace View   C  ZlHfiofwrf otf   File Edit Chart Filter    Window Help    Context View      888 Master Timeline       SS Master Timeline      piff E3  value 1 Comparison   Value 2 DiFF  Master 
38. ding timeline  and statistical charts  All available charts can be  opened with use of the  Charts  toolbar which is explained in Chapter 3 5    In the following section we will explain the basic functions of the Vampir GUI  which are generic to all charts  Feel free to go to Chapter  4  to skip the funda   mentals and directly start with the details about the different charts     13    AE      A    3 1 Chart Arrangement    The utility of charts can be increased by correlating them and their provided in   formation  Vampir supports this mode of operation by allowing to display multiple  charts at the same time  Charts that display a sequence of events such as the     Master Timeline    and the    Process Timeline    chart are aligned vertically  This  alignment ensures that the temporal relationship of events is preserved across  chart boundaries     The user can arrange the placement of the charts according to his preferences by  dragging them into the desired position  When the left mouse button is pressed  while the mouse pointer is located above a placement decoration then the layout  engine will give visual clues as to where the chart may be moved  As soon as  the user releases the left mouse button the chart arrangement will be changed  according to his intentions  The entire procedure is depicted in Figures  3 2  and    The flexible display architecture furthermore allows increasing or decreasing the  screen space that is used by a chart  Charts of particular interest 
39. e       Figure 6 4  Before Tuning  Counter Data Timeline revealing a high amount of L2  cache misses inside the CLIPPING routine  light blue        Vampir   Trace View    traces success story pmp tuned otf  W file Edit Chart filter Window Help  Ev Be TFERSESBYZI    Timeline   2 0 s  2 55  3 0 s  3 5 s  4 0             121s  0 5 5  1 0 s  1 5 5       Process 0       Figure 6 5  After Tuning  Visible improvement of the cache usage    51    a  a  Te 68  CONCLUSION    Problem    As can be seen in the Counter Data Timeline  Figure  6 4  the CLIPPING routine   light blue  causes a high amount of L2 cache misses  Also its duration is long  enough to make it a candidate for inspection  What caused these inefficien   cies in cache usage were nested loops  which accessed data in a very random   non linear fashion  Data access can only profit from cache if subsequent reads  access data that are in the vicinity of the previously accessed data     Solution    After reordering the nested loops to match the memory order  the tuned version  of the CLIPPING routine now needs a fraction of the original time   Figure  6 5     6 3 Conclusion    Vampir   Trace View    traces success story pmp tuned otf       W file Edit Chart Alter Window Help        X    i  poco  SS m dw e F i  ii     imeline x Function S  Os  50s rene s  150 s  200 s All Processes  Accumulated Ex m ive Time per F               40  0  30  0  20  0  10  0  0  0                    Process 0  Process 7     Process 14    Process 21 
40. e Display  Message Profile  Function Summary  Process Profile    General    Counter Display   Timeline Display  MessageStatistics Display  Zoom Display    Counters  Messages  Function Groups   Layout  n       L         w    a Appearance  UO Events  Collectives  Appearance Markers       Saye changes in selected categories  O Always  Saving Polic    Never    G  Ask             Figure 5 3  Saving Policy Settings    In the dialog    Saving Behavior    you tell Vampir what to do in the case of changed  preferences  The user can choose the categories of settings  e g  layout  that    45    y   Y A  Ay  Ww B    SAVING POLICY    should be treated  Possible options are that the application automatically    Al   ways    or    Never    saves changes  The default option is to have Vampir asking  you whether to save or discard changes     Usually the settings are stored in the folder of the trace file  If the user has no  write access to it  it is possible to place them alternatively in the  Application Data  Folder     All such stored settings are listed in the tab    Locally Stored Preferences     with creation and modification date     Note  On loading Vampir always favors settings in the    Application Data Folder           Default Preferences    offers to save preferences of the current trace file as de   fault settings  Then they are used for trace files without settings  Another option  is to restore the default settings  Then the current preferences of the trace file  are reverted
41. ect all other charts   The second group consists of statistical charts  providing special information    20    nmn  ann  aEEUSB             BHHHM  MENS                     BBHHH  manaa                         BHHHHH  mama    h  o     DOMM  mananam     BHHHHHH  msmamHmnnu      haa  mmHHEHHH  EBBHEBSHR  EBHEHBHBH    BHEBHHHH  BHEUDUUD   DUBDHEHEH  mumpnnu     DBBBHH  manono      oodccse  manom onon oocse  zaa    CHAPTERS  BASICS  e    Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    Eriexs ems ics        All Processes  Accumulated Exclusive Time pg    50s 75s 100s 125s 150s 175s 200 s   5000 s                Process 23    Process24   il EWALD_MOD  EWALD_ENERGY l Mi CONFIG MOD  STANDARD PORTAL I  i                   Mi CRYSTAL MOD  CONSTRUCTOR CRYS 2      EI ELECTRONS MOD  CONSTRUCTOR EL             Figure 3 10  Zooming and Navigation within the Zoom Toolbar   A B  Zooming  in out with Mouse Wheel   C  Scrolling by Moving the Highlighted  Zoom Area   D  Zooming by Selecting and Moving a Boundary of  the Highlighted Zoom Area    and statistics for a chosen interval  Vampir allows multiple instances for charts  of these categories  The last group comprises informational charts  providing  specific textual information or legends  Only one instance of an informational  chart can be opened at a time     3 6 Properties of the Trace File    Vampir provides an info dialog containing the most important characteristics of  the opened trace file  This dialog 
42. he receiver of the message  The correspond   ing function calls normally reflect a pair of MPI communication directives like  MPI Send   and MPI Recv    It is also possible to show a collective communi   cation like MPI Allreduce   by selecting one corresponding message as shown  in Figure 4 3  Furthermore additional information like message bursts  markers  and UO events is available  Table shows the symbols and descriptions of  these objects     25    Te A1  TIMELINE CHARTS       Figure 4 3  Selected MPI Collective in Master Timeline    Symbol    Message Burst   Due to a lack of pixels it is not possible to display   i   i a large amount of messages in a very short time  interval  Therefore outgoing messages are summarized  as so called message bursts  In this representation  you cannot determine which processes receive these  messages   Zooming into this interval reveals the  corresponding single messages     Markers To indicate particular points of interest during the    F multiple runtime of an application  like errors or warnings mark       single ers can be placed in a trace file  They are drawn as  triangles  which are colored according to their types  To  illustrate that two or more markers are located at the  same pixel  a tricolored triangle is drawn     I O Events Vampir shows detailed information about I O oper   ations  if they are included in the trace file    O events   mV are depicted as triangles at the beginning of an I O  interval  Multiple I O events are 
43. he so   called timeline chart  This chart type graphically presents the chain of events  of monitored processes or counters on a horizontal time axis  Multiple timeline  chart instances can be added to the  Trace View  window via the  Chart  menu  or the  Charts  toolbar     Note    To measure the duration between two events in a timeline chart Vampir provides  a tool called Huler  In order to use the Huler click on any point of interest in a  timeline display and move the mouse while holding the left mouse button and     Shift    key pressed  A ruler like pattern appears in the current timeline chart   which provides rough measurement directly  The exact time between the start  point and the current mouse position is given in the status bar  If the  Shift  key  is released before the left mouse button  Vampir will proceed with zooming     4 1 1 Master Timeline and Process Timeline    In the Master and the Process Timeline detailed information about functions   communication  and synchronization events is shown  Timeline charts are avail   able for individual processes   Process Timeline   as well as for a collection of  processes   Master Timeline    The  Master Timeline  consists of a collection of  rows  Each row represents a single process  as shown in Figure 4  1  A  Process  Timeline  shows the different levels of function calls in a stacked bar chart for a  single process as depicted in Figure  4 2     Every timeline row consists of a process name on the left and a col
44. he value of the chosen    35    a  GWT bsos     43  INFORMATIONAL CHARTS    metric  The  Set Metric  sub menu of the context menu can be used to access  the available metrics  Number of I O Operations      Accumulated 1 0 Transaction    Sizes   and all ranges of    I O Operation Size      I O Transaction Time   or    I O  Bandwidth      The I O operations can be grouped by the characteristics    Transaction Size      File  Name   and  Operation Type   The group base can be changed via the context  menu entry    Group I O Operations by        Note  There will be one bar for every occurring metric  For a quick and con   venient overview it is also possible to show minimum  maximum  and average  values for the metrics  Transaction Size Range of 1 0 Operations      Time Range  of I O Operations     and    Bandwidth Range of I O Operations  all at once  The  minimum and maximum values are shown in an additional  smaller bar beneath  the bar indicating the average value  The additional bar starts at the minimum    and ends at the maximum value of the metric     To select what l O operation types should be considered for the statistic cal   culation the    Set I O Operations    sub menu of the context menu can be used   Possible options are  Read    Write      Read  Write   and    Apply Global I O Op   erations Filter  including all selected operation types from the    I O Events  filter    dialog  see Chapter 4 4      4 3 Informational Charts    4 3 1 Function Legend    The  Functi
45. ing group  However  if metric is set  to  Message Transfer Rate   the minimal and the maximal transfer rate is given  in an additional bar beneath the one showing the average transfer rate  The ad   ditional bar starts at the minimal rate and ends at the maximal one     To filter out messages click on the associated label or color representation in the  chart and choose    Filter    from the context menu afterwards     4 2 5 Communication Matrix View    The    Communication Matrix View    is another way of analyzing communication  imbalances  It shows information about messages sent between processes  The  chart  as shown in Figure 4 1 1  is figured as a table  Its rows represent the send   ing processes whereas the columns represent the receivers  The color legend  on the right indicates the displayed values  Depending on the displayed informa   tion the color legend changes     It is possible to change the type of displayed values  Different metrics like the  average duration of messages passed from sender to recipient or minimum and  maximum bandwidth are offered  To change the type of value that is displayed  use the context menu option    Set Metric      Use the  Process Filter  to define which processes groups should be displayed    see Section 4 4      Note  A high duration is not automatically caused by a slow communication path  between two processes  but can also be due to the fact that the time between  starting transmission and successful reception of the message 
46. ionally the zoom can be accessed with help of the    Zoom Toolbar    by drag   ging the borders of the selection rectangle or scrolling down the mouse wheel as  described in Chapter    To return to the previous zooming state the global  Undo    is provided that can  be found in the  Edit  menu  Alternatively  press  Ctrl  Z  to revert the last zoom   Accordingly  a zooming action can be repeated by selecting  Redo   in the  Edit   menu or pressing  Ctri Shift Z     Both functions work independently of the cur   rent mouse position  Next to  Undo  and  Redo  it is shown which kind of action  in which display could be undone and redone  respectively  To get back to the  initial state of zooming in a fast way select  Reset Horizontal Zoom  or  Reset    19    am  Am  GWT  11 1 1 24  THE ZOOM TOOLBAR    Vertical Zoom   see Section 3 2  in the context menu of the desired timeline dis   play  To reset zoom is also an action that can be reverted by  Undo      3 4 The Zoom Toolbar    Vampir provides a    Zoom Toolbar     that can be used for zooming and naviga   tion in the trace data  It is situated in the upper right corner of the    Trace View     window as shown in Figure 3 1  Of course it is possible to drag and drop it as  desired  The  Zoom Toolbar  offers an overview of the data displayed in the  corresponding charts  The current zoomed area can be seen highlighted as a  rectangle within the  Zoom Toolbar   Clicking on one of the two boundaries and  moving it  with left mouse 
47. is called  Irace Properties  and can be ac   cessed by  File     Get Info   The information originates from the trace file and  includes details such as the filename  the creator  and the OTF version     21    Be    Description  Master Timeline Section 4 1 1     O  O    Process Timeline Section 4 1 1     Counter Data Timeline Section 4 1 2    ar d i    Performance Radar Section 4 1 3     Function Summary Section 4 2 2     a    Message Summary Section 4 2 4     Process Summary Section 4 2 3    IT  LI  I  LI  b  b  b     ERT  err    Communication Matrix View   Section 4 2 5  I O Summary Section  4 2 6    Call Tree Section 4 2 1    Function Legend Section 4 3 1   Context View Section 4 3 3     Marker View Section       Table 3 1  Icons of the Toolbar    22    waa   waa  waa   mama               DOMM  mananam       anana  aBHHEBHHH           na  mmmHmHHu  oopeoenns  Dee nmmummum  BHEBGDUUD   DUBDHEHEH  mumnnu     EDBBBHH  mmmnnu hhh  manom onon  BmBHBH  msBEBBDHDDDBODBHHEHHH    4 Performance Data Visualization    This chapter deals with the different charts that can be used to analyze the be   havior of a program and the comparison between different function groups  e g   MPI and Calculation  Even communication performance issues are regarded in  this chapter  Various charts address the visualization of data transfers between  processes  The following sections describe them in detail     4 1 Timeline Charts    A very common chart type used in event based performance analysis is t
48. l  51 951954s  FIELDS  MOD  CON  RUCTOR  NCP  FD I   a poban pasa ee        42 242958 s  SYS MOD  SYS START   m EXC_MOD  FORM_VXC_FIELDS I     28 416744 s ERROR  MOD  ERROR  START  Mon                     Figure 3 6  Undocking of a Chart    Considering that labels  e g  those showing names or values of functions  of   ten need more space to show its whole text  there is a further form of resiz   ing arranging  In order to read labels completely  it is possible to resize the distri   bution of space owned by the labels and the graphical representation in a chart     16    amm  ann  munna      DMM  waa  waa  mama            DOMM  mananam         BHHHHHH  msmamHmnnu      GBBHHBHH  mmHHEHHH  EBBHEBSHR  EBHEHBHBH    BHEBHHHH  BHEUDUUD   DUBDHEHEH  mumpnnu     DBBBHH  manono      oodccse  manom onon oocse    CHAPTER 3  BASICS CAMPI    Vampir    Trace View   D ZIHAsoc128 otf   a File View Help    D ARG MOD  ARG START   lll ATOMIC OPERATORS NCP MOD  CONSTRUCTOR AO   f  ATOMIC OPERATORS NCP MOD  CONSTRUCTOR AP  Mi ATOMIC OPERATORS  NCP MOD  FORM FACTORS I   f  ATOMIC OPERATORS NCP MOD  TYPE DATA I    f  CONFIG MOD  CONSTRUCTOR CFG   ll CONFIG MOD  STANDARD PORTAL I   lll CRYSTAL MOD  CONSTRUCTOR CRYS 2    ff  ELECTRONS MOD  CONSTRUCTOR EL    F  ELECTRONS MOD  STANDARD PORTAL I    F  ERROR  MOD  ERROR START   ll EWALD MOD  EWALD ENERGY    fl  EXC MOD  FORM EXC FIELDS I   lll Exc MOD  FORM VXC FIELDS I   MM Exc MOD  WB vxc I   lll Exc MOD  XCDEN xc   Mi Exc MOD  XCPOT xc   D EXTERNAL MO
49. lor blindness     5 2 Appearance    In the    Appearance    settings of the  Preferences  dialog there are six different ob   jects for which the color options can be changed  the functions function groups   markers  counters  collectives  messages and l O events  Choose an entry and  click on its color to make a modification  A color picker dialog opens where it is  possible to adjust the color  For messages and collectives a change of the line  width is also available     In order to quickly find the desired item a search box is provided at the bottom of  the dialog     44    one  waa  waa  waa  mama          DOMM             BHHHHHH  maaHHEHHnH     1 GBBHHBHH  maBHHEHHH  EBDBHEHHR  EBHEHBHHH    BHEBHHHH  BHEGDUUD   DUBDHEHHH  mumnnu     EDBBBHH  manono          nnBHBm    nnn    Preferences               Function Groups Waasia anae Even       Name    Application     DYN    Gnral     Default  mp IJO       IO  NETCDF     MEM     MPI      E  NoGroup     PHYS    5 VT API     WRF    Appearance         Saving Policy    o  6                         Search      L                      Figure 5 2  Appearance Settings    5 3 Saving Policy    Vampir detects whenever changes to the various settings are made  In the    Sav   ing Policy    dialog it is possible to adjust the saving behavior of the different com   ponents to the own needs     Preferences      Saving behavior   Locally stored preferences Default preferences      Categories       E Displays  Performance Radar  ProcessTimelin
50. maaHHEHHH      naa  mmmHmHHu ana  mmmanoon     BHEBHHHH  BHEBGDUUD   DUBDHEHEH  manonn     DBBBHH  mmmnnu      oodccse    mnnnnnun  nnn    Vampir   Trace View    traces success story pmp old otf       W file Edit Chart Filter Window Help    Ex AS YAUA       Timeline         Function Summa  AN  148 s  1 5  2s KSE  4 5  5 5  6 5  7s   All Processes  Accumulated Exclusive                   E i 2096 096    O  El uerge    Process 0 Dou       Process 1         4 71   IM MPI wait     4 63  rem     4 61   IM EXCHANGE  2 02  lADVECTION PD     ech  MPI Barrier   0 25   MPI Isend      UT    J 1  elt       Process 2    Process 3             0 196  RUNGE KUTTA  n TP     0 01x  MPI Allreduce      096  MPI Comm size  Process 4           35 8496    Figure 6 2  Before Tuning  Master Timeline and Function Summary identifying  MICROPHYSICS  purple color  as predominant and unbalanced    Vampir   Trace View    traces success story pmp tuned otf       W file Edit Chart fiter Window Help    f E 1 ti am cE v  m ev See gji SE Ss d  ee  L AX   121 s  1 8  2 5  3 s All Processes  Accumulated Exclusive TI                  MICROPHYSICS  Process 0 MPI Recv  TENDENCIES  EXCHANGE    Process 1      5 CLIPPING    3 78   fADVECTION PD  1 84  f MPI Wait   0 4496 j MPI Isend  0 34   MPI Barrier   0 236  RUNGE KUTTA       0 01  kane      lt 0   MPI Comm size    Process 2  Process 3    Process 4     a     Figure 6 3  After Tuning  Timeline and Function Summary showing an improve   ment in communication behavior  
51. may get more  space in order to render information in more detail     Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    Sri   egps Do D                      2852 773858 s g     MOD   SPACE GROUP I  E  445 185915s   HAMILTONIAN MOD  GET SITE DATA I     418 615375 s   SYMMETRY MOD     G STRUCTURES I     96 382357 s   HAMILTONIAN  MO   S  PROJECTORS I  51 951954 s  FIELDS MOD  CON  RUCTOR  NCP  FD  I  i 42 242958 s  SYS MOD  SYS START     28 416744 s  ERROR  MOD  ERROR  START  i 25 28145s  EXC MOD  WB VXC I  i 24 169955  CONFIG MOD  CONSTRUCTOR  CFG    22 504484 s  TAU USER    20 890244 s  EXTERNAL  MOD  CONSTRUCTOR  EXT  2    20 518876 s  FIELDS_MOD  CONSTRUCTOR_FD  14 217819 s  ELECTRONS MOD  CONSTRUCTOR  EL  14 007424 s  MPI MOD  MPI START  3 729707 s  CONFIG MOD  STANDARD PORTAL I    2852 773858 s  M SYMMETRY MOD     SPACE GROUP I      i 445 185915s   HAMILTONIAN MOD  GET SITE DATA I  gt      418 615375s   SYMMETRY MOD     G STRUCTURES 1  io 96 382357  HAMILTONIAN MO   S PROJECTORS I   51 951954   FIELDS_MOD  CON  RUCTOR_NCP_FD_I  42 242958 s  SYS MOD  SYS START  28 416744 s  ERROR MOD  ERROR  START  25 28145 s  EXC MOD  WB VXC I  24 16995 s   CONFIG_MOD  CONSTRUCTOR_CFG  22 504484 s  TAU USER  20 890244 s  EXTERNAL  MOD  CONSTRUCTOR  EXT  2  20 518876 s  FIELDS MOD  CONSTRUCTOR  FD  14 217819 s  ELECTRONS MOD  CONSTRUCTOR  EL  14 007424 s  MPI MOD  MPI  START  13 729707 s  CONFIG MOD  STANDARD PORTAL  I       13 491091s   ATOMIC  OPERATO   FORM
52. monitor   ing facility which is available as Open Source software     During a program run of an application  Vampir Trace generates an OTF trace file   which can be analyzed and visualized by Vampir  The Vampir Trace library allows  MPI communication events of a parallel program to be recorded in a trace file   Additionally  certain program specific events can also be included  To record MPI  communication events  simply relink the program with the Vampir Trace library   A new compilation of the program source code is only necessary if program   specific events should be added     Detailed information of the installation and usage of VampirTrace can be found    in the       VampirTrace User Manual          2 3 1 Enabling Performance Tracing    To perform measurements with VampirTrace  the application program needs to  be instrumented  Also VampirTrace handles this automatically by default  man   ual instrumentation is also possible     All the necessary instrumentation of user functions  MPI  and OpenMP events is  handled by the compiler wrappers of Vampir Trace  vtcc  vtcxx  vtf77  vtf90 and  the additional wrappers mpicc vt  mpicxx vt  mpif77 vt  and mpif90 vt in Open  MPI 1 3      All compile and link commands in the used makefile should be replaced by the  Vampir Irace compiler wrapper  which performs the necessary instrumentation of  the program and links the suitable Vampir Trace library     Automatic instrumentation is the most convenient method to instrument your pro
53. n Legend  Information Filtering and Reduction    5 Customization  5 1 General Preferences  5 2 Appearance    5 3 Saving Policy    Introduction  6 2 Identified Problems and Solutions  6 2 1 Computational Imbalance  6 2 2 Serial Optimization  6 2 3 The High Cache Miss Rate  6 3 Conclusion       Contents    waa   waa  waa   mama               DOMM  mananam       BHHHHHH  aBHHEBHHH           na  mmmHmHHu  oopeoenns  Dee nmmummmm  BHEBGDUUD   DUBDHEHEH  mumnnu     EDBBBHH  mmmnnu hhh  manom onon  BmBHBH  msBEBBDHDDDBODBHHEHHH    GBAPTERT  INTRODUCTION FC    1 Introduction    Performance optimization is a key issue for the development of efficient parallel  software applications  Vampir provides a manageable framework for analysis   which enables developers to quickly display program behavior at any level of de   tail  Detailed performance data obtained from a parallel program execution can  be analyzed with a collection of different performance views  Intuitive navigation  and zooming are the key features of the tool  which help to quickly identify in   efficient or faulty parts of a program code  Vampir implements optimized event  analysis algorithms and customizable displays which enable a fast and interac   tive rendering of very complex performance monitoring data  Ultra large data  volumes can be analyzed with a parallel version of Vampir  which is available on  request     Vampir has a product history of more than 15 years and is well established  on Unix based HPC systems  
54. n of the program  It is needless to say that program traces can also be  used to calculate the profiles mentioned above  Computing profiles from trace  data allows arbitrary time intervals and process groups to be specified  This is in  contrast to    fixed    profiles accumulated during runtime     1 2 The Open Trace Format  OTF     The Open Trace Format  OTF  was designed as a well defined trace format with  open  public domain libraries for writing and reading  This open specification of  the trace information provides analysis and visualization tools like Vampir to op   erate efficiently at large scale  The format addresses large applications written  in an arbitrary combination of Fortran77  Fortran  90 95 etc    C  and C       DLE ae ee an EE    Events  name x events   Ar            Statistics  name x stats           Snapshots  name x snaps                       Local Definitions                                                                            Events    Master Control  name otf  AP                                                                                                                                           Statistics  Snapshots  Local Definitions  w Events  Global Definitions  name 0 def  a                Statistics             EE       Ee EE EE El    Figure 1 1  Representation of Streams by Multiple Files    OTF uses a special ASCII data representation to encode its data items with num   bers and tokens in hexadecimal code without special prefixes 
55. n support for this monitor  It enables application developers to  quickly produce traces in production environments by simply adding an extra  mpiexec flag   t race   In order to trace an application the user account is re     Be  Ay    quired to be a member of the    Administrator    or    Performance Log Users    groups   No special builds or administrative privileges are necessary  The cluster admin   istrator will only have to add the    Performance Log Users    group to the head  node s  Users  group  if you want to use this group for tracing  Trace files will be  generated during the execution of your application  The recorded trace log files  include the following events  Any MS MPI application call and low level com   munication within sockets  shared memory  and NetworkDirect implementations   Each event includes a high precision CPU clock timer for precise visualization  and analysis     2 2 2 Tracing an MPI Application    The steps necessary for monitoring the MPI performance of an MS MPI appli   cation are depicted in Figure First the application needs to be available  throughout all compute nodes in the cluster and has to be started with tracing  enabled  The Event Tracing for Windows  ETW  infrastructure writes eventlogs    etl files  containing the respective MPI events of the application on each com   pute node  In order to achieve consistent event data across all compute nodes  clock corrections need to be applied  This step is performed after the successful 
56. ng Policy  Color blindness              Enable support for color blindness   Deuteranope  Protanope          Figure 5 1  General Settings     Show time as  decides whether the time format for the trace analysis is based    43    ET  GWT veer 52  APPEARANCE    on seconds or ticks     With the    Automatically open context view    option disabled Vampir does not open  the context view after the selection of an item  like a message or function        Use color gradient in charts    allows to switch off the color gradient used in the  performance charts     The next option is to change the style and size of the font        Show source code    enables the possibility to open an editor show the respective  Source file  In order to open a source file first click on the intended function in the     Master Timeline    and then on the source code path in the    Context View     For  the source code location to work properly you need a trace file with source code  location support  The path to the source file can be adjusted in    Preferences     dialog  A limit for the size of the source file can be set  too     In the  Analysis  section the number of analysis threads can be chosen  If this  option is disabled Vampir determines the number automatically by the number  of cores  e g  two analysis threads on a dual core machine     In the  Updates  section the user can decide if Vampir should check automati   cally for new versions     It is also possible to use Vampir with support for co
57. nged in a spreadsheet representation  In  addition to selecting or deselecting an entire group of processes  it is certainly  possible to filter single processes     Filter Processes       Include  Exclude All Include Exclude All    MPI Communicator 0    mars       Communicators    Process Hierarchy    Process 14  Process 15       Number of processes  16             Selected processes  16          Figure 4 17  Process Filter    Different selection methods can be used in a filter  The check box    Include Exclude  All    either selects or deselects every item  Specific items can be selected deselected  by clicking into the check box next to it  Furthermore  it is possible to se   lect deselect multiple items at once  Therefore  mark the desired entries by    40    waa   waa  waa   mama          DOMM          aaa  2 12 ha  mmHHEHHH aa  kwali nana  wao ha  manonn   oooooe  manono     oonnmm  manom onon  BBHHBH    clicking their names while holding either the  Shift  or the  Ctrl  key  By hold   ing the  Shift  key every item in between the two clicked items will be marked   Holding the  Ctrl  key  on the other hand  enables you to add or remove specific  items from to the marked ones  Clicking into the check box of one of the marked  entries will cause selection deselection for all of them     Filter Object Filter Criteria    Processes Process Groups  Communicators  Process Hierarchy    Collective Operations   Communicators  Collective Operations    Messages Message Communicator
58. of Functions    By default the Performance Radar shows the values of one counter for each  process as shown in Figure In this mode the user can choose between     Line Plot    and    Color Coded    drawing  In the latter case a color scale on the  bottom informs about the range of values  Clicking on    Set Counter       leads  to a dialog which offers to choose another counter and to calculate the sum or  average values  Summarizing means that the values of the selected counter of  all processes are summed up  The average is this sum divided by the number of  processes  Both options provide a single graph     4 2 Statistical Charts    4 2 1 Call Tree    The    Call Tree   depicted in Figure  4 7  illustrates the invocation hierarchy of all  monitored functions in a tree representation  The display reveals information    28    waa  waa  waa  mama          DOMM           BHHHHHH  maaHHEunu    aa  mamHHEHHH    EBBBHEBSHR  EBHEHBHBH    BHEBHHHH  mBHEGDUUD   DUHUHEHEH  mumnnu     EDBBBHH  manono          nDBBHBH  madan onon oocse      Vampir    Trace View   C  ZIHfiofwrf otf   W  Ser  Ge FERis GOO   Timeline  Os 25 105 15s 205  values of Counter MEM APP ALLOC  over Time  Fracess 0    Process 1  Process 2  Process 3  Process 4    Uh       Figure 4 6  Performance Radar Timeline   Visualization of Counters    about the number of invocations of a given function  the time spent in the differ   ent calls and the caller callee relationship     The entries of the    Call Tree    can be s
59. on Legend  lists all visible function groups of the loaded trace file  along with its corresponding color     If colors of functions are changed  they appear in a tree like fashion under their  respective function group as well  see Figure 4 13     4 3 2 Marker View    The    Marker View  lists all marker events included in the trace file     The display is made up in a tree like fashion and organizes the marker events in  their respective groups and types  Additional information  like the time of occur   rence in the trace file and its description is provided for each marker     36    waa   waa  waa  mama      1 1  BHHH  mananam          anana  aBHHEHHH naa  mmHHEHHH ana  EHEHBHBH  ooooomanN  BHEGDUUD   DUBDHEHHH  manonn      DBBBHH  manono       nBBHBH  manom onon oocse  zaa    Vampir    Trace View   C  ZlH io wrf otf  mE  Bx           ELEH CEREA OMe IIIA        Function Summa ry  All Processes  Accumulated Exclusive Time per Function Group    240 5 220 5 200 5 180 s 1605 140 s 1205 100 s  2584175    3329s MO NETCDF  402 62 ms  O     100 ms  VT API     100 ms  MEM             po  z IO NETCDF  z MEM       MPI          Hl PHYS     VT  API  E  WRF    o   2 o   Fis c      Si     wu                                    Figure 4 13  Function Legend    By clicking on a marker event in the    Marker View     this event gets selected in  the timeline displays that are currently open and vice versa  If this marker event  is not visible  the zooming area jumps to this event automaticall
60. ored se   quence of function calls or program phases on the right  The color of a function    23    a  Te 41  TIMELINE CHARTS    Vampir    Trace View   D  IH Ysoc128 otf   File View Help 3    View Chart Filter    EBries 2m S BIG  7        Timeline   AX    90 s 92s 94s 96 s 3 100s 102s 104s 106s 108s    MPI_wWaitany    aa  WPL Waitaavi   OAI  lr  Waitany    LIES Ald aa LX  Zo  ZEV i LL e  MPI m  IE o odi  MPI Waitany    MPI Waitany    MPI Waizanyi   MPI Waitany    MPI Waitanyt    i    PI Waitany    NIC PL PP RTT  l   I Wailtanvi   m     UI Waitany    n      tir Waitany    riri Waitanyii  hi a a  an    iri Waitanv    ii Maitany       inp W aitan y        MPI_Waitany         Figure 4 1  Master Timeline    is defined by its group membership  e g   MPI Send   belonging to the function  group MPI has the same color  presumably red  as MPI Recv    which also be   longs to the function group MPI  Clicking on a function highlights it and causes  the    Context View    display to show detailed information about that particular func   tion  e  g  its corresponding function group name  time interval  and the complete  name  The    Context View    display is explained in Chapter 4 3 3    some function invocations are very short thus these are not show up in the over   all view due to a lack of display pixels  A zooming mechanism is provided to  inspect a specific time interval in more detail  For further information see Sec   tion  3 3  If Zooming is performed  panning in horizontal direc
61. orted in various ways  Simply click on  one header of the tree representation to use its characteristic to re sort the    Call  Tree     Please note that not all available characteristics are enabled by default   To add or remove characteristics a context menu is provided accessible by right   click on any of the tree headers     To leaf through the different function calls  it is possible to fold and unfold the  levels of the tree  This can be achieved by double clicking a level  or by using the  fold level buttons next to the function name     Functions can be called by many different caller functions  what is hardly obvi   ous in the tree representation  Therefore  a relation view shows all callers and  callees of the currently selected function in two separated lists  as shown in the  lower area in Figure 4 7     To find a certain function by its name  Vampir provides a search option accessi   ble with the context menu entry  Show Find View   The entered keyword has to  be confirm by pressing the Return key  The  Previous  and  Next  buttons can  be used to flip through the results afterwards     29    am  Am  GWTzu 00 42  STATISTICAL CHARTS    Vampir    Trace View   D  ZIH soc128 otf   File View Help  View Chart Filter    ECKE EES HOY 7   Sen    Call Tree  Apply Global Process Filter       Function Max Indusive Time Max Exclusive Time  ES m FFT MOD  FFT 3D  PARALLEL 557 687000 ms 69 000000 us  a m EXC  MOD  XCPOT  XC 6 396320 s 38 279000 ms  B m EXC  MOD  FORM VXC FIELDS
62. s  Message Tags    UO Events I O Groups  File Names  Operation Types    Table 4 2  Options of Filtering       41    nmn  ann  mumua     DMM  waa  waa   mama          DOMM          anana  2 10 naa  mammHEHHH ana  mmmanoon  ooooomaN  wao      DUBDHEHHH  mumpnnu     EDBDBBHH  manono       nDBBHBH  manom onon  BBHBH    nnn    5 Customization    The appearance of the trace file and various other application settings can be  altered in the preferences accessible via the main menu entry  File     Prefer   ences     Settings concerning the trace file itself  e g  layout or function group  colors are saved individually next to the trace file in a file with the ending   vset   tings     This way it is possible to adjust the colors for individual trace files without  interfering with others     The options  Import Preferences  and  Export Preferences  provide the loading  and saving of preferences of arbitrary trace files     5 1 General Preferences    The    General    settings allow to change application and trace specific values     Preferences    Charts  Show time as  Seconds v    Automatically open context view    General Use color gradient in charts  Font  Arial Restore Default    Source code       C  Enable source code viewer  7 Local path to source Files   Prefix to remove from trace source File path     Appearance             Set maximum size For file  in KiByte     100  Analysis  W  C  Fix number of analysis threads    1    Updates  Automatically check for newer versions  Savi
63. tents    1  1 1 Event based Performance Tracing and Profiling              1  1 2 The Open Trace Format  OTF                        2  1 3 Vampir and Windows HPC Server 2008                  3   2 Getting Started 5  2 1 Installation of Vampir             02 D  2 2 Generation of Trace Data on Windows Systems              5   2 2 1 Enabling Performance Tracing                   5  2 2 2  racing an MPI Application                     6  2 3 Generation of Trace Data on Linux SystemsS               8  2 3 1 Enabling Performance Tracing                   8  2 3 2  racing an Application                         9  2 4 Starting Vampir and Loading a Trace File                 10   13  3 1 ChartArrangement                              14  TT 17  X RCRUM r   A A EU NW RN fi aa av fi dit ie A Sivik di v   18  15 3 4 9 5 8 X 8 15 99 ee de a ae BONOS SOR L   20  TIT 20  3 6 Properties of the Trace File                         21   23  41 TimelineCharisi                              23   4 1 1 Master Timeline and Process Timeline              23  4 1 2 Counter Data Timeline                        27  4 1 3 Performance Radar                         28  42 Statistical Charts                                28  T  28  WET KE EE een eeeve eae  30  kd down dom KON pi kw n RW ad ea es 32  ME AE taba chike pa mat kt BEL 33  KOY ae a NW L   di W ANN Edi Vo 34  42 6 I O Summary                              35    GWT       4 3    4 4    6 1    forschung innovation    Informational Charts  4 3 1 Functio
64. tion is possible with  the scroll bar at the bottom     The  Process Timeline  resembles the    Master Timeline  with slight differences   The charts timeline is divided into levels  which represent the different call stack  levels of function calls  The initial function begins at the first level  a sub function    24    waa   waa  waa   mama        DOMM          0 BHHHHHM  aEHHEHHH     ZBHBHBHBHH  mamHHEHHH nnaman  EBHEHBHHH   ocooooooe  wao      DUBDHEHHH  mumpnnu     DBBBHH  mmmnnu     oonnmm  manom onon oocse  zaa    Vampir    Trace View   D  ZIH soc128 otf   MW File View Help  View Chart Filter    SRM OC DEMS dg CURRERE    A    Ax   x   25s 40s 55s 70 85s 100 s 115s 130s 145s 160 s 175s 190 s A  Process 0                         x   A    EWALD  MOD    EWALD  ENERGY       Figure 4 2  Process Timeline    called by that function is located a level beneath and so forth  If a sub function  returns to its caller  the graphical representation also returns to the level above     In addition to the display of categorized function invocations  Vampir s    Master     and    Process Timeline    also provide information about communication events   Messages exchanged between two different processes are depicted as black  lines  In timeline charts  the progress in time is reproduced from left to right   The leftmost starting point of a message line and its underlying process bar  therefore identify the sender of the message whereas the rightmost position  of the same line represents t
65. tively on Windows  a command line invocation is possible   C  Program Files Vampir Vampir exe  trace file     To open multiple trace files at once you can take them one after another as com   mand line arguments     C  Program Files Vampir Vampir exe  file 1     file n     It is also possible to start the application by double clicking on a   otf file  If Vam   pir was associated with     otf files during the installation process      The trace files to be loaded have to be compliant with the Open Trace For   mat  OTF  standard  described in Chapter  1 2   Microsoft HPC Server 2008  is shipped with the translator program et 2otf exe  which produces appropriate  input files     While Vampir is loading the trace file  an empty    Trace View  window with a  progress bar at the bottom opens  After Vampir loaded the trace data com   pletely  a default set of charts will appear  The illustrated loading process can be  interrupted at any point of time by clicking on the cancel button in the lower right  corner as shown in Figure  2 3  Because events in the trace file are traversed one  after another the GUI will also open  but shows only the ealiest information from  the trace file  For huge trace files with performance problems assumed to be at  the beginning this proceeding is a suitable strategy to save time     Basic functionality and navigation elements are described in Chapter  3  The  available charts and the information provided by them are explained in Chap   ter  4     10 
66. tricolored and occupy a   ke    line to the end of the interval  To see the whole interval  of a single I O event the triangle has to be selected  In  that case a second triangle at the end of the interval  appears        Table 4 1  Additional Information in Master and Process Timeline    oince the  Process Timeline  reveals information of one process only  short black  arrows are used to indicate outgoing communication  Clicking on message lines  or arrows shows message details like sender process  receiver process  mes   sage length  message duration  and message tag in the  Context View  display     26    nmn  ann  waa     waa  waa          BHHHHHM  msmamumnnu      naa  mmmmmnHnu gBBBHESHR  EBHEHBHHH  BHEBHHHH  mBHEGDUUD   DUHUHEHEH  mumpnnu     EDBBBHH  manono      ooocse  madan onon  BmBHBH  zaa    4 1 2 Counter Data Timeline    Counters are values collected over time to count certain events like floating point  operations or cache misses  Counter values can be used to store not just hard   ware performance counters but arbitrary sample values  There can be counters  for different statistical information as well  for instance counting the number of  function calls or a value in an iterative approximation of the final result  Counters  are defined during the instrumentation of the application and can be individually  assigned to processes     Vampir    Trace View   D  ZIH wrf  1h otf     ExikLOTFERS 66  2  MEGA nn    13 10 13 15 s 13 20 s 13 25 s 13 30 s 13 35 s 13 40 s 1
67. work and therefore can slow  down the whole application     Vampir   Trace View   C  ZIH soc128 soc128 soc128 otf SE  W File Edit Chart Filter    window Help    EUS ODER SS  amp  70      Similar Processes  Accumulated Inclusive Time per Function    s 200 s a S 600 s 800 s 1 000 s 1 200 s 1 400 s      jJ        1    _   Lo j   _   L 0   L      0j  m   _         E      1  1  1  1  1  1  1  1  1  1  1  2  6  1    2    aj       Figure 4 9  Process Summary    The context menu entry    Set Event Category    specifies whether either function  groups or functions should be displayed in the chart  The functions own the color  of their function group     The chart can calculate the analysis based on    Number of Invocations        Accu   mulated Inclusive Time    or    Accumulated Exclusive Time     To change between  these three modes use the context menu entry    Set Metric        The number of clustered profile bars is based upon the window height by de   fault  You can also disable the clustering or set a fixed number of clusters via the  context menu entry    Clustering    by selecting the corresponding value in the spin  box  To the left of the clustered profile bars there is an overview of the cluster  associated processes  Moving the cursor over the blue places of the rectangle  shows you the process name as a tooltip     32    one  waa   waa  waa   EELT                 BHHHHHH  mammumnmu    naa  mEEHHEHHHU  EBBHEHHR  EBHEHBHBH   BHHEBHHHH  BHEUDUUD   DUHUHEHEH  manonn     EDBB
68. y  It is possible to  select markers and types  Then all events belonging to that marker or type gets  selected in the    Master Timeline    and the    Process Timeline   If  Ctrl  or    Shift     is pressed the user can highlight several events  In this case the user can fit the  borders of the zooming area in the timeline charts to the timestamps of the two  marker events that were chosen at last     4 3 3 Context View    As implied by its name  the    Context View    provides more detailed information of  a selected object compared to its graphical representation     An object  e g  a function  function group  message  or message burst can be  selected directly in a chart by clicking its graphical representation  For different  types of objects different context information is provided by the    Context View      For example the object specific information for functions holds properties like     Interval Begin        Interval End     and    Duration    as shown in Figure  4 15  The    37    a  GWT bsos 23  INFORMATIONAL CHARTS    Vampir    Trace View   D  ZIH datatype special otf   File View Help  View Chart Filter    Ere ERS ol VIA mmm   seess BI n    Property Value   Display Master Timeline   Type Marker Event   Description ERROR  MPI Type contiguous  oldtype is Fortran Type   Time 1 191657s   Process Process 0   Marker MARMOT Error   Group Error       Process  Processgroup    Process 0 1 191657 s ERROR  MPI  Type contiguous sidiype ka Fortrer Type    Process 1       Pro
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
第15号 - 松本市地区福祉ひろば  juin 2013 - Besançon Triathlon  8105 Accessoires pour modules E/S  Muriel Jolivet - Editions Philippe Picquier  USER MANUAL - CAMBOARD Electronics  Manual de Usuario User´s manual Manuel d      Copyright © All rights reserved. 
   Failed to retrieve file