Home

Scalasca User Guide

image

Contents

1. CUBE 3 4 User Guide March 2013 The Scalasca Development Team scalasca fz juelich de A J LICH Ce for Simulation Sciences FORSCHUNGSZENTRUM Copyright O 1998 2013 Forschungszentrum J lich GmbH Germany Copyright O 2009 2013 German Research School for Simulation Sciences GmbH Germany Copyright 2003 2008 University of Tennessee Knoxville USA All rights reserved Redistribution and use in source and binary forms with or without modification are per mitted provided that the following conditions are met e Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer in the documentation and or other materials provided with the distribution Neither the names of Forschungszentrum J lich GmbH the German Research School for Simulation Sciences GmbH or the University of Tennessee Knoxville nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBU TORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITN ESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRI
2. D 9 10e6 MPI 0 00 MPI_Bcast 1 39e7 Overhead 00 00 MPI_Finalize 2 61e9 Visits 00 00 MPI_Init 9 83e4 Synchronizations 0 0 00 MPI_Recy 0 62 Process 1 H 0 62 Process 2 H 0 62 Process 3 1 30e9 Communications 00 00 MPI_Send i 0 62 Process 32 1 02e13 Bytes transferred 00 00 TRACING p H E 0 62 Process 33 3 41e4 Computational imbalance e Gl 0 43 barrier_sync 0 62 Process 34 0 55 beast_int E 0 62 Process 35 0 12 beast_real fl 0 62 Process 64 0 11 decomp 0 62 Process 65 e arver m 0 62 Process 66 E 3 16e6 Subroutines a fl 0 62 Process 67 1 42e4 flux_err a H Il 0 62 Process 96 q ED El ED ld Ep 0 000000 3 175569e6 12 124495 261912587 0 000000 2 038284e4 0 641864 3 175569e6 0 000000 0 624831 0 003065 2 038284e4 Figure 1 7 CUBE flat profile Each tree has its own context menu which can be activated by a right mouse click within the tree s window If you right click on one of the tree s nodes this node gets framed and serves as a reference node for some of the menu items If you click outside of tree items there is no refernce node and some menu items are disabled The context menu consists depending on the type of the tree of some of the following items If you move the mouse over a context menu item the status bar displays some explanation of the functionality of that item 16 1 3 Using
3. b Online description Shows some usually more extensive online descrip tion for the reference node Disabled if no online description is available c Location Displays information about the module and position within the module line numbers where the callee method of the reference node is de fined d Source code Opens an editor for displaying editing and saving the source code of the callee of the reference node Begin and end of the relevant region are highlighted If the specified source code does not exists you are asked to choose a file to open Min max values Not for metric trees Here you can activate and deactivate the application of user defined minimal and maximal values for the color extremes i e the values corresponding to the left and right end of the color legend If you activate user defined values for the color extremes you are asked to define two val 19 Chapter 1 Cube 3 4 User Guide ues that should correspond to the minimal and to the maximal colors All values outside of this interval will get the color gray Note that canceling any of the input windows causes no changes in the coloring method If user defined min max val ues are activated the selected value information widget see Section displays a u foruser defined behind the minimal and maximal color values 22 Statistics Only available if a statistics file for the current CUBE file is provided Displays statistical informa
4. done erging metric dimension done done done done done done done done done done done done successfully Usage cube3_mean o output c C h cube o Name of the output file default mean cube c Do not collapse system dimension if experiments are incompatible C Collapse system dimension h Help Output a brief help message 3 4 1 4 Performance Algebra and Tools 1 4 4 Compare Compares two experiments and prints out if they are equal or not Two experiments are equal if they have same dimensions hierarchy and the equal values of the severieties An example of the output is below user host cube3_cmp remapped cube scoutl cube Reading remapped cube done Reading scoutl cube done Compare operation begins Experiments are not equal Compare operation ends successfully Usage cube3_cmp h cubel cube2 h Help Output a brief help message 1 4 5 Clean CUBE files may contain more data in the definition part than absolutely necessary The cube3_clean utility creates a new CUBE file with an identical structure as the input experiment but with the definition part cleaned up An example of the output is presented below user host cube3_clean remapped cube o cleaned cube Clean operation begins Reading remapped cube
5. r Re root call tree at named node p Prune call tree from named node f Filter call tree nodes matching patterns specified in filterfile h Help Output a brief help message 1 4 7 Part Partition CUBE files may contain data for processes that execute different executables or perform distinct roles within an application execution such that it can be desirable to partition some of the system tree processes and their associated threads and void the remainder The cube3_part utility creates a new CUBE file with an identical structure as the input report but with the specified processes marked lt VOID gt and their non Visit metric values set to zero Computational imbalance heuristic metrics are also voided as these would otherwise be inconsistent An example of the output is presented below user host cube3_part R 2 3 5 7 11 13 o primes cube input cube t Part operation begins Reading input cube done Part operation ends successfully Writing primes cube done Usage cube3_part h I R ranks o output input cube Inverse sense of partition R List of process ranks for partition e g 0 3 7 13 o Name of the output file default part cube gz h Help Show this brief help message and exit 36 1 4 Performance Algebra and Tools 1 4 8 Remap The Scalasca toolset initially creates CUBE files containing data for only a
6. textcolor keyword con Returns all attributes associated to the CUBE object as a map textcolor keyword const std vector lt std string gt amp get _mirrors textcolor keyword const Returns all mirrors defined in the CUBE object textcolor keywordtype int get _num _thrd textcolor keyword const Returns the maximal number of threads per process in the CUBE object 50 2 1 Creating CUBE Files 2 1 1 7 Writer Library in C In order to create data files another possibility is to use the C version of the CUBE writer API The interface defines a struct cube_t and provides the following functions cube _t cube _create Returns a new CUBE structure textcolor keywordtype void cube _free cube _t c Destroys the given CUBE structure cubel_metric cube _def _met cube _t c textcolor keyword const textcolo r keywordtype char disp _name textcolor keyword const textcolor keywordtype char uniql_name textcolor keyword const char uom textcolor keyword const textc char url textcolor keyword const textc textcolor keyword const textcolor keywordtype textcolor keyword const textcolor keywordtype cubel_metric parent aA a Returns a new metric structure cube _region cube _def _region cube _t c textcolor keyword const textcolor keywordtype char name textcolor keywordtype long endln textcolor keyword const textcolor
7. fers a set of operators that can be used to compare integrate and summarize multiple CUBE data sets The algebra allows the combination of multiple CUBE data sets into a single one that can be displayed and examined like the original ones In addition to the information provided by plain CUBE files a statistics file can be pro vided enabling the display of additional statistical information of severity values Fur thermore a statistics file can also contain information about the most severe instances of certain performance patterns globally as well as with respect to specific call paths If a trace file of the program being analyzed is available the user can connect to a trace browser i e Vampir or Paraver and then use CUBE to zoom their timelines to the most severe instances of the performance patterns for a more detailed examination of the cause of these performance patterns 1 3 Using the GUI The following sections explain how to use the CUBE display how to create CUBE files and how to use the algebra and other tools 1 3 Using the GUI This section explains how to use the CUBE QT display component After installation the executable cube3 qt can be found in the specified directory of executables speci fiable by the prefix argument of configure see the CUBE Installation Manual and can also be used via the alias cube3 The program supports as an optional command line argument the name of a cube file that w
8. inclusive absolute values relative to the largest inclusive absolute peer value i e to the largest inclusive value between all entities on the current hierarchy depth For example if there are 3 threads with inclusive absolute values 100 120 and 200 then they have the peer percent values 50 60 and 100 Peer distribution For the system tree only The peer distribution mode shows the percentage of the system nodes inclusive absolute values on the scale between the minimum and the maximum of peer inclusive absolute values For example if there are 3 threads with absolute values 100 120 and 200 then they have the peer distribution values 0 20 and 100 External percent Available for all trees if the metric tree is the left most widget To facilitate the comparison of different experiments users can choose the external percentage mode to display percentages relative to another data set The external percentage mode is basically like the metric root percentage mode except that the value equal to 100 is determined by another data set Note that in all modes only the leaf nodes in the system hierarchy i e processes or threads have associated severity values All other hierarchy levels i e machines nodes and eventually processes are only used to structure the hierarchy This means that their 14 1 3 Using the GUI oe 99 severity is undefined denoted by a minus sign when they are expanded 1 3 2 4 S
9. Displays information about the module and position within the module line numbers where the method is defined Source code For flat call profiles only for call trees see Call site and Called region below Disabled if there is no reference node Opens an editor for dis playing editing and saving the source code of the method region to which the reference node refers The begin and the end of the method region are highlighted If the specified source file is not found you are asked to choose a file to open The file is in a read only mode per default If you wish to edit the text please uncheck the Read only box in the bottom left corner For keyboard and mouse control see Section 1 3 4 Call site For call trees only Enabled only if there is a reference node Offers information about the caller of the reference node a Location Displays information about the module and position within the module line numbers of the caller of the reference node b Source code Opens an editor for displaying editing and saving the source code where the call for which the reference node stays for happens The begin and the end of the relevant source code region are highlighted If the specified source file is not found you are asked to chose a file to open Called region For call trees only Enabled only if there is a reference node Offers information about the reference node a Info Gives some short information about the reference node
10. Under this menu item you can define a file size threshold in bytes above which CUBE offers you dy namic data loading If a file being opened is larger than this threshold CUBE will ask you if you wish dynamic loading Screenshot The function offers you to save a screen snapshot in a PNG file Unfortunately the outer frame of the main window is not saved only the application itself Quit Ctrl Q Closes the application Recent files The last 5 opened files are offered for re opening the top most being the most recently opened one A full path to the file is visible in the status bar if you move the mouse above one of the recent file items in the menu 2 Display The display menu offers the following functions a b Dimension order As explained above CUBE has three resizable panes Initially the metric pane is on the left the call pane is in the middle and the system pane is on the right hand side However sometimes you may be interested in other orders and that is what this menu item is about It offers all possible pane orderings For example assume you would like to see the metric and call values for a certain thread In this case you could place the system pane on the left the metric pane in the middle and the call pane on the right as shown in Figure 1 3 Note that in panes to the left of the metric pane no meaningful valuescan be presented since they miss a reference metric in this case values are specified t
11. context information Ctrl lt left mouse click gt in tree multiple selection deselection lt left mouse drag gt over scroll bar scroll in topology rotate topology Ctrl lt left mouse drag gt in topology increase plane distance Shift lt left mouse drag gt in topology move topology lt scroll mouse wheel gt gt in topology zoom in out Up arrow in tree move selection one item up single selection only in topology scroll area scroll one unit up Down arrow in tree move selection one item down single selection only in topology scroll area scroll one unit down Left arrow Backspace Minus in tree collapse subtree Right arrow Return Plus in tree expand subtree Page up in tree topology scroll area scroll one page up Page down in tree topology scroll area scroll one page down 1 3 4 2 Source code editor Control in read only mode 30 1 3 Using the GUI Up Arrow Move one line up Down Arrow Move one line down Left Arrow Scroll one character to the left if horizontally scrollable Right Arrow Scroll one character to the right 1f horizontally scrollable Page Up Move one viewport page up PageDown Move one viewport page down Home Move to the beginning of the text End Move to the end of the text lt scroll mouse wheel gt gt Scroll the page vertically Alt
12. done Topology retained in experiment Clean operation ends successfully Writing cleaned cube done Usage cube3_clean o output h cube o Name of the output file default clean cube gz h Help Output a brief help message 1 4 6 Cut Reroot Prune Filter For the detailed study of some part of the report the CUBE file can be modified by applying cut operations to call tree nodes Different operations are possible e Sub trees may be re rooted i e only sub trees with the given call tree node as root are retained in the report e Entire sub trees may be pruned i e removed from the report In this case all metric values for those sub trees will be attributed to their parent call tree node 35 Chapter 1 Cube 3 4 User Guide e A filter can be applied to eliminate individual call tree nodes as if they were in lined or filtered during measurement A filter file lists shell wildcard patterns one per line An example of the output is presented below user host cube3_cut r inner_auto_ p flux_err_ o cut cube remapped cube Reading remapped cube done Cut operation begins Topology retained in experiment Cut operation ends successfully Writing cut cube done Usage cube3_cut h r nodename p nodename f filterfile o output cube o Name of the output file default cut cube gz
13. lt scroll mouse wheel gt gt Scroll the page horizontally if horizontally scrollable Ctrl lt scroll mouse wheel gt gt Zoom the text Ctrl A Select all text Additionally for the read and write mode Left Arrow Move one character to the left Right Arrow Move one character to the right Backspace Delete the character to the left of the cursor Delete Delete the character to the right of the cursor Ctrl C Copy the selected text to the clipboard Ctrl Insert Copy the selected text to the clipboard Ctrl K Delete to the end of the line Ctrl V Paste the clipboard text into text edit Shift Insert Paste the clipboard text into text edit Ctrl X Delete the selected text and copy it to the clipboard Shift Delete Delete the selected text and copy it to the clipboard Ctrl Z Undo the last operation Ctrl Y Redo the last operation Ctrl Left arrow Move the cursor one word to the left Ctrl Right arrow Move the cursor one word to the right Ctrl Home Move the cursor to the beginning of the text Ctrl End Move the cursor to the end of the text Hold Shift some movement e g Right arrow Select region 31 Chapter 1 Cube 3 4 User Guide 1 4 Performance Algebra and Tools As performance tuning of parallel applications usually involves multiple experiments to compare the effects of certain optimization strategies CUBE offers a mechanism called per
14. pt 19 Line spacing pixel 5 Cancel Figure 1 6 The font dialog opened via the menu Display gt Trees gt Font can chose if you would like to stick to the current main window size or if you allow to resize it e Topology The topology menu offers the following functions related to the topology display described in Section 1 3 2 7 i Item coloring Offers a choice how zero valued system nodes should be colored in the topology display The two offered options are either to use white or to use white only if all system leaf values are zero and use the minimal color otherwise ii Line coloring Allows to define the color of the lines in topology paint ing Available colors are black gray white or no lines iii Toolbar This menu item allows to specify if the topology toolbar but tons should be labeled by icons by a text description or if the toolbar should be hidden For more information about the toolbar see Section 1322 10 1 3 Using the GUI 1v Show also unused hardware in topology If not checked unused topology planes i e planes whose grid elements don t have any pro cesses threads assigned to are hidden Unused plane elements 1f not hidden are colored gray v Topology antialiasing If checked anti aliasing is used when drawing lines in the topologies f Help The help menu provides help on usage and gives some information about CUBE 1 Getting started O
15. 6 9385 global_int_sum PH 206 38 Process 65 1 42e4 flux_err PH 206 42 Process 66 182 36 global_real_sum PH 206 61 Process 67 55 84 task_end EH 209 85 Process 96 4 KIG 4 KIQ 4 KID 0 000000 1 227562e7 46 869000 261913587 0 000000 7 228397e6 58 884154 1 227562e7 0 000000 198 762332 0 002750 722839766 a X Figure 1 1 CUBE display window severity is the sum of the severities over the whole collapsed subtree For the example of Figure 1 1 the Execution metric value 1 23e7 is the total time for all executions On the other hand the displayed values of expanded nodes are their exclusive values E g the expanded Execution metric node in Figure 1 2 shows that the program needed 3 18e6 seconds for execution other than MPI Note that expanding collapsing a selected node causes the change of the current values in the trees on its right hand side As explained above in our example in Figure 1 1 the call tree displays values for the Execution metric over all system entities Since the Execution node is collapsed the call tree severities are computed for the whole Execution metric s subtree When expanding the selected Execution node as shown in Figure 1 2 the call tree displays values for the Execution metric without the MPI metric 1 3 2 GUI Components The GUI consists from top to bottom of e a menu bar e an optional topol
16. 72 2004 Augh ust Montreal Canada 2 5 F Wolf and B Mohr and J Dongarra and S Moore Efficient Pattern Search in Large Traces through Successive Refinement Proc of the European Conference on Par allel Computing Euro Par August September 2004 Lecture Notes in Computer Science Springer Pisa Italy 6 J Labarta and S Girona and V Pillet and T Cortes and L Gregoris DiP A Parallel Program Development Environment Proc of the 2nd International Euro Par Con ference Springer 665 674 Lyon France August 1996 26 7 Barcelona Supercomputing Center Paraver Obtain Detailed Information from Raw Performance Traces Oct 2008 http www bsc es plantillaA php cat_id 485 26 8 H Brunst and W E Nagel Scalable Performance Analysis of Parallel Systems Con cepts and Experiences Proc of the Parallel Computing Conference ParCo 2003 Dresden Germany 26 9 Technical University Dresden Vampir Performance Optimization Oct 2008 http vampir eu 26 10 World Wide Web Consortium Extensible Markup Language XML 1 0 Second Edition October 2000 http www w3 org TR REC xm1 43 11 Sameer S Shende and Allen D Malony The TAU Parallel Performance System International Journal of High Performance Computing Applications 20 2 287 331 SAGE Publications Summer 2006 39 59 Bibliography 60 Chapter 3 Appendix 3 Appendix 3 1 File format of statistics files Statistic files for an exampl
17. Can only be used after metric cnode and thread definitions are complete Note that you can only use either the region or the cnode form of these calls but not both at the same time textcolor keywordtype void cube _add _sev _reg cube _t c cube _metric met cube _region reg cube _thread thrd textcolor keywordtype double value Adds the severity value to the present value at point met reg thrd Can only be used after metric region and thread definitions are complete Note that you can only use either the region or the cnode form of these calls but not both at the same time textcolor keywordtype void cube _write _all cube _t c FILE fp Writes the entire CUBE data to the given file This basically corresponds to calling cube_write_def and cube_write_sev_matrix textcolor keywordtype void cubel_writel_def cube _t c FILE fp Writes the definitions part of the CUBE data to the given file Should only be used after definitions are complete textcolor keywordtype void cube _write _sev _matrix cube _t c FILE fp Writes the severity values part of the CUBE data to the given file Should only be used after severity values are completely set Unset values default to zero 53 Chapter 2 CUBE3 API textcolor keywordtype void cube _write _sev _row cube _t c FILE fp cube _metric met cube _cnode cnode textcolor keywordtype double sevs Writes the given severi
18. Region regnl cube def _region textcolor stringliteral foo 1 10 textcolor stringliteral Region regn2 cube def _region textcolor stringliteral bar 11 20 textcolor stringlitera Cnode cnodel Cnode cnodel node cnode2 cube def _cnode regn0 mod 21 NULL cube def _cnode regnl mod 60 cnode0 cube def _cnode regn2 mod 80 cnode0 Q textcolor comment Build system resource tree Machine mach cube def _mach textcolor stringliteral Node node cube def _node textcol stringliteral Process proc0 cube def _proc textcol tringliteral Process procl cube def _proc textcol tringliteral T T MSC textcolor stringliteral Athena mach Process 0 0 node Process 1 1 node cube def _thrd textcol cube def _thrd textcol hread thrd0 tringliteral hread thrdl tringliteral Thread 0 0 procd Thread 1 1 procl TE a O O 0 0 0 0 textcolor comment Build 2D Cartesian a topology a 5x5 grid std vector lt std string gt namedims namedims push _back textcolor stringliteral Dimension X namedims push _back textcolor stringliteral Dimension Y textcolor comment comment t have the textcolor comment exact number of dimensions present in the textcolor comment namedims push _back third uncomment this and no names at all will b textcolor keywordtype int nd
19. Rotation about the x and y axes can be done with left mouse drag click and 22 1 3 Using the GUI hold the left mouse button while moving the mouse 4 Increasing decreasing the distance between the planes with Ctrl lt left mouse drag gt 5 Moving the whole topology up down left right with Shift lt left mouse drag gt 1 3 2 8 Topology mapping panel If the number of topology dimensions is larger than three the first three dimensions are shown and an additional control panel appears below the displayed topology This panel allows rearranging topology dimensions on the x y and z axes as well as slicing or folding of higher dimensionality topologies for presentation in three or fewer dimensions Rearranging topology dimensions is achieved simply by dragging the topology dimen sion labels to the desired axis When dragged on top of an existing topology dimension label the two are exchanged When slicing select up to three of the dimensions to display completely and choose one element of each of the remaining dimensions The example in figure 1 9 shows a topology with 4 dimensions 32x16x32x4 labelled X Y Z and T The first element of the 4th dimension T is automatically selected By clicking on the button above the T an index in this dimension from 0 to 3 can be chosen If the index is set to all the selection becomes invalid until an index of another dimension is selected 3 3 00e7 Time 5 1410 Visits 1 97e5 Sy
20. Section Cnode def _cnode Region callee textcolor keyword const std string backslash amp mod textcolor keywordtype int Cnode parent Returns a new call tree node representing a call from call site located at the line line of the module mod The call tree node calls the callee callee 1 e a previously defined region parent is a previously created call tree node which will be the new one s parent To define a root node use NULL instead This method is used to create a call tree with line numbers Cnode def _cnode Region region Cnode parent Defines a new call tree node representing a call to the region region parent is a pre viously created call tree node which will be the new one s parent To define a root node use NULL instead Note that different from the previous def_cnode this method is used to create a call tree without line numbers where each call tree node points to a region To define a call tree with line numbers use def_cnode Regionx string int To define a call tree without line numbers use def_cnode Regionx Cnodex instead To create a flat profile use neither one just defining a set of regions will be sufficient 45 Chapter 2 CUBE3 API textcolor keyword const std vector lt Region gt 4 get _regv textcolor keyword const Returns a vector with all regions in the CUBE object textcolor keyword const std vector lt Cnode gt amp get _cnode
21. corresponding to the left and right end of the color legend If you activate user defined values for the color extremes you are asked to define two val ues that should correspond to the minimal and to the maximal colors All values outside of this interval will get the color gray Note that canceling any of the input windows causes no changes in the coloring method If user defined min max val ues are activated the selected value information widget displays a u foruser defined behind the minimal and maximal color values x rotation Rotate the topology cube about the x axis with the defined angle y rotation Rotate the topology cube about the y axis with the defined angle Dimension order for the topology displays This button no longer exists but formerly allowed the order of topology dimensions to be adjusted this is now done with the control panel at the bottom of the topology pane Using the grip at the left of the toolbar it can be dragged to another position or detached entirely from the main window The toolbar can also be closed after a right click in the grip 12 1 3 Using the GUI 1 3 2 3 Value modes Each tree view has its own value mode combobox a drop down menu above the tree where it is possible to change the way the severity values are displayed The default value mode is the Absolute value mode In this mode as explained below the severity values from the CUBE file are displayed However sometimes t
22. data of the selected pat terns The slender black lines on the top and the bottom designate the maximum and the minimum measured severity of the pattern respectively The lower and the upper borders of the white box indicate the values of the 25 and 75 quantile The thick line inside the box represents the median of the values while the dashed line indicates the mean There are two ways of interacting with the box plot You can zoom to a certain interval on the y axis by clicking on a position with the height of the desired maximal or mini mal value and by consecutively dragging the mouse to a position with the height of the 25 Chapter 1 Cube 3 4 User Guide Statistics info X Statistic 2 0 x Pattern WaitAtBarrier Sum 0 369845 Count 20 Mean 0 018492 Variance 0 000698 Maximum 0 065293 Quartil 75 0 047409 Median 0 006477 Quartil 25 0 000040 Minimum 0 000002 1 LateBroadcast 2 Barrier Completion 3 WaitAt Barrier 4 WaitAtiBarrier Figure 1 10 Screenshot of a box plot as shown by CUBE displaying statistical informa tion about the selected patterns The additional window on the top right displaying the exact values of the statistics corresponding other extreme value You can reset the view i e to undo all zooming by clicking the middle mouse button somewhere on the box plot If you are interested in more precise values for the severity statistics of a certain metric
23. keywordtype void cube _def _coords cube _t c cube _cartesian cart cube _thread thrd textcolor keywordtype long textcolor keywordtype int cc Maps a thread onto a Cartesian coordinate textcolor keywordtype void cube _set _sev cube _t c cube _metric met cube _cnode cnode cube _thread thrd textcolor keywordtype double value Assigns the severity value to the point met cnode thrd Can only be used after metric cnode and thread definitions are complete Note that you can only use either the region or the cnode form of these calls but not both at the same time textcolor keywordtype double cube _get _sev cube _t c cube _metric met cube _cnode cnode cube _thread thrd 52 2 1 Creating CUBE Files Returns the severity of the point met cnode thrd textcolor keywordtype void cube _set _sev _reg cube _t c cube _metric met cube _region reg cube _thread thrd textcolor keywordtype double value Assigns the severity value to the point met reg thrd Can only be used after metric regino and thread definitions are complete Note that you can only use either the region or the cnode form of these calls but not both at the same time textcolor keywordtype void cube _add _sev cube _t c cube _metric met cube _cnode cnode cube _thread thrd textcolor keywordtype double value Adds the severity value to the present value at point met cnode thrd
24. limited num ber of performance metrics The full hierarchy of performance metrics is then created during post processing using the cube3_remap tool Typically 1t is automatically called by the scalasca examine command but can also be executed manually Usage cube3_remap o output h cube o Name of the output file default remap cube gz h Help Output a brief help message 1 4 9 Score Classifies program regions by type and generates aggregated data for them In addition the cube3_score tool can be used to estimate trace buffer requirements based on a given CUBE file typically from a previous summary experiment Regions are classified into the categories ANY aggregate of all regions MPI pure MPI functions OMP pure OpenMP functions regions USR pure user regions not containing MPI or OpenMP and COM combined user regions calling MPI OpenMP directly or indirectly The metric s to be displayed can be specified via a command line option The default is to calculate teh absolute value as well as the percentage of the total time and the maximum trace buffer requirements across all processes Metrics can be any of those defined in the CUBE file or two special metrics 1 The total_tbc metric provides an estimate of total size of trace data in bytes aggregated across all processes 2 The max_tbc metric provides an estimate for the trace buffer capacity in bytes that is required to store all events that woul
25. machine mach in the CUBE object Returns NULL if the CUBE object does not contain the given machine Node get _node Node amp node textcolor keyword const Search for the node node in the CUBE object Returns NULL if the CUBE object does not contain the given node 47 Chapter 2 CUBE3 API 2 1 1 4 Virtual Topologies Virtual topologies are used to describe adjacency relationships among machines SMP nodes processes or threads A topology usually consists of a single class of entities such as threads or processes The CUBE API provides a set of functions to create Cartesian topologies and to define the machine SMP node process thread mappings onto coordi nates Note that the definition of virtual topologies is optional Cartesian def _cart textcolor keywordtype long ndims textcolor keyword const std vector lt long gt amp dimv textcolor keyword const std vector lt bool gt amp periodv Defines a new Cartesian topology ndims and dimv specify the number of dimensions and the size of each dimension periodv specifies the periodicity for each dimension Currently the maximum value for ndims is three textcolor keywordtype void def _coords Cartesian cart Sysres sys textcolor keyword const std vector lt long gt amp coordv Maps a specific system resource onto a Cartesian coordinate The system resource sys may be a machine SMP node process or a thread It is not recommended to map a mixe
26. maximal value on the same hierarchy depth The value modes Peer percent Peer distribution fall into this category Finally the External percent value mode relates the severity values to severities from another external CUBE file see below for the explanation Depending on the type and position of the tree the following value modes may be available 1 Absolute default Available for all trees The displayed values are the severity value as read from the cube file in units of measurement e g seconds Note that these values can be negative too 1 e the expression absolute in not used in its mathematical sense here 2 Own root percent Available for all trees The displayed node values are the per centage of their absolute values with respect to the absolute value of their root node in collapsed state 3 Metric root percent Available for trees on the right hand side of the metric tree The displayed node values are the percentage of their absolute values with respect to the absolute value of the collapsed metric root node If there are several metric 13 Chapter 1 Cube 3 4 User Guide 10 11 roots the root of the selected metric node is taken Note that multiple selection in the metric tree is possible within one root s subtree only thus there is always a unique metric root for this mode Metric selection percent Available for trees on the right hand side of the metric tree The displayed
27. maximum while the bottom and top of the box mark the lower quartile Q1 and upper quartile Q3 Within the box the bold horizontal line represents the median Q2 and the dashed line the mean value To see the statistics as numeric values in a separate window use lt left mouse click gt inside the chart Zooming into the boxplot is done with lt left mouse drag gt from top to bottom and reset with a lt middle mouse click gt inside the chart 1 3 2 7 Topology Display In many parallel applications each process or thread communicates only with a lim ited number of processes The parallel algorithm divides the application domain into 20 1 3 Using the GUI smaller chunks known as sub domains A process usually communicates with processes owning sub domains adjacent to its own The mapping of data onto processes and the neighborhood relationship resulting from this mapping is called virtual topology Many applications use one or more virtual topologies specified as multi dimensional Cartesian grids Another sort of topologies are physical topologies reflecting the hardware structure on which the application was run A typical three dimensional physical topology is given by the hardware nodes in the first dimension and the arrangement of cores processors on nodes in further two dimensions The CUBE display supports multi dimensional Cartesian grids where grids with high dimensionality can be sliced or folded down to two
28. n is the total number of processes MPI applications may use the rank in MPI_COMM_WORLD The process runs on the node node Thread def _thrd cosnt std string name amp textcolor keywordtype int rank Process proc 46 2 1 Creating CUBE Files Defines a new thread which has the name name and the rank rank The rank is a number from 0 to n 1 where n is the total number of threads spawned by a process Open MP applications may use the Open MP thread number The thread belongs to the process proc textcolor keyword const std vector lt Sysres gt 4 get _sysv textcolor keyword const Returns a vector with all system resources e g node thread process available in the CUBE object textcolor keyword const std vector lt Machine gt amp get _machv textcolor keyword const Returns a vector with all machines in the CUBE object textcolor keyword const std vector lt Node gt amp get _nodev textcolor keyword const Returns a vector with all nodes of all machines in the CUBE object textcolor keyword const std vector lt Process gt amp get _procv textcolor keyword const Returns a vector with all processes in the CUBE object textcolor keyword const std vector lt Thread gt amp get _thrdv textcolor keyword const Returns a vector with all threads in the CUBE object Machine get _mach Machine amp mach textcolor keyword const Search for the
29. other trees on its right to display values for that selection For the example of Figure 1 1 the metric tree displays the total metric values over all call tree and system nodes the call tree displays values for the Execution metric over all system entities and the system tree for the Execution metric and the sweep call tree node Briefly a tree is always an aggregation over all selected nodes of its neighboring trees to the left Collapsed nodes with a subtree that is not shown are marked by a sign expanded nodes with a visible subtree by a sign You can expand collapse a node by left clicking on the corresponding signs Collapsed nodes have inclusive values i e their Chapter 1 Cube 3 4 User Guide Cube 3 0 QT cube filesitrace cube File Display Help conatercccaae Absolute Absolute Absolute Metric tree Call tree Flatview System tree Topology0 Topology 0 00 Time E i 2 04e4 driver GO IBMBGIP JuGene 3 9586 task_init O ROS MO NO 1 39e7 Overhead 37 98 read_input s E 2 61e9 Visits 0 11 decomp 198 91 Process 1 9 83e4 Synchronizations E W 0 86 inner_auto PH 199 04 Process 2 E 1 30e9 Communications dy m 482 39 inner PH 199 13 Process 3 1 02e13 Bytes transferred 475 85 initialize PH 202 50 Process 32 3 41e4 Computational imbalance 64 39 barrier_sync PH 202 63 Process 33 W 0 00 timers PH 202 73 Process 34 m 3 64e5 source PH 202 89 Process 35 PH 206 27 Process 64
30. point is identified by a tuple met cnode thrd The value should be inclusive with respect to the metric but exclusive with respect to the call tree node that is it should not cover its children The default severity value for the data points left undefined is zero Thus users only need to define non zero data points textcolor keywordtype void set _sev Metric met Cnode cnode Thread thrd textcolor keywordtype double value Assigns the value value to the point met cnode thrd textcolor keywordtype void add _sev Metric met Cnode cnode Thread thrd textcolor keywordtype double value Adds the value value to the present value at point met cnode thrd The previous two methods set_sev and add_sev are intended to be used when the program dimension contains a call tree and not a flat profile As the flat profile does not require the definition of call tree nodes the following two functions should be used instead textcolor keywordtype void set _sev Metric met Region region Thread thrd textcolor keywordtype double value Assigns the value value to the point met region thrd textcolor keywordtype void add _sev Metric met Region region Thread thrd textcolor keywordtype double value Adds the value value to the present value at point met region thrd textcolor keywordtype double get _sev Metric met Cnode cnode Thread thrd textcolor key Re
31. status bar The three resizable panes offer different views the metric the call and the system pane You can switch between the different tabs of a pane by left clicking on the desired tab at the top of the pane Note that the order of the panes can be changed see the description of the menu item Display gt Dimension order in Section 1 3 2 1 The metric pane provides only the metric tree browser The call pane offers a call tree browser and a flat call profile The system pane has a system tree browser a boxplot statistics view and possibly several topology views if corresponding topology data is defined in the CUBE file Tree browsers also provide a context menu 1 3 2 1 Menu Bar The menu bar consists of four menus a file menu a display menu a topology menu and a help menu Some menu functions also have a keyboard shortcut which is written besides the menu item s name in the menu E g you can open a file with Ctrl O without going into the menu A short description of the menu items is visible in the status bar if you stay for a short while with the mouse above a menu item 1 File The file menu offers the following functions Chapter 1 Cube 3 4 User Guide a Open Ctrl 0 Offers a selection dialog to open a CUBE file In case of an already opened file 1t will be closed before a new file gets opened If a file got opened successfully it gets added to the top of the recent files list see below If it was alrea
32. the GUI Collapse all For all trees Collapses all nodes in the tree Collapse subtree For all trees Enabled only if there is a reference node It collapses all nodes in the subtree of the reference node including the reference node Collapse peers For system trees only Enabled only if there is a reference node Collapses all peer nodes of the reference node 1 e all nodes at the same hierarchy level Expand all For all trees Expands all nodes in the tree 5 Expand subtree For all trees Enabled only if there is a reference node Expands all nodes in the subtree of the reference node including the reference node Expand peers For system trees only Enabled only if there is a reference node Expands all peer nodes of the reference node i e all nodes at the same hierarchy level Expand largest For all trees Enabled only if there is a reference node Starting at the reference node expands its child with the largest inclusive value if any and continues recursively with that child until it finds a leaf It is recommended to collapse all nodes before using this function in order to be able to see the path along the largest values Dynamic hiding Not available for metric trees This menu item activates dy namic hiding All currently hidden nodes get shown You are asked to define a percentage threshold between 0 0 and 100 0 All nodes whose color position on the color scale in percent is below this
33. threshold get hidden As default value the color percentage position of the reference node is suggested if you right clicked over a node If not the default value is the last threshold The hiding is called dynamic because upon value changes caused for example by changing the node selection hiding is re computed for the new values In other words value changes may change the visibility of the nodes a Redefine threshold This menu item is enabled if dynamic hiding is already activated This function allows to re define the dynamic hiding threshold as described above During dynamic hiding for expanded nodes with some hidden children and for nodes with all of its children hidden their displayed exclusive value includes the hidden children s inclusive value The percentage of the hidden children is shown in brackets next to this aggregate value Static hiding Not available for metric trees This menu item activates static hid ing All currently hidden nodes stay hidden Additionally you can hide and show nodes using the now enabled sub items a Static hiding of minor values Enabled only in the static hiding mode As described under dynamic hiding you are asked for a hiding threshold All 17 Chapter 1 Cube 3 4 User Guide 10 11 12 13 14 15 16 nodes whose current color position on the color scale is below this percentage threshold get hidden However in contrast to dynamic hiding these h
34. 000000 3 17556988 12 261913567 Cube 3 0 QT cube fileskrace cube File Display Help een eS e E lr FSy rot 30 S xvz Own root percent v Absolute a Absolute Metric tree Calltree Flatview System tree Topology0 Topology 1 E O 0 00 Time 34 74 MPI 53 13 Overhead E 100 00 Visits G fl 100 00 Synchronizations G fl 100 00 Communications G fl 100 00 Bytes transferred EH 100 00 Computational imbalance 10 00 MPI_Allreduce a 0 00 MPI_Barrier 0 00 MPI_Bcast 0 00 MPI_Finalize 0 00 MPI_Init 0 00 MPI_Recy 0 00 MPI_Send 0 00 TRACING 2 24 global_real_max 0 81 global_real_sum 0 25 initgeom a 475 85 initialize IKIO lt J 0 000000 3 155186 3 175569e 92 899631 97 736427 75 435508 Figure 1 8 Topology Displays 2 Info By right clicking on a grid element an information widget appears with information about the system item assigned to it The information contains e the coordinate of the grid point in each topology dimension e the hardware node to which the attached system item belongs to e the system item s name its MPI rank e its identifier e and its value followed by the percentage of this value on the scale between the minimal and maximal topology values 3
35. 2 1 menu Topology for further topology specific coloring settings For example the upper topology in Figure 1 8 is drawn without lines and the one below with black lines and topology line anti aliasing If the selected system item or the first selected one in case of multiple selection occurs in the topology it is marked by an additional frame and by additional lines at the side of the plane which contains the corresponding grid point such that the selected item s position is also visible if the corresponding plane is not completely visible Besides the functions offered by the topology toolbar see 1 3 2 2 the following func tionality is supported 1 Item selection You can change the current system selection by left clicking on a grid element which has a system item assigned to it resulting in the selection of that system item 21 Chapter 1 Cube 3 4 User Guide Cube 3 0 QT cube filesitrace cube File Display Help I a fe fe 3 ff 3 fc a E Mr mo a Absolute Absolute Calltee Fiatvew Systemtres Topology 0 Topology 1 Gr ll 3 41e4 Computational imbalance m3 10 00 MPI Allreduce E 1 4264 fuer 2 28 global_int_sum 224 global_real_max 081 global_real_sum 025 initgeom 147545 initialize 1 15 initenc E 267 47 inis 1686 inner Lo elm 3 1606 inner_auto z 620 octant 0 1
36. 530 059890 382902 380047 251017 189381 170402 139266 087360 084858 083242 078037 077341 3 1 O OO 63 O O DO OO SS A AO 000000 383530 000000 Max 30 mpi 632249 684986 Or 0 10 COUN 1 00 gt imum 839160 037711 181852 051980 000002 612125 043699 162466 000000 113223 051952 000000 031288 043333 vis 399 399 1 38 its 48 36 36 28 48 12 48 16 1 4 Performance Algebra and Tools inner_ 4 0 034985 142 337220 0 034985 0 000000 inner_auto_ 4 0 024373 142 361593 0 024373 0 000000 task_init_ 4 0 014327 0 568882 0 014327 0 000000 read_input_ 4 0 000716 0 101781 0 000716 0 000000 octant_ 416 0 000581 0 000581 0 000581 0 000000 global_real_max_ 48 0 000441 1 374712 0 000441 0 000000 global_int_sum_ 48 0 000298 5 978850 0 000298 0 000000 global_real_sum_ 32 0 000108 0 030815 0 000108 0 000000 barrier_sync_ 12 0 000105 0 383007 0 000105 0 000000 bcast_int_ 12 0 000068 0 189395 0 000068 0 000000 timers 2 0 000044 0 000044 0 000044 0 000000 initgeom_ 4 0 000042 0 000042 0 000042 0 000000 initsnc_ 4 0 000038 0 000050 0 000038 0 000000 task_end_ 4 0 000013 0 088803 0 000013 0 000000 bcast_real_ 4 0 000010 0 000065 0 000010 0 000000 decomp_ 4 0 000005 0 000005 0 000005 0 000000 timers_ 2 0 000004 0 000048 0 000004 0 000000 Usage cube3_stat h p m metric metric r routine ro
37. BUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES INCLUDIN G BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CA USED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIAB ILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE 11 Contents Contents 1 Cube 3 4 User Guide 1 Li ADIRE daa eS AA a we 1 A AAA BS Se Se DS eS es 1 L3 Usm the GUL os 22S ame RHEE asirio 3 1 4 Performance Algebra and Tools 2 0 4 32 2 CUBES API 43 2414 Creatine CUBE gt ss ued eee oe Hae Se ee ee ees 43 Bibliography 59 3 Appendix 61 3 1 File format of statistics files 25245 6 6 eb eae eee eee 61 iii Chapter 1 Cube 3 4 User Guide 1 Cube 3 4 User Guide 1 1 Abstract CUBE is a presentation component suitable for displaying performance data for parallel programs includ ing MPI and OpenMP applications Program performance is represented in a multi dimensional space including various program and system resources The tool allows the interactive exploration of this space in a scalable fashion and browsing the different kinds of performance behavior with ease CUBE also includes a library to read and write performance data as well as operators to compare integrate and sum mar
38. PI _Barrier aa theca 0 00 Thread 1 0 00 Synchronization 0 00 MPI_Isend 0 00 Thread 2 ds 0 00 Barrier 10 00 MPL_lrecv O Static hiding L 0 00 Thread 3 pa 0 00 Barrier Completion 0 00 MPI_Wait or values gt 0 Process 3 0 37 WatAtBarrier 0 00 MpI_Comm_tree 0 01 Thread O amp 0 00 OMP 0 00 MpI_Finalize 0 00 Thread 1 0 00 Flush L 0 00 TRACING en of this 10 00 Thread 2 0 00 Fork L 0 00 Thread 3 6 O Process 4 No hiding 10 00 synchronization 0 00 Barrier Eind items Gi 0 00 Thread O if 10 00 Explicit 10 00 Thread 1 H 0 00 Thread 2 C 0 00 Thread 3 Call site E O Process 5 E 0 00 Thread O H 10 00 Thread 1 10 00 Critical Moirex values 010 00 Thread 2 FE 0 00 Overhead C 0 00 Thread 3 L 4 71 Idle Threads tC Process 6 H E 2576 Visits L 0 00 Thread 0 Er El 2 52e9 INSTRUCTION 10 00 Thread 1 L 1 01e9 FLOATING_POINT 10 00 Thread 2 EJ J El EM El 0 000000 0 042922 0 446654 0 000000 0 042922 100 000000 0 042922 0 000000 Find Next L 0 00 WaitAtEBarrier s ict Unmark items Called region Shows the most severe instance of pattern in trace browser Figure 1 12 CUBE display window with a selected metric and a context menu called on the same metric in a special call path showing the Max severity in trace browser menu item menu of the call tree to select the Max severity in trace b
39. can be mapped onto a number representing the actual measurement for metric m while the control flow of process thread s was executing call path c This mapping is called the severity of the performance space Each dimension of the performance space is organized in a hierarchy First the metric dimension is organized in an inclusion hierarchy where a metric at a lower level is a sub Chapter 1 Cube 3 4 User Guide set of its parent For example communication time is a subset of execution time Second the program dimension is organized in a call tree hierarchy However sometimes it can be advantageous to abstract away from the hierarchy of the call tree for example if one is interested in the severities of certain methods independently of the position of their invo cations For this purpose CUBE supports also flat call profiles that are represented as a flat sequence of all methods Finally the system dimension is organized in a multi level hierarchy consisting of the levels machine SMP node process and thread CUBE also provides a library to read and write instances of the previously described data model in the form of an XML file The file representation is divided into a metadata part and a data part The metadata part describes the structure of the three dimensions plus the definitions of various program and system resources The data part contains the actual severity numbers to be mapped onto the different elements of the perfor
40. cube textcolor comment Specify mirrors optional cube def _mirror textcolor stringliteral http icl cs utk edu software kojak textcolor comment Specify information related to the file optional cube def _attr textcolor stringliteral experiment time textcolor stringliteral September 2 cube def _attr textcolor stringliteral description textcolor stringliteral a simple exampl textcolor comment Build metric tree etric met0 cube def _met textcolor stringliteral Time textcolor stringliteral Time textcolor stringliteral mirror patterns 2 1 html execution textcolor stringliteral root node NULL textcolor comment u etric metl cube def _met textcolor stringliteral User time textcolor stringliteral Use textcolor stringliteral http www cs utk edu usr html textcolor stringliteral 2nd level met0 textcolor comment w etric met2 cube def _met textcolor stringliteral System time textcolor stringliteral S textcolor stringliteral http www cs utk edu sys html textcolor stringliteral 2nd level met0 textcolor comment w textcolor comment Build call tree textcolor keywordtype string mod textcolor stringliteral ICL CUBE example c Region regn0 cube def _region textcolor stringliteral main 21 100 textcolor stringlite
41. d be generated by a single process If an unknown metric name is given a list of metrics available in the input file is given An example of the output is presented below user host cube3_score experiment cube Reading experiment cube done Estimated aggregate size of event trace total_tbc 5775744 bytes Estimated size of largest process trace max_tbc 1444008 bytes When tracing set ELG_BUFFER_SIZE larger than this to avoid intermediate flushes or reduce requirements using file listing names of USR regions to be filtered flt type max_tbe time region ANY 1444008 143 20 100 00 summary ALL MPI 960072 62 53 43 67 summary MPI USR 3048 3 48 2 43 summary USR COM 480888 77 19 53 90 summary COM 37 Chapter 1 Cube 3 4 User Guide Usage r Print metrics for each region f File containing names of regions to filter cube3_score r f filename m metric metric cube m List of metrics that should be displayed default max_tbc time S Sort by region names rather than first metric h Help Output a brief help message 1 4 10 Statistics Extracts statistical information from the CUBE files user host cube3_stat m time mpi p remapped cube MetricRoutine Count Sum ean time INCL MAIN__ 4 143 199101 35 799775 time EXCL MAIN_ 4 0 078037 0 019509 time task_init_ 4 0 568882 0 142221 time read_input_ 4 0 101781 0 025445 time decomp_ 4 0 000005 0 000001 time inn
42. d set of entities onto one topology e g machines and threads located in the same topology The parameter of cart has been defined by the above def_cart method textcolor keywordtype void Cartesian gt setl_name textcolor keyword const std string amp Names a given virtual topology inside the cube object textcolor keyword const std strings Cartesian gt get _name Returns a topology s name textcolor keywordtype bool Cartesian gt setl_namedims std vector lt std string gt Labels the dimensions the axis labels of one Cartesian topology Although different topologies of the same CUBE object can have or not dimension names inside one specific topology either all of them have or none textcolor keyword const std vector lt std string gt amp Cartesian gt getl_namedins Returns a vector of strings with the given topology s dimensions If there is none it returns a zero sized vector 48 2 1 Creating CUBE Files textcolor keyword const std vector lt Cartesian gt amp get _cartv Mtextcolor keyword const Returns a vector of all cartesian topologies available in the CUBE object textcolor keyword const Cartesian get _cart textcolor keywordtype int i textcolor keyword Returns in i th topology in the CUBE object 2 1 1 5 Severity Mapping After the establishment of the performance space users can assign severity values to points of the space Each
43. d to a trace browser you can select the Max severity in trace browser menu item of the metric tree so that all connected trace browsers are zoomed to the globally most severe instance of the selected pattern A more sophisticated feature is the ability to zoom to the most severe instance of a pattern in a selected call path This can be done by selecting a metric in the metric tree which will highlight the most severe call paths in the call tree You can then use the context 27 Chapter 1 Cube 3 4 User Guide Cube 3 0 QT examplelctest pomp cube File Display Help AC Ea ES a a ES eS a Absolute Absolute y Absolute Metric tree Caltree Flat view System tree 0 00 Time 00 00 main amp 362 Execution A 0 00 Me d I 0 00 MPL 0 00 TRACING 110 00 Communication 10 00 MPI_Comm_spit 0 00 Collective 10 00 parallel 10 00 Early Reduce O 22 0 Process 0 0 01 Thread 0 0 00 Thread 1 0 00 Thread 2 0 00 sequential Collapse all L 0 00 Thread 3 710 00 WaitAtNxN d C 0 00 omp parallel Colapse subtree 6 O Process 1 0 00 P2P 0 00 omp tor 0 01 Thread 0 10 00 Late Receiver 10 00 step Expand all L 0 00 Thread 1 L 0 00 Messages in Wrong Order 0 00 omp ibz Expand subtree 0 00 Thread 2 0 00 Late Sender 0 00 omp ibarri L 0 00 Thread 3 a 110 00 Messages in Wrong Order oo AS ia ESE eo psc 2 0 0010 0 00 p2p Q Dynamic hiding 0 01 Thread O 0 63 irit Exit 0 00 M
44. directory As mentioned above when using the d or n command line options a numbered list of the current topologies will appear showing the topology names their dimension names when existing and the number of coordinates in each dimension as well as the total number of threads This is an example of the usage cube3_topoassist n topo cube gz Reading topo cube gz Please wait Done Processes are ordered by rank For more information about this file use cube3_info S lt cube experiment gt This CUBE has 3 topologie s 0 lt Unnamed topology gt 3 dimensions x 3 y 1 z 4 Total 12 threads 1 Test topology 1 dimensions dim_x 12 Total 12 threads 2 lt Unnamed topology gt 3 dimensions 3 1 4 Total 12 threads lt Dimensions are not named gt Topology to re name 1 New name Hardware topology Topology successfully re named Writing topo cube gz done The process is similar for re naming dimensions within a topology One characteristic is that either all dimensions are named or none One could easily create a script to generate the coordinates according to some algo rithm equation and feed this to the assistant as an input The only requirement is to answer the questions in the order they appear and after that feed the coordinates Coor dinates are asked for in rank order and inside every rank in thread order The sequence of questions made by the assistant when creating a n
45. dy in the list it is moved to the top b Close Ctrl W Closes the currently opened CUBE file Disabled if no file is opened c Open external Opens a file for the external percentage value mode see Section 1 3 2 3 d Close external Closes the current external file and removes all correspond ing data Disabled if no external file is opened e Connect to trace browser This menu item is only visible if a CUBE file 8 ww with a corresponding statistics file containing information about the most se vere instances of certain performance patterns is open and CUBE was con figured for remote trace browsing In this case it offers to connect to a trace browser i e Vampir or Paraver to examine the behaviour of the program around the most severe pattern instances For an in depth explanation of this feature see subsection 1 3 3 2 Settings This menu item offers the saving loading and the deletion of set tings You can save several settings under different names Settings with the name lt default gt are loaded automatically after the application is started Separate menu item allows you to save current settings as default On the one hand settings store the appearance of the application like the widget sizes color and precision settings the order of panes etc On the other hand settings can also store which data is loaded which tree nodes are expanded etc When saving a setting the appearance is always
46. e see Figure 3 1 are simply text files which contain the necessary data The first line is always ignored but should look similar to that in the example as it simplifies the understanding for the human reader All values in a statistic file are simply separated by an arbitrary number of spaces PatternName Count Mean Median Minimum Maximum Sum Variance Quartil25 Quartil75 mpi_latebroadcast 4 0 010 0 000031 0 000004 0 042856 0 042 0 000459 cnode 5 enter 0 245877 exit 0 256608 duration 0 042856 mpi_barrier_wait 20 0 018 0 006477 0 000002 0 065293 0 369 0 000698 0 000040 cnode 14 enter 0 192332 exit 0 192378 duration 0 000100 cnode 12 enter 0 326120 exit 0 335651 duration 0 065293 mpi_barrier_completion 20 0 000 0 000005 0 000002 0 000018 0 000 0 000000 0 000003 cnode 14 enter 0 192332 exit 0 192378 duration 0 000009 cnode 12 enter 0 159321 exit 0 165005 duration 0 000018 omp_ibarrier_wait 144 0 001 0 000027 0 000001 0 028451 0 212 0 000028 0 000002 cnode 11 enter 0 297292 exit 0 297316 duration 0 000057 cnode 10 enter 0 322577 exit 0 332093 duration 0 028451 Figure 3 1 An example of a statistic file For each pattern there is a line which contains at least its unique name and count of how many instances of the pattern exist as an integer If more values are provided there have to be the mean value median minimum and maximum as well as the sum all as floating point numbers
47. ellow hue Find next For all trees Changes the current found node to the next found node If you did not start a search yet then you are asked for the regular expression to search for Clear found items For all trees Removes the background markings of the pre ceding find items Define subset Only for system tree Uses the currently selected system resources e g from a preceding Find items to create a new subset of all system resources typically threads with the provided name This is added to the combobox at the bottom of the system tree and boxplot statistics panes and becomes the currently active subset for which statistics are calculated Info For all trees for call trees under Called region Gives some short infor mation about the reference node Disabled if there is no reference node or if no information is available for the reference node Online description For metric trees and flat call profiles for call trees see under Called region Shows some usually more extensive online description for the 18 1 3 Using the GUI 17 18 19 20 21 reference node For example metrics might point to an online documentation explaining their semantics or regions representing library functions might point to the corresponding library documentation Disabled if there is no reference node or if no online information is available Location For flat profiles only Disabled if there is no reference node
48. eparated by spaces 000 001 00 2 41 Chapter 1 Cube 3 4 User Guide Writing topo cube gz done 5 So a possible input file for this cube experiment could be Test topology 3 y torque 2000 y rotation 1500 n period 50 n 000 001 002 the remaining coordinates And then call the topology assistant cube3_topoassist c cubefile cube lt input txt 42 Chapter 2 CUBE3 API 2 CUBES API 2 1 Creating CUBE Files The CUBE data format in an XML instance 10 The CUBE library provides an interface to create CUBE files It is a simple class interface and includes only a few methods This section first describes the CUBE API and then presents a simple C program as an example of how to use it 2 1 1 CUBE API The class interface defines a class Cube The class provides a default constructor and fourty methods The methods are divided into four groups The first three groups are used to define the three dimensions of the performance space and the last group is used to enter the actual data In addition an output operator lt lt to write the data to a file is provided 2 1 1 1 Metric Dimension This group refers to the metric dimension of the performance space It consists of a single method used to build metric trees Each node in the metric tree represents a performance metric Metrics have different units of measurement The unit can be either sec 1 e seconds for time based
49. er Process 2 Ml 4 02e4 Visits E E 0 00 inner_auto Process 3 E 3 Synchronizations 4 002 inner Process 32 2 00e4 Communications 0 01 initialize Process 33 1 64e8 Bytes transferred 0 00 barrier_syne Process 34 1 61 Computational imbalance 0 00 timers Process 35 11 10 source Process 65 0 00 global_int_sum Process 66 0 43 flux_err a m a KID 0 000000 0 000000 98 351160 12 306856 799 157454 0 000000 86 150482 87 594779 98 351160 Figure 1 3 Modified pane order via the menu Display Dimension order c colors at positions between the 5 data points specified above With the upper spin below the coloring methods you can define a threshold percentage value between 0 0 and 100 0 below which colors are lightened The nearer to the left end of the color scale the stronger the lightening with linear increase With the spin at the bottom of the dialog you can define a threshold per centage value between 0 0 and 100 0 below which values should be colored white Precision Activating this menu item opens a dialog for precision settings see Figure 1 5 Besides Ok and Cancel the dialog offers an Apply button that applies the current dialog settings to the display Pressing Cance1 undoes all changes due to the dialog even if you already pressed Apply previously and closes the dialog Ok applies the settings a
50. er_auto_ 4 142 361593 35 590398 time task_end_ 4 0 088803 0 022201 pi INCL MAIN_ 4 62 530811 15 632703 pi EXCL MAIN__ 4 0 000000 0 000000 pi task_init_ 4 0 304931 0 076233 pi read_input_ 4 0 101017 0 025254 pi decomp_ 4 0 000000 0 000000 pi inner_auto_ 4 62 037503 15 509376 pi task_end_ 4 0 087360 0 021840 user host cube3_stat t33 remapped cube p Region NumberOfCalls ExclusiveTime InclusiveTime sweep_ 48 76 438435 PI_Recv 39936 36 632249 PI_Send 39936 17 684986 PI_Allreduce 128 7 383530 source_ 48 3 059890 PI_Barrier 12 0 382902 flux_err_ 48 0 380047 TRACING 8 0 251017 PI_Bcast 16 0 189381 PI_Init 4 0 170402 snd_real_ 39936 0 139266 PI_Finalize 4 0 087360 initialize_ 4 0 084858 initxs_ 4 0 083242 AIN 4 0 078037 rcv_real_ 39936 0 077341 Variance 001783 000441 001802 000622 000000 000609 000473 O OOO NO O O 190396 000000 001438 000633 000000 194255 000473 S NO O O ON 130 972847 36 632249 17 684986 383530 059890 382902 754759 251017 189381 419989 824251 088790 168192 083242 143 199101 36 709590 pa O SS O OM ow gt Minimum 35 0 0 0 0 35 0 76 36 Ls Pa 3 0 0 0 0 0 0 0 0 0 0 0 759769 001156 102174 000703 000001 566589 000468 607989 000000 040472 000034 000000 478049 000108 m time mpi visits time 438435 632249 684986 383
51. ew the call tree hierarchy is replaced with a source code hierarchy consisting of two levels regions and their subroutines Any subroutines are displayed as a single child node labeled Subroutines A subroutine node represents all regions directly called from the region above In this way you are able to see which fraction of a metric is associated with a region exclusively that is without its regions called from there Tree displays are controlled by the left and right mouse buttons and some keyboard keys The left mouse button is used to select or expand collapse a node You can ex pand collapse a node by left clicking on the attached sign and select it by left clicking elsewhere in the node s line To select multiple items Ctrl lt left mouse click gt can be used Selection without the Ctrl key deselects all previously selected nodes and selects the clicked node In single selection mode you can also use the up down arrows to move the selection one node up down The right mouse button is used to pop up a context menu with node specific information such as online documentation see the description of the context menu below Cube 3 0 QT cube filesitrace cube File Display Help GERM aaa tt Et Epi Absolute Absolute Absolute Metric tree Calltree Flatview System tree Topology0 Topology 1 0 00 Time al E 0 00 MPI Allreduce 0 IBMBGIP WuGene E 10 00 MPI_Barrier O RO3 M0 NO
52. ew topology the c switch is 40 1 4 Performance Algebra and Tools e New topology s name e Number of dimensions e Will the above dimensions be named Y N e If yes asks the name Empty is not valid e Number of coordinates in that dimension e Asks if this dimension is either periodic or not Y N e Repeat the previous three steps for every dimension e After that it expects the coordinates for each thread in this topology separated by spaces in the order described above This is a sample session of the assistant cube3_topoassist c experiment cube gz Reading experiment cube gz Please wait Done Processes are ordered by rank For more information about this file use cube3_info S lt cube experimel So far only cartesian topologies are accepted ame for new topology Test topology umber of Dimensions 3 Do you want to name the dimensions axis of this topology Y N y ame for dimension 0 torque umber of elements for dimension 0 2000 Is dimension 0 periodic y Name for dimension 1 rotation Number of elements for dimension 1 1500 Is dimension 1 periodic n Name for dimension 2 period Number of elements for dimension 2 50 Is dimension 2 periodic n Alert The number of possible coordinates 150000000 is bigger than the number of threads on the specified cube file 12 Some positions will stay empty Topology on THREAD level Thread 0 s rank 0 coordinates in 3 dimensions s
53. formance algebra that can be used to merge subtract and average the data from different experiments and view the results in the form of a single derived experiment Using the same representation for derived experiments and original experiments provides access to the derived behavior based on familiar metaphors and tools in addition to an arbitrary and easy composition of operations The algebra is an ideal tool to verify and locate performance improvements and degradations likewise The algebra includes three operators diff merge and mean provided as command line utilities which take two or more CUBE files as input and generate another CUBE file as output The operations are closed in the sense that the operators can be applied to the results of previous oper ations Note that although all operators are defined for any valid CUBE data sets not all possible operations make actually sense For example whereas it can be very helpful to compare two versions of the same code computing the difference between entirely different programs is unlikely to yield any useful results 1 4 1 Difference Changing a program can alter its performance behavior Altering the performance behav ior means that different results are achieved for different metrics Some might increase while others might decrease Some might rise in certain parts of the program only while they drop off in other parts Finding the reason for a gain or loss in overall performance
54. gy 1 3 2 11 Status Bar The status bar displays some status information like state of execution for longer proce dures hints for menus the mouse pointing at etc 1 3 3 Features enabled through statistic files In this section we will explain two features namely the display of statistical information about performance patterns which represent performance problems and the display of the most severe instances of these patterns in a trace browser which are only available if a statistic file for the currently opened CUBE file is present Such a statistic file can be generated by the Scalasca trace analyzer The file format of statistic files is described in the Appendix 3 1 For CUBE to recognize the statistic file it must be placed in the same directory as the CUBE file The basename of the statistic file should be identical to that of the CUBE file but with the suffix stat For example when the CUBE file is called trace cube gz the corresponding statistic file is called trace stat 1 3 3 1 Statistical information about performance patterns If a statistic file is provided you can view statistical information about one or multiple patterns for example in order to compare them This is done by selecting the desired metrics in the metric tree and then selecting the Statistics menu item in the context menu This brings up the box plot window as shown in Figure 1 10 The box plot shows a graphical representation of the statistical
55. h dimension T One element with index 0 0 1 3 has been selected by clicking with the right mouse button on it All elements inside the black rectancle around the selection belong to Z index one The gray lines divide the rectangle into four elements which correspond to the elements of dimension T with indices 0 to 3 1 3 2 9 Selected value info Below each pane there is a selected value information widget If no data is loaded the widget is empty Otherwise the widget displays more extensive and precise information about the selected values in the tree above This information widget and the topologies may have different precision settings than the trees such that there is the possibility to display more precise information here than in the trees see Section 1 3 2 1 menu Display Precision The widget has a 3 line display The first line displays at most 4 numbers The left most number shows the smallest value in the tree or 0 0 in any percentage value mode for trees or the user defined minimal value for coloring if activated and the right most number shows the largest value in the tree or 100 0 in any percentage value mode in trees or the user defined maximal value for coloring if activated Between these two numbers the current value of the selected node is displayed if it is defined Additionally in the absolute value mode it is followed by the percentage of the selected value on the scale between the minimal and maximal values sh
56. hese values may be hard to interpret and in such cases other value modes can be applied Basically there are three categories of additional value modes e The first category presents all severities in the tree as percentage of a reference value The reference value can be the absolute value of a selected or a root node from the same tree or in one of the trees on the left hand side For example in the Own root percent value mode the severity values are presented as percentage of the own root s inclusive severity value This way you can see how the severities are distributed within the tree All the value modes Own root percent System selection percent fall into this category All nodes of trees on the left hand side of the metric tree have undefined values Basically we could compute values for them but it would sum up the severities over all metrics that have different meanings and usually even different units and thus those values would not have much expressiveness Since we cannot compute percentage values based on undefined reference values such value modes are not supported For example if the call tree is on the left hand side and the metric tree 1s in the middle then the metric tree does not offer the Call root percent mode e The second category is available for system trees only and shows the distribution of the values within hierarchy levels E g the Peer percent value mode displays the severities as percentage of the
57. idings are static Even if after some value changes the color position of a hidden node gets above the threshold the node stays hidden b Hide this Enabled only in the static hiding mode if there is a reference node Hides the reference node c Show children of this Enabled only in the static hiding mode if there is a reference node Shows all hidden children of the reference node if any Like for dynamic hiding for expanded nodes with some hidden children and for nodes with all of its children hidden their displayed exclusive value includes the hidden children s inclusive value The percentage of the hidden children is shown in brackets next to this aggregate value No hiding Not available for metric trees This menu item deactivates any hiding and shows all hidden nodes Find items For all trees Opens a dialog to get a regular expression from the user If the user called the context menu over an item the default text is the name of the reference node otherwise it is the last regular expression which was searched for If select items is checked items matching the regular expression also become se lected If select items is unchecked all non hidden nodes whose names contain the given text are marked with a yellow background and all collapsed nodes whose subtree contains such a non hidden node by a light yellow background The current node found that is initialized to the first found node is marked by a distinguished y
58. ill be opened upon program start After a brief description of the basic principles different components of the GUI will be described in detail 1 3 1 Basic Principles The CUBE QT display has three tree browsers each of them representing a dimension of the performance space Figure 1 1 Per default the left tree displays the metric di mension the middle tree displays the program dimension and the right tree displays the system dimension The nodes in the metric tree represent metrics The nodes in the program dimension can have different semantics depending on the particular view that has been selected In Figure 1 1 they represent call paths forming a call tree The nodes in the system dimension represent machines nodes processes or threads from top to bottom Each node is associated with a value which is called the severity and is displayed si multaneously using a numerical value as well as a colored square Colors enable the easy identification of nodes of interest even in a large tree whereas the numerical values enable the precise comparison of individual values The sign of a value is visually dis tinguished by the relief of the colored square A raised relief indicates a positive sign a sunken relief indicates a negative sign Users can perform two basic types of actions selecting a node or expanding collapsing a node In the metric tree in Figure 1 1 the metric Execution is selected Selecting a node in a tree causes the
59. ims 2 vector lt long gt dimv vector lt bool gt periodv textcolor keywordflow for textcolor keywordtype int i 0 i lt ndims i M 55 Chapter 2 CUBE3 API dimv push _back 5 textcolor keywordflow if i 2 0 periodv push _back textcolor keyword true textcolor keywordflow else periodv push _back textcolor keyword false My Cartesian cart cube def _cart ndims dimv periodv cart gt set _name textcolor stringliteral Bi dimensional topology cart gt set _namedims namedims vector lt long gt coord0 coordl coord0 push _back 0 coord0 push _back 0 coordl pushl_back 3 coordl pushl_back 3 textcolor comment map the two threads onto the above 2 coordinates cube def _coords cart thrd0 coord0 cube def _coords cart thrdl coordl textcolor comment Severity mapping cube set _sev met0 cnode0 thrd0 4 cube set _sev met0 cnode0 thrdl 4 cube set _sev met0 cnodel thrd0 4 cube set _sev met0 cnodel thrdl 4 cube set _sev met0 cnode2 thrd0 4 cube set _sev met0 cnode2 thrdl 4 cube setl_sev met1 cnode0 thrd0 1 cube setl_sev met1 cnode0 thrdl 1 cube setl_sev met1 cnodel thrd0 1 cube setl_sev met1 cnodel thrdl 1 cube setl_sev met1 cnode2 thrd0 1 cube setl_sev met1 cnode2 thrdl 1 cube set _sev met2 cnode0 thrd0 1 cube set _sev met2 cnode0 thrdl 1 cube set _
60. in arbitrary format If one of these values is provided all have to The next optional value is the variance also as a floating point number The last two optional values of which both or none have to be provided are the 25 and the 75 quantile also as floating point numbers If any of these values is omitted all following values have to be omitted too If for ex ample the variance is not provided the lower and the upper quartile must not be provided either In the subsequent lines there can be an arbitrary number the information of the most 61 0 047409 0 000009 0 000437 Chapter 3 Appendix severe instances is provided Each of these lines has to begin with the text lt em gt cnode lt em gt followed by the integer identifier of this cnode in the CUBE file and thenenter lt em gt exit lt em gt andduration each with floating point numbers for the respective times in seconds The beginning of the next pattern is indicated by a blank line 62 scalasca 6 ee as A JULICH GS tee www scalasca org FORSCHUNGSZENTRUM
61. ir you have to specify the host name and port of the Vampir server you want to connect to and the path of the trace file you want to load This will launch the Vampir client if it is correctly configured and load the specified trace file To configure Vam pir so that it can be started automatically by CUBE a service file com gwt vampir service describing the path to your Vampir client executable must be placed un der usr share dbus 1 service or HOME local share dbus 1 services This service file must be exactly as shown below with the exception that Exec should point to your Vampir client executable D BUS Service Name com gwt vampir Exec private utils bin vng An example of the com gwt vampir service file For Paraver you have to specify a configuration file which is used to initialize the Par aver window which is opened when zooming as well as the path of the desired trace file This will launch Paraver which will directly open the correct trace file In order for CUBE to be able to launch Paraver the executable directory of Paraver must be in your path It is also possible to connect to multiple trace browsers so that you can view a trace file in Paraver and Vampir simultaneously but due to limitations with the Vampir client you can only have two Vampir clients running at the same time All trace browsers will be zoomed simultaneously if you select a zoom command as described below Once CUBE is connecte
62. ize data from different experiments This user manual provides instructions of how to use the CUBE display how to use the operators and how to write CUBE files The CUBE3 implementation has an incompatible API and file format to preceding versions 1 2 Introduction CUBE CUBE Uniform Behavioral Encoding is a presentation component suitable for displaying a wide variety of performance data for parallel programs including MPI 1 and OpenMP 2 applications CUBE allows interactive exploration of the performance data in a scalable fashion Scalability is achieved in two ways hierarchical decomposi tion of individual dimensions and aggregation across different dimensions All metrics are uniformly accommodated in the same display and thus provide the ability to easily compare the effects of different kinds of program behavior CUBE has been designed around a high level data model of program behavior called the cube performance space The CUBE performance space consists of three dimensions a metric dimension a program dimension and a system dimension The metric dimension contains a set of metrics such as communication time or cache misses The program dimension contains the program s call tree which includes all the call paths onto which metric values can be mapped The system dimension contains the items executing in par allel which can be processes or threads depending on the parallel programming model Each point m c s of the space
63. mance space The display component can load such a file and display the different dimensions of the performance space using three coupled tree browsers Figure 1 1 The browsers are connected in such a way that you can view one dimension with respect to another di mension The connection is based on selections in each tree you can select one or more nodes For example in Figure 1 1 the Execution metric the sweep call path node and Process 0 are selected For each tree the selections in the trees on its left hand side if any restrict the considered data The metric nodes aggregate data over all call path nodes and all system items the call tree aggregates data for the Execution metric over all system nodes and each node of the system tree shows the severity for the Execution metric of the sweep call path node for this system node If the CUBE file contains topological information the distribution of the performance metric across the topology can be examined using the topology view Furthermore the display is augmented with a source code display that shows the position of a call site in the source code As performance tuning of parallel applications usually involves multiple experiments to compare the effects of certain optimization strategies CUBE includes a feature designed to simplify cross experiment analysis The CUBE algebra 4 is an extension of the framework for multi execution performance tuning by Karavanic and Miller 3 and of
64. metrics such as execution time or occ i e occurrences for event based metrics such as floating point operations During the establishment of a metric tree a child metric is usually more specific than its parent and both of them have the same unit of measurement Thus a child performance metric has to be a subset of its parent metric e g system time is a subset of execution time Metric def _met textcolor keyword const std string amp disp _name textcolor keyword const std textcolor keyword const std string amp dtype textcolor keyword const std string textcolor keyword const std string val textcolor keyword const std string amp textcolor keyword const std string amp descr Metric parent Returns a metric with display name disp_name unique name unig_name and descrip tion descr 43 Chapter 2 CUBE3 API dtype specifies the data type which can either be lt tt gt INTEGER lt tt gt or FLOA Ta uom is the unit of measurement which is either lt tt gt sec lt tt gt for seconds orocc for number of occurrences val specifies whether there is any data available for this particular metric It can either be VOID no data available metric will not be shown in CUBE or an empty string metric will be shown and data is present parent is a previously created metric which will be the new metric s parent To define a root node use NULL instead u
65. nchronizations 2 57e10 Communication A Cube 3 0 QT CUBE example4D cube va x na x File Display Topology Help C 2000 E E E H et 300 Syros o 44 y M erot 300 Siyrat 30 0 AF v Absolute v Absolute v Peer distribution Y Peer distribution M Metric tree Call tree Flat view System tree Box Plot BG P XYZT App lt gt System tree Box Plot BG P XYZT App 256x256 RT PY G Gl 3 0087 driver 2 15e13 Bytes transferrec 7 31e4 Computational im E TL 0 0 1 3 R00 M0 N0 Process 2051 amp MPI rank 2051 Thread id 0 Value 12 23 12 23 ff Absolute 458 31 12 23 4 A all v IWC gt GE JN x Y Z T 0 00 3 00e7 100 00 3 00e7 0 00 3 00e 3 00e7 0 00 12 23 100 00 Figure 1 9 4 dimensional example Alternatively the folding mode can be activated by clicking on the fold button This mode is available for topologies with four to six dimensions and allows to display all elements 23 Chapter 1 Cube 3 4 User Guide by folding two dimensions into one Every dimension appears in a box which can be dragged into one of the three container boxes for the displayed Cartesian dimensions x y and z In folding mode the color of the inner borders is changed into gray The black bordered rectangles show the element borders of each of the three displayed dimensions The right image in figure 1 9 shows the folding of dimension Z wit
66. nd closes the dialog It consists of two parts precision settings for the tree displays and precision settings for the selected value info widgets and the topology displays For both formats three values can be defined lt ol gt lt li gt lt strong gt Number of digits after the decimal point lt strong gt As the name suggests you can specify the precision for the fraction part of the values E g the number 1 234 is displayed as 1 2 if you set this precision to 1 as 1 234 if you set it to 3 and as 1 2340 if you set it to 4 lt li gt lt strong gt Exponent representation above form 10 with x lt strong gt Here you can 1 3 Using the GUI Colorsettings lt 0 0 Start at Cyan at Green at Yellow at End at o0 pri bez e Kho Coloring method Linear Quadratic 1 gt Quadratic 2 Exponential 1 e Exponential 2 Lighten colors for values under rota ie RS Use white to color values under this percentage in the value range 0 00 Figure 1 4 The color dialog opened via the menu Display General coloring define above which threshold scientific notation should be used E g the value 1000 is displayed as 1000 if this value is larger then 3 and as form 1l otherwise lt li gt lt strong gt Display zero values below form 12 with x lt strong gt Due to inexact floating point representation it often happens that users wish to round down values very near zero t
67. node values are the percentage of their absolute values with re spect to the selected metric node s absolute value in its current collapsed expanded state In case of multiple selection the sum of the selected metrics values for the percentage computation is taken Call root percent Available for trees on the right hand side of the call tree Simi lar to the metric root percent but the call tree root instead of the metric tree root is considered In case of multiple selection with different call roots the sum of those root values is considered Call selection percent Available for trees on the right hand side of the call tree Similar to the metric selection percent percentage is computed with respect to the selected call node s value in its current collapsed expanded state In case of multiple selections the sum of the selected call values is considered System root percent Available for trees on the right hand side of the system tree Similar to the call root percent the sum of the inclusive values of all roots of selected system nodes are considered for percentage computation System selection percent Available for trees on the right hand side of the system tree Similar to the call selection percent percentage is computed with respect to the selected system node s in its current collapsed expanded state Peer percent For the system tree only The peer percentage mode shows the per centage of the nodes
68. o be undefined denoted by a minus sign General coloring Opens a dialog where different color settings can be changed The dialog is shown in Figure 1 4 The Ok button applies the settings to the display and closes the dialog the Apply button applies the settings to the display and Cancel cancels all changes since the dialog was opened even if Apply was pressed in between and closes the dialog At the top of the dialog you see a color legend with some vertical black lines showing the position of the color scale start the colors cyan green and yellow and the color scale end These lines can be dragged with the left mouse button or their position can also be changed by typing in some values between 0 0 left end and 1 0 right end below the color legend in the corresponding spins The different coloring methods offer different functions to interpolate the Chapter 1 Cube 3 4 User Guide Cube 3 0 QT cube filesitrace cube File Display Help elle cocoa e Absolute Absolute Absolute 1 System tree Topology 0 Topology 1 Metric tree Call tree Flat view 0 IBMB EE Lo ae Sme ame sae ro E Lo O RO3 MO NO 0 01 task_init T 276 14 MPI E 0 00 read_input O Process 1 424 67 Overhead 0 00 decomp Process 64 Process 67 a 0 00 global_real_sum Process 96 7 G W 0 00 task_end GIP JuGene 10 00 Time G 062 driv
69. o zero Here you can define the threshold below which this rounding should take place E g the value 0 0001 is displayed as 0 0001 if this value is larger than 3 and as zero otherwise lt ol gt d Trees This menu item offers two sub items i Font Here you can specify the font the font size in pt and the line spacing for the tree displays see Figure 1 6 The Ok button applies the settings to the display and closes the dialog the Apply button applies the settings to the display and Cancel cancels all changes since the di alog was opened even if Apply was pressed in between and closes the dialog lt li gt lt strong gt Selection marking lt strong gt Here you can specify if selected items in trees should be marked by a blue background or by a frame lt ol gt 11 Optimize width Under this menu item CUBE offers widget rescaling such that the amount of information shown is maximized i e CUBE optimally distributes the available space between its components You Chapter 1 Cube 3 4 User Guide AS ES Display in trees Number of digits after decimal point Exponent representation above 10 with 4 Display zero for values below 10 x with 7 Display in the value widget under the tree widgets and in topologies Number of digits after decimal point Exponent representation above 10 with x Display zero for values below 10 with x Cancel __ Font settings Font Size
70. often requires considering the performance change as a multidimensional structure With CUBE s difference operator a user can view this structure by computing the difference between two experiments and rendering the derived result experiment like an original one The difference operator takes two experiments and computes a derived experiment whose severity function reflects the difference between the minuend s severity and the subtrahend s severity The possible output is presented below user host cube3_diff scout cube remapped cube o result cube Reading scout cube done Reading remapped cube done Diff operation begins INFO Merging metric dimension done INFO Merging program dimension done INFO Merging system dimension done INFO Mapping severities done INFO Adding topologies Topology retained in experiment done INFO Diff operation done Diff operation ends successfully 32 1 4 Performance Algebra and Tools Writing result cube done Usage cube3_diff o output c C h minuend subtrahend o Name of the output file default diff cube c Do not collapse system dimension if experiments are incompatible C Collapse system dimension h Help Output a brief help message 1 4 2 Merge The merge operator s purpose is the integration of performance data from different sources Of
71. ogy tool bar that can be removed or repositioned e three value mode combo boxes e three resizable panes each containing some tabs 1 3 Using the GUI Cube 3 0 QT cube filesitrace cube File Display Help epoca Absolute Absolute Absolute Metric tree Call tree Flatview System tree Topology 0 10 00 Time Em 2 04e4 driver 5 0O IBMEG P JuGene 6 81 task_init O RO3 4V0 NO E 9 10e6 MPI e 0 56 read_input 1 39e7 Overhead 0 11 decomp 85 67 Process 1 Mi 2 61e9 Visits G W 0 86 inner_auto m 86 12 Process 2 9 83e4 Synchronizations 482 39 inner 84 79 Process 3 1 30e9 Communications 475 85 initialize 85 94 Process 32 1 02e13 Bytes transferred 0 26 barrier_sync 85 86 Process 33 3 41e4 Computational imbalance 0 00 timers 85 80 Process 34 D 3 64e5 source m 25 67 Process 35 85 73 Process 64 2 28 global_int_surn 86 79 Process 65 1 42e4 flux_err 85 45 Process 66 E 081 global_real_sum 85 81 Process 67 a E 0 14 task_end Mi 85 65 Process 96 4 KID 4 KIO 4 KIO 0 000000 3 175569e6 12 124495 2 619135e7 0 000000 2 775606e6 87 405007 3 175569e6 0 000000 86 150482 0 003104 2 775606e6 Topology 1 Figure 1 2 CUBE display window with expanded metric node Execution e three selected value information widgets e a color legend and e a
72. or three dimensions for presentation If the currently opened cube file defines one or more such topologies separate tabs are available for each using the topology name when one is provided The topology display shows performance data mapped onto the Cartesian topology of the application The corresponding grid is specified by the number of dimensions and the size of each dimen sion Threads processes are attached to the grid elements as specified by the CUBE file Not all system items have to be attached to a grid element and not every grid element has a system item attached Examples of a two and of a three dimensional topology are shown on Figure 1 8 Note that the topology toolbar is enabled when a topology is available to be displayed The Cartesian grid is presented by planes stacked on top of each other in a three dimen sional projection The number of planes depends on the number of dimensions in the grid Each plane is divided into tiles typically shown as rombi The number of tiles depends on the dimension size Each tile represents a system resource e g a process of the application and has a coordinate associated with it The current value of each grid element with respect to the selections on the left hand side and to the current value mode is represented by coloring the grid element Coloring is based on a value scale from 0 0 to 100 0 Grid elements without having a system item attached to it are colored gray See Section 1 3
73. own in brackets Note that the values of expanded non leaf system nodes and of nodes of trees on the left hand side of the metric tree are not defined If the value mode is not the absolute value mode then in the second line similar information is displayed for the absolute values in a light gray color In case of multiple selection the information refers to the sum of all selected values In case of multiple selection in system trees in the peer distribution and in the peer percent modes this sum does not state any valuable information but is displayed for consistency reasons If the widget width is not large enough to display all numbers in the given precision then a part of the number displays get cut down and a indicates that not all digits could be displayed Below these numbers in the third line a small color bar shows the position of the color of the selected node in the color legend In case of undefined values the legend is filled with a gray grid 24 1 3 Using the GUI 1 3 2 10 Color legend By default the colors are taken from a spectrum ranging from blue over cyan green and yellow to red representing the whole range of possible values You can change the color settings in the menu Display gt General coloring see Section 1 3 2 1 Exact zero values are represented by the color white in topologies you can decide whether you would like to use white or the minimal color see Section 1 3 2 1 menu Topolo
74. pens a dialog with some basic information on the usage of CUBE 11 Mouse and keyboard control Lists mouse and keyboard controls as given in Section 1 3 4 111 What s this Here you can get more specific information on parts of the CUBE GUL If you activate this menu item you switch to the What s this mode If you now click on a widget an appropriate help text is shown The mode is left when help is given or when you press Esc Another way to ask the question is to move the focus to the relevant widget and press Shift Fl 1v About Opens a dialog with release information v Selected metric description Opens a new window showing the de scription of the currently selected metric equivalent to Online descrip tion in the metric tree context menu Disabled if online information is unavailable vi Selected region description Opens a new window showing the descrip tion of the currently selected region equivalent to Online description in the call tree context menu Disabled if online information is unavailable 1 3 2 2 Toolbar As already mentioned the system pane may contain topology displays if corresponding data is specified in the CUBE file For the topology displays see Section 1 3 2 7 Ba sically a topology display draws a two or three dimensional grid in the form of some planes placed one above the other Each plane consists of a two dimensional grid of processes or threads The toolbar is enabled only if the s
75. rl isa link to an HTML page describing the new metric in detail If you want to mirror the page at several locations you can use the macro as a prefix which will be replaced by an available mirror defined using def_mirror see Section textcolor keyword const std vector lt Metric gt 4 get _metv textcolor keyword const Returns a vector with all metrics in the CUBE object textcolor keyword const std vector lt Metric gt amp get _root _metv textcolor keyword const Returns a vector with all roots of the metric dimension in the CUBE object Metric get _met textcolor keyword const std string unigq _name textcolor keyword const Returns a metric with the given unig_name Returns NULL if the CUBE object doesn t contain a metric with this name Metric get _root _met Metric met Returns the root metric for the given metric met 2 1 1 2 Program Dimension This group refers to the program dimension of the performance space The entities pre sented in this dimension are region call site and call tree node 1 e call paths A region can be a function a loop or a basic block Each region can have multiple call sites from which the control flow of the program enters a new region Although we use the term call site here any place that causes the program to enter a new region can be represented as a call site including loop entries Correspondingly the region entered from a call site i
76. rowse Y Y ampir Tr W File Edit Chart Filter Window Help ru es 192 25 ms 192 30 ms 192 35 ms 192 40 ms 192 45 ms 192 50 ms 192 55 ms Mustreft 2 fizamt1 fzam1166 publiciepik_ pik esd lt jj28101 Process 0 Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 Process 7 Process 8 Process 9 Process 10 Process 11 Process 12 K P EL PI _ Oy EF A 142 1 O 1 UA 1 eee 0 00 0 01 0 01 118 92 0 00 0 00 28 47 0 01 0 00 0 00 100 00 0 00 Shows the most severe instance of pattern in trace browser a Figure 1 13 Location of the worst Late Broadcast instance shown in the timeline dis play of Vampir It can be seen that some processes enter the MPI_Bcast operation earlier than the root process It is leading to a wait state 29 Chapter 1 Cube 3 4 User Guide 1 3 4 Keyboard and mouse control 1 3 4 1 General control Shift Fl Help What s this Ctrl O Shortcut for menu File gt Open Ctrl W Shortcut for menu File Close Ctrl Q Shortcut for menu File gt Quit lt left mouse click gt over menu tool bar activate menu function over value mode combo select value mode over tab switch to tab in tree select deselect expand collapse items in topology select item lt right mouse click gt in tree context menu in topology
77. rowser menu item see Figure 1 12 for illustration This menu item will then zoom all connected trace browsers to the most severe instance of the selected pattern with respect to the chosen call path see Figure 1 13 28 1 3 Using the GUI a Cube 3 4 QT epik_su3imp_base_16_traceftrace cube gz lt j28I01 gt ye x File Display Topology Help Absolute Absolute y Absolute x Metric tree Call tree Flat view System tree Box Plot Topology 0 E 00 00 Time 2 0 00 main 82 11 Execution 00 00 initialize_machine amp 0 36 MPI amp O 0 00 g_sync 0 00 Synchronization E O 0 00 setup 10 00 Communication Bl 3 27 Point to point E 0 01 Collective 10 00 Early Reduce 0 00 Early Scan E O 0 00 initial_set 110 00 mynode 10 00 get_prompt 110 00 get_i E O 0 00 broadcast_bytes IM 1 37 Wait at N x N 0 00 N x N Completion 131 57 Inivexit 0 22 Overhead F ll 6 60e7 Visits IM 28 Synchronizations Call site Called region amp C 0 00 numnodes 00 00 mynode 0 0 00 initialize_prn 00 00 numnodes 10 00 check_layout 10 00 fixup_ranks 0 00 setup_layout 10 00 make _lattice 0 00 make_nn_gathers 0 00 make_3n_gathers Expand collapse e Hiding LE Find items Find Next Clear found items 7 81e4 Communications 3 16e9 Bytes transferred 42 16 Computational imbalance Copy to clipboard Min max values y in trace b
78. s called callee which might as well be a loop Every call tree node points to a call site The actual call path represented by a call tree node can be derived by following all the call sites starting at the root node and ending at the particular node of interest The user can choose among three ways of defining the program dimension 44 2 1 Creating CUBE Files 1 Call tree with line numbers 2 Call tree without line numbers 3 Flat profile A call tree with line numbers is defined as a tree whose nodes point to call sites A call tree without line numbers is defined as a tree whose nodes point to regions i e the callees A flat profile is simply defined as a set of regions that is no tree has to be defined Region def _region textcolor keyword const std string name textcolor keywordtype long begln Ate textcolor keyword const std string gurl textcolor keyword const std string amp de textcolor keyword const std string amp mod Returns a new region with region name name and description descr The region is located in the module mod and exists from line beg1n to line end1n url isa link to an HTML page describing the new region in detail For example if the region is a library function the url can point its documentation If you want to mirror the page at several locations you can use the macro mirror as a prefix which will be replaced by an available mirror defined using def_mirror see
79. saved While saving you will be asked whether you would also like to save the data related settings If you load a setting which stores also data settings the corresponding data is also loaded In the dialog for loading settings you are offered the list of all available settings For the settings with data the name of the corresponding cube file is displayed in braces Note that settings with data only store the name of the cube file from which to load the data but not the data itself Thus if the cube file is not available any more CUBE cannot load the data settings CUBE also makes some basic tests on the data to check if it could have changed since saving the setting E g if the number of items does not coincide with those upon saving it also does not load the data Dynamic loading threshold This menu item is only available if CUBE was configured for dynamic loading By default CUBE always loads the whole amount of data when you open a CUBE file However CUBE offers also a possibility to load only those 1 3 Using the GUI h 1 J data which is needed for the current display To be more precise the data for the selected metric s and if a selected metric is expanded the data for 1ts children are loaded If you change the metric selection possibly some new data is needed for the display that is dynamically loaded on demand Currently unneeded data gets unloaded This functionality is useful mostly for large files
80. sev met2 cnodel thrd0 1 cube set _sev met2 cnodel thrdl 1 cube set _sev met2 cnode2 thrd0 1 cube set _sev met2 cnode2 thrdl 1 textcolor comment Output to a cube file ofstream out out open textcolor stringliteral example cube out lt lt cube 56 2 1 Creating CUBE Files Cube 3 0 QT cube filestexample cube File Display Help EE ae y rot 30 XYZ Ir Metric tree 6 6 00 User time 6 00 System time Fa a 3 fc en eee a Y Absolute X Call tree Flat view Absolute X System tree Topology 0 50 MSC 0 Athena 2 00 Process 1 0 0000 12 0000 50 0000 24 0000 0 0000 4 0000 33 3333 72 0000 0 0000 2 0000 50 0000 4 0000 Figure 2 1 Display of example cube 57 Chapter 2 CUBE3 API 58 Bibliography Bibliography 1 Message Passing Interface Forum MPI A Message Passing Interface Standard June 1995 http www mpi forum org 1 2 OpenMP Architecture Review Board OpenMP Fortran Application Program In terface Version 2 5 May 2000 http www openmp org 1 3 K L Karavanic and B Miller A Framework for Multi Execution Performance Tun ing Parallel and Distributed Computing Practices 4 3 2001 September 2 4 F Song and F Wolf and N Bhatia and J Dongarra and S Moore An Algebra for Cross Experiment Performance Analysis Proc of ICPP 2004 63
81. ten a certain combination of performance metrics cannot be measured during a single run For example certain combinations of hardware events cannot be counted si multaneously due to hardware resource limits Or the combination of performance met rics requires using different monitoring tools that cannot be deployed during the same run The merge operator takes an arbitrary number of CUBE experiments with a differ ent or overlapping set of metrics and yields a derived CUBE experiment with a joint set of metrics The possible output is presented below user host cube3_merge scout cube remapped cube o result cube Merge operation begins Reading scout cube done Reading remapped cube done INFO Merging metric dimension done INFO Merging program dimension done INFO Merging system dimension done INFO Mapping severities done INFO Merge operation Topology retained in experiment Topology retained in experiment done Merge operation ends successfully Writing result cube done Usage cube3_merge o output c C h cube o Name of the output file default merge cube c Do not collapse system dimension if experiments are incompatible C Collapse system dimension h Help Output a brief help message 33 Chapter 1 Cube 3 4 User Guide 1 4 3 Mean The mean operator is intended to smoo
82. textcolor keyword const textcolor keywordtype char descr textcolor Returns a new region cube _cnode cube _def _cnode _cs cube _t c cube _region callee textcolor keyword const textcolor keywordtype char mod textcolor keyword cube _cnode parent Returns a new call tree node structure with line numbers cube _cnode cube _def _cnode cube _t c cube _region callee cube _cnode parent Returns a new call tree node structure without line numbers cube _machine cube _def _mach cube _t c textcolor keyword const textcolor keywordtype char name textcolor keyword const textcolor keywordtype char desc 51 Chapter 2 CUBE3 API Returns a new machine cube _node cube _def _node cube _t c textcolor keyword const textcolor keywordtype char name cube _machine mach Returns a new node cube _process cube _def _proc cube _t c textcolor keyword const textcolor keywordtype char name textcolor keywordtype int rank cube _node node Returns a new process cube _thread cube _def _thrd cube _t c textcolor keyword const textcolor keywordtype char name textcolor keywordtype int rank cube _process proc Returns a new thread cube _cartesian cube _def _cart cube _t c textcolor keywordtype long ndims textcolor keywordtype long textcolor keywordtype int dimv textcolor ke Defines a new Cartesian topology textcolor
83. th the effects of random errors introduced by un related system activity during an experiment or to summarize across a range of execution parameters You can conduct several experiments and create a single average experiment from the whole series The mean operator takes an arbitrary number of arguments The possible output is presented below user host cube3_mean scoutl cube scout2 cube scout3 cube scout4 cube o mean cube Mean operation begins Reading scoutl cube done HAARHRHAARHDARARARDIARARAARADARARA FO Merging program dimension Merging system dimension oO o OF fn 5 oO o o jara oO o Q E Mapping severities done Adding topologies done Mean operation done g scout2 cube done Merging metric dimension Merging program dimension Merging system dimension Mapping severities done Adding topologies done Mean operation done g scout3 cube done Merging metric dimension Merging program dimension Merging system dimension Mapping severities done Adding topologies done Mean operation done g scout4 cube done Merging metric dimension Merging program dimension Merging system dimension Mapping severities done Adding topologies done Mean operation done Mean operation ends Writing mean cube
84. the position of the node s value between 0 0 and a maximal value These maximal value is the maximal value in the tree for the absolute value mode or 100 0 otherwise See the menu item Display General coloring in Sec tion 1 3 2 1 and the context menu item Min max values in the context menu description below for color settings A label in the metric tree shows the metric s name A label in the call tree shows the last callee of a particular call path If you want to know the complete call path you must read all labels from the root down to the particular node you are interested in After switching to the flat profile view see below labels in the flat call profile denote methods or program regions A label in the system tree shows the name of the system resource it represents such as a node name or a machine name Processes and threads are usually identified by a rank number but it is possible to give them specific names when creating a CUBE file The thread level of single threaded applications is hidden Multiple root nodes are supported After opening a data set the middle panel shows the call tree of the program However a user might wish to know which fraction of a metric can be attributed to a particular region e g method regardless of from where it was called In this case you can switch 15 Chapter 1 Cube 3 4 User Guide from the call tree view default to the flat profile view Figure 1 7 In the flat profile vi
85. tion about the instances of the selected metric in the form of a box plot For an in depth explanation of this feature see subsection 1 23 Max severity in trace browser Only available for metric and call trees and only if a statistics file providing information about the most severe instance s of the selected metric is present If CUBE is already connected to a trace browser via File Connect to trace browser the timeline display of the trace browser is zoomed to the position of the occurrence of the most severe pattern so that the cause for the pattern can be examined further For a more detailed explanation of this feature see subsection 1 3 3 2 24 Sort by value descending For flat call profiles only Sorts the nodes by their current values in descending order Note that if an item is expanded its exclusive value is taken for sorting otherwise its inclusive value 25 Sort by name ascending For flat call profiles only Sorts the nodes alphabeti cally by name in ascending order 1 3 2 6 Boxplot Statistics Display The boxplot statistics display shows a box and whisker distribution of metric severity values for the currently active subset of system resources typically threads The active subset is changed via the combobox menu at the bottom of the pane and the y axis scale is adjusted via the display mode combobox at the top of the pane The vertical whisker ranges from the smallest value minimum and to the largest value
86. turns the value for the point met cnode thrd 49 Chapter 2 CUBE3 API 2 1 1 6 Miscellaneous Often users may want to define some information related to the CUBE file itself such as the creation date experiment platform and so on For this purpose CUBE allows the definition of arbitrary attributes in every CUBE data set An attribute is simply a key value pair and can be defined using the following method textcolor keywordtype void def _attr textcolor keyword const std string amp key textcolor k Assigns the value value to the attribute key CUBE allows using multiple mirrors for the online documentation associated with met rics and regions The url expression supplied as an argument for def_metric and def_region can contain a prefix mirror When the online documentation is ac cessed CUBE can substitute all mirrors defined for the prefix until a valid one has been found If no valid online mirror can be found CUBE will substitute the doc directory of the installation path for mirrort textcolor keywordtype void def _mirror textcolor keyword const std string mirror Defines the mirror mirror as potential substitution for the URL prefix mirror std string get _attr textcolor keyword const std string key textcolor keyword const Returns the attribute in the CUBE object stored for the given key textcolor keyword const std map lt std string std string gt get _attrs
87. ty values of met cnode for all threads to the given file This can be used instead of cube_write_sev_matrix to incrementally write parts of the severity matrix textcolor keywordtype void cube _write _finish cube _t c FILE fp Writes the end tags to a file Must be called at the very end before closing the file but only when incrementally writing the severity matrix using cube_write_sev_matrix When using cube_write_sev_matrix to write the severity matrix in one chunk call ing this function is not needed 2 1 2 Typical Usage A simple C program is given to demonstrate how to use the CUBE write interface Example below shows the corresponding CUBE display The source code of the target application is provided below 1 textcolor keywordtype void foo M 10 11 textcolor keywordtype void bar 20 21 textcolor keywordtype int main textcolor keywordtype int argc textcolor keywordty 60 foo 80 bar 100 textcolor comment A C example using CUBE write interface textcolor preprocessor include lt cube3 Cube h gt textcolor preprocessor include lt string gt textcolor preprocessor include lt fstream gt textcolor keyword using namespace std textcolor keyword using namespace cube textcolor keywordtype int main textcolor keywordtype int argc textcolor keywordtype char 54 2 1 Creating CUBE Files Cube
88. utine cubefile OR cube3_stat h p m metric metric t topN cubefile h Display this help message p Pretty print statistics instead of CSV output Provide statistics about process thread metric values m List of metrics default time r List of routines default main t Number for topN regions flat profile 1 4 11 Conversion from TAU profile format to CUBE3 Converts a profile generated by the TAU Performance System 11 into the CUBE for mat Currently only 1 level 2 level and full call path profiles are supported Usage tau2cube tau profile dir o cube 1 4 12 Topology Assistant Topology assistant is a tool to handle topologies in CUBE files It is able to add or edit a topology Usage cube3_topoassist OPTION cubefile The current available options are 39 A A A 416 48 48 32 12 12 DD Hs SP SP SP os Chapter 1 Cube 3 4 User Guide e To create a new topology in an existing CUBE file e To re name an existing virtual topology and e To re name the dimensions of a virtual topology The command line switches for this utility are c creates a new topology in a given CUBE file n displays a numbered list of the existing topologies in the given CUBE file and lets the user choose one to be named or renamed d displays the existing topologies and lets the user name the dimensions of one of them The resulting CUBE file is named topo cube gz in the current
89. v textcolor keyword const Returns a vector with all call tree nodes in the CUBE object Cnode get _cnode Cnode amp cn textcolor keyword const Search a call tree node cn Returns NULL if the CUBE object does not contain the given call tree node 2 1 1 3 System Dimension This group refers to the system dimension of the performance space It reflects the system resources which the program is using at runtime The entities present in this dimension are machine node process thread which populate four levels of the system hierarchy in the given order That is the first level consists of machines the second level of nodes and so on Finally the last i e leaf level is populated only by threads The system tree is built in a top down way starting with a machine Note that even if every process has only one thread users still need to define the thread level Machine def _mach textcolor keyword const std string name textcolor keyword const std Returns a new machine with the name name and description desc Node def _node textcolor keyword const std string amp name Machine mach Returns a new SMP node which has the name name and which belongs to the machine mach Process def _proc textcolor keyword const std string amp name textcolor keywordtype int rank Node node Returns a new process which has the name name and the rank rank The rank is a number from 0 to n 1 where
90. you can click somewhere in the column of the desired metric which will yield a small window as shown in the top right corner of Figure 1 10 displaying the exact values of the statistics 1 3 3 2 Display of most severe pattern instances using a trace browser If a statistic file also contains information about the most severe instances of certain patterns CUBE can be connected to a trace browser currently Vampir 8 9 and Paraver 6 7 are supported in order to view the state of the program being analyzed at the time this most severe pattern instance occurred For collective operations the most severe instance is the one with the largest sum of the waiting times of all processes which is not necessarily the one with the largest maximal waiting time of each individual process To use this feature you first have to connect to a trace browser by using the Connect to 26 1 3 Using the GUI Connectto vampir Host localhost Port 30000 ja LY File ate felix kojak bin example ctest pomp elg Browse OK Cancel Connect to paraver Configuration file 5 General views state_as_is cfg Trace file felixikojak bin example ctest pomp prv Browse OK Cancel Figure 1 11 The dialog windows for a connection to Vampir and to Paraver trace browser menu item of the File menu which offers to connect to Vampir as well as to Paraver This will open one of the two dialog windows shown below For Vamp
91. ystem pane shows a topology display and it offers functions to manipulate the display of the above grid planes The toolbar can be labeled by icons by text or it can be hidden see menu Topology Toolbar in Section 1 3 2 1 The toolbar buttons have tool tips i e a short description pops up if the toolbar is enabled and you move the mouse above a button 11 Chapter 1 Cube 3 4 User Guide The functions are the following listed from the left to the right in the topology toolbar Move left e Moves the whole topology to the left Move right Moves the whole topology to the right Move up 1 Moves the whole topology upwards Move down Y Moves the whole topology downwards Increase plane distance Increase the distance between the planes of the topology Decrease plane distance E Decrease the distance between the planes of the topology Zoom in Enlarge the topology Zoom out Scale down the topology Reset R Reset the display It scales the topology such that it fits into the visible rect angle and transforms it into a default position Scale into window 5 It scales the topology such that it fits into the visible rectangle without transformations Set minimum maximum values for coloring Ly Similarly to the functions offered in the context menu of trees see Section 1 3 2 5 you can activate and deactivate the application of user defined minimal and maximal values for the color extremes i e the values
92. ystem resource subsets By default all system resources typically threads are included when determining box plot statistics Other defined subsets can be chosen from the combobox below the box plot such as Visited threads which are only those threads that visited the currently selected callpath The current subset is retained until another is explicitly chosen or a new subset is defined Additional subsets are defined from the system tree with the Define subset context menu using the currently selected threads via multiple selection Ctrl lt left mouse click gt or with the Find Items context menu selection option 1 3 2 5 Tree browsers A tree browser displays different hierarchical data structures in form of trees Currently supported tree types are metric trees call trees flat call profiles and system trees The structure of the displayed data is common in all trees The indentation of the tree nodes reflects the hierarchical structure Expandable nodes i e nodes with non hidden chil dren are equipped with a sign for collapsed and for expanded nodes Furthermore all nodes have a color icon a value and a label The value of a node is computed as explained earlier basing on the current selections in the trees on the left hand side and on the current value mode The precision of the value display in trees can be modified see the menu item Display Precision in Sec tion 1 3 2 1 The color icon reflects

Download Pdf Manuals

image

Related Search

Related Contents

  Tristar BR-1012 toaster  平成8年門審第42号 ケミカルタンカー天城丸機関損傷事件 〔簡易〕 言渡  Wine User Guide  Machine Model Parameter Determination Final Report  serpillieres jetable  User Manual  Avaya Business Communications Manager Telephone User's Manual  User Manual - Syntralink  Mode d`emploi abrégé du GPS GeoExplorer3 et du Software GPS  

Copyright © All rights reserved.
Failed to retrieve file