Home
Debugging and Profiling Workshop June 17, 2014
Contents
1. Sell EE EUM ae EN ee L1 uim 25 ell o Ui ue m ni L Sl 30 31 32 33 51 52 54 Invalid read of size 4 at 0x4012E2 Sparse_CG cg buggy c 53 by 0x401D33 main cg buggy c 182 Address 0x5528e 7c is 4 bytes before a block of size 53 744 alloc d at 0x4C267BA malloc vg replace malloc c 263 by 0x401162 Sparse CG cg buggy c 31 by 0x401D33 main cg buggy c 182 Invalid read of size 4 at 0x4015A0 Sparse CG cg buggy c 83 by 0x401D33 main cg buggy c 182 Address 0x555050c is 4 bytes before a block of size 53 744 atloc d at 0x4C267BA malloc vg replace malloc c 263 by 0x4011AA Sparse CG cg buggy c 34 by 0x401D33 main cg buggy c 182 oldx float malloc n sizeofCf Loat 5 r float malloc n sizeof float oldr float malloc n sizeof float for Ck K1 k lt K2 1 k sum AA k oldx JA k 1 j 1 based based confusion Debuggers lext Valgrind Code The 24512 24512 24512 24512 24512 SA gt SIE was assuming 1 based but the input is Q based oldx float malloc n sizeof float r float malloc n sizeof float oldr float malloc n sizeof float p float malloc n sizeof float oldp float malloc n sizeof float for k K1 k lt K2 1 k 1 sum AA k oldx JALk It was oldx JA k 1 now fixe
2. Context Event User Event Ba User Event Sta The labels are scaled to the exclusive time spent but the can be enlarged using the mouse to read the function names Eroniers GUI TAS Statistics lable eoo X TAU ParaProf Statistics for node 0 nv pf2 mbelgin3 Pace Workshop codes NPB3 3 MPI bin File Options Windows Help vme MENNNNNES CCCEEEEEENNNNNNEGGSIO o CCCRRRERERRERERREREEREENENNNNENE Name Exclusive TIME V Inclusive TIME Calls Child Calls 2 MC caf 49 7 655 9 0 002 8 608 1 20 gt Bg INITIALIZE MPI cg f 1663 71 1692 9 0 074 0 78 1 4 Mi MPI Init 0 706 0 706 1 0 B MPI Bcasto 0 0 1 0 B MPI Comm rank 0 0 1 0 WPi Comm size 0 0 1 Q B MPI Finalizeo 0 022 0 022 1 0 gt E Loop CG ica t 471 71 1550 111 0 002 0 598 1 105 B MPI Barriero 0 0 1 0 gt BM AKEA ica t 1486 7 1602 9 0 6 36 1 4 B PRINT RESULTS fprint results f 2 7 118 12 0 0 1 0 gt ga SETUP SUBMATRIX INFO cg f 816 73 4102 1 91 0 0 1 3 J Loop CG ca f 369 7 43 9 11 0 0 842 T 7 B MPI Reduceo 0 BB Loop CG ca f 349 7 353 11 0 TAU ParaProf nv pf2 mbelgin3 Pace Workshop gt B Loop CG cg f 321 7 4323 12 0 is Windows Help Bi Loop CG ca f 457 7 45 9 12 0 B TIMER STOP itimers f 43 71 5 9 Sy Y MM TIMER START timers f 22 7 27 9 0 VE 3D Visualization FH Loop CG ica f 1448 7 1450 113 0 i 3D Commun
3. i 3363 46 j Calculate Resudial r with initia zi sum 0 0 E Project Files Source Tree m Header Files 50 Name i N RER 1 n 13436 Source Files 51 Type int 52 Value 3363 k lt K2 1 k oo NS SUM AAK K ol JALK J 54 55 oldr i sum sum 0 0 Move mouse 5 over variables il for i40 i lt n i 59 oldr i bli olar il to see their 31 value gt us be none selected Input Output Breakpoints Watchpoints Stacks X Tacepoints Tracepoint Output Evaluate amp x Tracepoints 8 x Expression Value Threads Fie Line Function Condition StartAfter Tiggertvery StopAfter Double click on 49 to create a breakpoint or right click and select from menu Debuggers GUI DDT Allinea DOT v3 2 24924 Contri h Help 2 13 T LAS i 1 J nDn 7 ji an current Process C Thread Step Threads Together gu Project Files 8 X cg buggyc CJ Locals Current Line s Current Stack Search Ctri K 2 vi ini A Current Line s sx B Project Files for i 0 i n i Variable Name m Source Tee 44 oldx 1 x i i 3363 Mi Meader Files 5 a 19i 4m44421 MIT apiye Source Files H Sum a 0 0 inii Ki 21656 T for keK1 K K2 1 k 53 sum AA k oldx JA k 11 oldr i sum Y Sum 0 0 for 1 0 1 n i fo S AL gt
4. all the children will be indirectly lost If you fix the definitely lost leaks the indirectly lost leaks should go away possibly lost means your program is leaking memory unless you re doing unusual things with pointers that could cause them to point into the middle of an allocated block see the user manual for some possible causes Use show possibly lost no if you don t want to see these reports e still reachable means your program is probably ok it didn t free some memory it could have I his is quite common and often reasonable Don t use show reachable yes if you don t want to see these reports e suppressed means that a leak error has been suppressed There are some suppressions in the default suppression files You can ignore suppressed errors Debuggers GUI DDT Allinea DDT www allinea com products ddt e A commercial debugger with a GUI PACE has a single user license with up to 32 procs Heavily builds on GDB does everything GDB does and more Supports memory debugging and data structure visualization Supports Mvapich2 OpenMPI and also custom MPI stacks Supports GNU Intel amp PGI compilers and more Distributed debugging with focus on scalability Debuggers GUI DDT e We will use the same buggy CG code e Starting the DDT debugger always on a compute node use msub I msub I X q iw shared 6 l nodes 1 ppn 8 pmem 2gb module load gcc mvapich2 whichever compiler MPI
5. module load ddt ddt Select Auto Detect none DDT Configuration Wizard DOT Configuration Wizard Welcome to the DDT Configuration Wizard MPI UPC Implementation This wizard will take you through the Er Please select your MPI UPC implementation from the box below or select required to configure DDT for your system None if you do not wish to use MPI or UPC with DOT Create a new configuration file If you do not know which MPI UPC implementation you are using select the C Import an existing configuration file Auto Detect option which should work for most implementations Auto Detect none m E RS o Debuggers GUI DD I eoo X DDT Configuration Wizard Job Scheduling Do you want to configure DDT to submit jobs using a job scheduler or queue LoadLeveler Portable Batch System Sun Grid Engine etc C Configure submission of jobs through a job scheduler C Skip this step See section 2 3 Integrating DDT With Queuing Systems of the user guide for details Your decision really but I usually skip this step and run things interactively eoo X DDT Configuration Wizard Site Wide Configuration Do you want to see instructions for creating a site wide configuration for all users C See instructions for site wide configuration C Skip this step Back Next This 1s for admins you can also skip this step Debuggers GUI DDT 600 DDT Welcome Open Core Files Open
6. 126 j 1000 127 oap owe 128 13437000 129 for 120 1 lt n 1 d 13436000 130 oldr i riliji 13436000 131 olap 1i pLi 13430000 132 Protilers API PAPI Performance Application Programming Interface nttp icl cs utk edu papi e A profiling API for C C Fortran Java and collection of tools e Supports a large variety of architectures intel AMD Power e Used by many profiling packages AU OpenSpeedshop etc e No longer requires modified Kernel for hardware counter support starting with 2 6 39 PAPI Machine Dependent Substrate Kernel Extension Operating System Hardware Performance Counters Portable Layer PAPI Low Level Machine Specific Layer Protilers API PAPI Preset Events Can be a single hardware event or derived using multiple events E g Single PAPI_TOT_CYC Total number of cycles single event Derived PAPI L1 TCM Total L1 misses L1 data misses L1 instr misses Support for Preset Events depend on the architecture The number and types of Preset Events that can be counted concurrently are also architecture dependent Usage on PACE Clusters for both API and tools module load papi Protilers API PAPI Getting the list of supported events papi avail papi_avail Available events and hardware information PAPI Version AS 0T Vendor string and code AuthenticAMD 2 Model string and code AMD Opteron tm Processo
7. CG 1ca i5 1 14000 6 355 Loop MAKEA f cg f 1547 7 41568 11 i 1 14000 6 355 CG cg f 49 7 655 9 gt MAKEA cg f 4 AU ParaProf nv pf2 mbelgin3 Pace Workshc 7000 7000 4 2E 5 VECSET cg f 1834 7 1862 9 Help 7000 7000 4 2E 5 CG cg f 49 7 1655 9 gt MAKEA cg T 7000 0 1 4 E 5 Loop VECSET co Tf 1850 7 1855 11 tt i y CG icg f 49 71 1655 9Y MAKEA icg f m i JE Ca 7 gt Cg 3D Communication Matrix 2044 X 3 4E 5 WPI SendO Communication Matrix 2625 0 3 4E 5 Fundion ki 2625 0 3 3E 5 CG co f 49 7 655 9 gt Loop CG cg 1 4 0 78 INITIALIZE MPI cg f 663 7 692 9 Bar Chart 1 4 0 78 CG cg f 49 7 1655 9 gt INITIALIZE MPI 2625 0 1 5E 5 CG c0 f 49 7 655 9 gt Loop CG ica twv gt Eroniers GUI TAS e Packing all profiling data into a single package cd bin paraprof pack tau results ppk then on any system with TAU installed paraprof tau results ppk e Dynamic Instrumentation for codes that are not compiled with LAU mpirun np 8 tau_exec cg W 8 TAU will do its best to profile the code e lext based paraprof pprof pprof profile 0 0 0 pprof profile 1 0 0 ES Separate runs for each thread process Thank You e Your feedback will be appreciated mehmet belgin oit gatech edu Give it to me straight welcome criticism We might send you a survey later and any comment will hel
8. to compare two metrics on the Height value same plot C olor value Thread Scales Plot Axes Color Rendet We have only Orientation TIME here so a 3D viz is not that meaningful w Show Axes NW N f Font Size SE S Label Length Profilers GUI TAU 800 TAU ParaProf nv pf2 mbelgin3 Pace_Workshop codes NPB3 3 MPI bin File Options Windows Help Metric TIME value Exclusive std Dev illl Mean Max Min node O node Le 7 Show Source Code Right click on the green bar function SPRNVC and select 99 node 2 oh node 3 E Show Function Bar Chan node 4 ES Show Function Histogram noce gt Assign Function Color node 6 a g node 7 LL Reset to Default Color Rename eoo X Looking for cg f ParaProf could not find cg f would you like to add the containing directory to the search list You might need to tell AU where the source codes are if they not in the same directory as the executables Error ers GUI TAU O Oo AU Fara ro urce Browser nv ptZ mbelgir ace Worxshnop c File Help Jui 1 28 enddo ave iec write 11000 nza 1730 1731 return 11000 format final nonzero count in sparse 1732 Zi j 1733 1 number of nonzeros 116 You will not see the Show 1734 C end of sparse man Source Code option for 1736 9 functions that do not 1739 come fr
9. O x TAU ParaProf nv pf2 mbelein3 Pa TAU ParaProf 3D Communication Matrix nv pf2 mbelgin3 Pace Workshop codes NP83 3 MPI bin File Options Windows Help File Windows Help Metric TIME ParaProf Manager value Exclusive 3D Visualization 3D Communication Matrix Display Options std Dev Mi Communication Matrix Callpath Mean ul uw LI nction M mk Height Value k Thread Number of calls ne LI piri Function Legend mm RUE alene gt Lo Group Legend Mean message size bytes node 3 User Event Legend node 4 E Group Changer node 5 node 6 Close All Sub Windows a Receiver node 7 E Height value Color value We can easily identify two kinds of messages es Pot Anes ColorsScale Render 14000 Show ColorScale Font Size e Rainbow Gray sca 2 Less Frequent but large Inverse Gray sca Blue R Blue White Red 161 3 06 3 Mean message size bytes Eroniers GUI TANM O ON TAU ParaProf Call Graph for n c t 6 0 0 nv pf2 mbelgin3 Pace Workshop codes NPB3 3 MP X TAU ParaProf nv pf2 mbelgin3 Pace Works File Options Windows Help ions s Help LLL TT LTTE ME lusive 3D Visualization 3D Communication Matrix Communication Matrix Bar Chart Statistics Text Statistics Tabl Function Legend Group Legend User Event Legend Group Changer Call Path Rela Close All Sub Windows
10. Q2 e Requires compilation with g pg both in the Makefile DEBUGOPTS g pg 02 fprofile arcs ftest coverage make clean make all e Nothing extra on the command line Just run the code cg this time cg bayer10 mtx csr NOT CONVERGED at iteration 1001 Elapsed time 0 551763 sec e A file named gmon out appears in the working directory e lo see the profiling information run gprof cg gprof out Profilers lext Gprot Flat profile Each sample counts as 0 01 seconds cumulative self self total time seconds seconds calls ms call ms call name 100 10 0 55 0 55 1 550 54 550 54 Sparse CG 0 00 0 55 0 00 2 0 00 0 00 rtc Call graph Cexplanation follows granularity each sample hit covers 2 byte s for 1 8296 of 0 55 seconds index 96 time self children called name 0 55 0 00 IAAL main 2 1 100 0 0 55 0 00 1 Sparse CG 1 Current function spontaneous 2 100 0 0 00 0 55 main 2 Current function 0 55 0 00 TA Sparse CG 1 0 00 0 00 IA fod rtc 3 0 00 0 00 2 2 main 2 3 0 0 0 00 0 00 2 cro Current function Protilers lext Gprof e he 1 2 3 are tables for each function sorted by the exciusive time spent 6 Gprot output is verbose use b to omit definitions e otal 76 might be 7100 0 due to accumulated sampling errors e self means this function alone e cumulative means this function plus all listed above it parents children means time pr
11. and debug a coredump Restore a Checkpoint DDT e Run amp Debug Is for running and debugging the code interactively What would CE like to do raa Debug a Program Run and debug a code Manually ponen Only for command line nn Program Attach any of the running processes e Manually Launch a Program is for runs started DD s command line tools e Attach to any running processes which you own e Displays running processes and allows you to pick any subset e Allows you to selectively attach e g only 32 procs of 128 total e DDI can also analyze coredumps Debuggers GUI DDT eoo X DDT Run Application nv hpl6 mbelgin3 data Pace Workshop codes cg buggy baye Details Application nv hp16 mbelgin3 data Pace Workshop codes cg buggy u 1 nput matrix 1S an Arguments bayer10 mtx csr z argument 5 NOT an Input File a input file since it 1s not redirected in Working Directory nv hp16 mbelgin3 data Pace Workshop codes vi a the code with n Runtime no MPI Details OpenMP Details CUDA Details Memory Debugging Details Environment Variables CG MAXITER 100 Details CG MAXITER 100 Plugins none Details cance un gaS EORR B A JJ v ee Oe ee m l mm Focus on current Process C Thread Step Threads Together Debuggers GUI DDT Project Files ax Cearch Ciriek a B Project Fies Source Tr
12. show only the variables on that line Click and drag between lines 13 and 1138 in the source code to show all the variables in that region eoo Allinea DDT v3 1 5 a amp 45 45 3 3 1 m 5 Current Group Ai v Focus on current Group C Process C Thread Step Threads Together an OBE Create Group Project Files OX t sartmpi c c CJ Locats Current Line s Current Stack gt D e write y Im I lt Locals 5x E Project Files tables x v x 12 Cv 1 Variable Name Value NEMPE m Source Pee 3 y te my rank 1 sdim 0x0 m Header Files 1 3 source 12767 9 Source Files s t2 0x612010 tables tag 50 test troopa 0x1388100013880 Address 0x138 x 164443256 func10 m 4251280 X func2 5 1 hat Input Output Breakpoints Watchpoints Stacks Tracepoints Tacepoint Output WV i BEN async thread ibv get async event device c 189 read unistd h 45 FIX 1 y 4251280 the number of arguments 2 2 Fix on line 117 for y 0 y lt argc y DDT Parallel case startmpi c T Now try with 5 procs mpirun np 5 startmpi cab c CRASH Open DDT again Try clicking on the boxes representing processes QO to 4 how do the values in stack change 2 Can you spot the problem hint check the screenshot X Allinea DDT v3 1 eoo Session Control Search View Help 29S 45 EHE n 7 Cur
13. slower in parallel e run fast up to IN processors but stop scaling for gt N Profilers can tell us e time consumed by functions loops and even lines for each thread process e the location of a codes bottleneck Pareto Principle 80 20 rule e event counts instruction data cache misses memory access stalls etc e call graphs which functions call which functions e communication matrices Our Arsenal including but not limited to Debuggers text based GDB valgrind GUI Pee Profilers text based Gprot Gcov PAPI GUI tes ONS Registration single step Registration Run case sensitive pace register classes And pick this class from the list This command e Includes your username name email in the registration list e Moves the course material including codes files and this presentation to data PACE Debugging Profiling Class e Registering for multiple times is OK but overwrites this directory and everything In tt e Alternatively http pace gatech edu workshop DebuggingProfiling pdf Course Materials Files of interest data PACE Debugging Profiling Class codes a C8 C Sequential Conjugate Gradient CG Solver Bs cg buggy c Buggy sequential Conjugate Gradient CG Solver MPI DDT MPI codes for parallel debugging with DDT ae startmpi_c c startmpi_f f90 Buggy MPI code L Cpl c Another buggy MPI code E N PB3 3 M PI MPI para
14. ACE E Debugging and Profiling Workshop June 17 2014 Mehmet Memo Belgin PhD www pace gatech edu scientific Computing Consultant Georgia lech OI ART PACE mehmet belgin ort gatech edu Debugging and Profiling Workshop e A look at available debuggers and Profilers on PACE clusters text GUI e Debuggers GDB Valgrind DDT e Profilers Gprot Gcov PAPI TAU e Hands on examples e Run pace register classes and pick this class in the list to register and copy the class materials in data PACE Debugging Profiling Class e This includes everything you need to follow replay the tutorial e Slides are designed to be self contained yes they are crowded Path Boring Interesting Debuggers w text w SU wb Profilers w text w SU Overview Debugging Codes can and will e crash with errors e g segmentation faults e hang with no output w wo using CPU work on one system and fail on another e run to completion but produce inaccurate results Debuggers can tell us e the source code or libraries that are causing problems e where inside the code problems arise e values for variables at any given instance e where a variable is assigned an incorrect unexpected value e which arrays that are leaking memory allocation deallocation errors e which functions are called and in what order Overview Profiling Codes can and will e run very very slow e run even
15. B test case Buggy CG Step 1 Pinpoint the problem run backtrace list Cgdb show environment CG MAXITER Environment variable CG MAXITER not defined we found the first problem Cgdb set environment CG MAXITER 100 environment variables can be manipulated inside the GDB Cgdb run no need for input arguments if you are running again The program being debugged has been started already Start it from the beginning Cy or n y Starting program nv pf2 mbelgin3 PaceWorkshop codes cg buggy bayer10 mtx csr Program received signal SIGSEGV Segmentation fault we found a second problem 0x00000000004013e5 in Sparse CG AA Ox7ffff 7f62010 b 0x60d4d0 x 0x61a640 TA 0x60a040 JA 0x ffff 7f05010 n 13436 nnz 94926 delta 9 9999999999999995e 085 at cg buggy c 53 53 sum AA k olax JA k 1 gdb bt backtrace 0 Sparse CG AA Ox7ffff 7f62010 b20x60d4d0 x xo6labd IA 0x60a040 JA 0x7ffff7f05010 n 13436 nnz 94926 delta 9 9999999999999995e 08 at cg buggy c 53 1 0x0000000000401e17 in main Cargc 2 argv 0x fffffffe128 at cg buggy c 182 gdb list 53 48 for 120 1 lt n 41 d 49 K1 IA 1 50 K2 IA 141 1 5l 52 for Ck K1 k K2 1 k 53 sum AA k olax JA k 1 54 i 55 oldr 1 sum 56 sum 0 0 GDB test case Bugsy CG Step 2 Dig deeper place conditional breakpoints and print variables in stack Breakpoint Cheatsheet e info breakpoints List existing clear line
16. Miss LLd n values 0 Zendif TAU Tuning and Analysis Utilizies nttp www cs uoregon edu research tau home php Eroniers GUI TAS e A profiling GUI for C C Fortran Java Python paraprof For sequential and parallel distributed and multithreaded codes Supports both dynamic instrumentation and recompilation of code via compiler wrappers Collects and Visualizes profiling data including data by other packages Function and loop level granularity nothing at line level so far Supports 2D and 3D Visualizations Supports instrumentation using PD T program data toolkit Utilizes PAPI for HVV counters Provides a lext based interface pprof as well woners GUI TAS e Usage on PACE Clusters msub I X q iw shared 6 l nodes 1 ppn 8 pmem 2gb X for X11 forwarding module load gcc mvapich2 whichever compiler MPI module load tau 2 22 1 module list Currently Loaded Modulefiles 1 gcc 4 4 5 default 3 mvapich2 1 6 default 5 pdt 3 18 2 hwloc 1 2Cdefault 4 papi 5 0 1 6 tau 2 22 1 e Code re compilation requires a specific Makefile provided by TAU The AU module on PACE automatically defines it in your environment echo TAU MAKEFILE usr Llocal packages tau 2 22 1 mvapich2 1 6 gcc 4 4 5 x86 64 11ib Makefile tau papi mpi pdt openmp e We will use the NAS Parallel Benchmark Suite for AU demonstration nttp www nas nasa gov publications npb htm e NAS Suite comes with a MPI CG solver which we
17. PAPI L1 DCM PAPI L1 DCA PAPI FP OPS TIME OOP LEVET GrFanuiarity Create a callpath with a max depth of 100 TAU Event REM ai ee END_INSTRUMENT_SECTION export TAU CALLPATH DEPTH 100 TAU options file export TAU OPTIONS optTauSelectFile data PaceWorkshop codes NPB3 3 MPI bin select tau optVerbose Eroniers GUI TASS DON T run this script source it Source exports all env variables to shell msub I X q iw shared 6 l nodes l ppn 8 pmem Zgb if not in a compute node module purge 4 In case you have loaded modules cd data PaceWorkshop codes NPB3 3 MPI source tau runtime env sh echo TAU METRICS Check if sourcing worked fine PAPI L1 DCM PAPI Li DCA PAPI FP OPS TIME Good Recompile and run the code required due to new AU configurations make clean make cg NPROCS 8 CLASS W cd bin mpirun np 8 cg W 8 You will notice new directories named MULTI PAPI X Y ls MULTI PAPI L1 DCA MULTI PAPI FP OPS MULTI PAPI L1 DCM MULTI TIME Run paraprof in the bin directory paraprof eoo File Options Protilers GUI TAU See Height and Color Metrics Can you tell which loops are FP OPS heavy Windows Help TAU ParaProf 3D Visualizer nv pf2 mbelgin3 Pace Workshop codes NPB3 3 MPI bin Triangle Mesh Bar Plot Scatter Plot Topology Plot Height Metric Exclusive Color Metric Exclusive PAPLEP_OPS Function 4 Ihread 4 Heigh
18. Ye none selected _ Breakpoints watchpoints Stacks Tracepoints Tracepoint Output Evaluate eX 8 x Expression Value BI MN HN 773 02 CN 7770 7 7 Stop IA i 1 2 gt nnd Enter the condition Cthere is a typo here it should be IA 1 1 1 n Ready select Breakpoint lab and enter the breakpoint condition IA i 1 1 gt nnz Hit Play again Debuggers GUI DD I X Allinea DDT v3 2 24924 Session Control Search View Help eoo X Allinea DOT r u G amp S Ce 806032 11 97 i me naa Focus on current Process C Thread Step Threads Together Thread 1 stopped at breakpoint in Sparse CG cg buggyci49 hreads 1 fV Always show this window for user defined breakpoints Project Files 8x c a buggy E 5 gt Continue i 41 Initialize old f Project Files 12 Value E hm Source Tree 3 for 1 0 i lt n i EAA 0x7ffff7163010 Header Files an 3 oldxli xli alpha 4 59163468e 41 Source Files 46 Calculate Resudial r with initia b Ox60d4dO 47 beta 5 44100025e4 33 criteria 0 delta 9 9999999999999995e 08 i 3363 for k Ki k K2 1 k f pan _ sum AA k oldx ter 0 3 SJA 0x7ffff7106010 55 oldr i sum k 21656 56 n sum 0 0 LKI 21653 F for i 0 i lt n i 21633 mo 59 oldr i bli oldr i Pn 4 1 gt gt ye none selected Input Output Breakpoints Watchpoints Stacks Tracepoints Tracepoint Output Ev
19. al x 47 sum 0 0 48 for 1 0 1 lt n 1 49 Le ta 50 K2 IA 1 1 1 51 52 for Ck K1 k K2 1 k 53 sum AA k olax JA k 1 Cgdb print 1 oe 3363 Cgdb print nnz 6 94926 Cgdb print IA i 7 21656 gdb print IA 1 1 8 1065353210 IA 1 1 cannot be larger than nnz so this value is garbage GDB test case Buggy CG Step 4 The Fix Check cg buggy c for the location where IA is allocated and used 160 JA int malloc nnz sizeof 1nt 161 IA int malloc n 1 sizeof int This is n 4 13436 4 13440 bytes E 13440 bytes can hold 3360 integers not 13436 ED consistent with 123363 where the code crashed 164 169 for 120 1 n 1 4 1 170 fscanf fn 960 amp IA 1 IA must hold n 1 4 53748 bytes FIX 160 JA int malloc nnz sizeofCint 161 IA int malloc n 1 sizeof int Fixed by adding the missing parenthesis e GDB was able to tell us where the problem occurs e But GDB cannot tell us the size of dynamic arrays at run time gdb print sizeof IA 11 8 This is the size of the IA pointer not the array e The same symptoms could still arise if the input file included garbage values IA i 21656 IA 1 1 lt 1065353216 IA could be allocated large enough but filled with garbage values There is more to GDB e Watchpoints Breakpoints on variables instead of functions or lin
20. aluate PA Breakpoints amp x Expression value Threads Fie Line Function Condition StartAfter Trigger Every _ Stop 0 buwi 49 Gnar e C Afi 11 2 ni ta e It stopped exactly when the condition is met and we can browse for all variables e No need for print Debuggers GUI DDT e 0n A DDT Multi Dimensional Array Viewer X DOT Visualization Array Expression A 1 File View Viewpoint Distributed Array Dimensions None How do view distributed arrays Range of i 4 From o 3d m 34 Rows Cance v Align Stack Frames Auto update M Only show it See Examples Data Table Statistics Goto 4 Visualize f3 Export Full Window W E Process 0 re Right Click on IA from the Current Line s or Locals panel on the right and select View Array Enter O and 5453 n as the Range and click on Visualize We expect IA to gradually increase but the graph shows a drastic spike around 3000 remember 123363 Using visualization it only takes a single glance to recognize problems Parallel Debugging with DDT e Not so different from sequential debugging which cannot be said for text based debuggers e Process and hread level debugging with the ability to see and compare the stack for each process thread e Powerful Cross Process hread Comparison tool to compare the stack in different processes threads H
21. ands on Examples if there is time Warmup startmpi c c startmpi f f20 Deadlock cpi c DDT Parallel case startmpi c T eoo DDT Run cd codes MPI DDT Application data PACE Debugging Profiling Class codes MPI DDT startmpi c Details 4 source Load modules lication data PACE Deb j Profili Class codes MPI DDT start i g make Applica ata Debugging Profiling Class c E startmpi c a Arguments First try with no args mpirun np 4 startmpi C No problem Try with args mpirun np 4 startmpi cab c Number of processes 1 3 CRASH Open DDT Implementation mvapich 2 MPI no queue Change d d t mpirun arguments abc 0 start code in DDT see screenshot Input File amp Working Directory nv hpi6 mbelgin 3 work PaceWorkshop codes MPI DOT amp v MPI 4 processes mvapich 2 MPI Details 4 TE U GR AN tc cR v vt UA OpenMP Details M CUDA Details Memory Debugging Details Environment Variables none Details v Plugins none Details v DDT Parallel case startmpi c T 2 3 4 6 Hit the Play button to run When crashes hit pause Click on the main directly above the print arg function in the Stack View his takes you to main which lets you see where that arg value comes from Now click on the Locals tab on the right hand side of the GUI you are seeing all the local variables Click on the Current Line tab to simplify and
22. c Optional if you would like to keep the fixed code cp cg buggy c org cg buggy c make clean make all Restart DD T It will remember previous settings configuration is stored in ddt Session Control Search View Help u CRS ROR Focus on current Process C Thread F St X Allinea DOT AN Process 0 Thread 1 stopped in Sparse CG cg buggy c 53 with signal SIGSEGV Segmentation fault Locals Current Line s Current Stack Reason Origin address not mapped to object attempt to access invalid address 1 Your program will probably be terminated if you E sum continue 4 fot You can use the stack controls to see what the 0x711117163010 cg buggy 3 E Project Files 9 Source Tree W Header Files 0x7fff17106010 Source Files 51 Iv Always show this window for signals 95230 4 0x6278d0 Di 1 amp Continue aa C orar Sum sum 0 0 for i 0 i lt n i oldr i ae f oldr 1 Input Output Breakpoints Watchpoints Stacks Tracepoints Tracepoint Output Evaluate 8 Stacks 8 x Expression Value main cg buggy c 182 Debuggers GUI DDT eoo N Allinea DDT v3 2 24924 Session Control Search View Help u Ay S D 7 Focus on current Process C Thread Step Threads Together t cg buggyc 3 Locals Current Line s Current Stack AR Current Line s 8x Variable Name Value
23. d j for k K1 k lt K2 1 k 1 sum AA k p JA k It was p JA k 1 now fixed j code runs correctly but Valgrind still reports leaks LEAK SUMMARY definitely lost 1 243 108 bytes in 11 blocks Another Problem indirectly lost 0 bytes in Q blocks possibly lost 0 bytes in 0 blocks still reachable 16 404 bytes in 2 blocks suppressed Q bytes in 0 blocks Rerun with leak check full to see details of Leaked memoryggy c 34 This is what we will do by 0x401D33 main cg buggy c 182 More problems Definitely YES Trust Valgrind on this Debuggers lext Valgrina Full Leak Check Shows all sources for leaking memory valgrind leak check full exe args valgrind leak check full cg buggy bayer1 mtx csr 24935 74935 LA ISI 24935 ZAN Memcheck a memory error detector Copyright C 2002 2011 and GNU GPL d by Julian Seward et al Using Valgrind 3 7 0 and LibVEX rerun with h for copyright info Command cg buggy bayeri0 mtx csr NOT CONVERGED at iteration 101 Elapsed time 3 315764 sec a A LAS SEC 24935 24935 24935 AE 1 3 24935 iS 24935 Qo Jo jor SIL ee 24935 RS 24935 HEAP SUMMARY in use at exit 1 259 512 bytes in 13 blocks total heap usage 14 allocs 1 frees 1 260 080 bytes allocated 53 744 bytes in 1 blocks are definitely lost in loss record 3 of 13 at 0x4C267BA malloc
24. e v E Ane xecutable nvipf2 mbelains Pac Eroniers GUI TASS e This profiling data only includes TIME Double click on it e Then double click on any of the blue bars X TAU ParaProf Manager File Options Help Applications Cj Standard Applications Default App C3 Default Exp 2 9 bin NPB3 3 MPi codes Pace Workshop mbelgin3 pf2 nv TIME CPL Carse 15 C perfexplorer working jdbc h2 nvi X TAU ParaProf nv pf2 mbelgin3 Pace Workshop codes NPB3 3 MPI bin gt CI PACE Gdbc derby nv hpl mbelgi File Options Windows Help Metric TIME Value Exclusive TrialField Value bin NPB3 3 MPI codes Eroniers GUI TAS Function specitic view for the selected metric TIME for each process thread Windows Menu is identical for Function name MPI Init all views and not specific to E functions Explore 6 E O x TAW ParaP af Function Data Window nv pf2 mbelgin3 Pa File Options Help ParaProf Manager 3D EJ ton sorted by 3D Communication Matrix time spent in the Communication Matrix time function for each including min max mean and thread process std dev Protilers GUI TAU A TAU ParaProf 3D Visualizer nv pf 2 mbelgin3 Pace Workshop codes NPB3 3 MPI bin File Options Windows Help Iriangle Mesh Ear Plot Scatter Plot Topology Mat Height Metrix Excuse TIME Color Metric Exclusive Ea TIME Function 3D viz allows us
25. ean make cg NPROCS 8 CLASS W e NPROCS is the number of processors CLASS W defines the size e NPROCS and CLASS are NAS specific they have nothing to do with LAU e You can ignore the message that says usr bin ld warning libpfm so 3 needed by usr Llocal packages pap1 5 0 1 l1b libpapi so may conflict with Libpfm so 4 e Now find the executable named cg W 8 in the bin directory cd bin Ls cg W 8 e Run the Benchmark as usual mpirun np 8 cg W 8 Eroniers GUI TANM e You will notice new profiling files named as profile x y z for each processor ls cg W 8 profile 0 0 0 profile 2 0 0 profile 4 0 0 profile 6 0 0 e Run the AU GUI paraprof in the same directory paraprof X TAU ParaProf Manager File Options Help 9 Applications C Standard Applications 9 27 Default App C3 Default Exp 2 9 bin NPB3 3 MPl codes Pace Workshop mbelgin3 pf2 nv TIME C perfexplorer working dbc h2 nv hp16 mbelgin3 ParaProt perfexple gt CI PACE jdbc derby nv hp16 mbelgin3 ParaProf perfdmf TrialField Value bin NPB3 3 MPI codes gt Name Application ID Experiment ID Trial ID ache Size ommand Line PU Type PU Vendor File Type Index go ile Type Name ocal Time PI Processor Name emory Size ode Name 5 Machine 5 Release OS Version tarting Timestamp AU Architecture AU Config x EE nT MEE fuse ocu packages Aa 513203515 BS ANAL mer
26. ee gt m Header Files Source Files KA l Von Jj WMarive ALAN 9 1 7 I SIIZVCUSTSN AWA JA AA b float malloc n sizeof float x float malloc n sizeof float for 190 1 nnz i fscanf fn tf BAAL 1 for 140 1 nnz 1 fscanf fn td JAC1 for 1 0 1 n 1 i fscanf fn ka amp IA 1 5 delta 0 0000001 for Cj 0 j lt n jo j j 0 non O then rtcd Sparse CG AA b x IA JA n nnz delta now rtc then NS Passed Time now double repeat printf Elapsed time Xf sec n NS Passed Tine free AA free CIA free JA free b free x Annotated Source code return 0 input Output Breakpoints Watchpoints Stacks Tracepoints Tacepoimt Output Input Output 8 x Expression Value Type here Enter tosed PS Mw Locals Current Unets Current Stack value optenured out 2 Ox 7114268 value optimized out lt value optimized out value optimized out value optimized out value optenized out value optenized out value optimized out lt value op d out value c out lt value opt jw value optimized gt If you see this turn off optimizations Debugsgers GUI DDT Turn off the Optimizations in the Makefile DEBUGOPTS g pg 00 fprofile arcs ftest coverage cp cg buggy c cg fixed
27. es e watch var Stop on writes on var e rwatch var Stop on reads on var e swatch var Stop on writes reads on var e info breakpoints Listing and manipulation of watchpoints e ther useful commands step continue to next line next skip over the function cont run until the next breakpoint or to completion is there is none print sizeof var returns the size of a variable whatis var returns type of the variable ptype var similar to whatis but more detailed E g shows structs set var var value sets or replaces a variable at runtime E g gdb set var 1 2 5 e Running GDB in parallel e mpirun np 4 xterm e gdb your mpi exe well good luck with that e Use GUI debuggers Debuggers lext Valgrind Valgrind Valgrind http valerind org e A CPU simulator with hierarchical memory support e All requests for memory allocation deallocation are captured and analyzed e Subtle errors that does not crash the code can also be identified e Slow up to 50x so small test cases should be preferred e Six different tools e a memory error detector default two thread error detectors a cache and branch prediction profiler a call graph generating cache branch prediction profiler a heap profiler Debuggers lext Valgrind Usage on PACE Sequential module Load valgrind Very important Don t use the system default valgrind lt exe gt lt args gt Para
28. ication Matrix MB SETUP PROC INFO cg f 698 7 809 9 0 Communication Matrix B Loop CG ca f 3558 71 4360 111 0 B TIMER READ timers f 65 7 77 9 0 re gt Bar Chart IB TIMER CLEAR timers f 4 7 17 91 0 Function Legend Statistics Text BB RANDLC randi8 f 1 7 35 9 0 Group Legend Statistics Table liser Fvent Legend Call Graph Protilers GUI TAU Statistics Text 600Q N TAU ParaProf node 0 nv pf2 mbelgin3 Pace Workshop codes NPB3 3 MPI bin File Options Windows Help Metric TIME Sorted By Exclusive Units seconds Statistics Table Group Legend Ii Total Time Exclusive Inclusive Calls Child Calls Inclusive Call Name 68 1 2 961 5 85 7000 203452 8 4E 4 SPRNVC cg T 11740 71 11808 91 68 1 2 961 5 85 7000 203452 8 4E 4 CG ca fT 49 7 655 9 gt MAKEA cg f 21 7 1 866 1 866 130959 0 1 4E 5 RANDLC randi8 f 1 7 435 9y 21 7 1 866 1 866 130968 0 1 4E 5 CG cg f 49 7 655 9 gt MAKEA cg f 10 8 0 932 0 932 65484 0 1 4E 5 ICNVRT cg f 1814 7 1828 9 10 8 0 932 0 932 65484 0 1 4E 5 CG cg f 49 7 655 9 gt MAKEA cg f 10 5 0 9 0 9 2944 0 3 1E 4 MPI Wait 9 3 0 798 0 798 2 Q 0 399 CG cg f 49 7 655 9 gt Loop CG ica 8 2 0 706 0 706 1 0 0 706 MPI Init 8 2 0 706 0 706 1 0 0 706 CG ca T 49 7 4655 9 gt INITIALIZE MPI 7 0 0 373 0 605 15 8400 0 038 6 6 0 35 0 565 15 7875 0 038 CG cg T 49 7 655 9 gt Loop
29. llel CG solver from NAS Benchmark Suite A config make def Makefile definitions for parallel NAS Benchmarks bin Executables for NAS Benchmarks CG NAS Benchmark source codes for parallel CG input n bayer O mtx csr An Example sparse matrix in CSR format for sequential CG solver runs tau ru ntime env sh Environment variables required to run TAU profiler DebuggingProfiling pdf Course Slides PARI I DEBUGGERS Debuggers lext GDB GNU Project Debugger gdb nttp www gnu org software gdb quoting from GDB website GDB allows you to see what Is going on inside a program while it executes or what a program was doing at the moment it crashed GDB can do four main kinds of things plus other things in support of these to help you catch bugs in the act e Start your program specifying anything that might affect its behavior e Make your program stop on specified conditions e Examine what has happened when your program has stopped e Change things in your program so you can experiment with correcting the effects of one bug and go on to learn about another GDB test case Bugsy CG CG Conjugate Gradient Solver e An iterative Krylov Subspace solver e Requires positive definite sparse matrices e Sparse matrix vector multiply SpMV at each iteration Cg C Source code without a bug cg buggy c Source code with a bug Make cd data PaceWork
30. llel module Load gcc mvapich2 valgrind mpirun np cores valgrind exe args Alternatively to distribute each process output on a separate file mpirun np cores valgrind log file valgrind out p exe args valgrind out 27025 valgrind out 27026 valgrind out 27027 valgrind out 27028 Debuggers lext Valgrind valgrind output for the buggy CG run module load valgrind export CG MAXITER 100 valgrind cg buggy bayer1 mtx csr 9428 9428 9428 9428 9428 9428 9428 Invalid write of size 4 at 0x5625A20 IO vfscanf in lib64 libc 2 12 s0o by 0x563354A isoc99 fscanf in lIib64 libc 2 12 so by 0x401D28 main cg buggy c 170 The operation on line 170 is an invalid write Address 0x5a22c60 is 0 bytes after a block of size 13 440 alloc d 13 440 4 3360 I at 0x4C267BA malloc vg replace malloc c 263 by x401BF2 main cg buggy c 161 On the variable that was allocated on line 161 buggy CG source code 161 162 163 164 165 166 167 168 169 170 Ime ie malloc nit 1 sizeofCint ciu cm malloc n sizeof float x float malloc n sizeof float for 120 1 nnz 1 fscanf fn f amp a 1 for 120 1 nnz 1 fscanf fn d amp JA 1 for 120 1 lt n 1 1 fscanf fn d amp IA 1 Debuggers lext Valgrind But wait Looks like there is more which GDB did not complain about ee up
31. nt chooser PRESET PAPI L1 DCM PAPI L1 DCA PAPI L2 DCM PAPI L2 DCA PAPI TOT CYC Event PAPI L2 DCA can t be counted with others 8 supported but cannot count with others Protilers API PAPI Compilation with PAPI e Use of ifdef blocks are recommended to easily turn on off PAPI in the code ifdef PAPI Zendif e Load the PAPI module module load papi e Add PAPI and PFM libraries in the Makefile and DPAPI for ridef blocks in the Makefile PAPILIB L PAPIDIR lib lpfm lpapi PAPI PAPILIB DPAPI cg cg c CCC o cg cg c CDEBUGOPTS CPAPI CLIBS Protilers API PAPI Embedding PAPI in the code See cg c for a working example e Include the PAPI header define the number of concurrent events ifdef PAPI include papi h define NUMEVENTS 2 endi f e initialize PAPI and start counters ifdef PAPI Initialize PAPI int events NUMEVENTS PAPI L2 DCM PAPI L2 DCA Two events will be counted Start Counters int errorcode PAPI start counters events NUMEVENTS Start counters if Cerrorcode PAPI OK Error handling goes here endi f e Read from counters and printout the results Est Do some work here ifdef PAPI long long values NUMEVENTS Use long long since the number of events may get too large errorcode PAPI read counters values NUMEVENTS This function resets the counters fprintf stderr L2 Access LLd n values 1 fprintf stderr L2
32. om packages compiled without 1740 debugging enables g JA E 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 WFR 1753 1754 1755 1756 1757 1758 1759 1760 1761 e he function selected with blue text background E g try right clicking on the blue bar for MPI Init e Do not hope to see line by line metrics he finest granularity is loops and it needs to be enabled molers GUI TAS Not impressed yet Let s do more e Throw more metrics in the mix E g Number of cycles and Cache events e Use 3D visualization features to compare two different metrics at a glance e Derive new metrics using the already counted events e Check MPI communication patterns e Create a Call Graph e Get detailed counts statistics in table and text formats woners GUI TAS e TAU configuration is done using env variables Using a script is recommended See data PaceWorkshop tau runtime env sh bin bash Sets up runtime TAU intrumentation parameters module purge module load gcc module Load mvapich2 module load tau 2 22 1 beta The directory where profiling takes place export PROFILEDIR data PaceWorkshop codes NPB3 3 MPI bin Required for visualizing the communication matrix for MPI export TAU COMM MATRIX 1 Enable tracking for message communication export TAU TRACK MESSAGE 1 PAPI Events Which hardware counters to count L L G ni i export TAU METRICS
33. opagated into this function by its children e Add 1 A for annotated output NOT line by line only shows the number of calls for each function Erornlers lext Gpren gprof cg l A gt annotated gprof out in annotated gprof out void output vector char Label float fa int n double rtc 2 gt 4 Called twice struct timeval time gettimeofday 8time NULL return C double time tv sec 1000000 time tv usec 1000000 j Top 10 Lines Line Count 24 2 32 1 Execution Summary 86 Executable lines in this file 3 Lines executed 3 49 Percent of the file executed 3 Total number of line executions 0 03 Average executions per line Protilers ext Gcov Gcov e Show which parts of the code were executed e Can be regarded as a debugger or profiler depending on the usage e Code must be compiled with fprofile arcs ftest coverage in the Makefile DEBUGOPTS g pg 02 fprofile arcs ftest coverage make clean make all e gcov exe creates source c gcov the annotated source code File cg c Lines executed 93 07 of 101 cg c creating cg c gcov Protilers ext Gcov gcov cg IF C9 C 9 COV 118 Blank 1000 119 criteria 0 0 13437000 120 executed 13437000x for 120 1 lt n 1 13436000 121 criteria r i r i o PA 1000 123 if sqrt criteria lt delta HHHHH 124 Not executed printf Converged at iter d n iter HHHHH 125 break
34. p Have More Time Repeat
35. r 6168 9 Number Hardware Counters 4 Max Multiplex Counters 64 Name Code Avail Deriv Description Note PAPI L1 DCM 0x80000000 Yes No Level 1 data cache misses PAPI L1 ICM 0x80000001 Yes No Level 1 instruction cache misses PAPI L2 DCM 40x80000002 Yes No Level 2 data cache misses PAPI L2 ICM 40x80000003 Yes No Level 2 instruction cache misses PAPI L3 DCM 0x80000004 No No Level 3 data cache misses PAPI VEC SP 0x80000069 No No Single precision vector SIMD instructions PAPI VEC DP 0x80000060a No No Double precision vector SIMD instructions PAPI REF CYC 0x8000006b No No Reference clock cycles Of 108 possible events 40 are available of which 8 are derived avail c PASSED Protilers API PAPI Choose events to count concurrently papi event chooser USAGE papi event chooser Buggy Safe to ignore messages PAPI Error Didn t close all events Usage papi event chooser NATIVEIPRESET evti evt2 Q Can we count L2 Data Misses PAPI L2 DCM and Accesses PAPI L2 DCA together papi event chooser PRESET PAPI L2 DCM PAPI L2 DCA Event Chooser Available events which can be added with given events Q How about L2 Data Misses PAPI L2 DCM and L3 Data Misses PAPI L3 DCM together papi event chooser PRESET PAPI L2 DCM PAPI L3 DCM Event PAPI L3 DCM can t be counted with others 7 Not supported or no such cache exists Q PAPI LI DCM PAPI LI DCA PAPI L2 DCM PAPI L2 DCA PAPI TOT CYC papi eve
36. rent Group All v Focus on curre AN o 3 E2 E3 4 ent Group C Process C Thread Step Threads Together Create Group Project Files 8X c startmpi c c E Locals Current Line s Current Stack Search Ctri K UU in ka 101 printf my rank is d n my rank 2 Current Line s E Project Files Oc a m Source Tree 103 fort pom SS P K M Header Files TEY tables f Source Files 106 while y 12 rx 71131479040 107 a y 245 10 tables x y x 1 Cv e1 109 y my rank 1 110 111 d 112 113 ifCarac gt 1 amp amp my rank 0 114 115 printfc Rank d has d arguments n my rank 116 printfC They are nn 117 farlv 0 vearac ve of 4 gt d pe none selected Input Output Breakpoints Watchpoints Stacks Tracepoints Tracepoint Output Evaluate Stacks 8x 1 as 1 ee easync thread 1 enn 1 LL mnain statmpi c c 108 Processes Threads Function FR U dk v dk v dk HH vA DDOI Parallel case Cone cd codes MPI DDT source load modules make First try with 4 procs mpirun np 4 cpi No problem Try with 10 procs mpirun np 10 cpi No problem Try with 8 procs mpirun np 8 cpi CRASH But why Homework Hint It a deadlock PART H PROFILERS Erorners lext Gpren Gprof part of GNU binutils package nttp www gnu org software binutils e Turn on the optimizations e g
37. shop codes module purge remove all modules in your environment module load gcc load required modules make clean clean existing objects executables etc make all make both executables cg and cg buggy ignore the usr bin ld warning if any lest run cg buggy bayer1 mtx csr PROBLEM Segmentation fault core dumped GDB test case Buggy CG e Requires g In the compilation for source code association e No optimization OO is preferred in the Makefile DEBUGOPTS g pg 00 fprofile arcs ftest coverage e Initiate gab gdb executable name gdb cg buggy no arguments inputs just the executable gdb run bayer1 mtx csr Starting program nv pf2 mbelgin3 PaceWorkshop codes cg buggy bayer10 mtx csr Program received signal SIGSEGV Segmentation fault 0x00007ffff 72c8122 in strtoll L internal from lib64 libc so 6 Cgdb bt bt 1s backtrace Z0 O0x00007ffff72C8122 in strtoll l internal C from 11b64 L1bc so 6 1 O0x00007ffff 2c4ecO in atol from 11b64 Libc so 6 Z2 QxQQ0000000040124c in Sparse CG AA Ax7ffff7f6ZQ10Q b 0x617240 x 0x624440 IA 0x60a040 JA O0x ffff7f05010 n 13436 nnz 94926 delta 9 9999999999999995e 08 at cg buggy c 29 3 Q0x0000000000401e37 in main Cargc 2 argv 0x fffffffdff8 at cg buggy c 182 gdb list 29 list the source code around line 29 a double criteria product 28 29 int MAXITER atoi getenv CG MAXITER 5 5 30 GD
38. t value Color value Scales Pat Axes C olor 6 033E7 Show Colorcale Font Size Hainbow inverse Grayscale Blue White Red counts Render Grayscale Elue Red Protilers GUI TAU Deriving your own metrics using collected data E g L1 MISS RATE 76 eoo X TAU ParaProf Manager File Options Help An Show Derived Metric Panel E Apply Expression File Re Apply Expression File LJ Default Exp 2 9 bin NPB3 3 MPl codes Pace Workshop mbelgin3 pf2 nv PAPI LI DCA Q PAPI FP OPS TIME PAPI L1 DCM gt C perfexplorer working jdbc h2 nvy hp 16 mbelain3 ParaProf perfexplorer working ALITO SERVER T RUE CJ PACE dbc derby nv hp16 mbelgin3 ParaProf perfdmf IPAPIL Applic JU Experl O Trial ID O Metric IDjO File Options Windows Help SETUP PROC INFO on node 6 experienced a 10 718 LI Miss Rate N Metric LI MISS RATE value EXLIUSIVE Std Dev Mean Max om Min Expression Li MISS RATE 100 PAPI LI DCM PAPI LI DCA node mia el el a e ee ou mew node 3 node 4 node 5 node 6 node 7 SETUP PROC INFO cg f 698 7 809 9 Use the Derived Metric Panel to Create your own L1 MISSRATE 100 PAPI L1 DCM PAPI L1 DCA Exclusive LI MISS RAT ELLO 71S our Inclusive LI M S KATEN10 718 counts Protilers GUI TAU 3D Communication Matrix
39. vg replace malloc c 263 by Ox4Q1B4C main Ccg_buggy c 162 53 744 bytes in 1 blocks are definitely lost in loss record 4 of 13 at 0x4C267BA malloc vg replace malloc c 263 by x401B61 main cg buggy c 163 53 744 bytes in 1 blocks are definitely lost in loss record 5 of 13 at 0x4C267BA malloc vg replace malloc c 263 by 0x401192 Sparse CG cg buggy c 31 Debugsers lext Valgrind In Sparse CGC add to the end freeColdx free n freeColdr free p freeColdp free q In main add to the end free free free free free valgrind leak check full cg buggy bayeri10 mtx csr e AOU ZE 2002 ec 20027 2602 26027 2002 A L00 2007 260027 AA IA JA b x HEAP SUMMARY in use at exit 16 628 bytes in 2 blocks total heap usage 14 allocs 12 frees 1 260 304 bytes allocated LEAK SUMMARY definitely lost 0 bytes in 0 blocks Finally indirectly lost 0 bytes in 0 blocks possibly lost 0 bytes in 0 blocks still reachable 16 628 bytes in 2 blocks suppressed 0 bytes in 0 blocks Reachable blocks those to which a pointer was found are not shown Debuggers lext Valgrind Valgrind FAQ 5 2 e definitely lost means your program is leaking memory fix those leaks e indirectly lost means your program is leaking memory in a pointer based structure E g if the root node of a binary tree is definitely lost
40. will use Eroniers GUI TASS e Change directory to PaceWorkshop codes NPB3 3 MPI cd data PaceWorkshop codes NPB3 3 MPT e Check config directory for Makefile definitions cd config ls al lrwxrwxrwx 1 mbelgin3 pace admins 12 Feb 11 14 17 make def gt make def tau pw 1 mbelgin3 pace admins 7264 Feb 11 14 13 make def org pw 1 mbelgin3 pace admins 7337 Feb 12 16 41 make def tau e make def org is the original definitions file that comes with the suite e make def tau includes the modifications needed for AU e Currently make def is linked to make def tau switch between these two as you wish Eroniers GUI TAS ets check the differences between two Makefile definition files diff make def org make def tau ene only difference is replacing the E 303233 compiler with AU provided wrapper lt MPIF77 mpif77 E e On our system there is a default libpfm gt MPIF77 mpif77 usr Llib64 libpfm so gt MPIF77 tau f77 sh 79c79 80 qi EMI micc which Is not compatible with TAU so we a need to use the one that comes with PD I gt MPICC mpicc However this is not correctly defined in the gt MPICC tau_cc sh lpfm ES oe TAU Makefile TAU MAKEFILE mx Uu s cc g i Until this Is resolved we need to add gt ECC Ce 6 Lpfm gt CC tau cc sh lpfm Eroniers GUI TAS e Make the Parallel CG Suite EPIS or cd data PaceWorkshop codes NPB3 3 MPI make cl
41. z clear breakpoint at lineZ e disable lt breakpoint gt skip breakpoint but keep it in the list e ignore lt breakpoint gt lt N gt skip break point for the first N times e condition lt breakpoint gt condition stop at breakpoint if condition is met Cgdb list 53 48 for 120 1 lt n 1 d The relationship with i and k is 1 gt IA 1 gt K1 K2 gt k 49 EIS TAI 50 K2 IA 1 1 1 51 52 for Ck K1 k K2 1 k d 53 sum AA k olax JA k 1 54 55 oldr 1 sum 56 sum 0 0 57 Cgdb print k 1 95230 Cgdb print K1 2 21655 Cgdb print K2 K2 IA i 1 1 3 1065353214 Suspiciously High Should not be gt nnz nnz number of nonzeros in matrix Cgdb print nnz 4 94926 Cgdb break 49 We want to stop at line 49 Breakpoint 1 at 0x401343 file cg buggy c line 49 gdb condition 1 IA i 1 1 nnz stop at bp 1 049 ONLY when this condition is met GDB test case Buggy CG Step 3 Locate the problem gdb info breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x0000000000401343 in Sparse CG at cg buggy c 49 stop only if IA 1 1 1 nnz Cgdb run Breakpoint 1 Sparse CG CAA 0x 7ffff7F62010 b 0x60d4d0 x 0x61a6d0 IA 0x60a040 JA O0x ffff7f05010 n 13436 nnz 94926 delta 9 9999999999999995e 08 at cg buggy c 49 49 K1 IA 1 gdb list 44 ord Xii 45 46 Calculate Residual r with initi
Download Pdf Manuals
Related Search
Related Contents
à télécharger ici Oki 3321 Printer User Manual Hughes 9201 BGAN Terminal User's Guide 2.2 USER MANUAL PR-52 Long Range Parking Reader User Manual Polar RS800CX™ Po Intermec 074490 equipment case MDT551S - Mitsubishi Electric ŠKODA Superb Istruzioni per l`uso Copyright © All rights reserved.
Failed to retrieve file