Home

CUDA GDB user manual

1. 0 ccccccesceessccccsccsenessssuceeencnvees 12 Display CUDA kernel event ccecessscscccerccseseseseccceeresesenes 12 Examples of events displayed eneneneeneneneneonenenenenes 12 Checking memory errors eseessssoseerssesessscssecossoosesoseseseressesoo 13 GPU Err r REDON ING read im ateism saami kaa N SETET Ea 13 Multi GPU debugging ssossessossossosrosrossosrosrosrossesrosroeresreero 16 3 Installation and Debug Compilation eeeeeeeeeeeeeeeeeee 17 Installation Instructions ssssssssssscesssssssesseccsseosescssessssceseeeess 17 Compiling for Debugging esssesssesssessscessscsscsssscessessscesseesesees 18 Compiling Debugging for Fermi and Tesla GPUS ceeeeeee cece ees 18 Compiling for Fermi GPUS cacscccvwerseeveacer van iesiuin kali alamik 18 Compiling for Fermi and Tesla GPUS ccc ecccce cee eeeeeneeeeeeees 18 4 cuda odb Walk through ssiscissassisecssscsssecsesessesedieccsessassvesdnaessevass 19 Graphics Driver CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 i TABLE OF CONTENTS bitreverse cu Source Code eeveneenoonnoneonnonneneeneeneeee 19 Walking Through the Code vaen ananass kaks n s ekanusav kk as kask aki sis kaant mi 20 Appendix A Supported Platforms eeeeeeeoeeeeeee eee eee ee 24 Host Platform Requirements eeeeveenennennenn enn en vene enee ne 24 GPU R
2. Supporting an initialization file on page 4 vv v vy Pausing CUDA execution at any function symbol or source file line number on page 5 Single stepping individual warps on page 5 Displaying device memory in the device kernel on page 5 Switching to any coordinate or kernel on page 7 cuda gdb info commands on page 9 Breaking into running applications on page 12 Displaying context and kernel events on page 12 Checking memory errors on page 13 GPU Error Reporting on page 13 YYY YY V vV V VK Multi GPU debugging on page 16 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 3 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Getting Help For more information use the cuda gdb help with the help cuda and help set cuda commands Debugging CUDA applications on GPU hardware in real time The goal of cuda gdb is to provide developers a mechanism for debugging a CUDA application on actual hardware in real time This enables developers to verify program correctness without the potential variations introduced by simulation and emulation environments Extending the gdb debugging environment GPU memory is treated as an extension to host memory and CUDA threads and blocks are treated as extensions to host threads Furthermore there is no difference between cuda gdb and gdb when debugging host code The user can inspect either a specific host thread or a specific CUD
3. the exact instance to which the breakpoint will be resolved cannot be controlled With the driver API the user can control the instance to which the breakpoint will be resolved to by setting the breakpoint right after its module is loaded CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 16 INSTALLATION AND DEBUG COMPILATION Included in this chapter are instructions for installing cuda gdb and for using NVCC the NVIDIA CUDA compiler driver to compile CUDA programs for debugging Installation Instructions Follow these steps to install NVIDIA cuda gdb 1 Visit the NVIDIA CUDA Zone download page http www nvidia com object cuda_get html 2 Select the appropriate Linux operating system See Host Platform Requirements on page 24 3 Download and install the 3 2 CUDA Driver 4 Download and install the 3 2 CUDA Toolkit This installation should point the environment variable LD_LIBRARY_PATH to usr local cuda lib and should also include usr local cuda bin in the environment variable PATH 5 Download and install the 3 2 CUDA Debugger CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 17 Chapter 03 INSTALLATION AND DEBUG COMPILATION Compiling for Debugging NVCC the NVIDIA CUDA compiler driver provides a mechanism for generating the debugging information necessary for cuda gdb to work properly The g G option pair must be passed to NVCC when an application is compiled in order to debug with cuda gdb for
4. E arrmamkehneadid lt lt 2 g 18 array threadIdx x Oxaaaaaaaa amp array threadIdx x gt gt 1 19 0 55559999 amp eee klausel e lt lt 1y 20 Bal idata threadIdx x array threadIdx x 22 28 24 int main void 25 voici Cl NULA Loe ale 26 unsigned int idata N odata N 27 28 icone a Of al S WP 1 4 29 idata i unsigned int i CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 19 Chapter 04 CUDA GDB WALK THROUGH 30 sul cudaMalloc void amp d sizeof int N 32 cudaMemcpy d Idata sizeof int N 39 cudaMemcpyHostToDevice 34 35 bitreverse lt lt lt 1 N N sizeof int gt gt gt d 36 Si cudaMemcpy odata d sizeof int N 38 cudaMemcpyDeviceToHost 39 40 for i 0 i lt N i 41 printf Su gt Su n idata i odatali 42 43 cudaFree void d 44 return 0 45 Walking Through the Code 1 Begin by compiling the bitreverse cu CUDA application for debugging by entering the following command at a shell prompt nvcc g G bitreverse cu o bitreverse This command assumes the source file name to be bitreverse cu and that no additional compiler flags are required for compilation See also Compiling for Debugging on page 18 2 Start the CUDA debugger by entering the following command at a shell prompt cuda gdb bitreverse 3 Set breakpoints Set both the host main and GPU bitreverse breakpoints here Also set a breakpo
5. Warp error This occurs when any Warp Illegal thread within a warp Instruction has executed an illegal instruction CUDA EXCEPTION 5 Not precise Warp error This occurs when any Warp Out of range thread within a warp Address accesses an address that is outside the valid range of local or shared memory regions CUDA EXCEPTION 6 Not precise Warp error This occurs when any Warp Misaligned thread within a warp Address accesses an address in the local or shared memory segments that is not correctly aligned CUDA EXCEPTION 7 Not precise Warp error This occurs when any thread within a warp executes an instruction that accesses a memory space not permitted for that instruction CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 14 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Table 2 1 CUDA Exception Codes continued Exception code Precision of Scope of the Error Description the Error CUDA_EXCEPTION_8 Not precise Warp error This occurs when any Warp Invalid PC thread within a warp advances its PC beyond the 40 bit address space CUDA EXCEPTION 9 Not precise Warp error This occurs when any Warp Hardware Stack thread in a warp Overflow triggers a hardware stack overflow This should be a rare occurrence CUDA EXCEPTION 10 Not precise Global error This occurs when a Device Illegal thread accesses an Address illegal out of bounds global address For incr
6. context events Display CUDA kernel events gt cuda gdb set cuda kernel events 1 Display CUDA kernel events gt cuda gdb set cuda kernel events 0 Do not display CUDA kernel events CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 12 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Examples of events displayed The following are examples of context events displayed gt Context Create of context 0xad2fe60 on Device 0 gt Context Pop of context Oxad2fe60 on Device 0 gt Context Destroy of context Oxad2fe60 on Device 0 The following are examples of kernel events displayed gt Launch of CUDA Kernel 1 kernel3 on Device 0 gt Termination of CUDA Kernel 1 kernel3 on Device 0 Checking memory errors The CUDA MemoryChecker feature is enabled which allows detection of global memory violations and mis aligned global memory accesses This feature is off by default and can be enabled using the the following variable in cuda gdb before the application is run set cuda memcheck on Once CUDA memcheck is enabled any detection of global memory violations and mis aligned global memory accesses will be detected only in the run or continue mode and not while single stepping through the code You can also run CUDA memory checker as a standalone tool cuda memcheck GPU Error Reporting With improved GPU error reporting in cuda gdb application bugs are now easy to identify and easy to fix The following table
7. focus 6 Verify the CUDA thread of focus with the info cuda threads command and switch between host thread and the CUDA threads cuda gdb info cuda threads SISU OA ON OO ONE cos lt 0 0 255 0 0 See lolicweveres lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 The above output indicates that there is one CUDA block with 256 threads executing and all the threads are on the same pc cuda gdb bt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 Switch to host thread cuda gdb thread Current thread is 2 Thread 140609798666000 LWP 4153 cuda gdb thread 2 Switching to thread 2 Thread 140609798666000 LWP 4153 0 0x0000000000400e7d in main at bitreverse cu 35 35 bitreverse lt lt lt 1 N N sizeof int gt gt gt d cuda gdb bt 0 0x0000000000400e7d in main at bitreverse cu 35 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 21 Chapter 04 CUDA GDB WALK THROUGH Switch to CUDA kernel cuda gdb info cuda kernels 0 Device 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 cuda gdb cuda kernel 0 Switching to CUDAT Kerne MORE 1079 71407010 gt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 9 unsigned int idata cuda gdb bt 0 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9
8. shows the new errors that are reported on GPUs with sm_20 and higher Continuing the execution of your application after these errors could lead to application termination or indeterminate results Table 2 1 CUDA Exception Codes Exception code Precision of Scope of the Error Description the Error CUDA EXCEPTION 0 Not precise Global error on the GPU This is a global GPU Device Unknown error caused by the Exception application which does not match any of the listed error codes below This should be a rare occurrence CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 13 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Table 2 1 CUDA Exception Codes continued Exception code Precision of Scope of the Error Description the Error CUDA EXCEPTION 1 Precise Per lane thread error This occurs when a Lane Illegal Address Reguires thread accesses an memcheck on illegal out of bounds global address Warp Invalid Address Space CUDA_EXCEPTION_2 Precise Per lane thread error This occurs when a Lane User Stack thread exceeds its Overflow stack memory limit CUDA EXCEPTION 3 Not precise Global error on the GPU This occurs when the Device Hardware application triggers a Stack Overflow global hardware stack overflow The main cause of this error is large amounts of divergence in the presence of function calls CUDA EXCEPTION 4 Not precise
9. 0 GTX Quadro FX 5600 GeForce 8800 Ultra Tesla C870 Quadro Plex 1000 Model IV Tesla D870 Quadro Plex 2100 Model S4 Tesla S870 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 24 KNOWN ISSUES The following are known issues with the current release gt X11 cannot be running on the GPU that is used for debugging because the debugger effectively makes the GPU look hung to the X server resulting in a deadlock or crash Two possible debugging setups exist e remotely accessing a single GPU using VNC ssh etc e using two GPUs where X11 is running on only one Note The CUDA driver automatically excludes the device used by X11 from being picked by the application being debugged This can change the behavior of the application gt The debugger enforces blocking kernel launches gt Device memory allocated via cudaMalloc is not visible outside of the kernel function gt Not all illegal program behavior can be caught in the debugger gt On GPUs with sm_type less than sm_20 it is not possible to step over a subroutine in the device code gt Device allocations larger than 100 MB on Tesla GPUs and larger than 32 MB on Fermi GPUs may not be accessible in the debugger gt Breakpoints in divergent code may not behave as expected gt Debugging applications using textures is not supported on GPUs with sm type less than sm_20 cuda gdb may output the following error message when setting breakpoints in kern
10. 00000000002b8 thread LMNs AS pc 0x00000000000002b8 thread NS 28 32 pc 0x00000000000002b8 thread ENA O32 pc 0x00000000000002b8 thread DNS 30 32 pc 0x00000000000002b8 thread AN 31 32 pc 0x00000000000002b8 thread gt gt Wer er SA om EM AS ie SA E E Wer e ty a PS cor he S s gt OONO ee emme OG O O eee ee ee ee G Mee e e e e oso eS oS oS oa oe oe GOG eG S 2 eS OA c oS eo wo oe ae S _ _ amp _ _ SO _ gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt Gor U3 TS INS O RN SOME IS INS es O e ee M CO icon 4 eS Elp e ES E OS E E Es E E E SS E EE kt Es E Es ve ve 0 Ye DO 0 0 DO Ye YH WH gt gt info cuda lane This command displays information per thread level if you are not interested in the warp level information for every thread cuda gdb info cuda lane DEV O 1 Device Type gt200 SM Type m 13 SM WP LN 30 32 32 Regs LN 128 SM 0 30 valid warps 00000000000000 WE 0 732 valid active divergent lanes Oxffffffff Oxffffffff 0 L x00000000 leek 0 0 Ni 07432 pc 0x00000000000001b8 lased 07 nOr a0 info cuda kernels This command displays the list of current active kernels and the device on which they run CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 11 Chapter 02 CUDA GDB FEATURES AND
11. A thread gt To switch focus to a host thread use the thread N command gt To switch focus to a CUDA thread use the cuda device sm warp lane kernel grid block thread command Note It is important to use the cuda device 0 or cuda kernel 0 com mand to switch to the required device or kernel before using any of the other CUDA thread commands Supporting an initialization file cuda gdb supports an initialization file The standard gdbinit file used by standard versions of gdb 6 6 has been renamed to cuda gdbinit This file accepts any cuda gdb command or extension as input to be processed when the cuda gdb command is executed CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 4 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Pausing CUDA execution at any function symbol or source file line number cuda gdb supports setting breakpoints at any host or device function residing ina CUDA application by using the function symbol name or the source file line number This can be accomplished in the same way for either host or device code For example if the kernel s function name is mykernel main the break command is as follows cuda gdb break mykernel main The above command sets a breakpoint at a particular device location the address of mykernel_main and forces all resident GPU threads to stop at this location There is currently no method to stop only certain threads or warps at a given breakpoint Singl
12. EXTENSIONS In the following output example the indicates the current kernel only 1 right now the first number is the kernel id the second number is the device id cuda gdb info cuda kernels 0 Device 0 acos main lt lt lt 240 1 128 1 1 gt gt gt parms arg 0x5100000 res 0x5100200 n 5 at acos cu 367 Breaking into running applications cuda gdb provides support for debugging kernels that appear to be hanging or looping indefinitely The CTRL C signal freezes the GPU and reports back the source code location The current thread focus will be on the host you can use cuda kernel lt n gt to switch to the device kernel you need At this point the program can be modified and then either resumed or terminated at the developer s discretion This feature is limited to applications running within the debugger It is not possible to break into and debug applications that have been previously launched Displaying context and kernel events Kernel refers to your device code that executes on the GPU while context refers to the virtual address space on the GPU for your kernel Beginning with cuda gdb version 3 2 you can turn ON or OFF the display of CUDA context and kernel events to review the flow of the active contexts and kernels Display CUDA context events gt cuda gdb set cuda context events 1 Display CUDA context events gt cuda gdb set cuda context_events 0 Do not display CUDA
13. NVIDIA CUDA GDB NVIDIA CUDA Debugger TABLE OF CONTENTS 1 Introductio Meses inasnan t 1 cuda gdb The NVIDIA CUDA Debugger cece ccc e cece ee ceececeeeeeeeecs 1 What s New in Version 3 2 ccececcsseccccccceesesssecesccessessssscesosenes 2 2 cuda gdb Features and Extensions eeeeeeeeeeoeoeeeoeeeeeeee 3 Gettino Help ME mm mamm a 4 Debugging CUDA applications on GPU hardware in real time 4 Extending the gdb debugging environment veeeeeneeee 4 Supporting an initialization file eveeeeeeeeeeeeeeeeeee 4 Pausing CUDA execution at any function symbol or source file line number5 Single stepping individual warps eereveeeee eee neevenee vene 5 Displaying device memory in the device kernel eeeeees 5 Variable Storage and Accessibility evvrvenenneenenve ne 6 Switching to any coordinate or kernel erreereeeeeveenee 7 Inspecting the coordinates or kernel eeveeeneeeeeee 7 Changing the coordinate or kernel focus cc ccc cee cee ceeceeeeeeees 7 cuda gdb info commands neeevevennveeneneneenaneveveene ne 9 Breaking into running applications eeeveeveneeeeeeeeeee 12 Displaying context and kernel events eeeeenneneeeove veena 12 Display CUDA context events
14. SM WP LN 30 32 32 Regs LN 128 info cuda device This command displays the device information with an SM header in addition to the device header per GPU The SM header lists all the SMs that are actively running CUDA blocks with the valid warp mask in each SM The example below shows eight valid warps running on one of the SMs cuda gdb info cuda device DEA OA Device Type gt200 SM Type sm 13 SM WP LN 30 32 32 Regs LN 128 SM 0 30 valid warps 00000000000000 info cuda devices This command similar to info cuda kernels displays the list of devices currently running a kernel in device index order The device currently in focus is indicated with a character The first field is the device ID the second field is the kernel ID the third and last field is the kernel invocation kernel name and arguments Example cuda gdb info cuda devices 0 Kaermagel 1 leesineil0 lt lt lt iL 1 ily i ty eee s oo 1 Rerne karmelise lt 1 1 Cl dpb Sse ssa CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 9 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS info cuda sm This command displays the warp header in addition to the SM and the device headers for every active SM The warp header lists all the warps with valid active and divergent lane mask information for each warp The warp header also includes the block index within the grid to which it belongs The example below lists eigh
15. URES AND EXTENSIONS Variable Storage and Accessibility Depending on the variable type and usage variables can be stored either in registers or in local shared const or global memory You can print the address of any variable to find out where it is stored and directly access the associated memory The example below shows how the variable array which is of shared int array can be directly accessed in order to see what the stored values are in the array cuda gdb p amp array 1 shared int 0 0x20 cuda gdb p array 0 4 Sa 107 128 64 182 You can also access the shared memory indexed into the starting offset to see what the stored values are cuda gdb p shared int 0x20 3 0 cuda gdb p shared int 0x24 4 128 cuda gdb p shared int 0x28 5 64 The example below shows how to access the starting address of the input parameter to the kernel cuda gdb p amp data S6 const global void const parameter 0x10 cuda gdb p global void const parameter 0x10 7 global void const parameter 0x110000 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 6 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Switching to any coordinate or kernel To support CUDA thread and block switching new commands have been introduced to inspect or change the logical coordinates kernel grid block thread and the physical coordinates device sm warp lane T
16. ables stored in more than one register can now be accessed gt Multi GPU support cuda gdb now supports the inspection execution control and placement of breakpoints with multiple CUDA capable devices See Multi GPU debugging on page 16 for more information gt New cuda gdb commands e Added a command to display the list of devices currently running a kernel info cuda devices similar to info cuda kernels command See info cuda devices on page 9 e Added commands to display context and kernel events set cuda context_events set cuda kernel_events See Displaying context and kernel events on page 12 1 Kernel refers to your device code that executes on the GPU while context refers to the virtual address space on the GPU for your kernel CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 2 CUDA GDB FEATURES AND EXTENSIONS Just as programming in CUDA C is an extension to C programming debugging with cuda gdb is a natural extension to debugging with gdb cuda gdb supports debugging CUDA applications that use the CUDA driver APIs in addition to runtime APIs and supports debugging just in time JIT compiled PTX kernels The following sections describe the cuda gdb features that facilitate debugging CUDA applications gt Getting Help on page 4 Debugging CUDA applications on GPU hardware in real time on page 4 Extending the gdb debugging environment on page 4
17. e stepping individual warps cuda gdb supports stepping GPU code at the finest granularity of a warp This means that typing next or step from the cuda gdb command line when in the focus of device code advances all threads in the same warp as the current thread of focus In order to advance the execution of more than one warp a breakpoint must be set at the desired location and then the application execution continued A special case is the stepping of the thread barrier call __ syncthreads In this case an implicit breakpoint is set immediately after the barrier and all threads are continued to this point On GPUs with sm_type less than sm_20 it is not possible to step over a subroutine in the device code Instead cuda gdb always steps into the device function whereas on GPUs with sm_20 and higher you can step in over or out of the device functions by explicitly using a __noinline__ keyword on that function or compiling the code with arch sm_20 Displaying device memory in the device kernel The gdb print command has been extended to decipher the location of any program variable and can be used to display the contents of any CUDA program variable including gt allocations made via cudaMalloc gt data that resides in various GPU memory regions such as shared local and global memory gt special CUDA runtime variables such as threadIdx CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 5 Chapter 02 CUDA GDB FEAT
18. eased precision set cuda memcheck on in cuda gdb CUDA EXCEPTION 11 Precise Per lane thread error This occurs when a Lane Misaligned Reguires thread accesses a Address memcheck on global address that is not correctly aligned CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 15 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS Multi GPU debugging Multi GPU debugging is not much different than single GPU debugging except for a few additional cuda gdb commands that let you switch between the GPUs Any GPU hitting the breakpoint will pause all the GPUs running CUDA on that system Once paused you can use the info cuda kernels to view all the active kernels and the GPUs they are running on When any GPU is resumed all the GPUs are resumed All CUDA capable GPUs can run the same or different kernels To switch to an active kernel you can use cuda kernel lt n gt or cuda device lt n gt to switch to the desired GPU where n is the id of the kernel or GPU retrieved from info cuda kernels Once you are on an active kernel and a GPU then the rest of the process is the same as single GPU debugging Note The same module and therefore the same kernel can be loaded and used by different contexts and devices at the same time When a breakpoint is set in such a kernel by either name or file name and line number it will be resolved arbitrarily to only one instance of that kernel With the runtime API
19. els using textures Cannot access memory at address 0x0 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 25 Appendix B KNOWN ISSUES gt Debugging multiple contexts ina GPU Debugging applications with multiple CUDA contexts running on the same GPU is not supported on any GPU gt Debugging device functions On GPUs with sm_20 if you are debugging code in device functions that get called by multiple kernels then setting a breakpoint in the device function will insert the breakpoint in only one of the kernels gt If CUDA environment variable CUDA_VISIBLE_DEVICES lt index gt is used to target a particular GPU then make sure X server is not running on any of the GPUs If X is running then reduce the lt index gt count by one more since the GPU running X is not visible to the application when running under the debugger CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 26 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences o
20. equirements krvruvsimikvnaud as kadus kv n skt akkas Kaine 24 Appendix B Known ISSUES sseeseeseesecceccecseeseesecoeeseesecseeceeseesee 25 Graphics Driver CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 ii INTRODUCTION This document introduces cuda gdb the NVIDIA CUDA debugger and describes what is new in version 3 2 cuda gdb The NVIDIA CUDA Debugger cuda gdb is an extension to the standard 1386 AMD64 port of gdb the GNU Project debugger version 6 6 It is designed to present the user with seamless debugging environment that allows simultaneous debugging of GPU and CPU code Standard debugging features are inherently supported for host code and additional features have been provided to support debugging CUDA code cuda gdb is supported on 32 bit and 64 bit Linux Note All information contained within this document is subject to change CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 1 Chapter 01 INTRODUCTION What s New in Version 3 2 In this latest cuda gdb version the following improvements and changes have been made gt GPU error reporting cuda gdb now catches and reports all hardware errors on GPUs with sm_type sm_20 and higher These are hardware errors caused by the application and get reported as thread level warp level and global errors Please note that not all errors are precise See GPU Error Reporting on page 13 for more information gt Double register support Vari
21. example nvcc g G foo cu o foo Using this line to compile the CUDA application foo cu gt forces 00 mostly unoptimized compilation gt makes the compiler include symbolic debugging information in the executable Compiling Debugging for Fermi and Tesla GPUs Compiling for Fermi GPUs If you are using the latest Fermi board add the following flags to target Fermi output when compiling the application gencode arch compute 20 code sm 20 Compiling for Fermi and Tesla GPUs If you are targetting both Fermi and Tesla GPUs include these two flags gencode arch compute 20 code sm 20 gencode arch compute 10 code sm 10 CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 18 CUDA GDB WALK THROUGH This chapter presents a cuda gdb walk through of eleven steps based on the source code bitreverse cu which performs a simple 8 bit bit reversal on a data set bitreverse cu Source Code 1 include lt stdio h gt 2 dinelude lt stdlib h gt 2 4 Simple 8 bit bit reversal Compute test 5 6 define N 256 T 8 _ global void bitreverse void data 9 unsigned int idata unsigned int data 10 CXS HENS Nase Cente array ii abil 12 array threadIdx x idata threadIdx x 13 14 array threadIdx x 0xfofofof0 amp array threadIdx x gt gt 4 15 OxOf0f0f0f amp array threadIdx x lt lt 4 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 17 UKS 28388
22. f use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all other information previously supplied NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA the NVIDIA logo NVIDIA nForce GeForce NVIDIA Quadro NVDVD NVIDIA Personal Cinema NVIDIA Soundstorm Vanta TNT2 TNT RIVA RIVA TNT VOODOO VOODOO GRAPHICS WAVEBAY Accuview Antialiasing Detonator Digital Vibrance Control ForceWare NVRotate NVSensor NVSync PowerMizer Quincunx Antialiasing Sceneshare See What You ve Been Missing StreamThru SuperStability T BUFFER The Way It s Meant to be Played Logo TwinBank TwinView and the Video amp Nth Superscript Design Logo are registered trademarks or trademarks of NVIDIA Corporation in the United States and or other countries Other company and product names may be trademarks or registered trademarks of the respective owners with which they are associated Copyright 2007 2010 NVIDIA Corporation All rights reserved www nvidia com NVIDIA
23. gdb info cuda warp DEW Viz OA Device Type gt200 SM Type sm 13 SM WP LN 30 32 32 Regs LN 128 M 0 30 valid warps 00000000000000 S WP 0 32 valid active divergent lanes Oxffffffff oxffffffff 0x00000000 blocks 0 0 IN 0 32 pc 0x00000000000002b8 needid CO0 EN 17482 pc 0x00000000000002b8 Eline acs 6107710 ING ASC pc 0x00000000000002b8 aasad 2 0 0 ENE 3 32 pc 0x00000000000002b8 carse 370 0 4 0 0 S70 6 0 0 LN 4 32 pc 0x00000000000002b8 thread EN 57432 pc 0x00000000000002b8 thread Me 67452 pc 0x00000000000002b8 thread CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 10 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS ENEO pc 0x00000000000002b8 thread NE 87432 pc 0x00000000000002b8 thread ENE 97482 pc 0x00000000000002b8 thread NOW 32 pc 0x00000000000002b8 thread Ns 11 32 pc 0x00000000000002b8 thread ms 12732 pc 0x00000000000002b8 thread ENS 13 32 pc 0x00000000000002b8 thread LN 14 32 pc 0x00000000000002b8 thread LMNs 15 32 pc 0x00000000000002b8 thread WNS 16 32 pc 0x00000000000002b8 thread TN lt 32 pc 0x00000000000002b8 thread WN 18 32 pc 0x00000000000002b8 thread IIMS 19792 pc 0x00000000000002b8 thread ENET 20 92 pc 0x00000000000002b8 thread ENE 21792 pc 0x00000000000002b8 thread Tan 242 pc 0x00000000000002b8 thread NS 23 32 pc 0x00000000000002b8 thread LN 24 32 pc 0x00000000000002b8 thread LMS 25 32 pc 0x00000000000002b8 thread ENE 26 02 pc 0x000
24. he different between grid and kernel is that grid is a unique identifier for a kernel launch on a given device per device launch id whereas kernel is a unique identifier for a kernel launch across multiple devices Inspecting the coordinates or kernel To see the current selection use the cuda command followed by a space separated list of parameters Example Determing the coordinates cuda gdb cuda device sm warp lane block thread Current CUDA focus device 0 sm 0 warp 0 lane 0 block OPO emear 020 0 Example Determining the kernel focus cuda gdb cuda kernel Current CUDA kernel 0 device 0 sm 0 warp 0 lane 0 grid 1 bloes 06 0 aarzsad 0 0 0 Changing the coordinate or kernel focus To change the focus specify a value to the parameter you want to change For example To change the physical coordinates cuda gdb cuda device 0 sm 1 warp 2 lane 3 New CUDA focus device 0 sm 1 warp 2 lane 3 grid 1 918 10 0 7 dargael 5720 0 4 To change the logical coordinates thread parameter cuda gdb cuda thread 15 0 0 New CUDAR ocus device MO 5 mi at OO EE ES TG TEGU bilo CORO assad 437 07 0 CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 7 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS To change the logical coordinates block and thread parameters cuda gdb cuda block 1 0 thread 3 0 0 New CUDA focus device 0 sm 3 warp 0 lane 3 gr
25. id 1 9186 1 0 aasad 3 0 0 Note If the specified set of coordinates is incorrect cuda gdb will try to find the lowest set of valid coordinates If cuda thread selection is set to logical the lowest set of valid logical coordinates will be selected If cuda thread selection is set to physical the lowest set of physical coordinates will be selected Use set cuda thread_selection to switch the value To change the kernel focus specify a value to the parameter you want to change To change the kernel focus cuda gdb cuda kernel 0 Switching to CUDA Kernel 0 lt lt lt 0 0 0 0 0 gt gt gt 0 acos main lt lt lt 240 1 128 1 1 gt gt gt parms arg 0x5100000 res OS MNO m SC OS M GU 167 367 int totalThreads gridDim x blockDim x CUDA GDB NVIDIA CUDA Debugger DU 05227 001 V3 2 8 Chapter 02 CUDA GDB FEATURES AND EXTENSIONS cuda gdb info commands Note The command info cuda state is no longer supported info cuda system This command displays system information that includes the number of GPUs in the system with device header information for each GPU The device header includes the GPU type compute capability of the GPU number of SMs per GPU number of warps per SM number of threads lanes per warp and the number of registers per thread Example cuda gdb info cuda system umber of devices 1 N DEV O 1 Device Type gt200 SM Type sm 13
26. int at a particular line in the device function bitreverse cu 18 cuda gdb b main Breakpoint 1 at 0x400db0 file bitreverse cu line 25 cuda gdb b bitreverse 2 b 8 Breakpoint at 0x40204f file bitreverse cu line 8 21 ae O0x40205b file lbittreverse cu lines 21 cuda gdb Breakpoint CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 20 Chapter 04 CUDA GDB WALK THROUGH 4 Run the CUDA application and it executes until it reaches the first breakpoint main set in step 3 cuda gdb Starting program old ssalian local src rel gpgpu toolkit r3 1 bin x86 64 Linux debug bitreverse Thread debugging using libthread_db enabled New process 4153 New Thread 140609798666000 LWP 4153 Switching to Thread 140609798666000 LWP 4153 Breakpoint 1 main at bitreverse cu 25 25 void d NULL int i 5 At this point commands can be entered to advance execution or to print the program state For this walkthrough continue to the device kernel cuda gdb lt Continuing Breakpoint 3 at 0x1e30910 file bitreverse cu line 21 Launch of CUDA Kernel 0 on Device 0 KSM GE OMEUDANKE ENE MORE 10709471 40710710 gt gt gt Breakpoint 2 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 9 9 unsigned int idata unsigned int data cuda gdb has detected that a CUDA device kernel has been reached so it prints the current CUDA thread of
27. t warps with 32 active threads each There is no thread divergence on any of the valid active warps cuda gdb info cuda sm DEN OVI Device Type gt200 SM Type sm 13 SM WP LN 30 32 32 Regs LN 128 Me 07430 valid warps 00000000000000 25 0 32 valid active divergent lanes Oxffffffff oxffffffff x00000000 lol eles 07 0 1 32 Valiad active divergent aneen EEE OE eE E x00000000 oleeks 070 PUUSSE valid active divergent lanes Oxffffffff oxffffffff x00000000 loloeles 0 0 S W 0 W 0 W 0 W 3 32 valid active divergent lanes Oxffffffff Ooxffffffff 0x00000000 block 0 0 W 0 W 0 W 0 W 0 U oe U oe U oe U oe AY SE valid active divergent lanes Oxffffffff oxffffffff x00000000 lole eles 07 0 bY SE valid active divergent lanes Oxffffffff oxffffffff x00000000 lolleeles 07 0 Pe 6 32 valid active divergent lanes Oxffffffff oxffffffff x00000000 lolleeles 0 0 2 2 valid active divergent lanes Oxffffffff Oxffffffff x00000000 EKA OOD U oe info cuda warp This command takes the detailed information one level deeper by displaying lane information for all the threads in the warps The lane header includes all the active threads per warp It includes the program counter in addition to the thread index within the block to which it belongs The example below lists the 32 active lanes on the first active warp index 0 cuda
28. ugger DU 05227 001 V3 2 22 Chapter 04 CUDA GDB WALK THROUGH cuda gdb p x array 0 12 Se 0x0 0x80 0x40 Oxc0 0x20 Oxadl 0x60 Oxe0 Oxil0 0x40 099 959 cuda gdb p amp data 9 global void parameter 0x10 cuda gdb p global void parameter 0x10 10 global void parameter 0x100000 11 Delete the breakpoints and continue the program to completion cuda gdb delete b Delete all breakpoints y or n y cuda gdb continue Continuing Program exited normally cuda gdb This concludes the cuda gdb walkthrough CUDA GDB NVIDIA CUDA Debugger DU 05227 001_V3 2 23 SUPPORTED PLATFORMS The general platform and GPU requirements for running NVIDIA cuda gdb are described in this section Host Platform Requirements NVIDIA supports cuda gdb on the 32 bit and 64 bit Linux distributions listed below gt Red Hat Enterprise Linux 4 8 amp 5 5 gt Fedora 13 gt Novell SLED 115P1 gt openSUSE 11 1 gt Ubuntu 10 04 GPU Requirements Debugging is supported on all CUDA capable GPUs with a compute capability of 1 1 or later Compute capability is a device attribute that a CUDA application can query about for more information see the latest NVIDIA CUDA Programming Guide on the NVIDIA CUDA Zone Web site http www nvidia com object cuda_home html These GPUs have a compute capability of 1 0 and are not supported GeForce 8800 GTS Quadro FX 4600 GeForce 880
29. unsigned int data The above output indicates that the host thread of focus has LWP ID 9146 and the current CUDA thread has block coordinates 0 0 and thread coordinates 0 0 0 7 Corroborate this information by printing the block and thread indices cuda gdb print blockIdx Sil gt bx 07 2 gt 0 cuda gdb print threadIdx 2 x 0 y 0 z 0 8 The grid and block dimensions can also be printed cuda gdb print gridDim 3 x 1 y 1 cuda gdb print blockDim 4 x 256 y 1 z 1 9 Since thread 0 0 0 reverses the value of 0 switch to a different thread to show more interesting data cuda gdb cuda thread 170 Switching to CUDA Kernel 0 device 0 sm 0 warp 5 lane 10 grid 1 block 00y EA ea 170 00 10 Advance kernel execution and verify some data cuda gdb n 112 array threadIdx x idatalthreadIdx x cuda gdb n 14 array threadIdx x 0xfofofofo amp array threadIdx x gt gt 4 cuda gdb n 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 cuda gdb n 18 array threadIdx x cuda gdb n Oxaaaaaaaa amp array threadIdx x gt gt 1 Breakpoint 3 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 21 al idata threadIdx x array threadIdx x cuda gdb p array 0 12 S7 107 1287 Gd 192 32 160 85 2247 18 144 BO 203 CUDA GDB NVIDIA CUDA Deb

CUDA GDB user manual

Contents

Download Pdf Manuals

Related Search

Related Contents