Home

cuda-gdb V2.3 beta 22 pages ()

1. a Multi GPU applications are not supported CUDA GDB can debug only CUDA applications that use one GPU and the CUDA driver supports this by letting only one GPU remain visible to an application that is being debugged Note Because the CUDA driver starting with CUDA 2 3 Beta automatically excludes all but one GPU when an application is being debugged the application s behavior could be affected PG 00000 004_ V2 3 17 NVIDIA NVIDIA CUDA Debugger CUDA GDB 18 The debugger enforces blocking kernel launches Device memory allocated via cudaMalloc is not visible outside of the kernel function Host memory allocated with cudaMallocHost is not visible in CUDA GDB Not all illegal program behavior can be caught in the debugger examples include out of bounds memory accesses or divide by zero situations It is not currently possible to step over a subroutine in device code Debugging using the device driver API is not supported PG 00000 004 V2 3 NVIDIA
2. Q makes the compiler include symbolic debugging information in the executable Note It is currently not possible to generate debugging information when compiling with the cubin option PG 00000 004 V2 3 9 NVIDIA CHAPTER CUDA GDB Walkthrough This chapter presents a CUDA GDB walk through of twelve steps based on the following source code bitreverse cu which performs a simple 8 bit bit reversal on a data set 1 include lt stdio h gt 2 include lt stdlib h gt 3 4 Simple 8 bit bit reversal Compute test 5 6 define N 256 7 8 global void bitreverse unsigned int data 9 10 unsigned int idata data 11 12 unsigned int x idata threadIdx x 13 14 Oxf0f0fO0fO amp x gt gt 4 0x0fofofof amp x lt lt 4 15 x Oxcccccccc amp x gt gt 2 0x33333333 amp x lt lt 2 16 Oxaaaaaaaa amp x gt gt 1 0x55555555 amp x lt lt 1 1 7 18 idata threadIdx x x PG 00000 004 V2 3 00 00 o NVIDIA CHAPTER 4 CUDA GDB Walkthrough 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 int main void unsigned int d NULL int i unsigned int idata N odata N for 1 0 i lt N itt idata i unsigned int i cudaMalloc void amp d sizeof int N cudaMemcpy d idata sizeof int N cudaMemcpyHostToDevice bitreverse lt lt lt 1 N gt gt gt d c
3. NVIDIA CUDA Debugger CUDA GDB Displaying CUDA blocks and threads The CUDA GDB command info cuda threads displays a summary of all CUDA threads that are currently resident on the GPU CUDA threads are specified using the syntax described in Switching to any CUDA block or thread on page 7 and are summarized by grouping all contiguous threads that are stopped at the same program location A sample display is shown below lt lt lt 0 0 0 0 0 gt gt gt lt lt lt 0 0 31 0 0 gt gt gt GPUBlackScholesCallPut at blackscholes cu 73 lt lt lt 0 0 32 0 0 gt gt gt lt lt lt 119 0 0 0 0 gt gt gt GPUBlackScholesCallPut at blackscholes cu 72 The above example shows that 32 threads a warp have been advanced to line 73 of blackscholes cu and that the remainder of the resident threads stopped at line 72 Since this summary only shows thread coordinates for the start and end range it may be unclear how many threads or blocks are actually within the displayed range This can be checked by printing the dimension values gridDim and blockDim CUDA GDB also has the ability to display a full list of each individual thread that is currently resident on the GPU by using the command info cuda threads all 6 PG 00000 004 V2 3 NVIDIA CHAPTER 2 CUDA GDB Features and Extensions Switching to any CUDA block or thread To support CUDA thread and block switching CUDA GDB provides an ext
4. NVIDIA CUDA DEBUGGER CUDA GDB User Manual PG 00000 004 V2 3 June 2009 CUDA GDB PG 00000 004 V2 3 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication or otherwise under ane patent or patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all information previously supplied NVIDIA ee oa tare products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA the NVIDIA logo CUDA GeForce Tesla NVIDIA Quadro and Quadro are trademarks or registered trademarks of NVIDIA Corporation in the U
5. This enables developers to verify program correctness without the potential variations introduced by simulation and emulation environments Extending the GDB debugging environment GPU memory is treated as an extension to host memory and GPU threads and blocks are treated as extensions to host threads Furthermore there is no difference between CUDA GDB and GDB when debugging host code Supporting an initialization file CUDA GDB supports an initialization file which must reside in your home directory cuda gdbinit This file accepts any CUDA GDB command or extension as input to be processed when the cuda gdb command is executed It is just like the gabinit file used by standard versions of GDB only renamed Pausing CUDA execution at any function symbol or source file line number CUDA GDB supports setting breakpoints at any host or device function residing in a CUDA application by using the function symbol name or the source file line number This can be accomplished in the same way for either host or device code For example if the kernel s function name is mykernel main the break command is as follows cuda gdb break mykernel main The above command sets a breakpoint at a particular device location the address of mykernel main and forces all resident GPU threads to 4 PG 00000 004 V2 3 NVIDIA CHAPTER 2 CUDA GDB Features and Extensions stop at this location There is currently no method to stop only
6. Red Hat Enterprise Linux 5 x Q Red Hat Enterprise Linux 4 x Q Fedora 10 Q Novell SLED 11 Q Novell SLED 10 SP2 Q openSUSE 11 1 Q Ubuntu 9 04 a Ubuntu 8 10 PG 00000 004 V2 3 S D NVIDIA NVIDIA CUDA Debugger CUDA GDB GPU Reguirements 16 Debugging is supported on all CUDA capable GPUs with a compute capability of 1 1 or later Compute capability is a device attribute that a CUDA application can query about for more information see the latest NVIDIA CUDA Programming Guide on the NVIDIA CUDA Zone Web site http www nvidia com object cuda_home htmlf These GPUs have a compute capability of 1 0 and are not supported GeForce 8800 GTS Quadro FX 4600 GeForce 8800 GTX Quadro FX 5600 GeForce 8800 Ultra Tesla C870 Quadro Plex 1000 Model IV Tesla D870 Quadro Plex 2100 Model 54 Tesla S870 PG 00000 004 V2 3 NVIDIA APPENDIX Known lssues The following are known to be issues with the current release a X11 cannot be running on the GPU that is used for debugging because the debugger effectively makes the GPU look hung to the X server resulting in a deadlock or crash Two possible debugging setups exist remotely accessing a single GPU using VNC ssh etc using two GPUs where X11 is running on only one Note Starting with CUDA 2 2 Beta the CUDA driver automatically excludes the device used by X11 from being picked by the application being debugged This can change the behavior of the application
7. DA GDB Walkthrough 200 cece eee eee eee 10 A Supported Platforms c eee eee 15 Host Platform Requirements soso cane ela LE ERE fee eee fhe a eee nen a 15 GRUREGUITEMEMES ioiii cine stan ded aiaiai OR AAA Ge a aA HOM Cae ded edb Reed donde OL vaha s 16 hoe eee eee ee ee ee ee ee er er 1 er ee ii aia 17 PG 00000 004 V2 3 v NVIDIA NVIDIA CUDA Debugger CUDA GDB vi NVIDIA PG 00000 004 V2 3 CHAPTER Introduction CUDA GDB the NVIDIA CUDA debugger is introduced and what is new in the 2 3 Beta version is described CUDA GDB The NVIDIA CUDA Debugger CUDA GDB is an extension to the standard 1386 AMD64 port of GDB the GNU Project debugger version 6 6 It is designed to present the user with an all in one debugging environment capable of debugging native host code as well as CUDA code Standard debugging features are inherently supported for host code and additional features have been provided to support debugging CUDA code Starting with the 2 2 Beta release CUDA GDB is supported on 32 bit and 64 bit Linux Note All information contained within this document is subject to change PG 00000 004_V2 3 1 NVIDIA NVIDIA CUDA Debugger CUDA GDB What s New in the 2 3 Beta Version Inthis latest CUDA GDB version the following improvements have been made Q The number of supported Linux platforms has been increased See Host Platform Requirements on page 15 for the cu
8. certain threads or warps at a given breakpoint Single stepping individual warps CUDA GDB supports stepping GPU code at the finest granularity of a warp This means that typing next or step from the CUDA GDB command line when in the focus of device code advances all threads in the same warp as the current thread of focus In order to advance the execution of more than one warp a breakpoint must be set at the desired location A special case is the stepping of the thread barrier call __syncthreads In this case an implicit breakpoint is set immediately after the barrier and all threads are continued to this point It is important to note that it is not currently possible to step over a device subroutine Since all device subroutines are implicitly inlined CUDA GDB always steps into a device subroutine Displaying device memory in the device kernel The GDB print command has been extended to decipher the location of any program variable and can be used to display the contents of any CUDA program variable including a allocations made via cudaMalloc Q data that resides in various GPU memory regions such as shared local and global memory a special CUDA runtime variables such as threadIdx Displaying CUDA state information CUDA GDB provides a command info cuda state that displays information such as the current GPU being used and memory that has been allocated with cudaMalloc PG 00000 004 V2 3 5 NVIDIA
9. cuda gdb next Current CUDA Thread lt lt lt 0 0 170 0 0 gt gt gt bitreverse at bitreverse cu 12 unsigned int x idata threadIdx x PG 00000 004_V2 3 13 NVIDIA NVIDIA CUDA Debugger CUDA GDB 11 12 cuda gdb next Current CUDA Thread lt lt lt 0 0 170 0 0 gt gt gt bitreverse at bitreverse cu 14 x OxfOfOfOfO amp x gt gt 4 OxOfofofof amp x lt lt 4 cuda gdb print x 5 170 cuda gdb print x x 6 Oxaa This verifies thread 170 0 0 is working on the correct data 170 Use the last breakpoint set at bit reverse cu 18 to verify that the logic is correct to reverse the original data cuda gdb continue Continuing Current CUDA Thread lt lt lt 0 0 170 0 0 gt gt gt Breakpoint 3 bitreverse at bitreverse cu 18 idata threadIdx x x cuda gdb print x 7 85 cuda gdb print x x 8 0x55 Delete the breakpoints and continue the program to completion cuda gdb delete b Delete all breakpoints y or n y cuda gdb continue Continuing Program exited normally cuda gdb This concludes the CUDA GDB walkthrough 14 PG 00000 004 V2 3 NVIDIA APPENDIX Supported Platforms The general platform and GPU requirements for running NVIDIA CUDA GDB are described in this section Host Platform Requirements NVIDIA supports CUDA GDB on the 32 bit and 64 bit Linux distributions listed below a
10. ension to the GDB thread command that uses the CUDA syntax as follows thread lt lt lt BX BY TX TY TZ gt gt gt This extension supports multiple variations Q Providing fewer coordinates for the CUDA thread or block than are indicated sets the specified coordinates and clears all others to 0 The command thread lt lt lt 0 1 gt gt gt switches to the CUDA block with X coordinate 0 and Y coordinate 0 and to the CUDA thread with X coordinate 1 and Y and Z coordinates 0 This is the same as the command thread lt lt lt 0 0 1 0 0 gt gt gt Q Providing only the CUDA thread coordinates maintains the current block of focus while switching to the specified CUDA thread The command thread lt lt lt 10 gt gt gt maintains the current CUDA block and switches to the CUDA thread with X coordinate 10 and Y and Z coordinates 0 It is a shorthand version of the previous command thread lt lt lt 0 1 gt gt gt and only works for specifying threads within a current block Breaking into running applications CUDA GDB provides support for debugging kernels that appear to be hanging or looping indefinitely The CTRL C signal freezes the GPU and reports back the source code location At this point the program can be modified and then either resumed or terminated at the developer s discretion This feature is limited to applications running within the debugger It is not possible to break into and debu
11. g applications that have been previously launched PG 00000 004 V2 3 7 NVIDIA A a 9 ee i CHAPTER Installation and Debug Compilation Included in this chapter are instructions for installing CUDA GDB and for using NVCC the NVIDIA CUDA compiler driver to compile CUDA programs for debugging Installation Instructions Follow these steps to install NVIDIA CUDA GDB 1 Visit the NVIDIA CUDA Zone download page http www nvidia com object cuda_get html 2 Select the appropriate Linux operating system See Host Platform Requirements on page 15 3 Download and install the 2 3 Beta CUDA Driver Download and install the 2 3 Beta CUDA Toolkit This installation should point the environment variable LD_LIBRARY_PATH to usr local cuda lib and should also include usr local cuda bin in the environment variable PATH 5 Download and install the 2 3 Beta CUDA Debugger PG 00000 004 V2 3 8 NVIDIA CHAPTER 3 Installation and Debug Compilation Compiling for Debugging NVCC the NVIDIA CUDA compiler driver provides a mechanism for generating the debugging information necessary for CUDA GDB to work properly The g G option pair must be passed to NVCC when an application is compiled in order to debug with CUDA GDB for example nvcc g G foo cu o foo Using this line to compile the CUDA application foo cu Q forces 00 mostly unoptimized compilation which spills all variables to local memory
12. nited States and other countries Other company and product names may be trademarks of the respective companies with which they are associated Copyright 2007 2009 by NVIDIA Corporation All rights reserved NVIDIA Corporation a ON Kot OCCUR jopa fee ee wave SEERE SES ERNE ses ana aaa alee te Pap 1 CUDA GDB The NVIDIA CUDA Debugger 0000 cece eee 1 Whats Newin the 2 3 Beta Version erai cn aa kd todd tae ie wae ali nl Eid 2 2 CUDA GDB Features and Extensions 2000ce cece eee eee eee eee eee 3 Debugging CUDA applications on GPU hardware in real time saanunna 4 Extending the GDB debugging environment 0 00000 cece 4 Supporting an imitialization ME cernes tunes Rae RRR TERR Re eR MERE eR NEGER 4 Pausing CUDA execution at any function symbol or source file line number 4 Single stepping individual warpS KAR dre de DD ee 5 Displaying device memory in the device kernel 1 0 2 00 ee 5 Displaying CUDA state information oc ds0 cs6 2 Fone eatin ede eee ee ego a Bie dae 3 5 Displaying CUDA blocks and threads 1 ee 6 Switching to any CUDA block or thread s aususest ee eee 7 Breaking Into running applications siiniga deb eee ba ke aralt khk a baka wees 7 3 Installation and Debug Compilation 0 000 cece eee eee 8 installation MStMMctlOns a asc a2 cnen eden AEA de SOE jka ad BRNO dod we Reo Pe Sete a 8 Compiling for DEbUGOING 20 cups bdea hee SAME ARR ea eed 9 4 CU
13. ontinue to the device kernel cuda gdb continue Continuing Current CUDA Thread lt lt lt 0 0 0 0 0 gt gt gt Breakpoint 2 bitreverse at bitreverse cu 10 unsigned int idata data CUDA GDB has detected that a CUDA device kernel has been reached so it prints the current CUDA thread of focus 12 PG 00000 004 V2 3 NVIDIA CHAPTER 4 CUDA GDB Walkthrough 6 Verify the CUDA thread of focus with the thread command cuda gdb thread Current Thread 2 Thread 1584864 LWP 9146 Current CUDA Thread lt lt lt 0 0 0 0 0 gt gt gt The above output indicates that the host thread of focus has LWP ID 9146 and the current CUDA thread has block coordinates 0 0 and thread coordinates 0 0 0 7 Corroborate this information by printing the block and thread indices cuda gdb print blockIdx 1 x 0 y 0 cuda gdb print threadIdx 2 x 0 y 0 z 0 8 The grid and block dimensions can also be printed cuda gdb print gridDim 3 x 1 y 1 cuda gdb print blockDim 4 x 256 y 1 1 9 Since thread 0 0 0 reverses the value of 0 switch to a different thread to show more interesting data cuda gdb thread lt lt lt 170 gt gt gt Switching to lt lt lt 0 0 170 0 0 gt gt gt bitreverse at bitreverse cu 10 unsigned int idata data 10 Advance the execution to verify the data value that thread 170 0 0 should be working on
14. rrent list Q Scope shadowing is supported If a variable is introduced in an inner scope and has the same name as a variable in an outer scope the inner scope variable s value can now be seen a CUDA GDB is now integrated into the toolkit installer PG 00000 004 V2 3 NVIDIA CHAPTER CUDA GDB Features and Extensions Just as the CUDA programming model provides a seamless mechanism for programming host and GPU code CUDA GDB provides a model for seamlessly debugging both host and GPU code CUDA GDB provides a number of features to facilitate debugging CUDA applications Q Debugging CUDA applications on GPU hardware in real time on page 4 Q Extending the GDB debugging environment on page 4 a Supporting an initialization file on page 4 a Pausing CUDA execution at any function symbol or source file line number on page 4 a Single stepping individual warps on page 5 Q Displaying device memory in the device kernel on page 5 Q Displaying CUDA state information on page 5 Q Displaying CUDA blocks and threads on page 6 Q Switching to any CUDA block or thread on page 7 Q Breaking into running applications on page 7 PG 00000 004_V2 3 BS NVIDIA NVIDIA CUDA Debugger CUDA GDB Debugging CUDA applications on GPU hardware in real time The goal of CUDA GDB is to provide developers a mechanism for debugging a CUDA application on actual hardware in real time
15. udaMemcpy odata qd sizeof int N cudaMemcpyDeviceToHost for 1 0 i lt N itt printf u gt Su n idata i odatali cudaFree void d return 0 1 Begin by compiling the bitreverse cu CUDA application for debugging by entering the following command ata shell prompt S nvcc g G bitreverse cu o bitreverse This command assumes the source file name to be bitreverse cu and that no additional compiler flags are required for compilation See also Compiling for Debugging on page 9 PG 00000 004 V2 3 NVIDIA 11 NVIDIA CUDA Debugger CUDA GDB 2 Start the CUDA debugger by entering the following command at a shell prompt cuda gdb bitreverse 3 Set breakpoints Set both the host main and GPU bit reverse breakpoints here Also set a breakpoint at a particular line in the device function bitreverse cu 18 cuda gdb break main Breakpoint 1 at 0x805le8c file bitreverse cu line 23 cuda gdb break bitreverse Breakpoint 2 at 0x805b4f6 file bitreverse cu line 10 cuda gdb break bitreverse cu 18 Breakpoint 3 at 0x805b4fb file bitreverse cu line 18 4 Run the CUDA application and it executes until it reaches the first breakpoint main set in step 3 cuda gdb run Breakpoint 1 main at bitreverse cu 23 unsigned int d NULL int i 5 At this point commands can be entered to advance execution or to print the program state For this walkthrough c

cuda-gdb V2.3 beta 22 pages ()

Contents

Download Pdf Manuals

Related Search

Related Contents