Home

CUDA-GDB

1. cuda gdb e Continuing Reading symbols for shared libraries done Reading symbols for shared libraries done Context Create of context 0x80f200 on Device 0 Launch of CUDA Kernel 0 bitreverse lt lt lt 1 1 1 256 1 1 gt gt gt on Device 0 Breakpoint 3 at 0x8667b8 file bitreverse cu line 21 Swattiching focus to CUDA kernel Oy oridm bloc 07070 7 thread 070707 device 0 sm 0 warp 0 lane 0 Breakpoint 2 bitreverse lt lt lt 1 1 1 256 1 1 gt gt gt data 0x110000 at bitreverse cu 9 9 unsigned int idata unsigned int data cuda gdb has detected that a CUDA device kernel has been reached so it prints the current CUDA thread of focus 6 Verify the CUDA thread of focus with the info cuda threads command and switch between host thread and the CUDA threads cuda gdb info cuda threads BlockIdx ThreadIdx To BlockIdx ThreadIdx Count Virtual PC Filename Line Kernel 0 oe OO 03 OO CORO 10 SP O 0 256 0x0000000000866400 bitreverse cu 9 cuda gdb thread Current thread is 1 process 16738 cuda gdb thread 1 Switching to thread 1 process 16738 O 0x000019d5 in main at bitreverse cu 34 34 bitreverse lt lt lt l N N sizeof int gt gt gt d cuda gdb bt O 0x000019d5 in main at bitreverse cu 34 cuda gdb info cuda kernels Kernel Dev Grid SMs Mask GridDim BlockDim Name Args 0 0 1 0x00000001 1 1 1 256 1 1 bitreverse data 0x110000 CUDA GDB DU 0522
2. global void const parameter 0x10 cuda gdb print global void const parameter 0x10 7 global void const parameter 0x110000 Inspecting Textures To inspect texture use the print command while de referencing the texture recast to the type of the array it is bound to For instance if texture tex is bound to array A of type float use cuda gdb print float tex All the array operators such as can be applied to float tex cuda gdb print float tex 2 cuda gdb print float tex 2 4 Info CUDA Commands These are commands that display information about the GPU and the application s CUDA state Available options are devices information about all the devices sms information about all the SMs in the current device warps information about all the warps in the current SM lanes information about all the lanes in the current warp kernels information about all the active kernels VV VV vv blocks information about all the active blocks in the current kernel gt threads information about all the active threads in the current kernel A filter can be applied to every info cuda command The filter restricts the scope of the command A filter is composed of one or more restrictions A restriction can be any of the following gt device n gt sm n gt warp n gt lane n gt kernel n gt grid n CUDA GDB DU 05227 001_V4 0 21 Chapter 08 INSPEC
3. p amp data 9 global void parameter 0x10 cuda gdb p global void parameter 0x10 10 global void parameter 0x100000 The resulting output depends on the current content of the memory location CUDA GDB DU 05227 001_V4 0 34 Chapter 011 WALK THROUGH EXAMPLE 1 OSince thread 0 0 0 reverses the value of 0 switch to a different thread to show more interesting data cuda gdb cuda thread 170 Switching e focuses ro CUDA KerneIDO 7 grid mo Kock OOO eread 170 0 0 device 0 sm 0 warp 5 lane 10 11 Delete the breakpoints and continue the program to completion cuda gdb delete b Delete all breakpoints y or n y cuda gdb continue Continuing Program exited normally cuda gdb This concludes the cuda gdb walkthrough CUDA GDB DU 05227 001_V4 0 35 SUPPORTED PLATFORMS The general platform and GPU requirements for running NVIDIA cuda gdb are described in this section Host Platform Requirements Mac OS CUDA GDB is supported on 32 bit and 64 bit MacOS X 10 6 5 Linux CUDA GDB is supported on 32 bit and 64 bit editions of the following Linux distributions gt Red Hat Enterprise Linux 4 8 5 5 and 6 0 gt Fedora 13 gt Novell SLED 11SP1 gt OpenSUSE 11 2 gt Ubuntu 10 10 CUDA GDB DU 05227 001_V4 0 36 Appendix A SUPPORTED PLATFORMS GPU Requirements Debugging is supported on all CUDA capable GPUs with a compute capability of 1
4. 0x000000000000008c 0 0 0 1 active 0x000000000000008c Lp O 2 active 0x000000000000008c 25 OO 8 active 0x000000000000008c COo 4 active 0x000000000000008c 4 0 0 5 active 0x000000000000008c 6577070 6 active 0x000000000000008c G70 0 7 active 0x000000000000008c OO 8 active 0x000000000000008c 30 0 9 active 0x000000000000008c 9 101 10 10 active 0x000000000000008c 10 0 0 il active 0x000000000000008c 11 0 0 12 active 0x000000000000008c 12 0 0 L3 active 0x000000000000008c 13 0 0 14 active 0x000000000000008c 14 0 0 15 active 0x000000000000008c 15 0 0 16 active 0x000000000000008c 16 0 0 info cuda kernels This command displays on all the active kernels on the GPU in focus It prints the SM mask kernelld and the gridId for each kernel with the associated dimensions and arguments KernelID is unique across all GPUs whereas Gridld is unique per GPU This command supports filters and the default is kernel all cuda gdb info cuda kernels Kernel Dev Grid SMs Mask GridDim BlockDim Name Args i 0 A OWpdQOicwicicicie 240 pil lai ipl aeos main jseucMis fence 010100007 res 0510100007 m 57 CUDA GDB DU 05227 001_V4 0 23 Chapter 08 INSPECTING PROGRAM STATE info cuda blocks This command displays all the active or running blocks for the kernel in focus Results are grouped per kernel This command supports filters and the default is kernel current block all The outputs are coalesce
5. 1 or later Compute capability is a device attribute that a CUDA application can query about for more information see the latest NVIDIA CUDA Programming Guide on the NVIDIA CUDA Zone Web site http developer nvidia com object gpucomputing html These GPUs have a compute capability of 1 0 and are not supported GeForce 8800 GTS GeForce 8800 GTX GeForce 8800 Ultra Quadro Plex 1000 Model IV Quadro Plex 2100 Model S4 CUDA GDB Quadro FX 4600 Quadro FX 5600 Tesla C870 Tesla D870 Tesla S870 DU 05227 001_V4 0 37 KNOWN ISSUES The following are known issues with the current release gt Device memory allocated via cudaMalloc is not visible outside of the kernel function gt On GPUs with sm type less than sm 20 it is not possible to step over a subroutine in the device code gt Device allocations larger than 100 MB on Tesla GPUs and larger than 32 MB on Fermi GPUs may not be accessible in the debugger gt Debugging applications with multiple CUDA contexts running on the same GPU is not supported on any GPU gt On GPUs with sm_20 if you are debugging code in device functions that get called by multiple kernels then setting a breakpoint in the device function will insert the breakpoint in only one of the kernels gt In a multi GPU debugging environment on Mac OS X with Aqua running you may experience some visible delay while single stepping the application gt Setting a breakpoint on a line within a
6. 19 INSPECTING PROGRAM STATE Memory and Variables The GDB print command has been extended to decipher the location of any program variable and can be used to display the contents of any CUDA program variable including gt data allocated via cudaMalloc gt data that resides in various GPU memory regions such as shared local and global memory gt special CUDA runtime variables such as threadIdx Variable Storage and Accessibility Depending on the variable type and usage variables can be stored either in registers or in local shared const or global memory You can print the address of any variable to find out where it is stored and directly access the associated memory The example below shows how the variable array which is of type shared int can be directly accessed in order to see what the stored values are in the array cuda gdb print amp array 1 shared int 0 0x20 cuda gdb print array 0 4 S43 0 128 Ga 192 You can also access the shared memory indexed into the starting offset to see what the stored values are cuda gdb print shared int 0x20 Sa 0 cuda gdb print shared int 0x24 Sa 128 cuda gdb print shared int 0x28 85 64 CUDA GDB DU 05227 001_V4 0 20 Chapter 08 INSPECTING PROGRAM STATE The example below shows how to access the starting address of the input parameter to the kernel cuda gdb print amp data S6 const
7. BlockIdx Threadldx Virtual PC Dev SM Wp Ln Filename Line Kernel 1 0 0 0 0 0 0 0x000000000088f88c o 0 acos cu 376 07070 1 0 0 0x000000000088f88c Oo i acos cu 316 OPOR 2 0 0 0x000000000088f88c o 2 acos cu 376 000 3 0 0 0x000000000088f88c 0 O 3 acos cu 376 0070 4 0 0 0x000000000088f88c Oo 0 0 4 acos cu 376 0 0 0 5 0 0 0x000000000088f88c o 5 acos cu 376 07070 6 0 0 0x000000000088f88c O O O 6 aces cu 376 MOMO 7 0 0 0x000000000088f88c Oo F acos cu 316 07070 8 0 0 0x000000000088f88c O O O acos cu 376 0070 9 0 0 0x000000000088f88c 0 0 9J acos cu SEIO CUDA GDB DU 05227 001_V4 0 25 CONTEXT AND KERNEL EVENTS Kernel refers to your device code that executes on the GPU while context refers to the virtual address space on the GPU for your kernel You can turn ON or OFF the display of CUDA context and kernel events to review the flow of the active contexts and kernels Display CUDA context events gt cuda gdb set cuda context events 1 Display CUDA context events gt cuda gdb set cuda context events 0 Do not display CUDA context events Display CUDA kernel events gt cuda gdb set cuda kernel events 1 Display CUDA kernel events gt cuda gdb set cuda kernel events 0 Do not display CUDA kernel events CUDA GDB DU 05227 001 V4 0 26 Chapter 09 CONTEXT AND KERNEL EVENTS Examples of events displayed The following are e
8. CUDA thread focus has been retired Instead the user should use the cuda thread command CUDA GDB DU 05227 001_V4 0 4 GETTING STARTED Included in this chapter are instructions for installing cuda gdb and for using NVCC the NVIDIA CUDA compiler driver to compile CUDA programs for debugging Installation Instructions Follow these steps to install cuda gdb 1 Visit the NVIDIA CUDA Zone download page http www nvidia com object cuda_get html 2 Select the appropriate operating system MacOS or Linux See Host Platform Requirements on page 26 3 Download and install the CUDA Driver 4 Download and install the CUDA Toolkit CUDA GDB DU 05227 001 V4 0 5 Chapter 03 GETTING STARTED Setting up the debugger environment Linux Set up the PATH and LD LIBRARY PATH environment variables export PATH usr local cuda bin PATH export LD LIBRARY PATH usr local cuda lib64 usr local cuda lb LD LIBRARY PATH Mac Set up the PATH and DYLD LIBRARY PATH environment variables export PATH usr local cuda bin PATH export DYLD LIBRARY PATH usr local cuda lib DYLD LIBRARY PATH Also if you are unable to execute cuda gdb or if you hit the Unable to find Mach task port for processid error try resetting the correct permissions with the following commands sudo chgrp procmod usr local cuda bin cuda binary gdb sudo chmod 2755 usr local cuda bin cuda binary gdb sudo chmod 755 usr local cuda bin cuda gdb CU
9. Focus To inspect the current focus use the cuda command followed by the coordinates of interest cuda gdb cuda device sm warp lane block thread block 0 0 0 thread 0 0 0 device 0 sm 0 warp 0 lane Q cuda gdb cuda kernel block thread kernel 1 block 0 0 0 thread 0 0 0 cuda gdb cuda kernel kernel 1 CUDA GDB DU 05227 001 V4 0 13 Chapter 05 KERNEL Focus Switching Focus To switch the current focus use the cuda command followed by the coordinates to be changed cuda gdb cuda device 0 sm 1 warp 2 lane 3 Swatkchiing r ocus ito CDA Kernel orc arm poek 8707007 Rihxead GT OO devalceF o sm wap 27 ane 374 int totalThreads gridDim x blockDim x If the specified focus is not fully defined by the comment the debugger will assume that the omitted coordinates are set to the coordinates in the current focus including the subcoordinates of the block and thread cuda gdb cuda thread 15 Swalttehine focusto CUDA Kernel grade ellocel 87070 Ennead 1570 70 device sO sm wasplO aneis 374 int totalThreads gridDim x blockDim x The parentheses for the block and thread arguments are optional S cuda gdb cuda block 1 thread 3 Swattching Pocus to CUDA kernel oride 2 blocs 707 0 7 Enread 870707 device 0 sm 3 warp 0 lane 3 374 int totalThreads gridDim x blockDim CUDA GDB DU 05227 001_V4 0 14 PROGRAM EXECUTION Applications are launched the sa
10. GDB always steps into the device function On GPUs with with sm_type sm_20 and higher you can step in over or out of the device functions as CUDA GDB DU 05227 001_V4 0 15 Chapter 06 PROGRAM EXECUTION long as they are not inlined To force a function to not be inlined by the compiler the __ __noinline____ keyword must be added to the function declaration CUDA GDB DU 05227 001_V4 0 16 BREAKPOINTS There are multiple ways to set a breakpoint on a CUDA application Those methods are described below The commands to set a breakpoint on the device code are the same as the commands used to set a breakpoint on the host code If the breakpoint is set on device code the breakpoint will be marked pending until the ELF image of the kernel is loaded At that point the breakpoint will be resolved and its address will be updated When a breakpoint is set it forces all resident GPU threads to stop at this location when it hits that corresponding PC There is currently no method to stop only certain threads or warps at a given breakpoint When a breakpoint is hit in one thread there is no guarantee that the other threads will hit the breakpoint at the same time Therefore the same breakpoint may be hit several times and the user must be careful with checking which thread s actually hit the breakpoint Symbolic breakpoints To set a breakpoint at the entry of a function use the break command followed by the name of the function o
11. OF CONTENTS Display CUDA context events sssessssesesessscceseosesesssessceeseeeees 26 Display CUDA kernel events c cccccscceesectecscecscescsedessceecsenennvees 26 Examples of events displayed ssssssssosseseosseseosseseosseseoseesseo 27 10 Checking Memory Errors ssseseeseeseesecceccecseeseesecseeseesecseeseesee 28 Checking Memory Errors sista svntedconcdcsswuisvsecedscawwbwsdedvekednsutven towns 28 GPU Error Reporting ssaviasas ie epa nba suicide dese TEE A ES EEES 29 11 Walk through Example sesseeseeseeseesecseeseeceeseeseesecseesecseeseesee 31 bitreverse cu Source Code sessserssssosescssesosesoseroseecsecsssecsseeo 31 Walking Through the Code sensessessessesssesessesseoseoseeseeseesseo 32 Appendix A Supported PlatfoOrims cccscrccrcscrccscsccsccccscsccccssccsceeces 36 Host Platform Requirements ccesescccecsccsccecceccaccecssscaseees 36 MAC OS vps ccocacvascvacnsae dica dna dd a a EDEREK EENE 36 fp acetate pire DANOSO E RR SR DIONE AR 36 GPU Requirements esessesesessesesessescoecsssesesessesesessccececseseseeo 37 Appendix B Known ISSUCS ssseseeseesecceccecsecseeseesecseesecoeeseeseesee 38 Graphics Driver CUDA GDB DU 05227 001_V4 0 iii INTRODUCTION This document introduces cuda gdb the NVIDIA CUDA debugger and describes what is new in version 4 0 What is cuda gdb CUDA GDB is the NVIDIA tool for debugging CUDA applications running on
12. Precision of Scope of the Error Description the Error CUDA EXCEPTION 6 Not precise Warp error This occurs when any Warp Misaligned thread within a warp Address accesses an address in the local or shared memory segments that is not correctly aligned CUDA EXCEPTION 7 Not precise Warp error This occurs when any Warp Invalid Address thread within a warp Space executes an instruction that accesses a memory space not permitted for that instruction CUDA EXCEPTION 8 Not precise Warp error This occurs when any Warp Invalid PC thread within a warp advances its PC beyond the 40 bit address space CUDA EXCEPTION 9 Not precise Warp error This occurs when any Warp Hardware Stack thread in a warp Overflow triggers a hardware stack overflow This should be a rare occurrence CUDA EXCEPTION 10 Not precise Global error This occurs when a Device Illegal thread accesses an Address illegal out of bounds global address For increased precision set cuda memcheck on in cuda gdb CUDA EXCEPTION 11 Precise Per lane thread error This occurs when a Lane Misaligned Requires thread accesses a Address memcheck on global address that is not correctly aligned CUDA GDB DU 05227 001_V4 0 30 WALK THROUGH EXAMPLE This chapter presents a walk through of cuda gdb by debugging a sample application called bitreverse that performs a simple 8 bit reversal on a data se
13. existing GDB commands are unchanged Every new CUDA command or option is prefixed with the CUDA keyword As much as possible cuda gdb command names will be similar to the equivalent GDB commands used for debugging host code For instance the GDB command to display the host threads and switch to host thread 1 are respectively cuda gdb info threads cuda gdb thread 1 To display the CUDA threads and switch to cuda thread 1 the user only has to type cuda gdb info cuda threads cuda gdb cuda thread 1 Getting Help As with GDB commands the built in help for the CUDA commands is accessible from the cuda gdb command line by using the help command cuda gdb help cuda name of the cuda command cuda gdb help set cuda name of the cuda option cuda gdb help info cuda name of the info cuda command Initialization File The initialization file for cuda gdb is named cuda gdbinit and follows the same rules as the standard gdbinit file used by GDB The initialization file may contain any CUDA GDB command Those commands will be processed in order when cuda gdb is launched CUDA GDB DU 05227 001 V4 0 11 Chapter 04 CUDA GDB EXTENSIONS GUI Integration Emacs CUDA GBD works with GUD in Emacs and XEmacs No extra step is required besides pointing to the right binary To use cuda gdb the gud gdb command name variable must be set to cuda gdb annotate 3 Use M x customize variable to set the variable Ensure that
14. new option set cuda break on launch none application system all allows the user to decide if the debugger should stop at the entrance of every system or application kernel Textures The debugger now supports the debugging of kernels using textures Functionality has been added to allow textures to be read CUDA GDB DU 05227 001_V4 0 3 Chapter 02 RELEASE NOTES Improved info cuda commands Their names have changed to be more consistent with the other info commands The info cuda commands are now devices sms warps lanes kernels blocks threads The output format has been streamlined for readability Filters The info cuda commands support focus filters where only the selected devices SMs warps lanes kernels blocks and threads are considered It allows the user to filter the output data to the unit of interest MI support The info cuda and cuda commands are now available as MI commands as well Fermi disassembly Now kernel code running on Fermi sm_20 can be disassembled using the x i command Before this release only Tesla code sm_10 could be disassembled Conditional Breakpoints The break command now supports conditional breakpoints on device code as long as there is no function call in the conditional statement Built in variables such threadIdx and blockIdx can also be used Deprecated Commands The deprecated command thread lt lt lt x y x y z gt gt gt used to switch
15. supports debugging kernels that have been compiled for specific CUDA architectures such as sm_10 or sm_20 but also supports debugging kernels compiled at runtime referred to as just in time compilation or JIT compilation for short About this document This document is the main documentation for CUDA GDB and is organized more as a user manual than a reference manual The rest of the document will describe how to install and use CUDA GDB to debug CUDA kernels and how to use the new CUDA commands that have been added to GDB Some walk through examples are also provided It is assumed that the user already knows the basic GDB commands used to debug host applications CUDA GDB DU 05227 001_V4 0 2 RELEASE NOTES The following features have been added for the 4 0 release Three Dimensional Grid Support Starting with the Release 270 driver grids can be three dimensional With 4 0 cuda gdb supports the new Z dimension C Debugging Support The debugger now supports the debugging of C applications including templates named namespaces no alias virtual functions classes and methods overloading In particular breakpoints set on lines within a templatized functions will create multiple breakpoints one per instance of the template Mac Debugging Support CUDA GDB now runs on 32 bit and 64 bit Mac OS X 10 6 5 systems and supports all the same features as its Linux counterpart Automatic breakpoint on kernel launches A
16. 6 MaC eare eee aE A E E E E E ESR 6 Compiling the application ssssssssessesseosrsesssrssessrssrsecseessesees 7 Debug Compilati n ossserisiceskisicrerdirri stitit Erer EEE E dE 7 Compiling for Fermi GPUS aussi ros SEL SPU renas d 7 Compiling for Fermi and Tesla GPUS ccccc cee cceceeceececeeeeeceecs 7 Using the debugger ccc cece cece eee e ee eseeeeeseeseesceseessessesessseseeees 8 Single GPU Debugging sacas ares RD ra 8 Multi GPU DEDUSDIMD ae Rea Gears Da Std 8 Remote Debugging cece eee ee cence sence reee ce rec rec reeccecerecereeeca 10 4 cud gdb EXCCMSIONS 4ceisaesaicraenniadswsdaassaradesselwasalansssanensaiwaenseneed 11 Command Naming Convention cccceceeeeseseecceeeeeeeenseneeeeeeees 11 Graphics Driver CUDA GDB DU 05227 001_V4 0 i TABLE OF CONTENTS Getting Help esa aa sagas qi ee te id e aa 11 Initialization File ais sn c s area RENRETANUIS REAR OQ RAR RINDO NR 11 GUI IntegratiONn s cessa estarssa nda Seade asa ad ca erada 12 EMACS 90 PRADA SD RR DSO RO RAR RR SS SA Ei 12 DDD E DRI RNA E O SR RR ect anaes 12 5 Kernel FOCUS sqssansaraspinadas sanada A Sad 13 Software Coordinates vs Hardware Coordinates eeeeeeeeeee ees 13 Current FOCUS sfeesesesesisssisias asas ideia saias sa ese as putas 13 Switching FOCUS cscccencinedsmnesore eE EEEE a aU meres 14 6 Program EXCCUERION japas 15 Interrupting the Application ccccsessccscccscesescserecccsesesen
17. 7 001_V4 0 33 Chapter 011 WALK THROUGH EXAMPLE cuda gdb cuda kernel 0 ESwalitching Focus ES CUDA kernel grrd i block 070720 Enreadi 0707007 device 0 sm 0 warp 0 lane 0 9 unsigned int idata unsigned int data cuda gdb bt 0 bitreverse lt lt lt 1 1 1 256 1 1 gt gt gt data 0x110000 at bitreverse cu 9 7 Corroborate this information by printing the block and thread indices cuda gdb print blockIdx Sil ps O yw 0 cuda gdb print threadIdx 2 x 0 y 0 z 0 8 The grid and block dimensions can also be printed cuda gdb print gridDim Go ibs i y i cuda gdb print blockDim 4 x 256 y 1 z 1 9 Advance kernel execution and verify some data cuda gdb n 12 array threadldx x idata threadIdx x cuda gdb n 14 array threadIdx x 0xf0f0f0f0 amp array threadIdx x gt gt 4 cuda gdb n 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 cuda gdb n lig array threadIdx x 0xaaaaaaaa amp array threadIdx x gt gt 1 cuda gdb n Breakpoint 3 bitreverse lt lt lt 1 1 256 1 1 gt gt gt data 0x100000 at bitreverse cu 21 pal idata threadIdx x array threadIdx x cuda gdb p array 0 12 S7 0 128 G4 192 32 150 16 224 16 124 80 20g cuda gdb p x array 0 12 8 0x0 0x80 0x40 0xc0 0x20 0xa0 0x60 0xe0 0x10 0x90 0x50 Oxd0 cuda gdb
18. A NVIDIA CUDA GDB NVIDIA CUDA Debugger for Linux and Mac 4 0 Release TABLE OF CONTENTS 1 IMEFOGUCHOIN sas saida da eases CAS a a a 1 What is CUGG OAD aaa pede aq siga a assa O dida adia 1 Supported features ccccccccccccee cee cee eer err ee eee r cer eee err eeeeeeeeeeecs 1 About this document sacia esisedissado ssa ee ing sra nd 2 2 Release NOTES sacrssisanarnia race ins sers n ade raia pisa 3 Three Dimensional Grid SUPPOrt cceccccncccccccccccccccccscceccsces 3 C Debugging SUPPOPU sca sesmaria da dar ra sedia DOR dada Dad Dad 3 Mac Debugging Support ccccccecc eee eeeeeeeeeeeeeeeeeeeeceeeeeeeeeeeeeees 3 Automatic breakpoint on kernel launches cceeeeeceeceeeeeeeeeeees 3 WSS cesses vie arenes een sie whee arden EEE E EEEE ea EEEIEE REES 3 Improved info cuda COMMANAS cee eeeeeeeeeeeeeseeeeeeeeeeeseeeeeees 4 FULLELS sunosanas aos Res sa Un Ud Da O a CADA 4 MI SUDDONE qa OE dendes esa bend das ui nba 4 Fermi disassemply punir ta uu DURA ORAR 4 Conditional Breakpoints essas crase rear asas secas tar arena ar aaa aaa 4 Deprecated COMMANAGG cceeeseeeessessceceecesscsssssessesesssesseees 4 3 Getting Started sacicisicasennsanianennsssnsdanens ENEE RE 5 Installation Instructions sessessessessessessssscsssescoseeseseesseseesses 5 Setting up the debugger environment sssssesssossscesssesseesseeseeeess 6 WMA E da A E E E A E E mune ee emia
19. DA GDB DU 05227 001_V4 0 6 Chapter 03 GETTING STARTED Compiling the application Debug Compilation NVCC the NVIDIA CUDA compiler driver provides a mechanism for generating the debugging information necessary for cuda gdb to work properly The g G option pair must be passed to NVCC when an application is compiled in order to debug with cuda gdb for example nvcc g G foo cu o foo Using this line to compile the CUDA application foo cu gt forces 00 mostly unoptimized compilation gt makes the compiler include symbolic debugging information in the executable Compiling for Fermi GPUs For Fermi GPUs add the following flags to target Fermi output when compiling the application gencode arch compute_20 code sm_20 It will compile the kernels specifically for the Fermi architecture once and for all If the flag is not specified then the kernels must be recompiled at runtime every time Compiling for Fermi and Tesla GPUs If you are targeting both Fermi and Tesla GPUs include these two flags gencode arch compute 20 code sm 20 gencode arch compute 10 code sm 10 CUDA GDB DU 05227 001 V4 0 7 Chapter 03 GETTING STARTED Using the debugger Debugging a CUDA GPU involves pausing that GPU When the graphics desktop manager is running on the same GPU then debugging that GPU freezes the GUI and makes the desktop unusable To avoid this use cuda gdb in the following system configurations Single GPU Debu
20. E SESE SESE dE SESE S Co SS fon OE foo NS info cuda warps This command takes you one level deeper and prints all the warps information for the SM in focus This command supports filters and the default is device current sm current warp all The GPU warps information can be used to map to the application CUDA blocks using the BlockIdx cuda gdb info cuda warps CUDA GDB DU 05227 001_V4 0 22 Chapter 08 INSPECTING PROGRAM STATE Wp Active Lanes Mask Divergent Lanes Mask Active Physical PC Kernel BlockIdx Device O SM 0 2 OIE IE AE IEEE IE AE 0x00000000 0x000000000000001c 0 07070 iL Oe i WIE IE ie EAE 0x00000000 0x0000000000000000 0 07070 2 OREERT fs 0x00000000 0x0000000000000000 070 0 3 OE i E EIEEE IE 0x00000000 0x0000000000000000 O 0 070 4 OE 0x00000000 0x0000000000000000 0 070 5 OE 1E 1E IE EE E E 0x00000000 0x0000000000000000 O O 0 0 6 0i 1E 15 T 1E E IE E 0x00000000 0x0000000000000000 07070 7 Oa ie ies 0x00000000 0x0000000000000000 07070 info cuda lanes This command displays all the lanes threads for the warp in focus This command supports filters and the default is device current sm current warp current lane all In the example below you can see that all the lanes are at the same physical PC and these can be mapped to the application s CUDA threadIdx cuda gdb info cuda lanes Ln State Physical PC ThreadIdx Device 0 SM 0 Warp 0 0 active
21. Linux and Mac CUDA GDB is an extension to the i386 AMD64 port of GDB the GNU Project debugger The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware This enables developers to debug applications without the potential variations introduced by simulation and emulation environments CUDA GDB runs on Linux and Mac OS X 32 bit and 64 bit The Linux edition is based on GDB 6 6 whereas the Mac edition is based on GDB 6 3 5 Supported features CUDA GDB is designed to present the user with a seamless debugging environment that allows simultaneous debugging of both GPU and CPU code within the same application Just as programming in CUDA C is an extension to C programming debugging with CUDA GDB is a natural extension to debugging with GDB The existing GDB debugging features are inherently present for debugging the host code and additional features have been provided to support debugging CUDA device code CUDA GDB supports C and C CUDA applications All the C features supported by the NVCC compiler can be debugged by CUDA GDB CUDA GDB allows the user to set breakpoints and single step CUDA applications as well to inspect and modify the memory and variables of any given thread running on the hardware CUDA GDB supports debugging all CUDA applications whether they use the CUDA driver API the CUDA runtime API or both CUDA GDB DU 05227 001_V4 0 1 Chapter 01 INTRODUCTION CUDA GDB
22. Maximum sizes of each dimension of a block Maximum sizes of each dimension of a grid Maximum memory pitch Texture alignment Clock rate Concurrent copy and execution Run time Limit on kernels Integrated Support host page locked memory mapping Compute mode Concurrent kernel execution Device has ECC support enabled Device is using TCC driver mode deviceQuery CUDA Driver CUDART CUDA Driver Version 3 20 CUDA Runtime Version 3 20 NumDevs 2 Device Quadro FX 4800 Device GeForce 8800 GT Chapter 03 GETTING STARTED 3 20 3 20 1 3 1610285056 bytes 24 MP x 8 Cores MP 192 Cores 65536 bytes 16384 bytes 16384 32 512 512 x 512 x 64 65535 x 65535 x 1 2147483647 bytes 256 bytes 1 20 GHz Yes Yes No Yes Default multiple host threads can use this device simultaneously No No No 3 20 3 20 1 1 536674304 bytes 14 MP x 8 Cores MP 112 Cores 65536 bytes 16384 bytes 8192 32 12 512 x 512 x 64 65535 x 65535 x 1 2147483647 bytes 256 bytes 1 50 GHz Yes Yes No Yes Default multiple host threads can use this device simultaneously No No No pe Figure 3 1 deviceQuery Output Remote Debugging To remotely debug an application use SSH or VNC from the host system to connect to the target system CUDA GDB From there cuda gdb can be launched in console mode DU 05227 001_V4 0 10 CUDA GDB EXTENSIONS Command Naming Convention The
23. RDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all other information previously supplied NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA the NVIDIA logo NVIDIA nForce GeForce NVIDIA Quadro NVDVD NVIDIA Personal Cinema NVIDIA Soundstorm Vanta TNT2 TNT RIVA RIVA TNT VOODOO VOODOO GRAPHICS WAVEBAY Accuview Antialiasing Detonator Digital Vibrance Control ForceWare NVRotate NVSensor NVSync PowerMizer Quincunx Antialiasing Sceneshare See What You ve Been Missing StreamThru SuperStability T BUFFER The Way It s Meant to be Pl
24. TING PROGRAM STATE gt block x y or block x y gt thread x y z or thread x y z2 wheren x y zare integers or one of the following special keywords current any and all current indicates that the corresponding value in the current focus should be used any and all indicate that any value is acceptable info cuda devices This command enumerates all the GPUs in the system sorted by device index A indicates the device currently in focus This command supports filters Default is device all Prints No CUDA Devices if no GPUs are found cuda gdb info cuda devices Dev Description SM Type SMs Warps SM Lanes Warp Max Regs Lane Active SMs Mask vs 0 gt200 sm 13 24 32 22 1218 pe OO EEE info cuda sms This command show all the SMs for the device and the associated active warps on the SMs This command supports filters and the default is device current sm all A indicates the SM is focus Results are grouped per device cuda gdb info cuda sms SM Active Warps Mask Device 0 OLE IE VE JE SEE dE SESE dE dE SE IESE SESE OLIE ESE SESE SESE SESE SPIE SESE GE SESE OSE EVE IE dE dE SESE SE dE dE SE SESE IE IE OSE NE 1E AE 1E NE 1E NE SESE dE AE 1E AE IESE eE iE SE dE SEE SE dE SESE JE dE SE dE dE de Ose IEEE JE ME E dE de dE EE dE dE E ande de 1E dE SESE SE YE SESE IESE SESE SESE OLIE IE E AE 1E AE dE IESE AE IESE IESE dE aE OLIE SESE 1E 1E ESE SES
25. ance of that kernel With the runtime API the exact instance to which the breakpoint will be resolved cannot be controlled With the driver API the user can control the instance to which the breakpoint will be resolved to by setting the breakpoint right after its module is loaded CUDA GDB DU 05227 001_V4 0 8 Chapter 03 GETTING STARTED Multi GPU Debugging in Console Mode CUDA GDB allows simultaneous debugging of applications running CUDA kernels on multiple GPUs In console mode cuda gdb can be used to pause and debug every GPU in the system You can enable console mode as described above for the single GPU console mode Multi GPU Debugging with the Desktop Manager Running This can be achieved by running the desktop GUI on one GPU and CUDA on the other GPU to avoid hanging the desktop GUI On Linux The CUDA driver automatically excludes the GPU used by X11 from being visible to the application being debugged This can change the behavior of the application since if there are n GPUs in the system then only n 1 GPUs will be visible to the application On Mac OS X The CUDA driver exposes every CUDA capable GPU in the system including the one used by Aqua desktop manager To determine which GPU should be used for CUDA run the deviceQuery app from the CUDA SDK sample The output of deviceQuery as shown in Figure 3 1 indicates all the GPUs in the system For example if you have two GPUs you will see Device0 GeForce xxxx an
26. ayed Logo TwinBank TwinView and the Video amp Nth Superscript Design Logo are registered trademarks or trademarks of NVIDIA Corporation in the United States and or other countries Other company and product names may be trademarks or registered trademarks of the respective owners with which they are associated Copyright 2007 2011 NVIDIA Corporation All rights reserved www nvidia com NVIDIA
27. cuda gdb is present in the Emacs XEmacs PATH DDD CUDA GDB works with DDD To use DDD with cuda gdb launch DDD with the following command ddd debugger cuda gdb cuda gdb must be in your PATH CUDA GDB DU 05227 001_V4 0 12 KERNEL FOCUS A CUDA application may be running several host threads and many device threads To simplify the visualization of information about the state of application commands are applied to the entity in focus When the focus is set to a host thread the commands will apply only to that host thread unless the application is fully resumed for instance On the device side the focus is always set to the lowest granularity level the device thread Software Coordinates vs Hardware Coordinates A device thread belongs to a block which in turn belongs to a kernel Thread block and kernel are the software coordinates of the focus A device thread runs on a lane A lane belongs to a warp which belongs to an SM which in turn belongs to a device Lane warp SM and device are the hardware coordinates of the focus Software and hardware coordinates can be used interchangeably and simultaneously as long as they remain coherent Another software coordinate is sometimes used the grid The difference between a grid and a kernel is the scope The grid ID is unique per GPU whereas the kernel ID is unique across all GPUs Therefore there is a 1 1 mapping between a kernel and a grid device tuple Current
28. d Devicel GeForce xxxx Choose the Device lt index gt that is not rendering the desktop on your connected monitor If Device is rendering the desktop then choose Devicel for running and debugging CUDA application This exclusion of the desktop can be achieved by executing this command export CUDA_VISIBLE_DEVICES 1 CUDA GDB DU 05227 001_V4 0 9 There are 2 devices supporting CUDA Device O Quadro FX 4800 CUDA Driver Version CUDA Runtime Version CUDA Capability Major Minor version number Total amount of global memory Multiprocessors x Cores MP Cores Total amount of constant memory Total amount of shared memory per block Total number of registers available per block Warp size Maximum number of threads per block Maximum sizes of each dimension of a block Maximum sizes of each dimension of a grid Maximum memory pitch Texture alignment Clock rate Concurrent copy and execution Run time limit on kernels Integrated Support host page locked memory mapping Compute mode Concurrent kernel execution Device has ECC support enabled Device is using TCC driver mode Device 1 GeForce 8800 GT CUDA Driver Version CUDA Runtime Version CUDA Capability Major Minor version number Total amount of global memory Multiprocessors x Cores MP Cores Total amount of constant memory Total amount of shared memory per block Total number of registers available per block Warp size Maximum number of threads per block
29. d by default BlockIdx ro Bloki count State Kernel 1 e OFS Ore O ARS SI Oye 03 192 running Coalescing can be turned off as follows in which case more information on the Device and the SM get displayed cuda gdb set cuda coalescing off Coalescing of the CUDA commands output is off cuda gdb info cuda blocks BlockIdx State Dev SM Kernel 1 x 0 070 running 0 0 amoo running 0 8 270705 running 0 6 37 00 running 0 9 4 0 0 running o i2 570 0 running O Ls 6 0 0 running Qu ALE OS running 2L SF 0770 running 0 I info cuda threads This command displays application s active CUDA blocks and threads with total count of threads in those blocks Also displayed as virtual PC and the associated source file and the line number information The results are grouped per kernel The command supports filters with default being kernel current block all thread all The outputs are coalesced by default as follows cuda gdb info cuda threads BlockIdx ThreadIdx To BlockIdx ThreadIdx Count Virtual PC Filename Line Device 0 SM 0 Pee 10 10 0 COMO COMO 0 FO 32 0x000000000088f88c acos cu 376 0 0 0 32 0 0 191 0 0 127 0 0 24544 0x000000000088f800 acos cu 374 Coalescing can be turned off as follows in which case more information gets associated on the output cuda gdb info cuda threads CUDA GDB DU 05227 001_V4 0 24 Chapter 08 INSPECTING PROGRAM STATE
30. device or global function before its module is loaded may result in the breakpoint being temporarily set on the first line of a function below in the source code As soon as the module for the targeted function is loaded the breakpoint will be reset properly In the meantime the breakpoint may be hit depending on the application In those situations the breakpoint can be safely ignored and the application can be resumed gt The scheduler locking cannot be set to on gt Stepping again after stepping out of a kernel results in undetermined behavior It is recommended to use the continue command instead gt Kernels containing printf function calls where some arguments are missing cannot be debugged gt CUDA GDB can leave temporary files in case of a abnormal termination which can cause consecutives runs of cuda gdb to hang To fix this hang please delete all cudagdb files from tmp directory gt Any CUDA application that uses OpenGL interoperability requires an active windows server such applications will fail to run under console mode debugging on both Linux CUDA GDB DU 05227 001_V4 0 38 Appendix B KNOWN ISSUES and MAC If X server is running on Linux the render GPU will not be enumerated when debugging so the application could still fail unless the application uses the OGL device enumeration to access the render GPU CUDA GDB DU 05227 001_V4 0 39 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOA
31. eses 15 SINGIE SESD DING binvaxxivnvavcbuxevend ieie e n E E E aa hares 15 7 Breakpoints qussasisa ide din dadas AM a DC aaa 17 Symbolic breakpoints assessora sacia un cai end Sd 17 Line Breakpomts ccasscescscewsaxswausceedanecaduntwessiescareectasnecauheeawannees 18 Address Breakpoints a isinei silica iba ted die nore ada pura dd uia 18 Kernel Entry Breakpoints ssssessssscsersssecssssosecssecoseceessosseseees 18 Conditional BreakKDOINS usas sa sacas frase ds oba us pda a 19 8 Inspecting Program State ssssceccececeesececsosesececoecececseseseceeoeo 20 Memory and Variables snssssssssssecsssssssssssecsssesseseseesseceseeees 20 Variable Storage and Accessibility ssesssossesrossesresroesrsssesreeeo 20 Inspecting TEXtUFES apena raia Doroteia E aE EE E EEEa 21 Info CUDA Commands s sssssssssoseoseoseoseoseoseossoseoseoseoseoseeseessos 21 INFO cuda GEVICES ssc cnvcevssivinvassecadecessiceerdsaetaussencaueeessenenvanasee 22 TAT c da SINS sia cous Sev Ra OL CRT di iai eras 22 info cuda WANDS cuisis teores hehe recuse aba vi dd gs er td 22 info cuda lanes asa ae As Dea dai a De dra o NS a 23 info c da kernels cccccsccvsavevrsconeraanecadvarwsaevesserpeasanrexedeevavaessey 23 info c da DIGGS esapeinisr essi Ten A Shek Un a die 24 info c da threaos pes NA dO EEEE 24 9 Context and Kernel Events ssasssapaasasieninicandisa matr inas cana daUcAM nadas 26 Graphics Driver CUDA GDB DU 05227 001_V4 0 ii TABLE
32. gging In a single GPU system cuda gdb can be used to debug CUDA application only if no X11 server on Linux or no Aqua desktop manager on Mac OS X is running on that system On Linux you can stop X11 server by stopping the gdm service On Mac OS X you can login with gt console as the user name in the desktop UI login screen This allows CUDA applications to be executed and debugged in a single GPU configuration Multi GPU Debugging Multi GPU debugging is not much different than single GPU debugging except for a few additional cuda gdb commands that let you switch between the GPUs Any GPU hitting the breakpoint will pause all the GPUs running CUDA on that system Once paused you can use info cuda kernels to view all the active kernels and the GPUs they are running on When any GPU is resumed all the GPUs are resumed All CUDA capable GPUs can run the same or different kernels To switch to an active kernel you can use cuda kernel lt n gt or cuda device lt n gt to switch to the desired GPU where n is the id of the kernel or GPU retrieved from info cuda kernels Once you are on an active kernel and a GPU then the rest of the process is the same as single GPU debugging Note The same module and therefore the same kernel can be loaded and used by different contexts and devices at the same time When a breakpoint is set in such a kernel by either name or file name and line number it will be resolved arbitrarily to only one inst
33. me way in cuda gdb as they are with GDB by using the run command This chapter describes only how to interrupt and single step CUDA applications Interrupting the Application If the CUDA application appears to be hanging or stuck in an infinite loop it is possible to manually interrupt the application by pressing CTRL C When the signal is received the GPU is suspended and the cuda gdb prompt will appear At that point the program can be inspected modified single stepped resumed or terminated at the user s discretion This feature is limited to applications running within the debugger It is not possible to break into and debug applications that have been previously launched Single Stepping Single stepping device code is supported However unlike host code single stepping device code single stepping works at the warp granularity This means that single stepping a device kernel advances all the threads in the warp currently in focus In order to advance the execution of more than one warp a breakpoint must be set at the desired location and then the application must be fully resumed A special case is single stepping over a thread barrier call __syncthreads In this case an implicit temporary breakpoint is set immediately after the barrier and all threads are resumed until the temporary breakpoint is hit On GPUs with sm_type less than sm_20 it is not possible to step over a subroutine in the device code Instead CUDA
34. r method cuda gdb break my function cuda gdb break my class my method For templatized functions and methods the full signature must be given cuda gdb break int my templatized function lt int gt int The mangled name of the function can also be used To find the mangled name of a function you can use the following command cuda gdb info function my function name CUDA GDB DU 05227 001_V4 0 17 Chapter 07 BREAKPOINTS Line Breakpoints To set a breakpoint on a specific line number use the following syntax cuda gdb break my file cu 185 If the specified line corresponds to an instruction within templatized code multiple breakpoints will be created one for each instance of the templatized code Be aware that at this point those multiple breakpoints cannot be saved from one run to the next and will be deleted when the application is run again The user must then manually set those breakpoints again Address Breakpoints To set a breakpoint at a specific address use the break command with the address as argument cuda gdb break 0xlafe34d0 The address can be any address on the device or the host Kernel Entry Breakpoints To break on the first instruction of every launched kernel set the break_on_launch option to application cuda gdb set cuda break_on_launch application Possible options are gt application any kernel launched by the user application gt system any kernel launched by
35. t bitreverse cu Source Code 1 include lt stdio h gt 2 include lt stdlib h gt 3 4 Simple 8 bit bit reversal Compute test 5 6 define N 256 T 8 global void bitreverse void data 9 unsigned int idata unsigned int data 10 CXS HENS hareda imitam rasa abil 12 array threadIdx x idata threadIdx x mo 14 array threadIdx x 0xf0f0f0f0 amp array threadIdx x gt gt 4 15 0x0f0f0f0f amp array threadIdx x lt lt 4 16 array threadIdx x Oxcccccccc amp array threadIdx x gt gt 2 Jg MOSS te array iclouctseclich lt lt I lt lt 2 g 18 array threadIdx x Oxaaaaaaaa amp array threadIdx x gt gt 1 19 055555555 amp array enseada ss 1y 20 Bal idata threadIdx x array threadIdx x 22 28 24 int main void 25 voici Cl NULA Loe ale 26 unsigned int idata N odata N 27 28 cos a Of al E io 29 idata i unsigned int i CUDA GDB DU 05227 001_V4 0 31 Chapter 011 WALK THROUGH EXAMPLE 30 Sal cudaMalloc void amp d sizeof int N 32 cudaMemcpy d idata sizeof int N 35 cudaMemcpyHostToDevice 34 35 bitreverse lt lt lt 1 N N sizeof int gt gt gt d 36 Siy cudaMemcpy odata d sizeof int N 38 cudaMemcpyDeviceToHost 39 40 for L Op L lt Np L 41 printf Su gt Su n idata i odata i 42 43 cudaFree void d 44 return 0 45 Walking Thro
36. the driver such as memset gt all any kernel application and kernel gt none no kernel application or kernel Those automatic breakpoints are not displayed by the info breakpoints command and are managed separately from individual breakpoints Turning off the option will not delete other individual breakpoints set to the same address and vice versa CUDA GDB DU 05227 001_V4 0 18 Chapter 07 BREAKPOINTS Conditional Breakpoints To make the breakpoint conditional use the optional if keyword or the cond command cuda gdb break foo cu 23 if threadIdx x 1 amp amp i lt 5 cuda gdb cond 3 threadIdx x 1 amp amp i lt 5 Conditional expressions may include all the variables including built in variables such as threadIdx and blockIdx Function calls are not allowed in conditional expressions Note that conditional breakpoints are always hit and evaluated but the debugger reports the breakpoint only if the conditional statement is evaluated to true The process of hitting the breakpoint and evaluated the corresponding conditional statement is time consuming Therefore running applications while using conditional breakpoints may slow down the debugging session Moreover if the conditional statement is always evaluated to false the debugger may appear to be hanging or stuck although it is not the case You can interrupt the application with CTRL C to verify that progress is being made CUDA GDB DU 05227 001_V4 0
37. tion Codes memcheck on Exception code Precision of Scope of the Error Description the Error CUDA EXCEPTION O Not precise Global error on the GPU This is a global GPU Device Unknown error caused by the Exception application which does not match any of the listed error codes below This should be a rare occurrence CUDA EXCEPTION 1 Precise Per lane thread error This occurs when a Lane Illegal Address Requires thread accesses an illegal out of bounds global address Warp Out of range Address CUDA EXCEPTION 2 Precise Per lane thread error This occurs when a Lane User Stack thread exceeds its Overflow stack memory limit CUDA EXCEPTION 3 Not precise Global error on the GPU This occurs when the Device Hardware application triggers a Stack Overflow global hardware stack overflow The main cause of this error is large amounts of divergence in the presence of function calls CUDA EXCEPTION 4 Not precise Warp error This occurs when any Warp Illegal thread within a warp Instruction has executed an illegal instruction CUDA EXCEPTION 5 Not precise Warp error This occurs when any thread within a warp accesses an address that is outside the valid range of local or shared memory regions CUDA GDB DU 05227 001_V4 0 29 Chapter 010 Table 10 1 CUDA Exception Codes continued CHECKING MEMORY ERRORS Exception code
38. ugh the Code 1 Begin by compiling the bitreverse cu CUDA application for debugging by entering the following command at a shell prompt nvcc g G bitreverse cu o bitreverse This command assumes the source file name to be bitreverse cu and that no additional compiler flags are required for compilation See also Compiling for Debugging on page 20 2 Start the CUDA debugger by entering the following command at a shell prompt cuda gdb bitreverse 3 Set breakpoints Set both the host main and GPU bitreverse breakpoints here Also set a breakpoint at a particular line in the device function bitreverse cu 18 cuda gdb main Breakpoint at 0x18el file bitreverse cu line 25 at 0x18al file bitreverse cu line 8 21 at Ozlede hile pitreverselcu Ene air Breakpoint cuda gdb b il cuda gdb b bitreverse 2 b Breakpoint 3 CUDA GDB DU 05227 001_V4 0 32 Chapter 011 WALK THROUGH EXAMPLE 4 Run the CUDA application and it executes until it reaches the first breakpoint main set in step 3 cuda gdb r Starting program Users CUDA Userl docs bitreverse Reading symbols for shared libraries GIS OOOO gua Bro BUD GO Oro DOE COO OOO OF GS qro beso dan 66660 Gs bp dao done Breakpoint 1 main at bitreverse cu 25 25 Wiesel cl NUET alinie alg 5 At this point commands can be entered to advance execution or to print the program state For this walkthrough continue to the device kernel
39. xamples of context events displayed gt Context Create of context Oxad2fe60 on Device 0 gt Context Pop of context Oxad2fe60 on Device 0 gt Context Destroy of context Oxad2fe60 on Device 0 The following are examples of kernel events displayed gt Launch of CUDA Kernel 1 kernel3 on Device 0 gt Termination of CUDA Kernel 1 kernel3 on Device 0 CUDA GDB DU 05227 001_V4 0 27 CHECKING MEMORY ERRORS Checking Memory Errors The CUDA MemoryChecker feature is enabled which allows detection of global memory violations and mis aligned global memory accesses This feature is off by default and can be enabled using the following variable in cuda gdb before the application is run set cuda memcheck on Once CUDA memcheck is enabled any detection of global memory violations and mis aligned global memory accesses will be detected only in the run or continue mode and not while single stepping through the code You can also run CUDA memory checker as a standalone tool cuda memcheck CUDA GDB DU 05227 001 V4 0 28 GPU Error Reporting Chapter 010 CHECKING MEMORY ERRORS With improved GPU error reporting in cuda gdb application bugs are now easy to identify and easy to fix The following table shows the new errors that are reported on GPUs with sm 20 and higher Continuing the execution of your application after these errors could lead to application termination or indeterminate results Table 10 1 CUDA Excep

CUDA-GDB

Contents

Download Pdf Manuals

Related Search

Related Contents