Home

CUFFT Library User's Guide

1. ccc ccce cece cee ec ee eeeeeeneeeeeneeeeees 29 3 9 3 Functions cufftExecC2R and cufftExecZ2D ccc cece cece eee eeeeeeeeneeeeneeeeees 30 3 10 Function CufftSetStream sc esc cccoectcevecseeesesedeverevarertexseeeses creed o ague vases ahia 31 3 11 Function CufftGetVersion ccccccecccccescceeesseeeseeeeseeeeeeeeseeseeseseeeeeeeseseseeeeeees 31 3 12 Function cufftSetCompatibilityMode cccece eee ceee eect eee eeseeeeeeeeeeneeeeeeeeeeeeeee 32 3 13 Parameter cufftCompatibility ccc ccccnceseeeeneceeeecceenneseaeeeeessseeceeeaeesaaseaees 32 314 CUFFT ly p Sitcdescivewaruessnsin Sapaiescniieantaes cs Rann hee eae sales See EERE AUR ES apmead ae ea NSN als 32 3 14 1 Parameter CTYPE arracat rr tarara AERE AAA ARAARA AREAN EAER AAAA ERARA ARAA 33 3 14 2 Parameters for Transform Direction sssssssesssssesescesosssesececesssssecceeeseeeee 33 3 14 3 Other CUFFT TYPES serres roesini i Eese ESEE EEEE EEEE EAE 33 314 31 cufftHandle sssrinin eer on nean n ANENE AENEAN ENEE EEEREN ENEEIER ASS 33 AER Ti A E T E T T E A T T TA 33 eae wE CUTTEDOUDIEREAl EE E E T A E IE E E E E TE 33 35 14 3554 CUNTCOMPIEX s15cd tt pacccue cecabdauetinnss abandia moran veces dedtane teacendencchieterahsamsiontces 33 3 14 35 CUfFtDOUDLECOMPLEX wigs vsccescensiess cis resena n n e E S eal nee 34 Chapter 4 CUFFT Code Examples ccciscsedetceacidonntecetideueedecdcantenitstdestcuatedddescteoedeas 35 4 1 1D Complex to C
2. www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 4 Using the CUFFT API gt cufftPlan1D cufftPlan2D cufftPlan3D Create a simple plan for a 1D 2D 3D transform respectively gt cufftPlanMany Creates a plan supporting batched input and strided data layouts Among the plan creation functions cufftPlanMany allows use of more complicated data layouts and batched executions Execution of a transform of a particular size and type may take several stages of processing When a plan for the transform is generated CUFFT derives the internal steps that need to be taken These steps may include multiple kernel launches memory copies and so on In addition all the intermediate buffer allocations on CPU GPU memory take place during planning These buffers are released when the plan is destroyed In the worst case the CUFFT Library allocates space for 8 batch n 0 n rank 1 cufftComplex or cufftDoubleComplex elements where batch denotes the number of transforms that will be executed in parallel rank is the number of dimensions of the input data see Multidimensional transforms and n is the array of transform dimensions for single and double precision transforms respectively Depending on the configuration of the plan less memory may be used In some specific cases the temporary space allocations can be as low as 1 batch n 0 n rank 1 cufftComplex or cufftDoubleComplex elements This temporary space is
3. 20 CUFFT API Reference 3 4 2 Function cufftEstimate2d CUE RERGSUIIE cufftEstimate2d int nx int ny cufftType type size t workSize During plan execution CUFFT requires a work area for temporary storage of intermediate results This call returns an estimate for the size of the work area required given the specified parameters and assuming default plan settings Note that changing some plan settings such as compatibility mode may alter the size required for the work area Px The transform size in the x dimension number of rows The transform size in the y dimension number of columns The transform data type e g CUFFT_C2R for single precision complex to real Output Return Values CUFFT_INVALID SIZE Either or both of the nx or ny parameters is not a supported size 3 4 3 Function cufftEstimate3d cufftResult GUNES necese amie ibe ane inh idle aa WHEREIS yoe Siza UKoual lt Sil7AS During plan execution CUFFT requires a work area for temporary storage of intermediate results This call returns an estimate for the size of the work area required given the specified parameters and assuming default plan settings Note that changing some plan settings such as compatibility mode may alter the size required for the work area www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 21 CUFFT API Reference type The transform data type e g CUFFT_R2Cc for single precisio
4. Function cufftEstimateMany ccceeeccceeceee ence eenneeeeneeeeeneeeenneeeeneeeeeneeeees 22 3 5 CUFFT Refined Estimated Size of Work Area ssssssssssesessesossssseseescsssesseseosoe 23 3 5 1 Function CUFftGELSIZE1d cciencscvileciintvcde neon dbssdeuscecavneceehligevebveremedenbeed ak 23 3 5 2 Function cufftGetSiZe2d ccccccessceceecceesccceeseceeeesceesseesseetesseessaseesaaes 24 3 5 3 Function CufftGetSiZe3d cecscewcdsesecden dues eee eed eee EEEIEE peed ANEREN ERSO KESEN 25 3 5 4 Function cufftGetSizeMany cce cece ceeecee cece eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeaees 25 3 6 Fanchon GUITTGELSIZE airina A AEA AON ANRA EAEN ARTEAN ONEA RACAR AEEA 27 3 7 CUFFT Caller Allocated Work Area Support ssssssssssssesessesesessesesesseesesssesseessess 27 3 7 1 Function cufftSetAutoAllocation sessssssssssesessssssssesesessssssesessssssesesesess 27 3 7 2 Function cutftSetWorkATea J errcisirrireiririiriiiiirriiti rii hirri ri kanin ENEE ANEREN AAAA 28 3 8 Function CUTTLDESEFOY csevauscevnscdfemseedaliccuveaedbaseevametannsserdwacadadiceanseeelasesiaesadiws 28 www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 ii 3 9 CUFFT EXGGUtION eera ainu n ara loedaaange pee abe sesadac sehages eho eadinve as hooves eons OESS 29 3 9 1 Functions cufftExecC2C and cufftExecZ2Z sssssessesesssescssescsssesessessssees 29 3 9 2 Functions cufftExecR2C and cufftExecD2Z
5. Thread Safety Starting with CUFFT version 4 1 the CUFFT Library is thread safe and its functions can be called from multiple host threads even with the same plan cufftHandle The only requirement is that the output data memory intervals are disjoint 2 9 Accuracy and Performance A general DFT can be implemented as a matrix vector multiplication that requires O N operations However the CUFFT Library employs the Cooley Tukey algorithm to reduce the number of required operations to optimize the performance of particular transform sizes This algorithm expresses a DFT recursively in terms of smaller DFT building blocks The CUFFT Library implements the following DFT building blocks radix 2 radix 3 radix 5 and radix 7 Hence the performance of any transform size that can be factored as 27x 3 5 x 74 where a b c and d are non negative integers is optimized in the CUFFT library There are also radix m building blocks for other primes m whose value is lt 128 When the length cannot be decomposed as multiples of powers of primes from 2 to 127 Bluestein s algorithm is used The accuracy of the Bluestein implementation degrades with larger sizes compared to the pure Cooley Tukey implementation specifically in single precision mode due to the accumulation of floating point operation inaccuracies The pure Cooley Tukey implementation has excellent accuracy with the relative error growing proportionally to log N where N is t
6. cufftExecC2R cufftExecZ2D complex to real inverse transform for single double precision www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 5 Using the CUFFT API Each of those functions demands different input data layout see Data Layout for details 2 4 Data Layout In the CUFFT Library data layout depends strictly on the configuration and the transform type In the case of general complex to complex transform both the input and output data shall be a cuf tComplex cufftDoubleComplex array in single and double precision modes respectively In C2R mode an input array Xy Xp x pp of only non redundant complex elements is required The output array Xj X Xy consists of cuf tReal cufftDouble elements in this mode Finally R2C demands an input array X X Xy of real values and returns an array Xj Xp XN p1 of non redundant complex elements In real to complex and complex to real transforms the size of input data and the size of output data differ For out of place transforms a separate array of appropriate size is created For in place transforms the user can specify one of two supported data layouts native or padded The first is used for best performance and the latter for FFTW compatibility In the padded layout output signals begin at the same memory addresses as the input data In other words input data for real to complex and output data for complex to real must be padd
7. via the advanced data layout parameters inembed istride idist onembed ostride and odist All arrays are assumed to be in CPU memory Input Dimensionality of the transform 1 2 or 3 Pp Array of size rank describing the size of each dimension Pointer of size rank that indicates the storage dimensions of the input data in memory If set to NULL all other advanced data layout parameters are ignored istride Indicates the distance between two successive input elements in the least significant i e innermost dimension idist Indicates the distance between the first element of two consecutive signals in a batch of the input data onembed Pointer of size rank that indicates the storage dimensions of the output data in memory If set to NULL all other advanced data layout parameters are ignored ostride Indicates the distance between two successive output elements in the output array in the least significant i e innermost dimension o Indicates the distance between the first element of two consecutive signals in a batch of the output data dist type The transform data type e g CUFFT_R2C for single precision real to complex Batch size for this transform Output Return Values CUFFT successfully returned the size of the work space The plan parameter is not a valid handle CUFFT_INVALID_VALUE One or more invalid parameters were passed to the API CUFFT_ALLOC_FAILED The allocation of GPU resources for the plan fail
8. 1 cufftComplex N cufftReal Ni l l 1 cufftComplex N NocufftComplex N NocufftComplex No N NocufftReal Nil gt 1 cuf tComplex NN cufftReal No Nil 1 cuf tComplex N NoN3cufftComplex N NoN3cufftComplex N NoN3cufftReal N NNF 1 cufftComplex N NoN3cufftReal N3 N N2 IcufftComplex For example static declaration of a three dimensional array for the output of an out of place real to complex transform will look like this cufftComplex float odata N1 N2 N3 2 1 2 6 Advanced Data Layout The advanced data layout feature allows transforming only a subset of an input array or outputting to only a portion of a larger data structure It can be set by calling function cufftResult cufftPlanMany cufftHandle plan int rank int n int inembed int istride int idist int onembed int ostride www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 8 Using the CUFFT API int odist cufftType type int batch Passing inembed or onembed set to NULL is a special case and is equivalent to passing n for each This is same as the basic data layout and other advanced parameters such as istride are ignored If the advanced parameters are to be used then all of the advanced interface parameters must be specified correctly Advanced parameters are defined in units of the relevant data type cufftReal cufftDoubleReal cuf ftComplex or cufftDoubleComplex Advanced layout can be perceived as an add
9. 2 2 Function cufftPlan2d cufftResult CULftEPlanZdi cutttHandle Solera int nx Int my CUbEELYyoe type Creates a 2D FFT plan configuration according to specified signal sizes and data type Input Px The transform size in the x dimension number of rows The transform size in the y dimension number of columns type The transform data type e g CUFFT_C2R for single precision complex to real Output Contains a CUFFT 2D plan handle value Return Values CUFFT successfully created the FFT plan CUFFT_INVALID_ PLAN The plan parameter is not a valid handle www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 13 CUFFT API Reference CUFFT_ALLOC_ FAILED The allocation of GPU resources for the plan failed One or more invalid parameters were passed to the API An internal driver error was detected The CUFFT library failed to initialize Either or both of the nx or ny parameters is not a supported size 3 2 3 Function cufftPlan3d cu ftResult CULEtEPlansdl cutttHandle pilan int nx int my nt nz Curie Lyoe type Creates a 3D FFT plan configuration according to specified signal sizes and data type This function is the same as cuf tPlan2d except that it takes a third size parameter nz Pointer to a cufftHandle object The transform size in the x dimension The transform size in the z dimension The transform data type e g CUFFT_R2C for single precision real to complex Outp
10. FFTW compatibility both CUFFT_COMPATIBILITY_FFTW_PADDING and CUFFT COMPATIBILITY FFTW ASYMMETRIC Refer to the FFTW online documentation for detailed FFTW data layout specifications The default mode is CUFFT_COMPATIBILITY_FFTW_PADDING 2 5 Multidimensional transforms Multidimensional DFT transforms a d dimensional array Xp where n N Ny Ng into its frequency domain array given by N 1 kn X Xne 2A n 0 www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 7 Using the CUFFT API Ny Ng where 4 N a N No Ns and the summation denotes the set of nested summations NyINyl Ngl ny 0 ny 0 ng 0 CUFFT supports one dimensional two dimensional and three dimensional transforms which can all be called by the same cuf tExec functions see Fourier Transform Types Similar to the one dimensional case the frequency domain representation of real valued input data satisfies Hermitian symmetry defined as X n n ng XN NS Hin ET C2R and R2C algorithms take advantage of this fact by operating only on half of the elements of signal array namely on Xp for Na WE Ny Xo X L Ngo x L olor l B The general rules of data alignment described in Data Layout apply to higher dimensional transforms The following table summarizes input and output data sizes for multidimensional DFTs i FFT type Input data size Output data size N cufftComplex N4 cufftComplex Ni N4 cufftReal a l
11. allocated separately for each individual plan when it is created i e temporary space is not shared between the plans The next step in using the library is to call an execution function such as cufftExecC2C see Parameter cufftType which will perform the transform with the specifications defined at planning One can create a CUFFT plan and perform multiple transforms on different data sets by providing different input and output pointers Once the plan is no longer needed the cufftDestroy function should be called to release the resources allocated for the plan 2 3 Fourier Transform Types Apart from the general complex to complex C2C transform CUFFT implements efficiently two other types real to complex R2C and complex to real C2R In many practical applications the input vector is real valued It can be easily shown that in this case the output satisfies Hermitian symmetry Xy Xn_p where the star denotes complex conjugation The converse is also true for complex Hermitian input the inverse transform will be purely real valued CUFFT takes advantage of this redundancy and works only on the first half of the Hermitian vector Transform execution functions for single and double precision are defined separately as gt cufftExecC2C cufftExecZ2zZ complex to complex transforms for single double precision gt cufftExecR2C cufftExecD2Z real to complex forward transform for single double precision gt
12. and general signal processing The CUFFT library provides a simple interface for computing FFTs on an NVIDIA GPU which allows users to quickly leverage the floating point power and parallelism of the GPU in a highly optimized and tested FFT library The CUFFT product supports a wide range of FFT inputs and options efficiently on NVIDIA GPUs This version of the CUFFT library supports the following features gt Algorithms highly optimized for input sizes that can be written in the form 27x 3 x5 x 74 gt An O nlogn algorithm for every input data size gt Complex and real valued input and output gt C2C Complex input to complex output gt R2C Real input to complex output gt C2R Symmetric complex input to real output 1D 2D and 3D transforms Execution of multiple 1D 2D and 3D transforms simultaneously Single precision 32 bit floating point and double precision 64 bit floating point In place and out of place transforms FFTW compatible data layouts Arbitrary intra and inter dimension element strides strided layout Streamed execution enabling asynchronous computation and data movement Transform sizes up to 128 million elements in single precision and up to 64 million elements in double precision in any dimension limited by the available GPU memory Yy v v v v v v yvy www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 1 Introduction gt Thread safe API that can be called from multiple
13. of the plan to be destroyed Return Values CUFFT successfully destroyed the FFT plan The plan parameter is not a valid handle The CUFFT library failed to initialize www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 28 CUFFT API Reference 3 9 CUFFT Execution 3 9 1 Functions cufftExecC2C and cufftExecZ2Z cufftResult cufftExecC2C cufftHandle plan cufftComplex idata cufftComplex odata int direction cufftResult cufftExecZ2Z cufftHandle plan cufftDoubleComplex idata cufftDoubleComplex odata int direction cufftExecC2C cufftExecZ2Z executes a single precision double precision complex to complex transform plan in the transform direction as specified by direction parameter CUFFT uses the GPU memory pointed to by the idata parameter as input data This function stores the Fourier coefficients in the odata array If idata and odata are the same this method does an in place transform Input The cufftHandle object for the plan to be executed idata Pointer to the complex input data in GPU memory to transform Pointer to the complex output data in GPU memory The transform direction CUFFT_FORWARD Or CUFFT_INVERSE Output Contains the complex Fourier coefficients Return Values CUFFT successfully executed the FFT plan The plan parameter is not a valid handle CUFFT INVALID VALUE At least one of the parameters idata odata and direction is not valid CUFFT fail
14. of total elements in the least significant dimension of the input array is then istride inembed rank 1 The inembed 0 or onembed 0 corresponds to the most significant that is the outermost dimension and is effectively ignored since the idist or odist parameter provides this information instead Note that the size of each dimension of the transform should be less than or equal to the inembed and onembed values for the corresponding dimension that is n i lt inembed i n i lt onembed i where i 0 rank 1 The idist and odist parameters indicate the distance between the first element of two consecutive batches in the input and output data One can derive the total input data size as idist batch in units of transform elements e g cufftComplex in a C2C single precision transform www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 9 Using the CUFFT API 2 7 Streamed CUFFT Transforms Every CUFFT plan may be associated with a CUDA stream Once so associated all launches of the internal stages of that plan take place through the specified stream Streaming of CUFFT execution allows for potential overlap between transforms and memory copies See the NVIDIA CUDA Programming Guide for more information on streams If no stream is associated with a plan launches take place in stream 0 the default CUDA stream and no overlap will be possible Note that many plan executions require multiple kernel launches 2 8
15. plan settings that may have been made Input www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 23 CUFFT API Reference Px The transform size e g 256 for a 256 point FFT type The transform data type e g CUFFT_C2c for single precision complex to complex Output Return Values 3 5 2 Function cufftGetSize2d cufftResult GUUELITESES iL HeZl CneetclsleiaclkS jollei imie msy me Myr GUTES Te Sale T workSize This call gives a more accurate estimate of the work area size required for a plan than cufftEstimate2d given the specified parameters and taking into account any plan settings that may have been made Input Px The transform size in the x dimension number of rows The transform size in the y dimension number of columns type The transform data type e g CUFFT_C2R for single precision complex to real Output Return Values CUFFT successfully returned the size of the work space The plan parameter is not a valid handle www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 24 CUFFT API Reference CUFFT_ALLOC_ FAILED The allocation of GPU resources for the plan failed One or more invalid parameters were passed to the API An internal driver error was detected The CUFFT library failed to initialize Either or both of the nx or ny parameters is not a supported size 3 5 3 Function cufftGetSize3d cufftResult CUMEFEESie Sats Kol eti reene olem G
16. plan parameter is not a valid handle CUFFT_INVALID_ VALUE At least one of the parameters idata and odata is not valid CUFFT_INTERNAL_ERROR An internal driver error was detected CUFFT_EXEC_FAILED CUFFT failed to execute the transform on the GPU 3 9 3 Functions cufftExecC2R and cufftExecZ2D cufftResult cufftExecC2R cufftHandle plan cufftComplex idata cufftReal odata cufftResult cufftExecZ2D cufftHandle plan cufftComplex idata cufftReal odata cufftExecC2R cufftExecZ2D executes a single precision double precision complex to real implicitly inverse CUFFT transform plan CUFFT uses as input data the GPU memory pointed to by the idata parameter The input array holds only the nonredundant complex Fourier coefficients This function stores the real output values in the odata array and pointers are both required to be aligned to cufftComplex data type in single precision transforms and cufftDoubleComplex type in double precision transforms If idata and odata are the same this method does an in place transform Input plans The cufftHandle object for the plan to be executed Pointer to the complex input data in GPU memory to transform Pointer to the complex output data in GPU memory www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 30 CUFFT API Reference Output Contains the complex Fourier coefficients Return Values CUFFT successfully executed the FFT plan The plan pa
17. 3 No longer used CUFFT INVALID VALUE 4 User specified an invalid pointer or parameter CUFFT INTERNAL ERROR 5 Driver or internal CUFFT library error CUFFT EXEC FAILED 6 Failed to execute an FFT on the GPU CUFFT SETUP FAILED 7 The CUFFT library failed to initialize CUFFT INVALID SIZE 8 User specified an invalid transform size CUFFT UNALIGNED DATA 9 No longer used CUFFT INVALID DEVICE 10 Plan creation and execution are on different device CUFFT NO WORKSPACE 11 Workspace not initialized cufftResult Users are encouraged to check return values from CUFFT functions for errors as shown in CUFFT Code Examples 3 2 CUFFT Basic Plans 3 2 1 Function cufftPlan1d cufftResult www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 12 CUFFT API Reference cufftPlanld cufftHandle plan int nx cufftType type int batch Creates a 1D FFT plan configuration for a specified signal size and data type The batch input parameter tells CUFFT how many 1D transforms to configure Input Px The transform size e g 256 for a 256 point FFT type The transform data type e g CUFFT_C2c for single precision complex to complex Output Contains a CUFFT 1D plan handle value Return Values CUFFT_SUCCESS CUFFT successfully created the FFT plan CUFFT_INVALID_ PLAN The plan parameter is not a valid handle CUFFT_ALLOC_FAILED The allocation of GPU resources for the plan failed 3
18. A NVIDIA CUFFT LIBRARY USER S GUIDE TABLE OF CONTENTS Chapter 1 Introduction s saceccccsccdan sccaccecs cacesdcecesdasmesdiasedsancecceeaddcdaceescesatcacmetecceadac ceuca 1 Chapter 2 Using the CUFFT APU sisesssiccnscienwsasicassiesasiicinw aie vewsadcicasusenasiccasans ease sewaseea acai 3 2 1 ACCESSING CUFF Tissacssciansscansececsescanssteu ses cnansceuaseneactasasseaencheaaseeaenansaeuadennaasaaiane 4 2 2 Fourier Transform SCtUP is esacececcscnisenarssedenid enn EREE REEE ETENEE REES 4 2 3 Fourier Transform Type S ce2c 6x Sieve cecee ii enn r EEN NE NESNA NENE OEN 5 2 4 E T E E E E E E E EEA 6 2 4 1 FFIW Compatibility MOC iccceccceiasvevesvieicatGiens obs en dle tenedeeecsdsiieabednusiect E aE 7 2 5 Multidimensional transfOrMs cccccccee irs sariri Esso SEN rE N NA ETERNE E NATER EENET 7 2 6 Advanced Data LayOUt cs cscs eroria nien AN AAEE i EAEE EEE ERGY 8 2 7 Streamed CUFFT TransfOrms scccssccccescccecccecessceesneeeesseeeesseeessseeesseeeeeeaeees 10 2 8 Thread Safety asciccascsansentaswesdeadobeasewdven AENEAN ENEON EAE EAE EN ENEE NAREN 10 2 9 Accuracy and Performance sssesessesssesesecscsososcsseccocsesessecoscsesesecocsseeeseeeoo 10 Chapter 3 CUFFT API REfGrenCe ics iicescciiccecties scisnsecnns weds evdcnbesiies eed aiina 12 3 1 Return Value CutftResul tics c sosccsieeccswaaden ua ccawnaee vanes eieauawaned sueace basses emnddaweecanieecns 12 3 2 CUFFT Basic Plans errore eieiei an
19. FFTW compatible modes When desired FFTW compatibility can be configured for padding only for asymmetric complex inputs only or for full compatibility If the cufftSetCompatibilityMode API fails later cufftExecute calls are not guaranteed to work Input The cufftHandle object to associate with the stream The cuf tCompatibility option to be used CUFFT_COMPATIBILITY_NATIVE CUFFT_COMPATIBILITY_FFTW_PADDING default CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC CUFFT_COMPATIBILITY_FFTW_ALL Return Values CUFFT successfully set compatibiltiy mode The plan parameter is not a valid handle The CUFFT library failed to initialize 3 13 Parameter cufftCompatibility CUFFT Library defines FFTW compatible data layouts using the following enumeration of values See FFTW Compatibility Mode for more details typedef enum cufftCompatibility t Compact data in native format highest performance CUFFT COMPATIBILITY NATIVE 0 FFTW compatible alignment the default value CUFFT COMPATIBILITY FFTW_ PADDING 1 Waives the C2R symmetry requirement input CUFFT COMPATIBILITY FFTW ASYMMETRIC 2 CUFFT COMPATIBILITY FFTW ALL CUFFT COMPATIBILITY FFTW PADDING CUFFT COMPATIBILITY FFTW_ASYMMETRIC jC Ute ompatetonelemtays 3 14 CUFFT Types www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 32 CUFFT API Reference 3 14 1 Parameter cufftType The CUFFT library s
20. MakePlan calls CUFFT does not allocate the work area This is the preferred sequence for callers wishing to manage work area allocation Input plan Pointer to a cufftHandle object autoAllocate Boolean to indicate whether to allocate work area www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 27 CUFFT API Reference Return Values CUFFT_SUCCESS CUFFT successfully allows user to manage work area The plan parameter is not a valid handle 3 7 2 Function cufftSetWorkArea cufftResult cufftSetWorkArea cufftHandle plan void workArea cufftSetWorkArea overrides the work area pointer associated with a plan If the work area was auto allocated CUFFT frees the auto allocated space The cufftExecute calls assume that the work area pointer is valid and that it points to a contiguous region in device memory that does not overlap with any other work area If this is not the case results are indeterminate Input Output Return Values CUFFT_SUCCESS CUFFT successfully allows user to override workArea pointer An internal driver error was detected The CUFFT library failed to initialize 3 8 Function cufftDestroy cufftResult cufftDestroy cufftHandle plan Frees all GPU resources associated with a CUFFT plan and destroys the internal plan data structure This function should be called once a plan is no longer needed to avoid wasting GPU memory Input The cuff tHandle object
21. W has many plans and a single execute function while CUFFT has fewer plans but multiple execute functions The CUFFT execute functions determine the precision single or double and whether the input is complex or real valued The following table shows the relationship between the two interfaces FFTW function CUFFT function fftw_plan_dft_ld fftw_plan_dft_r2c_1d cufftPlanld fftw_plan_dft_c2r_1d fftw_plan_dft_2d fftw_plan_dft_r2c 2d cufftPlan2d fftw_plan_dft_c2r 2d fftw_plan_dft_3d fftw_plan_dft_r2c_ 3d cufftPlan3d fftw_plan_dft_c2r_3d fftw_plan_dft fftw_plan_dft_r2c cufftPlanMany fftw_plan_dft_c2r fftw_plan_many dft fftw_plan_many dft_r2c cufftPlanMany fftw_plan_many dft_c2r f ftw_execute cufftExecC2C cufftExecZ2Z cufftExecR2C cufftExecD2Z cufftExecC2R cufftExecZ2D f ftw_destroy plan cufftDestroy www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 40 Chapter 6 FFTW INTERFACE TO CUFFT NVIDIA provides FFTW3 interfaces to the CUFFT library This allows applications using FFTW to use NVIDIA GPUs with minimal modifications to program source code To use the interface first do the following two steps gt It is recommended that you replace the include file f tw3 h with cufftw h gt Instead of linking with the double single precision libraries such as fftw3 fftw3f libraries link with both the CUFFT and CUFFTW libraries After an
22. an parameter is not a valid handle CUFFT_ALLOC_FAILED The allocation of GPU resources for the plan failed CUFFT_INVALID_VALUE One or more invalid parameters were passed to the API An internal driver error was detected CUFFT_SETUP_FAILED The CUFFT library failed to initialize CUFFT_INVALID_SIZE One or more of the nx ny or nz parameters is not a supported size 3 3 5 Function cufftMakePlanMany cufftResult cufftMakePlanMany cufftHandle plan int rank int n int inembed ics trlGe mme diste Ine sOnemocd ine ositrrde WOE CGE CUREIPCINjOS Wise ame DATEN E Following a call to cuf tCreate makes a FFT plan configuration of dimension rank with sizes specified in the array n The batch input parameter tells CUFFT how many transforms to configure With this function batched plans of 1 2 or 3 dimensions may be created www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 18 CUFFT API Reference The cuf tPlanMany API supports more complicated input and output data layouts via the advanced data layout parameters inembed istride idist onembed ostride and odist All arrays are assumed to be in CPU memory Input Dimensionality of the transform 1 2 or 3 Pp Array of size rank describing the size of each dimension Pointer of size rank that indicates the storage dimensions of the input data in memory If set to NULL all other advanced data layout parameters are ignored istride Indicates t
23. application is working using the FFTW3 interface users may want to modify their code to move data to and from the GPU and use the routines documented in the FFTW Conversion Guide for the best performance The following tables show which components and functions of FFTW3 are supported in CUFFT Section in FFTW manual Supported Unsupported Complex numbers fftw_complex fftwf_complex types Precision double tw3 single fftwf3 long double tw31 quad precision f f tw3q are not supported since CUDA functions operate on double and single precision floating point quantities Memory Allocation fftw_malloc fftw_free fftw_alloc real fftw_alloc_complex fftwf_alloc_ real fftwf_alloc_complex Multi threaded FFTW fftw3_ threads fftw3_omp are not supported note that CUFFT is already multithreaded Distributed memory fftw3_mpi fftw3f_mpi are not FFTW with MPI supported www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 41 FFTW Interface to CUFFT Note that for each of the double precision functions below there is a corresponding single precision version with the letters tw replaced by fftwf Section in FFTW manual Supported Unsupported Using Plans fftw_execute fftw_cost fftw_flops exist fftw_destroy plan but are not functional fftw_cleanup fftw_print_plan Complex DFTs fftw_plan_dft_1d fftw_plan_dft_2d fftw_plan_dft_3d fftw_plan_dft Planner Flags Planner flag
24. atch of the input data www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 22 CUFFT API Reference onembed Pointer of size rank that indicates the storage dimensions of the output data in memory If set to NULL all other advanced data layout parameters are ignored ostride Indicates the distance between two successive output elements in the output array in the least significant i e innermost dimension odist Indicates the distance between the first element of two consecutive signals in a batch of the output data type The transform data type e g CUFFT_R2Cc for single precision real to complex Output Return Values One or more of the parameters is not a supported size 3 5 CUFFT Refined Estimated Size of Work Area The cuf tGetSize routines give a more accurate estimate of the work area size required for a plan than the cuf ftEstimate routines as they take into account any plan settings that may have been made As discussed in the section CUFFT Estimated Size of Work Area the workSize value returned may be conservative especially for values of n that are not multiples of powers of 2 3 5 and 7 3 5 1 Function cufftGetSize1d cufftResult CUEREGeE Suze cuit kiehHancdloms phan ae ine m CURFEW CyRe ime baten Size E workSize This call gives a more accurate estimate of the work area size required for a plan than cufftEstimateld given the specified parameters and taking into account any
25. double precision floating point real data type typedef double cufftDoubleReal 3 14 3 4 cufftComplex A single precision floating point complex data type that consists of interleaved real and imaginary components typedef cuComplex cufftComplex www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 33 CUFFT API Reference 3 14 3 5 cufftDoubleComplex A double precision floating point complex data type that consists of interleaved real and imaginary components typedef cuDoubleComplex cufftDoubleComplex www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 34 Chapter 4 CUFFT CODE EXAMPLES This chapter provides six simple examples of complex and real 1D 2D and 3D transforms that use CUFFT to perform forward and inverse FFTs 4 1 1D Complex to Complex Transforms In this example a one dimensional complex to complex transform is applied to the input data Afterwards an inverse transform is performed on the computed frequency domain representation define NX 256 define BATCH 10 cufftHandle plan cufftComplex data cudaMalloc void amp data sizeof cufftComplex NX BATCH if cudaGetLastError cudaSuccess fprintf stderr Cuda error Failed to allocate n return if cufftPlanid amp plan NX CUFFT_C2C BATCH CUFFT_SUCCESS iFjNeaKMiIcse Siecle CUER erron Plan creacion Canrledtii return Note Tdentical pointers to input and output arrays impli
26. ecute the plan multiple times without recalculation of the configuration This model works well for CUFFT because different kinds of FFTs require different thread configurations and GPU resources and the plan interface provides a simple way of reusing configurations Computing a number BATCH of one dimensional DFTs of size NX using CUFFT will typically look like this define NX 256 define BATCH 10 cufftHandle plan cufftComplex data cudaMalloc void amp data sizeof cufftComplex NX BATCH cufftPlanld amp plan NX CUFFT_C2C BATCH cufftExecC2C plan data data CUFFT FORWARD www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 3 Using the CUFFT API cudaThreadSynchronize cufftDestroy plan cudaFree data 2 1 Accessing CUFFT The CUFFT and CUFFTW libraries are available as shared libraries They consist of compiled programs ready for users to incorporate into applications with the compiler and linker CUFFT can be downloaded from http developer nvidia com cufft By selecting Download CUDA Production Release users are all able to install the package containing the CUDA Toolkit SDK code samples and development drivers The CUDA Toolkit contains CUFFT and the samples include simpleCUFFT The Linux release for simpleCUFFT assumes that the root install directory is usr local cuda and that the locations of the products are contained there as follows Modify the Makefile as appropria
27. ed An internal driver error was detected www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 26 CUFFT API Reference CUFFT_SETUP_FAILED The CUFFT library failed to initialize CUFFT_INVALID_ SIZE One or more of the parameters is not a supported size 3 6 Function cufftGetSize cufftResult CWHEIIESESiL AS OWE cclslenaehhS jOlleii Salva WOSE Once plan generation has been done either with the original API or the extensible API this call returns the actual size of the work area required to support the plan Callers who choose to manage work area allocation within their application must use this call after plan generation and after any cufftSet calls subsequent to plan generation if those calls might alter the required work space size Input plan Pointer to a cufftHandle object Output Return Values CUFFT_SUCCESS CUFFT successfully returned the size of the work space CUFFT_INVALID PLAN The plan parameter is not a valid handle 3 7 CUFFT Caller Allocated Work Area Support 3 7 1 Function cufftSetAutoAllocation cufftResult cufftSetAutoAllocation cufftHandle plan bool autoAllocate cufftSetAutoAllocation indicates that the caller intends to allocate and manage work areas for plans that have been generated CUFFT default behavior is to allocate the work area at plan generation time If cufftSetAutoAllocation has been called with autoAllocate set to false prior to one of the cufft
28. ed In the native layout no padding is required and both input and output data is formed as arrays of adequate types and sizes Sizes of input output data for all types of transforms are summarized in the table below FFT type input data size output data size C2C X cuff tComplex X cufftComplex C2R I5 l 1 cufftComplex x cufftReal X cufftReal 3 1 cufftComplex total transform size is limited to 2 see Introduction elements in in place R2C single precision native transforms The real to complex transform is implicitly a forward transform For an in place real to complex transform where FFTW compatible output is desired the input size must be padded to 2 Nit 1 real elements For out of place transforms input and output strides match the logical transform size N and the non redundant size ea 1 respectively The complex to real transform is implicitly inverse For in place complex to real FFTs where FFTW compatible output is selected default padding mode see Parameters for Transform Direction for details the input stride is assumed to be Ba 1 www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 6 Using the CUFFT API cufftComplex elements For out of place transforms input and output strides match the logical transform non redundant size N land size N respectively Starting with CUFFT version 4 1 transforms with advanced data layout are supported through the cuf tPlanMany function In th
29. ed to execute the transform on the GPU The CUFFT library failed to initialize 3 9 2 Functions cufftExecR2C and cufftExecD2Z CUFFT_INTERNAL_ERROR An internal driver error was detected cufftResult cufftExecR2C cufftHandle plan cufftReal idata cufftComplex odata cufftResult cufftExecD2Z cufftHandle plan cufftDoubleReal idata cufftDoubleComplex odata www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 29 CUFFT API Reference cufftExecR2C cufftExecD2Z executes a single precision double precision real to complex implicitly forward CUFFT transform plan CUFFT uses as input data the GPU memory pointed to by the idata parameter This function stores the nonredundant Fourier coefficients in the odata array Pointers to idata and odata are both required to be aligned to cuf tComp1lex data type in single precision transforms and cuf tDoubleComp1lex data type in double precision transforms If idata and odata are the same this method does an in place transform Note the data layout differences between in place and out of place transforms as described in Parameter cufftType Input plan The cufftHandle object for the plan to be executed Pointer to the real input data in GPU memory to transform Pointer to the real output data in GPU memory Output Contains the complex Fourier coefficients Return Values CUFFT_SUCCESS CUFFT successfully executed the FFT plan CUFFT_INVALID PLAN The
30. ed up to the eight times that of a similarly sized power of 2 These routines return estimated workSize values which may still be smaller than the actual values needed especially for values of n that are not multiples of powers of 2 3 5 and 7 More refined values are given by the cuf tGetSize routines but these values may still be conservative 3 4 1 Function cufftEstimate1d cufftResult cufftEstimateld int nx cufftType type int batch size t workSize During plan execution CUFFT requires a work area for temporary storage of intermediate results This call returns an estimate for the size of the work area required given the specified parameters and assuming default plan settings Note that changing some plan settings such as compatibility mode may alter the size required for the work area Input Px The transform size e g 256 for a 256 point FFT type The transform data type e g CUFFT_C2c for single precision complex to complex Output Return Values CUFFT_SUCCESS CUFFT successfully returned the size of the work space CUFFT_ALLOC_FAILED The allocation of GPU resources for the plan failed CUFFT_INVALID_VALUE One or more invalid parameters were passed to the API CUFFT_INTERNAL_ERROR An internal driver error was detected CUFFT_SETUP_FAILED The CUFFT library failed to initialize CUFFT_INVALID SIZE The nx parameter is not a supported size www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5
31. ee ime GONE myy GUE aap UES CJS size t workSize This call gives a more accurate estimate of the work area size required for a plan than cufftEstimate3d given the specified parameters and taking into account any plan settings that may have been made Input ee eso seintieranenson S ee sor seintiezanenson The transform data type e g CUFFT_R2Cc for single precision real to complex workSize Pointer to the size of the work space Output Return Values CUFFT successfully returned the size of the work space The CUFFT library failed to initialize CUFFT_INVALID SIZE One or more of the nx ny or nz parameters is not a supported size 3 5 4 Function cufftGetSizeMany cufftResult cufftGetSizeMany cufftHandle plan int rank int n int inembed ibaa aLisheieaoley Ine auelusie lige oumeileyeyel ache oysieiealeley www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 25 CUFFT API Reference ime CCOGIE CURIE IMYS Kye dime locteCli SLZ WOES AS 7 This call gives a more accurate estimate of the work area size required for a plan than cufftEstimateSizeMany given the specified parameters and taking into account any plan settings that may have been made The batch input parameter tells CUFFT how many transforms to configure With this function batched plans of 1 2 or 3 dimensions may be created The cuf tPlanMany API supports more complicated input and output data layouts
32. efine IX NX t2 define IY NY 1 define OX NX 3 define OY NY 4 define IDIST IX IY ISTRID define ODIST OX OY OSTRID A A t on cufftHandle plan cufftComplex idata odata aeS ize IIDISaP BATCH int osize ODIST BATCH www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 38 CUFFT Code Examples int n NRANK NX NY int inembed NRANK IX IY int onembed NRANK OX OY cudaMalloc void amp idata sizeof cufftComplex isize cudaMalloc void amp 0data sizeof cufftComplex osize if cudaGetLastError cudaSuccess fprintf stderr Cuda error Failed to allocate n return Create a batched 2D plan if cufftPlanMany amp plan NRANK n inembed ISTRIDE IDIST onembed OSTRIDE ODIST CUBE CAC BAvl Cr in aC UHH SWIC CESS ia fprintf stderr CUFFT Error Unable to create plan n recurn Execute the transform out of place ie CuUrEERxeece2Ci plan tdaca Odara UF ER HORWARD i CURED SUCCESS fprintf stderr CUFFT Error Failed to execute plan n BEEUEN if cudaThreadSynchronize cudaSuccess fprintf stderr Cuda error Failed to synchronize n reLurn cufftDestroy plan cudaFree idata cudaFree odata www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 39 Chapter 5 FFTW CONVERSION GUIDE CUFFT differs from FFTW in that FFT
33. es in place transformation ay if cuffthxecC2C plan data data CUE ORWARD E E CUFFT_SUCCESS fprintf stderr CUFFT error ExecC2C Forward failed return if cufftExecC2C plan data data CUFFT INVERSE CUFFT SUCCESS fprintf stderr CUFFT error ExecC2C Inverse failed return www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 35 CUFFT Code Examples yx Divide by number of elements in data set to get back original data ye if cudaThreadSynchronize cudaSuccess fprintf stderr Cuda error Failed to synchronize n reLvurn cufftDestroy plan cudaFree data 4 2 1D Real to Complex Transforms In this example a one dimensional real to complex transform is applied to the input data define NX 256 define BATCH 10 cufftHandle plan cufftComplex data cudaMalloc void amp data sizeof cufftComplex NX 2 1 BATCH if cudaGetLastError cudaSuccess fprintf stderr Cuda error Failed to allocate n return Ee euft Plami colam INC CUE IRC BATCH Wa COs SUCCESS Epeintm stdece CUREN error Plan crearon failed return Use the CUFFT plan to transform the signal in place i Cm biEhxcekA Ci olan Curse allits icateamecdcdied t CULE SUCCESS fprintf stderr CUFFT error ExecC2C Forward failed return if cudaThreadSynchronize cudaSuccess fprintf stderr Cuda error Failed to synchronize
34. he distance between two successive input elements in the least significant i e innermost dimension idist Indicates the distance between the first element of two consecutive signals in a batch of the input data onembed Pointer of size rank that indicates the storage dimensions of the output data in memory If set to NULL all other advanced data layout parameters are ignored ostride Indicates the distance between two successive output elements in the output array in the least significant i e innermost dimension odist Indicates the distance between the first element of two consecutive signals in a batch of the output data The transform data type e g CUFFT_R2Cc for single precision real to complex Output Return Values CUFFT successfully created the FFT plan The plan parameter is not a valid handle The allocation of GPU resources for the plan failed www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 19 CUFFT API Reference 3 4 CUFFT Estimated Size of Work Area During plan execution CUFFT requires a work area for temporary storage of intermediate results The cufftEstimate calls return an estimate for the size of the work area required given the specified parameters and assuming default plan settings Some problem sizes require much more storage than others In particular powers of 2 are very efficient in terms of temporary storage Large prime numbers however use different algorithms and may ne
35. he transform size in points For sizes handled by the Cooley Tukey code path the most efficient implementation is obtained by applying the following constraints listed in order from the most generic to the most specialized constraint with each subsequent constraint providing the potential of an additional performance improvement Restrict the size along all dimensions to be representable as 2 x 3x 5 x 74 The CUFFT library has highly optimized kernels for transforms whose dimensions have these prime factors gt Restrict the size along each dimension to use fewer distinct prime factors www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 10 Using the CUFFT API For example a transform of size 3 will usually be faster than one of size 2 x 3 even if the latter is slightly smaller gt Restrict the power of two factorization term of the x dimension to be a multiple of either 256 for single precision transforms or 64 for double precision transforms This further aids with memory coalescing gt Restrict the x dimension of single precision transforms to be strictly a power of two either between 2 and 8192 for Fermi class Kepler class and more recent GPUs or between 2 and 2048 for earlier architectures These transforms are implemented as specialized hand coded kernels that keep all intermediate results in shared memory gt Use native compatibility mode for in place complex to real or real to complex tra
36. independent host threads The CUFFTW library provides the FFTW3 API to facilitate porting of existing FFTW applications www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 2 Chapter 2 USING THE CUFFT API This chapter provides a general overview of the CUFFT library API For more complete information on specific functions see CUFFT API Reference Users are encouraged to read this chapter before continuing with more detailed descriptions The Discrete Fourier transform DFT maps a complex valued vector Xg time domain into its frequency domain representation given by N 1 X X xpe 27 n 0 where X is a complex valued vector of the same size This is known as a forward DFT If the sign on the exponent of e is changed to be positive the transform is an inverse transform Depending on N different algorithms are deployed for the best performance The CUFFT API is modeled after FFTW which is one of the most popular and efficient CPU based FFT libraries CUFFT provides a simple configuration mechanism called a plan that pre configures internal building blocks such that the execution time of the transform is as fast as possible for the given configuration and the particular GPU hardware selected Then when the execution function is called the actual transform takes place following the plan of execution The advantage of this approach is that once the user creates a plan the library retains whatever state is needed to ex
37. ir rren ar ANEA EEREN 12 3 2 1 Function cufftPlamid ssecessireissrreroiesresre isrener a aa na eee 12 3 2 2 Function CufftPlan2d cccccccccescece cece s irnia EREC AAEREN R ROESER 13 352 3 Function CUFFEPLANS sss0sscsinescesdsciee conde ssade cbeeeiegeaea es a a i oes 14 3 2 4 Function CUfftPlanMany ccceecccececeeeeecee eee eee eect N Er ESE E ETENEE NEEE 14 3 3 CUFFT Extensible Plans 2 cecsseateeewense decid svecseteeidewevee eearaetenecedeead owen dectseweses 16 3 3 lu Function CUPftCreate diwccccswedconeceudecccesgedgeteseeamsedenmeceemseatebegernheedabeeeenneees 16 3 3 2 Function cufftMakePlanid ccecccceescceenecen nee eeneeeeeneeeeeseeeeneeeeeeeeeeneaees 16 3 3 3 Function cufftMakePlan2d cccccccescceeceecesssceeeeeeeeseeeeeeeeseceeseeeesesaeeeees 17 3 3 4 Function cufftMakePlan3d cccccccceecceeeeeeeesseeeeeeeeeseseeeeeeeeseeeeeeeseeananens 18 3 3 5 Function cufftMakePlanMany cccessccceeeeceeneecenneeeeneeeeeneeeenneeeeneeeenneeees 18 3 4 CUFFT Estimated Size of Work Area c cece cece ee eeeceenceeeneeeenneeeeseeeesnseeeeaeeeees 20 3 4 1 Function cufftEstimate1d ccccccceescee cece eneeeeeeeeeeeeeeeeeeesseeeeeeeesseeenenens 20 3 4 2 Function cufftEstimate2d cccccc cee ssce cee eeneeeeceeeeseaeeseeeesesseeeeeeessaeeeeeees 21 3 4 3 Function cufftEstimate3d cccccceeseccceeeenseeeeeeeesaeeseeeessseeeeeeesesaseseeees 21 3 4 4
38. is applied to the input data define NX 64 define NY 128 www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 37 CUFFT Code Examples define NX 128 define BATCH 10 define NRANK 3 cufftHandle plan cufftComplex data int n NRANK NX NY NZ cudaMalloc void amp data sizeof cufftComplex NX NY NZ BATCH if cudaGetLastError cudaSuccess fprintf stderr Cuda error Failed to allocate n return Create a 3D FFT plan if cufftPlanMany amp plan NRANK n NULL 1 NX NY NZ inembed istride idist NULL 1 NX NY NZ onembed ostride odist CURE LECAC BACH Rel ECUR VCES fprintf stderr CUFFT error Plan creation failed return Use the CUFFT plan to transform the signal in place i amp CULEERxec 2 plan data data CUFET FORWARD CUFFT SUCCESS Epreiniize stdere CUPER Error Exec Z2 Hornward farled return if cudaThreadSynchronize cudaSuccess fprintf stderr Cuda error Failed to synchronize n return cufftDestroy plan cudaFree data 4 5 2D Advanced Data Layout Use In this example a two dimensional complex to complex transform is applied to the input data arranged according to the requirements the advanced layout define NX 128 define NY 256 define BATCH 10 define NRANK 2 Advanced interface parameters arbitrary strides define ISTRIDE 2 define OSTRIDE 1 d
39. is mode the developer can define strides between each element as well as between the signals in a batch see Advanced Data Layout 2 4 1 FFTW Compatibility Mode For some transform sizes FFTW requires additional padding bytes between rows and planes of real to complex R2C and complex to real C2R transforms of rank greater than 1 For details please refer to the FFTW online documentation One can disable FFTW compatible layout using cuf tSetCompatibilityMode Setting the input parameter to CUFFT_COMPATIBILITY_NATIVE disables padding and ensures compact data layout for the input output data for Real to Complex Complex To Real transforms Disabling padding using CUFFT native mode can provide significant speed up especially in power of two sized transforms The FFTW compatibility modes are as follows CUFFT_COMPATIBILITY_ NATIVE mode disables FFTW compatibility and achieves the highest performance CUFFT_COMPATIBILITY_FFTW_PADDING supports FFTW data padding by inserting extra padding between packed in place transforms for batched transforms default CUFFT_ COMPATIBILITY _FFTW_ASYMMETRIC guarantees FFTW compatible output for non symmetric complex inputs for transforms with power of 2 size This is only useful for artificial that is random data sets as actual data will always be symmetric if it has come from the real plane Enabling this mode can significantly impact performance CUFFT_ COMPATIBILITY _FFTW_ALL enables full
40. itional layer of abstraction above the access to input output data arrays An element of coordinates z y x in signal number b in the batch will be associated with the following addresses in the memory gt 1D input b idist x istride output b odist x ostride gt 2D input b idist x inembed 1 y istride output b odist x onembed 1 y ostride gt 3D input b idist x inembed 1 y inembed 2 z istride output b odist x onembed 1 y onembed 2 z ostride The istride and ostride parameters denote the distance between two successive input and output elements in the least significant that is the innermost dimension respectively In a 1D transform if every input element is to be used in the transform istride should be set to 1 if every other input element is to be used in the transform then istride should be set to 2 Similarly in a 1D transform if it is desired to output final elements one after another compactly ostride should be set to 1 if spacing is desired between the least significant dimension output data ostride should be set to the distance between the elements The inembed and onembed parameters define the number of elements in each dimension in the input array and the output array respectively The inembed rank 1 contains the number of elements in the least significant innermost dimension of the input data excluding the istride elements the number
41. n return cufftDestroy plan cudaFree data www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 36 CUFFT Code Examples 4 3 2D Complex to Real Transforms In this example a two dimensional complex to real transform is applied to the input data arranged according to the requirements of the native compatibility mode define NX 256 define NY 128 define NRANK 2 cufftHandle plan cufftComplex data int n NRANK NX NY cudaMalloc void amp data sizeof cufftComplex NX NY 2 1 if cudaGetLastError cudaSuccess fprintf stderr Cuda error Failed to allocate n return Create a 2D FFT plan if cufftPlanMany amp plan NRANK n Noti i NUNTI iL OW CURE MCA Ry BACH ia C UH HS Ue CH SS im fprintf stderr CUFFT Error Unable to create plan n rSeLurn if cufftSetCompatibilityMode plan CUFFT COMPATIBILITY NATIVE CUFFT SUCCESS fprintf stderr CUFFT Error Unable to set compatibility mode to native n Cercuri i Cuttie olan Cewa Cera I CURE SUCCESS fprintf stderr CUFFT Error Unable to execute plan n rSeLurn if cudaThreadSynchronize cudaSuccess fprintf stderr i WCude erron Harled to synehronaze vn ry return cufftDestroy plan cudaFree data 4 4 3D Complex to Complex Transforms In this example a three dimensional complex to complex transform
42. n a batch of the input data onembed Pointer of size rank that indicates the storage dimensions of the output data in memory If set to NULL all other advanced data layout parameters are ignored ostride Indicates the distance between two successive output elements in the output array in the least significant i e innermost dimension odist Indicates the distance between the first element of two consecutive signals in a batch of the output data type The transform data type e g CUFFT_R2Cc for single precision real to complex Output Return Values The CUFFT library failed to initialize One or more of the parameters is not a supported size www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 15 CUFFT API Reference 3 3 CUFFT Extensible Plans This API separates handle creation from plan generation This makes it possible to change plan settings which may alter the outcome of the plan generation phase before the plan is actually generated The same cufftExecute calls are used to execute all plans whether generated with this API or with the original API 3 3 1 Function cufftCreate cufftResult cufftCreate cufftHandle plan Creates only an opaque handle and allocates small data structures on the host The cufftMakePlan calls actually do the plan generation It is recommended that cufftSet calls such as cuf tSetCompatibilityMode that may require a plan to be broken down and re generated sh
43. n real to complex Output Return Values The CUFFT library failed to initialize CUFFT INVALID SIZE One or more of the nx ny or nz parameters is not a supported size 3 4 4 Function cufftEstimateMany cufftResult cufftEstimateMany plan int rank int n int inembed imc Aewien ime aichise aiding s omemloscl atime Osicieicle HONE CChiGic CUES Coe Tae betea Sala yvorelSalexs p During plan execution CUFFT requires a work area for temporary storage of intermediate results This call returns an estimate for the size of the work area required given the specified parameters and assuming default plan settings Note that changing some plan settings such as compatibility mode may alter the size required for the work area The cuf tPlanMany API supports more complicated input and output data layouts via the advanced data layout parameters inembed istride idist onembed ostride and odist All arrays are assumed to be in CPU memory Input Dimensionality of the transform 1 2 or 3 Pp Array of size rank describing the size of each dimension Pointer of size rank that indicates the storage dimensions of the input data in memory If set to NULL all other advanced data layout parameters are ignored Indicates the distance between two successive input elements in the least significant i e innermost dimension Indicates the distance between the first element of two consecutive signals in a b
44. nfiguration according to specified signal sizes and data type Input a 4 The transform size in the x dimension number of rows The transform size in the y dimension number of columns type The transform data type e g CUFFT_C2R for single precision complex to real Output Contains a CUFFT 2D plan handle value Return Values CUFFT successfully created the FFT plan The plan parameter is not a valid handle The allocation of GPU resources for the plan failed One or more invalid parameters were passed to the API An internal driver error was detected CUFFT_SETUP_FAILED The CUFFT library failed to initialize www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 17 CUFFT API Reference Either or both of the nx or ny parameters is not a supported size 3 3 4 Function cufftMakePlan3d cufftResult cufftMakePlan3d cufftHandle plan int nx int ny int nz cufftType type Following a call to cuf tCreate makes a 3D FFT plan configuration according to specified signal sizes and data type This function is the same as cuf tPlan2d except that it takes a third size parameter nz Input The transform size in the x dimension The transform size in the z dimension y type The transform data type e g CUFFT_R2C for single precision real to complex Output Contains a CUFFT 3D plan handle value Return Values CUFFT_SUCCESS CUFFT successfully created the FFT plan CUFFT_INVALID_PLAN The pl
45. nsforms This scheme reduces the write read of padding bytes hence helping with coalescing of the data Starting with version 3 1 of the CUFFT Library the conjugate symmetry property of real to complex output data arrays and complex to real input data arrays is exploited when the power of two factorization term of the x dimension is at least a multiple of 4 Large 1D sizes powers of two larger than 65 536 2D and 3D transforms benefit the most from the performance optimizations in the implementation of real to complex or complex to real transforms www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 11 Chapter 3 CUFFT API REFERENCE This chapter specifies the behavior of the CUFFT library functions by describing their input output parameters data types and error codes The CUFFT library is initialized upon the first invocation of an API function and CUFFT shuts down automatically when all user created FFT plans are destroyed 3 1 Return value cufftResult All CUFFT Library return values except for CUFFT_SUCCESS indicate that the current API call failed and the user should reconfigure to correct the problem The possible return values are defined as follows EypedetvenumnscurttEResuiie mt a CUFFT SUCCESS 0 The CUFFT operation was successful CUFFT INVALID PLAN 1 CUFFT was passed an invalid plan handle CUFFT ALLOC FAILED 2 CUFFT failed to allocate GPU or CPU memory CUFFT_INVALID TYPE
46. omplex Transforms essssssesssssssssssessessesssoosossesssssessossossossoso 35 4 2 1D Real to Complex TransformS ssessoesoesoesossossoosoosoossosssososssossssesosssssssssss 36 4 3 2D Complex to Real TransfOrms ceceesee ee eeceeeeeeee eee e ence eeeee eee neeeeseeeeeeeeeeees 37 4 4 3D Complex to Complex Transforms ccceece cece ese e eee e eee ee scenes eee eee eeeeeeeeeeeeees 37 4 5 2D Advanced Data Layout USe cccccecsscescenccenceensesesesaceeneeseeesseeeseeaseeaeesenees 38 Chapter 5 FFTW Conversion Guide ccscccccscccccsccccsccccescccecscccesscccesscceessceessecens 40 Chapter 6 FFTW Interface to CUPP T i cdseisscscdcascusastimssccnauccunesenceaaced nE Er 41 www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 iii www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 iv Chapter 1 INTRODUCTION This document describes CUFFT the NVIDIA CUDA Fast Fourier Transform FFT product It consists of two separate libraries CUFFT and CUFFTW The CUFFT library is designed to provide high performance on NVIDIA GPUs The CUFFTW library is provided as porting tool to enable users of FFTW to start using NVIDIA GPUs with a minimum amount of effort The FFT is a divide and conquer algorithm for efficiently computing discrete Fourier transforms of complex or real valued data sets It is one of the most important and widely used numerical algorithms in computational physics
47. ould be made after cufftCreate and before one of the cufftMakePlan calls plan Pointer to a cufftHandle object p ge c pr Output Contains a CUFFT plan handle value Return Values CUFFT_SUCCESS CUFFT successfully created the FFT plan CUFFT_INVALID_PLAN The plan parameter is not a valid handle CUFFT_ALLOC_FAILED The allocation of resources for the plan failed CUFFT_INVALID_VALUE One or more invalid parameters were passed to the API CUFFT_INTERNAL_ERROR An internal driver error was detected CUFFT_SETUP_FAILED The CUFFT library failed to initialize CUFFT_INVALID SIZE The nx parameter is not a supported size 3 3 2 Function cufftMakePlan1d CUEBERSS use cufftMakePlanld cufftHandle plan int nx cufftType type int batch Following a call to cuf tCreate makes a 1D FFT plan configuration for a specified signal size and data type The batch input parameter tells CUFFT how many 1D transforms to configure Input www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 16 CUFFT API Reference The transform size e g 256 for a 256 point FFT The transform data type e g CUFFT_C2c for single precision complex to complex Number of transforms of size nx Output Contains a CUFFT 1D plan handle value Return Values 3 3 3 Function cufftMakePlan2d cu ftResult cufftMakePlan2d cufftHandle plan int nx int ny cufftType type Following a call to cuf tCreate makes a 2D FFT plan co
48. rameter is not a valid handle At least one of the parameters idata and odata is not valid An internal driver error was detected CUFFT failed to execute the transform on the GPU The CUFFT library failed to initialize 3 10 Function cufftSetStream cufftResult cufftSetStream cufftHandle plan cudaStream_t stream Associates a CUDA stream with a CUFFT plan All kernel launches made during plan execution are now done through the associated stream enabling overlap with activity in other streams e g data copying The association remains until the plan is destroyed or the stream is changed with another call to cufftSetStream Input The cufftHandle object to associate with the stream stream A valid CUDA stream created with cudaStreamCreate 0 for the default stream Status Returned CUFFT_SUCCESS The stream was associated with the plan The plan parameter is not a valid handle 3 11 Function cufftGetVersion cufftResult cufftGetVersion int version Returns the version number of CUFFT Input input Pointer to the version number Output Pointer to the version number output www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 31 CUFFT API Reference Return Values CUFFT successfully returned the version number 3 12 Function cufftSetCompatibilityMode cufftResult cufftSetCompatibilityMode cufftHandle plan cufftCompatibility mode Configures the layout of CUFFT output in
49. s are ignored and the same plan is returned regardless Real data DFTs fftw_plan _dft_r2c _1d fftw_plan_dft_r2c_2d fftw_plan_dft_r2c_3d fftw_plan_dft_r2c fftw_plan_dft_c2r_1d fftw_plan_dft_c2r 2d fftw_plan_dft_c2r_3d fftw_plan_dft_c2r Read data DFT Array Not supported Format Read to Real Transform rss Not supported Read to Real Transform Not supported Kinds Advanced Interface Advanced Complex DFTs f ftw_plan_many dft with fftw_plan_many dft with 4D or multiple 1D 2D 3D transforms higher transforms or a 2D or higher batch of embedded transforms Advanced Real data fftw_plan_ many dft_r2c fftw_plan_ many dft_r2c DFTs fftw_plan_many dft_c2r with fftw_plan_many dft_c2r with multiple 1D 2D 3D transforms 4D or higher transforms or a 2D or higher batch of embedded transforms Advanced Real to Real Not supported Transforms Guru Interface Interleaved and split Interleaved format Split format arrays Guru vector and fftw_iodim struct transform sizes Guru Complex DFTs fftw_plan_guru_dft fftw_plan_guru_dft fftw_plan_guru_dft_r2c fftw_plan_guru_dft_r2c fftw_plan_guru_dft_c2r with fftw_plan_guru_dft_c2r with multiple 1D 2D 3D transforms www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 42 FFTW Interface to CUFFT Section in FFTW manual Supported Unsupported 4D or higher transforms or a 2D or higher batch of transforms Guru E Not supported Tran
50. sforms 64 bit Guru Interface La Not supported New array Execute fftw_execute_dft Split format and real to real Functions fftw_execute_dft_r2c functions fftw_execute_dft_c2r with interleaved format Wisdom fftw_export_wisdom_to file ff tw_import wi sdom_f rom_f ile exist but are not functional Other wisdom functions do not have entry points in the library www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 43 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all other information previously supplied NVIDIA Corporation products are not authorized as critical components in life suppor
51. t devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U S and other countries Other company and product names may be trademarks of the respective companies with which they are associated Copyright 2007 2013 NVIDIA Corporation All rights reserved eo www nvidia com NVIDIA
52. te for your system Product Location and name Include file nvcc compiler bin nvcc CUFFT library lib 11b64 libcufft so inc cufft h CUFFTW library lib 11b64 libcufftw so inc cufftw h The most common case is for developers to modify an existing CUDA routine for example filename cu to call CUFFT routines In this case the include file cuf t h should be inserted into filename cu file and the library included in the link line A single compile and link line might appear as gt usr local cuda bin nvec options filename cu I usr local cuda inc L usr local cuda lib lcufft Of course there will typically be many compile lines and the compiler g may be used for linking so long as the library path is set correctly Users of the FFTW interface see FFTW Interface to CUFFT should include cufftw h and link with both CUFFT and CUFFTW libraries For the best performance input data should reside in device memory Therefore programs in the CUFFT library assume that the data is in GPU memory For example if one of the execution functions is called with data in host memory the program will return CUFFT_EXEC_FAILED Programs in the CUFFTW library assume that the input data is in host memory since this library is a porting tool for users of FFTW If the data resides in GPU memory the program will abort 2 2 Fourier Transform Setup The first step in using the CUFFT Library is to create a plan using one of the following
53. upports complex and real data transforms The cuf tType data type is an enumeration of the types of transform data supported by CUFFT Eeler entm Custe E i CUFFT_R2C 0x2a Real to complex interleaved CUFFT_C2R 0x2c Complex interleaved to real CUFFT_C2C 0x29 Complex to complex interleaved CUFFT D2Z 0x6a Double to double complex interleaved CUFFT Z2D 0x6c Double complex interleaved to double CUFFT Z2Z 0x69 Double complex to double complex interleaved cufftType 3 14 2 Parameters for Transform Direction The CUFFT library defines forward and inverse Fast Fourier Transforms according to the sign of the complex exponential term define CUFFTFORWARD 1 define CUFFTINVERSE 1 CUFFT performs un normalized FFTs that is performing a forward FFT on an input data set followed by an inverse FFT on the resulting set yields data that is equal to the input scaled by the number of elements Scaling either transform by the reciprocal of the size of the data set is left for the user to perform as seen fit 3 14 3 Other CUFFT Types 3 14 3 1 cufftHandle A handle type used to store and access CUFFT plans The user receives a handle after creating a CUFFT plan and uses this handle to execute the plan typedef unsigned int cufftHandle 3 14 3 2 cufftReal A single precision floating point real data type typedef float cufftReal 3 14 3 3 cufftDoubleReal A
54. ut Contains a CUFFT 3D plan handle value Return Values An internal driver error was detected The CUFFT library failed to initialize CUFFT_INVALID_SIZE One or more of the nx ny or nz parameters is not a supported size 3 2 4 Function cufftPlanMany cufftResult cufftPlanMany cufftHandle plan int rank int n int inembed int istride int idist int onembed int ostride int odist cufftType type int batch www nvidia com CUFFT Library User s Guide DU 06707 001_v5 5 14 CUFFT API Reference Creates a FFT plan configuration of dimension rank with sizes specified in the array n The batch input parameter tells CUFFT how many transforms to configure With this function batched plans of 1 2 or 3 dimensions may be created The cuf tPlanMany API supports more complicated input and output data layouts via the advanced data layout parameters inembed istride idist onembed ostride and odist All arrays are assumed to be in CPU memory Input Dimensionality of the transform 1 2 or 3 Pp Array of size rank describing the size of each dimension Pointer of size rank that indicates the storage dimensions of the input data in memory If set to NULL all other advanced data layout parameters are ignored istride Indicates the distance between two successive input elements in the least significant i e innermost dimension idist Indicates the distance between the first element of two consecutive signals i

CUFFT Library User's Guide

Contents

Download Pdf Manuals

Related Search

Related Contents