Home

Floating Point Fast Fourier Transform v2.1

1. Floating Point Fast Fourier Transform v2 1 User manual Introduction The Fast Fourier Transform FFT is an efficient algorithm for computing the Discrete Fourier Transform DFT This Intellectual Property IP core was designed to offer very fast transform times while keeping a floating point accuracy at all computational stages Sundance s core is the fastest and the most efficient available in the FPGA world Based on a radix 32 architecture it also saves memory resources compared to other floating point cores available on the market Features e This IP core targets the following devices gt Xilinx Virtex II Virtex II Pro Spartan 3 and Virtex 4 e Forward and inverse complex FFT e Transform sizes 2 with m 8 to 20 256 512 1024 IM points e Arithmetic type floating point e Data formats gt IEEE 754 gt 24 bit mantissa 8 bit exponent 2 s complement gt 14 bit mantissa 8 bit exponent 2 s complement gt Any mantissa and exponent precision upon request e Configurable on the fly forward or inverse operation e Configurable on the fly transform length e Fully functional VHDL testbench and the related Matlab functions delivered along the FFT IFFT core for simulation purposes and specific performance characterization Fast Fourier Transform product manual May 2005 www sundance com l Floating Point Fast Fourier Transform v2 1 Functional description The Discrete Fourier Tran
2. Reall The data output file 1s coded in float and is organized as follow Linel RealO Line2 ImagQ Line3 Reall Line4 Imagl Line5 Real2 The UseFFTw program can be modified and recompiled by users using Microsoft Visual Cre Fast Fourier Transform product manual May 2005 www sundance com 12 Floating Point Fast Fourier Transform v2 1 Waveforms Start Figure 4 Start Figure 3 shows how the FFT core must be started The start signal is driven high for one clock cycle The first address for the data and twiddles is generated after 7 clock cycles The user then fetches the twiddles and data in the memory and drives the signal tw din valid high A new data and twiddle are then expected every new clock cycle Fast Fourier Transform product manual May 2005 www sundance com 13 Floating Point Fast Fourier Transform v2 1 Data input memory bank swap Mame Value di 1 A0 i AE i H4 AR NBS ASO H2 ASA ABA Figure 5 Memory bank swap When the core requires a new pass to be computed it needs to get the results data from the previous pass as input data A pass transition is indicated by an inversion of the din_bank signal This signal can be used to multiplex the memory banks connected to the core during processing Fast Fourier Transform product manual May 2005 www sundance com 14 Floating Point Fast Fourier Transform v2 1 Continuous processing between two consecutive passes
3. the address generators It is only asserted once for continuous data processing the core will restart automatically every time a transform is complete A Fast Fourier Transform product manual May 2005 www sundance com 3 Floating Point Fast Fourier Transform v2 1 es ee core and discard the transform that was currently computed stop l Input FFT stop signal active high Stop is asserted for one clock cycle to indicate that the current transform being computed is the last one The core will not restart automatically a new transform after a stop pulse is received done Output FFT done signal active high A done pulse indicates that the ma results of the current transform are ready The done pulse is active on the first active cycle of the result_valid signal section of this document for more details Lee pp FFT direction High lt FFT Low IFFT This signal is registered inside the core on a start pulse empty pipeline 1 Input Empty the core pipeline before processing the next FFTAFFT pass If High this signal will force the core to wait for all the data of an FFT IFFT pass to be output before the next pass can be started This is useful in a configuration where the processing memory banks are swapped every new pass If Low the FFT core will start reading the data from the memory before the core has completed the calculations for the previous pass This signal is registered inside the core on a start pulse tw din addr
4. May 2005 www sundance com 4 Floating Point Fast Fourier Transform v2 1 dout 2 Mbw 2 Ebw or 32 for IEEE 754 result valid Table 2 Ports definition Transform length Output Data output memory bank This signal indicates which data memory bank is used as the output bank Data output This bus should be connected to the output data bank currently used for processing The data decomposition is as follows Real mantissa bits Mbw 1 down to 0 Imag mantissa bits 2 Mbw 1 down to Mbw Real exponent bits 2 Mbw Ebw 1 down to 2 Mbw Imag exponent bits 2 Mbw 2 Ebw 1 down to 2 Mbw Ebw Data out valid strobe This signal indicates that the data on the dout bus are valid and can be written to a memory bank for further processing Result valid strobe This signal indicates that the data on the dout bus are the final results of the transform and must be written to the results memory bank The FFT transform length is a parameter fed to the core This parameter can be either constant or can be changed on the fly in order to perform an FFT or Inverse FFT with a different transform length The FFT length parameter as well as the FFT direction FFT nIFFT is registered when a start pulse is sent to the core In the case the FFT transform length is a constant parameter passed to the core it is recommended to match the address bit width addr width with the length N of the transform addr width log2 N This will yield the best s
5. Point Fast Fourier Transform v2 1 Results Hame Y alye 1 84 34 1 456 1 5435 1 S440 1 442 1 Gbdd G44 Figure 8 Results When the last pass of the algorithm is processed the data coming out of the core are the results of the transform These results are in a non sequential order and must be written in memory at the addresses given on the dout addr bus The transform results are stored in memory in a bit reversed order Fast Fourier Transform product manual May 2005 www sundance com 17 Floating Point Fast Fourier Transform v2 1 Done Mame Y alye 25 50 1 15 52 1 125 54 1 125 36 1 125 36 1 12540 i 12542 12 Figure 9 Done After the last result data has been output from the core the done signal is high for one clock cycle indicating the completion of the transform A new transform is then processed through the core Fast Fourier Transform product manual May 2005 www sundance com 18
6. agram below shows a configuration that uses as few memory banks as possible Please note that a system using dual port memory or QDR SRAM will only require one data bank Input and processing bank Data Bank A Floating pol nt Twiddle factors FFT core Processing bank Figure 1 Minimum memory usage architecture Fast Fourier Transform product manual May 2005 www sundance com 6 Floating Point Fast Fourier Transform v2 1 The output data bank is either A or B The number of passes through the core will help to determine which one is the output data bank Table 3 shows the number of passes in function of the transform length If the number is odd for a given transform length the FFT results will be in data bank B If even the results will be stored in data bank A Streaming IO architecture A streaming IO architecture is presented below for continuous data processing Please note that a system using Dual Port Memory or QDR SRAM will only require two data banks Input bank Data Bank A Twiddle factors Floating Data Bank B point FFT core Processing banks Output bank Data Data Bank C Bank D Figure 2 Streaming IO memory architecture Streaming IO processing with concurrent data input and data output requires 5 memory banks to be connected to the FFT core In this type of architectures the maximum continuous throughput depends on the number of passes through the FFT engine and the cl
7. din_bank eee na ceemenee eee eeeeemeneneeeseeeemeneneeesseeeseneneaeaseneoeneneaessenecenmneaedheneceseneamenssscesseeamenenscesMnseeseeenmeneneeeseeeemenneeee eee eemennneae nee eeseaneeaean sees eanneGees eee eSmeGOMee HOSES GHOOOEnE HOSE AS CROSS OHOO EAS SGOESO BOOS SOOESSESOROOE SESE SHES REOE SOS ESSHEOREOOES CESS ERO SEOOES ESOS SSOROOOEOE ESOS SOREOOEEOEOOE EO SUOESOEOOESOOSROEEOMOOE SOSH SOEOOMEOESOCHSSEEOMEGESOeHSSenMmenaEeeeaeE E m tw E5E99B frresssnnnnnunnnnunununnnnnnnnnnnnsnunnnnnnnnsnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn annnm rier reer Tete reee etre renee treet eer ree etree eter rere ree etree eee errr ree terete eee reer ee ener terete rere ree eter ree teeter eee errr eee eee eee eee ener eet eee eee enero eee eee reer amp din 1 00 C 4 n Seine E EAE AEAEE A E sees EAEAP AE EEE IE N AEE min E AAE EE A E E meine man s S I O E mn BETE TE E PAE EEE EE E E E wmaie we E RE E AEE OE nape EE PE AE EEE PE A EE AE ET EE EE E E AE E ERE E E P see we EE EAE NERE AEE OEE E PAE EE E TEE EEE A P E ER TAE AE EE PAE E EE RE E EE E P eases EE ERE EE cee een eens ce eee ceeememeneeeeeeeemeeee eee ence eeeaeee eee ee eeneeeee ences eeemeeeae senna seeemameneneeeemamenenacens Mee eeeeeeeeneaeeeeeeeeeeaeeeeeeeeeeeeeeeaeaeeeeeeeeeaeeeeeeeeeeeeeeeeaaeeeeeeeeeGeeeeeceeeeemeeeeeeee eee eeenaaeeeeeeeeeneeeeeeeeeeeeeeeeeaeeeeeeeeenSaeeeeeeeeeeGeeeenaeee eee eenaaeeeeeeeeeeeeeeeeeeeeeeeeeeaeeeeeeeeeeeeeeeeeeeeeeeeeeeaeeeeeeeeeneeaeeeeeeeeeeeeeea
8. eeeeeaeeeeemeeeneeeee dout_addr valid E dout_addr AEE nent aostaccneersn starr ananscai AA A AAAA E A N A E A EAA EA A E E A AA Sih A A A T TA EAA EA AA AA E AA AE EA A A A A A A AA pecan easton tenenenes A A E tin seactwasterer sic tur A T E A E A A A sree see ferrsssnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn E AE AE et EEE A wea ae ete AEA DEPA AEEA AE wea arson fa AEA AAAA AAEE EPE AENA AAE AEAN E EAE ADEE AIEE EAE E AE EAE E AENA EAE AE AAEE EIE AA AEE ESE EPE A AAAS AEAEE EA E ATEI EPEA OAE A EA EEA E EENE E E a E E AE A E EE A AEEA N EEE EEEE E E EEEE EE E EE AAEE EEE EE Figure 6 empty_pipeline low When a pass transition occurs the din bank and dout bank signals are inverted However due to the core latency the dout bank signal is inverted after the din _ bank signal when all the data for the previous pass have been processed through the core Forcing the empty pipeline signal low when starting the core will enable to continuously process data through the core without pausing between two consecutive passes As a result the core will need to access the same memory bank for read and write operations simultaneously Therefore if this mode is used the processing memory banks connected to the core must be dual port Fast Fourier Transform product manual May 2005 www sundance com 15 Floating Point Fast Fourier Transform v2 1 Halted processing betwee
9. he core see Data format The data to be input to the FFT core and the twiddle factors are saved in a text format respectively in the fftcore data _in txt and fftcore_twiddle txt files Analyse FFT_results Matlab m This Matlab program reads the output result file fftcore results txt from the FFT core calculates the expected results with the fft Matlab function and returns the Signal to Noise Ratio The data used for the transform calculation by the Matlab fft function come from the FFT_test m program Analyse FFT_results FFTw m This Matlab program reads the output result file from the FFT core reads the FFT results from the FFTw results file and returns the Signal to Noise Ratio UseFFTw This directory contains the source files and executables of the UseFFTw program that reads the data input for the FFT core fftcore data _in txt and calculates the FFT results using the FFT w functions http www fftw org Three parameters are expected when executing the program FFT length 256 512 or 1048576 Data input file name fftcore data in Data output file name fftw_results Fast Fourier Transform product manual May 2005 www sundance com 11 Floating Point Fast Fourier Transform v2 1 The data input file is coded in integer format for the mantissa exponent and is organized as follow Linel Mantissa RealO Line2 Mantissa Imag0 Line3 Exponent RealO Line4 Exponent Imag0O Line5 Mantissa
10. he core to be connected to much less memory for the same processing performances than designs with radix 2 butterflies implemented in parallel The following table shows how much memory is required to perform an FFT in both configurations Letiongth in Mbytes Ginbytes FFT length in Mbytes in Mbytes 256 Table 5 Radix 32 vs Radix 2 memory usage Data throughput maximum data throughput as shown in Table 7 Using a radix 32 architecture substantially reduces the number of memory resources required The main benefit is seen at the system level A single width PMC module used to perform long transforms with Sundance s FFT core achieves the same level of processing performances as a radix 2 implementation spread over two 6U CompactPCI boards bundled with multiple FPGAs and memory devices Fast Fourier Transform product manual May 2005 www sundance com 8 Floating Point Fast Fourier Transform v2 1 Resources usage and performances The following table summarizes the resources usage and performances of a 24 bit mantissa 8 bit exponent floating point FFT IFFT core Device Slices Multipliers 18x18 POOR AMS Fmax yon yy 12394 40 36 200 2 MHz Virtex I Pro 12293 40 36 175 MHz Spartans 12835 40 36 105 3 MHz Table 6 Core resources usage The FFT IFFT processing time with an FPGA internal clock running at 200MHz is shown in the table below FFT IFFT transform size Processing time Sustained throughput in n o
11. his core when used in combination with Sundance s float converter is compliant to the IEEE standard 754 for Binary floating point arithmetic Other data formats available for this core are coded in 2 s complement for both the mantissa and exponent The 8 bit exponent ranges from 128 to 127 The 24 bit mantissa ranges from 8388608 to 8388607 a Top enea eng that require a different bit width p values will range from 2 to Deeb The exponent bit width is noted Ebw The mantissa bit width is noted Mbw Parameters and Ports definitions Parameter name Description addr width integer gt 8and Address width This parameter also noted Abw lt 20 indicates the width of the address bus for twiddle factors and data If N 1s the maximum transform length used for computing the FFT then Abw log2 N Please note that the transform length can be changed on the fly by assigning a new FFT length when restarting the core However this new transform length cannot be larger than 2 Assigning the smallest address width as possible is recommended for achieving higher clock frequencies during synthesis Table 1 Parameters definition cke 1 Input Clock enable active high When low the clock inside the core is disabled If forced low the cke signal must be remain low for at least 4 clock cycles to ensure proper operation of the core FFT start signal active high Start is asserted for one clock cycle to start the core and
12. n two consecutive passes Hame Value I 45 35 I 45 40 I 45 45 I 45 50 I 43 55 I 45 60 I 45 65 I h oo m PEELE ELLE PEELE Eee eee eee ee ee PEPE Eee ELLE CELL Prereree eee reer reer err rr rrr rrr rrrr rr rr errertrer errr rer rere rrr errr rrr errr iret etree rrr rr ere treet reer Peree rere errr rrr reer reer ere eee rrr errr rer er rere rr errr rr err rr reer errr reer errr ere reer err reer treet ri reer errr reer reer errr ere errr errr iret er rrr r rere eee eee ee eee PEPE eee ELLE LLL CEE PEELE LEE ELL EEL Prerereeeer rere rere reer err rrrr errr errr rere reer er errr errr reer reer eri reer rte terre err rrr err rere rrr rir erer reer eee rrr err re eee rete reer rrr errr errr errr rr err rr rere eer reer rrr ere rrr er reer reer ri reer errr errr rere re rier reer rer eri rete reer rrr errr eer eee PEPE Eee EEL LCL ELLE LLL CEE Eee eee ee eee a PEELE LLL ELE ELLE LL CELE CU ER EER ER EE Figure 7 empty_pipeline high When the empty pipeline signal is driven high the core will pause the processing between two consecutive passes in order to empty the data pipeline As shown on the waveform above a new pass is started only when all the data from the previous pass have been processed through the core and written to memory This mode should be used when the data processing memory banks are single port Fast Fourier Transform product manual May 2005 www sundance com 16 Floating
13. ock rate is it running at The table below shows how the memory banks are used when performing several transforms in a row Bank Pass 1 Pass 2 Pass 3 Pass 1 Pass 2 Pass 3 Pass 1 Pass 2 Pass 3 FFT 1 FFT 1 FFT 1 FFT 2 FFT 2 FFT 2 FFT 3 FFT 3 FFT 3 Data A A Write input data for FFT 2 FFT FFT read FFT write FFT read FFT write FFT read FFT write Read output results of FFT 2 Read output results of FFT 1 Write input data for FFT 4 Read output results of FFT 0 Write input data for FFT 3 FFT read Twidales Table 4 Memory banks for a streaming IO architecture Fast Fourier Transform product manual May 2005 www sundance com 7 Floating Point Fast Fourier Transform v2 1 Memory latency The FFT core generates the addresses for twiddles factors data input and data output The memory latency is calculated as the number of clock cycles it takes between the address 1s valid on the core address bus and the twiddle factors or data are available at the input of the FFT core This latency can be up to 15 clock cycles The FFT core expects the latency to be the same for the twiddle factors and the data input and to remain the same during the transform computation This latency is automatically calculated inside the FFT core by monitoring the tw_din valid signal driven high by the user few clock cycles after tw_din_ addr valid goes high Radix 32 vs Radix 2 Sundance s radix 32 butterfly architecture allows t
14. s oe Table 7 Core performances Fast Fourier Transform product manual May 2005 www sundance com 9 Floating Point Fast Fourier Transform v2 1 The following graph displays the Signal to Noise Ratio of a Fast Fourier Transform performed over a 1024 points random vector with a 24 bit wide mantissa and 8 bit wide exponent The software Discrete Fourier Transform was calculated using the FFTw function with a float accuracy http www fftw org SNR 20log Signal Naise 100 200 300 400 500 600 700 800 900 1000 Frequency components Figure 3 FFT SNR Fast Fourier Transform product manual May 2005 www sundance com 10 Floating Point Fast Fourier Transform v2 1 Testbench and Matlab programs The FFT core package comprises a VHDL testbench three Matlab programs and a C program implementing the FFTw functions fftcore TB vhd This testbench is designed to work with the FFT core It reads the twiddle factors from a file fftcore_twiddle txt and stores them in the twiddle factors memory bank connected to the core The input data are also read from a file fftcore data_in txt and stored in a memory bank that will be accessed by the core once started Upon the transform completion the results available in one of the processing memory banks are written to a file fftcore results txt FFT _test m This Matlab program generates data and twiddle factors in the floating point format expected by t
15. sform DFT of length N N 2 calculates the sampled Fourier transform of a discrete time sequence with N points evenly distributed The forward DFT with N points of a sequence x n can be written as follows N I aan XW do ME withk 0 1 N 1 Equation 1 DFT The inverse DFT is given by the following equation 1 av Dae x n dX Me withn 0 1 N 1 Equation 2 Inverse DFT Algorithm The FFT core uses a decomposition of radix 2 butterflies for computing the DFT With 5 different stages the processing of the transform requires log32 N stages To maintain an optimal signal to noise ratio throughout the transform calculation the FFT core uses a floating point architecture with 8 bit exponent for the real and imaginary part of each complex sample This FFT core employs the decimation in frequency DIF method This FFT core is designed for FFT computation larger or equal to 1k points and up to 1M points Since FPGAs memory resources are limited and relatively small the memory banks used for the processing of the transform are not integrated in the core External memory such as QDR SRAM ZBT RAM DDR SDRAM or SDRAM is most suited for transforms larger than 16384 points For shorter transforms memory banks can likely be implemented inside the FPGA depending on which device is used Fast Fourier Transform product manual May 2005 www sundance com 2 Floating Point Fast Fourier Transform v2 1 Data format T
16. valid l Output Address valid strobe This signal indicates that the current tna O ON tee one adr and din adar mevali o o Twiddle factors address bus This busgives the address in the memory where the twiddle factors must be read from Data input address bus This bus gives the address in the memory where the input data must be read from Data input memory bank This ee a nl indicates which data memory bank is used as the ee a nl bank tw 2 Mbw 2 Ebw Input Twiddle factors input This bus should be connected to the or 32 for memory containing the twiddle factors The data IEEE 754 decomposition is as follows Real mantissa bits Mbw 1 down to 0 Imag mantissa bits 2 Mbw 1 down to Mbw Real exponent bits 2 Mbw Ebw 1 down to 2 Mbw Imag exponent bits 2 Mbw 2 Ebw 1 down to 2 Mbw Ebw 2 Mbw 2 Ebw Input Data input This bus should be connected to the input data or 32 for bank currently used for processing The data decomposition is IEEE 754 as follows Real mantissa bits Mbw 1 down to 0 Imag mantissa bits 2 Mbw 1 down to Mbw Real exponent bits 2 Mbw Ebw 1 down to 2 Mbw Imag exponent bits 2 Mbw 2 Ebw 1 down to 2 Mbw Ebw tw din valid Input Twiddle factors data input valid This signal should be asserted high when the data input din and twiddle factors tw are valid current address on the dout addr bus 1 1S valid the memory where the output data dout must be written to Fast Fourier Transform product manual
17. ynthesis results and guarantee an optimal clock frequency for this implementation In any other idth case jaddr_wid must be bigger or equal to the longest transform length N The following table shows the FFTlength code for a given transform length Transform length FFTlength Number of passes code through the core Table 3 FFTlength codes Fast Fourier Transform product manual May 2005 www sundance com 5 Floating Point Fast Fourier Transform v2 1 Twiddle factors The twiddle factors used during the transform computation must be stored in a memory accessible by the FFT core The twiddle factors for a forward FFT of length N are given by the following equation j2ak Tw k withk 0 1 N 1 Equation 3 Twiddle factors DFT The inverse FFT twiddle factors can be calculated as follows Tw k or with k 0 1 N 1 Equation 4 Twiddle factors IDFT The FFT core package comprises a Matlab program FFT test m and subroutines that generate the twiddles factors and write them to a file fftcore twiddle in the floating point format required Memory The memory banks are external to the FFT core Two banks are dedicated to data processing The signals din bank and dout bank indicate which bank is used for input and which bank is used for output Every new pass the banks are swapped as the FFT core needs to access the data calculated from the previous pass Minimal memory usage architecture The block di

Floating Point Fast Fourier Transform v2.1

Contents

Download Pdf Manuals

Related Search

Related Contents