Home

"TMS320C54x DSP Library User's Guide"

1. ooo 3 5 3 5 Where to Find Sample Code ceeee cece eee e eee eee 3 5 3 6 How DSPLIB is Tested Allowable Error ooooooooooo 3 6 3 7 How DSPLIB Deals With Overflow and Scaling Issues 3 7 3 8 Where DSPLIB Goes from Here ceeeeeeee eee e eens 3 9 DSPLIB Data Types 3 1 DSPLIB Data Types DSPLIB handles the following fractional data types Li Q 15 DATA A Q 15 operand is represented by a short data type 16 bit that is predefined as type DATA in the dsplib h header file Y Q 31 LDATA A Q 31 operand is represented by a long data type 32 bit that is predefined as type LDATA in the dsplib h header file YU Q 3 12 Contains 3 integer bits and 12 fractional bits Unless specifically noted DSPLIB operates on Q15 fractional data type ele ments Appendix A presents an overview of Fractional Q formats 3 2 DSPLIB Arguments 3 2 TI DSPLIB functions typically operate over vector operands for greater effi ciency Even though these routines can be used to process short arrays or even scalars unless a minimum size requirement is noted they will be slower on those cases Y Vector stride is always equal 1 Vector operands are composed of vector elements held in consecutive memory locations vector stride equal to 1 Y Complex elements are assumed to be stored in a Re Im format Lj In place computation is allowed unless specifically noted Source operand can b
2. Pointer to input vector of size nx Pointer to input matrix of size row1 col1 row A r row1 col2 Pointer to output data vector of size row1 col2 number of rows in matrix 1 number of columns in matrix 1 Pointer to input matrix of size row2 col2 number of rows in matrix 2 number of columns in matrix 2 Pointer to output matrix of size row1 col2 Length of input data vector Returns the minimum element of a vector x Multiply input matrix A M by N by input matrix B N by P using 2 nested loops fori 1toM fork 1toP temp 0 forj 1toN temp temp A i j B k C i k temp Function Descriptions 4 55 mtrans Overflow Handling Methodology Not applicable Special Requirements Verify that the dimensions of input matrices are legal Implementation Notes none Example See examples minval subdirectory Benchmarks Cycles Core row1 7 11 6 col1 col2 Overhead 71 Code size in 16 bit words 65 CN 20 Tanspose Function short oflag mtrans DATA x DATA r ushort nx defined in mtrans asm Arguments x row col Pointer to input matrix In place processing is not allowed row Number of rows in matrix col Number of columns in matrix r row col Pointer to output data vector of size nx containing Description This function transponse matrix x Algorithm fori 1toM forj 1toN C j i A i j Overflow Handling Methodology Scaling implemented for overflow prevention user selectable Special Req
3. The 40 bit C54x accumulators provide eight guard bits to allow up to 256 con secutive MAC operations before an accumulator overrun a very useful fea ture when implementing for example FIR filters There are four specific ways DSPLIB deals with overflow as reflected in each function description D Scaling implemented for overflow prevention In this type of function DSPLIB scales the intermediate results to prevent overflow Overflow should not occur as a result Precision is affected but not significantly This is the case of the FFT functions in which scaling is used after each FFT stage LJ No scaling implemented for overflow prevention In this type of func tion DSPLIB does not scale to prevent overflow due to the potentially strong effect in data output precision or in the number of cycles required This is the case for example of the MAC based operations like filtering correlation or convolutions The best solution on those cases is to design your system for example your filter coefficients with a gain less than 1 to prevent overflow In this case overflow could happen unless you input scale or you design for no overflow Lj Saturation implemented for overflow handling In this type of function DSPLIB has enabled the C54x 32 bit saturation mode OVM bit 1 This is the case of certain basic math functions that require the saturation mode to be enabled to work I Not applicable In this type of function due to the natu
4. low beginning of biquad 1 d1 n 2 high d1 n 1 low d1 n 1 high d1 n low d1 n high d2 n 2 low beginning of biquad 2 d2 n 2 high d2 n 1 d2 n 1 high n low Y Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between Function Descriptions 4 39 iir32 Description Algorithm consecutive blocks the delay buffer preserves the previous r output elements needed U Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 3 nbiq nbiq Number of biquads nx Number of elements of input and output vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Computes a cascaded IIR filter of nbiquad biquad sections using 32 bit coeffi cients and 32 bit delay buffers The input data is assumed to be single preci sion 16 bits Each biquad section is implemented using Direct form II All biquad coeffi cients 5 per biquad are stored in vector h The real data input is stored in vec tor a The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion is more efficient for block by block filter implementation
5. r nr Pointer to real output vector containing the first nr elements ofthe positive side of the autocorrelation function of vector a r must be different than a in place computation is not allowed nx Number of real elements in vector x nr Number of real elements in vector r type Auto correlation type selector Types supported _j If type raw r will contain the raw autocorrelation of x _j If type bias r will contain the biased autocorrelation of x _j If type unbias r will contain the unbiased autocorrelation of x oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred Y If oflag 0 a 32 bit overflow has not occurred Computes the first nr points of the positive side of the autocorrelation of the real vector x and stores the results are stored in real output vector r Notice that the full length autocorrelation of vector x will have 2 nx 1 points with even symmetry around the lag 0 point r 0 This routine provides only the positive half of this for memory and computational savings nx j 1 Raw Autocorrelation i X H O lt j lt or TE Biased Autocorrelation j a gt Xf Kk x Kk 0O lt j lt nr k 0 Unbiased Autocorrelation nx j 1 j We ma aa Xi k xik 0O lt sj lt nr k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Function Descriptions 4 7 add Implementation Notes QU Special debugging consideration This function is im
6. 144 Vector Negate short oflag neg DATA x DATA r ushort nx defined in neg asm x nx r nx nx oflag Pointer to input data vector 1 of size nx In place processing allowed r can be x y Pointer to output data vector of size nx In place processing allowed Special cases Lj ifx l 1 32768 then r 1 321767 with oflag 1 Y if x 1 32767 then r 1 321768 with oflag 1 Number of elements of input and output vectors nx gt 4 Overflow flag Lj If oflag 1 a 32 bit overflow has occurred Y If oflag 0 a 32 bit overflow has not occurred This shoud be taken it as a warning overflow in negation of aQ15 number can happen naturally when negating 1 This function negates each of the elements of a vector fractional values for E0 i lt nx i xi x Overflow Handling Methodology Saturation implemented for overflow handling Special Requirements none Implementation Notes none Example 4 62 See examples neg subdirectory Benchmarks Cycles neg32 Core 85 bsize nh 18 bsize nb nx Overhead 20 Code size in 16 bit words 21 E Vector Negate double precision short oflag neg32 LDATA x LDATA r ushort nx defined in neg32 asm Function Arguments Description Algorithm x nx r nx nx oflag Pointer to input data vector 1 of size nx In place processing allowed r can be x y Pointer to output data vector of
7. Benchmarks Function Arguments Description Algorithm See examples cfir subdirectory Cycles Core nx 13 8 nh Overhead 49 Code size in 16 bit words 66 Forward Complex FFT void cfft DATA x nx short scale defined in cfft asm where nx x 2 nx Pointer to input vector containing nx complex elements 2 nx real elements in bit reversed order On output vector a contains the nx complex elements of the FFT x Complex numbers are stored in Re Im nx Number of complex elements in vector x nx must be a constant number not a variable and can take the following values nx 8 16 32 64 128 256 512 1024 scale Flag to indicate whether or not scaling should be implemented during computation If scale 0 scale factor 1 else scale factor nx end Computes a Radix 2 complex DIT FFT of the nx complex elements stored in vector x in bit reversed order The original content of vector x is destroyed in the process The nx complex elements of the result are stored in vector x in normal order DFT yik on gt xn cos 22775 jsin 2 2 4 scale factor Function Descriptions 4 15 cfft Overflow Handling Methodology Scaling implemented for overflow prevention Special Requirements Implementation Notes Example 4 16 a O O Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below T
8. DATA dbuffer ushort nh2 ushort nx Function Arguments x nx r nx dbuffer 2 nh2 nx nh2 oflag Pointer to real input vector of nx real elements Pointer to real input vector of nx real elements In place computation r x is allowed Delay buffer of size nh 2 nh2 Y Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 2 nh2 Number of real elements in vector a input samples Half the number of coefficients of the filter due to symmetry there is no need to provide the other half Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Function Descriptions 4 33 firs2 Description Algorithm Computes a real FIR filter direct form using the nh2 coefficients stored in pro gram location pointed by TI_LIB_COEFFS global label The filter is assumed to have asymmetric impulse response with the first half of the filter coefficients stored in locations pointed by TI_LIB_COEFFS The real data input is stored in vector x The filter output result is stored in vector r This functi
9. E macrosi asm contains all macros used for this code E sintab q15 contains twiddle table section sintab E unpacki asm containing code to for unpacking results Memory alignment Although there is no memory alignment request for this function you need to align input data if you use this function with func tion cbrev see page 4 13 Implemented as a complex IFFT of size nx 2 followed by an unpack stage to unpack the real IFFT results Therefore implementation Notes for the cfft function apply to this case Notice that normally an IFFT of a real sequence of size N produces a com plex sequence of size N or 2 N real numbers that will not fit in the input sequence To accomodate all the results without requiring extra memory locations the output reflects only half of the spectrum complex output This still provides the full information because an IFFT of a real sequence has even symmetry around the center or nyquist point N 2 Special debugging consideration This function is implemented as a mac ro that invokes different IFFT routines according to the size As a conse quence instead of the rfft symbol being defined multiple rifft symbols are where nx IFFT real size When scale 1 this routine prevents overflow by scaling by 2 at each IFFT intermediate stages and at the unpacking stage See examples rifft subdirectory Function Descriptions 4 71 sine Benchmarks 8 cycles butterfly core only Code Size wo
10. Source code is pro vided to allow you to modify the functions to match your specific needs The routines included within the library are organized into eight different func tional categories a E yg Q Q Q FFT Filtering and convolution Adaptive filtering Correlation Math Trigonometric Miscellaneous Matrix 1 2 Features and Benefits m m E E E Hand coded assembly optimized routines C callable routines fully compatible with the TI C54x compiler Support also provided for C54x devices with extended program memory addressing Far mode Fractional Q15 format operand supported Complete set of examples on use provided Benchmarks time and code provided Tested against Matlab scripts 1 2 1 DSPLIB Quality Freeware That You Can Build on and Contribute to DSPLIB is a free of charge product You can use modify and distribute TI C54x DSPLIB for use on TI C54x DSPs with no royalty payments Refer to Appendix C Texas Instruments License Agreement for DSP Code and to section 3 8 Where DSPLIB Goes from Here for details Chapter 2 Installing DSPLIB Topic Page 221 DSPEIB Conte mtv aa ORO ere 2 2 2 2 HOw to Install DSPLUB ra a aaa nate ey vaio 2 3 2 32 Howto RebUIld DSPLIB A co ads 2 4 2 1 DSPLIB Content 2 1 DSPLIB Content 2 2 The TI DSPLIB software consists of four parts 1 A header file for C programmers dsplib h 2 Two object libraries for the two different memory m
11. n 1 i n N 1 1 e n keln e n 1 i N N 1 1 yn edn e n Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example See examples iirlat subdirectory Benchmarks Cycles Core nx 3 nh 14 Overhead 48 Code size in 16 bit words 49 OES Lattice Forward FIR Filter Function short oflag firlat DATA x DATA h DATA r DATA d int nx int nh Arguments x nx Pointer to real input vector of nx real elements h nh Pointer to lattice coefficient vector of size nh in normal order h b0 b1 b2 b3 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh r nx Pointer to real input vector of nx real elements In place computation r x is allowed dinh Delay buffer J Inthe case of multiple buffering schemes this array should be initialized to 0 for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address Function Descriptions 4 47 log_2 Description Algorithm must be zeros where k log2 nh nx Number of real elements in vector x input samples nh Number of coefficients oflag Overflow error f
12. Sample Code 0 cece eet eee ene eens 3 5 3 6 How DSPLIB is Tested Allowable Error 0000 cece cece e eee eae 3 6 3 7 How DSPLIB Deals With Overflow and Scaling Issues 000000 e eee 3 7 3 8 Where DSPLIB Goes from Here 0 00 cece cence eet e eee eae 3 9 Function Descriptions cece eee eee eee eee een eenneees 4 1 4 1 Arguments and Conventions Used 6000 cece eee ete eee eae 4 2 4 2 DSPLIB Functions 00 ett een eens 4 3 ACCOM shee watde ds E e AE a Lees dy 4 7 OG wise tad eee NA ease wae 4 8 AtAN Ore seca RN 4 9 atanz Dis acre oat SAR OUR SE SUPE SOAS Me nes dare ane sees 4 10 DO A eet ake eis AAA note cadet ae cia GAS 4 11 COVA dat heaped ig dal 4 12 CUT esi ee ar E E E EE E OR AAN 4 14 CIMT fates o A o e Ad 4 15 CIL ras Pansies a a POE aaa a 4 17 Contents CUTE sho BS ae Pe a a a iaa 4 19 GIMO 2 dee ar a UI AC A AC S tac fora Pea A Se vee ETAT ER Cen RET A 4 21 Convoca a 4 23 CUE A a fae se a A Gas Ue ee a 4 24 UIT an a aaa cas eyes wetted ecc teas veut dog aes eu e a a e ATAA 4 26 OX PID re than encanta ad hase ke ty 4 27 A O E O E O NN 4 28 A A ON ae 4 30 MO Pots elet eed beta thea Beles E T N E NESTA TT 4 31 Mins es pibe de a a A boar ac cedar ts carla a 4 33 SD A seats MR A A EA O E aad ate amend aia 4 34 A O NOTO RRE 4 36 MIDI va A Ree actin 4 37 MIL a an Ia 4 38 CAS A A A A A ada 4 41 A O O E O AN 4 42 MICA Nr ri daa A afoot 4 44 Ml o
13. Xnorm 1 Dyx or Dxy Ym old Xnorm 1 c3 where Dyx error in calculation Assume that Ym new and Ym old are related as follows Ym new Ym old Dy c4 where Dy difference in values Substituting c2 and c4 into c3 Ym old Xnorm Ym new Xnorm Dxy Ym new Dy Xnorm Ym new Xnorm Dxy Ym new Xnorm Dy Xnorm Ym new Xnorm Dxy Dy Xnorm Dxy Dy Dxy 1 Xnorm c5 Assume that 1 Xnorm is approximately equal to Ym old Dy Dxy Ym old approx c6 Substituting c6 into c4 Ym new Ym old Dxy Ym old c7 Substituting for Dxy from c3 into c7 Ym new Ym old Ym old Xnorm 1 Ym old Ym new Ym old Ym old 2 Xnorm Ym old Ym new 2 Ym old Ym old 2 Xnorm c8 If after each calculation we equate Ym old to Ym new Ym old Ym new Ym Calculating the Reciprocal of a Q15 Number Then equation c8 evaluates to Ym 2 Ym Ym42 Xnorm c9 If we start with an initial estimate of Ym then equation c9 will converge to a solution very rapidly typically 3 iterations for 16 bit resolution The initial estimate can either be obtained from a look up table or from choos ing a mid point or simply from linear interpolation The method chosen for this problem is the latter This is simply accomplished by taking the complement of the least significant bits of the Xnorm value Calculating the Reciprocal of a Q15 Number B 3 B 4 Appendix C Texa
14. a decimating real FIR filter direct form using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sam ple filtering nx 1 t SHKMi D K Osjsnx k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example Benchmarks Function Arguments See examples decim subdirectory Cycles Cycles nx D 12 nh 4 D 1 Overhead 86 Code size in 16 bit words 67 Interpolating FIR Filter short oflag firinterp DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx ushort I defined in interp asm x nx Pointer to real input vector of nx real elements Function Descriptions 4 31 firinterp Description Algorithm h nh Pointer to coefficient vector of size nh in normal order h b0 b1 b2 b3 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh r nx I Pointer to real output vector of nx real elements In place computation r x is allowed dbuffer nh Delay buffer Y Inthe case of multiple buffering schemes this array should be initial
15. array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 2 nbiq Number of biquads Number of elements of input and output vectors Overflow flag Y If oflag 1 a 32 bit overflow has occurred Lj If oflag 0 a 32 bit overflow has not occurred Function Descriptions 4 41 iircas5 Description Algorithm Computes a cascade IIR filter of nbiquad biquad sections Each biquad sec tion is implemented using Direct form II All biquad coefficients 4 per biquad are stored in vector h The real data input is stored in vector a The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion is more efficient for block by block filter implementation due to the C call ing overhead However it can be used for sample by sample filtering nx 1 for biquad a n x n a1 a n 1 a2 d n 2 y n d n b1 a n 1 b2 d n 2 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example Benchmarks Function Arguments 4 42 See exa
16. elements of vector x using Taylor series for i 0 i lt nxi Yi log2 x i where 1 lt x i lt 1 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes y 1 4427 In x with x M P Example x 24P x M 24P y 1 4427 In M In 2 P y 1 4427 In 2 M P 1 In 2 y 1 4427 In 2 M 1 1 P 1 In 2 y 1 4427 f 2 M 1 P 1 In 2 with f u In 1 u We use a polynomial approximation for f u f u C6 u C5 u C4 u C3 u C2 u C1 u C0 for 0 lt u lt 1 The polynomial coefficients Ci are as follows CO 0 000 001 472 C1 0 999 847 766 C2 0 497 373 368 C3 0 315 747 760 C4 0 190 354 944 C5 0 082 691 584 C6 0 017 414 144 The coefficients Bi used in the calculation are derived from the Ci as follows BO Q30 1581d 0062Dh B1 Q14 16381d O3FFDh B2 Q15 16298d 0C056h B3 Q16 20693d 050D5h B4 Q17 24950d O9E8Ah B5 Q18 21677d 054Adh B6 Q19 9130d ODC56h See examples log_2 subdirectory Function Descriptions 4 49 log_10 Benchmarks Function Arguments Description Algorithm Cycles Core 60 nx Overhead 56 Code size in 16 bit words 85 Base 10 Logarithm short oflag log_10 DATA x LDATA r ushort nx defined in log_10 asm x nx Pointer to input vector of size nx r nx Pointer to output data vector Q31 format of size nx nx Length of inp
17. for losses or damages 2 Loss of or damage to your records or data or 3 Economic consequential damages including lost profits or savings or in cidental damages even if we are informed of their possibility Some jurisdictions do not allow these limitations or exclusions so they may not apply to you We do not warrant uninterrupted or error free operation of the Program We have no obligation to provide service defect correction or any maintenance for the Program We have no obligation to supply any Program updates or en hancements to you even if such are or later become available IF YOU DOWNLOAD OR USE THIS PROGRAM YOU AGREE TO THESE TERMS THERE ARE NO WARRANTIES EXPRESS OR IMPLIED INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Some jurisdictions do not allow the exclusion of implied warranties so the above exclusion may not apply to you You may terminate this license at any time We may terminate this license if you fail to comply with any ofits terms In either event you must destroy all your copies of the Program You are responsible for the payment of any taxes resulting from this license You may not sell transfer assign or subcontract any of your rights or obliga tions under this license Any attempt to do so is void Neither of us may bring a legal action more than two years after the cause of action arose This license is governed by the laws of the State
18. in quadrant 1 0 pi 2 sin x C1 X C2 x42 C3 X 3 04 x04 c5 x 5 c1 3 140625x c2 0 02026367 c3 5 3251 c4 0 5446778 c5 1 800293 The angle x in other quadrant is calculated by using symmetries that map the angle x into quadrant 1 Example See examples sine subdirectory Benchmarks Cycles Core 20 nx worst case 18 nx best case Overhead 23 Code size in 16 bit words 41 in program space 6 in data space ETA Square Root of a 16 bit Number Function short oflag sqrt_16 DATA x DATA r short nx defined in sqrtv asm Arguments x nx Pointer to input vector of size nx r nx Pointer to output vector of size nx containing the sqrt x In place operation is allowed r can be equal to x nx Number of elements of input and output vectors Function Descriptions 4 73 sub Description Algorithm oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Calculates the square root for each element in input vector x storing results in output vector r for i O lt nxji qi JM 0 lt is nx Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes none Example Benchmarks Function Arguments 4 74 See examples sine subdirectory Cycles Core 42 nx Overhead 41 Code size in 16 bit words 68 Vector Subtract short oflag sub DATA x DATA y DATA r ushor
19. io 4 46 Wibal si a A IA RA A IA 4 47 Lia ia A AAA 4 48 Di OR 4 50 A A eh ayes A 4 51 MAI tere ata cote a a hea 4 52 maxval y erort O E TORN 4 53 MIDE A oa ad 4 53 Mal A A NA 4 54 MM tc ml y alt ld td Ad dl A See 4 55 MPAA o e eg a a ia ot Ae 4 56 MULA dit titi lo EcAt 4 57 NOIMS i as AS A EA asin AS aa ae 4 58 NAM aii 4 60 NOQ oia balboa aed 4 62 O O E 4 63 POWER metia cari is aa de a dE AAA 4 64 QU de dead 4 65 MAD src as id he Saad eters hee a CAER A ea 4 65 A a da e o 4 66 OCP TE reen aa e tdo 4 66 Mii o ri A A a aa Dra 4 67 MEA A e AEE EEES ENES TESI 4 70 A O RO E ATE 4 72 SOLA TO vemia tori gen ia Gs Lerma sh oat Gs coheed A a E sang Geeta E RNS 4 73 SUD a A A es Ae ah a a o 4 74 5 DSPLIB Benchmarks and Performance ISSUES 0 0oocccccccccccc 5 1 5 1 What DSPLIB Benchmarks are Provided o ocoocococoronnonanno os 5 2 5 2 Performance Considerations oooococoroononorno e eee e eens 5 2 vi Contents Licensing Warranty and Support 0c cece eee eee eens 6 1 6 1 Licensing and Warranty 0 c cece ene eens 6 2 6 2 DSPLIB Software Updates 0 cece eee eee eee 6 2 6 3 DSPLIB Customer Support 000 c cette eee eee 6 2 Overview of Fractional Q Formats cceeee eee eee eee eee eee A 1 AA GBA Format 2 Siew tain tiie eine ia AES Sais baw ei aida bia Wii aaa tian A 2 AS SO 1S F ORM al arrasada IR ER tl Risa A 2 AS lt Q STROMA es oye tts ood E TET E
20. nx FFT complex size Y This routine prevents overflow by scaling by 2 at each IFFT intermediate stages See examples cifft32 subdirectory Benchmarks convol Function Arguments Description 37 cycles butterfly core only FFT size 8 16 32 64 128 256 512 1024 Cycles see note 288 779 2059 5170 12500 29446 142724 361469 Code Size words text section 389 407 429 452 475 498 521 544 convol Data Size words Sintab section 0 26 74 170 362 764 1514 3050 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects those conditions Convolution oflag short convol DATA x DATA h DATA r ushort nr ushort nh x nr nh 1 h nh nr nh oflag Pointer to real input vector a of nr nh 1 real elements Pointer to real input vector h of nh real elements Pointer to real output vector h of nr real elements Number of real elements in vector r Number of elements in vector h Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes the real convolution positive of 2 vectors a and h and places the results in vector r Typically used for block by block FIR filter computation without any need of
21. scale void cifft82 LDATA x nx short scale void rfft DATA x nx short scale void rifft DATA x nx short scale void cbrev DATA a DATA r ushort n b Filtering and Convolution Functions short fir DATA x DATA h DATA r DATA dbuffer ushort nx ushort nh short firs DATA x DATA r DATA dbuffer ushort nh2 ushort nx short int firs2 DATA x DATA h DATA r DATA dbuffer ushort nh2 ushort nx Filtering and convolution Description Radix 2 complex forward FFT MACRO Radix 2 complex inverse FFT MACRO 32 bit forward complex FFT 32 bit inverse complex FFT Radix 2 real forward FFT MACRO Radix 2 real inverse FFT MACRO Complex bit reverse function Description FIR Direct form Symmetric FIR Direct form Optimized routine Symmetric FIR Direct form generic routine Function Descriptions 4 3 DSPLIB Functions Table 4 2 DSPLIB Function Summary Table Continued b Filtering and Convolution Continued Functions short firdec DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx ushort D short firinterp DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx ushort I short cfir DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx short convol DATA a DATA h DATA r ushort na ushort nh short hilb16 DATA x DATA h DATA r DATA db ushort nh ushort nx short iircas4 DATA x DATA h DATA r DATA dbuff
22. using circular addressing or restricted data alignment This function can be used for both block by block and sample by sample filtering nr 1 Function Descriptions 4 23 corr nh Algorithm i X Ai Osjsor k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example See examples convol subdirectory Benchmarks Cycles Core nr nh 4 Overhead 35 Code size in 16 bit words 43 Ce Correlation full length Function short oflag corr DATA x DATA y DATA r ushort nx ushort ny type defined in craw asm cbias asm cubias asm Arguments x nx x ny r nx ny 1 nx ny type oflag 4 24 Pointer to real input vector of nx real elements Pointer to real input vector of ny real elements Pointer to real output vector containing the full length correlation nx ny 1 elements of vector x with y r must be different than both x and y in place computation is not allowed Number of real elements in vector x Number of real elements in vector y Correlation type selector Types supported _j If type raw r will contain the raw correlation _j If type bias r will contain the biased correlation _j If type unbias r will contain the unbiased correlation Overflow flag Lj If oflag 1 a 32 bit overflow has occurred If oflag 0 a 32 bit overflow has not occurred corr Description Computes the full length
23. 1 Arguments and Conventions Used Table 4 1 lists the convention followed when describing the arguments for each individual function Table 4 1 Arguments and Conventions 4 2 Argument xy r nx ny nr nh DATA LDATA ushort Description Argument reflecting input data vector Argument reflecting output data vector Arguments reflecting the size of vectors x y and r respectively In functions in which case nx nr nr only nx has been used across Argument reflecting filter coefficient vector filter routines only Argument reflecting the size of vector h Data type definition equating a short a 16 bit value representing a Q15 number Use of DATA instead of short is recommended to increase future portability across devices Data type definition equating a long a 32 bit value representing a Q31 number Use of LDATA instead of short is recommended to increase future portability across devices Unsigned short 16 bit You can used this data type directly because it has been de fined in dsplib h 4 2 DSPLIB Functions DSPLIB Functions The routines included within the library are organized into 8 different functional categories FFT Adaptive filtering Correlation Math Trigonometric Uooovovodo Miscellaneous Lj Matrix functions Table 4 2 DSPLIB Function Summary Table a FFT Functions void cfft DATA x nx short scale void cifft DATA x nx short scale void cfft82 LDATA x nx short
24. 170 128 12749 475 362 256 30059 498 764 512 144205 521 1514 1024 371312 544 3050 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects those conditions Inverse Complex FFT void cifft DATA x nx short scale defined in cfft asm where nx x 2 nx Pointer to input vector containing nx complex elements 2 nx real elements in bit reversed order representing the complex FFT of a signal On output vector x contains the nx complex elements of the IFFT x or the signal itself Complex numbers are stored in Re Im format x must be aligned at 2 nx boundary where nx IFFT size The log nx 1 LSBits of address x must be zero nx Number of complex elements in vector x nx must be a constant number not a variable and can take the following values nx 8 16 32 64 128 256 512 1024 scale Flag to indicate whether or not scaling should be implemented during computation If scale 0 scale factor 1 else scale factor nx end Function Descriptions 4 19 cifft Description Algorithm Computes a Radix 2 complex DIT IFFT of the nx complex elements stored in vector x in bit reversed order The original content of vector x is destroyed in the process The nx complex elements of the result are stored in vector x in normal order IDFT Mis le gt X w cos 227715 2 jsin 2 sc
25. 343 367 19049 391 750 42098 439 1517 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects those conditions 32 Bit Forward Complex FFT void cfft32 LDATA x nx short scale defined in c asm where nx x 2 nx nx scale Pointer to input vector containing nx complex elements 2 nx real elements in bit reversed order On output vector a contains the nx complex elements of the FFT x Complex numbers are stored in Re Im x must be aligned at 2 nx boundary where nx FFT size The log nx 1 LSBits of address x must be zero Number of complex elements in vector x nx must be a constant number not a variable and can take the following values nx 8 16 32 64 128 256 512 1024 Flag to indicate whether or not scaling should be implemented during computation if scale 0 scale factor nx else scale factor 1 end Function Descriptions 4 17 cfft32 Description Algorithm Computes a 32 bit Radix 2 complex DIT FFT of the nx complex elements stored in vector x in bit reversed order The original content of vector x is de stroyed in the process The nx complex elements of the result are stored in vector x in normal order DFT nx 1 yik an als cos 2 4 Brak 4 jin 2 a 4 Overflow Handling Methodology Scaling implemented for overflow prevention Special Re
26. 7d 054ADh B6 Q19 9130d ODC56h See examples logn subdirectory Cycles Core 39 nx Overhead 56 Code size in 16 bit words 67 Index of the Maximum Element of a Vector short r maxidx DATA x ushort nx defined in maxidx asm x nx Pointer to input vector of size nx r Index for vector element with maximum value nx Length of input data vector nx gt 6 Returns the index of the maximum element of a vector x In case of multiple maximum elements r contains the index of the last maximum element found Not applicable Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes none Example See examples maxidx subdirectory Benchmarks Cycles Core 27 3 nx if n even approx 31 3 nx Overhead 27 Code size in 16 bit words 66 maxval Maximum Value of a Vector Function short r maxval DATA x ushort nx defined in maxval asm Arguments x nx Pointer to input vector of size nx r Maximum value of a vector nx Length of input data vector Description Returns the maximum element of a vector x Algorithm Not applicable Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes none Example See examples maxval subdirectory Benchmarks Cycles Core 2 nx Overhead 16 Code size in 16 bit words 13 COLDS ndex of the Minimum Element of a Vector Function short r minidx DATA x ushort nx defined in minidx asm Arguments x nx
27. 9 equivalent to 0 2 0 2 0 0 2 0 2 float x 0x1999 Ox3dcc Ox7ffff Ox3dcc c234 equivalent to 0 2 0 4828 1 0 4828 0 4828 float atan2_16 y x r 4 should give r 0x2000 0x1000 0x0 Oxf000 0x7000 equivalent to 0 25 0 125 0 0 125 0 875 pi Algorithm For j 0 j lt nx j r i atan2 q 1 Overflow Handling Methodology Not applicable Special Requirements Linker command file you must allocate data section for polynomial coefficients Implementation Notes none Example See examples arct2 subdirectory Benchmarks Cycles Core 107 nx Overhead 47 Code size in 16 bit words 143 words 6 words of 16 bit data CEST Eoc Exponent Implementation Function short maxexp bexp DATA x ushort nx Arguments maxexp Return value max exponent that may be used in scaling x nx Pointer to input vector of size nx Function Descriptions 4 11 cbrev Description Algorithm nx Number of elements of input and output vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Computes the exponents number of extra sign bits of all values in the input vector and returns the minimum exponent This will be useful in determining the maximum shift value that may be used in scaling a block of data for short j 0 j lt nx j temp exp xli if temp lt maxexp maxexp temp return maxexp Overflow Handling Methodolo
28. ATA x LDATA r ushort nx defined in logn asm x nx Pointer to input vector of size nx r nx Pointer to output data vector Q31 format of size nx nx Length of input and output data vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Computes the log base e of elements of vector x using Taylor series for i 0 i lt nxi y logn x where 1 lt x i lt 1 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes y 0 4343 In x with x M P x 24P x M 24P y 0 4348 In M In 2 P y 0 4343 In 2 M P 1 In 2 y 0 4343 In 2 M 1 1 P 1 In 2 y 0 4343 f 2 M 1 P 1 In 2 with f u In 1 u Function Descriptions 4 51 maxidx Example Benchmarks Function Arguments Description Algorithm 4 52 We use a polynomial approximation for f u f u C6 u C5 u C4 u C3 u C2 u C1 u C0 for 0 lt u lt 1 The polynomial coefficients Ci are as follows CO 0 000 001 472 C1 0 999 847 766 C2 0 497 373 368 C3 0 315 747 760 C4 0 190 354 944 C5 0 082 691 584 C6 0 017 414 144 The coefficients Bi used in the calculation are derived from the Ci as follows BO Q30 1581d 0062Dh B1 Q14 16381d O3FFDh B2 Q15 16298d 0CO56h B3 Q16 20693d 050D5h B4 Q17 24950d O9E8Ah B5 Q18 2167
29. CERA TMS320C54x DSP Library Programmer s Reference Literature Number SPRU518 April 2001 seg TEXAS X INSTRUMENTS Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments and its subsidiaries Tl reserve the right to make changes to their products or to discontinue any product or service without notice and advise customers to obtain the latest version of relevant information to verify before placing orders that information being relied on is current and complete All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgment including those pertaining to warranty patentinfringement and limitation of liability Tl warrants performance of its products to the specifications applicable at the time of sale in accordance with Tl s standard warranty Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty Specific testing of all parameters of each device is not necessarily performed except those mandated by government requirements Customers are responsible for their applications using TI components In order to minimize risks associated with the customer s applications adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards Tl assumes no liability for applications assistance or customer product design TI does not warrant or represent that any license either e
30. DIT IFFT of the nx complex elements stored in vector x in bit reversed order The original content of vector x is de stroyed in the process The nx complex elements of the result are stored in vector x in normal order IDFT pezi yik Gea rel cos 2 4 Bra Ik 4 jin 2 a 4 Overflow Handling Methodology Scaling implemented for overflow prevention Special Requirements Implementation Notes Example 4 22 Lj Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below Y This function requires the inclusion of two other files during assembling automatically included E cifft_32 asm contains all functions used for this code NM sintab q31 contains twiddle table section sintab Y This is an IFFT optimized for time Space consumption is high due to the usage of a separate sine table in each stage This reduces MIPS count but also increases twiddle table data space Y The first 2 IFFT stages are implemented as a radix 4 Last stage is also unrolled for optimization Twiddle factors are built in and provided in the sintab q31 that is automatically included during the assembly process OU Special debugging consideration This function is implemented as a mac ro that invokes different IFFT routines according to the size As a conse quence instead of the cifft32 symbol being defined multiple cifft82_ symbols are where
31. Pointer to input vector of size nx Function Descriptions minidx 4 53 minval r Index for vector element with minimum value nx Lenght of input data vector nx gt 6 Description Returns the index of the minimum element of a vector x In case of multiple minimum elements r contains the index of the last minimum element found Algorithm Not applicable Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes Different implementation than maxidx because unable to use cmps instruction with min Example See examples minidx subdirectory Benchmarks Cycles Core 4 5 NX Overhead 18 Code size in 16 bit words 22 COTE Minimum Value of a Vector Function short r minval DATA x ushort nx defined in minval asm Arguments x nx Pointer to input vector of size nx r Maximum value of a vector nx Lenght of input data vector Description Returns the minimum element of a vector x Algorithm Not applicable Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes none Example See examples minval subdirectory 4 54 Benchmarks Function Arguments Description Algorithm Cycles mmul Core 2 nx Overhead 16 Code size in 16 bit words 13 Matrix Multiplication short oflag mmul DATA x1 short row1 short col1 DATA x2 short row2 short col2 DATA r x1 row1 col1 row coli x2 row2 col2 row2 col2 r row1 col2 nx
32. TA x DATA y DATA r ushort nx ushort ny type e Trigonometric Functions Short sine DATA x DATA r ushort nx Short atan2_16 DATA q DATA i DATA r ushort nx Short atan16 DATA x DATA r ushort nx 1 Math Functions short add DATA x DATA y DATA r ushort nx ushort scale short expn DATA x DATA r ushort nx short Idiv16 LDATA x DATA y DATA r DATA exp ushort nx short logn DATA x LDATA r ushort nx short log_2 DATA x LDATA r ushort nx Short log_10 DATA x LDATA r ushort nx Description LMS FIR delayed version Normalized delayed LMS implementation Normalized Block LMS implementation Description Auto correlation positive side only MACRO Correlation full length MACRO Description sine of a vector 4 Quadrant Inverse Tangent of a vector Arctan of a vector Description Optimized vector addition Exponent of a vector Signed vector divide Natural log of a vector Log base 2 of a vector Log base 10 of a vector Function Descriptions 4 5 DSPLIB Functions Table 4 2 DSPLIB Function Summary Table Continued f Math Continued Functions short maxidx DATA x ushort nx short maxval DATA x ushort nx short minidx DATA x ushort nx short minval DATA x ushort nx short mul32 LDATA x LDATA y LDATA r ushort nx short neg DATA x DATA r ushort nx short neg32 LDATA x LDATA r ushort nx sho
33. Table A 3 Q 31 Low Memory Location Bit Fields Ese ee aes es ee pane ors ow ow ow os oe a a Table A 4 Q 31 High Memory Location Bit Fields fe Hee ae e aes e ee e oo po 0007 oie A 2 Appendix B Calculating the Reciprocal of a Q15 Number The most optimal method for calculating the inverse of a fractional number Y 1 X is to normalize the number first This limits the range of the number as follows 0 5 lt Xnorm lt 1 1 lt Xnorm lt 0 5 1 The resulting equation becomes Y 1 Xnorm 2 n or Y 24n Xnorm 2 where n 1 2 3 14 15 Letting Ye 24n Ye 2 n 3 Substituting 3 into equation 2 Y Ye 1 Xnorm 4 Letting Ym 1 Xnorm Ym 1 Xnorm 5 Substituting 5 into equation 4 Y Ye Ym 6 For the given range of Xnorm the range of Ym is 1 lt Ym lt 2 2 lt Ym lt 1 7 To calculate the value of Ym various options are possible Y Taylor Series Expansion Y 2nd 3rd 4th Order Polynomial Line Of Best Fit B 1 Calculating the Reciprocal of a Q15 Number B 2 Lj Successive Approximation The method chosen in this example is c Successive approximation yields the most optimum code versus speed versus accuracy option The method outlined below yields an accuracy of 15 bits Assume Ym new exact value of 1 Xnorm Ym new 1 Xnorm c1 or Ym new X 1 c2 Assume Ym old estimate of value 1 X Ym old
34. ait state memory external memory for program and data data alloca tion to on chip DARAM no pipeline hits A linker command file showing the memory allocation used during testing and benchmarking in the TI C54x EVM is included under the example subdirectory Remember execution speed in a system is dependent on where the different sections of program and data are located in memory Be sure to account for such differences when trying to explain why a routine is taking more time that the reported DSPLIB benchmarks Chapter 6 Licensing Warranty and Support Topic Page 6 1 Licensing and Warranty ssssssssssssssnnnssssnsnnnnnnn 6 2 6 2 DSPLIB Software Updates 0 cece cece eee na a a 6 2 6 3 DSPLIB Customer SUpport EEE EEE ee errr eee eer rete 6 2 6 1 Licensing and Warranty 6 1 Licensing and Warranty C54x DSPLIB is distributed as a free of charge product under the generic Tex as Instrument License Form presented in Appendix C BETA RELEASE SPECIAL DISCLAIMER This DSPLIB software release is preliminary Beta It is intended for evaluation only Testing and characteriza tion has not been fully completed Production release will typically follow after a month of the Beta release but no explicit guarantees are paced on that date 6 2 DSPLIB Software Updates C54x DSPLIB software updates will be periodically released incorporating product enhancement and fixes DSPLIB software updates will be p
35. ale beads mi ad s A E ea eed a aes a wa nd toewae a pe A A 2 Calculating the Reciprocal of a Q15 Number 00 0 cece eee eee eee eee B 1 Texas Instruments License Agreement for DSP Code 220ceee eee eees C 1 Contents vii Tables A 1 viii Arguments and Conventions sessa essan eeann nent eee eens 4 2 DSPLIB Function Summary Table 000 cece ete 4 3 Q3 T2 Bit Field Sinntean Sena Racin bea hotisegua phe Ra A 2 CAS BIT RS E Berta cool ae s E Reena ark pean Os A 2 Q 31 Low Memory Location Bit Fields 00 cece eee eee A 2 Q 31 High Memory Location Bit Fields 0 cece eee eee A 2 Chapter 1 Introduction The TMS320C54x DSPLIB is an optimized DSP Function Library for C pro grammers on TMS320C54x devices It includes over 50 C callable assembly optimized general purpose signal processing routines These routines are typically used in computationally intensive real time applications where opti mal execution speed is critical By using these routines you can achieve exe cution speeds considerably faster than equivalent code written in standard ANSI C language In addition to providing ready to use DSP functions TI DSPLIB can significantly shorten your DSP application development time Topic Page LA DSP ROUUNOS orare aae a oo 1 2 PA EE ae N e arar a O n A 1 2 DSP Routines 1 1 DSP Routines The TI DSPLIB includes commonly used DSP routines
36. ale factor nx nx Overflow Handling Methodology Scaling implemented for overflow prevention Special Requirements Implementation Notes Example 4 20 m m E Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below This function requires the inclusion of two other files during assembling automatically included E macrosi asm contains all macros used for this code ME sintab q15 contains twiddle table section sintab Memory alignment Although there is no memory alignment request for this function you need to align input data if you use this function with func tion cbrev see page 4 13 This is an IFFT optimized for time Space consumption is high due to the use of a separate sine table in each stage This reduce MIPS count but also increases twiddle table data space First 2 IFFT stages implemented are implemented as a radix 4 Last stage is also unrolled for optimization Twiddle factors are built in and provided in the sintab q15 that is automatically included during the assembly pro cess Special debugging consideration This function is implemented as a mac ro that invokes different IFFT routines according to the size As a conse quence instead of the cifft symbol being defined multiple cifft symbols are where nx IFFT complex size This routine prevents overflow by scaling by 2 at each IFFT inter
37. ar buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh Number of real elements in vector x input samples Number of coefficients Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real FIR filter direct form using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion can be used for both block by block and sample by sample filtering nx 1 l X Ad O lt jsmx k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Function Descriptions 4 29 firdec Implementation Notes You can also use the convolution function for filtering by having an input buffer x padded with nh 1 zeros at the beginning of the x buffer However having an fir filter implementation that uses a totally independent delay buffer dbuffer gives you more control in the relocation in memory of your data buffers in the case of a dual buffering filtering scheme Example See examples fir subdirectory Benchmarks Cycles Core 4 nx 4 nh Overhead 34 Code size in 16 bit w
38. ath of the 54xdsplib directory path to your C_DIR environment variable For example if you run the 54xdsplib exe self extracting file in cA54xdsplib and your TI DSP development tools were installed in c dsptools add this line to your c autoexec bat file Set C_DIR C 54xdsplib c dsptools This allows the C54x compiler linker to find the C54x DSPLIB object libraries 54xdsp lib or 54xdspf lib Installing DSPLIB 2 3 How to Rebuild DSPLIB 2 3 How to Rebuild DSPLIB 2 3 1 For Full Rebuild of 54xdsp lib and or 54xdspf lib Lj To rebuild 54xdsp lib simply execute the blt54x bat Warning This will overwrite the existing 54xdsp lib Lj To rebuild 54xdspf lib simply execute the blt54xf bat Warning This will overwrite the existing 54xdspf lib 2 3 2 For Partial Rebuild of 54xdsp lib and or 54xdspf lib Modification of a Specific DSPLIB Function for example fir asm 1 Extract the source for the selected function from the source archive ar500 x 54xdsp src fir asm 2 Reassemble your new fir asm assembly source file asm500 g fir asm 3 Replace the object fir obj in the dsplib lib object library with the newly formed object ar500 r 54xdsp lib fir obj 2 4 Chapter 3 Using DSPLIB Topic Page 3 17 DSPEIB Data Types a ee n cies ioerie sete neve E a E S 3 2 3 2 DSPEIBVArguments a ai 3 2 3 3 Calling a DSPLIB Function froMC occccococcccccccno 3 3 3 4 Calling a DSPLIB Function from Assembly
39. correlation of vectors x and y and stores the result in vector r using time domain techniques Algorithm Raw correlation nr j 1 f Y H O lt j lt nr nx ny 1 k 0 Biased correlation nr j 1 M h gt M AWK Osj lt nr nxt ny 1 k 0 Unbiased correlation nr j 1 1 ig Mota 2 A WA Osjsnr nx ny 1 k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes 1 Special debugging consideration This function is implemented as a mac ro that invokes different correlation routines according to the type se lected As a consequence the corr symbol is not defined Instead the corr_raw corr_bias corr_unbias symbols are defined Y Correlation is implemented using time domain techniques Example See examples cbias examples cubias examples craw subdirectories Benchmarks Cycles Raw Core 41 16 na 3 na 2 17 na 3 14 no na 1 na 2 8 Overhead 36 Unbias Core 26 na 3 53 na 3 na 2 38 no na 1 11 na 2 Overhead 51 Bias Core 59 2 na 3 12 na 3 na 2 2 nb na 1 12 na 2 Overhead 51 Function Descriptions 4 25 dims Code size in 16 bit words Raw 105 Unbias 255 Bias 132 COS Adaptive Delayed Ims Filter Function Arguments 4 26 short oflag dims DATA x DATA h DATA r DATA d DATA des DATA step ushort nh ushort nx defi
40. due to the C call ing overhead However it can be used for sample by sample filtering nx 1 for biquad a n x n at d n 1 a2 d n 2 y n b0 a n b1 a n 1 b2 a n 2 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example Benchmarks 4 40 See examples iir32 subdirectory Cycles Core 4 nx 12 48 nbiq Overhead 58 Code size in 16 bit words 110 iircas4 ME Cascaded I R Direct Form II Using 4 Coefs per Biquad Function short oflag iircas4 DATA x DATA h DATA r DATA dbuffer ushort nbiq ushort nx defined in iir4cas4 asm Arguments x nx h 4 nbiq r nx dbuffer 2 nbiq nbiq nx oflag Pointer to input data vector of size nx Pointer to filter coefficient vector with the following format h al1 a21 b21 b11 a1l a2l b21 b11 where is the biquad index i e a21 is the a2 coefficient of biquad 1 Pole recursive coefficients a Zero non recursive coefficients b Pointer to output data vector of size nx r can be equal than x Pointer to address of delay line d Each biquad has 2 delay line elements separated by nbiq locations in the following format d1 n 1 d2 n 1 di n 1 d1 n 2 d2 n 2 di n 2 where is the biquad index i e d2 n 1 is the n 1 th delay element for biquad 2 J Inthe case of multiple buffering schemes this
41. e TMS320C54x DSPLIB is an optimized DSP Function Library for C pro grammers on TMS320C54x devices It includes over 50 C callable assembly optimized general purpose signal processing routines These routines are typically used in computationally intensive real time applications where opti mal execution speed is critical By using these routines you can achieve exe cution speeds considerably faster than equivalent code written in standard ANSI C language In addition to providing ready to use DSP functions TI DSPLIB can significantly shorten your DSP application development time Related Documentation Trademarks Y The MathWorks Inc Matlab Signal Processing Toolbox User s Guide Natick MA The MathWorks Inc 1996 Y Lehmer D H Mathematical Methods in large scale computing units Proc 2nd Sympos on Large Scale Digital Calculating Machinery Cam bridge MA 1949 Cambridge MA Harvard University Press 1951 UY Oppenheim Alan V and Ronald W Schafer Discrete Time Signal Proc essing Englewood Cliffs NJ Prentice Hall 1989 Digital Signal Processing with the TMS320 Family SPRO12 TMS320C54x DSP CPU and Peripherals Reference Set Volume 1 SPRU131 Y TMS320C54x Optimizing C Compiler User s Guide SPRU103 TMS320 TMS320C54x and C54x are trademarks of Texas Instruments Matlab is a registered trademark of The MathWorks Inc Acknowledgments Acknowledgments DSPLIB includes code contributed by
42. e advantage of the C54x LMS instruction nh 1 FIR portion qi gt b k Xi k 0O lt i lt nx k 0 Adaptation using the previous error and the previous sample e i des i ri bk i 1 bk 2 u efi 1 xi k 1 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes Delayed version implemented to take advantage of the C54x LMS instruction Example Benchmarks Function Arguments Effect on covergence minimum For reference following is the algorithm for the regular LMS non delayed nh 1 FIR portion qi bD bik xli k 0 lt is nx k 0 Adaptation using the previous error and the previous sample e des i ri bk i 1 bk N 2 wu e i xX i k See examples dims subdirectory Cycles Core nx 18 2 nh 2 nx 14 2 nh Overhead 45 Code size in 16 bit words 62 Exponential Base e short oflag expn DATA x DATA r ushort nx defined in expn asm x nx Pointer to input vector of size nx x contains the numbers normalized between 1 1 in q15 format Function Descriptions 4 27 fir r nx Pointer to output data vector Q3 12 format of size nx r can be equal to x nx Length of input and output data vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description Computes the exponent of elem
43. e equal to destination operand to conserve memory Calling a DSPLIB Function from C 3 3 Calling a DSPLIB Function from C In addition to correctly installing the DSPLIB software to include a DSPLIB function in your code you have to YU Include the dsplib h include file 3 Link your code with one of the two DSPLIB object code libraries 54xdsp ib or 54xdspf lib depending on whether you need far mode Y Use a correct linker command file describing the memory configuration available in your C54x board For example the following code contains a call to the acorr q 15tofland fltoq15 routines in DSPLIB User s Guide example include dsplib h float xf 3 0 1 float yf 3 0 short x 3 short y 3 short i 3 3 main for i 0 1 lt 3 i y i x i 0 fltogl5 xf x 3 acorr x y 3 3 t1aw ql5tofl y y 3 In this example the fltoq15 and q 15tof DSPLIB functions are used to convert between floating point fractional values to Q15 fractional values However in many applications your data is always maintained in Q15 format so that the conversion between float and Q15 is not required The above code ug c is available under the doc code subdirectory To com pile and link this code with 54xdsp lib simply issue the following command c1500 pk g 03 i ug c z v0O 54x cmd 54xdsp lib m ug map oug out or c1500 v548 mf pk g 03 i ug c z v0 54x cmd 54xdsp lib m ug map o
44. ector r This function retains the address of the delay filter memory d containing the pre vious delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sample filtering nx 1 ii ARM K osjsnx k 0 where h is symmetric for example h hO h1 h2 h2 h1 hO where nh2 3 Only ho h1 h2 are stored in data memory Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Function Descriptions 4 35 fltoq15 Implementation Notes Although this routine is slower than the symmetric filter routine firs included Example Benchmarks Function Arguments Description Algorithm in DSPLIB it does not impose any restrictions in the location of the coefficient vector or in the use of multiple filtering routines in the same executable See examples firs2 subdirectory Cycles Core nx 15 2 nh2 Overhead 43 Code size in 16 bit words 58 Float to q15 Conversion short errorcode fltoq15 float x DATA r ushort nx defined in fltoq15 asm x nx Pointer to floating point input vector of size nx x should contain the numbers normalized between 1 1 The erro code returned value will reflect if that condition is not met r nx Pointer to output data vector of size nx containing the q15 equivalent of vector x nx Length of input and output data vectors errorcode The function returns the following err
45. ed delay buffer oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred ndims Normalized Delayed LMS NDLMS Block FIR implementation using coeffi cients stored in vector h Coefficients are updated after each sample based on the LMS algorithm The real data input is stored in vector a The filter output result is stored in vector r LMS algorithm is used but adaptation using the previous error and the pre vious sample delayed to take advantage of the C54x LMS instruction Restrictions This version does not allow consecutive calls to this routine in a dual buffering fashion For a more detailed description of the algorithm refer to 4 FIR portion nh 1 l Y A i Osis nx k 0 Adaptation using the previous error and the previous sample e i des i ri vari 1 vari 1 6 abs x 1 cutoff _ bk 2 u eh 1 xi k 1 var i A 2 bk i 1 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Function Descriptions 4 61 neg Implementation Notes Delayed version implemented to take advantage of the C54x LMS instruction Effect on covergence minimum Example Benchmarks Function Arguments Description Algorithm See examples ndlms subdirectory Cycles Core 85 bsize nh 18 bsize nb nx Overhead 88 Code size in 16 bit words
46. ents of vector x using Taylor series Algorithm for i 0 i lt nx i y e where 1 lt x lt 1 Overflow Handling Methodology Not applicable Special Requirements Linker command file you must allocate data section for polynomial coeffi cients Implementation Notes Computes the exponent of elements of vector x lt uses the following Taylor series exp x c1 x c2 x 2 C3 X 3 04 x 4 c5 x 5 where c1 0 0139 c2 0 0348 c3 0 1705 c4 0 4990 c5 1 0001 Example See examples expn subdirectory Benchmarks Cycles Core 12 nx Overhead 32 Code size in 16 bit words 36 TN 18 Filter Function oflag short fir DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx Arguments x nx Pointer to real input vector of nx real elements 4 28 Description Algorithm h nh r nx dbuffer nh nx nh oflag fir Pointer to coefficient vector of size nh in normal order h bO b1 b2 b3 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh Pointer to real input vector of nx real elements In place computation r x is allowed Delay buffer J Inthe case of multiple buffering schemes this array should be initialized to 0 for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circul
47. er ushort nbiq ushort nx short iircas5 DATA x DATA h DATA r DATA dbuffer ushort nbiq ushort nx short iircas51 DATA x DATA h DATA r DATA dbuffer ushort nbiq ushort nx short iir32 DATA x LDATA h DATA r LDATA dbuffer ushort nbiq ushort nx short iirlat DATA x DATA h DATA r DATA d ushort nh ushort nx short firlat DATA x DATA h DATA r DATA d ushort nx ushort nh Description Decimating FIR filter Interpolating FIR filter Complex FIR direct form Convolution 16 bit fir Hilbert Transformer IIR cascade Direct Form 2 4 coefficients per biquad IIR cascade Direct Form 2 5 coefficients per biquad IIR cascade Direct Form 1 5 coefficients per biquad 32 bit IIR cascade Direct Form 2 5 coefficients per biquad Lattice inverse IIR filter Lattice forward FIR filter 4 4 DSPLIB Functions Table 4 2 DSPLIB Function Summary Table Continued c Adaptive Filtering Functions short dims DATA x DATA h DATA r DATA d DATA des DATA step ushort nh ushort nx short ndlms DATA x DATA h DATA r DATA dbuffer DATA des ushort nh ushort nx int _tau int cutoff int gain DATA norm_d short nblms DATA x DATA h DATA r DATA dbuffer DATA des ushort nh ushort nx ushort nb DATA norm_e int _tau int cutoff int gain d Correlation Functions short acorr DATA x DATA r ushort nx ushort nr type short corr DA
48. ether or not scaling should be implemented during computation If scale 0 scale factor 1 else scale factor nx end Computes a Radix 2 real DIT FFT of the nx real elements stored in vector x in bit reversed order The original content of vector x is destroyed in the pro cess The first nx 2 complex elements of the FFT x are stored in vector x in normal order DFT yk nx 1 1 eee ae in scale factor e IK y jsi scale factor n xli cos PX sin i 0 Overflow Handling Methodology Scaling implemented for overflow prevention See section 6 3 Special Requirements 4 68 OU Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below Y This function requires the inclusion of two other files during assembling automatically included E macros asm contains all macros used for this code MW sintab q15 contains twiddle table section sintab E unpack asm containing code to for unpacking results UY Memory alignment Although there is no memory alignment request for this function you need to align input data if you use this function with func tion cbrev see page 4 13 Implementation Notes Example Benchmarks d d rfft Implemented as a complex FFT of size nx 2 followed by an unpack stage to unpack the real FFT results Therefore implementation Notes for the cfft function apply to th
49. ffi cients stored in vector h Coefficients are updated after each sample based on the LMS algorithm The real data input is stored in vector a The filter output result is stored in vector r LMS algorithm is used but adaptation uses the previous error and the previous sample delayed and takes advantage of the C54x LMS instruction Restrictions This version does not allow consecutive calls to this routine in a dual buffering fashion Algorithm For a more detailed description of the algorithm refer to 4 nh 1 FIR portion qi gt bik Xi k 0O lt i lt nx k 0 Adaptation using the previous error and the previous sample el a y i error var i 1 B van 21 B Labs x i cutoff signal power estimate for f O lt nb j a _ bkii 2 u e x i k bkj i 1 var A 2 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements Linker command file you must allocate ebuffer section for polynomial coeffi cients Function Descriptions 4 59 ndims Implementation Notes Delayed version implemented to take advantage of the C54x LMS instruction Example Benchmarks Function Arguments 4 60 Effect on covergence minimum For reference following is the algorithm for the regular LMS non delayed FIR portion nh 1 f Y bakti Osis nx k 0 Adaptation using the current error and the current sample e de
50. fficients 5 per biquad are stored in vector h The real data input is stored in vector a The filter output result is stored in vector r Computes a cascade IIR filter of nbiquad biquad sections Each biquad sec tion is implemented using Direct form I All biquad coefficients 5 per biquad are stored in vector h The real data input is stored in vector a The filter output result is stored in vector r The use of 5 coefficients instead of 4 facilitates the design of filters with Unit gain less that one for overflow avoidance typically achieved by filter coeffi cient scaling for biquad y n b0 x n b1 x n 1 b2 x n 2 al y n 1 a2 y n 2 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes This implementation does not use circular addressing for the delay buffer Example Benchmarks Instead it takes advantage of the 54x DELAY instruction For this reason the delay buffer pointer will always point to the top between successive block calls See examples iircas51 subdirectory Cycles Core nx 13 8 nbiq Overhead 44 Code size in 16 bit words 58 Function Descriptions 4 45 iirlat CE Lattice Inverse IIR Filter Function Arguments Description 4 46 short oflag iirlat DATA x DATA h DATA r DATA d int nh int nx x nx Pointer to real input vector of nx real elements h nh Poin
51. g Y If oflag 1 a 32 bit overflow has occurred YU If oflag 0 a 32 bit overflow has not occurred Computes a cascade IIR filter of nbiquad biquad sections Each biquad sec tion is implemented using Direct form II All biquad coefficients 5 per biquad are stored in vector h The real data input is stored in vector a The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion is more efficient for block by block filter implementation due to the C call ing overhead However it can be used for sample by sample filtering nx 1 The use of 5 coefficients instead of 4 facilitates the design of filters with Unit gain less that one for overflow avoidance typically achieved by filter coeffi cient scaling for biquad a n x n al d n 1 a2 d n 2 d n b1 a n 1 b2 a n 2 Overflow Handling Methodology No scaling implemented for overflow prevention Function Descriptions 4 43 iircas51 Special Requirements none Implementation Notes none Example Benchmarks Function Arguments 4 44 See examples iircas5 subdirectory Cycles Core nx 11 5 nbiq Overhead 40 Code size in 16 bit words 51 Cascaded IIR Direct Form 5 Coefs per Biquad short oflag iircas51 DATA x DATA h DATA r DATA dbuffer ushort nbiq ushort n
52. gy Not applicable Special Requirements none Implementation Notes none Example Benchmarks Function Arguments 4 12 See examples bexp subdirectory Cycles Core 9 nx Overhead 28 Code size in 16 bit words 29 words Complex Bit Reverse void cbrev DATA x DATA r ushort n defined in cbrev asm x 2 nx Pointer to complex input vector x r 2 nx Pointer to complex output vector r nx Number of complex elements of vectors x and r Lj To bit reverse the input of a complex FFT nx should be the complex FFT size Y Tobit reverse the input of a real FFT nx should be half the real FFT size Description Algorithm cbrev This function bit reverses the position of elements in complex vector x into out put vector r In place bit reversing is allowed Use this function in conjunction with FFT routines to provide the correct format for the FFT input or output data If you bit reverse a linear order array you obtain a bit reversed order array If you bit reverse a bit reversed order array you obtain a linear order array Not applicable Note The C54x Overflow Handling Methodology Not applicable Overflow Handling Methodology Not applicable Special Requirements Implementation Notes Example Benchmarks Memory alignment input data x must be aligned at 2 nx boundary The log nx 1 LSBits of address x must be zero Y xis read with bit reversed addressing and r is written in normal linear ad dre
53. his function requires the inclusion of two other files during assembling automatically included E macros asm contains all macros used for this code MW sintab q15 contains twiddle table section sintab Memory alignment Although there is no memory alignment request for this function you need to align input data if you use this function with func tion cbrev see page 4 13 This is an FFT optimized for time Space consumption is high due to the use of a separate sine table in each stage This reduce MIPS count but also increases twiddle table data space First 2 FFT stages implemented are implemented as a radix 4 Last stage is also unrolled for optimization Twiddle factors are built in and provided in the sintab q15 that is automatically included during the assembly pro cess Special debugging consideration This function is implemented as a mac ro that invokes different FFT routines according to the size As a conse quence instead of the cfft symbol being defined multiple cfft symbols are where nx FFT complex size This routine prevents overflow by scaling by 2 at each FFT intermediate stages See examples cfft subdirectory Benchmarks Function Arguments cfft32 8 cycles butterfly core only FFT size 8 16 32 64 128 256 512 1024 Code Size words Data Size words Cycles see note text section Sintab section 149 109 0 322 151 11 733 199 34 1672 247 81 3795 295 176 8542
54. in rifft asm where nx x nx Pointer to input vector x containing nx real elements in bit reversed order shown below for nx 8 Y 0 Re y nx 2 im gt DC and Nyquist y 2 Re y 2 Im y 1 Re y 1 Im y nx 2 Re y nx 2 Im where y fft x On output the vector x contains nx complex elements corresponding to IFFT x or the signal itself Complex numbers are stored in Re Im format nx Number of real elements in vector x nx must be a constant number not a variable and can take the following values nx 16 32 64 128 256 512 1024 scale Flag to indicate whether or not scaling should be implemented during computation If scale 0 scale factor 1 else scale factor nx end Computes a Radix 2 real DIT IFFT of the nx real elements stored in vector x in bit reversed order The original content of vector x is destroyed in the pro cess The 18 nx 2 complex elements of the IFFT x are stored in vector x in normal order IDFT Yk nx 1 1 a A 2 jsin 2 22tik scale factor ze am cos nx ISN nx i 0 Overflow Handling Methodology Scaling implemented for overflow prevention Special Requirements 4 70 Lj Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below Implementation Notes Example Lj d rifft This function requires the inclusion of two other files during assembling automatically included
55. is case Notice that normally an FFT of a real sequence of size N produces a com plex sequence of size N or 2 N real numbers that will not fit in the input sequence To accomodate all the results without requiring extra memory locations the output reflects only half of the spectrum complex output This still provides the full information because an FFT of a real sequence has even symmetry around the center or nyquist point N 2 Special debugging consideration This function is implemented as a mac ro that invokes different FFT routines according to the size As a conse quence instead of the rfft symbol being defined multiple rfft symbols are where nx FFT real size When scale 1 this routine prevents overflow by scaling by 2 ateach FFT intermediate stages and at the unpacking stage See examples rfft subdirectory 8 cycles butterfly core only Code Size words Data Size words FFT size Cycles see note text section Sintab section 16 264 171 11 32 541 213 34 64 1160 261 81 128 2516 309 176 256 5470 357 367 512 11881 405 750 1024 25716 453 1517 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects that Function Descriptions 4 69 rifft CiS Verse Real FFT in place Function Arguments Description Algorithm void rifft DATA x nx short scale defined
56. it overflow has occurred Y If oflag 0 a 32 bit overflow has not occurred This function calculates the power sum of products of a vector Power 0 for 0 i lt nx power x x Overflow Handling Methodology No scaling implemented for overflow handling Special Requirements none Implementation Notes none Example Benchmarks 4 64 See examples power subdirectory Cycles Core nx 4 Overhead 18 Code size in 16 bit words 18 rand16init CE 75 to Float Conversion Function Arguments Description Algorithm void q15tofl DATA x float r ushort nx defined in q152fl asm x nx Pointer to Q15 input vector of size nx r nx Pointer to floating point output data vector of size nx containing the floating point equivalent of vector x nx Length of input and output data vectors Converts the Q15 stored in vector x to IEEE floating point numbers stored vec tor r Not applicable Overflow Handling Methodology Saturation implemented for overflow handling Special Requirements none Implementation Notes none Example Benchmarks Function Arguments Description Algorithm See examples ug subdirectory Cycles Core 11 36 nx Overhead 15 Code size in 16 bit words 56 Initialize Random Number Generator void rand16init void defined in rand16i asm none Initializes seed for 16 bit random number generation routine Not applicable Overflow Handling Methodo
57. ized to 0 for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed U Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh nx Number of real elements in vector x and r nh Number of coefficients Interpolation factor For example an 2 means you will add one sample result for every sample oflag Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes an interpolating real FIR filter direct form using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sam ple filtering nx 1 ff S nm O lt j lt nr Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none 4 32 Implementation Notes none Example Benchmarks firs See examples decimate subdirectory Cycles Core nx 6 1 1 17 nh 1 Overhead 88 Code size in 16 bit words 74 U Symmetric FIR Filter short oflag int firs DATA x DATA r
58. lab Test utilities have been added to our test main drivers to automate this checking process Notice that a maxi mum absolute error value MAXERROR is passed to the test function to set the trigger point to flag a functional error We consider this testing methodology a good first pass approximation Further characterization of the quantization error ranges for each function under ran dom input as well as testing against a set of fixed point C models is planned for future releases We welcome any suggestions you as a user may have on this respect How DSPLIB Deals With Overflow and Scaling Issues 3 7 How DSPLIB Deals With Overflow and Scaling Issues One of the inherent difficulties of programming for fixed point processors is to determine how to deal with overflow issues Overflow occurs as a result of addition and subtraction operations when the dynamic range of the resulting data is larger than what the intermediate and final data types can contain The methodology used to deal with overflow should depend on the specifics of your signal the type of operation in your functions and the DSP architecture used In general overflow handling methodologies can be classified in five categories saturation input scaling fixed scaling dynamic scaling and sys tem design considerations It is important to note that a C54x architectural feature that makes overflow easier to deal with is the presence of guard bits in both C54x accumulators
59. lag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real lattice FIR filter implementation using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sam ple filtering nx 1 en ej nl xn eln eln hie n 1 i 1 2 N poor e n heln e n 1 i 1 2 N yin e n entr Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example Benchmarks See examples firlat subdirectory Cycles Core nx 3 nh 18 Overhead 61 Code size in 16 bit words 64 EM 256 2 Logarithm Function Arguments 4 48 short oflag log_2 DATA x LDATA r ushort nx defined in log_2 asm x nx Pointer to input vector of size nx Description Algorithm log 2 r nx Pointer to output data vector Q31 format of size nx nx Length of input and output data vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Computes the log base 2 of
60. logy No scaling implemented for overflow handling Special Requirements Allocation of bss section is required in linker command file Implementation Notes This function initializes a global variable rndnum in global memory to be used Example for the 16 bit random number generation routine rand16 See examples rand subdirectory Function Descriptions 4 65 rand16 Benchmarks Cycles Total 7 Code size in 16 bit words 5 CE Random Vector Generation Function short oflag rand16 DATA x ushort nx defined in rand16 asm Arguments x nx Pointer to input data vector 1 of size nx nx Number of elements of input and output vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description Computes vector of 16 bit random numbers Algorithm Linear Congruential Method Overflow Handling Methodology Not applicable Special Requirements none Implementation Notes none Example See examples rand16 subdirectory Benchmarks Cycles Core 13 nx 4 Overhead 10 Code size in 16 bit words 28 lrecip16 si 16 bit Reciprocal Function Function void recip16 DATA x DATA r DATA rexp ushort nx defined in recip16 asm Arguments x nx Pointer to input data vector 1 of size nx 4 66 rfft r nx Pointer to output data buffer rexp nx Pointer to exponent buffer for output values These exponent values are in integer format nx Number of eleme
61. mat number Thus in Q 15 format the decimal point is placed immediately to the right of the sign bit The fractional portion to the right of the sign bit is stored in regular twos complement format Topic Page ASE JO3i12 FOrMatincncnc ce ees e erable Scar A 2 A 2 SQA 5S OMIM AL ic a E a A 2 NITO RORIM AR erate rec tai e oa aia oia A 2 A 1 Q3 12 Format A 1 Q3 12 Format Q 3 12 format places the sign bit after the fourth binary digit from the right and the next 12 bits contain the twos complement fractional component The approximate allowable range of numbers in Q 3 12 representation is 8 8 and the finest fractional resolution is 2 12 2 441x104 Table A 1 Q3 12 Bit Fields EP paa Al E ETA A A UA E EEE E EA A 2 Q 15 Format Q 15 format places the sign bit at the leftmost binary digit and the next 15 left most bits contain the twos complement fractional component The approxi mate allowable range of numbers in Q 15 representation is 1 1 and the fin est fractional resolution is 2715 3 05 x 1075 Table A 2 Q 15 Bit Fields A 3 Q 31 Format Q 31 format spans two 16 bit memory words The 16 bit word stored in the low er memory location contains the 16 least significant bits and the higher memory location contains the most significant 15 bits and the sign bit The approximate allowable range of numbers in Q 31 representation is 1 1 and the finest fractional resolution is 2 31 4 66 x 10710
62. mediate stages See examples cfft subdirectory cifft32 Benchmarks 8 cycles butterfly core only Code Size words Data Size words FFT size Cycles see note text section Sintab section 8 149 109 0 16 322 151 11 32 733 199 34 64 1672 247 81 128 3795 295 176 256 8542 343 367 512 19049 391 750 1024 42098 439 1517 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects those conditions linker command file reflects those condi tions Gik 32 Bit Inverse Complex FFT Function void cifft32 LDATA x nx short scale defined in ci asm where nx Arguments x 2 nx nx scale Pointer to input vector containing nx complex elements 2 nx real elements in bit reversed order On output vector a contains the nx complex elements of the IFFT x Complex numbers are stored in Re Im x must be aligned at 2 nx boundary where nx IFFT size The log nx 1 LSBits of address x must be zero Number of complex elements in vector x nx must be a constant number not a variable and can take the following values nx 8 16 32 64 128 256 512 1024 Flag to indicate whether or not scaling should be implemented during computation if scale 0 scale factor nx else scale factor 1 end Function Descriptions 4 21 cifft32 Description Algorithm Computes a 32 bit Radix 2 complex
63. mplementation Notes Example Benchmarks Function Arguments 4 10 Linker command file you must allocate data section for polynomial coeffi cients _j atan x with O lt x lt 1 output scaling factor Pl _j Uses a polynomial to compute the arctan x for x lt 1 For x gt 1 you can express the number x as a ratio of 2 fractional numbers and use the atan2_ 16 function See examples atant subdirectory Cycles Core 11 nx Overhead 39 Code size in 16 bit words 32 Arctangent 2 Implementation short oflag atan2_16 DATA q DATA i DATA r ushort nx defined in arct2 asm q nx Pointer to quadrature input vector in Q15 format of size nx i nx Pointer to in phase input vector in Q15 format of size nx r nx Pointer to output data vector in Q15 format number representation of size nx containing In place processing allowed r can be equal to x On output r contains the arctangent of q l 1 Pl bexp nx Number of elements of input and output vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description This function calculates the arc tangent of the ratio q l where 1 lt atan2_16 Q l lt 1 representing an actual range of PI lt atan2_16 Q I lt Pl The result is placed in the resultant vector r Output scale factor correction PI For example if y 0x1999 0x1999 0x0 Oxe667 0x199
64. mples iircas4 subdirectory Cycles Core nx 11 4 nbiq Overhead 40 Code size in 16 bit words 50 Cascaded IIR Direct Form II 5 Coefs per Biquad short oflag iircas5 DATA x DATA h DATA r DATA dbuffer ushort nbiq ushort nx defined in lircas5 asm x nx Pointer to input data vector of size nx h 5 nbiq Pointer to filter coefficient vector with the following format h at1 a21 b21 b01 b11 ali a2i b2i bOI b1i where i is the biquad index i e a21 is the a2 coefficient of biquad 1 Pole recursive coefficients a Zero non recursive coefficients b Description Algorithm iircas5 r nx Pointer to output data vector of size nx r can be equal than x dbuffer 2 nbiq Pointer to address of delay line d Each biquad has 2 delay line elements separated by nbiq locations in the following format d1 n 1 d2 n 1 di n 1 d1 n 2 d2 n 2 di n 2 where i is the biquad index i e d2 n 1 is the n 1 th delay element for biquad 2 J Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 2 nbiq nbiq Number of biquads nx Number of elements of input and output vectors oflag Overflow fla
65. ned in dlms asm x nx h nh r nx dbuffer nh des nx step nh nx oflag Pointer to input vector of size nx Pointer to filter coefficient vector of size nh _j his stored in reversed order h n 1 h 0 where h n is at the lowest memory address Y Memory alignment h is a circular buffer and must start in a k bit boundary thatis the k LSBs of the starting address must be zeros where k log2 nh Pointer to output data vector of size nx r can be equal to x Pointer to location containing the address of the delay buffer Memory alignment the delay buffer is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh Pointer to expected output array Scale factor to control learning curve rate 2 mu Number of filter coefficients Filter order nh 1 nh gt 3 Length of input and output data vectors Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description Algorithm expn Adaptive Delayed LMS Least mean square FIR filter using coefficients stored in vector h Coefficients are updated after each sample based on the LMS algorithm and using a constant step 2 mu The real data input is stored in vector a The filter output result is stored in vector r LMS algorithm is used but adaptation using the previous error and the previous sample delayed to tak
66. nts of input and output vectors Description This routine returns the fractional and exponential portion of the reciprocal of aQ15number Since the reciprocal is always greater than 1 itreturns an expo nent such that r i rexp i true reciprocal in floating point Algorithm Appendix Calculating a reciprocal of a Q15 number Overflow Handling Methodology none Special Requirements none Implementation Notes none Example See examples recip16 subdirectory Benchmarks Cycles Core 4 nx 54 Overhead 24 Code size in 16 bit words 77 words 15 words of data space Cia Forward Real FFT in place Function void rfft DATA x nx short scale defined in rfft asm where nx Arguments x nx Pointer to input vector containing nx real elements in bit reversed order On output vector x contains the 18 half nx 2 complex elements of the FFT output in the following order Real FFT is a symmetric function around the Nyquist point and for this reason only half of the FFT x elements are required Function Descriptions 4 67 rfft Description Algorithm On output x will contain the FFT x y in the following format y 0 Re y nx 2 im gt DC and Nyquist y 1 Re y 1 lm y 2 Re y 2 Im y nx 2 Re y nx 2 Im Complex numbers are stored in Re Im format nx Number of real elements in vector x nx must be a constant number not a variable and can take the following values nx 16 32 64 128 256 512 1024 scale Flag to indicate wh
67. o DSPLIB let us know We will review and test your routine and make sure to include it in the next DSPLIB software rele ase Your contribution will be fully acknowledged and recognized by TI in the DSPLIB Application Report Acknowledgment Section Use this oppor tunity to make yourname known by your DSP industry peers Simply email your contribution to dsph ti com and we will get in contact with you Improved testing methodology and function characterization See section 3 6 How DSPLIB is Tested Allowable Error Increased code portability DSPLIB looks to enhance code portability across different TMS320 based platforms It is our goal to provide similar DSP libraries for other TMS320 devices that working in conjunction with C54x compiler intrinsics make C developing easier for fixed point device s However it is anticipated that a 100 portable library across TMS320 devices may not be possible due to normal device architectural differen ces TI will continue monitoring DSP industry standardization activities in terms of DSP function libraries In the event of the endorsement by the DSP community of a standard DSP library spec TI will take the necessary steps to evolve DSPLIB into industry compliance Using DSPLIB 3 9 Chapter 4 Function Descriptions Topic Page 4 1 Arguments and Conventions Used 2200eeeeeeee eee 4 2 4 2 DSPEIBiFunctions ii mete ain eee 4 3 4 1 Arguments and Conventions Used 4
68. odels supported by TI compilers 54xdsp lib for standards short call mode 16 bit 54xdspf lib for far call mode 24 bits 3 One source library to allow function customization by the end user 54xdsp src 4 Example programs and linker command files used under the 54x_test subdirectory How to Install DSPLIB 2 2 How to Install DSPLIB Read README 1ST file for specific details of release First Step De archive DSPLIB DSPLIB is distributed in the form of an executable self extracting ZIP file 54xdsplib exe that will automatically restore the DSPLIB individual compo nents in the same directory you execute the self extracting file from Follow ing is an example on how to install DSPLIB Just type 54xdsplib exe d The DSPLIB directory structure and content you will find is as follows 54xdsplib dir 54xdsp lib use for standards short call mode 54xdspf lib use for far call mode bIt54x bat re generate 54xdsp lib based on 54xdsp src blt54xf bat re generate 54xdspf lib based on 54xdsp src examples dir contains one subdirectory for each routine included in the library where you can find complete test cases include dir dsplib h include file with data types and function prototypes tms320 lib include file with type definitions to increase TMS320 portability doc dir code dir contains the examples shown in the application report Second Step Update Your C_DIR Environment Variable Append the full p
69. of Texas
70. on retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sample filtering nx 1 i X AKMt K 0 lt j lt nx k 0 where his symmetric for example h h0 h1 h2 h2 h1 hO where nh2 3 Only ho h1 h2 are stored in program memory pointed by the TI_LIB_COEFFS global la bel Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements Filter coefficients must be provided in program space with a global label called TI_LIB_COEFFS pointing to the start of the coefficient table Implementation Notes Although this routine is faster than the generic symmetric filter routine firs2 Example Benchmarks included in DSPLIB it is restrictive in that the address for the coefficients is hard coded to the global label TI_LIB_COEFFS in program memory This could be a problem in the event you want to use multiple filtering routines with different coefficient values If that is the case use the firs2 routine See examples firs subdirectory Cycles Core nx 16 nh Overhead 35 Code size in 16 bit words 56 CY Symmetric FIR Filter generic Function Arguments 4 34 short oflag int firs2 DATA x DATA h DATA r DATA dbuffer ushort nh2 ushort nx x nx Pointer to real input vector of nx real elements r nx Pointer to real input vector of nx
71. ook like this h 0 876 0 0 324 0 0 002 Implementation Notes You can also use the convolution function for filtering by having an input buffer Example Benchmarks Function Arguments 4 38 x padded with nh 1 zeros at the beginning of the x buffer However having an fir filter implementation that uses a totally independent delay buffer dbuffer gives you more control in the relocation in memory of your data buffers in the case of a dual buffering filtering scheme See examples fir subdirectory Cycles Core nx 4 nh Overhead 53 Code size in 16 bit words 42 Double precision IIR Filter short oflag iir32 DATA x LDATA h DATA r LDATA dbuffer ushort nbiq ushort nx x nx Pointer to input data vector of size nx h 5 nbiq r nx dbuffer 3 nbiq iir32 Pointer to the 32 bit filter coefficient vector with the following format For example for nbiq 2 h is equal to b21 high beginning of biquad 1 b21 low b11 high b11 low b01 high b01 low a21 high a21 low a11 high a11 low b22 high beginning of biquad 2 coefs b22 low b12 high b12 low b02 high b02 low a22 high a22 low a12 high a12 low Pointer to output data vector of size nx r can be equal than x Pointer to address of 32 bit delay line dbuffer Each biquad has 3 consecutive delay line elements For example for nbiq 2 d1 n 2
72. or codes 1 If any element is too large to represent in Q15 format 2 If any element is too small to represent in Q15 format 3 Both conditions 1 and 2 were encountered Convert the IEEE floating point numbers store in vector x into Q15 numbers stored in vector r The function returns the error codes if any element x i is not representable in Q15 format All values that exceed the size limit will be saturated to a Q15 1 or 1 depend ing on sign Ox7fff if value is positive 0x8000 if value is negative All values too small to be correctly represented will be truncated to 0 Not applicable Overflow Handling Methodology Saturation implemented for overflow handling Special Requirements none 4 36 hilb16 Implementation Notes none Example See examples expn subdirectory Benchmarks Cycles Core 19 40 nx Overhead 43 Code size in 16 bit words 60 DiS R Hilbert Transformer Function oflag short hilb16 DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx Arguments x nx Pointer to real input vector of nx real elements h nh Pointer to coefficient vector of size nh in normal order h b0 b1 b2 b3 b4 Every odd valued filter coefficient has to be 0 i e b1 b3 0 and h b0 0 b2 0 b4 0 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh r nx Pointer to real input vector of nx
73. ords 42 IE Decimating FIR Filter Function short oflag firdec DATA x DATA h DATA r DATA dbuffer ushort nh ushort nx ushort D defined in decimate asm Arguments x nx Pointer to real input vector of nx real elements h nh Pointer to coefficient vector of size nh in normal order h b0 b1 b2 b3 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh r nx D Pointer to real input vector of nx D real elements In place computation r x is allowed dbuffer nh Delay buffer Y Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed I Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh nx Number of real elements in vector x 4 30 Description Algorithm firinterp nh Number of coefficients D Decimation factor For example a D 2 means you drop every other sample Ideally nx should be a multiple of D If not the trailing samples will be lost in the process oflag Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes
74. osted as they become available in the same location you download this information Source code for previous releases will be kept public to prevent any customer problem in case we decide to discontin ue or change the functionality of one of the DSPLIB functions Make sure to read the readme 18 file available in the root directory of every release 6 3 DSPLIB Customer Support 6 2 If you have question or want to report problems or suggestions regarding the C54x DSPLIB contact Texas Instruments at dsph ti com We encourage the use of the software report form report txt contained in the DSPLIB doc direc tory to report any problem associated with the C54xDSPLIB Appendix A Overview of Fractional Q Formats Unless specifically noted DSPLIB functions use Q15 format or to be more ex act Q0 15 In a Qm n format there are m bits used to represent the twos com plement integer portion of the number and n bits used to represent the twos complement fractional portion m n 1 bits are needed to store a general Qm n number The extra bit is needed to store the sign of the number in the most sig nificant bit position The representable integer range is specified by 2M 2M and the finest fractional resolution is 27 For example the most commonly used format is Q 15 Q 15 means that a 16 bit word is used to express a signed number between positive and negative one The most significant binary digit is interpreted as the sign bit in any Q for
75. plemented as a mac ro that invokes different autocorrelation routines according to the type se lected As a consequence the acorr symbol is not defined Instead the acorr_raw acorr_bias acorr_unbias symbols are defined Lj Autocorrelation is implemented using time domain techniques Example See examples abias examples aubias examples araw subdirectories Benchmarks Cycles Abias Core na 1 na 2 nlags 13 26 Overhead 68 Araw Core 19 nr 10 na 2 na 3 Overhead 61 Aubias Core 4 nr 2 37 na 1 na 2 Overhead 68 Code size in 16 bit words Abias 95 words Araw 79 words Aubias 94 words adi Vector Acc Function short oflag add DATA x DATA y DATA r ushort nx ushort scale defined in add asm Arguments x nx Pointer to input data vector 1 of size nx In place processing allowed r can be x y y nx Pointer to input data vector 2 of size nx r nx Pointer to output data vector of size nx containing O x y if scale 0 Lj x y 2 if scale 1 nx Number of elements of input and output vectors nx gt 4 4 8 atan16 scale Scale selection Lj Scale 1 divide the result by 2 to prevent overflow Y Scale 0 does not divide by 2 oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description This function adds two vectors element by element Algorithm for i 0 i lt nx i
76. processor perfor mance for an specific function you should have in mind that the generic nature of DSPLIB might add extra cycles not required for customer specific use Topic Page 5 1 What DSPLIB Benchmarks are Provided 0 0oocoo oo 5 2 5 2 Performance Considerations ococcococcoccccnca 5 2 5 1 What DSPLIB Benchmarks are Provided 5 1 What DSPLIB Benchmarks are Provided DSPLIB documentation includes benchmarks for instruction cycles and memory consumption The following benchmarks are typically included Y Calling and register initialization overhead O Number of cycles in the kernel code Typically provided in the form of an equation that is a function of the data size parameters We consider the kernel or core code the instructions contained between the _ start and _end labels that you can see in each of the functions Lj Memory consumption Typically program size in 16 bit words is reported For functions requiring significant internal data allocation data memory consumption is also provided When stack usage for local variables is minimum that data consumption is not reported For functions in which is difficult to determine the number of cycles in the kernel code as a function of the data size parameters we have included direct cycle count for specific data sizes 5 2 Performance Considerations 5 2 Benchmark cycles presented assume best case conditions typically assum ing O w
77. quirements Implementation Notes Example 4 18 Lj Special linker command file sections required sintab containing the twiddle table For sintab section size refer to the benchmark information below Y This function requires the inclusion of two other files during assembling automatically included mM cfft_32 asm contains all functions used for this code MW sintab q31 contains twiddle table section sintab Q This is an FFT optimized for time Space consumption is high due to the usage of a separate sine table in each stage This reduce MIPS count but also increases twiddle table data space Y The First 2 FFT stages are implemented as a radix 4 Last stage is also unrolled for optimization Twiddle factors are built in and provided in the sintab q31 that is automatically included during the assembly process OU Special debugging consideration This function is implemented as a mac ro that invokes different FFT routines according to the size As a conse quence instead of the cfft32 symbol being defined multiple cfft82_ sym bols are where nx FFT complex size Y This routine prevents overflow by scaling by 2 at each FFT intermediate stages See examples cfft32 subdirectory Benchmarks Function Arguments cifft 37 cycles butterfly core only Code Size words Data Size words FFT size Cycles see note text section Sintab section 8 297 389 0 16 796 407 26 32 2097 429 74 64 5263 452
78. rameters to be passed on the stack in reverse order except for the first argument that is passed in the C54x Accumulator A Refer to the TMS320C54x Optimizing C Compiler User s Guide if a more in depth explanation is required Keep in mind that the TI DSPLIB is not an optimal solution for assembly only programmers Even though DSPLIB functions can be invoked from an assem bly program the result might not be optimal due to unnecessary C calling overhead 3 5 Where to Find Sample Code You can find examples on how to use every single function in DSPLIB in the examples subdirectory This subdirectory contains one subdirectory for each function For example the examples araw subdirectory contains the following files Y araw_t c main driver for testing the DSPLIB acorr raw function Y test h contains input data a and expected output data yraw for the acorr raw function This test h file is generated by using Matlab scripts Lj test c contains function used to compare the output of araw function with the expected output data Y abias cmd an example of a linker command you can use for this function C541 evm specific Using DSPLIB 3 5 How DSPLIB is Tested Allowable Error 3 6 How DSPLIB is Tested Allowable Error 3 6 Version 1 0 of DSPLIB is tested against Matlab scripts Expected data output has been generated from Matlab that uses double precision 64 bit floating point operations default precision in Mat
79. rds Data Size words FFT size Cycles see note text section Sintab section 16 264 171 11 32 541 213 34 64 1160 261 81 128 2516 309 176 256 5470 357 367 512 11881 405 750 1024 25716 453 1517 Note Assumes all data is in on chip dual access RAM and that there is no bus conflict due to twiddle table reads and instruction fetches provided linker com mand file reflects that ETT Sic Function short oflag sine DATA x DATA r ushort nx defined in sine asm Arguments x nx Pointer to input vector of size nx x contains the angle in radians between pi pi normalized between 1 1 in q15 format x xrad pi For example 450 pi 4 will be equivalent to x 1 4 0 25 0x200 in q15 format r nx Pointer to output vector containing the sine of vector x in q15 format nx Number of elements of input and output vectors nx gt 4 oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description Computes the sine of elements of vector x It uses the following Taylor series to compute the angle x in quadrant 1 O pi 2 4 72 sqrt_16 Algorithm for E0 i lt nx i y sin x i where x xrad pi Overflow Handling Methodology Not applicable Special Requirements Linker command file data section must be allocated Implementation Notes Computes the sine of elements of vector x It uses the following Taylor series to compute the angle x
80. re of the function operations there is no overflow to worry about Using DSPLIB 3 7 How DSPLIB Deals With Overflow and Scaling Issues 3 8 A couple of additional DSPLIB features relate to overflow scaling handling Lj DSPLIB reporting of overflow conditions overflow flag Due to the sometimes not predictible overflow risk most DSPLIB functions have been written to return an overflow flag oflag as an indication of a poten tially dangerous 32 bit overflow However keep in mind that due to the guard bits the C54x is capable of dealing with intermediate 32 bit over flows and still producing the correct final result Therefore the oflag parameter should be taken in the context of a warning but not a definitive error J Functions for handling of scaling and data block exponent DSPLIB includes a bexp that will return the maximum exponent extra sign bits of a vector to allow determination of correct input scaling As a final note DSPLIB is provided also in source format to allow customiza tion of DSPLIB functions to your specific system needs Where DSPLIB Goes from Here 3 8 Where DSPLIB Goes from Here We anticipate DSPLIB to improve in future releases in the following areas d Increased number of functions We anticipate the number of functions in DSPLIB will grow overtime We welcome user contributed code If dur ing the process of developing your application you develop a DSP routine that seems like a good fit t
81. real elements In place computation r x is allowed Description Algorithm firs2 h nh2 Pointer to vector containing 1st half the filter coefficients It assumes that the filter has a symmetric impulse response filter coefficients The total number of filter coefficients is 2 nh2 For example if The filter coefficients are bO b1 b1 bO then nh2 2 and h bO b1 dbuffer 2 nh2 Delay buffer of size nh 2 nh2 J Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 2 nh2 nx Number of real elements in vector x input samples nh2 Half the number of coefficients of the filter due to symmetry there is no need to provide the other half oflag Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real FIR filter direct form using the nh2 coefficients stored in array h data memory The filter is assumed to have a symmetric impulse re sponse so array h stores only the first half of the filter coefficients The real data input is stored in vector x The filter output result is stored in v
82. real elements In place computation r x is allowed dbuffer nh Delay buffer J Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed LJ Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh nx Number of real elements in vector x input samples nh Number of coefficients Function Descriptions 4 37 iir32 Description Algorithm oflag Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real FIR filter direct form using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion can be used for both block by block and sample by sample filtering nx 1 nh l X AAi 0 lt j lt nx k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements Every odd valued filter coefficient has to be 0 This is a requirement for the hil bert transformer For example a 5 tap filter may l
83. ros where k log2 2 nh r 2 nx Pointer to complex output vector of nx complex elements re Im in consecutive locations In place computation r x is allowed dbuffer 2 nh nx nh oflag Delay buffer Y Inthe case of multiple buffering schemes this array should be initialized to 0 for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed U Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 2 nh Number of complex elements in vector x input samples Number of complex coefficients Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real FIR filter direct form using coefficient stored in vector h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d containing the previous delayed values to allow consecutive processing of blocks This func tion can be used for both block by block and sample by sample filtering nx 1 Algorithm cfft nh f gt Adi 0O lt j lt nx k 0 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes none Example
84. rt power DATA x LDATA r ushort nx short rand16 DATA x ushort nx void rand16init void void recip16 DATA x DATA r DATA rzexp ushort nx short sqrt_16 DATA x DATA r short nx short sub DATA x DATA y DATA r ushort nx ushort scale g Matrix Functions short mmul DATA x1 short row1 short col1 DATA x2 short row2 short col2 DATA r short mtrans DATA x DATA r ushort nx h Miscellaneous Functions short bexp DATA x ushort nx void fltoq15 float x DATA r ushort nx void q15tofl DATA x float r ushort nx Description Index for maximum magnitude in a vector Maximum magnitude in a vector Index for minimum magnitude in a vector Minimum element in a vector 32 bit vector multiply 16 bit vector negate 32 bit vector negate sum of squares of a vector power Random number vector generator Random number generator initialization Vector reciprocal Square root of a vector Vector subtraction Description matrix multiply matrix transponse Description max exponent extra sign bits of vector to allow determination of correct inputscaling Float to Q15 conversion Q15 to float conversion 4 6 acorr lacorr si Autocorrelation Function Arguments Description Algorithm short oflag acorr DATA x DATA r ushort nx ushort nr type defined in araw asm abias asm aubias asm x nx Pointer to real input vector of nx real elements nx gt nr
85. s Instruments License Agreement for DSP Code IF YOU DOWNLOAD OR USE THIS PROGRAM YOU AGREE TO THESE TERMS Texas Instruments Incorporated grants you a license to use the Program only in the country where you acquired it The Program is copyrighted and licensed not sold We do not transfer title to the Program to you You obtain no rights other than those granted you under this license Under this license you may 1 Use the Program on one or more machines at a time 2 Make copies of the Program for use or backup purposes within your enter prise 3 Modify the Program and merge it into another program and 4 Make copies of the original file you downloaded and distribute it provided that you transfer a copy of this license to the other party The other party agrees to these terms by its first use of the Program You must reproduce the copyright notice and any other legend of ownership on each copy or partial copy of the Program You may NOT 1 Sublicense rent lease or assign the Program and 2 Reverse assemble reverse compile or otherwise translate the Program 3 Use it in non Tl DSPs We do not warrant that the Program is free from claims by a third party of copy right patent trademark trade secret or any other intellectual property in fringement Under no circumstances are we liable for any of the following C 1 Texas Instruments License Agreement for DSP Code C 2 1 Third party claims against you
86. s i ri bk i 1 bki 2 u ef xi k See examples ndlms subdirectory Cycles Core 85 bsize nh 18 bsize nb nx Overhead 88 Code size in 16 bit words 144 Normalized Delayed LMS Filter short oflag ndlms DATA x DATA h DATA r DATA dbuffer DATA des ushort nh ushort nx int _tau int cutoff int gain DATA norm_d defined in ndlms asm x nx input data vector of size nx reference input h nh Pointer to filter coefficient vector of size nh YU his stored in reversed order h n 1 h 0 where h n is at the lowest memory address Y Memory alignment h is a circular buffer and must start in a k bit boundary thatis the k LSBs of the starting address must be zeros where k log2 nh r nx Pointer to output data vector of size nx r can be equal to x dbuffer nh Pointer to location containing the address of the delay buffer Memory alignment the delay buffer is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh des nx Pointer to expected output array Description Algorithm nh Number of filter coefficients Filter order nh 1 nh gt 3 nx Length of input and output data vectors _tau Decay constant for long term filtering of power estimate cutoff the lowest allowed value for power estimate gain step size constant 2 beta betal abs_power 2 gain abs_ power norm_d pointer to normaliz
87. size nx In place processing allowed Special cases LJ ifx 1 32768 2 16 then r 1 321767 2 16 with oflag 1 Y ifx 1 32767 2 16 then r 1 321768 2 16 with oflag 1 Number of elements of input and output vectors nx gt 4 Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred This should be take it as a warning overflow in negation of a Q31 number can happen naturally when negating 1 This function negates each of the elements of a vector fractional values for E0 i lt nx i X i x i Overflow Handling Methodology Saturation implemented for overflow handling Special Requirements none Implementation Notes none Example See examples neg32 subdirectory Function Descriptions 4 63 power Benchmarks Function Arguments Description Algorithm Cycles Core 4 nx 4 Overhead 18 Code size in 16 bit words 19 Vector Power short oflag power DATA x LDATA r ushort nx defined in power asm x nx r 1 nx oflag Pointer to input data vector 1 of size nx In place processing allowed r can be x y Pointer to output data vector element in Q31 format Special cases O if x 1 32768 2 16 then r 1 321767 2 16 with oflag 1 O if x 1 32767 2 16 then r 1 321768 2 16 with oflag 1 Number of elements of input vectors nx gt 4 Overflow flag Y Ifoflag 1 a 32 b
88. ssing O In place bit reversing x r is much more cycle consuming compared with the off place bit reversing x lt gt r However this is at the expense of doub ling the data memory requirements See examples cfft and examples rfft subdirectories Cycles Core 2 3 nx off place 13 nx 26 in place Overhead 21 Code size in 16 bit words 50 includes support for both in place and off place bit reverse Note The C54x is capable to do an off place bit reverse in 2 n by using the following code stm N ar0O stm INPUT ar2 source address of data rpt N 2 1 looping 2 N times mvdk ar2 0b DATA The drawback of this implementation is the hard coding of the destination ad dress with label DATA The cbrev DSPLIB implementation has chosen a more generic solution at the expense at one extra cycle 3 nx Function Descriptions 4 13 cfir CL Complex FIR Filter short oflag cfir DATA x DATA h DATA r DATA dbuffer ushort nh Function Arguments Description 4 14 ushort nx x 2 nx h 2 nh Pointer to compex input vector of nx complex elements re Im in consecutive locations Pointer to coefficient vector of size 2 nh nh complex elements with re Im in consecutive locations in normal order For example if nh 3 h bOre b0Oim b1re b1im b2re b2im Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be ze
89. t nx ushort scale defined in sub asm x nx Pointer to input data vector 1 of size nx In place processing allowed r can be x y y nx Pointer to input data vector 2 of size nx r nx Pointer to output data vector of size nx containing g x y if scale 0 g x y 2 if scale 1 nx Number of elements of input and output vectors nx gt 4 sub scale Scale selection Lj Scale 1 divide the result by 2 to prevent overflow Y Scale 0 does not divide by 2 oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description This function adds two vectors element by element Algorithm for E0 i lt nx i z 1 x 0 y i Overflow Handling Methodology Scaling implemented for overflow prevention user selectable Special Requirements none Implementation Notes none Example See examples sub subdirectory Benchmarks Cycles Core 12 3 nx 2 Overhead 30 Code size in 16 bit words 39 Function Descriptions 4 75 4 76 Chapter 5 DSPLIB Benchmarks and Performance Issues All functions in the DSPLIB are provided with execution time and code size benchmarks While developing the included functions we tried to compromise between speed code size and ease of use However with few exceptions the highest priority was given to optimize for speed and ease of use and last for code size Even though DSPLIB can be used as a first estimation of
90. ter to lattice coefficient vector of size nh in normal order h b0 b1 b2 b3 Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh r nx Pointer to real input vector of nx real elements In place computation r x is allowed d nh Delay buffer Y Inthe case of multiple buffering schemes this array should be initialized to O for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed I Memory alignment this is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh nx Number of real elements in vector x input samples nh Number of coefficients oflag Overflow error flag 1 if a 32 bit data overflow has occurred in an intermediate or final result 0 if no 32 bit data overflow has occurred in an intermediate or final result Computes a real lattice IIR filter implementation using coefficient stored in vec tor h The real data input is stored in vector x The filter output result is stored in vector r This function retains the address of the delay filter memory d con taining the previous delayed values to allow consecutive processing of blocks This function can be used for both block by block and sample by sample filter ing nx 1 firlat Algorithm ealn xin eln e n hie
91. the following people Aaron Aboagye Jeff Axelrod Karen Baldwin Philippe Cavalier Pascal Dorster Allison Frantz Pedro Gelabert Mike Hannah Jeff Hayes Natalie Messine Jelena Nikolic Greg Peake Rosemarie Piedra Cesar Ramirez Alex Tessarolo Carol Chow Pierre Ponce Julius Kusuma Contents INTOUCHON 2 Lees Bete ee Ea eee eee ee ence ein eta eee 1 1 IL DSP Routines ee tis Neha Selah as kaa 1 2 1 2 Features and Benefits oss retsina nenii a a eee eee 1 2 1 2 1 DSPLIB Quality Freeware That You Can Build on and Contribute to 1 2 Installing DSPEIB lt A ee eet ei A a 2 1 2 1 DSPLIB Content 0 teens 2 2 2 2 Howto Install DSPLIB Suave tasas air o ic e 2 3 First Step De archive DSPLIB ooooocccccccccccc eee eens 2 3 Second Step Update Your C_DIR Environment Variable o o ooooo 2 3 2 3 Howto Rebuild DSPLIB 0 teen eae 2 4 2 3 1 For Full Rebuild of 54xdsp lib and or 54xdspf lib 2 4 2 3 2 For Partial Rebuild of 54xdsp lib and or 54xdspf lib Modification of a Specific DSPLIB Function for example fir asm 2 4 Using DSPLIB saca iner a aa a a a aa a a a aaa aaa 3 1 3 1 DSPLIB Data Types 0 cece ERDRE br E inina i baaie 3 2 3 2 DSPLIB Arguments vet AA dee fase ae tisk 3 2 3 3 Calling a DSPLIB Function from 2 eee 3 3 3 4 Calling a DSPLIB Function from Assembly 0000 eee cence eee 3 5 3 5 Where to Find
92. ug out Using DSPLIB 3 3 Calling a DSPLIB Function from C 3 4 SS KK 2 o lt _ lt lt _2_2za_ 0 aaa Note The examples presented in this application report have been tested using the Texas Instruments C54x EVM containing a C541 Therefore the linker command file used reflects the memory configuration available in that board Customization may be required to use it with a different board No overlay mode is assumed default after C54x device reset Refer to the TMS320C54x Optimizing C Compiler User s Guide if more in depth explanation is required noo DSPLIB routines modify the 54x FRCT bit This can cause problems for users of versions of the compiler cl500 prior to version 3 1 if interrupt service routines ISRs are implemented in C Versions prior to 3 1 do not preserve the FRCT bit on ISR entry therefore the FRCT bit may be corrupted and not restored which will lead to incorrect results One solution is to implement the ISRs in assembly and preserve the FRCT bit Users with version 3 1 and above need not worry about this Calling a DSPLIB Function from Assembly 3 4 Calling a DSPLIB Function from Assembly The C54x DSPLIB functions were written to be used from C Calling the func tions from Assembly language source code is possible as long as the calling function conforms with the Texas Instruments C54x C compiler calling conven tions This means that the DSPLIB functions expect pa
93. uirements none Implementation Notes none Example See examples mtrans subdirectory Benchmarks Cycles Core 5 col 6 Overhead 44 Code size in 16 bit words 34 4 56 CEA 32 bit Vector Multiply Function Arguments Description Algorithm short oflag mul32 LDATA x LDATA y LDATA r ushort nx defined in mul32 asm mul32 x nx Pointer to input data vector 1 of size nx In place processing allowed r can be x y y nx Pointer to input data vector 2 of size nx r nx Pointer to output data vector of size nx containing nx Number of elements of input and output vectors nx gt 4 oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred Y If oflag 0 a 32 bit overflow has not occurred This function multiply two 2 32 bit Q31 vectors element by element and pro duce a 32 bit Q31 vector for i 0 i lt nx i z i x i y i Overflow Handling Methodology Scaling implemented for overflow prevention User selectable Special Requirements none Implementation Notes none Example Benchmarks See examples add subdirectory Cycles Core 7 nx 4 Overhead 29 Code size in 16 bit words 35 Function Descriptions 4 57 nblms Inbims Normalized Block LMS Block Filter short oflag nblms DATA x DATA h DATA r DATA dbuffer DATA des ushort nh ushort nx ushort nb DATA norm_e int _tau int cutoff int gain defined in nblms asm Function Arg
94. uments 4 58 x nx h nh r nx dbuffer nh des nx nh nx nb bsize norm_e tau cutoff gain Input data vector of size nx reference input Pointer to filter coefficient vector of size nh _j his stored in reversed order h n 1 h 0 where h n is at the lowest memory address Y Memory alignment h is a circular buffer and must start in a k bit boundary thatis the k LSBs of the starting address must be zeros where k log2 nh Pointer to output data vector of size nx r can be equal to x Pointer to location containing the address of the delay buffer Memory alignment the delay buffer is a circular buffer and must start in a k bit boundary that is the k LSBs of the starting address must be zeros where k log2 nh Pointer to expected output array Number of filter coefficients Filter order nh 1 nh gt 3 Length of input and output data vectors number of blocks blocksize number of coefficients to be updated for each input sample Note nh number of coefficients nb bsize pointer to normalized error buffer decay constant for long term filtering of power estimate the lowest allowed value for power estimate step size constant 2 beta beta1 abs_power 2 gain abs_power nblms oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Description Normalized Delayed LMS NDLMS Block FIR implementation using coe
95. ut and output data vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred Y If oflag 0 a 32 bit overflow has not occurred Computes the log base 10 of elements of vector x using Taylor series for i 0 i lt nxjit y log10 x where 1 lt x i lt 1 Overflow Handling Methodology No scaling implemented for overflow prevention Special Requirements none Implementation Notes y 0 4343 In x with x M x 24P x M 24P 4 50 y 0 4343 In M In 2 P y 0 4343 In 2 M P 1 In 2 y 0 4343 In 2 M 1 1 P 1 In 2 y 0 4343 f 2 M 1 P 1 In 2 with f u In 1 u We use a polynomial approximation for f u f u C6 u C5 u C4 u C3 u C2 u C1 u C0 for 0 lt u lt 1 The polynomial coefficients Ci are as follows CO 0 000 001 472 C1 0 999 847 766 C2 0 497 373 368 C3 0 315 747 760 C4 0 190 354 944 C5 0 082 691 584 C6 0 017 414 144 Example Benchmarks Function Arguments Description Algorithm logn The coefficients Bi used in the calculation are derived from the Ci as follows BO Q30 1581d 0062Dh B1 Q14 16381d O3FFDh B2 Q15 16298d 0C056h B3 Q16 20693d 050D5h B4 Q17 24950d O9E8Ah B5 Q18 21677d 054ADh B6 Q19 9130d ODC56h See examples log_10 subdirectory Cycles Core 55 nx Overhead 56 Code size in 16 bit words 82 Base e Logarithm natural logarithm short oflag logn D
96. x defined in lircas51 asm x nx h 5 nbiq r nx dbuffer 4 nbiq Pointer to input data vector of size nx Pointer to filter coefficient vector with the following format h b01 b11 b21 a11 a21 bOI b11 b2l ail a2l where is the biquad index i e a21 is the a2 coefficient of biquad 1 where is the biquad index i e a21 is the a2 coefficient of biquad 1 Pole recursive coefficients a Zero non recursive coefficients b Pointer to output data vector of size nx r can be equal than x Pointer to adress of delay line dbuffer Each biquad has 4 delay line elements stored consecutively in memory in the following format x1 n 1 x1 n 2 y1 n 1 y1 n 2 xi n 2 xi n 2 yi n 1 yi n 2 where is the biquad index i e x1 n 1 is the n 1 th delay element for biquad 1 Y Inthe case of multiple buffering schemes this array should be initialized to 0 for the first block only Between consecutive blocks the delay buffer preserves the previous r output elements needed Description Algorithm iircas51 LJ Memory alignment No need for memory alignment nbiq Number of biquads nx Number of elements of input and output vectors oflag Overflow flag YU If oflag 1 a 32 bit overflow has occurred YU If oflag 0 a 32 bit overflow has not occurred Computes a cascade IIR filter of nbiquad biquad sections Each biquad sec tion is implemented using Direct form I All biquad coe
97. xpress or implied is granted under any patent right copyright mask work right or other intellectual property right of Tl covering or relating to any combination machine or process in which such products or services might be or are used TI s publication of information regarding any third party s products or services does not constitute Tl s approval license warranty or endorsement thereof Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties conditions limitations and notices Representation or reproduction of this information with alteration voids all warranties provided for an associated TI product or service is an unfair and deceptive business practice and TI is not responsible nor liable for any such use Resale of Tl s products or services with statements different from or beyond the parameters stated by TI for that products or service voids all express and any implied warranties for the associated TI product or service is an unfair and deceptive business practice and TI is not responsible nor liable for any such use Also see Standard Terms and Conditions of Sale for Semiconductor Products www ti com sc docs stdterms htm Mailing Address Texas Instruments Post Office Box 655303 Dallas Texas 75265 Copyright 2001 Texas Instruments Incorporated About This Manual Preface Read This First Th
98. z i x i y 1 Overflow Handling Methodology Scaling implemented for overflow prevention User selectable Special Requirements none Implementation Notes none Example See examples add subdirectory Benchmarks Cycles Core 12 3 nx 2 Overhead 30 Code size in 16 bit words 39 Em Arctangent Implementation Function short oflag atan16 DATA x DATA r ushort nx defined in atant asm Arguments x nx Pointer to input data vector of size nx x contains the tangent of r where x lt 1 r nx Pointer to output data vector of size nx containing the arctangent of x in the range pi 4 pi 4 radians In place processing allowed r can be equal to x e g atan 1 0 0 7854 or 64 78h nx Number of elements of input and output vectors oflag Overflow flag Lj If oflag 1 a 32 bit overflow has occurred _j If oflag 0 a 32 bit overflow has not occurred Function Descriptions 4 9 atan2_16 Description Algorithm This function calculates the arc tangent of each of the elements of vector x The result is placed in the resultant vector r and is in the range pi 2 to pi 2 in ra dians For example if x Ox7fff 0x3505 0x1976 0x0 equivalent to tan PI 4 tan PI 8 tan P1 16 O in float atan16 x r 4 should give r 0x6478 0x3243 0x1921 0x0 equivalent to PI 4 P1 8 PI 16 0 for i 0 i lt nx i r i atan x i Overflow Handling Methodology Not applicable Special Requirements I

"TMS320C54x DSP Library User's Guide"

Contents

Download Pdf Manuals

Related Search

Related Contents

&quot;TMS320C54x DSP Library User's Guide&quot;

Contents

Download Pdf Manuals

Related Search

Related Contents

"TMS320C54x DSP Library User's Guide"