Home

"TMS320C4x General-Purpose Applications

1. ITLE DMA AUTOINITIALIZATION WITH COMMUNICATION PORT ICRDY HIS EXAMPLE SETS UP DMA CHANNEL 0 TO WAIT FOR COMMUNICATION PORT TO INPUT THE INITIALIZATION VALUE THE DMA AUTOINITIAL IZATION AND TRANSFER ARE BOTH DRIVEN BY ICRDY 0 FLAG AFTER DMA AUTOINIT IS COMPLETED THE DMA CHANNEL STARTS TRANSFERRING DATA FROM COMM PORT INPUT REGISTER TO INTERNAL RAM WITH ICRDY 0 READ SYNCHRONIZATION THE VALUES IN COMM PORT 0 INPUT FIFO SHOULD BE SEQUENCE VALUE bel M T 00C40047H STOP AFTER TRANSFER COMPLETED OR 00C4054BH REPEAT AFTER TRANSFER COMPLETED 2 00100041H i 3 OH 4 20H 5 002FF800H he 6 1H 7 00100041H data DMAO word 001000A0H DMA channel 0 map address DMA INIT word 0004054BH DMA initialization control word LINK word 00100041H Comm port input register address DMA START word 00C4054BH DMA start control word text START LDP DMAO Load data page pointer LDA DMAO ARO Point to DMA channel 0 registers LDI DMA_INIT RO Initialize DMA control register STI RO ARO LDI LINK RO Initialize DMA link pointer STI RO ARO 6 LOT DMA_START RO Start DMA channel 0 transfer STI RO ARO LDI 01H DIE Enab
2. ck ck ck 0k KKK KKK KKK KKK KA A KA KKK KA KKK KKK KA XK AXA AKA KA AKA AKA KK KKK KKK KK TITLE TABLE WITH TWIDDLE FACTORS FOR A 64 POINT FFT FILE TO BE LINKED WITH THE SOURCE CODE FOR A 64 POINT RADIX 2 DIF COMPLEX FFT OR A RADIX 4 DIF COMPLEX FFT SINE TABLE LENGTH 5 FFTSIZE 4 ck ck ck 0k ck ck 0k ck Sk ck ck KA AKA KA A KA KAZ A KA XK AXA A KA ck ck AXA AKA KA AKA AKA kk ck Sk kv kv kx ko ko globl SINE sect sintab _SINE float 0 000000 float 0 098017 float 0 195090 float 0 290285 float 0 382683 float 0 471397 float 0 555570 float 0 634393 float 0 707107 float 0 773010 float 0 831470 float 0 881921 float 0 923880 float 0 956940 float 0 980785 float 0 995185 _COSINE float 1 000000 float 0 995185 float 0 980785 float 0 956940 float 0 923880 float 0 881921 float 0 831470 float 0 773010 float 0 707107 float 0 634393 float 0 555570 float 0 471397 float 0 382683 float 0 290285 float 0 195090 float 0 098017 float 0 000000 float 0 098017 float 0 195090 float 0 290285 float 0 382683 float 0 471397 float 0 55557 0 float 0 634393 float 0 707107 float 0 773010 float 0 831470 float 0 881921 float 0 923880 6 32 Fast Fourier Transforms FFTs
3. MPYF3 ARO IRO AR4 RA 7R4 X I4 SIN MPYF3 ARO AR3 R1 RL X 13 SIN MPYF3 ARO IR1 AR4 RO RO X I4 COS MPYF3 ARO AR3 RO RO X I3 COS HI SUBF3 R1 RO R3 R3 X I3 SIN X 14 C08 MPYF3 ARO IRO ARA RO E ADDF3 RO R4 R2 R2 X I3 COS X I4 SIN SUBF3 AR2 R3 R4 R4 R3 X I2 RPTBD IN BLK ADDF3 AR2 R3 R4 R4 R3 X I2 STF R4 AR3 SX I3 SUBF3 R2 AR1 R4 R4 X I1 R2 STF R4 AR4 X I4 lt ADDF3 AR1 R2 R4 R4 X I1 R2 STF R4 AR2 PX I2 LDF ARO IR1 R3 MPYF3 AR4 R3 R4 STF RA AR1 X I1 lt MPYF3 AR3 R3 R1 MPYF3 ARO AR3 RO SUBF3 R1 RO R3 MPYF3 ARO IRO ARA RO ADDF3 RO R4 R2 SUBF3 AR2 R3 R4 ADDF3 AR2 R3 R4 STF R4 AR3 SUBF3 R2 AR1 R4 STF R4 AR4 IN BLK ADDF3 AR1 R2 R4 STF R4 AR2 LDF ARO IR1 R3 MPYF3 AR4 R3 R4 STF R4 AR1 MPYF3 AR3 R3 R1 MPYF3 ARO AR3 RO SUBF3 R1 RO R3 LDA R6 IR1 ADDF3 RO R4 R2 SUBF3 AR2 R3 R4 ADDF3 AR2 R3 R4 HI STF R4 AR3 IR1 SUBF3 R2 AR1 R4 E STF R4 AR4 IR1 ADDF3 AR1 R2 R4 HI STF R4 AR2 IR1 STF R4 AR1 IR1 SUBI3 AR5 AR1 RO CMPI FFT_SIZE RO BLTD INLOP LOOP BACK TO THE INNER LOOP LDA SINE_TABLE ARO ARO POINTS TO SIN COS TABLE Applications Oriented Operations 6 71 Fast Fourier Transfor
4. Perform Bit Reversing On Odd Locations 1st Half Only LDI FFT_SIZE RC LSH L RO LDA RC IRO LDA DEST_ADDR ARO LDA ARO AR1 ADDI 1 ARO ADDI TRO AR1 LSH 1 RC LDA RC IRO SUBI 2 RC RPTBD BITRV3 STE R1 AR2 LDF ARO RO LDF AR1 R1 LDF ARO IR1 RO STF RO AR1 IRO B BITRV3 LDF AR1 R1 STF R1 ARO IR1 STF RO AR1 STF R1 ARO BR DIVISION r Check Data Source Locations F If SourceAddr DestAddr Then do nothing If SourceAddr lt gt DestAddr Then move data F MOVE_DATA LDI SOURCE_ADDR RO CMPI DEST_ADDR RO BEQ DIVISION LDI QFFT SIZE RO SUBI 2 R0 LDA SOURCE_ADDR ARO LDA DEST_ADDR AR1 LDF ARO R1 RPTS RO LDF ARO R1 STF RI AR1 STF R1 AR1 DIVISION LDA 2 IRO LDI GFFT SIZE RO FLOAT RO exp LOG SIZE PUSHF RO 32 MSB S saved POP RO NEGI RO Neg exponent PUSH RO POPF RO RO 1 FFT SIZE LDA DEST_ADDR AR1 LDI FFT_SIZE RC LSH 1 RE SUBI 2 RC RPTBD LAST_LOOP LDA DEST_ADDR AR2 NOP AR2 6 84 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued MPYF3 RO AR1 R1 lst location MPYF3 RO AR2 R2 2nd 4th 6th location M STF R1 AR1 IRO LAST LOOP MPYF3 RO AR1 R1 Srd 5th 7th location M STF R2 AR2 IRO MPYF3 RO AR2 R2 last location BI STF R1 AR1 STF R2 AR2 Return to C environment POP DP Restore C enviro
5. TITLE INVERSE OF A FLOATING POINT NUMBER WITH 32 BIT MANTISSA ACCURACY SUBROUTINE INVF HE FLOATING POINT NUMBER v IS STORED IN RO AFTER THE COMPUTATION IS COMPLETED 1 v IS STORED IN Rl TYPICAL CALLING SEQUENCE LAJU INVF LDF v RO NOP lt can be other non pipeline break NOP instructions ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION PM M P RO v NUMBER TO FIND THE RECIPROCAL OF s UPON THE CALL R1 1 v UPON THE RETURN REGISTER USED AS INPUT RO REGISTERS MODIFIED R1 R2 REGISTER CONTAINING RESULT R1 REGISTER FOR SUBROUTINE CALL R11 CYCLES 7 not including subroutine overhead WORDS 8 not including subroutine overhead global INVF INVF RCPF R R1 Get x 0 the estimate of 1 v RO v MPYF3 R1 R0 R2 SUBRF 2 0 R2 MPYF R2 R1 End of first iteration 16 bits accuracy BUD R11 Delayed return to caller MPYF3 R1 R0 R2 SUBRF 2 0 R2 MPYF R2 R1 End of second iteration 32 bits accuracy K R1 1 v Return to caller end 3 14 Galculating a Sguare Root 3 6 Calculating a Square Root In many applications normalization of data values is necessary Often the normalizing factor is the square root of another quantity For example given a vector the unit vector in the same direction as the original vector can be
6. MH 10 6 10 2 1 Tool Activated ZIF PGA Socket TAZ 0 00 cece eee eee ees 10 7 10 2 2 Handle Activated ZIF PGA Socket HAZ 000 ee cece eee eee 10 8 10 3 Part Order Information 0 00 cece eee eet eee eens 10 9 10 331 Nomietnclatute 2 0214 5de ed tenenan i tree neek bbb da l Bab beaded 10 9 10 3 2 Device and Development Support Tools 000 cece eee 10 10 11 XDS510 Emulator Design Considerations lt lt lt lt lt lt lt lt s lt 1 lt 4 11 1 Describes the JTAG emulator cable Tells you how to construct a 14 pin connector on your tar get system and how to connect the target system to the emulator 11 1 Designing Your Target System s Emulator Connector 14 Pin Header 11 2 112 Bus P IOIoCOL zz td s l p ka le ed eda edes e rient odsun wes 11 3 11 3 IEEE 1149 1 Standard 2 5 iR gu PRRIRIERT eed ob i da 11 3 11 4 JTAG Emulator Cable Pod Logic teens 11 4 11 5 JTAG Emulator Cable Pod Signal Timing 2 eee eee 11 5 11 6 Emulation Timing Calculations 2 tenes 11 6 11 7 Connections Between the Emulator and the Target System 11 8 11 7 1 Buffering Signals 0 cece teens 11 8 11 7 2 Using a Target System Clock oooooccoccccccccnocn 11 10 11 7 8 Configuring Multiple Processors 11 11 11 8 Mechanical Dimensions for the 14 Pin Emulator Connector 11 12 1
7. ITLE C4x TO IEEE CONVERSION WITHIN BLOCK MEMORY RANSFER PROGRAM ASSUMES THAT OUTPUT FIFO OF COMMUNICATION PORT 0 IS EMPTY EIGHT DATA WORDS ARE TRANSFERRED FROM INTERNAL RAM BLOCK 0 TO COMMUNICATION PORT 0 AND THE DATA FORMAT IS CONVERTED FROM C4x FLOATING POINT FORMAT TO IEEE FORMAT LDI CPO_OUT ARO Load comm port0 output FIFO address LDI RAMO AR1 Load internal RAM block 0 address OIEEE AR1 1 RO Convert first data RPTS 6 OIEEE AR1 1 RO Convert next data 1 STF RO ARO Store previous data STF RO ARO Store last data Logical and Arithmetic Operations 3 21 3 22 Chapter 4 Memory Interfacing The C4x s advanced interface design can be used to implement a wide variety of system configurations Its two external buses and DMA capability provide a flexible parallel 32 bit interface to byte or word wide devices This chapter describes how to use the CAx s memory interfaces to connect to various external devices Specific discussions include implementation of a parallel interface to devices with and without wait states and implementing system control functions 4 1 System Configuration 51 das 4 2 4 2 Extemnalllnteitacin o erster pret 4 3 4 3 Global and Local Bus Interfaces s 4 4 4 4 Zero Wait State Interfacing to RAMS 44 4 5 4 5 Wait States and Ready Generation 4 4 11 4 6 Parallel Processing Through S
8. I 2 SOURCE_AD for the twiddle factors is The sine cosine tabl the following format SINE_TABLE 0 gt SINE TABLE FFT SIZE 2 1 gt NOTE The table is th passing are supported 3 The SOURCE A are zero th service routines Sections needed in linker command file DDR must be aligned such that the first LOG DR FFT SIZE sin 0 2 pi FFT SIZI sin 1 2 pi FFT SIZI 1 The output data array will contain only real values Bit reversal optionally implemented at the end of the function gt I 1 xpected to be s E EE is upplied in sin FF _SIZI E 2 2 2 pi FE sin FF _SIZI _SIZI EE E 2 1 2 pi FF first half period of a sine wave E 1 Calling C program can be compiled using either large or Both calling conventions methods stack or register for I fttxt EEE Eftdat fft is is not checked by the program 1 R2 R3 R4 ARO AR1 AR2 AR3 AA FF F FF F FF HF FF A HF FF FF HF FF ACA A HF FF FF FF F FF A AA FF F FF FF X FH REGISTERS USED RO R IRO RC R DP MEMORY REQUIREMENTS IR1 S RE Program 322 Data 7 R5 R6 R7 AR4 AR5 AR6 AR7 Words approximately Words _SIZI ES small model parameter code data SIZE bits DP initialized only once in the program Be wary with interrupt Ens
9. 2 005 3 16 64 Bit Additlon cion oed Chari deed edad ge mann begga dunce ie aue dau 3 17 64 Bit Subtraction 0 hn 3 18 32 Bit by 32 Bit Multiplication 4444444444444 eens 3 18 IEEE to C4x Conversion Within Block Memory Transfer eee 3 21 C4x to IEEE Conversion Within Block Memory Transfer ee eee 3 21 PLD Equations for Ready Generation 0 00 00 cece eee k 4 16 Exchanging Objects in Memory 0 0c cece eect tenets 5 2 Optimizing Loop s S Gh deed nn eR ank Be dada O O doped 5 3 Allocating Large Array Objects 0 060 4 teen ees 5 4 m Law Compression ssssssssssse enm 6 3 meLaw EXpanslon s iu os ERU RISE ob dada 6 4 A Law Compression vivia id a o ede e dur U ed n Laud sd Ua en HE nie a 6 5 A Law EXparislon 211 dod ETUR dad 6 6 FIR AFI descarada dba eu do a da b aiii cad 6 8 Contents XXV Examples TD d d o ee momen OONoOG R UID O PT OVR ON 7 6 7 7 7 8 7 9 7 10 7 11 7 12 8 1 8 2 xxvi IIR Filter One Biquad aoreet db heeded A dub ntm d heck Yon Tea daco 6 10 IIR Filter N gt 1 Biquads RR 6 12 Adaptive FIR Filter LMS Algorithm ssssssssese eee eee 6 15 Inverse Lattice Filter 2 III 6 18 Lattice Filter zsz ita di tana debi bee pe oben Ad dn i A lev A da ee d d 6 20 Matrix Times a Vector Multiplication sseessseeee II 6 22 Complex RHadbc2 DIE EE T sirva ira 6 27
10. 06 cece eee eee 6 5 Fast Fourier Transforms FFTs 0 0c cece cece e eee eee eens 6 5 1 Complex Radix 2 DIF FFT 20 0 0 cece eens 6 5 2 Complex Radix 4 DIF FFT 0 0 0 0 cece eects 6 5 3 Faster Complex Radix 2 DIT FFT eA 6 5 4 Real Radix 2 FFT 00 ccc cette eens 6 6 C4x Benchmarks 00 ccc ened hn Programming the DMA Coprocessor 0ceceeee ence eee e eens Provides examples for programming the TMS320C4x s on chip peripherals 7 1 Hints for DMA Programming sssssssse e ee eens 7 2 When a DMA Channel Finishes a Transfer 000 eee ences 7 3 DMA Assembly Programming Examples 2 eh 7 4 DMA C Programming Examples sssssssssssss nh Using the Communication Ports sseeueeeeeeeeeee k Describes how to interface with the TMS320C4x communication ports 8 1 Communication Ports ssusssssssslesssssss s 8 2 Signal Considerations 4444444440000 nn 8 3 Interfacing With a Non C4x Device eh 8 4 Terminating Unused Communication Ports 2 8 5 Design TIPS 2s iode se Re Ree one d d bool Ea ed da dore edd 8 6 Commport to Host Interface 2 ss 8 6 1 Simplified Hardware Interface for CA0 PG w 3 3 or C44 devices 8 6 2 Improved Drive and Sense Amplifiers cece eee eee ee 8 6 38 How the Circuit Works ra 8 6 4 The Interface Software
11. int a 100000 BAD ali 10 Good Method int a int malloc 100000 GOOD u 10 Hints for Optimizing Assembly Language Code 5 2 Hints for Optimizing Assembly Language Code Each program has particular reguirements Not all possible optimizations make sense in every case The suggestions presented in this section can be used as a checklist of available software tools O Use delayed branches Delayed branches execute in a single cycle reg ular branches execute in four The three instructions that follow the delayed branch are executed whether the branch is taken or not If fewer than three instructions are used use the delayed branch and append NOPs Machine cycles time are still being saved Use delayed subroutine call and return Regular subroutine CALL and RETS execute in four cycles You can implement a delayed subroutine call by using link and jump LAJ and delayed branches with R11 register mode BUD R11 instructions Both LAJ and BUD instructions execute in a single cycle Guidelines for using the LAJ instruction are the same as for delayed branches Use the repeat single block construct This method produces loops with no overhead Nesting such constructs will not normally increase effi ciency so try to use the feature on the most often performed loop The RPTBD is a single cycle instruction and the RPTS and RPTB are four cycle instructions RPTBD and delayed branches are used
12. A 5 ns 16L8 PLD decodes lines A15 A13 These lines along with STRBO select each of the four pages in this circuit With the PAGESIZE field of STRBO of the global memory interface control register set to OCh the pages are selected on even 8K word boundaries starting at location zero in external memory space This circuit cannot be implemented without page switching because the data outpu s turn on and turn off delays cause bus conflicts and full speed accesses do not allow enough time for chip select decoding for the four pages Here the propagation delay of the 16L8 is involved only during page switches where there is sufficient time between cycles to allow new chip selects to be decoded The timing of this circuit for read operations with page switching is shown in Figure 4 9 When a page switch occurs the page address on address lines A30 A13 is updated during the extra H1 cycle while STRBO is high Then after chip select decodes have stabilized and the previously selected page has disabled its outputs STRB goes low for the next read cycle Further accesses occur at full speed with the normal bus timings as long as another page switch is not necessary Write cycles do not require page switching be cause of the inherent address setup provided in their timings This timing is summarized in Table 4 2 Memory Interfacing 4 19 Wait States and Ready Generation Figure 4 9 Timing for Read Operations Using
13. 6 82 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued BITRV1 Perform Bi BITRV2 LSH LDA LDI LDA LDA LDA OP OP DF DF SUBI z E QUE MPI E E DF TF DF TE PI E ME OE TF TF un at 3 J ct DI un T DA DDI DDI DA DA un m UBI Oc tg U i EOuEpuorPEPPEEEOQDQO mPPEZzaZzupmbEbpPEEE DFGT DFGT DFGT DFGT DFGT DFGT PTBD 2 IRO 2 IR1 QFFT 2 RC 3 RC DEST_ADDR ARO ARO AR1 ARO AR2 AR1 IRO B AR2 IRO B ARO IR1 RO AR1 R1 BITRV1 AR1 ARO RO R1 AR1 IR0 B R1 ARO IR1 RO RO ARO AR1 R1 R1 AR2 IR0 B AR1 ARO RO R1 AR1 IRO B RO RO ARO R1 AR2 SIZE RC Reversing Odd Locations QFFT SIZE RC 1 RC DEST_ADDR ARO RC ARO 1 ARO ARO AR1 ARO AR2 1 RC 3 R AR1 IRO B AR2 IRO B ARO IR1 RO AR1 R1 BITRV2 AR1 ARO RO R1 AR1 IRO B R1 4 AR0 IR1 RO RO ARO AR1 R1 R1 AR2 IR0 B AR1 ARO RO R1 AR1 IRO B RO RO ARO IRO Quarter FFT size Xchange Locations only if ARO lt xAR1 2nd Half Only Xchange Locations only if ARO lt xAR1 z SIF R1 AR2 later Applications Oriented Operations 6 83 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued
14. Because the communications configuration is fixed no token transfer is need ed this allows the CREO and CACK pins of all processors to be individually pulled up to 5 volts through 22 kQ resistors In all cases each CSTRB should be individually buffered to ensure that line reflections do not corrupt each received CSTRB signal The data pins CD7 0 of intercommunicating C4x devices can be tied together In general for fewer than three receivers and distances shorter than six inches data skew relative to CSTRB is nota problem and data buffering is not needed However if more than three receivers must be driven by a single transmitter or the distance is more than six inches both the CSTRB and CD7 0 lines must be buffered The CRDY signal input is generated by ORing the RDY outputs of all of the receiver communication ports The transmitter should not receive a RDY sig nal until the receiver has received all data In addition to ensure that the dedicated receiver C4x devices do nottry to arbi trate for the communication port bus you should halt the output ports of the receiver C4x devices by setting bit four of their communication port control registers to one Broadcasting Messages From One C4x to Many C4x Devices Figure 8 9 Message Broadcasting by One C4x to Many C4x Devices 5 V 22 ko 3 22 ko CREQ3 CACK3 gt CSTRB3 soy lt CRDY3 8 C3D 7 0 5 V 45V C4x 22
15. The DMA transfer can be synchronized with external interrupts communica tion port ICRDY OCRDY signals and timer interrupts In order to enable this feature the SYNCH MODE field bits 6 7 of the DMA control register must be DMA Assembly Programming Examples configured to a proper value and the corresponding bits of the DMA interrupt enable DIE register must be set Example 7 2 sets up DMA channel 4 read synchronization with the communication port 4 ICRDY signal The DMA con tinuously transfers data from the communication port input register until the START field bits 22 23 of the DMA control register is changed by the CPU Example 7 2 DMA Transfer With Communication Port ICRDY Synchronization ITLE DMA TRANSFER WITH COMMUNICATION PORT ICRDY SYNCHRONIZATION HIS EXAMPLE SETS UP DMA CHANNEL 4 TO RANSFER DATA FROM COMMUNICATION PORT INPU REGISTER TO INTERNAL RAM WITH ICRDY SIGNAL READ SYNCHRONIZATION HE TRANSFER MODE OF THE DMA IS SE TO 00 HEREFORE THE TRANSFER WON STOP UNTIL THE START BITS OF THE DMA CONTROL REGISTER IS CHANGED data DMA4 word 001000E0H DMA channel 4 map address CONTROL word 00C00040H DMA register initialization data SOURCE word 00100081H SRC_IDX word 0 COUN
16. MI SUBF R2 R2 R2 Initialize R2 SUBI 2 RC Set RC N 2 DOT PRODUCT 1 lt i lt N RPTS RC Setup the repeat single MPYF3 ARO 1 AR1 1 RO a i b i gt RO B ADDF3 RO R2 R2 a i 1 b i 1 R2 gt R2 ADDF3 RO R2 RO0 a N 1 b N 1 R2 gt RO RETURN SEQUENCE POP RE POP RS POP RC Restore RC POP AR1 Restore AR1 POP ARO Restore ARO POPF R2 Restore top 32 bits of R2 POP R2 Restore bottom 32 bits of R2 POP ST Restore ST RETS Return end end 2 1 2 Zero Overhead Subroutine Calls Two instructions link and jump LAJ and link and jump conditional LAJcona implement zero overhead subroutine calls to be implemented on the C4x Un like CALL and CALL cond which put the value of PC 1 into the software stack LAJ and LAJcond put the value of PC 4 into extended precision register R11 Three instructions following LAJ or LAJcond are executed before going to the subroutine The restriction that applies to these three instructions is the same as that of the three instructions following a delayed branch At the end of the subroutine you can use a delayed branch conditional BcondD in the register addressing mode with R11 as source to perform a zero overhead subroutine return For comparison the same dot product example with a zero overhead subrou tine call is given in the following example program 2 4 Example 2 2 Zero Overhead Subroutine C
17. PIEDRA TI Houston C40 port version of regis Ck Ck ck ck Ck ck KKK ck A A Ck ZA Ck ZA K AXA Ck ck ck KA KA Ck ck ck K ck ck ck Sk ck KA KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK ing started from C30 forward real FFT 2 0 Expanded calling conventions to the use ters for parameter passing FF FF FF FF FF A FF HF F FF FF FF FF FF FF FF FF FF FF FF F ox X SYNOPSIS int ffft rl FFT_SIZE LOG_SIZE SOURCE_ADDR DEST_ADDR SINE_TABLE BIT_REVERSE ar2 r2 r3 rc rs re int FFT SIZE 64 128 256 512 1024 int LOG SIZE 6 Ty 8 9 LO ace float SOURCE ADDR Points to location of source data float DEST ADDR Points to where data will be operated on and stored float SINE TABLE Points to the SIN COS table int BIT REVERSE 0 Bit Reversing is disabled lt gt 0 Bit Reversing is enabled NOTE 1 If SOURCE ADDR DEST ADDR then in place bit reversing is performed if enabled more processor intensive 2 FFT SIZE must be gt 64 this is not checked 6 56 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued KKK KKK ck ck ck ck AA ce ck KKK A XK AA X KKK KKK KKK KKK KKK ck X A KK KKK KKK KKK KKK AA AAA KAZ A kk ko A KKK KKK k 0X FF 0X FF F FF 0X 0X 0X F 0X 0X FF F FF FF FF FF FF FF FF FF AAA FF F Xo X koX A KKK KK KKK KKK KK
18. sect Clffttat USH FP H R4 H R5 H R6 HF R6 H R7 R7 H ARA H AR5 H AR6 H AR7 USH DP DPFFT SIZE if REGPAR E FO ka o flv FU FO FU FO FU FU o FU El tU ENE ua unnan anu nu I hz DI SP FP Entry execution point Reserve memory for arguments Preserve C environment Initialize DP pointer arguments passed in stack Applications Oriented Operations 6 75 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne r f LDA LDI LDI LDI LDI LDI endif TI TI TI TI TI TI U U U U 0 U FP 2 AR2 FP 3 R2 FP 4 R3 FP 5 RC FP 6 RS FP 7 RE RC DEST_ADDR RS SINE_TABLE RE BIT_REVERSE Perform Last FFT loops first loop 2 onwards AR1 A gt AR2 gt AR3 gt X 11 X 12 COS X 13 X 14 SIN C gt D gt AR4 gt X 11 X 12 SIN X 13 AR1 gt LOOP 1st 2nd VV x Il 0 O lt X I1 X 13 X I1 1st ji 1 lt X I1 X I2 X I1 2nd 2 2 k x I1 3rd 3 3 X 2 8 d6 lt X TZ 0672 X I2 3rd 13 29 X I2 2nd 14 30 X I2 1st 15 31 lt X 14 X I3 Xx 13 qd 32 lt X ri xX r3 X I3 1st 173 3 lt X I3 2
19. TDO Y X Table 11 2 Emulator Cable Pod Timing Parameters No oa AB GO PhP Reference Description Min Max Units te TCK TCK_RET period 35 200 ns tw TCKH TCK_RET high pulse duration 15 ns tw TCKL TCK_RET low pulse duration 15 ns ta TMS Delay time TMS TDI valid from TCK RET low 6 20 ns tsu TDO TDO setup time to TCK_RET high 3 ns th TDO TDO hold time from TCK RET high 12 ns XDS510 Emulator Design Considerations 11 5 Emulation Timing Calculations 11 6 Emulation Timing Calculations The following examples help you calculate emulation timings in your system For actual target timing parameters see the appropriate device data sheets Assumptions tsu TTMS Target TMS TDI setup to TCK high 10 ns ta TTDO Target TDO delay from TCK low 15 ns ta bufmax Target buffer delay maximum One ta bufmin Target buffer delay minimum 1 ns libufskew Target buffer skew between two devices 1 35 ns in the same package td bufmax td butmin X 0 15 t TCKfactor Assume a 40 60 duty cycle clock 0 4 40 Given in Table 11 2 on page 11 5 tg TMSmax Emulator TMS TDI delay from TCK RET 20 ns low maximum tsu TDOmin TDO setup time to emulator TCK RET 3 ns high minimum There are two key timing paths to consider in the emulation design 3 The TCK_RET to TMS TDI path called tpq rTCK RET TMS TDI J The TCK RET to TDO path called tpq TCK RET TDO Of the following two cases the worst case path delay is calcula
20. Topic Page CAC O MPA O ae EE o oo a ola 7 6 2 6 2 FIR IIR and Adaptive Filters lt 6 7 Oe Ela ae 6 17 6 4 Matrix Vector Multiplication lt lt lt lt lt c cccs lt s 6 21 6 5 Fast Fourier Transforms FFTS lt lt lt s 6 24 6 6 G4x Benchmatks 5 seen sepsis eel ele ae ain TS 6 86 6 1 Companding 6 1 Companding 6 2 In telecommunications one of the primary concerns is to conserve the channel bandwidth and at the same time to preserve high speech quality This is achieved by quantizing the speech samples logarithmically It has been demonstrated that an 8 bit logarithmic quantizer produces speech quality equivalent to that of a 13 bit uniform quantizer The logarithmic quantization is achieved by companding COMpress exPANDing Two international standards have been established for companding the u law used in the United States and Japan and the A law used in Europe Detailed descriptions of u law and A law companding are presented in an application report on companding routines included in the book Digital Signal Processing Applications with the TMS320 Family literature number SPRA012A During transmission logarithmically compressed data in sign magnitude form are transmitted along the communications channel If any processing is necessary these data should be expanded to a 14 bit for u law or 13 bit for A law linear for
21. 3 Reimer J B P E Nixon E B Boles and G A Frantz Audio Customiza tion of a DSP IC Digest of Technical Papers for 1988 International Con ference on Consumer Electronics June 8 10 1988 Medical 1 Knapp and Townshend A Real Time Digital Signal Processing System for an Auditory Prosthesis Proceedings of ICASSP 88 USA Volume A page 2493 April 1988 2 Morris L R and P B Barszczewski Design and Evolution of a Pocket Sized DSP Speech Processing System for a Cochlear Implant and Other Hearing Prosthesis Applications Proceedings of ICASSP 88 USA Vol ume A page 2516 April 1988 xiii Development Support 1 Mersereau R R Schafer T Barnwell and D Smith A Digital Filter De sign Package for PCs and TMS320 MIDCON 84 Electronic Show and Gonvention USA 1984 2 Simar Jr R and A Davis The Application of High Level Languages to Single Chip Digital Signal Processors Proceedings of ICASSP 88 USA Volume 3 pages 1678 1681 April 1988 If You Need Assistance xiv If you want to Request more information about Texas Instruments Digital Signal Processing DSP products Order Texas Instruments documentation Ask questions about product operation or report suspected problems Obtain the source code in this user s guide Visit TI online including TI amp ME your own customized web page Report mistakes or make com ments about this or any oth
22. Program Control 2 21 2 22 Chapter 3 Logical and Arithmetic Operations The C4x instruction set supports both integer and floating point arithmetic and logical operations The basic functions of such instructions can be combined to form more complex operations This chapter contains the following opera tions examples Bit manipulation Y Block moves Byte and half word manipulation Bit reversed addressing Y Integer and floating point division Y Square root Extended precision arithmetic Y Floating point format conversion between IEEE and C4x formats Topic Page 3 1 Bit Manipulation Ec a EE EE 3 2 3 2 Block Moves cce enemies reve ser eee aja acelere 3 3 3 3 Byte and Half Word Manipulation lt lt lt lt lt lt lt lt lt lt lt s 3 4 3 4 Bit Reversed Addressing lt lt lt lt lt lt lt lt 3 6 3 5 Integer and Floating Point Division lt lt 3 9 3 6 Calculating a Square Root cece e eee ee eee 3 15 3 7 Extended Precision Arithmetic 0 0c cece eee eee eee 3 17 3 8 Floating Point Format Conversion IEEE to From C4x 3 19 3 1 Bit Manipulation 3 4 Bit Manipulation Instructions for logical operations such as AND OR NOT ANDN and XOR can be used together with shift instructions for bit manipulation A special instruction TSTB tests bits TSTB does the same operation as AND
23. X I3 SIN X I4 COS Applications Oriented Operations 6 69 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued X I3 2nd 18 34 X I3 3rd 19 35 i E gt X I4 24 48 lt X I4 H D gt i r E X I4 3rd 29 61 H X I4 2nd 30 62 a E AR4 gt X I4 1st 31 63 lt X 12 X 13 SIN X I4 COS H 32 64 7 AR1 gt 33 65 i r XI F LDA GFFT SIZE IRO LSH 2 IRO STI IRO GSEPARATION LSH 2 IR LDI DARO LDI 3 R7 LDI 16 R6 LDA DEST_ADDR AR5 LDA DEST_ADDR AR1 LSH 1 IRO LSH 1 R7 LOOP ADDI 1 R7 LSH 1 R6 LDA AR1 AR4 ADDI R7 AR1 AR1 points at A LDA AR1 AR2 ADDI 2 AR2 AR2 points at B ADDI R6 AR4 SUBI R7 AR4 AR4 points at D LDA AR4 AR3 SUBI 2 AR3 AR3 points at C LDA SINE_TABLE ARO ARO points at SIN COS table LDA R7 IR1 LDI R7 RC INLOP ADDF3 AR1 IR1 AR2 IR1 RO RO X I1 X 13 SUBF3 AR3 IR1 AR1 R1 R1 X I1 X 13 NEGF ARA R2 RZ X 14 N STF RO AR1 X 11 lt STF R1 AR2 X 13 lt STF R2 AR4 IR1 PX 14 LDA SEPARATION IR1 IR1 SEPARATION BETWEEN SIN COS TABLES SUBI 3 RC 6 70 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued
24. The DMA current component pjya and communication port current compo nent icp should be included in the calculation of Kkpys if they are used in the operations 9 4 1 Local or Global Bus The current due to bus writes varies with write cycle time As discussed in the previous section to obtain accurate current values you must first determine the rate and timing for write cycles to external buses by analyzing program activity including any pipeline conflicts that may exist To do this you can use information from the TMS320C4x emulator or simulator as well as the TMS320C4x User s Guide In your analysis you must account for effects from the use of cache because use of cache can affect whether or not instructions are fetched from external memory When evaluating external write activity in a given program segment you must consider whether or not a particular level of external write activity constitutes significant activity If writes are being performed at a slow enough rate they do not impact supply current requirements significantly and can be ignored This is the case however only if writes are being performed at very slow rates on either the local or global bus When bus write cycle timing has been established Figure 9 5 can be used to determine the contribution to supply current due to bus activity Figure 9 5 shows values of current contribution from the local or global bus for various transfer rates This data was gathe
25. f i n k i b i 1 n 1 b i n b i 1 n 1 k i f i 1 n Initial conditions f p n x n b i n 1 0 fori 1 p Final conditions y n f 0 n The data memory organization is identical to that of the inverse filter shown in Figure 6 5 Example 6 10 shows the implementation of the lattice filter on the C4x Figure 6 6 Structure of the Forward Lattice Filter n z Kp X K2 K1 b p n b 2 n b 1 n f 2 f 1 T 2 n gt 1 n yn K2 K1 Applications Oriented Operations 6 19 Lattice Filters Example 6 10 Lattice Filter TITLE LATTICE FILTER SUBROUTINE LATTICE LAJU ATTICE LOAD ARO LOAD AR1 LOA RC ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION R2 F P N E N EXCITATION ARO ADDRESS OF FILTER COEFFICIENTS K P AR1 ADDRESS OF BACKWARD PROPAGATION VALUES B P 1 N 1 RC RC P 2 B REGISTERS USED AS INPUT R2 ARO AR1 RC REGISTERS MODIFIED RO R1 R2 R3 RS RE RC ARO AR1 REGISTER CONTAINING RESULT R2 f 0 n BENCHMARKS CYCLES 1 5P not including subroutine overhead PROGRAM SIZE 11 words not including subroutine overhead global ATTICE LATTICE RPTBD LOOP Setup the delayed repeat block loop MPYF3 ARO AR1 RO K P B P 1 N 1
26. j AI j BI KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK A K AXA AXA KKK KK KKK KKK KKK AXA AXA AA K KKK KKK A A KKK KK KKK tabl starting ending point for benchmarks Save dedicat lower 32 bits upper 32 bits d registers stack is used for parameter passing points input R1I0 N R9 holds the points where registers ar data remain stage number FFT result should move to used for parameter passing 6 28 Fast Fourier Transforms FFTs Example 6 12 Complex Radix 2 DIF FFT Continued STARTB LDI 1 R8 Initialize repeat counter of first loop LSH3 1 R10 IRO IR0 2 N1 because of real imag LSH3 2 R10 IR1 IR1 N 4 pointer for SIN COS table LDI 1 AR5 Initialize IE index AR5 IE LSH 1 R10 SUBI3 1 R8 RC RC should be one less than desired f Outer loop LOOP RPTBD BLK1 Setup for first loop LSH 1 R10 N2 N2 2 LDI AR2 ARO ARO points to X T ADDI R10 ARO0 AR6 AR6 points to X L First loop ADDF ARO AR6 RO RO X I 4X L SUBF AR6 AR0 R1 R1 X I X L ADDF AR6 ARO R2 R2 Y I Y L SUBF AR6 ARO R3 R32Y 1 Y L STF R2 ARO0 SY I R2 and B STF R3 AR6 PY L R3 BLK1 STF RO AR0 IRO X I RO and a STF R1 AR6 IRO X L R1 and AR0 2 AR0 2 2 n If this is the last stage you are done SUBI 1 R9 BZD ENDB x main inner loop LDI 2 AR1
27. 2320 Monday through Friday from 8 30 a m to 5 00 p m central time 1 Fax 713 274 2324 US DSP Hotline 33 1 3070 1032 European DSP hotline Development Support and Part Order Information 10 3 Development Support Y Electronic Mail 43897500 mcimail com To ask about third party applications and algorithm development packages contact the third party directly Refer to the TMS320 Third Party Support Ref erence Guide SPRU052 for addresses and phone numbers Extensive DSP documentation is available this includes data sheets user s guides and application reports Contact the hotline for information on litera ture that you can request from the Literature Response Center 800 477 8924 The DSP hotline does not provide pricing information Contact the nearest TI Field Sales Office for prices and availability of TMS320 devices and support tools 10 1 3 The Bulletin Board Service BBS The TMS320 DSP Bulletin Board Service BBS is a telephone line computer service that provides information on TMS320 devices specification updates for current or new devices and development tools silicon and development tool revisions and enhancements new DSP application software as it be comes available and source code for programs from any TMS320 user s guide You can access the BBS via Modem 300 1200 or 2400 bps dial 713 274 2323 Set your modem to 8 data bits 1 stop bit no parity To find out more about
28. 5 Lovrich A and J Reimer A Multi Rate Transcoder Transactions on Consumer Electronics USA November 1989 6 Lovrich A and J Reimer A Multi Rate Transcoder Digest of Technical Papers for 1989 International Conference on Consumer Electronics June 7 9 1989 7 Lu H D Hedberg and B Fraenkel Implementation of High Speed Voi ceband Data Modems Using the TMS320C25 Proceedings of ICASSP 87 USA Catalog Number 87CH2396 0 Volume 4 pages 1915 1918 April 1987 8 Mock P Add DTMF Generation and Decoding to DSP uP Designs Electronic Design USA Volume 30 Number 6 pages 205 213 March 1985 9 Reimer J M McMahan and M Arjmand ADPCM on a TMS320 DSP Chip Proceedings of SPEECH TECH 85 pages 246 249 April 1985 10 Troullinos G and J Bradley Split Band Modem Implementation Using the TMS32010 Digital Signal Processor Conference Records of Electro 86 and Mini Micro Northeast USA 14 1 1 21 May 1986 Automotive 1 Lin K Trends of Digital Signal Processing in Automotive International Congress on Transportation Electronic CONVERGENCE 88 October 1988 Consumer 1 Frantz G A J B Reimer and R A Wotiz Julie The Application of DSP to a Product Speech Tech Magazine USA September 1988 2 Reimer J B and G A Frantz Customization of a DSP Integrated Circuit for a Customer Product Transactions on Consumer Electronics USA August 1988
29. 7 1 Hints for DMA Programming 7 1 Hints for DMA Programming The following hints will help you improve your DMA programming and also help you avoid unexpected results OU Reset the DMA register before starting it This clears any previously 7 2 latched interrupt that may no longer exist Also set the DIE register enab ling interrupts for sync transfer after starting the DMA channel Take care in selecting the priority used to arbitrate between the CPU and DMA and also between DMA channels If a DMA channel fails to finish a block transfer it may have lower priority in a conflicting environment and and not be granted access to the resource CPU DMA rotating priority is considered a safe first choice Depending on CPU DMA execution load selection of other priority schemes could result in faster code Fine tuning may be needed Ensure that each interrupt is received when you use interrupt synchroniza tion otherwise the DMA will never complete the block transfer For faster execution avoid memory resource access conflicts between the CPU and DMA Carefully allocate the different sections of the program in memory Use the same care with DMA autoinitialization values in memory Try to use read write synchronization when reading from or writing to com munication ports This avoids a peripheral bus halt during a read from an empty input FIFO or a write to a full output FIFO Choose between DMA read and write synchronizati
30. 9 2 In CMOS devices the internal gates swing completely from one supply rail to the other The voltage change on the gate capacitance requires a charge transfer and therefore causes power consumption The required charge for a gate s capacitance is calculated by the following equation Qate Vpp X Cgate coulombs where Qgate is the gate s charge Vpp is the supply voltage and Cgate is the gate s capacitance Since current is coulombs per second the current can then be obtained from I coul s Vpp X Cgate X Frequency where lis the current For example the current consumed by an 80 pF capacitor being driven by a 10 MHz CMOS level square wave is calculated as follows l 5 volts x 80 x 10712 farads x 10 x 106 charges s 4mA 10 MHz Furthermore if the total number of gates in a device is known the effective total capacitance can be used to calculate the current for any voltage and fre quency For a given CMOS device the total number of gates is probably not known but you can solve for a current at a particular frequency and supply volt age and later use this current to calculate for any supply voltage and operating freguency device VDD X Ctotal X CLK where device is the current consumed by the device Ctotal is the total capacitance and fcLK is the clock cycle Capacitive and Resistive Loading Solving for power P V x I the equation becomes Pgevice VDD X Ctotal X CLK
31. ARO 1 AR1 1 R0 a i b i gt RO ADDF3 RO R2 R2 a i 1 b i 1 R2 gt R2 ADDF3 RO R2 RO a N 1 b N 1 R2 gt RO y ETURN SEQUENCE POP RE POP RS POP RC Restore RC POP AR1 Restore ARI POP ARO Restore ARO BUD R11 Return POPF R2 Restore top 32 bits of R2 POP R2 Restore bottom 32 bits of R2 POP ST Restore ST end end 2 6 Stacks and Queues 2 2 Stacks and Queues 2 2 1 The C4x provides a dedicated stack pointer SP for building stacks in memory Also the auxiliary registers can be used to build user stacks and a variety of more general linear lists This section discusses the implementation of the following types of linear lists Stack A linear list for which all insertions and deletions are made at one end of the list Queue A linear list for which all insertions are made at one end of the list and all deletions are made at the other end Dequeue A double ended queue linear list for which insertions and dele tions are made at either end of the list System Stacks A stack in the C4x fills from a low memory address to a high memory address as is shown in Figure 2 1 A system stack stores addresses and data during subroutine calls traps and interrupts The stack pointer SP is a 32 bit register that contains the address of the top of the system stack The SP always points to the last element pushed onto the
32. Commport to commport transfer DMA3 prim channel sends 4 words from memory 0x02ffc00 to commport 3 output FIFO DMA3 aux channel receives 8 words from commport 3 input FIFO to memory 0x02ffd00 DMA3 prim channel uses OCRDY3 write sync DMA3 aux channel uses ICRDY3 read sync In this program DMA3 aux channel expects data in commport 3 being sent by another processor device Otherwise no aux channel transfer will occur KK A A RR RR k k k Kk Ck k k k Ck Ck Ck Kk Ck Ck Ck kk kk kk k k k ke ko ke ko ke ke ke ke ke koe eoe eoe eoe e e x include dma h define DMAADDR 0x001000d0 define CTRLREG 0x03cdc0d5 DMA Aux prim send interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority read write sync transfer define DIEVAL 0x24000 set ICRDY3 OCRDY read write sync define DST 0x02ffd00 auxiliary channel settings define DST_IDX 0x1 define ACOUNTER 0x08 define SRC 0x02ffc00 primary channel settings define SRC IDX 0x1 define COUNTER 0x04 DMASPLIT dma DMASPLIT DMAADDR int dieval DIEVAL main dma gt src void SRC primary channel dma gt src idx SRC IDX dma gt counter COUNTER dma dst void DST auxiliary channel dma gt dst idx DST IDX dma gt acounter ACOUNTER dma gt ctrl void CTRLREG asm ldi dieval die SPLIT WAIT DMA volatile int dma 7 14 DMA C P
33. DTDIO TRST TRST Pompa TDO TDO prms DTDI1 TU TAG N DTDO2 TMS DTMS2 TCK DTDI2 TRST DTDO3 TDO DTMS3 DTDI3 The TRST signal from the main scan path drives all devices even those on the secondary scan paths of the SPL The TCK signal on each target device on the secondary scan path of an SPL is driven by the SPL s DTCK signal The TMS signal on each device on the secondary scan path is driven by the respec tive DTMS signals on the SPL DTDO on the SPL is connected to the TDI signal of the first device on the sec ondary scan path DTDI on the SPL is connected to the TDO signal of the last device in the secondary scan path Within each secondary scan path the TDI signal of a device is connected to the TDO signal of the device before it If the SPL is on a backplane its secondary JTAG scan paths are on add on boards if signal degradation is a problem you may need to buffer both the TRST and DTCK signals Although less likely you may also need to buffer the DTMSn signals for the same reasons XDS510 Emulator Design Considerations 11 15 Emulation Design Considerations 11 9 2 Emulation Timing Calculations for SPL 11 16 The following examples help you to calculate the emulation timings in the SPL secondary scan path of your system For actual target timing parameters see the appropriate device data sheets Assumptions tsu TTMS ta TTDO ta bufmax ta bufmin libufskew t TCKfacto r Target TMS TDI
34. FILENAME CR4DIF ASM DESCRIPTION COMPLEX RADIX 4 DIF FFT FOR TMS320C40 C callable DATE 6 29 93 VERSION 4 0 ck ck ck ck ck ck Sk ck ck KA A AA KA A KA XK A AX KA X AZ A KA AKA AXA A KA X KKK KKK KKK ck ck ck AAA KA AAA AA A KA AAA ko ck ck ck ck kA k kv ko koc VERSION DATE COMMENTS We LLL 1 0 10 87 PANNOS PAPAMICHALIS TI Houston Original Release 2 0 1 91 DANIEL CHEN TI Houston C40 porting 3 0 7 1 91 ROSEMARIE PIEDRA TI Houston made it C callable 4 0 6 29 93 ROSEMARIE PIEDRA TI Houston added support for in place bit reversing Ck Ck ck ck Ck ck ck ck ck ce A A Ck ZA Ck Sk KKK KKK ck ck A ck ck A A ck KKK ck ck KKK KKK KKK KKK ck ZA AX A ck kk ck kk kk ck kk KK KKK KKK KKK KKK SYNOPSIS int cr4dif SOURCE ADDR FFT SIZE LOGFFT DST ADDR ar2 r2 r3 rc float SOURCE ADDR input address int FFT SIZE 764 256 1024 int LOGFFT log base 4 of FFT SIZE float DST ADDR destination address The computation is done in place Sections to be allocated in linker command file ffttxt FFT code fftdat FFT data If SOURCE ADDR DST ADDR then in place bit reversing is performed T ct KKK KKK KKK ck ck A ce ck A KAZ KKK KKK KKK KK KKK A KA KKK ck X A KK KKK KKK KKK ck X AXA ko AA K A ko ck ck AA A ck ko k ko kc ko ko Jc ESCRIPTION Generic program for a radix 4 DIF FFT computation using the TMS320C4x family The computation is done in place and the
35. February 1988 5 Allen C and P Pillay TMS320 Design for Vector and Current Control of AC Motor Drives Electronics Letters UK Volume 28 Number 23 pages 2188 2190 November 1992 6 Bose B K and P M Szczesny A Microcomputer Based Control and Simulation of an Advanced IPM Synchronous Machine Drive System for Electric Vehicle Propulsion Proceedings of IECON 87 Volume 1 pages 454 463 November 1987 7 Hanselman H LQG Control of a Highly Resonant Disc Drive Head Posi tioning Actuator EEE Transactions on Industrial Electronics USA Vol ume 35 Number 1 pages 100 104 February 1988 8 Jacquot R Modern Digital Control Systems New York NY Marcel Dek ker Inc 1981 9 Katz P Digital Control Using Microprocessors Englewood Cliffs NJ Prentice Hall Inc 1981 10 Kuo B C Digital Control Systems New York NY Holt Reinholt and Winston Inc 1980 11 Lovrich A G Troullinos and R Chirayil An All Digital Automatic Gain Control Proceedings of ICASSP 88 USA Volume D page 1734 April 1988 12 Matsui N and M Shigyo Brushless DC Motor Control Without Position and Speed Sensors IEEE Transactions on Industry Applications USA Volume 28 Number 1 Part 1 pages 120 127 January February 1992 xi xii 13 Meshkat S and I Ahmed Using DSPs in AC Induction Motor Drives Control Engineering February 1988 14 Panahi and R Restle DSPs Rede
36. IRO RO R4 t IRO R2 R4 t IRO ERE t IRO R2 ES R3 R4 R3 R4 RO R4 R2 R4 lt X I1 X I3 lt X I1 X I3 12 lt X 14 Applications Oriented Operations 6 65 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued So Ne Ne Se So Se LOOP4_A Part B Ie Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne e AR1 zopuouutbzwuugutiEmestEEHE nnn 1 tj ARO AR1 AR2 AR4 AR3 ARO T l1 amp T DEST ADDR AR1 AR1 AR2 AR1 AR3 8 AR2 12 AR3 16 IRO QFFT SIZE RC 4 RC 2 RC LOOP4 A AR2 AR1 R1 AR2 AR1 R2 AR3 R3 AR2 IRO RO RO AR1 R1 RO AR1 R2 AR3 R3 R2 AR1 RI AR2 R3 AR3 I1 3rd I1 2nd LIE ists l2 list I2 2nd _ I2 3rd T3 3rd I3 2nd Z 1st I4 1st I4 2nd I4 3rd 16 NI R2 AR1 IRO RI AR2 IRO R3 AR3 IRO RO X 13 R1 R2 R3 XA X 13 X I4 0 1 lt X I1 2 3 4 5 6 s 7 lt X I1 8 Y X LL 10 11 L2 13 14 jb UC 17 X I1 X I3 4 X I1 X I3 X 14 z E Peay E X I3 COS X 14 SIN X I3 COS X 14 SIN X I3 SIN X I4 COS X I3 SIN X I4 COS 6 66 Fast Fourier Transf
37. Processor Initialization 1 7 How to Initialize the Processor Example 1 1 Processor Initialization Example Continued Initialize global memory interface control ldi LDI STI mctrla ar0 gctrl RO RO ARO Initialize local memory interface control LDI STI Glctrl RO RO ARO 4 Initialize the Stack Pointer LDI stacka SP Enable timer interrupt This is equivalent to ldi 1 iie LDI ieval IIE Clear Enable Cache and Enable Global Interrupts OR 3800H ST Global interrupt enable BR begin trap0 reti tint0 reti end BEGIN Branch to the beginning of the application lt this is your application code gt lt this is your trap0 trap code lt this is your tint0 interrupt service routine gt How to Initialize the Processor Example 1 2 Linker Command File for Linking the Previous Example MEMORY EPROM org 0x80000000 len 0x10 EPROM reset vector location RAM org 0x40000000 len 0x100 extend RAM SPECIFY THE SECTIONS ALLOCATION INTO MEMORY SECTIONS rst_sect gt EPROM myvect gt RA mystack gt RAM text gt RAM mytrap gt RA Processor initialization under C language If you are running under a C environment your initialization routine is typically boot asm from the RTS40 LIB library that comes with the floating point compiler
38. RO v NUMBER TO BE CONVERTED REGISTERS USED AS INPUT RO REGISTERS MODIFIED RO R1 REGISTER CONTAINING RESULT RO FF FF FF FF FF X koX xoxo x BENCHMARKS CYCLES 14 not including the BUD instruction WORDS 15 not including the BUD instruction global MUCMPR MUCMPR LSH3 6 R0 R1 Save sign of number ABSI RO RO CMPI 1FDEH RO If RO gt 0x1FDE LDIG 1FDEH RO saturate the result ADDI 33 R0 Add bias FLOAT RO Normalize seg 5 OWXYZx x MPYF 0 03125 R0 Adjust segment number by 2 5 LSH TRO seg WXYZx x PUSHF RO POP RO Treat number as integer LSH 20 R0 Right justify BUD R11 Delayed return AND 080H R1 Set sign bit ADDI R1 RO RO compressed number NOT RO Reverse all bits for transmission Applications Oriented Operations Companding Example 6 2 u Law Expansion TITLE U LAW EXPANSION SUBROUTINE MUXPND TYPICAL CALLING SEOUENCE LAJU MUXPND LDI v RO ui OP lt can be other non pipeline break NOP instructions ARGUMENT ASSIGNMENTS ARGUME FUNCTION R v NUMBER TO BE CONVERTED REGISTERS USED AS INPUT RO REGISTERS MODIFIED RO R1 R2 REGISTER CONTAINING RESULT RO BENCHMARKS CYCLES
39. Send Active Circuit Element Receive Active Circuit Element Using the Communication Ports 8 17 Parallel Processing Through Communications Ports 8 10 Parallel Processing Through Communication Ports The C4x communication ports are key to parallel processing design flexibility Many processors can be linked together in a wide variety of network configura tions In this section Figure 8 8 illustrates C4x parallel processing connectiv ity networks that are used to fulfill many signal processing system needs Figure 8 8 C4x Parallel Connectivity Networks Pipelined Linear Array 2D Array For convolution and correlation and other pipelined Excellent for image processing operations in graphics and modem applications Communication Port Connection Parent Children Tree Structures Bidirectional Ring Supports broadcasting and data searches for Clockwise and counterclockwise data flow Group speech and image recognition applications port for more I O Very effective for neural networks 8 18 Parallel Processing Through Communications Ports Figure 8 8 C4x Parallel Connectivity Networks Continued RO o C4Ax i Communication port connection oto o 3 D Grid For hierarchical processing such as image understanding and finite element analysis Hexagonal Grid 6 nearest neighbors connection Useful in numerical analysis and image processing C4Ax Communication port connection 4 D H
40. X I3 COS X I4 COS X I3 COS X I4 COS X I3 COS X I4 COS COS 2 pi 8 SIN 2 pi 8 So So Se Se Se Se Initialize table pointers R7 AR7 RO R5 R2 COS 2 pi 8 COS 2 pi 8 X I3 COS X I4 COS X I3 COS X I4 COS 6 64 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued NE Ne NE Ne Ne Ne Ne e Me Ne e Ne e e Ne Ne e Ne Ne Ne e UBF 3 UBF 3 nn PTBD DDF3 TE UBF3 TF DDF3 TF PYF3 B 3 i DF3 YES BF3 BF3 DF 3 JUEGGUU Hej BF3 F DF 3 TF PYF3 F DDF3 UBF3 UBF 3 D E 1 E U 1 1 DF3 F BF3 TF DDF3 TF TE 3 Es ANP NNNAPNNPNHZENPHNNHNPHUH A HNA HNN N H E RO R1 R3 R3 R4 ARI LOOP3 AR1 R3 R4 R4 A R2 A R4 A ARO R4 A AR3 R4 A RO R1 AR7 RO R1 AR1 AR1 R4 A R2 A R4 A ARO R4 A AR3 R4 A RO R1 RO R1 AR1 AR1 R4 A R2 A R4 A ARO R4 A R4 A Perform Fourth FFT Loop Part A AR1 gt AR2 AR3 pes DH OY O1 S NR H B R24 R34 R14 RO R24 R34 R14 RO R2 R3 R1 RO RO R4 R2 R4 R7 R1 R3 X I3 COS X I4 COS X IRO t IRO t IRO Ss Se We Rs Wel as Ba ewe M H l Co Il t IRO R2 AR2 IRO RO R3 R3 R4 R3 R4 t
41. arl 1 r0 idf ar0 1 r1 stf r 4ar 0 1 stf rl arl 1 nop arl 2 nop ar0 1r0 b environment POP DP POP AR7 POP AR6 POP AR5 POP AR4 POP AR3 POPF R7 POP Ri POPF R6 POP R6 POP R5 POP R4 RETS end 6 54 Fast Fourier Transforms FFTs Example 6 16 Bit Reversed Sine Table KKK K KKK KKK ck ck ck KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK A A A AXA A A AXA KKK KKK kx kx x SINTAB ASM Bit reversed sine table for a 64 point File to be linked with the source code for a 64 point radix 2 DIT FFT Sine table length FFT size 2 KKK KKK KEK KKK KKK KKK KKK ck ck ck KKK KK KKK KKK AXA AX AX XX AX AX AXA A A AXA A A AX A AX AX A A ko ko ko kx kx x global _SINE sect sintab _SINE float 1 000000 float 0 000000 float 0 707107 float 0 707107 float 0 923880 float 0 382683 float 0 382683 float 0 923880 float 0 980785 float 0 195090 float 05595570 float 0 831470 float 0 831470 float 0 555570 float 0 195090 float 0 980785 float 0 995185 float 0 098017 float 0 634393 float 0 773010 float 0 881921 float 0 471397 float 0 290285 float 0 956940 float 0 956940 float 0 290285 float 0 471397 float 0 881921 float 0 773010 float 0 634393 float 0 098017 float 0 995185 end Applications Oriented Operations 6 55 Fast Fourier Tra
42. n tg CTIONS N EDED IN R S UBF THE TWIDDL OF N 2 N WITH GLOBAL FACTORS AR STOR MMAND FI ED INSID UR BUTTERFLY R IS ON Ck Ck ck ck Ck ck KKK KKK KK KKK KKK KKK KKK KK KKK KKK KR KKK KKK KKK KKK KKK KKK KKK KKK KA K KKK KKK KKK KK KKK POI TS BECAUSE OF TH TH LOOP JY USED FO SINCE THE R A LOAD IN KKK KK KKK KKK KKK KKK KK KKK KKK KK KKK A XK A AAA AXA KKK KKK KEK KKK KKK KKK KEK KKK KKK A AA AA KKK k LINKER CO ffttxt fft code fftdat fft data Ck Ck ck ck Ck ck ck kk ck A KKK KKK KKK KKK ck ck A KK KKK KKK ck ck ck ck ck ck ck ck ck A ZA KA KA AX K A KX A Ck ck A kk ck kk ck kk Sk kk ko ko KK KKK D IN BIT R EV ERSED ORDER AND WITH A TABL E ENGTH EFT JENGTH IH S IN ABL IS PROVIDED IN A SEPARATE FIL LABEL _SIN POINTING O TH B EGINNING OF THE TABLE 6 42 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued COS 2 PI n N j SIN 2 PI n N COEFFICIENT COS 2 PI 0 32 1 SIN 2 PI 0 32 0 COS 2 PI 4 32 0 707 SIN 2 PI 4 32 0 707 COS 2 PI 3 32 0 831 SIN 2 PI 3 32 0 556 COS 2 PI 7 32 0 195 SIN 2 PI 7 32
43. single access RAM SARAM Memory that can be read from or written to only once in a single CPU cycle single precision floating point format A 32 bit representation of a float ing point number with a 24 bit mantissa and an 8 bit exponent single precision integer format A twos complement 32 bit format for in teger data single precision unsigned integer format A 32 bit unsigned format for integer data software interrupt An interrupt caused by the execution of a TRAP instruc tion splitmode A mode of operation of the DMA coprocessor This mode allows one DMA channel to service both the receive and transmit portions of a communication port ST See status register stack A block of memory reserved for storing and retrieving data on a first in last out basis It is usually used for storing return addresses and for pre serving register values status register A register in the CPU register file that contains global in formation related to the CPU Timer A programmable peripheral that can be used to generate pulses or to time events Timer Period Register Timer period register A 32 bit memory mapped register that specifies the period for the on chip timer Glossary A 7 Glossary A 8 trap vector table TVT An ordered list of addresses which each corre spond to an interrupt when a trap is executed the processor executes a branch to the address stored in the corresponding location in the trap vector ta
44. 0 981 AR BR TI AR AI BR BI XX OX A o Ok oko oko FF F FF F FF FF F FF F OX OX F X CRC KKK KKK KKK KK KKK KKK KKK ck Ck KKK KKK KKK A KA KK KKK KKK KKK AX KA KK KKK KKK KKK KKK KK KKK KKK KKK KK KKK TR j AI AR j AI A j BI COS j SIN BR j BI d BR COS BI SIN BI COS BR SIN AR TR AI TI AR TR AI TI WN THIS ENGTH OF 1024 THE TABLE IS FOR ALL FFT CAN ARE GENERATED BY USING E REALIZED VERY EASY BY EXAMPLE SHOWN FOR N 32 WN n ADDRESS 0 R WN 1 I W 2 R W 3 I W 12 R W 13 I W 14 R W 15 I W WHEN GENERATED FOR A FF LENGTH LESS OR EQUAL AVAILABLE HE MISSING TWIDDLE FACTORS W HE SYMMETRY WN N 4 n j WN n CHANGING REAL AND IMAGINARY PART OF TH NEGATING THE NEW REAL PART E TWIDDLE FACTORS AND BY Ck ck ck Ck ck ck Ck ck A A AK KKK KKK KKK ck X A ck kk KA Sk ck K ck ck AA ck X A ck ck A X AA Ck ck ck ck ck KKK ck kk KKK KK KKK KKK KKK ko ko ko ko global cr2dit Entry execution point global SINE Sine table pointer global STARTB ENDB starting ending point for given benchmarks sect 1 Te ent fg Space dl is FFT SIZE fg2 space 1 is FFT SIZE 2 fg4m2 spa
45. 0 sh 8 7 Anl O Coprocessor C4x Interface 0 ees 8 8 Implementing a Token Forcer 2 eens 8 9 Implementing a CSTRB Shortener Circuit 8 10 Parallel Processing Through Communication Ports eee ee 8 11 Broadcasting Messages From One C4x to Many C4x Devices o o o o Contents xix Contents 9 10 XX C4x Power Dissipation lt lt lt lt lt 4 nnn K nnn 9 1 Explains the current consumption of the C4x and also provides information about current con sumption by components 9 1 Capacitive and Resistive Loading 44 4 4444 9 2 9 2 Basic Current Consumption lt s 9 4 9 2 1 Current Components s n 9 4 9 2 2 Current Dependency nh 9 4 9 2 3 Algorithm Partitioning 00 eee 9 5 9 2 4 TestSetup Description ia nea iepa ena teens 9 5 9 3 Current Requirement of Internal Components 9 7 93 1 QUIESCENT RC 9 7 9 3 2 Internal Operations 2 eh 9 7 9 3 3 Internal Bus Operations 4 eee 9 8 9 4 Current Requirement of Output Driver Components eee eee 9 12 9 4 1 Local or Global Bus e aana es 9 13 942 DMA sioro eei panetep dashes PRI A Aa 9 16 9 43 Communication Port 00 eens 9 16 9 4 4 DataDependency cece k 9 17 9 4 5 Capacitive Loading Dependence 6c cece eee eee 9 19 9 5 Calculation of Total Supply Current eens 9 20 9 5 1 Combining Supply Current Due
46. 11 10 worst best not including subroutine overhead WORDS 11 not including subroutine overhead global MUXPND MUXPND NOT RO RO Complement bits AND3 OFH RO R1 Isolate quantization bin LSH 1 R1 ADDI 336 RL Add bias to introduce 1xxxx1 LSH3 4 R0 Isolate segment cod TSTB 08H RO Test sign BZD R11 If positive delayed return AND 7 R0 LSH3 RO R1 RO Shift and put result in RO SUBI 33 R0 Subtract bias BUD R11 Delayed return NEGI RO Negate if a negative number NOP NOP 6 4 Companding Example 6 3 A Law Compression 8 0x FF 0 FF 0X 0X FF FF FF FF F X HF X SUBROUTINE ACMPR ITLE A LAW COMPRESSION TYPICAL CALLING SEQUENCE LAJ ACMPR LDI v RO NOP lt can be other non pipeline break NOP lt instructions ARGUMENT ASSIGNMENTS ARGUMEN FUNCTION RO v NUMBER TO BE CONVERTED REGISTERS USED AS INPUT RO REGISTERS MODIFIED RO R1 REGISTER CONTAINING RESULT RO BENCHMARKS global ACMPR LSH3 ABSI CMPI BLED PI H am 3 j H LOAT mK Hj SHF as NOQCHNUVENYU T OZO UUU jq A a a UV UH zrno x lt O W CYCLES ACMPR 5 R0 RI RO RO 1FH RO END OFFFH RO OFFFH RO 1 R0 RO 0 125 R0 1 R0 RO RO 20 R0 R11 080H R1 R1 RO OD5H RO 16 10 worst best not including subroutine overhead WORD
47. 4 Data enable Address enable AE LAE 4 Address enable Status STAT 3 0 LSTAI 3 0 HE Status Global Interlock signal LOCK LLOCK Interlock signal Bus STBBO LSTRBO Local m gt LSTRBO control Bus STRBO control LPAGEO LRDYO STRBO control enable LCEO LSTRBO control enable LSTRB1 P STRB1 control LPAGE p p LSTRBi control a LRDY1 5 STRB1 control enable LCE1 LSTRB1 control enable Interrupt and I O Flags 4 lOF 3 0 CnD 7 0 E r Nonmaskable interrupt NMI CREQn 4 Communication Interrupt acknowledge JACK CACKn lt port interface Reset and 2 RESET CSIRBn 4 6 Sets ROM control Jat Co A X1 TCLKO 4 3 Timer interface Master clock lt X2 CLKIN TCLK1 4 and VO flags Clock outputs lt it TCK TDO Emulation TDI interface Note n 0for communication port 0 n 1 for communication port 1 etc The global and local buses implement the primary memory mapped interfaces tothe device These interfaces allow external devices such as DMA controllers and other microprocessors to share resources with one or more C4x devices through a common bus Memory Interfacing 4 3 Global and Local Bus Interfaces 4 3 Global and Local Bus Interfaces The C4x uses the global and local buses to access the majority of its memory mapped locations Since these two memory interfaces are identical in every way except for their positions in the memo
48. Init loop counter for inner loop LDI SINTAB AR4 Initialize IA index AR4 IA ADDI AR5 AR4 IA IA IE AR4 points to cosine ADDI AR2 AR1 ARO 7 X I Y I pointer SUBI TRE RG RC should be one less than desired INLOP RPTBD BLK2 Setup for second loop ADDI R10 ARO AR6 X L Y L pointer ADDI 2 AR1 LDF AR4 R6 RO SIN Second loop SUBF AR6 ARO R2 R2 X I X L SUBF AR6 ARO R1 R1 Y 1 Y L MPYF R2 R6 RO RO R2 SIN and l ADDF AR6 ARO R3 R3 Y 1 Y L MPYF R1 AR4 IR1 R3 R3 R1 COS and Hi STF R3 ARO Y I Y I Y L SUBF RO R3 R4 RA R1 COS R2 SIN MPYF R1 R6 RO RO R1 SIN and BI ADDF AR6 ARO R3 R3 X I X L MPYF R2 AR4 IR1 R3 R3 RZ COS and B STF R3 AR0 IRO 7X 1 X 1I X L and ARO ARO 2 N1 ADDF RO R3 R5 R5 R2 COS R1 SIN BLK2 STF R5 AR6 IRO X L R2 COS R1 SIN incr AR6 and Applications Oriented Operations 6 29 Fast Fourier Transforms FFTs Example 6 12 Complex Radix 2 DIF FFT Continued H t PI EAF DI DI BI H NANnNGUUEA H NEHFWrHNPPrPwaAN Iw H UBI3 ENDB cmpi bead nop tai subi rptbd ldi INPLACE This bit reversal ok ck ck ck ck ck kk ck kk kk KKK KKK R4 AR6 R10 AR1 INLOP AR5 ARA AR2 AR1 ARO 1 R8 RC 1 R8 LOOP 1 AR5 R10 IRO 1 R8 RC Y L R1 COS R2 SIN Loop back to the inner loop IA IA IE AR4 points to cosin
49. Perform first and second FFT loops A ARI gt I1 0 lt X I1 X I3 2 X 12 A AR2 gt I2 1 lt X 11 X 13 2 X I2 AR3 gt I3 2 lt X 11 X I3 2 X 14 A ARA gt I4 3 lt X Il1 X 13 2 X I4 H AR1 gt 4 i E VI LDA SOURCE_ADDR AR1 LDA AR1 AR2 LDA AR1 AR3 LDA AR1 AR4 ADDI 1 AR2 ADDI 2 AR3 ADDI 3 ARA LDA 4 IRO LDI QFFT SIZE RC LSH 2 RC SUBI 2 RG LDF AR4 R6 R6 X I4 LDF AR2 R7 R7 X I2 B LDF AR1 R1 R1 X I1 MPYF 2 0 R6 R6 2 X 14 MPYF 2 0 R7 R7 2 X I2 SUBF3 R6 AR3 R5 R5 X I3 2 X I4 SUBF3 R5 R1 R4 RA X I1 X I3 42X I4 SUBF3 R7 AR3 R5 R5 X I3 2 X 12 ii STF R4 AR4 IRO X I4 lt ADDF3 R5 R1 R3 R3 X 11 X 13 2X 12 ADDF3 R6 AR3 R4 RA X I3 2 X 14 Lj STF R3 AR2 IRO X I2 lt RPTBD LOOP1_2 7 SUBF3 R4 R1 R4 R4 X I1 X I3 2X I4 ADDF3 R7 AR3 RO RO X I3 2 X I2 BI STF R4 AR3 IRO X I3 lt ADDF3 RO R1 RO RO X 11 X 13 2X 12 LDF AR4 R6 R6 X I4 M STF RO AR1 IRO X 11 lt MPYF 2 0 R6 R6 2 X 14 LDF AR2 R7 R7 X I2 M LDF AR1 R1 R1 X I1 MPYF 2 0 R7 gt R7 2 X I2 SUBF3 R6 AR3 R5 R5 X I3 2 X 14 SUBF3 R5 R1 R4 RA X I1 X I3 42X I4 SUBF3 R7 AR3 R5 RB X I3 2 X 12 BI STF R4 AR4 IRO X I4 lt ADDF3 R5 R1 R3 R3 X I1 X I3 2X I2 ADDF3 R6 AR3 R4 RA
50. Program Control 5 2 Asem di 2 1 Provides examples for initializing the processor and discusses program control features 243 SUDIQUUINOS 1 6 4 dy RR do da Aa RA ESAE AR daa a 2 2 2 1 1 Regular Subroutine Calls 2 eee eee 2 2 2 1 2 Zero Overhead Subroutine Calls ce eee 2 4 2 2 Stacks and QUEUES x is acted eared ey ea doge wa a Cane Sk podobn ad ix eee ale 2 7 22 1 SYSICMMIACKS s sum ices rane a dale de RR oa eta it 2 7 222 User SLACKS iuis ebd eg ah e ea d Re Wha bend yeh dU okt 2 8 2 2 8 Queues and Double Ended Queues cee eee eee eee 2 9 2 39 Interrupt Examples vied nad Be ete itr kara acted hee e uio a Foil ad 2 11 2 3 1 Correct Interrupt Programming eh 2 11 2 3 2 Software Polling of Interrupts 22 eee 2 11 2 3 8 Using One Interrupt for Two Services 2 2 12 2 9 4 Nesting Interrupis sss dons here kr NES LEE NER B k ben E 2 13 2 4 Context Switching in Interrupts and Subroutines lt 2 14 25 Repeal Modes so Ai deed deel oe bende dae DE ena de Oi V AR bod RUE 2 18 2 5 Block REDCAL eere RU AREE uet RE DRE en ah ade ce santana ees 2 18 2 5 2 Delayed Block Repeat eee kn 2 19 2 5 8 Single Instruction Repeat serre 2 20 2 6 Computed GOTOs to Select Subroutines at Runtime lt 2 21 xvii Contents 3 xviii Logical and Arithmetic Operations lt lt lt lt lt lt lt s lt lt lt sss 144
51. TR rl BI COS AI r4 r5 AR TR BR r2 r5 BI SIN AR r5 r2 TI r0 rl r0 BR COS r3 AI TI 14 AI TI BI r3 r3 TR r0 r5 r0 BR SIN r2 AR TR rl BI COS AI r4 r3 AR TR BR r2 6 50 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued mpy stf sub mpy add sub stf sub mpy sub mpy stf add sti mpy stf add mpy add sub r3 stf sub mpy sub mpy add clear pipe sti stf add add stf sub SE stf bf2end KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK A KAZ KKK KKK KK KKK KKK KKK AXA AA KA A AA A kk ko ko Sk KKK KKK KKK Ck Ck ck ck KKK KR KKK KKK KKK A KA KKK KKK KK KKK KKK AX AA KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK ko ko ko KKK ldi ldi ldi ldi ldi ldi ldi Fh Fh FH FH Fh Fh Fh FH Fh Fh Fh Fh FH FH bh Fh Fh Fh Fh Fh Fh arl r7 r5 r3 ar2 rl r0 rYr2 arl r6 r0 2 ar r3 r2 ar0O ir0 r4 r3 ar3 ir0 rO 15 13 a arlde r7 2 0 r3 ar 0 rZ arl 126 11 r4 ar2 ir0 xar0 r3 r5 r2 ar3 arl r7 r5 r5 ar2 rl r r2 ard ro 0 xr2 ar0 r3 r2 ar0 r4 r3 ar3 r0 r5 r3 KAP R67 0 3 ar0 r2 arl ir0O r6 rl arQ 3 563 ine r2 ar3 r4 ar2 Fill 7205 62 r2 ar r3 r3 ar2 r2 ar0 r4 r3 ar3 r4 ar2 LAST STAGE r5 r2 ro
52. X 13 2 X 14 Applications Oriented Operations 6 81 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued Check Bit Reversing Mode STF R3 AR2 SUBF3 R4 R1 R4 ADDF3 R7 AR3 STF R4 AR3 LOOP1 2 ADDF3 RO R1 RO STF RO AR1 IRO X I2 lt R4 X I1 X I13 2X I14 RO RO X I3 2 X I2 IRO s ALA RO X 11 X 13 2X 12 LAST X I1 lt on or off E If SourceAddr lt gt DestAddr Bit reversing Type Se Se So Ne Ne Then Standard Bit Reversing BIT_REVERSING 0 then OFF no bit reversing BIT_REVERSING lt gt 0 Then ON NDB LDI BIT_REVERSE RO BZ MOVE_DATA Check Bit Reversing Type i If SourceAddr DestAddr Then In Place Bit Reversing i LDI SOURCE_ADDR RO CMP I DEST_ADDR RO BEO IN PLACE From Source to Destination In Place Bit Reversing Bit Reversing On IN PLACE LDA B Even Locations GFFT SIZI NOTE abs SOURCE ADDR DEST ADDR must be FFT SIZE this is not checked LDI QFFT SIZE RO SUBI 2 R0 LDA GFFT SIZE IRO LSH 1 IRO IRO Half FFT size LDA GSOURCE ADDR ARO LDA DEST_ADDR AR1 LDF ARO R1 RPTS RO LDF ARO R1 B STF R1 AR1 IRO B STF R1 AR1 IRO B BR DIVISION lst Half Only E IRO
53. board with four C4xs directly connected to each other via their commu nication ports Each C4x has 64K words SRAM and 8K byte EPROM as local memory and they all share a 128K word global SRAM See the TMS320C4x Parallel Processing Development System Technical Refer ence SPRU075 for detailed information about the PPDS The emulation porting kit EPK enables you to integrate emulation technology directly into your system without the need of an XDS510 board This product is intended to be used by third parties and high vol ume board manufacturers and requires a licensing agreement with Texas Instruments 10 1 1 Third Party Support The TMS320 family is supported by products and services from more than 100 independent third party vendors and consultants These support products take various forms both as software and hardware from cross assemblers simulators and DSP utility packages to logic analyzers and emulators The ex pertise ofthose involved in support services ranges from speech encoding and vector quantization to software hardware design and system analysis See the TMS320 Third Party Support Reference Guide literature number SPRUO52 for a more detailed description of services and products offered by third parties 10 1 2 The DSP Hotline For answers to TMS320 technical questions on device problems develop ment tools documentation upgrades and new products you can contact the DSP hotline via Phone 713 274
54. device ready signal generation 0027 two_waita Pin 15 internal flip flop signal for first of the two 0028 wait states for 2 wait state devices 0029 two_waitb Pin 16 internal flip flop signal for second 0030 of the two wait states for 2 wait 0031 state devices 0032 0033 name substitutions for test vectors 0034 C H L X Cir Lybra 0035 0036 0037 state bits 0038 outstate one wait two waita two waitb 0039 0040 idle b111 0041 wait one b011 0042 wait twoa b101 0043 wait twob b110 0044 0045 0046 state diagram outstate 0047 0048 state idle 0049 if reset 4 ahi2 strb syn then wait one 0050 else if reset amp ahi3 amp strb syn then wait twoa 4 16 Wait States and Ready Generation Example 4 1 PLD Equations for Ready Generation Continued 0051 else idle 0052 0053 0054 state wait_one 0055 GOTO idle 0056 0057 state wait_twoa 0058 if reset_ then wait_twob 0059 else idle 0060 0061 state wait twob 0062 GOTO idle 0063 0064 eguations 0065 Irdy0 reset ahil amp strb0_ 4 one wait two waitb 0066 0067 page 0068 Test lst level global arbitration logic 0069 test_vectors 0070 h3 ahil ahi2 ahi3 strbO strb syn reset outstate rdy0 0071 c X X X X X L gt idle H A 0072 Gu L H L L L H gt wait one L 0073 c X X X X X L
55. r4 r3 r0 rl 55 r5 r2 ro r4 r3 r0 el r2 r3 r4 BI COS AR r3 TI r0 r1 BR SIN r3 AI TI AI TI BI r3 TR r5 r0 BR COS r2 AR TR BI SIN AI r4 AR TR BR r2 BI COS AR r5 TI r0 rl BR SIN r3 AI TI AI TI y L BI TR r5 r0 BR COS r2 AR TR BI SIN r3 AR TR BR r2 AI r4 TI r0 rl AI TI AR r3 AI TI BI r3 AI r4 inputp ar0 ar0 ar2 finputp2 arl arl ar3 sintp2 ar7 3 ir0 f g4m2 rce fill pipeline upper output lower output pointer to twiddle factors group offset Applications Oriented Operations 6 51 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued 1 butterfly w 0 addf ar0 arl r6 AR r6 AR BR subf arl ar0 17 BR r7 AR BR addf ar0 arl r4 AI r4 AI BI subf arl ir0 ar0 1r0 r5 BI r5 AI BI 2 butterfly w M 4 addf arl ar0O r3 AR r3 AR BI Lat ar 7 rl rl 0 for inner loop BI ldf arl r0 r0 BR for inner loop rptbd bflend Setup for loop bflend subf arl ir0 arO r2 BR r2 AR BI stf r6 ar2 AR r6 MI stf 17 ar3 BR r7 stf r5b ar3 ir0 BI r
56. the TMS320C4x is not driving the data bus this eliminates a significant component of the output buffer current Further more in many typical cases only a few address lines are changing or the whole address bus is static Under these conditions an insignificant amount of supply current is consumed Therefore when no external writes are being performed or when writes are performed infrequently current due to output buffer circuitry can be ignored When external writes are being performed the current required to supply the output buffers depends on several considerations Data pattern being transferred Rate at which transfers are being made J Number of wait states implemented because wait states affect rates at which bus signals switch External bus DC and capacitive loading External bus operations involve external writes to the device and constitute a major power supply current component The power supply current for the external buses made up of four components is summarized in the following equation kbus base local local base global global where base local global is the current consumed by the internal driver and pin capaci tance local is the local bus current component and global is the global bus current component The remainder of this section describes in detail the calculation of external bus current requirements Current Reguirement of Output Driver Components o y Note
57. wait state definition A 8 Index wait states 4 5 4 11 4 15 consecutive reads then write 4 6 consecutive writes then read 4 7 full speed 4 5 logic 4 14 memory device timing See memory device tim ing wait state generator definition A 8 workshops 10 5 write cycles RAM requirements 4 6 XDS510 emulator JTAG cable See emulation zero fill definition A 8 zero overhead subroutine call example 2 5 ZIF PGA socket handle activated diagram 10 8 tool activated diagram 10 7 Index 9 Index 10
58. 1 Program Control 2 19 Repeat Modes 2 5 3 Single Instruction Repeat Example 2 9 shows an application of the repeat single construct In this ex ample the sum of the products of two arrays is computed The arrays are not necessarily different If the arrays are a i and b i and if each is of length N 512 register R2 contains the following quantity a 1 b 1 a 2 b 2 a N b N The value of the repeat counter RC is specified to be 511 in the instruction Example 2 9 Loop Using Single Repeat TITLE LOOP USING SINGLE REPEAT LDI ADDR1 ARO ARO points to array a i LDI ADDR2 AR1 AR1 points to array b i LDF 0 0 R2 Initialize RO MPYF3 ARO 1 AR1 1 R1 Compute first product RPTS 511 Repeat 512 times MPYF3 AR0 1 AR1 1 R1 Compute next product and BI ADDF3 R1 R2 R2 accumulate the previous ADDF R1 R2 One final addition 2 20 Computed GOTOs to Select Subroutines at Runtime 2 6 Computed GOTOs to Select Subroutines at Runtime Occasionally itis convenient to select during runtime not during assembly the subroutine to be executed The C4x s computed GOTO supports this selec tion You can implement the computed GOTO by using the CALLcondinstruc tion in the register addressing mode This instruction uses the contents of the register as the address of the call Example 2 10 shows the case of a task con troller Ex
59. 10 dk Pe O O BO n O ONO Ol k GOOD Oo T 1041 10 2 10 3 11 1 11 2 11 3 11 4 11 5 11 6 11 7 11 8 11 9 11 10 11 11 11 12 11 18 Figures Local Global Bus Current Versus Transfer Rate and Wait States 9 14 Local Global Bus Current Versus Transfer Rate at Zero Wait States 9 15 DMA Bus Current Versus Clock Rate ees 9 16 Communication Port Current Versus Clock Rate 0c cece eee teens 9 17 Local Global Bus Current Versus Data Complexity 00 cc eeee eee eee 9 18 Pin Current Versus Output Load Capacitance 10 MHz 000ee eee eee 9 19 Current Versus Frequency and Supply Voltage 9 21 Change in Operating Temperature 50 2 eee eee 9 22 Load GUNES coords ars ED OR ecc Bee d b e O e qe coo d bed aye 9 25 Tool Activated ZIF Socket s 10 7 Handle Activated ZIF Socket cece teen ees 10 8 Device Nomenclature n 10 10 14 Pin Header Signals and Header Dimensions cece ee 11 2 JTAG Emulator Cable Pod Interface ees 11 4 JTAG Emulator Cable Pod Timings 2 eese 11 5 Target System Generated Test Clock 000 cece eee eee eee eee eee 11 10 Multiprocessor Connections lt en 11 11 Pod Connector Dimensions 0 ccc cc teen tne teens 11 12 14 Pin Connector Dimensions nte tne eens 11 13 Connecting a Secondary JTAG Scan Path to an SPL 0c eee ee eee 11 15 EMUO Configurati
60. 2 IRO IRO N FOR i 0 i lt K i LOOP OVER THE ROWS RPTBD DOT Setup multiply a row by a column Set loop counter LDF 0 0 R2 Initialize R2 MPYF3 ARO 1 AR1 1 RO m i 0 v 0 gt RO NOP FOR j 1 j lt N j DO DOT PRODUCT OVER COLUMNS MPYF3 ARO 1 AR1 1 RO m i j v j gt RO ADDF3 RO R2 R2 m i j 1 v 3 1 R2 gt R2 DBD AR3 ROWS counts the number of rows left 6 22 Example 6 11 Matrix Times a Vector Multiplication Continued Matrix Vector Multiplication AA A X ADDF RO R2 last accumulate STF R2 AR2 1 result gt p i NOP AR1 IRO set AR1 to point to v 0 DELAYED BRANCH HAPPENS HERE RETURN SEQUENCE RETS return end end Applications Oriented Operations 6 23 Fast Fourier Transforms FFTs 6 5 Fast Fourier Transforms FFTs 6 24 Fourier transforms are an important tool often used in digital signal processing systems The transform converts information from the time domain to the fre quency domain The inverse Fourier transform converts information back to the time domain from the frequency domain Implementation of Fourier trans forms that are computationally efficient are known as fast Fourier transforms FFTs The theory of FFTs can be found in books such as DFT FFT and Con volution Algorithms by C S Burrus and T W Parks John Wiley 1985 and Dig ital Signal Processing Applica
61. 23 LSBs of x frac 0 2 Ifv O0 then x exp v exp 1 and x man 2 v man For the special case in which the 10 MSBs of v man 10 00000000b Integer and Floating Point Division then x man 1 2 8 10 11111111b In both cases the 23 LSBs of x frac 0 3 Ifv 0 v exp 128 then x exp 127 and x man 01 1111111111111111111111111111111b In other words if v 0 then xbecomes the largest positive number repre sentable in the extended precision floating point format The overflow flag V is set to 1 4 If v exp 127 then x exp 128 and x man 0 The zero flag Z is set to 1 The Newton Raphson algorithm is x n 1 x n 2 0 vx n In this algorithm vis the number for which the reciprocal is desired x 0 is the seedforthe algorithm and is given by RCPF At every iteration of the algorithm the number of bits of accuracy in the mantissa doubles Using RCPF accuracy starts at eight bits With one iteration accuracy increases to16 bits in the man tissa and with the second iteration accuracy increases to 32 bits in the mantis sa Example 3 8 shows the program for implementing this algorithm on the C4x Logical and Arithmetic Operations 3 13 Integer and Floating Point Division Example 3 8 Inverse of a Floating Point Number With 32 Bit Mantissa Accuracy
62. 3 and each set of d values i e d i n i 2 0 N 1 should begin at an address that is a multiple of 4 the last two bits zero as stated in the case of a single biquad Applications Oriented Operations 6 11 FIR IIR and Adaptive Filters Example 6 7 IIR Filter N gt 1 Biquads i TITLE IIR FILTER N BIQUADS SUBROUTINE IIR2 EQUATIONS y 0 n x n FOR i 0 i lt N i d d i n a2 i d i n 2 al i d i n 1 y i 1 n y i n b2 i d i n 2 bl i d i n 1 bO i d i n 3 y n y N 1 n TYPICAL CALLING SEQUENCE load R2 load ARO load ARI load IRO LAJU IIR2 load IR1 load BK load RC ARGUMENT ASSIGNMENT ARGUMENT FUNCTION sto pe SSC Area Dr ro rt R2 INPUT SAMPLE x n ARO ADDRESS OF FILTER COEFFICIENTS a2 0 AR1 ADDRESS OF DELAY NODE VALUES d 0 n 2 BK BK 3 IRO IRO 4 IR1 IR1 4 N 4 RC NUMBER OF BIOUADS N 2 REGISTERS USED AS INPUT R2 ARO AR1 IRO IR1 BK RC REGISTERS MODIFIED RO R1 R2 ARO AR1 RC REGISTERS CONTAINING RESULT RO BENCHMARKS CYCLES 2 6N not including subroutine overhead E WORDS 15 not including subroutine overhead global IIR2 IIR2 MPYF3 ARO AR1 RO a2 0 d 0 n 2 RO MPYF3 ARO 1 AR1 1
63. 3 1 Provides examples for performing logical and arithmetic operations 3d Bit Manipulation zx temene ii ERREUR GERE diana ape DR RR S seals 3 2 9 2 JBIOCK MOVES z ndn Ree ette ebbe dal dy ies uh desides eL ba treu 3 3 39 3 Byte and Half Word Manipulation 2 eens 3 4 3 4 Bit Reversed Addressing cece eee eee eet n 3 6 3 4 44 CPU Bit Reversed Addressing 00 0 c cece eee e eee eee 3 6 3 4 2 DMA Bit Reversed Addressing cece eee eee teen eens 3 7 3 5 Integer and Floating Point Division 2 2 II 3 9 3 5 4 Integer Division 00 6 eet aE a RE 3 9 3 5 2 Computation of Floating Point Inverse and Division 3 12 3 6 Calculating a Square Root 20 cece aeaaea 3 15 3 7 Extended Precision Arithmetic 0 0 ec eee meh 3 17 3 8 Floating Point Format Conversion IEEE to From C4x 0c eee 3 19 Memory Interfacing lt 4 44 eee eee 4 1 Provides examples for TMS320C4x System Configuration Memory Interfaces and Reset 4 1 System Configuration 0 0 eee e 4 2 42 External Interfacing ee E 4 3 4 3 Global and Local Bus Interfaces 2 sse 4 4 4 4 Zero Wait State Interfacing to RAMS 2 esser 4 5 4 4 1 Consecutive Reads Followed by a Write Interface Timing 4 6 4 4 2 Consecutive Writes Followed by a Read Interface Timing 4 7 4 4 8 RAM Interface Using One
64. 6 12 to 6 15 See also IIR filters FIR filter adaptive 6 15 benchmarks 6 8 FIR filters 6 7 6 14 circular addressing 6 7 example 6 7 features 6 7 FIX instruction 3 9 FLOAT instruction 3 9 floating point conversion to from IEEE 3 19 formats 3 19 IEEE 3 20 pop and push 2 8 floating point reciprocal 3 12 example 3 16 floating point division 3 12 floating point number inverse example 3 14 Index 4 formats floating point 3 19 forward lattice filter example 6 19 FRIEEE instruction 3 19 fully connected network 8 19 GIE 2 11 2 13 globalbus 4 3 control signals 4 11 global memory interface See memory interface half word manipulation 3 4 hardware interrupt definition A 3 header 14 pin 11 2 dimensions 14 pin 11 2 hexagonal grid 8 19 hit definition A 3 hotline 10 3 TACK definition A 4 ICFULL interrupt example 8 2 ICRDY communication port 7 7 ICRDY interrupt example 8 2 IEEE 1149 1 specification bus slave device rules 11 3 IEEE Customer Service address 11 3 IEEE standard 11 3 IIE See internal interrupt enable register IIF See IIOF flag register IOF flag register IIF 7 5 definition A 4 IIR filters 6 7 6 9 6 9 benchmarks 6 10 6 12 to 6 15 index registers definition A 4 initialization boot asm 1 9 initialization routine 1 6 input port 8 16 integer division 3 9 example 3 11 interface SRAM 4 8 two strobes 4 10 interfaces external See external interfacing parallel processing 8 18 shared bus
65. 9 11 Current Versus Frequency and Supply Voltage Incremental pp mA 0 6 0 4 0 2 Vpp 5 5 V Vpp 5 25V Operating Frequency MHz Power supply current consumption does not vary significantly with operating temperature However you can use a scale factor of 2 normalized Ipp per 50 C change in operating temperature to derate current within the specified range noted in the TMS320C4x data sheet C4x Power Dissipation 9 21 Galculation of Total Supply Current Figure 9 12 Change in Operating Temperature C Normalized Ipp 1 3 1 02 EN o 99 98 97 96 20 10 0 10 20 30 40 50 60 70 80 Operating Temperature C This temperature dependence is shown graphically in Figure 9 12 Note that a temperature scale factor of 1 0 corresponds to current values at 25 C which is the temperature at which all other references in the document are made 9 5 3 Design Equation 9 22 The procedure for determining the power supply current requirement can be summarized in the following equation ltotal laidle liops libus Ixbusglobal Ixbuslocal IDMA lcp XFxVxT where F is a scale factor for frequency Vis a scale factor for supply voltage T is a scale factor for operating temperature Table 9 2 describes the symbols used in the power supply current equation and gives the value and the number from which the value is obtained Calculation of Total Supply Current
66. Chapter 2 Program Control Chapter 1 Processor Initialization Chapter 2 Program Control Chapter 5 Programming Tips Chapter 4 Memory Interfacing Chapter 11 XDS510 Emulator Design Consider ations This document uses the following conventions QU Program listings program examples file names and symbol names are shown in a special font Examples use a bold version of the special font for emphasis Here is a sample program listing segment ARO RO ARO RO AR0 1 ARO 1 RO LOOP1 RPTB MAX CMPF MAX LDF LT B NEXT LOOP2 RPTB MIN CMPF MIN LDF LT NEXT Compare number to the maximum If greater this is a new max RO Compare number to the minimum If smaller this is new minimum Throughout this book MSB indicates the most significant bit and LSB indi cates the least significant bit MS indicates the most significant byte and LS indicates the least significant byte Information About Cautions and Warnings This book may contain cautions and warnings This is an example of a caution statement A caution statement describes a situation that could potentially damage your software or eguipment This is an example of a warning statement A warning statement describes a situation that could potentially cause harm to you The information in a caution or a warning is provided for your protection Please read each caution and warning carefully Related Documentation From Texas I
67. D D D D O E You need to allocate the section addresses using a linker command file see the TMS320 Floating Point DSP Assembly Language Tools User s Guide book for more information about linker command files as shown in Example 1 2 How to Initialize the Processor Example 1 1 Processor Initialization Example E E r T Fr F Fr Fr reset word init RESET vector _myvect sect myvect Named section for int vectors Space2 Reserved space word tintO Timer 0 ISR address _mytrap sect mytrap named section for trap vectors _mystack usect mystack 500 reserve 500 locations for Stack text stacka word mystack address of mystack section ivta word _myvect address of myvect section tvta word _mytrap address of mytrap section ieval word 1 IE register value gctrl word target board specific lctrl word target board specific mctrla word 100000h address of the global memory init Create Reset Vector Sect rst sect Named section for RESET vector Create Interrupt Vector Table Create Trap Vector Table word trap0 Trap 0 subroutine address Create Stack control register Initialize the DP Register ldp stacka Set Expansion Register IVTP LDI ivta ARO iDPE ARO IVTP T Set Expansion Register TVTP LDI tvta ARO iDPE ARO TVTP
68. Data Into Four Byte Wide Data Array ITLE USE OF UNPACKING 32 BIT DATA INTO FOUR BYTE WID DATA ARRAY HIS EXAMPLE ASSUMED THAT THE 32 BIT DATA CONTAINS FOUR 8 BIT UNSIGNED DATA O ox LDI size 1 RC Load array size LDI input_adr ARO Load RPTBD UNPACK input address LDI Garrayl AR1 Load output data array 1 address RPTBD UNPACK LDI array2 AR2 Load output data array 2 address LDI Garray3 AR3 Load output data array 3 address LDI array4 AR4 Load output data array 4 address gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt Loop starts here LBUO ARO R8 Unpack first byte STI R8 AR1 1 LBU1 ARO R8 Unpack second byte STI R8 AR2 1 LBU2 ARO R8 Unpack third byte STI R8 AR3 1 LBU3 ARO 1 R8 Unpack fourth byte UNPACK STI R8 AR4 1 Logical and Arithmetic Operations 3 5 Bit Reversed Addressing 3 4 Bit Reversed Addressing The C4x can implement fast Fourier transforms FFT with bit reversed ad dressing If the data to be transformed is in the correct order the final result of the FFT is scrambled in bit reversed order To recover the frequency do main data in the correct order certain memory locations must be swapped The bit reversed addressing mode makes swapping unnecessary The next time data is accessed the access is bit reversed rather than sequential In C4x this bit reverse
69. Example 3 12 shows the implementa tion of a 32 bit x 32 bit multiplication Example 3 12 32 Bit by 32 Bit Multiplication 3 18 RESULTING MPYI3 RO R1 R2 MPYSHI3 RO R1 R3 TITLE 32 BIT x 32 BIT MULTIPLICATION MULTIPLIES 2 32 BIT NUMBERS PRODUCING A 64 BIT RESULT THE TWO NUMBERS RO AND R1 ARE MULTIPLIED IN W R3 R2 RO x RI R3 R2 Floating Point Format Conversion IEEE to From C4x 3 8 Floating Point Format Conversion IEEE to From C4x In fixed point arithmetic the binary point that separates the integer from the fractional part of the number is fixed at a certain location Therefore if the binary point of a 32 bit number is fixed after the most significant bit which is also the sign bit only a fractional number a number with an absolute value less than 1 can be represented In other words there is a number with 31 frac tional bits All operations assume that the binary point is fixed at this location The fixed point system although simple to implement in hardware imposes limitations in the dynamic range of the represented number This causes scal ing problems in many applications You can avoid this difficulty by using floa ting point numbers A floating point number consists of a mantissa m multiplied by base b raised to an exponent e m x be In current hardware implementations the mantissa is typically a normalized num
70. Figure 4 5 shows the C4x s local bus interfaced to eight Integrated Device Technology IDT71258 20 ns 64K x 4 bit CMOS static RAMs with zero wait states using chip enable controlled write cycles The SRAMs are arranged to implementthe first 64K 32 bit words in external memory located at addresses 00000h thru OFFFFh internal ROM is assumed to be disabled If these 64K words of SRAM are the only memory controlled by LSTRBO the LSTRB AC TIVE field of the local memory interface control register LMICR should be set to its minimum value of 011112 allowing LSTRBO to be active for only the first Memory Interfacing 4 7 Zero Wait State Interfacing to RAMs 64K words of the C4x s memory space In addition if this memory is the only memory interfaced to LSTRBO LSTRBO requires only one page and the PA GESIZE field of the LMICR should be set to 011119 Also note that in Figure 4 5 the LRDYO input is tied low selecting zero wait states for all LSTRBO accesses on the local bus With all of the zero wait state memory controlled by LSTRBO LSTRB1 can be used to control accesses to slower read only memory devices or other types of memory Figure 4 5 C4x Interface to Eight Zero Wait State SRAM 1DT71258 SRAM IDT71258 SRAM In this circuit implementation no external logic is necessary to interface the C4x to the memory device Typically memory devices must be held inactive CS inactive d
71. For registration information pricing or to enroll call 800 336 5236 ext 3904 Development Support and Part Order Information 10 5 Sockets 10 2 Sockets Table 10 1 contains available sockets that accept the 325 pin C40 pin grid array PGA and the 304 pin C44 Plastic Quad Flatpack PQF Table 10 2 lists the phone numbers of the manufacturers listed in Table 10 1 Table 10 1 Sockets that Accept the 325 pin C40 and the 304 pin C44 Manufacturer Type Part Number Advanced Interconnections C40 wire wrap socket 3919 AMP C40 tool activated ZIF socket AMP 382533 9 AMP Actuation tool for AMP382533 9 AMP 854234 1 AMP C40 handle activated ZIF socket AMP 382320 9 AMP C40 PGA ZIF AMP 55291 2 Emulation Technology C40 logic analyzer socket BZ6 325 H6A35 TMS320C40Z Emulation Technology C40 wire wrap socket AB 325 H6A35Z P13 M Mark Eyelet C40 wire wrap socket MP325 73311D16 Yamaichi TMS320C44 PDB Socket 304 pins ic201 3044 004 Table 10 2 Manufacturer Phone Numbers Manufacturer Phone Number AMP 717 564 0100 Advanced Interconnections 401 823 5200 Emulation Technology Mark Eyelet Yamaichi 408 982 0660 203 756 8847 408 456 0797 10 6 The remainder of this section describes two available sockets that accept the C4x pin grid array PGA Both sockets feature zero insertion force ZIF A A tool activated ZIF socket TAZ J Ahandle activated ZIF socket HAZ The sockets described
72. INLOP ADDI AR7 R10 ADDI BK ARO AR1 CMPI R9 R11 BZD SPCL ADDI BK AR1 AR2 ADDI BK AR2 AR3 SUBI3 1 R8 RC LDI R10 AR4 ADDI SINTAB1 AR4 ADDI AR4 R10 AR5 SUBI 1 AR5 RPTBD BLK2 ADDI R10 AR5 AR6 SUBI 1 AR6 LDF AR2 R7 SECOND LOOP BLK2 ADDF R7 ARO R3 ADDF AR3 AR1 R5 ADDF R5 R3 R6 SUBF R7 ARO RA SUBF R5 R3 ADDF AR2 ARO R1 ADDF AR3 AR1 R5 MPYF R3 AR5 IR1 R6 HI STF R6 ARO ADDF R5 R1 RO SUBF AR2 ARO R2 SUBF R5 R1 MPYF R1 AR5 RO HI STF RO AR0 IRO SUBF RO R6 SUBF AR3 AR1 R5 MPYF R1 AR5 IR1 RO HI STF R6 AR1 MPYF R3 AR5 R6 ADDF RO R6 ADDF R5 R2 R1 SUBF R5 R2 SUBF AR3 AR1 R5 SUBF R5 R4 R3 ADDF R5 R4 MPYF R3 AR4 IR1 R6 HI STF R6 AR1 IRO MPYF R1 AR4 RO SUBF RO R6 MPYF R1 AR4 IR1 R6 HI STF R6 AR2 init IAl index Init loop counter for inner loop X I Y I pointer Increment inner loop counter IAl IA1 IE X T1 Y I1 pointer If LPCNT JT go to special butterfly X I2 Y I2 pointer X I3 Y I3 pointer RC should be one less than desired Create cosine index AR4 IA2 IA1 IA1 1 Setup loop BLK2 IA3 IA2 IA1 1 R7 Y 12 R3 Y 1 12 R5 Y 11 Y 13 R6 R3 R5 R4 Y I Y 12 R3 R3 R5 R1 X I X I2 R5 X I1 X 13 R6 R3 CO2 Y I R3 R5 RO R1 R5 R2 X I X 12 R1 R1 R5 RO R1 SI2 X I R1 R5 R6 R3 CO2 R1 SI2 R5 Y I1 Y I3 RO R1 C02 Y Il R3 CO2 R1
73. KKK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KKK A AA KKK KK KKK KKK KKK KKK KKK KKK KKK DESCRIPTION Generic function to do a radix 2 FFT computation on the C40 The input data array is FFT SIZE long with only real data The output is stored in the same locations in place with real and imaginary points R and I as follows DEST_ADDR 0 gt R 0 R 1 R 2 R 3 R FFT SIZE 2 I FFT SIZE 2 1 I 2 DEST ADDR FFT SIZE 1 gt I 1 The program is based on the FORTRAN program in the paper by Sorensen et al June 1987 issue of Trans on ASSP Bit reversal is optionally implemented at the beginning of the function The sine cosine table for the twiddle factors is expected to be supplied in the following format SINE_TABLE 0 gt sin 0 2 pi FFT SIZI sin 1 2 pi FFT SIZI 1 E sin FFT_SIZE 2 2 2 pi FFT_SIZE SINE TABLE FFT SIZE 2 1 gt sin FFT SIZE 2 1 2 pi FFT SIZE NOTE The table is the first half period of a sine wave NOTES 1 Calling C program can be compiled with large or small model Both calling conventions methods stack or register for parameter passing are supported 2 Sections needed in linker command file ffttxt fft code fftdat fft data 3 The DEST ADDR must be aligned such that the first LOG SIZE bits are zero this is not checked by the program Caution DP initialized only once in the pr
74. MPYF3 STF 1 lt I lt N AR0 1 AR1 RO RO R2 R2 AR1 1 5 R4 R1 R1 ARO 1 tmuerr gt R1 h n N 1 i x n N 1 i gt RO ultiply and add operation x n N N 1 i tmuerr gt R1 Ri gt h n 1 N 1 1 1 Applications Oriented Operations FIR IIR and Adaptive Filters Example 6 8 Adaptive FIR Filter LMS Algorithm Continued LOOP end ADDF3 BUD ADDF3 STF end ARO 1 R1 R1 R11 RO R2 R0 R1 ARO 1 h n N 1 i x n N 1 i tmuerr gt R1 Delayed return Add last product ih n 0 x n tmuerr gt h n 1 0 6 16 Lattice Filters 6 3 Lattice Filters The lattice form is an alternative way of implementing digital filters it has appli cations in speech processing spectral estimation and other areas In this dis cussion the notation and terminology from speech processing applications are used If H z is the transfer function of a digital filter that has only poles A z 1 H z will be a filter having only zeros and it will be called the inverse filter The in verse lattice filter is shown in Figure 6 4 These equations describe the filter in mathematical terms f i n f i 1 n k i b i 1 n 1 b i n b i 1 n 1 k i f i 1 n Initial conditions f 0 n b 0 n x n Final conditions y n f p n In the above equation f i n is
75. N1 N2 CONT BRD LOOP Next FFT stage delayed LSH 2 BK N2 N2 4 LSH3 1 BK R9 ADDI 2 R9 JT N2 2 2 ENDB ck ck 0k ck KKK KKK KKK KKK KKK KKK A KA KKK KK AX KA KKK KKK KKK KKK KKK AKA AAA KA AKA KA AKA KA A KA KKK KKK BIT REVERSAL This bit reversal section assumes input and output in Re Im Re Im format ck ck ck ck ck 0k ck Ck ck A KA AAA KA A KA AXA A KA X A AXA AX KA KA AKA KA A KA XK AZ KA AKA ZA AKA AKA KA AKA KA A KA KKK ko ko LDI INPUTP ar CMPI GOUTPUTP ar0 BEOD INPLACE LDI OUTPUTP arl arl DST_ADDR LDI RFFTSIZ irO 1r0 FFT_SIZE SUBI 2 190 re CC FFT_SIZE 2 RPTBD bitrvl LDI 2 irl jirl 2 LDF ar0 1 r0 read first Im value OP LDF ar0 ir0 b r1 B STF r0 arl 1 bitrvl LDF ar0 1 r0 M STF rl arl irl BUD END LDF ar0 ir0 b rl STF rO arl 1 NOP STF rl arlINPLACE RPTBD BITRV2 NOP arl 2 OP ar0 1r0 b NOP CMPI arl ar0 BGEAT CONT2 Applications Oriented Operations 6 39 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued CONT2 BITRV2 END arl r0 ar rl r 0 ar rl arl skal i ar0 1 r0 tar el ar arl ar0 AR7 r r 0 1 1 1 2 ir0 b 0 1 Restore the register values and return 6 40 Fast Fourier Transforms FFTs 6 5 3 Faster Complex Radix 2 DIT FFT Example 6 1
76. Provided that STRB is included in chip select decodes this causes all devices selected by that STRB to be disabled during this period The next page of devices is not en abled until STRB and PAGE go low again If the high order address lines remain constant during a read cycle the memory access time with page switching is the same as memory access time without page switching In addition page switching is not reguired during writes because these write cycles exhibit an inherent one half H1 cycle setup of address information before STRB goes low Thus when you use page switching for read write devices a minimum of half of one H1 cycle of address setup is provided for all accesses outside a page boundary Therefore large amounts of memory can be implemented without wait states or extra hardware required for isolation between pages Also note that access time for cycles during page switching is the same as that of cycles without page switching and accordingly full speed accesses may still be accomplished within each page The circuit shown in Figure 4 8 illustrates page switching with the CY7B185 15 ns 8K x 8 BICMOS static RAM This circuit implements 32K 32 bit words of memory with full speed zero wait state accesses within each page Wait States and Ready Generation Figure 4 8 Page Switching for the CY7B185 Bank 0 4 x CY7B185
77. R1 AR1 first and second FFT loops DA DA DA DA DDI DDI DDI DA SH UBI DF DF DDF3 UBF3 UBF3 WubBbbUuObrbb PPP Ze lt P Zo X I1 X I1 X I1 X I3 H CO B UDN H O V DEST_ADDR AR1 AR1 AR2 AR1 AR3 AR1 AR4 1 AR2 2 AR3 3 AR4 4 IRO GFFT SIZE RC 2 RC 2 RC AR2 RO AR3 R1 R1 AR4 R4 R1 AR4 IRO R5 RO AR1 R6 Petit X 12 X 12 X 14 So Se Se Se Se X I2 DestAddr Then move data X 13 X 13 RO R4 R5 R6 X I4 X I4 6 62 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued NE Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne ne npr pn APNPNNnNNHNNPHE PZ PNDHEP EE Br butt PTBD LOOP1_2 DDF3 RO AR14 4 IRO R7 DDF3 R7 R4 R2 UBF3 R4 R7 R3 DF AR2 IRO RO DF AR3 IRO R1 DDF 3 R1 AR4 R4 TF R3 AR3 IRO UBF3 R1 AR4 IR0 R5 TF R5 AR4 IRO UBF3 RO AR1 R6 TF R6 AR2 IRO DDF3 RO AR1 IRO R7 TF R2 AR1 IRO DDF3 R7 R4 R2 UBF3 R4 R7 R3 TF R3 AR3 TF R5 AR4 IRO TF R6 AR2 TF R2 AR1 IRO hird FFT Loop Rl gt Lu 30 Bl TZ 2 3 R2 gt _ I3 4 lt 5 R3 gt _ 14 6 7 Rl gt 8 9 NI DA DEST ADDR AR1 DA AR1 AR2 DA AR1 AR3 DDI 4 AR2 DDI 6 AR3 DA 8 IRO DI GFFT SIZE RC SH 3 RC UBI 2 RG PTBD LOOP3 A
78. R4 R3 ARO IR1 RO AR3 R4 R3 R4 AR3 R2 R2 AR1 R2 ARO IR1 R4 R3 AR2 R6 IR1 R4 R1 R3 R2 ARO R1 R3 AR4 IR1 R1 RO R4 AR1 IR1 R2 R4 AR3 IR1 AR5 AR1 RO FFT_SIZE RO INLOP AR2 IR1 R7 IR1 R7 RC l R5 LOG SIZI LOOP SOURCE ADDR AR1 1 IRO 1 R7 El RO Perform Third FFT loop NENE Ne Ne Ne Ne Mese Ne e Ne Ne e Ne e Me Ne e Ne Ne Ne Ne Ne Ne SNe e R3 X I4 X I3 R2 X I3 X I4 R3 X I4 X I3 R2 X I3 X I4 X 11 R4 R2 COS X I2 lt R3 R3 SIN R2 COS 4 F Rl R2 SIN X I4 lt R4 R3 COS R2 SIN R3 X I1 X I2 R2 X 11 X 12 Rl R3 SIN X 13 R4 X I4 RO R3 COS X I1 n R4 R2 COS X I2 lt 7 Get prepared for the next R3 R3 SIN R2 COS 4 R1 R2 SIN X I4 lt R4 R3 COS R2 SIN DUMMY X I3 LOOP BACK TO THE INNER LOOP DUMMY next stage if any left double step in sine table 6 78 Example 6 18 Real Inverse Radix 2 FFT Continued Fast Fourier Transforms FFTs Part A AR1 gt I1 O lt X 11 X I3 i d s AR2 I2 2 lt 2 X I2 i o ee H AR3 gt I3 4 lt XL XLS i oe EES gt AR4 gt __ 14 6 lt 2 X T4 i 7 H AR1 8 9 i i V LDA SOURCE_ADDR AR
79. SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED AUTHORIZED OR WARRANTED TO BE SUITABLE FOR USE IN LIFE SUPPORT DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS INCLUSION OF TI PRODUCTS IN SUCH APPLICATIONS IS UNDERSTOOD TO BE FULLY AT THE CUSTOMER S RISK In order to minimize risks associated with the customer s applications adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards TI assumes no liability for applications assistance or customer product design TI does not warrant or represent that any license either express or implied is granted under any patent right copyright mask work right or other intellectual property right of TI covering or relating to any combination machine or process in which such semiconductor products or services might be or are used TI s publication of information regarding any third party s products or services does not constitute Tl s approval warranty or endorsement thereof Copyright 1999 Texas Instruments Incorporated About This Manual Preface Read This First This users guide serves as an applications reference book for the TMS320C40 and TMS320C44 digital signal processors DSP Throughout the book all references to the TMS320C4x apply to both devices exceptions are noted Specifically this book complements the 7MS320C4x User s Guide by provid ing information to assist managers and hardware software engineers in ap plication
80. SI2 R6 R3 SI2 R6 R1 CO2 R3 SI2 R1 R2 R5 R2 R2 R5 R5 X I1 X I3 R3 R4 R5 R4 R4 R5 R6 R3 CO1 X I1 R1 CO2 R3 SI2 RO R1 SI1 R6 R3 CO1 R1 SI1 R6 R1 CO1 Y I2 R3 CO1 R1 SI1 Me Ne Ne Ne Ne e Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne SNe e Applications Oriented Operations 6 37 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued PYF R3 AR4 RO ADDF RO R6 PYF R4 AR6 IR1 R6 STF R6 AR2 IRO PYF R2 AR6 RO SUBF RO R6 PYF R2 AR6 IR1 R6 STF R6 AR3 PYF R4 AR6 RO ADDF RO R6 BLK2 STF R6 AR3 IRO LDF AR2 R7 CMP I R11 BK BPD INLOP LDI R11 ARO ADDI QINPUTP ARO ADDI 2 R11 BRD CONT LSH 2 R8 LSH 2 AR7 LDI BK IRO SPECIAL BUTTERFLY FOR W J SPCL RPTBD BLK3 LSH 1 IR1 AR4 ADDI SINTAB AR4 LDF AR2 R7 SPCL LOOP BLK3 ADDF R7 ARO R1 ADDF AR2 ARO R3 SUBF AR2 ARO R4 ADDF AR3 AR1 R5 SUBF R1 R5 R6 ADDF R5 R1 ADDF AR3 AR1 R5 SUBF R5 R3 R0 ADDF R5 R3 SUBF R7 ARO R2 STF R3 ARO LDF AR3 R7 STF RI ARO IRO SUBF AR3 AR1 R3 SUBF R7 AR1 R1 STF R6 AR1 ADDF R3 R2 R5 SUBF R2 R3 R2 SUBF R1 R4 R3 ADDF R1 R4 SUBF R5 R3 R1 MPYF R1 AR4 R1 STF RO AR1 IRO ADDF R5 R3 MPYF R3 AR4 R3 STF R1 AR2 SUBF R4 R2 R1 MPYF R1 AR4 R1 RO R3 S11 R6 R1 CO1 R3 SI1 R6 RA CO3 X 12 R
81. TAB tape encapsulated 324 pad TAB tape bare die 324 pad TAB tape encapsulated 324 pad TAB tape encapsulated 324 pad TAB tape bare die 324 pad TAB tape encapsulated Known Good Die Known Good Die Known Good Die Known Good Die Development Support and Part Order Information 10 11 Part Order Information Table 10 4 Development Support Tools Part Numbers Development Tool C Compiler Assembler Linker C Compiler Assembler Linker C Compiler Assembler Linker Assembler Linker Simulator C language Simulator C language Tartan Floating Point Library Tartan Floating Point Library Digital Filter Design Package C Source Debugger Conversion Software C Source Debugger Conversion Software Emulation Porting Kit C3x C4x Tartan C C Compiler Assembler Linker C3x C4x Tartan C C Compiler Assembler Linker C3x C4x Tartan C C Compiler Assembler Linker Simulator C3x C4x Tartan C C Compiler Assembler Linker Simulator C3x C4x Tartan C C XDS510 Debugger C3x C4x Tartan C C XDS510 Debugger XDS510 Emulatort XDS510WS Emulator PC Sparc JTAG Emulation Cable Parallel Processing Development System Reguires licensing agreement Part Number TMDS3243855 02 TMDS3243255 08 TMDS3243555 08 TMDS3243850 02 TMDS3244851 02 TMDS3244551 09 320FLO PC C40 320FLO SUN C40 DFDP TMDS3240140 TMDS3240640 TMDX3240040 TAR CCM PC TAR CCM SP TAR SIM PC TAR SIM SP TAR DEG XDS PC TAR DEG XDS SP
82. TCK cycle setup to the next device s TDI signal This type of timing scheme minimizes race conditions that would occur if both TDO and TDI were timed from the same TCK edge The penalty for this timing scheme is a reduced TCK frequency The IEEE 1149 1 specification does not provide rules for bus master emula tor devices Instead it states that it expects a bus master to provide bus slave compatible timings The XDS510 provides timings that meet the bus slave rules 11 3 IEEE 1149 1 Standard For more information concerning the IEEE 1149 1 standard contact IEEE Customer Service Address IEEE Customer Service 445 Hoes Lane PO Box 1331 Piscataway NJ 08855 1331 Phone 800 678 IEEE in the US and Canada 908 981 1393 outside the US and Canada FAX 908 981 9667 Telex 833233 XDS510 Emulator Design Considerations 11 3 JTAG Emulator Cable Pod Logic 11 4 JTAG Emulator Cable Pod Logic Figure 11 2 shows a portion of the emulator cable pod These are the function al features of the pod A Signals TDO and TCK RET can be parallel terminated inside the pod if reguired by the application By default these signals are not terminated Signal TCK is driven with a 74LVT240 device Because of the high current drive 32 mA loj loH this signal can be parallel terminated If TCK is tied to TCK RET then you can use the parallel terminator in the pod J Signals TMS and TDI can be generated from the falling edge of TCK RE
83. Table 6 2 FFT Timing Benchmarks Cycles Complex Real Radix 2 Radix 4 Radix 2 Forward Inverse Points Example 6 12 Example 6 14 Example 6 15 Example 6 17 Example 6 18 64 22904 17451 14251 7521 10121 128 51791 3336 1683 2269 256 115881 9216t 76551 38141 50861 512 256771 17302 86331 113431 1024 56411 47237t 38945t 19404t 25120t Assumptions T The data is in on chip RAM1 Program fftxt and reserved data fftdat are in on chip RAMO The sine Cosine table is in on chip RAMO Bit reversing is not considered The cache is enabled t The data is in on chip RAM Program ffttxt and reserved data fftdat are a in local global bus RAM with O wait states Bit reversing is not considered The sine cosine table is on the global local bus The cache is enabled Applications Oriented Operations 6 87 6 88 Chapter 7 Programming the DMA Coprocessor The C4x DMA Direct Memory Access coprocessor is a C4x peripheral mod ule With its six channels the DMA maximizes sustained CPU performance by alleviating the CPU of burdensome I O Any of the six DMA channels can transfer data to and from anywhere in the C4x s memory map for maximum flexibility Topic Page 7 1 Hints for DMA Programming eese RR 7 2 7 2 When a DMA Channel Finishes a Transfer ooooooooooo 7 3 7 3 DMA Assembly Programming Examples e 7 4 7 4 DMA C Programming Examples o 7 9
84. Table 9 2 Current Equation Typical Values Fc 40 MHz Value Symbol Min Typical Max Note Reference Igidle2 20 uA 50 uA Idle2 shutdown Figure 9 2 laidle 130mA 130mA 130mA Internal idle Figure 9 2 liops 60 mA 60 mA 60 mA Branch to self internal Figure 9 2 libus 0 mA 50mA 190 mA Data dependent Figure 9 3 Figure 9 4 Ixbusglobal max 0 mA 50mA 280 mA Data and Cigag Figure 9 5 Figure 9 6 dependent Figure 9 9 Ixbuslocal Max OmA 50mA 280mA Data and Cioad Figure 9 5 Figure 9 6 dependent Figure 9 9 IDMA 0 mA 50mA 300mA Data and source Figure 9 7 destination dependent ICP 0 mA 50 mA 250 mA Data dependent Figure 9 8 Notes 1 All values are scaled by frequency and supply voltage The nominal tested frequency is 40 MHz 2 Externally driven signals are capacitive load dependent 3 Itis unrealistic to add all of the maximum values since it is impossible to run at those levels 9 5 4 Average Current Over the course of an entire program some segments typically exhibit signifi cantly different levels of current for different durations For example a program may spend 80 of its time performing internal operations and draw a current of 250 mA it may spend the remaining 20 of its time performing writes at full speed to both buses and drawing 790 mA While knowledge of peak current levels is important in order to establish power supply requirements some applications require information about average current This is particularly
85. Table With Twiddle Factors for a 64 Point FFT lt 2 6 32 Complex Radix 4 DIF FET assscessenire e RR RR kok piken hitanan k EE ba 6 34 Faster Version Complex Radix 2 DIT FFT eee 6 42 Bit Reversed Sine Table 2 nn 6 55 Real Forward Radpc2 EFT 222sue setate hr he doo o r ae 6 56 Real Inverse Radix 2 FFT lt nen 6 73 Array initialization With DMA 2 22 IIR a EE 7 4 DMA Transfer With Communication Port ICRDY Synchronization 7 5 DMA Split Mode Transfer With External Interrupt Synchronization 7 6 DMA Autoinitialization With Communication Port ICRDY cece eee 7 7 Single Interrupt Driven DMA Transfer eee 7 8 Unified Mode DMA Using Read Sync eee nee 7 10 Unified Mode DMA Using Autoinitialization Method 1 cee ee eee 7 11 Unified Mode DMA Using Autoinitialization Method 2 e cee eee eee 7 12 Split Mode Auxiliary DMA Using Read Synek 7 13 Split Mode Auxiliary and Primary Channel DMA 200000 cece eee eee eee 7 14 Split Mode DMA Using Autoinitialization siiis eese 7 15 Include File for All C Examples dma h 0000 0 c cece eee eects 7 17 Read Data from Communication Port With CPU ICFULL Interrupt 8 3 Write Data to Communication Port With Polling Method ee o 8 4 Chapter 1 Processor Initialization Before you execute a DSP algorithm it is necessary to initialize the
86. Timing for Read Operations Using Bank Switching 0060 e eee e eee 4 20 C4x Shared Distributed Memory Networks rh 4 21 Data Memory Organization for an FIR Filter 2 6 7 Data Memory Organization for a Single Biquad 00 0 cece eee eee eee 6 9 Data Memory Organization for N Biquads eese 6 11 Structure of the Inverse Lattice Filter 2 4444 k 6 17 Data Memory Organization for Inverse Lattice Filters cece eee 6 18 Structure of the Forward Lattice Filter 2 2 4444 k 6 19 Data Memory Organization for Matrix Vector Multiplication o o o 6 21 Impedance Matching for C4x Communication Port Design eee 8 5 Better Commport Signal Splitter 2 seh 8 11 Improved Interface Circuit eet eee ees 8 12 A C32 to C4x Interface raare een eee eens 8 14 A Token Forcer Circuit Output ee Ie 8 15 Communication Port Driver Circuit Input 2 eee 8 16 CSTRB Shortener Circuit tenet eee 8 17 C4x Parallel Connectivity Networks 0 00 c cece eee k 8 18 Message Broadcasting by One C4x to Many C4x Devices 0c cece ee 8 21 TeSt SEUD MR m 9 6 Internal and Quiescent Current Components 9 8 Internal Bus Current Versus Transfer Rate 0c cece eee eee ees 9 9 Internal Bus Current Versus Data Complexity Derating Curve nnu 9
87. a portion of the current used to calculate system power dissipation due to Vpp at 5 volts Power dissipation is defined as P VxI where P is power V is voltage and is current If device outputs are driving any DC load to a logic high level only a minor contribution is made to power dissipation because CMOS outputs typically drive to a level within a few tenths of a volt of the power supply rails If this is the case subtract these current com ponents out of the TMS320C4x supply current value and calculate their con tribution to system power dissipation separately see Figure 9 13 Calculation of Total Supply Current Figure 9 13 Load Currents IDD IOUT IOUT TMS320C4x Device Output Driven High Iss 0 VDD Ipp 0 IDD TMS320C4x IOUT Device Output Driven Low Iss loUT Furthermore external loads draw supply current pp only when outputs are driven high because when outputs are in the logic zero state the device is sinking current through Vss which is supplied from an external source There fore the power dissipation due to this component will not contribute through Ipp but will contribute to power dissipation with a magnitude of P VoL x loL where Vo is the low level output voltage and lo is the current being sunk by the output as shown in Figure 9 13 The power dissipation component due to outputs being driven low should be calculated and added to the total power dissipation When
88. address generation logic Logic circuitry that generates the address es for data memory reads and writes This circuitry can generate one ad dress per machine cycle See also program address generation logic data page pointer A seven bit register used as the seven MSBs in ad dresses generated using direct addressing decode phase The phase of the pipeline in which the instruction is de coded DIE See DMA interrupt enable register Glossary DMA coprocessor Aperipheralthattransfers the contents of memory loca tions independently of the processor except for initialization DMA coniroller See DMA coprocessor DMA interrupt enable register DIE A register in the CPU register file that controls which interrupts the DMA coprocessor responds to DP See data page pointer dual access RAM Memory that can be accessed twice in a single clock cycle For example your code can read from and write to a dual access RAM in one clock cycle external interrupt A hardware interrupt triggered by a pin extended precision floating point format A 40 bit representation of a floating point number with a 32 bit mantissa and an 8 bit exponent extended precision register A 40 bit register used primarily for extended precision floating point calculations Floating point operations use bits 39 0 of an extended precision register Integer operations however use only bits 31 0 FIFO buffer First in first out buffer A portion o
89. and less than 10 us Software will set the EMUx OUT pin to a high state 2 To enable the open collector driver and pullup resistor on EMU1 to provide rising falling edges of less than 25 ns the modification shown in this figure is suggested Rising edges slowerthan 25 ns can cause the emulator to detect false edges during the RUNB command or when the external counter selected from the debugger analysis menu is used XDS510 Emulator Design Considerations 11 21 Emulation Design Considerations If itis not important that the devices on one target board are stopped by devices on another target board via the EMO 1 then the circuit in Figure 11 12 can be used In this configuration the global stop capability is lost It is important not to overload EMU0 1 with more than 16 devices Figure 11 12 EMUO 1 Configuration Without Global Stop Pullup Resistor EMU0 1 Device 1 N i To Emulator l EMUO 1 Target Board 1 Target Board m Pullup Resistor e EMUO 1 Device Device 1 ies Liz lt Note The open collector driver and pullup resistor on EMU1 must be able to provide rising falling edges of less than 25 ns Rising edges slower than 25 ns can cause the emulator to detect false edges during the RUNB command or when the external counter selected from the debugger analysis menu is used If this condition cannot be met then the EMU0 1 signals from the individual boards should be ANDed together as shown in
90. applications in digital signal processing a filter must be adapted over time to keep track of changing conditions The book Theory and Design of Adaptive Filters by Treichler Johnson and Larimore Wiley Interscience 1987 presents the theory of adaptive filters Although in theory both FIR and IIR structures can be used as adaptive filters the stability problems and the local optimum points that the IIR filters exhibit make them less attractive for such an application Hence until further research makes IIR filters a better choice only the FIR filters are used in adaptive algorithms of practical applica tions In an adaptive FIR filter the filtering equation takes this form y n h n 0 x n h n 1 x n 1 h n N 1 x n N 1 The filter coefficients are time dependent In a least mean squares LMS al gorithm the coefficients are updated by an equation in this form Applications Oriented Operations 6 13 FIR IIR and Adaptive Filters 6 14 h n 1 i h n 1 b x n i i 0 1 N 1 bis a constant for the computation The updating of the filter coefficients can be interleaved with the computation of the filter output so that it takes 3 cycles per filter tap to do both The updated coefficients are written over the old filter coefficients Example 6 8 shows the implementation of an adaptive FIR filter on the C4x The memory organization and the positioning of the data in memory should follow the sam
91. approach to power reduction is to reduce clock speed wherever possible during periods of inactivity Also the appropriate choice of clock generation approach will ensure mini mum system power dissipation The use of an external oscillator rather than the on chip oscillator can result in lower power device and system power dis sipation levels As described previously the internal oscillator can require as much as 10 mA when operating at 40 MHz If you use an external oscillator that requires less than 10 mA for clock generation overall system power is reduced When considering switching rates of signals other than the system clock the main consideration is to minimize switching Specifically any unnecessary switching should be avoided Outputs or inputs that are unused should either be disabled tied high or grounded whichever is appropriate Additionally out puts connected to external circuitry should drive other power dissipation ele ments only when absolutely necessary C4x Power Dissipation 9 29 Design Considerations 9 7 2 Capacitive Loading of Signals Current reguirements are also directly proportional to capacitive loading Therefore all capacitive loading should be minimized This is especially signif icant for device outputs The approaches to minimize capacitive loading are consistent with efficient PC board layout and construction practices Specifically signal runs should be as short as possible especially for s
92. bus protocol 11 3 byte manipulation 3 4 C code compiler efficient usage 5 2 C compiler 10 2 C examples include file 7 17 cable target system to emulator 11 1 to 11 24 cable pod 11 4 11 5 cache enabling 1 9 optimization of code 5 5 CALL instruction 2 7 CALLcond instruction 2 2 2 21 calls example code 2 2 zero overhead 2 4 carry bit definition A 2 central processing unit CPU definition A 2 chip enable CE controls 4 5 circular addressing definition A 2 code generation tools 10 2 code optimization 5 5 BUD instruction 5 5 Index 2 delayed branches 5 5 internal memory 5 6 LAJ instruction 5 5 parallel instruction set 5 5 pipeline conflicts 5 6 registers 5 5 RPTB and RPTBD instructions 5 5 RPTS instruction 5 5 communication port ICRDY synchronization 7 5 communication ports 8 1 8 18 CSTRB shortener 8 17 hardware design guidelines 8 9 impedance matching 8 5 message broadcasting 8 20 software applications 8 2 termination 8 8 token forcer 8 15 word transfer 8 7 companding 6 2 companding standards 6 2 compiler 10 2 constructs 5 2 to 5 5 computed GOTO example 2 21 computed GOTOs 2 21 configuration multiprocessor 11 11 connector 14 pin header 11 2 dimensions mechanical 11 12 DuPont 11 3 consecutive reads 4 6 consecutive writes diagram 4 7 context restore example 2 15 to 2 18 context save example 2 15 to 2 18 context save restore definition A 2 context switching 2 14 conversion of format IEEE to from C4
93. by 178 mA from Current Reguirement of Internal Components Figure 9 3 yields 159 mA due to internal bus usage Therefore an algorithm running under these conditions reguires about 349 mA of power supply current 130 60 159 Since a statistical knowledge of the data may not be readily available a nomi nal scale factor may be used The median between the minimum and maxi mum values at 50 relative data complexity yields a value of 0 93 and can be used as an estimate of a nominal scale factor Therefore this nominal data scale factor of 93 can be used for internal bus data dependency adding 165 5 mA to 130 mA quiescent and 60 mA internal operations to yield 355 5 mA As an upper bound assume worst case conditions of three accesses of alternating data every cycle adding 178 mA to 130 mA quiescent and 60 mA internal operations to yield 368 mA C4x Power Dissipation 9 11 Current Requirement of Output Driver Components 9 4 Current Requirement of Output Driver Components The output driver circuits on the TMS320C4x are required to drive significantly higher DC and capacitive loads than internal device logic drivers Because of this output drivers impose higher supply current requirements than other sec tions of circuitry in the device Accordingly the highest values of supply current are exhibited when external writes are being performed at high speed During read cycles or when the external buses are not being used
94. by POP It is important to per form the integer and floating point PUSH and POP in the above order since POPF forces the last eight bits of the extended precision registers to zero Figure 2 1 System Stack Configuration 2 2 2 User Stacks 2 8 Low Memory Bottom of stack Top of stack Free High Memory User stacks can be built to store data from low to high memory or from high to low memory Two cases for each type of stack are shown You can build stacks by using the preincrement decrement and postincrement decrement modes of modifying the auxiliary registers AR You can implement stack growth from high to low memory in two ways Case 1 Store to memory using ARnto push data onto the stack and read from memory using ARn to pop data off the stack Case 2 Store to memory using ARn to push data onto the stack and read from memory using ARn to pop data off the stack Figure 2 2 illustrates these two cases The only difference is that in case 1 the AR always points to the top of the stack and in case 2 the AR always points to the next free location on the stack Stacks and Oueues Figure 2 2 Implementations of High to Low Memory Stacks Low Memory Low Memory Free Free ARn gt Top of stack Top of stack Bottom of stack Bottom of stack High Memory High Memory Case 1 Case 2 You can implement stack growth from low to high memory in two ways C
95. ck KA AKA KA A KA XK AZ A KA XK AXA A KA XK AXA A KA KA AXA AAA KA AKA ck ck A KA ck ck ck ck ck AKA ko ck sk Sk Sk kv kx AAA VERSION DATE COMMENTS DEEP TO AAA RE m perc NUR a 1 0 10 87 PANNOS PAPAMICHALIS TI Houston Original Release 2 0 1 91 DANIEL CHEN TI Houston C40 porting 3 0 7 1 92 ROSEMARIE PIEDRA TI Houston made it C callable 4 0 6 29 93 ROSEMARIE PIEDRA TI Houston added support for in place bit reversing kk ck Ck ck ck Ck Sk ck KA ck K A ck K A K Sk KKK KKK ck KA AX KA ck KA Ck ZA K KA KAZ ck AS SYNOPSIS int cr2dif SOURCE ADDR FFT SIZE LOGFFT DST ADDR ar2 r2 3 re float SOURCE_ADDR input address int FFT SIZE 764 128 256 512 1024 int LOGFFT log base 2 of FFT SIZE float DST ADDR destination address The computation is done in place Sections to be allocated in linker command file ffttxt FFT code fftdat FFT data If SOURCE ADDR DST ADDR then in place bit reversing is performed kk ck Ck ck ck Ck Sk ck Ck Sk ck K A ck K A K K A Ck X KKK ck AA ck KA K KA ck ZA K KA KA ck KA ck ck A KX ck ck K AXA ck ck ck KA ck kk ck A kk Sk MA AA KKK KKK DESCRIPTION Generic program for a radix 2 DIF FFT computation using the TMS320C4x family The computation is done in place and the result is bit reversed The program is from the Burrus and Parks book p 111 The input data array is 2 FFT SIZE long with real and imaginary data in consecutive memory locations Re Im Re
96. extended precision register R11 respectively to save the re turn address The following subsections use example programs to explain how this works 2 1 4 Regular Subroutine Calls 2 2 The C4x has a 32 bit program counter PC and a virtually unlimited software stack The CALL and CALLcond subroutine calls increment the stack pointer and store the contents of the next value of the PC counter on the stack At the end of the subroutine RETScond performs a conditional return Example 2 1 illustrates the use of a subroutine to determine the dot product of two vectors Given two vectors of length N represented by the arrays a 0 a 1 a N 1 and b 0 b 1 b N 1 the dot product is computed from the expression d a 0 b 0 a 1 b 1 a N 1 b N 1 Processing proceeds in the main routine to the point where the dot product is to be computed It is assumed that the arguments of the subroutine have been appropriately initialized At this point a CALL is made to the subroutine trans ferring control to that section of the program memory for execution then re turning to the calling routine via the RETS instruction when execution has com pleted Note that for this particular example it would suffice to save the register R2 However a larger number of registers are saved for demonstration pur poses The saved registers are stored on the system stack which should be large enough to accommodate the maximum anticipated
97. found by normalizing the original vector by its length This involves a division by a square root The C4x single cycle instruction RSQRF generates an estimate of the reciprocal of the square root of a positive floating point number This estimate has the correct exponent and the mantissa is accurate to the eighth binary place the error of the mantissa is lt 2 8 Three rules apply to this algorithm 1 If v exp is even then x exp v exp 2 1 and x man 2 sqrt v man For the special case where the 10 MSBs of y man 01 00000000b then x man 2 2 78 01 11111111b In both cases the 23 LSBs of x frac 0 2 If v exp is odd then x exp v exp 1 2 1 and x man sqrt 2 v man The 23 LSBs of x frac 0 3 Ifv 0 v exp 128 then x exp 127 and x man 01 1111111111111111111111111111111b In other words if v 0 then xbecomes the largest positive number repre sentable in the extended precision floating point format The overflow flag V is set to 1 If you need larger precision than the RSQRF instruction gives for the estimate of the reciprocal of the square root you can use the Newton Raphson algo rithm to further extend the precision of the mantissa The algorithm is x n 1 x n 1 5 v 2 x n x n In this equation vis the number for which the reciprocal is desired x 0 is the seed for the algorithm and is given by RSQRF At every iteration of the algo rithm the n
98. gt RO SUBF3 RO R2 R2 Assume F P N gt R2 NOP F P N K P B P 1 N 1 F P 1 N gt R2 2 lt I lt P Repeat block loop start here MPYF3 ARO R2 R1 K I F I 1 N gt R1 MPYF3 ARO 1 AR1 1 RO K I 1 B I 1 1 N 1 RO ADDF3 AR1 1 R1 R3 B I 1 N 1 K I F I 1 N B I N gt R3 STF R3 AR1 2 B I N gt B I N 1 LOOP SUBF3 RO R2 R2 F I 1 N K I 1 B I 1 1 N 1 F I 1 1 N gt R2 I 1 CLEANUP BUD R11 Delayed return MPYF ARO R2 R1 K 1 F 0 N gt R1 ADDF3 AR1 R1 R3 B 0 N 1 K 1 F 0 N 7 B 1 N gt R3 STF R3 AR1 1 7B 1 N gt B 1 N 1 STF R2 AR1 F 0 N gt B 0 N 1 end end 6 20 Matrix Vector Multiplication 6 4 Matrix Vector Multiplication In matrix vector multiplication a Kx N matrix of elements m i j having K rows and N columns is multiplied by an N x 1 vector to produce a K x 1 result The multiplier vector has elements v j and the product vector has elements p i Each one of the product vector elements is computed by the following expres sion p i m i 0 v 0 m i 1 v 1 4 m i N 1 v N 1 i 0 1 K 1 This is essentially a dot product and the matrix vector multiplication contains as a special case the dot product presented in Example 2 1 on page 2 3 and Example 2 2 on page 2 5 In pseudo C format the computation of the matrix multiplication is expressed by for i 0 i lt K
99. herein are manufactured by AMP Incorporated Sockets 10 2 1 Tool Activated ZIF PGA Socket TAZ Figure 10 1 Tool Activated ZIF Socket 0 350 in Max 2 061 in Max Description AMP part number 382533 9 Pin positions 325 Soldertail length 0 170 in for PC boards 0 125 in thick other tail lengths available Actuator tool 354234 1 Features A Slightly larger than a PGA device Y Easy package loading because of large funnel entry Y Zero insertion force 3 Contact wiping action during insertion ensures clean contact points Spring loaded cover ensures proper loading Can be used with robotic insertion and removal Horizontal vs vertical socket forces prevent damage to the device Development Support and Part Order Information 10 7 Sockets 10 2 2 Handle Activated ZIF PGA Socket HAZ Figure 10 2 Handle Activated ZIF Socket 2 700 in Max 0 350 in Max 0 650 jv Description AMP part number 382320 9 Pin positions 325 Solder tail length 0 170 in for PC boards 0 125 in thick other tail lengths available Features Can be used for test and burn in Spring contacts are normally closed Y Easy package loading because of large funnel entry Y Zero insertion force J Contact wiping action during socket closing ensures clean contact points j Maximum Operating temperature is 160 C to allow burn in capability 10 8 Part Order Information 10 3 Part Order Information Thi
100. i pli 0 for j 0 j lt N j p i p i m i j v j Figure 6 7 shows the data memory organization for matrix vector multiplica tion and Example 6 11 shows the C4x assembly code that implements it Note that in Example 6 11 K number of rows should be greater than 0 and N number of columns should be greater than 1 Figure 6 7 Data Memory Organization for Matrix Vector Multiplication input result matrix o vector pcena vector Sa ae 0 vw0 mon vt O Ta m e e e e e e e e MN 1 m 1 0 p K 1 high m 1 1 address e e e Applications Oriented Operations 6 21 Matrix Vector Multiplication Example 6 11 Matrix Times a Vector Multiplication tt FH A FH HH HF HH HH HH HH HH KH HH HK HF TITLE MATRIX TIMES A VECTOR MULTIPLICATION SUBROUTINE MAT MAT MATRIX TIMES A VECTOR OPERATION TYPICAL CALLING SEQUENCE load ARO load AR1 load AR2 load AR3 load R1 CALL MAT ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION MOORE D I ARO ADDRESS OF M 0 0 AR1 ADDRESS OF V 0 AR2 ADDRESS OF P 0 AR3 NUMBER OF ROWS 1 K 1 RC NUMBER OF COLUMNS 2 N 2 REGISTERS USED AS INPUT ARO AR1 AR2 AR3 RC REGISTERS MODIFIED RO R2 ARO AR1 AR2 AR3 IRO RC MATRIX VECTOR BENCHMARKS CYCLES 1 7K KN 1 K N 7 not including subroutine overhead PROGRAM SIZE 10 words not including subroutine overhead global MAT SETUP ADDI3 RC
101. in similar ways Note that RPTS is not interruptible and the executed instruction is not re fetched for execution This frees the buses for operands Use parallel instructions You can have a multiply in parallel with an add or subtract and stores in parallel with any multiply or ALU operation This increases the number of operations executed in a single cycle For maximum efficiency observe the addressing modes used in parallel instructions and arrange the data appropriately You can have loads in parallel with any multiply or add or subtract The result of a multiply by one or an add of zero is the same as a load Therefore to implement paral lel instructions with a data load you can substitute a multiply or an add instruction with one extra register containing a one or zero in place of the load instruction Maximize the use of registers The registers are an efficient way to access scratch pad memory Extensive use of the register file facilitates the use of parallel instructions and helps avoid pipeline conflicts when you use register addressing Use the cache The cache speeds instruction fetches and enables sim ple cycle access even with slow external memory The cache is transpar ent to the user so make sure that it is enabled Programming Tips 5 5 Hints for Optimizing Assembly Language Code 5 6 Use internal memory instead of external memory The internal memory 2K x 32 bits RAM and 4K x 32 bits ROM is conside
102. ko 22 ko 22ko 22 kQ 22 kQ CRDY5 C5D 7 0 Using the Communication Ports 8 21 8 22 Chapter 9 C4x Power Dissipation The power supply current requirement Ipp of the C4x vary with the specific application and the device program activity The maximum power dissipation of a device can be calculated by multiplying Ipp with Vpp power supply volt age requirement Both parameters are provided in the C4x data sheet Addi tionally due to the inherent characteristics of CMOS technology the current requirements depend on clock rates output loadings and data patterns This chapter presents the information you need to determine power supply current requirements for the C4x under various operating conditions After you make this determination you can then calculate the device power dissipa tion and in turn thermal management requirements Topic Page 9 1 Capacitive and Resistive Loading 9 2 9 2 Basic Current Consumption a r a a ess 9 4 9 3 Current Requirement of Internal Components 9 7 9 4 Current Requirement of Output Driver Components 9 12 9 5 Calculation of Total Supply Current L usus 9 20 9 6 Example Supply Current Considerations 9 27 9 7 Design Considerations 2 9 29 9 1 Capacitive and Resistive Loading 9 1 Capacitive and Resistive Loading
103. mapped version of the printer port in the PC The printer port used to test this circuit was the DSP 550 from STB Systems butthere are other bidirectional printer ports on the market Usingthe STB card in the bidirectional mode requires that a jumper be set see your manual Then if a 1 is written to bits 5 or 7 of the control register this depends on your printer port data can be read back from the data register Simplified Hardware Interface for CA40 PG 3 3 or C44 devices Figure 8 2 shows a simplified commport signal splitter that splits each comm port control signal into a simple drive and sense pair of signals Simplified in this case means that though the circuit is easy to follow functionally and will operate it is not the preferred solution see the improved driver in Figure 8 3 The signals in this circuit can be easily buffered without risk of driver conflicts However keep a few things in mind about the simplified design Due to commport control signal restrictions in earlier silicon revisions this circuit will not work with the TMS320C40 PG 3 0 or lower This circuit requires a bidirectional printer port Standard printer port cables often do not provide clean signals uu C1 Ahigh value is needed for the isolation resistor in order to keep the current levels during signal opposition to a minimum But a low value is needed for the isolation resistor in order to insure reasonably fast rise and fall
104. mode bit should be reset OVM 0 so that the accumulator results are not loaded with the saturation values Example 3 10 and Example 3 11 show 64 bit addition and 64 bit subtraction respectively The first operand is stored in the registers RO low word and R1 high word The second operand is stored in registers R2 and R3 respective ly The result is stored in RO and R1 Example 3 10 64 Bit Addition ITLE 64 BIT ADDITION TWO 64 BIT NUMBERS ARE ADDED TO EACH OTHER PRODUCING A 64 BIT RESULT THE NUMBERS X R1 RO AND Y R3 R2 ARE ADDED RESULTING IN W R1 R0 i R1 RO d R3 R2 LN em R1 RO ADDI R2 R0 ADDC R3 R1 Logical and Arithmetic Operations 3 17 Extended Precision Arithmetic Example 3 11 64 Bit Subtraction SUBI R2 RO0 SUBB R3 R1 TITLE 64 BIT SUBTRACTION TWO 64 BIT NUMBERS ARE SUBTRACTED FROM EACH OTHER PRODUCING A 64 BIT RESULT THE NUMBERS X R1 R0 AND Y R3 R2 ARE SUBTRACTED RESULTING IN W R1 RO Rl RO R3 R2 BK 0 0 SSA Rl RO When two 32 bit numbers are multiplied a 64 bit product results To do this C4x provides a 32 bit x 32 bit multiplier and two special instructions MPYSHI multiply signed integer and produce 32 MSBs and MPYUHI multiply un signed integer and produce 32 MSBs
105. n 1 x n y n b2 d n 2 b1 d n 1 bO d n Figure 6 2 shows the memory organization for this two equation approach to the implementation of a single biquad on the C4x Figure 6 2 Data Memory Organization for a Single Biquad filter newest delay newest delay low coefficients node values node values address newest delay circular queue oldest delay high address As in the case of FIR filters the address for the start of the values d must be a multiple of 4 that is the last two bits of the beginning address must be zero The block size register BK must be initialized to 3 Applications Oriented Operations 6 9 FIR IIR and Adaptive Filte rs Example 6 6 IIR Filter One Biquad TITLE IIR FILTER SUBROUTINE IIR1 IIR1 IIR FILTER ONE BIQUAD EQUATIONS d n a2 d n 2 al d n 1 x n y n b2 d n 2 bl d n 1 bO d n OR y n al y n 1 a2 y n 2 b0 x n b1 x n 1 x b2 x n 2 TYPICAL CALLING SEQUENCE load R2 LAJU IIRL load ARO load AR1 load BK ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION Ko qp R2 INPUT SAMPLE X N ARO ADDRESS OF FILTER COEFFICIENTS A2 AR1 ADDRESS OF DELAY MODE VALUES D N 2 BK BK 3 REGISTERS USED AS INPUT R2 ARO AR1 BK REGISTE
106. obtained above as a function of actual load capacitance if the load capacitance presented to the buses is less than 80 pF Inthe previous example if the load capacitance is 20 pF instead of 80 pF the actual pin current would be 1 66 mA While the slope of the line in Figure 9 10 can be used to interpolate scale fac tors for loads greater than 80 pF the TMS320C4x is specified to drive output loads less than 80 pF interface timings cannot be guaranteed at higher loads With data dependency and capacitive load scale factors applied to the current values for local and global buses the total supply current required for the device for a particular application can be calculated as described in the next section Figure 9 10 Pin Current Versus Output Load Capacitance 10 MHz 5 Incremental I pp mA Ibase 66 mA 0 0 10 20 30 40 50 60 70 80 Output Load Capacitance pF C4x Power Dissipation 9 19 Galculation of Total Supply Current 9 5 Calculation of Total Supply Current 9 5 1 9 20 The previous sections have discussed currents contributed by different sources on the TMS320C4x Because determinations of actual current values are unique and independent for each source each current source was dis cussed separately In an actual application however the sum of the indepen dent contributions determines the total current requirement for the device This total current value is exhibited as the total current supplied to
107. pins When the bus is not being driven explicitly itis left floating which can cause excessively high currents to be drawn on the input buffer section of all 64 bits of the bus In this case because all 64 data bus bits are normally used independently in most applications each data bus pin should be pulled up with a separate resistor for minimum power C4x Power Dissipation 9 31 9 32 Chapter 10 Development Support and Part Order Information This chapter provides development support information socket descriptions device part numbers and support tool ordering information for the C4x Each C4x support product is described in the TMS320 Family Development Support Reference Guide literature number SPRUO11 In addition more than 100 third party developers offer products that support the TI TMS320 family For more information refer to the TMS320 Third Party Reference Guide literature number SPRUO52 For information on pricing and availability contact the nearest TI Field Sales Office or authorized distributor See the list at the back of this book Topic Page 10 1 Development SUpport e 10 2 TO 2 Sockels ee a eere eese eie E Iti 10 6 10 3 Part Order Information ee eee ele eee eee ere 10 9 10 1 Development Support 10 1 Development Support 10 2 Texas Instruments offers an extensive line of development tools for the TMS320C4x generation of DSPs including tools to evaluate the performance of the
108. range of linear opera tion This can cause the input buffer circuit to draw a significant DC current directly from Vpp to ground Therefore any unused device inputs should be pulled up to Vpp via a resistor pullup of nominally 20 kQ or driven high with an unused gate Input only pins that are not used can be pulled up in parallel with other inputs of the same type with a single gate or resistor to minimize sys tem component count In this case up to 15 or more standard device inputs can be pulled up with a single resistor Any device I O pins that are unused should be selected as outputs This avoids the requirement for pull ups to ensure that the I O input stage is not biased in the linear region and therefore eliminates an unnecessary current compo nent Design Considerations For any device output any DC load present is directly reflected in the system s power supply current Therefore DC loading of outputs should be reduced to a minimum If DC currents are being sourced from the address bus outputs the address bus should be set to a level that minimizes the current through the external load This can be accomplished by performing a dummy read from an external address For I O pins that must be used in both the input and output modes individual pullup resistors of nominally 20 kQ should be used to ensure minimum power dissipation if these pins are not always driven to a valid logic state This is par ticularly true of the data bus
109. result is bit reversed The program is taken from the Burrus and Parks book p 117 The input data array is 2 FFT SIZE long with real and imaginary data in consecutive memory locations Re Im Re Im The twiddle factors are supplied in a table put in a section with a global label SINE pointing to the beginning of the table This data is included in a separate file to preserve the generic nature of the program The sine table size is 5 FFT SIZE 4 In order to have the final results in bit reversed order the two middle branches of the radix 4 butterfly are interchanged during storage Note the difference when comparing with the program in p 117 of the Burrus and Parks book A A AX A A FF F FF FF FF ACA A F FF FF HF FF HF FF FF FF FF FF 0X 0X 0X X 6 34 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued Note Sections needed in the linker command file ffttxt FFT code fftdat FFT data ck ck ck 0k ck ck 0k ck Sk ck ck KA AKA KA A KA KAZ A KA ck ck AXA A KA X AZ AXA AKA KA AKA ck ck ck KA AKA ck ck A KA ck ck ck ck ck AKA AA Sk Sk kv ke kv ko ko ko WARNING For optimization purposes LDF AR1 RO see 1 will fetch memory outside the input buffer range during the first loop execution RC 0 Even though the read value RO is not used in the code this could cause a halt situa tion if AR1 po
110. runs in real time 4 If the program does not run in real time m Use the 02 or o3 option when compiling m Use registers to pass parameters mr compiling option m Use inlining x compiling option m Remove the g option when compiling B Follow some of the efficient code generation tips listed below 5 Identify places where most of the execution time is spent and optimize these areas by writing assembly language routines that implement the functions The efficiency of the code generated by the floating point compiler depends to alarge extent on how well you take advantage of the compiler strengths de scribed above when writing your C code There are specific constructs that can vastly improve the compiler s effectiveness U Useregister variables for often used variables This is particularly true for pointer variables Example 5 1 shows a code fragment that ex changes one object in memory with another Example 5 1 Exchanging Objects in Memory do temp src src dest dest temp while n Pre compute subexpressions especially array references in loops As sign commonly used expressions to register variables where possible 5 2 m Hints for Optimizing C Code Use to step through arrays rather than using an index to recalculate the address each time through a loop As an example of the previous 2 points consider the loops in Example 5 2 Exam
111. s resistive loss see section 9 1 Capacitive and Resistive Loading C4x Power Dissipation 9 5 Basic Current Consumption Figure 9 1 Test Setup 32 LD 31 LA CLOAD T Control Pins 32D TMS320C40 31A 7 CLOAD V V Rp Current Reguirement of Internal Components 9 3 Current Requirement of Internal Components 9 3 1 9 3 2 Quiescent The power supply current requirement for internal circuitry consists of three components quiescent internal operations and internal bus operations Quiescent and internal operations are constants whereas the internal bus operations component varies with the rate of internal bus usage and the data values being transferred The quiescent requirement for the TMS320C4x is 130 mA while in IDLE Quiescent refers to the baseline supply current drawn by the TMS320C4x dur ing minimal internal activity Examples of quiescent current include Maintaining timer and oscillator J Executing the IDLE instruction Holding the TMS320C4x in reset Internal Operations Internal operations include register to register multiplication ALU operations and branches but not external bus usage or significant internal bus usage In ternal operations add a constant 60 mA above the quiescent requirement so that the total contribution of quiescent and internal operation is 190 mA Note however that internal and or external program operations executed via an RPTS in
112. setup to TCK high Target TDO delay from TCK low Target buffer delay maximum Target buffer delay minimum Target buffer skew between two devices in the same package td bufmax ta bufmin x 0 15 Assume a 40 60 duty cycle clock Given in the SPL data sheet ld DTMSmax tsu DTDLmin ld DTCKHmin ta DTCKLmax There are two key timing paths to consider in the emulation design SPL DTMS DTDO delay from TCK low maximum DTDI setup time to SPL TCK high minimum SPL DTCK delay from TCK high minimum SPL DTCK delay from TCK low maximum LJ The TCK to DTMS DTDO path called tpg TCK DTMS LJ The TCK to DTDI path called tpg TCK DTDI 10ns 15ns 10ns 1ns 1 35 ns 0 4 40 31 ns 7 ns 2 ns 16 ns Case 1 Case 2 Emulation Design Considerations Of the following two cases the worst case path delay is calculated to deter mine the maximum system test clock frequency Single processor direct connection DTMS DTDO timed from TCK low ty Ia DTMSmax DTCKHmin su TTMS t 3 PS MS I Tckfactor Bins 2ns 10ns 7 0 4 107 5ns 9 3 MHz latrrooj La DTCKLmax su DTDLmin fa TCK DTDI 7 TCkfactor 15ns 16ns 7ns a 0 4 9 bns 10 5 MHz In this case the TCK to DTMS DTDL path is the limiting factor Single multiprocessor DTMS DTDO TCK buffered input DTDI buffered out put DTMS DTDO timed from TCK low la prMSmax iprckHmin tsu
113. significant if periods o f high peak current are short in duration You can obtain average current by performing a weighted sum of the current due to the various independent pro gram segments over time You can calculate the average current for the exam ple in the previous paragraph as follows 0 8 x 250 mA 0 2 x 790 mA 358 mA Using this approach you can calculate average current for any number of pro gram segments 9 5 5 Thermal Management Considerations Heating characteristics of the TMS320C4x are dependent upon power dis sipation which in turn is dependent upon power supply current When mak C4x Power Dissipation 9 23 Galculation of Total Supply Current 9 24 ing thermal management calculations you must consider the manner in which power supply current contributes to power dissipation and to the TMS320C4x package thermal characteristics time constant Depending on the sources and destinations of current on the device some current contributions to Ipp do not constitute a component of power dissipation at 5 volts That is to say the TMS320C4x may be acting only as a switch in which case the voltage drop is across a load and not across the C4x If the total current flowing into Vpp is used to calculate power dissipation at 5 volts erroneously large values for package power dissipation will be obtained The error occurs because the current resulting from driving a logic high level into a DC load appears only as
114. stack A push performs a preincrement and a pop performs a postdecrement of the SP Provisions should be made to accommodate your software s antici pated storage requirements The stack pointer SP can be read from as well as written to multiple stacks can be created by updating the SP The SP is not initialized by the hardware during reset itis important to remember to initialize its value so that the it points to a predetermined memory location Example 1 1 on page 1 7 shows how to initialize the SP You must initialize the stack to a valid free memory space Otherwise use of the stack could corrupt data or program memory The program counter is pushed onto the system stack on subroutine calls traps and interrupts Itis popped from the system stack on returns The PUSH POP PUSHF and POPF instructions push and pop the system stack The stack can be used inside of subroutines as a place of temporary storage of reg isters as is the case shown in Example 2 1 on page 2 3 Program Control 2 7 Stacks and Oueues Two instructions PUSHF and POPF are for floating point numbers These in structions can pop and push floating point numbers to registers RO R11 This feature is very useful for saving the extended precision registers see Example 2 1 and Example 2 2 PUSH saves the lower 32 bits of an extended precision register and PUSHF saves the upper 32 bits To recover this exten ded precision number execute a POPF followed
115. storage require ments Other methods of saving registers could be used equally well Subroutines Example 2 1 Regular Subroutine Call Dot Product TITLE REGULAR SUBROUTINE CALL DOT PRODUCT e MAIN ROUTINE THAT CALLS THE SUBROUTINE DOT TO COMPUTE THE DOT PRODUCT OF TWO VECTORS LDI b1k0 ARO ARO points to vector a LDI b1k1 AR1 AR1 points to vector b LDI N RC RC contains the number of elements CALL DOT SUBROUTINE DOT E EQUATION d a 0 b 0 a 1 b 1 a N 1 b N 1 THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO N MUST BE GREATER THAN OR EQUAL TO 2 e ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION K ml e ei IY IECIT P PITE TEES ARO ADDRESS OF a 0 AR1 ADDRESS OF b 0 m RC LENGTH OF VECTORS N e REGISTERS USED AS INPUT ARO AR1 RC REGISTER MODIFIED RO REGISTER CONTAINING RESULT RO global DOT DOTPUSH ST Save status register PUSH R2 Use the stack to save R2 s PUSHF R2 bottom 32 and top 32 bits PUSH ARO Save ARO PUSH AR1 Save AR1 PUSH RC Save RC PUSH RS PUSH RE Initialize RO MPYF3 ARO AR1 RO a 0 b 0 gt RO Program Control 2 3 Subroutines Example 2 1 Regular Subroutine Call Dot Product Continued
116. the NMI occurs Program Control 2 11 Interrupt Examples 2 3 8 Using One Interrupt for Two Services The IVTP can be changed to point to alternate interrupt vector tables This re locatable feature of the table allows you to use a single interrupt signal for more than one service In Example 2 4 the IVTP is reset in the external INTO interrupt service rou tines EINTOA and EINTOB After the value of the IVTP is changed the CPU goes to a different interrupt service routine when the same interrupt signal re occurs Example 2 4 Use of One Interrupt Signal for Two Different Services 2 12 TITLE USE OF ONE INTERRUPT SIGNAL FOR TWO DIFFERENT SERVICES IN THIS EXAMPLE THE ADDRESS OF EINTOA AND EINTOB ARE IN MEMORY LOCATION 03H AND 1003H RESPECTIVELY ASSUME THE IVTP HAS NOT BEEN CHANGED AFTER DEVICE RESET AND THE EXTERNAL INTERRUPT IIOFO IS ENABLED WHEN THE FIRST IIOFQ INTERRUPT L COMES IN THE EINTOA ROUTINE WILL BE EXECUTED AND THEN E NEXT IIOFO INTERRUPT SIGNAL OCCURS THE EINTOB ROUTINE WILL BE EXECUTED AND SO ON THE EINTOA AND EINTOB ROUTINES WILL TAKE TURNS TO BE EXECUTED WHEN THE IIOFO INTERRUPT SIGNAL OCCURS External IIOFO interrupt service routine A ACA X F ACA F X Xo X F xo x H hj m global EINTOA EINTOA LDI 1000H RO Change IVTP to poi
117. the forward error b i n is the backward error k i is the i h reflection coefficient x n is the input and y n is the output signal The order of the filter that is the number of stages is p Inthe linear predictive coding LPC method of speech processing the inverse lattice filter is used during analysis and the forward lattice filter is used during speech synthesis Figure 6 4 Structure of the Inverse Lattice Filter m f p 1 n m EX uM id n Figure 6 5 shows the data memory organization of the inverse lattice filter on the C40 b 0 Applications Oriented Operations 6 17 Lattice Filters Figure 6 5 Data Memory Organization for Inverse Lattice Filters backward propagation terms reflection coefficients low address k b 0 n 1 k 2 b 1 n 1 e e e e e e high k p b p 1 n1 address Example 6 9 Inverse Lattice Filter TITLE INVERSE LATTICE FILTER SUBROUTINE LATINV LATINV LATTICE FILTER LPC INVERSE FILTER ANALYSIS TYPICAL CALLING SEQUENCE load R2 LAJU LATINV load ARO load ARI load RC ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION R2 f 0 n x n ARO ADDRESS OF FILTER COEFFICIENTS k 1 AR1 ADDRESS OF BACKWARD PROPAGATION VALUES b 0 n 1 RC RC p
118. times of the commport control signals when they are inputs This conflict can be overcome by carefully picking the correct resistor values or by adding additional biasing Commport to Host Interface Figure 8 2 Better Commport Signal Splitter Vcc ere ji B Comm Port Port R P LS32 0 RESET Rs CREQ sns Busy R CREQ drv X SLCTIN e e E CREQ Rg ACK lt CACK_sns R CACK_d x INIT o D b N CACK Rs SABER hae L CSTRB sns CSTRB dv Rx 2 AUTOFD CSTRB Rs CRDY_sns SLCT CRDY_drv Rx STROBE e CRDY DO DO D7 D7 Ry Legend Rp 470 ohms Ry 180 ohms Rs 47 ohms Ry 220 ohms Using the Communication Ports 8 11 Commport to Host Interface 8 6 2 Improved Drive and Sense Amplifiers Two improvements are suggested for the interface described above The improvements are described in Figure 8 3 Figure 8 3 Improved Interface Circuit sense sense NVV R4 Clock Pe Rs C4x Parallel V Port 4 comm port R p A drive Bs e R R RS 232 driver A R2 j C4 Legend Rp 470 ohms Ro 10 K ohms C4 100 pF R4 1Kohms Rg 50 ohms The first improvement is that the signals going to and from the printer port are synchronized using a clock and a simple data latch By taking samples in time noise which may be able to corrupt the first sample of a tr
119. transfer operation is more complex than the word transfer operation It requires tri stating of pins after different events Sections 8 6 and 8 7 offer examples on how to handle token transfers with non C4x devices The word transfer operation is much simpler The following sequence describes the word transfer operation Word transfer operation CASE I The non C4x has the token and transmits data The C4x receives data 1 Thenon C4x device drives the first byte byte 0 into the CD data lines and then drops CSTRB low indicating new data There is no need to meet the maximum timing requirements but the data should be valid before CSTRB goes low 2 Thenon C4x device waits for the C4x to respond with CRDY low and then can immediately drive the next data byte and bring CSTRB high 3 The non C4x device waits for CRDY to be high then steps 1 2 and 3 repeat for bytes 1 3 4 After byte 3 is transmitted the non C4x device can leave the byte 3 value in the CD lines until a new word is sent 5 In C4x device revisions lower than 3 0 CSTRB should go high after re ceiving CRDY low no later than one C4x H1 H3 cycle between word boundaries See Section 8 9 Implementing a CSTRB Shortener Circuiton page 8 17 foranimplementation of a CSTRB shortener circuit In C4x de vice revisions 3 0 or higher no CSTRB width restriction exists 6 The non C4x device can drive CSTRB low for the next word at any t
120. where Pdevice is the power consumed by the device In this case Crota includes both internal and external capacitances Cj5 4j can be effectively reduced by minimizing power consuming internal operation and external bus cycles Bipolar devices pullup resistors and other devices con sume DC power that adds a constant offset unaffected by fc K The effect of these DC losses depends on data not frequency This document assumes an all CMOS approach in which these effects are minimal Another source of power consumption is the current consumed by a CMOS gate when itis biased in the linear region Typically if a gate is allowed to float it can consume current Pullups and pulldowns of unused pins are therefore recommended C4x Power Dissipation 9 3 Basic Current Consumption 9 2 Basic Current Consumption 9 2 1 Generally power supply current requirements are related to the system for example operating freguency supply voltage temperature and output load In addition because the current requirement for a CMOS device depends on the charging and discharging of node capacitance factors such as clocking rate output load capacitance and data values can be important Current Components The power supply current has four basic components Quiescent Internal operations Internal bus operations External bus operations 9 2 2 Current Dependency 9 4 The power supply current consumption depends on many fact
121. 0 You can use the OR of the two ready signals to implement wait states for de vices that require more wait states than internal logic can implement up to seven This feature is useful for example if a system contains some fast and some slow devices In this case Fast devices can generate ready externally with a minimum of logic When fast devices are accessed the external hardware responds prompt ly with ready which terminates the cycle Slow devices can use the internal wait counter for larger numbers of wait states When slow devices are accessed the external hardware does not respond and the cycle is appropriately terminated after the internal wait count The OR of the two ready signals can also terminate the bus cycle before the number of wait states implemented with external logic allows termination In this case a shorter wait count is specified internally than the number of wait states implemented with the external ready logic and the bus cycle is termi nated after the wait count Also this feature can be used as a safeguard against inadvertent accesses to nonexistent memory that would never re spond with ready and would therefore lock up the C4x Ifthe OR of the two ready signals is used however and the internal wait state count is less than the number of wait states implemented externally the external ready generation logic must be able to reset its sequencing to allow a new cycle to begin immediately follo
122. 0 LBU 3 2 1 0 LH 1 0 LHU 1 0 LWL 0 1 2 3 LWR 0 1 2 3 MB 3 2 1 0 and MH 1 0 is available on the C4x In an application such as image process ing it is often important to be able to manipulate packed data For example the pixels in color images are often represented by four 8 bit unsigned quanti ties red green blue and alpha which are packed into a single 32 bit word The byte and half word instruction makes it very easy to manipulate this packed data Example 3 4 shows the packing of data from a half word FIFO to 32 bit data memory and Example 3 5 shows the unpacking of a 32 bit data array into a 4 byte wide data array assuming the 32 bit data array contains four 8 bit un signed numbers ITLE USE OF PACKING DATA FROM HALF WORD FIFO TO 32 BIT DATA MEMORY IN THIS EXAMPLE EVERY TWO INPUT 16 BITS DATA HAS BEEN PACKED INTO ONE 32 BIT DATA MEMORY THE LOOP SIZE USED HERE IS ARRAY SIZE NOT THE INPUT DATA LENGTH LDI size 1 RC Load array size RP TBD PACK LDI Gfifo adr AR1 Load fifo address LDI Garray AR2 Load data array address NOP gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt gt Loop starts here WLO AR1 R9 Pack 16 LSBs WLI AR1 R9 Pack 16 MSBs PACK STI R9 AR2 1 Store the data 3 4 Byte and Half Word Manipulation Example 3 5 Use of Unpacking 32 Bit
123. 0 r3 TR rl r0 r0 BR COS MI subf r rl r3 mpyf arl r6 rl rl BI SIN r2 AR TR subf r3 ar0 r2 addf ar0 r3 r5 rb AR TR BR r2 stf r2 ar3 SECOND BUTTERFLY TYPE TR BI COS BR SIN TI BI SIN BR COS AR AR TR AI AI TI BR AR TR BI AI TI loop bfly2 mpyf tard E65 rb BI COS AR r5 stf r5 ar2 addf ri r0 r2 r2 TI r0 rl mpyf arl r6 r0 r0 BR SIN r3 AI TI addf r2 ar r3 subf r2 ar0 r4 r4 AI TI BI 13 stf r3 ar3 subf 0 5 3 TR r3 r5 0 mpyf arl r7 r0 r0 BR COS r2 AR TR subf ES ar0 22 mpyf arl r6 rl rl BI SIN AI r4 stf r4 ar2 bfly2 addf ar0O r3 r5 r5 AR TR BR r2 stf r2 ar3 clear pipeline 6 48 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued addf rl rf0 r2 r2 TI r0 rl addf r2 ar0 r r3 AI TI im stf r5 ar2 AR r5 cmpi ar6 ar4 bned gruppe do following 3 instructions subf r2 ar0 ir1 r4 v4 AI TI BI r3 stf r3 ar3 irl ldf tar r7 r COS 1 stf r4 ar2 irl AI r4 nop arl irl branch here end of this butterflygroup cmpi 4 ir0 jump out after ld n 3 stage bnzaf stufe ldi sintab ar7 pointer to twi
124. 000101000 00000000000000000000000000011010 00000000000000000000000000100001 00000000000000000000000000101000 00000000000000000000000000011010 00000000000000000000000000011011 00000000000000000000000000101000 Negative difference J 00000000000000000000000000110110 y l Remainder Quot Dividend Divisor aligned 1st SUBC com mand New Dividend Quotient Divisor Difference 20 2nd SUBC command New Dividend Quotient Divisor Difference gt 0 3rd SUBC command New Dividend Quotient Divisor 4th SUBC command Final Result When the SUBC command is used both the dividend and the divisor must be positive Example 3 7 shows a realization of the integer division in which the sign of the quotient is properly handled The last instruction before returning modifies the condition flag in case subsequent operations depend on the sign of the result Integer and Floating Point Division Example 3 7 Integer Division TITLE INTEGER DIVISION SUBROUTINE DIVI INPUTS SIGNED INTEGER DIVIDEND IN RO SIGNED INTEGER DIVISOR IN R1 dl OUTPUT R0 R1 into RO REGISTERS USED RO R3 IRO IR1 OPERATION 1 NORMALIZE DIVISOR WITH DIVIDEND 2 REPEAT SUBC 3 QUOTIENT IS IN LSBs OF RESULT m CYCLES 31 62 DEPENDS ON AMOUNT OF NOR
125. 1 IR1 MPYF3 AR2 IRO R5 R4 STF R4 ARO IR1 MPYF3 AR3 IRO R5 R1 MPYF3 AR7 AR3 RO ADDF3 RO R1 R2 MPYF3 AR6 ARA RO 6 68 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued UBF 3 R4 RO R3 UBF 3 AR1 IR0 R3 R4 DDF3 AR1 R3 R4 TF R4 AR2 UBF 3 R2 ARO IRO R4 TF R4 AR3 DDF3 ARO R2 R4 TF R4 AR1 PYF3 4 AR3 R6 R1 TF R4 ARO DDF 3 RO R1 R2 PYF3 AR5 ARA IRO RO UBF 3 RO R1 R3 S S A S S S A S M S A M S SUBF3 AR1 R3 R4 ADDF3 AR1 R3 R4 STF R4 AR2 SUBF3 R2 ARO R4 STF R4 AR3 ADDF3 ARO R2 R4 STF R4 AR1 MPYF3 AR2 R7 R4 STF R4 ARO MPYF3 AR3 R7 R1 MPYF3 AR5 AR3 RO ADDF3 RO R1 R2 SUBF3 R4 RO R3 SUBF3 AR1 R3 R4 ADDF3 AR1 R3 R4 STF R4 AR2 SUBF3 R2 ARO R4 STF R4 AR3 ADDF3 ARO R2 R4 STF R4 AR1 STF R4 ARO Perform Remaining FFT loops LOOP lst E X I1 0 AR1 gt X I1 1st 1 X I1 2nd 2 H x I1 3rd 3 A gt A X I2 8 B gt H X I2 3rd 13 X I2 2nd 14 s AR2 X I2 1st 15 X L3 16 H AR3 X I3 1st 17 loop 4 onwards 2nd 0 lt X I1 X I1 X 13 1 X I3 COS X 14 SIN 2 3 16 29 30 31 lt X 11 X I3 COS X I4 SIN Sg lt MTT XT ES 33 lt X I2
126. 1 LDA AR1 AR2 LDA AR1 AR3 LDA AR1 AR4 ADDI 2 AR2 ADDI 4 AR3 LDI FFT SIZE RC LSH 3 R SUBI 1 RC RPTBD LOOP3 A ADDI 6 AR4 LDA 8 IRO LDA SINE_TABLE ARO ARO points at SIN COS table LDF AR3 R3 ADDF3 R3 AR1 RO RO X I1 X I3 SUBF3 R3 AR1 R1 R1 X I1 X I3 LDF ARA R2 n STF RO AR1 IRO 7 Xe TT lt MPYF 2 0 R2 R2 2 X I4 LDF AR2 R3 BI STF R1 AR3 IRO X 13 lt E MPYF 2 0 R3 R3 2 X I2 LOOP3 A STF R3 AR2 IRO RX 02 eee E B STF R2 AR4 IRO p X 14 Part B 0 A AR1 gt I1 1 lt X 11 X I2 i 2 AR2 gt I2 3 lt X I4 X I3 i A A AR3 gt I3 5 X I1 X I2 COS X I3 X I4 SIN D 6 E ARA gt IA 7 lt X I1 X I2 SIN X I3 X I4 COS 8 r A AR1 gt 9 NOTE COS 2 pi 8 SIN 2 pi 8 Applications Oriented Operations 6 79 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued H H i NI Z LDA SOURCE_ADDR AR1 LDA AR1 AR2 LDA AR1 AR3 LDA AR1 AR4 ADDI 1 AR1 ADDI 3 AR2 ADDI 5 AR3 ADDI 7 AR4 LDA SINE_TABLE AR7 AR7 points at SIN COS table LDI QF
127. 1 9 Emulation Design Considerations ccs 11 14 11 9 1 Using Scan Path Linkers ococccocccccccocoonnrrr 11 14 11 9 2 Emulation Timing Calculations for SPL 11 16 11 9 8 Using Emulation PINS cce esee RE E Re ner mh md aaah s 11 18 11 9 4 Performing Diagnostic Applications lt lt 11 23 A Glossary scimesca t A 1 Contents xxi Figures PTT 00 YO OI Lg dog BON Lo od tod BOD NO G N 00 YO G Te er bu nl pl R cl m eee oe oe clas cl cm Co IN xxii Reset CIRCUIT ue inve n RR ba bod A E d irse edd dide d Ud UE en hae E 1 3 Voltage on the RESET Pin eee n 1 4 System Stack Configuration sss siraga taa aiaia eaaa a a nh 2 8 Implementations of High to Low Memory Stacks 0 00 cece aeaaaee 2 9 Implementations of Low to High Memory Stacks 2 2 9 DMA Bit Reversed Addressing 2 tenes 3 8 Possible System Configurations ccc eee tenes 4 2 External Interfaces 4 hh 4 3 Consecutive Reads Followed by a Write 0 c cece eee eee 4 6 Consecutive Writes Followed by a Read 0000 cece eect eect eens 4 7 C4x Interface to Eight Zero Wait State SRAM 0 00 c cece cece eee 4 8 C4x Interface to Zero Wait State SRAMs Two Strobes 00 0 c eee eee 4 10 Logic for Generation of 0 1 or 2 Wait States for Multiple Devices 4 14 Page Switching for the CY7B185 2 en 4 19
128. 1 CO1 R3 S11 RO R2 SI3 R6 R1 CO3 R2 SI3 R6 R2 CO3 Y I3 RA CO3 R2 SI3 RO RA SI3 RO R2 CO3 R4 S13 x 13 R2 CO3 R4 SI3 Load next Y I2 LOOP BACK TO THE INNER LOOP p X I Y I pointer Increment inner loop counter Increment repeat counter for next time IE 4 IF N1 N2 Setup loop BLK3 Point to SIN 45 Create cosine index AR4 CO21 R7 X 12 R1 X I X 12 R3 Y I Y 12 R4 Y I Y I2 R5 X I1 X 13 R6 R5 R1 R1 R1 R5 R5 Y 11 Y 13 RO R3 R5 7R3 R3 R5 R2 X I X 12 Y I R3 R5 R7 X 13 X I R1 R5 R3 Y 11 Y 13 R1 X 11 X 13 Y 11 R5 R1 R5 R2 R3 R2 R2 R3 R3 R4 R1 R4 R4 R1 R1 R3 R5 RI SR1 CO21 PX I1 R3 R5 7R3 R3 R5 R3 R3 CO21 Y I2 R3 R5 CO21 Rl R1 R2 R4 R1 C021 6 38 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued NI STF R3 AR2 IRO X I2 R3 R5 CO21 ADDF R4 R2 R2 R2 R4 MPYF3 R2 AR4 R2 R2 R2 CO21 B STF R1 AR3 Y 13 R4 R2 CO21 BLK3 LDF AR2 R7 Load next X I2 BI STF R2 AR3 IRO X 13 R4 R2 CO21 CMPI R11 BK BPD INLOP Loop back to the inner loop LDI R11 ARO ADDI INPUTP ARO 7 X I Y 1I pointer ADDI 2 R11 lncrement inner loop counter LSH 2 R8 Increment repeat counter for next time LSH 2 AR7 IE A IE LDI BK IRO
129. 2 REGISTERS USED AS INPUT R2 ARO AR1 RC REGISTERS MODIFIED RO R1 R2 R3 RS RE RC ARO ARI REGISTER CONTAINING RESULT R2 f p n BENCHMARKS CYCLES 3 3p not including subroutine overhead PROGRAM SIZE 9 WORDS not including subroutine overhead global LATINV i LATINV RPTBD LOOP Setup the delayed repeat block loop MPYF3 ARO AR1 RO 7k 1 b 0 n 1 gt RO Assume f 0 n gt R2 LDF R2 R3 Put b 0 n 0 n gt R3 MPYF3 AR0 1 R2 R1 k 1 0 n gt R1 Lattice Filters Example 6 9 Inverse Lattice Filter Continued 2 lt i lt p end MPYF3 ADDF3 DDF3 TF np MPYF3 UD DDF3 p W ADDF3 STF NOP end EANUP Repeat block loop start here ARO AR1 1 RO k i b i 1 n 1 gt RO R2 R0 R2 i 1 1 n k i 1 b i 1 1 n 1 f i 1 n gt R2 b i 1 1 n 1 k i 1 f i 1 1 n AR1 1 R1 R3 b i 1 n gt R3 R3 AR1 1 b i 1 1 n b i 1 1 n 1 ARO 1 R2 R1 Pk i f i 1 n gt R1 R11 Delayed return R2 R0 R2 E p1 n k p b p 1 n 1 f p n gt R2 AR1 R1 R3 b p 1 n 1 k p f p 1 n i b p n gt R3 R3 AR1 b p 1 n gt b p 1 n 1 The structure of the forward lattice filter shown in Figure 6 6 is similar to that of the inverse filter also shown in the figure These corresponding equations describe the lattice filter f i 1 n
130. 2 SUBRF doy R2 PYF R2 R1 End of first iteration 16 bits accuracy PYF3 R1 R1 R2 Second iteration BRD R11 Delayed return to caller PYF RO R2 SUBRF 1 54R2 PYF R2 R1 End of second iteration 32 bits accuracy R1 1 SORT v Return to caller end You can find the square root by a simple multiplication sqrt v vx n in which x n is the estimate of 1 sqrt v as determined by the Newton Raphson algo rithm or another algorithm 3 16 Extended Precision Arithmetic 3 7 Extended Precision Arithmetic The C4x offers 32 bits of precision in the mantissa for integer arithmetic and 24 bits of precision in the mantissa for floating point arithmetic For higher pre cision in floating point operations the twelve extended precision registers RO to R11 contain eight more bits of accuracy Because no comparable extension is available for fixed point arithmetic this section discusses how to achieve fixed point double precision The technique consists of performing the arith metic by parts and is similar to the way in which longhand arithmetic is done The instructions ADDC add with carry and SUBB subtract with borrow use the status carry bit for extended precision arithmetic The carry bit is affected by the arithmetic operations of the ALU and by the rotate and shift instructions You can also manipulate it directly by setting the status register to certain val ues For proper operation the overflow
131. 2 and Example 6 14 provide an easy understanding of the FFT algorithm functions However those examples are not optimized for fast ex ecution of the FFT Example 6 15 shows a faster version of a radix 2 DIT FFT algorithm This program uses a different twiddle factor table than the previous examples The twiddle factors are stored in bit reversed order and with a table length of N 2 N FFT length as shown in Example 6 16 For instance if the FFT length is 32 the twiddle factor table should be Address Coefficient 0 R WN 0 COS 2 PI 0 32 1 1 MWN 0 SIN 2 PI 0 32 0 2 R WN 4 COS 2 PI 4 32 0 707 3 MWN 4 SIN 2 PI 4 32 0 707 12 R WN 3 COS 2 PI 3 32 0 831 13 MWN 3 SIN 2 PI 3 32 0 556 14 R WN 7 GOS 2 PI 7 32 0 195 15 I WN 7 SIN 2 PI 7 32 0 981 Applications Oriented Operations 6 41 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT k k A A AA FF FF F FF F FF FF FF HF FF HF FF HF FF FF CACA 0X 0X FF FF X OX 0E X FF F FF F F X X FILENAM D ESCRIPTION m DAT VERSION ERSION Wh oo 4 0 CR2DIT ASM KKK KK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK A KA KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KK COMPLEX RADIX 2 DIT FFT FOR TMS320C40 6 29 93 4 0 DATI 1 91 7 1 92 6 29 93 Iu co NTS nal ve Ck Ck ck ck Ck ck ck kk Ak A A A ZA AK Z KKK KKK KKK KK KK
132. 4 r2 ro r2 r2 ro r3 r6 7 r4 BI r2 r7 r6 r7 r7 CI r3 KKK KKK KK KKK AZ HA AZ K KKK KKK KKK AX KA AX KA KA KAZ A KA AX ck ck ck KA KK KKK KKK KKK AKA KK KKK KKK KKK KK KK KKK KKK KKK KK KKK KR KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KK KK KKK 6 46 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued ldi fg2 irl subi 1 ir0 ar5 ldi 1 ar6 ldi sintab ar7 ldi 0 ar4 ldi inputp ar0 stufe ldi ar0 ar2 addi ir0 ar0 ar3 ldi ar3 arl lsh 1 ar6 lsh 2 ar5 lsh 1 ar5 lsh 1 ir0 lsh 1 151 addi 1 irl Lat arl r6 I ldf katy gruppe fill pipeline ar0 upper real butterfly input arl lower real butterfly input ar2 upper real butterfly output ar3 lower real butterfly output the imaginary part has to follow ldf ar7 r6 mpyf arl i6 ek pd addf x ar4 r0 r3 mpy f FAC rd EO ldi ar5 rc rptbd bflyl mpyf Rat Say tard 1 addf O cl 43 mpyf Harker ey 7 ed IBI subf r3 ar0 r2 addf ar 4 r3 rb5 IN stf r2 ar3 5 FIRST BUTTERFLY TYPE TR BR COS BI SIN TI BR SIN BI COS AR AR TR E AI AI TI BR AR TR BI AI TI Es loop bflyl Seo Se Ne Ne Ne Ne e Se Se Ne e Se Se Ne e pointer to twiddle factor group counter upper real butterfly output lower real butterfly o
133. 4 22 internal interrupt definition A 4 internal interrupt enable register definition A 4 interrupt definition A 4 interrupt acknowledge IACK definition A 4 interrupt flag register 2 11 interrupt programming procedure 2 11 interrupt service routine INT2 2 13 interrupt service routine ISR definition A 4 interrupt vector table IVT definition A 4 interrupt vector table pointer IVTP definition A 4 interrupts communication port 8 3 context switching 2 14 context switching 2 11 DMA 7 4 dual services example 2 12 example 3 2 examples 2 11 IVTP reset 2 12 nesting 2 13 NMI 2 11 priorities 2 11 programming 2 11 service routines 2 11 2 13 software polling example 2 11 vector table 2 11 inverse Fourier transform 6 24 inverse lattice filter example 6 18 inverse of floating point 3 12 ISR See interrupt service routine ISR IVTP 2 12 See also interrupt vector table pointer IVTP register 2 11 JTAG 11 14 JTAG emulator buffered signals 11 9 connection to target system 11 1 to 11 24 no signal buffering 11 8 podinterface 11 4 jumps 2 4 Index LAO LA30 definition See A0 A30 LAJ instruction 2 4 5 5 lattice filter structure 6 17 lattice filters 6 17 6 18 applications 6 17 benchmarks 6 20 forward 6 19 LBb LBUb instructions 3 4 LDO LD31 definition See D0 D31 LHw LHUw instructions 3 4 linker command file 1 6 example 1 9 literature 10 3 LMS algorithm 6 13 localbus 4 3 control signals 4 11 local m
134. 4x to signal the completion of a bus access for external wait state gen eration Because RDYO is sampled on the falling of H1 the H3 output clock is used as the PLD clock input Example 4 1 shows the ready logic equations for programming the 16R4 PLD The PLD language used is ABEL STRBO is an input into the PLD that indicates that a valid C4x bus cycle is occurring Also a delayed version of STRBO synchronized with H1 going high is provided as the strb syn input signal This delayed signal is needed to avoid problems with a race condition that may exist between STRBO going low and H rising RESET can be used to bring the state machine back to the idle state Notice that the RDYO output of the PLD is not registered An asynchronous RDYO signal is necessary to generate a ready signal for zero wait state de vices When a zero wait state device is selected ahi1 high in Example 4 1 and STRBO is low the PLD asserts RDYO low within 7 ns Hence RDYO goes active fast enough to satisfy the 20 ns setup time of RDYO low before H1 low For generation of RDYO for one and two wait states the device select address bits and strb syn are delayed one and two cycles respectively by the PLD before a RDYO is brought active low The one H3 cycle delay required for one wait state device ready generation corresponds to state wait one in Example 4 1 and the two H3 cycle delay required for two wait state devices corresponds to state wait twoa
135. 5 3 to M butterfly loop bflend ldf ar r7 r7 COS AI r4 stf r4 ar2 1r0 ldf ar7tt r6 r6 SIN BR r2 stf r2 ar3 mpyf arl r6 r5 rb BI SIN AR r3 stf r3 ar2 addf fl r0 r2 r2 TI r0 rl mpyf arl r4 r0 r0 BR COS r3 AI TI addf r2 ar 0 r3 subf r2 ar0 1r0 r4 r4 AI TI BI 13 stf r3 ar3 ir0 addf r0 rb r3 r3 TR r0 r5 mpyf arl r6 r0 r0 BR SIN r2 AR TR subf r3 ar0 r2 mpyf arltt ir0 r7 r1 rl BI COS AI r4 stf r4 ar2 1r0 addf ar0 13 13 r3 AR TR BR r2 SEE r2 ar3 mpyf arl 7 2r5 rb BI COS AR r3 stf r3 ar2 subf El r0 ra2 r2 TI r0 rl mpyf arl r6 r0 r0 BR SIN r3 AI TI addf r2 ar r3 subf r2 ar0 1r0 r4 14 AI TI BI 13 stf r3 ar3 ir0 subf r 0 rb5 r3 13 TR r0 r5 mpyf arl r7 r r0 BR COS r2 AR TR subf rt3 ar0 r2 bflend mpyf arl ir0 r6 r1 rl BI SIN r3 AR TR addf ar0 r3 r3 clear pipeline 6 52 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued ENDB ldi cmpi ldi ldi subi ldi ldf nop ldf IN stf bitrvl BI stf t INPLACE bead END OF FFT ck ck kk kk OR kk OR AA ck c kck ck ck ck ck ckck c A A A A A A A A ck kckck ck kckck ck ck ck ck ck ck ck ck ck ck oko stf r2 ar3 BR r2 A
136. 6 Workshop on Applications of Signal Process ing to Audio and Acoustics September 1986 32 Texas Instruments Digital Signal Processing Applications with the TMS320 Family 1986 Englewood Cliffs NJ Prentice Hall Inc 1987 33 Treichler J R C R Johnson Jr and M G Larimore A Practical Guide to Adaptive Filter Design New York NY John Wiley and Sons Inc 1987 Graphics Imagery 1 Andrews H C and B R Hunt Digital Image Restoration Englewood Cliffs NJ Prentice Hall Inc 1977 Gonzales Rafael C and Paul Wintz Digital Image Processing Reading MA Addison Wesley Publishing Company Inc 1977 Papamichalis P E FFT Implementation on the TMS320C30 Proceed ings of ICASSP 88 USA Volume D page 1399 April 1988 Pratt William K Digital Image Processing New York NY John Wiley and Sons 1978 Reimer J and A Lovrich Graphics with the TMS32020 WESCON 85 Conference Record USA 1985 Speech Voice 1 2 DellaMorte J and P Papamichalis Full Duplex Real Time Implementa tion of the FED STD 1015 LPC 10e Standard V 52 on the TMS320C25 Proceedings of SPEECH TECH 89 pages 218 221 May 1989 Frantz G A and K S Lin A Low Cost Speech System Using the TMS320C17 Proceedings of SPEECH TECH 87 pages 25 29 April 1987 Gray A H and J D Markel Linear Prediction of Speech New York NY Springer Verlag 1976 Jayant N S and Peter Noll Digital Coding of
137. 8 1 Communication Ports 8 1 Communication Ports 8 2 To provide simple processor to processor communication the C4x has six parallel bidirectional communication ports Because these ports have port ar bitration units to handle the ownership of the communication port data bus be tween the processors you should concentrate only on the internal operation of the communication ports For software these communication ports can be treated as 32 bit on chip data I O FIFO buffers Processor read data from write data to communication is simple LDI comm_portO_input RO Read data from comm port 0 or STI RO comm_portO_output Write data to comm port 1 If the CPU or DMA reads from or writes to the communication port I O FIFO and the l O FIFO is either empty on a read or full on a write the read write execution will be extended either until the data is available in the input FIFO for a read or until the space is available in the output FIFO for a write Some times you can use this feature to synchronize the devices However this can slow down the processing speed and even hang up the processor Avoid such situations by synchronizing the CPU DMA accesses with the following flags that indicate the status of the port ICRDY input channel ready 0 the input channel is empty and not ready to be read 1 the input channel contains data and is ready to read ICFULL input channel full 0 the input channel is not full 1 t
138. 80H R2 Test sign bit BZAT R11 If positive delayed return and annul next three instructions NEGI RO Negate if a negative number NOP NOP BU R11 Return 6 6 FIR IIR and Adaptive Filters 6 2 FIR IIR and Adaptive Filters 6 2 1 FIR Filters Digital filters are a common reguirement for digital signal processing systems There are two types of digital filters finite impulse response FIR and infinite impulse response IIR Each of these types can have either fixed or adaptable coefficients In this section the fixed coefficient filters are presented first and then the adaptive filters are discussed If the FIR filter has an impulse response h 0 h 1 h N 1 and x n repre sents the input of the filter at time n the output y n at time n is given by this equation y n h 0 x n h 1 x n 1 h N 1 x n N 1 Two features of the C4x that facilitate the implementation of the FIR filters are parallel multiply add operations and circular addressing The first permits the performance of a multiplication and an addition in a single machine cycle while the second makes a finite buffer of length N sufficient for the data x Figure 6 1 shows the arrangement of the memory locations to implement cir cular addressing while Example 6 5 presents the C4x assembly code for an FIR filter Figure 6 1 Data Memory Organization for an FIR Filter impulse initial final i response input sa
139. Addressing in which several bits of an address are reversed in order to speed processing of algorithms such as Fourier transforms BK See block size register A 1 Glossary A 2 block size register A register used for defining the length of a program block to be repeated in repeat mode bootloader A built in segment of code that transfers code from an external memory or from a communication port to RAM at power up carry bit A bitin status register ST1 used by the ALU for extended arithme tic operations and accumulator shifts and rotates The carry bit can be tested by conditional instructions circular addressing An addressing mode in which an auxiliary register is used to cycle through a range of addresses to create a circular buffer in memory context save restore A save restore of system status status registers ac cumulator product register temporary register hardware stack and auxiliary registers etc when the device enters exits a subroutine such as an interrupt service routine CPU Central processing unit The unit that coordinates the functions of a processor CPUcycle The time it takes the CPU to go through one logic phase during which internal values are changed and one latch phase during which the values are held constant cycle See CPU cycle DO D31 External data bus pins that transfer data between the processor and external data program memory or I O devices See also LD0 LD31 data
140. Bank Switching gt t4 E tae A30 13 X Valid STRB tol N A tgit SELO t SEL1 ia fe t6 D31 0 Bank 0 on Bus q Bank 1 on Bus Table 4 2 Page Switching Interface Timing Time Time Interval Event Period ty H1 falling to address STRB valid 7 ns to STRB to select delay 5ns ta Memory disable from select 8 ns t4 H1 falling to STRB 7 ns i5 STRB to select delay 5 ns tg Memory output enable delay 3ns 4 20 Parallel Processing Through Shared Memory 4 6 Parallel Processing Through Shared Memory The C4x s two memory interfaces allow flexibility to design shared memory interfaces for parallel processing Many processors can be linked together in a wide variety of network configurations through these ports In this section Figure 4 10 illustrates C4x shared memory networks that you can use to fulfill many signal processing system needs Figure 4 10 C4x Shared Distributed Memory Networks Global Local Bus Bus C4x Global i Global C4x Shared memory ES architecture C4x BB B Global Local Bus Bus Local Local 4 6 1 Shared Global Memory Interface i One of the most common multiprocessor configurations is the sharing of memory by all processors in a system Shared memory is typically implemented by tying the processors data and address lines together Howev er the shared memory interface must guarantee that no more than one processor is driving the s
141. C JENGTH OF FILTER 2 N 2 BK ENGTH OF FILTER N REGISTERS USED AS INPUT ARO AR1 RC BK REGISTERS MODIFIED RO R2 ARO AR1 RC REGISTER CONTAINING RESULT RO BENCHMARKS CYCLES 3 N not including subroutine overhead WORDS 6 not including subroutine overhead FIR global FIR RPTBD CONV Set up the repeat cycle Initialize RO PYF3 ARO 1 AR1 1 RO h N 1 x n N 1 gt RO LDF 0 0 R2 Initialize R2 OP FILTER 1 lt i lt N CONV PYF3 ARO 1 AR1 1 RO h N 1 i x n N 1 i gt RO 1 ADDF3 RO R2 R2 Multiply and add operation BUD R11 Delayed return ADDF RO R2 R0 Add last product NOP NOP end end 6 8 6 2 2 IIR Filters FIR IIR and Adaptive Filters The transfer function of the IIR filters has both poles and zeros Its output de pends on both the input and the past output As a rule the filters need less computation than an FIR with similar frequency response but the filters have the drawback of being sensitive to coefficient quantization Most often the IIR filters are implemented as a cascade of second order sections called biquads Example 6 6 and Example 6 7 show the implementation for one biquad and for any number of biquads respectively y n a1 y n 1 a2 y n 2 bO x n b1 x n 1 b2 x n 2 However the following two equations are more convenient and have smaller storage requirements d n a2 d n 2 a1 d
142. C30 Expanded calling conventions to registers for parameter passing CK ck ck Ck ck ck Ck ck KA AA K Sk ck Ck Sk A Ck ck ck ck A Ck kk kk ck AAA KA kk ck A KKK SYNOPSIS int ifft rl FFT SIZE LOG SIZE SOURCE ADDR DEST ADDR SINE TABLE BIT REVERSE ara r2 ES re rs re int FFT SIZE 64 128 256 512 1024 int LOG_SIZE 7 6 Ws 8 9 10 float SOURCE ADDR Points to where data is originated and operated on float DEST ADDR Points to where data will be stored float SINE TABLE Points to the SIN COS table int BIT REVERSE 0 Bit Reversing is disabled lt gt 0 Bit Reversing is enabled NOTE 1 If SOURCE ADDR DEST ADDR then in place bit reversing is kk ck Ck ck ck Ck Ck KKK ck Ck ck KKK KK KKK KKK KKK KK KKK DESCRIPTION FF 0X FF FFF 0X 0X 0X FF FF F FF F OX OX FFF FF OR F FF FF Ro oo o FF HF ox I as follows performed if enabled more processor intensive 2 FFT_SIZE must be gt 64 this is not checked Generic function to do an inverse radix 2 FFT computation on the C40 The input data array is FFT_SIZE long with real and imaginary points R and KKK KKK ck Ck KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK Applications Oriented Operations 6 73 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued SOURCE_ADDR 0 KKK KK NOTI CAUTION
143. C32 is an inexpensive and flexible solution Some of the advantages of using an I O coprocessor include J An I O coprocessor can provide with data processing 3 An I O coprocessor allows for error correction and recovery from C4x commport interface problems Anl O coprocessor can buffer data allowing faster C4x data throughput Figure 8 4 shows the C32 to C4x interface Through the interface a C4x commport is memory mapped to the C32 external memory bus The interface uses four C32 I O pins to drive the commport control signals Figure 8 4 A C32 to C4x Interface 8 14 Vcc CAx output comm port CREQ CACK CRDY CSTRB DO D7 Pullup resistors in the XFO XF1 TCLKO and TCLK1 lines are used to prevent undesired glitches due to temporary high impedance conditions Serial resis tors are also used on the same pins for better impedance matching The interface software drivers and a more detailed explanation of the interface can be obtained from our TI BBS filename 4xaic exe Token transfer and word transfer drivers are included with the software Implementing a Token Forcer 8 8 Implementing a Token Forcer After system reset half of the communication channels associated with a par ticular C4x have token ownership communication ports 0 1 2 and the other half communication ports 3 4 5 do not If because of system configuration requireme
144. DMA transfer is set up to have priority over a CPU operation and to generate an interrupt flag DMA INT2 after the transfer is completed The DMA control register is set to 00C4 0007h Example 7 1 Array initialization With DMA ITLE ARRAY INITIALIZATION WITH DMA HIS EXAMPLE INITIALIZES A 128 EMENTS ARRAY TO ZERO THE DMA TRANSFER IS SET UP TO HAVE HIGHER PRIORITY OVER CPU OPERATION THE DMA INT2 INTERRUPT FLAG IS SET TO 1 AFTER THE TRANSFER IS COMPLETED data DMA2 word 001000COH DMA channel 2 map address CONTROL word 00C40007H DMA register initialization data SOURCE word ZERO SRC IDX word 0 COUNT word 128 DESTI word ARRAY DES IDX word 1 ZERO float 0 0 Array initialization value 0 0 bss ARRAY 128 text START LDP DMA2 Load data page pointer LDA DMA2 ARO Point to DMA channel 2 registers LDI SOURCE RO Initialize DMA source register STI RO ARO 1 LDI SRC_IDX RO Initialize DMA source index register STI RO ARO 2 LDI COUNT RO Initialize DMA count register STI RO ARO 3 LDI DESTIN RO Initialize DMA destination register STI RO ARO 4 LDI DES IDX RO Initialize DMA destination index register STI RO ARO 5 LDI CONTROL RO Start DMA channel 2 transfer STI RO ARO end
145. DSP literature number SPRAO31 This application note covers unprocessed partitioned FFT implementation for large FFTs The source code is also available onthe TI DSP Bulletin Board un der the filename C40PFFT EXE Applications Oriented Operations 6 25 Fast Fourier Transforms FFTs 6 5 1 Complex Radix 2 DIF FFT Example 6 12 shows a simple implementation of a complex radix 2 DIF FFT on the C4x The code is generic and can be used with any length number However for the complete implementation of an FFT a table of twiddle factors sines cosines is needed and this table depends on the size of the transform To retain the generic form of Example 6 12 the table with the twiddle factors containing 1 1 4 complete cycles of a sine is presented separately in Example 6 13 for the case of a 64 point FFT A full cycle of a sine should have a number of points equal to the FFT size If the table with the twiddle factors and the FFT code are kept in separate files they should be connected at link time 6 26 Fast Fourier Transforms FFTs Example 6 12 Complex Radix 2 DIF FFT KKK KKK KKK KK A AK ZA AK ZA KA KKK KKK KKK KEK A KA ck ck KKK KKK ck ck ck KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KK e FILENAME CR2DIF ASM DESCRIPTION COMPLEX RADIX 2 DIF FFT FOR TMS320C40 C callable DATE 6 29 93 VERSION 4 0 Dk ck ck ck ck ck 0k ck 0k ck
146. E ARO ARO points at SIN COS table LSH 2 IR1 SUBI 3 RC SUBF3 AR2 AR1 R3 R3 X I1 X I2 ADDF3 AR1 AR2 R2 R2 X I1 X I2 PYF3 R3 ARO IRO R1 R1 R3 SIN LDF AR4 R4 R4 X I4 PYF3 R3 ARO IR1 RO RO R3 COS SUBF3 AR3 R4 R3 R3 X I4 X 13 ADDF 3 R4 AR3 R2 R2 X 13 X 14 STF R2 AR1 XL e t PYF3 R2 ARO IR1 R4 R4 R2 COS STF R3 AR2 X I2 RPTBD IN BLK ADDF3 R4 R1 R3 R3 R3 SIN R2 COS 4 F PYF3 R2 ARO R1 R1 R2 SIN STF R3 AR4 X I4 lt SUBF3 R1 RO R4 R4 R3 COS R2 SIN SUBF3 AR2 AR1 R3 R3 X I1 X I2 ADDF3 AR1 AR2 R2 R2 X I1 X I2 MPYF3 R3 ARO IRO R1 R1 R3 SIN STF RA AR3 X 13 LDF AR4 R4 R4 X I4 MPYF3 R3 ARO IR1 RO RO R3 COS Applications Oriented Opera tions 6 77 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued So Ne Ne Ne IN_BLK U D 3 3 GHVUU nj Duuousg pusgu tu D P ST LD E SU AD S PEW jw 3 1 worrrzrmouuzuu E BF3 DF3 YF3 E DF3 YF3 BF3 BF3 DF3 YE3 F F YF3 BF3 DF3 LS LS E U D AR3 R4 R3 R4 AR3 R2 R2 AR1 R2 AR0 IR1 R4 R3 AR2 R4 R1 R3 R2 ARO R1 R3 AR4 R1 RO R4 AR2 AR1 R3 AR1 AR2 R2 R3 ARO IRO R1 R4 AR3 AR4
147. E ROUTINE CODE GOES HERE global RESTR CONTEXT RESTORE AT TH END OF A SUBROUTINE CALL OR INTERRUPT ESTR RESTORE THE REST REGISTERS FROM THE REGISTER FILE POP RC Restore repeat counter POP RE Restore repeat end address POP RS Restore repeat start address POP DIE Restore DMA interrupt enable register POP IIF Restore interrupt flag register POP IIE Restore interrupt enable register POP BK Restore block size register POP TRI Restore index register IRl POP IRO Restore index register IRO POP DP Restore data page pointer RESTORE THE AUXILIARY REGISTERS POP AR7 Restore AR7 POP AR6 Restore AR6 POP AR5 Restore AR5 POP AR4 Restore AR4 POP AR3 Restore AR3 POP AR2 Restore AR2 POP AR1 Restore ARI POP ARO Restore ARO Context Switching in Interrupts and Subroutines Example 2 6 Context Save and Context Restore Continued R ECISION REGISTERS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi r CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi E CS tore the upp lower 32 bi CS ESTORE TH EXTENDED PR POP
148. Example 6 13 Table With Twiddle Factors for a 64 Point FFT Continued float 0 956940 float 0 980785 float 0 995185 float 1 000000 float 0 995185 float 0 980785 float 0 956940 float 0 923880 float 0 881921 float 0 831470 float 0 773010 float 0 707107 float 0 634393 float 0 555570 float 0 471397 float 0 382683 float 0 290285 float 0 195090 float 0 098017 float 0 000000 float 0 098017 float 0 195090 float 0 290285 float 0 382683 float 0 471397 float 0 555570 float 0 634393 float 0 707107 float 0 773010 float 0 831470 float 0 881921 float 0 923880 float 0 956940 float 0 980785 float 0 995185 6 5 2 Complex Radix 4 DIF FFT The radix 2 algorithm has tutorial value because it is relatively easy to under stand how the FFT algorithm functions However radix 4 implementations can increase the speed of the execution by reducing the overall arithmetic re quired Example 6 14 shows the generic implementation of a complex DIF FFT in radix 4 A companion table like the one Example 6 13 should be used to provide the twiddle factor Applications Oriented Operations 6 33 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT KKK KKK KKK KK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK ck ck ck KKK KKK KKK KKK KKK KKK
149. F R11 Res POP R11 the POPF R10 Res POP R10 the POPF R9 Res POP R9 the POPF R8 Res POP R8 the POPF R7 Res POP R7 the POPF R6 Res POP R6 the POPF R5 Res POP R5 the POPF R4 Res POP R4 the POPF R3 Res POP R3 the POPE R2 Res POP R2 the POPE R1 Res POP R1 the POPF RO Res POP RO the POP STE Res ESTORE IS COMPLETE RETI tore the upp lower 32 bi E CS Lore status reg 3 o e o 3 o 3 o 3 o 3 o 3 o 3 o 3 o 3 o 3 o S o i hNANANANAN MN Fh PO Fh No Fh PO Eh DY h N 2 s bits R11 bits R10 bits R9 bits R8 bits R7 bits R6 bits R5 bits R4 bits R3 bits R2 bits R1 bits RO ter and and and and and and and and and and and and Program Control 2 17 Repeat Modes 2 5 Repeat Modes 2 5 1 Block Repeat The RPTB RPTBD and RPTS instructions support looping without overhead Loop execution parameters are specified by three registers as can be seen in the following examples RS Repeat start address j RE Repeat end address RC Repeat counter In principle it is possible to nest repeat blocks However there is only one set of control registers RS RE and RC It is therefore necessary to save these registers before entering an inside loop and to restore these registers after completing the inside loop It ta
150. FT SIZE RC LSH 3 RC LDA RC IR1 SUBI 2 R LDF AR2 R6 R6 X I2 LDF AR3 RO RO X 13 ADDF3 R6 AR1 R5 RS X 11 X 12 SUBF3 R6 AR1 R4 R4 X 11 X 12 l SUBF 3 RO R4 R3 R3 X I1 X I2 X 13 ADDF3 RO R4 R2 R2 X I1 X I2 4X I3 SUBF 3 RO AR4 R1 R1 X 14 X 13 STF R5 AR1 IRO X I1 lt r RPTBD LOOP3 B ADDF3 R2 AR4 R5 RS X I1 X I2 X I3 X 14 STF R1 AR2 IRO X I2 lt MPYF3 R5 AR7 IR1 R1 R1 R5 SIN SUBF3 AR4 R3 R2 R2 X I1 X I2 X I3 X I4 MPYF3 R2 ART7 RO0 RO R2 SIN STF R1 AR4 IRO X I4 lt Fr LDF AR2 R6 R6 X I2 STF RO AR3 IRO X I3 ADDF3 R6 AR1 R5 RS X 11 X 12 LDF AR3 RO RO X 13 SUBF 3 R6 AR1 R4 R4 X I1 X I2 SUBF3 RO R4 R3 R3 X I1 X I2 X I3 ADDF3 RO R4 R2 R2 X 11 X 12 X 13 SUBF 3 RO AR4 R1 R1 X I4 X 13 STF R5 AR1 IRO s MIL lt ADDF3 R2 AR4 R5 RS X I1 X I2 X I3 X 14 STF R1 AR2 IRO X I2 lt 4 MPYF3 R5 AR7 R1 R1 R5 SIN lt SUBF3 AR4 R3 R2 R2 X I1 X I2 X I3 X I4 LOOP3 B MPYF3 R2 AR7 RO RO R2 SIN STF R1 AR4 IRO X I4 lt STF RO AR3 X 13 6 80 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued r
151. Figure 1 11 to produce an EMUO 1 signal for the emulator 11 22 Emulation Design Considerations 11 9 4 Performing Diagnostic Applications For systems that reguire built in diagnostics it is possible to connect the emulation scan path directly to a TI ACT8990 test bus controller TBC instead of the emulation header The TBC is described in the Texas Instruments Ad vanced Logic and Bus Interface Logic Data Book literature number SCYD001 Figure 11 13 shows the scan path connections of n devices to the TBC Figure 11 13 TBC Emulation Connections for n JTAG Scan Paths 4 TBC TCKI 2 TDO jp TAS TMSO gt TMS TMS1 EMUO TMS2 EVNTO EMU1 TMS3 EVNT1 4 TRST TMS4 EVNT2 TCK TMS5 EVNT3 TDO TCKO TDIO TDN In the system design shown in Figure 1 13 the TBC emulation signals TCKI TDO TMSO TMS2 EVNTO TMS3 EVNT1 TMS5 EVNT3 TCKO and TDIO are used and TMS1 TMS4 EVNT2 and TDI1 are not connected The target devices EMUO and EMU1 signals are connected to Vcc through pullup resis tors and tied to the TBC s TMS2 EVNTO and TMS3 EVNT1 pins respectively The TBC s TCKI pin is connected to a clock generator The TCK signal for the main JTAG scan path is driven by the TBC s TCKO pin XDS510 Emulator Design Considerations 11 23 Emulation Design Considerations 11 24 On the TBC the TMSO pin drives the TMS pins o
152. HO RO IIF Enable INT2 OR ENABLE ST Enable all interrupts MAIN PROCESSING SECTION FOR ISR2 XOR ENABLE ST Disable all interrupts POPF R1 Restore upper 32 bits and POP R1 lower 32 bits of R1 POPF RO Restore upper 32 bits and POP RO lower 32 bits of RO POP IIF POP IIE Restore interrupt enable register POP DP Restore data page register POP ST Restore status register RETI Return and enable interrupts Program Control 2 13 Context Switching in Interrupts and Subroutines 2 4 Context Switching in Interrupts and Subroutines Context switching is commonly required when a subroutine call or interrupt is processed It can be extensive or simple depending on system requirements For the C4x the program counter is automatically pushed onto the stack Im portant information in other C4x registers such as the status auxiliary or ex tended precision registers must be saved in the stack with PUSH PUSHF and recovered later with POP POPF instructions You need to preserve only the registers that are modified inside of your subrou tine or interrupt trap service routine and that could potentially affect the pre vious context environment Note The status register should be saved first and restored last to preserve the processor status without any further change caused by other context switch ing instructions I If the previous c
153. I r4 stf r4 ar2 ir0 addf r1 r0 r2 r2 TI r0 rl addf r2 ar0 r3 r3 AI 4 TI AR r3 stf r3 ar2 subf EL aros r4 AI TI BI 53 stf r3 ar3 stf r4 ar2 SAI r4 Ck Ck ck ck ck ck ck ck ck ck AA ck ck KKK A XK AXA ck KKK KKK KKK KKK A ck ck ck ck KKK KKK KKK KKK ck kk AA AAA kk A kk ck A KKK KKK k BITR rptbd Ck Ck ck ck Ck ck ck kk ce KA K KA Ck ZA K A ck KA ck KA KK KKK A KA ck ck ck KK KKK KKK KKK ck MA ck kk KA A ck ck A kk AAA kk Sk kk VERSAL Qinputp arO QGoutputp arO0 INPLACE outputp arl fg ird 2 ir0 rc bitrvl 2 eL ar0 1 r0 ar0 ir0 b r1 r0 arl 1 ar0 1 r0 rl arl iel end ar0 ir0 b rl rO arl 1 rl arl Return to C environment rptbd BITRV2 nop t rl 2 nop ar0 ir0 b nop ar1 DSR_AD This bit reversal section assume input and output in Re Im Re Im format KEK KK KKK KK KKK A AZ A KA AX ck ck AXA A KA AX AZ A AXA A KA AXA AKA AX KA kc AKA AAA KA AAA AA A KA AAA ko ck ck ck ck Mk Sk kv kx Sk kA kx ko zirO FFT SIZE 7 CC FFT_SIZI irl 2 read first Im value Applications Oriented Operations 6 53 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued CONT BITRV2 end Return to C cmpi arl ar0 bgeat CONT ldf arl r0 ldf ar0 rl stf r ar stf rl arl ldf
154. IRO is set to 512 instead of 256 Bit Reversed Addressing Example 3 6 CPU Bit Reversed Addressing TITLE BIT REVERSED ADDRESSING THIS EXAMPLE MOVES THE RESULT OF THE 512 POINT FFT COMPUTATION POINTED AT BY ARO TO A LOCATION POINTED AT BY AR1 REAL AND IMAGINARY POINTS ARE ALTERNATING LDI 511 RC Repeat 511 1 times RPTBD LOOP LDI 512 IRO Load FFT size LDI 2 IR1 LDF ARO 1 R1 Load first imaginary point LDF AR0 IRO B RO Load real value and point to next BI STF R1 AR1 1 location and store the imaginary value LOOP LDF ARO 1 R1 Load next imaginary point and store STF RO AR1 IR1 previous real value 3 4 2 DMA Bit Reversed Addressing In DMA bit reversed addressing two bits in the DMA control register enable bit reversed addressing on DMA reads READ BIT REV and DMA writes WRITE BIT REV The source address index register and destination address index register define the size of the bit reversed addressing Their function is similar to the CPU index register IRO described in the previous subsection Two DMA block transfers are required when the DMA is used for bit reversed transfer of complex numbers one to transfer the real ports and one to transfer the imaginary ports Figure 3 1 illustrates the DMA settings required for a DMA operation equiva lent to Example 3 6 Unified autoin
155. Im The twiddle factors are supplied in a table put in a section with a global label SINE pointing to the beginning of the table This data is included in a Separate file to preserve the generic nature of the program The sine table size is 5 FFT SIZE 4 Note Sections needed in the linker command file ffttxt FFT code fftdat FFT data ACA FF ob ook OX ox FF FF FFF FF obo oko FF FF FF FF oko FF FHF HF FH Applications Oriented Operations 6 27 Fast Fourier Transforms FFTs Example 6 12 Complex Radix 2 DIF FFT Continued KKK KK KKK KKK KKK KK KKK KKK KKK A KA KKK KKK KK A KA KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KK AR j AI N x N E i BR j BI AR AR BR AI AI BI BR AR BR COS BI AI BI COS globl SINE globl _cr2dif globl STARTB ENDB sect tftdat SINTAB word _SINE OUTPUTP space 1 FFTSIZE space 1 sect UStELEXL _cr2dif LDI SP ARO PUSH DP PUSH R4 PUSH R5 PUSH R6 PUSHF R6 PUSH AR4 PUSH ARS PUSH AR6 PUSH R8 LDP SINTAB Jaf REGPARM LDI ARO 1 AR2 LDI ARO 2 R10 LDI ARO 3 R9 LDI ARO 4 RC else LDI R2 R10 LDI R3 R9 endif STI RC OUTPUTP STI R10 FFTSIZE AI BI SIN AR BR SIN 0 COS Address of sine cosin Entry point for execution j SIN BR AR
156. In Place Bit Reversing If SourceAddr lt gt DestAddr Then Standard Bit Reversing Bit reversing Type 1 NOTE abs SOURCE_ADDR DEST ADDR LDI SOURCE_ADDR RO CMP I DEST_ADDR RO BEQ IN_PLACE From Source to Destination must be gt FFT_SIZE this is not checked Applications Oriented Operations 6 59 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued LDI GFFT SIZE RO SUBI 2 R0 LDA QFFT SIZE IRO LSH 1 IRO IRO Half FFT size LDA SOURCE_ADDR ARO LDA DEST_ADDR AR1 LDF ARO R1 RPTS RO LDF ARO R1 N STF R1 AR1 IRO B STF R1 AR1 IRO B BR STARTB r In Place Bit Reversing Bit Reversing On Even Locations 1st Half Only IN_PLACE LDA FFT_SIZE IRO LSH 2 IRO0 IRO Quarter FFT size LDA 2 IR1 LDI GFFT SIZE RC LSH 2RC SUBI 3 RC LDA DEST_ADDR ARO LDA ARO AR1 LDA ARO AR2 NOP AR1 IRO B NOP AR2 IRO B LDF ARO IR1 RO LDF AR1 R1 RPTBD BITRV1 CMPI AR1 ARO Xchange Locations only if ARO lt ARl LDFGT RO R1 LDFGT AR1 IRO B R1 LDF ARO IR1 RO N STF RO ARO LDF AR1 R1 STF R1 AR2 IRO B CMPI AR1 ARO LDFGT RO R1 BITRV1 LDFGT AR1 IRO B RO STF RO ARO STF R1 AR2 A Perform Bit Reversing Odd Locations 2nd Half Only La 6 60 Example 6 17 Fast Fourier Transforms FF
157. In addition to initializing global variables boot asm initializes the DP register pointing to the bss section and the SP register pointing to the stack section You need to enable the cache as shown in Example 1 3 and setup your interrupts inside your main routine before you enable interrupts See the Application Report Setting Up TMS320 DSP Interrupts in C SPRAO36 for more information Example 1 3 Enabling the Cache main asm or 1800 st enable cache asm or 3800 st enable cache and interrupts Processor Initialization 1 9 Chapter 2 Program Control Several C4x instructions provide program control and facilitate high speed processing These instructions directly handle 4 Regular and zero overhead subroutine calls Lj Software stack Y Interrupts Delayed branches J Single and multiple instruction loops without overhead Topic Page 24 Subroutines 99v ne 2 2 2 2 Stacksand Queues eee III 2 7 2 3 Interrupt Examples laa 2 11 2 4 Context Switching in Interrupts and Subroutines 2 14 2 0 hepeatiModesq 7 7 500 e E IS 2 18 2 6 Computed GOTOs to Select Subroutines at Runtime 2 21 2 1 Subroutines 2 4 Subroutines The C4x provides two ways to invoke subroutine calls regular calls and zero overhead calls The regular and zero overhead subroutine calls use the soft ware stack and
158. Interrupt Vector Table IVTP Interrupt Priorities O O C L 2 3 1 Correct Interrupt Programming For interrupts to work properly you need to execute the following sequence of steps as is shown in Example 1 1 1 Set the interrupt vector table in a 512 word boundary 2 Initialize the IVTP register 3 Create a software stack 4 Enable the specific interrupt 5 Enable global interrupts 6 Generate the interrupt signal 2 3 2 Software Polling of Interrupts The interrupt flag register can be polled and action can be taken depending on whether an interrupt has occurred This is true even when maskable inter rupts are disabled This can be useful when an interrupt driven interface is not implemented Example 2 3 shows the case in which a subroutine is called when external interrupt 1 has not occurred Example 2 3 Use of Interrupts for Software Polling TITLE INTERRUPT POLLING TSTB 40H IIF Test if interrupt 1 has occurred CALLZ SUBROUTINE If not call subroutine When interrupt processing begins the program counter is pushed onto the stack and the interrupt vector is loaded in the program counter Interrupts are disabled when GIE is cleared to 0 and the program continues from the address loaded in the program counter Because all maskable interrupts are disabled interrupt processing can proceed without further interruption unless the inter rupt service routine re enables interrupts or
159. K KKK KKK KK KKK KKK KKK KKK AX KA KX A KX A kk KKK KKK k KKK KK KKK rsion D MEY ER KARL SCHWARZ L TUHL F UER NACHRICHTENTECHNIK due RSITAE I ERLANGEN NUERNBERG CA ERSTRASSI E 7 D 8520 ERLANGEN FRG DANI CH TI HOUSTON C40 porting ROSEMARI D IEDRA TI HOUSTON made it C callable of the oper faster exec and implemented changes in the order ands for some mpyf instructions for ution when sine table is off chip ROSEMARIE P IEDRA TI Houston Added support for in plac bit reversing Ck Ck ck ck Ck ck ck kk ce A A KK KKK KKK KKK KKK KK KKK KKK KKK KK ck ck kk A kk ck ck ck ZA AXA A ck kk K kk kk ck kk ko kk kk KKK KKK KKK SYNOPSIS int cr2dit SOURCE ADDR FFT SIZE DST ADDR ar2 re 3 float SOURCE ADDR Points to where data is originated and operated on int FFT SIZE 64 128 256 512 1024 float DST ADDR Points to where FFT results should be COMP THIS P UTATION IS ROGRAM H P ST AGES THIS R PARA RS WO D lt MOH DONI IN PL moved ACE ASSES AR R NIM ALI S NO U FFT LENGTH IS 32 T ED Z TIPLIES AR H O I U A RALLEL WI TRIVIAL CHECK AS A FO TH MULTIPLI TH AN ADDF B
160. KA ck ck ck ck ck AX KA ck ck ck ck ck ck ck ck ck ck AKA AAA KA ck kk kk AKA kv ko ko KKK FP set AR3 global ffft rl Entry execution point global STARTB ENDB FFT SIZE usect fftdat 1 Reserve memory for arguments LOG SIZE usect oLitdat l SOURCE ADDR usect tetcdat l DEST ADDR usect Eftdat l SINE TABLE usect eLttdat l BIT REVERSE usect itticdat l SEPARATION usect ftftdat l Initialize C Function sect L TEXT ffft rl PUSH FP Preserve C environment 6 58 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued Se Se Ne Se Se Se Se Se Se Se Se Se Me Ne Ne Ne Ne LDI SP FP PUSH R4 PUSH R5 PUSH R6 PUSHF R6 PUSH R7 PUSHF R7 PUSH AR4 PUSH AR5 PUSH AR6 PUSH AR7 PUSH DP LDP FFT SIZE T REGPARM LDA FP 2 AR2 EDI FP 3 R2 LDI FP 4 R3 LDI FP 5 RC LDI FP 6 RS LDI FP 7 RE endif STI AR2 GFFT SIZI STI R2 Q0LOG SIZE STI R3 GSOURCE A STI RC DEST_ADD STI RS SINE_TAB STI RE BIT_REVERS Check Bit Reversing Mode on or off BIT_REVERSING 0 then OFF BIT_REVERSING lt gt 0 Then ON LDI BIT_REVERS BZ MOVE_DATA Check Bit Reversing Type E RO Initialize DP pointer arguments passed in stack no bit reversing If SourceAddr DestAddr Then
161. L RAM AFTER THE SECOND TRANSFER IS COMPLETED THE DMA IS RE INITIALIZED TO FIRST DMA TRANSFER SETUP data DMA5 word 001000FO0H DMA channel 5 map address DMA INIT word 0000004BH DMA initialization control word LINK word DMA1 lst DMA link list address DMA START word 00C0004BH DMA start control word DMA1 word 00C0004BH lst dummy DMA transfer link list word 002FF800H word 00000000H word 00000001H word 002FF800H word 00000000H word DMA2 DMA2 word 00C4000BH The desired DMA transfer link word 00400000H List word 00000001H word 00000040H word 002FF800H word 00000001H word DMA1 text START LDP DMA5 Load data page pointer LDA DMA5 ARO Point to DMA channel 5 registers LDI DMA_INIT RO Initialize DMA control register STI RO ARO LDI LINK RO Initialize DMA link pointer STI RO ARO 6 LDI DMA_START RO Start DMA channel 5 transfer STI RO ARO LDI 01H IIE Configure INTO as interrupt pins LDHI 0800H DIE Enable INT 0 read sync for DMA channel 5 end DMA C Programming Examples 7 4 DMA C Programming Examples Example 7 6 to Example 7 11 includes DMA programing examples from C These examples cover unified and Split mode DMA autoinitialization and DMA synchronization operations Descriptions of the examples presented are as follows i Example 7 6 Unified mode DMA transfers data between commports us ing read sync Example 7 7 Unified mo
162. Local Strobe 0 00 4 7 4 4 4 RAM Interface Using Both Local Strobes 2 00 c cece eee 4 8 4 5 Wait States and Ready Generation 44 4 11 4 5 1 ORing of the Ready Signals STRBx SWW 10 o 4 12 4 5 2 ANDing of the Ready Signals STRBx SWW 11 o 4 12 4 5 8 External Ready Generation nesana 4 13 4 5 4 Ready Control Logic lt teens 4 14 4 5 5 Example CIFCUIU 15 358 RR RR Rake ERR Y AEN EE REY EAR ERE 4 15 4 5 6 Page Switching Techniques 2 sese 4 18 4 6 Parallel Processing Through Shared Memory 4 21 4 6 1 Shared Global Memory Interface esses 4 21 4 6 2 Shared Memory Interface Design Example cece eee 4 22 Programming TIPS cecus ts xh eee OK RR eee aed Pee RA 5 1 Provides hints for writing more efficient C and assembly language code 5 1 Hints for Optimizing C Code 6 ee e 5 2 5 2 Hints for Optimizing Assembly Language Code 00 e eee e eee ees 5 5 Contents Applications Oriented Operations lt lt lt lt lt 454 Describes common algorithms and provides code for implementing them Gil cGonmipahdilg arca eU PME UU aaa 6 2 FIR IIR and Adaptive Filters lt 2 III 6 2 1 FIECFIGES ui bd dn in U aede arena reo wu Pp ted ee Jg XA Adi ce 6 2 8 Adaptive Filters LMS Algorithm 00 0 c cece eee eee 6 39 Lattice CR O i E e E ERE tad ac ge ech at aes 6 4 Matrix Vector Multiplication
163. MALIZATION globl DIVI SIGN set R2 EMPF set R3 EMP set IRO COUNT set IR1 DIVI SIGNED DIVISION DIVI z DETERMINE SIGN OF RESULT GET ABSOLUTE VALUE OF OPERANDS XOR RO R1 SIGN Get the sign ABSI RO ABSI R1 CMPI RO R1 Divisor gt dividend BGTD ZERO If so return 0 NORMALIZE OPERANDS USE DIFFERENCE IN EXPONENTS AS SHIFT COUNT FOR DIVISOR AND AS REPEAT COUNT FOR SUBC FLOAT RO TEMPF Normalize dividend PUSHF TEMPF PUSH as float POP COUNT POP as int LSH 24 COUNT Get dividend exponent FLOAT R1 TEMPF Normalize divisor PUSHF EMPE PUSH as float POP EMP POP as int LSH 24 TEMP Get divisor exponent SUBI EMP COUNT Get difference in exponents LSH COUNT R1 Align divisor with dividend DO COUNT 1 SUBTRACT amp SHIFTS RPTS COUNT SUBC R1 RO Logical and Arithmetic Operations 3 11 Integer and Floating Point Division Example 3 7 Integer Division Continued MASK OFF THE LOWER COUNT 1 BITS OF RO SUBRI 31 COUNT Shift count ts 32 COUNT 1 LSH COUNT RO Shift left NEGI COUNT LSH COUNT RO Shift right to get result i CHECK SIGN AND NEGATE RESULT IF NECESSARY NEGI RO R1 Negate result ASH 31 SIGN Check sign LDINZ R1 RO If set use negativ
164. MDS development support tools have been fully character ized and the quality and reliability of the device has been fully demonstrated Texas Instruments standard warranty applies to these products TT Note Itis expected that prototype devices TMX or TMP have agreater failure rate than standard production devices Texas Instruments recommends that these devices notbe used in any production system because their expected end use failure rate is still undefined Only qualified production devices should be used sss Development Support and Part Order Information 10 9 Part Order Information Tl device nomenclature also includes the device family name and a suffix This suffix indicates the package type for example N FN or GB and temperature range for example L Figure 10 3 provides a legend for reading the com plete device name for any TMS320 family member Figure 10 3 Device Nomenclature 10 3 2 Device and Development Support Tools 10 10 TMS 320 C 40 GF L PREFIX X SMJ Ceramic QML TMX experimental device TMP prototype device TMS qualified device SMQ Plastic QML DEVICE FAMILY 320 TMS320 Family TECHNOLOGY C CMOS E CMOS EPROM TEMPERATURE RANGE L AMBIENT A 40 to 85 C H 0to50 C L 0Oto70 C M 55 to 125 C S 55 to 100 C PACKAGE TYPE FD ceramic leadless CC FN plastic leaded CC FZ ceramic CER QUAD GB 181 pin ceramic PGA GE 181 pin ce
165. Mg HI HI MET butterfly addf subf addf subf butterfly addf ldf ldf rptbd subf stf stf stf to M butterfly loop bf2end ldf stf ldf stf mpy f stf addf mpyf addf subf stf addf mpyf subf mpyf stf addf stf mpyf stf subf mpyf addf subf stf addf mpyf subf mpyf stf addf stf w M 4 w M 4 ar0O arl rb arl ar0 r4 arl ar0 r6 arl ar0 17 arl ar0 r3 ar7 rl arl r0 bf2end arl ir0 ar0 r2 La ar2 17 ar3 r6 ar3 ar r7 r4 ar2 ar7tt r6 r2 ar3 arl r6 r5 r3 ar2 El r0 r2 arl r7 r0 r2 ar 0 r3 r2 ar0 1r0 r4 r3 ar3 1r0 r rb r3 ari r6 r r3 ar0 r2 earl e761 r4 ar2 1r0 ar0 r3 r5 r2 ar3 arl r6 r5 r5 ar2 r1 r0 r2 arl ET r0 r2 ar r3 r2 ar0 r4 r3 ar3Tt r0 r5 r3 arl r6 r0 r3 ar r2 arltt ir0 r7 r1 r4 ar2 arD r3 r3 r2 ar3 Se Se 5 Ne So So Ne Ne Ne Ne Ne Ne AR r5 AR BI AI r4 AI BR BI r6 AI BR BR r7 AR BI AR r3 AR BI rl 0 for inner loop r0 BR for inner loop Setup for loop bf2end BR r2 AR BI AR r5 ER a BI r6 r7 COS AI r4 r6 SIN BR r2 r5 BI SIN AR r3 r2 TI r0 rl r0 BR COS r3 AI TI 14 AI TI BI r3 r3 TR r0 r5 rO BR SIN r2 AR
166. O 4 DES_IDX RO Initialize DMA aux destination index register RO ARO 5 AUC_CNT RO Initialize DMA auxiliary count register RO ARO 7 CONTROL RO Start DMA channel 1 transfer RO ARO 01100H IIF Configure INT2 and INT3 as interrupt pins OAOH DIE Enable INT2 read and INT3 write sync 7 6 An advantage of the C4x DMA is the autoinitialization feature This allows you to set up the DMA transfer in advance and makes the DMA operation com pletely independent from the CPU When the DMA operates in autoinitializa tion mode the link pointer and auxiliary link pointer initialize the registers that control the DMA operation The link pointer can be incremented AUTOINIT STATIC 0 during autoinitialization or held constant AUTOINIT STATIC 1 during autoinitialization This option allows autoinitialization values to be stored in sequential memory locations or in stream oriented devices such as the on chip communication ports or external FIFOs When DMA SYNC MODE is enabled The DMA autoinitialization operation can be configured to synchro nize with the same signal Example 7 4 sets up DMA channel 0 to wait for the communication port to input the initialization value After DMA autoinitializa DMA Assembly Programming Examples tion is complete the DMA channel starts transferring data from the communi cation port input register to internal RAM Example 7 4 DMA Autoinitialization With Communication Port ICRDY
167. R1 b2 0 d 0 n 2 gt R1 RPTBD LOOP Set loop for 1 lt i lt nm MPYF3 ARO 1 AR1 RO al 0 D 0 n 1 gt RO ADDF RO R2 R2 First sum term of d 0 n 6 12 FIR IIR and Adaptive Filters Example 6 7 IIR Filter N gt 1 Biquads Continued FINAL SUMMATION ADDF3 R1 R2 R0 BRD R11 ADDF RO R2 NOP AR1 IR1 NOP AR1 1 end end MPYF3 AR0 1 FARI 1 RO ADDF3 RO R2 R2 MPYF3 ARO 1 R2 R2 STF R2 AR1 1 LOOP STARTS HERE MPYF3 ARO 1 AR1 IRO RO ADDF3 RO R2 R2 PYF3 AR0 1 AR1 1 R1 ADDF3 R1 R2 R2 PYF3 ARO 1 AR1 RO ADDF3 R0 R2 R2 PYF3 AR0 1 AR1 1 R0 b1 i ADDF3 RO R2 R2 LOOP PYF3 ARO 1 R2 R2 STF R2 AR1 1 b1 0 d 0 n 1 gt RO Second sum term of d 0 n b0 0 d 0 n gt R2 Store d 0 n point to d 0 n 2 a2 i d i n 2 gt RO First sum term of y i 1 n Pipeline hit on previous instruction b2 i D i n 2 gt R1 Second sum term of y i 1 n al i d i n 1 RO First sum term of d i n d i n 1 RO Second sum term of d i n bO i d i n gt R2 Store d i n point to d i n 2 Second sum term of y n 1 n Delayed return First sum term of y n 1 n Return to first biquad Point to d 0 n 1 6 2 3 Adaptive Filters LMS Algorithm In some
168. R3 X I3 SIN X I4 COS RA X I2 R3 R4 X 12 R3 SX I3 4 F FRA X 11 B X 14 lt R4 X I1 R2 PIX IZ2 lt D l iX Il lt Applications Oriented Operations 6 67 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued ADDF3 RO R1 R2 MPYF3 AR7 AR4 IR1 RO SUBF3 R4 RO R3 SUBF3 AR1 R3 R4 RPTBD LOOP4_B ADDF3 AR1 R3 R4 STF R4 AR2 IR1 SUBF3 R2 ARO R4 STF R4 AR3 IR1 ADDF3 ARO R2 R4 STF R4 AR1 IR1 MPYF3 AR2 IRO R5 R4 STF R4 AR0 IR1 MPYF3 AR3 IRO R5 R1 MPYF3 AR7 AR3 RO ADDF3 RO R1 R2 MPYF3 AR6 AR4 RO SUBF3 R4 RO R3 SUBF3 AR 1 IRO R3 R4 ADDF3 AR1 R3 R4 STF R4 AR2 SUBF3 R2 ARO IRO R4 STF R4 AR3 ADDF3 ARO R2 R4 STF R4 AR1 MPYF3 AR3 R6 R1 STF R4 ARO ADDF3 RO R1 R2 MPYF3 AR5 AR4 IRO RO SUBF3 RO R1 R3 SUBF3 AR1 R3 R4 ADDF3 AR1 R3 R4 STF R4 AR2 SUBF3 R2 ARO R4 STF R4 AR3 ADDF3 ARO R2 R4 STF R4 AR1 MPYF3 AR2 R7 R4 STF R4 ARO MPYF3 AR3 R7 R1 MPYF3 AR5 AR3 RO ADDF3 RO R1 R2 MPYF3 AR7 AR4 IR1 RO SUBF3 R4 R0 R3 SUBF3 AR1 R3 R4 ADDF3 AR1 R3 R4 STF R4 AR2 IR1 SUBF3 R2 ARO R4 STF R4 AR3 IR1 LOOP4_B ADDF3 ARO R2 R4 STF R4 AR
169. RS MODIFIED RO R1 R2 ARO AR1 REGISTER CONTAINING RESULT RO 2d BENCHMARKS CYCLES 7 not including subroutine overhead WORDS 7 not including subroutine overhead global IIRI IIR1 MPYF3 ARO AR1 RO a2 d n 2 gt RO MPYF3 ARO 1 AR1 1 R1 b2 d n 2 gt R1 MPYF3 AR0 1 AR1 RO al d n 1 RO N ADDF3 RO R2 R2 a2 d n 2 x n R2 MPYF3 ARO 1 AR1 1 RO bl d n 1 RO B ADDF3 RO R2 R2 al d n 1 a2 d n 2 tx n gt R2 BUD R11 Delayed return MPYF3 AR0 1 R2 R2 b0 d n gt R2 B STF R2 AR1 1 Store d n and point to d n 1 ADDF RO R2 b1 d n 1 b0 d n gt R2 ADDF R1 R2 R0 b2 d n 2 b1 d n 1 o0 d n gt RO end end 6 10 FIR IIR and Adaptive Filters Generally the IIR filter contains N gt 1 biguads The eguations for its implemen tation are given by the following pseudo C language code y 0 n x n for i 0 i lt N i d i n a2 i d i n 2 a1 i d i n 1 y i 1 n y i n b2 i d i 2 b1 i d i n 1 bO i d i n y n y N 1 n Figure 6 3 shows the memory organization and Example 6 7 shows the cor responding C4x assembly language code Figure 6 3 Data Memory Organization for N Biquads filter initial delay final delay coefficients node values node values low address circular queue circular queue The block size register BK should be initialized to
170. S 16 not including subroutine overhead E y f Eliminate rightmost bit Normalize seg 3 0WXYZx x Save sign of number If R0 lt 0x20 do linear coding If RO gt 0xFFF saturate the result Adjust segment number by 2 3 seg WXYZx x Treat number as integer Right justify Delayed return Set sign bit RO compressed number Invert even bits for transmission Applications Oriented Operations 6 5 Companding Example 6 4 A Law Expansion ITLE A LAW EXPANSIO SUBROUTINE AXPND TYPICAL CALLING SEQUENCE LAJU AXPND LDI v RO m OP 235 can be other non pipeline break NOP instructions ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION RO v NUMBER TO BE CONVERTED REGISTERS USED AS INPUT RO REGISTERS MODIFIED RO R1 REGISTER CONTAINING RESULT BENCHMARKS CYCLES 15 13 worst best not including subroutine overhead WORDS 15 not including subroutine overhead global AXPND AXPND XOR OD5H RO R2 Invert even bits ASH3 4 R2 RO Store for bit sign AND 7 RO Isolate segment cod BZD SKIP1 AND3 OFH R2 R1 Isolate quantization bin LSH 1 R1 ADDI 1 R1 Create Oxxxxl ADDI 32 R1 Or 1xxxxl SUBI 1 R0 SKIP1 LSH3 RO R1 RO Shift and put result in RO TSTB
171. T word 0 Transfer counter is set to largest value DESTIN word 002FF800H DES IDX word 1 text START LDP DMA4 Load data page pointer LDA DMA4 ARO Point to DAM channel 4 registers LDI SOURCE RO Initialize DMA source register STI RO ARO 1 LDI SRC_IDX RO Initialize DMA source index register STI RO ARO 2 LDI COUNT RO Initialize DMA count register STI RO ARO 3 LDI DESTIN RO Initialize DMA destination register STI RO ARO 4 LDI DES_IDX RO Initialize DMA destination index register STI RO ARO 5 LDI CONTROL RO Start DMA channel 4 transfer STI RO ARO LDHI 010H DIE Enable ICRDY 4 read sync end If external interrupt signals are used for DMA transfer synchronization then pins IIOFO 3 must be configured as interrupt pins The C4x DMA split mode is another way besides memory map address to transfer data from to the communication port When the split mode bit of the DMA control register is set the DMA is separated into primary and auxiliary channels The primary channel transfers data from memory to the commu nication port output register and the auxiliary channel transfers data from the communication port to memory The communication port number is selected in bits15 17 of the DMA control register Example 7 3 shows how to set up DMA channel 1 into split mode The DMA primary channel transfers data from internal RAM to communication port 3 Programming the DMA Co
172. T according to the IEEE 1149 1 bus slave device timing rules Signals TMS and TDI are series terminated to reduce signal reflections A 10 368 MHz test clock source is provided You may also provide your own test clock for greater flexibility Figure 11 2 JTAG Emulator Cable Pod Interface 5 V i v 74F175 2700 JP1 TDO Pin 7 74LVT240 10 368 MHz TMS Pin 1 GND Pins 4 6 8 10 12 TDI Pin 3 EMUO Pin 13 74AS1034 EMU1 Pin 14 gt TCK Pin 11 45V ES 180 2700 TRST Pin 2 JP2 74AS1004 TCK RET Pin 9 i PD VcGo Pin 5 gt 1002 RESIN TL7705A The emulator pod uses TCK_RET asits clock source for internal synchronization TCK is provided as an optional target system test clock source JTAG Emulator Cable Pod Signal Timing 11 5 JTAG Emulator Cable Pod Signal Timing Figure 11 3 shows the signal timings for the emulator cable pod Table 11 2 defines the timing parameters These timing parameters are calculated from values specified in the standard data sheets for the emulator and cable pod and are for reference only Texas Instruments does not test or guarantee these timings The emulator pod uses TCK_RET as its clock source for internal synchroni zation TCK is provided as an optional target system test clock source Figure 11 3 JTAG Emulator Cable Pod Timings 1 gt TCK RET 1 5V A 8 TMS TDI 4 5 i aa
173. TEXAS INSTRUMENTS TMS320C4x General Purpose Applications Users Guide 1999 Digital Signal Processing Solutions Y TEXAS INSTRUMENTS Printed in U S A May 1999 SPRU159A TMS320C4x General 8 Purpose Applications T TMS320C4x General Purpose Applications User s Guide SPRU159A May 1999 M Texas ds INSTRUMENTS NN E IMPORTANT NOTICE Texas Instruments and its subsidiaries TI reserve the right to make changes to their products or to discontinue any product or service without notice and advise customers to obtain the latest version of relevant information to verify before placing orders that information being relied on is current and complete All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement including those pertaining to warranty patent infringement and limitation of liability TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with Tl s standard warranty Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty Specific testing of all parameters of each device is not necessarily performed except those mandated by government requirements CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS MAY INVOLVE POTENTIAL RISKS OF DEATH PERSONAL INJURY OR SEVERE PROPERTY OR ENVIRONMENTAL DAMAGE CRITICAL APPLICATIONS TI
174. TMDS3260140 TMDS3260640 TMDS3080001 TMDX3261040 Platform PC DOS OS 2 VAX VMS SPARC Sun OS PC DOS PC DOS Windows SPARC Sun OS PC DOS SPARC Sun OS PC DOS PC XDS510 Sun XDS510WS PC DOS SPARC PC DOS SPARC PC DOS Windows SPARC Sun OS PC DOS OS 2 Windows Sun SPARC SCSI XDS510 XDS510WS XDS510 XDS510WS Includes XDS510WS box SCSI cable power supply and JTAG cable TMDS3240640 C source debugger software not included Includes XDS510 board and JTAG cable TMDS3240140 C source debugger software not included 10 12 Chapter 11 XDS510 Emulator Design Considerations This chapter explains the design reguirements of the XDS510 emulator with respect to JTAG designs and discusses the XDS510 cable manufacturing part number 2617698 0001 This cable is identified by a label on the cable pod marked JTAG 3 5V and supports both standard 3 volt and 5 volt target system power inputs The term JTAG as used in this book refers to TI scan based emulation which is based on the IEEE 1149 1 standard Topic Page 11 1 Designing Your Target System S 4 44 11 2 Emulator Connector 14 Pin Header We CNS MA nocaodaaooodndnodaconanndooaooononnanodnaioac 11 3 11 3 IEEE 1149 1 Standard co rer rre 11 3 11 4 JTAG Emulator Cable Pod Logic sees 11 4 11 5 JTAG Emulator Cable Pod Signal Timing 11 5 11 6 Emulation Timing Cal
175. TTMS bufskew tod TCK TDMS i TCKfactor 31ns 2ns 10ns 1 35ns 0 4 110 9ns 9 0 MHz ta rroo ta orckLmax su DTDLMIM la bufskew tod TCK DTDI i TCKfactor _ 15ns 15ns 7ns 10ns 7 0 4 120ns 8 3 MHz In this case the TCK to DTDI path is the limiting factor XDS510 Emulator Design Considerations 11 17 Emulation Design Considerations 11 9 3 Using Emulation Pins 11 18 The EMU0 1 pins of TI devices are bidirectional three state output pins When in an inactive state these pins are at high impedance When the pins are active they function in one of the two following output modes J Signal Event The EMUO pins can be configured via software to signal internal events In this mode driving one of these pins low can cause devices to signal such events To enable this operation the EMUO 1 pins function as open collector sources External devices such as logic analyzers can also be connected to the EMUO 1 signals in this manner If such an external source is used it must also be connected via an open collector source External Count The EMUO 1 pins can be configured via software as totem pole outputs for driving an external counter These devices can be damaged if the out put of more than one device is configured for totem pole operation The emulation software detects and prevents this condition However the emulation software has no control over external sources
176. Ts Real Forward Radix 2 FFT Continued BITRV2 Perform Bit R BITRV3 LDI FFT_SIZE RC LSH 1 RC LDA DEST_ADDR ARO ADDI RC ARO ADDI 1 AR0 LDA ARO AR1 LDA ARO AR2 LSH L RC SUBI IRE OP AR1 IRO B OP AR2 IRO B LDF ARO IR1 RO LDF AR1 R1 RPTBD BITRV2 CMPI AR1 ARO LDFGT RO R1 LDFGT AR1 IRO B R1 LDF ARO IR1 RO STF RO ARO LDF AR1 R1 STF R1 AR2 IRO B CMP I AR1 ARO LDFGT RO R1 LDFGT AR1 IRO B RO STF RO ARO STF R1 AR2 eversing Odd Locations LDI GFFT SIZE RC LSH 1 RC LDA RC IRO LDA DEST_ADDR ARO LDA ARO AR1 ADDI 1 ARO ADDI TRO AR1 LSH 1 RC LDA RC IRO SUBI 24 RG RPTBD BITRV3 OP LDF ARO RO LDF AR1 R1 LDF ARO IR1 RO STF RO AR1 IRO B LDF AR1 R1 STF R1 ARO IR1 STF RO AR1 STF R1 ARO BR STARTB Xchange Locations only if ARO ARI 1st Half Only Note could be instruction Applications Oriented Operations 6 61 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued io MeN Se So e OVE DATA Perform AR1 AR2 AR3 AR4 AR1 TARTB If SourceAddr If SourceAddr lt gt LDI PI EQ DI UBI DA DA DF PTS DF STF STF PDE PEP newa Check Data Source Locations DestAddr Then do nothing SOURCE_ GDEST STARTB GFFT 2 R0 SOURCE ADDR ARO DEST ADDR AR1 ARO R1 RO ARO R1 R1 AR1
177. UBF3 AR2 AR1 R1 DDF3 AR2 AR1 R2 EGF AR3 R3 DF AR2 IRO RO lt X 11 X I1 lt X 14 R7 X 11 X 12 R2 R7 R4 R3 R7 R4 X I3 lt 4 gt X I4 lt X I2 lt ZD S 5 1 F X I3 X I3 RO X I3 Applications Oriented Operations 6 63 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued LOOP3_A Part B Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne 5e S S S A S N S j S TF EGF RO gt RI gt R2 gt LDI LSH LDA UBI DA DA wn iw D DA DA DDI DDI DDI DDI iw D pp pbpesppPEEEELE DF MPYF 3 MPYF3 ADDF3 MPYF3 RO AR1 R1 RO AR1 R2 AR3 R3 R2 AR1 R1 AR2 R3 AR3 1 T 13 LOB Oo 00 UIO O1 4S C0 IND ES O XE R2 AR1 IRO RI AR2 IRO R3 AR3 IRO lt X I1 lt IL lt X 12 lt X I2 NOTE QFFT SIZE RC 3 RC RC IR1 3 RC 8 IRO DEST_ADDR ARO ARO AR1 ARO AR2 ARO AR3 1 ARO 3 AR1 5 AR2 7 AR3 QSINE TABLE AR7 AR7 I R1 R7 AR7 AR2 RO AR3 R7 RO R1 R2 AR7 AR2 IRO RO R1 So NG Ne Ne Ne Ne Se Se Ne R1 XII X I3 XII X I3 X I4 4 lt lt P X I3 COS X I4 COS
178. WWWWWWWWNNNNNNNNNDNYD al Examples Processor Initialization Example oooccccccccccccccc ro 1 7 Linker Command File for Linking the Previous Example oo 1 9 Enabling the Cache 22d ee lr a a 1 9 Regular Subroutine Call Dot Product 4 2 3 Zero Overhead Subroutine Call Dot Product 2 2 5 Use of Interrupts for Software Polling oooocccccccccnccoca 2 11 Use of One Interrupt Signal for Two Different Services 00 0 lt 2 12 Interrupt Service Routine 00 0 c ccc n 2 13 Context Save and Context Restore 2 eens 2 15 Use of Block Repeat to Find a Maximum or a Minimum 0c eee eee 2 18 Loop Using Delayed Block Repeat 0 eee 2 19 Loop Using Single Repeat 2 nh 2 20 Comp ted GOTQO orinal rs F R t REI AER TEE 2 21 Use of TSTB for Software Controlled Interrupt 2 3 2 Copy a Bit from One Location to Another 000 cece cece esee ees 3 2 Block Move Under Program Control eect cnet eens 3 3 Use of Packing Data From Half Word FIFO to 32 Bit Data Memory 3 4 Use of Unpacking 32 Bit Data Into Four Byte Wide Data Array 3 5 CPU Bit Reversed Addressing 2 teen eet ns 3 7 Integer DIVISION seni asimi id snad dodal SA Maced d kv Shed adbuc d acad 3 11 Inverse of a Floating Point Number With 32 Bit Mantissa Accuracy 3 14 Reciprocal of the Square Root of a Positive Floating Point
179. Waveforms Englewood Cliffs NJ Prentice Hall Inc 1984 Papamichalis Panos Practical Approaches to Speech Coding Engle wood Cliffs NJ Prentice Hall Inc 1987 Papamichalis P and D Lively Implementation of the DOD Standard LPC 10 52E on the TMS320C25 Proceedings of SPEECH TECH 87 pages 201 204 April 1987 Pawate B I and G R Doddington Implementation of a Hidden Markov Model Based Layered Grammar Recognizer Proceedings of ICASSP 89 USA pages 801 804 May 1989 Rabiner L R and R W Schafer Digital Processing of Speech Signals Englewood Cliffs NJ Prentice Hall Inc 1978 9 Reimer J B and K S Lin TMS320 Digital Signal Processors in Speech Applications Proceedings of SPEECH TECH 88 April 1988 10 Reimer J B M L McMahan and W W Anderson Speech Recognition for a Low Cost System Using a DSP Digest of Technical Papers for 1987 International Conference on Consumer Electronics June 1987 Control 1 Ahmed I 16 Bit DSP Microcontroller Fits Motion Control System Ap plication PCIM October 1988 2 Ahmed l Implementation of Self Tuning Regulators with TMS320 Fami ly of Digital Signal Processors MOTORCON 88 pages 248 262 Sep tember 1988 3 Ahmed l and S Lindquist Digital Signal Processors Simplifying High Performance Control Machine Design September 1987 4 Ahmed l and S Meshkat Using DSPs in Control Control Engineering
180. age levels when running multiple C4x devices or any other CMOS device from different power supplies This can create a CMOS latch up that can permanently damage your device Adding serial resistors to C4x communication ports connecting devices in different boards marginally helps to protect communication port drivers It is rec ommended that all C4x devices in the system remain in reset until power supplies are stable Sometimes it is beneficial to keep the line impedance as high as possible This helps when interfacing to external cables Typical ribbon cable im pedance is about 100 Q Because it is sometimes difficult to route high impedance lines especially long ones in a circuit board use an external ribbon cable to jump over the length of a board In this case only two headers should be installed in the circuit board Useanalternating signal and ground scheme This helps control differen tial signal coupling and impedance variation For quality signals use a 26 wire ribbon 4 control 8 data 1 shield 2 26 The shield is need ed for the signal that is otherwise on the edge Do not route signals on top of each other When it is necessary to cross traces on adjacent layers cross them at right angles to reduce coupling Note Because the C4x communication ports are very high speed data transmis sion circuits signal quality is very important A poor quality signal can cause the missing or slipping of a b
181. all Dot Product Subroutines TITLE ZERO OVERHEAD SUBROUTINE CALL DOT PRODUCT e MAIN ROUTINE THAT CALLS THE SUBROUTINE DOT TO COMPUTE THE DOT PRODUCT OF TWO VECTORS LAJ DOT LDI C b1k0 ARO ARO points to vector a LDI b1k1 AR1 AR1 points to vector b LDI N RC RC contains the number of elements e SUBROUTINE DOT EQUATION d a 0 b 0 a 1 b 1 a N 1 b N 1 THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO N MUST BE GREATER THAN OR EQUAL TO 2 ARGUMENT ASSIGNMENTS ARGUMEN FUNCTION ecc ct CE ARO ADDRESS OF a 0 AR1 ADDRESS OF b 0 RC LENGTH OF VECTORS N e REGISTERS USED AS INPUT ARO AR1 RC REGISTER MODIFIED RO REGISTER CONTAINING RESULT RO global DOT DOT PUSH SEL Save status register PUSH R2 Use the stack to save R2 s PUSHF R2 bottom 32 and top 32 bits PUSH ARO Save ARO PUSH AR1 Save AR1 PUSH RC Save RC PUSH RS PUSH RE Program Control Subroutines Example 2 2 Zero Overhead Subroutine Call Dot Product Continued Initialize RO MPYF3 ARO AR1 RO a 0 b 0 gt RO SUBF R2 R2 R2 Initialize R2 SUBI 2 RC Set RC N 2 DOT PRODUCT 1 lt i lt N RPTS RC Setup the repeat single MPYF3
182. ample 2 10 Computed GOTO ROUTINE CONTROLS THE ORDER OF TASK EXECUTION LLED TITLE COMPUTED GOTO TASK CONTROLLER THIS MAIN 6 TASKS IN THE PRESENT EXAMPLE THE NAMES OF SUBROUTINES TO BE CA IN ORDER TASKO TASK1 TASKS m OCCURS T ud PROCESSOR IDLE INSTRUCTION TASK FOR THE CURRENT CYCLE CALLS AND BRANCHES BACK TO THE IDLE INS EXECUTION RO HOLDS THE OFFSET FRO TASK TO BE EXECUTED BIT 15 SET COND BIT ST SHOULD BE SET TO 1 LDI 5 IRO LDI GADDR ARI WAIT IDLE ADDI AR1 IRO R1 SUBI 1 IRO LDILT 5 IRO CALLU R1 BR WAIT TSKSEO word TASK5 word TASK4 word TASK3 word TASK2 word TASK1 word TASKO ADDR word TSKSEO Initialize AR1 holds the base address IRO of the table Wait for the next interrupt TASKO THROUGH TASK5 ARE THEY ARE EXECUTED WHEN AN INTERRUPT HE INTERRUPT SERVICE ROUTINE IS EXECUTED CONTINUES WITH THE INSTRUCTION FOLLOWING THE THIS ROUTINE SELECTS THE APPROPRIATE THE TASK AS A SUBROUTINE PRUCTION TO WAIT FOR THE NEXT SAMPLE INTERRUPT WHEN THE SCHEDULED TASK HAS COMPLETED THE BASE ADDRESS OF THE OF STATUS REGISTER AND THE Add base address to the table entry number Decrement IRO If IRO lt O reinitialize it to 5 Execute appropriate task Address of Address of Address of Address of Address of Address of TASK5 TASK4 TASK3 TASK2 TASK1 TASKO
183. and wait twob This 16R4 PLD based design can be used to implement different numbers of wait states for multiple devices More devices can be selected with C4x ad dress lines and a higher number of wait states can be produced with a PLD logic Furthermore this approach can be used in conjunction with the C4x s internal wait state generator Memory Interfacing 4 15 Wait States and Ready Generation Example 4 1 PLD Equations for Ready Generation 0001 module ready_generation 0002 title ready generation logic for 0 1 and 2 wait state devices interfaced 0003 to TMS320C4x 0004 0005 C40u5 device P16R4 0006 0007 inputs 0008 h3 Pin 1 0009 0010 0011 The following are TMS320C40 address bits used to 0012 select the different speed devices More can be used if 0013 necessary In this example a zero wait state a one wait 0014 state and a two wait state device are decoded with these three address bits 0015 0016 ahil Pin 2 when high selects zero wait state device 0017 ahi2 Pin 3 when high selects one wait state device 0018 ahi3 Pin 4 when high selects two wait state device 0019 strb0_ Pin 5 indicates valid TMS320C40 bus cycle 0020 reset_ Pin 6 reset signal from TMS320C40 0021 strb_syn_ Pin 7 reset strb0_ synchronized with Hl rising edge 0022 output 0023 rdy0_ Pin 12 ready signal to TMS320C40 0024 0025 one wait Pin 14 internal flip flop signal for 1 wait state 0026
184. ansistion will probably not be enough to corrupt the next sample By adding a hysteris loop made from resistors R1 and R2 the noise immunity is improved more Capacitor C1 is an additional analog filter that rejects high frequency noise The next major improvement is the use of a current driver in place of the isola tion resistor In this case an RS232 driver is used this driver can drive beyond the supply rails of the DSP and has a built in current limit of about 20mA Diodes D1 and D2 along with R3 clamp the resulting signal to the supply rails of the DSP and latch to prevent excessive overdrive The DSP and latch both have internal clamping diodes but it is not recommended that you rely on them as the internal clamp diodes are not intended for this purpose 8 12 Commport to Host Interface 8 6 3 How the Circuit Works The PC can drive any value on the control lines independent from the returned status If a logic 1 is driven into the drive side ofthe isolation resistor and a logic 0 is observed on the sense side the C4x commport signal under question is without a doubt an output By then driving levels and polling the returned status it is possible to synchro nize a host processor to the state machine of the C4x commport The advan tage of this design is that it can be easily ported to any smart processor with any basic I O capability For example TMS320C31 32 devices have been used as slave devices that are bootloaded from a
185. ase 3 Store to memory using ARn to push data onto the stack and read from memory using ARn to pop data off the stack Case 4 Store to memory using ARn to push data onto the stack and read from memory using ARn to pop data off the stack Figure 2 3 shows these two cases In case 3 the AR always points to the top of the stack In case 4 the AR always points to the next free location on the stack Figure 2 3 Implementations of Low to High Memory Stacks Low Memory Low Memory Bottom of stack Bottom of stack Top of stack Top of stack ARn gt High Memory High Memory Case 3 Case 4 2 2 3 Queues and Double Ended Queues The implementations of queues and double ended queues is based upon the manipulation of the auxiliary registers for user stacks Program Control 2 9 Stacks and Oueues For queues two auxiliary registers are used one to mark the front of the queue from which data is popped and the other to mark the rear of the queue to where data is pushed For double ended queues two auxiliary registers are also necessary One register marks one end of the double ended queue and the other register marks the other end Data can be popped from or pushed onto either end Interrupt Examples 2 3 Interrupt Examples When using interrupts you must consider several issues This section offers examples of several interrupt related topics Interrupt Service Routines Context Switching
186. ber with an absolute value between 1 and 2 and the base is b 2 Al though the mantissa is represented as a fixed point number the actual value of the overall number floats the binary point because of the multiplication by b The exponent e is an integer whose value determines the position of the binary point in the number IEEE has established a standard format for the re presentation of floating point numbers To achieve higher efficiency in the hardware implementation the C4x uses a floating point format that differs from the IEEE standard However C4x has two single cycle instructions TOIEEE and FRIEEE forthe format conversion These two instructions can also be used with the STF instruction which allows the data format to be converted within memory to memory transfer Here are descriptions of both formats and an example program to convert between them C4x floating point format 8 bits 1 23 bits boe je a In a 32 bit word representing a floating point number the first 8 bits corre spond to the exponent expressed in twos complement format One bit is for sign and 23 bits are for the mantissa The mantissa is expressed in twos com plement form with the binary point after the most significant nonsign bit Be cause this bit is the complement of the sign bit s it is suppressed In other words the mantissa actually has 24 bits One special case occurs when Logical and Arithmetic Operations 3 19 Floating Point Format Conv
187. ble trap vector table pointer TVTP A register in the CPU expansion register file that contains the address of the beginning of the trap vector table TVTP See trap vector table pointer unified mode A mode of operation of the DMA coprocessor The mode is used mainly for memory to memory transfers This is the default mode of operation for a DMA channel See also split mode wait state A period of time that the CPU must wait for external program data or I O memory to respond when reading from or writing to that ex ternal memory The CPU waits one extra cycle for every wait state wait state generator A program that can be modified to generate a limited number of wait states for a given off chip memory space lower program upper program data or I O zerofill Fillthe low or high order bits with zeros when loading a number into a larger field 14 pin connector dimensions 11 13 14 pin header header signals 11 2 JTAG 11 2 2D array 8 18 3 D grid 8 19 4 D hypercube 8 19 64 bit addition example 3 17 A law compression expansion 6 2 A D converter definition A 1 A0 A30 definition A 1 adaptive filters 6 13 ADDC instruction 3 17 ADDI instruction 3 17 address definition A 1 generation 3 6 address pins external A 5 addressing mode definition A 1 algorithm LMS 6 13 ALU See arithmetic logic unit ANSI C programs 5 2 applications hardware 4 1 applications oriented operations introduction 6 1 ARAU See aux
188. but the result ofthe TSTB is used only to set the condition flags and is not written any where Example 3 1 and Example 3 2 demonstrate the use of several in structions for bit manipulation and testing Example 3 1 Use of TSTB for Software Controlled Interrupt TITLE USE OF TSTB FOR SOFTWARE CONTROLLED INTERRUPT ix IN THIS EXAMPLE ALL INTERRUPTS HAVE BEEN DISABLED BY RESETTING THE GIE BIT OF THE STATUS REGISTER WHEN AN INTERRUPT ARRIVES IT IS STORED IN THE IF REGISTER THE PRESENT EXAMPLE ACTIVATES THE INTERRUPT SERVICE ROUTINE INTR WHEN IT DETECTS THAT INT2 HAS OCCURRED TSTB 4 IIF Check if bit 2 of IF is set CALLNZ INTR and if so call subroutine INTR Example 3 2 Copy a Bit from One Location to Another TITLE COPY A BIT FROM ONE LOCATION TO ANOTHER HOLDING I AND IT IS ASSUMED THAT THE NEX EMORY LOCATION HOLDS THE VALUE J BIT I OF R1 NEEDS TO BE COPIED TO BIT J OF R2 ARO POINTS TO A LOCATION LDI LRO LSH ARO RO Shift 1 to align it with bit I TSTB R1 RO Test the I th bit of R1 BZD CONT If bit 0 branch delayed LDI 1 R0 LSH ARO 1 RO Align 1 with J th location ANDN RO R2 If bit 0 reset J th bit
189. case of a 32 bit positive dividend with i significant bits and 32 i sign bits and a 32 bit positive divisor with j signifi cant bits and 32 jsign bits The repetition of the SUBC command 1 times produces a 32 bit result in which the lower 1 bits are the quotient and the upper 31 i j bits are the remainder of the division SUBC implements binary division in the same manner as long division The divisor assumed to be smaller than the dividend is shifted left times to align with the dividend Then using SUBC the shifted divisor is subtracted from the dividend For each subtract that does not produce a negative answer the divi dend is replaced by the difference It is then shifted to the left and the LSB is set to 1 If the difference is negative the dividend is simply shifted left by one This operation is repeated 1 times As an example consider the division of 33 by 5 using both long division and the SUBC method In this case i 6 j 3 and the SUBC operation is repeated 6 3 1 4 times LONG DIVISION Quotient 00000000000000000000000000000110 00000000000000000000000000000101 00000000000000000000000000100001 101 1101 101 Remainder 11 Logical and Arithmetic Operations 3 9 Integer and Floating Point Division 3 10 SUBC METHOD 00000000000000000000000000100001 00000000000000000000000000101000 Negative difference l 00000000000000000000000000100001 00000000000000000000000
190. ce 1 is FFT SIZE 4 2 fg8m2 space 1 is FFT SIZE 8 2 sintab word _ SINE pointer to sine table sintp2 word _SINE 2 pointer to sine table 2 inputp2 space 1 pointer to input 2 inputp space 1 pointer to source address outputp Space 1 pointer to dst address Applications Oriented Operations 6 43 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued cr2dit FF AX A X X Sect DI FU U FU U fU av tU Deyo c aaa aan ana nu mn uv I H ar AR 4 arl BR d mr2 CR 4 ar3 DR 4 ar4 AR aro BR ar6 DR Initialize C Function T LELExXL AR6 AR7 DP REGPARM 0 ARO 1 AR2 ARO 2 R2 ARO 3 R3 fg R2 fg 1 R2 AR2 inputp 2 AR2 R0 RO inputp2 R3 outputp R2 fg2 z 1 R2 2 R2 R0 RO f g4m2 1 R2 2 R2 R0 RO Bfg8m2 Se 5 5 e GT E ER Ho CI ar7 first twiddle factor arguments passed in stack src address FFT size dst address Initialize DP pointer fg FFT SIZE R2 FFT SIZE 2 inputp SOURCE ADDR inputp2 SOURCE ADDR 2 output DST ADDR fg2 nhalb FFT size 2 fg4m2 NVIERT 2 FFT SIZE 4 2 1 6 44 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued STARTB ldi ldi ldi addi addi addi l
191. ces the CPU into a subroutine called an interrupt service routine This signal can be triggered by an external device an on chip peripheral or an instruction TRAP for example interrupt acknowledge IACK Asignalthat indicates that an interrupt has been received and that the program counter is fetching the interrupt vec tor location interrupt vector table IVT An ordered list of addresses which each corre spond to an interrupt when an interrupt occurs and is enabled the pro cessor executes a branch to the address stored in the corresponding location in the interrupt vector table interrupt vector table pointer IVTP A register in the CPU expansion register file that contains the address of the beginning of the interrupt vector table ISR nterrupt service routine A module of code that is executed in response to a hardware or software interrupt IVTP See interrupt vector table pointer Glossary LAO LA30 External address pins for data program memory or I O devices These pins are on the local bus See also A0 A30 LDO LD31 External data bus pins that transfer data between the processor and external data program memory or I O devices See also D0 D31 LSB Least significant bit The lowest order bit in a word machine cycle See CPU cycle mantissa A component of a floating point number consisting of a fraction and a sign bit The mantissa represents a normalized fraction whose binary point is shifted b
192. commport 3 to commport 0 DMA3 source sync with ICRDY3 is used This example is functionally equivalent to Example 7 7 In this program DMA3 expects data in commport 3 being sent by another processor device Otherwise no transfer will occur ck ck kc ck ck ck ck ck ck ck ck ck ck ck kk ck ck ck X ck AX KA ck ck ck ck ck A KA X A AXA AX KA AXA AAA RARA AAA KA kk ck ZA ck ck ck ko ko ck ko ck ke kx kx ko ko i include dma h define DMAADDR 0x001000d0 define CTRLREG 0x0309c091 DMA Aux sends interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority define DST 0x00100042 dst commport 3 output fifo define DST IDX 0x0 dst address does not increment define DIEVAL 0x4000 set ICRDY3 Auxiliar read sync define ACOUNTER 0x08 auxiliar channle counter DMASPLIT dma DMASPLIT DMAADDR int dieval DIEVAL main dma gt dst void DST dma gt dst_idx DST_IDX dma gt acounter ACOUNTER dma gt ctrl void CTRLREG asm ldi dieval die AUX_WAIT_DMA volatile int dma Programming the DMA Coprocessor 7 13 DMA C Programming Examples Example 7 10 Split Mode Auxiliary and Primary Channel DMA 8K ok ok oko ok ok oko oko oko oko oko oko k oko ok k oko ok oko ok oko ok oko oko oko oko oko oko oko oko kk ok kok kok kok kok kok kok kok kok ke e e e e oe EXAMPLE Split mode AUX and PRIMARY both running
193. commport and then used as serial ports with internal memory and additional processing capabilities Com plicated and risky ASIC designs are not required and the solution is fully pro grammable You must include current limiting circuitry when designing any C4x interface If the current is not limited it can exceed 100 mA per pin which can damage a device 8 6 4 The Interface Software The interface software for this host interface is available through the TI BBS filename M4x 2 exe This file contains not only the low level software driv ers but also extra code for the M4x a multiprocessor C4x communication ker nel applications note The following files are contained in this application M4X Debugger no source code MEMVIEW memory and communications matrix view and edit utility MANDEL40 multiprocessor Mandelbrot demonstration program M4X ASM multiprocessor TMS320C4x communications kernel DRIVER CPP higher level system functions TARGET CPP getmem putmem run stop and singlestep commands OBJECT CPP source code for using the printer port interface O O O O O O L Using the Communication Ports 8 13 An I O Coprocessor C4x Interface 8 7 An I O Coprocessor C4x Interface This section presents a software based interface that provides a C4x with a flexible bidirectional interface to a TMS320C32 The C32 acts as a smart I O coprocessor that can provide AIC interfacing and data preprocessing among others The
194. culations 4 44 4444 11 6 11 7 Connections Between the Emulator and the Target System 11 8 11 8 Mechanical Dimensions for the 14 Pin Emulator Connector 11 12 11 9 Emulation Design Considerations 11 14 11 1 Designing Your Target System s Emulator Connector 14 Pin Header 11 1 Designing Your Target System s Emulator Connector 14 Pin Header JTAG target devices support emulation through a dedicated emulation port This port is a superset of the IEEE 1149 1 standard and is accessed by the emulator To communicate with the emulator your target system must have a 14 pin header two rows of seven pins with the connections that are shown in Figure 11 1 Table 11 1 describes the emulation signals Figure 11 1 14 Pin Header Signals and Header Dimensions TMS TRST i END ooo 00 in X Y PD Vcc no pin key Pin width 0 025 in square post TDO GND Pin length 0 235 in nominal TCK RET GND TCK GND EMUO EMU1 While the corresponding female position on the cable connector is plugged to prevent improper connection the cable lead for pin 6 is present in the cable and is grounded as shown in the sche matics and wiring diagrams in this document Table 11 1 14 Pin Header Signal Descriptions Emulatort Targett Signal Description State State TMS Test mode select O I TDI Test data input O I TDO Test data output I O TCK Test clock TCK is a 10 368 MHz clock O l
195. cumstances because wait state devices are in herently slow and often reguire complex select decoding If RDY is high between accesses zero wait state devices which tend to be inherently fast can usually respond immediately with a ready indication Wait state devices can simply delay their select signals appropriately to generate a ready Typically this approach results in the most efficient implementation of ready control logic Figure 4 7 shows a circuit of this type which can be used to generate 0 1 or 2 wait states for multiple devices in a system Figure 4 7 Logic for Generation of 0 1 or 2 Wait States for Multiple Devices 16R4 PLD CAx gt Address bus bits for device selection gt STRB0 5 7 RDYO From C4x RESET gt To C4x strb syn gt From C4x H3 gt 4 14 Wait States and Ready Generation 4 5 5 Example Circuit Figure 4 7 shows how a single 7 ns 16R4 programmable logic device PLD can be used to generate 0 1 and 2 wait states for multiple devices that are interfaced to a C4x In this example distinct address bits are used to select the different wait state devices Here each of the three address lines input to the 16R4 corresponds to a different speed device For a single 16R4 imple mentation up to nine different address bits can be used to select different speed devices The single output 4Q of the PLD is connected directly to the RDYO input of the C
196. cutive reads followed by a write For con secutive reads LSTRBO stays active low and LR W stays high as long as read cycles continue For back to back reads the C4x requires zero wait state memories to have an address valid to data valid time of less than 21 ns For most memory devices this time is the same as the memory access time which is t4 20 ns Thus memories with access times of 25 ns or more cannot meet this timing Memory device timing is not as critical for zero wait state as for nonzero wait state write cycles because of the two H1 cycle writes of the C4x The extra cycle gives LSTRBO enough time to frame LR W preventing memories that go into high impedance slowly at the end of a read cycle from driving the bus during the subsequent write cycle For the memory device used in this design Figure 4 3 the data lines are guaranteed to into high impedance to 10 ns after CS goes inactive which gives more than 23 ns of margin before the C4x starts driving the bus with write data Also the extra cycle with LSTRBO inactive prevents writes to random locations in memory while the address is changing between consecutive writes For the write cycles shown in Figure 4 3 and Figure 4 4 the RAM requires 15ns of write data setup before CS goes high andthis design provides atleast 24 ns t3 A data hold time of 0 ns t4 is required by the RAM and this design provides greater than 13 ns Finally the RAM s 20 ns
197. d XDS emulator both provide benchmarking and timing capabilities that help you determine bus usage Current Reguirement of Internal Components Figure 9 3 Internal Bus Current Versus Transfer Rate Incremental I pp mA 280 240 200 160 120 80 40 0 5 10 15 20 25 30 35 40 45 50 Transfer Rate MHz The current resulting from internal bus usage varies linearly with transfer rates Figure 9 3 shows internal bus current requirements for transferring alternat ing data AAAA AAAAh to 5555 5555h at several frequencies Note that trans fer rates greater than the TMS320C4x s MIPS rating are possible because of internal parallelism The data set AAAA AAAAh to 5555 5555h exhibits the maximum internal bus current for data transfer operations The current required for transferring other data patterns may be derated accordingly as described later in this subsec tion As the transfer rate decreases that is transfer cycle time increases the incre mental Ipp approaches 0 mA This figure represents the incremental Ipp due to internal bus operations and is added to quiescent and internal operations current values For example the maximum transfer rate corresponds to three accesses every cycle one program fetch and two data transfers or an effective one third H1 transfer cycle time At this rate 178 mA is added to the quiescent 130 mA and internal operation 60 mA current values for a total of 368 mA C4x Power Di
198. d addressing can be implemented through both the CPU and DMA For correct CPU or DMA bit reversed operation the base address of bit re versed addressing must be located on a boundary of the size of the table To clarify this point assume an FFT of size N 2 When real and imaginary data are stored in separate arrays the n LSBs of the base address must be zero 0 and IRO must be initialized to 271 half of the FFT size When real and imaginary data are stored in consecutive memory locations Re m Re Im the n 1 LSBs of the base address must be zero 0 and IRO must be equal to IRO 2 N FFT size 3 4 1 CPU Bit Reversed Addressing 3 6 One auxiliary register ARO in this case points to the physical location of a data value When you add IRO to the auxiliary register by using bit reversed addressing addresses are generated in a bit reversed fashion reverse carry propagation The largest index IRO in this case for bit reversing is OOFF FFFFh Example 3 6 illustrates how to move a 512 point complex FFT from the place of computation pointed at by ARO to a location pointed at by AR1 Reads are executed in a bit reversed fashion and writes in a linear fashion In this exam ple real and imaginary parts XR i and XI i of the data are not stored in sepa rate arrays but they are interleaved with XR 0 XI 0 XR 1 XI 1 XR N1 XI N1 Because of this arrangement the length of the array is 2N instead of N and
199. d applied to the CREQ line This results in an oscillation until the synchronizer period has timed out Figure 8 5 A Token Forcer Circuit Output V cc e CLK U1 U2 Rp 10 KQ lt lt CREQ gt CACK 4 AM Rs 4700 Using the Communication Ports 8 15 Implementing a Token Forcer Forcing a communication port to become an input port Figure 8 6 shows a circuit that forces a communication port to become an in put port In this circuit driving the CREQ line with an inverted CACK reconfi gures an input port as an output If CREQ is an input it is held low through Rs whenever CACK is high or floating high because of Rp The port then responds to this request by driving CACK low which in turn drives CREQ high finishing the token acknowledge As in Figure 8 5 synchronizer delays mimic the re sponse of another CAx communication port to prevent oscillation Figure 8 6 Communication Port Driver Circuit Input 8 16 Vee CLK Rp 10 kQ lt lt CREQ lt AM Rg 470 Q Inverter Note that after the port has been reconfigured as an input port the CREQ line is active high while the output of the inverter is low This causes a constant cur rent flow from CREQ to the inverter Implementing a CSTRB Shortener Circuit 8 9 Implementing a CSTRB Shortener Circuit In C40 device revisions lower than 3 0 the width of the CSTRB low pulse be tween word boundar
200. d held by the bus until the bus cycle is complete Since the CPU may not require that bus again for some time the CPU is free to perform operations on other buses until a conflict occurs Conflicts include DMA a second write or a read to the bus In Figure 9 5 the upper line is applicable when STI STI is not dominated by execution of internal NOPs and the external wait state is equal to zero The lower line shows when STI STI is internally stalled while waiting for the exter nal bus to go ready because of wait states The addition of NOPs between successive STI STI operations contributes to internal bus current and there fore does not result in the lowest possible current Current Reguirement of Output Driver Components Figure 9 6 Local Global Bus Current Versus Transfer Rate at Zero Wait States 420 380 340 300 Ipp mA 260 220 180 140 Table 9 1 Wait State Transfer Rate Mword second To further illustrate the relationship of current and write cycle time Figure 9 6 shows the characteristics of current for various numbers of cycles between writes for zero wait states The information on this graph can be used to obtain more precise values of current whenever zero wait states are used Table 9 1 lists the number of cycles used for software generated wait states Timing Table Wait State Read Cycles Write Cycles 0 1 2 1 2 3 2 3 4 3 4 5 Once a current value has been obtained from Figu
201. ddle factor ldi 0 ar4 group counter ldi Qinputp arO ck ck 0k ck ck 0k ck Sk ck ck KA AKA KA A KA KAZ A KA XK ck ck ck A KA KA AKA A KA KA AKA AAA KA KAZ ck ck AKA AAA K AKA kk A K kA kx ko SECOND LAST STAGE ck ck ck ck ck ck ke ck ck ck ck A AZ A ck ck ek ck ck ck AXA A A kk ck KKK KKK KKK KKK AXA A KKK KKK KKK KKK X AXA AX X AA AX ck ko A Sk ko k ko ko ko ko ko ldi Ginputp arl ldi ar0 ar2 upper output addi ir0 ar0 arl lower input ldi arl ar3 lower output ldi sintp2 ar7 pointer to twiddle faktor ldi 5 ir0 distance between two groups ldi Qfg8m2 rc fill pipeline 1 butterfly w 0 addf ar0 arl r2 AR r2 AR BR subf arl ar0 13 BR r3 AR BR addf karo arel 20 AI r0 AI BI subf arl ar0O rl BI rl AI BI 2 butterfly w 0 addf ar0 arl r6 AR r6 AR BR subf arl arO r7 BR r7 AR BR addf ar0 arl r4 AI r4 AI BI subf arl ir0 ar0 ir0 r5 BI r5 AI BI stf r2 ar2 AR r2 B stf r3 ar3 BR r3 st r0 ar2 AI r0 IE stf rl ar3 BI rl stf ro ar2 AR r6 M stf r7 ar3 BR r7 StL r4 ar2 ir0 AI r4 stf r5 ar3 ir0 BI r5 Applications Oriented Operations 6 49 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued e 3a Xo
202. de DMA uses autoinitialization method 1 to transfer 2 data blocks Y Example 7 8 Unified mode DMA uses autoinitialization method 2 to transfer 2 data blocks Example 7 9 Split mode auxiliary DMA transfers data between comm ports using read sync Example 7 10 Split mode auxiliary and primary channel send receive data to and from commport Example 7 11 Split mode DMA autoinitializes both auxiliary and primary channels auxiliary transfers 1 block and primary transfers 2 blocks Example 7 12 is the include file for all examples dma h Programming the DMA Coprocessor 7 9 DMA C Programming Examples Example 7 6 Unified Mode DMA Using Read Sync KK KKK KK IK RR RR RR RR RR RR RR RR RR RRA EXAMPLE Unified mode Commport to commport transfer DMA3 in unified mode transfers 8 words from commport 3 to commport 0 DMA3 source sync with ICRDY3 is used Note Writes cannot be synchronized with OCRDY0 because a DMA i can only be synchronized with signals coming commport i You could sync on ICRDY3 or on OCRDYO not both the choice depends on the specific application to avoid deadlock In this program DMA3 expects data in commport 3 being sent by another processor device Otherwise no transfer will occur RAR RAR RAR RAR A oko oko oko ok oko oko oko oko oko oko oko ok oko oko oko ok oko ok kok ok ok kok ke ke ke e ke ke ke e ee ee e e include dma h define DMAADDR 0x001000d0 de
203. development It includes example code and hardware connections for various applications The guide shows how to use the instruction set the architecture and the C4x interface It presents examples for frequently used applications and discusses more involved examples and applications It also defines the principles in volved in many applications and gives the corresponding assembly language code for instructional purposes and for immediate use Whenever the detailed explanation of the underlying theory is too extensive to be included in this manual appropriate references are given for further information How to Use This Manual The following table summarizes the information contained in this user s guide If you are looking for information about Turn to these chapters Arithmetic Chapter 3 Logical and Arithmetic Operations Communication Ports Chapter 8 Using the Communication Ports Companding Chapter 6 Applications Oriented Operations Development Support Chapter 10 Development Support and Part Or der Information If you are looking for information about DMA Coprocessor FTTs Filters Ordering Parts Repeat Modes Reset Stacks Tips Wait States XDS510 Emulator Style and Symbol Conventions Turn to these chapters Chapter 7 Programming the DMA Coprocessor Chapter 6 Applications Oriented Operations Chapter 6 Applications Oriented Operations Chapter 10 Development Support and Part Or der Information
204. di ldi ldi ldi lsh subi KKK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK AX AXA KK KKK KKK KKK KKK KKK AAA AKA KKK AZ A KKK kc kc k ERE LY KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK A KA KKK KK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KK KKK fill pipeline add sub add sub add mpy sub add stf sub stf add mpy sub rpt add stf subf stf addf Fh Fh Eh Fh Fh Fh Fh Fh h Fh O Hh Fh Fh Qfg2 ir0 sintab ar7 ar2 ar0 ir0 ar0 arl ir0 arl ar2 ir0 ar2 ar3 ar0 ar4 arl ar5 ar3 ar6 2 irl 43r 2 1r 0 rc FIRST 2 STAGES AS RADIX 4 BUTTI ar2 ar 0 r4 ar2 ar0 r5 arl ars r6 arl ar3 r7 r6 r4 r0 kar ariel r6 r4 r3 rl arl r0 r0 ar44 Fl arl4 r3 ar54 r1 r55 rf2 ar ar2 rl rl rb5 r3 b1k1 Fl amo re r2 ar2 irl 166 3 ar6 r0 r2 r4 1 Ne Ne Ne Ne Ne Ne Ne Ne Ne Ne Se So Se Ne Ne Ne Se iro ar7 ar0 arl ar2 ar3 ar4 ars ar6 ir0 r4 r5 r6 r7 AR rl ro rl CR r1 Setup for radix 4 butterfly loop r2 r6 n 2 offset between SOURCE ADDRs points to twiddle factor 1 points to AR points to BR points to CR points to DR points to AR points to BR points to DR addressoffset n 4 number of R4 butterflies AR CR AR CR DR BR DR BR r0 r4 r6 DI BI BI r2 r5 rl CI AI AI r4 r2 r0 BR x3 r4
205. e X I Y I pointer Increment loop counter for next time Next FFT stage delayed IE 2 IE N1 N2 OUTPUTP ar2 INPLACE RFFTSIZE irO 2 LEO EGC BITRV 2 ET OUTPUTP arl ar2 1 r0 ar2 ir0O b rl EQ arl L ar2 1 r0 lo arl 1el END ar2 ir0 b r1 r tar1 1 ri arl BITRV2 ar2 arl arl 2 ar2 ir0 b arl ar2 CONT arl r0 ar2 rl 0 tarz sl rani arl 1 r0 ar2 1 r1 ir0 FFT SIZI iro FFT SIZE Z SRC different from DST ar2 SRC ADDR EE jirl 2 jarl DST ADDR read first Im value in place bit reversing KKK KKK KKK KK KKK ck X A KK KKK KKK KK KKK A KA KKK A KA KKK KKK KK KKK KKK AXA A AA KKK AA A KKK KKK KK BITREVERSAL section assume input and output in Re Im Re Im format KKK KKK KKK KKK KKK KKK KKK KKK KA KA AKA AKA KA A KA AA KA KKK KAKA KKK KKK 6 30 Fast Fourier Transforms FFTs Example 6 12 Complex Radix 2 DIF FFT Continued StL r0 tar2 1 n stf r1 arl 1 CONT nop arl 2 BITRV2 nop ar2 1r0 b Return to C environment END POP R8 POP AR6 Restore the register values and return POP AR5 POP ARA POPF R6 POP R6 POP R5 POP R4 POP DP RETS end Applications Oriented Operations 6 31 Fast Fourier Transforms FFTs Example 6 13 Table With Twiddle Factors for a 64 Point FFT
206. e however wait state operation on the ocal bus is the same as on the global bus so this discussion pertains equally well to both local and global Also the local and global buses each have two sets of control signals R WO0 STRBO RDYO PAGEO CEO and R W1 STRB1 RDY1 PAGE1 CE1 with each set of control signals having its own ready signal providing for more flexibility in support of external devices with different speeds Since both strobes ready signals share the same electrical characteristics the fol lowing discussion focuses on one of the global bus s set of control signals Wait states are generated by The internal wait state generator Y The external ready inputs RDYO or RDY1 The logical AND or OR of the two ready signals When enabled internally generated wait states affect all external cycles re gardless of the address accessed If different numbers of wait states are re quired for various external devices the external RDY input can be used to cus tomize wait state generation to specific system requirements If either the logical OR or electrical AND since the signals are true low of the external and wait count ready signals is selected the earlier ofthe two signals will generate a ready condition and allow the cycle to be completed It is not required that both signals be present Memory Interfacing 4 11 Wait States and Ready Generation 4 5 1 ORing of the Ready Signals STRBx SWW 1
207. e reguired even though the 60 mA is omitted 3 If significant internal bus operations are being performed see subsection 9 3 2 Internal Bus Operations on page 9 8 add the calculated current value 4 lf external writes are being performed at high speed see Section 9 4 Current Requirements of Output Driver Components on page 9 12 then add the values calculated for local and global bus current components 5 Add DMA and communication port current requirements if they are used Calculation of Total Supply Current The current value resulting from summing these components is the total device current requirement for a given program activity 9 5 2 Supply Voltage Operating Frequency and Temperature Dependencies Three additional factors that affect current requirements are supply voltage level operating temperature and operating frequency However these con siderations affect total supply current not specific components that is internal or external bus operations Note that supply voltages operating temperature and operating frequency must be maintained within required device specifica tions The scale factor for these dependencies is applied in the same manner as dis cussed in previous sections once the total current for a particular program segment has been determined Figure 9 11 shows the relative scale factors to be applied to the supply current values as a function of both Vpp and operat ing frequency Figure
208. e result CMPI 0 RO Set status from result RETS RETURN ZERO ZERO LDI 0 RO RETS end 3 5 2 Computation If the dividend is less than the divisor and you want fractional division you can perform a division after you determine the desired accuracy of the quotient in bits Ifthe desired accuracy is k bits start by shifting the dividend left by k posi tions Then apply the algorithm described above and replace with k It is assumed that i kis less than 32 of Floating Point Inverse and Division When you use the RCPF reciprocal of a floating point number instruction to generate an estimate of the reciprocal of a floating point number you can also use Newton Raphson algorithm to extend the precision of the mantissa of the reciprocal of a floating point number that the instruction generates The floa ting point division can be obtained by multiplying the dividend and the recipro cal of the divisor The input to RCPF is assumed to be v v man x 2v exp The output is x x man x 2 X exp The value v man or x man is composed of three fields the sign bit v sign an implied nonsign bit and the fraction field v frac Four rules apply to generating the reciprocal of a floating point number 1 Ifv gt 0 then x exp v exp 1 and x man 2 v man For the special case in which the 10 MSBs of v man 01 00000000b then x man 2 2 78 01 11111111b In both cases the
209. e rules as the above FIR filter with fixed coeffi cients FIR IIR and Adaptive Filters Example 6 8 Adaptive FIR Filter LMS Algorithm A A A 0X 0 0 FF FF 0 0X FF FF FF 0 ko FF o FF F Xo CR h n 0 x n h n N 1 x n N 1 TITLE ADAPTIVE FIR FILTER SUBROUTINE LMS LMS LMS ADAPTIVE FILTER EQUATIONS y n FOR i 0 i lt Nj i tmuerr x n i TYPICAL CALLING SEOUENCE load load LAJU load load load R4 ARO LMS AR1 RC BK ARGUMENT ASSIGNMENTS h n 1 i LMS ALGORITHM h n 1 x n 1 hn i ARGUMENT FUNCTION R4 SCALE FACTOR 2 mu err ARO ADDRESS OF h n N 1 AR1 ADDRESS OF x n N 1 RC JENGTH OF FILTER 2 N 2 BK ENGTH OF FILTER N REGISTERS USED AS INPUT R4 ARO AR1 RC BK REGISTERS MODIFIED RO R1 R2 ARO AR1 RC REGISTER CONTAINING RESULT RO BENCHMARKS CYCLES 4 3N not including subroutine overhead PROGRAM SIZE 9 words not including subroutine overhead SETUP i 0 global LMS RPTBD LOOP Setup the delayed repeat block Initialize RO MPYF3 ARO AR1 RO h n N 1 x n N 1 gt RO SUBF3 R2 RZ R2 Initialize R2 Initialize R1 MPYF3 AR1 1 5 R4 R1 x n N 1 tmuerr gt R1 ADDF3 ARO 1 R1 R1 h n N 1 x n N 1 FILTER AND UPDATE Filter UPDATE MPYF3 ADDF3
210. e set first This method does not cause any extra CPU DMA access conflict But its drawback when using split mode is that you cannot differentiate whether the primary or auxiliary channel has finished The transfer counter has a zero value This option is sometimes not reli able because the DMA channel could be in the middle of an autoinitializa tion sequence The TCINT or AUX TCINT flag is set to 1 This option is reliable but the CPU is polled via the peripheral bus potentially causing CPU DMA ac cess conflict if the DMA is operating to from the peripheral bus This is a good option if you do not foresee any problem with the additional access delay The START AUX START bits in the DMA channel control register are set to 105 This option can also cause a CPU DMA access conflict Programming the DMA Coprocessor 7 3 DMA Assembly Programming Examples 7 3 DMA Assembly Programming Examples The DMA coprocessor is a memory mapped peripheral that you can easily program from C as well as from assembly Example 7 1 through Example 7 5 provide examples on programming the DMA coprocessor using assembly lan guage Example 7 6 through Example 7 11 provide examples on program ming the DMA coprocessor from C The source code for examples Example 7 6 through Example 7 11 can be found in the TI BBS self extract ing file C4xdmaex exe Example 7 1 shows one way for setting up DMA channel 2 to initialize an array to zero This
211. ed memory 4 21 short floating point format definition A 7 short integer format definition A 7 short unsigned integer format definition A 7 signal descriptions 14 pin header 11 2 signal quality 8 5 Index 7 Index signals buffered 11 9 buffering for emulator connections 11 8 to 11 11 description 14 pin header 11 2 timing 11 5 sign extend definition A 7 single access RAM SARAM definition A 7 single precision floating point format definition A 7 single precision integer format definition A 7 single precision unsigned integer format defini tion A 7 slave devices 11 3 slow devices OR 4 12 sockets 10 6 325 pin C40 304 pin C44 10 6 software development tools assembler linker 10 2 Ccompiler 10 2 digital filter design package 10 2 general 10 12 linker 10 2 simulator 10 2 software interrupt definition A 7 software polling interrupts example 2 11 software stack 2 2 2 11 split mode definition A 7 split mode DMA 7 5 split mode 7 13 7 14 square root calculation 3 15 ST See status register stack 2 7 definition A 7 stack pointer 2 7 stack pointer SP application 2 7 stacks growth 2 8 high to low memory diagram 2 9 low to high memory diagram 2 9 user 2 8 status register definition A 7 straight unshrouded 14 pin 11 3 STRBx SWW 4 12 strobes 4 9 wait states 4 7 SUBB instruction 3 17 3 18 Index 8 SUBC instruction 3 9 SUBI instruction 3 18 subroutine 2 21 subroutines 2 4 2 14 calls See call
212. emory interface See memory interface local memory interface control register LMICR LSTRB ACTIVE field 4 9 loop delayed block repeat example 2 19 loop optimization example 5 3 loops 2 18 single repeat 2 20 LSB definition A 5 LWLct LWRct instructions 3 4 machine cycle See CPU cycle mantissa definition A 5 maskable interrupt definition A 5 matrix vector multiplication data memory organiza tion 6 21 MBct MHct instructions 3 4 memory object exchange example 5 2 memory device timing 4 6 memory interface 4 12 global 4 4 local 4 4 ready generation 4 11 shared global 4 21 strobes 4 7 two banks 4 8 wait states 4 11 Index 5 Index memory interface local global RAM zero wait states 4 7 shared bus 4 22 memory interface control registers 4 12 LSTRB ACTIVE field 4 8 PAGESIZE field 4 8 4 18 memory interfacing introduction 4 1 memory map 4 4 memory mapped register definition A 5 message broadcasting 8 20 communication ports 8 21 MFLOPS definition A 5 microcomputer mode definition See microproces sor mode microprocessor mode definition See microcomput er mode MIPS definition A 5 miss definition A 5 MPYIS instruction 3 18 MPYSHI3 instruction 3 18 MSB definition A 5 mu law compression expansion 6 2 conversion linear 6 2 multiplication matrix vector 6 21 multiplier definition A 5 networks distributed memory 4 21 parallel connectivity 8 18 Newton Raphson algo
213. en more communication port channels are Current Reguirement of Output Driver Components used Similar to the DMA bus current consumption adding communication ports eventually saturates the peripheral bus as more channels are added Figure 9 8 Communication Port Current Versus Clock Rate 280 240 200 160 120 Incremental I pp mA 80 40 fcik MHz Note that since the communication ports are intended to communicate with other TMS320C4x communication ports over short distances no additional capacitive loading was added In this case the transmission distance is about 6 inches without additional 80 pF loads Note that communication port current is superimposed over libus value 9 4 4 Data Dependency Data dependency of the current for the local and global buses is expressed as a scale factor that is a percentage of the maximum current exhibited by either of the two buses C4x Power Dissipation 9 17 Current Requirement of Output Driver Components Figure 9 9 Local Global Bus Current Versus Data Complexity 1 4 1 2 1 0 1 a 8 E E N 6 E o z 4 id 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 Relative Operation Complexity Figure 9 9 shows normalized weighting factors that can be used to scale cur rent requirements on the basis of patterns in data being written on the external buses The range of possible weighting factors forms a trapezoidal pattern bounded by extremes of data values As the
214. enerate gmce0 and gmce1 to prevent these sig nals from going low active if all the processors busenable signals are high inactive The busenable signal is shown in the PLD equations in the Global Bus Interface Logic section the of the TMS320C4x Parallel Processing Devel opment System Technical Reference The gmce0 and gmce1 signals are shown in the Global Memory Control section of the same book Chapter 5 Programming Tips Programming style is highly personal and reflects each individual s prefer ences and experiences The purpose of this chapter is not to impose any par ticular style Instead it emphasizes some of the features of the C4x that can help in producing faster and or shorter programs The tips in this chapter cover both C and assembly language programming Topic Page 5 1 Hints for Optimizing C Code ee 5 2 5 2 Hints for Optimizing Assembly Language Code 5 5 5 1 Hints for Optimizing C Code 5 1 Hints for Optimizing C Code The C4x s large register file software stack and large memory space easily support the C4x C Compiler The C compiler translates standard ANSI C pro grams into assembly language source It also increases the portability and de creases the porting time of applications The suggested methodology for developing your application follows five steps Write the application in C 2 Debug the program 3 Estimate if the program
215. er 32 bits of R7 PUSHF R7 and the upper 32 bits PUSH R8 Save the lower 32 bits of R8 PUSHF R8 and the upper 32 bits PUSH R9 Save the lower 32 bits of R9 PUSHF R9 and the upper 32 bits PUSH R10 Save the lower 32 bits of R10 PUSHF R10 and the upper 32 bits PUSH R11 Save the lower 32 bits of R11 PUSHF R11 and the upper 32 bits SAVE THE AUXILIARY REGISTERS PUSH ARO Save ARO PUSH AR1 Save AR1 PUSH AR2 Save AR2 PUSH AR3 Save AR3 PUSH AR4 Save AR4 PUSH ARS Save ARS PUSH AR6 Save AR6 PUSH AR7 Save AR7 Program Control 2 15 Context Switching in Interrupts and Subroutines Example 2 6 Context Save and Context Restore Continued 2 16 X CR Wd SAVE THE REST OF THE REGISTERS FROM THE REGISTER FILE PUSH DP Save data page pointer PUSH IRO Save index register IRO PUSH IR1 Save index register IR1 PUSH BK Save block size register PUSH IIE Save interrupt enable register PUSH IIF Save interrupt flag register PUSH DIE Save DMA interrupt enable register PUSH RS Save repeat start address PUSH RE Save repeat end address PUSH RC Save repeat counter SAVE IS COMPLETE YOUR INTERRUPT SERVIC
216. er TI documentation Do this Write to Texas Instruments Incorporated Market Communications Manager MS 736 P O Box 1443 Houston Texas 77251 1443 Call the TI Literature Response Center 800 477 8924 Contact the DSP hotline Phone 713 274 2320 FAX 713 274 2324 Electronic Mail 4389750 mcimail com Call the TI BBS 713 274 2323 Ftp from ftp ti com log in as user ftp cd to mirrors tms320bbs Point your browser at http www ti com Send electronic mail to comments books sc ti com Send printed comments to Texas Instruments Incorporated Technical Publications Mgr MS 702 P O Box 1443 Houston Texas 77251 1443 Trademarks MS is a registered trademark of Microsoft Corp MS Windows is a registered trademark of Microsoft Corp MS DOS is a registered trademark of Microsoft Corp OS 2 is a trademark of International Business Machines Corp Sun and SPARC are trademarks of Sun Microsystems Inc VAX and VMS are trademarks of Digital Eguipment Corp XV 1 Contents Processor Initialization lt III III III 1 1 Provides examples for initializing the processor 1 1 Reset Process coniu ese EE sacar dae URGE NUR rea kee IRR ERR SUR Rus 1 2 1 2 Reset Signal Generation 2 ren 1 3 1 3 Multiprocessing System Reset Considerations 00 cece eee eee 1 5 1 4 How to Initialize the Processor 0 0 ccc cece teeta 1 6
217. er User s Guide litera ture number SPRUO34 describes the TMS320 floating point C compiler This C compiler accepts ANSI standard C source code and produces TMS320 assembly language source code for the C3x and C4x genera tions of devices TMS320C4x C Source Debugger User s Guide literature number SPRUO54A tells you how to invoke the C4x emulator and simulator ver sions of the C source debugger interface This book discusses various aspects of the debugger interface including window management com mand entry code execution data management and breakpoints It also includes a tutorial that introduces basic debugger functionality TMS320C4x Technical Brief literature number SPRUO76 gives a con densed overview of the C4x DSP and its development tools It also lists TMS320C4x third parties TMS320 Family Development Support Reference Guide literature number SPRU011 describes the 320 family of digital signal processors and the various products that support it This includes code generation tools compilers assemblers linkers etc and system integration and debug tools simulators emulators evaluation modules etc This book also lists related documentation outlines seminars and the university pro gram and gives factory repair and exchange information TMS320 Third Party Support Reference Guide literature number SPRU052 alphabetically lists over 100 third parties that supply various products that serve the family o
218. ersion IEEE to From C4x 3 20 128 In this case the number is interpreted as zero independently of the values of sand f which are by default set to zero To summarize the values of the represented numbers in the C4x floating point format are as follows 2e 01 f if s 0 2e 10 f if s 1 0 if e 128 IEEE floating point format 1 8 bits 23 bits oe T The IEEE floating point format uses sign magnitude notation for the mantissa In a 32 bit word representing a floating point number the first bit is the sign bit The next 8 bits correspond to the exponent expressed in an offset by 127 for mat the actual exponent is e 127 The following 23 bits represent the abso lute value of the mantissa with the most significant 1 implied The binary point is fixed after this most significant 1 In other words the mantissa actually has 24 bits Several special cases are summarized below These are values of the represented numbers in the IEEE floating point for mat 1 s 28 127 01 f if0 lt e lt 255 Special cases 1 8 ife 0andf 0 zero UE 2 26 0 f if e 0 and f lt gt 0 denormalized 71 5 infinity if e 255 and f 0 infinity NaN not a number if e 255 and f lt gt 0 The C4x performs the conversion according to these definitions of the for mats It assumes that the source data for the IEEE format is in memory only and that the source data for the C4x floating point format is in ei
219. f 320 digital signal processors software and hardware development tools speech recognition image process ing noise cancellation modems etc TMS320 DSP Designer s Notebook Volume 1 literature number SPRT125 presents solutions to common design problems using C2x C3x C4x C5x and other TI DSPs Related Articles and Books A wide variety of related documentation is available on digital signal process ing These references fall into one of the following application categories O O O O O O O C C C C General Purpose DSP Graphics Imagery Speech Voice Control Multimedia Military Telecommunications Automotive Consumer Medical Development Support In the following list references appear in alphabetical order according to au thor The documents contain beneficial information regarding designs opera tions and applications for signal processing systems all of the documents provide additional references Texas Instruments strongly suggests that you refer to these publications General Purpose DSP 1 2 Antoniou A Digital Filters Analysis and Design New York NY McGraw Hill Company Inc 1979 Brigham E O The Fast Fourier Transform Englewood Cliffs NJ Pren tice Hall Inc 1974 vii 3 Burrus C S and T W Parks DFT FFT and Convolution Algorithms New York NY John Wiley and Sons Inc 1984 4 Chassaing R Horning D W Digital Signal Processing with Fixed and Float
220. f memory in which data is stored and then retrieved in the same order in which it was stored Thus the first word stored in this buffer is retrieved first The C4x s communica tion ports each have two FIFOs one for transmit operations and one for receive operations hardware interrupt An interrupt triggered through physical connections with on chip peripherals or external devices hit A condition in which when the processor fetches an instruction the instruction is available in the cache Glossary A 3 Glossary A 4 IACK Interrupt acknowledge signal An output signal that indicates that an interrupt has been received and that the program counter is fetching the interrupt vector that will force the processor into an interrupt service rou tine IIE See internal interrupt enable register IF SeellOF flag register IOF flag register IIF Controls the function general purpose I O or inter rupt of the four external pins IIOFO to IIOF3 It also contains timer DMA interrupt flags index registers Two registers IRO and IR1 that are used by the ARAU for indexing an address internal interrupt A hardware interrupt caused by an on chip peripheral internal interrupt enable register A register in the CPU register file that determines whether or not the CPU will respond to interrupts from the communication ports the timers and the DMA coprocessor interrupt A signal sentto the CPU that when not masked for
221. f the C4x s four strobes two each on the local and global buses four different banks of memory can be decoded In addition through program control you can change the address decoding under pro gram control by changing the LSTRB active field bits 24 28 of the LMICR or the global memory interface control register GMICR If you must decode more than four banks of memory or if the chosen memory device cannot meet the read cycle timing requirements for the C4x at zero wait states you should use page switching discussed in subsection 4 5 6 on page 4 18 to add an ex tra cycle to read accesses outside the current bank boundary Memory Interfacing 4 9 Zero Wait State Interfacing to RAMs Figure 4 6 C4x Interface to Zero Wait State SRAMs Two Strobes 8 x HM6708 SRAM 8 x HM6708 SRAM C4x 6 1 03 1 00 LSTRB1 LRW1 LRDY1 _ LD 31 0 32 4 10 Wait States and Ready Generation 4 5 Wait States and Ready Generation Using wait states can greatly increase a system s flexibility and reduce its hardware requirement The C4x is capable of generating wait states on either the global bus or the local bus and both buses have independent sets of ready control logic The buses wait state configuration is determined by the SWW and WTONT fields of the local and global bus interface control registers This section discusses ready generation from the perspective of the global bus interfac
222. figure shows the minimum cur rent occurs when all zeros are written while the maximum current occurs when alternating 5555 5555h and AAAA AAAAh are written This condition results in a weighting factor of 1 which corresponds to using the values from Figure 9 5 and or Figure 9 6 directly As with internal bus operations data dependencies for the external buses are well defined but accurate prediction of data patterns is often either impossible orimpractical Therefore unless you have precise knowledge of data patterns you should use an estimate of a median or average value for the scale factor Assuming that data will be neither 5s and As nor all Os and will be varying ran domly then a value of 0 80 is appropriate Otherwise if you prefer a conserva tive approach you can use a value of 1 0 as an upper bound Regardless of the approach taken for scaling once you determine the scale factor for the buses apply this factor to the current values you determined with the graphs in section 9 4 1 Local or Global Bus Current Reguirement of Output Driver Components For example if a nominal scale factor of 0 80 for the buses is assumed the current contribution from the two buses is as follows Local or Global 0 80 x 133 mA 106 4 mA 9 4 5 Capacitive Loading Dependence Once cycle timing and data dependencies have been accounted for capaci tive loading effects should be calculated and applied Figure 9 10 shows the current values
223. fine CTRLREG 0x00c40045 DMA sends interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority define SRC 0x00100071 src commport 0 input fifo define SRC_IDX 0x0 src address does not increment define COUNTER 0x08 number of words to transfer define DST 0x00100042 dst commport 3 output fifo define DST IDX 0x0 dst address does not increment define DIEVAL 0x4000 set ICRDY3 read sync DMAUNIF dma DMAUNIF DMAADDR int dieval DIEVAL main dma gt src void SRC dma gt src_idx SRC_IDX dma gt counter COUNTER dma gt dst void DST dma gt dst_idx DST IDX dma gt ctrl void CTRLREG asm ldi dieval die PRIM WAIT DMA volatile int dma 7 10 DMA C Programming Examples Example 7 7 Unified Mode DMA Using Autoinitialization Method 1 BKK KK IKK IK I IK IK kok kok k kok k kok k kok RR ke kk RR k ke kk ko kk kk kok kok kok kok k kok kok kok ke ke ke kk k k k k k EXAMPLE oko kok kok k kok k kok kok kok k kok oko kok k kok k ko oko AA include dma h define DMAADDR 0x001000a0 1st transfer settings define CTRLREGI 0x00c00009 DMA CPU rotating priority and DMA autoinitializes when transfer counter 0 define SRC1 0x002ffc00 src address define SRC1 IDX 0x1 src address increment define COUNTER1 0x08 number of words to transfer define DST1 0x002ffd00 dst address r
224. fine Motion Control Motion Control Magazine December 1993 15 Phillips C and H Nagle Digital Control System Analysis and Design En glewood Cliffs NJ Prentice Hall Inc 1984 Multimedia 1 Reimer J DSP Based Multimedia Solutions Lead Way Enhancing Audio Compression Performance Dr Dobbs Journal December 1993 2 Reimer J G Benbassat and W Bonneau Jr Application Processors Making PC Multimedia Happen Silicon Valley PC Design Conference July 1991 Military 1 Papamichalis P and J Reimer Implementation of the Data Encryption Standard Using the TMS32010 Digital Signal Processing Applications 1986 Telecommunications 1 Ahmed l and A Lovrich Adaptive Line Enhancer Using the TMS320C25 Conference Records of Northcon 86 USA 14 3 1 10 September October 1986 2 Casale S R Russo and G Bellina Optimal Architectural Solution Us ing DSP Processors for the Implementation of an ADPCM Transcoder Proceedings of GLOBECOM 89 pages 1267 1273 November 1989 3 Cole C A Haoui and P Winship A High Performance Digital Voice Echo Canceller on a SINGLE TMS32020 Proceedings of ICASSP 86 USA Catalog Number 86CH2243 4 Volume 1 pages 429 432 April 1986 4 Cole C A Haoui and P Winship A High Performance Digital Voice Echo Canceller on a Single TMS32020 Proceedings of IEEE Internation al Conference on Acoustics Speech and Signal Processing USA 1986
225. fine SRC2 IDX 0x2 define COUNTER2 0x02 Auxiliary channel define DST1 0x02ffd00 autoinitialization 1 define DST1_IDX 0x1 define ACOUNTER1 0x04 DMASPLIT dma DMASPLIT DMAADDR int dieval DIEVAL DMAPRIM autoinil autoini2 DMAAUX autoiniaux main PRIMARY CHANNEL 1st autoinitialization values autoinil ctrl void CTRLREGI autoinil src void SRC1 autoinil src idx SRC1_IDX autoinil counter COUNTER1 autoinil linkp amp autoini2 Programming the DMA Coprocessor 7 15 DMA C Programming Examples Example 7 11 Split Mode DMA Using Autoinitialization Continued PRIMARY CHANNEL 2nd autoinitialization values autoini2 ctrl void CTRLREG2 autoini2 src void SRC2 autoini2 src idx SRC2 IDX autoini2 counter COUNTER2 AUXILIARY CHANNEL 1st autoinitialization values autoiniaux ctrl void CTRLREG2 autoiniaux dst void DST1 autoiniaux dst idx DST1 IDX autoiniaux acounter ACOUNTERI initialize DMA dma gt linkp amp autoinil dma gt alinkp amp autoiniaux dma counter 0 dma gt acounter 0 dma gt ctrl void CTRLREG1 asm ldi dieval die wait for DMA to finish transfer SPLIT WAIT DMA volatile int dma 7 16 Example 7 12 Include File for All C Examples dma h DMA C Programming Examples typedef ty
226. g enough time to ensure the stabilization of the system oscillator upon powerup V7 Note Reset does not have internal Schmidt hysteresis To ensure proper reset op eration avoid low rise and fall times Rise fall time should not exceed one CLKIN cycle I Multiprocessing System Reset Considerations 1 3 Multiprocessing System Reset Considerations If synchronization of multiple C4x DSPs is required all processors should be provided with the same input clock and the same reset signal After powerup when the clock has stabilized set RESET high for a few H1 H3 cycles and then set it low to synchronize their H1 H3 clock phases Following the falling edge RESET should remain low for at least ten H1 cycles and then be driven high The circuit in Figure 1 1 can be used for RESET generation Pullup resistors are recommended at each end of the connection to avoid unin tended triggering after reset when RESET going low is not received on all C4x devices at the same time It is recommended that you power up the system with RESET low This prevents C4x asynchronous signals from driving unknown values before RESET goes low which could create bus contention in communication port pins resulting in damage to the device Processor Initialization 1 5 How to Initialize the Processor 1 4 How to Initialize the Processor After reset the C4x jumps to the address stored in the reset vector location and starts execut
227. g the number of wait states for devices that already have external ready logic imple mented but require additional wait states under certain unique circumstances 4 5 3 External Ready Generation The optimum technique for implementing external ready generation hardware depends on the specific characteristics of the system including the relative number of wait state and nonwait state devices in the system and the maximum number of wait states required for any one device The approaches discussed here are intended to be general enough for most applications and are easily modifiable to comprehend many different system configurations In general ready generation involves the following three functions 1 Segmentation of the address space to distinguish fast and slow devices 2 Generation of properly timed ready indications 3 Logical ORing of all the separate ready timing signals together to connect to the physical ready input Segmentation of the address space is required to obtain a unique indication of each particular area within the address space that requires wait states This segmentation is commonly implemented in the form of chip select generation Chip select signals can initiate wait states in many cases however occasionally chip select decoding considerations may provide signals that do not allow ready input timing requirements to be met In this case you can seg ment coarse address space on the basis of a small number of address l
228. gher current require ment during the write portion where the external bus is being used significantly The processing portion of the algorithm is 95 of the total algorithm During this portion the power supply current is required for the internal circuitry only Data is processed in several loops that make up the majority of the algorithm During these loops two operands are transferred on every cycle The current required for internal bus usage then is 60 mA from Figure 9 3 The data is assumed to be random A data value scale factor of 0 93 is used from Figure 9 4 This value scales 60 mA yielding 55 8 mA for internal bus opera tions Adding 55 8 mA to the quiescent current requirement and internal opera tions current reguirement yields a current reguirement of 245 8 mA for the major portion of the algorithm lq liops libus 2 130 mA 60 mA 60 mA 0 93 245 8 mA The portion of the algorithm corresponding to writing out data is approximately 5 of the total algorithm Again the data that is being written is assumed to be random From Figure 9 4 and Figure 9 10 scale factors of 0 93 and 0 8 are used for derating due to data value dependency for internal and local buses respectively During the data dump portion of the code a load and a store are performed every cycle however the parallel load store instruction is in an RPTS loop Therefore there is no contribution due to internal opera tions because the inst
229. hared Memory 4 21 4 1 System Configuration 4 1 System Configuration Figure 4 1 illustrates an expanded configuration of a C4x system with differ ent types of external devices and the interfaces to which they are connected Figure 4 1 Possible System Configurations Fast local Analog Large shared C4x Peripherals Peripherals Local bus Global bus Interrupt Communication interface ports Peripherals C4x devices Timer interface I O devices Timer interface I O devices Clock reset generator etc Peripherals Bit O External flags System control In your design you can use any subset or superset of the illustrated compo nents External Interfacing 4 2 External Interfacing The C4x interfaces connect to a wide variety of device types Each of these interfaces is tailored to a particular type of device such as memory DMA par allel and serial peripherals and I O In addition C4x devices can interface di rectly with each other without external logic through their communication ports or their external flag pins IIOF 0 3 Each interface comprises one or more signal lines which transfer information and control its operation Figure 4 2 shows the signal groups for these interfaces Figure 4 2 External Interfaces Data 3 po 31 4X 1Do 31 3 Data address LA0 30 gt address Data enable DE LDE
230. hared bus at any one time it must also allow all processors sharing the bus to have a chance to access shared resources The C4x supports shared memory multiprocessing with its identical global and local port interfaces Both interfaces have four status output signals L STAT3 0 which identify what type of access is beginning on the bus These signals identify whether the C4x portis idle a DMA readis occurring a STRB1 write is occurring a LOCKed access to memory is pending etc The signals can be interpreted by the interface to issue single access or locked access bus requests to a shared bus arbiter The L CE L AE and L DE input signals support shared address control and data lines When the signals are disabled high they put the port s control Memory Interfacing 4 21 Parallel Processing Through Shared Memory signals address lines and data lines respectively in the high impedance state These bus enable lines are asynchronous inputs to the C4x which can quickly turn off bus drivers when another processor is accessing a shared resource However these signals asynchronously turn off the C4x s local and global buses without memory accesses being suspended To ensure that data written is seen externally and data read is valid you should use the external L RDY should be used for wait state generation in shared memory designs An L RDY signal should not be sent to the C4x until the processor has regained access
231. he actual measured power supply current Design Considerations 9 7 Design Considerations Designing systems for minimum power dissipation involves reducing device operating current reguirements due to signal switching rate capacitive load ing and other effects Selective consideration of these effects makes it pos sible to optimize system performance while minimizing power consumption This section describes current reduction technigues based on operating cur rent dependencies of the device as discussed in previous sections of this doc ument 9 7 1 System Clock and Signal Switching Rates Since current and therefore power reguirements of CMOS devices are directly proportional to switching freguency one potential approach to mini mizing operating power is to minimize system clock freguency and signal switching rates Although performance is often directly proportional to system clock and signal switching rates tradeoffs can be made in both areas to achieve an optimal balance between power usage and performance in the design of a system If reducing power is a primary goal and a given system design does not have particularly demanding performance requirements the system clock rate can be reduced with the corresponding savings in power Minimum power is real ized when system clock rates are only as fast as necessary to achieve required system performance Additionally if overall system clock rates cannot be reduced an alternative
232. he input channel is full OCRDY output channel ready 0 the output channel is full and not ready to be written 1 the output channel is not full and ready to be written OCEMPTY output channel empty 0 the output channel is not empty 1 the output channel is empty Example 8 1 shows the reading of data from the communication port eight data at a time using the CPU ICFULL interrupt Example 8 2 shows the writing of data to a communication port one datum at a time using the polling method Both examples show DMA reads writes DMA is discussed in subsection 7 3 DMA Assembly Programming Examples on page 7 4 Communication Ports Example 8 1 Read Data from Communication Port With CPU ICFULL Interrupt ITLE READ DATA FROM COMMUNICATION PORT WITH CPU ICFULL INTERRUP HIS EXAMPLE ASSUMES THE ICFULL 0 INTERRUPT VECTOR IS SET IN THE CPU INTERRUPT VECTOR TABLE THE EIGHT DATA WORDS ARE READ I WHENEVER THE DATA IS FULL IN COMM PORT 0 INPUT FIFO LDA COMM PORTO CTL AR2 Load comm port 0 control Reg address LDA COMM PORTO INPUT ARO Load comm port 0 input FIFO address LDA GINTERNAL RAM AR1 Load internal RAM address AND3 OF7H AR2 R9 Unhalt comm port 0 input channel STI R9 AR2 OR 04H IIE Enable ICRDY 0 interrupt OR 02000H ST Enab
233. heral bus Abus that the CPU uses to communicate the DMA copro cessor communication ports and timers pipeline A method of executing instructions in an assembly line fashion program counter A register that contains the address of the next instruc tion to be fetched RC Seerepeat counter register read write R W pin This memory control signal indicates the direction of transfer when communicating to an external device register file A bank of registers repeat counter register A register in the CPU register file that specifies the number of times minus one that a block of code is to be repeated when a block repeat is performed repeat mode A zero overhead method for repeating the execution of a block of code reset A means to bring the central processing unit CPU to a known state by setting the registers and control bits to predetermined values and signaling execution to fetch the reset vector reset pin This pin causes the device to reset ROMEN ROM enable An external pin that determines whether or not the the on chip ROM is enabled Glossary R W See read write pin short floating point format A 16 bit representation of a floating point number with a 12 bit mantissa and a 4 bit exponent short integer format A twos complement 16 bit format for integer data short unsigned integer format A 16 bit unsigned format for integer data sign extend Fill the high order bits of a number with the sign bit
234. ication the TCK signal is left unconnected Figure 11 4 Target System Generated Test Clock Greater Than 6 Inches Vcc Voc JTAG Device Emulator Header EMUO PD i EMU1 TRST lt e TMS e TDI gt TDO NC TCK gt TCK_RET v GND System Test Clock Note When the TMS TDI lines are buffered pullup resistors should be used to hold the buffer inputs at a known level when the emulator cable is not connected There are two benefits to having the target system generate the test clock The emulator provides only a single 10 368 MHz test clock If you allow the target system to generate your test clock you can set the freguency to match your system requirements In some cases you may have other devices in your system that require a test clock when the emulator is not connected The system test clock also serves this purpose 11 10 Connections Between the Emulator and the Target System 11 7 3 Configuring Multiple Processors Figure 11 5 shows a typical daisy chained multiprocessor configuration which meets the minimum requirements of the IEEE 1149 1 specification The emulation signals in this example are buffered to isolate the processors from the emulator and provide adequate signal drive for the target system One of the benefits of this type of interface is that you can generally slow down the test clock to eliminate timing problems You should follow these guidelines for multipr
235. idle H A 0074 Cz L L H L L H gt wait_twoa H E 0075 c X X X X X L idle H A 0076 c L L H L L H wait twoa H n 0077 Cy L L H L L H gt wait twob L A 0078 c X X X X X L gt idle H 0079 L 4H L L L L H gt idle L 0080 c H L L L L H gt idle L A 0081 L L L L L L H gt idle H n 0082 Cy L H L L L H gt wait one L 0083 c X X x X X H idle H n 0084 c L L H L L H gt wait twoa H p 0085 Gu L L H L L H gt wait twob L A 0086 c 4H L L L L H gt idle L 0087 Er X x x H H H idle H 0088 c X X X H H H idle H i 0089 end ready generation Memory Interfacing 4 17 Wait States and Ready Generation 4 5 6 Page Switching Techniques 4 18 The C4x s programmable page switching feature can greatly ease system de sign when large amounts of memory or slow external peripheral devices are reguired This feature provides a time period for disabling all device selects During the interval slow devices are allowed time to turn off before other de vices have the opportunity to drive the data bus thus avoiding bus contention When page switching is enabled any time a portion of the high order address lines changes as defined by the contents of the STRBO and STRB1 PAGE SIZE fields in the global and local memory interface control registers the cor responding STRB and PAGE go high for one full H1 cycle
236. ies should not exceed 1 0 H1 H3 at the receiving end A CSTRB low beyond the synchronization period on a word boundary can be recognized as a new valid CSTRB resulting in an extra byte reception byte slippage For a short distance between two communicating C4x devices byte slippage is not a problem In C40 device revisions 3 0 or higher or in any revision of the C44 no CSTRB width restriction exists The circuit shown in Figure 8 7 can reduce the width of CSTRB for very long distances when you are using C4x device revisions lower than 3 0 The circuit has buffers for CSTRB and CRDY on the transmitting end and two S R flip flops on the receiving end On the receiving end a low STRB incoming signal causes the Q signal of S R flip flop U1 to go low forcing the CSTRB pin to go low When CRDY responds with a low signal S R flip flop U2 drives the RDY signal low Because RDY is also tied to the S input of U1 and S has prece dence over R in an S R flip flop A in U1 goes high Also STRB is inverted and drives the S input of U2 In this way the width of the local CSTRB is shortened regardless of the channel length When the STRB signal goes back high the S R flip flop pair is ready to receive another CSTRB Figure 8 7 CSTRB Shortener Circuit CSTRB C4x Transmitter CRDY CSTRB on Both Sides C4x Receiver CRDY Same Circuit
237. ignals with high switching rates Also signals should not run long distances across PC boards to edge connectors unless absolutely necessary Note that the buffering of device outputs that must drive high capacitive loads reduces supply current for the TMS320C40 but this current is translated to the buffering device Whether or not this is a valid tradeoff must be determined at the system level The two main considerations are 1 whether the power reguired by the buffers is more or less than the power reguired from the C40 to drive the load in question and 2 whether or not off loading the power to the buffers has any implications with respect to system power down modes It may be desirable to use buffers to drive high capacitive loads even though they may require more current than the TMS320C40 especially in cases where part of the system may be powered down but the TMS320C40 is still required to interface to other low capacitance loads 9 7 3 DC Component of Signal Loading 9 30 In order to achieve lowest device current requirements the internal and exter nal DC load component of device input and output signal loading must also be minimized Any device inputs that are unused and left floating may cause excessively high DC current to be drawn by their input buffer circuitry This occurs because if an input is left unconnected the voltage on the input may float to a level that causes the input buffer to be biased at a point within its
238. iliary register arithmetic unit architecture distributed memory 8 19 shared and distributed memory 8 19 shared memory 8 19 arithmetic logic unit ALU definition A 1 array initialization example 7 4 array objects allocation 5 4 Index arrays 2 20 assembly language 1 6 7 4 auxiliary register arithmetic unit ARAU defini tion A 1 auxiliary registers ARn definition A 1 BBS 10 4 Bcond instruction 2 4 benchmarks A law compression 6 5 A law expansion 6 6 adaptive FIR filter 6 15 fast Fourier transforms FFT 6 87 FIR filter 6 8 floating point inverse 3 14 IIR filter 6 12 to 6 15 inverse lattice filter 6 18 lattice filter 6 20 matrix vector multiplication 6 21 6 22 mu law compression 6 3 mu law expansion 6 4 bidirectional ring 8 18 biquads 6 9 data memory organization 6 9 example 6 11 6 12 to 6 15 single 6 9 bit copying example 3 2 bit manipulation 3 2 bit reversed addressing example 3 7 bit reversed addressing 3 6 3 7 3 8 CPU 3 6 definition A 1 bit reversed sine table 6 55 BK See block size register block move example 3 3 block moves 3 3 Index 1 Index block repeat delayed example 2 19 example 2 18 single instruction 2 20 block repeats delayed 2 19 block size BK register 6 7 block size register definition A 2 block transfers 3 7 bootloader definition A 2 BUD instruction 5 5 buffered signals JTAG 11 9 buffering 11 8 bulletin board 10 4 bus control signals 4 4 bus devices 11 3
239. ime after receiving CRDY high from the last byte There is no reason to wait for the internal C4x synchronizer between CRDY low and CSTRB low for the next word to finish CASE Il The C4x has the token and transmits data The non C4x device re ceives data 1 After receiving CRSTB low from the C4x indicating new data valid the non C4x device can immediately read the data byte and then drive CRDY low indicating thatthe byte has been read There is no maximum time limit between these two events 2 The non C4x device then waits to receive CSTRB high and can immedi ately drive CRDY high ending the byte transfer operation Using the Communication Ports 8 7 Terminating Unused Communication Ports 8 4 Terminating Unused Communication Ports To avoid unintended communication port triggering you can terminate unused communication port control lines in one of the following ways Use pullup resistors in all the communication port control lines Pullups in data lines of input communication ports are optional but they lower power consumption Pullups in data lines of output communication ports are not required if used they increase power consumption Tie the control lines together on the same communication port that is CSTRB to CRDY and CREQ to CACK This holds the control inputs high without using external pullup resistors 8 5 Design Tips Design Tips Becareful with different volt
240. ines where simpler gating allows signals to be generated more quickly In either case the signal that indicates that a particular area of memory is being addressed also normally initiates the ready or wait state signal When address space to be accessed has been established a timing circuit is normally used to provide a ready indication to the processor at the appropriate point in the cycle to satisfy each device s unique requirements Finally since indications of ready status from multiple devices are typically present you should logically OR the signals by using a single gate to drive the RDY input Memory Interfacing 4 13 Wait States and Ready Generation 4 5 4 Ready Control Logic You can take one of two basic approaches to implement ready control logic depending on the state of the ready input between accesses If RDY is low be tween accesses the processor is always ready unless a wait state is reguired if RDY is high between accesses the processor will always enter a wait state unless a ready indication is generated If RDY is low between accesses control of devices that are zero wait state at full speed is straightforward no action is necessary because ready is al ways active unless otherwise reguired Devices reguiring wait states howev er must drive ready high fast enough to meet the input timing requirements Then after an appropriate delay a ready indication must be generated This can be difficult in many cir
241. ing Point Processors COED USA Volume 1 Number 1 pages 1 4 March 1991 5 Defatta David J Joseph G Lucas and William S Hodgkiss Digital Sig nal Processing A System Design Approach New York John Wiley 1988 6 Erskine C and S Magar Architecture and Applications of a Second Generation Digital Signal Processor Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing USA 1985 7 Essig D C Erskine E Caudel and S Magar A Second Generation Digital Signal Processor EEE Journal of Solid State Circuits USA Vol ume SC 21 Number 1 pages 86 91 February 1986 8 Frantz G K Lin J Reimer and J Bradley The Texas Instruments TMS320C25 Digital Signal Microcomputer EEE Microelectronics USA Volume 6 Number 6 pages 10 28 December 1986 9 Gass W R Tarrant T Richard B Pawate M Gammel P Rajasekaran R Wiggins and C Covington Multiple Digital Signal Processor Environ ment for Intelligent Signal Processing Proceedings of the IEEE USA Volume 75 Number 9 pages 1246 1259 September 1987 10 Gold Bernard and C M Rader Digital Processing of Signals New York NY McGraw Hill Company Inc 1969 11 Hamming R W Digital Filters Englewood Cliffs NJ Prentice Hall Inc 1977 12 IEEE ASSP DSP Committee Editor Programs for Digital Signal Pro cessing New York NY IEEE Press 1979 13 Jackson Leland B Digital Fil
242. ints to a no ready external memory ck ck ck 0k ck ck 0k ck Sk ck ck KA AKA KA A KA X AA KA AK AXA A KA X A AXA A KA KA AKA AAA KA KA KA A KA ck ck ck ck ck AKA AA Sk Sk ke kx kx kk ko globl _ SINE Address of sine cosine tabl globl _cr4dif Entry point for execution globl STARTB ENDB starting ending point for benchmarks sect fftdat FFTSIZ space 1 SINTAB word _SINE SINTAB1 word _SINE 1 INPUTP space 1 OUTPUTP space 1 sect PEEUXL _cr4dif LDI SP ARO PUSH DP PUSH R4 Save dedicated registers PUSH R5 PUSH R6 lower 32 bits PUSHF R6 upper 32 bits PUSH R7 lower 32 bits PUSHF R7 upper 32 bits PUSH AR3 PUSH ARA PUSH AR5 PUSH AR6 PUSH AR7 PUSH R8 if REGPARM LDI ARO 1 AR2 points to input data LDI ARO 2 R10 R1O N LDI ARO 3 R9 R9 holds the remain stage number LDI ARO 4 RC points to where FFT result should move to else LDI R2 R10 LDI R3 R9 endif LDP FFTSIZ Command to load data page pointer STI AR2 INPUTP STI RC OUTPUTP STI R10 FFTSIZ Applications Oriented Operations 6 35 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued STARTB LDI GFFTSIZ BK LSH3 1 BK IRO PIRO 2 N1 because of real imag LSH3 2 BK IR1 IR1 N 4 pointer for SIN COS table LDI 1 AR7 Initialize IE index LDI 1 R8 Initialize repeat co
243. ion from that point The RESET vector normally contains the ad dress of the system initialization routine The initialization routine should typically perform several tasks Set the DP register Set the stack pointer Set the interrupt vector table Set the trap vector table Set the memory control register Clear enable cache O O O O O O Note When running under microcomputer mode ROMEN 1 The address stored in the reset vector location points to the beginning of the bootloader code The on chip bootloader automatically initializes the memory control register values from the bootloader table The following examples illustrate how to initialize the C4x when using assem bly language and when using C Processor initialization under assembly language If you are running under an assembly only environment Example 1 1 pro vides a basic initialization routine This example shows code for initializing the C4x to the following machine state Timer 0 interrupt is enabled Trap 0 is initialized The program cache is enabled The DP is initialized to point to the text section The stack pointer is initialized to the beginning of the mystack section The memory control registers are initialized The C4x is initialized to run in microcontroller mode with the reset vector located at address 08000 0000h RESETLOC 1 0 1 0 The program has already been loaded into memory location at address 0x4000 0000 D D
244. it Reversed to Bit Reversed Order in FFT Algorithms Proceedings of ICASSP 89 USA pages 984 987 May 1989 24 Papamichalis P and R Simar Jr The TMS320C30 Floating Point Digi tal Signal Processor IEEE Micro Magazine USA pages 13 29 Decem ber 1988 25 Parks T W and C S Burrus Digital Filter Design New York NY John Wiley and Sons Inc 1987 26 Peterson C Zervakis M Shehadeh N Adaptive Filter Design and Implementation Using the TMS320C25 Microprocessor Computers in Education Journal USA Volume 3 Number 3 pages 12 16 July Sep tember 1993 27 Prado J and R Alcantara A Fast Square Rooting Algorithm Using a Digital Signal Processor Proceedings of IEEE USA Volume 75 Number 2 pages 262 264 February 1987 28 Rabiner L R and B Gold Theory and Applications of Digital Signal Pro cessing Englewood Cliffs NJ Prentice Hall Inc 1975 29 Simar Jr R and A Davis The Application of High Level Languages to Single Chip Digital Signal Processors Proceedings of ICASSP 88 USA Volume D page 1678 April 1988 30 Simar Jr R T Leigh P Koeppen J Leach J Potts and D Blalock A 40 MFLOPS Digital Signal Processor the First Supercomputer on a Chip Proceedings of ICASSP 87 USA Catalog Number 87CH2396 0 Volume 1 pages 535 538 April 1987 31 Simar Jr R and J Reimer The TMS320C25 a 100 ns CMOS VLSI Dig ital Signal Processor 198
245. itialization mode and bit reversed read are used For more detailed information about DMA operation refer to The DMA Coprocessor in the TMS320C4x User s Guide Logical and Arithmetic Operations 3 7 Bit Reversed Addressing Figure 3 1 DMA Bit Reversed Addressing Control Register 00C0 1009h src Address ARO src Index IRO Counter 512 dst Address AR1 dst Index Link Pointer 3 8 r label 00C0 1005h ARO 1 IRO 512 AR1 1 Integer and Floating Point Division 3 5 Integer and Floating Point Division You can use the single cycle instruction RCPF to generate an estimate of the reciprocal of a floating point number This estimate has the correct exponent and the mantissa is accurate to the eighth binary place the error of the mantis sa is lt 2 9 Often this is a satisfactory estimate of the reciprocal of a floating point number In other cases this estimate can be used as a seed for an algo rithm that computes the reciprocal to even greater accuracy The Newton Raphson algorithm described later is one such case Although it provides no special instruction for integer division the instruction set can perform an efficient division routine Additionally the FLOAT RCPF and FIX instructions can produce a rough estimate 3 5 1 Integer Division You can implement division on the C4x by repeating SUBC a special condi tional subtract instruction Consider the
246. kes four cycles of overhead to save and restore these registers Hence sometimes it may be more economical to implement a nested loop by the more traditional method of using a register as a counter and then using a delayed branch rather than by using the nested repeat block approach Often implementing the outer loop as a counter and the inner loop as a RPTB RPTBD instruction produces the fastest execution Example 2 7 shows the use of the block repeat to find the maximum or the minimum value of 147 numbers The elements of the array are either all positive or all negative numbers Because the loop cannot be predetermined the RPTBD instruction is not suitable here Example 2 7 Use of Block Repeat to Find a Maximum or a Minimum TITLE USE OF BLOCK REPEAT TO FIND A MAXIMUM OR A MINIMUM THIS ROUTINE FINDS MAXIMUM OR MINIMUM OF N 147 NUMBERS LD 146 RC Initialize repeat counter to 147 1 LD ADDR ARO ARO points to beginning of array LDF ARO 1 RO Initialize MAX or MIN to first value BLT LOOP2 If negative array find minimum LOOP 1 RPTB MAX CMPF ARO RO Compare number to the maximum MAX LDF LT ARO RO If greater this is a new maximum B NEXT LOOP2 RPTB MIN CMPF ARO 1 RO Compare number to the minimum MIN LDF LT ARO 1 RO If smaller this is new minimum NEXT 2 18 Repeat Modes 2 5 2 Delayed Block Repeat Example 2 8 shows an application of the delayed block repeat construc
247. le CPU global interrupt ICFULLO PUSH ST PUSH RS PUSH RE PUSH RC LDI ARO R10 Read data from comm port 0 input RPTS 6 Setup for loop READ READ LDI ARO R10 Read data from comm port 0 input M STI R10 AR1 1 Store data into internal RAM STI R10 AR1 1 Store data into internal RAM POP RC POP RE POP RS POP ST RETI Using the Communication Ports 8 3 Communication Ports Example 8 2 Write Data to Communication Port With Polling Method Load comm port 0 control reg address Load comm port 0 output FIFO address Unhalt comm port 0 output channel Check if output FIFO is full Read data from internal RAM Store data into comm port 0 output ITLE WRITE DATA TO COMMUNICATION PORT WITH POLLING METHOD HE BIT 8 OF COMMUNICATION PORT 0 CONTROL REGISTER WILL BE SET ONLY WHEN THE OUTPUT FIFO IS FULL THIS EXAMPLE CHECKS HIS BIT TO MAKE SURE THERE IS SPACE AVAILABLE IN OUTPUT FIFO LDA COMM PORTO CTL AR2 LDA COMM PORTO OUTPUT ARO LDA INTERNAL_RAM AR1 Load internal RAM address AND3 OEFH AR2 R9 SII R9 AR2 LDI 0100H R9 Load mask for bit 8 WAIT TSTB AR2 R9 BZD WAIT If yes check again WRITE COMM LDI ARI 1 R10 STI R10 ARO NOP 8 4 Signal Considerations 8 2 Signal Considerations Because of the bidirec
248. le ICRDY 0 read sync end The DMA autoinitialization and transfer continues executing ifthe DMA autoin itialization is still enabled Therefore a DMA setup like the one in Example 7 4 can make it possible for an external device to control the DMA operation through the communication port With the autoinitialization feature the C4x DMA coprocessor can support a variety of DMA operations without slowing down CPU computation A good ex ample is a DMA transfer triggered by one interrupt signal Usually this is imple mented by starting a DMA activity with a CPU interrupt service routine butthis utilizes CPU time However as shown in Example 7 5 you can set up a single interrupt driven dummy DMA transfer with autoinitialization When the inter Programming the DMA Coprocessor 7 7 DMA Assembly Programming Examples rupt signal is set the DMA will complete the dummy DMA transfer and start the autoinitialization for the desired DMA transfer Example 7 5 Single Interrupt Driven DMA Transfer ITLE SINGLE INTERRUPT DRIVEN DMA TRANSFER HIS EXAMPLE SETS UP A DUMMY DMA TRANSFER FROM INTERNAL RAM O THE SAME MEMORY WITH EXTERNAL INT 0 SYNCHRONIZATION AND AUTOINITIALIZATION FOR TRANSFERRING 64 DATA FROM LOCAL MEMORY TO INTERNA
249. mat This operation occurs when data is received at the digital signal processor After processing and in order to continue transmission the result is compressed back to 8 bit format and transmitted through the channel Example 6 1 and Example 6 2 show u law compression and expansion such as linear to u law and u law to linear conversion while Example 6 3 and Example 6 4 show A law compression and expansion For expansion using a look up table is an alternative approach It trades memory space for speed of execution Because the compressed data is 8 bits long a table with 256 entries can be constructed to contain the expanded data If the compressed data is stored in the register ARO the following two instructions put the expanded data in register RO ADDI TABL ARO TABL BASE ADDRESS OF TABLE LDI ARO RO PUT EXPANDED NUMBER IN RO The same look up table approach could be used for compression but the re quired table length would then be 16 384 words for u law or 8 192 words for A law If this memory size is not acceptable you should use the subroutines presented in Example 6 1 or Example 6 3 Companding Example 6 1 u Law Compression ITLE U LAW COMPRESSION SUBROUTINE MUCMPR TYPICAL CALLING SEQUENCE LAJU MUCMPR LDI v RO NOP SSS can be other non pipeline break NOP lt instructions ARGUMENT ASSIGNMENTS ARGUMEN FUNCTION
250. mples input samples ow address h N 1 oldestinput Xn N 1 x n h N 2 x n N 2 x n N 1 e e e e e e circular e e gueue h 1 x n 1 x n 2 high h 0 newest input x n x n 1 address To set up circular addressing initialize the block size register BK to block length N Also the locations for signal x should start from a memory location whose address is a multiple of the smallest power of 2 that is greater than N For instance if N 24 the first address for x should be a multiple of 32 the lower 5 bits of the beginning address should be zero To understand see Cir cular Addressing in the TMS320C4x User s Guide In Example 6 5 the pointer to the input sequence x is incremented and as sumed to be moving from an older input to a newer input At the end of the sub routine AR1 will point to the position for the next input sample Applications Oriented Operations 6 7 FIR IIR and Adaptive Filters Example 6 5 FIR Filter TITLE FIR FILTER SUBROUTINE FIR EQUATION y n h 0 x n h 1 x n 1 i h N 1 x n N 1 TYPICAL CALLING SEQUENCE LOAD ARO LAJU FIR LOAD AR1 LOAD RC LOAD BK ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION ARO ADDRESS OF h N 1 AR1 ADDRESS OF x N 1 R
251. ms FFTs Example 6 17 Real Forward Radix 2 FFT Continued LDA LDI ADDI CMPI BLED LDA LSH LSH pi ss NDB POP POP POP POP POP POPF POP POPF POP POP POP POP RETS end No more R7 IR1 R7 RC 1 R5 GLOG SIZI LOOP DEST AD 1 IRO 1 R7 Return to C environment DP AR7 AR6 AR5 AR4 R7 R7 R6 R6 R5 R4 FP Ei lt W U1 DR AR1 Restore C environment variables kk ck ck Sk ck Ck Sk KKK KR A ck ck KKK ck Ck Sk KKK ck ck ck KKK KK KKK ck KA ck AX ck ck ck ck ck AAA ck AA AA AXA kk ck kk KKK KKK KK KK KKK 6 72 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK FILENAME IFFT RL ASM DESCRIPTION INVERSE FFT FOR DATE 1 19 93 VERSION s 02 0 KKK KK KKK kk ck kk AZ A KKK kk sk kA KKK KK KKK VERSION DATE COMMENTS Tessarolo 1 0 2 18 92 DANIEL MAZZOCCO TI Houston Original Release C30 version Started from forward real FFT routine written by Alex TMS320C40 Ck Ck ck ck ck ck ck ck ck ck AA ck ck A kk A kk ck kk ko AA ko kk kk ko A KKK ko ko kc k rev 2 0 230 1 19 93 ROSEMARIE PIEDRA TI Houston C40 porting started from kk ck Ck ck ck Ck Sk ck Ck KKK KKK KKK KKK KKK KKK KK KKK C30 inverse real FFT version 1 0
252. n the connector is 0 100 inches in both the X and Y planes The cable pod box is nonconductive plastic with four recessed metal screws Figure 11 6 Pod Connector Dimensions 2 70 4 50 ES 9 50 o SD 0 90 S Emulator Cable Pod FS Connector LES Short Jacketed Cable Gd Y Refer to Figure 11 7 Note All dimensions are in inches and are nominal dimensions unless otherwise specified 11 12 Mechanical Dimensions for the 14 Pin Emulator Connector Figure 11 7 14 Pin Connector Dimensions Cable ET Connector Side View 0 100 OO Key Pin 6 Cable 0 100 Y Connector Front View Pins 1 3 5 7 9 11 13 Pins 2 4 6 8 10 12 14 Note All dimensions are in inches and are nominal dimensions unless otherwise specified XDS510 Emulator Design Considerations 11 13 Emulation Design Considerations 11 9 Emulation Design Considerations This section describes the scan path linker SPL which can simultaneously add all four secondary JTAG scan paths to the main scan path It also de scribes how to use the emulation pins and configure multiple processors 11 9 1 Using Scan Path Linkers 11 14 You can use the TI ACT8997 scan path linker SPL to divide the JTAG emulation scan path into smaller logically connected groups of 4 to 16 devices As described in the Advanced Logic and Bus Interface Logic Data Book literature number SCYD001 the SPL is compatible with
253. n each device on the main JTAG scan path TDO on the TBC connects to TDI on the first device on the main JTAG scan path TDIO on the TBC is connected to the TDO signal of the last device on the main JTAG scan path Within the main JTAG scan path the TDI signal of a device is connected to the TDO signal of the device before it TRST for the devices can be generated either by inverting the TBC s TMS5 EVNTS signal for software control or by logic on the board itself Appendix A Glossary AO A30 External address pins for data program memory or I O devices These pins are on the global bus See also LAO LA30 address The location of program code or data stored in memory addressing mode The method by which an instruction interprets its oper ands to acquire the data it needs ALU See Arithmetic logic unit analog to digital A D converter A successive approximation converter with internal sample and hold circuitry used to translate an analog signal to a digital signal ARAU See auxiliary register arithmetic unit arithmetic logic unit ALU The part of the CPU that performs arithmetic and logic operations auxiliary registers ARn A set of registers used primarily in address gen eration auxiliary register arithmetic unit ARAU Auxiliary register arithmetic unit A16 bit arithmetic logic unit ALU used to calculate indirect ad dresses using the auxiliary registers as inputs and outputs bit reversed addressing
254. n it transfer 4 words from 0x02ffe00 index 4 to to 0x02fff00 index 1 No DMA sync transfer is used Autoinitialization method 1 requires N autoinitialization memory blocks to transfer N blocks and starts with a DMA transfer counter equals to 0 Programming the DMA Coprocessor 7 11 DMA C Programming Examples Example 7 8 Unified Mode DMA Using Autoinitialization Method 2 OK IK ok ok ok ok ok oko ok ok ok oko ok ok ok ok ok ok kok ok kok ok kok kok kok kok kok kok kok kok kok kok kok kok kok kok A I k k He EXAMPLE Unified Mode Autoinitialization method 2 DMAO in unified mode transfers 8 words from 0x02ffC00 index 1 to Ox02ffd00 index 1 and then it transfer 4 words from 0x02ffe00 index 4 to to 0x02fff00 index 1 No DMA sync transfer is used Autonitialization method 2 requires N 1 autoinitialization memory blocks to transfer N blocks and starts with a DMA transfer counter different from O0 KK A RR RR k k k k k k Ck Ck Ck Ck k k Kk Ck Ck kk kk RR kk kc ko ke ko ke ke ke ke ke ke e ee eoe eoe eoe e e kx include dma h define DMAADDR 0x001000a0 1st transfer settings define CTRLREGI 0x00c00009 DMA CPU rotating priority and DMA autoinitializes when transfer counter 0 define SRC1 0x002ffc00 src address define SRC1 IDX 0x1 src address increment define COUNTER1 0x08 number of words to transfer define DST1 0x002ffd00 dst address
255. nd 18 34 X I3 3rd I9 35 X I4 24 48 X 14 2 X I4 3rd 29 61 X I4 2nd 30 62 X I4 1st 31 63 lt X 14 COS 32 64 33 65 NI 6 76 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued STARTB LDA 1 IRO step between two consecutive sines LDI 4 R5 stage number from 4 to M LDI QFFT SIZE R7 LSH 2 R7 R7 is FFT SIZE 4 1 ie 15 for 64 ipts SUBI 1 R7 and will be used to point at A amp D LDI FFT_SIZE R6 R6 will be used to point at D LSH 1 R6 LDA SOURCE_ADDR AR5 LDA SOURCE_ADDR AR1 LOOP LSH 1 R6 R6 is FFT SIZE at the lst loop LDA AR1 AR4 ADDI R7 AR1 AR1 points at A LDA AR1 AR2 ADDI 2 AR2 AR2 points at B ADDI R6 AR4 SUBI R7 AR4 AR4 points at D LDA AR4 AR3 SUBI 2 AR3 AR3 points at C LDA R7 IR1 LDI R7 RC INLOP ADDF3 ARTI IRI1 AR3 IRLI R0 RO X IL X 13 SUBF3 AR3 AR1 R1 RL X I1 X I3 LDF ARA R2 A ii STF RO AR1 xX Il lt 4 MPYF 2 0 R2 R2 2 X I4 LDF AR2 R3 Mi STF R1 AR3 X I3 lt MPYF 2 0 R3 R3 2 X I2 STF R3 AR2 IR1 i Xt l12 i IN STE R2 AR4 IR1 X I4 1 LDA GFFT SIZE IR1 IR1 SEPARATION BETWEEN SIN COS TBLS LDA SINE_TABL
256. ne In microcomputer mode the reset vector is initialized automatically by the processor to point to the beginning of the on chip boot loader code No user action is required In microprocessor mode the reset vector is typically stored in an EPROM Example 1 1 shows how you can initialize that vector Apply a low level to the RESET input See section 1 2 Table 1 1 RESET Vector Locations in the C40 and C44 Value at RESETLOCX Pin Get Reset Vector From RESETLOC1 RESETLOCO Hex Memory Address Bus 0 0 00000 0000 Local 0 1 07FFF FFFFf Local 1 0 08000 0000 Global 1 1 OFFFF FFFFf Global t This corresponds to the 32 bit address that the processor accesses However in the C44 only the 24 LSBs of the reset address are driven on pins A0 A23 and pins LAO LA23 The corre sponding LSTRBx pins are also activated Reset Signal Generation 1 2 Reset Signal Generation Several aspects of C4x system hardware design are critical to overall system operation One such aspect is reset signal generation The reset input controls initialization of internal C4x logic and execution of the system initialization software For proper system initialization the RESET sig nal must be applied for at least ten H1 cycles that is 400 ns for a C4x operat ing at 50 MHz Upon power up however it can take 20 ms or more before the system oscillator reaches a stable operating state Therefore the power up reset circuit should generate a lo
257. nment variables POP AR7 POP AR6 POP AR5 POP AR4 POPF R7 POP R7 POPF R6 POP R6 POP R5 POP R4 POP FP RETS end No more ck ck 0k ck ck 0k ck 0k ck A KA AKA KA A KA KA A KA ck ck ck ck ck A KA KA AKA A KA KA AKA ck ck ck KA ck ck ck ck A KA AAA K AKA KA A K A kx kx Applications Oriented Operations 6 85 C4x Benchmarks 6 6 C4x Benchmarks Table 6 1 provides benchmarks for common DSP operations Table 6 2 sum marizes the FFT execution time reguired for FFT lengths between 64 and 1024 points for the four algorithms in Example 6 12 Example 6 14 Example 6 17 Example 6 18 and Example 6 15 The benchmarks are given in cycles the H1 internal processor cycle To get the benchmark time multiply the number of cycles by the processor s inter nal clock period For example for a 50 MHz C4x multiply by 40 ns Table 6 1 C4x Application Benchmarks Application Weds Cydes Inverse of a float 32 bit mantissa accuracy 7 7 Double precision integer multiply 2 2 Square root 32 bit mantissa accuracy 11 11 Vector dot productt 6 N 4 Matrix Times a Vector 10 1 R C 7 FIR Filter 6 3 N IIR Filter One Biguad 7 7 IIR Filter N gt 1 Biguads 15 2 6N LMS Lattice Filter 11 1 5P Inverse LPC Lattice Filter 9 3 3P Mu law A law Compression 15 16 14 16 10 Mu law A law Expansion 11 15 11 10 15 13 T Based on a modification of the matrix times a vector benchmark 6 86 C4x Benchmarks
258. nsforms FFTs 6 5 4 Real Radix 2 FFT Most often the data to be transformed is a sequence of real numbers In this case the FFT demonstrates certain symmetries that permit the reduction of the computational load even further Example 6 17 and Example 6 18 show the generic implementation of a real valued radix 2 FFT forward and inverse For such an FFT the total storage required for a length N transform is only N locations in a complex FFT 2N are necessary Recovery of the rest of the points is based on the symmetry conditions A companion table Example 6 13 should be used to provide the twiddle factors Example 6 17 Real Forward Radix 2 FFT FILENAME FFFT_RL ASM DESCRIPTION REAL RADIX 2 DI DATE 1 19 93 VERSION 34 0 m ERSION DAT COMMEN S 0 7 18 91 ALEX TES KKK KK KKK KKK KKK AZ KA ck KA K ZA KA AXA KX KA KK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KK KKK KK KKK KKK AZ A A KKK KK KKK KKK KKK KKK K AA KK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KK KKK KK KK F FFT FOR TMS320C40 Original ALEX TES 0 7 23 92 SAROLO TI Australia Release C30 version SAROLO TI Australia Most S Minimum Faster i Program tages One extra da odified C30 version FFT Size increased from 32 to 64 n place bit reversing algorithm Size increased by about 100 words ta word required 0 1 19 93 ROSEMARI
259. nstruments vi The following books describe the TMS320 floating point devices and related support tools To obtain a copy of any of these TI documents call the Texas Instruments Literature Response Center at 800 477 8924 When ordering please identify the book by its title and literature number TMS320C4x User s Guide literature number SPRU063 describes the C4x 32 bit floating point processor developed for digital signal processing as well as parallel processing applications Covered are its architecture in ternal register structure instruction set pipeline specifications and op eration of its six DMA channels and six communication ports TMS320C4x Parallel Processing Development System Technical Refer ence literature number SPRUO75 describes the TMS320C4x parallel processing system a system with four C4xs with shared and distributed memory Parallel Processing with the TMS320C4x literature number SPRA031 de scribes parallel processing and how the C4x can be used in parallel pro cessing Also provides sample parallel processing applications TMS320C3x C4x Assembly Language Tools User s Guide literature number SPRUO35 describes the assembly language tools assembler linker and other tools used to develop assembly language code assembler directives macros common object file format and symbolic debugging directives for the C3x and C4x generations of devices TMS320 Floating Point DSP Optimizing C Compil
260. nt to 1000H DPE RO IVTP RETI Return and enable interrupts External IIOFO interrupt service routine A global EINTOB EINTOB LDI 0 RO Change IVTP to point to 0 LDPE RO IVTP RETI Return and enable interrupts Interrupt Examples 2 3 4 Nesting Interrupts In Example 2 5 the interrupt service routine for INT2 temporarily modifies the interrupt enable register IIE and interrupt flag register IIF to permit interrupt processing when an interrupt to INTO or NMI but no other interrupt occurs When the routine finishes processing the IIE register is restored to its original state Notice that the RETIcond instruction not only pops the next program counter address from the stack but also restores GIE and CF bits from the PGIE and PCF bits This re enables all interrupts that were enabled before the INT2 interrupt was serviced Example 2 5 Interrupt Service Routine TITLE INTERRUPT SERVICE ROUTINE global ISR2 ENABLE set 2000h ASK set 9h INTERRUPT PROCESSING FOR EXTERNAL INTERRUPT INT2 ISR2 PUSH ST Save status register PUSH DP Save data page pointer PUSH IIE Save interrupt enable register PUSH IIF PUSH RO Save lower 32 bits and PUSHF RO upper 32 bits of RO PUSH R1 Save lower 32 bits and PUSHF R1 upper 32 bits of R1 LDI 0 IIE Unmask all internal interrupts LDI ASK RO M
261. nts communication port direc tion must to be changed the circuits shown in Figure 8 5 and Figure 8 6 can be used The circuits force the token to be passed and communication port direction to remain changed Even though these circuits are intended to force a change of the original com munication port direction after reset they can be used also to maintain the orig inal direction However this can be more conveniently achieved using pullups in CACK and CREQ The pullups prevent any damage to the communication ports in the event of a program error that writes into a port configured as an input Forcing a communication port to become an output port Figure 8 5 shows a circuit that forces a communication port to become an out put port In this circuit driving the CACK line with the CREQ line reconfigures an input port as an output port When a word is written to the FIFO CREQ is driven low indicating a token request After a synchronizer delay of 1 to 2 cycles U1 and U2 CACK is driven low indicating a token acknowledge CREO then goes active high and then is held high by Rp as the line switches to an input The CLK signal can be any clock with a frequency equal to or lower than the H1 H3 clock The synchronizer delay is important If no delay is provided the CREQ line will not be ready to change to an input high condition As a result the CACK line which at this point is a delayed version of CREQ is inverted an
262. ocessor support The processor TMS TDI TDO and TCK signals should be buffered through the same physical package for better control of timing skew The input buffers for TMS TDI and TCK should have pullup resistors con nected to Vcc to hold these signals at a known value when the emulator is not connected A resistor value of 4 7 kO or greater is suggested Buffering EMUO and EMU1 is optional but highly recommended to provide isolation These are not critical signals and do not have to be buffered through the same physical package as TMS TCK TDI and TDO Unbuf fered and buffered signals are shown in this section page 11 8 and page 11 9 Figure 11 5 Multiprocessor Connections JTAG Device JTAG Device Voc Emulator Header EMUO PD EMU1 95 d TRST e TMS TDI TDO e o TCK TCK RET V XDS510 Emulator Design Considerations 11 11 Mechanical Dimensions for the 14 Pin Emulator Connector 11 8 Mechanical Dimensions for the 14 Pin Emulator Connector The JTAG emulator target cable consists of a 3 foot section of jacketed cable an active cable pod and a short section of jacketed cable that connects to the target system The overall cable length is approximately 3 feet 10 inches Figure 11 6 and Figure 11 7 page 11 13 show the mechanical dimensions for the target cable pod and short cable Note that the pin to pin spacing o
263. of R2 OR RO R2 If bit 1 set J th bit of R2 CONT 3 2 3 2 Block Moves Block Moves Because the C4x directly addresses a large amount of memory blocks of data or program code can be stored off chip in slow memories and then loaded on chip for faster execution Data can also be moved from on chip memory to off chip memory for storage or for multiprocessor data transfers The DMA can transfer data efficiently in parallel with CPU operations Alterna tively you can use the load and store instructions in a repeat mode to perform data transfers under program control Example 3 3 shows how to transfer a block of 512 floating point numbers from external memory to block 1 of on chip RAM Example 3 3 Block Move Under Program Control E TITLI extern block1 word word LDI LDI LDF RPTS LDF STF STF E BLOCK MOVE UNDER PROGRAM CONTROL 01000H 02FFCOOH extern ARO Source address block1 AR1 Destination address ARO RO Load the first number 510 Repeat following instruction 511 times ARO RO Load the next number and RO AR1 store the previous one RO AR1 Store the last number Logical and Arithmetic Operations 3 3 Byte and Half Word Manipulation 3 3 Byte and Half Word Manipulation Example 3 4 Use of Packing Data From Half Word FIFO to 32 Bit Data Memory A set of instructions for byte and half word accessibility such as LB 3 2 1
264. of the internal counter EMUO becomes a processor halted signal During a RUNB or other external analysis count the EMU0 1 IN signal to all boards must remain in the high disabled state You must provide some type of external input XCNT ENABLE to the PAL to disable the PAL from driving EMUO 1 IN to a low state If sources other than TI processors such as logic analyzers are used to drive EMUO their signal lines must be isolated by open collector drivers and be inactive during RUNB and other external analysis counts You must connect the EMUO 1 OUT signals to the emulation header or di rectly to a test bus controller Emulation Design Considerations Figure 11 10 Suggested Timings for the EMUO and EMU1 Signals EMU0 1 OUT RRA EMUO 1 IN RRAN Figure 11 11 EMUO 1 Configuration With Additional AND Gate to Meet Timing Requirements A AA Target Board 1 Pullup Resistor gt Open Collector sine EMUO 1 Drivers Backplane Device Device XCNT_ENABLE 1 n pu EA 4 EMU0 1 IN Pullup PAL Resistor EMUO 1 OUT Target Board m E Pullup Resistor TCK To Emulator EMUO gt Open Circuitry required for gt 25 ns rising Collector ois EMUO 1 falling edge modification Drivers O 2 o o Bl o gt lt o o EMU signal from other boards Notes 1 The low time on EMUx IN should be at least one TCK cycle
265. ogram Be wary with interrupt Service routines Make sure interrupt service routines save the DP pointer Applications Oriented Operations 6 57 Fast Fourier Transforms FFTs Example 6 17 Real Forward Radix 2 FFT Continued ck ck ck ck ck ck kk KKK KKK KKK KA K A A KA AXA KA KKK KKK KKK KKK KKK KKK KKK KKK KA AKA AAA KA AKA KA KKK KKK KKK REGISTERS USED RO R1 R2 R3 R4 R5 R6 R7 ARO AR1 AR2 AR3 AR4 AR5 AR6 AR7 IRO IR1 de RC RS RE DP MEMORY REQUIREMENTS Program 405 Words approximately Data 7 Words id Stack 12 Words ck ck ck ck ck ck kk ck KA AK KA AA A KA K A A KA AXA KA ck ck ck ck ck ck KA ck ck ck ck ck AKA ck ck ck ck ck AKA XA AKA AKA KA kk kk AKA kx kx ko ko k ok BENCHMARKS Assumptions Program in RAMO Reserved data in RAMO m Stack on Local Global Bus RAM Sine Cosine tables in RAMO Processing and data destination in RAMI Local Global Bus RAM 0 wait state FFT Size Bit Reversing Data Source Cycles C40 M Zk 1024 OFF RAMI 19404 approx Note This number does not include the C callable overheads This benchmark is the number of cycles between labels STARTB and ENDB NOTE If ffttxt is located off chip enable cache for faster performance ok ck ck ck ck ck kk ck KA AK KA AXA A KA XK A A KA AXA KA AKA ck ck A
266. on 0 00 RII eens 11 19 Suggested Timings for the EMUO and EMU1 Signals 0 0 0 0 0 00 eee 11 21 EMUO 1 Configuration With Additional AND Gate to Meet Timing Hequlremlents aeo dra ea ta aoe dae da KDE EH Rn 11 21 EMU0 1 Configuration Without Global Stop 11 22 TBC Emulation Connections for n JTAG Scan Paths 0 000 c eee eee 11 23 Contents xxiii Tables 9 2 10 1 10 2 10 3 10 4 11 1 11 2 XXIV RESET Vector Locations in the C40 and C44 0 cece ees 1 2 Local Global Bus Control Signals 00 0 eee eee eee 4 4 Page Switching Interface Timing 2 440 eens 4 20 C4x Application Benchmarks 2 6 86 FFT Timing Benchmarks Cycles 2 eee eee ees 6 87 Wait State Timing Table 00 ccc cece tenet rn 9 15 Current Equation Typical Values FCLK 40 MHz 0 0 0 cece eee eee 9 23 Sockets that Accept the 325 pin C40 and the 304 pin C44 o 10 6 Manufacturer Phone Numbers 0000 c cece tee eee n 10 6 Device Part Numbers 0000 cece eee hn 10 11 Development Support Tools Part Numbers 000 c cece eee eee eens 10 12 14 Pin Header Signal Descriptions lt 11 2 Emulator Cable Pod Timing Parameters 2 11 5 po o I N E 0 Ld LI I I I D 200 6 01 RO Ld to toby gd Lg ol BON O c c c Il cl o O 00 J OD DO o C5 mM EN DAAMNAADAOAATHAWWWWWW
267. on the EMUO 1 signal Therefore all external sources must be inactive when any device is in the external count mode TI devices can be configured by software to halt processing if their EMUO 1 pins are driven low This feature in combination with the use of the signal event output mode allows one TI device to halt all other TI devices on a given event for system level debugging If you route the EMUO 1 signals between boards they require special handling because these signals are more complex than normal emulation signals Figure 11 9 shows an example configuration that allows any processor in the System to stop any other processor in the system Do not tie the EMUO 1 pins of more than 16 processors together in a single group without using buffers Buffers provide the crisp signals that are required during a RUNB run bench mark debugger command or when the external analysis counter feature is used Figure 11 9 EMU0 1 Configuration Open Collector Drivers Emulation Design Considerations Target Board 1 Pullup Resistor Backplane XCNT_ENABLE EMUO 1 IN Pullup PAL Resistor EMUO 1 OUT TCK To Emulator EMUO lt gt Open Collector Drivers EMUO 1 Device Device 1 n Target Board m Pullup Resistor EMUO 1 Device Device 1 hh n Notes 1 The low time on EMUx IN should be at least one TCK cycle and less than 10 us Software will set the EMUx OUT pin to a high sta
268. on when using a DMA channel to transfer from one communication port to another The C4x does not allow synchronization of DMA channel reads writes with ICRDY OCRDYj signals coming from two different communication ports iz j When your application requires initializing the primary or auxiliary DMA channel while the auxiliary or primary channel may still be running halt the running channel by writing a halt signal to the START or AUX START bits Before proceeding check the STATUS or AUX STATUS bits of the running channel to ensure it has halted This is necessary because the DMA halt takes place in read write boundaries depending on the type of halt issued and the channel must wait for any ongoing read or write cycles to complete When reinitializing this channel be especially careful to restore its previous status exactly For an example of how to deal with this situation refer to the Designer Notebook Page split mode DMA re ini tialization available through the DSP hotline When a DMA Channel Finishes a Transfer 7 2 When a DMA Channel Finishes a Transfer Many applications require that you perform certain tasks after a DMA channel has finished a block transfer You can program the DMA to interrupt the CPU when this happens TCC or AUX TCC bits You can also achieve this by polling if m The corresponding IIF DMA INTx bit is set to 1 interrupt polling This requires that the DMA control register TCC or AUX TCC bit b
269. ontext environment was in C then your program must perform one of two tasks Ifthe program is in a subroutine it must preserve the dedicated C regis ters Save as integers Save as floating point R4 RS R6 R7 AR4 AR5 AR6 AR7 FP DP small model only SP R8 C4x only If the program is in an interrupt service routine it must preserve all of the C4x registers as Example 2 6 shows If the previous context environment was in assembly language you need to determine which registers you must save based on the operations of your as sembly language code Context Switching in Interrupts and Subroutines Example 2 6 Context Save and Context Restore global ISR1 TOTAL CONTEXT SAVE ON INTERRUPT ISRl PUSH ST Save status register SAVE THE EXTENDED PRECISION REGISTERS PUSH RO Save the lower 32 bits of RO PUSHF RO and the upper 32 bits PUSH R1 Save the lower 32 bits of R1 PUSHF R1 and the upper 32 bits PUSH R2 Save the lower 32 bits of R2 PUSHF R2 and the upper 32 bits PUSH R3 Save the lower 32 bits of R3 PUSHF R3 and the upper 32 bits PUSH R4 Save the lower 32 bits of R4 PUSHF R4 and the upper 32 bits PUSH R5 Save the lower 32 bits of R5 PUSHF R5 and the upper 32 bits PUSH R6 Save the lower 32 bits of R6 PUSHF R6 and the upper 32 bits PUSH R7 Save the low
270. or DIF Complex or real FFTs FFTs of different lengths etc The following C callable FFT code examples are provided in this section J Complex radix 2 DIF FFT subsection 6 5 1 Complex radix 4 DIF FFT subsection 6 5 2 E Y Faster Complex radix 2 DIT FFT subsection 6 5 3 Y Real radix 2 DIF FFT subsection 6 5 4 Fast Fourier Transforms FFTs Code for these different FFTs can be found in the DSP Bulletin Board Service under the filename C40FFT EXE This file includes code input data and sine table examples and batch files for compiling and linking For instructions on how to access the BBS see subsection 10 1 3 The Bulletin Board Service BBS To use these FFT codes you need to perform two steps Provide a sine table in the format required by the program This sine table is FFT size specific with the exception of the sine table required for Complex radix 2 DIT and the real radix 2 DIF FFT programs as noted in Example 6 18 Align the input data buffer on a n 1 memory boundary i e the n 1 LSBs of the input buffer base address must be zero n log FFT SIZE For most applications the C4x quickly executes FFT lengths of up to 1024 points complex or 2048 points real because it can do so almost entirely in on chip memory For FFTs larger than 1024 complex see the application report Parallel 1 D FFT Implementation with the TMS320C 4x DSPs in the book Parallel Proces sing Applications with the TMS320C4x
271. orms FFTs Example 6 17 Real Forward Radix 2 FFT Continued Pe PrrrrrryrPrPHrnrree E NPNNNPHUNEPEREEE Zenu unun p un E o nz DI un d DA DA UBI DA DA DA DA DA DDI DDI DDI DDI DDI DA DF LDA LDF DA DF DA PYF3 PYF3 PYF3 PYF3 DDF3 PYF3 UBF 3 UBF 3 DDF3 TE UBF 3 TF DDF 3 TF PYF3 FE DF3 YF3 BF3 BF3 DF3 OGG UOH E BF3 i E TE PYF3 PYF3 PYF3 FFT_SIZE RC 4 RC RC IR1 2 IRO 3 RC DEST_ADDR ARO ARO AR1 ARO AR2 ARO AR3 ARO AR4 1 ARO 7 AR1 9 AR2 15 AR3 11 AR4 SINE TABLE AR7 AR7 IR1 R7 AR7 AR6 AR6 IR1 R6 AR6 AR5 AR5 IR1 R5 16 IR1 AR7 AR4 RO AR2 IRO R5 R4 AR3 IRO R5 R1 AR7 AR3 RO RO R1 R2 AR6 AR4 RO R4 RO R3 AR1 IR0 R3 R4 AR1 R3 R4 R4 AR2 R2 ARO IR0 R4 R4 AR3 ARO R2 R4 R4 AR1 AR3 R6 R1 R4 ARO RO R1 R2 AR5 ARA IRO RO RO R1 R3 AR1 R3 R4 AR1 R3 R4 R4 AR2 R2 ARO R4 R4 AR3 ARO R2 R4 R4 AR1 AR2 R7 R4 R4 ARO AR3 R7 R1 AR5 AR3 RO R7 SIN 1 2 pi 16 AR7 COS 3 2 pi 16 R6 SIN 2 2 pi 16 AR6 COS 2 2 pi 16 R5 SIN 3 2 pi 16 AR5 COS 1 2 pi 16 RO X 13 COS 3 R4 X I3 SIN 3 R1 X I4 SIN 3 RO X I4 COS 3 R2 X I3 COS X 14 SIN
272. ors Four are system related J Operation frequency Supply voltage Operating temperature 3 Output load Several others are related to TMS320C4x operation Duty cycle of operations Number of buses used Wait states Cache usage Data value O O O O C You can calculate the total power supply current requirement for a C4x device by using the equation below which comprises the four basic power supply cur rent components and three system related dependencies described above hotal lg hops libus bus X Fx Vx T where kota is the total supply current y is the quiescent current component lops is the current component due to internal operations lpus is the current component due to internal bus usage including data value and cycle time dependency Basic Current Consumption kbus is the current component due to external bus usage including data value wait state cycle time and capacitive load dependency F is a scale factor for frequency Vis a scale factor for supply voltage and T is a scale factor for operating temperature This report describes in detail the application of this equation and determina tion of all the dependencies The power dissipation measurements in this re port were taken using a C40 PG 3 X running at speeds up to 50 MHz and at a voltage level of 5 V The minimum power supply current requirement is 130 mA The typical current consumption for most algorithms is 350 mA as de
273. ot always possible because interconnections and gating for chip enable generation can cause them In addition if you choose a memory device with an output enable the output enable must become active quickly enough to ensure that the memory can meet the data valid timing requirements of the C4x For memories with 20 ns access times the output enable active to data valid timing parameter is typically less than 10 ns Currently available RAMs without output enable OE control lines include the 1 bit wide organized RAMs and most of the 4 bit wide RAMs Those with OE controls include the byte wide and a few of the 4 bit wide RAMs Many of the fastest RAMs do not provide OE control they use chip enable CE controlled write cycles to ensure that data outputs do not turn on for write operations In CE controlled write cycles the write control line WE goes low before CE goes low and internal logic holds the outputs disabled until the cycle is completed Using CE controlled write cycles is an efficient way to interface fast RAMs without OE controls to the C4x at full speed p Note You can find timing parameters for CLKIN H1 H3 and memory in the TMS320C40 and TMS320C44 data sheets I Memory Interfacing 4 5 Zero Wait State Interfacing to RAMs 4 4 1 Consecutive Reads Followed by a Write Interface Timing Figure 4 3 shows the timing of conse
274. ould be avoided Figure 8 1 Impedance Matching for C4x Communication Port Design Pin as an Output V Pin as an Input Veo CC 10 ko 10 ko Www Rb Rs Rs 22 33 Q Z9 50 100 Q Lower than optimum Even though pullup resistors do not help for impedance matching they are recommended at each end to avoid unintended triggering after reset when RESET going low is not received on all C4x devices at the same time A pulldown resistor is not desirable because it increases power consumption does not protect the device from a fault condition and can cause token loss and byte slippage on reset Using the Communication Ports 8 5 Signal Considerations 8 6 For jumps to other boards or for long distances a unidirectional data flow with buffering is the preferred method In this case use buffers with hysteresis for CSTRB and CRDY at each end with delays greater than those in the data bus This has two advantages it cleans up the signals and helps eliminate glitches that can be erroneously perceived as valid control it also allows the data bits to settle before the receiver sees CSTRB going low Interfacing With a Non C4x Device 8 3 Interfacing With a Non C4x Device To guarantee a correct word transfer operation between a C4x communica tion port and a non C4x device the non C4x device should mimic the hana shaking operation between CSTRB and CRDY word transfer CREQ and CACK token transfer The token
275. outputs with DC loads are being switched the power dissipation compo nents from outputs being driven high and outputs being driven low should be averaged and added to the total device power dissipation Power components due to DC loading of the outputs should be calculated separately for each pro gram segment before average power is calculated Note that unused inputs that are left unconnected may float to a voltage level that will cause the input buffer circuits to remain in the linear region and there fore contribute a significant component to power supply current Accordingly if you want absolute minimum power dissipation you should make any unused inputs inactive by either grounding or pulling them high If several unused inputs must be pulled high they can be pulled high together through one resis tor to minimize component count and board space C4x Power Dissipation 9 25 Galculation of Total Supply Current 9 26 When you use power dissipation values to determine thermal management considerations use the average power unless the time duration of individual program segments is long The thermal characteristics of the TMS320C40 in the 325 pin PGA package are exponential in nature with a time constant on the order of minutes Therefore when subjectedto a change in power the tem perature of the device package will require several minutes or more to reach thermal equilibrium If the duration of program segments exhibiting high po
276. pedef typedef typedef struct dmaunif volatile void ctrl volatile void src volatile int src_idx volatile int counter volatile void dst volatile int dst_idx struct dmaunif linkp DMAUNIF struct dmaprim volatile void ctrl volatile void src volatile int src idx volatile int counter struct dmaprim linkp DMAPRIM struct dmaaux volatile void ctrl volatile void dst volatile int dst_idx volatile int acounter struct dmaaux alinkp DMAAUX struct 4 volatile void ctrl volatile void src volatile int src_idx volatile int counter volatile void dst volatile int dst_idx struct dmaprim linkp volatile int acounter struct dmaaux alinkp DMASPLIT define PRIM_WAIT_DMA x while define AUX_WAIT_DMA x while define SPLIT WAIT DMA x while control register x source address ky Source address index transfer counter x dest address dest address index link pointer control register prim src address 2 prim index prim transfer counter link pointer ES control register aux dst address aux index aux transfer counter aux link pointer my control register x prim src address 57 prim index s prim transfer counter aux dst address x aux index link pointer x aux transfer counter aux link poin
277. ple 5 2 Optimizing a Loop loop 1 main float a 10 b 10 int i for i 0 i lt 10 I afi a i 20 bli loop 2 main float a 10 b 10 int i register float p a q b for i 0 i lt 10 i ptt p 20 qtt Loop 1 executes in 19 cycles Loop 2 which is the equivalent of loop 1 executes in 12 cycles EJ Use structure assignments to copy blocks of data The compiler gen erates very efficient code for structure assignments so nest objects within structures and use simple assignments to copy them Avoid large local frames and declare the most often used local vari ables first The compiler uses indirect addressing with an 8 bit offset to access local data To access objects on the local frame with offsets greater than 255 the compiler must first load the offset into an index register This causes 1 extra instruction and incurs 2 cycles of pipeline delay Avoid the large model The large model is inefficient because the compil er reloads the data page pointer DP before each access to a global or static variable If you have large array objects use malloc to dynamical ly allocate them and access them via pointers rather than declaring them globally Example 5 3 illustrates two methods for allocating large array objects Programming Tips 5 3 Hints for Optimizing C Code Example 5 3 Allocating Large Array Objects Bad Method
278. processor Initialization brings the processor to a known state Generally initialization takes place any time after the processor is reset This chapter reviews the con cepts explained in the user s guide and provides examples Topic Page del FResetProcesse repe PESSIMI EIE 1 2 1 2 Reset Signal Generation 22 2232 wees cee ee ele a 1 3 1 3 Multiprocessing System Reset Considerations 1 5 1 4 How to Initialize the Processor lt lt lt lt lt lt lt 1 6 Reset Process 1 1 Reset Process After RESET is applied the C4x jumps to the address stored in the reset vec tor location and starts execution from that point In order to reset the C4x correctly you need to comply with several hardware and software requirements al O Select the reset vector location The RESET vector of the C4x can be mapped to one of four different locations that are controlled by the value of the RESETLOC 1 0 pins at RESET Table 1 1 shows possible reset vectors for the C40 and C44 If the DSP is in microcomputer mode ROMEN pin 1 RESET LOC 1 0 must be equal to 0 0 for the boot loader to operate correctly If the DSP is in microcomputer mode set the IIOFx pins as discussed in the bootloader chapter TMS320C4x User s Guide so that the bootloader works properly Provide the correct reset vector value The RESET vector normally contains the address of the system initial ization routi
279. processor 7 5 DMA Assembly Programming Examples through external interrupt INT2 synchronization and bit reversed addressing The DMA auxiliary channel transfers data from communication port 3 to inter nal RAM via external interrupt INT3 synchronization and linear addressing Example 7 3 DMA Split Mode Transfer With External Interrupt Synchronization FF X HH TITLE DMA SPLIT MODE TRANSFER WITH EXTERNAL INTERRUPT SYNCHRONIZATION THIS EXAMPLE SETS UP DMA CHANNEL 1 TO SPLIT MODE THE PRIMARY CHANNEL TRANSFERS DATA FROM INTERNAL RAM TO COMM PORT 3 OUTPUT REGISTER WITH EXTERNAL INTERRUPT INT2 SYNCHRONIZATION AND BIT REVERSED ADDRESSING THE AUXILIARY CHANNEL TRANSFERS DATA FROM COMMUNICATION PORT 3 INPUT REGISTER TO INTERNAL RAM WITH EXTERNAL INTERRUPT INT3 SYNCHRONIZATION AND LINEAR ADDRESSING data word word word word word word word word LDP 001000B0H DMA channel 1 map address 03CDDOD4H DMA register initialization data 002FFCOOH 08H The same value as IRO for bit reversed 8 002FF800H T 8 text DMA1 Load data page pointer DMA1 ARO Point to DAM channel 1 registers SOURCE RO Initialize DMA primary source register RO ARO 1 SRC_IDX RO Initialize DMA primary source index register RO ARO 2 COUNT RO Initialize DMA primary count register RO ARO 3 DESTIN RO Initialize DMA aux destination register RO AR
280. processors generate code develop algorithm implementations and ful ly integrate and debug software and hardware modules The following products support the development of C4x applications Code Generation Tools O a a The optimizing ANSI C compiler translates ANSI C language directly into highly optimized assembly code You can then assemble and link this code with the TI assembler linker which is shipped with the compiler It sup ports both C3x and C4x assembly code This product is currently avail able for PCs DOS DOS extended memory OS 2 VAX VMS and SPARC workstations See the TMS320 Floating Point DSP Optimizing C Compiler User s Guide SPRUO34 for detailed information about this tool The assembler linker converts source mnemonics to executable object code It supports both C3x and C4x assembly code This product is cur rently available for PCs DOS DOS extended memory OS 2 The C3x CAx assembler for the VAX VMS and SPARC workstations is only available as part of the optimizing C3x C4x compiler See the TMS320 Floating Point DSP Assembly Language Tools User s Guide SPRUOS5 for detailed information about available assembly language tools The digital filter design package helps you design digital filters System Integration and Debug Tools The simulator simulates via software the operation of the C4x and can be usedin C and assembly software development This product is current ly available fo
281. r dual access RAM definition A 3 DuPont connector 11 3 EMUO 1 configuration 11 19 11 21 11 22 emulation pins 11 18 INsignals 11 18 rising edge modification 11 21 EMU0 1 signals 11 2 11 5 11 6 11 11 11 16 emulation JTAG cable 11 1 timing calculations 11 6 to 11 7 11 16 to 11 24 emulator connection to target system JTAG mechanical dimensions 11 12 to 11 24 designing the JTAG cable 11 1 emulation pins 11 18 signal buffering 11 8 to 11 11 target cable header design 11 2 to 11 3 emulator pod JTAG timings 11 5 extended precision registers 3 17 extended precision floating point format defini tion A 3 extended precision register 2 2 definition A 3 external flag pins 4 3 external interfacing 4 3 example 4 3 external interrupt definition A 3 external logic 4 12 external ready generation 4 13 Index 3 Index fast devices OR 4 12 fast Fourier transforms 3 6 6 56 DIF decimation in frequency 6 24 DIT decimation in time 6 24 6 42 to 6 54 inverse 6 73 fast Fourier transforms FFT 6 24 benchmarks 6 87 complex radix 2 DIF 6 26 DIF decimation in frequency 6 27 to 6 31 6 33 6 34 to 6 41 DIT decimation in time 6 55 DIT decimations in time 6 41 DMA 6 24 real radix 2 6 56 theories references 6 24 twiddle factors 6 32 twiddle table 6 41 types of 6 24 FFT See fast Fourier transforms FIFO buffer definition A 3 filters adaptive 6 7 digital See digital filters example 6 10 FIR 6 14 See also FIR filters IIR
282. r PCs DOS Windows and SPARC workstations See the TMS320C4x C Source Debugger User s Guide SPRU054 for detailed in formation about the debugger The XDS510 emulator performs full speed in circuit emulation with the C4x providing access to all registers as well as to internal and external memory of the device It can be used in C and assembly software develop ment and has the capability to debug multiple processors This product is currently available for PCs DOS Windows OS 2 and SPARC worksta tions This product includes the emulator board emulator box power sup ply and SCSI connector cables in the SPARC version the C4x C Source Debugger and the JTAG cable Because C3x and C5x XDS510 emulators also come with the same emu lator board or box as the C4x you can buy the C4x C Source Debugger Development Support Software as a separate product called C4x C Source Debugger Conver sion Software This enables you to debug C3x C4x applications with the same emulator board The emulator cable that comes with the C3x XDS510 emulator cannot be used with the C4x A JTAG emulation con version cable see Section 10 3 is needed instead The emulator cable that comes with the C5x XDS510 emulator can also be used for the C4x without any restriction See the TMS320C4x C Source Debugger User s Guide SPRU054 for detailed information about the C4x emulator The parallel processing development system PPDS is a stand alone
283. r is greater than six inches Emula tion signals TMS TDI TDO and TCK RET are buffered through the same package Greater Than l 6 Inches gt Vcc ge JTAG Device Emulator Header EMUO EMUO PD EMU1 EMU1 TRST TRST TMS TMS TDI TDI TDO gt TDO TCK TCK gt TCK_RET V GND B The EMUO and EMU1 signals must have pullup resistors connected to Vcc to provide a signal rise time of less than 10 us A 4 7 kQ resistor is suggested for most applications B The input buffers for TMS and TDI should have pullup resistors con nected to Vcc to hold these signals at a known value when the emula tor is not connected A resistor value of 4 7 KQ or greater is suggested B To have high quality signals especially the processor TCK and the emulator TCK RET signals you may have to employ special care when routing the PWB trace You also may have to use termination resistors to match the trace impedance The emulator pod provides optional internal parallel terminators on the TCK RET and TDO TMS and TDI provide fixed series termination B Since TRST is an asynchronous signal it should be buffered as needed to insure sufficient current to all target devices XDS510 Emulator Design Considerations 11 9 Connections Between the Emulator and the Target System 11 7 2 Using a Target System Clock Figure 11 4 shows an application with the system test clock generated in the target system In this appl
284. r6 DI AR r0 DI BR r3 DR r3 r5 rl CI CR r2 CI DR r3 Applications Oriented Operations 6 45 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued radix 4 butterfly loop mpyf ar ar2 r0 subf r0 r2 r2 mpyf ar arlt rl addf r6 r3 addf rO ar0O r4 A stf r4 ar4 subf r ar tt rb 3 otf r2 arb subf ET EGET addf rl ar3 r6 A Sti r7 ar6t subf rl ar3 r7 stf r3 ar2 addf r6 r4 r0 H mpyf ar7 ar3tt rl subf r6 r4 3 addf ritari rO z stf r0 ar4 subf rl arl rl A stf r3 ar5 addf ELE 62 mpy f ar2 aE yel subf el 5 43 addf rl ar 0 r2 H stf r2 ar2 irl subf r1 ar0 r6 z stt r3 ar6 blki addf O 2 54 A clear pipeline subf EO 2 2 addf r r6 r3 3 stf r4 ard 3 B stf r2 r5 subf r7 r6 r J z stf r7 ar6 B stf E3 a kz R A ae THIRD TO LAST 2 STAGE ro rI r4 PS DIT r6 7 AR rl ro rl CR r2 r6 AI BI QI AI DI DI rl CR BI r2 r2 r0 BR CI r3 r6 r7 AR CR AI r4 AR CR BI r2 r7 r6 r7 DR BR DI r7 DR BR CI r3 r0 r4 r6 DI BR r3 r4 r6 BI DI AR r0 BI DI BR r3 r2 r5 rl CI DR r3 r5 rl AI CI CR r2 AI CI DR r3 r
285. rably faster to access than external memory In a single cycle two operands can be brought from internal memory You can maximize performance if you use the DMA coprocessor in parallel with the CPU to transfer data you want to operate on to internal memory Avoid pipeline conflicts For time critical operations make sure that cycles are not missed because of pipeline conflicts If there is no problem with program speed ignore this suggestion g Plan your linker command file in advance Memory allocation for code and data sections can have a big impact on your algorithm performance One ofthe C4x s strengths is its sustained bandwidth achieved by having two external busses By carefully dividing data and program between the two busses you can minimize pipeline conflicts You need to apply the same concept to minimize DMA CPU access conflicts The above checklist is not exhaustive and it does not address some features in detail To learn how to exploit the full power of the C4x carefully study its architecture hardware configuration and instruction set which are all de scribed in the TMS320C4x User s Guide SPRU063 Chapter 6 Applications Oriented Operations The C4x architecture and instruction set features facilitate the solution of nu merically intensive problems This chapter presents examples of applications that use these features such as companding filtering matrix arithmetic and fast Fourier transforms FFT
286. ramic PGA GF 325 pin ceramic PGA HFH 352 leaded CER OFP J ceramic DIP JD ceramic DIP side brazed N plastic DIP TA tape automated bonding encapsulated TB tape automated bonding bare die KGD known good die PDB 304 pin plastic guad flatpack DEVICE C1x DSP 10 14 15 16 17 C2x DSP 25 26 28 C3x DSP 30 31 32 C4x DSP 40 44 C5x DSP 50 51 52 53 Table 10 3 lists C4x device part numbers Table 10 4 lists the development support tools available for the C4x DSP their part numbers and the platform on which they run Table 10 3 Device Part Numbers Device Part Number TMS320C40GFL TMS320C40GFL60 TMS320C44PDB50 TMS320C44PDB60 SMJ320C40GFM40 SMJ320C40GFM50 SMJ320C40HFHM40 SMJ320C40HFHM50 SMJ320C40TAM40 SMJ320C40TBM40 TMS320C40TAL50 SMJ320C40TAM50 SMJ320C40TBM50 TMS320C40TAL60 SMJ320C40KGDM40 SMJ320C40KGDM50 TMS320C40KGDL50 TMS320C40KGDL60 Voltage 5V 5V 5V 5V 5V Operating Frequency 50 MHz 40 ns 60 MHz 33 ns 50 MHz 40 ns 60 MHz 33 ns 40MHz 50 ns 50MHz 40 ns 40MHz 50 ns 50MHz 40 ns 40MHz 50ns 40MHz 50ns 50MHz 40ns 50MHz 40ns 50MHz 40ns 60MHz 33ns 40MHz 50ns 50MHz 40ns 50MHz 40ns 60MHz 33ns Comm Ports O O O O O O O o O O O O Oo RX A o Part Order Information Package 325 pin ceramic PGA 325 pin ceramic PGA 304 pin PQFP 304 pin POFP 325 pin ceramic PGA 325 pin ceramic PGA 352 lead ceramic PGA 352 lead ceramic PGA 324 pad
287. re 9 5 or Figure 9 6 this value can be scaled by a data dependency factor if necessary as described on page 9 16 This scaled value is then summed along with several other cur rent terms to determine the total supply current C4x Power Dissipation 9 15 Current Requirement of Output Driver Components 9 4 2 DMA Using DMA to transfer data consumes power that is data dependent The cur rent resulting from DMA bus usage Ipma varies linearly with the transfer rate Figure 9 7 shows DMA bus current requirements for transferring alternating data AAAA AAAAh to 5555 5555h at several transfer rates it also shows that current consumption increases when more DMA channels are used However as more DMA channels are used the incremental change in current dimi nishes as the internal DMA bus becomes saturated Note that DMA current is superimposed over libus internal bus value Figure 9 7 DMA Bus Current Versus Clock Rate Incremental I pp mA 350 300 250 200 150 100 50 feik MHz 9 4 3 Communication Port Communication port operations add a data dependentterm to the equation for the current requirement The current resulting from communication port opera tion cp varies linearly with the transfer rate Figure 9 8 shows communica tion port operation current requirements for transferring alternating data AAAA AAAA to 5555 5555h at several transfer rates it also shows that cur rent consumption increases wh
288. red when alternating values of 5555 55555h and AAAAAAAAh were written at a capacitive load of 80 pF per output signal line This condition exhibits the highest current values on the device The val ues presented in the figure represent the incremental current contributed by the local or global bus output driver circuitry under the given conditions Cur rent values obtained from this graph are later scaled and added to several other current terms to calculate the total current for the device As indicated in the figure the lower limit base lg liops libus is essentially koza fortranster rates less than 1 Mword second C4x Power Dissipation 9 13 Current Requirement of Output Driver Components Figure 9 5 Local Global Bus Current Versus Transfer Rate and Wait States Ipp mA 420 380 340 300 260 220 180 140 1 1 1 1 1 LI 1 107017773777 57 STI STIis dominated NN a S i i i by execution of internal NOPs i i i Cload 80 pF with maximum data complexity D p D t A p y ssl STI STI is internally stalled i while waiting for bus ready S T 2 2 M uc E puc E EXTWS MAX 1 1 1 1 ii 1 LI 1 i i ji s LI i mE i n m NOTE Upper line is caused by execution of internal NOPs 0 1 2 3 4 5 6 7 8 9 10 11 12 Transfer Rate Mword second Figure 9 5 demonstrates a feature of the C4x s external bus architecture known as a posted write In general data is written to a latch or a one deep FIFO an
289. rithm 3 12 3 15 NMI 2 13 See also nonmaskable interrupt nomenclature 10 9 nonmaskable interrupt NMI definition A 6 normalization 3 15 Index 6 OCEMPTY interrupt example 8 2 OCRDY interrupt example 8 2 operations examples 3 1 introduction 3 1 logical instructions 3 2 output enable OE controls 4 5 output modes external count 11 18 signal event 11 18 output port 8 15 overflow flag OV bit definition A 6 packing data example 3 4 page switching 4 18 page switching example 4 19 PAL 11 19 11 20 11 22 parallel instruction set optimization use 5 5 parallel processing C4x to C4x 8 20 distributed memory 8 19 shared and distributed memory 8 19 shared bus 4 22 shared memory 4 21 8 19 part numbers device 10 11 tools 10 12 part order information 10 9 PC See program counter peripheral bus definition A 6 phone numbers manufacturer 10 6 pipeline definition A 6 pipelined linear array 8 18 PLD equations 4 16 polling method communication port 8 4 POP instruction 2 7 2 14 POPF instruction 2 7 port driver circuit diagram 8 16 primary channel 7 14 processor delays 4 5 processor initialization 1 6 C language 1 9 example 1 7 introduction 1 1 product vector 6 21 program control instructions 2 1 introduction 2 1 program counter definition A 6 programming tips 7 2 introduction 5 1 protocol bus 11 3 pulldown resistor 8 5 pullups 1 5 8 5 PUSH instruction 2 7 2 14 PUSHF inst
290. rogramming Examples Example 7 11 Split Mode DMA Using Autoinitialization KR KK kk kk kk kk kk kk kk kCkCkCk kk kk kk kk kk k AAA EXAMPLE Split mode AUX and PRIMARY both running Autoinitialization example DMA3 aux channel autoinitializes and THEN receives 4 words from commport 3 input FIFO to memory 0x02ffd00 DMA3 pri channel sends 4 words from memory 0x02ffc00 to commport 3 output FIFO and THEN other 2 words from memory 0x02ffc10 with index 2 to commport 3 output FIFO DMA3 prim channel uses OCRDY3 write sync DMA3 aux channel uses ICRDY3 read sync Autoinitialization method 1 is used in all cases In this program DMA3 aux channel expects data in commport 3 being sent by another processor device Otherwise no aux channel transfer will occur FER A RA A KCKCkCK KCK kk kk KCKCKCK k k KCK KCK KCK KCK KCK KCK KCKCKCK KCK KCK KCK KCK KCK KCK CKCK CK Ck K ck k ck k ck ck ck ck ck I RR f include dma h define DMAADDR 0x001000d0 define CTRLREG1 0x03cdc0e9 DMA aux prim send interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority read write sync transfer define CTRLREG2 0x03cdc0d5 same as above but transfer finishes define DIEVAL 0x24000 set ICRDY3 OCRDY read write sync Primary Channel define SRC1 0x02ffc00 autoinitialization 1 define SRC1 IDX 0x1 define COUNTER1 0x04 define SRC2 0x02ffc10 autoinitialization 2 de
291. rt 3 output fifo define DST1 IDX 0x1 dst address increment 2nd transfer settings define CTRLREG2 0x00c40005 DMA sends interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority and DMA stops after transfer completes define SRC2 0x002ffe00 src address define SRC2 IDX 0x4 src address increment define COUNTER2 0x4 number of words to transfer define DST2 0x002fff00 dst address define DST2 IDX 0x1 dst address increment DMAUNIF dma DMAUNIF DMAADDR DMAUNIF autoini2 main initialize 2nd set of autoinitialization values autoini2 src void SRC2 autoini2 src_idx SRC2_IDX autoini2 counter COUNTER2 autoini2 dst void DST2 autoini2 dst_idx DST2_IDX autoini2 ctrl void CTRLREG2 initialize DMA with 1st set of autoinitialization values dma src void SRC1 dma src idx SRC1L IDX dma gt counter COUNTER1 dma gt dst void DST1 dma gt dst idx DST1_IDX dma gt linkp amp autoini2 dma gt ctrl void CTRLREGI wait for DMA to finish transfer PRIM WAIT DMA volatile int dma 7 12 DMA C Programming Examples Example 7 9 Split Mode Auxiliary DMA Using Read Sync OK K kk kk kk kk kk kk kk kk KCK k k K K KCKCKCKCKCK KCK KCK KCK AAA AA EXAMPLE Split mode AUX only Commport to commport transfer DMA 3 Auxiliary channel transfers 8 words from
292. ruction 2 7 queues stack 2 9 RW See read write pin RAM zero wait states 4 7 RAMS 4 8 RAMs 4 5 RC See repeat counter register RCPF instruction 3 9 3 12 read sync 7 13 read write R W pin definition A 6 ready controllogic 4 14 ready generation 4 11 ready signals 4 12 regional technology centers 10 5 register file definition A 6 registers optimization use 5 5 repeat count RC 2 20 stack pointer SP 2 7 regular subroutine call example 2 3 repeat count register RC 2 20 repeat counter register definition A 6 repeat mode definition A 6 repeat modes block repeat restrictions 2 19 Index reset definition A 6 multiprocessing 1 5 rise fall time 1 4 signal generation 1 3 vector locations 1 2 vector mapping 1 2 voltage 1 3 reset circuit diagram 1 3 reset pin definition A 6 voltage diagram 1 4 RETIcond instruction 2 13 RETScond instruction 2 2 ROMEN definition A 6 RPTB and RPTBD instructions 6 24 optimization use 5 5 RPTB instruction 2 18 RPTBD instruction 2 18 RPTS instruction 2 18 example 2 18 3 3 optimization use 5 5 RSQRF instruction 3 15 3 16 RTCs 10 5 run stop operation 11 8 RUNB debugger command 11 18 11 19 11 20 11 21 11 22 RUNB_ENABLE input 11 20 scan path linkers 11 14 secondary JTAG scan chain to an SPL 11 15 suggested timings 11 21 usage 11 14 scan paths TBC emulation connections for JTAG scanpaths 11 23 seminars 10 5 serial resistors 8 5 shared bus interface 4 22 shar
293. ruction is fetched only once The only internal contribu tions are due to guiescent and internal bus operations Figure 9 5 indicates a 23 mA current contribution due to writes every available cycle Therefore the total contribution due to this portion of the code is lg libus Ixbus or 130 mA 60 mA 0 93 85 mA 23 mA 0 8 289 2 mA C4x Power Dissipation 9 27 Example Supply Current Calculations 9 6 3 Average Current The average current is derived from the two portions of the algorithm The pro cessing portion took 95 of the time and reguired about 245 8 mA the data dump portion took the other 5 and reguired about 411 6 mA The average is calculated as lavg 0 95 245 8 mA 0 05 289 2 mA 247 97 mA From the thermal characteristics specified in the TMS320C4x User s Guide it can be shown that this current level corresponds to a case temperature of 28 C This temperature meets the maximum device specification of 85 C and hence requires no forced air cooling 9 6 4 Experimental Results 9 28 A photograph of the power supply current for the FFT using a 40 MHz system clock is shown in Appendix A During the FFT processing the current varied between 190 and 220 mA The current during external writes had a peak of 230 mA and the average current requirement as measured on a digital multimeter was 205 mA Scaling those results to the 50 MHz calculations yielded results that were close to t
294. ry map each example in this memory interface section focuses on only one of the two interfaces How ever all of the examples are applicable to either the local or global bus The buses have identical but mutually exclusive sets of control signals Table 4 1 Local Global Bus Control Signals Global Bus STRBO STRB1 CEO CE1 RDYO RDY1 AE DE PAGEO PAGE1 R Wo R W1 Local Bus LSTRBO LSTRB1 LCEO LCE1 LRDYO LRDY1 LAE LDE LPAGEO LPAGE1 LR WO LR W1 While both the global bus and the local bus can interface to a wide variety of devices they most commonly interface to memories Zero Wait State Interfacing to RAMs 4 4 Zero Wait State Interfacing to RAMs A memory read access time is normally defined as the time between address valid and data valid This time can be determined by Read access time te H ta H1L A fsu D R where te H H1 H3 cycle time tq HiL A H1 low to address valid tsu D R Data valid before next H1 low read For a full speed zero wait state interface to any device a 50 MHz C4x 40 ns instruction cycle time requires a read access time of 21 ns from address stable to data valid For most memories the access time from chip enable is the same as access time from address thus it is possible to use 20 ns memories at full speed with a 50 MHz C4x However to use 20 ns memories properly you must avoid long delays between the processor and the memories Avoiding these delays is n
295. s support tools development 10 10 device 10 10 support tools nomenclature 10 9 system configuration 4 2 possible 4 2 system configuration stack diagram 2 8 system initialization 1 3 system stacks 2 7 stack pointer 2 7 target cable 11 12 target system connection to emulator 11 1 to 11 24 target system clock 11 10 TCK signal 11 2 11 3 11 5 11 6 11 11 11 15 11 16 11 23 TDI signal 11 2 11 3 11 4 11 5 11 6 11 7 11 10 11 11 11 16 11 17 TDO output 11 3 TDO signal 11 3 11 4 11 6 11 7 11 17 11 23 technical assistance 10 3 test bus controller 11 20 11 23 test clock 11 10 diagram 11 10 third party support 10 3 Timer definition A 7 Timer Period Register definition A 7 timing bank switching 4 20 page switching 4 20 timing calculations 11 6 to 11 7 11 16 to 11 24 TMS signal 11 3 TMS signal 11 2 11 4 11 5 11 6 11 7 11 10 11 11 11 15 11 16 11 17 11 23 TMS TDI inputs 11 3 TOIEEE instruction 3 19 token forcer 8 15 token forcer circuit diagram 8 15 tools part numbers 10 12 tools nomenclature 10 9 transfer function 6 9 trap vector table TVT definition A 8 trap vector table pointer TVTP definition A 8 tree structures 8 18 TRST signal 11 2 11 5 11 6 11 11 11 15 11 16 11 24 TSTB instruction 3 2 TVTP See trap vector table pointer twiddle factor 6 32 fast Fourier transforms FFT 6 41 unified mode definition See split mode unpacking data example 3 5
296. s section describes the part numbers of C4x devices development support hardware and software tools 10 3 1 Nomenclature To designate the stages in the product development cycle Texas Instruments assigns prefixes to the part numbers of all TMS320 devices and support tools Each TMS320 device has one of three prefixes TMX TMP or TMS Each sup port tool has one of two possible prefix designators TMDX or TMDS These prefixes represent evolutionary stages of product development from engineer ing prototypes TMX TMDX through fully gualified production devices and tools TMS TMDS This development flow is defined below Device Development Evolutionary Flow TMX The part is an experimental device that is not necessarily representa tive of the final device s electrical specifications TMP The partis a device from a final silicon die that conforms to the device s electrical specifications but has not completed quality and reliability verification TMS The part is a fully qualified production device Support Tool Development Evolutionary Flow TMDX The development support product that has not yet completed Texas Instruments internal qualification testing TMDS The development support product is a fully qualified development support product TMX and TMP devices and TMDX development support tools are shipped with the following disclaimer Developmental product is intended for internal evaluation purposes TMS devices and T
297. scribed in the TMS320C4x data sheet unless excessive data output is being performed The maximum current requirement for a C4x running at 50 MHz is 850 mA and occurs only under worst case conditions writing alternating data AAAA AAAA to 5555 5555 out of both external buses simultaneously every cycle with 80 pF loads 9 2 3 Algorithm Partitioning Each part of an algorithm has its own pattern with respectto internal and exter nal bus usage To analyze the power supply current requirement you must partition an algorithm into segments with distinct concentrations of internal or external bus usage Analyze each program segment to determine its power supply current requirement You can then calculate the average power supply current requirement from the requirements of each segment of the algorithm 9 2 4 Test Setup Description All TMS320C4x supply current measurements were performed on the test setup shown in Figure 9 1 The test setup consists of a TMS320C40 capaci tive loads on all data and address lines but no resistive loads A Tektronix digi tal multimeter measures the power supply current Unless otherwise specified all measurements are made at a supply voltage of 5 V an input clock frequency of 50 MHz a capacitive load of 80 pF and an operating temperature of 25 C Note that the current consumed by the oscillator and pullup resistors does not flow through the current meter This current is considered part of the system
298. setup and 0 ns hold times for address with respect to CS high ensure a clear margin Figure 4 3 Consecutive Reads Followed by a Write mo Xu Me Xx X 4 6 LR WO LSTRBO LD 31 0 LA 30 0 o a CY Ca RO da 2 addas JK Valid read addr Zero Wait State Interfacing to RAMs Figure 4 4 Consecutive Writes Followed by a Read kg uo xo m SE XU LR WO y STRBO iS Y y NN o LD 31 0 Valid write data Valid write data Valid data LA 30 0 Valid write address Valid write address Read address 4 4 2 Consecutive Writes Followed by a Read Interface Timing Figure 4 4 shows the timing of consecutive writes followed by a read Notice that between consecutive writes LR W stays low but STRBO goes inactive to frame the write cycles Although C4x zero wait state writes take two H1 cycles writes appear to take one cycle internally from the perspective of the CPU and DMA if no access to the interface is already in progress In the read cycle following the writes in Figure 4 4 the C4x requires zero wait state memories to have a LSTRB active to data valid time of less than 21 ns one H1 cycle minus H1 low to LSTRB active plus data setup before H1 low For most memory devices this time is the same as the memory access time which is t4 2 20 ns in this design Thus a margin of only 1 ns exists leaving little allowance for STRB gating if desired 4 4 3 RAM Interface Using One Local Strobe
299. source from the emulation cable pod This signal can be used to drive the system test clock TRST Test reset O l EMUO Emulation pin 0 I 1 0 EMU1 Emulation pin 1 I 1 0 PD Vcc Presence detect Indicates that the emula O tion cable is connected and that the target is powered up PD should be tied to Vcc in the target system TCK RET Test clock return Test clock input to the l O emulator May be a buffered or unbuffered version of TCK GND Ground 1 I input O output Do not use pullup resistors on TRST it has an internal pulldown device In a low noise environment TRST can be left floating In a high noise environment an additional pulldown resistor may be needed The size of this resistor should be based on electrical current considerations 11 2 Bus Protocol Designing Your Target System s Emulator Connector 14 Pin Header Although you can use other headers recommended parts include straight header unshrouded DuPont Connector Systems part numbers 65610 114 65611 114 67996 114 67997 114 The IEEE 1149 1 specification covers the requirements for the test access port TAP bus slave devices and provides certain rules summarized as follows 3 The TMS TDI inputs are sampled on the rising edge of the TCK signal of the device 3 The TDO output is clocked from the falling edge of the TCK signal of the device When these devices are daisy chained together the TDO of one device has approximately a half
300. ssipation 9 9 Current Requirement of Internal Components Figure 9 3 shows the internal bus current requirement when transferring As followed by 5s for various transfer rates Figure 9 4 shows the data depen dence of the internal bus current requirement when the data is other than As followed by 5s The trapezoidal region bounds all possible data values trans ferred The lower line represents the scale factor for transferring the same data The upper line represents the scale factor for transferring alternating data all Os to all Fs or all As to all 5s etc The possible permutation of data values is quite large The term relative data complexity refers to a relative measure of the extent to which data values are changing and the extent to which the number of bits are changing state There fore relative data complexity ranges from 0 signifying minimal variation of data to a normalized value of 1 signifying greatest data variation Figure 9 4 Internal Bus Current Versus Data Complexity Derating Curve Normalized Ipp 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 Operation Complexity If a statistical knowledge of the data exists Figure 9 4 can be used to deter mine the exact power supply reguirement on the basis of internal bus usage For example Figure 9 4 indicates a 89 5 scale factor when all Fs FFFF FFFFh are moved internally every cycle with two accesses per cycle 80 Mbytes per second Multiplying this scale factor
301. struction do not contribute an internal operations power supply current component During an RPTS instruction program fetch activity other than the instruction being repeated is suspended therefore power supply current is related only to the data operations performed by the instruction being executed C4x Power Dissipation 9 7 Current Requirement of Internal Components Figure 9 2 Internal and Quiescent Current Components 9 3 3 Incremental pp mA 210 180 150 120 90 60 30 IDLE folk MHz Internal Bus Operations The internal bus operations include all operations that utilize the internal buses extensively such as internal RAM accesses every cycle No distinction is made between internal reads or writes such as instruction or operand fetches from internal memory because internally they are equal Significant use of internal buses adds a data dependent term to the equation for the power sup ply current requirement Recall that switching requires more current Hence changing data at high rates requires higher power supply current Pipeline conflicts use of cache fetches from external wait state memory and writes to external wait state memory all affect the internal and external bus cycles of an algorithm executing on the TMS320C4x Therefore you must determine the algorithm s internal bus usage in order to accurately calculate power supply current requirements The TMS320C4x software simulator an
302. t In this example an array of 64 elements is flipped over by exchanging the ele ments that are equidistant from the end of the array In other words if the origi nal array is a 1 a 2 a 31 a 32 a 63 a 64 then the final array after the rearrangement is a 64 a 63 a 32 a 31 a 2 a 1 Because the exchange operation is performed on two elements at the same time it requires 32 operations The repeat counter RC is initialized to 31 In general if RC contains the number N the loop is executed N 1 times In the example the loop begins at the fourth instruction following the RPTBD instruc tion at the EXCH label RC should not be initiated in the next three instruc tions following the RPTBD Example 2 8 Loop Using Delayed Block Repeat ITLE LOOP USING DELAYED BLOCK REPEAT THIS CODE SEGMENT EXCHANGES THE VALUES OF ARRAY ELEMENTS THAT ARE SYMMETRIC AROUND THE MIDDLE OF THE ARRAY LDI 31 RC Initialize repeat counter RPTBD EXCH Repeat RC 1 times between START and EXCH LDI ADDR ARO ARO points to beginning of array LDI ARO ARI ADDI 63 AR1 AR1 points to the end of the array The loop starts here START LDI ARO RO Load one memory element in RO LDI AR1 R1 and the other in R1 EXCH STI R1 ARO 1 Then exchange their locations STI RO AR1
303. t 3 output fifo define DST1 IDX 0x1 dst address increment 2nd transfer settings define CTRLREG2 0x00c40005 DMA sends interrupt to CPU when transfer finishes TC 1 DMA CPU rotating priority and DMA stops after transfer completes define SRC2 0x002ffe00 src address define SRC2 IDX 0x4 src address increment define COUNTER2 0x4 number of words to transfer define DST2 0x002fff00 dst address define DST2 IDX 0x1 dst address increment DMAUNIF dma DMAUNIF DMAADDR DMAUNIF autoinil DMAUNIF autoini2 main initialize lst set of autoinitialization values f autoinil src void SRC1 autoinil src idx SRC1 IDX autoinil counter COUNTER1 autoinil dst void DST1 autoinil dst idx DST1 IDX autoinil linkp amp autoini2 autoinil ctrl void CTRLREGI initialize 2nd set of autoinitialization values 7 autoini2 sre void SRC2 autoini2 src_idx SRC2 IDX autoini2 counter COUNTER2 autoini2 dst void DST2 autoini2 dst idx DST2 IDX autoini2 ctrl void CTRLREG2 initialize DMA link pointer pointing to lst set of autoinit values dma gt linkp amp autoinil dma gt counter 0 dma gt ctrl volatile void CTRLREGI wait for DMA to finish transfer PRIM WAIT DMA volatile int dma Unified Mode Autoinitialization method 1 DMAO in unified mode transfers 8 words from 0x02ffC00 index 1 to 0x02ffd00 index 1 and the
304. te 2 To enable the open collector driver and pullup resistor on EMU1 to provide rising falling edges of less than 25 ns the modification shown in this figure is suggested Rising edges slower than 25 ns can cause the emulator to detect false edges during the RUNB command or when the external counter selected from the debugger analysis menu is used These seven important points apply to the circuitry shown in Figure 11 9 and the timing shown in Figure 11 10 J Open collector drivers isolate each board The EMUO 1 pins are tied to gether on each board At the board edge the EMUO 1 signals are split to provide IN OUT This is required to prevent the open collector drivers from acting as a latch that can be set only once Y The EMUO signals are bused down the backplane Pullup resistors are installed as required Y The bused EMUO 1 signals go into a PAL device whose function is to generate a low pulse on the EMUO 1 IN signal when a low level is detected XDS510 Emulator Design Considerations 11 19 Emulation Design Considerations 11 20 on the EMU0 1 OUT signal This pulse must be longer than one TCK period to affect the devices but less than 10 us to avoid possible conflicts or retriggering once the emulation software clears the device s pins During a RUNB debugger command or other external analysis count the EMU0 1 pins on the target device become totem pole outputs The EMU1 pin is a ripple carry out
305. ted to deter mine the maximum system test clock frequency Case 1 Single processor direct connection TMS TDI timed from TCK RET low TMSmax tsu ms Ioa TCK RET TMS TDI t TCKfactor 20ns 10ns 7 0 4 75ns 13 3 MHz TTDO tau M t pd TCK_RET TDO firCKfactor 15ns 3ns 7 0 4 45ns 22 2 MHz In this case the TCK_RET to TMS TDI path is the limiting factor Emulation Timing Calculations Case 2 Single multiprocessor TMS TDI TCK buffered input TDO buffered output TMS TDI timed from TCK_RET low 7 Its TMSmax tsu TTMS t teutskew tod TCK_RET TMS TDI 7 7 TCKfactor _ 20ns 10ns 1 35ns 0 4 78 4ns 12 7 MHz ta TTDO su TDOmin ta T tod TCK RET TDO t TCKfactor 15ns 3ns 10ns i 0 4 70ns 14 3 MHz In this case the TCK RET to TMS TDI path is the limiting factor In a multiprocessor application itis necessary to ensure that the EMUO 1 lines can go from a logic low level to a logic high level in less than 10 us This can be calculated as follows tr S Rpullup X Ngevices X Cload per device 5 4 7 kQ x16 x 15 pF 5 64 us XDS510 Emulator Design Considerations 11 7 Connections Between the Emulator and the Target System 11 7 Connections Between the Emulator and the Target System It is extremely important to provide high quality signals between the emulator and the JTAG target system Depending upon the si
306. ter A 0x00c00000 amp 0x03000000 amp 0x03c00000 amp x 0x00800000 x 0x02000000 x 0x02800000 Programming the DMA Coprocessor 7 17 7 18 Chapter 8 Using the Communication Po rts The C4x communication ports are very high speed data transmission circuits Their speed and the close proximity of multiple data lines create special chal lenges General design rules that are applicable to high speed lt 10ns memory interface design are appropriate for C4x communication port inter connections This chapter provides guidelines for designing communication port interfaces Topic Page SSI COmmUNICAUONIPONS terete eerie erect EET 8 2 8 2 SignaliGonsiderations7337939 elev stare eere tene ele lano svete 8 5 8 3 Interfacing With a Non C4x Device eee e eens 8 7 8 4 Terminating Unused Communication Ports 8 8 8 5 Design TIPS scele retis arenis uin eee nrw severe era ee e s nan einn 8 9 8 6 Commportto Host Interface Luuus e 8 10 8 7 An I O Coprocessor C4x Interface Luussuuuuus 8 14 8 8 Implementing a Token Forcer lt lt lt lt lt lt lt lt cc lt lt lt lt 4 8 15 8 9 Implementing a CSTRB Shortener Circuit 8 17 8 10 Parallel Processing Through Communication Ports 8 18 8 11 Broadcasting Messages From One C4x to Many C4x Devices 8 20
307. ters and Signal Processing Hingham MA Kluwer Academic Publishers 1986 14 Jones D L and T W Parks A Digital Signal Processing Laboratory Using the TMS32010 Englewood Cliffs NJ Prentice Hall Inc 1987 15 Lim Jae and Alan V Oppenheim Advanced Topics in Signal Processing Englewood Cliffs NJ Prentice Hall Inc 1988 16 Lin K G Frantz and R Simar Jr The TMS320 Family of Digital Signal Processors Proceedings of the IEEE USA Volume 75 Number 9 pages 1143 1159 September 1987 viii 17 Lovrich A Reimer J An Advanced Audio Signal Processor Digest of Technical Papers for 1991 International Conference on Consumer Elec tronics June 1991 18 Magar S D Essig E Caudel S Marshall and R Peters An NMOS Digi tal Signal Processor with Multiprocessing Capability Digest of IEEE Inter national Solid State Circuits Conference USA February 1985 19 Morris Robert L Digital Signal Processing Software Ottawa Canada Carleton University 1983 20 Oppenheim Alan V Editor Applications of Digital Signal Processing Englewood Cliffs NJ Prentice Hall Inc 1978 21 Oppenheim Alan V and R W Schafer Digital Signal Processing Engle wood Cliffs NJ Prentice Hall Inc 1975 and 1988 22 Oppenheim A V A N Willsky and I T Young Signals and Systems En glewood Cliffs NJ Prentice Hall Inc 1983 23 Papamichalis P E and C S Burrus Conversion of Dig
308. the BBS refer to the 7MSS320 Family Development Support Reference Guide literature number SPRUO11 10 1 4 Internet Services 10 4 Texas Instruments offers two Internet accessible services for DSP support an ftp site and a www site World wide web Point your browser at http www ti com to access TI s web site At the site you can follow links to find product information online literature an online lab and the 320 Hotline online FTP Use anonymous ftp to ti com Internet port address 192 94 94 1 to access copies of the files found on the BBS The BBS files are located in the subdirectory called mirrors Development Support 10 1 5 Technical Training Organization TTO TMS320 Workshops CAx DSP Design Workshop This workshop is tailored for hardware and soft ware design engineers and decision makers who will be designing and utiliz ing the C4x generation of DSP devices Hands on exercises throughout the course give participants a rapid start in developing C4x design skills Micro processor assembly language experience is reguired Experience with digital design technigues and C language programming experience is desirable These topics are covered in the C4x workshop C4x architecture instruction set Use of the PC based software simulator Use of the C3x C4x assembler linker C programming environment System architecture considerations Memory and I O interfacing Development support O O O O O C L
309. the JTAG emulation scanning The SPL is capable of adding any combination of its four secondary scan paths into the main scan path A system of multiple secondary JTAG scan paths has better fault tolerance and isolation than a single scan path Since an SPL has the capability of adding all secondary scan paths to the main scan path simultaneously it can support global emulation operations such as starting or stopping a selected group of processors TI emulators do not support the nesting of SPLs for example an SPL connected to the secondary scan path of another SPL However you can have multiple SPLs on the main scan path Although the ACT8999 scan path selector is similar to the SPL it can add only one of its secondary scan paths at a time to the main JTAG scan path Thus global emulation operations are not assured with the scan path selector For this reason scan path selectors are not supported You can insert an SPL on a backplane so that you can add up to four device boards to the system without the jumper wiring required with nonbackplane devices You connect an SPL to the main JTAG scan path in the same way you connect any other device Figure 11 8 shows you how to connect a secondary scan path to an SPL Figure 11 8 Connecting a Secondary JTAG Scan Path to an SPL Emulation Design Considerations SPL DTCK Tor YAGO TDI DTDOO TMS o TMS DTMSO TCK TCK
310. the device through all of the Vpp inputs and returned through the Vss connections Note that numerous Vpp and Vss pins on the device are routed to a variety of internal connections not all of which are common Externally however all of these pins should be connected in parallel to 5 V and ground planes providing very low impedance As mentioned previously because of the inherent differences in operations between program segments it is usually appropriate to consider current for each of the segments independently In this way peak current requirements are readily obtained Further you can make average current calculations to use in determining heating effects of power dissipation These effects in turn can be used to determine thermal management considerations Combining Supply Current Due to All Components To determine the total supply current requirements for any given program activity calculate each of the appropriate components and combine them in the following sequence 1 Start with 130 mA quiescent current requirement 2 Add 60 mA for internal operations unless the device is dormant such as when executing IDLE or using an RPTS instruction to perform internal and or external bus operations see Internal Operations section on page 9 7 Internal or external bus operations executed via RPTS do not con tribute an internal operations power supply current component Therefore current components in the next two steps may still b
311. ther memory or an extended precision register The destination for both conversions must be in an extended precision register In the case of block memory transfer the no penalty data format conversion can be executed by parallel instruction with STF Example 3 13 and Example 3 14 show the data format conversion within the data transformation between communication port and internal RAM Floating Point Format Conversion IEEE to From C4x Example 3 13 IEEE to C4x Conversion Within Block Memory Transfer TITLE IEEE TO C4x CONVERSION WITHIN BLOCK MEMORY RANSFER PROGRAM ASSUMES THAT INPUT FIFO OF COMMUNICATION PORT 0 IS FULL OF IEEE FORMAT DATA EIGHT DATA WORDS ARE RANSFERRED FROM COMMUNICATION PORT 0 TO INTERNAL RAM BLOCK 0 AND THE DATA FORMAT IS CONVERTED FROM IEEE FORMAT x TO C4x FLOATING POINT FORMAT LDI CPO_IN ARO Load comm port0 input FIFO address LDI RAMO AR1 Load internal RAM block 0 address FRIEEE ARO RO Convert first data RPTS 6 FRIEEE ARO RO Convert next data Il STF RO AR1 1 Store previous data STF RO AR1 1 Store last data Example 3 14 C4x to IEEE Conversion Within Block Memory Transfer
312. tional high speed protocol used in the C4x communica tion ports signal quality is extremely important Poor quality signals can poten tially cause both ends of a communication port link to become a master If this occurs and one communication port drives a signal request no response is received from the other communication port and the link hangs This condition remains until both C4x devices are reset If this is not corrected the commu nication port drivers can be damaged If poor quality signals are a problem use circuits to improve impedance match ing Because the C4x communication port output buffer impedance can change during signal switching a conventional parallel termination does not help Serial matching resistors can be added at each end of all communication port lines see Figure 8 1 Serial resistors help match the output buffer im pedance to the line impedance and protect against signal contention caused by any potential fault condition The resistor value plus buffer output imped ance should match the line impedance Results have shown that a lower than optimal serial resistor value provides better performance A resistor value of 22 33 Q is usually a reasonable start Some experimentation may be needed to reduce ringing effects A good received signal should have an undershoot of 0 5 to 1 0 V or less A resistor value that is too high results in an under damped falling edge that does not cross the zero logic level and sh
313. tions With the TMS320 Family C4x features that increase efficient implementation of numerically intensive algorithms are particularly well suited for FFTs The high speed of the C4x 40 ns cycle time makes the implementation of real time algorithms easier while the floating point capability eliminates the problems associated with dy namic range The powerful indexing scheme in indirect addressing facilitates the access of FFT butterfly legs that have different spans The repeat block implemented by the RPTB or RPTBD instruction reduces the looping over head in algorithms heavily dependent on loops such as the FFTs This gives the efficiency of in line coding with the form of a loop Since the output of the FFT is in scrambled bit reversed order when the input is in regular order it must be restored to the proper order This rearrangement does not require ex tra cycles The device has a special form of indirect addressing bit reversed addressing mode that can be used when the FFT output is needed The C4x can implement the bit reversed addressing mode on either the CPU or DMA This mode makes it possible to access the FFT output in the proper order If the DMA transfer with bit reversed addressing mode is used there is no overhead for data input and output There are several types of FFT examples in this section j Radix 2 and radix 4 algorithms depending on the size of the FFT butterfly J Decimation in time or frequency DIT
314. to All Components 2005 9 20 9 5 2 Supply Voltage Operating Frequency and Temperature Dependencies 9 21 9 5 3 Design Equation icono reve dr e EORR bw k CE a be 9 22 9 5 4 Average Current 0 00 cc ehh 9 23 9 5 5 Thermal Management Considerations 0 0 cece ee eee eee 9 23 9 6 Example Supply Current Calculations 000 c cece eee eens 9 27 9 6 1 Processing rehenes ERIS sa 9 27 9 6 2 Data Output ores ense Rr RE da INVE ere 9 27 9 6 3 Average Current 0 cc enn 9 28 9 6 4 Experimental Results 0 20000 cece eee nents 9 28 9 7 Design Considerations essre asaina saii adinei mai eet tenet eens 9 29 9 7 1 System Clock and Signal Switching Rates 0 0c eee eee ee 9 29 9 7 2 Capacitive Loading of Signals 0 0 cece cee 9 30 9 7 3 DC Component of Signal Loading 00 c cece eee eee eee 9 30 Development Support and Part Order Information 10 1 Describes C4x support available from TI and third part vendors 10 1 Development Support ssssssssssssssssssse teens 10 2 10 1 4 Third Party Support css d nirai e de ome Ent Rt d cea 10 3 10 1 2 The DSP Hotline 2 III 10 3 10 1 3 The Bulletin Board Service BBS 00 ccc eee eee eee 10 4 10 1 4 Internet Services 2 nett ees 10 4 10 1 5 Technical Training Organization TTO TMS320 Workshops 10 5 Contents 10 2 SOCKETS MD
315. to the bus CE AE DE enabled and has had enough time to complete its access Hence with bus enable and status signals the C4x flexible bus interfaces easily implement high speed shared bus configura tions 4 6 2 Shared Memory Interface Design Example 4 22 For an example of a C4x shared memory interface see the TMS320C4x Par allel Processing Development System Technical Reference SPRU075 In the example in that text four C4x devices share SRAM with their global buses tied together A bus arbitrator implemented as a programmable logic device provides a fair scheme for processor access to the shared bus The design uses high speed parts but employs a fully asynchronous handshake protocol that allows C4x devices of various speeds and also processors other than C4x devices to be added to this bus configuration The shared memory interface in the PPDS works for C4x devices running at a speed of up to 32 MHz For higher speeds the arbitrator incorrectly takes away bus master privileges from a C4x between back to back reads to the same page the page size is determined by the page size field in the global bus control register The default page size for the PPDS global memory is 64K If this occurs while two or more C4x devices are requesting the bus to perform write cycles random shared memory locations can be corrupted To fix this problem for higher speeds the busenable signal of each C4x local interface can be used to g
316. tuation you must supply the correct signal buffering test clock inputs and multiple processor intercon nections to ensure proper emulator and target system operation Signals applied to the EMUO and EMU1 pins on the JTAG target device can be either input or output I O In general these two pins are used as both input and output in multiprocessor systems to handle global run stop operations EMUO and EMU1 signals are applied only as inputs to the XDS510 emulator header 11 7 1 Buffering Signals If the distance between the emulation header and the JTAG target device is greater than six inches the emulation signals must be buffered If the distance is less than six inches no buffering is necessary The following illustrations depict these two situations No signal buffering In this situation the distance between the header and the JTAG target device should be no more than six inches 6 Inches or Less gt Voc Ves JTAG Device Emulator Header 4 EMUO EMUO PD EMU1 o EMU1 TRST TRST GND TMS TMS GND TDI TDI GND TDO TDO GND TCK t TCK GND TCK RET V GND The EMUO and EMU1 signals must have pullup resistors connected to Vcc to provide a signal rise time of less than 10 us A 4 7 kO resistor is suggested for most applications Connections Between the Emulator and the Target System O Buffered transmission signals In this situation the distance between the emulation header and the processo
317. umber of bits of accuracy in the mantissa doubles Using RSQRF accuracy starts at eight bits With one iteration accuracy increases to16 bits and with the second iteration accuracy increases to 32 bits in the mantissa Example 3 9 shows the program for implementing this algorithm on the C4x Logical and Arithmetic Operations 3 15 Calculating a Square Root Example 3 9 Reciprocal of the Square Root of a Positive Floating Point TITLE RECIPROCAL OF THE SOUARE ROOT OF A POSITIV FLOATING POINT 1 SUBROUTINE RCPSORF HE FLOATING POINT NUMBER v IS STORED IN RO AFTER THE COMPUTATION IS COMPLETED 1 SQRT v IS STORED IN R1 T TYPICAL CALLING SEQUENCE LDF v RO LAJU RCPSORF ARGUMENT ASSIGNMENTS ARGUMENT FUNCTION RO v NUMBER TO FIND THE RECIPROCAL OF UPON THE CALL R1 1 sqrt v UPON THE RETURN REGISTER USED AS INPUT RO REGISTERS MODIFIED R1 R2 REGISTER CONTAINING RESULT R1 REGISTER FOR SUBROUTINE CALL R11 CYCLES 10 not including subroutine overhead WORDS 10 not including subroutine overhead OR AC ACA CACA F FF F F X X X X 0X 0X F 0X 0X 0X 0X 0X F X X global RCPSORF RCPSORF RSORF RO R1 Get x 0 th stimate of 1 sqrt v RO v PYF 0 5 RO RO v 2 PYF3 R1 R1 R2 First iteration PYF RO R
318. unter of first loop ADDI 2 IR1 R9 R9 JT LSH 1 BK BK N2 OUTER LOOP LOOP LDI INPUTP ARO ARO points to X T SUBI3 1 R8 RC RC should be one less than desired ADDI BK ARO AR1 ARl points to X 11 RPTBD BLK1 Setup loop BLK1 ADDI BK AR1 AR2 AR2 points to X I2 ADDI BK AR2 AR3 AR3 points to X I3 LDF AR1 RO RO Y 11 FIRST LOOP BLK1 ADDF RO AR3 R3 R3 Y I1 Y 13 ADDF ARO AR2 R1 R1 Y I Y 12 ADDF R3 R1 R6 R6 R1 R3 SUBF AR2 AR0 R4 R4 Y I Y 12 LDF AR2 R5 R5 X I2 STF R6 ARO Y 1 R1 R3 SUBF R3 R1 R1 R1 R3 ADDF AR3 AR1 R3 R82X I1 X 13 ADDF R5 ARO R1 R1 X 1 X 12 STF R1 AR1 Y 11 R1 R3 ADDF R3 R1 R6 R6 R1 R3 SUBF R5 ARO R2 R2 X I X I2 STF R6 AR0 IRO X I R1 R3 SUBF R3 R1 R1 R1 R3 SUBF AR3 AR1 R6 R6 X I1 X I3 SUBF RO AR3 R3 R3 Y I1 Y 13 STF R1 AR1 IRO X 11 R1 R3 SUBF R6 R4 R5 RS R4 R6 ADDF R6 R4 R4 R44 R6 STF R5 AR2 Y 12 R4 R6 STF R4 AR3 Y 13 R4 R6 SUBF R3 R2 R5 R5 R2 R3 ADDF R3 R2 R2 R2 R3 STF R2 AR34 IRO X I3 R2 R3 BLK1 STF R5 AR2 IRO X I2 R2 R3 LDF AR1 RO RO Y 11 IF THIS IS THE LAST STAGE YOU ARE DONE CMPI IR1 R8 BZD ENDB 6 36 Fast Fourier Transforms FFTs Example 6 14 Complex Radix 4 DIF FFT Continued MAIN INNER LOOP LDI 1 R10 LDI 2 R11 LDI R11 ARO ADDI INPUTP ARO ADDI 2 R11
319. ure interrupt service routines save DP pointer KKK KKK KKK KK A AXA Ck KKK KKK KKK KA AX AXA K AXA A AX AX KA KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK 6 74 Fast Fourier Transforms FFTs Example 6 18 Real Inverse Radix 2 FFT Continued AAC xls FP Set global global FFT SIZE usect LOG SIZE usect SOURCE ADDR usect DEST ADDR usect SINE TABLE usect BIT REVERSE usect SEPARATION susect Stack 12 Words Ck A AA A A A A A A A A A A A A A AA A AAA ke I Ik I RARA RARA Sk S I I S e I I S I AAA KK KK AAA x x M BENCHMARKS Assumptions Program in RAMO le Reserved data in RAMO i Stack on Local Global Bus RAM Sine Cosine tables in RAMO Processing and data destination in RAMI1 Local Global Bus RAM 0 wait state FFT Size Bit Reversing Data Source Cycles C30 WO Bem SE 1024 OFF RAM1 25120 approx Note This number does not include the C callable overheads This benchmark is the number of cycles between labels STARTB and ENDB NOTE If ffttxt is located in external SRAM enable cache for faster performance KKK KKK KKK KKK KKK KKK KKK AXA A KA AX AA AA A KA AXA AKA AX KA KKK KKK KKK KKK KKK A MA AAA M ck ck ck Mk sk KAKA KKK AR3 TEBE EL STARTB ENDB ifftdat 1 ifftdat 1 Pfftdat T ifftdat l ifftdat 1 ifftdat 1 ifftdat l Initialize C Function
320. uring changes in WE this avoids undesired memory accesses while the address changes The C4x ensures this glueless interface because LSTRB always frames changes in LR W 4 4 4 RAM Interface Using Both Local Strobes Figure 4 6 shows the C4x s local bus interfaced to HM6708 20 ns 64K x 4 bit CMOS static RAMs with zero wait states using CS controlled write cycles 4 8 Zero Wait State Interfacing to RAMs These RAMs are arranged to allow 128K 32 bit words of local memory which are implemented as two 64K x 32 bit banks One bank is controlled by each of the two sets of control signals on the local bus To map these memory de vices properly in the C4x s memory space you must use the local memory in terface control register LMICR to define which part of the local bus s memory space is mapped to each of the two strobes In this implementation with inter nal ROM disabled LSTRBO is mapped to the first 64K words of the local space addresses Oh through OFFFFh and LSTRB1 is mapped to the rest of the lo cal space addresses 10000h through 7FFF FFFFh For this memory config uration the LSTRB ACTIVE field of the local memory interface control regis ter LMICR should be set to 011112 Also each LSTRB requires only one page The PAGESIZE field of the LMICR should be set to 011115 Note that in Figure 4 6 the LRDY inputs are tied low selecting zero wait states for all ac cesses on the local bus Hence through the use o
321. utput lower real butterfly input double group count half butterfly count clear LSB half step from upper to lower real part step from old imaginary to new real value dummy load only for address update r7 COS r6 SIN rl BI SIN dummy addf for counter update r0 BR COS Setup for loop bflyl r3 TR r0 rl rO BR SIN rl BI COS r2 AR TR rb AR TR BR r2 Applications Oriented Operations 6 47 Fast Fourier Transforms FFTs Example 6 15 Faster Version Complex Radix 2 DIT FFT Continued mpyf arl r6 r5 rb BI SIN AR r5 stf r5 ar2 subf el 40 12 r2 TI r0 r1 mpyf arl r7 r0 P0 BR COS r3 AI TI addf r2 ar r3 subf r2 ar0 r4 r4 AI TI BI 13 Stet r3 ar3 addf 0 E57 ES r3 TR r0 r5 mpyf arl r6 r0 r0 BR SIN r2 AR TR subf r3 ar0 r2 mpyf arl r7 rl rl BI COS AI r4 stf r4 ar2 bflyl addf ar0 r3 r5 rb AR TR BR r2 stf r2 ar3 switch over to next group subf Fi FO e2 r2 TI r0 rl addf r2 arQ0 r3 r3 AI TI AR r5 MI stf r5 ar2 subf r2 ar0 ir1 r4 r4 AI TI BI r3 B stf r3 ar3 ir1 nop ari ir1 address update mpyf arl rT7 rl rl BI COS AI r4 B sti r4 ar2 ir1 mpyf arl r6 r0 r0 BR SIN Tdi ar5 rc rptbd bfly2 Setup for loop bfly2 mpyf ar arl r
322. w pulse on the RESET pin for 100 to 200 ms Once a proper reset pulse has been applied the processor fetches the reset vector from location zero which contains the address of the system initializa tion routine Figure 1 1 shows a circuit that will generate an appropriate pow er up or push button reset signal Figure 1 1 Reset Circuit TMS320C4x Reset O 5 V 7 74ALS34 R4 100 kQ 3 C4 247 uF AN I The voltage on the RESET pin is controlled by the R4C4 network After a reset this voltage rises exponentially according to the time constant R4C4 as shown in Figure 1 2 In Figure 1 1 the 74ALS34 provides a clean RESET signal to the C4x Processor Initialization 1 3 Reset Signal Generation Figure 1 2 Voltage on the RESET Pin Voltage A Jf V2 Voc 1 e t 1 Voc V r Time to 0 ty The duration of the low pulse on the RESET pin is approximately t4 which is the time it takes for the capacitor C4 to be charged to 1 5 V This is approxi mately the voltage at which the reset input switches from a logic 0 to a logic 1 The capacitor voltage is expressed as V Vool 1 e 5 where T R4C4 is the reset circuit time constant Solving 5 for t results in V t RjCjln 1 6 Can Setting the following R4 100 kQ C4 4 7 uF Voc 25V V V y 1 5V results in t 167 ms Therefore the reset circuit of Figure 1 1 provides a low pulse for a lon
323. wer dissipation values is short on the order of a few seconds in comparison to the package thermal characteristics time constant use average power calculated in the same man ner as average current described in the previous section Otherwise calculate maximum device temperature on the basis of the actual time required for the program segments involved For example if a particular program segment lasts for 7 minutes the device essentially reaches thermal equilibrium due to the total power dissipation during the period of device activity Note that the average power should be determined by calculating the power for each program segment including all considerations described above and performing a time average of these values rather than simply multiplying the average current by Vpp as determined in the previous subsection Calculate specific device temperature by using the TMS320C4x thermal impedance characteristics included in the TMS320C4x data sheet Example Supply Current Calculations 9 6 Example Supply Current Calculations 9 6 1 Processing 9 6 2 Data Output An FFT represents a typical DSP algorithm The FFT code used in this calcula tion processes data in the RAM blocks The entire algorithm consists mainly of internal bus operations and hence includes guiescent and in general inter nal operations At the end of the processing the results are written out on the global and local bus Therefore the algorithm exhibits a hi
324. wing the end of the internal wait count Also the consecutive cycles must be from independently decoded areas of memory or from different pages in memory Otherwise the external ready generation logic may lose synchronization with bus cycles and generate improperly timed wait states 4 5 2 ANDing of the Ready Signals STRBx SWW 11 4 12 lf the logical AND electrical OR of the wait count and external ready signals is selected the later of the two signals will control the internal ready signal but both signals must be asserted Accordingly external ready control must be im plemented for each wait state device and the wait count ready signal must be enabled This feature is useful if devices in a system are eguipped to provide a ready signal but cannot respond quickly enough to meet the C4x s timing require ments If these devices normally indicate a ready condition and when ac cessed respond with a wait until they become ready the logical AND of the Wait States and Ready Generation two ready signals can be used to save hardware in the system In this case the internal wait counter can provide wait states initially and then the external ready can provide wait states after the external device has had time to send a not ready indication The internal wait counter then remains ready until the external device also becomes ready which terminates the cycle Additionally the AND of the two ready signals can be used for extendin
325. x instructions See TOIEEE and FRIEEE instructions CPU cycle definition A 2 CPU registers stack pointer SP 2 7 CSTRB shortener 8 17 cycle See CPU cycle D0 D31 definition See LD0 LD31 data address generation logic definition See pro gram address generation logic data page pointer definition A 2 debugger See emulation decode phase definition A 2 dequeues stack 2 9 development tools 10 2 device nomenclature 10 10 partnumbers 10 11 diagnostic applications 11 23 DIE See DMA interrupt enable register digital filters FIR 6 7 IIR See IIR filters lattice 6 17 dimensions 12 pin header 11 18 14 pin header 11 12 mechanical 14 pin header 11 12 division floating point 3 9 integer 3 9 DMA 3 3 3 6 7 13 7 14 8 2 autoinitialization 7 6 C programming examples 7 9 example 7 11 7 12 interrupts example 7 8 split mode autoinitialization 7 15 split mode 7 6 unified mode 7 10 DMA autoinitialization 7 7 DMA channel finished transfer 7 3 DMA controller See DMA coprocessor DMA coprocessor array initialization 7 4 autoinitialization 7 7 example 7 8 definition A 3 interrupts 7 4 link pointer register example 7 7 operation examples 7 4 Index programming 7 4 programming hints 7 2 split mode example 7 5 transfer description 7 4 DMA interrupt enable register DIE definition A 3 DMA programming 7 2 DMA transfer 7 4 communication port 7 5 documentation 10 3 double precision fixed point 3 17 DP See data page pointe
326. y the exponent maskable interrupt A hardware interrupt that can be enabled or disabled through software memory mapped register One of the on chip registers mapped to ad dresses in memory Some of the memory mapped registers are mapped to data memory and some are mapped to input output memory MFLOPS Millions of floating point operations per second A measure of floating point processor speed that counts of the number of floating point operations made per second microcomputer mode A mode in which the on chip ROM is enabled This mode is selected via the MP MC pin See also MP MC pin microproces sor mode microprocessor mode A mode in which the on chip ROM is disabled This mode is selected via the MP MC pin See also MP MC pin microcomput er mode MIPS Million instructions per second miss A condition in which when the processor fetches an instruction it is not available in the cache MSB Most significant bit The highest order bit in a word multiplier A device that generates the product of two numbers NMI See Nonmaskable interrupt Glossary A 5 Glossary A 6 nonmaskable interrupt NMI A hardware interrupt that uses the same logic as the maskable interrupts but cannot be masked It is often used as a soft reset overflow flag OV bit A status bit that indicates whether or not an arithme tic operation has exceeded the capacity of the corresponding register PC See program counter perip
327. ypercube Fully Connected Network A more general purpose structure According to memory interface C4x parallel system architecture can be clas sified in three basic groups Shared Memory Architecture shares global memory among processors g Distributed Memory Architecture each processor has its own private local memory Interprocessor communication is via C4x communication ports Shared and Distributed Memory Architecture each processor has its own local memory but also shares a global memory with other processors Figure 8 8 shows examples of these basic groups Using the Communication Ports 8 19 Broadcasting Messages From One C4x to Many C4x Devices 8 11 Broadcasting Messages From One C4x to Many C4x Devices 8 20 Message broadcasting from one C4x to many C4x devices requires a simple interface However try to avoid signal analog delays caused by distance differ ences between the C4x master and the C4x slave processor These delays could create bus contention in the CSTRB and CRDY lines Figure 8 9 shows the block diagram of a multiple processor system In this design one C4x is the dedicated transmitter and three C4x devices are dedicated receivers No reset circuitry is needed because the transmitter is communication port 0 and the receivers are communication ports 3 4 and 5 At reset C4x communica tion ports 0 1 and 2 are output ports and communication ports 3 4 and 5 are input ports
328. yte If this happens the only solution is a C4x reset Because at reset communication ports 0 1 and 2 are transmitters and 3 4 and 5 are receivers a safe reset requires resetting of every C4x con nected to the C4x with the faulty condition Global reset becomes a neces sity LLLLLLLLLL v Using the Communication Ports 8 9 Commport to Host Interface 8 6 Commport to Host Interface 8 6 1 8 10 A host interface between a C4x comport and a PC s bidirectional printer port has many advantages including freeing up the DSP bus and treating the host PC as a virtual C4x node within a system of C4x devices This interface uses a bidirectional PC printer port interface Logic circuits buff ers and resistors convert logic control levels driven from the printer port into CAx commport control signals Signals driven from the C4x are converted into status signals which can be polled in software by the PC In addition the PC s printer port provides the byte wide data path into and out of the PC You can use this I O interface for host data communication bootloading and debug operations With proper buffering and software control it is also pos sible to build long and reliable links The speed is primarily dependent on the speed of the host When using a PC as the host the speed is limited by the PC s I O channel speed If higher rates are needed use a memory

"TMS320C4x General-Purpose Applications

Contents

Download Pdf Manuals

Related Search

Related Contents

&quot;TMS320C4x General-Purpose Applications

Contents

Download Pdf Manuals

Related Search

Related Contents

"TMS320C4x General-Purpose Applications