Home

The SHARC in the C

image

Contents

1. MOVE L 4x3 DO MOVE L AO DO D1 Offset scale to 12 bytes I4 data 21K M4 3 R1 dm M4 14 Note that the offset of element data 3 from the start of the array data is 12 bytes on the 68K but three www circuitcellar com online words on the 21K You ll have to get deep into the 21K User Manual to discover the advantages of having three 21K words 12 bytes long under some circumstances and 16 bytes long under others These simple code sequences also indicate the differences in the coding conventions adopted for volatile regis ter usage in the two development environments When program flow requires a subroutine call it is impor tant that key values remain undis turbed in registers for reuse when the subroutine exits Volatile registers can be used in a subroutine without sav ing every register to slow memory With the SDS 68K compiler there are two volatile data registers DO and D1 defined and the 21K registers RO and R1 have a similar usage How ever there are four additional SHARC volatile data registers R2 R4 R8 and R12 This strange choice is not arbitrary but it meets the require ment to have volatile registers avail able to use with the 21K super scalar operations Both 68K and 21K coding conven tions allow for two volatile address registers However while the 68K volatile address registers are the obvi ous AO and Al the 21K equivalent index registers are I4 and 12 This c
2. 68K address registers AO A7 However the similarity ends there A main limitation of the 68K for DSP operations is the frequent conflicts between data fetches and instruction fetches on a single data bus The 21K s Harvard architecture removes this prob lem by having both a program memory data bus for instructions and a data memory data bus And the SHARC s large on chip fast access memory pro vides more speed Even with the Harvard architec ture there will be data data conflicts when a large amount of data is being manipulated within a tight DSP loop On the SHARC this problem is over come by storing instructions within an instruction cache and allowing data fetches to occur simultaneously down both the program memory data bus and the data memory data bus This architectural feature is handled by special C extensions int dm data 200 int pm coeffs 200 The dm syntax indicates that the data array is stored in data memory for fast access through the data memory data bus The pm syntax indicates that the coeff array is stored in program memory for fast access through the program memory data bus www circuitcellar com online 68K Processor Integer data registers DO D7 Floating point registers Subroutine return value Subroutine parameters Address registers Index registers 18 115 program memory Modify registers Length registers Base registers Volatile registers 14 M4 112 M12 Alternate r
3. must be handled differently from a subroutine In particular volatile regis ters must be saved recovered and an RTI rather than an RTS instruction is needed at the end of the routine During the 68K main function the starting address for the ISR routine must be placed at the appropriate location in the vector table Then the interrupt must be activated The equivalent of these events must be present with the SHARC chip and compiler However the implementa tion details are different Unlike the 68K with its interrupt vector table the starting addresses of the SHARC interrupt service routines begin at a fixed location in memory Each interrupt is provided with a fixed number of instructions within this April 2000 5 area For longer routines a jump must occur to code elsewhere in memory There is no 21K pragma inter rupt preprocessor command to desig nate an ISR as something other than a subroutine Instead there are three different approaches within the SHARC C code that can be used to link an interrupt to a specific ISR routine The following code links the IRQ1 interrupt with the C subroutine or C compatible assembly subrou tine DoSomething include lt signal h gt interrupt SIG_IRQ1 DoSomething The call interrupt modifies a lookup table used to inform the inter rupt handler that IRQ1 interrupts will require that all possible registers Rx Ix Bx Lx Mx and others be saved to external m
4. the speed improving hardware features impose restrictions that can t be handled through a standard C pro gramming model In this article I look at the assembly code of C and assembly code interfaces on Analog Devices 21061 SHARC These are compared to those found with the Software Development System s SDS Motorola 68K CISC C compiler The SHARC assembly language is C like in format which makes the comparison relatively straightforward Only C functions calling assembly code functions will be considered There is little advantage in going in the opposite direction because the whole point of switching to assembly code from a C subroutine is speed REGISTER COMPARISON You must understand the function and uses of the processor registers before trying to tackle a link between assembly code and C The available registers programmer s model for the 68K and 21K processors are shown in Table 1 There are as many similari ties as there are differences between these processors The sixteen 21K data registers R1 R15 have essentially the same func tionality as the eight 68K data registers DO D7 However there are Figure 2 This is the Data Address Generation Block DAG1 on the SHARC 2106X DSP processor The modulus logic in DAG1 permits hardware circular buffer operations to occur with zero overhead but poses problems for the C programmer 2 April 2000 is much wider than many hidden differences Th
5. up address modify constant Calculate the sum dm I4 M4 RO R1 dm I4 M4 RO R1 RO RO Fetch and add Fetch and add I4 is incremented and then adjusted by the circular buffer hardware to point back to VALUES O dm I4 M4 RO R1 Fetch and add 14 pointing to VALUES 1 prevent surprises avoid introducing a C stack with circular buffer character istics Remember to keep length regis ters L6 and L7 at zero especially when interrupts are active The final touch is to provide some C specific instructions to the SHARC instruction set CJUMP is used when getting into a C callable subroutine and RFRAME when leaving You ll have to search the SHARC Technical Reference Guide because these in structions are not detailed in the stan dard SHARC user manual The SHARC C JUMP instruction together with other instructions hid den in the branch delay slots is equivalent to the combined 68K in structions JSR with LINK 0 FP www circuitcellar com online The technical guide states CJUMP combines a direct or PC relative jump with register transfer operations that save the frame and stack pointers However the registers are not saved on the stack There are some unfortunate things happening with register R2 in this instruction but should not cause a problem if you remember that R2 is designated as a volatile register Nothing useful should be sto
6. GIRCUIT CELLAR In a recent project Mike set out to de velop DSP algorithms suitable for producing an improved sound stage for headphones Using the Analog De vices 21061 SHARC he modified the phase and amplitude of the audio signal before it is sent to the ear thus creating virtual speakers that give the effect of listening via room speakers www circuitcellar com online FEATURE ARTICLE Michael Smith The SHARC inthe C n a recent project that I did I became interested in develop ing DSP algorithms suit able for producing an improved sound stage for headphones Listening to music through a standard headset see Figure 1 leaves the listener with the impression that the music is inside their head a different feeling than listening to the same music while using speakers However the sound stage of the headphones can be drasti cally changed if the phase and ampli tude of the audio signal are modified before being sent to the ear For ex ample the perceived position of a mono sound signal can be altered by modifying the relative time of arrival of the same sound at the left and right ear Creating the effect of a series of virtual speakers with room reverbera tion can be handled using extensive DSP techniques such as implement ing a series of finite impulse response FIR and Comb filters a cross be tween FIR and infinite duration im pulse filters IIR Sampling CIRCUI
7. T CELLAR ONLINE frequencies must be 44 to 48 kHz for good sound quality A specialized architecture DSP processor or sound card is needed to process one sound bite before the next bite arrives For efficient code development you need to make the appropriate language choice for the various sys tem components One line of de bugged code takes roughly the same time and effort in any language This means that other things being equal developing modules using assembly language should be avoided when higher level languages are available There are a number of different components needed for this sound stage project Standard C for the GUI interface is used to modify speaker and room characteristics The DSP components are best handled as inde pendent interrupts using hardware circular buffers and other custom memory addressing to take advantage of special processor architectural fea tures Setting up the hardware might require calling an assembly code rou tine directly from the higher level language You must become aware of the interaction between assembly code and the C environment to handle coding in such an embedded environ ment This interaction for a CISC processor was detailed in the article I Figure 1 The sound source is perceived in the center of the head if exactly the same sound comes from both left and right earphones If the sound is delayed before being sent to the left earphone then the perceived posi
8. by permission For subscription information call 860 875 2199 subscribe circuitcellar com or www circuitcellar com subscribe htm www circuitcellar com online
9. e first three elements of a 10 element array C code segment int values 10 int sum 0 int pt values for count 0 sum sum pt 68K code segment section data VALUES DS L 40 section code count lt 3 count Clear the sum variable MOVE L 0 DO Set up the pointer to the array MOVE L VALUES AO Calculate the sum ADD L A0 DO ADD L A0 DO ADD L A0 DO Fetch add and increment AQ pointing to VALUES 3 21K code segment segment dm seg_dmda var VALUES 10 endseg segment pm seg_pmco Clear the sum variable RO 0 Set up the pointer to the array I4 VALUES M4 1 Set up address modify constant f Calculate the sum R1 dm I4 M4 RO RO RI CIRCUIT CELLAR ONLINE Fetch and add www circuitcellar com online Listing 2 There will be interesting surprises in the C code operation if the 21K circular buffer hardware is left activated C code segment int values 10 int sum 0 int pt values for count 0 sum sum pt 21K code segment segment dm seg_dmda var VALUES 10 endseg segment pm seg_pmco count lt 3 count Some where else in the code the 21K circular buffer hardware gets left activated VALUES a B4 L4 Same code as before Clear the sum variable R0 0 Setup the pointer to the array I4 VALUES M4 1 Set
10. egister banks Frame pointer convention C stack pointer DO On the stack AO A7 l0 I7 data memory DO D1 AO A1 A6 A7 21K Processor R0 R15 FO F15 same as R registers RO R4 R8 R12 MO M7 M8 M15 LO L7 L815 B0 B7 B8 B15 RO R1 R2 R4 R8 R12 For Rx Ix Mx Lx Bx l6 L6 0 M5 M6 M7 17 L7 0 M5 M6 M7 Table 1 Programmer s model of the main registers on the 68K CISC processor and the 21K SHARC DSP processor At the assembly code level the first bank of 21K Data Address Gen erator DAG1 index registers I0 I7 allows access to dm memory in parallel with access to the pm memory using the DAG index registers I8 I15 Access to arrays stored in memory can be accomplished using either set of DAG registers because of the SHARC s onboard memory organization This can be confusing for the inexperi enced developer because the introduced bus conflicts are handled transparently by the 21K The conflict results in addi tional bus cycles being introduced rather than the expected high speed parallel memory operations MODIFY AND VOLATILE REGISTERS The 21K modify registers MO M7 and M8 M15 can be used in conjunc tion with the index registers to access elements in an array This is equiva lent to using a 68K data register in conjunction with an address register Access to the 32 bit array element data 3 will be programmed on the two processors as follows MOVE L data AO 68K
11. emory Then the table is further modified to ensure that the DoSomething routine is called as a subroutine from within the IRQ1 ISR routine This adds 250 cycles to the interrupt overhead The overhead is less than expected because the SHARC super scalar capability is bought to bear Note there are programming quirks that occur when saving the index and other DAGI registers Because of ar chitectural constraints the DAG1 registers can t be saved directly to the C stack in data memory using instruc tions involving DAGI registers The IRQI interrupt is automati cally activated by calling the inter rupt routine Calling interruptf rather than interrupt changes the lookup table so that a faster interrupt register saving routine will become active In this case 60 cycles are added to the interrupt overhead be cause only the volatile registers are saved and recovered There is also an even smaller interrupt overhead op tion interrupts SUMMING UP This article is directed towards the developer planning to use low level assembly code in conjunction with C code I covered a brief overview of the C programming environment for the Analog Devices SHARC 2106X DSP 6 April 2000 processor Many of the similarities and differences of C coding on the 68K processor using the SDS develop ment environment were discussed The SHARC has many interesting architectural features designed to opti mize DSP algorithms Th
12. ere is an alternate set of 21K data registers available for fast interrupt handling where as the 68K registers must be saved to slow external memory The 21K data registers can be used for both for integer RO R15 and floating point FO F15 operations The 32 bit addition register to register operation REGO REGO REG1 is written on the two proces sors as ADD L D1 DO RO RO R1 68K 21K The 21K assembler format has a number of other advantages in addi tion to its C like characteristics First there is no hidden source register like in the 68K syntax At the end of a long day working on the 68K I might start wondering if DO was meant to be added to D1 and stored into D1 or was D1 supposed to be added to DO and stored in DO Note that the use of the semicolon to signal the end of an assembly in struction permits a single 21K in struction to be written across many lines of code This free formatting also allows documentation of the instructions describing parallel opera tions to multiple registers and memory accesses in a single cycle Invocation of the 21K processor s super scalar capability requires syntax in the form of FO F1 F4 F2 F8 F12 21K CIRCUIT CELLAR ONLINE the 16 bits of the 68K data bus MEMORY ACCESS Figure 2 shows the Data Address Generator DAG1 Block from the SHARC 2106X processor The eight 21K index registers I0 I7 play roughly the same role as the eight
13. ese include hardware stacks for loop and subrou tine handling with zero overhead cir cular buffer capability You must understand the consequences of acti vating these features from within an assembly code routine called from C For more information on the SHARC internal register operations use SHARC Navigator LIVE or contact Talib Alukaidey at T Alukaidey herts ac uk Look for my next article in the August issue 121 of Circuit Cellar magazine where Laurence Turner and I will look at how to get the best perfor mance out of your processors when handling DSP algorithms l Michael Smith is a professor at the University of Calgary Canada where he teaches about standard and ad vanced microprocessors He spent the last two years creating lessons and laboratories using ten SHARC 21061 development systems obtained from Con Korikis Analog Devices Univer sity Support Program You may reach him at smithmr ucalgary ca REFERENCES 1 E Bessinger Localization of Sound Using Headphones M Sc Thesis University of Calgary Canada Analog Devices ADSP 2106X SHARC User s Manual Analog Devices Analog Devices ADSP 21065L SHARC Technical Reference Guide Analog Devices C Compiler Guide and Reference for the ADSP 2106X Family DSPs Analog Devices University Sup port Program www analog com dsp CIRCUIT CELLAR ONLINE Circuit Cellar the Magazine for Computer Applications Reprinted
14. hoice matches the need to access both program and data memory The volatile 21K data registers unlike the volatile 68K data registers can t be used in conjunction with the index registers to step through an array Specific volatile 21K modify registers M4 and M12 are needed for this purpose CIRCUIT CELLAR ONLINE Note that there are both PREMODIFY and POSTMODIFY memory accessing modes on the 21K M4 4 R1 dm M4 I4 PREMODIFY R2 dm I4 M4 POSTMODIFY The value at memory location I4 M4 is fetched during the premodify operation with index register I4 left unchanged In the postmodify opera tion the memory at location I4 is fetched and then the index register is autoincremented so that I4 14 M4 The postmodify operation can be described in a few bits which allows parallel postmodify operations to be described in a single opcode to pro gram and data memory LENGTH AND BASE REGISTERS There are two sets of 21K registers that have no equivalent in the 68K architecture the base and length registers The length register signifi cantly affects accessing data arrays within a C C program This effect is hidden in the code sequences which use the C default settings of the length and base regis ters Listing 1 shows the C code and assembly language needed to calculate the sum of the first three elements in a 10 element array using an autoincrementing pointer for both 68K and 2 1K processors The onl
15. nto the C stack above the return address By contrast the first three subroutine parameters are passed via 21K data registers R4 R8 and R12 Even pointer values e g int pt are passed through the data registers This can invoke a string of error mes sages from the 21K The 21K registers can t be used for both address and data purposes as many RISC general regis ters can The 21K pointer value must be moved from the data register pa rameter into a volatile index register Even with this complication passing parameters via registers still saves considerable time over putting things on and off an external memory stack As stated earlier l ll ignore the complica tions arising from situations where subroutines call other subroutines HARDWARE AND SOFTWARE Things get interesting with the SHARC s C programming model when the programmer attempts to pass the fourth parameter to the sub routine On a register windowed pro cessor such as the SPARC or the 29K RISC this wouldn t be a problem The 29K has 128 registers for param eter passing However the SHARC has no other allocable volatile regis ters and the fourth parameter must be passed another way It will do you no good to try to do the 68K trick passing the parameter 4 April 2000 above the return address on the stack The standard place for the 21K return address is on a specialized high speed PC stack This hardware is provided without access to
16. red there during a sub routine call The RFRAME instruction is used as part of the return sequence from an assembly language routine back to a C calling program This is equivalent to the 68K UNLEK instruction includ ing that instruction s memory access Fortunately the necessary instruc tions for returning to a C calling func CIRCUIT CELLAR ONLINE tion from assembly code are always the same and can be cut and pasted into a macro DO_MAGIC In this macro the return address from the C stack is fetched As part of the indi rect jump using a program data memory volatile index register the return address is tweaked by an in struction to get it pointing in the proper direction Finally the RFRAME instruction causes the C top of stack to be copied from the frame pointer and the old frame pointer recovered from the C stack during the branch delay slots define DO_MAGIC 112 dm 1 I6 GET RETURN ADDRESS JUMP 1 112 DB ADJUST ADDRESS A LITTLE nop RFRAME AD JUST FRAM AND TOP OF STACK POINTERS INTERRUPT HANDLING In this article I am more interested in the C programming side of inter rupt handling on the SHARC rather than specific hardware details How ever it is not possible to completely separate the two With the SDS 68K compiler the statement pragma interrupt must be added before the code for the C interrupt service routine ISR This informs the compiler that the ISR
17. the data registers and can only hold a few values There are a number of other prob lems as well A characteristic of the C program is the presence of subrou tines calling other subroutines The innermost subroutine in a program could easily be nested 8 or 10 subrou tines deep inside the main function This type of operation cannot be handled with a shallow hardware stack using the standard SHARC sub routine CALL and RTS instructions Also there s the problem of all the variables and arrays declared inside the C function or interrupt handling when copious material must go onto the stack The approach taken on the SHARC brings back fond memories of my early days developing embedded sys tems with microprogrammable chips If you didn t have the required in structions you either added em or faked em In the SHARC C programming model index registers I6 and I7 are set up as a frame pointer and a C top of stack pointer respectively These registers function essentially in the same way as the 68K frame A6 and stack pointer A7 However the SHARC I7 register points to the next empty stack location whereas the 68K A7 register points to the current valid value on the stack Several modify registers are initial ized to values 0 1 and 1 to speed address stack operations on their way in the SHARC in the C Of course to Listing 1 A comparison of C code 68K and 21K assembly code to calculate the sum of th
18. tion of the sound source shifts to the right side of the head Furthermore modeling of the audio channel using FIR and IIR filters can move the perceived sound out to a virtual speaker in front or back of the listener s head 1 April 2000 1 wrote Some Assem bly Required Circuit Cellar 101 However interfacing between assembly code and C functions is different on newer DSP chips because of the new architectural features Bit reverse 1st B updates From instruction Bit reverse Ke in DSP processors On the positive side additional data and ry PM address bus DM address bus Simultaneous multiplication and addition in a single cycle is available only within certain 21K register banks This is a limitation of the number of bits avail able on the 21K pro gram data bus even though this at 48 bits Program memory W 0x00005 E 0 00006 0x00007 0x00008 l 0x00009 0x0000a O 0x0000c E 0 00004 BB 0x0000e 0x0000f E 000010 y E a PM data bus DM data bus address registers are available Specialized hardware allows fast switching between subroutines or handling zero overhead loops On the negative side many of the new processor features are not directly describable using the standard C lan guage What s the syntax for accessing an array using the bit reversed circular buffer address register operations Some of
19. y difference is the need to perform a 21K load memory to register opera tion before performing the ADD The 21K doesn t have the complex ad dressing capabilities of the 68K CISC architecture Mind you a 40 MHz 21K performs the fetch and add opera tion in two clock cycles while a 40 MHz 68K takes 16 cycles Listing 2 has a hidden kick in its operation although it looks similar to Listing 1 The length L4 and base registers B4 establish a two word long circular buffer The first two memory fetches work as expected However in the second post modify operation index register I4 is incremented to point to VALUES 2 and then modified by the SHARC circular buffer hardware to point back to VALUES 0 see Figure 2 April 2000 3 For standard C array handling the length register Lx associated with index register Ix must remain 0 or be returned to 0 Pity the poor 21K code developer who has to try maintaining C code where the base and length registers are unintentionally left modified by an interrupt service rou tine that is infrequently invoked PASSING PARAMETERS Both registers and the C stack are used for parameter passing on the 68K and 21K processors One register is typically designated for returning values from a function DO 68K and RO 21K with occasional assistance from other registers With the limited number of 68K registers available parameters are typically passed to subroutines by pushing them o

Download Pdf Manuals

image

Related Search

Related Contents

Origin Storage 512GB MLC SATA  DOC 51 0797/001 DOC 51 0797/001    Ricetrasmettitore Zodiac Skill 417 Dual Band LPD + PMR  UART/RS232/RS485/USB/Ethernet Over Powerline Communication  PDFファイル    StarTech.com Short Micro-USB cable – 15cm (6in)  Manuale - Kombi Kompakt HR    

Copyright © All rights reserved.
Failed to retrieve file