Home
2. Blackfin Overview
Contents
1. Unmapped sections can be mapped simply by dragging to an appropriate memory segment a D Segment Section StartAddress End Address aE 19 _ _ eft MEM_SDRAMO 0 4000 c tion O ASYNCO 0x20000000 D gat MEM ASYNCI 020100000 a E code 3 MEM ASYNC2 020200000 a L1 data MEM ASYNC3 0 20300000 data b MEM L1 DATA aE MEM L1 DATA A CACHE axira aE init MEM L1 DATA B STACK constdata MEM L1 DATA cplb MEM L1 CODE E cplb code MEM 12 CACHE Oxffa10000 Oxffa13fff E data L1 SCRATCH Oxffo00000 E ctor MEM SYS MMRS 0 4 00000 a 9 0 exem AND LINE OBJECTS SOBJECTS Expert Linker CPP Test ldf testdoj 8 8 noncache code Input Sections Memory aE program 4 Start Address voldata amp DI exem E MEM SDRAMO HEAP alll SCOMMAND LINE OBJECT dll SOBJECTS basiccrtdoj cpibtabs33 doj testdoj 3 MEM _ 532 3 ASYNC2 MEM ASYNC3 3 noncache code MEM L1 DATA A amp 0 program 3 MEM L1 DATA A CACHE V libprofile532y dib EndAddress Off N A N A Ox200ff
2. gt Analog Devices Confidential Information DEVICES ANALOG System Control Blocks Peripheral Blocks Analog Devices Confidential Information System Control Blocks Memory Processor Core To 400MHz 16 bit External Memory Test Control Event Controller Emulation Control Watchdog Timer Up to 64KB Inst Up to 64KB Data SRAM SRAM Scratch 32KB 32KB 16KB 16KB Pad 4KB 10 100 Ethernet MAC 16 GPIO System Interface Unit SPORT1 UARTO 1 SPIO 0 7 PPI SPORTO 32 GPIO Peripheral Analog Devices Confidential Information System Control Blocks Memory Processor Core To 500MHz 16 bit External Memory Test Control Event Controller Emulation Control Watchdog Timer Up to 64KB Inst Up to 64KB Data SRAM SRAM Scratch 32KB 32KB 32KB 32KB Pad 4KB 10 100 Ethernet MAC 16 GPIO System Interface Unit SPORT1 UARTO 1 SPIO 0 7 PPI SPORTO 32 GPIO Peripheral Analog Devices Confidential Information rre Blackfin Operating System Support CROSSCORE metrowerks Plum Tachnalagy Embedded Systems Division af Basic Needs Limited Budget FREE with VisualDSP Media Web centered Em
3. gt if for else for Analog Devices Confidential Information e Predicated Instruction Support The blackfin predicated instruction support takes the form of IF CC reg reg Much faster than a conditional branch 1 cycle but limited Help the compiler to see the opportunity For instance consider speculative execution e if A X EXPR1 else X EXPR2 e X EXPR1 IF A X EXPR2 e Or X EXPR1 Y EXPR2 if A X Y Analog Devices Confidential Information e The Worid Leader in High Performance Signal Processing Solutions Loops ny Iro V NM Lop _ AP MN The inner loop The optimizer focuses on the inner loop because this is where most programs spend most of their time Considered a good trade off to slow down loop prologue and epilogue to speed up loop Make sure your program spends most of its time in the inner loop Analog Devices Confidential Information The optimizer works by unrolling loops e Vectorization e Software pipelining Do not unroll loops yourself Avoid loop carried dependencies Avoid aliases Do not rotate loops yourself Analog Devices Confidential Information ANALOG DEVICES Software Pipelining What is software pipelining e Technique used to schedule loops and functional units e
4. Analog Devices Confidential Information el Configurable L1 Memory Selections L1 Instruction L1 Data A L1 Data 11 Data B Scratchpad BF533 and Cache SRAM only Cache SRAM SRAM Analog Devices Confidential Infonration Using instruction cache will improve performance for most applications Data Cache may or may not improve performance Max bandwidth into L1 memory is available with cache enabled Trade offs must be made on code control and peak short term performance Dines Core L1 Memory Registers General Control eIMEM CONTROL Instruction Memory e CONTROL Data Memory Cache and Protection Properties n 0 to 15 DATAn ICPLB ADDRn eDCPLB DATAn ICPLB ADDRn Test Functionality COMMAND ITEST DATA eDTEST COMMAND DTEST DATA Arnalog Devices Confidential Inforrration ANALOG DEVICES BF533 L1 Instruction Memory Instruction Bank C BF531 BF532 BF533 16KB SRAM CACHE Instruction Bank B BF531 16KB SRAM BF532 32KB SRAM BF533 32KB SRAM Instruction Bank A BF531 32KB ROM BF532 32KB ROM BF533 32KB SRAM INSTRUCTION BANK C 16KB CACHE OR SRAM INSTRUCTION BANK B 32 KB SRAM
5. Devices Confidential Inforrration ANALOG DEVICES The link process is controlled by a linker command language The LDF provides a complete specification of mapping between the linker s input files and its output It controls input files output file target memory configuration Preprocessor Support Analog Devices Confidential Information Global Commands Defines architecture or processor Directory search paths Libraries and object files to include Memory Description Defines memory segments Link Project Commands Mapping of input sections to memory segments Output file name Link against object file list Analog Devices Confidential Information ANALOG DEVICES nga Global Commands amp Memory Description ARCHITECTURE ADSP BF533 LLL obal Commands SEARCH DIR ADI_DSP Blackfin lib OBJECTS COMMAND LINE OBJECTS MEMORY seg data a seg data b seg data scr seg prog 0 800000 WIDTH START 0xFF900000 END OxFF903FFF WIDTH TYPE RAM START 0xFFB00000 END OxFFBOOFFF WIDTH TYPE RAM START 0xFFA00000 END OxFFAO3FFF WIDTH 8 8 Analog Devices Confidential Information 8 8 Link Commands PROCESSOR OUTPUT COMMAND LINE OUTPUT FILE SECTION
6. Two Types of transfers available on the ADSP BF533 BF561 Descriptor based DMA transfers e Requires a set of parameters stored within memory to initiate a DMA sequence These parameters are transferred to DMA control registers upon a start of a DMA transfer e Supports chaining of multiple DMA transfers Register based DMA transfers e Allows the user to program the DMA control registers directly to define and initiate a DMA sequence e Upon DMA completion depending on certain bits with the Configuration Register Control registers are automatically updated with their original setup values Autobuffer Mode Or the DMA Channel gracefully shuts off Stop Mode Devices Confidential Inforrration aos Descriptor Array Mode 0 0 Start Addr 15 0 0 2 Start Addr 31 16 0 4 DMA Config 0x6 X Count 0x8 X Modify 0 Y Count OxC Y Modif 0 0 10 0 12 0 14 0x16 0x18 Ox1A Ox1C Start Addr 15 0 Ox1E Start Addr 31 16 0x20 Config Descriptor Block 1 Descriptor Block 2 Descriptor Block 3 Descriptor List Small Model Mode Next Desc Ptr 15 0 Start Addr 15 0 Start Addr 31 16 DMA Config X Count X Modify Y Count Y Modify Desc Next Desc Ptr 15 0 Next Desc Ptr 31 16 Start Addr 15 0 Start Addr 31 16
7. CPLB L1 AOW Valid only if write through cacheable CPLB VALID 1 CPLB WT 1 0 Allocate cache lines on reads only 1 Allocate cache lines on reads and writes CPLB_WT Operates only in cache mode 0 Write back 1 Write through CPLB L1 CHBL Clear this bit when L1 memory is configured as SRAM 0 Non cacheable in L1 1 Cacheable in L1 CPLB DIRTY Valid only if write back cacheable CPLB VALID 1 CPLB WT 0 and CPLB L1 CHBL 1 0 Clean 1 Dirty A protection violation exception is generated on store to this page when this bit is 0 The state of this bit is modified only by writes to this register The exception service routine must set this bit Bits 17 16 Page Size 1 0 same as ICPLB Register Note n equals 15 0 Analog Devices Confidential Information L CPLB_VALID 0 Invalid disabled CPLB entry 1 Valid enabled CPLB entry CPLB LOCK Can be used by software in CPLB replacement algorithms 0 Unlocked CPLB entry be replaced 1 Locked CPLB entry should not be replaced CPLB USER RD 0 User mode read access gen erates protection violation exception 1 User mode read access permitted CPLB USER WR 0 User mode write access gen erates protection violation exception 1 User mode write access permitted CPLB SUPV WR 0 Supervisor mode write access generates protection vio lation exception 1 Supervisor mode write access permitted
8. Four Library Functions Can Be Used to Allocate or Free Memory to from the Heap e malloc calloc realloc free Other C Library Functions Implicitly Use these Four Functions and ALSO Require the Heap e memmove memcopy etc Initialized by Constants Defined in the LDF eldf heap space e df heap length e heap end Multiple Heaps are Possible e Can be defined at Link Time or at Run Time see compiler manual Analog Devices Confidential Information LDF Heap Setup C Compiler Only Output Section heap Calculates LDF Heap Initializers from Heap Memory Segment Description ifdef USE CACHE heap Allocate a heap for the application Idf_heap_space 1 heap end Idf heap space MEMORY SIZEOF MEM SDRAMO 1 heap length Idf heap end Idf heap space MEM SDRAMO else heap Allocate a heap for the application Idf_heap_space heap end Idf heap space MEMORY_SIZEOF MEM_L1_DATA_A_CACHE 1 heap length Idf heap end Idf heap space MEM L1 DATA A CACHE endif USE CACHE Analog Devices Confidential Information d The Worid Leader in High Performance Signal Processing Solutions Expert Linker Using the LDF Vizard Ira 6 Red ui s Expert Linker Features Expert Linker is a Graphical tools that can Use wizards to create LDF
9. When the MDMA counter decrements from 1 to 0 the next available MDMA stream is selected e If the MDMA period is set to 0 then MDMA is scheduled by fixed priority If the MDMA period is setdogtweeaclaempie db the two MDMA nrantard hiie anrace in altarnata hiiecte af un ta n data trancfare P e o Copy Mills l 9 x Traffic Control cont d Without traffic control gt e lt _ With traffic control Arrows represent transfers in and out of SDRAM Important Register Allows the 2 Reads and 2 writes are definition of transfer sizes in a given more efficient with traffic control direction on DMA busses Max values usually yield best performance but it is application dependent Analog Devices Confidential Information ANALOG DEVICES Two Dimensional 20 Analog Devices Confidential Information Supports arbitrary row and column sizes up to 64K x 64K elements X_Count row size and Y_Count column size NP X MODIFY VVN a 2 Y COUNT 4 S 2 Bb X COUNT must be 2 or greater ANALOG Two Dimensional 20 X Modify eis the byte address increment applied after each transfer that
10. OXF FAD 6000 gt Internal 0000 gt OxFF90 8000 p Memory OxFF90 6000 3 22509664 OxFF90 4000 p OxFF90 0000 gt OxFF80 8000 gt OxFF80 6000 gt OxFF80 4000 OxFF80 0000 gt Reserved Reserved Data Bank A SRAM Cache Data Bank SRAM Cache Reserved EDO o C 0 2040 0000 222 0200000 aeon oo rT amene Async Banki _________ 020100000 gt External 0 0800 0000 gt 0000 __ 15984 _____________ Analog Devices Confidential Information ren XU j n s Memory Hierarchy on the BF533 As processor speeds increase 300Mhz 1 GHz it becomes increasingly difficult to have large memories running at full speed The BF53x uses a memory hierarchy with a primary goal of achieving memory performance similar to that of the fastest memory i e L1 with an overall cost close to that of the least expensive memory i e L2 L1 Memory L2 Memory CORE Internal External Registers Smallest capacity Largest capacity Single cycle access Highest latency Analog Devices Confidential Information re 7 ______ MEMORY Core Clock ore Cloc CCLK Domain pax NE 1 System Clock SCLK Domain External Access Bus EAB CONTR
11. 1 4 0 Example signed int 16 R3 I1 load divisor cc R3 lt 0 check if negative R1 15 add 2 n 1 to divisor R2 R3 RI R2 if divisor negative use addition result R3 gt gt gt 4 to the divide a shift Ensure compiler has visibility Divisor must be unambiguous Analog Devices Confidential Information ae Cop Division be created by For loops Sometimes the compiler will calculate number of iterations for Iz start lt finish step compiler plants code to calculate iterations finish start step Analog Devices Confidential Information Division Trick 1 Multiply by Reciprocal float recip NUM SAMPS 1 0 NUM SAMPS for i20 i lt NC i for j 0 j lt NC j float sum 0 0 for 0 K lt NUM_SAMPS k sum Input i NC Cover i NC j sum Replace Division by Multiplication by Reciprocal e helps when divisor is locally constant e answer may be slightly different is this OK Arnalog Devices Confidential Information re eut iW f t k Use the laws of Algebra Original customer benchmark compares ratios coded as if XY gt Recode as if X B A Y Another way to lose divisions Problem possible overflow in fixed point The compiler does not know anyt
12. Analog Devices Confidential Information nS Addressing Operations are Fully Efficient pA amp A 0 gt zero overhead loop pB amp 0 gt D loop invariant pP amp P N 1 loaded once outside loop pQ amp Q N 1 for j 0 j lt N j P j AL j C B N j 1 D BIN j 1 D Pel B pQ pB loaded once reused You Can Count on the Optimizer to Do This Transformation Analog Devices Confidential Information The Worid Leader in High Performance Signal Processing Solutions How can we improve on the compilers effort yp Ding i LS 1 M Getting Started 80 20 Find out where program spends its time 80 20 rule Measure Intuition is notoriously bad here instrument use profiler and cycle accurate simulator Loops always a good place to look Even a trivial operation can have a significant cost if it is done often enough Analog Devices Confidential Information ANALOG DEVICES m MNA V EDT NM 97424 Use the Statistical Profiler Statistical profiling samples the program counter of the running application and builds up a picture of where it spends its time Completely non intrusive no tracing code is added Completely a
13. E Post buid i E General 1 Eh General 2 fi Preprocessor fih Processor 1 fh Processor 2 Warning E Workarounds E Assemble Link E General 158 LDF Preproces Th Elimination Eh Processor 58 Options 5 Spitter E Pre buid Pre build description Project Postbuild description Postbuild command s Analog Devices Confidential Inforration xX tS ANALOG Debug target ADSP BF53x Family Simulator ADSP BF53x Single Processor Simulator Session name ADSP BF533 ADSP BF53x Single Platform Sessions define Debug Environments Choose Sessions List Select Session to activate Define New Session from Session List Session List ADSP BF537 ADSP BF5xx Single Processor Simulator Select Sessions pull down menu ADSP BF561 Dual Processor Simulator ADSP BF561 EZ KIT Lite HPUSB ICE ADSP TS201 ADSP TS201 Rev 1 0 Single Processor Simula SP BF533 ADSP BF5xx Single Processor Simulator T Session Configuration Target ADSP BF5xx Blackfin Family Simulators Select New Session Pha Side recone Configure session as required e g Processor Simulator Click OK Session name will appear in Session List Click Activate IDDE session will open i New Session Debug target ADSP BF5xx Blackfi
14. The Blackfin instruction set includes a number of operations d support fractional or fract data The instructions include e saturating MAC ALU SHIFT instructions e MAC shift correction for fractional inputs The compiler and libraries provide support for fractional types e fractional builtins e fract types fract16 and fract32 e ETSI e C fract class Fractional arithmetic is a hundred times faster than floating Analog Devices Confidential Information rn 4 WS N N ETSI Builtins fully optimised Fractional arithmetic to a standard specification European Telecommunications Standards Institute s fract functions carefully mapped onto the compiler built ins add sub abs s shi shr mult mult r negate round L add L sub L abs negate L shl L shr L mult L mac L msu saturate extract_h extract L deposit L deposit h div s norm s norm L Extract L Comp Mpy 32 Mpy 32 16 Immediate optimisation of ETSI standard codecs Highly recommended Analog Devices Confidential Information e An Vito Is dL MT Pointers or Arrays Arrays are easier to analyse void va ind int a int b int int n i nt i for i 0 lt i outfi ali bli Pointers are closer to the hardware void
15. 8 Link exceptions and RTTI 558 General Allow mult line character strings A d Pointers to const may point to non cor Processor Do nottreat EP operations as associa Load Non standard circular buffer idiom fi Options Disable hardware circular buffers d Additional options Assembler Property Page Arnalog Devices Confidential Information Project Options for fir_533 ih Project Eh General a 8 Compie f General 1 f General 2 1 Preprocessor 158 Processor 1 5 Processor 2 Warning fi Workarounds sem Link fb General Additional include directories f LDF Preproces 54 Elimination Processor f Load 158 Options 55 Kernel Additional options E Project As Options Generate verbose output Save temporary files V Generate debug information Output listing fle Skip preprocessing Preprocessor definitions ANALOG DEVICES Property Pages Project 5 General fa Compile General 1 5 General 2 b Preprocessor 158 Processor 1 8 Processor 2 f Warning E Workarounds T Assemble Link ET T LDF Preproces Project Options for fir_533 4 ProjectLink Options Generate objecttrace Strip debug symbols Strip all symbols Additional Output Generat
16. DMA Config X Count X Modify Y Count Y Modify tor List Large Model Mode Analog Devices Confidential Information Next Desc Ptr 15 0 onn 15 0 LA Start_Addr 31 16 DMA_ Config X_Count X_Modify Y Count Y Modify Next Desc Ptr 15 0 A 1 16 Sfart Addr 15 0 Start Addr 31 16 DMA Config X Count X Modify Y Count Y Modify ANALOG DEVICES Transfer Modes The Transfer Mode is controlled by 3 bits called the FLOW 2 0 bits within the DMA Configuration Register e Stop Mode FLOW 0 0 When the current DMA transfer completes the DMA channel stops automatically after signaling an interrupt if enabled e Autobuffer Mode FLOW 0 1 DMA is performed in a continuous circular buffer fashion based on user programmed DMAx MMR settings On completion of the DMA transfer the Parameter registers are reloaded into the Current registers and DMA resumes immediately with zero overhead Autobuffer mode is stopped by a user write of 0 to the DMA enable bit in the DMAx DMA Config Register e Descriptor Array Mode FLOW 0x4 In this mode the Descriptor Block does not include the NEXT DESC PTR parameter Descriptor Blocks are placed one after the other within memory like an array e Descriptor List Small Model Mode FLOW 0x6 In this mode the Descriptor Block d
17. Debugging e Simulation Emulation EZ KIT e Run Step Halt e Breakpoints Watchpoints e Advanced plotting and profiling capabilities e Pipeline and cache viewers Analog Devices Confidential Infonration VisualDSP What comes with VisualDSP e Integrated Development and Debugger Environment IDDE Compiler Assembler Linker VDK Emulation and Simulation Support On line help and documentation Part VDSP BLKFN FULL Floating License Part VDSP BLKFN PCFLOAT VisualDSP is a common development environment for all ADI processor families e Blackfin ADSP BF5xx e TigerSharc ADSP TSxxx e Sharc ADSP 21xxx e Each processor family reguiresasslearale dia GAS Ration ANALOG fig e Features of VisualDSP 4 0 Integrated Development and Debugger Environment IDDE e Multiple workspaces projects project groups Project Wizard e Create configure a DSP project High level language support including C and C Expert Linker e Graphical support for managing linker description files e Code profiling support Easy to use Online Help BTC Background Telemetry Channel Support e Data Streaming and Logging Easy to test and verify applications with scripts TCL VB Java VisualDSP RTOS Kernel Scheduler VDK Integrated Source Code Control Device Dri IS Seri Analog Devices Confidential Infonration DAS Co
18. plications PDA e Internet audio Digital Still Camera Video camera e Video conferencing e MPEG2 e DVD Digital Printing Audio e MP3 Audio e Digital Car Radios Modems e ADSL e VoIP Phone Solutions e Cable Modems e RAS Modems e Wireless modems Mobile Phones e GSM Mobile phones e 3G data terminals Internet Appliances Analog Devices Confidential Information ANALOG DEVICES ADI Blackfin Performance Leadership VLIW DSPs and o o C6416 Texas Instruments E C64 Series 200 semi x 5502 Texas Instruments E C5509 C55 Series 729 DSP enhanced RISC Processors Conventional DSPs 100 50 10 5 Price 4 10 kU Benchmark BDTlmark2000 BDTIsimMark2000 Analog Devices Confidential Information n KES Blackfin Competitive Performance Advantage Blackfin has Higher Clock Rate And gt 2x Signal Processing Performance 700 Frequency MHz 2002 2003 TMS320C55x O Intel PXA2xx Blackfin RDT 72771771111 The BDTImark2000 BDTIsimMark2000 ES provide a summary measure of DSP speed PARTNER For more info and scores see www BDTI com Scores 2002 2003 BDTI BD lImark20007WBDT simMhrk2000 a 0 500 1000 1500 2000 2500 3000 3500 BDTImark2000 BDTIsimMark200
19. 0000 0001 15 14 13 12 11 10 9 8 7 6 5 4 3 2 LRUPRIORST LRU Priority Reset 0 LRU priority functionality is enabled 1 All cached LRU priority bits LRUPRIO ENICPLB Instruction CPLB Enable 0 CPLBS disabled minimal address checking only 1 CPLBs Enabled IMC L1 Instruction Memory are cleared Configuration ILOC 3 0 Cache Way Lock 0 Upper 16 KB of LI instruc 0000 All Ways not locked tion memory configured as 0001 Way 0 locked Way 1 Way 2 SRAM 1 Upper 16 KB of L1 instruc tion memory configured as cache and Way 3 not locked 1111 All Ways locked Analog Devices Confidential Information re 17 Nata Mamnarv 14 SRAM DATA BANK B V 10 BUFFERS 8 2 8 P L Ll l ka 5 4KB 4KB 3 E ee Victim Buffers 8 4KB ake Victimized Write Back PIE 9 Cached Data to external 1 TAG 1 1 4KB 1 LES Write Buffer Write Through and 1 5 Te ee Non cached Data to external memory Tac DAG1 LOAD T DAGO LOAD STORE BUFFER DAG1 0 STORE 6X 32 A
20. ANALOG DEVICES Example Protection Operation Set up CPLBs to define regions and properties Default hardware CPLBs are present for MMRs and scratchpad memory e CPLBs must be configured for L1 Data and L1 Instruction Memory as Non Cacheable e Disable all memory other than the desired memory space e Execute Code code tries to access memory that has been disabled or protected then a protection violation occurs as exception Analog Devices Confidential Information ANALOG DEVICES Example CPLB Setup E L1 Instruction Non cacheable Instruction CPLB setup 1MB page Memory management handles exceptions and redefines external memory pages as required for external memory Examples will be provided to customers Analog Devices Confidential Inforrration EAS nS Accessing the Cache Directly Once L1 memory is configured as cache it can t be accessed via DMA or from a core read _ COMMAND ITEST DATA memory mapped registers do allow direct access to Instruction Memory tags and lines Analogous registers exist for Data Cache Can be useful for invalidating cache lines directly Analog Devices Confidential Information fee Wr MN LL SUE Mar Data Cache Control Instructions Prefetch Causes data cache to prefet
21. Heap s Map Input Sections to Memory Segments BF533 Default LDF Segment Names Used Segment Name Use e MEM L1 CODE code storage MEM L1 CODE CACHE code storage if not cache MEM L1 DATA used for default compiler data sections MEM L1 DATA A CACHE If not used as cache it becomes heap space e MEM L1 DATA B used for default compiler data sections e MEM L1 DATA B CACHE If not used as cache it is used for data e MEM L1 DATA B STACK dedicated stack space e MEM L1 SCRATCH Dedicated 4 Kbyte Data Scratchpad SDRAMO HEAP If L1 Data A used as cache heap is external e MEM SDRAMO external SDRAM bank MEM ASYNCx 0 1 2 3 1MB Async Banks Analog Devices Confidential Infonration EAS KEN LDF and the Stack C C Runtime Environment Depends Upon the Initialization of FP and SP Variables Initialized by Constants Defined in the LDF ldf stack space eldf stack end Variables Used to Initialize FP and SP are Declared and Initialized in the Assembly File basiccrt s Analog Devices Confidential Information e i LA LDF Stack Setup C C Compiler Only Linker Calculates LDF Stack Initializing Constants from the Stack Memory Segment Description stack Idf_stack_space Idf_stack_end Idf stack space MEMORY SIZEOF MEM L1 DATA B STACK 5 MEM L1 DATA B STACK Analog Devices Confidential Information LDF and the Heap
22. Ptr Register increments as each descriptor element is read in For Descriptor Array Mode the Curr Desc Ptr Register must be programmed not the Next Desc Ptr 5 Analog Devices Confidential Information Current Start Register DMAx CURR ADDR MDMA yy CURR ADDR 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Current Address 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Reset 0 0000 0000 Current Address 15 0 Contains the current DMA transfer address At the start of a DMA transfer the Curr Addr Register is loaded from the Start Addr Register and it is incremented as each transfer occurs Analog Devices Confidential Information m UT Current X Count Register DMAx CURR X COUNT MDMA yy CURR X COUNT 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 CURR_X_COUNT 15 0 This register is loaded by X Count at the beginning of each DMA transfer It is decremented each time an element is transferred For 2D DMA Curr X Count is reloaded after the end of for each row Expiration of the count in this register signifies that DMA is complete In 2D DMA this register is 0 only wher transfer is complete Devices Corfcertil Irforretion DEVICES Iri KEN Current Outer Loop Count Register DMAx CURR Y LOUNT MOMA VY CURR Y COUNT 15 14 1
23. RX Re assignab UART TX Re assignab Fixed o o e E Of CO NI oO AJ W Analog Devices Confidential Information ANALOG DEVICES fig 2 LL M Peripheral Map Register DMAx PERIPHERAL MAP MDMA yy PERIPHERAL MAP 15 14 13 14 0 4 2 7 8 9 10 11 Reset 0 0000 Channel Type PMAP 3 0 Peripheral Mapped to RO 0 Peripheral DMA this Channel 1 Memory DMA 0000 PPI 0100 SPORT1 TX 0001 SPORTO RX 0101 SPI 0010 SPORTO TX 0110 UART RX 0011 SPORT1 RX 0111 UART TX The Peripheral Map Register allows the user to map a peripheral to a specific channel thus programming the Analog Devices Confidential Information Initialization To initiate a DMA transfer certain parameters need to be defined before the DMA engine can start a DMA sequence These parameters are e Configuration describes certain characteristics of the DMA transfer such as data Size transfer direction etc e Start Address Specifies the address where the DMA transfer will start from e Count Specifies the number of elements the DMA Engine will transfer e Modify Specifies the address increment after every element transfer Analog Devices Confidential Information
24. are improving EDN Improvement in the last 2 years 96 Improvement 70 60 m ligerSharc g Blackfin g TI C62x o 55 E SC140 m Sharc 219x 50 40 30 20 10 0 Analog Devices Confidential Information d There is a massive effect from optimization on a DSP platform Much more than on RISC chips Non optimised code is up to 20 times slower Sliding scale from control code to DSP inner loop Non optimized code is only for debugging the algorithm You can also perform limited debugging optimized with O 9 which gives access to global variables function names and line numbers Analog Devices Confidential Information ANALOG DEVICES 57 gt SN sat ye Un Optimized Code for Blackfin Unoptimised assembly Loop control The source code R3 FP 8 150 increment test amp exit lt 2 foe aie for i 0 lt 150 i Load dotp ali sqr bli Eur Load B I R7 WIP2 0 X R7 RD prop Sum A I B I 4 R0 8 Meu The Optimised assembly PO R3 seii ia Load Pl Pl A Dg easier to understand 7 8 Load 1 12 LSETUP P1L2 P1
25. decrements Curr X Count eis not applied when the inner loop row count is ended by decrementing Curr X Count from 1 to 0 Y Modify eis the byte address increment applied after each decrement of Curr Y Count e is not applied to the last element in the array on which the outer loop column count Curr Y Count also expires by decrementing from 1 to 0 After the last transfer completes e Curr Y Count 1 e Curr X Count 0 e Curr Addr is equal to the last item s address plus X Modify In Autobuffer Mode these registers are reloaded from X Count Y Count and Start upon the first data transfer Devices Confidential Inforrration n DN Ax START ADDR Start address of DMA buffer DMAx DMA CONFIG DMA configuration register DMAx X COUNT Inner loop count gt arameter Registers DMAx X MODIFY Inner loop address increment in bytes DMAx Y COUNT Outer loop count 2D DMA only DMAx Y MODIFY Outer loop address increment in bytes DMAx CURR DESC P Current Descriptor Pointer Bax CURR ADD Current DMA Address DMAx IRQ STATUS Interrupt Status Register contains completion Control and error interrupt status information Current Status Registers Registers DMAx PERIPHERAL Priority mapping register D COUN Current count 1D or intra row X count 2D DMAx CURR Y COUN Current row count 2D DMA only ANALOG Analog Devices Confidential
26. files Define a DSP s target memory map Drag and Drop object sections into the memory map Graphically highlights code elimination of unused objects Profile object sections in memory Analog Devices Confidential Inforrmation ANALOG DEVICES Ain Fig fiers Edit Session View Project Register Memory Debug Settings 9 p So AVAF a cp cor Tes Expert Linker Create LDF x m TO id q 5 OOP HPP Em Elash Programmer PGO Project Group 1 project cPP Test Bi cpp test cpp Source Files Linker Files Header Files Generated Files Create LDF 64 CRT B basicerts l include lt shortfract gt Welcome to the Create LDF Wizard This wizard will guide you through the creation of a new LDF file To continue click Next ek widgete desig Breakpoint Hit at lt ffa0012e Cancel Help Analog Devices Confidential Information feo Expert Linker CPP Test ldf Input Sections Memory Map a Segment Section Start Address EndAddress B _ _ Ox4 Ox3fff m MEM SDRAMO 0 4000 MEM ASYNCO 0x20000000 Ox200fffff MEM ASYNCT 0x20100000 Ox201fffif 3 MEM ASYNC2 0x20200000 Ox202fffit MEM ASYNC3 MEM L1 DATA A 5 MEM L1 DATA A CACHE A BBB 6 8
27. instruction One 16 bitinstruction One 16 bit instruction 16 bit instruction 32 bit instruction 64 bit Instruction Bits 63 58 Analog Devices Confidential Information KAA Linker Description File Analog Devices Confidential Infonration ANALOG DEVICES rig KNS Software Development Flow Step 1 Compiling amp Assembling Source Files Object Files Executable C and ASM DOJ DXE Debugger In Circuit Emulator Simulator or EZKIT Compiler amp Linker Assembler Loader Splitter Boot Code Boot Image Linker LDR Description File LDF 161 Analog Devices Confidential Inforrration DEVICES Software Development Flow Step 2 Linking Source Files Object Files Executable C and ASM DOJ Compiler amp Assembler ANALOG 162 Analog Devices Confidential Information DEVICES Debugger In Circuit Emulator Simulator or EZKIT Loader Splitter Boot Code Boot Image one LDR Na ns 59 M Step 2 Linking Object Files Executable DOJ DXE OBJECT SECTIO OBJECT SECTIO OBJECT SECTION OBJECT SEGMENT OBJECT SECTION OBJECT SECTION 1 CODEI OBJECT SECTION OBJECT SECTION OUTPUT SECTION OBJECT SECTION OUTPUT SECTION OUTPUT LDF SECT
28. processor can continue running without having to wait for the data to be written back to L2 memory The victim buffer is comprised of a 4 deep FIFO each 64 bits in width similar to the fill buffer There is no data forwarding support from the victim buffer Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Cacheability Protection Lookaside Buffers CPLBS TOTE Memory Management Unit e Cacheability and Protection Look Aside Buffers CPLBs e Cache protection properties determined on a per memory page basis 1K 4K 1M 4M byte sizes e 32 CPLBs total 16 CPLBs for instruction memory 16 CPLBs for data memory User Supervisor Access Protection Read Write Access Protection Cacheable or Non Cacheable Analog Devices Confidential Information Using CPLBs Cache enabled Cache disabled eCPLB must be used to eCPLBs can be used to define cacheability protect pages of memory are enabled valid CPLB must exist before an access to a specific memory location is attempted Otherwise an exception will be generated User and Supervisor mode protection is available without using CPLBs Analog Devices Confidential Information 22 DW Cacheability Prote
29. remaining in the DEB traffic period Analog Devices Confidential Information ANALOG
30. ure proj ect Project Options for fir 533 options mee Define the target a processor and set up ace icut ne your project options or 4 Bere accept default settings 58 Warning Tool Chain before adding files to Compiler C C Compiler for Blag the project T d Assembler Blackfin Family Asseglbler The Project Options 55 2 Linker Btackin Linger dialog box provides Loader Blackfin Family fader access to project 58 Options Spliter 158 Kernel fs Settings for configuration n gt options which enable the corresponding build tools to process the project s files correctly Enable building for a specific revision of silicon No need to specify si revision switch Automatic will attempt to determine revision of the attached target or specify a specific rev level eg 0 3 Analog Devices Confidential Information Project Options for fir 533 Compiler Property Page II 4 Project comple Gen E Exod Code Generation ET Enable optimization General 2 3 V Generate debug information fi Processor 1 V Generate assembly code annotations 88 Processor 2 Waring Language Dialect f Workarounds Disable builtin functions fi Assemble Disable keyword extensions
31. va ptr int a int b int out int n for i 0 i lt n i out b Which produces the fastest code Analog Devices Confidential Information KESI Pointers or Arrays 2 Often no difference Sometimes one version may do better for an algorithm Not always the same style that wins Start using array notation as easier to understand Array format can be better for alias analysis in helping to ensure no overlap If performance is unsatisfactory try using pointers Outside critical loops stay with array notation Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Tricks useful transformations Avoid Division There are no divide instructions just supporting instructions Floating or integer division very costly Remember Modulus also implies division Get Division out of loops wherever possible Analog Devices Confidential Information ANALOG DEVICES N S Sl Exception Division by powers of 2 Division by power of 2 rendered as right shift very efficient Unsigned Divisor one cycle Division call costs 35 cycles Signed Divisor more expensive Could cast to unsigned 2 x 0 2 1 gt gt n Consider
32. ya yb cli xc c i 1 yc OK to unroll outer loops Analog Devices Confidential Information Bad Scalar dependency for i20 i n x ali x Value used form previous iteration So iterations cannot be overlapped Bad Array dependency for i20 i n i ali b i Value may be from previous iteration So iterations cannot be overlapped Analog Devices Confidential Information e 8 T Dine uS m SL us Resolvable dependencies Good A Reduction for i20 i n i ali Operation is associative Iterations can be reordered to calculate the same result Good Induction variables for i 0 i lt i 1 4 b i ali Addresses vary by a fixed amount on each iteration Compiler can see there is no data dependence Analog Devices Confidential Information ANALOG DEVICES UFE Avoid aliases 15 there a loop carried dependence in this loop void fn int a int b int n for i 0 lt i ali bli Yes if a and b point at the same array Write your code so they do not point at the same array ipa switch may help compiler find out this is so Analog Devices Confidential Information ig f Do not rotate loops yourself common DSP idiom To rotate loops so loads can be execu
33. 0 simulated only not verified on hardware Analog Devices Confidential Information DA cs Price Performance Comparison BENCHMARK Signal Processing Performance PARTNER 6000 9 ADSP BF533 x Medi Processing Po ADI BLACKfin 6411 X TI c55x 2000 1 Baseband TI C64xx Processing PXA250 _ 5502 5501 TI 5510 f 750 5509 intel XSCALE n SH3 DSP Q Hitachi SH3 DSP 100 50 10 5 Price 10 kU Arnalog Devices Confidential Inforrration ANALOG The Worid Leader in High Performance Signal Processing Solutions Blackfin Products at a Glance ADSP BF535 Blackfin DSP Available Now System Control Blocks Emulator Real Event Watchdog Memory Controllers Timers DMA Time PLL Control Clock Blackfin Core To 350 MHz 16KB 32KB Inst Data SRAM Cache System Interface Unit 256 KB SRAM GPIO TIMERS UARTO UART ES Analog Devices Confidential Information Peripheral Blocks 32 bit External Bus Interface DRAM Ctrl High Speed I O PCI v2 2 Master Slave USB v 1 1 ANALOG DEVICES Blackfin ADSP BF533 Available Now System Control Blocks Voltage Clock Memory Regulation PLL DMA Processor Core To 750MHz 80KB 32KB External A 4KB Instruction Instruction nace Scratchpad Interface RO
34. 0 0000 _ gt 3 OxFF80 8000 I ASRAMCache ged os OxFF80 4000 00 00 gt emes oom pamen iaa 020000000 0 0800 0000 Internal Memory External Memory Analog Devices Confidential Information re ADSP BF532 Memory OxFFEO 0000 Core MMR System MMR OxFFCO 0000 gt OxFFBO 1000 O OxFFBO 0000 gt OxFFA1 4000 ese OxFFA1 0000 SRAM Cache OxFFAO C000 gt Instruction SRAM carro oomo e prc 0000 gt OxFF90 8000 25e OxFF90 6000 Data Bank B SRAM Cache OxFF90 4000 Data Bank B SRAM Cache OxFF90 0000 gt DEF Data Bank A SRAM Cache Od rene gt Data Bank A SRAM Cache OxFF80 4000 gt 222 227 didnt 0000 gt 0x2040 0000 3 ss gt 0000 p arena gt 0000 Analog Devices Confidential Information Internal Memory External Memory ANALOG DEVICES ADSP BF531 Memory Core MMR 1 OxFFEO 0000 29 8 5 OxFFCO 0000 gt OxFFBO 1000 2 gt OxFFA1 4000 ___________ OxFFA1 0000 gt Instruction SRAM Cache
35. 0016 26 0 0 0000018 0 0 000001 0 PC 0xf000001c _ Enabled Linear Profiler is also available for the simulator Analog Devices Confidential Information ANALOG DEVICES 4 H Deed DUE Look closely at cycles in critical areas Cycle Accurate Simulator e Step through the code identified by the Statistical profiler Watch the Cycle counter Pipeline Viewer Close on causes of stalls with the pipeline viewer Analog Devices Confidential Information VDSP Pipeline Viewer Accessed through View gt Debug Windows gt Pipeline Viewer a simulator session not available in emulator Pipeline e Cycle DECODE ADDRESS Details for stage EX1 cycle 31 E COMMIT 23 R1 L RILL ISET PO Address Invalid IB I 0 24 PO RO L 10 1 25 Pl PO R1 L stall mE I1 H 26 R2 L Pl R1 L Cause Dagreg RAW hazard R3 L 27 2 1 Po Details Stall in stage AC due to stage EX3 PO R3 L 28 RO L R3 R2 L P1 29 RO L R3 R2 I P1 30 RO L R3 R2 L P1 31 RO L R3 P1 32 1 R3 R2 33 PO R1 L RO L R3 34 Pl PO R1 L RO Analog Dev
36. 3 12 11 10 9 8 Reset 00000 CURR_Y_COUNT 15 0 This register is loaded by Y Count at the beginning of each 2D DMA transfer Not used for 1D DMA This register is decremented each time that the Curr X Count Register expires during 2D DMA 1 to X Count or 1 to 0 transition signifying completion of an entire row transfer After 20 is complete Curr Y Count 1 and Curr X Count 0 Analog Devices Confidential Information T N Interrupt Status Register DMAx STATUS yy IRQ STATUS DMA RUN DMA Channel Running RO This bit is set to 1 automatically when the DMA CONFIG register is written 0 This DMA channel is disabled or it is enabled but paused 1 This DMA channel is enabled and operating either transferring data or fetching a DMA descriptor DFETCH DMA Descriptor Fetch RO This bit is set to 1 automatically when the DMA CONFIG register is written with FLOW 0 4 0x7 0 This DMA channel is disabled or it is enabled but stopped 1 This DMA channel is enabled and presently fetching a DMA descriptor Analog Devices Confidential Information Reset 0x0000 DMA_DONE DMA Completion Interrupt Status W1C 0 No interrupt is being asserted for this channel 1 DMA transfer has completed and this DMA channel s interrupt is being asserted ERR DMA Error I
37. ION 163 Analog Devices Confidential Inforrration ANALOG 2 0 Linker Generates a Complete Executable DSP Program dxe Resolves All External References Assigns Addresses to re locatable Code and Data Spaces Generates Optional Memory Map Output in ELF format e Used by downstream tools such as Loader Simulator and Emulator Controlled by linker commands contained in a linker description file LDF e An LDF is required for each project e Typically modify a default one to suit target application Analog Devices Confidential Information ANALOG DEVICES Linker Analog Devices Confidential Infonration NS Project Options for ex1 Project Ti General Compile 2 General 1 General 2 156 Preprocessor 158 Processor 1 f Processor 2 Warning T Workarounds f Elimination f Processor amp fij Load 58 Options E Spitter Options Generate objecttrace Strip debug symbols Additional Generate symbol Generate Save temporary files Search directories once on undefined symbol Runtime initialization None Optimizations Individually map functions and data items Additional options Cancel
38. Information DEVICES 7 Control Status Registers MMR Name SO S1 DO Ir 15 7 N Onyor ation Description DESC Link pointer to next descriptor MDMA yy START ADDR Start address of DMA buffer MDMA yy DMA CONFIG DMA configuration register yy X COUNT Inner loop count MDMA yy MODIFY yy Y COUNT Inner loop address increment in bytes Outer loop count 2D DMA only MDMA yy MODIFY Outer loop address increment in bytes MDMA yy CURR DESC PTR Current Descriptor Pointer yy ADD Current DMA Address yy IRQ STATUS Interrupt Status Register contains completion and error interrupt status MDMA yy PERIPHERAL MA register read only X COUNT yy CURR Y COUNT Current count 1D or intra row X count 2D Current row count 2D DMA only Analog Devices Confidential Information gt rameter Registers Current Registers ANALOG DEVICES Next Descriptor Pointer Register DMAx NEXT DESC PTH CMDMA NEXT DESC PTR Next Descriptor Pointer 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Next Descriptor Pointer 15 0 Reset 0x0000 0000 Specifies the location of the Next Descript
39. L1 data a L1 data b bsz bsz init constdata cplb cplb code cplb data ctor datal noncache_code program voldata vtbl 6 8 MEM L1 DATA B STACK MEM L1 DATA B MEM L1 CODE MEM L1 CODE CACHE MEM L1 SCRATCH MEM SYS MMRS Ei E E E ED E2 BB BB oe 8 6 8 8 E E9 E E9 E E 2 This is a memory map view of the generated Idf file In this mode each section s start and end address are shown in a list format Arnalog Devices Confidential Inforrration e Expert Linker CPP Test ldf Input Sections 0 MEM 8000000 20000000 code 20400000 data a data 000000 800000 constdata 2808000 cplb MEM 11 DATA 902000 cplb data Eos 908000 ctor 00000 MEM LI CODE 22800000 3 datat B E 2 8 B E B ES E3 E9 E9 E E9 E9 E E E9 E noncache code 14000 program B voldata 22500000 MEM L1 SCRATCH 22500000 D 22501000 gom fc00000 SYS MMRS fc00000 00000 This is a graphical view of the memory map Double click on the section to zoom in Analog Devices Confidential Inforrration Expert Input Sections Linker CPP Test Idf Memory
40. L3 8 LCO P1 Pl Pl P2 R3 W P1 H Bil P1L2 R3 Rl EE PN 1 RO H RO H 0 RO L RO H IS FP 16 R7 RO L 11 mire 8 Increment I bows ccnl R3 1 8 R3 JUMP PlLl P1L3 Repeat Loop Analog Devices Confidential Information ANALOG DEVICES The Optimizer Looks at Each Operation Try not to do it at all e perhaps not actually needed calculate at compile time e re use previous calculation Do it more cheaply e avoiding storing in memory Do it more efficiently e use special resources e do more than one thing at a time Loops get special attention e Biggest Savings of All The compiler is your partner You can count on certain optimizations being done Analog Devices Confidential Information ANALOG DEVICES fva Compiler command line options 0 Oa OS Ov ipa save temps Optimize Optimize with auto inlining Optimize space sensitively Optimize with user control of balance between size and speed Whole program analysis Preserves compiler output s Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Leave the low level concerns to the compiler Leave basic operations to the compiler 1 2 3 12 cycles
41. M RAM SDRAM Ctrl f High Speed System Interface Unit is Vo Parallel Peripheral SPORTO Interface eripheral Blocks ANALOG Analog Devices Confidential Information DEVICES Blackfin ADSP BF532 Available Now System Control Blocks Voltage Clock Memory Regulation PLL DMA Processor Core To 400MHz 48KB 32KB External A 4KB Instruction Instruction nace Scratchpad Interface ROM RAM SDRAM Ctrl f High Speed System Interface Unit is Vo Parallel Peripheral SPORTO Interface eripheral Blocks ANALOG Analog Devices Confidential Information DEVICES Blackfin ADSP BF531 Available Now System Control Blocks Voltage Clock Memory Regulation PLL DMA Processor Core To 400MHz 32KB 32KB External 4KB Instruction Instruction nace Scratchpad Interface ROM RAM SDRAM Ctrl f High Speed System Interface Unit is Vo Parallel Peripheral SPORTO Interface eripheral Blocks ANALOG Analog Devices Confidential Information DEVICES E INN ADSP BF561 Dual Core Blackfin Available Now 32 bit Emulator External amp Test Voltage Event atchdog Memory PLL Bus Control Timers DMA BAM 1 2 32KB 64KB Inst Data Inst Data SRAM Cache SRAM Cachi System Interface Unit 128 KB SRAM
42. OLLER DMA External DEB Peripheral Access Bus PAB External Port Bus EPB DMA Access Bus DEVICES DAB L2 Memory Analog Devices Confidential Information isi ne S 1 Configurable Memory The best system performance can be achieved when executing code or fetching data out of L1 memory Two methods can be used to fill the L1 memory Caching and Dynamic Downloading Blackfin Processor Supports Both e Micro controllers have typically used the caching method as they have large programs often residing in external memory and determinism is not as important e DSPs have typically used Dynamic Downloading as they need direct control over which code runs in the fastest memory Blackfin amp Processor allows the programmer to choose one or both methods to optimize system performance Analog Devices Confidential Information d Hous Why Do Blackfin Processors Have Cache To allow users to take advantage of single cycle memory without having to specifically move instructions and or data manually 2 memory can be used to hold large programs and data sets e The paths to and from L1 memory are optimized to perform with cache enabled Automatically optimizes code that reuses recently used nearby data Internal L1 Memory External L2 Memory Smallest capacity Largest capacity Single cycle access Highest latency
43. Performance Signal Processing Solutions Reference Material Menory y gi ee SS NS ns M Data Byte Ordering The ADSP BF533 architecture supports little endian byte ordering For example if the hex value 0x76543210 resides in register rO and the pointer register contains address 0x00ff0000 then the instruction 1 pO r0 woul cause the data to be 0 00 0000 0x00ff0001 0 00 0002 0x00ff0003 When loading a byte half word or word from memory to a register the LSB bit 0 of the data word is always loaded into the LSB of the destination register Analog Devices Confidential Information gines Instruction Packing Instruction set tuned for compact code 64 bit Multi OP e Multi length instructions Instruction Formats 16 32 64 bit opcodes 15 0 Limited multi issue instructions No memory alignment restrictions for code 16 bit wide memory No Memory Alignment Restrictions System Memory Cost Analog Devices Confidential Infonration Maximum Code Density and Minimum ANALOG DEVICES Instruction Fetching 64 bit instruction line can fetch between 1 and 4 instructions One 64 bit instruction One 32 bit instruction One 32 bit instruction One 16 bit instruction One 16 bit instruction one 16 bit instruction One 16 bit instruction One 32 bit
44. S M Hmm sec data a INPUT_SECTIONS OBJECTS data a gt seg data sec data b INPUT_SECTIONS OBJECTS data b gt seg data b sec data scr INPUT_SECTIONS OBJECTS data scr gt seg data scr sec prog INPUT_SECTIONS OBJECTS prog seg prog Arnalog Devices Confidential Inforrration aos Linker Description File for Programming Memory Description e Define Memory Segments e Map Input Sections Names Produced by Compiler to Memory Segments Run Time Stack Supported e Stack Used for Branching Local Variables Arguments eLDF Defines Stack Size and Location Run Time Heap Supported e Used For Memory Management Protocols malloc free etc eLDF Defines Heap Size Location and Name For Multiple Heap Support Analog Devices Confidential Inforrration aos enr ess Mi Compiler Generated Memory Section Names Compiler uses default section names that are mapped appropriately by the linker through the LDF e program contains all program instructions e data1 contains all global and static data e constdata contains all data declared as const ctor C constructor initializations ecplb code code CPLB config tables ecplb data data CPLB config tables Analog Devices Confidential Information e Memory Descriptions Define Memory Segments In LDF For e Code Data Stack
45. State History window operations status bar VDK Status window about adding a marshalled messac adding boot threads adding dependent event bits adding device drivers adding device flags adding eventbits adding events adding imported projects adding interrupts adding memory pools adding round robin priorities adding routing threads to pro adding semaphores adding thread types changing boot thread order configuration data cursor deleting boot O objects deleting boot threads deleting device drivers deleting device flags deleting event bits deleting events deleting interrupts deleting memory pools deleting round robin priorities deleting semaphores 3 Display On Line Help Example Home Print Options VDK State History Window Operations Thread 2 KMyThreadTyp ISemanhorePosted 1 Value 3 Status Bar The status bar bottom of plot of the State History page of the VDK State History window shows the event s details and thread status when the data cursor is enabled Event details include the event type the tick when the event occurred and an event value The value for a thread switched event indicates the thread being switched in or out The status bar indicates thread status for the active location Data Cursor Analog Devices Confidential Information ANALOG DEVICES What 15 2 is kernel not an operating system VDK c
46. The Worid Leader in High Performance Signal Processing Solutions Blackfin Overview Srinives K A A Shailendra Mglani Agenda Day 1 Day 2 e Introduction e Introduction to LDF e VisualDSP features e DMA e Coding guidelines for achieving VDK and uClinux Optimal C Performance on Blackfin e Architecture and Pipeline e Memory e Assembly level optimization e Q amp A session Analog Devices Confidential Infonration ANALOG DEVICES Arnalog Devices Confidential Inforrration m Pro a y 7 si lt ie SN Mier Ms Blackfin Introduction Blackfin DSP is the architectural base for a whole new family of DSPs from ADI It is built upon the Micro Signal Architecture MSA core developed through the Joint Development with Intel Corporation Blackfin DSPs incorporate the industry s highest performance 16 bit DSP architecture It has Dynamic Power Management capabilities which delivers the lowest power consumption Blackfin DSPs are optimized for processing data communications and video streams for penetration into new market spaces Analog Devices Confidential Information Blackfin Features and Benefits High Performance for real time video signal processing Easily programmed to support complex new standards Handles the DSP and Control cod
47. a b c d 9 R2 b R3 c 1 R2 R3 R1 1 R6 f R4 R1 R6 d R4 R2 b R7 9 R1 R2 R7 a ralog Devices Confidential Informatio Value of a can be used directly from register eliminate load from memory New value assigned to a so value stored at 1 is not used eliminate the store to already in R2 memory 2 b R3 c R1 R2 R6 f R4 R1 R6 d R4 R7 g 9 cycles Leave scheduling to the compiler 1 a b c Take advantage of hardware parallelism consider dispatching 2 multiple instructions in one cycle 3 a b g Optimized code Scheduled code fe tel R2 Rl E R3 Bc 9 9 T 1 82 f 5 gt R6 f R4 R1 RG 191 d R4 R2 R7 d R7 g9 a R1 R1 R2 R7 a R1 Analog Devices Confidential Infonration e KI Compilers understand Loops Simple counted loop C and D don t change Use zero overhead loop during loop mechanism Load them into registers outside for j20 j lt N j P j A j B N j 1 D QIj Aj C BIN j 1 1 D ND Combine reference with incrementing pointer Use post modify addressing COMPILER DOES THE LOW LEVEL WORK
48. ation Register A write of 0x0 to the entire register will always terminate gracefully without DMA Abort FLOW z 0x4 0x6 0x7 Array List Mode e Set the final DMA CONFIG Register with FLOW z 0x0 setting to gracefully stop the DMA channel If the DMA CONFIG Parameter is not included within the Descriptor Block use the FLOW 0 1 method above to end the DMA Arnalog Devices Confidential Inforrration aos Memory MemDMA Allows memory to memory DMA transfers between the various ADSP BF533 memory spaces A single MemDMA transfer requires a pair of DMA channels e One to specify the Source block of memory e One to specify the Destination block of memory ADSP BF533 consists of four MemDMA channels which allows setup for 2 memory to memory DMA transfers at the same time e Two Source DMA Channel used to read from memory e Two Destination DMA Channel used to write to memory Both the Source and Destination DMA Channels share a 8 entry 16 bit FIFO 32 bit FIFOs on the BF561 e Source DMA Channel fills the FIFO e Destination DMA Channel empties the FIFO Analog Devices Confidential Inforration ae Memory MemDMA Each DMA transfer sequence requires two sets of Descriptor Blocks within memory e One for the Source DMA Channel One for the Destination DMA Channel e Both sets of Descriptor Blocks must be con
49. bedded XML 1 TCP IP Stack in World OSEK Compliant Safety Critical Performance Driven Minimal Code Size De facto Std in Academic World Broad User Community Free Connotation Comprehensive Product Portfolio beyond Kernel Comprehensive CPU coverage for easy switch JAnaiog Devices Conhdental Tnfomration Broad Coverage and Highly Integrated Consumer Media Audio Video Network Connected Automotive Telematics Consumer Media STB PC amp Peripheral Traditional MCU From Desktop to Embedded Devices Consumer Telecomm Industrial Networking DEVICES Operating Systems Real Time Operating Systems e VDK from ADI em A gt A e Unicoi Fusion RTOS e Nucleus PLUS LiveDevices unicz 2 e ThreadX cux e Live Devices e Operating Systems e Embedded Linux BF535 BF531 2 3 in development Networking Stacks e Kadak Kwik Net e Unicoi Fusion Net e Net X Blackfin ANALOG Analog Devices Confidential Information DEVICES The Worid Leader in High Performance Signal Processing Solutions Section 2 Introduction to MssualDSP 44 VisualDSP 4 0 VisualDSP is an integrated development environment that enables efficient management of projects e Key Features Include Editing Building e Compiler assembler linker
50. ccurate shows all effects including stalls Don t assume you know where an application spends its time profile it Analog Devices Confidential Information ANALOG DEVICES VDSP Statistical Profiler The profiler is very useful C C mode because it makes it easy to benchmark a system module by module 1 function Assembly or optimised code appears as individual instructions a Profiling Re Histogram Execution Unit Z rIine C Documents and Sett 8 funcA int 1 int 6 main 2 int funcB int 3 funcB int 3 6 0 0000106 4 int main 4 6 PC O0xf0000102 5 int 1 1 0 0000120 o 6 int b 2 i 0 000011 7 int contr 0 0 0000020 4 8 for cntr 0 cn 0 0 000043 3 3 funcA b 0 0 000043 2 10 b funcB a 0 0 00003 8 3 11 funcA a 0 0 00003 6 2 12 funcA a 0 0 00003 2 13 0 00003 4 7 14 o PC 0xf00003fe 15 gt PC O0xf0000400 16 0 PC O0xf00003fc 17 0 00003 18 int funcA int argA 0 000006 19 int locA 1 PC 0xf 0000000 20 return argAd locA 0 PC 0xf 0000008 ri aR PC O0xf000000a 22 PC O0xf000000c 23 int funcB int argB 4 0 0 000000 24 int locB 1 0 0 0000012 25 return argB locB 0 0 000
51. ch line associated with address in P register e Causes line to be fetched if it is not currently in the cache and the location is cacheable e Otherwise it behaves like a nop Prefetch p2 Prefetch p2 post increment by cache line size FLUSH Causes data cache to synchronize specified cache line with higher levels of memory e If the line is dirty it is written out and marked clean flush p2 flush p2 post increment by cache line size FLUSHINV Causes data cache to invalidate a specific line in cache e If the line is dirty it is written out flushinv p2 Analog Devices Confidential Information LL m AM Instruction Cache Control Instructions IFLUSH Causes instruction cache invalidate a specific line in cache e iflush p2 eiflush p2 post increment by cache line size Analog Devices Confidential Information fee Care must be taken when memory that is defined as cacheable is modified by outside source e DMA controller data or descriptors Cache is not aware of these changes so some mechanism must be setup e Simple memory polling will not work e Must Invalidate the cache before accessing the changed L2 memory L1 Cache L2 Memory External Device Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High
52. ction Lookaside Buffers CPLBs Divide the entire Blackfin memory map into regions i e pages that have cacheability and protection properties 16 Pages in Instruction Memory plus 16 Pages in Data memory e Page sizes 1KB 4KB 1MB 4MB Each CPLB has 2 associated registers e 32bit Start Address ADDRn DCPLB ADDRn e Cache Protection Properties ICPLB DATAn DCPLB DATAn Note n equals 15 0 Analog Devices Confidential Information e ICPLB DATAn Register 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 o o o fo o o o Reset 0000 T PAGE SIZE 1 0 00 1 KB page size 01 4 page size 10 1 MB page size 11 4 MB page size CG TS CPLB L1 NT ml Clear this bit whenever L1 memory pens VALID is configured as SRAM Invalid disabled CPLB 0 Non cacheable in L1 acy 1 Cacheable in L1 1 Valid enabled CPLB CPLB LRUPRIO ent 0 Low importance CPLB LOCK 1 High importance Can be used by software in CPLB replacement algorithms 0 Unlocked CPLB entry can be replaced 1 Locked CPLB entry should not be replaced CPLB USER RD 0 User mode read access generates protection viola tion exception 1 User mode read access permitted Note n equals 15 0 Analog Devices Confidential Information DCPLB Datan Register 15 14 13 12 11 10 9 8 76 5 4
53. de Generation System Verification Software Verification Analog Devices Confidential Information WORSE Cr Software tools be configured and called by the IDDE Software tools are configured via property pages The IDDE calls the software tools it needs to complete the build GUI front end to a command line utility Software tools can be invoked from a Command line C Compiler X sourcefile switch switch Assembler easmblkfn sourcefile switch switch Linker linker object object switch switch Loader elfloader executable switch switches For the complete list of switches see the appropriate tools manual Analog Devices Confidential Information Environment IDDE Features IDDE allows one to manage the project build The user configures the project and the development tools via property pages Project Property pages configure the project Project Property Page General Property Page Pre Build Property Page Post Build Property Page Development Tools Property Pages are used to configure the development tools Assembler Property Page Compiler Property Page Linker Property Page Loader Property Page Analog Devices Confidential Information ae Create a project All development in VisualDSP occurs within a project The pro
54. e Buffer 0 DAG1 non cacheable fetches Enable 0 CPLBs disabled Minimal 1 DAG1 non cacheable fetches use port B PORT PREFO DAGO Port address checking only 1 CPLBs enabled DMC 1 0 L1 Data Memory Preference Configure 0 DAGO non cacheable fetches 1 For ADSP BF533 use port A 00 Both data banks are 1 DAGO non cacheable fetches SRAM use port B 01 Reserved DCBS L1 Data Cache Bank Select 5 B Valid only when DMC 1 0 11 for ADSP BF532 5 16 ADSP BF533 Determines whether Address bit A 14 or A 23 i d l heL1 d 11 Both data banks are it A 14 or A 23 is used to select the lata lower 16 KB SRAM upper cache bank 2 16 KB cache Analog Devices Confidential Information LE ns Data Cacheability ANALOG DEVICES Cache Mode Analog Devices Confidential Information ANALOG DEVICES What is Cache In a hierarchical memory system cache is the first level of memory reached once the address leaves the core i e L1 e If the instruction data word 8 16 32 or 64 bits that corresponds to the address is in the cache there is a cache hit and the word is forwarded to the core from the cache elf the word that corresponds to the address is not in the cache there is a cache miss This causes a fetch of a fixed size block which contains the requested word f
55. e symbol map Generate Save temporary files Search directories Additional options Loader Property Page PK Warm once on undefined symbol Runtime initialization None Ip Project E General Compie General 1 Eh General 2 f Preprocessor fih Processor 1 f Processor 2 Eh Warning f Workarounds Bh Assemble 8 Link 858 General f LDF Preproces E Elimination 1 Processor Load aT f Kernel Splitter BootMode Flash PROM Osr OSPiSlave Boot Format Intel hex O Binary Use default start address Verbose Initialization file Linker Property Page Output Width amp bit O 16 bit Qutputfile Additional options Analog Devices Confidential Information Project Compie Eh General 1 fh General 2 f Preprocessor 158 Processor 1 f Processor 2 f Warning 18 Workarounds Bh Assemble Link General f LDF Preproces Elimination Th Processor 8 Load f Options f Kernel f Spitter Pre Build Property Page Project Options for fir 533 Intermediate file Debug Outputfiles Debug Post Build Property Page Project Options for fir_533 Project Pre 5 General 1 Eh General 2 5 Splitter smm
56. e with equal efficiency Maximizes work and minimizes energy per cycle High Performance Blackfin offers 600 MACs today with a roadmap for 2G MACs Low Power Consumption e Blackfin DSP enables significant power savings by dynamically varying both voltage and frequency Ease to use e Blackfin DSP combines attributes of both high performance DSP and microcontrollers into a single RISC ISA Analog Devices Confidential Information ANALOG DEVICES BLACKfin Embed MCU Features Arbitrary bit and bit field manipulation insertion and extraction Integer operations on 8 16 32 bit data types Memory protection and separate user and supervisor stack pointers Scratch SRAM for context switching Population and leading digit counting Byte addressing DAGs Compact Code Density Analog Devices Confidential Information ANALOG DEVICES Integrated Blackfin Features Typically Found in a Microcontroller A RISC Instruction Set and Data Movement Arithmetic LD ST 8 16 32 bits gt gt gt Negate Memory management Unsigned Sign extend 2 and 3 operand instructs Register moves P D DAG Push Pop Push Popmult Logical CC2 dreg etc AND OR XOR NOT BI Ttst set tgl clr CC ops Event control Addressing Modes lt lt gt gt Auto incr Auto decr Pre decr store on SP Video Indirect Indexed w immed offset Post incr w non
57. ecision accumulators e Specialized architectural features If not well modeled by C lose portability and efficiency Example Zero overhead loop good Fractional arithmetic problem e mathematical focus historically not C s orientation Features which compiler must exploit e Efficient Load Store Operations in Parallel e Utilize multiple Data paths SISD SIMD MIMD operations e minimize memory utilization Analog Devices Confidential Information ANALOG DEVICES 4 C provides common computational model e portability e higher level Compiler s job map this to a particular machine e tries for optimal use of instructions e supplement by instruction sequences or library calls Optimizer improves performance e do things less often more cheaply e try to utilize resources fully Optimizing Compiler has Limited Scope e will not make global changes e will not substitute a different algorithm e will not significantly rearrange data or use different types e correctness as defined in the language is the priority Analog Devices Confidential Information ANALOG DEVICES ST Overview of Compilation Compiler 1 makes a straightforward translation fully sequential each individual step as written This form provides clearest debugging 2 then improves it optimization etransforms it into an equivalent one hopefully fast
58. ee ee 28 1 NEC e te 1 Saag 1 BUFFER 1 ETT 88 M MEMORY 12 RNAL 439 INSTRUCTION FETCH 64 BIT TO MMR ACCESS PROCESSOR gt CORE Analog Devices Confidential Information e Bank 16 KB SRAM Four 4KB single ported sub banks Allows simultaneous core and DMA accesses to different banks EAB 16 KB cache 4 way set associative with arbitrary locking of ways and lines LRU replacement No DMA access ANALOG Analog Devices Confidential Information DEVICES PN 2 go n 4 M AS Features of L1 Instruction Memory Unit Instruction Alignment Unit handles alignment of 16 32 and 64 bit instructions that are to be sent to the execution unit Cacheability and Protection Look aside Buffer CPLB Provides cacheability control and protection during instruction memory accesses 256 bit cache Line Fill Buffer uses four 64 bit word burst transfers to copy cache lines from external memory Memory test interface Provides software with indirect access to tag and data memory arrays Analog Devices Confidential Information e L1 Instruction Memory Control Register IMEM CONTROL 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Lo fo o o o fo o o o o o Reset 0
59. el BUT missing operations provided by software emulation slow for example C provides floating point arithmetic everywhere e Cis more machine dependent than you might think for example is a short 16 or 32 bits later Machine s Characteristics will determine your success C programs can be ported with little difficulty But if you want high efficiency you can t ignore the underlying hardware Analog Devices Confidential Information lt a M Evaluate Algorithm against Hardware What s the native arithmetic support e Can we use floating point hardware e how wide is the integer arithmetic doing 64 bit arithmetic on a 32 bit unit is slow doing 16 bit arithmetic on a 32 bit part is awkward e Can we use packed data operations 2x16 arithmetic might be ideal for your application more computation per cycle less memory usage implications for data types memory layout algorithms is the computational bandwidth and throughput e what are the key operations required by your algorithm e macs loads stores e how fast can the computer perform them Analog Devices Confidential Information NN C 6207 RUE AA DSP s Present Some Unique Problems Special Aspects of Digital Signal Processors e Reduced memory e Extended pr
60. en obParameters that the AMOG A vasi Ill fatah fan tha navi You DMA Register Setup To start DMA operation some or all of the DMA Parameter Registers must first be initialized depending on the Next Descriptor Size NDSIZE and FLOW bits in the DMA Configuration Register After Initialization DMA operation begins by writing a 1 to the DMA Enable bit in the DMA Configuration Register 1 FLOW 0x0 Stop Mode NDSIZE 0x0 Initialize all of the following START ADDR X COUNT X MODIFY Y COUNT if 2D DMA Y MODIFY if 2D DMA DMA CONFIG 2 FLOW z 0 1 Autobuffer Mode NDSIZE 0x0 Initialize all of the following START ADDR X COUNT X MODIFY Y COUNT if 2D FLOW 0x4 Descriptor Array Mode NDSIZE 0x0 0 7 Initialize at least CURR DESC PTR 31 16 CURR DESC PTR 15 0 FLOW 0x6 Small Descriptor List Mode NDSIZE 0 0 0x8 Initialize at least NEXT DESC PTR 31 16 NEXT DESC PTR 15 0 FLOW z 0x7 Large Descriptor List Mode NDSIZE 0x0 0x9 Initialize at least NEXT DESC PTR 31 16 NEXT DESO PTR 15 0 Y MODIFY if 2D DMA DMA CONFIG Analog Devices Confidential Information ANALOG DEVICES How to Stop Transfers FLOW z 0x0 Stop Mode e DMA stops automatically after the DMA transfer is complete FLOW z 0x1 Autobuffer Mode e Write a 0 to the DMA Enable bit in the DMA Configur
61. er and smaller must get same answers e Simple Guiding Principle e Avoid Work Reduce Generality things in parallel Analog Devices Confidential Information ANALOG DEVICES Summary How to go about increasing performance l Work at high level first most effective maintains portability e improve algorithm e make sure it s suited to hardware architecture e check on generality and aliasing problems 2 Look at machine capabilities e may have specialized instructions library portable e check handling of DSP specific demands 3 Non portable changes last e in e in assembly language e always make sure simple C models exist for verification Compiler will improve with each release Analog Devices Confidential Information ANALOG DEVICES EEUU Ws ini i m SU Sal ENS MIA Choose Optimized C or Out of the Box C OTB or out of the box C is portable code But most platforms allow some elaboration of the source pragmas Compiler specific assertions builtin functions Memory qualifiers const restrict volatile bank These can specify alignment cycle iteration count SIMD memory type Or access specific machine instructions one to one Optimized C can go very much faster than out of the box C Analog Devices Confidential Information iL ST OTB C compilers
62. fer consists of a four 64 bit word read burst The instruction memory unit requests the target instruction word first once it has returned the target word the IMU requests the next three words in sequential address order and wrap around Fetching Order for Next Three Words WDO WD1 WD2 WD3 WD1 WD2 WD3 WDO WD2 WD3 WDO WD1 WD3 WDO WD1 WD2 Analog Devices Confidential Information e V MSS ee NESS Mi Cache Line Fill Buffer The cache line fill buffer allows the core to access the data from the new cache line as the line is being retrieved from external memory rather than having to wait until the line has been completely written to the 4KB memory block The line fill buffer oraanization is shown below ADDRESS TAG VALID BITS The line fill buffer is also used to support non cacheable accesses A non cacheable access consists of a single 64 bit transfer on the instruction memory unit s external read port A non cacheable access includes external accesses when instruction memofy 15 configured as SRAM or accesses to non cacheable pages Analog Devices Confidential Information gines Typ al _ ns s Cache Line Replacement The cache line replacement unit first checks for invalid entries a single invalid entry is found then that entry is selected for the new cache
63. fficiently Reorganizing the loop in such a way that each iteration of software pipelined code is made from instructions of different iterations of the original loop Simple Dot Product load multiply accumulate CYCLE 1 2 3 4 5 6 100 Fl 1 Al F2 M2 A2 F3 M3 A3 F4 M4 A4 e The pipeline gives more instructions to be done per cycle Analog Devices Confidential Information Effects of Vectorization and Software Pipelining on Blackfin Simple code generation 1 iteration in 4 instructions LSETUP RO L W I1 R1 L 10 1 RO L R1 L Vectorized and unrolled once 2 iterations in 2 instructions RO 11 R1 10 1 RO H R1 H 0 RO L R1 L IS Software pipeline 2 iterations in 1 instruction 1 RO H RO H 0 RO L RO H IS RO L W I1 RO H W I0 LSETUP P1L2 P1L3 8 LCO P1 align 8 PIL2 1 RO H RO H 0 RO L RO H IS RO L W I1 RO H W IO JPlL3 Analog Devices Confidential Information DAS 40 Los T 2 fir Ww i Do not unroll inner loops yourself Good compiler unrolls to use both compute blocks for i 0 i lt n i b i ali Bad compiler leaves on a single compute block for i 0 lt 1 2 xb b i yb b i 1 ali ya 1 xc xa xb yc
64. figured to have the same transfer count and data size but they can have different modify values e The DMA Configuration Register for the source channel must be written before the DMA Configuration Register for the destination channel When the destination DMA Configuration Register is written MemDMA operations starts after a latency of 3 SCLK cycles It is preferable to activate interrupts on only one channel e Eliminates ambiguity when trying to identify the channel either Source or destination that requested the interrupt Arnalog Devices Confidential Information EAS NM NN Prioritization and Traffic Control Traffic can be independently controlled for each of the three buses DAB DCB and DEB with simple counters e alternation of transfers between MDMA streams can also be controlled Using the traffic control features the DMA system preferentially grants data transfers on the DAB or memory buses DCB and DEB which are going in the same read write direction as the previous transfer until either the traffic control counter times out or until traffic stops or changes direction on its own When a count field in CNT expires it is automatically reloaded with the appropriate value programmed in TC PER ie period value When a DAB DEB or DCB counter decrements from 1 to 0 the opposite direction DAB DCB or DEB access is preferred e This may result in a direction change
65. ft Ox203fift Oxfs07ft 4 Be Analog Devices Confidential Information ANALOG DEVICES Deuge S5s ER SPI Test 434 C C Compiler for Blackfin Blackfin Family Assembler utable built Load complete loutput Window CALL RO R1 RO 8 JUMP S 2 0 SEa 12 CR PO UNLINK JUMP PD PO P2 L P2 H RL R2 2 5 Arnalog Devices Confidential Inforrration _get__FP9SpiSt The Worid Leader in High Performance Signal Processing Solutions Section 11 Direct Memory Access BF533 Overview The ADSP BF533 DMA controller allows data transfer operations without processor intervention e Core sets up registers or descriptors e Core responds to interrupts when data is available Types of data transfers Internal or External Memory Memory Internal or External Memory Interface SPI Internal or External Memory Internal or External Memory Internal or External Memory PPI lt gt e gt Internal or External Serial Peripheral Serial Port UART Port Parallel Port Interface Analog Devices Confidential I
66. hing about the real data precis The programmer must decide For instance two 12 bit precision inputs are quite safe 24 bits max on multiplication ion Analog Devices Confidential Information ANALOG DEVICES Ann i E o Cony M 25 Replace Conditionals with Min Max Abs Simple bounded decrement Programming trick Avoid jump instruction RO 1 latencies and simplifying 1 control flow helps RO MAX R1 R0 optimisation The compiler will often do this automatically for you but not always in 16 bit cases BF ISA Note Min and Max are for signed values only Analog Devices Confidential Information el T Deng Cony ig i Wi BM ui s Removing Conditionals 2 Pipelined Architecture Problem sum 0 for 120 1 lt I if 1 10 1 1 sum sum buffer I 10 64 else sum sum 1 10 64 Better Solution removes conditional branch Multiplication is fast let KeyArray hold 64 or sum 0 for 120 1 lt I sum buffer 1 10 KeyArray val1 10 k 1 Compiler is not able to make this kind of global change ANALOG Analog Devices Confidential Information DEVICES 5 Removing conditionals 3 Duplicate small loops rather than have a conditional in a small loop Example for if else
67. ices Confidential Information ANALOG DEVICES uL SUUM n How about the pipeline Deep pipeline processors e pipelines do badly on conditionally branching code also on table lookup e sometimes branches can be avoided by using other techniques 915 there a latency associated with computations results not ready on next cycle e latency can be hidden within a loop e hiding latencies involves loop setup overhead a problem if iteration counts are low Compiler will do its best but inherent hardware limitations will always influence the outcome ePipeline is FULLY interlocked and interruptable Analog Devices Confidential Information Blackfin Pipeline Latencies 1 Multiply Video Operation Latencies One stall RO R4 STALL R2 H R1 L RO H 2 Load to DAG Latencies Three stalls P3 SP STALL STALL STALL RO P3 3 Sub bank access collision One stall STALL R1 R4 L R5 H IS R3 10 R4 1 Analog Devices Confidential Information ANALOG DEVICES Blackfin Pipeline Latencies 2 4 Instruction flow dependencies Correctly predicted branch 4 stalls Incorrectly predicted branch 8 stalls 5 Store buffer load collision WIP0 RO STALL R1 6 Hardware loop latencies e example is instructions between Isetup and loop top LSETUP t
68. ion ANALOG DEVICES Both DAG units can access Data Banks amp B If an address conflict is detected Data Bank priority is as follows 1 System highest priority 2 DAG Unit O 3 DAG Unit 1 lowest priority Parallel DAG accesses can occur to the same Data Bank as long as the references are to different sub banks OR they access 2 words of different 32 bit address polarity Address bit 2 is different Analog Devices Confidential Information A2 z 1 odd A2 0 even A dual access to an odd and even quad address location can be performed in a single cycle A dual access to two odd or two even locations will result in an extra cycle 1 stall of delay Analog Devices Confidential Information L1 Scratchpad Memory Dedicated 4KB Block of Data SRAM Operates at CCLK rate Can not be configured as Cache Can not be accessed by DMA Typical Use is for User and Supervisor stacks to do fast context switching during interrupt handling Analog Devices Confidential Information ANALOG DEVICES L1 Data Memory Control Register DMEM CONTROL 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 o o o o o o o o jo o o o o o Reset 0x0000 0001 15 14 13 12 11 10 9 8 7 6 5 4 3 2 10 PORT PREF1 DAG1 Port Preference Protection Lookasid
69. ject file DPJ stores your program s build information source files list and development tools option settings A project group file DPG contains a list of projects that make up an application eg ADSP BF561 dual core application gt Analog Devices VisualDSP Target ADSP BF533 ADSP BF5xx Single Load Program Load Symbols 5 Reload Program Workspace Recent Files Recent Projects Recent Programs Recent Project Groups Recent Scripts Recent Workspaces Edit Session View Project Register Memory B Disassembly FFA00000 FFA00002 FFA00004 FFA00006 00008 0000 FFA0000C FFA0000E FFA00010 FFA00012 FFA00014 Analog Devices Confidential Inforrration Osha 3 Project Group Talkthrough_TDM_C 5 projects I Talkthrough_TDM_C Source Files Linker Files _ By Talkthrough_TDM_C lf Header Files E Talkthrough h Readme 88 CoreA 48 CoreB smi2 dB sm3 Corea amp Source Files E initialize c ISRc main core Ac Process data c CoreB amp Source Files E main core B c sm Source Files E shared memory L2 c smi3 Source Files E shared memory L3 c ANALOG Ba KR iO cx Bj 25 Talkthrough TDM C Project Talkthrough_TDM_C dpj Config
70. line If multiple invalid entries are found the replacement entry for the new cache line is selected based th following priority way 1 next way 2 next 3 last When no invalid entries are found the cache replacement logic uses a 6 bit LRU algorithm to select the entry for the new cache line For instruction cache the LRUPRIO bit is also considered Analog Devices Confidential Information mass A T Dene UN WS N Cony eS Rl Instruction Cache Locking By Line LRUPRIO Prevents the Cached Line from being replaced CPLB LRUPRIO bits in the ICPLB_DATAx register define the priority for that page The Cache line importance level LRUPRIO is saved in the TAG and used by the replacement policy logic Cache Line Replacement policy with LRUPRIO e No invalid entries high priority will replace a low priority or a high priority if all 4 ways contain high priority lines LRU least recently used policy is used to determine which one of the lines that have the same priority will be replaced Setting the IMEM CONTROL LRUPRIORST bit clears all LRUPRIO bits in the TAGs Analog Devices Confidential Information riis do Instruction cache ym By Way Each 4KB way of the instruction cache can be locked individually to ensure placement of performance critical code Controlled by the ILOC lt 3 0 gt bits in the IMEM CONTROL registe
71. ment in the inner loop up to but not including the last element in each inner loop After the last element in each inner loop Y Modify is applied instead Analog Devices Confidential Information igi iL ST Outer Loop Count Register DMAx Y COUNT MDMA yy Y COUNT 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 For 2D DMA the Y Count Register contains the outer loop count This register contains the number of rows in the outer loop of a 2D DMA sequence is not used in 1D DMA COUNTT 15 0 Analog Devices Confidential Information ANALOG DEVICES i mL Outer Loop Address Increment Register DMAx MODIFY MDMA yy Y MODIFY 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 040002 Y MODIFYT 15 0 This register contains a 2 s compliment byte address increment In 2D DMA this increment is applied after each decrement of Curr Y Count except for the last item in the 2D array on which the Curr Y Count also expires The value is the offset between the last word of one row and the first word of the next row Analog Devices Confidential Information DA cs Current Descriptor Pointer 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reset 0x0000 0000 Current Descriptor Pointer 15 0 Contains the memory address of the next descriptor element to be loaded Curr Desc
72. n Family Simulators Platform ADSP BFSxx Single Processor Simulator Session name My BF532 Sim Session Show all targets and platforms Processor ssa ISS ADSP BF534 ADSP BF536 ADSP BF537 ADSP BF538 ADSP BF539 Delete Delete New Session OK Cancel cancel Licenses Analog Devices Confidential Information ANALOG DEVICES ne Debug Features Single Step Run Halt Set Breakpoints Register Viewing Memory e Viewing e Plotting Dump Fill Code Optimization Utilities e Profiling e Pipeline Viewer e Cache Viewer Compiled Simulation High Level Language debug support e Mixed mode Analog Devices Confidential Information ANALOG DEVICES lu p Iri _ AD p om Online Help Fully searchable and indexed online help Includes quick overviews on using VisualDSP and all of its features Excellent supplement to the manual for things that are better represented visually such as what various plot windows should look like Customizable by using the Favorites window Analog Devices Confidential Information E VisualDSP 4 0 Help A gm Hide Locate Back Contents Index Search Fi Type in the keyword to find State History window openin
73. nalog Devices Confidential Information Block is Multi ported when Accessing different sub bank OR Accessing one odd and one even access Addr bit 2 different within the same sub bank When Used as SRAM When Used as Cache Allows simultaneous Each bank is 2 way dual DAG and DMA set associative access No DMA access Allows simultaneous dual DAG access Analog Devices Confidential Information DEVICES 72 Data Bank B OxFF90 0000 OxFF90 1000 OxFF90 2000 OxFF90 3000 OxFF90 4000 OxFF90 5000 OxFF90 6000 OxFF90 7000 CONFIGURABL E L1 configurable data memory can be Both banks A amp B as SRAM Bank A as cache bank B as SRAM Both banks as cache Analog Devices Confidential Information re DLT EE 777 2 99778070 YOGESH 7007 2 7 CLL 7 2 OxFF90 4000 OxFF90 5000 OxFF90 6000 OxFF90 7000 CONFIGURABL E L1 configurable data memory can be Both banks A amp B as SRAM Bank A as cache bank B as SRAM Both banks as cache Analog Devices Confidential Information re Wt N Ses 2124 Vio Br53 Sut Data Bank 7727 7 a 4 YY LO LE ALLS Ma F80 A 004 2 44 CONFIGURABL E L1 configurable data memory can be Bank A as SRAM Bank A as Cache Analog Devices Confidential Informat
74. nfonration Overview cont The ADSP BF533 system includes 6 DMA capable peripherals including the Memory DMA controller MemDMA with 12 DMA channels and bus masters that support these devices SPORTO RCV DMA Channel SPORT1 RCV DMA Channel SPORTO XMT DMA Channel SPORT1 XMT DMA Channel Channels SPI DMA Channel Streams UART RCV Channel UART XMT Channel PPI DMA Channel 4 Memory DMA Equates to 2 DMA ANALOG Analog Devices Confidential Information DEVICES rig KN 0 BF533 Buses The DMA Access Bus DAB provides a means for DMA channels to be accessed by the peripherals The DMA External Bus DEB provides a means for DMA channels to gain access to off chip memory e The core processor has priority over the DEB on the External Port Bus EPB for off chip memory The DMA Core Bus DCB provides a means for DMA channels to gain access to on chip memory e The DCB has priority over the core processor on arbitration into L1 memory configured as SRAM Analog Devices Confidential Information ANALOG DEVICES lt Copy ovd u BF533 DMA Priority The ADSP BF533 processor uses the following priority arbitration policy on the DAB PPI Re assignab SPORTO RX Re assignab SPORTO TX Re assignab SPORT1 RX Re assignab SPORT TX Re assignab SPI Re assignab UART
75. nterrupt Status W1C 0 No DMA error has occured 1 A DMA error has occured and the global DMA error interrupt is being asserted DA cs sia a DMA Traffic Control Counter Period Register TC PER ROUND ROBIN PERIOD 4 0 DCB TRAFFIC PERIOD S 0O Max length of MDMA round robin bursts If 000 No DCB bus transfer grouping not zero any MDMA stream which receives performed a grant is allowed up to that number of DMA Other Preferred length of transfers to the exclusion of the other unidirectional bursts on the DCB bus MDMA streams between the DMA and internal L1 memory LL DEB TRAFFIC PERIOD 3 0 000 No DEB bus transfer grouping performed Other Preferred length of unidirectional bursts on the DEB bus between the DMA and external memory DAB TRAFFIC PERIOD 2 0 000 No DAB bus transfer grouping performed Other Preferred length of unidirectional bursts on the DAB bus between the DMA and the peripherals Analog Devices Confidential Information m Wager DMA Traffic Control Counter Register TC CNT RO MDMA ROUND ROBIN COUNT 4 0 Current cycle count remaining in the MDMA round robin period TRAFFIC COUNT 2 0 Current cycle count remaining in the DAB traffic period DCB TRAFFIC COUNT 3 0 Current cycle count remaining in the DCB traffic period DEB TRAFFIC COUNT 3 0 Current cycle count
76. oes not include the upper 16 bits of the NEXT DESO parameter The upper 16 bits are taken from the upper 16 bits of the NEXT DESC register thus confining all descriptors to a specific 64K page in memory e Descriptor List Large Model Mode FLOW 0 7 In this mode Descriptor Block includes all 32 bits of the NEXT DESC parameter thus allowing maximum flexibility in locating descriptors in memory Arnalog Devices Confidential Inforrration aos NL Descriptor Block Structures Descripto r Offset Descriptor Array Mode FLOW Small Descriptor List Mode FLOW Depending on the Descriptor Mode used the following lists the order of the Descriptor Block Parameters stored within memory Large Descriptor List Mode FLOW 0x0 START 5 0 NEXT D BS PTR 15 NEXT 2856 PTR 15 0x2 START ADDR S1 1 ADDR 15 0 NEXT DESC PTR BI 0 4 CONFIG START_ADDR 31 16 S4IaRT_ADDR 15 0 0x6 X COUNT DMA CONFIG START ADDR 31 16 0x8 X MODIFY X COUNT DMA CONFIG OxA Y COUNT X MODIFY X COUNT OxC Y_MODIFY Y_COUNT X_MODIFY OxE Y_MODIFY Y_COUNT 0x10 Y_MODIFY NOTE Not all of the Parameters need to be initialized within the Descriptor Block depending on the NDSIZE value within the DMA Configuration Register The NDSIZE valueisthe numb
77. omprises e VDK libraries e VDK specific Idf files e Include files e Template files Overheads e Memory overhead e Minimum memory requirement is platform dependent e Footprint is one of the most important metrics for a RT kernel e MIPS overhead Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Coding Guidelines for Achieving Optimal C Performance on Blackfin Strategic Objective Make C as fast as assembler Advantages C is much cheaper to develop C is much cheaper to maintain C is comparatively portable Disadvantages ANSI C is not designed for DSP DSP processor designs usually expect assembly in key areas DSP applications continue to evolve Arnalog Devices Confidential Inforrration aos ACV T UL Ww LL SUE N Pillars of Effective Programming Understand Underlying Hardware Capabilities Discover What Compiler Can Provide Design Program Effectively e general choice of algorithm e choice of data representation e finer low level programming decisions Usually the process of performance tuning is a specialisation of the program for particular hardware It may grow larger or more complex and is less portable Analog Devices Confidential Information ANALOG DEVICES 2 Analog C Compiler VDSP 4 0 S
78. op bottom LCO P0 3 STALLS PO RO top Analog Devices Confidential Information ANALOG DEVICES Vif AE USES Latency gt affects programming style Take care with structure depth ep q z is inefficient to access And hard on pointer analysis What data does this reference Take care with Table Lookup Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Data types SQ Native C Data Types on Blackfin char 8 bit signed unsigned char 8 bit unsigned short 16 bit signed integer unsigned short 16 bit unsigned integer int 32 bit signed integer unsigned int 32 bit unsigned integer long 32 bit signed integer unsigned long 32 bit unsigned integer float 32 bit double 32 bit long long 64 bit and unsigned long long 64 bit are not supported by the hardware Analog Devices Confidential Information mass WS SET EANA An efficient floating Point Emulation Measurement in cycles 5 532 Multiply 330 95 Add 163 108 And then add Subtract 195 145 in MHZ Divide 655 246 advantage Sine 5341 2164 Cos 5942 2029 Square Root 5836 316 Smaller is better Note Our Square root uses a better algorithm Srreller is better Analog Devices Confidential Ir _ ae
79. or Block when the current DMA transfer finishes Used only in Small and Large Descriptor List Modes Contents of this register are copied into the Curr Desc Ptr Register at the start of a descriptor block fetch Disregarded in Stop Autobuffer and Descriptor Array Mode Analog Devices Confidential Information Ratt Cony Configuration Register DMAx CONFIG MDMA yy CONFIG DMA Buffer Clear 0 Retain FIFO data between DMA transfers 1 Discard DMA FIFO before DMA Mode beginning DMA transfer 0 Linear 1 2D DMA Transfer Word Size 00 8 bit transfers 01 16 bit transfers 10 32 bit transfers Transfer Direction DMA Enable 11 reserved 0 Memory Read 0 Disabled 1 Memory Write 1 Enabled Bit 1 cannot be modified for some peripherals and MemDMA Analog Devices Confidential Information fuera d TE DMA Configuration Register cont DMAx CONFIG MDMA yy CONFIG Interrupt Timing Select FLOW Next Operation MEM bud 0 0 Stop 0 1 Autobuffer Mode 0 4 Descriptor Array 0x6 Descriptor List small model 0 7 Descriptor List large model 1 Interrupt after completing each row inner loop 2D only Interrupt Enable 0 Do not allow completion of DMA transfer to generate an NDSIZE Next Descriptor Size interrupt 0000 Required if Sto
80. p Autobuffer Mode 1 Allow completion of DMA 0001 1001 Descriptor Size transfer to generate an interrupt 1010 1111 Reserved Analog Devices Confidential Information Start Address 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 DMA Start Address 15 0 Reset 0x0000 0000 Specifies the address of the data buffer currently targeted for DMA Contents of the Start Addr Ptr Register is copied into the Curr Start Addr Register at the start of a DMA transfer Analog Devices Confidential Information yigg bu UT X Count Register DMAx X COUNT MDMA W S ted 15 14 13 12 11 10 9 8 7 6 3 2 0 HERTRTRTETETETETETETRTRTRTRTST Reset 0x0001 X COUNTT 15 0 For 2D DMA the X Count Register contains the inner loop count For 1D DMA it specifies the number of elements 8 16 32 bit to read in A value of 0 0 in X Count corresponds to 65 536 elements Analog Devices Confidential Information e ee SS X Address Increment Register DMAx_X_MODIFY MDMA YY X MODIFY 15 14 18 12 11 10 9 8 7 6 4 3 2 Reset 0x000 X_MODIFY 15 0 This register contains a signed 2 s compliment byte address increment In 1D DMA this increment is the stride that is applied after transferring each element In 2D DMA this increment is applied after transferring each ele
81. r Analog Devices Confidential Information ANALOG DEVICES The Worid Leader in High Performance Signal Processing Solutions Data Cache Mode L en 4 Address eFour 4KB sub banks 16KB total eEach sub bank has 2 ways 2KB for each way eEach way has 64 lines eEach line is 32 bytes elf Both Data Bank A and B are set for Cache bit 14 or 23 is used to determine which Data Bank Analog Devices Confidential Information ANALOG DEVICES Write Through eA cache write policy where write data is written to the cache line and to the source memory Write Back eA cache write policy where write data is written only to the cache line The modified cache line is written to source memory only when it is replaced Dirty Clean Applies to Write Back Mode only e State of cache line indicating whether the data in the cache has changed since it was copied from source memory Performance trade off required between write through and write back to determine the best policy to use for an application Analog Devices Confidential Information rit M Data Cache Victim Buffer The victim buffer is used to read a dirty cache line either being flushed or replaced by a cache line fill and then to initiate a burst write operation on the bus to perform the line copyback to the system The
82. re Clock CCLK Domain 1 CORE SYSTEM BUS INTERFACE System Clock SCLK Domain E DMA Core Bus DCB 16 4 P Data Watchdog Mast Tm Choc Event Power DMA Controller 4 lt gt EBIU 9 Address And Timers Controller Management 16 gt ontro DMA Ext Bus DEB EPB Peripheral Access Bus PAB External Port Bus DAB External Access Bus Programmable UARTO 1 internal EAB SPORTS SPI PPI flags IRDA Boot ROM Analog Devices Confidential Information e WSN i 4 N V NA S T j Blackfin Internal SRAM ADSP BF531 ADSP BF532 ADSP BF533 84KB Total 116KB Total 148KB Total 16KB Data SRAM Cache 16KB Data SRAM Cache 16KB Data SRAM Cache 4KB Scratchpad 4KB Scratchpad 32KB Data SRAM 16KB Data SRAM Cache 16KB Data SRAM Cache 4KB Scratchpad Analog Devices Confidential Information fee ADSP BF533 Memory Map RR ER BECOME 1000 gt Smas OO OxFFA1 4000 0000 gt 000 8000 gt 0000 OxFF90 8000 gt 5n OxFF90 6000 gt 288 Bank SRAM Cache Instruction SRAM Cache Instruction SRAM Instruction SRAM Instruction SRAM OxFF9
83. rom the main memory Blackfin allows the user to specify which regions i e pages of main memory are cacheable and which are not through the use of CPLBs more on this later e page is cacheable the block i e cache line containing 32 bytes is stored in the cache after the requested word is forwarded to the core e page is non cacheable the requested word is simply forwarded to the core Arnalog Devices Confidential Inforrration aos ANGD_RFAR2 Inctriictinn Cache lt 20 gt lt gt lt 2 gt ES LINE ADDRESS INDEX LINE ADDRESS TAG LINE OFFSET WAY3 32 BYTE LINE 0 32 BYTE LINE 1 32 BYTE LINE 2 VALID TAG 1 HH VALID TAG WAY2 DATA lt 64 gt 32 LINE 0 32 BYTE LINE 1 32 LINE 2 SZBYTELINE 32 BYTE LINE 32 BYTE LINE 5 SHADED BOXES ACROSS EACH WAY CONSTITUTE A SET Analog Devices Confidential Inforrration Cache Line e A 32 byte contiguous block of memory Set A group of cache lines in the cache Selected by Line Address Index One of several places in a set that a cache line can be stored 1 of 4 for Instructions 1 of 2 for Data Cache Tag e Upper address bits stored with cache line Used to ID the specific id memory that the p ANALOG cached line reor LEE Address Four 4KB sub banks 16KB
84. tate of the art optimizer aProvides flexibility aEase of adding architecture specific optimizations Exploitation of explicit parallelism in the architecture aVectorization exploiting wide load capabilities aRecognizing SIMD opportunities aSoftware pipelining Whole Program Analysis aA wider view enables the optimizer to be more aggressive Analog Devices Confidential Information ANALOG DEVICES RU 2 a vr PES Optimizer improvements in VDSP 4 0 Intelligent Vectorization e More flexible heuristic based vectorization Unroll and Jam Unroll outer loop and combine resulting copies of inner loop Minimising Call Overhead e Can supply list of registers altered by a function Analog Devices Confidential Information ANALOG DEVICES 7 T us NC MG l E 6 ies ui 7h Other new features with VDSP 3 5 long long support 64 bit integer support Enhanced GNU compatibility features compiler built ins added for Blackfin video operations ADSP BF561 support multiple heap support improved cache support Exception Handling Profile Guided Optimization Analog Devices Confidential Information ANALOG DEVICES Understanding Underlying Hardware Isn t C supposed to be portable amp machine independent e yes but at a price e Uniform computational mod
85. ted at same time as computation Introduces loop carried dependencies Makes code less easy to read The compiler can do it for itself Just don t do it Analog Devices Confidential Information ANALOG DEVICES The original loop good A rotated loop bad float ss float a float b int float ss float float b int n X float sum 0 01 int i float ta tb sum 0 0 for i 0 i lt i We 0 r ta ali tb 60 mr T 1 i lt n i 4 sum ali 60 sum ta tb ta ali tb bli return sum sum ta tb return sum Arnalog Devices Confidential Inforrretion ANALOG 7 a SLE Experiment with Loop structure Unify inner and outer Loops e May make loop too complex but optimiser is better focused Loop Inversion reverse nested loop order Unify sequential loops e reduce memory accesses can be crucial when dealing with external memory Analog Devices Confidential Information ae The Worid Leader in High Performance Signal Processing Solutions Section 6 Blackfin ADSP BF533 Memory Core Timer Performance Core Monitor Processor JTAG Debug 032 Core DO b DMA Mastered 32 1 32 I DAObu C 32 m Dibus 3 DAlbus 64 1 Core I bus bus Co
86. total eEach sub bank has 4 ways 1KB for each way eEach way has 32 lines eEach line is 32 bytes Arnalog Devices Confidential Inforrretion re n y e ET UP AEN Cache Hits and Misses cache hit occurs when the address for an instruction fetch request from the core matches a valid entry in the cache A cache hit is determined by comparing the upper 18 bits and bits 11 and 10 of the instruction fetch address to the address tags of valid lines currently stored in a cache set Only valid cache lines i e cache lines with their valid bits set are included in the address tag compare operation When a cache hit occurs the target 64 bit instruction word is sent to the instruction alignment unit where it is stored in one of two 64 bit instruction buffers When a cache miss occurs the instruction memory unit generates a cache line fill access to retrieve the missing cache line from external memory to the core Analog Devices Confidential Inforrraetion gines Arii d Instruction Fill from L2 Memory Cache 64 bits 64 bits Cache Burst Cache Line fill 32 bytes 64 bits 64 bits 64 bits 64 bits Analog Devices Confidential Information ji K Cache Line Fills A cache line fill consists of fetching 32 bytes of data from memory external to the core i e L2 memory A line read data trans
87. unity stride Byte addressable Program Control BRCC UJ UMP SAA Byteops Residual calc Superv i sor user modes Spatial Interpolation Spatial Filter Cache Control Prefetch Flush Wide range of peripherals Analog Devices Confidential Infonration BLACKfin Processors Simplify Programmer s Model Traditional MCU Compiler generates Dense control code BUT Much larger and slower DSP code Traditional DSP Compiler generates Good DSP aigorithmcode BUT Much larger control code Architecture and Compiler Work Together to Deliver Dense Control Code and Fast DSP Code Assembly Code Analog Devices Confidential Information DEVICES Enhanced Dynamic Power Management Increases Battery Life A 600 MHz 1 2V 500 MHz 1 2V 2 Frequency Only 200 MHz 1 2V o 5 ct Voltage amp Frequency 500 2 1 0 200 2 07 Video Processing Audio Processing Variable Frequency Programmable PLL 1x to 63x combined with CCLK and SCLK dividers enable low latency changes in system performance and power consumption profile Variable Voltage e On Chip Voltage Regulator generates core voltage from an externally supplied 2 25 3 6V input e Core voltage programmable from 0 7V to 1 2V 50 mV increments System Cost Reduction Arnalog Devices Confidential Inforrration ei lFiar Blackfin Target
Download Pdf Manuals
Related Search
Related Contents
GARIBALDI - Istituto Comprensivo IV Novembre di Mariano • DK s. 2 – 5 • SE s. 6 – 9 • NO s. 10 User Manual - Appliances Online Rev A - AMETEK Programmable Power 4&8 CH DVR user`s manual Mac 5000 Service Manual Viking F20875 EN User's Manual FR NEW HOLLAND FA168CPS v5 User Manual - Security ONE Alarm Systems Bedienungsanleitung Copyright © All rights reserved.
Failed to retrieve file