Home

Plugging into High-volume Consumer Products

1. o INC II E Se Inconsistent Results 25 MHz to 60 MHz 9 Hours Place and Route Runtime 147 MHz Consistently Exceeding 63 MHz 3 Hours Place and Route Runtime 172 MHz Design Time 3 Weeks 160 MHz Design Time 3 Days 178 MHz Table 1 Customer designs demonstrate a boost in performance and productivity with PlanAhead software Second Quarter 2005 oat DESIEN TOO LE Xilinx Events and Tradeshows Xilinx participates in numerous trade shows and events throughout the year This is a perfect opportunity to meet our silicon and software experts ask questions see demonstrations of new products and hear other customer success stories For more information and the current schedule visit www xilinx com events Worldwide Events Schedule North America April 21 AdvancedTCA Seminar San Jose CA April 21 May 19 Low Cost Joint Seminar with Synplicity various locations Wind River 2005 Worldwide User Conference Orlando FL SUPERCOMM 2005 Chicago IL NSREC 2005 Seattle WA May 22 25 June 7 9 July 11 15 Europe Middle East Africa EMEA May 11 Mentor Graphics EDA Tech Forum Dresden Germany The MathWorks Model Based Design Conference Warwickshire UK May 25 TI Developer Conference Milan Italy May 11 12 May 30 TI Developer Conference Tel Aviv Israel June 2 JTAG Technologies DFT Boundary scan Seminar Barcelona Spain The MathWorks Model Based Design C
2. cation are in an Low Cost FPGAs Change the Equation Now that we ve reviewed current trends in the ASIC market we can now look at how low cost FPGAs can change the landscape to one more favorable for designers Designers can now apply the traditional strengths of program mable logic to their system solu tions namely the low cost of entry off the shelf availability short design cycles reprogram mability itself and a known and working architecture Plus they can use FPGAs as test vehicles to examine and prove out different types of semiconductor intellec tual property SIP with an ease that is impossible in standard cell or SoC markets ASIC designers can test out several different types of SIP using FPGAs in the same time it took just to try out one type of SIP using the traditional standard cell or SoC route This in itself is a great aid to ASIC designers in arriving at the best possible silicon solution However the road for FPGAs does not end here this is merely the start of the story in terms of aiding ASIC designers Previously when FPGA silicon was more costly ASIC designers could successfully prototype with FPGAs and enter into lim ited production with their FPGA solution But at some point usually between 30 000 and 100 000 units depending on the price sensitivity of the end application the design would change over to a full blown ASIC solution At this Second Quarter 2005 Increase i
3. 16 potential for FPGAs in high volume low cost systems In Encoding High Resolution Ogg Theora Video with Reconfigurable FPGAs 19 1998 Xilinx released the very first FPGA designed for these Implementing DSP Algorithms Using Spartan 3 FPGAs 22 systems the Spartan device family Today the Spartan Designing a Spartan 3 FPGA DDR Memory Interface 26 series of FPGAs are broadly adopted in high volume con sumer applications with more than 100 million units shipped and more than 1 billion in cumulative revenue With the latest release of the Spartan 3E family Xilinx has reduced the cost of the Spartan device 30X since its introduction while offering more platform features that Signal Processing Capability with the Nu Horizons Spartan 3 Development Doud 28 CoolRunner Il CPLDs Offer New Features ooooconnionnionocinicnnicnniconoss Reducing Bill of Materials Cost Using Logic Consolidator enable higher levels of integration Using CoolRunnertl Silicon Features to Reduce Cost 36 Xilinx recognizes that an important part of high volume l eg tan 3 3E Features to Area Optimize Your Design 9 PLDs is the non volatile CPLD solution The Xilinx portfo GE lio of CPLD products is one of the most comprehensive in Low Cost EasyPath FPGAs Offer Promise to ASSP Compani
4. d by Lonn Fiance Director Strategic Alliances Synopsys lonnEsynopsys com Robert Vallelunga Product Manager Synopsys robertv synopsys com With the latest process technologies com plex high performance systems once the exclusive domain of ASIC and custom chips can now be created with FPGAs Although this benefits both consumers and companies it also creates significant new design and ver ification challenges that must be addressed to meet time to market requirements As a designer you use system level analysis and simulation to evaluate differ ent architectural alternatives The result of this architectural exploration is the RTL register transfer language specifica tion which forms the definition of your device s functionality Depending on the product require ments you can realize this functionality in a number of forms using Xilinx devices as FPGAs designed into the final product or prototypes for verification of complex ASICs 6 Xcell Journal Second Quarter 2005 The RTL description defines the func tionality so maintaining equivalent behavior for all implementations under all circum stances is critical As design sizes have increased the ability to prove equivalence by exhaustive simulation has disappeared ASIC designers recognized this issue several years ago and turned to equivalence checking EC to maintain functionality When you design one of todays high com ple
5. Custom m UART Blaze a GPIO Memory Controller Peng Other Figure 4 Integrated Poseidon and Xilinx tool flow Xcell Journal 85 si oo by Salil Raje Control Your Designs with salil rajeOxilinx com the PlanAhead Hierarchical 2 Xilinx Inc david knolOxilinx com Design and Analysis Too s Xilinx Inc brian jacksonExilinx com You can improve performance and reduce Max Goosmon be Product Marketing Manager design closure times through quick analysis feedback and a block based flow mark goosman xilinx com If you are pushing the limits of perform rae ance and density in an FPGA then you are SE obs KK A also likely facing problems that threaten your design deadlines Long place and route runtimes and repeatability of results are two common sources of anxiety Also traditional FPGA tools provide a flat push button implementation approach a flow that provides few clues about how to improve your design goals and even fewer opportunities to make the necessary changes If you can relate to these challenges Xilinx offers a new tool that will put you in control of your design The PlanAhead hierarchical design and analysis tool allows you to employ ASIC design methodologies for complex FPGAs PlanAhead software easily integrates into your existing flow between synthesis and place and route as shown in Figure 1 You can analyze detect and correct many pot
6. Figure 1 Most industry experts agree on the explosive growth projections for flat panel displays The Spartan 3 Feature Set Spartan 3 FPGAs offer various features that are very useful for FPD system designers These include embedded multipliers that enable very efficient DSP implementation shift register functionality features that enable high performance and reduce resource utilization for implementation of pipelined or multi channel functions large memory resources and built in support for popular differential I O standards used in the display market Table 1 lists the various Spartan 3 features and their use in DSP implementation A Spartan 3 device can be used in FPD designs in various ways Let s look at some typical FPGA usage within the FPD system Front End Pre Processing The digital RGB signal will typically require some kind of pre processing Spartan 3 Silicon Features Embedded Multipliers Shift Register Logic Block RAM and Distributed RAM Built In Support for RSDS Second Quarter 2005 before it is fed into the main image pro cessing engine This pre processing can be anything from discrete cosine transfer decryption or interlacing de interlacing Normally this is how an FPGA is gener ally used in conjunction with the cus tomers image processing ASIC or ASSP One common use is to assist the ASSP in image scaling and de interlacing Core Processing Although core image processing is gener
7. PCB area An evaluation board is available Input Range 3 0V to 20V 3 0V to 18V Output Load 750 mA For free samples evaluation boards or more information visit power national com or call 1 800 272 9959 Output Range 0 8V to 18V 1 25V to 16V Internal References 0 8V 2 1 25V 2 Operating Frequency 550 kHz 1 6 MHz 3 MHz National Semiconductor The Sight amp Sound of Information Second Quarter 2005 Xcell Journal 109 KE POWER PLAY Tl Power Solutions 5 V_INPUT T dl E Vecaux 3A IN2 sWi ri BUCK des z TR E m Shy i EN3 e DGND DGND i DGND d 15 T 158 dr nF T Vecint 12V 2A 100 uF Veco 33V 2A 61 9k I 100 uF 36 5k Vecaux 2 5 V O 300 mA 10 uF Highly integrated triple supply powers Spartan 3 core 1 0 and Vecaux rails 110 Features e Two 95 efficient 3A buck controllers and one 300 mA LDO e Adjustable output voltages From 1 2V for bucks From 1 0V for LDO e Input voltage range of 2 2V to 6 5V e Independent softstart for all three power supplies e UDO stable with small ceramic output capacitor e Independent enable for each supply for flexible sequencing e 4 5 mm x 3 5 mm x 0 9 mm 20 pin QFN package e 1 ku price 1 90 Applications e DSL modems e Set top boxes e Plasma TV display panels e DVD players Xcell Journal The TPS75003 power management IC for Xilinx Spartan II Spartan HIE and Spartan 3 FPGAs integrates multiple func
8. P just stops its work We achieve speed ups of 12 on the 16 cluster entities which means that Hydra now examines 36 million nodes per second At a certain level of branching the remaining problems are small enough that 96 Xcell Journal we can solve them with the help of a Configware coprocessor benefiting from the fine grain parallelism inside the application We have a complete chess program on chip consisting of modules for the search the evaluation generating moves and executing or taking back moves At present we use 67 block RAMs 9 879 slices 5 308 TBUFs GenVictim To Square 1 Occupied squares send a signal in GenVictim 2 Free squares forward these signals 3 All squares receiving a signal are potential to squares 4 Comparator tree selects most attractive to square taking moves move generator consists of two 8 x 8 chess boards as shown in Figure 3 The GenAggressor and GenVictim modules instantiate 64 square instances each Both determine to which neighbor square incoming signals must be forwarded The square instances will send piece signals if there is a piece on that square GenAggressor From Square 1 Winner square generates signal of a super piece 2 Free squares forward the signals 3 Squares occupied by own pieces are potential from squares 4 Comparator tree selects most attractive from square Figure 3 The gen modules form the move generator 534 flip flops and 1
9. Ae A ii ler ia Kan Eum WIRTER o PRO LS VU vm SS ah pr a EELER deraman TUE NM yee ee o foe EEN KEKKEHLIECKKRKI gt AAA MAN TDT A a a GRO WALL quell Prove your design with high speed FPGA hardware emulation plugged directly into your PCIe system Here are 4 5 million gates to emulate your ASIC and kill the RTL bugs before you cut masks This board will let you test your software and increase your chances that the first spin will be the last The DN6000K10PCle is packed with the features you need e1 4 and 8 lane versions e Six VirtexII Pro FPGAs 2vp100s the big ones e10 DDR 64Mx16 and 4 SSRAMs 2Mx36 external to the Th e FPGAs o ATT e Expansion capability to customize your application Vi eSynplicity Certify models for quick and easy partitioning Gr Like all our products this new PCI Express bus board will help you oup get your ASIC to market on time and in budget Call The Dini Group today PCIe is already here 1010 Pearl Street Suite 6 e La Jolla CA 92037 e 858 454 3419 e Email sales dinigroup com The Hydra Project Hydra currently the strongest chess program in the world is a cutting edge application that combines cluster computing with the fine grain parallel FPGA world by Chrilly Donninger Programmer Nimzo Werkstatt DEG donninger wavenet at The International Computer Games Association ICGA regula
10. As FPGA vendors continue to use smaller process geometries when they become available the ASIC industry will see a further extension of FPGAs into appli cations where they can deliver a solid solution at the necessary price points Designs and end system solu tions that might not have made it into production because of a pro hibitively long design cycle now have a much better chance of suc ceeding whereas before they might have been cancelled because they couldnt make that ever important market window The introduction of low cost full featured FPGAs has changed the systems landscape for the better Today ASIC designers or system architects can use a programmable logic solution with the knowledge that a FPGA can be used farther into the production cycle than was previously possible Xcell Journal 11 e Increased visibility with FPGA dynamic probe Intuitive Windows XP Pro user interface e Accurate and reliable probing with soft touch connectorless probes e 16900 Series logic analysis system prices starting at 21 000 Agilent Direct Get a quick quote and or FREE CD ROM with video demos showing how you can reduce your development time U S 1 800 829 4444 Ad 7909 Canada 1 877 894 4414 Ad 7910 www agilent com find new16900 www agilent com find new16903quickquote Agilent Technologies Inc 2004 Windows is a U S registered trademark of Microsoft Corporation Now you can see inside
11. B 1 m 1 plus the score of substitut ing sub A n for B m e The similarity of A 1 n and B 1 m 1 plus the score of deleting del A n e The similarity of A 1 n and B 1 m 1 plus the score of inserting ins B m To solve the problem with this recur rence the algorithm builds an n 1 x m 1 matrix M where each M i j repre sents the similarity between sequences A 1 i and B 1 j The first row and the first column represent alignments of one sequence with spaces M O 0 represents the alignment of two empty sequences and is set to zero All other entries are comput ed with the following formula M i j min M i 1 j 1 sub A i Billi Mli 1 j del A i Mli j 1 ins BI Because there are m 1 x n 1 posi tions to compute and each takes a con stant amount of work this algorithm has a time complexity of O n2 Clearly it also has quadratic space complexity because it needs to keep the entire matrix in memory FPGA Architectures and Networking Tools The FPGA network hardware comprises Nallatech FPGA computing cards con taining between one and seven FPGAs per card Several such cards can be plugged into the PCI slots of a desktop server and networked using Nallatech s DIMEtalk Second Quarter 2005 networking software greatly increasing the available computing capability Let s briefly describe the different FPGA net work components High Capacity Motherb
12. Back seat entertainment applications require two video inputs and video outputs such as two logiCVC and two UltiWIN 2 Xylon is an Ultimodule Inc technology partner Xylon has an open license and a UMA memory architecture fits into a smaller Spartan 3 XC38200 device We reduced costs further by using the MicroBlaze core which eliminates the need for a separate external CPU The IP core integation and system configuration for the MicroBlaze core is provided by the Xilinx Embedded Development Kit EDK End Applications The Multimedia FPGA Platform is primarily aimed at the auto motive market including appli cations such as navigation infotainment backseat enter tainment and driving assistance Example infotainment appli cations include such automotive displays as VCR videos game consoles rear parking cameras and navigation systems Figure 2 shows a backseat entertainment prototype system displaying a contemporary game console on one display and a DVD movie on another display Driving assistance is an another example of an automotive application using multi media platform IPs The system comprises synchronized stereo video camera inputs a DSP algorithm for image processing and Second Quarter 2005 to tailor Ulti MEM and UItiWIN IP source code to customer needs CAN IP for communication with dash board or other human interfaces Xylon is in the process of designing an automotive reference board whic
13. By Farzad Zarrintar VP of Worldwide Sales amp Marketing Poseidon Systems Farzad Zarrintar poseidon systems com Bill Salefski VP of Engineering Poseidon Systems bills poseidon systems com Stephen Simon Director of Sales amp Business Development Poseidon Systems ssimon poseidon systems com The introduction of embedded processors for FPGAs like the immersed PowerPC and Xilinx MicroBlaze soft processor core has greatly expanded the benefits of platform FPGAs These advantages include lower FPGA power consumption lower device cost and more scalable perform ance particularly when balancing your application s execution between what will run in the fabric of the FPGA versus the embedded processor s Designing efficient processor based sys tem architectures and optimizing overall system performance for a specific applica tion is challenging For example effectively partitioning the hardware and software early in the design is difficult yet critical to Second Quarter 2005 the development of the architecture Also architectures must be verified early in the design cycle to maximize the benefits of processor based designs You cannot wait until RTL development to discover that your architecture does not support the sys tem requirements This redesign loop can easily delay the overall design cycle Newer and better alternatives exist to partition analyze design bottlenecks in implement and ver
14. CIk1 20 Mhz and Clk2 23 Mhz change to Clk1 20 Mhz and Clk2 25 Mhz Are the I O constraints correct Excessive input or output delay can wreak havoc with synthesis If the criti cal path includes an I O and one level of Second Quarter 2005 a HIGH VOLUME SOLUTIONS We hope these guidelines and hints will help you quickly achieve the performance required for your FPGAs while minimizing the logic resources logic there is not much that synthesis tools can do about it Turn on I O regis ter packing with the syn_useioff switch Ensure that pipelining is enabled Try a run with re timing on We have found that this gives a 5 performance improvement for Spartan 3 devices It is also a very good idea to surgically apply the retiming attribute to registers on the critical path with the global switch off This can increase performance while keeping the area increase to a minimum Add all timing exceptions false paths and multi cycle paths Just adding these constraints can make a huge difference in performance Does the critical path start or end inside a black box or Coregen edn ngc ngo Is it possible to re code in generic code If not add the edn file to the Synplify project The Synplify tool will optimize the logic around these cores If you have ngc files use ngc2edf exe to con vert the core to edn and add to the Synplify project Provide I O types Remember to specify the speed and type for I Os
15. Synplify software needs this timing information to optimize the driving driven logic Are there gated clocks in the design Turn on the gated clock converter in the Synplify Pro tool This engine will push the clock from the register input pin onto the clock line while maintaining the same functionality This will dramatically increase the performance of this path e Turn off resource sharing Are you now meeting timing If not you may want to try Amplify FPGA phys ical synthesis which generates detailed placement and performs physical opti mizations for additional performance and timing predictability Second Quarter 2005 How to Reduce Area Here are few ways to reduce area for Spartan 3 FPGAs These techniques will have an impact on the timing e Use as many of the dedicated resources as possible Spartan 3 devices have plenty of resources that you can tap into to reduce LUT usage Here are some suggestions Try to keep black boxes Coregen to a minimum Wherever possible replace these components with generic code Synplify software is capable of optimizing around black boxes but not the boxes themselves Significant effort has gone into dedicated resource management for each Xilinx architecture With generic code the Synplify product can remove redundant logic merge identical logic and pack into the dedicated resources Synplify software may pack logic into registers for performance reasons If this is
16. algorithm in VHDL e Creation of the FPGA network The user VHDL design maps the Smith Waterman algorithm to a systolic array structure As shown in Figure 1 the array structure comprises computational units known as processing elements PEs that have local interconnections between them Each element of the query sequence is hard coded into a PE to achieve an area efficient realization of approximately four slices Note that the length of the maximum length of the query sequence corresponds to the maximum number of PEs that can fit inside a given Virtex II Pro device A C based routine instantiates the VHDL code of a PE multiple times to create the array of interconnected PEs Additional VHDL code is then inserted to interface with the FPGA network DIMEtalk networks are defined early in the application development phase using the supplied network design software tool DTdesign This allows the structure of the network to be outlined across the FPGAs in a system Having defined the network Smith Waterman blocks of design VHDL or VHDL wrapped source are imported using the custom component insertion wizard and interconnected to the network in the DT design workspace FPGA I O ports for the whole design can be mapped to the device pins using the high Figure 1 Systolic array of Smith Waterman processing elements C O u n T Out n In D Out t e In INIT Out r Xcell Journal 101 level drag and drop device editor
17. ally done using standard ASSPs or cus tom ASICs FPGAs are sometimes used for custom processing Xilinx along with its AllianceCORE and XPERTS partners provides a range of intellectual property cores that can ease the design of these functions Visit the Xilinx IP Center www xilinx com ipcenter for more details Value in FPD Applications Efficient implementation of MAC FIR and other DSP Functions Efficient implementation of multi channel functions Video Line Buffers Cache Tag Memory Scratch Pad Memory for DSP co efficients Packet Buffers FIFOs RSDS support without termination resistors other special design considerations Table 1 Spartan 3 features allow for an area efficient design in FPD systems A HIGH VOLUME SOLUTIONS Examples of such functions include gamma correction image scaling edge detection sharpness contrast and frame buffer Given the range of features that Spartan 3 devices provide you can imple ment many of these functions with minimal silicon area Some of the intellectual prop erty cores utilized by our customers include e Color Space Converter e JPEG Codec e Discrete Cosine Transfer e MPEG Video Encoder Decoder Figure 2 shows the various functions in a typical FPD that you can implement in a Spartan FPGA Interfacing Many times the FPGA is used for interfacing to the screen or to the external memory Spartan 3 support for low swing differential I O standards such as
18. when 1000 gt data_out lt store8_data when 1001 gt data_out lt store9 data when 1010 gt data_out lt store10_data when 1011 gt data_out lt store11_data when 1100 gt data_out lt store12 data when 1101 gt data_out lt store13_data when 1110 gt data_out lt store14 data when 1111 gt data_out lt store15 data when others gt data_out lt A end case end if end process Xcell Journal aK HIGH VOLUME SOLUTIONS A ef HIGH VOLUME SOLUTIONS Low Cost EasyPath FPGAs Offer wo to d a F Companies by Gokul Krishnan Senior Marketing Manager Xilinx Inc gokul krishnan xilinx com As the semiconductor industry continues its march towards smaller process geome tries it has to deal with exponentially increasing mask set costs Customers who want to spin a 90 nm ASIC must pay upwards of a million dollars in NRE In addition they must incur significant engi neering and design expenses increasing the overall cost of product development to prohibitive levels ASSP companies have come under increasing pressure to weigh more carefully the risk reward tradeoffs for introducing new products ASSP vendors are being forced to develop only those blockbuster products that will allow a reasonable return on their large up front investments Recent innovations such as Xilinx WSpartan 3 FPGAs and EasyPath tech nology wh
19. 135 Msps Plug in evaluation modules from Intersil allow you to utilize digital to analog converters with 8 10 12 14 bit resolution and through put rates of 130 to 260 MHz for use in applications such as medical imaging 3G 4G base stations and spectral analysis The Spartan 3 family from Xilinx has all of the features necessary for todays most demanding high performance applications Nu Horizons Spartan 3 development platforms provide the exact hard ware to develop and test to these requirements and meet your time to market expectations 114 Xcell Journal Features e Based on the Xilinx Spartan 3 FPGA e Development boards available X 3S1500 4 FG676 X 3S2000 4 FG676 e ISSI 64 Mb SDRAM 1542516400 e Two IMx 16 x 4 SDRAM 16 MB e Footprint for 1M x 32 SSRAM provided e ST 32 Mb flash M29W320DB e ST highspeed CAN 2 0B PHY interface L961 6 e 2 x 24 character LCD interface e Sharp LCD panel connector F e SMSC 10 100 Ethernet MAC and PHY LAN91C111 e ICS 10 100 Ethernet PHY ICS1893BF e Linear Technology two channel A D and D A converters e 1101654 D A 14 bit e TC1865L A D 16 bit e ST audio CODEC STW5093 e PS2 port interface e Intersil 2 RS232 serial ports 1CL3237 e 16 bit LVDS I F with control signals e Parallel Cable Ill and IV JTAG configuration support e CS PLL system clock generator 165511 1 S8422 Additional plug in evaluation modules available e linear Technology high sp
20. DSP divi sion at Xilinx collaborated in an effort to combine AccelChip s tools with System Generator The mix language based algorithm design in MATLAB and graph ical block oriented design in result allows you to Simulink in a novel unified electronic system level ESL design environment Figure 1 Xcell Journal 13 DESTE IN e AccelChip s DSP Synthesis tool aug ments System Generator by providing a seamless integration path for algorithm developers enabling the rapid creation of IP blocks directly from M files that enhance the Xilinx block set in System Generator In addition AccelChip has optional AccelWare toolkits that comple ment System Generator with additional IP cores optimized for Xilinx cores AccelWare toolkits include mathematical building blocks signal processing communications and advanced math to implement linear algebra functions Kalman Filter Design Example To illustrate this approach let s take an advanced algorithm written in MATLAB use AccelChip to synthesize the design and then integrate it into a System Generator model Our example is a Kalman filter a recursive adaptive filter well suited to combining multiple noisy signals into a clearer signal for details on the topic see Arthur Gelb s book Applied Optimal Estimation Kalman filters embed a mathematical model of an object such as a commercial aircraft being tracked by ground based radar and use
21. DT design then automatically gener ates VHDL code user constraint files and a VHDL test bench and configuration script files DIMEtalk components include e Routers that direct data around the network and interconnect all other component types e Bridges that move data between different physical FPGA devices e Nodes serving as the user interface to the network that can be connected to user FPGA designs through node interfaces such as block memo ries BRAMs queues DAA al HS e FIFOs and memory maps e Edges serving as the user inter face to the host computer Note that the sizes of memory elements such as BRAMs and FIFOs depend on the application requirements and are easily inter changeable in the DIMEtalk GUI environment Figure 2 shows a block diagram of components within the FPGA The entire network including the Smith Waterman application is then synthesized using the Xilinx ISE 6 21 flow and the place and route timing netlist is generated Functional and timing simulations verify the operation of the design Thereafter the bitstreams for the entire multiple FPGA network are generated within the DIMEtalk environment and placed in cor responding work directories Figure 3 shows a block diagram of a multi FPGA network using five XC2V6000 devices as designed in DIMEtalk The bit files are then downloaded to the designated FPGAs using FUSE C API calls In runtime dedicated Tcl Tk soft
22. Data Selector Mun 3 TS J J A9 Devices TIES APaVaugn ude Compatatcs i E one 4 AA Deen 7474 Dial D Type Positivo Edge Tigpered Flip Fiopa With Preset And Cle KR UIT E i Ai Deecen TARA Dual d io 1 Doia SolectonMuz wi L znie Outputs 4 Ap A Deoeen 74357 Quad 2 to 1 Data Selector i E te 38 d i Leen Jun Ostal Date o we State Outputs 1 E ong 1 A Deaen HS Dual 4 reg HAND 1 MN of i A Devices DA Teo 2AM 2 input AND OAANWEAT Gates 1 E ag Ai Dicas MA Cuad 2 to t Dats SelectorMur 35 Ad Devers TINI Detal EuflotDrivel el 3 State Duiputs 1 E TN An Deces 14412 Dual JE FipFlops wMeg Edge CLA amp PRE RK H ap CR S Devices TA Quedruple Haput Poulie DA Gates C E KE El Ad Device KL E MA An E D 000 H Uer att Aeon Westen Ent j EEN E 1 Erico 1125 187 Figure 2 Logic Consolidator helped Wipro Technologies understand the cost benefits ofusing a CPLD in place of multiple discrete logic devices Xcell Journal 35 A HIGH VOLUME SOLUTIONS SOLUTIONS Using CoolRunner ll Silicon Features to Reduce Cost CoolRunner ll CPLDs offer features that lower total PCB costs through component count reduction smaller board space and lo E V Xcell Journal O WW a Y wersmanufacturing costs ARA Johr Me pi 8 by Steve Prokosch Product Solutions Marketing Manger Xilinx Inc a SprokoschOxilinx com Xilinx introduced the CoolRunnerTM II family of CPLDs in 2001 continuing t
23. EE x jealy 2 YO v D E a D O10 a a a A aaa a OA aa a Ha EK e WO Oo 0 QU oO W o COM Co ees co COM CON CON CO Un oi 3 1 lt Ul Ul Ul Ul Ul Ul Ul Ul Ul Ul Ul Ul Ul Ul 5 oe A Ka y N N Pl LA N Pl LA N Pl LA D hd DI e DI A Oo A O N Q 0 Bb N Q 0 Bb N Q La o o E co o E 0 co E X gt S ES gt D x D a Kl We lt lt CH hes lt o c S 00S6 X 1X00S6 X AX00S6 X z A Duttoon paads sainjea4 NEEN oi O I 49SN pue suondo obeyed S31136S 00656 XIAJPIA UOI1339 9S5 PINPOAd Second Quarter 2005 Intersil Switching Regulators Intersil High Performance Analog i Selecting the Right gt Switching Regulator for Spartan 3 FPGA Based System Power Solutions for Gg do you need FPGA Design Has o A N eve r B e e n S O d Sy kees size use Intersil ISL6528 For highest efficiency use Intersil ISL6539 Intersil offers a wide range of Power Supply solutions for the latest e One generations of Xilinx FPGA based systems ecc lito Reduce your Spartan 3 board size and increase efficiency with Intersil s multiple output to 0 54 use Intersil ISL6410 and DC DC Switching Regulators or Switching Regulators with Integrated MOSFETs ISL6410A Switching Regulators with Integrated FETs For ease of use and Integration from 5 VIN 1A to 8A use Intersil EL75XX Switching Regulators with Integrated FETs For most flexibility up 20A use Rei Intersil ISL6526 OCSET RS UGATE Q1 g VecinT
24. FS_START Otherwise we enter the quiescence search which starts with the evaluation inspection In the quiescence search we only consider capture and check evasion moves The search stack not shown is realized by six blocks of dual port RAM organized as 16 bit wide RAMs Thus we can write two 16 bit words into the RAM or one 32 bit word at one point of time A depth vari able controlled by the search FSM controls the data flow Various tables capture differ ent local variables of the recursive search Conclusion We are quite optimistic that Hydra already plays better chess than anybody else Nevertheless we must now show this in a series of matches At the same time we want to maintain the distance between Hydra and other computer players and even increase it Eval lt q or depth Eval finished ANO Yes nullmove applicable Yes lt i depth lt 0 gressor S found E Yes No victim available depth lt 0 side to move in check depth lt 0 Figure 4 Simplified flow chart for the 56 state FSM that operates the Alphabeta algorithm Second Quarter 2005 Therefore in future versions of Hydra we plan to switch to newer generations of Xilinx FPGAs increase the number of processors further and fine tune the evalu ation function For more information visit www O hydrachess com and www chessbase com History of Modern Computer Chess e 1940 1970
25. PHASE OUT 1 5V 5A to 20A Vecor 03 SR R6 Linear Regulator SS ae E E For detailed application 3 3V 120mA KH zm note on using Intersil CH 2 Sp Power Management Se Solutions in Xilinx FPGAs go to www intersil com xfpgas Je GE 2200 Document includes Intersils n recommeded power supply solutions IC schematics functional block diagrams EE ic BOMs selection tables Xilinx FPGA power tables DDR memory power solutions sequencers and supervisory Intersil s Xilinx Spartan 3 Power Solutions circuits and layout considerations Peak Iccint Current A Requirement Recommended Intersil Power Solutions Download ISim PE Intersil s personal Device Vin 5V Vin 3 3V TE l a xcasso oe os O a edition offline design simulation tool at Se T T www intersil com iSim XC3S400 2 3 1 1 ISL6528 EL75XX EL75XX Family XC351000 4 2 2 0 Family or ISL6521 or ISL6526 Find the power management device that XC3S1500 6 6 3 2 matches your design requirements at XC352000 9 6 4 6 www intersil com pmps N XC3S4000 15 3 7 2 ISL6521 or ISL6539 ISL6526 XC3S5000 17 6 8 4 _intersil 2005 Intersil Americas Inc All rights reserved The following are trademarks or services marks owned by Intersil Corporation 7 MH PERFORMANCE ANALOG or one of its subsidiaries and may be registered in the USA and or other countries Intersil and design and i and design Intersil Switching Regulat
26. Staff Signal Integrity Design Engineer Xilinx Inc mike degerstrom xilinx com Every multigigabit backplane trace and cable distorts the signals passing through it This degradation may be slight or dev astating depending on the conductor geometry materials length and type of connectors used Because they spend their lives working with sine waves communications engineers like to characterize this distortion in the frequency domain Figure 1 shows the channel gain also called the frequency response of a perfectly terminated typical 50 ohm stripline or 100 ohm differential stripline This stripline acts like a low pass filter attenuating high frequency sine waves more than lower frequency waves Figure 2 illustrates the degradation inherent to a digital signal passing through 20 inches 5 meters of FR 4 stripline The dielectric and skin effect losses in the trace reduce the amplitude of the incident pulse and disperse its rising and falling edges We like to call the received pulse much small er than normal a runt pulse In a binary communication system any runt pulse that fails to cross the receiver threshold by a suf ficient margin causes a bit error Second Quarter 2005 For the purposes of this discussion three things degrade the amplitude of the runt pulse in a high speed serial link losses in the traces or cables reflec tions due to connectors and other signal transitions and the limited
27. Xilinx CPLDs you get choices on I O and you can use them for other functions Thus they are a better choice when you need level transla tion versatility and some logic extra I Os or other system integration functions See Table 1 for features and package types Additional Low Cost Packages With the migration of CPLDs into non traditional markets package form factor and cost have become growing concerns The Xilinx CPLD group has listened to customer needs and found a better small form factor package This package is a low cost alternative to more expensive chip scale BGA packages with similar I O counts The MLF micro lead frame pack age shown in Figure 1 is small and packs a lot of I O Xilinx has introduced two new MLF packages for the CoolRunner II family of CPLDs a 32 pin QFG quad flat no lead Pb free package with 21 I O for the 32 macrocell device and a 48 pin QFG pack age with 37 I O for the 64 macrocell device These additions bridge the gap between more expensive chip scale BGA and low cost thin quad flat pack packages MLF packages also known as QFN quad flat no lead offer small sizes and high I O counts but unlike their BGA chip scale cousins are easy to assemble and easy to probe For assembly they are like Second Quarter 2005 regular 0 5 mm pin spacing thin quad flat pack TQFP packages This makes pin alignment and solder reflow very easy com pared to BGA packages For debug MLF pa
28. Xilinx provides many tools to implement astomized DDR memory interfaces a Ni par kb by Rufino Olay Marketing Manager Spartan Solutions Xilinx Inc rutino olay xilinx com Karthikeyan Palanisamy Staff Engineer Memory Applications Group Xilinx Inc karthi palanisamyOxilinx com Memory speed is a crucial component of system performance Currently the most common form of memory used is synchro nous dynamic random access memory SDRAM The late 1990s saw major jumps in SDRAM memory speeds and technology because systems required faster perform ance and larger data storage capabilities By 2002 double data rate DDR SDRAM became the standard to meet this ever growing demand with DDR266 initially DDR333 and recently DDR400 speeds 26 Xcell Journal z wf i erer y WEE oe DDR SDRAM is an evolutionary extension of single data rate SDRAM and provides the benefits of higher speed reduced power and higher density com ponents Data is clocked into or out of the device on both the rising and falling edges of the clock Control signals however still change only on the rising clock edge DDR memory is used in a wide range of systems and platforms and is the com puting memory of choice You can use Xilinx Spartan 3 devices to implement a custom DDR memory controller on your board Interfacing Spartan 3 Devices with DDR SDRAMs Spartan 3 platform FPGAs offer an ideal connectivity solut
29. across multiple time domains The ATC2 software debug core has also been enhanced for automatic setup allow ing the logic analyzer to automatically find which ATC2 FPGA pins are connected to which logic analyzer pod signals making setup faster and easier ATC2 can auto matically find the ideal sampling point of each ATC2 FPGA pin in both phase and voltage offset ATC2 also offers an increased number of multiplexer bank inputs from 32 to 64 and you can now start up your ATC2 debug with a known bank selected Conclusion Nothing in FPGA or ASIC verification approaches what ChipScope Pro tools offer real time verification at or near sys tem operating speeds with a minimum impact on design space and I O pins and the ability to capture any signal inside the FPGA while it is running on the board interacting with the system Add to that the advantage of FPGA reprogrammabili ty and you can identify problems and change your design in a matter of minutes or hours not days or weeks or months using other offerings To order your ChipScope Pro 7 11 copy today contact your Xilinx sales representative Second Quarter 2005 Integrating MATLAB Algorithms by Eric Cigan Product Marketing Manager AccelChip Inc eric cigan accelchip com Narinder Lall Senior DSP Marketing Manager Xilinx Inc narinder lall xilinx com There are two kinds of people in electronic design those who think in terms of words and those who think i
30. amukheri uncc edu Arun Ravindran Assistant Professor University of North Carolina at Charlotte aravindr uncc edu In the past decade there has been an explo sive growth of biological data including genome projects proteomics protein structure determination cellular regulato ry mechanisms and the rapid expansion in digitization of patient biological data This 100 Xcell Journal gt 7 L Le y has led to the emergence of a new area of research bioinformatics where powerful computational techniques are used to store analyze simulate and predict bio logical information Although raw computational power as predicted by Moore s Law has led to the number of transistors that can be integrat ed doubling every 18 months the genom ic data at GenBank the NIH genetic sequence database an annotated collection of all publicly available DNA sequences is doubling every six months Proteomic and cellular imaging data is expected to grow even faster Post genomic era bioinformat ics will require high performance comput ing power of the order of several hundreds of teraflops or more In recent years FPGAs have emerged as high performance computing accelerators capable of implementing fine grained massively parallelized versions of computa tionally intensive algorithms The repro pe EE ie Ee erammability of FPGAs enables algorithm specific computing architectures to be implemented using the same
31. as well The goal of the Hydra Project is literal ly one of world dominance in the comput er chess world a final widely accepted victory over human players Indeed we are convinced that in 2005 a computer will be the strongest chess entity a world first Four programs stand out as serious con tenders for the crown e Shredder by Stefan Meyer Kahlen the dominating program over the last decade e Fritz by Frans Morsch the most well known program e Junior by Amir Ban and Shay Bushinsky the current Computer Chess World Champion e Our program Hydra in our opinion the strongest program at the moment These four programs scored more than 95 of the points against their opponents in the 2003 World Championship Computational speed as well as sophis ticated chess knowledge are the two most important features of chess programs FPGAs play an important role in Hydra by harnessing the demands on speed and pro gram sophistication Additionally FPGAs provide these benefits e Implementing more knowledge requires additional space but nearly no additional time e FPGA code can be debugged and changed like software without long ASIC development cycles This is an Second Quarter 2005 important feature of FPGAs because the evolution of a chess program never ends and the dynamic progress in computer chess enforces short develop ment cycles Therefore flexibility is at least as important as speed
32. band width and performance The controlling body for the PCI specification the PCI SIG has ratified PCI Express as the next generation PCI PCI Express based prod ucts are now becoming available shipments are expected to achieve high vol ume as early as 2006 Figure 1 shows the adoption forecast for PCI Express PCI Express uses serial I O technology to create point to point connections and is reverse compatible to PCI preserving many original PCI advantages It scales from a single lane 1x to a 32 lane 32x architecture offering a bandwidth of 2 5 Gbps per lane PCI 32 33 has a bandwidth of 1 Gbps while PCI 64 66 has a band width of 4 Gbps The 1x PCI Express implementation matches up very well with PCI 32 33 the most commonly used PCI interface across all markets A two lane implementation 5 Gbps is an incremental improvement over 60 Xcell Journal PCI 64 66 At the high end a 32 lane PCI Express implementation supports a total of 80 Gbps providing more than enough bandwidth to support the vast majority of next generation applications Implementation Details PCI Express is a three layer specification physical PHY logical and transport all defining separate functionalities Also includ ed in the specification are advanced features for hardware error recovery and system power management For more information about PCI Express visit www pcisig com Since 2000 Xilinx has offered a line of PCI 32 an
33. consumption Series termination creates a volt age plateau that persists until a reflection is received back from the end of the line so series terminators do not work properly from a timing standpoint unless the receiver ICs are clustered near the end of the net as shown in Figure 4 There are several ways to termi nate at a junction or star connec tion One way is to have a series termination at every driver This has the advantage of reducing settling time at the receiver while consum ing a minimum amount of power Several conditions must be met for a single series termination strategy to be effective Each branch must be close to the same length otherwise reflections coming back from each branch are not in sync and end up bleeding from branch to branch Second Quarter 2005 EIS foe RSE AAA d CIR 0 gt DI P t REI P alla A A Zil Lit wis eg For a discussion regarding rl E termination component place ment see the Using Spartan 3 FPGAs to Optimize Termination Component Placement sidebar Conclusion Spartan 3 devices with built in terminators for both single ended and differential signaling and support for LVTTL LVC MOS SSTL HSTL GTL Figure 4 Series terminators do not work properly from a timing standpoint unless the receiver ICs are clustered near the end of the net because series termination works by creating a voltage pl
34. contains links to analysis Xilinx web based power tools information on XPower ISE integrated power analysis software white papers design examples and links to power centric partner prod ucts and software Web Based Power Tools Web based power estimation is the quick est and easiest way to get an idea of device power consumption early in the design flow Xilinx offers a complete set of web based power tools on Power Central A new version is released every quarter so infor mation is current and no installation or downloading is required just an Internet connection and a web browser You can specify design parameters and save and load design settings eliminating the need to re enter design parameters with iterative use Just an estimate of design behavior and a target device will get you started Die SGN Oru Version 4 1 of Xilinx web based Power Tools was released in March 2005 This new release is designed to help customers quickly and easily estimate the power consumption of their target Xilinx device and in particular includes important new Virtex 4 LX SX and FX family enhance ments Figure 1 Eiter virtex 4 Web Power Tos Version 4 1 timas a Figure 1 Web Power Tools e Power results are now affected by changes in ambient temperature one of the only power tools able to consid er ambient temperature e Iccintq and Iccauxq have been updated with new more accurate figu
35. defined radio finite impulse response FIR filters image processing functions mathematical operators A D and delta sigma D A conversion and all in one silicon for widely used applications Using System Generator within the Simulink design environment from The MathWorks you have unrestricted access to many blocks You can select both Xilinx and third party blocks drag and drop to the Simulink work space connect and simulate a system within minutes The Second Quarter 2005 Ariba ee wa gee Ul Xilinx System Generator block is used to select various implementation options such as FPGA device package speed system clock synthesis options and HDL The need to write HDL is eliminated as the tool creates the proper language for you however it can write HDL if you prefer e Ann LEIA A Figure 3 Implementation summary Simulink also enables you to integrate blocks from many different libraries Commonly used DSP block sets include math functions signal management a vari ety of filters transforms encoders decoders and linear feedback shift registers For exam ar HIGH VOLUME SOLUTIONS ple you can create a fast Fourier transform FFT or FIR core with the easy to use GUI in System Generator for DSP customize the core as per your application and run the Xilinx ISE tool in the background to build your signal processor system This flow can synthesize place route and gener ate h
36. device power consumption at a high level of accuracy cus tomized to your specific design information XPower is integrated directly into ISE software and gives hierarchical and detailed net power displays detailed summary reports and a power wizard that makes it easy for new users to run XPower XPower can accept simulated design activity data and runs in both GUI and batch mode Figure 2 XPower considers each net and logic ele ment in the design The ISE design files 82 Xcell Journal provide exact resource use XPower routing information with cross references characterized capacitance data Physical resources are then characterized for capacitance Design charac terization is continuous and ongoing for newer devices to provide the most accurate results XPower uses net toggle rates as well as output loading XPower then com putes power and junction temperature and can display individual net power data as well Available to CPLD and FPGA users the XPower design wizard allows for easy data entry to XPower and is an ideal tool for beginners The wizard helps you enter all the data required by XPower to get a good power estimate including design files sim ulation data operating environment out put loading and activity rates The wizard makes XPower results more accurate the first time and eliminates costly and time consuming learning curves Figure 3 Power Central also contains reference information
37. e We can use a lot of fine grain parallelism Technical Description The key feature that enables computer chess programs to play as strong as or stronger than the best human players is their search algorithm The programs per form a forecast given a certain position what can I do what can my opponent do next and what can I do thereafter Modern programs use some variant of the Alphabeta algorithm to examine the resulting game tree This algorithm is opti mal in the sense that in most cases it will examine only O b many leaves instead of b many leaves assuming a game tree depth of tand a uniform branching of b With the help of upper and lower bounds the algo rithm uses information that it collects dur ing the search process to keep the remaining search tree small This makes it a sequential procedure that is difficult to parallelize and naive approaches waste resources Although the Alphabeta algorithm is efficient we cannot compute true values for all positions in games like chess The game tree is simply far too large Therefore we use tree search as an approximation pro cedure First we select a partial tree rooted near the top of the complete game tree for examination usually we select it with the help of a maximum depth parameter We then assign heuristic values such as one side has a queen more so that side will probably win to the artificial leaves of the pre selected partial tree We propaga
38. environment for algorithm development and debugging In addition to the IP functions pro vided in MATLAB the MATLAB lan MATLAB Simulink Sms SYSTEM GENERATOR Floaling lo fixed point DSP IP cores RTL Synthesis Simulation and Implementation Complete System Verification using Hardware in the loop Figure I System Generator AccelChip interface Language based Flow AccelChip Vector Matrix operations Design space exploration i ti i d i Ge a DG vo hb e i D f LA V WA ae E 5 a IH i X i cn AN T SA Eu MK Ki K guage is uniquely adept with vector and array based waveform data at the core of algorithms in applications such as wire less communications radar infrared tracking and image processing The AccelChip DSP Synthesis tool was developed specifically for algorithm devel opers and DSP architects who have embraced a language based flow With AccelChip you begin with MATLAB M files to perform stimulus creation algo rithm evaluation and post processing You can load the M files into the AccelChip become the golden source for tool they a design flow that ultimately produces optimized implemen tations in Xilinx FPGAs Unifying Words and Pictures In the looked on System Generator and AccelChip DSP Synthesis as mutually exclusive design past design teams tool options but wished for access to the best aspects of each tool In response AccelChip and the
39. experience and legacy code are two of the many reasons to use such devices time and time again The tendency is to build the software topology in logically partitioned function blocks that meet the ECU specification But only when these function blocks are compiled and targeted at a single device will we see the real time issues associated with sequential code Even though these functions were built discretely they will run sequentially based on a pre defined pri oritization structure You may not even see these timing issues immediately Any interrupt collision issues will not be caught in simulation because the tendency is to only test what you know should work and it is almost impossible to test for indeterminate states or real time fault conditions Any function that runs off of an interrupt can only really be tested after the system is built up and Second Quarter 2005 A HIGH VOLUME SOLUTIONS The CPLD can interface to the microcontroller through a simple serial bus and provide low cost inputs and outputs to the devices that need to be controlled run in real time At this point we can catch more of the errors but only when the product hits the road will every conceivable interrupt state be tried and tested We have to hope that these are not major failures and that they can be fixed with a software patch at the next scheduled vehicle service Function Interdependency With programmable logic you can build up a sys
40. find comprehensive information on power consumption and solving key power challenges with Xilinx devices tools DP and solutions at www xilinx com power Second Quarter 2005 ep POWER PLAY The Intersil I5L6521 Simplifies Power Solutions for Xilinx FPGAs Features e Provides one regulated voltage switching regulator capable of 20A and three linear regulators capable of 120 mA or up to 3A with an external transistor e Externally resistor adjustable outputs e Simple single loop control design voltage mode PWM control e Fast PWM converter transient response high bandwidth error amplifier full 0 to 100 duty ratio e Excellent output voltage regulation all outputs 2 over temperature e Overcurrent fault monitors switching regulator does not require extra current sensing element using instead the MOSFET s Rds on e Small converter size 300 kHz constant frequency operation small external component count e Commercial and industrial temperature range support e Pb free available RoHS compliant Applications e FPGA and PowerPC based boards e General purpose low voltage power supplies High integration with full features enables space and cost savings The Intersil ISL6521 quad regulator IC solution includes one switch ing regulator for loads as high as 20A and three linear regulators that each drive 120 mA or 3A with external transistors High integration brings
41. free downloadable WebPACK software could handle the XC3S1000 the largest device available in small FT256 package I knew it was the ime to switch to the Spartan 3 device Second Quarter 2005 Xcell Journal 19 A Ne Sensor Board gt 304 314 317 318 CPU Compressor Board 333 LAN Figure I Camera system block diagram The Camera Hardware The new Model 333 camera Figure 1 uses the same Linux optimized CPU ETRAX100LX by Axis Communications as the earlier Model 313 but with increased system memory 32 MB of SDRAM and 16 MB of Flash The second major upgrade is the use of 32 MB of DDR SDRAM as a dedicated frame buffer that works in tandem with the FPGA supple menting its processing power with high capacity and I O bandwidth The Spartan 3 DDR I O functionality made it possible to increase the memory bandwidth without increasing board size the complete system still fits on a 1 5 x 3 5 inch four layer board see Figure 2 The actual board area is even smaller as the new one is designed to fit the sealed RJ45 con nectors for outdoor applications For the camera circuit design the goals include combining high computational per formance with small size that also simpli fies preserving high speed signal integrity on the PCB and providing the flexibility for the reconfigurable FPGA on the system level For the latter I decided to split the camera circuitry into two boards one main
42. given computa tional problem and an FPGA network con figuration Figure 5 shows a system level view of mapping an enormous sequence matching application to a multi device FPGA computing platform Conclusion Looking forward using Nallatech s scalable FPGA networks will help realize high per formance computing capability for bioin formatics applications at a low cost A practical system will be evaluated on e Scaleability to execute diverse complex algorithms e Ease of implementation and Integration e Price performance demands Our prototype implementation of the Smith Waterman sequence alignment algo rithm provides a computational speed improvement of more than two orders of magnitude 200X while running on a three FPGA network To find out more about Nallatech s scal able FPGA computing architectures and networking tools visit www nallatech com To discuss how Nallatech can support your FPGA acceleration and systems develop ment needs please contact Nallatech at contact nallatech com or call 1 877 44 NALLA e Partial Gene Core N computation limited IQI gt IDI the PE Host Computer network behaves as a parallel system Can Be One or More Stacks of FPGA Boards Note that the use of large on board memories relieves the I O bottleneck For example the Nallatech BenDATA IT Module and BenNUEY PCI 4E motherboard adds 1 5 GB of on board memory In addition the Figure 5 Resource a
43. ideal for DDR memory interfacing as they include DualEDGE triggered registers a global clock divider and voltage referenced I O standards including SSTL_2 These fea tures provide the capability to interface between a microprocessor and high speed memory devices such as DDR SDRAM To make implementing this function easi er Xilinx offers a reference design and application note XAPP384 Interfacing to DDR SDRAM with CoolRunner II CPLDs for downloading at the Xilinx website www xilinx com literature This reference design is particularly useful if your chosen microprocessor does not sup port HSTL or SSTL memory interfaces Conclusion Automotive design has always been a chal lenging environment when faced with small form factor ECU specifications wide temperature ranges high quality and reliability requirements and low cost points Today s designs also need to be flex ible upgradeable and easy to test fully Programmable logic is a viable alterna tive to microprocessors and microcon trollers because they offer true design interdependency and parallel processing CPLDs can be used to great effect where microcontrollers have run out of I O or processing power or simply for memory interfacing or voltage level shifting 56 Xcell Journal DDR RAM Xilinx CPLD Memory Xilinx Automotive Devices Designing electronics systems for automobiles has always been challenging With automotive electro
44. ment that unobtrusively plugs into a Xilinx EDK design flow enabling you to quickly make dramatic improvements in the performance and power consumption of your application Triton Builder generates the RTL for the complete accelerator the driver required to invoke the accelerator and inserts the driver into the proper place in the original application The RTL can be generated in either Verilog or VHDL This integral generation process ensures that necessary hardware and software components are exactly matched for trou ble free design The tool also generates an RTL test bench and SystemC model for verifying the new hardware Design Architecture of FPGA Based Accelerator The builder tools create a design architec ture ideally matched to FPGA design Using Triton Builder you can quickly create a peripheral using the C code that runs on the processor The tool moves the computa tionally intensive portions of the code into a 84 Xcell Journal E caia past we EEN ms Ad Arie FR 2 Mpg rer 7y lap H ile ppi aser rei eyer A rer re A Figure 1 Bus activity graph hardware accelerator This acceleration hard ware is connected directly onto the proces sor bus and implemented on the FPGA fabric The block diagram of a typical accel erated architecture is shown in Figure 2 If while executing the application code the processor hits a section of code that has been moved to hardware control is passed to the a
45. model of the fil ter The model has a coefficient width of 12 a coefficient binary point value of 12 a data width value of 10 a data binary point value of 8 and a sampling frequency of 44 1 KHz Scope shots of the filtered out put are shown in Figure 2 An implementa tion summary displaying the use of RAM and multiplier resources is shown in Figure 3 This design achieved 125 MHz per formance in a 4 speed grade of the Spartan 3 device Xcell Journal 29 E HIGH VOLUME SOLUTIONS The NuHorizons Spartan 3 Board Xilinx its distributors and third party companies offer several boards for proto typing or emulating a DSP based system A low cost prototyping platform from Nu Horizons Electronics Corp is the Spartan 3 development board HW AFX SP3 2000 DB Figure 4 The board comprises these elements Figure 5 e Xilinx XC3S2000 4FG676 Spartan 3 device e XCFO8 Flash PROM for configuration e 4 x 24 character LCD display e Graphical LCD interface e 64 Mb SDRAM 2 1 Mb x 16 x 4 e 32 Mb Flash 2 Mb x 16 e 50 MHz clock oscillator e PLL clock multiplier e CAN 2 0B transceiver e RS232 interface e PS2 interface e Audio CODEC e Two channel A D and D A converter e 10 100 Ethernet MAC e 10 100 Ethernet PHY e Flash memory interface e SDRAM memory interface e LED push buttons e JTAG configuration header for programming e 16 bit LVDS I F with clock and control e Test point headers for debugging This fully loaded Sparta
46. ns drivers produce clean receiver waveforms The faster 2 5 ns and 1 0 ns drivers how ever produce reflections and ringing on the yellow and red receiver waveforms The Second T Topology Signal Integrity and Timing Signal integrity problems tend to disappear when nets are short relative to how fast they are driven as reflections settle much more quickly From the fastest 1 0 ns wave form in Figure 2 the reflections eventually smooth out at a half inch trace length Although academically instructive a health conscious engineer certainly would not want to specify more than a few care fully planned high speed nets with a half inch maximum length requirement Sometimes departures from good prac Ee tice routing can actually be a key to resolv Xcell Journal 49 E HIGH VOI Ue SONUTIONS ing signal integrity problems Consider the case of a clock with multiple receivers each of which is skew sensitive the clock must arrive at each receiver at close to the same time In this case a daisy chain route may not be ideal because it delivers the signal to each receiver serially inherently creating skew Here a superior scheme may be a star pattern in which each receiver or small subsets of receivers has its own routing branch Each receiver can be placed at approximately the same delay length from the driver and each receiver is considerably more isolated from other receivers than on a daisy chai
47. on or in rush current is not a viable option for their system designs Attention to detail when designing the Spartan 3 configuration logic has yielded devices where the maximum quiescent power alone is guaranteed to be sufficient to power up the device Spartan 3 devices have no in rush current or power specifica tion When using these low cost devices you can focus on the product features and design without worrying about increased system cost because of high in rush power requirements Power Management Tools Web Power Tools are pre implementation tools that estimate a design s power con sumption based on the expected utilization of device resources operating frequencies and toggle rates Once you have implemented your design in the Xilinx software tools you can use XPower to accurately estimate the power consumption Actual power consumption must be determined in circuit under the appropriate operating conditions Web Power Tools The intuitive interface guides you through the steps of the data entry process and ensures the most accurate estimates possible The equations and values used by Web Power Tools are based on device characterizations for the family Web Power Tools are available for the Virtex 4 Virtex II Pro Virtex IL Virtex Virtex E and Spartan 3 FPGA fami lies as well as CoolRunner IJ CPLDs XPower XPower is the first power analysis software available for programmable logic design allowing the a
48. place and route results serve as a timing sanity check if the trial PBlock does not meet timing you are guaran teed that the logic will not meet timing in the global context It is easy to pin point and address the potential show stoppers in your design Bottom Up Flow Once a PBlock has run through place and route and its results look promising you can use the placement results for the Figure 4 Block based team design Second Quarter 2005 PBlock in a powerful team based design flow For example the design manager could divide up the logic into PBlocks and assign one or more to each team member Figure 4 Team members focus on achieving satisfactory results for each of their assigned PBlocks and return those results to the manager The manager then stitches the design back together by importing each PBlock result into PlanAhead software as it arrives Because PlanAhead design tools do not lock down routing the design manager will need to launch a final route run to complete the design IP Re Use PlanAhead software has the ability to export any IP module which can then be run through place and route Once you are satisfied with performance you can save the module s placement in your team s IP library Another team member can import the IP and its placement into an entirely new design PlanAhead design tools also allow IPs to be moved around on the device the relative placement of all logic elemen
49. points Second Quarter 2005 Register 1 Black Box Die SIG LO SES Compare Point K Figure 1 Logic cone with compare point 2 Match using a variety of tech niques the equivalence checker maps corresponding compare points between the implementation and reference designs Figure 2 The fastest of these techniques is name based matching but differences between the RTL and gate level rep resentations can lead to highly dis similar names making advanced matching algorithms necessary Registers may have been duplicated merged or otherwise optimized away such that the logic cones between design versions may be quite different making the match ing process far from simple Reference Design Logic Cones VVVVV ypy gt gt gt gt gt gt gt gt gt 3 Verification using mathematical tech niques each set of matched compare points is proven to be functionally equivalent or non equivalent 4 Debug when non equivalent logic cones have been identified graphical debug techniques are available to isolate the logic causing functional deviations Static verification has two primary bene fits it is orders of magnitude faster than dynamic verification and provides 100 verification coverage EC can prove that dif ferent versions of a design are equivalent or not in a matter of minutes rather than the hours or days required for dynamic simula tion Equival
50. popu lar in high volume consumer electronics applications where they store system parameters or code for an embedded proces sor SPI flash PROMs also offer a low cost pin saving configu ration solution for Spartan 3E FPGAs SPI flash memory is multi sourced and multiple densities are available within the same package footprint Parallel NOR flash is the pre ferred solution for FPGA appli cations with an embedded processor such as the MicroBlaze soft core processor At power on the FPGA config ures from one end of the parallel flash After configuration the MicroBlaze processor either exe cutes directly from the opposite end of memory or shadows the code to external SDRAM Unused Memory After Configuration Table 2 Required SPI device sizes and estimated cost Second Quarter 2005 the embedded processing and consumer electronics markets Many modern microcontrollers use this interface as well as a wide variety of third party peripherals SPI Flash Suppliers Although SPI is a standard four wire interface the various available SPI flash PROMs use different command proto cols Spartan 3E FPGAs support up to four different command protocols Currently SPI flash PROMS from five vendors Atmel NexFlash Programmable Microelectronics Corporation PMC Silicon Storage Technology SST and ST Microelectronics are tested and sup ported as shown in Table 1 Devices from other SPI suppliers are being tested a
51. s The Theory and Practice of Modem Design Working together the DFE TX pre emphasis and RX linear equalizer pro vide an incredibly rich array of possible adjustments Conclusion For any channel with as much as 6 dB of runt pulse degradation a simple pre emphasis adjustment easily doubles the length at which your link operates If you anticipate more than 6 dB of runt pulse degradation we strongly suggest that you simulate your system in detail before making the final equal izer adjustments Contact your local Xilinx customer support office or visit the Xilinx website to obtain the neces sary RocketIO models and associated design kits for modeling your channel The modeling effort is well worth it as equalization can substantially extend O the reach of your circuits Howard Johnson PhD is the author of High Speed Digital Design and High Speed Signal Propagation He frequently conducts technical workshops for digital engineers at Oxford University and other sites worldwide For more information visit www sigcon com or e mail howie03 sigcon com Figures 1 3 4 and 9 are adapted with permission from Johnson and Graham High Speed Signal Propagation Advanced Black Magic Prentice Hall 2003 Xcell Journal 9 AP BUSINESS VIEWPOINTS Changing the Systems Landscape with Low Cost FPGAs The advent of low cost FPGAs signals the dawn of a new era by Richard Wawrzyniak Sr Market Analys
52. see Srgselelsel AE AIGA Sele coins FAS Ciri ouies GMPSEIS Early Adopter 2006 2007 Figure 1 PCI Express adoption forecast Second Quarter 2005 TxDetectRx PhyStatus TxData 8 or 16 TxDataK 1or2 External PHY RxPolarit TxCompliance TxElecldle RxElecldle RxData 8 or 16 RxDataK 1or2 RxValid A Status 2 E Genesys Logic Philips Semiconductor Texas Instruments Others Jeck PCI Express I F IP Core PowerDown User Logic PIPE Interface Pins SSTL2 Figure 2 PIPE interface between a Spartan FPGA and an external PHY 40 External PLD d e A SPARTAN3 p w 30 External DLLs a Ze SE d Memories ol T E Controllers and XC3S1000 lt 20 Translators d TENE AE S i gt 50 Logic gt 50 Logic O Se PCle IP Core PCle IP C o 10 1x PCI Express AE i to PCI Ee 1x PCle PHY 1x PCle PHY Solution 40 Solution 20 Solution 17 High volume pricing Figure 3 Single lane PCI Express implementation options the PCI Express Core IP sidebar for details on Northwest Logic s product and www xilinx com pciexpress for details on PCI Express IP from our other IP partners Figure 2 shows the implementation of a PIPE interface using a Spartan FPGA and external PHY Figure 3 illustrates a range of options to implement a single lane PCI Express inter face The cost of a standard product option is
53. static power of other 90 nm FPGAs You can also obtain a 20X power reduction using the available Virtex 4 embedded functions How has Xilinx been able to reduce power in its Virtex 4 FPGA Triple oxide is the Virtex 4 process innovation that has reduced leakage current whereas two oxide thickness was used for Virtex II Pro devices and all past families However Virtex 4 FPGAs add a third middle thickness oxide transistor thereby reducing static power The Virtex 4 architecture has also enabled dynamic power per CLB to be reduced by 50 at comparable frequen cies Even at a 50 higher operating fre quency Virtex 4 devices reduce dynamic power by 20 An overall increase in power demand also emphasizes average junction temperature and thermal power consumption Exceeding the allowable junction temperature leads to reduced system performance forces reduced device utilization and reduces device relia bility The more information you have about junction temperature and how your design will affect it the more informed your deci sion will be on package types and thermal design considerations as well as where you can make design changes to help reduce potential thermal problems Power consumption is design depend ent and affected by output loading sys tem performance switching frequency design density number of interconnects design activity the percentage of inter connects switching logic block and interconnect struct
54. test bench from the MATLAB source this test bench is applied at the RTL level within AccelChip and can be applied in System Generator to verify the correctness of the design Once verified in the System Generator environment you can verify the AccelChip generated block using System Generators supported methods includ ing HDL co simulation and hardware in the loop to accelerate hardware level simulation 10 to 100 times Conclusion The integration of AccelChip with Xilinx System Generator is the first solu tion to unite MATLAB based algorith mic synthesis favored by algorithm developers with the graphical design flow used by system architects and hardware designers It uses the rich MATLAB lan guage and its companion toolboxes to create System Generator IP blocks of complex DSP algorithms By using these tools together design teams can employ the most productive means of modeling hardware for imple mentation fully involving algorithm developers in the FPGA design process and completing higher quality designs more quickly For more information on AccelChip and its interface to Xilinx System Generator visit www accelchip com For more information on System Generator visit www xilinx com products design_ resourcesldsp_centrallgroupinglindex htm Xcell Journal 15 Complex FPGAs Require Equivalence Checking Synopsys Formality equivalence checker proves identical functionality Bee e
55. the traditional box shipment or downloading from our secure delivery system ISE 7 11 software became avail able via download just six days after its product announcement so if you need your product shipment you no longer have to wait for the box to arrive ESD also offers a cleaner delivery and registration system Because your registra tion data is obtained and confirmed upon login no product ID step is required Product downloads and registration keys are available together online in a speedy and secure environment Your ISE update can now be on your system and running faster than ever before Xilinx is one of the only PLD vendors offering all of their logic design tools through Fiia electronic download Conclusion ISE software sets the pro grammable design bar even higher with the release of 7 li design tools With faster performance new design tools advanced HDL simu lation expanded device and operating system support and electronic download we believe no competitor offers you more whether you re making the next handheld CPLD based PDA or high speed serial high density FPGA based blade server From logic development with ISE Foundation software ISE BaseX or ISE WebPACK software embedded design with Platform Studio system design with System Generator for DSP real time ver ification with ChipScope Pro analyzer or floorplanning with PlanAhead design tools Xilinx and ISE software have your project
56. tions to significantly reduce the number of external components required and simply your designs Combining increased design flexibility with cost effective voltage conversion the device includes programmable softstart for in rush current con trol and independent enables for sequencing the three channels The TPS75003 meets all Xilinx startup profile requirements including monotonic ramp and minimum ramp times For more information about the complete line of TI power management solutions for Xilinx FPGAs including a library of reference designs schematics and BOMs visit www ti com xilinxfpga For questions samples or an evaluation module e mail fogasupport list ti com di TEXAS INSTRUMENTS Second Quarter 2005 EN ch A 1 i Wi a 2 with COMPETING OWER Has BECOM RNIN Design Example Check the specs for yourself at realistic operating temperatures 6 TIT serrr rrer T 85 CH Different logic architecture or dielectric just won t do it No competing FPGA comes close to Virtex 4 for total E power savings d e 73 lower static power 5 e Up to 86 lower dynamic power 94 lower inrush current Take it to the lab and see UNIQUE TRIPLE OXIDE TECHNOLOGY Ge EMBEDDED IF At the 90nm technology node power is the next big challenge a 0 50 100 en 200 250 for system level designers An inferior device can suffer leakage Performance MHz dramatic surges in static power
57. to automate the time correlation of each instrument s capture to the other and import scope waveforms into the logic analyzer it becomes much easier to track down elu sive cause and effect problems whether the source is logic or signal integrity For information about logic analyzers and oscilloscopes from Agilent visit www agilent com find logic and www agilent comifind scopes Xcell Journal 69 ES DESIGN TOOLS Real Time Debug That Dominates Debug your design with the ChipScope Pro system by Lee Hansen Sr Product Marketing Manager Xilinx Inc lee hansen xilinx com Brent Przybus Sr Product Marketing Manager Xilinx Inc brent przybus xilinx com Verification continues to be one of the most time consuming and time critical phases of the design flow The fixed nature of an ASIC or structured ASIC makes identifying and finding problems on the board prohibitively expensive and mas sively time consuming HDL verification and timing analysis tools offer a good first line of defense in predicting design behavior but we believe no design tool offering beats the advantage that the Xilinx ChipScope Pro system brings to FPGA real time verification Faster FPGA Verification The ChipScope Pro system available as a separately purchased option to ISE soft ware allows logic and embedded software designers to debug their FPGAs in real time You can find design problems quickly while the chip is ru
58. to make decisions and provide con trol Typical applications include active video surveillance robotic arms motion Xcell Journal 23 E HIGH VOI Ue SONUTIONS measurement of points and distances and autonomous guided vehicles Image Warping Theory Digital image warping deals with tech niques of geometric spatial transformations The pixels in an image are spatially rep resented by a couple of Cartesian coordi nates x y To apply a geometric spatial transformation to the image it is conven ient to switch to homogeneous coordi nates which allow us to express the transformation by a single matrix opera tion Usually this is done by adding a third coordinate with value 1 x y 1 In general such transformation is repre sented by a non singular 3 x 3 matrix H and applied through a matrix vector multiplica tion to the pixel homogeneous coordinates N ei EEE EEE mz Si The matrix H called homography or collineation is defined up to a scale factor it has 8 degrees of freedom The transfor mation is linear in projective or homoge neous coordinates but non linear in Cartesian coordinates The formula implies that to obtain Cartesian coordinates of the resulting pixel we have to perform a division an operation quite onerous in terms of time and area consumption on an FPGA For this reason we considered a class of spatial transforma tions called affine transformations that is a partic
59. tool setup to understand these logic optimiza tion transformations Fortunately Synopsys Design Compiler FPGA and Xilinx PAR tools write out an automated setup file for Formality This file contains information telling Formality which optimizations were performed in particular areas of the design minimizing the manual setup information that you might otherwise need to provide Formality uses this setup information to understand the differences between the two BEE el Ek versions and after validating the informa tion uses it to more quickly complete the equivalence check Linking multiple tools through the setup file helps you achieve the fastest possible time to results and eliminate errors that might be introduced in a manu al setup process Complex FPGA devices are increasingly being designed using modular or hierarchi cal flows Formality provides an ideal way to verify each individual module and ensure that they have been correctly stitched together at the upper levels of your design Formality is a mandatory step in the EasyPath methodology EasyPath devices are not reprogrammable so ensuring that the device is 100 equivalent to the refer ence RTL is essential Formalitys static analysis techniques prove functional equiv alence minimizing the risk of implement ing incorrect functionality to help you reach volume production sooner Conclusion Todays FPGAs have achieved the same level of functional comple
60. undesirable either loosen the timing constraint or force the logic into RAMs ROMs multipliers Please see the Synplify online documentation for more information e There are times when the Synplify tool will not pack logic into the dedicated resources Common occurrences are RAMs either the address bus or the data output must be registers Reset or preset is synchronous asynchronous causing failure to map Enable is synchronous asynchro nous causing failure to map our online help has examples Timing constraints are too tight logic is mapped to registers for tim ing reasons ROMs are less than half populated use the syn_romstyle attribute to force e Clock tree management Clock man agement is extremely design depend ent so it is difficult to offer generic advice The idea is to pack each clock quadrant of the chip with as much logic as possible The Synplify tool will build feedback logic for registers without resets This extra MUX can add to the LUT count Reset every register in the design if possible If certain paths modules are clocked by the critical clock but are not critical themselves either supply a multicycle path or give them another clock line off the DCM The perfect solution is to have as little as possible high frequency critical logic on the chip Experiment with different state machine implementations Try a run with the Synplify Pro product s F
61. you choose your pin package Multiple Floorplans PlanAhead software has the ability to open multiple device views on the same netlist and create a different floorplan in each of the device views You can analyze each floorplan for utilization and per formance characteristics and ultimately run through place and route This gives you a powerful what if mechanism for design space exploration na AAA E EI i E ee ITEN zz ee eS ee Sen T Epe Sy eer ie Sei min ri Sen ES E A M ras A eg A e Bug T a Ji 2 dar Pr gt dief EA a Ju 2 Sere a Mama Seng PT ls kd a zap a mai bo r ri I ara rm E I i o n ma E gja baur E al d ke pb ra Gem ster Tse DE Ee ke mir a F T Sa r Weg E Figure 2 The PlanAhead analysis environment Second Quarter 2005 Die SIG WI Oru Statistics You can generate a detailed statistical report for any block in your design that includes information on clocks resource utilization RPMs carry chains clock regions and the number of internal versus external nets This could help you narrow down problems in your design to specific blocks or modules that you could then re code or re synthesize with different options Schematic Viewer A comprehensive schematic viewer can help you navigate timing paths trace through cones of logic or determine the floor planned block connectivity Some simple net and pin tracing capabilitie
62. 1 MAA ww Op X OF L1SL4 07 915 91 915 Z1 tt Na WU S X SE 261143 TI ZGE dure lt ZSE d ww 7 X ZZ 21943 096 096 096 096 ww OF X Ob ELS 143 0v9 891 891 891 o9 0v9 Ca WU GE X GE 81144 oze pp oze pp eb pt oze gei ww ZX ZZ 89944 orz orz ObZ wl WU LX It SE 968 968 lY BOWS90 EE OBBZOOLZ meer rom meet aer arme g e 8O8 LIE IS aen O89 LIL OE aerer eu user mengt goe RG P 0l LL ot LL ot Ll ot LL OL LL OL LL OL LL OL Ll OL 0l LL ot LI ot LL OL LL ot Ll ot LL OL LL OL ORESTES sapeip LL OL zt A TS DT TT A A I 159158 0 15amo s enaauo p ds vc 02 gip Cl 8 0 SIDAIBISULAL 91195 0190Y 9 NOS9Y y y y y Z 7 9P0 8 DVN 1919413 0001 001 01 dl DIE Z Z dd rd L S190 9 10SS 201d Dd1 MOd E 261 091 871 8v ZE ZE 215 261 871 96 96 96 08 v9 v9 gp ZE S S deOdaua1g s 2 NOS Y dert I 811155 I 8 LILSS 11Z11SS IZTLSS AS LAS U Al TASH A L AS L III TLSH A8 L AS L 11 115H A8 L AS 1 1 TSH 119 119 991Dd X 1Dd EESOWDAT ILLAT EEIDd SLSOWDAT 8 LSOWDAT SZSOWDAT SZ 193dM1 SZ SGATN SZ SAM8 SZ LXISGAT SZ SGAT SZ 101 METI stb v8 887 Gi 091 091 oze Fri 091 r 08b 08b 18 oze oze Kr 091 suled O I EL USI9JH O ve SOA Sal Sal Sal SOA Sal SOA Sal Sal SOA Sal Sal SOA Sal Sal Sal SOA aduapaduy pa 0 13u0 Ajjeqbig 931N059Y OI LL Sl El LL 6 6 EL LL
63. 11 hours 24 minutes e Tt 1 7 e ha Keng i r geb A B qt Ki Edel a is see i 1 1 ri D a Dei D o de ki S DI 1 E OOOO pa ao 50 E Figure 4 The BenNUEY PCI 4E motherboard configured with three BenDATA II modules Where IDI is the number of elements in a given database sequence D IQI is the length of the query sequence Q equal to the number of processing ele ments T is the number of nucleotide sequences in the database Vis the number of FPGAs in the network and Gu is the clock period in the FPGA network A truly parallel FPGA network would have a processing time given by tp ta X E x D Q N while a serial FPGA network would have a processing time given by ty ue X TX D Q In tasks where the processing is I O limited IDI gt IQI the FPGA network acts as a serial system where as in tasks where the processing is BenNUEY PCI 4E motherboard has 4 GB Ethernet Xilinx RocketIO ports allowing direct connec ports and eight tion to storage area networks SANs net worked memory devices and interlacing of MGTs between boards These Nallatech boards which became available in early 2005 would be ideally suited for real world bioinformatics appli cations Figure 4 Automatically evolving computational Partial Gene Core 0 Partial Gene Core 1 resource partitioners are currently being developed at UNC Charlotte for optimal resource allocation for a
64. 20C H A Eege e Seven segment LEDs 4 oR Pathe a Ee VENU e Push switch e DIP switch e Four channels 10 bit analog to digital converter e Four channels 10 bit digitalto analog converter SS SUE e E ege A e 32 Mb flash ROM 2M x 16 bits E infeVviuN t se g G e 64 Mb SDRAM 4M x 16 bits E Cie G E M we e Interfaces KS Sei E A Parallel port o ue dw kk Serial port TIR k Se JTAG Expansion connector 20 pin 12 signals The 1S101 is an ideal platform for evaluating Xilinx Spartan 3 FPGAs The Circuit Design Learning Tool TD BD TS101 is a toolkit for efficiently learning how to design circuits using HDL The toolkit is Part Number based on a Xilinx Spartan 3 XC3S400 4PQ208 mounted on Deliverables TD BD 15101 Board board as the main unit The main unit carries an RS 232C connector for communica Sample RTL Source Code tion with an external device By using the toolkit in combination User s Manual with your PC you can easily check the board s I O operations to Information for FPGA Mapping UCF File learn circuit designing This product comprises the main unit AC adapter documents AC Adapter and CD ROMs On board four channels of AD DA converter 32 Configuration Cable bits of flash ROM and 64 Mb of SDRAM extend the use of the evaluation board for several kids of circuit design You can program your designed circuit test circuit into on board PROM through the parallel cable fro
65. 5 ISSUE 53 Xcel journal INDUSTRY EXPERT Extend Your Reach Using RocketlO usceugr 6 BUSINESS VIEWPOINTS Changing the Systems Landscape with Low Cost Hike 10 HIGH VOLUME SOLUTIONS Leading the High Volume Programmable Revolution e 13 Spartan 3E FPGAs Introduce a New Era in Low Cost Programmable Logic 14 Implementing New Configuration Options for the Spartan 3E Tom 16 Encoding High Resolution Ogg Theora Video with Reconfigurable FPGAs 19 Implementing DSP Algorithms Using Spartan 3 Hits 22 Designing a Spartan 3 FPGA DDR Memory Juierinrg 26 Signal Processing Capability with the Nu Horizons Spartan 3 Development Board 28 A EE ce 31 Reducing Bilkof Materials Cost Using Logic Lonely 34 Using CoolRunner ll Silicon Features to Reduce ost 36 Using Spartan 3 3E Features to Area Optimize Your Deem 39 Low Cost EasyPath FPGAs Offer Promise to ASSP Compte 4 Applications of the Spartan Family in Flat Panel Denn 46 o o a X cc NENG 49 Meeting Timing and Reducing Area with the Synplify Pro Inge 52 The Changing Face of Automotive ECU Deen 54 A Multimedia Platform for Automotive and Consumer Markets c o ooo o o 57 Alow Cost E 0 e ee Ee E 60 DESIGN TOOLS Design Leadership From Xilinx Introduction ssssssssssssssssssssssssssrrsrisririirrerrereesrsssssssssssss 63 Design Performance Leaps
66. 6 Ll Ll Ll Sl El El LL 6 syueg O I 2301 968 89 915 pt oze oze 0v9 pt oze 096 096 096 891 019 0v9 8v oze O I 99195 Ve 8 8 8 y 0 0 y y 0 8 8 8 8 y y y 0 GDNd Sept 901 Payoyeu aseyd S3 1NOSDY 07 ZI ZI 8 y y 8 8 y al al ZI al 8 8 8 y Wa staBeuey 1901 e1161q 01 9 6 6 891 9 dk 2657 GA 819 09 s get GIN Sp v8l s 07E y VER VER SL 962 L 198 Suq gt INV 42019 12301 s J nos ZSS ag ZEZ vr 89 oe oze 261 871 9E 887 orz 007 091 96 ZL 8v yes SHAA 81 2291 O4I4 VH Dog Se 889 010 1 918 v 9 08b v0b 186 L67 v0L 9 l 255 18 917 E6 moien 0v8 E9L 80v szv L Fre Ian ZEv 98L Obvb ELS GE 216 v67 Z 0 ZL vOE 86 sig We P39INQUASIA ve 9E 971 25 v8 095 08 SH 880 L1 vv6 0L 29 1 6b Nd 08b 07 911811 891 SEl vOE 86 089 L 8v7 ES 198 9 v0S 17 88771 sdo y dil4 81 SU 968 v6 goe v06 Ly Ga d 967 SS VK Ov0 ez 8bi 007 GE 265 011 0v9 08 GU Uv Lv 261 v7 GH sie 21607 9 1NOS9Y 91 891 9 IN O8Z SZ Gel vyS 8 SU dE 09 sL 0vZ 01 880 68 Y8S L9 SIN 0b8 s yz9 97 rg 28101 vy1 9 EEN GEI 89 X 091 7S X 871 ZS X 96 9 X 79 GT 8h X 8ZL 0y X96 Ob X 9 EI 88X 261 v9 X 761 96 X 091 7S X 871 9 X 871 87 X 96 2X9 uwnjo x moy feu 87 NA EVA OOLXSAVIDX ARI EVA OVXAAVIX OCXAAVIX CLXAAVIX AyIAI D UUOD jenas 9 Hulssed01g pappaquiy X4 p XeHIA Burssad0 g eubis XS Y X HIA 21607 X1 p Xx9LIA i KILNA XIAJE A UOI1339 9S5 PNpPOld 3 E gt lt S H A p WOI XUIJIX MMM SYDd4 t X MIA XUIJIX nsAuedwo 31607 ajqewwesbolg 34L XNNI
67. 8 403 LUTs An upper bound for the number of cycles per search node is nine cycles We estimate that a pure software solution would require at least 6 000 Pentium cycles The longest path con sists of 51 logic levels and the design runs at 30 MHz on a Virtex I 1000 We have just ported the design to a Virtex XC2VP70 5 so that we can now run the program with 50 MHz In software a move generator is usually implemented as a quad loop one loop over all piece types an inner loop over pieces of that type a second inner loop for all direc tions where that piece can move and the most internal loop for the squares to which the piece can move under consideration of the current direction This is quite a sequential procedure especially when we consider that piece taking moves should be promoted to the front of the move list In a fine grain parallel design however we have a fast small move generator which works very differently In principle the respectively forwarding the signals of far reaching pieces to neighbor square instances Additionally each square can output the signal victim found Then we know that this square is a victim a to square of a legal move The collection of all victim found signals is input to an arbiter a comparator tree that selects the most attractive not yet examined victim The GenAggressor module takes the arbiters output as input and sends the sig nal of a super piece a com
68. 801 ww zL XZL MIA 91 v zt ZO Zl OL L GL Gel EE GIE gp 952 0009 ECT iD Ov 9E uw x EA S a S lt FPUUNY D 91 y Ol DLL MISS 9 801 EE DIE E 87 821 000 1X8Z 1 EYIX ee H Duppds eg ww go vDg aje9s diy gt puoq a11m s sabey deg 2422S d y F 91 yY Ol Ol Z 0l 9 9 89 eE DIE E 87 v9 00S TX790EUDX 901 OOL wu gx g ZEL 91 va POL Ol 0l Z S S 9E EE DIE E 87 CE OSL TXZEQEUDX gp ep E ww 9 X 9 NES Buneds Weg ww Co vpg aye9s diy puoqg asim 4 sabey deg ayers d y HOA EE Aug EV1dX 19UUNy O0O 8LL OZL 80L SLL SLL 001 ww OC X O 7Z MA El en Olle ol O D Y OLC EE E ES OV ZLS O00 l CLSGDZOX Supeds pe j ww S 0 ddd up 01 sebexpeg d40L We o 0l 0L L lL oo Ob EE GAUSWSL EE SUSUSL Ov rse 0006 YSEDZIX Gel toe Si K E MIN oe e L L 9 US 7 a ae a e a a or oer 000 9 9SZIZIX ge 9 ee ce MUIwOzZLXOzL EH Ittre Ll be is L 9 L S C 00l E S TB LTL EE CS WSL OV 821 000 LTA DTA A Gupeds pe j ww e 0 ddd uu1 Maa OA sobegogd d4DA I S El E i E G oy ZC P EEN TELL E G 2 8 WSL OV 79 00S 1 Vv9D7 DX 08L ZZL P ELL ELL ELL wu g0E X 9 0E 807 Guneds pea ww ECO ddd gt nsejd puoq a11m Dd sabeyred ddd El 9 9 9 y gE ZC EE E EENSTIEB WSL EE CS WSL OV Ce OSL 4 Yao 4 9E 9E cc ee WWSZLXSLL 3194 8 4 Awe 19UUNyY OO Guneds pes ww 771 191118 diyd 1sejd puoq 311m Jd sabey deg 2 1d LE uu xo SCH e a Le ES AE A E x lt a eh E E oun 3 Go CN q et e D SC
69. AGING EDITOR Charmaine Cooper Hussain XCELL ONLINE EDITOR Tom Pyles tom pyles xilinx com 720 652 3883 ADVERTISING SALES Dan Teie 1 800 493 5551 ART DIRECTOR Scott Blair Viez O PUBLICATIONS Xilinx Inc 2100 Logic Drive San Jose CA 95124 3400 Phone 408 559 7778 FAX 408 879 4780 www xilinx com xcell 2005 Xilinx Inc All rights reserved XILINX the Xilinx Logo and other designated brands included herein are trademarks of Xilinx Inc PowerPC is a trade mark of IBM Inc All other trademarks are the property of their respective owners The articles information and other materials included in this issue are provided solely for the convenience of our readers Xilinx makes no warranties express implied statutory or otherwise and accepts no liability with respect to any such articles information or other materials or their use and any use thereof is solely at the risk of the user Any person or entity using such information in any way releases and waives any claim it might have against Xilinx for any loss damage or expense caused thereby LETTER FROM THE EDITOR What s Both High and Low and Used Everywhere This issue of the Xcell Journal focuses on high volume low cost consumer applications and the design tools used to implement them Business Viewpoints Our Business Viewpoints column offers an independent business perspective by Rich Wawrzyniak Senior Analyst ASICs Services for Semico R
70. Aan DA sabeyped d4OA S as 22 gt E 2 lt lt A A 2 S e A 2 z S S S 27 32 S ZS o 2 El ee ce ee n 3 o lt A IL It vz gel gel uu goexooe EH EN See be a e o 2 5 r S pS 5 T 5 VO UU uv o E gt E E a e e uv Gupeds pea ww 0 d4 20spd puog asim dd sed d4dd 2 7 E a o 2 a a o A AE RJE s 2 2 7 S T SE gogoco AE oa EECHER EC E 3 a 3 3 Se e E 5 e z S 2 2 e E e A Lee xxl xxx E a 2 is w A CH NCAA AA AAA GAO lte ol fo lm 1 amp ER O ka O Se gt W W U U U Y Y W w Ww WW W ER Lo Oo a Zz Lo L Nn Un L i Un Nn L L Un N n Nn un E y u Bb N Bb N U 2 Hi N u SE LO S S oral ESA ESA SM MS O N O U Re 9 45 JE NVIAVISS 2S e e e o e e o o o o o j D a E ESO HOM ROM Be m om m Lo A D _ AZ 1 ueyeds AZ 1 3e ueyleds wows ie sainjesj4 O S91NO0S3Y 91D asa sadinosay JOUI9IA S9 JNOSIY 91D BS 6 S 0 1493SN pue suoizdo abeyped KLEIN UOIJD9 9S PNPOAdg Second Quarter 2005 spBa y JO BAISNIU De spnpold 9Uue1J PpBa JO SUOISUQUIIP Bal y BON Miup au JO JaquaW St 10 abeyped siy ul ajquedulo gt uid 30u ase ajqeua yod pue suid DYLf y 097 OcZ UC ObZ Wu EZ X EZ Gupeds Weg ww oU vog auttauu puog a4iMm 54 SOHEAEA voa ZIZ ZIZ v9L ZIZ ZIZ P ww ZL X LL 957 E Sa aT E a a eaa SCH A ar O SEH ca 097 eE EE gp ZIS 00071 IK HEI ww 94 X94 E 91 VA Ral eLo ala Dl GL 022 CE DIE E 817 V8E 000 6 IXYSEEYDX
71. Band Limited White Noise2 Gateway In Gateway In kalman_filter_wrapper s model_kalman_filter_wrapper Gateway In3 E es Step Gateway In2 PS IEA AS SHE Gateway Out Scope Sy stem Generator Goto Figure 4 AccelChip exports a cycle accurate System Generator block that supports both simulation and RTL code generation Throughout this flow AccelChip main tains a uniform verification environment through a self checking test bench the input output vectors that were generated when verifying the fixed point MATLAB design are used to verify the generated RTL The RTL verification step also gives AccelChip the information necessary to compute the throughput and latency of the Kalman filter This is essential information to assess whether the design meets specifi cations and is critical for achieving cycle accurate simulation Exporting from AccelChip to System Generator Although RTL verification is a key step in the design flow designers want to see algo rithms running in hardware System Generators hardware in the loop co simu lation interfaces make this a push button flow allowing you to bring the full power of MATLAB and Simulink analysis func tions to hardware verification Now that you have run RTL verifica tion in AccelChip you are ready to export the AccelChip design to System Generator by going to the Export pull down menu in the AccelChip GUI and selecting System Generator Ac
72. CPLD to perform some system function that requires power management inte grating discrete logic functions or imple menting fixes to standard products For more information regarding OTE see Attp www xilinx com bvdocs appnotes xapp388 pdf If you re designing a small form factor device consider using a new package option for Xilinx CoolRunner IIA CPLDs The package is commonly referred to as micro lead frame or quad flat no lead and is extremely small yet priced similarly to standard thin quad flat package devices This package is an attractive way to use minimal board space GEIS Embedded FPGA made easy and still get the great low power opera tion of CoolRunner II CPLDs Conclusion Perhaps you will be able to apply some of these cost saving ideas to your next design If you need more information on any of these topics there is a wealth of information from the Xilinx CPLD web site at www xilinx com cpld index htm If you need something more substan tial on these features an excellent applica tion note exists at http www xilinx com budocs appnotes xapp378 pdf With these innovative features and your imagination design problems will seem more manageable and hopefully make your design more successful Increase your embedded software performance by using Nucleus PLUS on your processor basem Xilinx designs integrating with Xilinx s MLD technology Nucleus PLUS is configured to your system design automat
73. EW running on Windows This allows us to leverage the development skills of our existing LabVIEW developers and migrate code between targets Furthermore because LabVIEW has always been a paral lel language and was originally designed with programming hardware in mind its parallelism is an ideal match for program ming parallel FPGA hardware To implement a LabVIEW block dia gram onto the FPGA we generate a set of state machines in addition to the function ality described in the blocks of the diagram that behave as an enable chain to control the flow of data We also make data from the LabVIEW FPGA VI available to exter nal programmers very intuitively by repre senting this data as controls and indicators on the front panel You can treat the VI running on the FPGA just as if it were a VI on any other processing engine We then provide a small set of functions to access the LabVIEW FPGA data from a LabVIEW host VI Figure 4 To speed up development and take advantage of Xilinx compiler tools we decid ed to generate an intermediate netlist repre sentation from the LabVIEW block diagram in a hardware description language in our case VHDL This allows us to represent some control logic at higher levels of abstrac tion which speeds up the development of our code generators and makes debugging much easier And because LabVIEW is a very generic programming environment for the host we developed the module generators the
74. First and foremost use the Synplify or Synplify Pro products Some design ers are often unaware of the impact of a quality synthesis tool If you dont have Synplify or Synplify Pro and are not meeting timing or you want to reduce area you can download a full featured evaluation copy from the Synplicity website for free Is Synplify software reporting positive slack in the log file srr If the Synplify products estimates are incorrect usually because of excessive routing delays caused by congestion the logic opti How to meet timing while still Keeping the area to a minimum mizations will be affected The Synplify solution has to work on the same critical areas as reported by place and route If there are excessive routing delays apply the route constraint to either the clock or the path see the online documenta tion for more information Do not use the global frequency box on the front panel of the Synplify tool unless you need to constrain combina torial paths Specify all the clocks in the constraints editor SCOPE Ensure that unrelated clocks are put into different clock groups If you have clocks that are related and in the same clock group make sure the periods have a common multiple If the clocks are slightly different this can cause very small clock to clock setup delays making timing impossi ble This can again have an effect on logic optimizations For example if you have two clocks
75. Forward with ISE 7 1i Software cooccnonicocncoconocionos 64 Debugging with Combined Oscilloscope and Logic Analyzer Measurements 67 AA A 70 Integrating MATLAB Algorithms into FPGA Deum Vs Complex FPGAs Require Equivalence eck 16 Completar E S tt o lll 80 Increase Performance and Lower Cost Through System Deem 83 Control Your Designs with the PlanAhead Hierarchical Design and Analysis Tool s es 0e 86 Building High Performance Measurement and Control Systems with FPGAs 90 GENERAL The Hydra Project Chess Prograf A en 94 Accelerating Time to Market with Embedded Processing QuickStart 98 Bioinformatics Algorithms on Nallatech Contigurable Multi FPGA Systems 100 POWER PLAY TR L 104 Conquering the Three Challenges of Power Lonsumgon 104 BOARD 11111 eer Eer O 0 112 REFERENCE zseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee 8 o INDUSTRY EXPERT Extend Your Reach RocketlO transceivers included in the Virtex 4 FPGA family incorporate highly flexible equalization circuits that significantly extend the range and performance of high speed serial links Xcell Journal by Howard Johnson Ph D President Signal Consulting Inc howie03 sigcon com Mike Degerstrom Senior
76. HEI ac uz v E lt A 3 a EHDE SE 2 E S 1 w D fa a Sn Oo 5 E SH o SC D y Gupeds pea ww e ol peaj ou zel penb 540 sebeped N4O A E 5 ma Ce JI T e 3 S 50 IE EAR 2 y 3 o R ze lt x Se Se Se Se lt lt Se e Se Se eem EA gt D f p 5 E Q e O Ol O oO O a AO EECH EECH A 5 Da n B S al T z3 a 23 23 zx N N N N N N e Aa ww len Oar O a G D O 2 oS VD WM O Ei U W N O UW 5 o A D y D 0 a NY OU 0 a N P N z D y o D N BD 0 E N N bb O gt P v o Ge o KSX IIX D D e Fe El EE E Al LEE Va e 5 V1dX IBUUNyY OO TRL exe A z va y d 12uun y 007 Duo paads sainjea VE O I O I Aaen pue suonido abeaogd SOLIS JIUUNYJOOD XIAJEIA UOI 2 j S IINPO Xcell Journal nsAUeCWUO 31607 jqewwepod 24L XNITIX ESTI IN n sAuedwo 31607 ajqewwesbolg ay XNNIX Xcell Journal JAI QA WOI XUI IXMMM JISIA SUOIINIOS 29 04 MOE uoewoyzui 310w JOY A QR ILAR ase SUOINIOS 3314 qd 8l E VN Oc Sl Oc SL OL 0l Col S S 06 887z 00v 9 887S6 X spea y JO aAIsn pul De spnpold DWEIJ Ped JO SUOISUQUIIP Bay 310N 8l Sc UN Kell 0c Sl OL OL 991 S S 06 912 008 p 9L7S69X CL Col ww CL ZL 81 VN Sl 0l SL OL Z OL EEL d S 06 Pl 007 AED Gupeds eg ww oU vDg 9u 9uly puoq 341M 54 SOBEAEA voa 81 E WING VEH E ONE O 7 OL 801 d S 06 801 00y z 301 569X Z6L 991 ww O SE X O GE 81 e Sl Sl 0l SL OL Z OL CL d d 06 Wi 009 L 215693X Z6L w
77. I BPI configuration memory 80000 75000 70000 65000 60000 55000 50000 45000 40000 35000 Logic Cells 30000 25000 20000 15000 10000 5000 0 UO e HIGH VOLUME SOLUTIONS e PCI 64 66 and PCI X support e DDR 333 memory interface e Mini LVDS RSDS e DCM clock frequency input range expanded down to 5 MHz ideal for video e 325 MHz multipliers aligned with block RAM for low cost DSP One of the most significant new fea tures available in Spartan 3E devices is sup port for low cost serial commodity Flash configuration memory With Spartan 3E FPGAs you can use generic low cost serial EPROMs or byte wide Flash devices avail able from multiple vendors to configure the device You can use the configuration memory for other system functions and even reprogram it under the control of the Spartan 3E FPGA adding tremendous flexibility for system designers Conclusion With the arrival of the Spartan 3E FPGA family there are now even more reasons to consider programmable logic in the produc tion of low cost systems With prices start ing under 2 and a substantially lower cost configuration solution Spartan 3E devices will serve as the production solution for increasing numbers of low cost high S O volume and consumer applications mum Spartan 3 mil Spartan 3E Figure 2 I O versus logic curves for Spartan 3 and Spartan 3E devices Xcell Journal 15 A HIGH VOLUME SOLUTIONS MI OPM CUNO IN
78. Issue 53 Second Quarter 2005 XCelliouma ATIVE JOURNAL FOR PROGRAMMABLE LOGIC USERS 007 YILYVNO GNODIS ES INSSI TVNANOF TDX THE AUTHOR NI XNITIX Plugging into High Volume Consumer Products DESIGN TOOLS JNew ISE 7 11 Software control Your Designs g _ Extend Your Reach Kg Xilinx is the only FPGA supplier in the world to have achieved high volume 90nm production resulting in the lowest cost FPGAs in the industry Our leadership in 90nm products gives you all the performance and features you need at the lowest price points ever Both our Spartan and Virtex product lines the world s most widely adopted FPGAs are shipping on our optimized 90nm process now Contact your Xilinx rep today and let s ramp up for success together XILINX The Programmable Logic Company www xilinx com spartan3 Pb free devices available now vi 2004 Xilinx Inc 2100 Logic Drive San Jose CA 95124 Europe 44 870 7350 600 Japan 81 3 5321 7711 Asia Pacific 852 2 424 5200 e Pro ven a On 1 D races e Xilinx is a registered trademark Spartan and Virtex are trademarks and The Programmable Logic Company is a service mark of Xilinx Inc S Keng em Over I Million Units Shipped EDITOR IN CHIEF Carlis Collins editor xilinx com 408 879 4519 MANAGING EDITOR Forrest Couch forrest couch xilinx com 408 879 5270 ASSISTANT MAN
79. MB privade DOR SDRAM gt 64 bit 66MHz CompactPCl PIME Global DDR SDRAM Two PME Sites with 14 to FPGA External Data Post up to tbs Starkabric PIC M47 Compliant Port www innovative dsp com 805 520 3300 phone Innovative Integration real time solutions Xcell Journal 45 A HIGH VOLUME SOLUTIONS SOLIS Applications of the Spartan Family in Flat Panel Displays A full feature set and low prices make Spartan 3 devices attractive for use within the flat panel display market by Danny Mok Product Marketing Manager High Volume Xilinx Hong Kong danny mok xilinx com With its combination of low cost and full feature set the Xilinx SpartanTM 3 family is uniquely positioned to enable various digital consumer systems One market segment where Spartan 3 devices are being widely accepted is the flat panel display FPD market Flat panel displays are the fastest grow ing segment of a new breed of consumer tel evisions There are three types of flat panel displays on the market LCD plasma and projection DLP All three types are poised to dramatically increase unit shipments see Figure 1 from iSuppli Research 46 Xcell Journal Second Quarter 2005 40 000 Pl PoP O Projection 35 000 30 000 25 000 20 000 15 000 10 000 5 000 2002 units 2003 units 2004 units Television Systems Market Tracker Q4 2003 iSuppli Corporation Ei co 2005 units 2006 units 2007 units
80. OS you can easily and sin gle handedly navigate all device functions with a jog wheel buttons and soft keys Several project functions embedded in CARLOS help you manage various meas uring units simultaneously Because the sys tem is based on LabVIEW and Virtex II FPGA technology you can reconfigure its functionalities far beyond its factory set tings to meet a wide variety of testing needs Conclusion Xilinx and NI have collaborated to help measurement and control engineers build high performance custom I O devices leveraging flexible off the shelf hardware powered by Xilinx FPGAs at a considerable cost and time savings While LabVIEW can be used to define FPGAs quickly the use of Xilinx Virtex II FPGAs provides reconfig urable customization for many measure ment and control platforms from NI Within hours you can turn an off the shelf I O board into a control device that exactly meets your needs without ever knowing VHDL or other low level hard ware design tools By simply changing the block diagram of the LabVIEW FPGA code you can recreate hardware with a completely different personality and func tionality This rapidly implemented reconfigurable I O approach is unprece dented in the measurement and control industry and continues to expand the capabilities of virtual instrumentation For LabVIEW FPGA please visit www ni com fpgal more information about Second Quarter 2005 Prototype Your PCIe ASIC HERE
81. Oo Om Pb Go DN o ON Oo Om Pb W N SS SS OUTPUT warped image Iw Figure 2 Forward mapping with a scaling factor greater than one Backward Mapping Image Iw tea 4S o y Ee 10d x lw 4 5 interpolation of 1 3 2 lw 4 7 interpolation of 1 3 4 INPUT source image For every v from 1 to height Iw For every u from 1 to width Iw Calculate x Calculate y Calculate lw u v as bilinear interpolation the four pixel values xP LID ix Ix 7 x y such as Iw wu Jun h 1 k go 1 p10 1 Ale be p0l hekepll pk y x ig ts bLy1 p01 ULx y p11 xb yl x where a OUTPUT warped image Iw Figure 3 Backward mapping with a scaling factor greater than one Second Quarter 2005 IG HIGH VOLUME SOLUTIONS Conclusion The challenge is to design efficient effec tive and reliable vision modules with the highest possible reliability Ultimodule a Xilinx XPERTS partner and the VIPS Laboratory at the Universita di Verona have defined a foun dation platform for computer vision using Ultimodule s system on module family The platform provides a stereovi sion system for real time extraction of three dimensional data and a real time image processing engine implementing most of the algorithms required when an application relies on vision to make deci sions and provide control The platform supports applications that require high performance and robust vision ana
82. Programmers attempt to mimic human chess style but resulting programs are weak 1970s Chess 4 5 is the first strong program emphasizing tree search It is the first program to win a tournament against humans the Minnesota Open 1977 e 1983 Belle becomes National Master with 2 100 ELO e 1988 Hitec wins for the first time against a Grandmaster e 1988 IBM s Deep Thought plays on Grandmaster level e 1992 The ChessMachine a conventional PC program by Ed Schr der becomes World Champion e 1997 IBM s Deep Blue beats Kasparov in a six game match e 2003 Hydra s predecessor wins a human Grandmaster Tournament with 9 out of 11 points reaching 2 768 ELO e February 2004 Hydra achieves 1 rank in International Paderborn Computer Chess Championship e April 2004 Hydra reaches 2 920 ELO on ChessBase chess server e August 2004 Hydra scores 5 5 2 5 against Shredder and 3 5 0 5 against a 2 650 ELO Grandmaster e October 2004 Hydra is crowned Machine World Team Champion against the human team performing 2 950 ELO again Xcell Journal 97 Accelerate Time to Market with Embedded Processing QuickStart Embedded Processing QuickStart will give you the essential _ training to start and complete an embedded FPGA design by Jonathan Trotter Titanium Business Development Manager Xilinx Inc jonathan trotter xilinx com Co designing across processor and logic domains can present a significant challen
83. RTAN S SPARTAN 3E E D S E S 05 gH E E ES oH O E Hp SEI El y E Sp EJE El O a a AAA AAA AA Figure 1 Spartan 3 staggered I O compared to Spartan 3E inline I O The Gate Centric FPGA Spartan 3E devices feature an optimized inline I O ring for the absolute lowest cost Figure 1 shows the inline I O ring of Spartan 3E FPGAs versus the staggered I O pad approach used by Spartan 3 devices Inline I Os are more efficient for smaller densities and allow us to add more logic for a given I O count as shown in Figure 2 To further reduce die size we refined the I O layout removed some less common I O standards and resized the output buffers Even a small area reduction in each I O adds up to a relatively large savings because it is repeated for each I O pad Many of the deci sions in the Spartan 3E architecture were driven by direct feedback from our high vol ume and low cost customers Spartan 3E FPGAs have a lower CPL compared to Spartan 3 devices and are the lowest cost FPGAs for gate centric designs Correspondingly Spartan 3 devices are well Second Quarter 2005 suited for I O intensive FPGA designs Together Spartan 3 and Spartan 3E FPGAs will continue to meet customer needs in low cost system design New Features in Spartan 3E Devices Besides the optimized inline I O ring to lower costs Spartan 3E FPGAs also have a host of new features that include e Support for low cost commodity Flash memory SP
84. S Implementing New Configuration Options for the Spartan 3E Family The new Spartan 3E family supports serial A and parallel flash memory for configuration e 4 at ki je N Knapp e plications Engineering Manager Xilinx Inc 4 eve knapp xilinx com jord Williams e Marketing Manager Hetz yard worse inx com Kir A Sr Solutions Marketing Manager Xilinx Inc kirk owyang xilinx com The new Xilinx Spartan 3E FPGA family reduces your total system cost ny ways including new low cost configuration memory options You can se the configuration memory ution that best suits your specific application requirements For configura tion memory you can choose between industry standard commodity serial peripheral interface SPI or parallel NOR flash PROMs competitively priced fr Platform Flash or other low cost memories with a microcontroller bg j Second Quarter 2005 a HIGH VOLUME SOLUTIONS What is SPI The serial peripheral interface SPI Nearly all of the Spartan 3E configuration accessible byte addressable read write memo ry If the application requires additional space upgrade to the next larger PROM Most SPI and parallel flash PROMs are available in a common footprint across multiple densities pins become available as user I Os after con figuration Consequently you can leverage any interface is a four wire synchronous remaining space in the co
85. SB controller that allows you to pro gram the FPGA and or the on board flash device through the USB port on a PC In addition to the new Spartan 3E Evaluation Kit Avnet offers a complete portfolio of other Spartan series design kits These kits deliver a variety of low cost feature rich platforms to develop and test designs targeted at Spartan 3 and Spartan 3L devices The kits are available as either evaluation or development kits depend ing on the resale price and mix of on board peripherals Please refer to the table for a complete list of part numbers and prices all kits listed are available and shipping today In addition to the Spartan series design kits described here Avnet offers a comprehensive portfolio of other design kits that sup port additional Xilinx FPGA families such as Virtex 4 and Virtex II Pro Virtex II devices It also offers a variety of kits target ed at specific technology domains such as connectivity embedded processing and DSP 112 Xcell Journal Featured Device Price XC3S100E 69 Featured Devices Price 35400 399 XC351500 ADS XLX SP3 EVL1500 499 XC351500 ADS XLX SP3 DEV1500 5749 XC352000 ADS XLX SP3 DEV2000 849 Spartan 3L Design Kit Featured Device XC351000L Avnet Part Number ADS XLX SP3 EVL1000L 5599 Second Quarter 2005 PS Dre TODA OO NA 15101 TED Evaluation Board Using Spartan 3 FPGAs Inreviun Features e Xilinx XC3S400 4PQ208 e Xilinx XCFO2SVO0
86. SM Explorer This is a timing driven state machine engine that will increase run time but can often find a better FSM solution for your particular design e Turn on resource sharing Resource sharing tells the tool to share logic such as adders and multipliers whenever pos sible This may have an impact on tim ing but usually results in fewer resources used Conclusion Spartan 3 devices are increasingly being used in higher volume applications But along with higher volumes comes the requirement that part cost is as low as pos sible We hope these guidelines and hints will help you quickly achieve the perform ance required for your FPGAs while mini mizing the logic resources and therefore cost of your next design project For more information please contact your local Synplicity sales office which you can o o find on our website www synplicity com e Xcell Journal 53 aK HIGH VOLUME SOLUTIONS The Changing Face of Automotive ECU Design Why are automotive electronic control unit designers moving away trom microcontrollers and microprocessors and looking at programmable logic devices PLDs instead by Karen Parnell Automotive Product Marketing Manager Xilinx Inc karen parnell xilinx com Microcontrollers and microprocessors have long been the natural choice to implement control functions in instru ment clusters door modules and other control based electronic control units ECUs Design
87. Spartan 3 3E Features to Area Optimize Your Design Spartan 3 3E devices include features that were typically the domain of only high end FPGAs distributed memory embedded multipliers and others that can yield significant optimization om by Suhel Dhanani rolled out even more logic and I O at a Features That Optimize Area Senior Marketing Manager Spartan Products lower price With the release of the The Spartan 3 3E architecture is very Xilinx Inc Spartan 3 series we did the same but also memory intensive Not only does this suhel dhanani xilinx com added embedded features that let you architecture have embedded 18 Kb Ken Chapman integrate even more functionality in your block RAMs but half the look up tables Staff Engineer General Products Division favorite Spartan device Table 1 shows the LUT can be configured as memory Xilinx Inc advantages of these features Figure 1 ken chapman xilinx com The Xilinx SpartanTM 3 series is the first Spartan 3 3E Features To Reduce Area Utilization low cost FPGA that does not compromise Embedded Multipliers Area efficient implementation of various DSP functions on features it offers fast embedded mul a I DE GE DE EE DEE Up to Eight Digital Clock Clock management functions such as clock multiplication Managers DCM division phase alignment can be integrated within the FPGA mally configured to form shift registers distributed memory and large amounts of Dist
88. X ESTO TIN wsAuedwo 31607 ajqewwesbolg 34L SYDdd ueweds Xu IX NTE 5 5 2 pazusrda uos xu xn 951 UM09 xuzix mnnn sujosgifuos wos xuyixmnmn sz dUIIIFJOY dl IFUMIJOS Stade ISPIOIS pue uonem3Spuo ps3paoddnsjwosxuix mam Gotvas pavoq uos xurx mann Su18vyv0d worxuix mnm s99109p wMorxu px mann SIMAJIS LO spsevog 3IUIIIJIY JUDO Suiseyoeg SIMAI AID PYL V dA e 4 eg Hurmojjoy ayy PSIA sv d synposd xuy Ip UO suormijpads pnposd pun UO DULIOJUI S94D ou 104 uy gt y00qe ep ojunJed uo gt 9 Xu icmamm d33y 3e PUNO sigaus ezep VdIAVP y Y IM JUSUIMOP SIY UI ejep UE Apa jueytoduu S e19p 104 XU UOIpajas abexpeg sas Ae g ueyeds y ul Daag asoyz wo py6 s Hea Aew sbunayo abeyseq uondunsuo Jamod juadsainb pampas Jajjo sadiAap 7E ueueds Z SECH SIAVY Se pasn Sg 40 0E 0Z apnpul sajen W S S L 210 A A EE A 24d A NE EL v S v Ee v8L ve SIA y 08Z vZ vOL NZL vO 407s 09599 08874 mee OB X VOL 1000 wem d TIAN 6 ELO SOEN A WELL v Gp SUX SAN SZSAM eN We lacie EECH y EI 96 28711 96 Wey gier 80779 BH9 LZ 7ZX96 2000p woneem spea ay JO aAISNDUI ale SIDNPOIA AUIEAJ PLa 10 SUOISUBWIIP PAY 7 25M Sng SZSAM SOU Jasn JO J9QUINU WNWIXeW 9 851PpUl 9 08 Ul SISQUINN SBION A WLL v Sy e USJS Ha ECH 0LZ SJA Y 087 17 Up NOL Up NOE D Op 080 9y 08b 07 79 X 08 0002 mee A NZ S v S v AIS Us t Li SAA v EI zE XLS ZE A807 Loo reen TIE EL 7SX 79 JOOGI TOOSISEDX
89. a specification change or design error Traditionally designers have had to forgo this advantage as they move from FPGAs to an inflexible custom solution E like standard cell or structured ASIC However recent developments by Xilinx in customer specific FPGAs now preserve some of the flexibility of the FPGA while also maintaining a low cost approach and VirtexTM 4 EasyPath FPGAs enable you to prototype two different designs in a standard FPGA Xilinx Spartan 3 and then move to an EasyPath device that supports both those designs simultaneous ly in production For example you can use one bitstream to perform system diagnos tics on the entire system and the other to load the second application specific bit stream Alternatively you can choose to implement designs for two different prod ucts in a single device and take advantage of a common part number for inventory management purposes In addition with Spartan 3 and Virtex 4 EasyPath FPGAs you can change LUTs and I Os even after these devices have been deployed in the field see Figure 3 As a result you can now fix minor bugs or tweak certain parameters to further optimize your designs For instance a line card in a router might need to have the drive strength and slew rate adjusted a notch or two depending on what load it encounters You can implement a range of drive strengths that are available Figure 3 EasyPath FPGAs allow ASSP customers t
90. accelerate processor based archi tectures With Triton Tuner s SystemC Second Quarter 2005 simulation environment you can develop robust efficient processor architectures With Triton Builder you can add sophisti cated hardware acceleration to your proces sor based systems The Triton tool suite enables design architects to achieve higher system throughput reduced power consump y ANSI C Application Code and Architecture Description BEE SO MES tion and cost as well as shorter design cycles For more information visit our website at www poseidon systems com ot contact the sales department at 925 292 1670 To be qualified for a free tool evaluation of the Poseidon Builder and Tuner and a free white paper e mail O farzad zarrinfarOposeidon systems com e Poseidon Triton Tool Suite System Simulation Performance Analysis Accelerator RTL Modified C Code Accelerator Model RTL Test Bench Embedded Hardware Software Design Instruction Set Simulator BSP Compiler Debugger Memory Controller Custom Logic Custom Logic Xilinx Platform Studio Co Verification PowerPC APU Auxillary Controller Processor Bridge Meann UART GPIO mu FSL optional een Triton Builder HW SW Partitioning Acceleraton Embedded Processor Subsystem Design XPS SWIFT Toolkit Design Entry ba Simulatore OnChip Debug ISE Implementation ChipScope
91. and system interfaces However when it comes to signal processing designers traditionally purchase a fixed algo rithm standard product for high speed mul tiply and accumulate MAC functions This requirement demands additional design resources verification time system components and more board space With the explosion of the wireless communication market and prolifera tion of low cost 90 nm processes Spartan 3 devices with plenty of logic memory and as many as 104 18 x 18 embedded hard multipliers are the ideal solution for many signal processing needs The challenge for today s digital pro cessing systems is their large memory requirements and the very fast MACs need ed for rapid mathematical operations Every multimedia system contains an external DSP processor and memory component that reduces system performance and increases component costs Once you have uncovered a solution that integrates a paral lel or semi parallel system you can then focus on improving the overall DSP design to eliminate performance bottlenecks As a result of digital processing chal lenges many companies have focused their efforts on developing the system on chip SOC concept by adding feature sets to Second Quarter 2005 bring additional functionality to a single piece of silicon Customized ASICs have become very costly solutions in today s com petitive landscape Traditional DSP proces sors are capable of carryin
92. and thermal runaway That s Design Example LXE va 2660 Target Frequency 200 MHz Worst case procesa why we designed our Virtex 4 FPGAs with Triple Oxide WK LP 20K Flip Flops Mbit On Chip AM 64 DAF Blocks 128 2 5V 1O Technology embedded IF and power saving configuration Baiat an Zap tool vi and comperibar tool v2 1 circuitry Now you can meet your performance goals while rod higher density dewioes achiree up Ip vk reet powe staying within the power budget Visit wwnsacilinx com virtexddowpower today and get the right solution on board before your power issues start heating up XILINX www xilinx com virtex4 lowpower View The Tacha Line Seminar Today K Dat Ee Spartan 3E Evaluation Kit trom Avnet AVNET electronics marketing HSE SS maae oe ate le glee n TTT ai WK ha Jeng pi E a 8 C e We wi eg we a a la y i id Em 8 er 1i vi if Features e Xilinx XC3S100E 10144 FPGA e Cypress CY7 68013 USB 2 0 controller e ST 4 MB SPI serial flash e TI TPS75003 triple voltage regulator e Windows USB download utility Now shipping Avnet design kits for Spartan FPGAs get you started fast Avnet is now shipping the newest member of its Spartan series design kits the Spartan 3E Evaluation Kit featuring the XC3S100E EPGA This design kit offered at 69 is an affordable and robust tool for becoming familiar with the new features of the Spartan 3E family It features a Cypress U
93. ar focus on interfacing to a Micron MT46v32M16TG 6T DDR SDRAM This and other application notes illustrate the theory of operations key chal lenges and implementations of a Spartan 3 FPGA based memory controller DDR memories use non free running strobes and edge aligned read data Figure 1 For 333 Mbps data speeds the memory strobe must be used for higher margins Using local clocking resources a delayed strobe can be centered in the data window for data capture To maximize resources within the FPGA you can explore design techniques such as using the LUTs as RAMs for data capture while at the same time minimizing the use of global clock buffers BUFGs and digital clock managers DCMs as explained in the Xilinx application notes Results are given with respect to the maximum data width per FPGA side for either right and left or top and bottom implementations Implementation chal lenges such as these are a rie de E SEH mitigated with the cfr y An Y at new Memory Interface een es Pa We Ee el FA are eh nerator AR y se E EECH EE l pera n Xilinx created the l eg E ja Memory Interface ar lr E res r a mir a Generator MIG 007 B T to take the guesswork Fara ma E Sah ie ie out of designing your m ar mm pe CH mt for y e own controller To cre SE EE ate the interface the E EE pang enga E SP is i E la i i aa Se LV ET TT e e
94. ardware configuration files The Simulink environment allows you to verify the functionality of each block or sub system created with scopes and graphs to view images or observe data Digital filters are among the most signif icant components in digital signal process ing applications The function of a filter is to eliminate undesirable parts of the signal random noise or to extract signals in a par ticular frequency range Basic FIR filters are used extensively in video broadcasting and wireless communications A mathematical expression of a basic FIR filter is Y n SUM h k x n k k 0 to k N 1 It consists of an input sample output sample and coefficients Imagine x is a continuous stream of input signal and y is a resulting filtered stream of output signal The n and k in the equation correspond to a particular instant in time so to compute y n at time n a group of input samples at m different points in time are required or numerically x n x n 1 x n 2 x n k cc gt A group of n input samples are multiplied by n different coefficients and summed together to form a result y n This design example implements a 43 tap FIR filter with a MAC engine and a dual port RAM used for data and coeffi cient storage The filter is a low pass filter with a cut off frequency of 6 KHz The sampling frequency is 44 1 KHz Figure 1 represents the
95. arter 2005 Board Level Functions As designs become more complex more peripheral functions are controlled by 8 bit microcontrollers but they may not have sufficient I O There are a number of solu tions to this issue including stepping up to a more costly 16 bit microcontroller Another is to partition the design over a low cost 8 bit microcontroller and low cost CPLD Here s a good example a system has mul tiple ADC inputs used as data inputs to help control and position multiple stepper Microprocessor Serial Bus design with but offer extremely low power consumption thus eliminating the need to compromise power or performance The low cost XC9500XL devices are ideal if you require 5V compatible I O Figure 1 shows a typical example of where a CPLD can be used alongside a microcontroller providing extra I O to connect to multiple indicator LEDs and also provide I O to control multiple motors The CPLD in this example also provides simple PWM functions and ADC TTL to LEDs Figure 1 I O expansion example motors such as in an instrument cluster plus other I O for LED control and switch inputs The low cost microcontroller is ideal for the sequential signal processing but has limited I O to interface to off chip devices The CPLD can interface to the micro controller through a simple serial bus and provide low cost inputs and outputs to the devices that need to be controlled Although this ap
96. arting a new project Design Summary shows you the key core information about your project such as project location and target device As you proceed through the design flow Design Summary adds and updates more infor mation like logic utilization perform ance summary and design constraint results Figure 2 Design Summary also contains a list of links that take you to more detailed implementation reports so that you can easily jump to more information And Design Summary is right up front in ISE software easily accessible as a selectable tab in the Project Navigator display window Message Filtering is a new ISE option that works with design reporting tools to cap ture the information that is exactly right for your design project Message Filtering lets you select the report informa tion that you deem non critical to your design and suppress it from future reports Message Filtering lets you get more streamlined and pertinent reports and quickly see the data necessary to your project making debug and verification quicker and easier Lower Project Costs ISE 7 1i software also offers new cost savings options including new Spartan 3E device support Virtex 4 device support in ISE WebPACK software and ISE BaseX expanded Linux operat ing system support and new electronic software delivery Expanded Device Support Spartan 3E devices are the newest low cost FPGA offerings from Xilinx further 66 Xce
97. at need to change thus delivering faster reimplementation compile times PlanAhead wraps area groups incremental design and modular design into a single ASIC strength floor planner See the article Control Your Designs with the PlanAhead Hierarchical and Design Analysis Tool in this issue of Xcell for more detailed information Ease of Use Slashes Design Time ISE 7 1i software includes a host of new tools and capabilities that you cant find anywhere else including Technology Viewer Message Filtering Design Summary ISE Simulator and ModelSim Xilinx Edition III In addition to the advanced spectrum of ISE implementa tion tools this collection of new technology slashes design times getting you through even advanced high density design flows faster and cutting project time and costs New More Powerful HDL Simulation Choices Two new advanced HDL simulation offer ings are now available with ISE 7 11 soft Second Quarter 2005 ware ISE Simulator and ModelSim Xilinx Fdition III These state of the art HDL simulation tools give you more perform ance and design capacity than previous ver sions of ISE software and include new integration into the design flow ISE Simulator is a new full featured HDL simulator integrated within ISE soft ware It offers you the ability to simulate directly from the ISE Project Navigator process window test benches and stimulus can be generated automatically and graph ing is
98. ateau that persists until a reflection is received back from the end of the line Each branch must also be the same imped ance or close or it will be impossible to choose an effective single resistor value If branches are longer than three quarters of an inch it makes sense to make their paral lel impedances equal to the inbound line impedance from the driver You can also terminate at the junction itself by changing the trace impedance or by using parallel DC termination both dampening reflec tions quickly and attenuating the signal The most appropriate choice depends on network topology and signal direction For nets that have complex routing pat terns it may be difficult to find a termina tion scheme that works even in theory This is where a what if simulation tool like HyperLynx can be an indispensable ally in comparing alternatives LVDS and RSDS are a key enabler for hardware engineers seeking high speed technology at a reasonable price But the increased capabilities of modern devices force today s engineers to shoulder the burden of proactive ly resolving signal integrity timing and EMC problems With the money youll save by manufac turing with Spartan 3 FPGAs consider adding signal integrity analysis software to your toolbox Several features are important in good analysis software including the ability to recommend termination strategies and run what if simulations early in th
99. atform is an FPGA design based on hardware modules in the form of IP cores You can add addi tional features using third party IP cores or by designing a customer specific circuit The main components of the Multimedia FPGA Platform are e logiCVC a compact video controller for display driving e logiBITBLK for 2D graphics acceleration e logiCAN logiUART for communication e UltiWIN for frame grabbing video input e UltiMEM a multi port SDRAM DDRAM memory controller The Spartan 3 hardware resources dedicated multipliers digital clock man agers DCMs and block RAMs ensure that IP cores operate at higher speeds and consume less area of the FPGA The video input scaling circuits take advantage of the dedicated 18 x 18 multipliers while the high capacity block RAMs used in all Xylon IP cores contribute to more efficient sharing of memory bandwidth and better overall system performance In addition the DCMs provide finer clock generation and eliminate the need for costly clock generation ASSPs while digi tal termination and DDR support enable better and lower cost support for SDRAM or DDRAM memories Figure 1 shows the Multimedia FPGA Platform block schematic configured for an automotive backseat entertainment appli cation This example is significant because it has the capability to show a live video stream on any display screen We accom plished this by instantiating more than on
100. avor of discrete logic was that it was cheap and easily available Yet discrete logic was not the most optimal choice from a board area power consumption or programma bility standpoint The introduction of programmable logic devices provided programmability flexibility and higher levels of integration 34 Xcell Journal However it also came at a cost premium over discrete devices Improvements in manufacturing processes and packaging technologies in the last 10 years have significantly lowered the cost of programmable logic devices Today CPLD unit costs have been driven down to a level such that they can replace discrete logic devices yet maintain advan tages such as higher performance design flexibility re programmability and relia bility Figure 1 Logic Consolidator To help customers quickly and accurately estimate their costs and area savings by using a single CPLD instead of multiple discrete devices Xilinx9 offers a tool called Logic Consolidator This is a Microsoft Excel based tool that estimates the number of discrete logic devices that can be inte grated into a single CPLD It compares Xilinx XC9500XL and CoolRunner M II 10 000 unit pricing to the 10 000 unit average resale price of 7 400 discrete logic products in the lowest cost plastic package Also based on particular quotes you can Caleulator CPLD dm specify your own unit pricing for accurate comparison Logic Consolidator
101. bandwidth of the driver and receiver A classic test of dispersion appears in Figure 3 This particular waveform adjusted so that the long flat portions of the test signal represent the worst case longest runs of ones or zeros available in your data code displays the runt pulse amplitude In the absence of reflections crosstalk or other noise this single waveform as measured at the receiver represents a worst case test of channel dispersion Longer traces introduce pro gressively more dispersion eventually causing receiver failure at in this exam ple a length of 1 5 meters One measure of signal quality at the receiver is voltage margin This number equals the minimum distance in volts between the signal amplitude and the receiver threshold at the instant sam pling occurs In a system with zero reflections crosstalk or other noise you could theoretically operate with a very small voltage margin and still expect the system to operate perfectly In a practical system however you must maintain a healthy noise margin suf ficient to soak up the maximum ampli tude of all reflections crosstalk and other noise in the system while still keeping the received signal sufficiently above the threshold to account for the limited band width and noise inherent to the receiver Following the example in Figure 4 a runt pulse amplitude equal to 85 of the amplitude exceeds the receiver threshold nominal low frequency
102. bination of all possible pieces For example if a rook move signal hits a rook of our own we will find an aggressor a from square of a legal move Thus many legal moves are generated in parallel These moves must be sorted and we have to mask those already examined The moves are sorted with the help of another com parator tree and the winner is determined within six levels of the tree Sorting criteria is based on the value of attacked pieces and whether or not a move is a killer move Second Quarter 2005 Figure 4 shows the Finite State Machine for the recursive Alphabeta algorithm On the left side of the figure you can see the states of the normal search including the nullmove heuristic The right side shows the states of the quiescence search The main difficulty is controlling the timing as some of the sub logic the evaluation and the move generator may need two or three cycles instead of one We enter the search at FS_INIT If there is anything to do and if nullmove is not applicable we come to the start of the full search After possibly increasing the search depth not shown in Figure 4 we enter the states FS VICTIM and thereafter FS_AGGR which give us the next legal move as described previously Reaching the state FS_ DOWN corresponds to a recur sive call of the Alphabeta algorithm with a l point window aa 1 If the search remaining depth is greater than zero we look for a move in the state
103. board and a second containing just a sensor with minimal related components On the main board the FPGA I O pins go directly to the inter board connector so it is possi 20 Xcell Journal ble to change the pin functions including polarity to match the particular sensor boards A similar solution allowed the earli er Model 313 camera to support different types of sensors most became available after the board design It even works in our 11 megapixel Model 323 cameras without any PCB modifications Selecting the Video Encoding Technique After the prototype camera was ready it took just a couple of weeks to modify the code developed for the Spartan I E based connected directly to the processor I O pins so I could not use the software that comes with Xilinx configuration hardware The JTAG instruction register is six bits wide not five as it was in the Spartan IIE devices with which I was familiar After some trial and error I figured that out and found that the same code could run at 125 MHz instead of 90 MHz in the previous model and used just 36 not 98 as before of available slices plenty of room for more challenging tasks Of course I had some challenging tasks in mind as motion JPEG is not a really good option for high resolution high frame rate cameras because the amount of data to be transferred or stored is quite huge It is a waste of network bandwidth or hard disk space when recording such video st
104. capture phase using CORE Generator software or insert cores directly into the design netlist using Core Inserter These cores are then synthesized and instantiated into your FPGA design allowing you to view any internal signal within the FPGA For engineers designing embedded processor systems using the Virtex II Pro or Virtex 4 FX FPGA families the ChipScope Pro system enables debug of embedded processor buses including the IBM CoreConnect processor local bus or on chip peripheral bus supporting the IBM PowerPC 405 processor You can also view and debug embedded processor signals for the MicroBlaze soft proces sor core supporting all leading Xilinx FPGA device families Signals are captured at or near operating system speed and then brought out through the programming interface freeing up pins for your design not gobbling them up for debug The ChipScope Pro real time debug tool is one of the only tools that allow you to change probe points without having to re synthesize and re route your design Using the ISE FPGA Editor you can change signal DYE SGA I OS probe points and then quickly reprogram your FPGA and debug a whole new set of signals in a matter of minutes Integrated Logic Analysis You can analyze captured signals through the ChipScope Pro software logic analyzer Figure 1 The ChipScope Pro logic ana lyzer is an advanced display and debug tool that makes logic and bus analysis easy The Ch
105. ccelerator through the inserted driver which then performs the accelerated function The accelerator runs independ ently from the processor freeing it to per form other tasks When the accelerated task is completed the results are passed back to the application program You can imple ment multiple independent accelerators within the same system Thread 1 Thread 2 Driver Driver Thread 3 To create a complete accelera tor peripheral you need more than just a C synthesis tool Poseidon Builder includes not just a C to RTL synthesizer that creates the compute core but it also generates the communica tion and control hardware The Poseidon tool creates a complete accelerator peripheral the block diagram of which is shown in Figure 3 In addition to the com pute the includes a multi channel DMA core accelerator controller bus interface local memories and other logic to cre ate a plug and play peripheral Interfacing to Xilinx Tools Triton Tools are extremely flexible allow ing you to use either Triton Tuner or Triton Builder independently or together as an integrated suite These tools were devel oped to enhance the productivity in devel oping processor based designs and vastly increase their capability The Triton tools link seamlessly to EDK tools A typical system design flow is usually an iterative process where you would analyze the system performance determine inefficiencies modi
106. celChip then gen erates a cycle accurate System Generator block that supports both simulation and RTL code generation At this point you transition the design flow to System Generator where a new Second Quarter 2005 block for the Kalman filter is available in the Simulink library browser You need only select the Kalman filter block and drag it into the destination model to incorporate the AccelChip generated Kalman filter into a System Generator design Figure 4 shows the resulting diagram for the Kalman filter In the center of the System Generator diagram is the Kalman filter and around it are the gateway blocks needed for a System Generator design Once the AccelChip generated block is included in the System Generator design you can perform a complete system level simulation of cycle accurate bit true mod els to verify that the system meets specifica tions You can use AccelChip generated blocks for System Generator in conjunc tion with the Xilinx block set Once this system level verification step is completed the next step in the System Generator flow is to move on to design implementation The Generate step in System Generator compiles the design into hardware Verification Options All design files generated by AccelChip including exported System Generator files are verified back to the original golden source MATLAB M file AccelChip s veri fication approach is based on the genera tion of a
107. ckages have external pins that can be probed just as easily as a thin quad flat pack package These features make it a great addition to the CoolRunner I family of CPLDs for consumer products These MLF packages also offer better sons You get additional logic capability more package choices lower power free JTAG testability I O options advanced security and cost savings Current users will benefit from the same pin out as well as the new I O banking feature Novice users can find out how to use these new CPLDs at www xilinx com cpldl Dont forget that the design software is always free and easy to use Download your copy of ISE WebPACK software at www xilinx com products design_resources design_toollindex htm to start your design today With these design tools and world class support Xilinx CPLDs make a great choice for logic solutions and replacement a e A O of discrete logic devices The Compact DSP amp FPGA Solution The micro line C6713Compact is a high performance single board DSP FPGA solution offering exceptional capabilities and flexibility Measuring only 67 x 120mm it combines a Xilinx Virtex IIT FPGA with Texas micro line C6713Compact Embedded DSP FPGA board Virtex II FPGA 250k 500k or 1MGates TMS320C6713 DSP Up to 1800MIPS 1350 MFLOPS IEEE 1394 FireWire 400MBit sec IIDC DCAM SBP2 Optional Ethernet 10 100BaseT TCP IP UDP ICMP IGMP Telnet HTTP SMTP POP3 FTP Ins
108. come viable high volume ASSP alternatives The use of triple oxide technology for example has allowed Xilinx to selectively vary gate oxide thicknesses to optimize spe cific regions for low power or high per Radio frequency signal processing solution Four 720 MHz TM532006416 DSPs each with e BAMR RAM Dedicated 200 MRT bank between 15 mg FPGAS a MEN formance As a result and contrary to industry trends static power consumption in the Xilinx 90 nm Virtex 4 family is about 50 lower than in 0 134 FPGA devices On the performance front the use of specialized embedded hard macros for processor cores and DSP slices allows Xilinx to provide operating frequencies on par with ASICs Even on the cost front the move to smaller process nodes and the development of clever techniques to improve yields has allowed FPGA vendors to shift the crossover point between FPGAs and ASICs to higher and higher volumes ASSP vendors are being squeezed from both ends by market uncertainties and risk on one side and rising product development costs on the other EasyPath FPGAs offer a way out They eliminate the risk of a new product introduction by significantly reduc ing the total cost of development and the time to production ASSP vendors can now address many more markets and segments that might a 7 0 otherwise have been uneconomical gt Two 4M Gate User Programmable FPGAS each with TEM private DDR SESRAM 128
109. covered All ISE configurations ChipScope Pro analyzer PlanAhead software MXE III and ISE Simulator are avail able for purchase from the Xilinx online store Xilinx distributors or by calling 1 800 888 FPGA 3742 d Second Quarter 2005 Dies GaN Oro ls Debugging with Combined Oscilloscope and Logic Analyzer Measurements With an automated logic analyzer fo scope cross trigger de skew and scope waveform input one plus one can be gen by Brad Frieden Logic Applications Specialist Agilent Technologies brad frieden agilent com With higher speeds in today s digital designs its not uncommon to encounter challenges during board turn on that relate to timing or signal integrity Most of the work is distinguishing whether it s a logic problem or a signal integrity problem Spending too much time sorting out such things can reduce time to market and cre ate even more time pressures when what youd really like to do is design Many designers find a real time oscillo scope and logic analyzer invaluable when it comes time to turn on validate and debug the hardware in their system But they are often used independently the oscilloscope for signal integrity or timing issues and the logic analyzer for problems with the logic Second Quarter 2005 But it turns out that these tools are most powerful when used together You can significantly reduce debug challenges by taking advantage of tools that auto
110. critical for FPGA systems An evaluation board is available LM2734 36 1A SOT 23 Buck Regulators LM2734 Typical application diagram 3V to 18V e Complete easy to use switcher solution has the smallest foot print and highest power density in the industry e Choice of switching frequencies allows designers to trade off efficiency against solution size and EMI e Current mode control improves phase margin line reg ulation and rejection of transients e Internal softstart circuitry cycle by cycle thermal shutdown and over voltage protection The LM2734 and LM2736 are monolithic high frequency 550 kHz and 1 6 MHz PWM step down DC DC converters in tiny six pin thin SOT 23 packaging They provide local DC DC conversion for Xilinx FPGAs with currents up to 1A Both regulators need no external compensation and are supported by WEBENCH National s online design tool The ability to drive up to 1A loads with an internal 300 m4 NMOS switch using state of the art 0 5 pm BiCMOS technology results in the best power den sity available The world class control circuitry supports exceptionally high frequency conversion over the entire 3V to 20V input operating range down to the minimum output voltage of 0 8V Even though the operating frequencies are very high efficiencies as high as 90 are easy to achieve Additionally a current mode control loop provides fast transient response and accurate regulation in the smallest possible
111. ctions Hence we can classify the structure around each pixel observing the eigenval ues of H No structure a B 0 Edge a 0 B gt gt 0 Corner et esch Second Quarter 2005 Figure 1 Feature points extracted from an image captured by a camera Using the Benedetti and Perona approx imation we can choose the corners without computing the eigenvalues We have realized an algorithm that compared to the original method doesnt require any floating point operations Although this algorithm can be imple mented either in hardware or software by implementing it in FPGA technology we can achieve real time performance Input e 8 bit gray level image of known size up to 512 x 512 pixels e The expected number of feature points wf Output e List of selected features FL The type of the output is a 3 x N matrix whose First row contains the degrees of con fidence for each feature in the list Second row contains the x coordinates of the feature points Third row contains the y coordinates of the feature points Semantic of the Algorithm In order to determine if a pixel i j is a fea ture point corner we followed Tomasi and Kanade s method First we calculate the gradient of the image Hence the 2 x 2 symmetric matrix G a b b c is computed whose entries derive from the gradient values in a patch around the pixel i j If the minimum eigenvalue of G is greater than a
112. ctors philips com Texas Instruments www t1 com pciexpress Xcell Journal 61 STOP OVER THE WALL ENGINEERING FINALLY ALGORITHM DEVELOPERS DSP SYSTEM ARCHITECTS AND HARDWARE DESIGNERS ARE ALL ON THE SAME PAGE Xili ator ort to Xilinx System Generator Sen for AccelChip DSP Synthesis Complete System Verification using Hardware in the loop i or option The AccelChip DSP Synthesis Export to System Generator 0p accelerates time to first silicon Benefit R comporte Accenvare and Xilinx IP imaging applications eee www accelchip com Company partnerships don t usually generate a lot of interest but a partnership that yields team building technology does For example the industry s first DSP design flow from a mixed MATLAB Simulink environment to a verified FPGA It s based on a tight integration between AccelChip DSP Synthesis and Xilinx System Generator for DSP that accelerates development of signal processing systems Work more collaboratively and get designs to market faster by eliminating the need to recapture in HDL Readily incorporate AccelWare and Xilinx IP cores and verify the entire system using hardware in the loop simulation further speeding development How s that for teamwork Stop your Over the Wall engineering Download free whitepapers at accelchip com papers or call 408 943 0700 o S XILINX Accelchip e PES iS TOOLS Design Leadership trom Xilinx The s
113. customer needs A e Ae AS e i el cn mo 5 Continued improvement is an ongoing goal Toward this end Xilinx has extended DI the features of its 32 and 64 macrocell A E XILINX 4 i i AULIN Nin n a CPLD devices by augmenting a feature cur Coeubfuurwnee At a i R E A geg GE GI rently only available on higher density parts that will help ease design difficulties and reduce total system costs In conjunction SEN s AA E with added features Xilinx is also offering new packages to decrease the cost per I O a ar gem pan e deif E a E Cinn A a ee for small footprint packages These new packages will help Xilinx continue to pene GC trate low cost battery operated devices Xilinx is achieving CPLD market share growth through both traditional applica tions such as computing data processing networking and telecommunications and non traditional applications such as con pas sumer products set top boxes and large screen TVs both plasma and LCD and key handheld markets PDAs handset and other battery operated products With J new market success and continued tradi tional market use CPLDs will continue to show growth in system logic solutions Second Quarter 2005 Xcell Journal 3 E HIGH VOI Ue SONUTIONS New Banking for CoolRunner Il CPLDs Xilinx has re introduced its CoolRunner II 32 and 64 macrocell devices with an added feature known as I O banking I O ba
114. d HW S3 SL361 Conclusion With the popularity of DDR memory increasing in system designs it is only nat ural that designers use Spartan 3 FPGAs as memory controllers Implementing the controller need not be difficult For more information about the applica tion notes GUI and development board please visit 1www xilinx com products S O design_resources mem_corner lindex htm Xcell Journal 27 e HIGH VOLUME SOLUTIONS Signal Processin Capability with the Nu Horizons Spartan Development Board Spartan 3 platform provides a low cost FPGA based solution to perform digital signal processing requirements by Zulfigar Ali Zamindar Field Application Engineer Nu Horizons Electronics Corp azamindar nuhorizons com To meet their system design goals design ers today must prototype a new idea and integrate its product features into the low est cost silicon complete with versatile functionality Specifically two of the biggest challenges are to get the design cor rect in the first place and to fix a problem rapidly Immediate design solutions are essential for meeting time to market pres sures keeping up with changing industry standards and prototyping quickly Even after the design is completed a need may exist for field upgradeability if a bug is found or a new functionality is available The Xilinx SpartanTM FPGA has been a great low cost programmable platform for low density control logic
115. d 64 bit Spartan series FPGAs The most logical solutions for successor is a PCI Express solution using an external PHY chip paired with a Spartan 3 or Spartan 3E device The PCI Express specification defines an interface to hook a PHY chip up to a separate device that houses the logical and transport layers A Joy ejes pny peng Lindenhurst ships E Grantsdale ahve Early Ser e SS AS Chipsets Adopter Compliance Workshops A FIG apiiCs PC ss mes AAA 2005 called a PIPE interface a white paper about this is available from Intel In the two chip solution the transport layer resides in a dedicated PHY chip and the logic and transport layers reside in a Spartan FPGA A broad range of PHY devices are available from manufacturers such as Genesys Logic Philips Semiconductor and Texas Instruments PHY pricing will be less than 10 for high volumes 250 000 units per year See the sidebar PHY Vendors for contact information Xilinx has collaborated with Phillips Semiconductor and delivered this solution to our customers To implement the interface Xilinx and several of our IP partners including Eureka GDA and Northwest Logic provide PIPE IP cores for Spartan 3 and Spartan 3E devices A single lane PCI Express controller requires approximately 500 000 gates 50 of a Spartan XC3S1000 for the logical and transport layer core leaving the rest of the FPGA available for the user application
116. d with Theora to deliver the video content The bitstream format is stable enough and supported by multiple players running on different operating systems Like JPEG and MPEG it uses a two dimensional 8 x 8 DCT FPGA Implementation The code for the Elphel Model 333 camera FPGA is written in Verilog HDL Figure 3 It is designed around the 8 channel SDRAM controller that uses the Spartan 3 DDR capabilities The structure of the memory accesses and specially organized Sensor 1 0 ei Bypass Buffer 7 Ly SW e B Tn d SE r Quan _ Dequan DCT tizer tizer Bayer to YCbCr 4 2 0 Converter me Second Quarter 2005 Synchronization Y data mapping both serve the same goal optimizing memory bandwidth that other wise would be a system bottleneck The rest of the code that currently uses two thirds of the general FPGA resources slices and 20 of 24 block RAM modules includes video compression modules a sensor and system interfaces A detailed description of the camera code is available together with the source code at Sourceforge itps sourceforge net projectslelphel Conclusion High performance reconfigurable FPGAs made it possible to build a fast high resolu tion low bit rate network camera capable of running 30 fps at a resolution of 1280 x 1024 pixels 12 fps at a resolution of 2048 x 1536 Many of the new features of the Spartan 3 devices proved to be very useful in this design
117. ddress Ee it prog INTERRUPT interrupt EY Constants INTERRUPT_ACK Control INSTRUCTION 17 0 Operational Control amp Instruction Decoding Program ROM RAM 1024 Words Program Program Cale ADDRESSJ9 0 Flow Control Program Counter Stack 32 x 10 Bits Implemented Using Distributed Memory Implemented Using 18 KB Block RAM Figure 2 The 8 bit PicoBlaze microcontroller takes only 192 logic cells by making effective use of Spartan 3 3E memory resources The FIFO is a common component of many system designs The normal way to construct a FIFO is by using a memory Figure 5 Because of the need to write at the same time as reading dual port mem ory is required Two address counters are also needed to point at the write address and the read address Additional compara tor logic is then required to determine the state of the FIFO full empty or half full In Figure 6 a FIFO can be imple mented using one SRLIGE for each bit of the data path width Din The SRLIGE gives a dual port operation but in half the space of the real dual port memory We LUT4 gt gt INIT 1234 also only need one up down counter instead of two up counters which again is half the space As a final bonus the counter status tells you precisely how many words are stored in the FIFO with the most significant bit naturally provid ing a most useful half full flag The embedded multipli
118. design with new technology because the time used for training is time not used for designing The cost of using our flexible scalable powerful processing solution is time And while youre learning the features of our platform FPGA your competitors are com ing up to speed as well If time to market is critical for you the risk of your competi tors successfully completing their projects before yours can be catastrophic The difference between a good design and a great design is time The difference between a novice design team and a pro Second Quarter 2005 ductive proficient designing machine is simply time Time yields experience and familiarity If your team has experience and is comfortable designing with a new tech nology they will finish faster When they encounter obstacles they will know what steps are needed to proceed Embedded Processing QuickStart Features With Embedded Processing QuickStart Xilinx provides one week of on site assis tance by a trained Xilinx applications engi neer Regardless of your team s competence or skill level the engineer will set up and customize EDK software for you Included during that week of on site support is a two day training course in embedded system development The course Bosic Understanding of C Programming Required Fundamentals of FPGA Design Systems Development Advanced Features and Technologies of Embedded Systems Development covers the bene
119. direct in the ISE Project Navigator display window as shown in Figure 1 ISE e en e i beer Seier e F RES sp aizi cee ajl aaj i ae EES E OE oe A Y i Figure 1 ISE Simulator graphing window Simulator supports VHDL and Verilog and functional and timing simulation ISE Simulator is also licensed through the exist ing Xilinx software registration ID process Two different versions of ISE Simulator are available ISE Simulator Lite comes included with ISE BaseX and ISE Foundation software at no charge The full ISE Simulator is a design option that you can purchase as an upgrade to your existing ISE Foundation seat or purchase as a part of a new ISE Foundation seat Xilinx and Mentor Graphics recently announced an extended OEM agreement that has led to the release of the new ModelSim Xilinx Edition III HDL simula tor The free MXE III starter now offers designers a 50 faster HDL simulation environment and 20 times more design capacity than the previous version For larger designs you can order the MXE II full version which provides five times the design capacity and 30 faster performance than the MXE III starter MXE III enables you to simulate devices with up to 2 million system gates depend ing on coding Built on the industry lead ing ModelSim 6 0a HDL simulation environment MXE III has enhanced GUIs that give designers easy access to capabili ties and faster debugging With the MXE III full ver
120. e 58 Xcell Journal Drivers Connectors O User IP MPEG Decoder OPB CoreConnect FPGA IP Interconnection BUS User IP MOST Control Platform IP Cores User IP Cores ASSPs MicroBlaze Figure 1 Multimedia FPGA Platform automotive back seat entertainment logiCVC for multiple display driving and instantiating more than one UltiWIN for simultaneous video input streaming A DVD VCR or camera CVBS output is used as the first video source while a MOST network is used as the second The CVBS analog signal is converted to an ITU656 digital signal by a composite video signal decoder an ASSP component The MOST MAC and MPEG decoder are cus tomer added IP cores The DC and GPIO IP cores are used for the keyboard and touch controller interface The UltiMEM IP core is configured for unified memory architecture and six access ports The 32 bit DDRAM assures 800 MBps bandwidth for display refresh video streaming graphics accelera tion and CPU program execution The logiCAN and logiUART local interconnect network LIN IP cores are used for network connections with in car body electronics Memory Bandwidth Requirements The selection of memory bandwidth is important for system performance The six IP cores share common memory as shown in Figure 1 Two logiCVC driving displays of 400 x 234 resolution and 16 bpp color depth require 16 MBps each Two UluWIN proc
121. e low NRE and high flexibility of program mable logic Many mainstream applica tions using Spartan 3E FPGAs will have an ASIC crossover point of 250 000 units meaning that the total cost favors Spartan 3E devices over ASICs for the first quarter million units of production ce available for under 2 e ege The Spartan 3E Family The Spartan 3E family is our newest low cost FPGA family and further reduces the price points for low cost FPGAs to Through 90 nm process technology unprecedented levels 300 mm wafers and application driven architecture choices Xilinx has extended FPGAs into volumes and applications previously reserved for mask pro grammed ASICs Spartan 3E devices offer one of the lowest costs per logic CPL of any FPGA Spartan 3E FPGAs have been archi tected for digital consumer applications and all high volume low cost applica tions will benefit from its advanced fea tures and capabilities The five member family ranges from the 100 000 gate XC3S100E through the 1 6 million gate XC3S1600E adding features such as 64 66 PCI mini LVDS and faster embedded multipliers for low cost DSP all to better serve low cost applications Table 1 highlights the key features of the Spartan 3E family Second Quarter 2005 Mulipliers Max Single Ended 1 0 Max Differential 1 0 Pairs oo FG484 23 x 23 mm Table 1 Spartan 3E family key features matrix _ UN Ol i L H SPA
122. e alone Receiver a SE Singal Amplitude at Far End of Line D Y EE i T d Time 1 bit per tick 400 ps Figure 9 A pre emphasis circuit at least doubles the length of channel over which you may safely operate Typical FR 4 Material Curve AMA HIS Frequency Hz Figure 10 The linear equalizer in the receiver may be set to one of four distinct response curves preprogrammed to match the response of various lengths of FR 4 PCB trace standards meeting exact transmitted signal specifications and at the same time adding receiver based equalization to keep your system working at the peak of performance ei INDUSTRY EXPERT Decision Feedback Equalizer As a last defense against the slings and arrows of uncertain channel perform ance the RocketIO transceiver includes a manually adjustable six tap decision feedback equalizer DFE This device is integrated into the slicer circuit at the receiver The DFE is particularly useful with poor quality legacy channels not initially designed to handle high serial data rates It has the remarkable property of accentuating the incoming signal without exacerbating crosstalk Those of you familiar with signal processing will recognize that a DFE inserts poles into the equalization network while a TX pre emphasis circuit creates zeros A very accessi ble book about digital equalization including DFE circuits is John A C Bingham
123. e at ye Ki www xilinx com pciexpress PCI Express IP PCI Express IP cores are available from multiple ven dors including Xilinx and our partners One such core from Northwest Logic is featured below Northwest Logic s PCI Express Core is specifically designed for low cost Spartan 3 FPGAs A Spartan 3 based PCI Express design uses the Spartan 3 device with a low cost physical interface for a PCI Express PIPE compatible PHY chip The PHY chip implements the low level PCI Express physical layer while the device takes care of the upper level data link and transaction layers Another version of the PCI Express Core uses the internal MGTs in Virtexell Pro and Virtex 4 FX FPGAs to provide a fully integrated PCI Express solution Northwest Logic s PCI Express Core is one of the smallest PCI Express cores available enabling you to target the smallest and consequently lowest cost FPGA The core is provided with a comprehensive ver itication suite and expert support to ensure rapidly developed and validated designs Also available is a PCI Express Development Board for quickly prototyping a complete PCI Express System A demo GUI drivers and PCI Express FPGA reterence design are also included For more information including pricing and core size for a particular FPGA family visit the Northwest Logic website at www nwlogic com PHY Vendors Genesys Logic WWW genesysamerica com Philips Semiconductor www semicondu
124. e design cycle Critical for post layout analysis are both interactive and whole board batch simu lation where violations are flagged and rec ommendations are made for an entire PCB Armed with the latest Xilinx technology the three Ts and the appropriate simulation software you will not only be ready for today s technology but ready for whats coming down the road as well For more information visit www mentor com hyperlynx e A HIGH VOLUME SOLUTIONS Using Spartan 3 FPGAs to Optimize Termination Component Placement Ideally terminators on a PCB are locat ed at precisely the position theyre designed to terminate exactly at the last receiver IC for parallel termination or exactly at the driver for source ter mination Unfortunately in the real world of dense PCBs ideal placement is often impossible Therefore it s com mon to find terminators that are locat ed a stub length away from their ideal positions But how long can the stub be before the terminator fails Series termination will start to fail when the stub is longer than about 10 of the driver switching time We can simulate this with a series resistor placed within this guideline and com pare the result to a series terminator placed an inch away For parallel termination if the ter minator is located upstream from the last receiver on the net the stub length can be as long as about 15 of the driver switching time For
125. e pushed ASSP vendors down to 0 13 and 90 nm NRE costs and design complexity have study in increased significantly A Second Quarter 2005 Xilinx Market Analysis Table 1 EasyPath FPGAs offer significant advantages in time to market and total cost of ownership when compared to structured ASICS International Business Strategies Fourth Quarter 2003 report estimates the product development cost of a 0 13 ASIC to be greater than 10 million including the cost of design verification and prototyping This implies that an ASSP vendor has to sell at least 250 000 units of a 40 ASP device see Figure 1 just to recoup the development cost Clearly if other sales and marketing costs are included and a rea sonable ROI is expected the lifetime vol ume potential of a device must be significantly higher Customer specific FPGAs on the other hand significantly reduce the mar 1 400 000 E 40 ASP 1 200 000 RTL nor spend multiple weeks in verification and simulation to benefit from low cost high volume solutions Time to Market In today s world of complex design cycles short product life cycles and quickly changing market needs time to market is of paramount importance Unfortunately the complexity of deep sub micron designs increases the time required to develop and debug a design while increasing the risk of re spins Because ASIC re spins can set back a schedule at least three months ASSP v
126. eed A D converters e 10 12 14 bit 10 to 135 Msps ADCs e Intersil high speed D A interface e 8 10 12 14 bit 130 to 260 Msps DACs Applications e MicroBlaze sottprocessor development e DSP system development e Industrial systems development e Data communications telecommunications e Universal prototyping platform Second Quarter 2005 ES MRE TODO mM LOS Memec Spartan 3E Board Solutions Features e X 3S100E 41Q144 FPGA e DT 512K x 8 SRAM e Atmel 2M x 8 flash memory e Atmel 16 Mb serial data flash memory e System Ace interface user 1 0 header e XCFO1S Platform Flash ISP PROM e Silicon Labs USB UART interface e Texas Instruments TPS75003 regulator D E DO ai de DN El 7 ch Loui F 1 Gol mo al Del 4 ES o de ALE Zi LL L A i y Sid MAS MET E e Low cost HAAG suns d Atmel Serial Data Flash The Memec Spartan 3E LC Se d Development Kit offers flexible SE JTAG Port Seven Segment LED Display User LEDs IDT SRAM 512K x 8 Atmel Flash 2M x 8 configuration prototyping R AT in a low cost evaluation platform 50 MHz Clock USB UART Port The Memec Spartan 3E LC Development Kit for the Xilinx9 Spartan 3E FPGA family provides a versatile proto Power Supply he N lt fe ul lt se LA lt type platform for evaluating the numerous configuration options available in Spartan 3E devices In addition the demo board pro
127. egin operation than they do during normal operation thereby placing demands on system power supplies In a consumer sys tem with very tightly controlled power supply size and cost ensuring that in rush power is not more than normal operating power is a key design goal Higher power levels can affect both man ufacturers and end customers alike in four key areas e Performance Higher power levels in a chip can limit device and end system performance by forcing a lower system clock rate to stay within the system power budget Second Quarter 2005 e Reliability As power goes up so does the threat of brown out and latch up from high power on surge In addition higher failures in time FIT rates will be expected due to higher device oper ating temperatures e Cost As mentioned previously higher power equals higher cost in the system because of larger more expensive power supplies and thermal manage ment components such as fans and head sinks e End customer operating expenses Higher power also impacts end users in the form of higher power bills which can be significant for large systems and shorter battery life for portable products E POWER PLAY This is enabled with industry leading technologies such as 90 nm triple oxide technology high performance embedded IB and power saving configuration circuitry Xilinx also provides comprehensive tools for power system design Virtex 4 datasheet and user g
128. elop designs that are either feature rich or seg mented by end usage without any addi tional effort The Spartan 3 family also offers an extensive library of encryption cores such as AES DES and TDES for applications where security is important In addition Spartan 3 EasyPath FPGAs support various SSTL and HSTL I Os for interfacing to different high speed memo ries such as SRAM and DRAM as well as reduced swing differential signaling RSDS with applications in LCD TVs and other display products where power consumption is critical Finally for applications that require com putational capability like image video pro cessing the Spartan 3 EasyPath FPGA MicroBlaze soft processor cores bring the MIPS cost down below 0 02 per DMIP Second Quarter 2005 while at the same time keeping the core area to about 1 in an XC3S5000 This provides you with low cost processing power while leaving enough room for other peripherals For example at a unit cost of 12 95 the 50K unit price for an EasyPath XC3S1500 device you can prototype with confidence using Spartan 3 standard FPGAs and con vert to EasyPath FPGAs for high volume Conclusion Power performance and cost have ranked high among the reasons why ASSP ven dors have historically migrated to ASIC FPGAs But advances in process technologies have sig solutions over recent nificantly mitigated these concerns to a point where solutions like EasyPath FPGAs have be
129. embedded multipliers for DSP functions advanced digital clock Gamma Correction FPN Correction D Overlay Application 1 ee TEO Inverse BET Coefficient Encoder pretokens i DC Predictor Figure 3 Block diagram of the FPGA code Xilinx Spartan 3 1000K Gates FPGA a HIGH VOLUME SOLUTIONS management DDR I O functions an increased number of global clock networks for the DDR SDRAM controller and large block RAM modules for the various tables and buffers in the camera The free video encoder Theora and completely open implementation of the camera all software and Verilog code is provided under the GNU General Public License makes the second most important function of Elphel products possible You can use these cameras not only as finished products but also as universal development platforms demonstrating the power and flexibility of the Spartan 3 family It is pos sible to add your own code rerun the tools both for the FPGA code and the C lan guage camera software and immediately try the new camera with advanced image processing implemented For visit more information https sourceforge net www elphel com projects elphel and www theora org w 0 Data from Sensor 1 FPN Correction Overla 2 20 x 20 Pixel Tiles to Compressor 3 PIO SDRAM Access DDR SRAM 4 Reference Frame Write 5 Reference Frame Read 6 Com
130. en mapping glob al optimization design re timing and FPGA physical synthesis Together this ProActive Timing Closure technology leads to higher overall design performance with timing closure reached faster and less time spe ntin the overall design flow 64 Xcell Journal Second Quarter 2005 Die SIGN OO In addition to the advanced spectrum of ISE implementation tools this collection of new technology slashes design times getting you through even advanced high density design flows taster and cutting project time and costs PlanAhead Extends Performance Advantages If you need even more design performance the PlanAhead hierarchical floorplanner from Xilinx can help PlanAhead software is a separately purchased tool option to ISE software that is ideal for high density Virtex 4 or Spartan 3E designs PlanAhead implements a block based hierarchical design approach that can ana lyze detect and correct potential implemen tation problems earlier in the design cycle leading to greater consistency faster place and route cycles fewer design iterations tighter utilization control and quicker incre mental design changes Customer based benchmarks confirm that PlanAhead soft ware can as much as double your out of the box ISE design performance You can also lock placement results for individual blocks that already meet timing so that subsequent place and route iterations only affect the blocks th
131. en used in combination with a wide IP portfolio allow ASSP vendors to address many more opportunities in a cost effective way while assuming minimal risks In this article we will examine how Spartan 3 FPGAs together with their EasyPath counterparts can alleviate some of the challenges faced by consumer elec tronics ASSP vendors a Second Quarter 2005 42 Xcell Journal ar HIGH VOLUME SOLUTIONS Customer Specific FPGAs ket financial risk element Customer specific FPGAs like those in the EasyPath program are identical to standard FPGA offerings but use patented test Structured ASICs EasyPath Selection Criteria FPGAs You can now have multiple variations of a particular Time to Prototype Samples design or indeed multiple Total Time to Volume Production 8 Weeks designs catering to different ing techniques to significantly Vendor NRE Mast Costs 75K market segments and go to improve yield Once your Design Cost for Conversion market with all of them at design is frozen Xilinx develops the same time This is Additional Cost of Tools custom test patterns that specif for Conversion because the NRE charges ically test only the resources Unit Costs Low Low are minuscule compared to used by that particular design Risk Low ASICs You no longer need You reap the benefits of these to spend valuable engineer Flexibility to Make Changes In System Ing resources t
132. ence checking also eliminates Implementation Design Logic Cones y VVVVV 7Y Figure 2 Matched or mapped compare points between design versions Xcell Journal 7 Pee Ne Oe Today s very large programmable devices may contain hundreds of thousands of logic cells making equivalency checking a mandatory component of the modern competitive design flow the need to create extensive sets of test pat terns in an attempt to demonstrate equiva lence by automatically identifying compare points and then applying sophisticated ver ification techniques Dynamic simulation remains an important part of your verifica tion strategy but its primary use is to ensure proper RTL behavior Equivalence Checking in FPGA Flows Because the value of EC is well understood by ASIC designers its initial application to FPGA prototyping is an obvious choice In the prototyping context you want to prove that your FPGA prototype implements the RTL functionality that will be used to cre ate an expensive ASIC The FPGA verification usually hap pens in two phases first by proving the equivalence of the reference RTL to the FPGA Implementation Synthesis DC FPGA or Other Auto Setup Y Formality post synthesis netlist and second by prov ing the equivalence between the post syn thesis netlist to the post place and route PAR netlist You can perform a similar check between the reference RTL and the ASIC im
133. endors take a tremendous risk of being late to mar ket Some studies have shown as in the IBS 2003 report that just a three month slip in the schedule can cause a product revenue 30 000 000 9 50 ASP 25 000 000 Rm 80 ASP A Design Cost 1 000 000 20 000 000 800 000 o 15 000 000 600 000 400 000 10 000 000 5 000 000 Required Life Cycle Volume unit 200 000 0 25 0 18 Deverlopment Cost per Design CH 0 13 0 09 Process Geometry u Figure 1 The increasing cost of developing ASSPs as process nodes shrink means that companies are forced to go after high volume blockbuster products to get a reasonable return on investment Xcell Journal 43 ar HIGH VOLUME SOLUTIONS 40 0 35 0 30 0 25 0 20 0 gw Fast w Med 15 0 H Slow Revenue Reduction 10 0 5 0 0 0 3 Months 6 Months 9 Months 12 Months Project Delay Figure 2 The impact of time to market on product revenue in fast medium and slow moving market segments IBS 2003 reduction of about 15 see Figure 2 In the extreme case of seasonal products widely prevalent in the consumer electronics indus try schedule slippage can mean that an ASIC product will miss customer demand altogether Customer specific FPGAs allow ASSP vendors to postpone spending decisions unt
134. ening The answer is not much In fact simply looking at the activ ity indicators before even taking a trace it becomes obvious that the transmit state machine is locked up Acquiring data through bank 0 shows that the state machine is stuck at 01H the idle state and transaction Ds TIDs that should be counting are stuck at ODH If we change our probe points to signals of interest over to the receive side through 68 Xcell Journal measurement bank 1 things are hung up there too at 02H the receive idle state Smarter Logic Analyzer Triggering Now that we know the condition of the transmit state machine once it hung up we can redefine the trigger point to be not that value which should allow us to view state machine activity leading up to this hung condition The acquisition reveals what looks like a normal state machine cycle just before the hang up Similarly by switching probe points with the FPGA Dynamic Probe over to the receive side and defining our trigger as not idle there is also a nor mal state machine progres sion So we might try next to come at this problem with the oscilloscope The Oscilloscope Insight A point of particular interest for an oscilloscope measure ment is to view activity on the serial channel and look for anything unusual We saw the serial channel between the the eye diagram on the data it looks fine No glitches closing eye or noise Trying again from b
135. ential problems before place and route And with its block based incremental capa bilities you can divide and conquer your design one block at a time Analysis _PlanAhead design tools take a synthesized EDIF netlist as input and one or more UCF files for constraints The powerful and easy to use graphical environment displays the Xilinx device that you have targeted see Figure 2 At this point you have several tools available to explore your design space Second Quarter 2005 Timing Analysis TimeAhead is a flexible timing analyzer built into PlanAhead software It allows you to estimate route delays before run ning place and route Using the PlanAhead block based approach the accuracy of timing estimates will improve as blocks in the design are implemented through place and route Figure 1 PlanAhead software fits into your existing design flow TimeAhead also allows you to analyze and identify performance bottlenecks in small sections of the design You can use this feedback to improve results through re synthesis or better floorplanning Device Selection It is easy to explore different devices within compatible families for a netlist because PlanAhead tools can open mul tiple device views floorplans on the same netlist In combination with TimeAhead this enables you to decide on the speed grade that matches your target performance and the optimal size of your Xilinx part It can also help
136. ers available in the Spartan 3 fabric are useful in a variety of DSP applications Each multiplier can replace as many as 450 logic cells that would normally be required to imple ment a multiply function Many implementations of finite impulse response FIR filters base stations digital SRL16E EI INIT 1234 Figure 3 Half of the LUT in the Spartan 3 FPGA are configurable as a 16 bit shift register Second Quarter 2005 video systems wireless LANs xDSL and cable modems use multiply accumulate functions You can implement high per formance FIR filters that use multiple MACs with minimal area penalty within the Spartan 3 fabric Such MAC intensive func tions would ordinarily take significant logic resources in competing FPGAs that lack embedded multipliers Vcc Conclusion Spartan 3 3E features were designed to save costs by allowing the design to be imple mented using the smallest and thereby the lowest cost silicon Spartan devices utilize the world s most advanced manufacturing process and incorporate features that allow you to integrate even more functionality in the smallest FPGA SRL16E Figure 5 Traditional FIFO requiring dual port RAM and two counters Write 0000 Les E CE 4 CE L CE L CE L CE 4 CE 4 CE E CE 4 CE 4 CE 4 CE L CE 4 CE L CE le DO DO DO DO DO DO DO DO DO DO DO DO DO DO DO DO gt gt a f 1111 Figure 6 SRLIGE mode with a single counter can im
137. es 4 the industry ranging from lowest cost XC9500 XL devices Applications of the Spartan Family in Flat Panel Displays 1 46 to the ultra low power CoolRunner II device family This WEE 49 Pt A ar ging Sig eege ket for 16 consecutive annualized quarters propelling Xilinx Meeting Timing and Reducing Area with the Synplify Pro Ing 52 to the number two position in the CPLD market a The Changing Face of Automotive ECU Design csssssssssseeeeveesssseee 54 Xilinx conceived of and developed the market for high volume programmable logic products In this issue of the ell Journal youll find articles about products and appli cations within this market with many contributions from S LU satisfied customers e Quarter 2005 A Multimedia Platform for Automotive and Consumer Markets Ad le Gel HIGH VOLUME SOLUTIONS Spartan 3E FPGAs Introduce a New Era in Low Cost Programmable Logic The Spartan 3E AC3S100E FPGA is the first 100 000 gate devi Z A E by Richard Terrill Senior Manager High Volume Products Marketing Xilinx Inc richard terrill xilinx com First introduced in 1998 the Spartan family was the world s first FPGA series tai lored for low cost applications With the introduction of the Spartan 3E family Xilinx now has seven families of Spartan FPGAs in production with more than 100 million
138. esearch Corporation His article Changing the Systems Landscape with Low Cost FPGAs discusses how advanced processing technology is driving down the costs of FPGAs and changing the ASSP ASIC crossover point High Volume Solutions Approximately one year ago Xilinx engineers were asked to develop the lowest cost highest density FPGA family on the market This section features articles using the new Spartan 3E devices in several high volume applications Their low cost makes these devices successful in applications previously reserved for ASIC and gate array technologies Industry Expert In Extend Your Reach well known signal integrity expert Howard Johnson discusses the Virtex 4 RocketIO transceiver which incorporates three forms of equalization to stretch the reach and performance of serial links Design Tools Software support for the Spartan 3E family is available with the newly announced ISE 7 1i tools This section has articles from Xilinx and some of its partners describing the use of their design tools to implement a complete solution Board Room The Board Room section describes some of the Spartan 3 hardware development boards and other design solutions to help you determine which platform is best for your application and design task Power Play The new Power Play section features technical descriptions of our partners power solutions In Steve Sharp s lead article Conquering the Three Challen
139. essing CVBS video sources require 54 MBps each The Xilinx MicroBlaze soft processor core running at 100 MHz requires 400 MBps while the logiBITBLK processing 16 bpp images and running at 50 MHz requires 100 MBps This gives a total bandwidth requirement of 2 x 16 MBps 2 x 54 MBps 400 MBps 100 MBps 640 MBps The overall memory bandwidth for a 32 bit DDRAM running at 100 MHz is 800 MBps This leaves 160 MBps of spare memory bandwidth that can be used for efficient memory arbitration and access to memory without stalls System Gate Count Table 1 shows the size of each IP core and the total number of slices required for the backseat entertainment FPGA design Every single IP is compact with a low gate count resulting in a lower cost solution The complete backseat entertainment system is a complex multimedia applica tion that fits into a Spartan 3 XC3S1000 device leaving approximately 2 000 slices for extra FPGA circuits A separate display controller application driving a single dis play including 2D graphics acceleration Second Quarter 2005 MicroBlaze logiCVC logiBITBLK INT Control logiCAN 120 Xilinx OPB I2C Control logiUART Table 1 IP core size Xylon FIFO UART Control LIN Bus Note 1 nto someto ii alo ODO Corl Won ici apen 2x 61501 Xilinx OPB INT Control Ultimodule Versatile Video Input 2 x 960112 Xylon CAN 2 0B Control 200 225
140. evep synplicity com Being the first to deliver a product in a given market greatly increases its chance of success Effective use of chip resources allows for less expensive parts and a less expensive overall solution Synplify Pro software has many timing closure features that increase your chances of using slower speed grade parts without com promising on quality The Synplify Pro tool offers a perfect fit for the high volume low cost Spartan series of Xilinx FPGAs Identifying Critical Areas Usually only a few sections or modules in a design fail timing It is often a good idea to recognize these structures while coding your design During coding the designer usually knows what device the design is tar geted for but not the speed grade Bumping up to the next speed grade can add unbud geted costs to the project In this article 52 Xcell Journal we ll outline how to design with perform ance and area in mind along with a few tricks of the trade for reducing area You should know roughly how many levels of logic the requirements will toler ate before failing timing Keep this in mind while coding counters state machines decoding and data path logic When the logic is nearing the first draft synthesize and place and route this will give you a glimpse at the problems ahead If you meet timing then read further to the area saving section If you dont meet timing here is a checklist you can go through e
141. fairly high gt 40 making it ten uous for high volume low cost applica tions The Spartan options drop that cost substantially and add the flexibility of pro grammable logic to integrate and imple ment other system capabilities In 250K quantities reasonable for typical consumer applications the Spartan 3E version will cost approximately 17 Second Quarter 2005 Conclusion In addition to reducing total costs the Spartan FPGA PHY option gives you substantial flexibility to build PCI Express to anything bridges and inte grate other circuit elements As most sys of bandwidth requirements preserving flexibility is tems have a range important so that you can add lanes with out dramatically changing the layout Spartan 3 and Spartan 3E FPGAs are available in a wide range of densities and preserve migration up and down in over all bandwidth And because FPGAs are fully reprogrammable post deployment they eliminate the risks associated with first generation ASSPs and ASICs If you are currently using PCI for your interconnect standard and are architect ar HIGH VOLUME SOLUTIONS ing your next generation designs you should consider the PCI Express option from Xilinx We encourage you to find out how Spartan 3 and Spartan 3E FPGAs will help you meet your current and future design requirements More information about Spartan 3 and Spartan 3E FPGAs PCI Express IP and compatible PHY devices is availabl
142. fits of implementing an 8 bit PicoBlaze or a 32 bit MicroBlaze soft processor core within the Spartan 3 architecture and highlights the advantages of the embedded IBM PowerPC core with in Virtex M II Pro and Virtex 4 FPGAs With the Embedded Development Kit EDK you can create any of the combina tions that fit your design specifications by applying the techniques taught in the train ing course After the two day course the engineer remains on site to assist the design team in transferring what they learned in the course to a practical design environment Most engineering teams will have quite a few lingering questions after such an in depth course With the trainer on site for several days afterwards these questions can be answered Another major feature of Embedded Processing QuickStart is assistance with system partitioning Some applications are better suited for a processor and others are better suited for FPGA fabric To obtain optimum performance your design team will be able to determine which functions to implement to logic and which func tions should be executed by the CPU This is critical for any embedded FPGA design With proper hardware software partitioning your design can achieve superior performance The on site engineer management leaves behind a training plan tailored specifically to the needs of your design team which can include system parti tioning CPU programming C pro gramming debugg
143. functionality and you may have to buy additional parts to com pletely solve the problem The more parts you use the higher the chance of picking a part that may be discontinued and the more purchase orders and stocking require ments you generate Plus if you need to assemble more parts the board assembly cost will probably increase But if a simple AND gate is all you need a discrete logic device is probably the right choice Let s look at the case where you need more than just a simple gate CPLDs have 32 Xcell Journal been continually cost reduced and are now on par with some discrete logic devices If you are using multiple discrete logic devices the benefits of a CPLD become even more appealing This trend opens up a whole new level of integration at a low price point For the first time in history CPLDs can com pete with discrete solutions add func tionality and cost less For instance a typical voltage translation device usually costs from 0 50 to 3 50 depending on volume package type the number of I Os to be used and the process technol ogy of the device CPLD devices are below 1 00 for the smallest density and lowest cost package But you get a whole lot more within that single CPLD than any discrete function logic device With the addition of I O banks on CoolRunner II 32 and 64 macrocell CPLDs voltage and standard I O transla tion can occur in very small cost com petitive CPLDs When u
144. fy the sys tem and check the resultant perform ance When the resultant architecture performs to the desired level the system is then transferred back into the Xilinx ES Poseidon Generated Accelerator and Driver Off Chip Memory OPB or PLB Bus Figure 2 Accelerated system architecture Second Quarter 2005 OPB or PLB Bus Figure 3 Accelerator block diagram tool chain The Triton tools accelerate the process of identifying problem areas and establish an integrated flow that allows you to move between the tools to develop the optimal architecture A typical flow shown in Figure 4 comprises these steps e The designer develops the target archi tecture using selected Xilinx tools e The architecture description is read from the microprocessor hardware specification file mhs from EDK and the ANSI C application source code is read into the Triton tools e Triton Tuner profiles the ANSI C code reveals bottlenecks in the code or archi tecture and eliminates inefficiencies e Triton Builder partitions compute intensive algorithms in ANSI C into hardware and generates a hardware accelerator e Triton Tuner verifies that the new system performs to the desired level e RTL test bench modified C code driver and architecture are exported back into the Xilinx environment Conclusion Poseidon s Triton Tool suite enables you to rapidly and predictability analyze opti mize and
145. g out high speed MAC operations but have bandwidth limi tations FPGA technology has made tremen dous progress in recent years by increasing a large number of intellectual properties to reduce the cost of silicon development in var ious markets This is accomplished by opti mizing architectures using leading process technologies and adding IP cores Some of the typical applications for dig ital signal processing are digital cameras phones 3G wireless video conferencing systems and high definition digital televi sions Having a signal processing capability inside an FPGA is the perfect design inno vation the stepping stone to system on chip in an FPGA without the high cost of complex customized chip development System Generator for DSP Xilinx expanded its features in the Spartan 3 FPGA by adding embedded multipliers in the architecture This technological inno vation is similar to embedded block mem ory clock management and multiple standards for high speed I O circuits all standard characteristics of the Xilinx Spartan 3 and Virtex II Pro families Time to market remains critical for companies developing both system hard ware and software With Xilinx System Generator SysGen you can simultane ously create behavioral level hardware blocks and simulate the entire system with just a few tool clicks The design environ ment allows you to create block based sys tems like digital QAM modulators for software
146. g the entire system s power distribution plan In line powered consumer products the goal is usually to use the smallest and least expensive power supply possible to keep costs under control Exceeding the capabilities of a particular model power supply by only a few percent can necessi tate the use of a larger more expensive supply and this might be unacceptable in light of total system cost Designers Ko on Why is power such an issue would rather design in more features to differentiate the product than to use a larger power supply In portable consumer products the overwhelming goal is to extend battery life for as long as possible For these products longer battery life both in active and standby modes is a significant competi tive advantage With all of these challenges it s no won der that power issues are sounding the alarm bells for system designers today iSuppli s Selburn continues On the cus tomer side chip designers can consider architectural approaches such as parallel processing at reduced clock speeds to reduce dynamic power or gated clocks that essentially turn off entire sections of the chip when they are not needed Despite these techniques power consumption remains a serious issue for a large portion of the core silicon market an issue that is becoming worse not better with time Second Quarter 2005 System Design Challenges There are three key areas of powe
147. g variations in board stackup trace width variations ball erid array breakouts stubs vias loads connector transitions or large power plane discontinuities The significance of a reflection is impact ed by several factors including the imped ance difference the length over which the impedance difference occurs relative to the overall length of the transmission path and your technology s tolerance for noise Some reflections if significant enough cannot be resolved using the three Ts For this reason careful pre lay out impedance planning with a tool like HyperLynx s Stackup Planning shown in Figure 1 is a critical part of a proactive impedance control process The First T Triage by Technology Several strategies exist for dealing with non ideal routing The first is to know which nets can accommodate poor routing and which cannot A technology triage strategy works well dividing nets into those that are e Signal integrity critical clocks strobes and signals that require clean edges between OK strength termination strategy and signal quality e Timing critical address data and sig nals that can have non ideal edges but must align with timing requirements e Signals with driver edge rates faster than 5 ns A quick look at the effect of fast driver edge rates is instructive Figure 2 shows the effect of increasing driver edges on the same 5 inch trace The 10 ns and 5 0
148. ge for both embedded software engineers and logic designers To take advantage of the embedded capabilities within the hardware designers should learn how to properly uti lize the Xilinx Embedded Development Kit EDK and how to apply the best methodology to implement its functions With time any design team can over come these hurdles and successfully com 98 Xcell Journal plete an embedded system In the amount of time it takes for learning and training the competition could release their prod uct first A productive and effective design team will enable you to ship to your cus tomers before your competitors do The Embedded Processing QuickStart solution delivers individualized service that includes a QuickStart application engi neer at your site for a week This Xilinx expert will train your team on creating embedded systems and teach them how to optimize supporting FPGA features The assistance provided will help your team exceed expectations and achieve desired design results Embedded Processing QuickStart includes e Configuration of the Xilinx design environment e An instructor led embedded systems development course e Design architecture implementation consultation and guidance e Guidance with system partitioning e Initial design techniques to enable faster and more effective debug and verification e A comprehensive training plan Second Quarter 2005 Benefits If you had to choo
149. ges of Power Consumption he writes To conquer the key challenges of power consumption it takes a combination of good product design proper device technology and tools that let you take control of system power management And It s Time to Re Subscribe Periodically we must clean our mailing database As of January 1 2005 you must re subscribe to continue receiving the Xcell Journal FREE If you subscribed after January 1 2005 you do not have to re subscribe If you subscribed before this date please visit our website at www xilinx com xcell subscribe and take a minute to renew your FREE subscription and ensure uninterrupted delivery Forrest Couch Managing Editor RE GE E This section comprises articles from Xilinx and its partners describing the use of the newly announced ISE 7 1i software as well as other design tools This article by well known signal integrity expert Howard Johnson discusses the Virtex 4 RocketlO transceiver which incorporates three forms of equalization to stretch the reach and performance of serial links High Volume Solutions The latest FPGAs from Xilinx set new records in capacity capability performance power 5 efficiency and value The new Power Play section features technical descriptions of our partners power solutions This section describes some of the Spartan 3 hardware development boards and other design solutions SECOND QUARTER 200
150. gin above the threshold to only 35 instead of the nominal 50 Sy INDUSTRY EXPERT Runt Pulse Degradation On the left side of Figure 4 is a sine wave with a period of two baud To the extent that the runt pulse pattern 101 looks somewhat like this sine wave you should be able to infer the runt pulse amplitude from a fre quency domain plot of channel attenuation Let s try it In Figure 4 the data waveform has a baud rate of 2 5 Gbps One half this frequency the equivalent sine wave frequency equals 1 25 GHz According to Figure 5 the half meter curve gives you 4 5 dB of attenuation at 1 25 GHz The same curve also shows 1 5 dB of attenuation at 1 10th this frequency corresponding roughly to the lowest frequency of interest in an 8B10B coded data transmission system The difference between these two numbers 3 dB approximates the ratio of runt pulse amplitude to low frequency signal amplitude at the receiver With only 3dB degradation the system satisfies our 70 frequency domain criterion for solid link performance precisely explaining why time domain wave forms look so good at a half meter Looking closely at Figure 4 the actual runt pulse amplitude in the time domain is 85 not quite as bad as the 3dB predicted by our quick frequency domain approxi mation This discrepancy arises part ly from the harmonic construction of a square wave where the funda the amplitude of the square wave signal men
151. h provides the ability to prototype or evaluate the Multimedia FPGA Platform by connecting displays input devices vehicle communica tion buses and video signal sources Figure 2 Backseat entertainment prototype Other multimedia applications are equally suitable such as consumer med ical and measurement instrumentation or factory automation Today automotive OEM and first tier manufacturers have used Xilinx FPGAs and Xylon IPs to develop infotainment systems A HIGH VOLUME SOLUTIONS Conclusion The value of the Multimedia FPGA Platform from Xylon is in the high number of different displays tested the broad feature set and the ability to configure for perform ance price low development and produc tion cost all with virtually no obsolescence Xilinx FPGAs and in particular the Spartan 3 family with its low cost per gate and rich architecture provided an excellent base on which to build Combining Spartan 3 devices with Xylon logicBRICKS IP cores provided the ideal combination of feature set form factor performance and time to market capabilities for the develop ment of the Multimedia FPGA Platform For information about the Multimedia FPGA Platform and logicBRICKS Xylon IP library please con tact us at sales logicbricks com or visit more O www logicbricks com Multimedia FPGA Platform Features e Displays supported wide resolution range 128 x 64 to 1280 x 1024 pixels wide range
152. hardware resource across a range of algorithms In this article we ll describe the design flow test results and system optimization for implementing a Smith Waterman algo rithm using a scaleable Nallatech FPGA computing architecture Sequence Matching Overview The objective of bioinformatic sequence matching is to identify similarities between subsequences of strings as far as possible The dissimilarity of two sequences of nucleotides or amino acids can be defined as the minimum change required in one to get the other one among all possible align ments between them this is the edit dis tance of the pair of sequences Note that it depends on the choice of scoring scheme For instance a possible scoring scheme can assign 0 values to Second Quarter 2005 matches of nucleotides at certain positions in the sequences 1 to gaps insertions or deletions at those positions and 2 to mis matches substitutions at those positions The Smith Waterman algorithm com putes the local alignments or similarities between substrings of two sequences A and B of lengths m and n respectively using a dynamic programming approach Dynamic programming is a strategy of building a solution gradually using simple recurrences The key observation for the alignment problem is that the similarity between sequences A 1 n and B 1 m can be com puted by taking the maximum of the three following values e The similarity of A 1 n 1 and
153. hortest path from design start to project tinish is through Xilinx 7 11 design tools by Bruce Talley Vice President Design Sottware Division Xilinx Inc bruce talley xilinx com It s a good time to be an FPGA or CPLD designer The high cost of ASIC based projects and the high risk Table of Contents of structured ASIC designs are enabling more advanced features with every new generation of silicon But without Design Performance Leaps Forward with ISE 7 1i Software easy to use productive design tools all of that power is still just silicon waiting to be unleashed Debugging with Combined Oscilloscope and Logic Analyzer Measurements Xilinx ISE software has long been known to Real Time Debug That Dominates deliver some of the fastest logic in programmable design but the ISE design suite also delivers ease of use at the Integrating MATLAB Algorithms into FPGA Designs lowest cost possible With the addition of advanced ae options like PlanAhead design and analysis software Complex FPGAs Require Equivalence Checking and ChipScope Pro real time FPGA debug the com Complete FPGA and CPLD Power Analysis plete ISE system delivers solutions to design bottlenecks and gets you to project completion faster Increase Performance and Lower Cost Through System Design In this issue of the Xcell Journal we present a series of artitl s about desietilools and desien appitions to help Control Your Designs with the PlanAhead Hierarchical Design and A
154. ical Block a physical design entity that con tains one or more logical RTL or structur al EDIF hierarchical instances Simply create a PBlock assign a region for it export its netlist and relevant con straints for place and route import its placement results and verify its timing PBlock technology enables several key methodologies Flexible Block Partitioning PBlocks can comprise any number of hier archical instances or logic elements LUTs FFs PBlocks need not adhere to your original logical hierarchy and can be dynamically redefined whenever necessary throughout your flow Let s explore the advantages of this technology Performance Based Floorplanning PBlocks can group logic together to improve performance on your design Corralling critical paths into smaller regions on the device can potentially improve performance through reduced route delays You can use any of PlanAhead s analysis features to drive PBlock creation and floorplan genera tion Once defined a floorplan can absorb minor to moderate changes to the netlist to help keep your place and route results predictable Block Analysis If design analysis reveals that your criti cal paths are contained by a few hierar chical instances you might consider the following technique Create a PBlock with these instances and export the PBlock to place and route PlanAhead software will also export timing con straints relevant to those instances Your
155. icallygs software integration with your FPGA design easy The compact design of Nucleus PLUS can run from on chip memory to help minimize power dissipation and deliver increased performance This combined with a wealth of middleware including the Nucleus NET TCP IP solution supporting Xilinx Ethernet IP makes it ideal for products targeted at the networking telecommunications datacoms and consumer markets Discover how Nucleus software can make your embedded development for Xilinx FPGAs easy Go to our dedicated Web page for more articles and free downloads at www acceleratedtechnology com xilinx e Nucleus EDGE our new embedded development environment based on Eclipse offers a complete tools solution for developing designs with Xilimx PowerPC cores Nucleus EDGE provides you with a Builder Editor and Project Manager utilizing optimizing compilers comprehensive embedded debuggers and RTOS profiling tools For more information On MiMemMNucleus complete development solution fra UML through to embedded RTOS go to WWWiacceleratedtechnology com Nucleus Embedded made easy VD D Technology A Mentor Graphics Division Accelerated Technology A Mentor Graphics Division info AcceleratedTechnology com www Accelerated Technology com 2005 Mentor Graphics Corporation All rights reserved Mentor Graphics is a trad mark of Mentor Graphics Corporation w E WA 38 Xcell Journal Second Quarter 2005 IN e E N Using
156. ifferences The principal idea behind these features is to integrate functions and simpli fy the designers task The first and most basic is clock generation For most designs a single clock source may not generate all of the necessary frequencies needed in the design Also if the PCB is larger than a hand held device clock skew may come into play The design team considered some of these clocking design problems and cre ated solutions internal to the CPLD Clock Doubler and Divider Two features included with CoolRunner II devices are DualEDGE and in 128 macrocell and higher devices a clock divider DualEDGE flip flops offer a performance boost for sequential operations while the clock divider can reduce power consumption But that s just the tip of the iceberg Let s start with a simple clock dou bling scheme where we can generate a frequency double that of the incoming clock This yields two clock sources from a single clock input So if you happen to need a faster clock to improve a serial data flow DualEDGE flip flops are a perfect fit It also works well for improved pulse width modulation or higher resolution of timers counters Thus you need not use another clock source for portions of your design that need to run faster Another key point is the fact that you can preserve a clock input pin So with DualEDGE flip flops you get a free clock without using an input pin on the CPLD Figure 1 shows a simple diagram
157. ify system architectures for Xilinx FPGAs with embedded hard or soft processors Electronic system level ESL tools and predefined cores can help you make critical architectural decisions simplify your design tasks and reduce your overall design cycle In this article we ll show how Poseidons new ESL tools can help you realize these benefits Triton Tool Suite Poseidon s Triton Tool suite is a system design and acceleration environment that enables you to quickly develop analyze and optimize system architectures With these ESL tools the abstraction level of the design is above RTL you can quickly address system issues without having to solve the detailed issues surrounding RTL implementation Thus you can quickly perform architectural what if analysis and accelerate time to market The tool suite was created specifically for processor based systems that require efficient robust architectures with the need to optimize performance power and cost Triton comprises two main tools e Triton Tuner a system and software analysis tool e Triton Builder a hardware software partitioning tool Triton Tuner is a simulation and analy sis environment based on SystemC Simulation is performed at the transaction level from models of both the processor and surrounding buses and peripherals Using Tuner you co simulate the hardware with the application software During simulation the tool collects b
158. igners and system architects to have an end product with increased per formance and a lower BOM For systems manufacturers this leads to increased prof itability a lower overall system cost or both At face value this situation seems rather straightforward lower device costs equal lower system costs everyone is a winner However the real answer to this question goes much deeper than lower costs To gain a better perspective on this issue we must explore some of the back ground of the FPGA market and the ASIC industry as a whole A Difficult Design Environment In today s market environment there are a series of trends adversely influencing the ASIC design community One of the trends most often discussed is the rise in NRE which are costs associated with the design of ASICs Figure 1 shows how these costs have increased by process geometry NRE costs have risen in large part because of the rising device complexity As the possible gate counts have increased at each process node so too has the task of assembling these gates into a design that meets the needs of the market With its low to none NRE costs pro grammable logic is a great aid to designers who need to prove out their designs in the shortest amount of time to meet a moving market window but do not have the engi neering or financial resources necessary to commission a new system on chip design Figure 2 shows the relationship between design cycle t
159. il market uncertainties are removed Once a design is completed you no longer need to plan three to four months in advance and hope that the silicon you get at the end of the day works to your specifica tions With one of the highest levels of reli ability and shortest lead times in the industry customer specific FPGAs require ASSP vendors to incur little to no risk in the conversion process Furthermore with just eight to ten weeks to production ASSP ven dors can go to market with confidence and hit that prime market window Technical Challenges A common challenge that many ASSP ven dors have to deal with is the changing land scape of industry standards This is particularly true in wired telecom and wire less applications where the lack of a coordi specify telecommunications protocols and interface nated effort to various standards can result in incompatibilities between different vendors products Ratifying standards is a time consuming process and vendors often have to freeze their designs much earlier to get first 44 Xcell Journal mover advantages ASSP vendors must either gamble that their final specifications will get adopted in a standard implement a superset of all possible future combina tions or delay the design phase running the risk of being late to market These options are far from optimal One of the major advantages of FPGAs over ASICs is the flexibility to make design changes in case of
160. ile FPGA Power Solutions trom National Semiconductor LM2743 44 Low Voltage Synchronous Buck Controllers LM2743 Typical application diagram Input power 1V to 16V 3V to 6V Bias ON Vout 0 6V to 0 85 Vin gt A ISEN LM2743 LG PGOOD Up to 25A loads Output power good SS TRK EAO GND e Highly efficient 2A to 25A solution e Input voltage from TV to 16V e Adjustable output voltage as low as 0 6V e Power good flag and output enable e 1 5 reference accuracy over temperature e Current limit without sense resistor e Programmable softstart e Switching frequency from 50 kHz to 1 MHz e Available in small TSSOP 14 packaging The LM2743 and LM2744 are high speed synchronous buck reg ulator controllers that drive external MOSFETs to supply as much as 25A of current They can provide simple down conversion to output voltages as low as 0 6V Although the control sections of the ICs are rated for 3 to GV the driver sections are designed to accept input supply rails as high as 16V The use of adaptive non overlap ping MOSFET gate drivers helps avoid potential shoot through problems while maintaining high efficiency A wide range of switching frequencies from 50 kHz to 1 MHz gives Xilinx FPGA and system power supply designers the flexibili ty to make better trade offs between component size cost and effi ciency A versatile softstart and tracking pin allows ratiometric coincidental or offset tracking which are
161. ilt in functionality for timing triggering and advanced I O handling synchronization and event counting However in some applications the timing model or built in functionality breaks down In these cases providing hardware with a user programmable Virtex II FPGA becomes advantageous because you can define the device to serve the specific needs of your system and application With NI RIO hardware you also save time and money Rather than building multiple boards for different applications you can reduce costs by using only one device and reprogram it for different appli cation needs Additionally because code execution is implemented within the Virtex II device the hardware responds faster than software which is very beneficial in applications requiring high speed control and tight trig gering and synchronization When imple DGTEAIEAGLELOK GK CLD Figure 3 FPGA hardware architecture on NEreconfigurable I O hardware Xcell Journal 9 DESTE IN e menting control loops within Virtex II devices digital loops can run up to 20 MHz and analog loops up to 125 kHz These rates are not achievable in implementations using only system level software Implementation of LabVIEW FPGA One of our key objectives during the imple mentation of LabVIEW FPGA was to pre serve the same semantics level of abstraction and ease of use between the hardware implementation using LabVIEW FPGA and the desktop version of LabVI
162. imes gate count and product life cycles Designers are under pressure to hit a market window that is shrinking while at the same time their designs are becoming more complex These trends are at odds with product life cycles which are also shrinking The net effect of these trends is to make the design environment more challenging for ASIC designers Even before a design is complete the market at which it is aimed could have changed necessitating changes in the design itself This quickly becomes a no win situation for all concerned Furthermore as the cost to create a complex ASIC solution increases so too must the size of the market towards which the solution is targeted If the design costs for a complex ASIC are in the 30 million range the size of its target market must be many times larger than the design costs just to recover the initial investment in the sili con Admittedly there are not many appli cations today that can command such high unit volumes to guarantee the recovery of this initial investment An additional detrimental effect is the decline in ASIC design starts because of rising design costs decreasing product life cycles and shrinking market win dows Several of these trends and ASIC Second Quarter 2005 design start numbers for various ASIC product types including FPGAs and PLDs by end appli detailed upcoming report from Semico titled ASIC Design Starts An Industry in Transition
163. in succession Figure 10 presents the set of four possible frequency response curves attainable with this receiv er equalization architecture Each section of the equalizer is tuned to approximate the channel response of a typical PCB channel with an attenuation of about 3 dB at 2 5 GHz With all stages on you get a little more than 9 dB of boost at 2 5 GHz response keeps rising all the way Because the to 5 GHz this equalizer is useful for data rates up to and beyond 10 Gbps When setting up the equalizer first select the number of sections of the RX linear equalizer that best match your overall channel response Then fine tune the overall pulse response using the 5 bit programmable coefficients in the transmit pre emphasis cir cuit to obtain the lowest ISI the lowest jitter or a combination of both After building the circuit a clock phase adjustment internal to the receiver helps you map out bit error rate BER bathtub curves so you can corroborate the correctness of your equalizer settings The flexibility provided by these two forms of equalization lets you interoper ate with an amazing array of serial link Second Quarter 2005 Channel di Equalizer 5 dB div Equalizer Channel Frequency band of interest 108 10 1010 Frequency Hz Figure 8 Composing the pre emphasis circuit with the channel produces an overall response much flatter than either curv
164. ing and verification This plan helps prevent schedule slips later in the project by ensuring that your team is skilled in the required disci plines It also helps maintain a more effective and highly motivated team that is properly equipped and confident to handle any obstacle Conclusion Software engineers must design across logic domains and hardware engineers must learn how to design in the software domain The effect of these challenges lengthens the development schedule and can affect time to market The Embedded Processing QuickStart solution offers an unprecedented level of on site design support and training for the critical initial design phase of your project This service will not only show your team where to begin it will also empower them to complete the project on time and on budget about Embedded Processing QuickStart con For more information tact your Xilinx representative or go to www xilinx comlepg Xcell Journal 99 Implementing Biointormatics Algorithms on Nallatech Contigurable Mult FPGA Systems gt II 19 A computational acc eration of 200X S impressi Ie but the future of bioi formatics 1 requires mot d by Keith Regester Systems Applications Engineer Nallatech k regester nallatech com Jong Ho Byun Research Assistant University of North Carolina at Charlotte jbyun uncc edu Arindam Mukherjee Assistant Professor University of North Carolina at Charlotte
165. ion for low cost systems providing the system level building blocks necessary to successfully interface to the latest generation of DDR memories Included in all Spartan 3 FPGA input output blocks IOB are three pairs of storage elements The storage element pair on either the output path or the three state path can be used together with a spe DDR transmission This is accomplished by tak cial multiplexer to produce ing data synchronized to the clock signal s rising edge and converting it to bits syn chronized on both the rising and falling edge The combination of two registers and a multiplexer is referred to as double data rate D type flip flop FDDR Memory Controllers Made Fast and Easy Xilinx has created many tools to get design ers quickly through the process of building and testing memory controllers for Spartan devices These tools include reference designs and application notes the Memory Interface Generator MIG and more recently a hardware test platform Xilinx application note XAPP454 DDR2 SDRAM Memory Interface for Spartan 3 FPGAs describes the use of a Spartan 3 FPGA as a memory controller Second Quarter 2005 DQS Internally or Externally Delayed DQS Phase Shifted DCM Output to Capture DQ to Capture DQ GEN NY NY MY MY Figure 1 Read operation timing diagram with particul
166. ipScope Pro logic analyzer supports multiple window views and bus plotting in either data versus time or data versus data formats Capture mode lets you compare data captured after multiple trigger events sig nal filtering lets you ignore data that is not critical to your analysis saving you mem ory and time Using the listing viewer you can import bus token files and view instructions in the order they occur To facilitate processor system debug environments that use software debuggers in addition to ChipScope Pro tools you can share the JTAG connection to the FPGA with the ChipScope Pro analyzer In addition to providing data capture capabilities the ChipScope Pro system also includes the Virtual I O console the interface to one of the industry s first real time virtual input output cores Through the Virtual I O console you can control virtual inputs and pulse trains and view output activity The ChipScope Pro system now runs on Windows Solaris and Linux Red Hat Enterprise 3 New Capabilities ChipScope Pro 7 1i software offers new capabilities that take real time debug to the next level Leading the way is the new remote debug capability ChipScope Pro tools can now run in server client mode over a TCP IP connection You can sit in your office while debugging a board next door in the lab or on the other side of the world You can share a single board system in the lab with other debug engineers on your team or allo
167. itch over to the receive state machine with the FPGA Dynamic Probe the state machine also looks okay at least initially Figure 3 But with a change of time division for a macro view we see that the state machine stalls 12 packets after the serial channel glitch The transaction IDs and acknowledge IDs increment for a time but then stop That s not acceptable Second Quarter 2005 and it points to the possibility of invalid data being input to the state machine somehow related to this narrow pulse on the serial channel Time correlated measure ments have helped show that these are interrelated events The next step would be to isolate which one causes the other correcting the logic causing a serial channel nar row pulse or fixing the nar row pulse causing incorrect state machine logic Cross Triggering in the Other Direction We ve determined that the first time we have a narrow pulse it appears that it coin cides with problems in our state machines But does a narrow pulse always exist in the serial channel when the state machines lock up One approach to determine the answer is to have the logic analyzer cross trigger the oscilloscope and have the logic analyzer trigger on the lock up Fortunately to switch the direction of the cross trigger simply requires checking a box in the user interface the application already has determined the cross trigger delay from the calibration performed ear
168. ith a thin oxide in the core and thicker oxide in the I O area Virtex 4 devices add a third medium thick oxide transistor used for certain functions in the FPGA The result is 50 lower static power than that of Virtex II Pro FPGAs Other FPGA vendors have gone the other way when migrating to a 90 nm process with static power increasing more than 2X compared to 130 nm devices e Dynamic power New and existing Virtex 4 embedded functions lower dynamic power by 5 to 20x compared Xcell Journal 105 E POWER PLAY to Virtex II Pro FPGAs This results in as much as 86 lower dynamic power than that of other 90 nm FPGAs Note these specific examples PowerPC as much as 86 power reduction Block RAM as much as 82 power reduction DSP as much as 23 reduction with XtremeDSP slice Ethernet MAC as much as 83 power reduction Logic although Virtex 4 devices consume similar dynamic power per logic cell when compared to other FPGAs the embedded IP blocks often allow fewer general purpose logic cells to be used For example when building a source synchronous I O SSIO interface the new ChipSync block reduces the num ber of logic cells used In rush power Other high performance 90 nm FPGAs have exhibited levels of in rush power more than four times that of Virtex 4 FPGAs In Virtex 4 devices by spending considerable time designing very power efficient configu
169. lier Because we know that the receive state machine locks up on a value of 02H idle immediately following 10H TID we can set up the logic analyzer to trigger when it sees TID followed by idle for greater than one clock cycle Doing this importing the related scope capture from the serial channel and mak ing multiple runs we see that indeed there is a direct relation between the first narrow pulse and the state machines locking We can search through the serial stream cap ture on the oscilloscope and find the loca tion of narrow pulses with the E2681A EZJIT jitter analysis software package val idating that there are always multiple nar Figure 3 Scope trigger on glitch cross trigger to logic analyzer and capture of internal receive state machine initially looking okay Miu e 2 5 mee A Bama FA A RE Bake FT 8 4 FEG E F ege gt A n a eege ps Mis Cp r 505 em ee rg Ban pa eg Sa hee fa AA 7 EP a YFF FEBRER Figure 4 Zoom out reveals relationship of glitch to receive state machine stalling after 12 packets row pulses in the vicinity of the lock up We ve made an important step to narrow down the possible causes of the problem Conclusion With complex interactions between asyn chronous digital circuitry and the fast data rates that are susceptible to signal integrity issues it makes sense to pair up your logic analyzer and real time oscillo scope With the ability
170. little to sil icon area but brings many advantages With input hysteresis the problem of noisy envi ronments was eliminated and as a side ln a HIGH VOLUME SOLUTIONS benefit reduced power This power reduction is subtle but nonetheless an advantage that other CPLDs do not include So how does it reduce cost Because the hysteresis is programmable on a pin by pin basis it eliminates the need for external Schmitt trigger devices Although these com ponents may be cheap the extra insertion cost could prove expensive if added after the initial board fabrication Better to be safe than sorry when it comes to reliability One other cost that is commonly over looked is PCB routing and the associated costs with extra layers of circuit board mate rial Fewer layers is always less expensive I O Banking Voltage and I O Standard Translation With today s vast selection of components the chances of pick ing parts that all have the same voltage input levels are slim Consider the demand on todays design engl time to market neers If you cannot reuse some portion of the previous design chances of making the current product release cycle are reduced For example a design that uses legacy ASSP devices may not line up with the same voltage or I O structure of the new processor you just specified With the simple addition of a CPLD the voltage problem goes away and you get extra logic for all of those ne
171. ll Journal n LST tos ES ER ps Al p Enn g t fr Dr reducing price points for FPGAs they are now supported in ISE 7 1i software Spartan 3E devices extend the reach of FPGAs into production volumes by reducing costs while preserving the low NRE and high flexibility of programma ble logic Mainstream applications using Spartan 3E FPGAs and ISE 7 1i design tools will see lower ASIC crossover points and lower total project costs Virtex 4 FPGAs are now supported in more configurations of ISE 7 11 software as well ISE WebPACK software adds Virtex 4 LX15 and LX25 support while ISE BaseX now includes Virtex 4 LX15 LX25 SX25 and FX12 devices You can choose the Xilinx device and ISE config uration that best fits your budget and design demands DEE ER Es LEE Ak mii arji a Figure 2 Design summary in ISE Project Navigator ISE 7 11 software also expands Linux operating system support ISE Foundation software is available in both 32 bit and 64 bit Red Hat Enterprise 3 versions both CDs are included in prod uct update shipments ISE WebPACK software also now runs on Linux Red Hat Enterprise 3 letting you use free down loadable ISE software on the platform that best fits your needs Electronic Software Delivery offering ISE Foundation software and ISE BaseX via Xilinx is also now electronic download to all in mainte nance customers ESD gives you the option of getting your update ISE copy by
172. ll Journal you ll reach more than 70 000 engineers designers and engineering managers worldwide Figure 2 A typical block diagram of functions found within a display application 4 LVDS Channels n P XC3S100E T144 gt 52 DIFF I O Channels The Xilinx Xcell Journal is an award winning publication dedicated specifically to helping programmable logic users and it works We offer affordable advertising rates and a variety of advertisement sizes to meet any budget Figure 3 The large number of differential I Os allows even small Call today and inexpensive Spartan 3 devices to drive large displays A 800 493 5551 ore mailus at cost digital consumer systems In flat panel e Timing control and panel RSDS SEN display systems the Spartan 3 3E architec drivers ee Deene are ture is even more useful because it allows 6 gt e mage compression decompression OT industry and advertise high performance DSP to be implemented in in the Xcell Journal a fraction of the area consumed by competing FPGAs It also has built in support for low swing display I O standards such as RSDS Spartan 3 FPGAs have been used to imple The programmable nature of an FPGA solution reduces the inherent development risk in a new system design With its host of other features such as ment these functions successfully within flat l multiple I O banks on chip digital panel display systems clock managers and exte
173. llocation in a multi FPGA network Second Quarter 2005 Xcell Journal 103 ES Kap POWER PLAY Conquering th ot Power Consum E by Steve Sharp Sr Manager Corporate Solutions Marketing Xilinx Inc steve sharp xilinx com As chip technology progresses to 90 nm and below power becomes a burning issue in system design At this node leak age plays a more major role in total power smaller interconnect geometries with new dielectric materials affect dynamic power as well According to Jordan Selburn of market research firm iSupply Leakage current essentially insignificant at the 0 35 micron node and earlier has become a major issue as transistors become increasingly leakier Studies have shown that at the 90 nm node leakage power can equal dynamic power consumption and even exceed it at the 65 nm node Another factor facing system designers is the tighter power budgets around 104 Xcell Journal 7 which they must design This is not lim ited to any single type of system but does affect most designers Large systems with many boards or modules as well as portable and consumer products all face power budgeting issues In large systems power budgeting is typically done for the total system as well as distributed power regulation on a per board or per module basis With multiple power supplies now on every board it is not a simple task to increase the power budget for one board without affectin
174. lysis both in qualitative and computational terms real time includ ing active video surveillance robotic arm motion and control autonomous vehicle navigation test and measurement and hazard detection The platform provides modules with all required system control logic memory and processing hardware together with the application software Interconnecting modules allow fast devel opment of a complex architecture The platform leverages Xilinx Spartan 3 devices which are an optimal choice for image processing IP cores because of their flexibility high performance and DSP oriented targeting The Spartan 3 family provides a valid programmable alternative to ASICs This characteristic coupled with its low cost structure adds considerable value when time to market is crucial For more information about feature extraction you can e mail the authors at paolo giacon students univrit or saul saggin students univr it For more informa tion about image warping you can e mail matteo busti students univrit or giovanni tommasi students univr it o We are grateful for the support from our advisor Professor Murino in the Vision Image Processing and Sound VIPS Laboratory in the Dipartimento di Informatica at the Universit di Verona and contributions from Marco Monguzzi Roberto Marzotto and Alessandro Negrente Xcell Journal 25 e IN HIGH VOLUME SOLUTIONS Designing a Spartan FPGA DDR Memory Interface
175. m your PC After the data is written into PROM re supply the power or push the con figuration button Transfer test circuit data from the PROM to the FPGA for configuring the FPGA and execute the test circuit Also you can directly configure the bit file to the FPGA Second Quarter 2005 Xcell Journal 113 D POS TRE BOARD OO Ki Nu Horizons Spartan 3 2000 Development Platform ELECTRONICS CORP WT ZE E OUR Le iftsta Poke Bis eros DA Fil ad A ae boo r WW KA wa Ap Wal ys m EZ The ideal platform to evaluate Spartan 3 1500 and 2000 FPGAs The Nu Horizons Electronics Corp Xilinx Spartan 3 1500 2000 development platforms are complete solutions for evaluating Xilinx Spartan 3 XC3S1500 and XC3S2000 FPGAs Each board is a com plete development environment allowing you to evaluate multiple system designs on a single platform For industrial and automotive applications the system board includes the ST Microelectronics L9616 CAN 2 0B physical layer device which when coupled with the Bosch certified CAN core available from Nu Horizons will get you well on your way to devel oping applications with the MicroBlaze soft processor core The external Ethernet controller allows you to create and evaluate CAN to Ethernet gateways With the addition of plug in evaluation modules from Linear Technology you can utilize analog to digital converters with 10 12 14 bit resolution and sampling rates from 10 to
176. makes life easy for designers as well as managers For purchas ing managers the easy to use Excel tool enables them to accurately list the number of near obsolete discrete logic devices that can be replaced by the most optimal Xilinx CPLD and the cost savings justifying such an upgrade For designers it saves pre cious design time that is usually spent esti mating the right PLD to replace a range of discrete logic devices In this article we ll present a real life case study where Logic Consolidator helped a customer analyze the real cost benefits provided by Xilinx CPLDs over discrete logic devices saving thousands of dollars in the process Customer Case Study The Logic Consolidator tool was instru mental in convincing an initially skeptical customer to integrate all of the discrete logic on their board to a single Xilinx CPLD Wipro Technologies one of the biggest hardware design services companies in India Second Quarter 2005 E Die Cost EI Test Cost El Package Cost 10 times smaller die size through semiconductor process technology Figure 1 The costs of a CPLD have come down dramatically because of lower die sizes and a Xilinx XPERTS partner was working on a project aimed at upgrading an existing telecom product mainly to cut manufactur ing costs and include some enhanced fea tures During the initial stage CG CoreEl s Xilinx Consolidator to the customer and helped them identify the a
177. mate the integration of scopes and logic analyz ers In this article we ll describe these tools and how you can more quickly understand digital design problems by capturing the analog characteristics of key digital signals simultaneously with a functional view An Example Situation Consider a packet communication system where something is going wrong at boot up This system shown in Figure 1 should be passing packet data across a seri al channel to a monitor but the data is not being received We can take a variety of approaches to figure out what exactly is causing the problem One approach is to view the condition of state machines in the design using a logic eater than e E e E es e W ZK dh E WAAN bp bh A Wor q E ON C Sei SS E d A sr re ke A Y An gd analyzer with the possibility of seeing some improper states Another approach is to probe onto the serial channel with an oscil loscope in an attempt to find some improp er signals particularly with respect to signal integrity The best approach is to use both the oscilloscope and logic analyzer to better understand the exact target condition under which the error is occurring When Working Independently When approaching the problem with a logic analyzer a number of signals are of particular interest We will certainly want to see what is happening with the trans mit state machine drivi
178. me ma Figure 2 The effect of driver edge rates on a 5 inch trace The 10 ns and 5 ns edges look fine with reflections and ringing effects appearing on the receiver waveforms for the faster 2 5 ns and 1 ns drivers For sub nanosecond driver switching speeds receiver waveforms get progressively worse Ip bm ls fe el am Figure 3 The HyperLynx oscilloscope and transmission line schematic show a driver that is too weak for the parallel low impedance star topology It would be better in this case to use a stronger 24 mA buffer parallel DC terminating resistors pulled to opposing power supplies to achieve a specific DC bias for Thevenin termination Here are some general termina tion guidelines e Source termination is useful in point to point one directional connections e Far end termination is useful in multipoint connections e Distributed termination can be helpful if you have a plug in sys tem with variable configuration Each of these techniques has advantages and disadvantages Parallel DC termination is the sim plest both from the standpoint of component count only one built into Spartan 3 FPGAs and the choice of value equal to the line impedance However it burns the most power and may unacceptably load the driver IC AC termination requires an additional component more expensive extra board space and more engineering work finding the optimal capacitor value but reduces power
179. mentation These systems provide you with user defined I O functionality speed and resolution not available from traditional box instrumentation When implemented with high end mod ular instrumentation and LabVIEW graphi cal programming virtual instrumentation S 2 d costs less and performs better than other Ss e commercial off the shelf COTS solutions SE The NI LabVIEW FPGA module lz SH extends the capabilities of LabVIEW soft ware to target FPGAs on NI Reconfigurable I O RIO hardware LabVIEW FPGA NN and RIO hardware devices bring custom timing triggering synchronization and advanced counter I O to instrumentation end users Figure 1 ff Within the constraints of a COTS hard ware platform LabVIEW provides a high level application programming interface Bech API for software to hardware and hard ware to hardware interconnections so that a 90 Xcell Journal Second Quarter 2005 Figure 1 LabVIEW block diagrams download directly to Xilinx FPGAs on NI CompactRIO you dont need to generate your own cus tomized API With NI LabVIEW and RIO hardware you can rapidly define custom control logic and I O lines in a LabVIEW program called a virtual instrument VI without prior knowledge of low level development tools Simply draw your hard ware design in LabVIEW and the tool chain synthesizes the gates directly into the Xilinx FPGA Figure 2 Opportunity Cust
180. mselves in LabVIEW enabing us to increase our productivity even further Once we had met the objectives of cre ating identical semantics between LabVIEW implemented on a FPGA and LabVIEW implemented on other process ing engines we turned our attention to enhancing the language constructs to take further advantage of the fact that we are running native on hardware We optimized our timing with single cycle timed loops as the strict use of the enable chain had forced us to have a cycle delay for each function Using the single cycle timed loop we can execute complete loops within one FPGA clock cycle Customers can also integrate legacy VHDL or Verilog code EE FPGA gl Figure 4 LabVIEW FPGA block diagram and front panel with corresponding LabVIEW host block diagram 92 Xcell Journal Automotive Application NI alliance partner G PEL Electronic a leading German PCB and electronic device testing company created a portable measuring unit for onboard vehicle analy sis and diagnostics They used LabVIEW FPGA and CompactRIO the latest mem ber of the RIO hardware family to create a portable rugged and extremely versatile handheld data recorder for use in test drives in laboratories environmental chambers wind tunnels and on proving grounds The system called CARLOS in CAR LOgging System is also ideal for endurance and long term tests as well as vehicle calibration and diagnostics With CARL
181. n Underscoring the interrelationship and trade offs between the three Ts however star routing introduces sev eral new problems Multiple branches present the driver IC with a low impedance requiring it to dynamical ly sink and source significant current In practice you may need to use a stronger driver technology for this topology example such as a Xilinx Spartan 3 LVCMOS33_F_24mA driver instead of a LVCMOS33_F_ 8mA as shown in Figure 3 The Third T Termination As a general rule any line with an edge rate faster than 5 ns on nets run ning longer than an inch should be considered a termination candidate Although reducing costs see sidebar Conserve Board Space with Spartan 3 FPGAs is important the associat ed signal quality benefits are critical impacting whether the product works at all Let s review some termination strategies for various trace topologies and design requirements Termination Types The classic methods of terminating digital transmission lines are well known You can terminate the source the far end or both you can employ distributed terminations at several locations or you can use two 50 Xcell Journal MI E Ee jz Aen legen a toen zg GE ep i i CR KZ 7 rte DS ai Di gt Om tm ft Figure 1 Careful pre layout impedance planning using the HyperLynx Stackup Planning tool helps eliminate signal reflections Ti
182. n 3 develop ment board is priced at 449 which includes all of the features necessary to pro totype a DSP based system design The board comes with a users manual power supply documents and design files The option of getting the board bundled with ISE Foundation Generator and ChipScope Pro software software System is also available at an additional cost Besides DSP designs the Spartan 3 platform is a great tool to implement many other Xilinx reference designs Several designs written by Nu Horizons field 30 Xcell Journal iF Figure 4 Spartan 3 board application engineers cover topics such as memory controllers embedded processors hardware in the loop with digital signal processing and system monitor design using ADC and DAC on the board ADC and DAC are very powerful attributes of our low cost board and two of the many competitive board features The Spartan 3 platform can be expanded with the add on ADC board from Linear Technology Conclusion With fast multipliers and lower cost FPGAs engineers now have the ideal solution to their signal and image processing require ments With tools like System Generator anyone can implement a powerful parallel or 2 1Mx16x4 SDRAM 16 MB ISS142S16400 semi parallel customized DSP system on chip design within days The Spartan 3 board from Nu Horizons is a perfect solution for proto typing signal processing using Spartan 3 FPGAs The b
183. n SoC NRE Costs 250nm 180nm 130nm BUSINESS VilevirPOUN TS A MS Dollars 90nm E Gate Count NRE Charges Figure I NRE costs by process geometry Source Semico Research Corp Ever Tightening Circle Design Time vs Market Window 10 11 1192 13 14 Months Shrinking Product Life Cycle Months 15 16 17 18 F Design Complexity Design Time Product Life Cycle Figure 2 Comparison of design and market trends 5000 10000 25000 50000 100000 250000 Unit Volumes Higher Cost FPGA ow Cost FPGA Figure 3 Deeper volume penetration by FPGAs Source Semico Research Corp 500000 1000000 2000000 Standard Cell Source Semico Research Corp point many of the gains seen through the use of program mable logic were given up designers were back on a more traditional ASIC path with all the problems and pitfalls previously mentioned As shown in Figure 3 low cost FPGAs have pushed out the point at which a standard cell or SoC solution must be employed to arrive at the absolute lowest cost This chart also takes into consideration the total cost to create the ASIC solution NRE charges and mask set costs for example as compared to the same cost to create the silicon solution using programmable logic The benefits are numer ous particularly the ability to use a silicon solution that fits into market windows and is at the right price point Conclusion
184. n terms of pictures This dichotomy is most appar ent in the world of DSP Some designers prefer to develop algorithms in language form while others choose to draw out block diagrams showing data flows Until now different sets of tools were required for each method but why should you have to choose Xilinx System Generator for DSP is well established as a productive tool for creating DSP designs using graphical methods With a visual pro gramming environment that The MathWorks Simulink tool and its prede fined Xilinx block set of DSP functions System Generator leverages meets the needs of both sys tem architects to integrate design components and hard ware designers to optimize implementations For more details see Implementing Second Quarter 2005 Graphical based Flow Fixed point system level design scalar processing Debug Verification HDL generation AIL co semutabon The integration of AccelChip DSP Synthesis and Xilinx System Generator Jar De provides eS e a seamless path from KAAS dek Ewe viel 97 yes jee eae ef DSP Algorithms in FPGAs in the Winter 2004 issue of the Xcell Journal Many DSP algorithm developers have found that the MATLAB language best meets their preferred development style With more than 1 000 built in functions as well as toolbox extensions for signal process ing communications and wavelet process ing MATLAB offers a rich
185. nalysis Tool you explore the new offerings behind the our 7 1i design Building High Performance Measurement and Control Systems with FPGAs tools release from a general overview of ISE 7 11 software to detailed third party explorations of logic analyzer setup and debug for FPGAs Second Quarter 2005 Xcell Journal Design Performance Leaps orward with ISE 7 11 Software jgrammable design has never been er or more costettective by Lee Hansen mi Sr Product Marketing evi Xilinx Inc lee hansen xilinx com Mike Weir North American Direct Marketing Manager Xilinx Inc mike weir xilinx com Some of the fastest tools in programmable logic have gotten even faster Xilinx ISE 7 11 software includes new tech LLI ON An y a AW A nology new device support and new Lt CW LLL es MK speed files that get you through the design flow faster with the highest device per formance possible ISE 7 1i software includes the new Mr A N Virtex 4 12 speed grade making the D e SE Ee e industry s leading edge Virtex 4 FPGA NO ee E based designs even faster ISE 7 11 software in combination with Virtex 4 devices can deliver up to 70 faster design perform ance than the nearest competing solution More high performance technology is oe packed directly into ISE software than any Ce other design offering including c core capa bilities like timing driv
186. nalysis of total device power power per net routed partially routed or un routed designs Power Management Hardware Tools for managing power are not just of the software variety Power management chips from National Semiconductor Intersil Texas Instruments and Linear Technology are available to make the job of supplying the multiple supply voltages needed by todays FPGAs easier and they can be valuable companions to Virtex 4 or Spartan 3 devices Their individual capabilities are highlighted in this issue of the Xcell Journal in the following pages Conclusion To conquer the key challenges of power consumption it takes a combination of good product design proper device tech nology and tools that let you take control of system power management Xilinx is an industry leader in power management and now offers many advan tages within its programmable solutions e Virtex 4 FPGAs consume 1 to 5W less power than competing 90 nm FPGAs e Spartan 3 FPGAs are the one of the only low cost FPGAs in the industry to eliminate power on surge In these devices the maximum quiescient power alone is sufficient to guarantee device power up e CoolRunner II CPLDs are the world s lowest power CPLDs ideal for even the most power critical portable applications e Xilinx offers a comprehensive suite of power management tools from Web Power Tools to the XPower analysis tool integrated into the ISE environment You can
187. nce the load dynamically with the help of the work stealing concept First a special processor Po gets the search problem and starts performing the forecast algorithm as if it would act sequen tially At the same time the other proces sors send requests for work to other randomly chosen processors When P a processor that is already supplied with work catches such a request it checks whether or not there are unexplored parts of its search tree ready for evaluation These unexplored parts are all rooted at the right siblings of the nodes of Dis search stack P sends back either a message that it cannot perform work or it sends a work packet a chess position with bounds to the request ing processor Es Thus D becomes a master itself and P starts a sequential search on its own The processors can be master and worker at the same time The relationship dynamically changes during computation When E has finished its work possibly with the help of other processors it sends an answer message to P The master worker relationship between P and P is released and P becomes idle It again starts sending requests for work into the network When processor P finds out that it has sent a wrong af window to one of its workers P it makes a window mes sage follow to D P stops its search corrects the window and starts its old search from the beginning If the message contained a cutoff which indicates superfluous work
188. nd will be supported in the near future Memory Requirements A single Spartan 3E FPGA requires roughly 600K to 6M bits for configura tion a small amount of memory relative to the large SPI capacities available today Table 2 lists the amount of con figuration memory required by each Spartan 3E device type the minimum required SPI device size and the extra memory that is available for other pur poses Note that it is not necessary to have a dedicated configuration memory device for each FPGA Multiple FPGAs can share a single SPI flash PROM in a configuration daisy chain Interfacing to SPI Figure 2 shows the typical connections between a Spartan 3E FPGA and SPI flash memory as well as the FPGA I O pins used to control the configuration procedure When the FPGA is in SPI flash configuration mode three dual pur Xcell Journal 17 A HIGH VOLUME SOLUTIONS Spartan 3E SPI Flash Interface SPI Flash H Configuration Mode 0 SPI Flash Vendor Select Optional JTAG Programming Interface Spartan 3E 3 3V SPI Flash Pull up registor on SS_B only required if HSWAP_EN 1 Figure 2 Interface between Spartan 3E FPGA and SPI flash memory Example Reusing SPI Interface DDR SDRAM MicroBlaze Spartan 3E SPI flash Memory SPI Peripherals e A D Converter e D A Converter e FPGA configures from SPI flash memory e After configuration the FPGA copies MicroBlaze code from SPI flash
189. nfiguration memory interface Figure 1 SPI was originally for application data such as code for an pioneered as a serial communications interface between CPUs MCUs peripherals and other devices support embedded MicroBlaze processor serial numbers or Ethernet MAC IDs SPI and par allel flash PROMs additionally offer random Xilinx Platform Flash still provides an excellent solution for stand alone two chip programmable logic solutions FPGA ded ing this protocol Now SPI is popular in SPI Physical Interface Master Slave gt m e SPI is a four wire synchronous serial interface e SPI Master device communicates to one or more Slaves e SPI Master controls all timing via the SCLK clock signal e SPI Master selects a Slave using an active Low select signal SS e All connected SPI devices share a common serial data input output and clock signal Figure I SPI physical interface SPI Serial Flash Vendor Tested SPI Flash Family SSTZ5VExx supports only the READ 0X03 SPI read command all others support the FAST_Read 0X0B command Table 1 SPI flash memory families supported by Spartan 3E FPGAs Spartan 3E Device Configuration Bits Minimum SPI Device Capacity icated configuration PROM with competitive cost per megabit pricing Platform Flash also excels with features such as JTAG in system programmabil ity and patented compression technology SPI flash PROMs are
190. ng packet data generation along with the data and relat ed transaction IDs We would probably also like to see what is happening over on the receive side of the design where pack ets should be collected Signals of interest there include the state machine data Xcell Journal 67 Bee Leg is transmitter and receiver earlier in Figure 1 Because this link is differential and running at 400 Mbps we used a 1 5 GHz differential active probe and accessories to let us plug in Monitor SERIAL PACKETS gt SERIAL ACKs lA to the 0 1 inch connectors on the board As we attach to this link and get a view of Xilinx FPGA Agilent Trace Figure 1 Probe points of interest brought out through the Agilent trace core received and acknowledge IDs which get sent back to the transmitter to let 1t know that data was received and to keep the data coming This design is implemented with a Xilinx Virtex II FPGA so we have the option to use a measurement core inside the FPGA to move our probe points around to various locations collecting the data on the Agilent logic analyzer by using the logic analyzer based FPGA Dynamic Probe application We use a measurement core with four signal banks and 16 FPGA pins for visibility Figure 1 Because the data is not being received a reasonable starting point is to probe these points of interest set up the logic analyzer to trigger on anything and just see whats happ
191. nics comprising as much as 90 of the functional innovation in new vehicles it has become even more important to choose the right devices to meet temperature quality and technological requirements The Xilinx Automotive XA range of FPGAs CPLDs and configuration devices not only offer the design flexibility and time to market advantage associated with programmable logic devices but also meet the extended temperature requirements and quality standards required in todays automotive advanced electronics systems The XA device range is offered in both the extended temperature Q grade 40 C to 125 C and industrial I grade 40 C to 85 C and is qualified to the industry recognized AEC Q100 standard for more information visit www aecouncil com The XA range includes the lowest power CPLD devices available CoolRunner I CPLDs the lowest cost FPGA Spartan 3 devices and Platform Flash configura tion memory devices Table 1 For more information visit www xilinx com automotive Standards Block RAM Density Range 1 0 Supported Multipliers Kbhits FPGA CPLD Packages Table 1 The Xilinx Automotive family of CPLDs FPGAs and configuration memory devices Second Quarter 2005 Ee IESO AIN A Multimedia Platform for Automotive and Consumer Markets Second Quarter 2005 Xilinx FPGAs and Xylon IP cores shorten design cycles and lower production costs for multimedia applica
192. nking is useful in systems that use dif ferent supply voltage levels on the same printed circuit board Usually a mismatch of voltage levels occurs between a system processor and the devices with which it communicates This communication may be as simple as a serial to parallel conver sion or as integral as a processor interface to a display Another example is the connection of a processor to an external memory card such as CF compact flash or SD secure digi tal Because of competition and constant price pressures high profile devices such must continue to use These process technologies continue to lower as processors advanced process technologies voltage swings because of the physical properties of wafer geometries Yet standards organizations in markets like memory cards wireless communica tions and bus architectures go through specification changes more slowly To bridge the gap voltage and I O translation is a nec essary component in today s architectures Care must be taken to match input and out put voltage switching thresholds for that particular voltage standard Designers face the challenge of pasting together many logic level standards on a variety of part types What device can do the job and yet leave room for design upgrades or changes Discrete devices can usually address this mismatch but may add cost power consumption printed circuit board layers and area to a design They may have limited
193. nning on the board interact ing with the rest of the system Then leveraging the FPGAS re programmability you can implement design changes quick ly and send them back to the device on board in a matter of minutes through the FPGA programming cable Xcell Journal Second Quarter 2005 Figure 1 ChipScope Pro logic analzyer Debug Through Software Cores The ChipScope Pro package of tools includes software debug cores Core Inserter CORE Generator software and the integrated logic analyzer The ChipScope Pro system is a separate option that plugs in directly to ISE design tools and comes in a 60 day evaluation version Four software debug cores are included with ChipScope Pro tools e ILA for accessing and capturing logic signals up to 315 MHz 50 faster than the previous ChipScope Pro release and among the fastest debug cores available e IBA for embedded processor bus sig nal capture protocol detection debug ging and verifying control address and data buses e VIO for setting virtual inputs like external switches mimicking output devices like LEDs or for simulating external logic e ATC2 the advanced Agilent Technologies Trace Core 2 for linking the ChipScope Pro system to your Agilent logic analyzer and FPGA Dynamic Probe Second Quarter 2005 You can quickly and easily configure and insert these low profile cores into your FPGA logic during the design
194. nsive block and e SD HD color space conversion distributed memory Spartan 3 devices EE D can also efficiently implement many control glue logic functions effectively e Digital RGB to USB card reader reducing the size complexity and cost of 0 function your system 48 Xcell Journal Second Quarter 2005 Gel HIGH VOLUME SOLUTIONS Managing Signal Quality Making trade offs at See Ss by Bill Hargin Product Manager HyperLynx Mentor Graphics bill_ hargin mentor com Driven by advances in lithography IC switching speeds continue their progres sively faster march At the same time esca lating clock speeds result in much less forgiving timing margins The techniques for managing high speed effects can be broken into three broad categories sometimes referred to as the three Ts e Technology selecting driver technology fast enough to meet your functional needs but as slow as possible e Topology selecting topologies that meet timing requirements while mini mizing the impact of signal reflections e Termination managing signal reflec tions using passive components Sounds easy right The problem is that there are thousands of such choices to be made when designing a PCB and you must balance these against timing requirements and electromagnetic com pliance EMC Second Quarter 2005 Impedance Mismatches Reflections can occur at any impedance discontinuity includin
195. nsive linear algebra capabili ties Although such an algorithm could be constructed as a block diagram doing so would obscure the algorithm structure so readily K apparent in MATLAB With AccelChip a first step in synthesizing a complete algorithm is to generate any major cores that are referenced in this case the matrix inverse indicated by the function call inv P_cap_est R But you can implement a matrix inverse in many ways the choice of which method to use depends on the size structure and values of the matrix Using the matrix inverse IP core from the AccelWare toolkit you can choose from micro architectures designed for dif ferent applications These micro architec tures can be optimized for speed area power or noise In this case the most suit able approach is to use the Accel Ware QR matrix inverse core Synthesizing RTL with AccelChip With the MATLAB M file loaded into AccelChip the next step is to simulate the floating point design to establish a baseline You would then use AccelChip to convert the design to fixed point math verifying it in MATLAB as shown in Figure 3 AccelChip offers an array of tools to help you trim bits from the design and verify fixed point design effects like saturation and rounding AccelChip aids in this process by propagating bit growth through out the design and letting you use direc tives to set constraints on bit width This algorithmic design exploration allows y
196. o lower power consumption through process technology improvements This is what you would expect from a technology driven company But differences in this family exist that have never been employed in a re programmable logic device These features deliver even lower power higher speed and most importantly integrated functions that reduce system cost The CoolRunner II design team looked at methods to add functionality while pre serving the low cost aspect of the current CPLD market To this end clock doublers clock dividers input hysteresis and I O banking were designed in from a cost per feature standpoint This led to extremely creative ways to design features in the smallest amount of silicon If the feature did not reduce system cost enough to merit the die size increase that feature was not designed into the device With this balance of features and cost CoolRunner II CPLDs offer a unique advantage when compared to other devices Second Quarter 2005 Cost Reducing Features Above and beyond the normal process tech nology shrinks that every silicon component goes through CoolRunner II CPLDs have very special features that help most designers lower the total components in their design These component reductions can be very simple or very complex depending on how creative you are as a designer They range from clocking features to integration of sim ple logic functions to voltage and 1 O level d
197. o convert an FPGA design into an ASIC consequently higher yields in the form of lower costs Conversion Free Design Conversion from Prototype to Production With EasyPath FPGAs you can realize a 30 80 reduction in unit prices compared to equivalent standard FPGAs when you move to high vol ume Because customer specif ic FPGAs are the same piece of silicon as standard FPGAs except cheaper they offer a one for one match of all the fea tures offered in a standard FPGA As a result almost no involvement is required from customer engineering teams the conversion process from design comple tion to production is completely seamless Furthermore the lead times from the time a design is frozen to volume production in customer specific FPGAs can be as short as 8 10 weeks an advantage of almost 12 weeks when compared to ASIC methodolo gies Table 1 shows a comparison of EasyPath FPGAs versus structured ASIC methodologies illustrating why EasyPath FPGAs are a better overall solution Addressing ASSP Market Needs Historically ASSP vendors have used FPGAs as prototyping vehicles moving to custom ASICs in high volume to take advantage of their low unit costs This was a reasonable strategy at 0 18 and larger process nodes because lower NRE costs allowed companies to take the financial market risk and go to market with multiple products As the demands of system integration performance and power hav
198. o fix minor design bugs by changing LUTS and I Os even after they are in high volume production Second Quarter 2005 for certain I Os This new level of customer specific flexibility offers a unique blend of EPGA like features at ASIC like prices Another important issue is the availabil ity of qualified reliable IP ASSP vendors can take advantage of the vast portfolio of IP offered by FPGA vendors during the prototyping phase When the design is ready for production you can use the same IP without any additional charges or quali fication effort ASIC solutions on the other hand require ASSP vendors to either maintain their own portfolio of state of the art IP or incur large expenditures in acquiring qualifying standard IP blocks Case Study Consumer Electronics Companies developing consumer electron ic products increasingly find themselves having to incorporate more and more functionality while maintaining lower costs In no other area are these trends more evident than in home networking which involves the aggregation and distri bution of content within a home through wireless modems cable xDSL and satel lite set top boxes among others In these applications the ability of chipset vendors to integrate various stan dards and encrypt decrypt data streams is critical Xilinx Spartan 3 FPGAs and their EasyPath analogs support multiple I O standards such as Ethernet IEEE 1394 USB2 0 and PCI allowing you to dev
199. oard BenNUEY The BenNUEY motherboard features a Xilinx Virtex IT FPGA and module sites for as many as six additional FPGAs The PCI control and low level drivers abstract the PCI interfacing resulting in a simplified design process for designs applications Virtex II Expansion Module BenBlue II The BenBlue II DIME II module provides a substantial logic resource ideal for imple menting applications that have a large num ber of processing elements Through support for as many as two on board Xilinx XC2V8000 FPGAs the BenBlue II can pro vide more than 200 000 logic cells on a sin gle module A Virtex II Pro version of this module is also available providing two XC2VP100 devices Multi FPGA Management DIMEtalk To manage the large silicon resource pool provided by the hardware the DIMEtalk tool accelerates the design flow for creating a reconfigurable data network by providing between a communications channel FPGAs and the host user environment FUSE Tcl Tk Control and C APIs FUSE is a reconfigurable operating system that allows flexible and scalable control of the FPGA network directly from applica tions using the C development API which is complemented by a Tcl Tk toolset for scripting base control GCAGTTGCA Data In m In T Out D Out In INIT Out Merging Algorithms and Networks The design flow consists of two interde pendent tasks e Capture of the Smith Waterman
200. oard has all of the inter faces necessary to create designs for wire less and digital imaging applications if you want to e Integrate your logic and signal pro cessing capability on a single chip e Prototype a configurable DSP system on chip e Reduce the cost of conventional serially implemented external signal processors e Improve system performance Nu Horizons is also in the process of releasing a Virtex 4 platform board for cus tomers requiring higher density logic more memory and hard MACs running up to 500 MSPS for high performance interfaces to video and imaging applications For all of the designs and related documentation for the Spartan 3 board visit the Nu Horizons website at O www nuborizons comisp3 e 10 100 Ethernet Phy ICS1893BF 10 100 Ethernet I F SMSC91C111 Test LEDs Push Buttons CAN I F STL9616 Figure 5 Spartan 3 XC3S1500 2000 board block diagram Second Quarter 2005 SO HIGH VOLUME SOLUTIONS CoolRunne N DW U nd l tes 10 0 RU n n e r 3 l n d CN ber a Marketing Manager Xilinx Inc 64 macrocell CPLDS are now available zz Since joining the CPLD market in 1992 Xilinx has gone from being the new kid on the block to the second largest CPLD supplier and is closing the gap on the number one position in both volume and sales dollars Growth has come from listen ing to customer input and creating innova e tive ideas to satisfy
201. of color depths 1 to 24 bit bpp TFT STN and other flat panel displays 8 to 32 bit memory interface overlays multiple display support Acceleration ROP2 operation block fill transparency color expansion pattern operations Memory interface flexible multi port memory controller unified memory architectures for cost sensitive applications priority round robin and mixed port arbi tration policy configurable data width for access port and config urable memory bus e Video input YUV RGB and de interlace picture position picture cropping picture scaling using multi pass bi linear interpolation multiple video input source support e Communication CAN controller and UART improved for LIN bus Xcell Journal 59 A IN HIGH VOLUME SOLUTIONS A Low Cost PCI Express Solution Spartan FPGAs are ideal for next generation PCI applications and systems by Abhijit Athavale Product Marketing Manager Xilinx Inc abhijit athavale xilinx com PCI has been the most widely used bus standard in the PC server and embedded markets for the past decade Because PCI is limited by its shared central arbitration based architecture and system synchro nous clocking scheme current and next generation processors are outstrip ping its ability to keep up PCI s emerging replacement is PCI Express a new connectivity standard that preserves the flexibility and familiarity of PCI while dramatically increasing
202. of what DualEDGE flip flops look like The clock divider feature Figure 1 is another way to eliminate extra clock Second Quarter 2005 uy Divider sources while again preserving valuable I O The CoolRunner II clock divider circuit is that pins neat thing about the it actually improves duty cycle For instance if you have a 60 40 duty cycle on an incoming clock source the internal divider circuitry actually performs duty cycle correction So internal to the CPLD this will have a 50 50 duty cycle This may help circumvent clock skew problems and again eliminate multiple oscillator inputs The clock divider is not simply a divide by two it has eight different divide by set tings see Figure 2 So if you use the clock divider in conjunction with DualEDGE flip flops you can generate multiple com DualEDGE Register Figure 1 Xilinx CoolRunner II clock doubler Figure 2 Xilinx CoolRunner II clock divider binations of clock frequencies And for those troublesome odd clock frequency problems just use an integer divide by that yields an odd number for example 6 10 14 to get even more clock choices Input Pin Features Because of the low power nature of CoolRunner Il CPLDs the design team believed that there may be portable applica tions that go into noisy environments This led to the idea of integrating hysteresis on input pins This single feature available on all CoolRunner II CPLDs adds
203. om Measurement HDL and gt Control Hardware Integration Xilinx FPGAs for Virtual Instrumentation Several measurement and control instrumen tation vendors already use FPGAs because of their performance fast development time when compared to other processing tech nologies and vendor reconfigurability However until recently this functionality has been out of reach for many instrumenta tion users Most of the growth experienced by Xilinx FPGAs in the measurement and Challenges Goal Ease Development Using COTS Hardware and LabVIEW Figure 2 Using NI LabVIEW saves time and money by eliminating HDL coding complexity and providing an easy to use GUI interface The intuitive graphical nature of LabVIEW also makes it a unique tool for prototyping Xilinx FPGA designs You can iteratively change your design blocks in LabVIEW and see these changes quickly converted into actual FPGA behavior Additionally you can integrate previously developed VHDL or Verilog code within the LabVIEW block diagram using an HDL input node Using the HDL input node you can develop low level algorithms in HDL and represent that algorithm as a single function block within the LabVIEW environment This function block can be reused throughout the application or saved for other designs LabVIEW also provides a thin driver level API to interface RIO hardware to the rest of a measurement and control system simplifying driver devel
204. onference Munich Germany The MathWorks Model Based Design Conference Paris France June 21 Mentor Graphics EDA TechForum Reading UK j June 23 Mentor Graphics EDA TechForum Paris France June 8 9 June 13 14 Xcell Journal 89 ES DESIGN TOOLS IN e e e by Mike Trimborn B H LabVIEW Embedded Product Manager U n Q D 0 a 0 I O National Instruments Q mike trimborn ni com ND has leveraged industry standard comput A ers the Internet and cutting edge technology to help engineers and scientists build test S sfe H d W Hh F PO As measurement and control solutions We are a pioneer and leader in virtual instrumentation a concept that has changed the way engi neers and scientists approach measurement National Instruments has customized off the shelf cones Virtual instrumentation increases pro FPGA hardware using graphical development tools Deeg hl dak sein eege A9 integrate software such as the NI LabVIEW graphical development environ ment and modular hardware such as PCI and PXI modules for data acquisition instrument control and machine vision NI and Xilinx have collaborated to take virtual instrumentation one step fur ther by bringing the flexibility and per formance of FPGAs to measurement and control I O NI LabVIEW for Virtual Instrumentation Virtual instrumentation merges software tools with modular hardware to bring you flexible programmable and configurable instru
205. only 192 logic cells LC within the Spartan 3 3E architec ture As shown in Figure 2 this is possible only because the register stack scratchpad memory program counter stack and program ROM are constructed using the FPGA memory resources A 4 input LUT in a slice M can also be con figured in the SRLIGE mode This provides a 16 bit shift register with clock enable in addi tion to the dedicated flip flop as shown in Figure 3 When writing to this shift register the new data is always placed in location 0 and all other data moves along by one location However data can be read from any location using the address inputs A 3 0 The SRL16E provides a very cost efficient way to implement a delay line For example Figure 4 shows how one LUT can be used to generate a 5 cycle delay for the data Simply apply the data into the SRL16 data input hard code the address pins at 0100 and read the data out from that address Synthesis tools can implement this technique automatically for anything up to a 16 cycle delay in each LUT 40 Xcell Journal 16 Registers IN_PORT 7 0 Scratch Pad Memory 64 Bytes MIA 18 bit instruction word Port Address Control PORT_ID 7 0 READ_STROBE WRITE_STROBE OUT_PORT 7 0 ADD SUB Shift Rotate AND OR XOR Y ZERO amp lt gt MI s bit data path ECH Interrupt CARRY E gt Shadow MIA 8 bit port address a Flags Flags 10 bit program a
206. ons all using even fewer pins on the FPGA The advanced technology contained within the ATC2 core and FPGA Dynamic Probe isn t available in other FPGA or ASIC real time verification solutions The FPGA Dynamic Probe measures new groups of internal FPGA signals in seconds without forcing you to recompile your design and with no impact on design timing It achieves wider internal visibility over a fixed number of pins with 64 inter nal probe points for every pin helping to conserve FPGA resources for your design It eliminates error prone and time con suming tasks with features like automated signal bus labeling from FPGA design to logic analyzer and mapping FPGA pins from board layout to logic analysis chan nels Figure 2 shows an example setup using an Agilent logic analyzer FPGA Xilinx Parallel Cable Figure 2 ATC2 core FPGA Dynamic Probe and ChipScope Pro system IN Xcell Journal Dynamic Probe and ChipScope Pro soft ware in a target system ATC2 can support 2 4 8 16 32 or 64 banks with each bank supporting from 4 to 128 signal widths Using optional 2X TDM in state debug mode signal counts can be doubled offering as many as 8 192 signals possible for debug in one system And you can change input banks on the fly using the JTAG programming inter face ATC2 also operates in either state mode for functional debug in one time domain with the widest signal capture or timing mode for making measurements
207. oot up we see a flurry of activity for a second or so where there appear to be unex pected glitches or narrow pulses in infinite persistence mode Fortunately we can trigger on a narrow pulse and set up the scope to do that for a more detailed view Figure 2 The resultant capture upon boot up is a single shot view of a narrow pulse This is highly interesting and reveals a definite problem in the system but were still left scratching our head as to what s going on How About a Little Cooperation What if we cross trigger the logic analyzer when we catch this narrow pulse on the oscilloscope We could see what condition the logic is at when this situation presents itself on the serial channel It turns out this is a relatively easy thing to do because of application software in the logic analyzer enabled by the E5850A time correlation fixture created to auto mate the process By time correlating the logic analyzer and oscilloscope and pulling the oscilloscope capture into the logic ana lyzer we should be able to get a very help ful view to track down the problem Figure 2 Scope trigger on a serial channel glitch and single shot capture right out of reset Second Quarter 2005 Time Correlating the Instruments Time correlation is accomplished by dou ble probing a common reference signal with both the logic analyzer and oscillo scope Common BNC cables are placed between one instruments trigger out con nector and
208. opment Therefore you can easily integrate existing instrumen tation motion vision or other COTS hardware into one complete system Second Quarter 2005 control industry has been among instrumen tation vendors rather than their customers NI has taken the next logical step bringing FPGA technology to instrumenta tion users by using Xilinx Virtex TI FPGAs on NI RIO hardware to merge LabVIEW s rapid development time with the performance and flexibility of Virtex II devices Figure 3 An obstacle LabVIEW overcomes is that HDL software knowledge while common among hardware designers is not as popular 8 8 DESIGNS LO Os among measurement and control users who need flexible architectures for custom I O timing and control logic Chip to chip interconnects require knowledge of low level timing and communication protocols between components Furthermore soft ware and hardware systems often require the design of low level drivers and APIs to inter face a custom FPGA board to the rest of a measurement and control system This requirement forces you away from the GUI approach used for most other tasks Such a cumbersome custom devel opment can require weeks or months and cost in excess of 100 000 With the com bined usage of LabVIEW and Virtex II FPGAs you can successfully avoid these delays and extra costs Another consideration is that typical NI data acquisition products address a broad range of bu
209. ories can be fully verified using memory specific ver ification tools Using Formality in a Xilinx Design Flow Formality the proven equivalency check ing tool from Synopsys allows you to veri 10 100X faster than simulation based techniques and identify fy designs and correct logic errors 5 10X faster because of its graphical schematic debug capabilities This performance advantage enables you to run Formality at every stage of the implementation flow and catch errors when they are first introduced great ly reducing the cost of correcting them Functional verification using Formality is the clear winner in terms of performance and error isolation Today s very large pro grammable devices may contain hundreds of thousands of logic cells making equiva lency checking a mandatory component of the modern competitive design flow Synopsys and Xilinx have created the Formality EC flow for Xilinx FPGA designs shown in Figure 4 Design dis crepancies can result from the numerous transformations that take place during synthesis and in ISE software NGDbuild MAP and PAR These transformations may change the design to Second Quarter 2005 MY Em Xilinx ISE RTL vs Post Synth Post Synth vs Post P amp R Formality Compatible Libraries v Formality Step 1 of 2 Formality Step 2 of 2 Figure 4 Formality DC FPGA Xilinx verification flow improve timing
210. ors for precise power delivery What will happen when your FPGA and PCB designs meet suwajs s pa1e1b3 u1 Integrated Systems The potential for disaster is enormous if your designers work independently of one another Mentor Graphics eliminates that potential with the only integrated systems design solution in EDA We have superior design solutions in both PCB and FPGA so it stands to reason that we d create the only truly integrated system flow Our unique class of tools empowers designers to concurrently design FPGA s and PCB s Increase your system performance and design team productivity Get the systems integration white paper at www mentor com techpapers or call 800 547 3000 2004 Mentor Graphics Corporation All Rights Reserved Mentor Graphics is a registered trademark of Mentor Graphics Corporation Introducing the new Spartan 3E family the world s lowest cost FPGAs Amazing product Incredible price An industry firstl Wow A 90nm platform FPGA loaded with features for just 2 00 The new Spartan 3E devices optimized for gate centric designs are perfect for consumer digital apps and so much more Reducing the previous unit cost benchmark by over 30 Spartan 3E FPGAs make it easier than ever to replace your ASIC with a more flexible faster to market solution Get the most for your money mS With Spartan 3E FPGAs two dollars buys a lot 00K gates the lowest cost per logic cell Ea
211. ortant stage in many applications of image analy sis as well as some common applications of computer vision such as view synthesis image mosaicing and video stabilization in a real time system In this article we ll present an FPGA implementation of these algorithms Feature Extraction Theory In many computer vision tasks we are interested in finding significant feature points or more exactly the corners These points are important because if we measure the displacement between fea tures in a sequence of images seen by the camera we can recover information both on the structure of the environment and on the motion of the viewer Figure 1 shows a set of feature points extracted from an image captured by a camera Corner points usually show a significant change of the gradient values along the two directions x and y These points are of interest because they can be uniquely matched and tracked over a sequence of images whereas a point along an edge can be matched with any number of other points on the edge in a second image The Feature Extraction Algorithm The algorithm employed to select good features is inspired by Tomasi and Kanade s method with the Benedetti and Perona approximation considering the eigenvalues o and f of the image gradient covariance matrix The gradient covari ance matrix is given by 2 O II I y X where 7 and Z denote the image gradi ents in the x and y dire
212. oth hardware and software performance data The environment then provides tools to visualize and analyze the data reducing the effort required to identify inefficiencies in the design Figure 1 shows a bus activity graph a visualiza tion tool to help identify bus congestion and bottlenecks in the system design Triton Tuner also links hardware and software events providing a key tool in Xcell Journal 83 determining causal relationships ia jj between hardware and software sys tems This feature greatly simplifies the task of evaluating and optimiz ing system performance Tuner also provides tools to optimize the mem ory structure This can greatly and other high speed memories Triton Builder is an automated reduce the need for costly caches partitioning tool which simplifies the task of accelerating the per formance of processor based algo rithms With Triton Builder you l can easily offload compute inten sive loops and functions from soft The tool a DMA direct memory access based hardware ware to hardware automatically creates accelerator to perform the offloaded task All hardware and source code modifica tions are also created to greatly simplify the repartitioning process With the builder tool you can control a number of key parameters in which to optimize the solution to your system requirements Together these tools provide a system design and optimization environ
213. otprint 30 mm x 65 5 mm METE Small Footprint 30 mm x 65 5 mm KS gt Complete System On A Module Wa Complete System On A Module Set gt Supports MicroBlaze Processor gt Supports PowerPC Processor gt 400K Gate Spartan 3 FPGA gt Based on the Virtex 4 FX12 FPGA gt 10 100 Ethernet MAC and PHY gt 10 100 1000 Ethernet Port gt SRAM and Flash Memory gt DDR and Flash Memory gt 76 User I O gt 76 User 1 0 A Visit www memec com xilinx minimodules gt XILINX Memec Copyright 2005 Memec LLC All rights reserved Logos are owned by their pr si rietors and used by Memec with permission All company and product names may be trademarks of their respective companie Memec MG047 05 04 01 05 UI 2OO0EILEDIOUUDIEdIuIO3 Ru UI MAM d 4 16 punoj ziaaus EIER a31Aap ay YIM JUSUINIOP SIY UI EIER UE Apa zuez odu uop2ppoid WNJOA 104 yed 391 UOISISAUO APIAOId SUONN OS YI Se3 y SISA99SUEA qepd O 1990Y ajqeylene zo Jaquiny E EH SI9M9ISUBA Ugpbo DUpIW ORDPOY LDN 2 SC V S SS 9BALQ WOIXUI IXMMM JISIA SUONNIOS adJJ Gg NOE UONEULOJU BOW 104 9 QR ILAR 31e SUOIIN OS 931J d ajqnedwiod juudjooy aie abeyped awes ay U ajge lene sa2 Nap XS Y X HIA PUE XI Y X HIA Y N a Bupeds jeq ww 00 vog yud aul dup du 44 sobeoe v4y g d uneds Weg ww 08 0 vog yud auly diyd diy 45 SABEPe4 vie 1 S9ION S au CH 72 968 ass wl Wu EEN ET 092144 S N 1 7 891 07 89
214. ou to attain the ideal quantization that mini mizes bit widths while managing overflows or underflows allowing early trade offs of silicon area versus performance metrics True signals 50 100 150 200 250 300 350 Signals with additive noise ANT ANT de NOA a WI A aaa AV NA 00 50 1 150 200 250 300 350 Filtered signals 50 100 150 200 250 300 350 Figure 3 The Kalman filter constructs estimates of the three signals based on input signals corrupted by noise Once you have attained suitable quanti zation the next step is to generate RTL for your target Xilinx device At this point you can use the AccelChip GUI to set constraints on the design using the following design directives to achieve further optimizations e Rolling unrolling of FOR loops e Expansion of vector and matrix addi tions and multiplications e RAM ROM memory mapping of vectors and two dimensional arrays e Pipeline insertion e Shift register mapping Using these directives constitutes hardware based design exploration allowing the design team to further improve quality of results In synthesizing the RTL AccelChip evaluates the entire design and schedules the entire algo rithm performing necessary boundary optimization in the process Second Quarter 2005 Sine Wavel O Band Limited Scope3 White Noise ge Sine Wave Scope2 Band Limited White Noise 1 Sine Wave2 Scope4
215. out put current by using the voltage drop across the upper MOSFET s rDS ON The linear regulator outputs are monitored through the FB pins for undervoltage events For a copy of Power Management Application Guide for Xilinx FPGAs visit www intersil com Xilinx index asp intersil HIGH PERFORMANCE ANALOG Xcell Journal 107 ep POWER PLAY Spartan 3 FPGA Power Management Solutions VIN 2 75V TO 9 8V me 2 24H q 2 2yH BG1 od PGND Vour1 1 0 va VouT2 CORE 6V TO Vin SGN 0 6V TO Vin lour lt 6A E louT lt 6A Vin 22N OSON Vout 0 8 TO 5V our lt 4A Precision Tracking ER BR USR ei UI KSE al A 0 01 0 1 1 LOAD CURRENT A EFFICIENCY 50 0 001 Linear Technology DC DC converters power Spartan 3 devices Linear Technology offers a broad range of regulators to simplify the design and selection of DC DC converters to power the Xilinx SpartanTM 3 family of FPGAs Our converters range from low dropout linear regulators to sophisticated dual output switching con trollers that offer high efficiency and on chip power supply tracking Linear s free Switcher CAD M software allows quick and simple sim ulation of power supply designs To download SwitcherCAD soft ware please visit www nuhorizons comllinear ELECTRONICS CORP www nuhorizons com linear 108 Xcell Journal LI WY www linear com Second Quarter 2005 KE POWER PLAY Versat
216. plement a FIFO The data path would normally be several LUTs wide Second Quarter 2005 Synthesis Tools Support SRL16E Mode The following code describes the full functionality of an SRLIGE and yet it will be automatically reduced to an SRLIGE by the synthesis tools The entire code is thus implemented using one look up table Storage process clk begin if clk event and clk 1 then enable only applies to the input path if input_enable 1 then store0_data lt data_in Store1_ data lt store0_ data Store2_data lt store1_ data store3_data lt store2_ data Store4_data lt store3_data Stored data lt store4 data store6_data lt store5_data store7 data lt store6_data Store8 data lt store7 data store9 data lt store8_data store10_data lt store9 data store11_data lt store10_data store12 data lt store11_ data store13_data lt store12 data store14 data lt store13 data Store15 data lt store14 data end if Select only applies to the output path case output_select is when 0000 gt data_out lt store0_data when 0001 gt data_out lt store1_ data when 0010 gt data_out lt store2_ data when 0011 gt data_out lt store3_ data when 0100 gt data_out lt store4 data when 0101 gt data_out lt store5_data when 0110 gt data_out lt store6_data when 0111 gt data_out lt store data
217. plementation netlist s to ensure that your final ASIC matches the func tionality proven in the FPGA prototype as shown in Figure 3 The benefits of EC also apply when you intend to use the FPGA as the final imple mentation of your design The ability to quickly and exhaustively prove the corre spondence between your reference RTL the post synthesis netlist and the post PAR netlist significantly reduces simula tion time or multiple programming iterations neither of which can conclusive ly prove the equivalence between various versions of your design ASIC Implementation Synthesis DC or Other Auto Setup y Physical Design Automated setup supported in DC DC FPGA and Xilinx tools Figure 3 Verifiying an ASIC implementation against an EPGA prototyped implementation 18 Xcell Journal Block level design elements like digital clock managers DCMs and block RAMs inherent features of the Xilinx architecture pose interesting challenges in FPGA designs When they are instantiated as is common in FPGA prototyping flows they are treated as black boxes for syn thesis and EC purposes The contents of a black box are not verified but the signals at the periphery of the black boxed element are proven to be equivalent RAMs can also be inferred from the RTL These inferred RAMs can be directly verified by Formality as long as the inferred RAM does not become excessively large Large mem
218. poly nomial To store and read the gradient values we use a buffer implemented using a Spartan 3 block RAM 2 Calculation of the characteristic poly nomial value This value is important to sort the features related to the spe cific pixel We implemented the mul tiplications used for the characteristic polynomial calculus employing the embedded multipliers on Spartan 3 devices 3 Feature sorting We store computed feature values in block RAM and sort them step by step by using successive comparisons 4 Enforce minimum distance This is done to keep a minimum distance between features otherwise we get clusters of features heaped around the most important ones This is imple mented using block RAMs building a non detect area around each most important feature where other features will not be selected Spartan 3 Theoretical Performance The algorithm is developed for gray level images at different resolutions up to 512 x 512 at 100 frames per second The resources estimated by Xilinx System Generator are e 1 576 slices e 15 block RAMs e 224 LUTs e 11 embedded multipliers The embedded multipliers and extensive memory resources of the Spartan 3 fabric allow for an efficient logic implementation Applications of Feature Extraction Feature extraction is used in the front end for any system employed to solve practical control problems such as autonomous navigation and systems that could rely on vision
219. ppropriate CPLD for sales team introduced Logic replacing multiple discrete devices on board The tool is very intuitive It saves a lot of time in finding the right CPLD to replace all of the discrete logic Figure 2 shows a screenshot of the Logic Consolidator tool as used by the customer In this case Wipro was able to identify 73 discrete logic devices that could be replaced by a single Xilinx CPLD They could choose either a XC9500XL device or a CoolRunner II device depend ing on the supply voltage available Both options offered an immediate bill of mate rials cost savings 0 74 with the XC95288XL device and 1 81 with the XC2C256 device with about 40 more room available in the CPLD to add more logic for any feature enhancement When Wipro engineers actually imple mented the design because of optimiza tions performed by the Xilinx SE software tool the design fitted in the next smallest Xilinx device the XC95144XL 10TQ144C with 144 macrocells With a listed price of 6 45 for the XC95144XL device it provided further cost savings for the customer On the PCB front the Second Quarter 2005 potential savings are approximately more than 70 in board real estate 60 or more in board routing and a 50 reduc tion in board design time Add to this the benefits of programma bility reduced board complexity and EKUN A EIER EE DR EES increased reliability through reduced power consumption and lo
220. pressed Tokens Write 7 Compressed Tokens Read Bitstream Packager DMA Buffer Controlle CPU Status Data Command Decode Tables Write Oscil Programming __ Interface lators Xcell Journal 21 N HIGH VOLUME SOLUTIONS Implementing DSP Algorithms Using Spartan 3 FPGAs This article presents two cose studies of FPGA implementations for commonly used image processing algorithms eature extraction and digital image warping by Paolo Giacon Graduate Student Universita di Verona Italy paolo giacon students unive it Saul Saggin Undergraduate Student Universita di Verona Italy saul saggin students univr it Giovanni Tommasi Undergraduate Student Universita di Verona Italy giovanni tommasi students univr if Matteo Busti Graduate Student Universita di Verona Italy matteo busti students univr it Computer vision is a branch of artificial intelligence that focuses on equipping com puters with the functions typical of human vision In this discipline feature tracking is one of the most important pre processing tasks for several applications including structure from motion image registration and camera motion retrieval The feature extraction phase is critical because of its computationally intensive nature 22 Xcell Journal Digital image warping is a branch of image processing that deals with tech niques of geometric spatial transforma tions Warping images is an imp
221. proach may increase the board component count the combined cost of an 8 bit microcontroller and CPLD is lower than that of a 16 bit microprocessor This approach also allows the board to be adapted to suit many customer configura tions through the CPLD providing the abil ity to interface to many different devices under control The Xilinx CoolRunner II family of CPLDs are not only easy to interfacing to aid with motor control which can enable you to use a lower cost basic microcontroller without PWMs Interfacing and Bridging With the increasing number of bus proto cols supported by ASSP and microcon troller vendors interface translation is a growing problem that requires an inexpen sive and easy solution Offering the lowest cost method for translating one bus proto col into another CPLD devices can sup port interface bridging applications including voltage level shifting 3 3V in to 1 8V out bus translation applications translating a proprietary language to an industry standard language serial to parallel and parallel to serial bus conver sions and DDR memory interfacing Xcell Journal 55 A MN Address Microprocessor Xilinx CPLD Control Figure 2 DDR memory interfacing and level shifting using CPLDs Figure 2 shows how CPLDs can be used to interface to high speed memory and also shows an example of a CPLD level shifting between different voltage levels CoolRunner II CPLDs are
222. r second with 512 x 512 images Theoretical results show a boundary of 360 fps in a Spartan 3 based system Applications of Image Warping Image warping is typically used in many common computer vision applications such as view synthesis video stabilization and image mosaicing Image mosaicing deals with the composi tion of sequence or collection of images after aligning all of them respective to a com mon reference frame These geometrical transformations can be seen as simple rela tions between coordinate systems By applying the appropriate transforma tions through a warping operation and merging the overlapping regions of a warped image we can construct a single panoramic image covering the entire visible area of the scene Image mosaicing provides a powerful way to create detailed three dimensional models and scenes for virtual reality scenarios based on real imagery It is employed in flight simulators interactive multi player games and medical image sys tems to construct true scenic panoramas or limited virtual environments Second Quarter 2005 Forward Mapping Image Image Iw a a y 9 10 1 X l 2 8 s AE gt 9 10 T1 x Se 4 ee Kei k ae ao LOL E INPUT source image For every y from 1 to height I For every x from 1 to width l Calculate x u round x Calculate y v round y If 1 lt u lt wodth lw and 1 lt v lt height lw Copy I x y to Iw u v Oo 0 AJ
223. r usage and control challenging system designers today static power dynamic power and in rush power Each presents different issues and requires different methods to calculate and manage power Static power is the power consumed by a device when it is in its quiescent condition with no input signals being exercised It is also referred to as steady state or standby power In todays 90 nm technology devices leakage currents in the transistors are the biggest contributors to static power This is usually the key parameter of concern to designers of portable equipment because of its effect on battery life especially for devices that spend large amounts of time in a standby condition waiting for input from the outside world Dynamic power is the power consumed during normal operation It is also referred to as operating power Dynamic power is depen dant on operating signal frequency intercon nect capacitance and operating voltage Because the voltage dependency is a square function the reduction in voltage when mov ing to 90 nm devices has substantially reduced operating power in many devices However for large high performance systems with high operating frequencies dynamic power is still a significant component of total system power In rush power is the power required at device power up It is also referred to as power up or start up power or power on surge power or current Some devices require many times more power to b
224. ration logic Xilinx has been able to keep in rush power within 15 20 of the static power requirements and below typical operating power This removes the need to use a larger power supply just to address in rush current CoolRunner II CPLDs When Xilinx designed the CoolRunner II family of low power CPLDs our goal was to deliver one of the industrys lowest power levels for a programmable logic device These devices have standby current require ments of less than 20 pA making them ideal for battery powered portable devices Other CPLDs claiming to be low power have standby power 100 to 1000x higher affect ing battery life so significantly that they are unsuitable for portable applications The static RealDigital technology used in the logic of CoolRunner II devices does away with power hungry sense amplifiers 106 Xcell Journal and delivers low dynamic power as good as any other device available today In addition to these advantages in the basic circuit design and process technology CoolRunner II devices also offer power management features unique to the CPLD industry including a DataGate feature to reduce effective logic usage in the device and clock management and input hysteresis features to reduce internal operating fre quencies and dynamic power Spartan 3 FPGAs Our customers have told us that in today s cost conscious consumer products being forced to put in a bigger supply just to sup ply a high power
225. reams as fixed view cameras in most cases have very little difference between consecutive frames Something like MPEG 2 could make a difference that was the standard I was planning to implement in the camera But as soon as I got some books on MPEG 2 and started combing through online resources I found another funda mental difference between MPEG and JPEG not just that it can use the similar ity between consecutive frames Contrary to JPEG MPEG 2 requires you to pay licensing fees for using the encoders based on this standard The fee is small compared i F ay i i LP F i ka LR ke j H kl WU f Figure 2 Camera system board camera and to implement motion JPEG compression Half of that time was spent trying to figure out how to configure the new FPGA with the generated bitstream In the camera JTAG pins of the device are to the cost of the hardware but it still could be a hassle and does not provide free dom for implementation It did not take long to find a perfect alternative Theora based on the VP3 Second Quarter 2005 codec developed by On2 Technologies www on2 com and released as open source software for royalty free use and modifica tions see 1www theora org svn htm Theora is an advanced video codec that competes with MPEG 4 and other similar low bit rate video compression schemes It is now supported by the Xiph org Foundation along with Ogg the transport layer use
226. reduce area or power bet ter match the architectural features of the Xilinx device or meet design rules Transformations such as combinatorial reductions sequential optimizations retiming FSM re encoding register merging or duplication and place and route optimizations increase the risk of unintended functional changes EC tools consider two versions of a design independently without knowing how they were created Therefore Formality has no awareness of the transformations that occurred during design implementation These transformations include changes to the logic in net and instance names in hier archy and to the number or meaning of state elements These changes may impact the names of compare points preventing the use of efficient name based matching tech niques and increasing the use of advanced but slower matching techniques Additionally any changes to a bounding element of a logic cone results in a change to Second Quarter 2005 the logic cone itself For example if you opti mize away a register during synthesis you remove the end point of a logic cone forcing that logic cone to be expanded until another end point is found The effect is that the matching of compare points may no longer be possible even ifa match is made the logic between corresponding cones is no longer equivalent and verification will fail Tight Linkage Between Tools Improves Usability In the past EC tools required extensive
227. reduced swing differ ential signaling RSDS as well as other single ended standards such as SSTL HSTL allows it to be used in such applications The FPGA can also be used to implement the timing control unit TCON to control the horizontal and vertical pixel displays for mat TCON is more or less a vertical and horizontal display counter The large amount of flip flops in the Spartan 3 architecture and a built in carry chain allows for an effi cient implementation of this function Also with abundant differential I O channels Spartan 3 FPGAs have an efficient architecture for designs that are more I O intensive Figure 3 shows how even a small Spartan 3E device has enough differential I O resources to drive a large LCD plasma screen In many designs customers require an interface to high speed DDR external mem ory Xilinx provides reference designs and application notes that allow you to build an efficient memory controller within the Spartan 3 fabric Conclusion With some of the industrys lowest price points and full featured capabilities a Spartan 3 3E FPGAs is ideal for various low Xcell Journal 4 a UE IME SOUTIONS Digital Image CODEC Processing Let Xilinx help vou get your message out to thousands of programmable logic users worldwide Input PHY E nc System Display Control Driver SPARTAN 3 Output FPGA xP That s right by advertising your product or service in the Xilinx Xce
228. res e The Virtex 4 XtremeDSP block power has been updated with new measurements at low medium and high togele rates e Latest thermal data updates for all sup ported devices In addition new updates that affect both Virtex 4 FPGAs and other Xilinx device families include e VccInt can now be varied for Virtex 4 Virtex II Virtex II Pro and Spartan 3 devices e More accurate power estimation for all open drain outputs e New and improved online help updat ed with all features available for Virtex 4 FPGAs and other families all web spreadsheets now have clickable header links that take you directly to online help Xcell Journal 81 oat DES TENE TOOLS Ta ha la Te Pedo La g ren d He pg Zenger Le Cem a tonic i wom Aw e P fi sl ii Lesen De e Less l In Pe Le Mi oe IP A Cal E d n j Cp Gebees E 3 me Sa 2 Ta l Bei Ba erg Figure 3 XPower new design wizard XPower Integrated Design Specific Power Analysis XPower a free part of all Xilinx ISE design tool configurations allows you to get a much more detailed estimate of your design based power requirements XPower estimates device power based on a mapped or placed and routed design XPower calcu lates an estimate of power with an average design suite error of less than 10 for mature in production FPGA and CPLDs It considers device data along with your design files and reports estimated
229. ributed RAM Small FIFOs LIFO buffers scratch pad memory register banks embedded block RAMs All of these features let you design for the 18 Kb Block RAM Large FIFOs buffers cache memory program storage lowest possible area lowering the size and for processors thereby the cost of the FPGA in seg design LUT Configured as 16 bit FIFOs delay lines state machines logic register With more than 100 million units Shift Register SRL16E tradeoff in regular intensive designs shipped Spartan devices have quickly become the world s most accepted low Differential Signaling Support Lower number of pins reduced power consumption cost FPGA architecture familiar to thou LVDS RSDS reduced EMI high noise immunity sands of engineers With every generation of the Spartan architecture Xilinx has Table 1 Features included in the Spartan 3 3E family of devices Second Quarter 2005 Xcell Journal 39 A MELO Wie SOUTO NS 16 x 4 bit 2 SliceM 2 SliceL 32 x 2 bit 64 x 1 bit Figure 1 The LUT in two slices of the CLB are configurable as 16 x 4 32 x 2 or a 64 x 1 memory The advantage of having distributed mem ory is that you can very efficiently implement any design that requires a large number of smaller size memory structures without run ning out of registers or block RAMs offering the potential for unrivaled data bandwidth A good example of this is the 8 bit PicoBlaze microcontroller which takes
230. rigorous time domain sim ulation but it can greatly improve your understanding of link behavior A channel with less than 1 dB of runt pulse degradation works great with just about any ordinary CMOS logic family assuming that you solve the clock skew problem either with low skew clock dis tribution or by using a clock recovery unit at the receiver A channel with as much as 3 dB degradation requires noth ing more sophisticated than a good dif ferential architecture with tightly placed well controlled receiver thresholds A channel with 6 dB of degradation requires equalization Transmit Pre Emphasis The Xilinx Virtex 4 RocketlO TM transceiver incorporates three forms of equalization that extend your reach on deeply degraded channels The first is transmit pre emphasis 8 Xcell Journal Stripline trace familiar with calculus you can think of the first WI difference waveform as a kind of derivative opera t 1 2 oz Cu tion On every edge the me pen 61810 difference waveform cre Z0 500 ates a big kick The trans mit pre emphasis circuit adds together a certain proportion of the main signal and the first difference waveform to superimpose the big kick at the begin ning of every transition As viewed by the receiver each kick boosts the amplitude of the runt pulses without enlarging low frequency portions of your signal which are already too big The first difference idea help
231. rly organizes computer chess world championships For quite a long time large mainframe computers won these championships Since 1992 however only PC pro grams have been world chess champi ons They have dominated the world increasing their playing strength by about 30 ELO per year ELO is a sta w tistical measure of 100 points differ ence corresponding to a 64 winning chance A certain number of ELO points determines levels of expertise Beginner 1 000 ELO International Master 2 400 Grandmaster 2 500 World Champion 2 830 Today the computer chess commu nity is highly developed with special machine rooms using virtual reality ZS z ur gr r Ar lun E dt Up i i fal OUTI EI d rite Ip Il d F and closed and open tournament rooms Anybody can play against grandmasters or strong machines through the Internet Second Quarter 2005 Hydra The Hydra Project is internationally driv en financed by PAL Computer Systems in Abu Dhabi United Arab Emirates The core team comprises programmer Chrilly Donninger in Austria researcher Ulf Lorenz from the University of Paderborn in Germany chess grandmaster Christopher Lutz in Germany and project manager Muhammad Nasir Ali in Abu Dhabi FPGAs from Xilinx are provided on PCI cards from AlphaData in the United Kingdom The compute cluster is built by Megware in Germany supported by the Paderborn Center for Parallel Computing Taiwan is involved
232. rollers MCUs and digital signal processors DSPs e Many application specific standard products ASSPs After configuration the FPGA user appli cation can remain as the SPI bus master and communicate with the attached SPI flash PROM and any attached SPI peripherals All of the attached SPI peripherals share com mon serial input serial output and clock sig nals Each SPI peripheral has a separate select input The SPI flash PROM used for config uration is selected through the FPGAs CSO_B pin Each additional SPI peripheral is selected by a separate user I O pin Parallel NOR Flash Spartan 3E FPGAs also provide a parallel configuration interface primarily designed to configure the FPGA from industry standard parallel NOR flash The EPGAS asynchronous memory interface is flexible and can connect to a variety of memory devices such as EEPROM OTP EPROM masked ROM NVRAM or even asynchronous SRAM The memory requirements are fairly simple During configuration the FPGA provides up to 24 address lines and receives byte wide data The FPGA has four control lines driving the memory s chip select output enable write enable and an optional byte enable for high density flash PROMs with a x8 x16 mode control BYTE The memory must be 3 3V 2 5V or 1 8V and must have a read access time of 200 ns or faster Conclusion In addition to low device costs the new low cost Spartan 3E FPGA family lowers overall system co
233. rt the placement of your block or entire design and make changes to the placement locations PlanAhead design tools give you very easy to use cross prob ing functionality between the netlist tree display schematic placement timing paths and user floorplanned blocks Figure 3 shows the post place and route environment The PlanAhead analysis fea tures work together to make it easy for you to comprehend the effect of physical implementation on performance Metric Maps With metric maps introduced in the latest PlanAhead release you can detect prob lem areas such as a cluster of negative slack instances within a certain hierarchical module on the placement surface Congested areas or utilization issues are also detectable Better faster and smarter visualization puts you firmly in control of the FPGA and leads to faster turnaround times in fixing problems Block Based Design If you ve ever struggled to get a giant FPGA out the door on time you might have won 88 Xcell Journal dered how to break up your design into smaller more manageable blocks Imagine a flow that lets you focus your optimization efforts on a critical block implement it and verify its performance before tackling the rest of the design You may have also wished for a flow that enables you to divide up a design between team members PlanAhead software was built from scratch to be your hierarchical design tool At its core lies the PBlock or Phys
234. s Aa e x Aj Ao gt As Aa As Aso J Aaxt Azt Aas y gt x y 2 0 0 1 1 1 Affine transformations include several planar transformation classes as rotation translation scaling and all possible combi nations of these We can summarize the affine transformation as every planar trans formation where the parallelism is pre 24 Xcell Journal Pi tions where a personal computer or work A hardware implementation requires further work to station is required achieve efficiency constraints on an FPGA Essentially the process can be divided in two parts transformation and interpo lation We implemented the first as a matrix vector multiplication 2 with four multipliers and four adders The second is an approximation of the real result of the interpolation we weighted the four pixel values approximating the results of the transformation with two bits after the bina ry point Instead of performing the calcula tions given by the formula we used a LUT to obtain the pixel final value since we divided possible results of the interpolation into a set of discrete values Spartan 3 Theoretical Performance We designed the algorithm using System Generator for DSP targeting a Spartan 3 device We generated the HDL code and synthesized it with ISE design software obtaining a resource utilization of e 744 slices 1 107 LUTs e 164 SRL16 e 4 embedded multipliers The design can process up to 46 fps frames pe
235. s eS sf bil 5 ia be ee Pe AA Figure 2 Using the MIG 007 to automatically create a DDR memory controller Number of Slices 2 277 out of 13 312 heteros toni Number of External 10Bs 147 out of 487 Table 1 Device utilization for a DDR 64 bit interface in an XC3S1500 FPGA Second Quarter 2005 tool requires you to input data including FPGA device frequen cy data width and banks to use The inter active GUI Figure 2 generates the RTL EDIE SDC UCE and related document files As an example we created a DDR 64 bit interface for a Spartan XC3S1500 5FG676 a HIGH VOLUME SOLUTIONS using MIG The results in Table 1 show that the implementation would use 17 of the slices leaving more than 80 of the device free for data processing functions Testing Out Your Designs The last sequence in a design is the verifi cation and debug in actual hardware After using MIG 007 to create your cus tomized memory controller you can implement your design on the Spartan 3 Memory Development Kit HW S3 SL361 as shown in Figure 3 The 995 kit is based on a Spartan 3 1 5M gate FPGA the XC3S1500 and includes additional features such as e 64 MB of DDR SDRAM Micron MT5VDDT1672HG 335 with an additional 128 MB DDR SDRAM DIMM for future expansion e Two line LCD e 166 MHz oscillator e Rotary switches e Universal power supply 85V 240V 50 60 MHz Figure 3 Spartan 3 memory development boar
236. s within the schematic viewer can save you hours wad ing through VHDL Verilog files Connectivity Display PlanAhead software bundles all nets con necting any two floorplanned blocks into a single line with thickness and color reflect ing the number of such nets This bundled connectivity shows the flow of data through the design and is extremely useful in arriv ing at the right floorplan Constraints Editing You can view edit add and delete any of your constraints and see the effect of timing constraint changes by TimeAhead This eliminates the need to re re running run place and route a lengthy process just to verify constraint changes If you decide that certain paths are not important you can false path them decide that certain clocks need to be tightened or add a MAX_DELAY constraint on a long meandering net Design Rule Checks DRCs PlanAhead software sports a comprehensive set of design rule checks to flag potential problems before launching a time consum ing place and route run e SSO limit violations e I O bank rule violations e Clock region rules e Carry chain height e DSP48 internal register optimization Xcell Journal HI Bee Lg E rd 717 he hh wm Pech gg ms Leg mes ken a m Joe eveex ee 35 Mo pe a gt A sen w LIT E rm b s ES 8 we A Hab ae Figure 3 Analyzing placement with TimeAhead and the schematic viewer Placement You can impo
237. s you see how pre emphasis works but that is not how it is built The actual circuit sums Binary waveform LJP TL JS vo First difference tL 1 r t x n x n 1 Sum o a Le a x n b x n x n 1 Figure 6 The transmit pre emphasis circuit creates a big kick at the beginning of every transition 0 716 eS Rises TA UN Li Equalizer Transfer Function dB 10 10 1010 Frequency Hz Figure 7 Over the critical range from DC to 1 25 GHz the pre emphasis response rises smoothly Equalized Output Ge ized The overshoot at each not two but three delayed terms called the pre cursor cursor and post cursor This architecture gives you the capacity to realize both first and second differ ences by adjusting the coefficients associ ated with these three terms Programmable 5 bit multiplying DACs control the three coefficients The first and third amplitudes are always inverted with respect to the main center term a trick that is accomplished by using the NOT Q outputs of the first and third flip flops As an example Figure 7 plots the frequency response corresponding to the particular coefficient set 0 056 0 716 0 228 Over the critical range from DC to 1 25 GHz the pre emphasis response rises smoothly just the opposite of the plum meting curves drawn in Figure 5 The response peaks at 1 25 GHz If you clock this pre emphasis circuit at a higher data rate
238. se between five individ ual devices or five individual functions on one device you would probably choose the latter The more dynamic design elements on one device the more flexibility and options you have One or all embedded processing solutions such as the Ultra Controller PicoBlaze soft core proces sor MicroBlaze soft core processor and the PowerPCTM 405 are located or can be implemented on one powerful device These system components were once separate discrete and alone on the board In the future there will be a wave of pro grammable embedded platforms Why not reunite the lonely CPU with some logic block RAM DSP and connectivity Instead of all of that functionality linked to a whole board it can now be self con tained in a single device The Xilinx embedded solution does not just help you for a single application our solution will increase the system performance of all of your applications A CPU is better equipped to handle some functions than traditional FPGAs A good example of this is arithmetic func tions CPUs are designed for this purpose So why not let the processor do what it does best and leave the FPGA to concen trate on the tasks at which it excels With the Xilinx embedded solution you can do just that By adding a processing solution to your design you can create a more advanced system and provide more features to your customers Challenges Customers are often reluctant to
239. sequences are first written across the DIMEtalk network into the 16K read FIFOs of the individual FPGAs Note that a single XC2V6000 device is capable of as many as 1 26 trillion cell updates per second At the end of the pro cessing the final edit distance is written into a write FIFO Table 1 gives the runtimes for the different GenBank data bases in the case of a single XC2V6000 FPGA and two XC2V6000 FPGAs in the net work The databases used were the GBUNA 3 MB 1 276 sequences in one database file GBPRI20 187 MB 2 036 sequences with each sequence mo LAN a A Thi tua UE in a separate database file and GBROD10 245 MB 1 200 sequences with each sequence in a separate database file all from the GenBank database The results are compared with those obtained using a SunFire 280R two UltraSPARC III processors clocking at 1 05 GHz with 8 MB L2 cache and 8 GB memory running Solaris 9 Each XC2V6000 FPGA holds PEs correspon ding to a 5 000 nucleotide bases long query sequence besides a 10 overhead for the interface logic We achieved an overall FPGA utilization of more than 80 Systems Optimization The time required to compare a query sequence against a database sequence for the individual FPGA configuration is cal culated as Luz ta GX NX D Q Second Quarter 2005 Computation Platform Time GBUNA Time GBPRI20 Time GBROD10 120 seconds 294 seconds 220 seconds 265 seconds
240. siest to use in the industry PLUS embedded multipliers for high performance low cost DSP tons of Software RAM digital clock managers and all the VO support you need With a density range up to 1 6 million gates and our production proven 90nm process now is the time to make vour money work smarter MAKE IT YOUR ASIC XILINX The Programenable Logic Corpa For more Information visit www xilinx com spartan3e Eh es See en RETIRE AB el wide arr 190 BEST COMPARICS HIE HS Pron ber WHA ug cual kaf ul Je a a SEI i j E i 3 ak MAA UE A Fan ma ep Pr r Era aitor a CR zB KAN a l EJ FJ TrA i Ep i d m P T ii z DT mi i I Ei f deg PN 0010864
241. signal by only 35 instead of the nominal 50 A smaller runt pulse with ampli tude 75 of the normal size would reduce the voltage margin by half a huge hit to your noise budget but still workable For generic binary communi cation using no equalization we would like to see the runt pulse arrive with amplitude never smaller than 70 of the low frequency pulse amplitude Second Quarter 2005 0 HA l Olm 9 5 S S 10 oO g S 0 5m O 15 lm 1 5m 20 10 10 10 Frequency Hz FR 4 stripline 1 2 oz Cu w 152 um 6 mil Zp 50Q no connectors Figure I The effective channel gain associated with a long PCB trace depends on the trace width dielectric materials length and type of connectors used An 20 in Qm gt Figure 2 Long traces reduce the amplitude of the input pulse and disperse its rising and falling edges Signal amplitude at receiver Receiver Threshold Time 1 bit per tick 400 ps Figure 3 This test waveform displays the worst case runt pulse amplitude Sine wave at half baud rate Signal amplitude at receiver Receiver Threshold 400 ps div 0 5m trace Figure 4 A runt pulse amplitude equal to 85 of the nominal low frequency signal amplitude reduces the voltage mar
242. sing a CoolRunner II CPLD other benefits also include the following e Security e Input hysteresis e Optional output control e Low power operation e Multiple package options with varying I O e EEE1149 1 Boundary Scan test e Adjustable from 1 5V to 3 3V CoolRunner Il Family of CPLDs ams Doga Deen org poos ca yA mo E E m DET E O IA DECO CO Mw A E IC aa f e e e IK mae A e tt ag wo te Yer e Ye ETT E E EE S a Advanced Security Yes Yes Packages Size Type Maximum User 1 0 VQ44 10 x 10 mm leaded 33 CIC INES IEC II IS estr s e 6 ten bie af ren ie SITTI wano e a a we re extn cinco wo me A A A A E ECC A A A A roma ma Table 1 CoolRunner II family overview Second Quarter 2005 a HIGH VOLUME SOLUTIONS electrical characteristics With their small size bulk capacitance and inductance are Lowest Cost Small Form Factor Packaging sie Tt k ower than TQFP packages Conclusion Many product choices are available when Package Type OFS CP56 0 48 vo044 10100 a Boord Area 25 mm 36 mm 49 mm 144 mm 255 mm considering voltage translation but Bottom View CPLDs are the best choice for many rea Figure 1 MLF package comparison With these features CPLDs offer more than voltage translation and can do a lot more Because discrete level shifters per form a specific task you may be stuck with a specific I O choice With
243. sion you now have the option to license your soft ware with a USB dongle as well as the traditional methods of parallel and Ethernet ID Technology Viewer ISE 7 11 includes the new Technology software now Viewer which lets you view your post synthesis HDL based design at the block level in a schematic like display to get an early picture of how your design will be represent ed during the implementation flow Technology Viewer is built on the same graphic interface as RTL Viewer so there are no new tools or menus to learn Full hierarchy is represented so you can easily push down into the design as far as necessary highlighting and identifying crit ical elements in your design by pin net or instance name ISE software gives you more ways to view the progress of HDL based design implementation as you move through the programmable design flow RTL Viewer displays pre synthesis design elaboration while Technology Viewer shows the post synthesis block level design FPGA Editor lets you see and edit post place and route FPGA results Xcell Journal 65 DESTE IN AOS Display Summary and Message Filtering ISE Project Navigator includes a new Design Summary view that takes the most commonly sought after design information and places it in one easy to use automated display eliminating the need to search through multiple tool reports and outputs to find exactly what you need Upon st
244. st in many ways including new support for inexpensive SPI and parallel flash memory for configuration Currently a number of flash memory families are sup ported and more memory vendors will announce their support for Spartan 3E devices in the coming months Please check the Spartan 3E pages on the Xilinx website for a current list of supported SPI flash www xilinx comispartan3el d Second Quarter 2005 4 HIGH VOLUME SOLUTIONS Encoding High Resolution Ogg Theora Video with Reconfigurable FPGAs Once the traditional application area of custom ASICs modern FPGAs can now handle high performance video encoding ing 2003 issue of the Xcell whi eh m article about Spartan IIE based Elphel Model 313 cameras appeared How to Use Free Software in FPGA Embedded Designs was dedicated to the Xilinx Spartan 3 FPGA I gi e to think about using these devices in our new gen eration of Elphel network cameras but it wasnt until last year that I was finally able to start working with them One of the factors that slowed my com pany s adoption of this new technology was the fact that at first could not find appro priate software that could handle the devices selected as it is essential that our I end users can modify our products without expensive software development tools When I visited the Xilinx website in A Summer 2004 and found that the current vU SZ version of the
245. t ASIC and SoC Semico Research Corp richw semico com We are living in very exciting times today with new compelling consumer applica tions springing up seemingly out of nowhere These exciting times can be applied specifically to FPGAs as we con tinue to see growth and widespread accept ance of the concept of programmable logic by the marketplace at large The wind in the sails of the FPGA indus try is our old friend Moores Law which states that semiconductor devices will see a doubling of available transistors every 18 to 24 months The increase in transistor budg ets available to designers and the movement to ever tighter process geometries translates directly into devices with rising performance levels and smaller die areas A good example of this is the recent intro duction of FPGAs produced using 90 nm process geometries Smaller geometries have allowed FPGAs with more logic elements and embedded memory while at the same time decreasing the die area needed for the solution A part with a given performance level using 130 nm process geometries can now be produced with a higher performance level in a smaller die area at 90 nm Die area and yield have a great impact on device costs so FPGA manufacturers can offer these new 90 nm families at decreasing price points 10 Xcell Journal Why is this important and how does it change the systems landscape Lower price points for a given performance level allow system des
246. tal amplitude exceeds from which it is extracted and part ly from the natural fuzziness inher ent to any quick rule of thumb translation between the time and frequency domains The simple fre quency domain criteria conserva tively estimates these factors If your data code permits longer runs of zeros or ones than 8B10B coding then you must use a corre spondingly lower frequency as your lowest frequency of interest In the time domain you will see the Xcell Journal 1 ei INDUSTRY EXPERT Figure 6 illustrates a simple binary waveform x n and the related first difference waveform x n x n 1 If you are Channel gain dB 108 10 Frequency Hz Figure 5 The difference between high frequency and low frequency channel gain in this 2 5 Gbps system equals 3 dB received signal creep closer to the floor or ceiling of its maximum range before the runt pulse occurs making it even more difficult for the worst case runt pulse to cross the threshold As arule of thumb we look at the dif ference between the channel attenuation at the highest frequency of operation the 101010 pattern and the lowest frequen cy of operation determined by your data coding run length to quickly estimate the degree of runt pulse amplitude degra dation at the receiver This simple fre quency domain method only crudely estimates link performance It cannot substitute for
247. te these values up to the root of the tree as if they were true ones Figure 1 The key observation over the last 40 years of computer chess data is that the game tree acts as an error filter The larger the tree that we can examine and the more sophisti cated its shape the better its error filter property Therefore what we need is speed Top gt Figure I Game tree search in the blue part leads to an approximation procedure The Hardware Architecture Hydra uses the ChessBase Fritz GUI run ning on a Windows XP PC It connects to the Internet using ssh to our Linux cluster which itself comprises eight dual PC server nodes able to handle two PCI buses simul taneously Each PCI bus contains one FPGA accelerator card One message passing inter face MPI process is mapped onto each of Four Dual PCs with Myrinet Interconection Network Figure 2 A cluster of dual PCs supplied with two FPGA cards each is connected to a GUI via the Internet Xcell Journal 95 the processors one of the FPGAs is associat ed with it as well A Myrinet network inter connects the server nodes Figure 2 The Software Architecture The software is partitioned into two the distributed search algorithm running on the Pentium nodes of the cluster and the soft co processor on the Xilinx FPGAs The basic idea behind our paralleliza tion is to decompose the search tree in order to search parts of it in parallel and to bala
248. tem in the same way as with a microprocessor or microcontroller by implementing discrete functional blocks But unlike micro code these functions are totally independent instantiated in hard ware as opposed to software they will run in real time and are not susceptible to erro neous errors or interrupt states Each function will operate on its own and is guaranteed to run as described every time under every condition Another ben efit is that you can simulate every state before building any hardware so the risks of operations or tasks not functioning as required or happening out of sequence are dramatically reduced For example when building an instru ment cluster you may need to display a tell tale based on the driver pushing a switch respond to a CAN packet dictating the position of the tachometer and handle an error message from the braking system that must be displayed urgently to alert the driver all at the same time In software based systems these tasks run sequentially You must give priority to the error message but which task runs when and in what order depends on when the interrupt hits the micro In a PLD based system all of these functions are handled in parallel displayed as soon as the message hits the device The parallel processing available in FPGAs and CPLDs offers a great advantage in applica tions where instant uninterrupted signals must be processed reliably Second Qu
249. the model to predict future behavior Kalman filters then use measured signals such as the signature of the aircraft returned to the radar receiver to periodi cally correct the prediction function S simple _kalman A DIM size A 2 persistent p P cap if isempty P cap PD Cayo B 0 0 8 0 0 0 81 p ones DIM 1 2 end I eye DIM EE SONO OSO 0 0 128 gt ol estimate step p_est P Pp cap ese 19 Ek o9 correction step K Tee lav P cap est R p seo A p PECO PE E E cap est S jo s Figure 2 Kalman filter example M file Figure 2 shows the MATLAB M file describing the Kalman filter The algorithm defines matrices R and that describe the 4 Xcell Journal statistics of the measured signal and the predicted behavior The last nine lines of the algorithm are the code that predicts and corrects the estimate This algorithm illustrates the flexibility and conciseness of the MATLAB language Common operators such as addition and subtraction operate on variables like the two dimensional arrays A or P_cap without having to write loops as you would in lan guages like C Multiplication of two dimensional arrays is auto matically performed as matrix multiplication without any special annotation MATLAB operators such as matrix transposition allow the MATLAB code to be compact and easily readable And complex operations like matrix inversion are completed using MATLAB 1 exte
250. the other instruments trigger input The instruments are also connected together via the LAN which allows the logic analyzer to control the oscilloscope and pull out its data A calibration routine that runs on the logic analyzer makes repetitive measure ments in both the logic analyzer triggers scope direction and the scope triggers logic analyzer direction to determine the delay in the cross trigger for each direction Once these delays are known they can be applied to the oscilloscope data after importing to the logic analyzer time align ing the logic analyzer and oscilloscope data A Better Look Now we re going to trigger the oscillo scope on the narrow pulse occurring on the serial channel cross trigger the logic analyzer and bring both captured scope and logic analyzer data together on one screen The goal is to see exactly what state machine activity is occurring inside the FPGA when the narrow pulse hap pens on the serial channel This cross trigger is set up through the time correlation application All we have to set up is the scope trigger on the narrow pulse the rest is taken care of Rebooting the system we observe the capture on the logic analyzer which for tunately is very revealing The trigger point for the oscilloscope where the nar row glitch occurs is visible and we can also see time correlated to this event the transmit states of the state machine That actually looks okay As we sw
251. the peak shifts correspondingly high er always appearing just where you want it at a frequency equal to half the data rate Figure 8 overlays the pre emphasis response with the channel response at 1 meter showing a composite result the equalized channel that appears much flatter than either curve alone In very simplistic terms a flatter com posite channel response should make a better looking signal in the time domain The time domain benefits of pre emphasis appear in Figure 9 At shorter distances the signal appears over equal transition works fine in a bina ry system assuming that the receiver has ample headroom to avoid saturation with the maximum sized signal At 1 meter the signal looks quite nice with very little runt pulse degradation visible and if you look closely very little jitter The 1 5 meter waveform now just meets the 70 criteria for runt pulse success Second Quarter 2005 Compared to a simple differential architecture the pre emphasis circuit has at least doubled the length of channel over which you may safely operate Linear Receive Equalizer In addition to the pre emphasis circuit the RocketIO transceiver also incorporates a sophisticated 6 zero 9 pole receive based linear equalizer This circuit precedes the data slicer It comprises three cascaded stages of active analog equalization that may be individ ually enabled turning on zero one two or all three stages
252. these four regulators together with full protection power on reset and softstart features in one space and cost saving IC Applications for this new IC range broadly providing power solu tions for FPGAs ASICs POL embedded systems and I O across medical industrial computing and telecom The ISL6521 combines multiple switchers and or linears in a sin gle 16 lead SOIC package delivering the integration and flexibility FPGA based designers need This single IC solution increases avail able board space while reducing costs and the number of required external components The ISL6521 PWM controller is intended to regulate the low voltage supply that requires the greatest amount of current usually the core voltage for the FPGA ASIC or processor with a synchro nous rectified buck converter The linears are intended to regulate other system voltages such as I O and memory circuits Both the Second Quarter 2005 switching regulator and linear voltage reference provide 2 of stat ic regulation over line load and temperature ranges All outputs are user adjustable by means of an external resistor divider All linear controllers can supply up to 120 mA with no external pass devices Employing bipolar NPNs for the pass transistors the linear reg ulators can achieve output currents of 3A or higher with proper device selection The ISL6521 monitors all output voltages The PWM controllers adjustable overcurrent function monitors the
253. threshold then the pixel i j Second Quarter 2005 is a corner point The minimum eigenvalue is computed using an approximation to avoid the square root operation that is expensive for hardware implementations The corner detection algorithm could be summarized as follows The image gradient is computed by mean of convolution of the input image with a predefined mask The size and the values of this mask depend on the image res olution A typical size of the mask is 7 x 7 e For each pixel i j loop N sn UY N b NIT y by J X N pj AH where N is the number of pixels in the patch and Zf and h are the components of the gradient at pixel k inside the patch o Pj L t El El b2 where t is a fixed integer parameter e If Ke 0 and 4 gt d then we retain pixels 2 7 e Discard any pixel that is not a local maximum of P e End loop e Sort in decreasing order the feature list FL based on the degree of confidence values and take only the first wfitems Implementation With its high speed embedded multipliers the Xilinx Spartan 3 architecture meets the cost performance characteristics required by many computer vision systems that could take advantage of this algorithm The implementation is divided into four fundamental tasks 1 Data acquisition Take in two gradient values along the x and y axis and a MEN compute for each pixel three coeffi cients used by the characteristic
254. tions by Davor Kovacec CEO Xylon d og dkova xylon hr Market leadership in the fast moving elec tronics market continuously demands more innovative and cost effective prod ucts With an ever shrinking time to mar ket window design tools and prefabricates are important elements of success But the right development platform can make all the difference between success and failure harmonizing the contradictory requirements of being standard and highly configurable at the same time Combining Xilinx FPGAs with the Xylon logicBRICKS IP cores library Xylon s feature rich Multimedia FPGA Platform is ideal for addressing the time to market and flexibility needs of the high volume customer base You can quickly turn system designs running on this gener ic FPGA development platform into spe cialized products Such a design approach retains a large portion of design reuse through different hardware IP cores and software modules You can reuse these same modules in many system designs for differ ent applications Xcell Journal 57 E HIGH VOI Ue SONUTIONS Multimedia FPGA Platform The basic functional blocks of the Multimedia FPGA Platform are output to displays with inputs provided from video human machine interface and communi cation interfaces These functional blocks support a variety of different displays video input types input devices or com munication interfaces The Multimedia FPGA Pl
255. to external DDR SDRAM e SPI flash available for non volatile data or parameter storage e Additional SPI peripherals require a single select pin per device Figure 3 Connecting SPI flash memory to multiple devices pose pins called VS 2 0 define the type of attached SPI flash and operate as follows e These pins are only activated in SPI con figuration mode M 2 0 001 e The VS pins are sampled when INIT_B goes high e The VS pins are reusable as user I O after FPGA configuration completes DONE goes high Then the FPGA issues the command sequence appropriate for the selected SPI flash After Configuration After configuration all of the pins connect ed to the SPI flash PROM are available as user I O pins If not using the SPI flash PROM after configuration drive the FPGAS CSO_B pin high to disable the PROM free 18 Xcell Journal ing the FPGAs MOSL DIN and CCLK pins as user ICH If large enough the SPI flash PROM can also contain non volatile application data such as MicroBlaze processor code or data such as serial numbers and Ethernet MAC IDs Figure 3 shows an example of using SPI flash memory for multiple purposes Third Party Peripherals In addition to flash memory many periph erals utilize the same SPI These include e Memories EEPROM EPROM e Analog to digital converters ADCs e Digital to analog converters DACs e Thermal management e Display drivers e Microprocessors microcont
256. to help you understand and reduce power across your entire PLD proj ect Application notes and user guides with specific device and design examples go through everything from power distri bution considerations to minimizing power in hand held CPLD applications There is also key information from power supply partners like Intersil and Texas Instruments on third party power based products and offerings Conclusion The Xilinx Power Central website delivers everything you need to analyze your FPGA or CPLD based design with tools like XPower in ISE software Web Power Tools supporting the leading Xilinx FPGA and CPLD product families and complete partner and reference data FREE Multimedia Demo Seamless FPGA for Xilinx Virtex II Pro XILINX During this demo you will learn the following e Simplifying the embedded processor verification flow Easily importing designs from Xilinx Platform Studio EDK to Seamless FPGA Verification with Seamless FPGA accelerating the simulation process Performance profiling and analysis Watch the demo today http www mentor com products fy hwsw_covenficotien seamiess_fpga dema_request cfm For additional details about Seamless FPGA visit us at www seamlessipga com Second Quarter 2005 ooo SEENEN Increase Performance and Lower Cost Through System Design Poseidon s breakthrough ESL tools shorten the system level design cycle by using Xilinx embedded processors
257. truments most powerful floating point DSP processor the TMS320C6713 as well as with up to 64MBytes of SDRAM 8MBytes of FLASH ROM FireWire and optional Ethernet communications and analog I O It is suitable for stand alone operation or as a mezzanine daughter card and Optional Analog I O 12 14 16 bit A D A via FPGA I O Pins or DSP EMIF has extensive digital I O capabilities for easy integration with the on board FPGA DSP and FireWire resources Download the datasheet and learn more at www traquair com ads xcell c6713compact html Traquair Data Systems Inc Tel 607 266 6000 Email sales traquaircom Web www traquair com Xcell Journal 33 A IN MON Reducing Bilrof Materials Cost Using Logic Consolidator Logic Consolidator helps analyze the total cost of discrete logic on the board and the potential savings to be gained by using a Xilinx by Monita Chan Product Marketing Manager High Volume Xilinx Hong Kong monita chan xilinx com Ajay Panicker Field Applications Engineer CG CoreEl India ajay cq coreel com System enhancements and cost reduction are becoming major trends for upgrading existing system designs To be competitive companies need to enhance their existing feature sets and lower their total cost pro viding more for less Traditional systems have always used discrete logic devices for implementing the simple glue and control logic that every design requires The main argument in f
258. ts within the IP is maintained and the net route delays should see minimal varia tion The time spent to meet perform ance on your IP modules need not be repeated for each design using this IP Incremental Design Xilinx ISE softwares incremental guided implementation flow in place and route requires you to floorplan your design This guiding flow will re use your previous imple mentation results in conjunction with the floorplan to help preserve results in the new implementation You can use PlanAhead soft ware to easily generate the necessary floorplan for guided incremental flow With larger FPGA design sizes more users are complaining about unpredictability of results and long place and route runtimes PlanAhead block based flows put you in con trol of your larger designs and their increasing complexities Table 1 offers several examples of cus tomers who have had marked results after adopting the PlanAhead methodology Conclusion Even though FPGAs continue to grow in size and sophistication performance requirements and time to market pressures remain status quo for FPGA designers Existing design flows and methodologies are struggling to keep pace but PlanAhead software provides a revolutionary boost to your current flow Intuitive and powerful its design analy sis and block based hierarchical design capa bilities put you in control To learn more about PlanAhead design tools visit www xilinx com planaheadl
259. tunately with Spartan 3 devices these terminators are located on the FPGA itself removing the need for placement gyrations and allowing you to focus on using the three Ts dis cussed in this article to address signal quality timing and EMC Conserve Board Space with Spartan 3 FPGAs Spartan 3 FPGAs have built in termination resistors allowing hardware designers to save significant board space that would have previously been allocated to surface mount resistors Traditional external terminations require two traces for every chip to chip connection a trace for each resistor lead On chip ter mination using Spartan 3 devices cuts this in half requiring less routing area and possibly reduced layer count Second Quarter 2005 To get an idea of the benefit consider a 4 x 6 in 6 layer PCB with three Spartan 3 FPGAs on it including a combined routable area of 77 square inches With previous FPGA technologies exter nal terminating resistors would consume about 10 000 square mils of board space 40 x 250 mils each A board of 4 000 nets 1 333 with termination would result in a 13 33 square inch board space savings and a 17 reduction in manufacturing costs Xcell Journal IN MON Meeting Timing and Reducing Area with the Synplify Pro Tool by Steve Elzinga Senior Product Applications Engineer Xilinx Inc steve elzingaOxilinx com Steve Pereira Senior Technical Marketing Manager Synplicity Inc st
260. u ZZ X ZZ 967 81 e Sl ESA CH el S SF OL 9E 5 S 06 Je 008 9ES69 X Gupeds Weg ww zz L vDg p1epuezs puoq a11M Dg sobemed YDA ze 761 ww 9L X 9L 087 HOA S Ajimey 00S6 X g SE EE a E WN 0l Z 0l L 9 9 AN Eeer Ge EIST 06 882 0079 W PE SE 9 Sem 9E ww XZ men 8 VN Oi Ee S LLL agase DIE E S C 06 vyl ooz E 1XttLS69X Guneds eg ww go VDg ajers diy gt puoq a 11m el sabey eg ayers diy m an e Te 8l 0l oT Ol SF G lL E E E S E E S C 06 OL 009 IXZLS69X Guneds pea ww e 0 ddd un 01 saeed d4OL HOA EE Ajiwiey 1X00S69X Se E SC a E YN ott 0l L 9 9 UL EST81 Cie 06 88 00r 9 VT PE ve pE ve ve RO CES OI 8l E VN l LG S Ge E S OS LE 06 vyl ooz E AXvv1LS6 X Gupeds pea ww e 0 d4OL ut Aran OA sobe ed d4dOA 891 991 891 891 wu 9 0 X one CU VN ie Ge S L f a E T 7 81 EESC 06 GE 009 AXC LS69X EEL EEL BOL ww LEX ZLE 8 VN ie L S G l 9E E S C S L E E S C 06 9 008 AX9ES6DX HOA G Ajlmey AXO0S6DX 13 18 ZL wW ZLI X EEZ Gupeds pe j ww e 0 ddd gt sejd puoq 311m Od sabeoed dddd crlolal pa ae SIE S 3 3 a UM UM eb 69 69 Wu Z OE X Z OE ES 3 e E 3E y El D O oe A a D 2 e S a 3 ve ve FE tve ve ve HS LXSgLL A g2 SO T CHE S E Si Q SE EIERE o gt jeals e 3 S Guneds pea ww SU 191118 diyd 2tspid puoq 311m Jd S9BEAed 2121 a 3 Lo an Oo oO on Ki O e D 3 La MS 7 ja S 8 z a o E K xK xx xX X lt x x X el
261. uide a web based power estimator and XPower included in ISE software Virtex 4 devices handle the three types of power usage and control in the follow ing ways e Static power As process geometries shrink to 90 nm and lower the industry expects higher leakage and higher static power when channel length decreases Working with fab partner United Microelectronics Higher power levels in a chip can limit device and end system performance by forcing a lower system clock rate to stay within the system power budget How Xilinx Helps Manage System Power Virtex 4 FPGAs With a significant reduction in power con sumption over that of the competition the new Virtex 4 platform FPGAs offer sig nificant benefits for system design includ ing reduced thermal concerns easier power supply design lower cost power supply and higher system reliability Virtex 4 FPGAs dramatically reduce power consumption when compared to other FPGAs in all three key power areas e As much as 73 percent lower static power with the industry s first triple oxide technology e As much as 86 percent lower dynamic power enabled by embedded IP blocks e Negligible in rush current with unique power saving configuration circuitry Corp Xilinx solved this problem by using triple oxide technology in the Virtex 4 90 nm process which reduces leakage current significantly Two oxide thicknesses are widely used in the industry today w
262. ular specialization of homography This allows us to avoid the division and obtain good observational results x A Datt A x lei B ix A y H 3 y 1 A x H y As w served Six parameters are required to define an affine transformation Image Warping Algorithms There are two common ways to warp an image e Forward mapping e Backward mapping Using forward mapping the source image is scanned line by line and the pixels are copied to the resulting image in the position given by the result of the linear system shown in equation 2 This technique is subject to several problems the most important being the presence of holes in the final image in the case of significant modification of the image such as rotation or a scaling by a factor greater than 1 Figure 2 The backward mapping approach gives 0 910 A Jl better results Using the inverse transforma tion A we scan the final image pixel by pixel and transform the coordinates The result is a pair of non integer coordinates in the source image Using a bilinear interpola tion of the four pixel values identified in the source image we can find a value for the final image pixel see Figure 3 This technique avoids the problem of holes in the final image so we adopted it as our solution for the hardware imple mentation Implementation Software implementations of this algorithm are well known and widely used in applica U A A
263. umption can affect everything from cost and longevi ty of the device in your project to system per formance and battery life in hand held applications Xilinx design tools have expanded over the last few releases to offer you more ways to generate accurate estimates of your device power consumption The Power Essentials There are two main components to FPGA chip power consumption e Dynamic power is largely determined by the switching power of the core and the switching speed of the I O Dynamic power is largely affected by capacitive load supply voltage and switching frequency e Quiescent power is dominated largely by transistor leakage current and by DC current from a few specialized FPGA circuits The drive to lower FPGA costs drives transistor size down which tends to raise quiescent power consumption Design demands also continue to force design per formance faster leading to higher switch ing rates and higher dynamic power consumption Xilinx Virtex 4 FPGAs can deliver as much as 70 more performance than the nearest competing FPGA offering Normally this trend might raise dynamic power consumption Designs are also get ting denser year to year again in normal circumstances this tends to raise static and dynamic power However the Virtex 4 family offers dramatic static and dynamic power reduction through architecture design and process offering as much as Second Quarter 2005 1 10th the
264. units shipped to date The Spartan series features the world s most accepted low cost FPGA architecture and is familiar to thousands of engineers Moore s law allows Xilinx to offer ever lower prices in the Spartan series of FPGAs The Spartan 3E family is our third Spartan family in 90 nm and gives us the lowest possible manufacturing costs These low costs make programmable logic successful in high volume and low cost production applications an area previously reserved for ASIC and gate array technologies The Spartan 3 family introduced in 2003 is optimized for I O centric designs and is ideally suited for systems that have 14 Xcell Journal large I O requirements The Spartan 3E family is optimized for gate centric designs and is well suited for designs that require a relative higher gate to I O ratio The older Spartan II HE and Spartan XL families remain suitable candidates for legacy designs or for systems with higher core voltages Spartan 3 FPGAs have found remarkable success in the production of systems that typically would have used an ASIC or gate array For example many flat panel display systems employ Spartan 3 devices to man age the display driver and control functions The ability to modify the design after layout and adapt the system to changing market conditions makes FPGAs highly desirable Spartan 3E devices extend the reach of FPGAs into production volumes by fur ther reducing costs while preserving th
265. ure and power supply voltage levels You can perform power cal culations at three distinct phases of the design cycle e Concept phase calculating a rough estimate of power based on estimates of logic capacity and frequency of operation Supported using Xilinx Web Power Tools e Detailed design phase calculating power more accurately based on detailed information about how the design is implemented in the FPGA Supported through XPower and includ ed with all copies of ISE software Second Quarter 2005 e System integration phase measuring power using benchtop instrumentation board level measurements Xilinx Power Tools utilize activity rate to reach accurate power estimates Clocks and other input signals have an absolute frequency Synchronous logic signals use a percentage activity rate relative to the associated clock An activity rate of 100 indicates that a net is expected to change state fee Ire ib on every clock cycle Activity rates can be set globally 12 of the nets rhe switch each clock cycle on groups of signals or individual signals Activity rate allows you to adjust clock frequency in XPower and see the effect on power consumption to your design results The frequency of logic ele ments displayed by XPower is a function of their output signal s Power Tools The Xilinx Power Central website for FPGA and CPLD power www xilinx com power
266. urnal The Spartan 3E FPGA Starter Kit 149 available in Q3CY05 Complete solution includes e Xilinx Spartan 3E 500 000 gate platform FPGA XC3S500E 4F6320 e 32 Mb parallel flash e 8 Mb SPI flash e 32 MB of DDR SDRAM e Board interfaces e Ethernet 10 100 PHY e USB 2 0 PHY controller e 3 hit eight color VGA display port e Nine pin RS 232 serial port e PS 2 style mouse keyboard port e Three 40 pin expansion connection ports Additional features e Two line LCD e Four slide switches e Eight individual LED outputs e Two momentary contact push button switches e 90 MHz crystal clock oscillator e Universal power supply 100 240V AC 50 60 Hz e JTAG cable e Spartan 3 3E resource CD e EE Foundation evaluation CD e EDK evaluation CD Part numbers e D0 SPARSE DK e DO SPAR3E DK J Japanese version The Spartan 3E FPGA Starter Kit includes a full featured devel opment board based on the Spartan 3E platform FPGA family Second Quarter 2005 Memec Mini Modules The Ultimate Programmable System hai if lt became easier to integrate your favorite Xilinx FPGA onto your circuit board hat if The job has already been done for you Memec Mini Modules are complete systems on a module which pack all the necessary functions needed for an embedded processor system onto a tiny footprint the size of your thumb Spartan 3 Mini Module Features Virtex 4 Mini Module Features SPARTAN 3 gt Small Fo
267. vides the basic interface and support func tions required by a low cost platform for general purpose prototype use The kit bundles a Spartan 3E based demo board with a power supply user guide and reference designs Xilinx ISETM WebPACK software or BaseX software along with a low cost JTAG download cable are available as kit options Second Quarter 2005 Xcell Journal 115 PS TRE BOARD SOOM Xilinx FPGA Starter Kits gt XILINX The Spartan 3 FPGA Starter Kit 99 Complete solution includes e Xilinx Spartan 3 200 000 gate platform FPGA XC35200 4FT256C e Xilinx 2 Mb Platform Flash configuration PROM XCFO2S e MB of fast asynchronous SRAM 512K x 16 or 256K x 32 e 3 bit eightcolor VGA display port e Nine pin RS 232 serial port e PS 2 style mouse keyboard port e Four character seven segment LED display e Fight slide switches e Fight individual LED outputs e Four momentary contact push button switches e 50 MHz crystal clock oscillator e Three 40 pin expansion connection ports Universal power supply 100 240V AC 50 60 Hz e JTAG cable e Spartan 3 resource CD e SET Foundation evaluation CD e EDK evaluation CD Part numbers e DO SPAR3 DK e DO SPAR3 DK J Japanese version The Spartan 3 FPGA Starter Kit gives you instant access to the complete platform capabilities of the Xilinx Spartan 3 family The kit brings high volume designs to reality quicker at a lower cost and on schedule 116 Xcell Jo
268. w features that marketing dreamed up for the next generation of products But that s not all because you still need to consid er if the I O standards match CoolRunner I CPLDs do that for you as well What does this have to do with cost sav ings If we look at the cost of discrete devices versus low cost Xilinx CPLDs you can achieve the same function for less money and get a whole lot more Xilinx has a free down loadable tool called Logic Consolidator to show you just how much you can save recent ly updated with even more parts to compare You can download it at www xilinx com products cpldsolutions logic_tool htm There are even associated documents to explain how we did the comparison Xcell Journal 37 E HIGH VOI Ue SONUTIONS That s still not all because there is more to cost than the silicon In fact if you look at the cost of discrete devices and the associated stocking shipping assembly and routing costs in most cases a single integrated chip costs less Also note the CoolRunner II CPLD s added reliability versatility and low power Other Cost Reducing Ideas Still other features on CoolRunner I CPLDs may be overlooked OTF on the fly reprogramming is one of those features that can make your CPLD design do double duty for no extra cost Here s one good example of using OTF on board power up use a CPLD to con figure the Xilinx FPGA devices on board Once that is complete reprogram the
269. w help desk personnel to debug a prob lem remotely at a customer site The ChipScope Pro system supports the newest low cost FPGA offering from Xilinx the Spartan 3E family Xcell Journal H PESTE NE e ATCZ software debug core has also been enhanced for automatic setup allowing the logic analyzer to automatically find which ATCZ FPGA pins are connected to which logic analyzer pod signals making setup faster and easier ChipScope Pro cores have also been per formance enhanced System debug can now occur at clock speeds greater than 315 MHz one of the fastest verification cores available Core Inserter can now be used across multiple netlists from one or more sources allowing system integrators to debug entire designs rather than one sec tion at a time ChipScope Pro software also supports the newest Xilinx USB plat form programming cable Linking to Agilent Logic Analyzers The ChipScope Pro system also links inter nal FPGA debug to Agilent Technologies bench top logic analyzers using the included ChipScope Pro ATC2 core This core syn chronizes the ChipScope Pro system to Agilent s FPGA Dynamic Probe software an optionally purchased plug in to your Agilent 1680 1690 or 16900 logic analyzer This unique partnership between Xilinx PC Board Works with Xilinx Virtex 4 Spartan 3 Virtex ll and Virtex 1l Pro FPGAs and Agilent delivers deeper trace memory faster clock speeds and more trigger opti
270. ware API functions provide runtime host connectivity This enables direct communi cation from the host system to all nodes within the network The database sequences downloaded 102 Xcell Journal a e lO ee nie ed VK aot O ON AN ei a A AA e A A en AA AAA A Teen gen eT SW Figure 2 Screen capture of network components and Smith Waterman algorithm using DIMEtalk Figure 3 Screen capture of a multi FPGA network designed using DIMEtalk from the GenBank database are formatted using a C routine for input to the FPGA network The formatting consists of coding the individual nucleotide bases using 4 bits A ux00 C ux01 G ux10 T ux11 where the bit denoted by u is undefined and the one denoted by x is set to 1 for the starting nucleotide base of a sequence set to 0 otherwise The sequences are then packed into groups of eight to fully use the 32 bit DIMEtalk networks The length of an individual sequence could be well over 200 000 nucleotide bases long Test Results The initial test utilizes three of the five FPGAs from the network The resulting multi FPGA network comprises a BenNUEY motherboard with a Virtex IT XC2V3000 4 FPGA and an attached BenBLUE II daughterboard with two Virtex II XC2V6000 4 FPGAs The query sequences are hard coded into the FPGAs while the individual database sequence files are loaded in the main memory The database
271. wer EMI and it was apparent to Wipro that the Xilinx CPLD was the obvious choice After using Logic Consolidator for this card Wipro was so impressed with its effi ciency and usefulness that they decided to use it for all of the other cards in their tele com product upgrade project and are planning to use it in future projects as well Conclusion Logic Consolidator is a powerful tool that allows customers to analyze the costs of different alternatives for themselves Because it is built using Microsoft Excel it is easy to learn and use To get a copy of Logic Consolidator contact your Xilinx sales representative And for more information about or to download Logic www xilinx com products cpldsolutions logic_tool htm A Consolidator visit new version now includes more logic P O device choices 7400 Conversion Calculator Cabegory Device Description Total i SS Galis TO Quad 7 Input HAND T E oeh So Gates Te Qued 2 Input HOR RK er SH Gates TAM bes INVERTERS 4 gdl So Gales Tie Tripe 3 eg NANO a E TEA Sg Gales 1427 Tripe Jeinput Positivo NOR Gaisa 2 E dami Ropster tald Dual D Type Postre Ego merce Flip Fiopa With Preset And Cleat fC Al Decora Tag Se e rar Cerri wie CLA hot Ai Daya 74104 DP Parallel Oot Serial Bit Aagi 3 i E E A Dearga T4166 Parabol Load 8 Et Shh Rogaber wilomp Outputs 3 RH TTM JE d AA Devica 141758 Quad O Type FlipFlop wCLR AS Sg d Ad Deeg MT Quad 2 io 1
272. wesen NTE t S v 1118 11 15sep 8 111SH II 16 SL SIA y 082 72 vz Et vz 2071 DCL 08Z LL 089 L 0x87 0001 woneem O00LSEDX wl se Leer oe EH Iesst TLSH Iesst 811155 LL v S v 418 1 SSeJD Z11SS ZHINEE v97 oU SIA y EI 91 1887 91 9S SEIN 1908 v8s 87X7E N00P LA ora l ju K DH Ia D EE eee ES wn EH NO v sp ee MEE Dd te ELL o SAA y EI al 2017 al WOE EK oze 026 1 pi 00 weern 18 US TE ESONIA LINI ae papua ajbuis HI oe a Z EI y AL y NZL ge SC 891 ZLX9L MOS EEE EEE EEE 192 z a30U 395 JOA Z Sa1 1uey 1 ueyeds pue g ueyleds HE poe o wz EH DIAN SASY SAAT HUW N6 S v S t EHS org e ON 8 9ZE S 9 1819 9 ALEZ vOS6Z7 Z6L EE ZSLbL 91X85 0091 Gupeds Weg ww 0 1 vDa Yyoud auyy puoq a11m 54 saeed V94 S Wes ro sy Smu com i a kent ZENS 87 MOS 8z me mu en og 09X9 007I ELL ELL ELL 06L O6L ZZL 967 81 1 11SH S Z 8 1 11155 EZ v S v ZHINOOL X IDd ZHN99 E ZEZ 76 ON y 9ZE S 07 200 07 JEL Z1 6 9Lv 0L 999 QvXPE 00S Buneds jeq ww 0 LU vpg ut uaud auu puoq 341M 14 sabeyeq gp ieee i M09 AE E Dd CU a e SE Se Vid v Sy IUSZICESOWDAT ULM DL 89 ON y 9ZE S ZL 2917 a WBE 968 80 S 8bb7 vEX9Z MOST ae apua a bul 5 e DEELT ad Wan v GY ppe o or SON Z IZE S y AZL v SL 0261 EIN 096 Z X9L 2001 Gupeds Deet ww e 9 ddd up 01 ebe d40L e a os TO 2 e U E A A E a of oe S o D 5 bes J 4 D o w D 2 lt lt JE A b es z EE E 2 ee EH eis e ot 0 ER A ry Bupeds pea ww e 0 ddd up
273. xity FPGA devices whether used in the final product or as an ASIC prototype you are clearly facing these same challenges As a result EC is quickly becoming a manda tory verification methodology in FPGA design flows This methodology becomes even more important if you have selected the EasyPath low cost FPGA solution for your production needs In this article we ll provide an overview of EC technology and the bene fits it brings to FPGA design Using the Synopsys Formality equivalence checker in your FPGA design flow enables you to quickly verify the implementation of your device giving you the freedom to focus your efforts on other design tasks Introduction to Equivalence Checking EC is a formal static verification technology that uses mathematical techniques to deter mine if two versions of the same design at different stages of development are func tionally equivalent The power of this tech nique is its ability to compare between e Two RTL versions e An RTL description and a gate level netlist e Two gate level netlists EC flows consist primarily of four stages 1 Read the EC tool reads RTL descriptions or netlists for the refer ence and implementation designs and segments the logic in each design into smaller components called logic cones Figure 1 Logic cones are simply small groups of logic bordered by registers ports and black boxes The end points for each logic cone are known as compare
274. xity as some ASICs The ability to realize very complex functionality in FPGAs has created tremendous challenges in the FPGA verifi cation process Formality helps you e Achieve 100 coverage e Reduce runtimes 10 100X versus traditional dynamic verification e Quickly isolate discrepancies between the RTL and implementation e Reduce or eliminate the time spent re simulating after a design change e Verify that changes did not uninten tionally impact other functionality Synopsys and Xilinx have worked togeth er to create a proven FPGA static verifica tion solution centered on Formality Design Compiler FPGA and Xilinx ISE implemen tation tools Formality provides a fast thor ough functional verification methodology for proving equivalence between multiple representations of your design For more information on Formality or DC FPGA Visit www synopsys com or contact your Synopsys sales representative Xcell Journal 19 ES DESIGN TOOLS II Complete FPGA and CPLD Power Analysis The Power Central website contains everything you need fo accurately predict FPGA or CPLD power consumption 2f SM 80 Xcell Journal A by Lee Hansen Sr Product Marketing Manager Xilinx Inc lee hansenOxilinx com Tony Thomas Technical Marketing Engineer Xilinx Inc tony thomas xilinx com Device power consumption has become one of the leading design issues facing FPGA and CPLD engineers today Device cons
275. your FPGA designs in a way that will save days of development time The FPGA dynamic probe when combined with an Agilent 16900 Series logic analysis system allows you to access different groups of signals to debug inside your FPGA without requiring design changes You ll increase visibility into internal FPGA activity by gaining access up to 64 internal signals with each debug pin You ll also be able to speed up system analysis with the 16900 s hosted power mode which enables you and your team to remotely access and operate the 16900 over the network from your fastest PCs The intuitive user interface makes the 16900 easy to get up and running The touch screen or mouse makes it simple to use with prices to fit your budget Optional soft touch connectorless probing solutions provide unprecedented reliability convenience and the smallest probing footprint available Contact Agilent Direct today to learn more Agilent Technologies dreams made real HIGH VOLUME SOLUTIONS Leading the High Volume Programmable Revolution Xilinx pioneered and leads the adoption of PLDs in low cost systems by Sandeep Vij Vice President Worldwide Marketing Xilinx Inc e sandeep vij xilinx com Table of Contents Spartan 3E FPGAs Introduce a New Era in Low Cost Programmable Logic 14 Xilinx was the first FPGA company to recognize the market Implementing New Configuration Options for the Spartan 3E Family

Plugging into High-volume Consumer Products

Contents

Download Pdf Manuals

Related Search

Related Contents