Home
TELL1 Board - Physikalisches Institut Heidelberg
Contents
1. Functional block Logic Block Block Block DSP PLL Elements memory memory memory blocks LE 512 bit 4k 4k x 144 L1 trigger ZSupp 9000 80 40 1 80 1 HLT Link 700 0 2 0 0 0 L1B 3000 12 12 0 0 1 Synchronization 1000 0 12 0 0 0 Data Generator 200 0 0 0 0 ECS slave 400 0 0 0 0 ECS registers 500 0 0 0 0 0 ADC clock gen 200 0 0 0 0 1 Total 15000 92 68 1 80 3 Available in 1520 18460 194 82 2 80 6 Available in 1525 25660 224 138 2 80 6 This is the estimated number with the assumption that the processing is done with 80M Hz amp Assume that the zero suppression for the HLT is done on the SyncLink FPGA which reduces the resources needed on the PP FPGA 16 clocks per PP FPGA can be generated for the ADC sampling clocks using one PLL plus additional logic Table 4 Estimation of needed resources on the PP FPGA For sub detectors not con tributing to the L1 trigger the logic resources on the chip are available for other tasks 17 7 SyncLink FPGA This FPGA is used to distribute control signals interfacing the TTCrx and the FEM linking the cluster fragments from the whole board doing zero suppression for the HLT and sending the data to the RO TxCard The cluster collection uses FIFO based interfaces from the PP FPGAs to the SyncLink FPGA The links to collect the clusters for the L1T interface and the HLT are chosen to be 16 bit wide 7 1 L1T fragment link To clarify the use of
2. the upper 3U is reserved for the power backplane The optical and electrical connectors for the TTC ECS L1T HLT and Throttle are plugged manually from the back which is accessible since there are no transition modules 16 1 Cut outs For the two optical receivers two cut outs are made to allow a maximal hight of more then the stack hight of 11mm The allowed hight of the receiver increases therefore by the board thickness which is 2 2mm The A RxCards also needs cut outs in the region of the front panel connector The two plus four cut outs superimposed leads to the final front side shape given in appendix D 34 A RxCard for 16 ADC Channel Glue Card 012 RxCard for 12 x 1 6 Gbit s RO TxCard Ethernet Figure 14 This preliminary board layout shows the arrangement of the mezzanine cards connectors and all major components on the board 35 16 2 LEDs The mechanical constraints do not allow to have LEDs placed on the motherboard visible on the front panel There are several status LEDs visible on the back Visible LEDs on the front panel are implemented on the optical receiver card 17 FPGA implementation guidelines To allow several groups to work on the software and firmware development for the TELL1 it is necessary to define the interfaces of the board chips and functional blocks on the chip To allow the interfaces to be as simple as possible and robust against changes of clock domains the use of on chip real dual po
3. 00000045 10 6 3 Data synchronization for the O RxCards 11 6 4 Event synchronization for the Velo 11 6 5 Event synchronization for the O RxCard 12 6 6 L1 buffer memory access organization o o a e 12 Gr LIT rete pe cone oe sue pres aaa e E RARA 16 6 8 I O signals and resources o a 16 7 SyncLink FPGA 18 Ll oregon ke ae ges sack eg eS eo heey eee a 18 7 2 HLT fragment link at we ee he i we ee oe A Eo ea 19 7 3 Cluster formats for fragment links 2 2 44 4 454 ee he ews 19 7 4 L1T data path on the SyncLink FPGA 21 7 5 HLT data path on the SyncLink FPGA 21 7 6 RO TxCard interface aoaaa mada 21 T TICO interfaces bc ee ee EE OER a o Y 22 7 8 LO and L1 Torote ts tene a A OES 23 7 9 FEM for Beetle based read out o 23 7 10 I O signals and resources ooo AGRE REED a 24 8 ECS interface 25 Sel JIAG sacosa duka ka diguk a AA aod ai AA 25 A POr he ee ee enu rahe NI 25 8 3 Parallel local bus fc ooa isa a a 26 9 Resets 27 10 Clock distribution and signal termination on the board 28 11 FPGA configuration 28 12 FPGA technology TEL Altera s enia ai ea ke wk Fo ee a G 12 2 Xilinx VirtexI 12 3 Device choice 0 000 ees 13 L1 trigger and HLT interface RO Tx 14 Testing 14 1 JTAG boundary scan 2c hd oe bi oe BY ae ES 14 2 Test MOMS A 14 3 Analyzer connect
4. 222 16k x 36 512kbit RAM 212 18x18 Multiplier 105 206 Table 8 Xilinx Virtexll Speed grade 5 second fastest out of 3 compared to Altera Stratiz Speed grade 7 slowest out of 3 4nttp www altera com 25http www xilinx com 30 12 3 Device choice Several reasons have driven the decision to use Altera Stratix devices on the board Migration The migration between devices in the low density device region of the Stratix family allows to have relatively low cost migration to higher density de vices The VirtexII family devices with equivalent size are in the high density region of the family and tend to get very expensive Memory With three different memory block sizes they can more efficiently be used in our application DDR SDRAM interface Dedicated read data clock DQS delay circuits for DDR SDRAM PLL vs DLL PLL are more suitable for clock distribution since they do not suffer from additional jitter after each frequency translation step Cost and speed The slowest speed grade Stratix device is sufficiently fast 13 L1 trigger and HLT interface RO Tx The interface to the HLT and the L1T is implemented as a mezzanine card A two gigabit ethernet card is under development an is considered as the baseline to interface the DAQ system The card provides two copper GBE 6 14 Testing 14 1 JTAG boundary scan All devices supporting JTAG boundary scan are chained together For production testing the
5. In most the cases on this PCB a serial resistor of 33 Ohms is appropriate Simple parallel termination can not be applied due to the lack of driving strength of LVTTL and too high power dissipation All signals driven by the FPGAs can be terminated by programming the I O cell to use the on chip termination option For the DDR SDRAM the SSTL 2 I O standard developed for memory bus systems is making use of parallel termination and is fully integrated in the memory and the I O cells of the FPGA With the use of the SSTL 2 I O standard and the TLK2501 using 50 Ohm transmission between the optical receiver and the de serializer all signal layers on the board are chosen to be 50 Ohm The clock distribution on the board is accomplished with PLL circuits on the FPGAs for de skewing and multiplying the clock signals see figure 13 The Clock40Des1 40M Hz clock from the TTCrx is taken as the reference for all circuits using the LHC system clock and is connected to the SyncLink FPGA For distribution to the various circuits on the board PLLs on the SyncLink FPGA are used This allows to adjust the clock phase individually for each external circuit and ensures the proper timing between them In addition to the 40M Hz system clock a x3 multiplied 120M Hz clock is used for the DDR memory access on the PP FPGAs This clock is used for the link interfaces for the L1T and the HLT With this distribution scheme no external clock buffers are needed and a maximal flexibilit
6. Information BCnt EvCnt 160 MHz accepted events Figure 3 Data flow to and in the PP FPGA with the optical receiver cards The diagram shows the data flow for 6 optical links one quarter of the board Level 0 throttle is set and sent to the readout supervisor to stop accepting events at Level 0 This mechanism can only prevent from buffer overflows with a link buffer large enough to store the remaining already accepted events After a second aggre gation on the SyncLink FPGA the board wide event fragment is stored in the multi event packet MEP buffer for the L1T This buffer is implemented as an on chip RAM on the SyncLink FPGA Complete MEPs are framed into ethernet packets and the IP header is added according to the specification in 2 before sent to the RO TxCard The L1 accepted event rate of 40k Hz allows either to do zero suppression on each PP FPGA or to transfer the data to the central SyncLink FPGA The main advantages of a central implementation of the zero suppression is that it leaves more resources in the PP FPGA for the L1 trigger pre processor and the fact that a unified scheme for all sub detectors can be used The slow event rate of 40k Hz allows to time multiplex the events for the whole board for sub detectors that perform a channel independent zero Suppression The readout of events accepted by the L1 trigger starts with the L1T decision dis tributed over the TTC broadcast command which is interpreted on
7. PP FPGAs 64KByte Onchip 100 MHz SyncLink FPGA Figure 16 Sub detector specific blocks on the SyncLink FPGA are indicated with the dashed boxes work can serve as an example implementation that guides the specific implementations for the other sub detectors All interfaces and data formats need to be defined and documented in a framework user guide 37 References 1 Aurelio Bay Guido Haefeli Patrick Koppenburg LHCb VeLo Off Detector Elec tronics Preprocessor and Interface to the Level 1 Trigger LHCb Note 2001 043 B Jost N Neufeld Raw data transport format LHCb Note 2003 063 Raymond Frei Guido Gagliardi A long analog transmission line for the VELO read out LHCb Note 2001 072 Niels van Bakel Daniel Baumeister Jo van den Brand Martin Feuerstack Raible Neiville Harnew Werner Hofmann Karl Tasso Knofle Sven Lochner Michael Schmelling Edgar Sexauer Nigel Smale Ulrich Trunk Hans Verkoojen The Beetle Reference Manual Prentice Hall 1993 Mike Koratzinos The Vertex Detector Trigger Data Model LHCb Note 89 070 Hans Muller Francois Bal Angel Guirao GiGabit Ethernet mezzanines for DAQ and Trigger links of LHCb LHCb Note 2003 021 Jorgen Christiansen Requirements to the L1 front end electronics LHCb Note 2003 078 Niels Tuning undpublished J Christiansen A Marchioro P Moreira and T Toifl TTCrx Reference Manual CERN EP MIC Geneva Swi
8. Q 125MHz AA A ECS TTC Gigabit Ethernet Figure 13 Overview of the clock distribution on the TELL1 Only clock signals are drawn or the SyncLink FPGA The EEPROM used on the EPC16 is a 16 Mbit flash memory manufactured by Sharp The minimal number of erase cycles are 100 000 2 12 FPGA technology 12 1 Altera The evolution of FPGA technology has driven the devices to higher density faster on chip processing and faster I O capability The development is mostly driven by the telecommunication industry which is also doing multichannel processing on the FPGAs There is nevertheless a major difference on the demand of I O performance For this board only single ended 40MHz to 160M Hz interconnect signals are used The standard currently supported by FPGA families are e g 840Mbps or 3 125Gbps This circumstance should not mislead to the conclusion that these chips are overkilled to use Price investigation for high density FPGA device for the present and the near future show that the most recent devices family will cost less than e g Altera 23The number of erase cycle for the smaller EPC devices is significantly lower 100 29 Apex devices This can be explained with the miniaturization of the silicon process to 0 134m which allows to reduce production cost In an uncomplete list of features the advantages of the Stratix devices over the Apex is shown For details see the specification and application notes on the
9. This is done for all channels 6 5 Event synchronization for the O RxCard For all detectors 24 bits EvCnt LO EvID and the 12 bits BCnt are added to the header For each LO accepted event the LO EvCnt and BCnt available on the the SyncLink FPGA are written into a local LO de randomizer which is a FIFO that can store at least 16 LO EvCnt and 16 BCnt The LO EvCnt and BCnt are transmitted over a 6 bit wide bus within 6 clock cycles 4 word LO EvCnt and 2 word BCnt to the PP FPGAs The Synchronization circuit on the PP FPGA is setting a synchronization acknowledge signal as soon as the header with the corresponding identification is detected This is sent to the LO de randomizer on the SyncLink FPGA to read out the next value from the LO de randomizer 6 6 L1 buffer memory access organization A block diagram of the principle of the L1 buffer controller is shown in figure 7 The L1B controller 120MHz RdData 2 M x 32 bit 64 Mbit Needed per input channel UsedWords pO vet Data to DAQ_PPLink From SyncLink_FPGA Rd Wr access for ECS Figure 7 Schematical L1B controller block diagram For the hardware implementation additional fifos on the input and output of the core are used to improve the timing data coming from all synchronization channels are written to the L1B by one L1B 12 controller Its Arbiter allows to schedule the required transaction It checks on the state of the InFifo indicated with the Used
10. external boundary scan cable device can be connected to this chain with a 10 pin connecter Since the JTAG chain over the whole board is very long 75cm the TCK and TMS signals are distributed point to point to avoid signal integrity problems Boundary scan can also be performed with the JTAG from the GlueCard 14 2 Test points For debugging purpose many signals are required to be attainable with a scope probe It is foreseen to route unused I O pins from the FPGAs to test points and connectors 14 3 Analyzer connector To provide a simple connection for a logic analyzer to configurable GPIOs of the FP GAs 20 pin connector for the PP0 FPGA and also for the SyncLink FPGA are fore seen 31 14 4 Lemo connectors Three Lemo connectors one for the PLL 40MHz input and two general I O to the SyncLink FPGA are implemented This allows to clock the board without the TTC interface 15 Power requirements A list of all power supplies and its estimated current is given in table 9 For the FPGAs a power calculation spread sheet has been used for the estimation To avoid too high voltage drop the low voltage power supplies 1 5V 5A 2 5V 10A have to be generated on the motherboard using PWM power supplies These work typically at 85 efficiency and use 48V input voltage which leads to 1 5A board The 5V and 3 3V are distributed on the backplane Separated from the digital supplies the 5V 5V and analog ground are distributed on the power ba
11. the SyncLink FPGA see figure 4 Over a serial link the LO EvCnt and trigger type for accepted events QDR _CIk Clk_40 Clk Long and Short Brest TTCrxRst E BCnt EvCnt SyncData k Ba SyncAck LIA Generator To PP FPGAs DataValid broadcast Shared data path for 2 channel RO TxCard 100 MHz To RO Tx From PP FPGAs 4 KByte QDR SRAM 100 MHz broadcast 64KByte 32 To RO Tx Shared data path lt for 2 channel g RO TxCard e a 100 MHz a E S 64KByte pra Onchip 100 MHz SyncLink FPGA Figure 4 Overview of the SyncLink FPGA data and control signal path The data flow for linking and sending the data to the RO TxCard is done by first buffering the input link data zero suppression event linking multi event buffering and framing The two data path for L1T and HLT appear to be identical except the fact that for the HLT an external RAM is needed is transmitted to the PP FPGA The Arbiter reads the requested events stored at the start address given by the LO EvCnt The LO EvCnt is also stored in the event header and can therefore be checked The events are collected in a dedicated block called HLT PPLink At this stage a first aggregation of the headers within one PP FPGA is done The LO EvCnt and BCnt are checked the processing error flags are or ed among all channels and 2 user specific header words per channel are
12. the access to registers on chip memories L1B and external MEP buffers It is used in 32 bit multiplexed mode running typically at 10M Hzor 20M Hz depending on the performance needed and obtained The frequency can be changed with the PLL settings on the SyncLink FPGA and the phase can be adjusted to ensure valid setup and hold time for accesses to the 40M Hz clock domain The 28 bit address space is divided in two sections The lower half 128MByte for the access of registers FIFOs 21See the documentation of the PLX9030 for the functionality of the local bus http www plxtech com 26 RAMs and ROMs on the FPGAs and the upper half for the L1 Buffer Since 128Mbyte is only one quarter of the size of the whole buffer on the board the address space of the upper half has to be implemented as a swaped address space see figure 12 The swap KLN L i a e S Swapped 8 bit Swap Page address Not Swapped 28 bit address space Figure 12 The ecs address space is a swaped address spaces The different spaces are enumerated with the page address page page number is set in a 8 bit register on the SyncLink FPGA and is distributed to the PP FPGAs via dedicated data lines 8 bit allows to extend the address space by 6 bits to cope with an increase of L1Buffer To access registers of different size 8 bit 16 bit and 32 bit the two lowest order bits of the address contain the binary encoded byte enable signals A more detail
13. 3 clock cycles a counter on the SyncLink FPGA is implemented to make it always available The event counter is also transmitted on the SyncData bus over four clock cycles The EvCnt on the TTCrx is called LO EvCnt in LHCb Brcst The setting of the TTCrx is made such that the broadcast command data signals are all synchronous to Clock40Des1 the appropriate settings are made on the control registers of the TTCrx The broadcast command is used to decode the LHCb L1 accepted events 16 The chip is packaged in an 144 pin FBGA package 13mm x 13mm 17Sionals named Str at the end are strobe signals and are used to latch the corresponding data bus 18 At overflow the value of the counter remains at Oxffff 22 L1 accept The TTCrx signal called Llaccept is named L0acceptLHCb to avoid any problems with the LHCb naming convention It is used for the Velo FEM and is also used to generate the LO EvCnt independent of the TTC BCnt which allows to verify the correct synchronization DOut The type of data available on this data bus is indicated with the DQ signals DQ Data qualifier bits going with the DOUT data bus DOutStr Indicates valid data on the data bus SubAddr Used to output sub address contents For the optical receiver the Agilent HFBR 2316T is used which is recommended for the use for the TTC system in the counting room 1 This receiver can be replaced with some small modification of the layout by the TrueLight TRR 1B43 which will bec
14. 4x 1 1 L1A EvID 4 Clock 40M Hz distribution 4 Clock 160M Hz distribution 1 5V HSTL 2 Processing mode 4x2 L1T processing sync 4x2 HLT processing sync 6x1 Initialization done 2x16 32 QDR memory for HLT SEvnt buffering 1 5V HSTL 4 Resets 4x4 GPIO from PP FPGA s 48 Analyzer interface TP 2 ECS I2C 3 3V LVTTL 3 Device address 3x2 Reference voltages 0 75V 8x2 Termination resister reference R 606 Total Table 11 The number of I O s used for the SyncLink FPGA with the proposed parti tioning of the board of 4 PP FPGA s 40 B Signal tables Option Signal name of pins I O Standard Digital GND Cu plate pwr All JTAG JTAG_TCK 1 out 3 3V LVTTL JTAG JTAG_TMS 1 out 3 3V LVTTL JTAG JTAG TDI 1 out 3 3V LVTTL JTAG JTAG TDO 1 in 3 3V LVTTL JTAG JTAG RESET 1 out 3 3V LVTTL A RxCard 16 x 38 bit Digital 3 3V Cu plate pwr Data 128 input 3 3V 2 5V LVTTL Clk 16 output 3 3V 2 5V LVTTL 1 C RxScl 1 out 3 3V LVTTL 1 C RxSda 1 inout 3 3V LVTTL 1 C RxAddr 2 const 0 or 3 3V A RxCard 16 x 10 bit Digital 3 3V Cu plate pwr Data 160 input 3 3V LVTTL Clk 16 output 3 3V LVTTL PC RxSel 1 out 3 3V LVTTL PC RxSda 1 inout 3 3V LVTTL I C RxAddr 2 const 0 or 3 3V Oc Cand 6 input Digital 2 5V Cu plate pwr Data Dv Er 108 input 2 5V LVTTL LoopEn 6 output 2 5V LVTTL PrbsEn 6 output 2 5V LVTTL nLckRef 6 output 2 5V LVTT
15. 7 Data3 8 105 106 Data3 9 Data310 107 108 Data3 11 Data3 12 109 110 Data3 13 Data3 14 111 112 Data3 15 3 RxEr3 113 114 RxDv3 LckRef3 115 116 Enable345 RxClk3 117 118 NC GND 119 120 GND Data4 0 121 122 Data4 1 Data4 2 123 124 Data4 3 4 Data4 4 125 126 Data4 5 Data4 6 127 128 Data4 7 Data4 8 129 130 Data4 9 Data4 10 131 132 Data4 11 Data4 12 133 134 Data4 13 Data4 14 135 136 Data4 15 4 RxEr4 137 138 RxDv4 LckRef4 139 140 LoopEn345 RxClk4 141 142 NC GND 143 144 GND FP Con 1 145 146 FP Con 3 FP Con 5 147 148 FP Con 7 NC 149 150 NC NC 151 152 NC NC 153 154 NC NC 155 156 NC NC 157 158 NC NC 159 160 NC NC 161 162 NC NC 163 164 NC NC 165 166 NC GND 167 168 GND Data5 0 169 170 Data5 1 Data5 2 171 172 Data5 3 5 Data5 4 173 174 Data5 5 Data5 6 175 176 Data5 7 Data5 8 177 178 Data5 9 Data5 10 179 180 Data5 11 Data5 12 181 182 Data5 13 Data5 14 183 184 Data5 15 5 RxEr5 185 186 RxDv5 LckRef5 187 188 LoopEn345 RxClk5 189 190 NC RxAddr5 191 192 RxAddr6 RxSDA 193 194 RxSCL NC 195 196 TP NC 197 198 NC JTAG_TDI 199 200 JTAG TDO Figure 19 Second half of the pin out of the O RxCard signal connector The optical receiver card with 12 links uses two connectors The signals on the second connector are identical except the LED and front panel connectors The second connector connects LEDJ bit 2 4 6 and 8 and FP Con 9 11 13 and 15 47 Power plates on Rx Connector Plate Name A RxCard O Rx
16. Altera web site 7 On chip memeory Fast and flexible embedded memory block structure with block sizes of 512bit 4kbit and 512kbit Power and I O Low power consumption due to low core voltage I O Support of a wide range of current signaling standard at its I Os Fast The slowest speed grade is fast enough for this application Termination Termination of the interconnects of the traces on the PCB is possible on the chip This increases significantly the allowed density of fast signals PLL Allows a flexible clock management and replaces clock buffers on the board DSP blocks Embedded multiply accumulate blocks make the processing less critical for speed and reduces significantly the number of needed LEs 12 2 Xilinx VirtexIT The Xilinx VirtexII 7 family is also suitable for the needs of the TELL1 Devices with the necessary resources are available The architectural differences between Stratix and VirtexII are given by the size of the embedded RAM blocks the width and modularity of the DSP multiplier blocks DLLs instead of PLLs To compare the two device families a table of performance in maximal frequencies is shown in table 8 Function Frequency in MHz Frequency in MHz for Xilinx VirtexII for Altera Stratix 16 bit adder reg to reg 239 239 32 1 MUX reg to reg 323 216 64 x 8 Distributed RAM 294 32 x 18 M512 Block RAM 242 1k x 9 Block RAM 250 128x18 M4k Block RAM
17. Card 4 2 3 Power connector for RxCards Name Pin Pin Name Analog 5V Analog 5V AGND AGND Analog 5V Analog 5V AGND AGND Analog 2 5V Analog 2 5V D3V3 D3V3 D2V5 D2V5 NC NC FEM_SCL FEM_SDA FEMCLK FEMRESET FEMLOACCEPT FEMDATAVALID FEMDATA3 FEMDATA2 FEMDATA1 FEMDATAO NC FEMFIFOFULL D3V3 D3V3 GND GND Figure 20 Power plate signal definition for the RxCard connectors 48 A 5V A 5V A 5V AGND AGND AGND A 5V A 5V A 5V A 5V Figure 21 Backplane connecter is used only for power 49 D Dimensions 165 00 i o S 3 g amp 4 4 A E o o a Sa co 4 00 135 00 130 00 S o o t 151 00 Figure 22 Dimension for the A RxCard 50 162 00 Bottom view connector side o 2 e 130 00 151 00 Figure 23 Dimension for the O RxCard l 90 00 44 00 Top view component side 10 35 Figure 24 Dimension for the mother board Dimensions are given also in the data base for the project Guido Haefeli iphe unil ch can give more detailed information 92
18. Data flow to and in the PP FPGA for the Velo read out Only one A RxCard and PP FPGA is shown The FIFO data buffers on the in and output of the logic blocks are indicated as small dark rectangles of I O pins and the resource usage on the PP FPGA only one L1B controller with its SDRAM controller is implemented per PP FPGA The required bandwidth can be obtained using a 48 bit wide SDRAM bank build from 3 memory chips using a double data rate interface clocked with 120M Hz The main processing part of the PP FPGA is the L1 trigger zero suppression To cope with several imperfections of the front end chip e g pedestals and baseline variation an advanced zero suppression scheme is necessary At an event rate of 1 11MHz all processing is done pipelined For a detailed study of the pre processing envisaged for the Velo see 1 The zero suppressed data and header is aggregated on the PP FPGA and transferred to the SyncLink FPGA Special care has to be taken to avoid buffer overflows at the linking stages due to the restricted bandwidth to the readout network A large buffer of 64KByte is inserted on the output stage of the Level 1 trigger link on each PP FPGA In case the buffer fill state exceeds a certain level a throttle signal Half of 12 way optical receiver card DDR SDRAM SK 2GMbit 80MHz 16 T chips lie Sync sync Sync ina Syn X4 120MHz 48 bit data 13 bit addr 040MHz PP FPGA Synchronization
19. L Clk 6 input 2 5V LVTTL LED TP Debug see O RxCard Total maximal 185 Table 12 Signals on digital signal connector for the RxCard Al Signal name Number of pins Remark Analog GND 4 Digital 3 3V 2 Is only used by the O RxCard Analog 5V 2 used for ADC s is used for digital 5V for the O RxCard Analog 5V used for ADC s Analog 2 5V used by SERDES Total 12 Table 13 Signals on the power connector for the RxCard Signal Name 1 O seen Comment from the FPGAs EcsAD lt 31 0 gt 32 InOut Multiplexed Addr Data ECSCIk 1 In the SyncLink FPGA drives the clock ECSnADS 1 In Address strobe ECSnBlast 1 In Burst Last ECSnCS1 1 IIn Chip select ECSWnR 1 In Write not Read ECSnReady 1 Out Assert by slave when ready ECSGPIO 1 ECS GPIO goes only to SyncLink FPGA ECSGPIO 1 ECS GPIO goes only to SyncLink FPGA ECSALE 1 l Address latch enable not used ECSnBE lt 3 0 gt 4 l Byte enable not used ECSnRD 1 l Read strobe not used ECSnWR 1 l Write strobe not used Total to SyncLink FPGA 41 Total to PP FPGA 38 no Reset ECSnCS 2 1 l for RO Tx ECSnCS3 1 l for external control interface Table 14 PLX 9030 Local parallel bus used in multiplexed 32 bit mode slave only The given signals are used to access the FPGAs on the board In addition 2 more chip select signals are
20. L FEMScl PC 1 3 3V LVTTL FEMSda PG 1 3 3V LVTTL Total 11 Table 16 FEM signals 44 C Pin out for connectors on the board men Signal Signal ADCO 0 ADCO 2 ADCO 4 ADCO 6 ADCO 8 ADC1 0 ADC1 2 ADC1 4 ADC1 6 ADC1 8 ESE ADC14 0 ADC14 2 ADC14 4 ADC14 6 ADC14 8 25 48 49 72 73 96 97 120 121 144 145 168 ADC15 0 ADC15 2 ADC15 4 ADC15 6 ADC15 8 ADC15 9 ADCCIk14 ADCCIk15 RxAddr5 191 RxAddr6 PxSda 193 RxScl JTAG_TMS 195 NC JTAG_RESET 197 JTAG_TCK JTAG TDI 199 JTAG_TDO Figure 17 Pin out for the A RxCard signal connector O RxCard 1 2 Channel Signal Pin Pin Signal Data0 0 Data0 1 Data0 2 Data0 3 Data0 4 Data0 5 Data0 6 Data0 7 Data0 8 Data0 9 Data0 10 Data0 11 Data0 12 Data0 13 Data0 14 Data0 15 RxEr0 RxDv0 LckRefO Enable012 RxClkO NC GND GND Data1 1 Data1 3 Datal 5 Data1 7 Datal 9 Data1 11 Data1 13 Data1 14 Data1 15 RxEr1 66 RxDvi LckRef1 68 LoopEn012 RxClk1 70 4 4 74 76 78 80 82 84 Data2 1 86 Data2 1 Data2 14 88 Data2 15 RxEr2 90 RxDv2 LckRef2 92 ProsEn012 RxClk2 94 NC GND 96 GND Figure 18 First half connector pin out of the 200 pin O RzxCard signal connector 46 O RxCard 2 2 Channel Signal Pin Pin Signal Data3 0 97 98 Data3 1 Data3 2 99 100 Data3 3 3 Data3 4 101 102 Data3 5 Data3 6 103 104 Data3
21. Words signal and performs the read out of the L1 accepted events only if the InFifo does not risk to overflow see cycle budget for the L1B 2 A possible sequence over 5 event cycles can be seen in figure 8 With a clock frequency of 120M Hz enough cycles are available for arbitration and refreshing In table 2 the necessary cycle count for each task on the SDRAM is given The chosen SDRAM frequency leads to a sufficient high bandwidth of the memory and allows to keep the InFifos as small as 6 events as it is demonstrated in the following section Upgrading the L1B space can be achieved by replacing the current chosen 256 Mbit chips by the next generation 512 Mbit version without any changes on the PCB To Rate Task Total cycle Average Cy Remark cle Event Each event Write CHO 2 5 36 38 5 Data transfer is 2 word per cycle Each event Write CH1 36 36 Performed after CHO writing Every 2 events Active 3 1 5 Activate the row open Every 2 events Precharge 3 1 5 Deactivate the row close Every 25 events Read CHO 6 2 5 36 1 8 Every 25 events Read CH1 36 1 44 Performed after CHO reading Every 8 events Refresh 10 1 25 Refresh once per 7 Sus Cyles available 108 900ns 8 3ns Average Cycle 82 76 Table 1 This table gives the number of cycles used for each access on the DDR SDRAM running with 2 5 CL latency Half clock cycles appear due to the 2 5 CL latency The data f
22. added unchanged For more detail on the detector specific data format a user manual to the board will be available The data from all PP FPGAs are linked zero suppressed and encapsulated on the SyncLink FPGA At this stage a final aggregation of the header will take place The data from several events are packed in a multi event packet MEP which is the ageregation of several events stored in an external QDR based MEP buffer and finally sent to the RO TxCard The slow control of the FPGAs is done with a 32 bit wide address and data multi plexed parallel interface This interface is generated from the PCI bridge on the Glue Card and is called the Local Bus of the PLX9030 The ECS interface allows to ac cess a board wide 28 bit address space containing registers constant tables and the L1 buffers In order to handle the L1B memory space an 8 bit paging system is applied The TTCrx A RxCards and the FEM are connected to individual 72C buses for direct ECS access 5 Receiver cards At present optical and analog electrical receiver daughter card implementations are foreseen to be plugged on the motherboard This is necessary due to the different data transmission system from the cavern to the counting room For the Velo the receiver card is used to digitize the data transferred over analog copper links 3 and therefore is mainly an analog circuit with pre amplifier line equalizer and ADC This card is called A RxCard The optical receiver c
23. agment link 1000 0 8 0 0 0 HLT MEP processing 2000 0 16 0 0 2 HLT zero suppression 9000 90 16 0 80 0 Control generators 2000 0 16 0 0 2 RO Tx interface 3000 0 16 0 0 1 Total 21000 90 88 0 0 6 Available in 1525 25660 224 138 2 80 6 Table 7 Estimation of needed resources on the SyncLink F PGA 24 8 ECS interface With the use of the LHCb specific CC PC and the adaption board the so called Glue Card three interfaces are provided 8 1 JTAG JTAG is used for three different purposes on the board Each of the three available JTAG chains on the LHCb connector is assigned a separate task e Programming the EEPROM only one containing the firmware for the FPGAs e Configuring the FPGAs directly e Boundary scan for production testing 8 2 12C All four independent buses are provided by the Glue Card are used and even shared among different destinations The usage is shown in figure 10 b 00xxxxx SyncLink FPGA eal b 001xxxx b 011xxxx b 101xxxx b 111xxxx FPGAI2C b 00xxxxx b 01xxxxx b 10xxxxx b 11xxxxx rom oe TTCI2C b 0000000 b 000000x ES b1010000 Figure 10 Overview of the 4 I2C buses and their address spaces defined by hardwired pins on the motherboard To control the A RxCard one 1 C buses are used The two highest address bits bit 6 and 7 are defined on the motherboard e 12C for the RxCards RxSda RxScl 25 e 2c
24. ard is used by all other sub detectors as ST TT Veto OT Muon Calorimeter It uses optical receiver and de serializer which results in a mainly digital design except for the optical receiver part This card is called O RxCard The signal connection from the receiver cards to the motherboard is split up into 4 separate connectors The physical placement is chosen such that 2 or 4 mezzanine cards can be inserted giving flexibility for the receiver card design Table 12 shows the number of digital signals on the signal connector for different implementations The connector chosen provides massive copper plates for GND and Vcc connection and ensures very good signal integrity properties In addition to the signal connectors separate power connectors are used to supply the RxCard with all necessary analog power see table 13 5 1 A RxCard The motherboard is designed to allow for 64 analog links beeing digitized with 8 or 10 bit This leads to 16x10 data signals plus 16 clock signals per analog card using one quarter of the board Remark that the clock source is on the PP FPGA and the signal standard is 3 3V LVTTL On the FPGA a clock generator is programmed generating 16 clock signals with individual phase adjustment The timing of the signals are specified by the ADC used on the receiver card which is the Analog Device AD9057 No data valid signal is available since continuous sampling is performed For controlling the reference voltage DACs
25. assumed Its power is taken from a 48V source 1 5 A on the 5V is not counted in the power consumption since two power options for the RO TxCard are provided Table 9 Table of estimated currents for all components on the board 33 16 Physical aspects of the TELL1 The board is designed to be as far as possible compliant with the mechanical specifica tion of a 9U VME board given by IEEE standard 1101 1 Not conform to the standard is the RO TxCard position which sticks 5cm out over the board edge using part of the transition module space This simplifies significantly the layout and improves sig nal integrity at the PL3 interface by keeping the trace length short Since neither a backplane nor a transition module is used this can be tolerated The width is one slot 20mm with the standard rail position The J1 position top connector is used for a custom power only backplane The component placement on the board is driven by two major constraints e The A RxCard needs to have a maximum width to allow a reasonable analog circuit layout No other connectors can be allowed on the same panel e All other interfaces and the power supply has to be squeezed on the other side of the board The approach taken is the following The data signals are connected to the front panel For the optical receiver cards the optical fibres take a small space The analog signals are connected with 44 pin DSUB connectors 4 per TELL1 On the back side
26. available 42 Table 15 FPGA Signal Name 1 O seen Comment from SyncLink FPGA BCnt lt 11 0 gt 12 Input BCnt EvCntL EvCntH BCntRes 1 Input BCnt reset BCntStr 1 Input BCnt strobe Brest lt 7 2 gt 6 Input Broadcast command data BrestStrl 1 Input Broadcast strobe 1 BrestStr2 1 Input Broadcast strobe 1 Clock40 1 Input Non de skewed clock Clock40Des1 1 Input De skewed clock 1 Clock40Des2 1 Input De skewed clock 2 DbErrStr 1 Input Double error strobe 1 EvCntHStr 1 Input EvCnt high strobe EvCntLStr 1 Input EvCnt low strobe EvCntRes 1 Input EvCnt reset L1Accept 1 Input L1 accept LOAcceptLHCb Reset_b 1 Output Chip reset DOut 8 Inputt Data bus DQ 4 Inputt Data qualifier for Dout DOutStr 1 Inputt Strobe of DOut SubAddr 8 Inputt Address bus SinErrStr 1 Input Single error strobe TTCReady 1 Input Ready signal TTCSda 1 l Ready signal TTCScl 1 l Ready signal Total 56 TTC signals All but the I2C bus 43 signals are connected to the SyncLink Signal Name Use I O seen from Standard the SyncLink FPGA FEMData lt 3 0 gt SyncLink In 4 3 3V LVTTL FEMDataValid SyncLink In 1 3 3V LVTTL FEMClk Clock Out 1 3 3V LVTTL FEMRst Reset Out 1 3 3V LVTTL FEMLOAccept Trigger Out 1 3 3V LVTTL FEMFifoFull Status In 1 3 3V LVTT
27. ckplane Summary The power supplies needed from the crate are e 8A D3 3V e 2A D5V e 1 5A D24V e DGND e 2 5A A5V e 1A A5V e AGND For a crate with 21 boards an estimation of power according to the requirements results 1900 W 15 1 Power backplane The power in the crate is distributed over a custom power backplane which available from Wiener It uses the standard 96 pin VME connector and is fixed at the upper 3U in the J1 position Only five of the 8 possible power rails are used for the pin out see appendix C 15 2 Power supply fuses Each individual power supply on the board is protected for over current by a fuse as it is recommended by TIS 32 Description IESNI Da NI SN 3M A5V A5V Comment 2 x O12 RxCard 1 2A 0 6A A2V5 de rived from A5V low 4 x A RxCard 3 9A 2 4A 1A option 4 x PP FPGA 4A 3A 1 5A power calc 12 x DDR 1 4A SDRAM 4 x DDR Termi 1 2A nation 1x QDR SRAM 0 4A 1 x QDR Termi 1A nation 1 x SyncLink 1A 0 7A 0 7A power calc FPGA 1 x EEPROM OA Used for for FPGA config only 1 x TTCrx 100mA 1 x Optical Rev 9 mA 1 x FEM Bee 100mA estimation tle 1 x RO TxCard 2 3A 1 5A 1 x CC PC 0 5A 0 5A estimate 1 x Glue Card 0 2A estimate Mora eestor ASA ESG AO 6 UA SS SA AO EDAD SAO Wat RxCard Total tor O S6AA 72764 SSA 7224 60 Watt RxCard 85 efficiency for 1 5V and 2 5V is
28. e O RxCards is generating the clock data valid and an error signal which are used to synchronize the data on the input of the PP FPGA The multiplexed data is de multiplexed and written to an input FIFO to allow the change of clock domain for the following processing stages For detectors assigning the data valid on the transmitter not only for valid data have to provide additional synchronization mechanism This is done in case for the ST with the FEM All other sub detectors are assigning the correct data valid signal to the transmitter 6 4 Event synchronization for the Velo The data valid signal available on the Velo FE chip Beetle is not transmitted to the read out electronics over the analog links The principle to select the valid data from the continuous sampled signals is based on the data valid signal regenerated by the local reference Beetle on the FEM The data has further to be synchronized to a common clock domain which is done by the use of FIFOs on the input stage All further data processing on the PP FPGA is done with a multiple of the clk 40 The synchronization 11 is illustrated in figure 6 on the bottom In addition to the pipeline column number PCN the Beetle is also sending a fixed start and stop bit which in addition to the data enable signal of the front end emulator can be used for synchronization After the valid data is identified the header words representing the pipeline column number 8 bit PCN can be verified
29. e availability of relatively low cost optical links the increase of the L1 buffer latency and the possible contribution to the L1 trigger all sub detectors except the Velo have chosen to use the optical transmission of the LO accepted data to the L1 buffer board described in this this document The transmission schema for the Velo has been decided to be based on analog electrical links Driven by the use of the same read out chip Velo ST TT and Veto the development of a common read out board has started already in the early prototyping phase To cope with the two different link systems used optical and analog the receiver part of the board is implemented as mezzanine cards where for the Velo the receiver side digitizes the analog signals and for the optical links the data serialized with the Cern Giga bit Optical Transmitter GOL is de serialized with the TLK2501 high speed transceiver from Texas Instruments The common interface on the receiver side opens the possibility to use the same board for all sub detectors For synchronization L1 buffering L1T zero suppression and HLT zero suppression several large FPGAs are employed on the board allowing the adaption to sub detector specific data processing Even though the zero suppression has to be developed specific for each sub detector a framework for the FPGA firmware development as well as a common test environment including all interfaces of the board can be developed sharing resources and
30. ed description of the access will be documented in the development phase of the FPGA code For the RO TxCard LAD 15 0 and LA 12 2 and two chip select signals are connected 9 Resets A general reset signal which is issuing a reset on all registers and FIFOs on the FPGAs is distributed on the card It can be issued with a push button with a GPIO of the ECS and also with an ECS access to the SyncLink FPGA In addition three dedicated Resets are distributed from the SyncLink FPGA to the PP FPGAs for defined reset procedures as specified for LO front end and L1 front end reset All resets concerning the event identification need only to be available on the SyncLink FPGA For the power on reset of the CC PC a power supervisor circuit is resetting during 200ms after power good is indicated 27 10 Clock distribution and signal termination on the board Special care has to be taken for the clock and fast signal distribution on the board The typical rise fall time for fast signals from and to the FPGAs and ASICS as the TLK2501 is Ins This leads to a maximal trace length of 2 4cm that can be considered as electrical short with the general accepted 1 6 rule 10 All long signals have to be terminated in an appropriate fashion The preferred termination scheme for LVTTL signals is to use point to point interconnects with source termination The value of the serial resistor is depending on the driver s impedance and the trace impedance
31. ed to R Reserved the HLT without any changes or aggregations Figure 5 In general the data format after synchronization consist of 4 header words and 32 data words The two first header words are used for the event identification with LO EvCnt and BCnt The third and fourth words are user defined This means specific to each sub detector All user defined words are stored in the L1B and passed to the HLT readout As an example the definition for the Velo is given 6 2 Data synchronization for the Velo The analog signal transmission over 40m twisted pair copper links suffer from a skew among channels on the same cable of order 5ns which has to be compensated by 10 using channel individual phase adjustable clocks for sampling the signals The phase adjustable clocks are generated with a small design block on the FPGA using a PLL to generate a fast clock and shift registers for clock dividing This allows to generate 16 phase shifted clock signals from which each ADC clock can be chosen from a ag Brest SyncAck amm rrcrxrst Ea Resets PP FPGA el E Me Sa Sync Aao PP FPGA Sync EvCnt BCnt multiplexed Cik Short Brest TTCrxRst RaM RaM EvCnt BCnt PCN multiplexed Figure 6 Synchronization of the input data for the optical in the upper and for the analog links in the lower part of the figure 6 3 Data synchronization for the O RxCards The TLK2501 SERDES chip used on th
32. efer to the TTCrx user manual 9 The following synchronization tasks are implemented on the SyncLink FPGA using the TTTrx signals 17 TTCrx reset All resets on the board are distributed from the SyncLink FPGA see section Resets TTCrx status DbErrStr and SinErrStr are counted with saturated 18 16 bit counters and are accessible on the ECS registers TTCErrCntReg TTCReady is accessible on BoardStatReg Clock The Clock40Des1 is used for the board wide 40M Hz clock called clk_40 The PLL based clock management circuit on the SyncLink FPGA allows to distribute the system clock to all necessary locations without external clock buffers The Clock40 and Clock40Des2 are also connected to the SyncLink FPGA but are not used yet BCnt The bunch counter is available on the BCnt bus during the BCntStr high synchronized to the Clock40Des1 and reset by BCntRes Since the BCnt bus becomes erroneous for LO L1 accept spaced less than 3 clock cycles a counter on the SyncLink FPGA is implemented to make the BCnt always available The bunch counter is transmitted to the PP FPGAs via 6 bit wide SyncData bus and therefore has to be multiplexed during two clock cycles EvCnt The low part 12 bit of the event counter is available on the BCnt bus during the EvCntLStr high and the high part during EvCntHStr high The BCnt signals are synchronized to the Clock40Des1 and reset by EvCntRes Since the EvCnt bus becomes erroneous for LO L1 accept spaced less than
33. for the TTCrx TTCSda TTCScl The serial EPROM for the board identi fication is connected as well to this bus e 12C for the front end emulator Beetle chip FEMSda FEMScl e 12C FPGA all FPGAs are connected on I2C for debugging purpose FPGASda FPGAScl 8 3 Parallel local bus The local bus generated by the PLX9030 7 PCI bridge provides a simple parallel bus Three chip select signals are made available via glue card The chip selects are used in the following way see figure 11 ncs1 SyncLink FPGA h 4000000 h 7FFFFFF h 0000000 h OFFFFFF h 1000000 h 1FFFFFF PP FPGA h 2000000 h 2FFFFFF h 3000000 h 3FFFFFF L1B Swapped region h 8000000 h FFFFFFF ncs2 ncs3 h 000 h FFF h 000 h FFF Local parallel bus Multiplexed mode 28 bit address 32 bit data Figure 11 Overview of the 3 local parallel nCS and their address spaces For the RO TxCard part of the non multiplexed address bus is connected to obtain a maximal number of supported configuration scheme possible for the MAC chip on the mezzanine card e nCS1 is shared among SyncLink FPGA and the PP FPGAs to access registers FIFOs RAM the L1B and the MEP output buffers e nCS2 is used for the local bus to the RO Tx e nCS3 is reserved for a second device on the RO Tx card The parallel local bus is used to access the resources controlled by the FPGAs including
34. iPHE SS LHCb 2003 007 IPHE 2003 02 October 10 2003 TELL1 Specification for a common read out board for LHCb Version Release Guido Haefeli t Aurelio Bay Federica Legger Laurent Locatelli Jorgen Christiansen Dirk Wiedner a Institut de Physique des Hautes Energies Universit de Lausanne b Cern Geneva c Physikalisches Institut University Heidelberg Abstract This document specifies the TELL1 readout board used by essentially all sub detectors in LHCb It defines the interface to the optical or analog front end receiver cards specific to each sub detector as well as the data synchronization buffering and the interface to the L1 trigger and the higher level trigger The FPGA based board is interfacing to standard Gigabit Ethernet network equipment providing up to four Gigabit Ethernet links TELL1 accepts 24 optical links running at 1 6GHz and provides for the analogue option 64 10 bit ADC channels sampling at 40MHz E mail Guido Haefeli iphe unil ch 2E mail Aurelio Bay iphe unil ch 3E mail Jorgen Christiansen cern ch E mail dwiednerCphysi uni heidelberg de Contents 1 Introduction 1 2 Shortcuts 2 3 Requirements 2 4 Overview of the data flow on the board 4 5 Receiver cards 9 Gd ASNO y a ae oe woe os eS el Bee oe Bee oe 9 D A A es A es ES ee ee ee aA 9 6 PP FPGA 10 6 1 Data synchronization and event synchronization 10 6 2 Data synchronization for the Velo
35. ize of the FPGA a detailed count of the I O is listed in table 10 The number of data signals plus the I O pins used for reference voltage of the SSTL 2 and the reference impedance for the source termination are also included in this calculation The calculated number of I O is supported by several packages and devices of the Altera Stratix FPGAs To allow the migration between different devices the necessary number of I O has to be available by all desired devices Altera 672 Pin 780 Pin Comment Stratix FineLine FineLine Device BGA BGA EP1S10 341 422 not enough I O EP1520 422 582 EP1525 469 593 EP1S30 593 Table 3 The 780 pin FBGA package allows to migrate between several devices The low cost Stratix devices called Cyclone are only produced with a maximum I O count of 301 and aren t suitable for this application see table 10 In table 4 an overview of the estimated resources is given for the implementation of the so called LCMS algorithm linear common mode suppression algorithm also described in 1 The implementation has been optimized for the Altera APEX20K FPGA architecture but also allows to estimate the resources used in an other FPGA Using the Altera Stratix FPGA devices allows to implement the MAC multiply ac cumulation operations with the embedded DSP blocks This reduces significantly the LEs logic elements used for the design 16
36. manpower 2 Shortcuts A RxCard O RxCard RxCard PP FPGA SyncLink FPGA TELL1 FEM L1B L1T L1A DAQ HLT HLT ZSup TTC TTCrx ECS TLK2501 GOL RO TxCard DDR SDRAM SERDES L1T ZSup OSI MAC MEP PHY POS PHY Level 3 PL3 GMII 3 Requirements Analog Receiver Card Optical Receiver Card Receiver Card stands for A RxCard and O RxCard Pre Processor FPGA Synchronization and Link FPGA Trigger ELectronics and L1 board the board that we talk about in this note Front End Emulator L1 Buffer L1 Trigger L1 Accept Data acquisition for LIT and HLT trigger data High Level Trigger HLT zero suppression processing block Timing and Trigger Control for LHC TTC receiver chip Experiment Control System Texas Instruments SERDES chip Cern implementation of a radiation hard 1 6 Gbit s serializer Read Out Transmitter Card Double Data Rate Synchronous Dynamic RAM Serializer and de serializer circuit L1 trigger zero suppression Open Systems Interconnect Model Medium Access Controller Gigabit Ethernet terminology Multi Event Packet Term used for an aggregation of several events to one packet in oreder to achieve maximal performance on the Gigabit Ethernet based read out network Physical layer device Gigabit Ethernet terminology Saturn compatible Packet Over Sonet interface level 3 used for 1 Gigabit Ethernet Gigabit Medium Independent Interface 8 bit parallel PHY interface TELL1 is used by se
37. more complicated access The bandwidth of the memory is supposed to be sufficient to deal with more advanced read out cycles 12This macro cycle is only used to verify proper operation under worst case conditions which means it does not have to be coded in the FPGA 14 Task Cycle count Remark Write CHO 5 x 2 5 36 4 Write CH1 5 x 36 Active 5x3 Open row each event Precharge 5x3 Close row each event Read CHO 6 2 5 36 5 times the L1 accept rate Read CH1 36 5 times the L1 accept rate Refresh 10 does not to be done as frequent Cyles available 540 Cycles used pg 95 cycles used Table 2 SDRAM cycle count for one macro cycle of 5 events This table shows that all memory accesses can be done during the time of 5 events The consequence is that an input FIFO of 5 1 event is sufficient on the input to the buffer Cycles for arbitration and address generation for the different accesses are included 15 6 7 L1T zero sup Detailed studies for the implementation of the zero suppression for the the L1T called L1PPI have been done in 1 to estimate the amount of logic gates and memory needed on the PP FPGA The processing foreseen to be done for the Velo can be split into the following steps e pedestal subtraction e faulty channel masking e common mode suppression e hit detection e cluster encoding e cluster encapsulation 6 8 I O signals and resources To determine the package s
38. nt link is 768 bit or 48 16 bit words This value is an upper limit to the necessary bandwidth for binary hit encoding 7 2 HLT fragment link To link and transfer the HLT fragments a maximum time of 20s is allowed 1 The 16 bit wide links permit to transfer the event fragments without the need of deep FIFO buffers In figure 9 the event fragment format is given for both links The link format for all detectors is supposed to be the same The front end chip specific header information is either part of the common header marked user defined or is implemented as data and will be transparent for the link L1TFragmentFormat 32 R8 HLTFragmentFormat 32 E E E i User defined header A CH 0 32 User defined header B CH 0 32 N Number of data clusters fragment Ns128 User defined header A CH 5 32 R Reserved User defined header B CH 5 32 R Reserved This format has a fixed length of 15 header and 6 x 32 data words Figure 9 Link format for LIT and HLT between PP FPGA and SyncLink F PGA Remark that the format is given on a 32 bit wide base This is done because the DDR interface on the L1T and HLT interface appears on the chip as a 32 bit interface 7 3 Cluster formats for fragment links As an example the Velo cluster format for the L1T link is given in table 5 The HLT With a L1 accept rate of 40kHz 19 Bit Description 0 Cluster size Is 0 for clusters of one hit and 1 for two hits lt Mi 1 gt Strip n
39. ome the standard TTC receiver at LHC With the connection of all signals of the TTCrx to the SyncLink FPGA not only the short but also the long broadcast commands can be used The final definition of the commands to be interpreted by the SyncLink FPGA is not yet fixed but will be available during the next few month 7 8 LO and L1 Throttle To allow for feedback to the read out supervisor two separate throttles one for LO and one for L1 throttling are generated and put on a RJ45 using LVDS signals The throttle signals of the whole TELL1 board crate are ored on a Throttle OR module which will or the LO and L1 throttle signals for the crate and generate an optical to avoid ground loops between crates output This module is placed in the TELL1 crate 7 9 FEM for Beetle based read out The FEM used by the sub detectors with the Beetle FE chip 4 is controlled with I C and interfaced to the SyncLink FPGA Its task is to generate the DataValid signal which is not transmitted with the detector data In addition the PCN is extracted to check the synchronization between the FEM and the data from the FE The available status signals from the Beetle are also connected to the SyncLink FPGA and made available in a register for status monitoring In table 16 in the appendix the signals on the FEM interface are given The PCN is available on the FEMData bus and has to be sampled with respect to the FEMDataValid signal The timing can be found in the Bee
40. or 24 ces ao a a Oe Bw S 14 4 Lemo connectors o e 15 Power requirements 15 1 Power backplane o 15 2 Power supply fuses o eee 16 Physical aspects of the TELL1 16 1 Cut outs 2 0 000 200 0000 000000 ee POs LEDS ca ee areas weet sb et ge ns de a ts Dk we a ed 17 FPGA implementation guidelines 17 1 Development 2 2424 lt 46 4 2400444 o4 bad oes A I O Tables B Signal tables C Pin out for connectors on the board D Dimensions 29 29 30 31 31 31 31 31 31 32 32 32 32 34 34 36 36 36 39 41 45 50 1 Introduction This document describes an off detector electronics acquisition readout board for LHCb called TELL1 It serves for the readout of optical or analogue data from the front end electronics The FPGA based board is used for event synchronization buffering during the trigger latency pre processing including common mode correction and zero suppression For the data acquisition the board is interfacing to standard Gigabit Ethernet network equipment providing up to four Gigabit Ethernet links TELL1 accepts 24 optical links running at 1 6GHz and provides for the analogue option 64 8 bit ADC channels sampling at 40MHz Driven by the high cost of optical links and the fact that the data was not used for the L1 trigger decision several sub detectors of LHCb had planed to do the L1 buffering in the cavern close to the detector With th
41. otal worst case of 32 x 512 bytes 16 Kbyte that has to be expected after a LO throttle The necessary buffering is provided with the internal MEP output buffer of 64 Kbytes which allows to store 4 worst case MEP This allows for the scenario where a worst case event is framed and a worst case MEPs is assembled at the same time The buffer is implemented on the write side as a 64 bit wide RAM using one of the two on chip large RAM blocks The necessary bandwidth can be achieved with a data transfer frequency of 80 MHz on the write side The read side clock domain can be adapted to the RO TxCard interface clock by using the RAM in dual clock mode 7 5 HLT data path on the SyncLink FPGA In principle the same type of data path as for the L1T is used for the HLT The zero suppression is custom to each sub detector In case for the Velo it is a replication of the L1T zero suppression that in addition can be adapted to the requirements for the HLT eg special thresholds The MEP buffering for the HLT makes an external memory necessary as it can be seen from the calculated necessary buffer depth Using a worst case event size of 4K Byte and a multi event packing factor of 16 a 64 Kbyte MEP results As already for the L1T a minimum of 2 MEP needs to be stored in the buffer which leads to a buffer size of at least 128 Kbyte The chosen memory is a high bandwidth dual port memory of the type Quad Data Rate QDR which allows simultaneous read and write opera
42. overview of the building blocks on the board independent receiver mezzanine cards A RxCard or O RxCard can be plugged onto the motherboard The signals of the receiver cards are directly connected to the PP FPGAs which are the main processing units on the board and each PP FPGA is connected to its independent L1 buffer After zero suppression for the level 1 trigger L1T and the HLT data is linked and encapsulated on the SyncLink FPGA The same FPGA is also used to process the TTCrx ECS and FE emulator information to issue resets synchronize the 4 preprocessor FPGAs PP FPGA and distribute clocks and LIT decisions Event data is sent to the event building network of the DAQ system 5Two receiver card types are foreseen one for the Velo using the analog electrical readout and the other for the optical readout The analog receiver part on the board is split in 4 where for the optical receiver in 2 mezzanine cards via the read out transmitter RO Tx A more detailed representation of the data path from the Rx Card to the PP FPGA is given in figure 2 for the Velo with the analog receiver card and in figure 3 for the optical read out To reduce the number 16 Analog Electrical link from FE DDR SDRAM A RxCard A card 3 x 256Mbit X4 chips 120MHz 48 bit data 13 bit addr PP FPGA 16 1 Data rate Synchronization 160 MHz Information BCnt EvCnt accepted events Figure 2
43. path is one of the critical design on the chip Macro cycle To ensure the correct operation of the buffer at all possible read and write scenarios a macro cycle of 5 events 5 x 900ns 4 5us is analyzed Within one macro cycle all necessary transaction to and from the buffer can be performed In other words 5 events are written to the buffer one is read and one refresh can be performed in 4 5us This principle is illustrated in figure 8 Simple addressing The start address of each event block in the memory is defined by the 24 bit event counter LO EvCnt The address length required to address 64k events is 16 bit only Addressing in SDRAMs is always performed in two steps In a so called ACTIVE cycle the row address is issued on the address lines eg 13 bit During the actual read or write cycle the column and bank address is applied 9 bit 2 bit This allows to address the whole 256 Mbit SDRAM chip and allows for upgrading to 512 Mbit chips without any hardware changes by using 10 bit 2 bit for the column address as it is foreseen in the migration path of these chips The memory arbitration has been implemented on a FPGA using the DDR SDRAM core from Altera The RTL simulation is in good agreement to the calculation done in table 2 lEven this is a very convenient way to use the memory space it leaves half the memory unused In a more advanced state of the board the memory access can be redefined with the drawback of a
44. r need to be used to chose from the two options The header and data format received by this card is depending on the sub detector We can nevertheless assume that all sub detectors use 4 header followed by 32 data words 6 PP FPGA With a long list of tasks this FPGA demands for a high amount of resources 6 1 Data synchronization and event synchronization For a better understanding of the synchronization mechanism on the board it is use ful to distinguished between data and event synchronization In this context data synchronization comprises the generation of the sampling clock of the ADC for the A RxCard selecting the valid data out of the continuous sampled analog signals and changing the clock domain on the input stage of the FPGA to the on chip common clock domain For the optical receiver card the data synchronization is given by the interface of the deserializer The event synchronization is a second step and performs consistency checks between the transmitted event identification in the header and the local reference This separation can be understood as a Physical layer and DataLink layer of the OSI model The data format after data and event synchronization is given in figure 5 General Synchronized Data Fromat 32 User defined header A CH 0 32 User defined header B CH 0 32 Velo Synchronized Data Fromat 32 Re nue rs Pene 36 words R Reserved NU Not used The two user defined header words are pass
45. rom two physical channels a data stream of 32 bit wide 40 MHz are multiplered in one memory This is taken into account by having rd and wr access to CH0 and CH1 The clock cycle count shows that a sufficient margin on the bandwidth can be achieved using the chosen memory configuration The clock cycles used for arbitration are not included predict the behavior of the L1 buffer and fix the size of the input buffer the following precaution have been taken Double memory space The actual memory size of the SDRAM is twice the minimal size specified by the L1 front end electronics requirements The memory space is used such that the data of each event 2 input channel x 36 words x 32 bit can be placed completely in one column of the memory This can only be ensured 10With a clock frequency of 120M Hz the data transfer to the memory is at 240M Hz 13 Write Read Refresh 5 Event cycles of 900ns Figure 8 Example how the Arbiter schedules all required transactions of the memory during one macro cycle of 4 58 by reserving nearly twice the space 2 x 64 x 32 bit 512 Byte per event Some of the not used words can be dedicated for integrity checks in an advanced state of the project From the point of memory size it is also possible to attach the L1T pre processing information to the data stored in the L1B Nevertheless the feasibility of this has to be verified during the development of the FPGA fimeware since the L1B data
46. rt FIFOs is preferable The development of the FPGA code firmware can be divided in one to be common to all sub detectors and an other part with specific firmware In figure 15 and 16 the blocks specific for sub detectors and therefore not in the common framework developed are indicated with the dashed boxes In the SyncLink FPGA only the DAQ zero Half of 12 way optical receiver card DDR SDRAM 3 x 256Mbit i X4 chips 120MHz 48 bit data LNY KM ANNAN ANAN ANN ANANN 13 bit addr NS NN NN NN NYI wI 40MHz PP FPGA 16 Data rate up Synchronization to 160 MHz Information BCnt EvCnt accepted events Figure 15 Sub detector specific blocks on the PP FPGA are indicated with the dashed boxes suppression is significantly different for each sub detector The event data collection multi event packaging and Ethernet framing including the PL3 interface to the RO TxCard will be kept identical for all users 17 1 Development Already in the early debugging phase a common framework needs to be developed that includes all firmware for a specific detector In the next phase the common frame 36 QDR_CIk Cik Long and Short Brest BCnt EvCnt SyncData SyncAck To PP FPGAs broadcast Shared data path for 2 channel RO TxCard 100 MHz To RO Tx From PP FPGAs QDR SRAM 100 MHz Shared data path for 2 channel RO TxCard 100 MHz From
47. the notion Cluster and Event Fragment in this context a definition shall be given Cluster A cluster is formed when one or multiple neighboring detector channel carry a signal The proposed cluster size for the L1T is one 16 bit word The cluster size for the HLT is variable depending on the number of hits in the cluster but is transmitted in 16 bit words Event fragment All clusters for one event on one PP FPGA is called an event frag ment With a transfer rate of 160M Hz and a cluster size of 16 bit see table 5 the data transfer has to be restricted to 128 clusters 256 bytes plus header per event leaving a margin of 8 cycles to start stop and verification of the transfer 14 Additional hits or clusters have to be discarded to allow the linking to be performed with a fixed latency of 900ns Fixing the event linking is only one solution to prevent from buffer overflows on the PP FPGAs A large link de randomizer buffer can me used such that the LO throttle signal is set in case the buffer fill state exceeds a certain level Fragments with discarded clusters are flagged as such in the ErrorFlag word of the event This way of truncation can be reproduced in the HLT and is not dependent on the fill state of some buffers caused by previous events The corresponding bandwidth of one of the four link is 16 bit 160M Hz equals 2 5 Gbit s which corresponds to the full bandwidth of the RO TxCard Velo To find the most appropriate cl
48. tions With a bus width of 16 bit at double data rate and 100M Hz clock frequency a bandwidth of 3 2 Gbit s is reached The chosen QDR memory chip provides a depth of 1 Mbyte and is the smallest available 7 6 RO TxCard interface The interface is defined as two independent POS PHY Level 3 PL3 interfaces This allows for a maximal data transfer rate of 2 x 2 4 Gbit s to the mezzanine card 6 The PL3 compliant interface is used in 32 bitQ100MHz mode Firmware on the SyncLink FPGA can be either developed specific for our application or using the PL3 to Atlantic interface FPGA core from Altera This fifo like interface facilitates the implementation since registers can be inserted to improve the timing of the design 21 7 7 TTCrx interface The TTC receiver chip synchronization signals are connected to the SyncLink FPGA table 15 The distribution of clock trigger and event synchronization signals is done with point to point links to each PP FPGA The clocks can be individually phase adjusted to ensure the correct clock phase between the FPGAs on the board The configuration registers can be loaded over an ECS driven J C bus For production testing the JTAG boundary scan interface is connected to the overall JTAG chain The use of a TTC configuration EEPROM is not foreseen Its configuration registers have to be loaded at each power up The TTCrx is directly mounted on the board to reduce cost and board space 16 For further documentation r
49. tle user manual 4 The FEMData bits have to be re ordered and are multiplexed distributed to PP FPGAs over the SyncData bus The strobe signal SyncPCNStr generated is used by the PP FPGA to latch the data on the SyncData bus The PCN is transmitted over the SyncData bus to the PP FPGAs and has to be multiplexed on the SyncLink FPGA nttp literature agilent com litweb pdf 5988 2576EN pdf 0nttp www truelight com tw datalist TRR TRR 1B43 000 pdf It is not clear yet if the two receiver can be placed on the board by doing two soldering options 23 7 10 I O signals and resources The functionality foreseen at the present to be implemented is not very well known in terms of resource usage For the Velo two or four channels of HLT zero suppression are needed This will take about the same amount of resources as for the L1T zero suppression on the PP FPGA An estimation is given in table 7 Altera 780 Pin 1020 Pin Comment Stratix FineLine FineLine Device BGA BGA EP1520 582 na EP1825 593 706 EP1530 593 726 Table 6 The 1020 pin FBGA package allows to migrate between several devices see table 11 Functional block Logic Block Block Block DSP PLL Ele memory memory memory blocks ments 512 bit 4k 4k x 144 LE L1T fragment link 1000 0 16 0 0 1 LIT MEP processing 2000 0 0 1 0 0 L1T location conversion 1000 0 0 1 0 0 HLT fr
50. two I2C chains are used The definition of the header bits and the data format can be found in 4 The structure of the events sent is 4 header words and 32 data words Since the header words are decoded pseudo digital the header can be reduced to 16 bit 5 2 O RxCard With the use of the TLK2501 de serializer from Texas Instrument the data transmitted with the GOL transmitter is de serialized to a 16 bit wide multiplexed bus signal clocked at 80M Hz Two control signals the data valid and an error signal are available to synchronize and verify the proper operation of the receiver Each optical channel SFor the digital signal connectors a 200 pin 0 643mm pitch connector has been chosen see http www samtec com ftppub pdf QTS PDF and http www samtec com ftppub pdf QSS PDF Using a 10 bit ADC is optional for the A RxCard and therefore thus needs to be supported The further processing will be done on 8 bit resolution 8Each bit is transferred as an analog low or high value The analog signal is sampled with 8 bit ADCs and a threshold applied Since in several applications the data valid signal is constantly assigned eg ST the data valid signal can not be used for event synchronization de serialized is accompanied by its clock generated from the TLK2501 which means that the PP FPGA is clock receiver All signals use 2 5V LVTTL standard To cope with the two different I O voltages on the two different receiver cards power jumpe
51. tzerland Howard W Johnson Martin Graham High Speed Digital Design a handbook of black magic LHCb 2001 046 38 A I O Tables Signals Purpose I O standard 16x11 RxCard 3 3V 2 5V LVTTL 28 13 48 DDR SDRAM 16 bit 2 5V SSTL_2 32 6 ECS 3 3V LVTTL 8 Swap Page 3 3V LVTTL 2 Throttle 1 1 L1A EvID LVTTL 16 4 PP HLT link 1 5V HSTL 16 2 PP L1T link 1 5V HSTL 6 2 Event synchronization 2 Clock 2 Processing mode 2 L1Tprocessing sync 2 HLT processing syne il Initialization done 4 Resets 4 GPIO to SynkLink FPGA LVTTL 38 Analyzer interface TP 2 ECS 12C 3 3V LVTTL 3 Device address 6 Reference voltages 1 25V 2 Reference voltages 0 75V 8x2 Terminationresister reference R 439 Total Table 10 The number of I O s used for the PP FPGA with the proposed partitioning of the board with 4 PP FPGA s The high pin count makes the use of low cost FPGA s which are only available in smaller packages impossible 39 Signals Purpose I O standard 4x 16 4 HLT link interface 1 5V HSTL 4x 16 2 L1T link interface 1 5V HSTL 4x 6 2 SyncData link to PP FPGA s 2 x 32 18 To RO Tx POS PHY L3 LVTTL 324 9 ECS 3 3V LVTTL 8 Swap Page 3 3V LVTTL 12 FEM interface 3 3V LVTTL 54 TTCrx interface 3 3V LVTTL 4x2 Throttle in 2 Throttle out 3 3V LVTTL
52. umber Unique strip number per board 13 Second threshold If one of the hits in the cluster ex ceeded the second threshold level this bit is set lt 15 14 gt Unused Total 1 word Table 5 Velo cluster format for the L1 trigger Remark that the cluster size is fixed to 16 bit event fragment do have a more complicated structure and shall be defined during the implementation phase It is assumed that this does not affect the hardware implemen tation of the board 20 7 4 L1T data path on the SyncLink FPGA In figure 4 the data path is illustrated Special care has to be taken to avoid buffer overflows caused by exceptionally big events For the L1T data path this can be handled in the following way Each PP FPGA applies a cut on the maximal number of clusters per event so the data transfer can be accomplished within 900ns In a last linking stage on the SyncLink FPGA the maximum clusters per event is restricted again and is 256 clusters per board This leads to a worst case event size of 512 bytes plus header Since up to this point a data push architecture has been assumed the necessary buffering to prevent buffer overflows due to the limited bandwidth on RO TxCard needs to be done in the next buffering stage With the assumption that after LO throttle has been raised due to a high buffer level 24 events from the LO de randomizer and link system and 8 events from the zero suppression have to be managed This leads to a t
53. urces for the data processing The synchronization of the sampled data needs a local front end emulator to generate a data valid signal since no data valid signal is transmitted along the data 64 individual phase adjustable clocks need to be generated for the ADCs The board needs to provide the connectivity for 24 optical links de serialized running at 1 6GHz carrying the information of 24 x 128 strips sampled with 8 bit in total The data is transferred on multiplexed 16 bit wide data buses running at 80M Hz In addition at least the receiver clock the data valid and the error control signals must be connected With 24 optical links the L1 buffer L1B needs to be designed in size and bandwidth for this data input which is 50 more than for the Velo TT must provide information to the L1 trigger With 24 optical links the same requirements as for the ST apply for the OT The high occupancy on this detector imposes a high bandwidth for the whole readout path Others e The sub detectors not mentioned do have optical interfaces identical to the ST and OT and do not have a demand of higher bandwidth more memory or more programmable logic on the FPGAs 4 Overview of the data flow on the board In figure 1 a block diagram of the board is given The blocks are indicating the par titioning in different daughter cards FPGAs and external interfaces Four or two BE El a HLT LO and L1 Throttle SyncLink FPGA Figure 1 An
54. uster encoding schema the distribution for 1 2 or multiple hit clusters has been simulated and the most appropriate data model has been discussed 5 With an expected occupancy of order 0 6 or an average of 15 clusters board event 8 the most reasonable cluster encoding is the following e One hit clusters are marked as of size 1 and its strip number is transmitted e Two hit clusters are marked as size 2 and the strip number of only the first strip is transmitted e Clusters with three and more hits are split up into clusters of size one and two 13For this interface the double data rate registers in the I O cells of the FPGA are used The clock frequency is therefore 80M Hz 14 Available cycles 900ns 6 25ns 144 cycles not used 144 8 128 8 18 To allow a flexible limitation of the readout data the maximal number of clusters sent to the L1T can be limited at the two linking stages The limits can be set per PP FPGA and for the whole TELL1 on the SyncLink FPGA ST TT As for the Velo OT With 6 optical links per PP FPGA a total of 3072 channels are processed on the TELL1 The restriction to 128 hits event per PP FPGA allows to read out an average occupancy of 128 hits 768 16 7 With a zero suppression that allows to encode multiple hits in one cluster a significant data reduction can be obtained With the assumption that only binary information per hit needs to be sent to the L1T the non zero suppressed information on the fragme
55. veral sub detectors in LHCb Special requirements are given by different sub detectors concerning interconnection and synchronization In most aspects the Velo imposes the strongest requirements and is therefore taken to guide the implementation In the following list important aspects for the various sub detectors are summarized to give a general overview of the most demanding aspects of each sub detector Velo ST OT The L1 front end electronics interface is analog and digitization must be done on the receiver part A RxCard A total of 64 analog links each carrying the information of 32 strips can be sampled with 8 or 10 bit To accommodate the required number of analog channels the space available on the mezzanine card has to be maximized Special care has to be taken concerning the board layout in order not to disturb the sensitive analog part with the high speed digital logic on the mother board The number of input data and clocks signals is higher than for other sub detectors This is caused due to the fact that the analog receiver card is operating as a digitizer and data is transferred to the motherboard at 40M Hz on 32 bit wide buses For the optical receiver card a multiplexed bus running at 80M Hz is used The Velo must provide information to the L1 trigger An advanced common mode suppression algorithm is foreseen to be implemented for the L1 trigger pre processor and the HLT interface which requires a high amount of reso
56. y can be achieved Even though the ECS local parallel bus is running at 10 Hz only care has to be taken that no fast signal edges are causing overshoot and undershoot that can destroy the devices on the bus 2 Signal integrity simulation have been done in order to ensure its proper functioning The ECS parallel bus which has a total length of about 40cm is RC terminated on both sides of the bus 11 FPGA configuration For the configuration of the Altera Stratix FPGA one enhanced configuration de vice EPC16 is sufficient with the assumption that the PP FPGAs do have identical firmware Having only one firmware for all PP FPGAs is a big advantage for the com pilation time of the board firmware which can take hours per design To distinguish the five FPGAs on a ECS access a hardwired chip address has been given each The EPC16 device is programmed over JTAG controlled by the ECS Optional a connecter is available on the motherboard to download the firmware directly to the PP FPGAs 22Remark that the PLX9030 is one of the driver of the local bus Because the local bus is specified to operate at a frequency of up to 60M Hz the edges of the local bus can be much faster than it is needed for the 10M Hz operation 28 RxCard Mezzanine card Only for A RxCard for O RxCard T 7 16 clocks are inputs to PP FPGA IM lt x O a LL o a Cik_40 i LEEK QDR CLK 100MHz ECSCLK 40MHz Een 2
Download Pdf Manuals
Related Search
Related Contents
brochure FR - Henry Schein Kenwood KAC-7251/7201 Stereo Amplifier User Manual Operating Instructions Samsung Galaxy S4 Benutzerhandbuch(LL) Instruction Manual WMS Lexis Light Manual PDFファイル - 医薬品医療機器総合機構 Copyright © All rights reserved.
Failed to retrieve file