Home

i960 CA/CF Microprocessor User`s Manual

1. 1 MACHINE LANGUAGE INSTRUCTION REFERENCE Opcode 58 0 58 1 58 2 58 3 58 4 58 6 58 7 58 8 58 9 58 58 B 58 C 58 D 58 E 58 F 59 0 59 1 59 2 59 3 59 8 59 59 59 C 59 D 59 E 5A 0 5A 1 5A 2 4 5A 5 5A 6 E 2 Table E 2 REG Format Instruction Encodings Sheet 1 of 2 Mnemonic notbit and andnot setbit notand xor or nor xnor not ornot clrbit notor nand alterbit addo addi subo subi shro shrdi shri shlo rotate shli cmpo cmpi concmpo concmpi cmpinco cmpinci cmpdeco E 9 9 E ES 5 z 24 23 19 18 14 13 12 11 10 6 5 4 0 0101 1000 dst src M3 M2 M1 0000 S2 S1 bitpos 0101 1000 dst src2 M2 1 0001 S2 S1 Src1 0101 1000 dst src2 M3 M2 M1 0010 S2 S1 Src1 0101 1000 dst SIC M2 1 0011 S2 S1 bitpos 0101 1000 dst src2 M3 M2 1 0100 S2 S1 Src1 0101 1000 dst src2 M3 M2 1 0110 S2 S1 Src1 0101 1000 dst src2 M3 M2 1 0111 52 51 Src1 0101 1000 dst src2 M3 M2 1 1000 S2 S1 Src1 0101 1000 dst src2 M3 M2 1 1001 52 51 srct 0101 1000 dst M3 M2 M1 1010 S2 51 src 0101 10
2. Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p tfp em te Events Modes T Y Format Type issue Latency Conditional Compare Int ncmpi srcl 5 conemp reg lit sfr reg lit sfr if AC cc2 0 0 U M 5A 3 REG R 0 5 1 1 if src1 lt src2 010 else AC cc lt 001 concmpo 7 74 7 T7 I I U M 5A REG R 0 5 1 1 else 001 Divide Integer divi srcl sre2 dst reg lit sfr reg lit sfr reg sfr IO i I U 74 B REG 3 37 if src 0 Arithmetic Zero Divide fault ZD dst lt quotient srcZ src1 src2 src and dst are 32 bits Divide Ordinal divo src dst reg lit sfr i sfr reg sfr I I U ZD M 70 REG m 3 35 36 if src 0 Arithmetic Zero Divide fault dst lt quotient srcl src2 src and dst are 32 bits Extended Divide iv srcl sre2 dst ed g lit sfr g lit sfr reg sfr if src1 0 Arithmetic Zero Divide fault I U ZD 67 1 REG 3 35 36 dst lt remainder src2 src1 dst 1 lt quotient srcZ src1 src2 is 64 bits srcl dst and dst 1 are 32 bits Extended Multiply emul srcl src2
3. j T I I I I l T F_CA123A Figure B 23 DRAM System Write Waveform B 31 BUS INTERFACE EXAMPLES intel B 3 13 Design Example Burst DRAM with Distributed CAS Before RAS Refresh Using READY Control This example illustrates a DRAM system design that uses CAS before RAS refresh and READY control CAS before RAS refresh uses the internal refresh address generation capabilities of modern DRAMs The design does not use a DMA channel for refresh READY must be generated by the DRAM controller to indicate that a data transfer is complete The controller must arbitrate between access requests and refresh requests control the address multiplexer and RAS precharge time The internal wait state generator is not used The DRAM controller must be designed with information about processor and DRAM speed The memory system block diagram Figure B 24 is similar to the schematic for the previous example except for the absence of the DMA controller connection The refresh timer indicates it is time to refresh the DRAM 23 7 RAS CAS1 are Jr CAS3
4. 7 9 7 6 4 Parallel Faults tats eee teniente cate 7 9 vi intel CONTENTS 7 6 5 Faults in One Parallel Instruction 7 10 7 6 6 Faults in Multiple Parallel Instructions 7 10 7 6 7 Fault Record for Parallel Faults 2 7 10 7 7 FAULT HANDLING PROCEDURES eene 7 12 7 7 1 Possible Fault Handling Procedure Actions 2 7 12 7 7 2 Program Resumption Following a Fault e 7 12 7 7 3 Returning to the Point in the Program Where the Fault Occurred 7 18 7 7 4 Returning to a Point in the Program Other Than Where the Fault Occurred 7 13 7 7 5 Fault e ee e d 7 14 7 8 FAULT HANDLING ACT ION see erc 7 14 7 8 1 Local Fault a e a RD ERE En es 7 15 7 8 2 System LocakFa lt Ca ll snapsin ar er a eee 7 16 7 8 3 System Supervisor Fault 7 16 7 8 4 Faults and Interrupts 5 itte ce ege este re ies 7 17 7 9 PRECISE AND IMPRECISE FAULTS eene eene 7 17 7 9 1 Precise Faults oii tct em ida dae 7 18 7 9 2 Imprecise Faults HR rd 7 18 7 9 3 Asynchronous Faults 5 2 rope ue e indt 7 18 7 9 4 No Imprecise Faults NIF Bit 7 18 7 9 5 Controlling Fault Precision oti er ct etr ee ta i leet ede 7 19 7 10 FAULT
5. a ie ei dit tete ipee tee 7 20 7 10 1 nates na eee Ab eae 7 21 7 10 2 Constraint eA ee ee 7 22 7 10 3 Operation eee ete en ede 7 23 7 10 4 Parallel Faults vives Renee Ianue 7 24 7 10 5 Protection Faults 4 5 2 et t e n esce toad cen tha dde i aed es 7 25 7 10 6 Trace Faults c E YE ERE EYE ERES 7 26 7 10 7 Type Faultss 7 28 CHAPTER 8 TRACING AND DEBUGGING 8 1 TRAGE GONTROLS i iet erectus td ipud nes eni fedes eres 8 1 8 1 1 Trace Controls TC Register 2 8 2 8 1 2 Trace Enable Bit and Trace Fault Pending 8 3 8 1 3 Trace Control on Supervisor Calls eee 8 3 8 2 TRAGE MODES iia meti ero iet ii E 8 4 8 2 1 Inistr ction Trace eie eee ge nee desti 8 4 8 2 2 Branch Trace rm 8 4 8 2 3 a tes 8 4 8 2 4 Return Trace cass ot ect iet te Do t a oth nals Seth 8 4 8 2 5 eR RE eret 8 5 8 2 6 S pervisor TIEace uec irt ir ade ect re c c o da ge ofla etg 8 5 8 2 7 Breakpoint Trace err e i d Ee ee HP erat 8 5 vii 5 intel i
6. B 2 Non Pipelined SRAM Read B 4 Non Pipelined SRAM Write B 5 Chip Enable State enn B 7 A3 2 Address Generation State 1 B 8 Pipelined Read Address and 0 B 10 Pipelined SRAM Interface Block B 11 Pipelined Read 4 B 13 Pipelined Read Chip Enable State B 13 5 Figure 10 Figure B 11 Figure B 12 Figure B 13 Figure B 14 Figure B 15 Figure B 16 Figure B 17 Figure B 18 Figure B 19 Figure B 20 Figure B 21 Figure B 22 Figure B 23 Figure B 24 Figure B 25 Figure B 26 Figure B 27 Figure B 28 Figure B 29 Figure B 30 Figure B 31 Figure B 32 Figure D 1 Figure F 1 Figure F 2 Figure F 3 Figure F 4 Figure F 5 Figure F 6 Figure F 7 Figure F 8 Figure F 9 Figure F 10 Figure F 11 Figure F 12 Figure F 13 Figure F 14 Figure F 15 Figure F 16 XX Pipelined Read PA3 2 State Machine Diagram B 14 Nibble Mode Read B 16 Fast Page Mode DRAM Read B 17 Static Column Mode DRAM Read B 18 RAS only DRAM Refresh B 19 CAS before RAS DRAM Refresh B 19 Address Multiplexer Inputs B 20 DRAM System with DMA Refresh B 22 DRAM Address Generation State Machi
7. 12 2 12 2 1 Interrupt Controller Modes 2 12 3 12 2 1 1 Dedicated Mode hr eee eti 12 4 intel 3 CONTENTS 12 2 1 2 Expanded Mode three eee ede peli teet 12 5 12 2 1 3 Mode cante tbh ees 12 7 12 2 2 Non Maskable Interrupt 12 7 12 2 3 Saving the Interrupt 12 7 12 3 EXTERNAL INTERFACE 12 8 12 3 1 Pin D scriptiOns s tee frere re rernm eee eae 12 9 12 3 2 Interrupt Detection Options 2 mener 12 9 12 3 3 Programmer s Interface 12 11 12 3 4 Interrupt Control Register 12 11 12 3 5 Interrupt Mapping Registers IMAPO IMAP2 12 12 12 3 6 Interrupt Mask and Pending Registers IMSK IPND 12 14 12 3 7 Default and Reset Register Values 12 15 12 3 8 Setting Up the Interrupt Controller 12 16 12 3 9 Irnplemientatlon DD ben 12 16 12 3 10 Interrupt Service Latency emere 12 17 12 3 11 Optimizing Interrupt Performance emere 12 19 12 3 12 Vector Caching Option 2 12 20 12 313 DMA Suspension on Interrupts 12 21 12 3 14 Caching Interrupt Handling Procedures 12 21 CHAPTER 13 DMA CONTROLLER 13 1 OVERVIEW e Sedna nt ri
8. e ee ed e ERE ae LER 6 6 6 5 1 Postirig Interr pts ice RETRO RUBER RR AERE 6 6 6 5 2 Posting Interrupts Directly to the Interrupt Table 6 7 6 6 SYSTEM CONTROL INSTRUCTION 6 8 6 7 INTERRUPT STACK AND INTERRUPT RECORD seen 6 9 6 8 INTERRUPT SERVICE ROUTINES sess nere 6 10 6 9 INTERRUPT CONTEXT SWITQOH ssseseseeeeeeenenneeeneennen nennen nnne 6 11 6 9 1 Executing State 8 6 12 6 9 2 Interrupted State Interrupt 2 6 13 7 FAULTS 7 1 FAULT HANDLING FACILITIES 7 1 7 2 EAUIT VY A etaim 7 2 7 8 FAULT TABEE tpe eee e BER RR Ed 7 4 7 4 STACK USED IN FAULT 7 6 7 5 FAULT REGCORD 3 2 Etre ER ne eee x i Ree CERT 7 6 7 5 1 Fa lt Record Data i t eae cas 7 6 7 5 2 Return Instruction Pointer RIP 7 7 7 5 3 Fault Record tte tit 7 8 7 6 MULTIPEE AND PARALLEL FAULTS tr edente eere 7 9 7 6 1 Multiple die 7 9 7 6 2 Multiple Trace Fault Conditions 7 9 7 6 3 Multiple Trace Fault Conditions with Other Fault Conditions
9. F 17 Previous Frame Pointer Register PFP 0 F 18 Process Controls PC Register F 18 Trace Controls TC Register 4 4 4 011 F 19 Process Control Block Configuration Words eene F 20 xxi 5 xxii TABLES Table 1 1 Table 2 1 Table 2 2 Table 2 3 Table 2 4 Table 2 5 Table 2 6 Table 2 7 Table 2 8 Table 3 1 Table 3 2 Table 3 3 Table 3 4 Table 3 5 Table 4 1 Table 4 2 Table 4 3 Table 4 4 Table 4 5 Table 5 1 Table 5 2 Table 5 3 Table 7 1 Table 7 2 Table 9 1 Table 9 2 Table 9 3 Table 9 4 Table 9 5 Table 10 1 Table 10 2 Table 11 1 Table 11 2 Table 11 3 Table 11 4 Table 12 1 Table 12 2 Table 13 1 Table 13 2 Register Terminology Conventions 2 1 8 Registers and Literals Used as Instruction Operands 2 3 Allowable Register 2 6 Data Structure 2 8 Alignment of Data Structures in the Address Space 2 11 Condition Codes for True or False 2 16 Condition Codes for Equality and Inequality Conditions 2 16 Condition Codes for Carry Out and Overflow 2 2 2 2 17 Supervisor Only Operations and Faults Generated in User Mode
10. 4 20 4 3 2 1 Request Interr pt 2 n ertet e rente eee 4 21 4 3 2 2 Invalidate Instruction eee 4 21 4 3 2 3 Configure Instruction Cache 4 21 4 3 2 4 Reinitialize PrOCOeSSOt itcr tenir en tete tier dede 4 22 4 3 2 5 Load Control Registers 2 4 23 CHAPTER 5 PROCEDURE CALLS 5 1 OVERVIEW nior ERE eene EE E ee TU ER E AER 5 1 5 2 CALL AND RETURN MECHANISM 5 2 5 2 1 Local Registers the Procedure 5 2 5 2 2 Local Register and Stack 5 4 5 2 2 1 Frame Pointer gt ne ere ae TRE 5 4 5 2 2 2 Stack Polriter itas it eH e 5 4 5 2 2 3 Previous Frame Pointer 84 0 5 4 5 2 2 4 Ret rn Type Field ette Eee M e DO eR ders 5 4 5 2 2 5 Return Instruction Pointer 5 5 5 2 3 Call and Return Action 5 5 5 2 3 1 Call Operation abire ke d tae AR ARE RISO 5 5 5 2 3 2 Return Operatlor s dede BEER ia SERE 5 6 5 2 4 Caching of Local Register Sets 5 6 5 2 5 Mapping Local Registers to the Procedure Stack 5 9 5 3 PARAMETER PASSING 1 rent
11. Reset 1 0 0 I U 243 bus 243 bus ase 4 Load Control Register Group ee I I U M 42 bus 42 bus default Operation Invalid Operand fault gt 7 7 gt gt gt 3 3 alo 5 7 5 return j ji 2 i li H 1 H d a System Control 1 srcl sre2 sre3 sysct reg lit sfr reg lit sfr reg lit 65 9 REG u 80960CF i and Oxff gt gt 8 switch i ase 0 Post an Interrupt ee bred I I U M 37 bus 37 bus 1 Purge the Instruction Cache ES mS Ec et gt U M 38 38 case 2 Configure the Instruction Cache 4 Kbyte cache enabled 522 U I I I IM 52 52 2 Kbyte cache disabled 48 48 load and lock 2 Kbytes break 3653 bus 3653 bus pues ie Bata Reset 0 0 0 0 0 0 31 2 1 O I 0 I U M 265 bus 265 bus 4 Load Control Register Group E E bia I I U M 42 bus 42 bus default Operation Invalid Operand fault a 919159 5 59 3 7 3 gt gt 7 5 return T E 3 1 i H X t E 2 i March 1994 Page 17 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Instruction
12. COBR 1994 Page 7 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Instruction Execution Opcode Mach Instruction Result Type Issue Latency Mnemonic Description nif om of 50 p tip em te Events Modes T 1 Format Compare and Increment Integer cmpinci srel 5 dst reg lit sfr reg lit sfr reg sfr I 5 5 REG R 0 5 1 1 if srel lt src2 lt 100 else if 1 src2 010 else AC cc lt 001 dst src2 1 overflow is ignored Compare and Increment Ordinal cmpinco sre P regilit sfr dst reg sfr I U M 5A REG R 0 5 1 1 if 1 lt src2 AC cc lt 100 else if src src2 lt 010 else AC cc lt 001 dst lt src2 1 Compare Ordinal cmpo srcl sre2 reg lit sfr reg lit sfr I U M 5A 0 REG R 0 5 1 1 if 1 lt sre2 lt 100 else if src1 src2 AC cc lt 010 else AC cc lt 001 Compare Ordinal and Branch If Equal m srcl sre2 targ cmpobe reg lit sfr reg i 5 1 IB IB M 32 COBR C 1 3 1 3
13. M 92 MEM Morp a memory_word dst lt src Store Integer Byte OP i sre dst 05 1 stib fogli 1 r u 10 M C2 25 memory byte dst lt src truncated to 8 bits Store Integer Short OP i sre dst 05 1 suis nein E I r U 10 M CA MEM memory short dst lt src truncated to 16 bits OC Store Long OP 1 src dst 05 14 st 1 11 1 MEM Mor u memory long dst lt src sre 1 Store Ordinal Byte OP stob sre dst 0 5 1 reg lit mem I U M 82 M oru efa memory_byte dst lt src truncated to 8 bits Store Ordinal Short OP sre dst 05 1 stos s I r u M MEM memory short dst src truncated to 16 bits OC Store Quad OP src dst 05 14 stq I r U M B2 MEM Morn memory quad dst lt src src l sre 2 src 3 OC Store Triple OP sre dst 05 1 sit reg lit 1 U M memory triple dsr src sre 1 src 2 Subtract Ordinal With Ca subc srcl src2 dst reg lit sfr reg lit sfr reg sfr 0 2 2 I I U M 5B2 REG R 0 5 1 1 dst lt src2 srcl not AC cc1 0 integer overflow AC ccl out Subtract Integer subi srel src2 dst reg lit sfr reg lit sfr
14. reghsr I I R 05 1 1 dst lt src rotate left len mod 32 March 1994 Page 13 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Mnemonic Description Arithmetic Controls Process Controls Trace Controls Faults nif om of cc cc 2 1 te Events Modes A Opcode Opcode Format Instruction Execution Mach Type Instruction Result Issue Latency scanbit Scan for Bit sre dst reg lit sfr reg sfr if sre 0 dst lt Oxffff ffff lt 000 else for i 31 src and 2 i 0 i lt 1 1 dst i AC cc lt 010 0 641 scanbyte setbit Scan Byte Equal if 1 and 0x0000 0044 src2 and 0x0000 OOff or 1 and 0x0000 ff00 rc2 0x0000 ff00 or 1 and 0 00 0000 c2 and OxOOff 0000 or 1 and Oxff00 0000 src2 and Oxff00 0000 AC cc lt 010 else AC cc 000 0 0 M 5A C REG R 0 5 1 1 Setup DMA Channel srcl reg lit sfr src3 reg lit ima channel control lt src2 if not chaining mode DMA_RAM src1 lt sre3 lse src src3 quad store word store M 63 0 REG 22 40 for back to back SDMA s 22 40 for back to back SDMA s
15. src2 0015 gt src2 followed by a branch if instruction is equivalent to compare integer and branch instruction The latter method of comparing and branching produces more compact code however the former method can result in faster running code if used to take advantage of pipelining in the architecture Same is true for cmpo and the compare ordinal and branch instructions Action if srcl lt src2 lt 1005 else if src src2 lt 0105 else AC cc lt 0015 Faults Type Mismatch Non supervisor reference of a sfr Example cmpo r9 0x10 compares the value in r9 with 0x10 and sets AC cc to indicate the result bg xyz branches to xyz if the value of r9 was greater than 0x10 Opcode cmpi 5AIH REG cmpo 5A0H REG See Also COMPARE AND BRANCH cmpdeci cmpdeco cmpinci cmpinco concmpi concmpo 9 29 INSTRUCTION SET REFERENCE intel P 9318 cmpinco Mnemonic cmpinci Compare and Increment Integer cmpinco Compare and Increment Ordinal Format cmpinc srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Compares src2 and src values and sets condition code according to comparison results src2 is then incremented by one and result is stored in dst The following table shows condition code settings for the three possible comparison results Condition Code Comparison 1005 lt src2 0105 src2 0015
16. pg 14 5 The FAIL pin signals errors in either the internal self test or bus confidence self test FAIL is asserted low for each self test Figure 14 1 If the test fails the pin remains asserted and the processor attempts to stop at the point of failure If the test passes FAIL is deasserted When the internal self test is disabled with the STEST pin FAIL still toggles at the point where the internal self test would occur even though the internal self test is not executed FAIL is deasserted after the bus confidence test passes In Figure 14 1 all transitions on the FAIL pin are relative to PCLK2 1 as shown in the 80960CA or CF data sheets NER IEEE MEC RESET Internal Self Test Bus Test Pass Pass FAIL Fail Fail 5 102 On the 80960 cycles 60 000 On the 80960CF cycles 280 000 CA075A Figure 14 1 FAIL Timing 14 4 intel INITIALIZATION AND SYSTEM REQUIREMENTS 14 2 3 On Circuit Emulation On circuit emulation aids board level testing This feature allows a mounted 1960 Cx processor to electrically remove itself from a circuit board In ONCE mode the processor presents a high impedance on every pin nearly eliminating the processor s power demands on the circuit board Once the processor is electrically removed a functional tester can take the place of emulate the mounted proces
17. 5 intel 3 197141 Pin Description 6 aeter cer etes 13 30 13 11 2 Demand Mode Request Acknowledge Timing eene 13 31 13 11 3 End Of Process Terminal Count Timing 13 32 13 11 4 Block Mode Transfers 0 8 4 13 33 13 11 5 DMA Bus Request Pili aec at E c D eas 13 33 13 11 6 DMA Controller em 13 34 13 11 7 and User Program 13 34 19 11 8 Controller Unit 5 3 retenti tenenmemtteae eei 13 35 13 11 9 Controller Logic 13 35 13 11 10 DMA Performance eie Rr HERE 13 36 1311 11 DMA Thro gfhput rete e Rr Dre qn Lee 13 38 19 14 12 DMA Latency iere tette tee cte o teet 13 40 CHAPTER 14 INITIALIZATION AND SYSTEM REQUIREMENTS 14 1 OVERVIEW iine ee ET ER EID CENE 14 1 14 2 INITIALEIZATIQN e dic enirn 14 2 14 2 1 Reset Operation ode tte ER 14 2 14 2 2 Self Test Function STEST FAIL 002 200 000000000000 14 4 14 2 3 OnsGircuit Emulation Jac cien a a 14 5 14 2 4 Initial Memory Image 14 5 14 2 5 Initialization Boot Record IBR 14 5 14 2 6 Process Control Block 14 8 14 3 REQUIRED DATA 5 14 11 14 3 1 Reinitializ
18. ADS BLAST READY BOFF SUSPEND REQUEST RESUME REQUEST j i A31 2 SUP DMA 4 t P BE3 0 WAIT DEN DT R er D31 0 t WRITES 1 t Begin Request End Request BOFF may be asserted to suspend request m Tm BOFF may not BOFF may not be asserted be asserted Note READY BTERM must be enabled Ngap Nwap Nwpp 0 CX043A Figure 11 18 Operation of the Bus Backoff Function 11 31 EXTERNAL BUS DESCRIPTION Memory Processor System A Requested Access to Processor B Local Memory High Requested Access to Other Device On Bus Slave Memory or Peripheral Processor System B Bus Grant Priority Low F CA048A 11 32 Figure 11 19 Example Application of the Bus Backoff Function intel 12 INTERRUPT CONTROLLER intel CHAPTER 12 INTERRUPT CONTROLLER This chapter contains interrupt controller information that is of particular importance to the system implementor The method for handling interrupt requests from user code is described in CHAPTER 6 INTERRUPTS Specifically this chapter describes the 19609 Cx processors facilities for requesting and posting interrupts the programmer s interface to the on chip interrupt controller implementation latency and how to optimize interrupt performance
19. Byte Short Word Word Byte Requests Short Word Word Triple Word Short Requests Load Store Byte Word Word Short Byte Requests Word Word Word Requests Word Word Word Requests Word Word Word l Requests One Four Word Request Aligned Byte Short Word Word Word Byte Requests Quad Word Short Word Word Word Load Store Short Requests 1 1 1 1 Byte Word Word Word Short Byte Requests Word Word Word Word Requests Double Word Double Word j 1 Requests _ 49 Figure 10 5 Summary of Aligned Unaligned Transfers for Little Endian Regions continued 10 12 intel THE BUS CONTROLLER 10 5 INTERNAL DATA RAM 1960 Cx processor contains 1 Kbyte of user visible internal data RAM which is mapped into the first 1 Kbyte of the address space addresses 00H 3FFH Internal data RAM is accessed only by loads stores or DMA transfers Instruction fetches directed to these addresses cause an operation unimplemented fault to occur A portion of this internal data RAM is optionally used to store DMA status cached interrupt vectors and in some applications cached local registers The remaining data RAM can be used by application software Refer to section 2 5 4 Internal Data RAM pg 2 12 Internal data RAM interfaces directly to an internal 128 bit bus This bus
20. Line 62 Line 63 Figure A 5 Data Cache Organization A 8 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 1 8 2 Bus Configuration Certain data accesses are implicitly non cacheable All DMA and atomic atmod atadd accesses are non cacheable User settings in the memory region configuration registers MCONO 15 determine which data accesses are cacheable or non cacheable Registers MCONO 15 divide memory into 16 blocks whose characteristics are programmed through sysctl instructions Refer to section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 Micro flow execution breaks unaligned accesses into aligned accesses cacheability is determined as described in the preceding paragraph Data objects that cross programming boundaries may be only partially in the data cache resulting in a combination of cache hits and misses Upon reset or initialization the processor clears all valid bits to zero to ensure that accesses are not made to a cache line that may contain invalid data A 1 8 3 Global Control of the Cache The data cache is globally enabled or disabled by a bit in the DMA Command Register The following example code shows how to disable the cache Setting the data cache disable bit does not take effect until the second clock after the setbit instruction is executed Any load store issued in parallel with setbit or on the following clock will be directed to the data cach
21. a logic 0 low impedance to Vss or float present a high impedance to Vcc and Each can drive an appreciable external load The i960 CA CF microprocessor data sheets describe each pin s drive capability and provide timing and derating information to calculate output delays based on pin loading Output drivers on the 1960 Cx processor are specially designed to provide a uniform drive current over the entire range of operating temperatures and voltages This feature eliminates excess noise produced by output drivers under adverse operating conditions 14 28 intel INITIALIZATION AND SYSTEM REQUIREMENTS 14 4 5 2 Input Pins 1960 Cx processor inputs are designed to detect TTL thresholds providing compatibility with the vast amount of available random logic and peripheral devices that use TTL outputs Most 1960 Cx processor inputs are synchronous inputs Table 14 3 A synchronous input pin must have a valid level TTL logic 0 or 1 when the value is used by internal logic If the value is not valid it is possible for a metastable condition to be produced internally The metastable condition is avoided by qualifying the synchronous inputs with the rising edge of PCLK2 1 or a derivative of PCLK2 1 The 1960 CA CF microprocessor data sheets specify input valid setup and hold times relative to PCLK for the synchronized inputs Table 14 3 i960 Cx Processor Input Pins SGusdhonous Mute Asynchronous Inputs Asynchr
22. if src lt src2 AC cc lt 100 else if src src2 AC cc lt 010 IP lt targ else AC cc lt 001 Compare Ordinal and Branch If Greater cmpob srel src2 targ pobg reg lit sfr reg IB U M 31 COBR C 1 3 1 3 if 1 lt sre2 lt 100 else if src1 src2 AC cc lt 010 else AC cc amp 001 IP lt targ Compare Ordinal and Branch If Less src2 targ cmpobl Lm it sfr reg IB M 34 1 3 1 3 if src lt src2 AC cc lt 100 lt targ if src1 src2 AC cc lt 010 else lt 001 Compare Ordinal and Branch If Less Or Equal cmpoble srel src2 targ P reg lit sfr IB U M 36 COBR C 1 3 1 3 if src lt src2 AC cc lt 100 IP lt targ else if src src2 AC cc lt 010 IP lt targ else lt 001 Compare Ordinal and Branch If Not Equal cmpobne srel src2 targ P reg lit sfr reg IB U M 35 COBR C 1 3 1 3 if src1 lt src2 AC cc lt 100 IP lt targ else if src src2 AC cc lt 010 else lt 001 IP lt targ March 1994 Page 8 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference
23. ram i lt status channel i Example udma f update status to dma ram ldq Channel 3 ram r4 read current pointers and byte count for dma channel 3 Opcode udma 631H REG See Also sdma 9 83 INSTRUCTION REFERENCE intel 9 3 61 Format Description Action Faults Example Opcode See Also 9 84 xnor xnor Exclusive Nor xor Exclusive Or xnor srcl src2 dst reg lit sfr reg lit sfr reg sfr xor srcl src2 dst reg lit sfr reg lit sfr reg sfr Performs a bitwise XNOR xnor instruction or XOR xor instruction operation on the src2 and src values and stores the result in dst xnor dst lt not src2 xor srcl XOr dst src2 xor src Type Mismatch Non supervisor reference of a sfr xnor r3 r9 r12 r12 lt r9 XNOR r3 xor 41 47 44 g4 lt 47 XOR gl xnor 589H REG xor 586H REG and andnot nand nor not notand notor or ornot intel 10 THE BUS CONTROLLER intel CHAPTER 10 THE BUS CONTROLLER This chapter serves as a guide for a software developer when configuring the bus controller It overviews bus controller capabilities and implementation and describes how to program the bus controller System designers should reference CHAPTER 11 EXTERNAL BUS DESCRIPTION for a functional description of the bus controller 10 1 OVERVIEW The bus controller supports a synchronous 32 bit wide demultiplexed external bus which
24. 5 REG R 0 5 1 1 dst lt src 32 bits Move Long movl sre dst reg lit sfr reg sfr 5D C R 0 5 1 1 dst lt src 64 bits Move Qi mov dst ovg reg lit sfr reg sfr I 5 REG u 2 2 dst lt src 128 bits Move Triple movt sre dst reg lit sfr reg sfr I SE C REG u 2 2 dst lt src 96 bits Multiply Integer muli srel sre2 dst reg lit sfr reg lit sfr reg sfr I 74 1 REG R 0 5 1 2 3 4 5 dst lt src2 src 5 dst src2 srcl are 32 bits Multiply Ordinal mulo srcl src2 dst reg lit sfr reg lit sfr reg sfr I 70 1 REG R 0 5 1 2 3 4 5 src 1 dst src2 srcl are 32 bits nand srcl sre2 dst reg lit sfr reg lit sfr reg sfr I 58 E REG R 0 5 1 1 dst lt not src2 and src1 Nor nor srcl sre2 dst reg lit sfr reg lit sfr reg sfr I 58 8 REG R 0 5 1 1 dst lt not src2 or srcl Not not sre dst reg lit sfr reg sfr I 58 4 REG R 0 5 1 1 dst lt not src Not And notan srel sre2 dst otang 584 REG R 05 1 1 dst not 2 and sre March 1994 Page 12 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description cc cc cc O
25. 9 67 EE 9 68 SHIEI 9 69 spanbit oer Cat e PE 9 72 UNDIS M up es ee 9 73 Ca EE EE EE 9 75 SUD SUDO sic 9 76 D ERR T 9 77 sysctl 80960Cx Processor Only 9 78 See 9 81 80960 Processor 9 83 KOM eo iste celeb ele hee uem 9 84 5 intel 9 CHAPTER 10 THE BUS CONTROLLER 10 1 QVERVIEW ae eR ee 10 1 10 2 MEMORY REGION 10 2 10 2 1 Data Bus Width vies e reet ne a imo vere es 10 3 10 2 2 Burst and Pipelined Read ACCESSES 10 3 10 2 3 Wait States de hi aia RI One nie ed A 10 3 10 2 4 Byte Ordering tini beet ilies tedio ene inc desta dg e eden 10 5 10 3 PROGRAMMING THE BUS 10 5 10 3 1 Memory Region Configuration Registers MCON 0 15 10 6 10 3 2 Bus Configuration Register 10 8 10 3 3 Configuring the Bus Controller meme 10 9 10 4 DATA ALIGNMENT iiri ect eee eve e e e ee 10 9 10 5 INTERNAL DA ARAM ie eite et eret
26. Figure B 19 DRAM Controller State Machine STATE 0 RAS is CAS3 0 is COL ADR 1 6 IF Idle not asserted not asserted not asserted memory access ADS amp amp DRAM CS amp amp DACKO B 26 BUS INTERFACE EXAMPLES THE the next state is STATE 1 ELSE IF refresh access ADS amp amp DRAM CS amp amp DACKO THE the next state is STATE 5 ELSE the next state is STATE 0 STATE Assert RAS RAS is asserted CAS3 0 is not asserted COL_ADR is not asserted IF WRITE write THE the next state is STATE_3 ELSE read the next state is STATE_2 STATE Static Column Mode Read Assert CAS RAS is asserted CAS3 0 is asserted COL_ADR is asserted IF BLAST THEN the next state is STATE_0 ELSE the next state is STATE_2 STATE Select Column Address RAS is asserted CAS3 0 is not asserted COL_ADR is asserted the next state is STATE_4 STATE_ Assert CAS RAS is asserted COL_ADR is asserted CASO BEO CAS1 CAS2 BE2 CAS3 BE3 IF WAIT amp amp BLAST THEN the next state is STATE_3 ELSE IF BLAST THEN the next state is STATE_0 ELSE the next state is STATE_4 the next state is STATE_6 STATE_5 REFRESH CYCLE RAS ONLY REFRESH B 27 BUS INTERFACE EXAMPLES intel P RAS is CAS3 0 is COL ADR is STATE
27. pg 13 18 DMA DMA Bus Request output This pin indicates that a bus request is issued by the DMA controller The pin is active during a bus request originating from the DMA controller and inactive during all other bus requests DMA pin value is indeterminate during idle bus cycles The DMA pin is not active when chaining descriptors are loaded from memory 13 11 2 Demand Mode Request Acknowledge Timing Demand mode transfers require that the DMA request DREQ3 0 signal is asserted before the transfer is started Demand mode transfers should satisfy two requirements 1 After the transfer is requested the DMA controller must be fast in responding to the requesting device This characteristic is referred to as latency 2 The requesting device must be given enough time to deassert the request signal to prevent an unwanted DMA transfer The timing for demand mode transfers is described in the following sections Latency character istics of a transfer are described in section 13 11 10 Performance pg 13 36 An external device initiates a demand mode transfer by asserting active low one of the DMA request pins The acknowledge pin is driven active by the DMA controller during the bus request issued to access the DMA requestor Figure 13 14 shows DACK3 0 output timings To start a demand mode DMA DREQ3 0 must be held asserted until the acknowledge bus request is started EOP3 0 pins do not require externa
28. 2 21 Supported Integer Sizes 3 2 Supported Ordinal SIzes cte ie ccs a tdt ec eh 3 3 Memory Contents For Little and Big Endian 3 5 Byte Ordering for Little and Big Endian 3 5 Memory Addressing emnes 3 6 i9609 Cx Microprocessor Instruction Set Summary 4 4 Arithmetic Operations 4 7 System Control Message Types and Operand 4 20 Cache Configuration 4 22 Control Register Table and Register Group 4 24 PRCB Cache Configuration Word and Internal Data 5 9 Encodings of Entry Type Field in System Procedure Table 5 14 Encoding of Return Status 5 17 i9609 Cx Processor Fault Types and Subtypes 7 3 Fault 1 95 07 5 5 dece eoe n ee p be ot e epe ge 7 15 Abbreviations in nennen 9 5 Pseudo code Symbol Definitions eese 9 6 Fault Types and Subtypes esses enne 9 6 Common Possible Faulting Conditions
29. 93 11 Call Mnemonic call Call Format call targ disp Description Calls a new procedure targ operand specifies the IP of called procedure s first instruction When using the Intel 1960 processor assembler targ must be a label In executing this instruction the processor performs a local call operation as described in section 5 4 LOCAL CALLS pg 5 12 As part of this operation the processor saves the set of local registers associated with the calling procedure and allocates a new set of local registers and a new stack frame for the called procedure Processor then goes to the instruction specified with targ and begins execution targ can be no farther than 223 to 22 4 bytes from current IP Action wait for any uncompleted instructions to finish temp lt SP Oxf andnot Oxf round to next boundary memory FP lt 10 15 these accesses are cached in RIP lt next IP local register cache PFP lt FP PFP rt lt 0005 lt temp SP lt temp 64 IP lt IP displacement Faults Trace Instruction Call Breakpoint Instruction and Call Trace Events are signaled after instruction completion Trace fault is generated if PC tez1 and TC i or 1 Operation _ Unimplemented Execution from on chip data Example call xyz IP xyz Opcode call 09H CTRL See Also bal calls callx 9 21 INSTRUCTION REFERENCE intel 9 3 12 Format Descri
30. IF Done BLAST THEN next state is STATE 0 ELSE next state is STATE 3 B 9 BUS INTERFACE EXAMPLES intel P B 1 5 Trade offs and Alternatives The SRAM example just described demonstrates a burst SRAM memory interface If a non burst interface is desired the address generation section of the state machine PLD may be removed The design is also easily expanded to accommodate multiple banks of SRAM 1960 Cx processors integrated bus controller simplifies external memory system design The internal wait state generator decouples the memory speed from the memory controller The memory control PLD does not use any of the memory access parameters So operation of the memory control PLD is independent of memory access times Memory access parameters are entered into the memory region configuration table via software B 2 PIPELINED SRAM READ INTERFACE The following example illustrates the implementation of a pipelined read SRAM system A zero wait state pipelined read memory system will have a 20 percent improvement in read data bandwidth over a non pipelined memory system using the same memory devices The pipelined read memory system is similar in design to the burst memory system A pipelined read memory system is the highest performance memory system that can be interfaced to the 1960 Cx processors The address cycle of consecutive accesses is overlapped with the data cycle of the previous access This result
31. O y y b 2 2 b Instruction Scheduler ibus xb y b z b w b Figure A 14 CTRL Pipeline for Branches to Branches Figures A 15 A 16 and A 17 show the IS issue stage and the CTRL pipeline for each case of possible IS branch lookahead detection Assuming that the IS can see four instructions every clock from the instruction cache the branch can be in the first second or third group of instructions seen An executable group of instructions is a group of sequential instructions in the currently visible quad word which can be issued in the same clock See section A 2 PARALLEL INSTRUCTION PROCESSING pg 14 Figure A 15 shows the cases where a branch when first seen by the IS is in the first executable group of instructions The IS issues the branch immediately along with the first one or two instruction s ahead of it Since the branch takes two clocks in the CTRL pipeline to execute a one clock break in the IS s ability to issue instructions occurs On the next clock the IS issues a new group of instructions from the branch target A 29 CTRL gt ED ED Qum ap intel In the figure two other instructions were issued simultaneously with the branch Hence the branch could be said to have taken one clock to execute When the branch is the first instruction in the group the branch is a branch target no other instructions are issued in parallel wit
32. OCH 10H 14H 18H 1CH 20H 24H F_CA135A Figure 14 2 Initial Memory Image IMI and Process Control Block PRCB 14 6 intel INITIALIZATION AND SYSTEM REQUIREMENTS When the processor reads the IMI during initialization it must know the bus characteristics of external memory where the IMI is located This bus configuration is read from the IBR s first three words At initialization the processor performs loads from the lower order byte of the IBR s first three words These three bytes are combined and loaded into the memory region 0 configuration register MCONO to program the initial bus characteristics for the system The byte in IBR word 0 is loaded into the lowest byte position of the MCONO register the next two bytes from word 1 and word 2 are loaded into successively higher byte positions The byte in IBR word 4 is reserved and must be set to 00H This byte is not loaded at initialization See section 10 2 MEMORY REGION CONFIGURATION pg 10 2 When initialization begins the region configuration table valid bit BCON ctv is cleared This means that every bus request issued takes configuration information from the MCONO register regardless of the memory region associated with the request The MCONO register is initially set by microcode to a value which allows the bus configuration data in the IBR to be loaded regardless of actual memory configuration This is done by configuring the external bus with i
33. The DMA controller uses the DMA command register DMAC and setup DMA instruction sdma to configure and control the four DMA channels The update DMA instruction udma monitors the status of an in progress DMA operation The DMAC register is a special function register sf2 This register enables or disables each channel and holds frequently accessed status and control bits for the DMA controller including idle or active status and termination status for a channel 13 20 intel 3 DMA CONTROLLER sdma configures each channel sdma specifies source address destination address byte count transfer type chained or non chained operation When a channel is set up using sdma an eight word 32 byte block of internal data RAM is allocated for the channel Channel state is stored in this section of data RAM when operation is preempted by another DMA channel The user can access the current status for any active or idle DMA operation by examining data RAM assigned to a channel This status includes the current source and destination addresses and the remaining byte count udma copies the state of an active DMA channel to internal RAM These actions are usually taken to set up and start a DMA operation on the 1960 Cx processors 1 channel is set up using the sdma instruction 2 DMAC register is modified to enable the DMA 3 DMAC register is then read to monitor the activity of the DMA operation 4 udma can be issued and DMA data RAM exa
34. These were correctly defined in the i960 CF Microprocessor Reference Manual Supplement and unintentionally omitted from this manual REGISTER AND DATA STRUCTURES Transfer Type Field 8 to 8 bits 01 8 to 16 bits 02H reserved 8 to 32 bits 04H 16 to 8 bits 05H 16 to 16 bits 06H reserved 07H 16 32 bits O8H 8 bits fly by 09H 16 bits fly by OAH 128 bits fly by quad OBH 32 bits fly by OCH 32 to 8 bits ODH 32 to 16 bits OEH 128 to 128 bits quad OFH 32 to 32 bits Destination Addressing 0 increment 1 hold Source Addressing 0 increment 1 hold Synchronization Mode Bit 0 source synchronized 1 destination synchronized Synchronization Select Bit 0 block non synchronized 1 demand synchronized EOP TC Select Bit 0 End Of Process 1 Terminal Count Destination Chaining Select Bit 0 no chaining 1 chained destination Source Chaining Select Bit 0 no chaining 1 chained source Interrupt on chaining buffer Select Bit 0 no interrupt 1 interrupt Chaining Wait Select Bit 0 Wait function disabled 1 Wait function enabled 31 28 24 20 16 12 8 4 0 DMA Control Word instruction operand for SDMA instruction Reserved Initialize to 0 F 068 Figure F 13 DMA Control Word Section 13 10 3 Control Word pg 13 25 F 13 REGISTER AND DATA STRUCTURES intel IL bx D
35. Unimplemented An attempt to execute any instruction fetched from internal data RAM causes an operation unimplemented fault 9 6 intel INSTRUCTION SET REFERENCE Table 9 4 Common Possible Faulting Conditions Fault Type Subtype Description Any instruction that references a special function register while not in Mismatch 4 supervisor mode causes a type mismatch fault Type Any instruction that attempts to write to internal data RAM while not in Mismatch 4 4 supervisor mode causes type mismatch fault Any instruction that causes an unaligned memory access causes an Operation Unimplemented operation unimplemented fault if unaligned faults are not masked in the Processor Control Block PRCB 9 2 7 Example The Example section gives an assembly language example of an application of the instruction 9 2 8 Opcode and Instruction Format The Opcode and Instruction Format section gives the opcode and instruction encoding format for each instruction for example subi 593H REG The opcode is given in hexadecimal format The instruction encoding format is one of four possible formats REG COBR CTRL and MEM Refer to APPENDIX D MACHINE LEVEL INSTRUCTION FORMATS for more information on the formats 9 2 9 See Also The See Also section gives the mnemonics of related instructions which are also alphabetically listed in this chapter 9 3 INSTRUCTIONS This section contains refer
36. e Avoid closed loops in signal paths Figure 14 9 Closed loops cause excessive current and create inductive noise especially in the circuitry enclosed by a loop 082 Figure 14 9 Avoid Closed Loop Signal Paths ESI is caused by the capacitive coupling of two adjacent conductors The conductors act as the plates of a capacitor a charge built up on one induces the opposite charge on the other The following steps reduce ESI e Separate signal lines so that capacitive coupling becomes negligible Runa ground line between two lines to cancel the electrostatic fields 14 32 intel A INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel APPENDIX A INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION This appendix describes the 19609 Cx processors core architecture and core features which enhance the processors performance and parallelism This appendix also describes assembly language techniques for achieving the highest instruction stream performance 1960 core architecture defines the programming environment basic interrupt mechanism and fault mechanism for all members of the 1960 microprocessor family The C series core is a high performance highly parallel implementation The 1960 Cx processors integrate a bus controller DMA controller and interrupt controller around the core architecture Figure A 1 DMA Controller C Series Core F CA083A Figure A 1
37. 1 and lost bit 1 temp lt temp 1 dst lt temp Shift Right Integer shri len sre dst reg lit sfr reg lit sfr reg sfr if len gt 32 i lt 32 else i lt len temp amp src while temp 31 temp 30 and i 0 I U M 59 REG R 05 1 1 temp lt temp gt gt 1 temp bit31 lt temp bit30 ici 1 dst lt temp Shift Right Ordinal hri len Snro reg lit sfr dst reg sfr I U M 598 REG R 0 5 1 1 if len lt 32 dst lt src gt gt len else dst lt 0 Span Over Bit spanbit sre dst reg lit str reg sfr dst tfft AC cc lt 000 0 5 0 1 U 64 0 REG u 2 2 else for i 31 sre and 271 0 i i 1 dst lt AC cc lt 010 1994 Page 15 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p tip em te Events Modes T Y Format Type issus Latency i Store 7 OP 05 14 sre ds 5 7 reg lit meni I
38. 1 m I U IB CTRL u 1 2 taken Constraint Range fault Fault If Less ue 99 if fault if AC cc and 100 0 U 1C CTRL 1 2 taken Constraint Range fault Fault If Less Or Equal 99 if fault faultle if AC cc and 110 0 I U IE CTRL u 1 2 taken Constraint Range fault Fault If Not Equal 99 if fault faultne if AC cc and 101 0 1 U E ID CTRL u 1 2 iken Constraint Range fault Fault If Not Ordered TE 1 99 if fault faultno cro E T 18 CTRL p 1 2 taken Constraint Range fault Fault If Ordered ES 99 if fault faulto ID 20 IF CTRL 1 2 taken Constraint Range fault Flush Local Registers 24 24 flushreg Write all cached local register sets to memory 1 U 66 D REG frames frames Invalidate all local register cache locations Force Mark 10 75 IBR 66 C REG p PC tfp lt 1 TC bte 1 Trace Breakpoint fault Load OP 1 f E o efa Ed E 1 mT Uu 90 1 efa i dst memory word src i Load Address Ida sre dst OP 05 14 mem reg 1 fel 8C MEM Moru iu 1 efa dst lt efa src Load Integer Byte OP i sre ist 8 1 efa Idib mein 1 CO 1 dst lt memory byte src sign extended OC Load Integer Short OP we i sre dst S I ue C8
39. 12 1 OVERVIEW The interrupt controller s primary functions are to provide a flexible low latency means for requesting and posting interrupts and to minimize the core s interrupt handling burden The interrupt controller handles the posting of interrupts requested by hardware and software sources The interrupt controller acting independently from the core compares the priorities of posted interrupts with the current process priority off loading this task from the core The interrupt controller provides the following features for managing hardware requested interrupts Low latency high throughput handling e Support of up to 248 external sources Eight external interrupt pins one non maskable interrupt pin four internal DMA sources for detection of hardware requested interrupts Edge or level detection on external interrupt pins e Debounce option on external interrupt pins The user program interfaces to the interrupt controller with four control registers and two special function registers The interrupt control register ICON and interrupt map control registers IMAPO IMAP2 provide configuration information The interrupt pending IPND special function register posts hardware requested interrupts The interrupt mask IMSK special function register selectively masks hardware requested interrupts 12 1 INTERRUPT CONTROLLER In 12 2 MANAGING INTERRUPT REQUESTS The 1960 processor architecture provides a c
40. 14 1 OVERVIEW During the time that the RESET pin is asserted the processor is in a quiescent reset state external pins are inactive and the internal processor state is forced to a known condition The processor begins initialization when the RESET pin is deasserted When initialization begins the processor uses an Initial Memory Image IMI to establish its state The IMI includes Initialization Boot Record IBR contains the addresses of the first instruction of the user s code and the PRCB e Process Control Block PRCB contains pointers to system data structures also contains information used to configure the processor at initialization e System data structures several data structure pointers are cached internally at initialization The 1960 Cx processors may be reinitialized by software When a reinitialization takes place a new PRCB and reinitialization instruction pointer are specified Reinitialization is useful for relocating data structures from ROM to RAM after initialization The processor supports several facilities to assist in system testing and startup diagnostics The ONCE mode electrically removes the processor from a system This feature is useful for system level testing where a remote tester exercises the processor system During initialization the processor performs an internal functional self test and external bus self test These features are useful for system diagnostics to ensure basic CPU
41. Bus Request Pending bus access starts with an address cycle address cycle is defined by the assertion of address strobe ADS Address and byte enables A31 2 and BE3 0 are also presented in the address cycle After the address cycle extra clock cycles called wait states may be inserted to accommodate the access time for external memory or peripherals For write accesses the data lines are driven during wait states For read accesses data lines float Wait states are discussed in section 11 2 1 Wait States pg 11 4 A data cycle follows wait states For write accesses the data cycle is the last clock cycle in which valid data is driven onto the data bus For read accesses external memory must present valid data on the rising edge of PCLK2 1 during the data cycle Setup and hold time for input data is specified the 80960 and CF data sheets 11 3 EXTERNAL BUS DESCRIPTION In bus access may be either non burst or burst A non burst access ends after one data cycle to a single memory location A burst access involves two to four data cycles to consecutive memory locations BLAST the burst last signal is asserted to indicate the last data cycle of an access section 10 2 2 Burst and Pipelined Read Accesses pg 10 3 explains how to configure the bus controller for burst or non burst accesses Read accesses may be pipelined Write accesses are not pipelined
42. D23 16 10 4 A21 13 ES COL ADR E BLAST COL ADR REF REQ Refresh Request Timer F CA124A Figure B 24 Memory System Block Diagram B 32 BUS INTERFACE EXAMPLES B 3 14 DRAM Controller State Machine The state machine in Figure B 25 is more complicated than the state machine in the previous example This is because the controller works without the help of the internal wait state generator There are two advantages of this design over the previous example a DMA channel is not used and the refresh cycle does not require the processor bus Not using a DMA channel for DRAM refresh makes the DMA channel available for other applications within the system CAS before RAS refresh mode does not require the bus or any processor intervention therefore DRAM refresh occurs autonomously The DRAM controller state machine described here assumes 80 ns static column mode DRAM with a 33 MHz clock PCLK This DRAM controller does not require the internal wait state generator as a result all wait state parameters can be programmed to zero 0 B 33 BUS INTERFACE EXAMPLES NOT RAS NOT CAS NOT RDY NOT RAS NOT CAS NOT RDY NOT RAS NOT CAS NOT RDY RDY amp BLAST NOT COL ADR WE WRITE WRITE amp IBLAST REF REQ ADS amp CS REQ amp REQ RAS AS NOT CAS NOT READY
43. F CA 023A Figure 8 1 Trace Controls TC Register The TC register contains mode bits and event flags Mode bits define a set of tracing conditions that the processor can detect For example when the call trace mode bit is set the processor generates a trace event when a call or branch and link operation executes See section 8 2 pg 8 4 The processor uses event flags to monitor which trace events are generated A special instruction the modify trace controls modtc instruction allows software to modify the TC register On initialization all TC register bits and flags are cleared modtc can then be used to set or clear trace mode bits as required Software can also access event flags using modtc however this is generally not necessary The processor automatically sets and clears these flags as part of its trace handling mechanism TC register bits 0 8 through 16 and 28 through 31 are reserved Software must initialize these bits to zero and not modify them afterwards 8 2 TRACING AND DEBUGGING 8 1 2 Trace Enable Bit and Trace Fault Pending Flag The PC register trace enable bit and the trace fault pending flag located in the process controls register control tracing The trace enable bit enables the processor s tracing facilities when set the processor generates trace faults on all trace events Typically software selects the trace modes to be used through the TC register It then sets the trace
44. Number of wait cycles for Write Address to Data The number of wait states that data is held after the address cycle and before the first write data cycle Nwap can be programmed for 0 31 wait states Number of wait cycles for Write Data to Data The number of wait states that data is held between consecutive data cycles of a burst write Nypp can be programmed for 0 3 wait states Number of wait cycles for X read or write Data to Address The minimum number of wait states between the last data cycle of a bus request to the address cycle of the next bus request Nypa applies to read and write requests can be programmed for 0 3 clocks NgAp and describe address to data wait states Nppp and Nwpp specify the number of wait states between consecutive data when burst mode is enabled Nppp and Nwpp are not used in non burst memory regions Nxpa describes the number of wait states between consecutive bus requests Nxpa is the bus turnaround time An external device s ability to relinquish the bus on a read request read deasserted to data float determines the number of NxpA cycles NOTE Nxpa States are only inserted after the last data transfer of a bus request Therefore for requests composed of multiple accesses states do not appear between each access For example on an 8 bit burst bus Nypa states are inserted only after the fourth byte of a word request rather than after every byte See Figure 11 2 E
45. Sheet 2 of 2 Groups 2 5 Bus Configuration Registers DEFAULT Region 0 DEFAULT Region 1 DEFAULT Region 2 DEFAULT Region 3 DEFAULT Region 4 DEFAULT Region 5 DEFAULT Region 6 DEFAULT Region 7 DEFAULT Region 8 DEFAULT Region 9 DEFAULT Region 10 DEFAULT Region 11 DEFAULT Region 12 I O Region 13 DRAM Region 14 FLASH Region 15 Group 6 Breakpoint Trace and Bus Control Registers 0 Reserved 0 BPCON Register reserved by monitor 0 Trace Controls 1 BCON Register Region config valid Example 14 5 Initialization Boot Record File ibr c Sheet 1 of 2 include init h NOTE The ibr must be located at OxFFFFFF00 Use the linker to locate this structure The boot configuration is almost always region F since the IBR must be located there define CONFIGFLASH extern void start ip extern unsigned rom prcb extern unsigned checksum define CS 6 int amp checksum value calculated in linker 14 21 INITIALIZATION AND SYSTEM REQUIREMENTS intel Example 14 5 Initialization Boot Record File rom ibr c Sheet 2 of 2 const IBR init boot record BYTE N 0 BOOT CONFIG 0 0 0 BYTE N
46. s complement number in the range 392135 221 1 The processor ignores the ret instruction s displacement field The T bit performs the same prediction function for CTRL instructions as it does for COBR instructions D 5 FORMAT The MEM format is used for instructions that require a memory address to be computed These instructions include the load store and Ida instructions Also the extended versions of the branch branch and link and call instructions bx balx and callx use this format D 3 MACHINE LEVEL INSTRUCTION FORMATS intel The two MEM format encodings MEMA can optionally 32 bit displacement contained in a second word to the instruction Bit 12 of the instruction s first word determines whether MEMA clear or MEMB set is used The opcode field is eight bits long for either encoding The src dst field specifies a global or local register For load instructions src dst specifies the destination register for a word loaded into the processor from memory or for operands larger than one word the first of successive destination registers For store instructions this field specifies the register or group of registers that contain the source operand to be stored in memory The mode field determines the address mode used for the instruction Table D 2 summarizes the addressing modes for the two MEM format encodings Fields used in these addressing modes are described i
47. seen 9 7 Cache Configuration emen 9 79 MCONO 15 Programmable eene 10 8 BCON Register Bit 10 9 Bus Controller Pins aient tire eek dine 11 3 Byte Enable 11 11 Burst Transfers and Bus Widths sss 11 15 Byte Ordering on Bus Transfers 11 26 Location of Cached Vectors in Internal 12 20 Cache Configuration 12 22 Transter Type Options coeunt eee en io 13 4 DMA Configuration and Byte Count Alignment 13 10 intel 3 CONTENTS Table 13 3 Table 13 4 Table 13 5 Table 13 6 Table 13 7 Table 13 8 Table 14 1 Table 14 2 Table 14 3 Table A 1 Table A 2 Table 1 3 Table A 4 Table A 5 Table A 6 Table A 7 Table A 8 Table A 9 Table A 10 Table A 11 Table A 12 Table A 13 Table A 15 Table A 14 Table A 16 Table A 17 Table A 18 Table A 19 Table A 20 Table A 21 Table D 1 Table D 2 Table D 3 Table E 1 Table E 2 Table E 3 Table E 4 Table E 5 DMA Transfer Alignment Requirements 13 11 Rotating Channel Priority 13 20 DMA Transfer Clocks Nyfer 13 39 Base Values of Worst case DMA Throughput used for DMA Latency Calculation 13 42 DMA Latency Components 13 43 Values of DMA Latency Components 13 43 Pin Reset St
48. une eet D 3 D 4 CT RE FORMAT tito adu ri o cedi wees abated cote acd D 3 D 5 MEM FORMAT ceo ree EUR RERO A EORR EE ES D 3 D 5 1 MEMA Format 2 4 4 440 D 4 D 5 2 MEMB Format Addressing acir necne A eterne D 5 APPENDIX E MACHINE LANGUAGE INSTRUCTION REFERENCE E 1 INSTRUCTION REFERENCE BY E 1 APPENDIX F REGISTER AND DATA STRUCTURES F 1 Data Structures rH erri nt n Ted P ee F 2 F 2 Registers a a F 10 GLOSSARY INDEX xvi intel FIGURES Figure 1 1 Figure 2 1 Figure 2 2 Figure 2 3 Figure 2 4 Figure 2 5 Figure 2 6 Figure 3 1 Figure 3 2 Figure 4 1 Figure 4 2 Figure 5 1 Figure 5 2 Figure 5 3 Figure 5 4 Figure 5 5 Figure 6 1 Figure 6 2 Figure 6 3 Figure 6 4 Figure 7 1 Figure 7 2 Figure 7 3 Figure 7 4 Figure 7 5 Figure 8 1 Figure 8 2 Figure 8 3 Figure 8 4 Figure 10 1 Figure 10 2 Figure 10 3 Figure 10 4 Figure 10 5 Figure 10 6 Figure 11 1 Figure 11 2 Figure 11 3 CONTENTS i9609 CA CF Superscalar Microprocessor Architecture 1 2 19609 Cx Microprocessor Programming 2 2 Controllable asset oc ira e ces ete C EE en tih enn 2 7 Address Space iunii deii nope e cone acta eed 2 9 Arithmetic Controls AC 2 15 Process Controls PC Regis
49. 10 3 memory request 11 1 Micro flows atomic instructions A 42 bit and bit field instructions 39 branch instructions A 40 call and return instructions A 41 comparison instructions 40 data movement instructions 38 debug instructions 42 definition A 15 execution A 38 fault instructions A 42 invocation 37 processor management instructions A 42 modac 9 48 modi 9 49 modify 9 50 modify trace controls modtc instruction 8 2 modpc 9 51 modtc 9 52 mov 9 53 mov movl movt movq 9 53 move instructions 9 53 mul 9 54 muli mulo 9 54 multiple fault conditions 7 9 Multiply Divide Unit MDU A 7 A 22 intel N nand 9 55 NIF bit 7 14 7 19 NMI 12 7 no imprecise faults NIF bit 7 14 non maskable interrupt NMI 6 3 12 7 12 9 nor 9 56 not notand 9 57 notbit 9 58 notor 9 59 NgAp 10 4 11 5 Ngpp 10 4 11 5 Nwap 10 4 11 5 Nwpp 10 4 11 5 NxpA 10 4 11 5 ONCE 14 1 14 5 on circuit emulation ONCE 14 1 14 5 one X mode 14 26 or ornot 9 60 ordinals 3 3 sign and sign extension 3 3 sizes 3 3 output pins 14 28 P parallel instruction execution overview 1 1 Parallel Issue A 14 parallel processing A 14 parallel execution A 15 parameter passing 5 10 argument list 5 10 by reference 5 10 by value 5 10 PC register see Process Controls PC register 2 17 pending interrupts 6 5 encoding 6 5 interrupt procedure pointer 6 5 pending priorities field 6 5 INDEX PFP Previous Frame
50. 1254 g6 word beginning at offset 1254 g6 g2 st 92H MEM stob 82H MEM stos stib C2H MEM stis CAH MEM stl 9AH MEM stt A2H MEM stq B2H MEM LOAD MOVE intel 9 3 55 Mnemonic Format Description Action Faults Example Opcode See Also INSTRUCTION SET REFERENCE subc subc Subtract Ordinal With Carry subc srcl src2 dst reg lit sfr reg lit sfr reg sfr Subtracts src from src2 then subtracts not AC cc1 and stores the result in dst If the ordinal subtraction results in a carry AC ccl is set to 1 otherwise is set to 0 This instruction can also be used for integer subtraction Here if integer subtraction results in an overflow condition code bit O is set subc does not distinguish between ordinals and integers it sets condition code bits 0 and 1 regardless of data type dst src2 srel AC ccl AC cc lt 0 is if integer subtraction would have generated an overflow 0 otherwise Cis Carry out of the ordinal addition of src2 to not src and Mismatch Non supervisor reference of a sfr subc g5 g6 g7 g7 lt 46 95 not Carry Bit subc 5B2H REG addc addi addo subi subo 9 75 INSTRUCTION SET REFERENCE intel 9356 Subi subo Mnemonic subi Subtract Integer subo Subtract Ordinal Format sub srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Subtracts src from src2
51. 16 else if PFP rt 010 gt or PFP rt20115 return to non supervisor procedure PC te lt PFP rt0 PC em lt user else if PFP rt 000 gt return from local else Operation Unimplemented fault lt these accesses cached in the local register cache 0 15 lt memory FP lt Trace nstruction Return Pre Return Instruction Return and Pre Return Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC r or TC p 1 intel INSTRUCTION SET REFERENCE Operation Unimplemented Execution from on chip data RAM Unimplemented Reserved return type encountered Example ret program control returns to context of calling procedure Opcode ret OAH CTRL See Also call calls callx 9 63 INSTRUCTION REFERENCE intel 9 3 47 Format Description Faults Example Opcode See Also 9 64 rotate rotate Rotate rotate len STC dst reg lit sfr reg lit sfr reg sfr Copies src to dst and rotates the bits in the resulting dst operand to the left toward higher significance Bits shifted off left end of word are inserted at right end of word The Jen operand specifies number of bits that the dst operand is rotated This instruction can also be used to rotate bits to the right Here the number of bits the word is to be rotated right is subtracted from 32 to get
52. 3 1 3 2 State Machine PED 5 n a bete B 3 B 1 3 3 Write Enable Generation Logic 2 eem 3 1 3 4 Chip Select Generation B 3 B 1 4 Wavetorms 2 lai akin dh oe genre ie B 4 B 1 4 1 Wait State Selection 4 8 440 B 5 B 1 4 2 Output Enable and Write Enable B 6 B 1 4 3 State Machine Descriptions B 6 B 1 5 Trade offs and Alternatives B 10 B 2 PIPELINED SRAM READ B 10 B 2 1 Block Diagram ete Ree an iEn HERE B 11 B 2 1 1 Address Lath ette ettet ated qe dettes B 12 B 2 1 2 State Machine PED 5 iid efie diede eade B 12 B 2 1 3 Write Enable redet ep es ee imd ete B 12 xiv intel 3 CONTENTS B 2 2 WAaVefortms serit nete aede Pagi eee e P peret B 13 B 2 2 1 ee eee nete irem oie eim Mere EH dee B 13 B 2 3 Trade offs and Alternatives 2 B 15 B 3 INTERFACING TO DYNAMIC B 15 B 3 1 DRAM Access Modes necem HR e et east eia dace B 15 B 3 1 1 Nibble Mode DRAM sseeeeeeneneeeeneenmee merenti enne nnne neret B 16 B 3 1 2 Fast Page Mode DRAM sess B 17 B 3 1 3 Static Column Mode B 18 B 3 2 DRAM Refresh Modes treaa ArT AEA 18 B 3 3 Address Multiplexer Input Con
53. 32 bits Multi Cycle Word 32 bits Word 32 bits Fly by Quad Word 128 bits Quad Word 128 bits Multi Cycle Quad Word 128 bits Quad Word 128 bits Fly by For a multi cycle transfer source data is first loaded into on chip DMA registers before it is stored to the destination The processor effectively buffers the data for each transfer When a DMA transfer is configured for destination synchronization the DMA controller buffers source data waiting for the request active DREQ3 0 signal from the destination requestor This operation reduces latency The initial DMA request however still requires the source data to be loaded before the request is acknowledged Source data buffering is shown in Figure 13 1 The DMA controller does not perform multi cycle transfers atomically A DMA transfer does not cause the processor s LOCK output to be asserted A bus hold request may also be acknowledged between the bus requests which make up a multi cycle transfer 13 4 intel CONTROLLER 32 bit device 32 bit memory i9609 CA CF Microprocessor buffer T destination External Bus next load prefetched amp buffered External word word V word word VM Bus load Store f load Store fA first request DREQx M i DACKx f Figure 13 1 Source Data Buffering for Destination Synchronized DMAs F CA058A 13 4 2 Fly By Single Cycle Transfers Fly by transfers are executed with only
54. 4 8 2 Instruction Address Breakpoint Registers 1 8 6 Data Address Breakpoint Registers DABO 1 8 6 Hardware Breakpoint Control Register 8 7 MCON 0 15 Registers Configure External 10 6 Memory Region Configuration Register MCON 0 15 10 7 Bus Configuration Register BCON ssee menn 10 8 Summary of Aligned Unaligned Transfers for Little Endian Regions 10 11 Summary of Aligned Unaligned Transfers for Little Endian Regions cont 10 12 Bus Controller Block 10 14 Internal Programmable Wait 11 6 Quad word Read from 32 bit Non burst Memory 11 8 Bus Request with READY and BTERM 11 9 xvii 5 Figure 11 4 Figure 11 5 Figure 11 6 Figure 11 7 Figure 11 8 Figure 11 9 Figure 11 10 Figure 11 11 Figure 11 12 Figure 11 13 Figure 11 14 Figure 11 15 Figure 11 16 Figure 11 17 Figure 11 18 Figure 11 19 Figure 12 1 Figure 12 2 Figure 12 3 Figure 12 4 Figure 12 5 Figure 12 6 Figure 12 7 Figure 12 8 Figure 12 9 Figure 13 1 Figure 13 2 Figure 13 3 Figure 13 4 Figure 13 5 Figure 13
55. 4 2 6 1 Compare and Conditional Compare These instructions compare two operands then set the condition code bits in the AC register according to the results of the comparison cmpi compare integer cmpo compare ordinal concmpi conditional compare integer concmpo conditional compare ordinal chkbit check bit These all use the REG format and can specify literals or local global or special function registers The condition code bits are set to indicate whether one operand is less than equal to or greater than the other operand See section 2 6 2 Arithmetic Controls AC Register pg 2 15 for a description of the condition codes for conditional operations cmpi and simply compare the two operands and set the condition code bits accordingly concmpi and concmpo first check the status of condition code bit 2 e not set the operands are compared as with cmpi and e fset no comparison is performed and the condition code flags are not changed The conditional compare instructions are provided specifically to optimize two sided range comparisons to check if is between B and C i e B A lt C Here a compare instruction cmpi or cmpo checks one side of the range e g A 2 B and a conditional compare instruction concmpi or concmpo checks the other side e g A C according to the result of the first comparison The condition codes following the conditional comparison directly reflect the results
56. 8 2 7 1 Software Breakpoints 8 5 8 2 7 2 Hardware Breakpoints 8 5 8 3 SIGNALING A TRACE 8 7 8 4 HANDLING MULTIPLE TRACE 5 8 8 8 5 TRACE FAULT HANDLING 8 8 8 6 TRAGE HANDLING AGTION ir Een aa tala tina dip est ette eti 8 9 8 6 1 Normal Handling of Trace Events eene 8 9 8 6 2 Prereturn Trace Handling 8 9 8 6 3 Tracing and Interrupt Procedures 8 9 CHAPTER 9 INSTRUCTION SET REFERENCE 9 1 N RODU C TON e ere nte Un Cede eee decet 9 1 9 2 NOTATIONS et m t oap mide setas 9 1 9 2 1 Alphabetic Reference ope eon ae de due ile pe ie 9 2 9 2 2 Ml rop erm 9 2 9 2 3 Heri ES 9 3 9 2 4 DeSCripltlon iioc t c deett peius baec b tiec cH ab d fcd e 9 3 9 2 5 ei EE 9 4 9 2 6 FANS fe 9 6 9 2 7 Example rae e See ee ER YER 9 7 9 2 8 Opcode and Instruction Format eese 9 7 9 2 9 See satin in te etie aa ette n eii da evi er teret 9 7 9 3 INSTR UGHMONS ait ny ae teer a a ied e ci ee 9 7 9 8 1 addo eese Renee dee 9 8 9 3 2 addi addo rd dat eec m er ed c tene in a reda 9 9 9 3 3 ee RE dee PE ee 9 10 9 3 4 and andrnot EL te Pd adde 9 11 9 3
57. 93 92 g2 is also destination load must wait for addo to complete However the following instructions can be issued in parallel addo g0 gl g2 g0 is a source for both instructions st g0 93 or addo 90 gl 92 90 is source for addo and 1 93 40 destination for load In all cases of parallel instruction issue the IS ensures that the program operates as if the instruc tions were actually issued sequentially Two conditions can delay the execution of one or more of the instructions that the scheduler attempted to issue a scoreboarded register or a scoreboarded resource INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 3 1 Register Scoreboarding If an instruction s source or destination register is the destination of a prior incomplete multi clock instruction such as a load the instruction is delayed The scheduler attempts to reissue the instruction every clock until the scoreboarded register is updated and the delayed instruction can be executed Table 1 3 summarizes conditions which cause a delay due to a scoreboarded register Table 1 3 Scoreboarded Register Conditions Condition Description src busy One or both of the registers specified as a source for the instruction was referenced as a destination of a prior instruction which has not completed dst busy The destination referenced by the instruction was referenced as a destination of a prior instruction which has no
58. 93 25 extract Mnemonic extract Extract Format extract bitpos len srcldst reg lit sfr reg lit sfr reg Description Shifts a specified bit field in src dst right and zero fills bits to left of shifted bit field bitpos value specifies the least significant bit of the bit field to be shifted len value specifies bit field length Action src dst lt src dst gt gt bitpos mod 32 and 2 len 1 Faults Type Mismatch Non supervisor reference of a sfr Example extract 5 12 g4 g4 lt g4 with bits 5 through 16 shifted right Opcode extract 651H REG See Also modity 9 39 INSTRUCTION SET REFERENCE intel P 9 3 26 Mnemonic Format Description Action Faults 9 40 FAULT IF faulte t f Fault If Equal faultne t f Fault If Not Equal faultl t f Fault If Less faultle t f Fault If Less Or Equal faultg t f Fault If Greater faultge t f Fault If Greater Or Equal faulto t f Fault If Ordered faultno t f Fault If Not Ordered fault tl f Raises a constraint range fault if the logical AND of the condition code and opcode s mask part is not zero For faultno unordered fault is raised if condition code is equal to 0005 Optional t or f suffix may be appended to the mnemonic Use t to speed up execution when these instructions usually fault use f to speed up execution when these instructions usually do not fault If a suffix is not provided the assembler is free to
59. A31 2 BE3 0 SUP DMA and All of these pipelined signals are invalid during the last cycle of a pipelined read Address delay time for the pipelined read is the output valid time of the address latch or the PA3 2 generation PLD Minimizing address delay maximizes access time B 12 intel o BUS INTERFACE EXAMPLES B 2 2 Waveforms 108 Figure 8 Pipelined Read Waveform B 2 2 1 State Machines Chip enable CE is controlled by a simple state machine The state machine is normally in the idle state and CE is not asserted When ADS and PSRAM CS are asserted the CE state machine goes to the active state CE remains active until BLAST is asserted ADS amp PSRAM CS BLAST amp ADS amp PSRAM CS Assert CE F CA109A Figure B 9 Pipelined Read Chip Enable State Machine BUS INTERFACE EXAMPLES intel P The PA3 2 state machine latches the A3 2 address bits on read and generates the low address bit for writes During read PA3 2 is a latched version of A3 2 If a write access occurs the state machine generates the proper PA3 2 addresses 9 BLAST State Bits XXX IWAIT amp IBLAST X A3 A2 3 ADS WR CS 1 2 B ADS WR CS A2 Abs cs 2 ADS WR CS A2 F 0110 Figure B 10 Pipelined Read PA3 2 State Machine Diagram In the READ STATE the state machine simply latches A3 2 and ou
60. All Modes high to prevent next bus cycle DREQx ls i i Case 1 ss tius MA high to prevent next bus cycle Request DREQx Case 2 l 155 lius i i Note 1 Case 1 DREQ must deassert before DACK deasserts This applies to all Fly By modes source synchronized packing modes and destination synchronized unpacking modes 2 Case 2 DREQ must be deasserted by the second clock rising edge after DACK is driven high This applies to all other DMA transfers 3 DACKx is asserted for the duration of a DMA bus request The request may consist of multiple bus F CX018A accesses defined by ADS and BLAST Figure 13 14 DMA Request and Acknowledge Timing 13 11 3 End Of Process Terminal Count Timing EOP TC3 0 can be programmed as an input EOP3 0 or output TC3 0 for each channel EOP TC3 0 pins are configured when a channel is setup using sdma TC3 0 is asserted when byte count reaches zero 0 for a chained or non chained DMA A TC3 0 pin for a channel is driven active during the last acknowledge bus request for a non chained DMA or during the last acknowledge bus request of each buffer for a chained DMA TC3 0 pins have the same timing as DACK3 0 13 33 DMA CONTROLLER intel EOP3 0 pins are asserted to terminate a DMA EOP3 0 pins are active level detected For proper internal detection EOP3 0 pins must be asserted for a minimum of two and maximum of 17 PCLK2 1 cycles See Figure 13 15
61. FP current frame 31 Interrupt Stack padding area optional data not implemented for i9609 Cx processor saved Process Controls Register NFP 16 Interrupt saved Arithmetic Controls Register 12 Record new frame Reserved 017 Figure F 5 Storage of Interrupt Record on the Interrupt Stack Section 6 7 INTERRUPT STACK AND INTERRUPT RECORD pg 6 9 F 6 lel REGISTER AND DATA STRUCTURES 000H 004H 31 87 0 Pending Priorities Pending Interrupts 020H 024H Vector 8 028H Vector 9 Entry 10 02 Vector 10 Entry 243 3DO0H Vector 243 3D4H Vector 244 Vector 247 3E8H Vector 249 Vector 251 Entry 252 Vector 252 Entry 255 400H Vector 255 Vector Entry 210 Instruction Pointer X L t Entry Type T 00 Normal Reserved Initialize to 0 01 Reserved 10 Target Cache Preserved 11 Reserved CA016A Secti Figure F 6 Interrupt Table on 6 4 INTERRUPT TABLE pg 6 3 F 7 REGISTER AND DATA STRUCTURES Procedure Stack Current Register Set Previous Frame Pointer i 10 Stack Pointer SP ri Previous Return Instruction Pointer RIP r2 Stack Frame Frame Pointer FP g15 user allocated stack padding area Previous Frame Pointer PFP 10 Stack Pointer SP n register Current reserved for RIP r2 Pippi r15 user allocated stack
62. IPB1 F 16 Interrupt Control ICON Register Section 12 3 4 Interrupt Control Register ICON pg 12 11 F 14 F 17 Interrupt Map IMAPO IMAP2 Registers 7 12 3 5 Interrupt Mapping Registers 0 2 pg F 15 nterrupt Mask IMSK and Interrupt Pending Section 12 3 6 Interrupt Mask and Pending Registers IMSK IPND F 18 F 16 IPND Registers pg 12 14 F 19 Memory Region Configuration Register Section 10 3 1 Memory Region Configuration Registers MCON 0 FA7 MCON 0 15 15 pg 10 6 F 20 Previous Frame Pointer Register PFP r0 Section 5 8 RETURNS pg 5 16 F 18 F 21 Process Controls PC Register Section 2 6 3 1 Initializing and Modifying the PC Register pg 2 19 F 18 F 22 Trace Controls TC Register Section 8 1 1 Trace Controls TC Register pg 8 2 F 19 F 23 Process Control Block Configuration Words Section 14 3 REQUIRED DATA STRUCTURES pg 14 11 F 20 1 REGISTER AND DATA STRUCTURES F 1 Data Structures 31 0 2 4H F CA002A Figure F 1 Control Table Section 2 3 CONTROL REGISTERS pg 2 6 F 2 REGISTER AND DATA STRUCTURES 1 Reserved 31 0 Optional Data NFP 20 NFP 16 NFP 12 NFP 8 NFP 4 F CA020A Figure F 2 Fault Record Section 7 5 1 Fault Record Data pg 7 6 F 3 REGISTER AND DATA STRUCTURES intel P 31 Fault Table 0 oooO O O O 48H
63. Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Instruction Execution um Opcode Mnemonic Description p cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p tfp em te Events Modes T Format Type lssug Latency Add Ordinal with Carry addc srcl src2 dst reg lit sfr reg lit sfi reg sfr 0 I I U SB 0 REG R 05 1 1 dst src2 srcl AC ccl US 0 lt integer overflow AC cel carry out Add Integer i srel dst add reals I 1 U IO M S591 REG 05 1 1 dst lt src2 srel Add Ordinal addo srcl dst eg lit sfr reg sfr I U 59 0 REG R 0 5 1 1 dst lt src2 srel Alter Bit Iterbi bitpos sre dst alterbit reg lit sfr reg lit sfr reg sfr 1 1 U ser REG R 05 1 1 if AC ccl 1 f i dst src or 2 bitpos mod 32 else dst src and not 2 bitpos mod 32 And n srcl src2 dst and reg lit sfr reg lit sfr reg sfr I U M 581 REG R 0 5 1 1 dst lt src2 and srcl And Not srcl dst andnot regllitstr 1 r u 582 REG R 0 5 1 1 dst lt src2 and not src1 Atomic Add Ordinal atadd src dst sre dst reg reg lit sfr reg sfr dst lt memory src dst and not 0x3 U M 61 2 REG u 7 w
64. Register bypassing logic constantly monitors all register addresses being written and read If a register is being read and written in the same clock bypass logic routes incoming data from the write port directly to the read port A 2 3 4 Additional Scoreboarded Resources Due to the Data Cache In general when the scheduler issues a group of instructions the targeted parallel processing units immediately acknowledge receipt of instructions and the scheduler begins considering the next four unexecuted words of the instruction stream There are however two conditions in which the execution of one or more of the instructions that the scheduler attempted to issue would be delayed These conditions are a scoreboarded register or a scoreboarded resource Because of the addition of the data cache to the MEM side there are a few additional scoreboarded resources on the 1960 CF processor A scoreboarded resource thwarts the scheduler s attempt to issue an instruction resource is scoreboarded when it is needed to execute the instruction but is not available The parallel processing units are the resources To maintain cache coherency the IS does not issue any user process stores to the BCU until all pending cacheable loads have returned to the data cache The BCU is scoreboarded for user process stores when its queues contain one or more cacheable loads DMA stores are allowed to be issued to the BCU See section 1 8 8 BCU Queues and Cache Cohe
65. cmpibg Compare Integer and Branch If Greater srcl sre2 targ it sfr if srel lt src2 AC cc lt 100 else if srcl src2 AC cc lt 010 else AC cc lt 001 IP targ IB IB 39 COBR C cmpibl Compare Integer and Branch If Less srcl src2 targ g lit sfr reg if src1 lt src2 AC cc lt 100 IP lt targ else if srcl src2 AC cc lt 010 else AC cc lt 001 IB IB 3C COBR cmpible Compare Integer and Branch If Less Or Equal srcl src2 targ reg lit sfr reg if src1 lt sre2 AC cc lt 100 IP lt targ else if sre src2 AC cc lt 010 IP lt targ else AC cc lt 001 IB IB 3E COBR C cmpibne Compare Integer and Branch If Not Equal srcl src2 targ reg lit sfr reg if src1 lt src2 AC cc lt 100 IP lt targ else if srel src2 AC cc lt 010 else AC cc lt 001 IP lt targ IB IB 3D COBR C cmpibno Compare Integer and Branch If Not Ordered src2 targ it sfr reg if sre1 lt sre2 AC cc lt 100 else if src1 src2 AC cc 010 else AC cc lt 001 IB IB 38 COBR C cmpibo Compare Integer and Branch If Ordered srcl sre2 targ reg lit sfr src lt src2 lt 100 e if srcl src2 lt 010 else AC cc lt 001 lt targ
66. collision free bus arbitration schemes such as VME and MULTIBUS I A collision occurs when multiple processors begin a bus access simultaneously and a conflict for control of one of the processor s local memory occurs Figure 11 19 illustrates a bus collision In this system several processors share a common bus Each processor has local memory which is connected directly to that processor s address data and control lines Each processor can access another processor s local memory over the bus Processor A has highest priority and Processor B has lowest priority for use of the bus Processor A and B simultaneously request an access over the bus Processor attempts to access Processor B s local memory and Processor B attempts to access another memory on the bus Use of the bus is granted to Processor because it is the highest priority For Processor A to complete its access the local bus for Processor B must be relinquished floated This is accomplished by asserting the BOFF pin for Processor B When BOFF is asserted external memory is responsible for gracefully cancelling the current access This means that the memory control state machine should cancel write cycles and return to an idle state after BOFF is asserted The processor ignores read data after BOFF is asserted 11 30 lel EXTERNAL BUS DESCRIPTION Regenerate ADS 4 n NON BURST p F
67. enable bit to begin tracing This bit is also altered as part of some call and return operations that the processor performs as described in section 8 6 3 Tracing and Interrupt Procedures pg 8 9 The trace fault pending flag allows the processor to track when a trace event is detected for an enabled trace condition The processor uses this flag as follows 1 When the processor detects trace event and tracing is enabled it sets the flag 2 Before executing an instruction the processor checks the flag 3 If the flag is set and tracing is enabled it signals a trace fault By providing a means to record trace event occurrences the trace fault pending flag allows the processor to service an interrupt or handle a fault other than a trace fault before handling the trace fault Software should not modify this flag 8 1 3 Trace Control on Supervisor Calls The trace control bit allows tracing to be enabled or disabled when a call system instruction calls executes which results in a switch to supervisor mode This action occurs independent of whether or not tracing is enabled prior to the call supervisor call is a calls instruction that references an entry in the system procedure table with an entry type 0105 When a supervisor call executes the processor 1 Saves current register trace enable bit status in the register return type field bit 0 2 Sets the PC register trace enable bit to the value of the trace contr
68. except interrupts directed to the locked cache come from external memory See section 12 3 14 Caching Interrupt Handling Procedures pg 12 21 for more details on the cache load and lock feature sysctl is issued with a configure instruction cache message type to select the load and lock mechanism When the lock option is selected an address is specified which points to a memory block to be loaded into the locked cache 2 5 6 Data Cache 80960CF Only The 1960 CF processor has a 1 Kbyte direct mapped data cache which enhances performance by reducing the number of load and store accesses to external memory The data cache can return up to a quad word 128 bits to the register file in a single clock cycle on a cache hit section A 1 8 Data Cache 80960CF Only pg 8 fully describes the data cache 2 6 PROCESSOR STATE REGISTERS The architecture defines four 32 bit registers that contain status and control information e Instruction Pointer IP register e Arithmetic Controls AC register e Process Controls PC register e Trace Controls TC register 2 14 3 PROGRAMMING ENVIRONMENT 2 6 1 Instruction Pointer IP Register The IP register contains the address of the instruction currently being executed This address is 32 bits long however since instructions are required to be aligned on word boundaries in memory the IP s two least significant bits are always 0 zero 1960 processor instructions
69. followed by a micro flow The sequence executes one clock faster when the branch target is long word aligned The reason for the extra clock is described in section A 2 6 Micro flow Execution pg A 36 Since optimization can save one clock under such circumstances it could be worthwhile in small frequently executed loops A 53 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel Example A 6 Align Branch Targets initialize initialize align 2 align 2 mov 40 40 target target add 90 gl add 90 gl lda Oxffffffff 2 lda Oxffffffff g2 scanbit g3 g4 scanbit g3 g4 addo 45 46 addo g5 g6 more more Execution Execution Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 b target 1 b target 2 2 addo Ida 3 addo Ida u4 scanbit u4 scanbit u5 5 addo addo 6 more 7 more A 2 7 9 Replacing Straight Line Code and Calls bal takes three or four clocks to execute and does not cause a frame spill to memory Replacing calls with branch and link instructions is an obvious optimization However a not so obvious but equally beneficial optimization is to use branches and bal to reduce a critical procedure s code size When porting optimized algorithms originally written for other processors the software engineer often expands the code in a straight line fashion due to branch speed penalties of the original target and the lack of o
70. generation This instruction is used in part to load and control the 1960 Cx microprocessors breakpoint registers 4 2 11 Atomic Instructions Atomic instructions perform read modify write operations on operands in memory They allow a system to ensure that when an atomic operation is performed on a specified memory location the operation completes before another agent is allowed to perform an operation on the same memory These instructions are required to enable synchronization between interrupt handlers and background tasks in any system They are also particularly useful in systems where several agents processors coprocessors or external logic have access to the same system memory for communication The atomic instructions are atomic add atadd and atomic modify atmod atadd causes an operand to be added to the value in the specified memory location atmod causes bits in the specified memory location to be modified under control of a mask Both instructions use the REG format and can specify literals or local global or special function registers 4 2 12 Processor Management These instructions control processor related functions modpc modify the process controls register flushreg flush cached local register sets to memory modac modify the AC register sysctl perform system control function sdma set up a DMA controller channel udma copy current DMA pointers to internal data RAM All use the REG format and can specify
71. processor executes loop with two instructions per clock 2 bus is saturated with quad operations 3 no registers are left 4 loop does not fit in the cache Order for parallelism Alternate REG side instructions with MEM side instructions so they may be issued in parallel Migrate the operation To enable parallelism move EU and MDU operations to the AGU or vice versa Use branch prediction Set prediction bits correctly in conditional instructions Align branch targets Align branch targets of critical loops on an even word or quad word boundary Compress code to fit If loop does not fit in cache use branches branch and links or calls to compress code size so it fits Use code size optimization instructions e g cmpobe where possible Use data RAM Use high bandwidth data RAM space for performance critical and or latency critical variables A 57 intel BUS INTERFACE EXAMPLES intel APPENDIX B BUS INTERFACE EXAMPLES This appendix describes how to interface the processor to external memory systems Also discussed are non pipelined and pipelined burst SRAM non pipelined burst DRAM slow 8 bit memory systems and high performance pipelined burst EPROM All issues discussed in each example are independent of operating frequency Design examples state machines and pseudo code are for example only refer to the 80960 Evaluation Platform User s Guide order numb
72. registers are 6 0 sf1 sf2 However when programming the registers in user generated code make sure to use the instruction operand 1960 microprocessor compilers recognize only the instruction operands listed in Table 1 1 Throughout this manual the registers descriptive names numbers operands and acronyms are used interchangeably as dictated by context Groups of bits and single bits in registers and control words are called either bits flags or fields These terms have a distinct meaning in this manual bit Controls a processor function programmed by the user flag Indicates status Generally set by the processor certain flags are user program mable field A grouping of bits bit field or flags flag field 1 7 INTRODUCTION intel Table 1 1 Register Terminology Conventions Register Descriptive Name Register Number yere Acronym Global Registers 00 015 90 914 Frame Pointer 015 fp FP Local Registers 10 115 15 Previous Frame Pointer ro pfp PFP Stack Pointer ri 5 SP Return Instruction Pointer r2 rip RIP Interrupt Pending Register 510 510 IPND Interrupt Mask Register 51 51 IMSK DMA Command Register sf2 sf2 DMAC Specific bits flags and fields in registers and control words are usually referred to by a register abbreviation in upper case followed by a bit flag or field name in lower case These items are separated with a period A position number desi
73. starting 13 31 end of process 13 32 execution 13 1 fixed addresses 13 10 Fly by transfers 13 5 latency 13 40 multi cycle transfers 13 3 overview 13 1 pin definitions 13 30 setup and control 13 2 terminal count 13 32 termination 13 2 transfer 13 3 transfer type 13 3 unaligned DMA transfers 13 10 DMA command register DMAC 13 21 DMA controller Block mode DMAs 13 33 overview 1 4 DMAC register 13 20 E ediv 9 36 effective address efa A 25 effective address calculations 26 electromagnetic interference EMI 14 32 electrostatic interference ESI 14 32 intel emul 9 37 eshro 9 38 EU execution unit 7 executable group A 15 A 29 Execution Unit EU A 7 A 20 explicit calls 5 1 extended addressing instructions 4 14 extended register set C 5 external interrupt pins XINT7 0 12 9 external memory requirements 2 10 external system requirements C 6 extract 9 39 F fault conditions 7 1 fault handling data structures 7 1 during program execution 7 2 fault record 7 2 7 6 fault table 7 2 7 4 fault type and subtype numbers 7 2 fault types 7 4 local calls 7 2 multiple fault conditions 7 9 return instruction pointer RIP 7 7 returning to an alternate point in the program 7 14 stack frame alignment 7 8 stack usage 7 6 supervisor stack 7 2 system procedure table 7 2 system local calls 7 2 system supervisor calls 7 2 user stack 7 2 fault handling procedure invocation 7 6 fault
74. unused stack stack growth toward higher addresses F CA010A Figure F 7 Procedure Stack Structure and Local Registers Section 5 2 1 Local Registers and the Procedure Stack pg 5 2 F 8 000H 008H supervisor stack pointer base Bl 00CH 010H Trace Control Bit 02CH procedure entry 0 030H procedure entry 1 034H procedure entry 2 038H 03CH 438H procedure entry 259 43CH 31 Procedure Entry 210 I Reserved Entry Type E 00 Local Initialize to 0 Preserved 10 Supervisor F_CA013A REGISTER AND DATA STRUCTURES Figure F 8 System Procedure Table Section 5 5 1 1 Procedure Entries pg 5 14 F 9 REGISTER AND DATA STRUCTURES intel P F 2 Registers ft 4 1 No Imprecise Faults Bit AC nif n 0 some faults are imprecise 1 all faults are precise Integer Overflow Mask Bit AC om 0 no mask 1 mask Integer Overflow Flag AC of 0 no overflow 1 overflow Condition Code Bits AC cc Reserved Initialize to 0 CA004A Figure F 9 Arithmetic Controls Register AC Section 2 6 2 Arithmetic Controls AC Register pg 2 15 Configuration Table Valid 0 table not valid 1 table valid 0 protection OFF Internal RAM Protection Enabled BCON irp 1 protection ON Y Bus Configuration Register BCON Reserved Initialize to 0 CA029A Figure F
75. 10 Bus Configuration Register BCON Section 10 3 2 Bus Configuration Register pg 10 8 F 10 intel REGISTER AND DATA STRUCTURES Data Address 31 28 24 20 16 12 8 4 0 025 Figure 11 Data Address Breakpoint Registers Section 8 2 7 Breakpoint Trace pg 8 5 F 11 REGISTER AND DATA STRUCTURES Channel Enable Bits DMAC ce 0 suspend 1 enable Channel Terminal Count Flags DMAC ctc 0 non zero byte count 1 zero byte count software must reset Channel Active Flags DMAC ca 0 idle 1 active Channel Done Flags DMAC cd 0 not done 1 done software must reset DMA Command Register DMAC Reserved Initialize to O 12 8 4 0 Channel Wait Bits DMAC cw 0 read next descriptor 1 descriptor has been read Priority Mode Bit DMAC pm 0 fixed 1 rotating Throttle Bit DMAC t 0 4 DMA to 1 user clock max 1 1 DMA to 1 user clock max Data Cache Global Disable DMAC dcgd 0 Enabled 1 Disabled Data Cache Invalidate DMAC dci 0 Enabled 1 Invalidate F CA066A Figure F 12 DMA Command Register DMAC Section 13 10 1 Command Register DMAC pg 13 21 12 ERRATA 7 11 94 DMA Command Register bits 30 Data Cache Global Disable and 31 Data Cache Invalidate not defined in Figure 13 9 or in the text that follows the figure
76. 14 certain sequences of machine type instructions can be executed in parallel such as REG MEM REG MEM CTRL MEM CTRL In Example 4 the checksum loop is repeated Another clock is eliminated by reordering code for parallel issue Example A 4 Order for Parallelism Checksum initialize initialize loop opt loop ldob 90 gl addo g4 g2 g2 addo ql 292 3 92 1 90 91 90 93 90 cmpinco g0 g3 40 bl t loop bge f exitl ret ldob g0 44 cmpinco g0 g3 gO addo gl 42 92 Id ut opt loop exit2 addo g4 g2 g2 ret exitl addo 91 92 92 ret Execution Execution Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 Idob 1 addo g4 Idob g1 bge f 2 2 cmpinco 3 2 3 Idob 04 4 addo bl t 4 cmpinco bl t 5 cmpinco 5 addo g1 6 Idob 6 addo g4 Idob g1 48 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 7 6 Alternating from Side to Side The 1960 Cx processor can sustain execution of two instructions per clock To maximize this capability try to start instructions in two of the three pipelines each clock To increase parallelism move an instruction from a unit which has become a critical path to a unit with available clocks The AGU performs shifts additions and moves that can replace EU operations Literal addressing mode in combination with EU or AGU operations provides some freedom in deciding which side loads cons
77. 144 40 44 addo g4 g5 g6 ldt 97 g8 ldq g8 g0 Instruction id addo id Scheduler a idt 3 EU Read srci src2 16 90 94 g5 Pipeline Execute and 90 90 16 96 lt 94 05 Figure 11 Data RAM Execution Pipeline Table A 8 Data RAM Instructions Load Latency 1 clock Store Latency z 1 clock Id st Idob stob Idib stib Idos stos Idis stis Idi stl stt Idq stq A 24 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 4 4 Address Generation Unit AGU The AGU contains a 32 bit parallel shifter adder to speed memory address calculations It also directly executes the Ida instruction The AGU receives instructions over the MEM machine bus and offset and displacement values over the address out bus from the IS The AGU reads the global and local registers over the 32 bit base bus register port and writes the registers over the 128 bit load bus The AGU calculates an effective address efa which is either written to a destination register in the case of an Ida instruction or used as a memory address in the case of loads stores extended branches or extended calls When an Ida instruction is issued the AGU returns the efa to the destination register in the following clock for most addressing modes An instruction which immediately follows the Ida and references the Ida destination is not issued in the same clock as the Ida As shown in Figure A 12 it is issued in the clock in which Ida is w
78. 2 1 Add Subtract Multiply and Divide 2 4 7 4 2 2 2 Extended Arithmetic 040 4 8 4 2 2 3 Remainder and emen 4 8 4 2 2 4 Shiftand Rotate ete n e PR 4 9 4 2 8 Eee 4 10 4 2 4 Bit arid Bit Field emen Ye Ren creer d 4 10 4 2 4 1 Bit Operatlons 6i ee ehe De ERI 4 10 4 2 4 2 Bit Field Operations sosiaa rv e Tee e Ye eere rers 4 11 intel g CONTENTS 4 2 5 Byte Operations Elem 4 12 4 2 6 pai REDDERE 4 12 4 2 6 1 Compare and Conditional 2 4 12 4 2 6 2 Compare and Increment or Decrement 4 13 4 2 6 3 Test Condition Codes 2 2 4 13 4 2 7 Branch ics scie eie 4 13 4 2 7 1 Unconditional Branch 4 14 4 2 7 2 Conditional Branch ten emm eee HER 4 15 4 2 7 3 Compare and Branch 4 15 4 2 8 Caland Return 23 in deter rede n ED ee dee eter een 4 16 4 2 9 Conditional Faults srahna 2 dite htt aes te e bet eet 4 17 4 2 10 bte ete tee deer ba oed e e eos 4 17 4 2 11 Atomic InstructlOlls x nes eto v EE RE ee 4 18 4 2 12 Processor Management nnne nennen 4 18 4 3 SYSTEM CONTROL FUNCTIONS esee emen nennen nnne nere 4 19 4 3 1 sysctl Instruction Syntax 4 19 4 3 2 System Control Messages
79. 2 3 Call Trace When the call trace mode is enabled the processor generates a call trace event when a call instruction call callx or calls or a branch and link instruction bal or balx executes An implicit call such as the action used to invoke a fault handling or an interrupt handling procedure also causes a call trace event to be generated When the processor detects a call trace event it sets the prereturn trace flag PFP register bit 3 in the new frame created by the call operation or if a branch and link operation was performed it sets this flag in the current frame The processor uses this flag to determine when to signal a prereturn trace event on a ret instruction 8 2 4 Return Trace When the return trace mode is enabled the processor generates a return trace event any time a ret instruction executes 8 4 TRACING AND DEBUGGING 8 2 5 Prereturn Trace The prereturn trace mode causes the processor to generate a prereturn trace event prior to ret execution providing the PFP register prereturn trace flag is set Prereturn tracing cannot be used without enabling call tracing The processor sets the prereturn trace flag whenever it detects a call trace event as described above for call trace mode This flag performs a prereturn trace pending function If another trace event occurs at the same time as the prereturn trace event the processor generates a fault on the non prereturn trace event first
80. 2 Local Register and Stack Management Global register g15 FP and local registers rO r1 SP and r2 RIP contain information to link procedures together and link local registers to the procedure stack Figure 5 1 The following subsections describe this linkage information 5 2 2 1 Frame Pointer The frame pointer is the current stack frame s first byte address It is stored in global register g15 the frame pointer FP register The FP register is always reserved for the frame pointer do not use 15 for general storage In the 1960 Cx processors frames are aligned on 16 byte boundaries Figure 5 1 When the processor creates a new frame on a procedure call if necessary it adds a padding area to the stack so that the new frame starts on a 16 byte alignment boundary Stack frame alignment is defined for each implementation of the 1960 processor family This alignment boundary is calculated from the relationship SALIGN 16 For the 1960 Cx processors SALIGN 1 and stacks are aligned on 16 byte boundaries 5 2 2 2 Stack Pointer The stack pointer is the byte aligned address of the stack frame s next unused byte The stack pointer value is stored in local register rl the stack pointer SP register The procedure stack grows upward i e toward higher addresses When a stack frame is created the processor automatically adds 64 to the frame pointer value and stores the result in the SP register This action creates the regi
81. 3 Double Word 8 bit 2 4 8 16 bit 1 4 4 32 bit 1 2 2 Word 8 bit 1 4 4 16 bit 1 2 2 32 bit 1 1 1 Short 8 bit 1 2 2 16 bit 1 1 1 32 bit 1 1 1 Byte 8 bit 1 1 1 16 bit 1 32 bit 1 1 1 Burst accesses increase bus bandwidth over non burst accesses The 1960 Cx processors burst access allows up to four consecutive data cycles to follow a single address cycle Compared to non burst memory systems burst mode memory systems achieve greater performance out of slower memory SRAM interleaved SRAM Static Column Mode DRAM and Fast Page Mode DRAM may be easily designed into burst mode memory systems A burst read or write access consists of a single address cycle 0 to 31 address to data wait states Nrap or Nwap and one to four data cycles separated by zero to three data to data wait states Ngpp or Nwpp If READY BTERM control is enabled in the region NwAp Ngpp and Nwpp Wait states may all be extended by not asserting READY BTERM may be used to break a burst access into smaller accesses The address two least significant bits automatically increment after each burst data cycle This is true for 8 16 and 32 bit wide data buses When a memory region is configured for a 32 bit data bus width address pins A3 2 increment For a 16 bit memory region BEI is encoded as A1 and address pins A2 1 increment When a memory region is configured for an 8 bit data bus width BEO and acting as the lower two bits of
82. 4 21 bus snooping 2 13 configuration 2 13 A 33 disabling 2 13 enabling and disabling 14 10 instruction buffer 2 14 invalidation 2 13 load and lock mechanism 2 14 sysctl 2 14 overview 1 1 size 2 13 A 33 Instruction Fetch Unit A 34 Instruction flow A 4 decode stage A 5 execute stage A 5 issue stage A 5 instruction formats 4 3 assembly language format 4 1 branch prediction 4 2 instruction encoding format 4 2 Instruction Pointer IP register 2 15 Instruction Scheduler IS 12 22 A 3 instruction cache A 4 instruction fetch unit A 4 microcode A 4 Instruction Set implementation specific instructions C 4 instruction timing C 4 intel INDEX instruction set instruction set continued add 9 9 ediv 9 36 addc 9 8 emul 9 37 addi addo 9 9 eshro 9 38 alterbit 9 10 extract 9 39 and andnot 9 11 fault 9 40 atadd 9 12 faulte faultne faultl faultle 9 40 atmod 9 13 faultg faultge 9 40 b 9 19 faulto faultno 9 40 b bx 9 14 flushreg 9 42 bal balx 9 15 fmark 9 43 bb 9 17 Id 9 44 bbc bbs 9 17 Id Idi Idt Idq 9 44 be bg bge 9 19 Ida 9 46 bl ble bne 9 19 Idib Idis 9 44 bo bno 9 19 Idob Idos 9 44 call 9 21 mark 9 47 calls 9 22 modac 9 48 callx 9 24 modi 9 49 chkbit 9 26 modify 9 50 clrbit 9 27 modpc 9 51 cmp 9 29 modtc 9 52 cmpdec 9 28 mov 9 53 cmpdeci cmpdeco 9 28 mov movl movt movq 9 53 cmpi cmpo 9 29 mul 9 54 cmpib cmpob 9 31 muli mulo 9 54 cmpibe cmpibne cmpibl cmpible 9 31 nand 9 55 cmpibg
83. 5 et N 9 12 9 3 6 te Ed uc dtt ute eee ade 9 13 9 3 7 Mo m 9 14 9 3 8 bal hec nd nor db ect rend dee va RE PATET UR 9 15 9 3 9 bDC bbs nter ee be ER tients 9 17 9 3 10 BRANGHIE 4 eiae ce mice tine A 9 19 9 3 11 Qc 9 21 9 3 12 D IEEE 9 22 9 3 13 9 24 9 3 14 ChkDit valeas der etat 9 26 9 3 15 zeugen e Wie bad ts 9 27 9 3 16 empdeco ae nek ents diea ete Pee iip e ee eer qaot 9 28 9 3 17 tenni dea 9 29 9 3 18 cempinci CmplnQQO icti cte etd art e o dee ades 9 30 9 3 19 COMPARE AND BRANCH 1 eere nennen 9 31 viii 9 3 20 9 3 21 9 3 22 9 3 23 9 3 24 9 3 25 9 3 26 9 3 27 9 3 28 9 3 29 9 3 30 9 3 31 9 3 32 9 3 33 9 3 34 9 3 35 9 3 36 9 3 37 9 3 38 9 3 39 9 3 40 9 3 41 9 3 42 9 3 43 9 3 44 9 3 45 9 3 46 9 3 47 9 3 48 9 3 49 9 3 50 9 3 51 9 3 52 9 3 53 9 3 54 9 3 55 9 3 56 9 3 57 9 3 58 9 3 59 9 3 60 9 3 61 5 con mpi concmpo eere bete D a ice e e envie 9 34 BREMEN 9 35 IUE ME UM 9 36
84. 5 Interrupt Mapping Registers 0 2 pg 12 12 F 16 lel REGISTER AND DATA STRUCTURES External Interrupt Pending Bits IPND xip 0 no interrupt 1 pending interrupt 0 no interrupt DMA Interrupt Pending Bits IPND dip 1 pending interrupt Interrupt Pending Registers IPND SFO Internal Interrupt Mask Bits IMSK xim 0 masked 1 not masked DMA Interrupt Mask Bits IMSK dim 0 masked 1 not masked Interrupt Mask Register IMSK SF1 Reserved Initialize to 0 F CA055A Figure F 18 Interrupt Mask IMSK and Interrupt Pending IPND Registers Section 12 3 6 Interrupt Mask and Pending Registers IMSK IPND pg 12 14 REGISTER AND DATA STRUCTURES intel Burst Enable 0 disabled 1 enabled READY BTERM Enable 0 disabled 1 enabled Read Pipelining Enable 0 disabled 1 enabled Nrap Wait States 0 31 wait states Wait States 0 3 wait states Nxpa Wait States 0 3 wait states Nwap Wait States 0 31 wait states Nwpp Wait States 0 3 wait states 31 28 Reserved Initialize to 0 Memory Region Configuration 1 enabled Register MCON 0 15 Y 24 20 Y vy 16 12 8 4 0 Bus Width 00 8 bit bus 01 16 bit bus 10 32 bit bus 11 reserved Byte Order 0 little endian 1 big endian Data Cache Enable i960 CF processor
85. 56 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Variables which are used heavily over short periods of time or are used heavily by one procedure should be dynamically allocated Such variables could be DMA descriptors for the currently active packets or coefficients for filters which process large images on command Dynamically allocated data RAM space would be loaded from main memory at the onset of intense processing and restored to main memory as the activity subsides Global allocation of DR space should be saved for storing variables which are heavily used by a variety of procedures over a long period of time or for storing variables needed by latency critical activities For example the programmer may wish to allocate space for coefficients of a continu ously operating filter or standard DMA descriptor templates from which run time descriptors are built in data RAM A 2 9 Summary Table A 21 summarizes code optimization tactics presented in the previous sections Optimizing compilers for the 1960 processor family are designed to exploit most of these techniques Advanced compilers also incorporate profiling features to automate much of the experimentation process Table A 21 Code Optimization Summary Tactic Description Advance long operations Separate comparisons loads stores and MDU operations from the instructions that use their results Unroll loops Unroll time consuming loops until 1
86. 6 RAS is CAS3 0 is COL ADR is IF BLAST THEN ELSE not asserted not asserted asserted REFRESH CYCLE Assert RAS asserted not asserted asserted the next state is 5 0 the next state is STATE_6 B 3 9 DRAM Refresh Request and Timer Logic DRAM refresh request and timer logic is responsible for generating DMA requests at an appropriate interval and for removing the DMA request after receiving DMA acknowledge Typical DRAM must be refreshed every 4 ms refresh cycles must be performed on all 256 rows during this 4 ms interval If a distributed refresh method is chosen a refresh cycle must be performed every 15 us The time base can be generated from a counter connected to PCLK a timer counter chip or any other time base DMA request and acknowledge signals are shown in Figure B 20 ADR BLAST AES hy Figure B 20 DMA Request and Acknowledge Signals B 28 S BUS INTERFACE EXAMPLES B 3 10 DMA Programming for Refresh DMA should be programmed to perform 32 bit fly by source synchronized demand mode transfers with source chaining The chaining must be set up to perform an infinite loop of transfers When all transfers are complete and all rows are refreshed the cycle begins again See Figure B 21 for chaining description 0xC 0x8 0x4 0x0 ADR 0 NEXT PTR DESTINATION ADR SOUR
87. 7 DATA TYPES AND MEMORY ADDRESSING MODES intel Example 3 1 Addressing Mode Mnemonics st g4 xyz absolute word from g4 stored at memory location designated with label xyz ldob r3 r4 register indirect ordinal byte from memory location given in r3 loaded into register r4 and zero extended stl g6 xyz g5 register indirect with displacement double word from g6 g7 stored at memory location xyz 45 ldq r8 r9 4 r4 register indirect with index quad word beginning at memory location r8 r9 Scaled by 4 loaded into r4 through r7 st g3 xyz g4 g5 2 register indirect with index and displacement word in g3 loaded to mem location g4 xyz g5 scaled by 2 ldis xyz r12 1 r13 index with displacement load short integer at memory location xyz r12 into r13 and sign extended St r4 xyz IP IP with displacement store word in r4 at memory location IP xyz 4 8 Example 3 2 Use of Index Plus Scaled Index Mode array op mov 40 4 pointer to array is moved to r4 subi 1 gl r3 calculate index for the last array b 2533 element to filled 134 st g2 x4 r3 4 fill array at index st g2 0x30 r4 r3 4 fill array at index constant offset subi Ly ESES decrement index 133 cmpible 0 r3 134 store next array elements if ret index is not 0 3 8 intel 4 INSTRUCTION SET SUMMARY intel CHAPTER 4 INSTRUCTION SET SUMMARY This chapter overviews the 19609 microprocessor family s
88. A 9 coherency A 10 BCU queues A 12 DMA operation A 13 1 0 and bus masters A 13 data fetch policy A 10 global control A 9 hits and misses 8 organization A 8 subblock placement 8 write policy A 10 data control peripheral units C 6 data fetch policy A 10 data movement instructions 4 5 load address instruction 4 6 load instructions 4 5 move instructions 4 6 data packing unit 10 15 Data RAM A 24 Data RAM DR A 7 data structure locations 2 10 INDEX Index 3 INDEX data structures control table 2 1 2 6 2 8 fault table 2 1 2 8 initialization boot record 2 1 2 8 interrupt stack 2 1 2 8 interrupt table 2 1 2 8 local stack 2 1 processor control block 2 1 2 8 supervisor stack 2 1 2 8 system procedure table 2 1 2 8 user stack 2 8 data types bits and bit fields 3 3 data alignment 3 4 integers 3 2 ordinals 3 3 triple and quad words 3 4 debug overview 8 1 debug instructions 4 17 decoupling capacitors 14 28 Demand mode transfers DMA 13 1 design considerations high frequency 14 29 interference 14 31 latchup 14 31 line termination 14 30 performance 1 die stepping information location 2 3 Direct Memory Access DMA Controller 13 1 div 9 35 divi divo 9 35 Index 4 m intel DMA block mode 13 2 block mode transfers 13 1 byte count alignment 13 10 data assembly 13 9 data chaining 13 13 data disassembly 13 9 data RAM 13 27 demand mode 13 2 demand mode transfers 13 1 13 8 demand mode
89. Also chkbit COMPARE AND BRANCH BRANCH IF 9 18 intel 9 3 10 Mnemonic Format Description INSTRUCTION SET REFERENCE BRANCH IF be t f Branch If Equal True bne t f Branch If Not Equal bl t f Branch If Less ble t f Branch If Less Or Equal bg t f Branch If Greater bge t f Branch If Greater Or Equal bo t f Branch If Ordered bno t f Branch If Unordered False b t f targ disp Branches to instruction specified with targ operand according to AC register condition code state Optional t or f suffix may be appended to mnemonic Use t to speed up execution when these instructions usually take the branch use f to speed up execution when these instructions usually do not take the branch If a suffix is not provided assembler is free to provide one For all branch if instructions except bno the processor branches to instruction specified with targ if the logical AND of condition code and mask part of opcode is not zero Otherwise it goes to next instruction For bno the processor branches to instruction specified with targ if the condition code is zero Otherwise it goes to next instruction For instance bno unordered can be used as a branch if false instruction when coupled with chkbit For bno branch is taken if condition code equals 000 can be used as branch if true instruction NOTE bo and bno are used by implementations that include floating point coprocessor for br
90. B 27 Two Way Interleaved Memory System Memory interleaving can be applied to SRAM DRAM and even EPROM memory systems Interleaved SRAM and EPROM memory systems overlap access times for consecutive accesses to improve memory system performance The 1960 Cx processors pipelined read mode can be used on SRAM and EPROM systems to further increase memory system performance However pipelined read mode is not appropriate for DRAM memory systems that require NypA states or READY control Interleaved DRAM memory systems can overlap the memory access time RAS precharge time of consecutive accesses and B 39 BUS INTERFACE EXAMPLES intel P Q Ha d I d T Gs gd wd WE ADS ADR Latched ADR Even Bank ADR Even Bank Data Odd Bank ADR Odd Bank Data Data SELECT Latched A2 l A3 0 A3 1 A3 0 A3 1 F_CA128A B 40 Figure B 28 Two Way Interleaved Read Waveforms x BUS INTERFACE EXAMPLES B 5 INTERFACING TO SLOW PERIPHERALS USING THE INTERNAL WAIT STATE GENERATOR This section illustrates how easy it is to interface slow peripherals to the 1960 Cx processor This example shows the interface to an Intel 82C54 2 Timer Counter and an Intel 82510 UART The integrated internal wait state generator programmable data bus width and data transceiver control signals simplify the logic required to implement the interface A system may require several slower speed peripher
91. Breakpoint 0 DABO 00H 04H IP Breakpoint Register 0 IPBO 08H IP Breakpoint Register 1 IPB1 0CH Data Address Breakpoint 1 DAB1 10H Interrupt Map Register 0 IMAPO 01H 14H Interrupt Map Register 1 IMAP1 18H Interrupt Map Register 2 IMAP2 1CH Interrupt Control Register ICON 20H Memory Region 0 Configuration MCONO 02H 24H Memory Region 1 Configuration MCON1 28H Memory Region 2 Configuration MCON2 2CH Memory Region 3 Configuration MCON3 30H Memory Region 4 Configuration MCONA ds 34H Memory Region 5 Configuration 5 38H Memory Region 6 Configuration MCON6 3CH Memory Region 7 Configuration MCON7 40H Memory Region 8 Configuration 8 04H 44H Memory Region 9 Configuration MCON9 48H Memory Region 10 Configuration MCON10 4CH Memory Region 11 Configuration MCON 1 1 50H Memory Region 12 Configuration MCON12 05H 54H Memory Region 13 Configuration 13 58H Memory Region 14 Configuration MCON14 5CH Memory Region 15 Configuration MCON15 60H Reserved 06H 64H Breakpoint Control Register BPCON 68H Trace Controls Register TC 6CH Bus Configuration Control BCON intel PROCEDURE CALLS intel CHAPTER 5 PROCEDURE CALLS This chapter describes mechanisms for making procedure calls which include branch and link instructions built in call and return mechanism call instructions call callx calls return instruction ret and call actions caused by interrupts and faults
92. C Series Core and Peripherals State of the art silicon technology and innovative microarchitectural constructs achieve high performance due to these features e Parallel instruction decoding allows sustained simultaneous execution of two instructions in every clock cycle e Most instructions execute in a single clock cycle e Multiple independent execution units enable multi clock instructions to execute in parallel e Resource and register scoreboarding provide efficient and transparent management for parallel execution e Branch look ahead and branch prediction features enable branches to execute in parallel with other instructions A 1 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel e A local register cache permits fast calls returns interrupts and faults to be implemented A 1 Kbyte 80960CA or 4 Kbyte 80960CF two way set associative instruction cache is integrated on chip e A 1 Kbyte direct mapped data cache is integrated on chip 80960CF only e Kbyte of static data RAM is integrated on chip A 1 INTERNAL PROCESSOR STRUCTURE The 1960 Cx processor core contains the following main functional units e Instruction Scheduler IS e Multply Divide Unit MDU Register File RF e Address Generation Unit AGU e Execution Unit EU Data RAM Local Register Cache Figure A 2 shows the 1960 Cx processor block diagram The IS and RF are the heart of the processor Other core functional units referre
93. Call Entry 0000 027FH 00H 08H 10H 18H 20H 28H 30H 38H 40H 48H 50H F CA019A Figure 7 2 Fault Table and Fault Table Entries 7 5 FAULTS intel As indicated in Figure 7 2 two fault table entry types are allowed local call entry and system call entry Each is two words in length The entry type field bits 0 and 1 of the entry s first word and the value in the entry s second word determine the entry type local call entry Provides an instruction pointer for the fault handling procedure The type 000 processor uses this entry to invoke the specified procedure by means of an implicit local call operation The second word of a local procedure entry is reserved it must be set to zero when the fault table is created and not accessed after that system call entry Provides a procedure number in the system procedure table This entry must type 010 have an entry type of 010 and a value in the second word of 0000 027FH Using this entry the processor invokes the specified fault handling procedure by means of an implicit call system operation similar to that performed for the calls instruction A fault handling procedure in the system procedure table can be called with a system local call or a system supervisor call depending on the entry type in the system procedure table To summarize a fault handling procedure can be invoked through the fault table in any of three ways a local call a system local call or a syste
94. Cycle 2 4 4 6 6 10 10 7 7 11 11 16 to 32 Multi Cycle 4 9 12 11 8 20 20 17 14 26 26 32 to 8 Multi Cycle 4 22 22 13 13 35 35 26 23 48 45 32 to 16 Multi Cycle 2 10 11 8 8 18 19 14 13 24 24 32 to 32 Multi Cycle aligned 4 4 4 6 6 10 10 7 7 11 11 2 4 6 6 6 6 12 12 9 9 15 15 128 to 128 Multi Cycle 16 6 7 9 9 15 16 10 10 16 17 8 bit Fly by 1 3 3 3 3 6 6 4 4 7 7 16 bit Fly by 2 3 3 3 3 6 6 4 4 7 7 32 bit Fly by 4 3 3 3 3 6 6 4 4 7 7 128 bit Fly by 16 3 3 6 6 9 9 6 6 9 9 The columns in Table 13 5 labeled DMA Process and User Process show the number of clock cycles allocated to either these processes during a single DMA transfer Equation 13 4 provides the minimum fraction of processor bandwidth remaining for the user process during DMA transfer 13 40 intel 3 DMA CONTROLLER Minimum User Process Bandwidth 100 Equation 13 4 XFER Transfer types that do not perform assembly or disassembly always have NTggo equal to Nxpggg For example for a 32 to 32 bit multi cycle transfer has a value of 4 which means that each transfer moves 4 bytes Bggo is 4 which indicates that 4 bytes of data is transferred every DMA request This means that every DMA request will cause a transfer Throughput per DMA request found by using Equation 13 3 is equal to Nypggg 10 clocks In some cases a transfer does not occur every DMA request For example a source synchron
95. FP 64 Configure status amp control registers AC lt 0 lt 0 4 Supervisor lt Interrupted PC p 4 lt 31 Load control registers with data in the control table Execute user code branch to start IP Set up bus controller Load byte at FFFF FFOOH into byte 0 of MCONO Load byte at FFFF into byte 1 and FFFF FFO8H into byte 2 of MCONO Assert FAIL pin Compute checksum for bus confidence self test Load words FFFF FF10H through FFFF FF2CH start IP is word at FFFF FF10H Rs Y ES m 0077 Deassert FAIL pin Figure 14 4 Processor Initialization Flow 14 13 INITIALIZATION AND SYSTEM REQUIREMENTS intel 14 3 3 Startup Code Example After initialization is complete user startup code typically copies initialized data structures from ROM to RAM reinitializes the processor sets up the first stack frame changes the execution state to non interrupted and calls the main routine This section presents an example startup routine and associated header file This simplified startup file can be used as a basis for more complete initialization routines The MON960 debug monitor s source code serves as an example of a more complete initialization The examples in this section are useful for creating and evaluating startup code The following lists the example s number name and page e Example 14 2 Startup Routine init s p
96. Field 2 Message Type Field 1 31 0 SRC2 Field 3 o SRC DST Used as SRC Field 4 F 007 Figure 4 2 Source Operands for sysctl Table 4 3 System Control Message Types and Operand Fields SRC1 SRC2 SRC3 Message Type Field 1 Field 2 Field 3 Field 4 Request Interrupt 00H vector unused unused unused Invalidate Cache 01H unused unused unused unused Configure Cache Mode Cache load gen see Table 4 4 Unused address Reinitialize 03H caused first instruction PRCB address address Load Control Register 04H register group unused unused unused NOTE The processor ignores unused sources and fields 4 3 2 System Control Messages The 4 20 five system control messages defined in the following subsections are request interrupt causes an interrupt to be serviced or posted configure instruction cache disables or locks instructions in a portion of the instruction cache invalidate instruction cache causes the contents of the instruction to be purged reinitialize restarts the processor load control register loads the on chip control registers INSTRUCTION SET SUMMARY 4 3 2 1 Request Interrupt Executing sysctl with a message type of 00H causes an interrupt to be requested Field 1 of the instruction specifies the vector number of the interrupt requested The remaining fields are not defined Requesting an interrupt with sysctl
97. However only interrupt procedures can be locked into the cache The cache locking scheme is improved on the 1960 CF processor and has fewer restrictions Any section of code can be locked into half of the instruction cache not just the interrupt procedures When the 1960 CF processor executes sysctl modes 100 110 with a command to lock the instruction cache one way of the two way set associative cache is pre loaded and locked from the specified address The other half of the instruction cache now functions as a 2 Kbyte direct mapped instruction cache except for those instructions that sysctl locks The unlocked portion of the 1960 CA processor s instruction cache functions as two way set associative This mode of operation continues until the cache mode is changed by the next sysctl instruction As on the 1960 CA processor the invalidate instruction cache sysctl message invalidates both the locked and unlocked halves of the cache The instruction scheduler checks all ways of the cache for every instruction fetched If an instruction is not found it is fetched from external memory and loaded into the unlocked portion of the instruction cache A 33 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel Table A 12 Cache Configuration Modes Mode Field Mode Description CA CF 0005 normal cache enabled 1 Kbyte 4 Kbytes 12 full cache disabled 1 Kbyte 4 Kbytes 1005 Load and lock full cache execute off chip 1 Kbyte 4
98. Kbytes Load and lock half the cache 110 2 remainder is normal cache enabled 512 bytes 2 Kbytes 0105 Reserved 1 Kbyte 4 Kbytes NOTES 1 On the CA only interrupt procedures can execute in the locked portion of the cache 2 On the CF interrupt procedures and other code can operate in the locked portion of the cache A 2 5 2 Fetch Strategy When any of the three or four words presented to the scheduler are invalid a cache miss is signaled and an instruction fetch is issued The Instruction Fetch Unit makes the fetch and prefetch decisions Since the cache supports two word and quad word replacement within a line instruction fetches can be issued in either size The conditions of the cache miss determine which fetch is issued Table A 13 describes the fetch decision Table A 13 Fetch Strategy Words Provided To Scheduler Fetch Initiated IP IP 4 IP 8 IP 12 A3 2 of requested IP 0X A3 2 of requested IP 1X Hit Hit Hit Hit no fetch no fetch Hit Miss Hit Hit fetch two words at IP fetch two words at IP Miss Hit Hit Hit Miss Miss Hit Hit Hit Hit Hit Miss fetch two words at IP 8 fetch two words at IP 8 Hit Hit Miss Hit Hit Hit Miss Miss All other cases fetch four words at IP fetch two words at IP and four words at IP 8 A 2 5 3 Fetch Latency The Instruction Fetch Unit initiates an instruction fetch by requesting quad word or long word loads from the BCU Th
99. Mode Supervisor Only Operation User Mode Fault modpc modify process controls type mismatch sysctl system control constraint privileged sdma setup DMA constraint privileged SFR as instruction operand type mismatch Protected internal data RAM write type mismatch 2 7 2 Using the User Supervisor Protection Model A program switches from user mode to supervisor mode by making a system supervisor call also referred to as a supervisor call A system supervisor call is a call executed with the call system instruction calls With the calls instruction the IP for the called procedure comes from the system procedure table An entry in the system procedure table can specify an execution mode switch to supervisor mode when the called procedure is executed The calls instruction and the system procedure table thus provide a tightly controlled interface to procedures which can execute in supervisor mode Once the processor switches to supervisor mode it remains in that mode until a return is performed to the procedure that caused the original mode switch Interrupts and some faults also cause the processor to switch from user to supervisor mode When the processor handles an interrupt it automatically switches to supervisor mode However it does not switch to the supervisor stack Instead it switches to the interrupt stack Figure 2 6 shows a system which implements the user supervisor protection model to protect kerne
100. NOT RDY COL ADR NOT WE WE WRITE RAS CAS CAS BE NOT RDY READY WRITE COL_ADR WE WRITE IWRITE RAS RAS 5 5 READY NOT RDY COL ADR WE WRITE C ADS amp CS ACC REQ RAS NOT One NOT CAS NOT READY NOT RDY NOT COL_ADR NOT WE WE WRITE F_CA125A B 34 Figure B 25 DRAM State Machine BUS INTERFACE EXAMPLES The refresh request timer generates the refresh request signal REF REQ indicating that it is time to refresh the DRAM The controller gives preference to refresh requests over access requests This ensures that the entire memory remains refreshed The access request signal REQ shown on the state diagram is a latched signal REQ is asserted when ADS and DRAM CS are both asserted REQ is deasserted when BLAST is asserted It is necessary to latch the access request because the controller could be in a refresh or RAS precharge state when the processor accesses the DRAM The pseudo code description below is provided only to describe the state machine diagram It is not intended to be used directly as PLD equations STATE 0 Idle RAS is not asserted CAS3 0 is not asserted COL ADR is not asserted READY is not asserted WE W R IF REF REO THEN the next state is STATE 7 Refresh ELSE IF ADS amp amp DRAM CS ACC REQ THEN the next state is STATE 1 Access ELSE the next state is STATE 0
101. Not changed or generated by the instruction C Call 0 Set to 0 by the instruction under any condition S Supervisor 1 Set to 1 by the instruction under any condition BR Breakpoint May be set or cleared by the instruction R Return 0 be cleared but is never set by the instruction P Prereturn be set but is never cleared by the instruction U Unimplemented T Trace Fault Type OP Invalid Operand Operation Fault OC Invalid Opcode A Arithmetic Fault Type IO Integer Overflow C Constraint Fault Type 2 Zero Divide P Privilege Fault Type M Machine Y Type Fault Type R Range Instruction Undefined B Branch March 1994 Page 1 of 18 Order Number 272220 002 Opcode Instruction Format Machine Type i9609 Cx Microprocessor User s Guide Instruction Set Quick Reference The instruction s opcode The encoding format of the instruction Appendix D details the complete encoding for each instruction Indicates which group of parallel processing units are used to execute the instruction Table 2 lists the possible machine types Table 2 Machine Type Shorthand Symbol Definition R Register The instruction is executed in parallel by a processing unit on the Register side of the processor M Memory The instruction is executed in parallel by a processing unit on the Memory side of the processor C Control The instruction is executed by the Instruction Scheduler in pa
102. Offset 3124 23 19 18 14 19002 12 11 10 97 65 40 Opcode src dst ABASE Mode Scale 00 Index Displacement Effective Address efa offset Opcode dst 0 0 offset offset reg Opcode dst reg 1 0 offset reg Opcode dst reg 0 1 0 0 00 disp 8 IP Opcode dst 0 1 0 1 00 displacement reg1 reg2 scale Opcode dst regi 0 1 1 1 scale 00 reg2 disp Opcode dst 1 1 0 0 00 displacement disp reg Opcode dst reg 1 1 0 1 00 displacement disp reg scale AOpcode dst 1 1 1 0 scale 00 reg displacement disp reg1 reg2 scale dst reg1 1 1 1 1 scale 00 reg2 displacement Opcode Mnemonic Opcode Mnemonic 80 Idob 98 Idi 82 stob 9A stl 84 bx A0 Idt 85 balx A2 stt 86 callx BO Idq 88 Idos B2 stq 8A stos CO Idib 8C Ida C2 stib 90 Id C8 Idis 92 st CA stis E 6 intel REGISTER AND DATA STRUCTURES tel APPENDIX F REGISTER AND DATA STRUCTURES This appendix is a compilation of all register and data structure figures described throughout the manual Section F 1 Data Structures pg F 2 contains diagrams of the memory resident data structures listed in order of importance Section F 2 Registers pg F 10 lists all registers alpha betically Following each figure is a reference that indicates the section that discusses the figure Fig Register Data Structure Where defined in the manual
103. PD A 42 A 2 7 Coding Optimizations ic ie ds eie e e be e repertus A 43 A 2 7 1 6ads and SfOtes onde epit E PD me A 44 A 2 7 2 Multiplication and Division emn A 45 A 2 7 3 Advancing mener A 46 A 2 7 4 Unrolling LOOPS ette tte tret cete ei drei A 46 A 2 7 5 Enabling Constant Parallel Issue sem A 48 A 2 7 6 Alternating from Side to Side A 49 A 2 7 7 Branch Prediction nee tt e P HERE E HERI ER REN ER SER A 53 A 2 7 8 Branch Target Alignment A 53 A 2 7 9 Replacing Straight Line Code and Calls see A 54 A 2 8 Utilizing On chip Storage seem A 55 A 2 8 1 Instruction Cache c a 55 2 8 2 Data Cache 1960 CF Processor Only A 55 A 2 8 3 Register Cache nee ede ee A 56 A 2 8 4 Data RAM idee ED C RE RR EU A 56 A 2 9 S miTIary ashen ee iw ae dn a d em n ER ene A 57 APPENDIX B BUS INTERFACE EXAMPLES B 1 NON PIPELINED BURST SRAM B 1 B 1 1 Background 5o eth bet ORO edi c m DS e iet ei B 1 B 1 2 Implementation 5 eR Deae Umm B 1 B 1 3 Block rect ea e rent iet B 2 B 1 3 1 Chip Select Foglio 5 Rete enden
104. Page F 1 Control Table Section 2 3 CONTROL REGISTERS pg 2 6 F 2 F 2 Fault Record Section 7 5 1 Fault Record Data pg 7 6 F 3 F 3 Fault Table and Fault Table Entries Section 7 3 FAULT TABLE pg 7 4 4 E Initial Memory Image IMI and Process F 4 Control Block PRCB Section 14 2 5 Initialization Boot Record IBR pg 14 5 F 5 F 5 Storage of an Interrupt Record on the Section 6 7 INTERRUPT STACK AND INTERRUPT RECORD pg F 6 Interrupt Stack 6 9 F 6 Interrupt Table Section 6 4 INTERRUPT TABLE pg 6 3 F 7 F 7 d ei Stack Structure and Local Section 5 2 1 Local Registers and the Procedure Stack pg 5 2 F 8 F 8 System Procedure Table Section 5 5 1 1 Procedure Entries pg 5 14 F 9 F 9 Arithmetic Controls Register AC Section 2 6 2 Arithmetic Controls AC Register pg 2 15 F 10 F 10 Bus Configuration Register BCON Section 10 3 2 Bus Configuration Register BCON pg 10 8 F 10 F 11 Data Address Breakpoint Registers Section 8 2 7 Breakpoint Trace pg 8 5 F 11 F 12 DMA Command Register DMAC Section 13 10 1 DMA Command Register DMAC pg 13 21 F 11 13 DMA Control Word Section 13 10 3 DMA Control Word pg 13 25 F 12 F 14 Hardware Breakpoint Control Register Section 8 2 7 Breakpoint Trace pg 8 5 F 13 BPCON F 15 Instruction Address Breakpoint Registers Section 8 2 7 Breakpoint Trace pg 8 5 F 13 IPBO
105. Record Data pg 7 6 for the remaining parallel faults is then written to the fault record s optional data field and the fault handling procedure for parallel faults is invoked The first word in the fault record s optional data field NFP 20 contains information about the parallel faults The byte at offset NFP 18 contains 00H encoding for the parallel fault type the byte at NFP 20 contains the number of parallel faults The optional data field also contains a 32 byte parallel fault record for each additional fault These parallel fault records are stored incremen tally in the fault record starting at byte offset NFP 97 The fault record for each additional fault contains only the fault type fault subtype and address of faulting instruction field AC and PC register values are not given for these faults these are given in the fault record for the first fault FAULTS intel 7 7 FAULT HANDLING PROCEDURES The fault handling procedures can be located anywhere in the address space Each procedure must begin on a word boundary The processor can execute the procedure in user mode or supervisor mode depending on the type of fault table entry To resume work on a program at the point where a fault occurred following the recovery action of the fault handling procedure the fault handling procedure must be executed in supervisor mode The reason for this requirement is described in section 7 7 3 Returning to the Point in the Program Where
106. The MDU is a REG coprocessor which performs integer and ordinal multiply divide remainder and modulo operations The MDU detects integer overflow and divide by zero errors The MDU is optimized for multiplication performing extended multiplies 32 by 32 in four to five clocks The MDU performs multiplies and divides in parallel with the main execution unit A 1 6 Address Generation Unit AGU The AGU is a MEM coprocessor which computes the effective addresses for memory operations It directly executes the load address instruction Ida and calculates addresses for loads and stores based on the addressing mode specified in these instructions Address calculations are performed in parallel with the main execution unit EU A 1 7 Data RAM and Local Register Cache The data RAM and local register cache are part of a 1 5 Kbyte block of on chip Static RAM SRAM One Kbyte of this SRAM is mapped into the 1960 Cx processors address space from location 00000000H to 000003FFH A portion of the remaining 512 bytes is dedicated to the local register cache This part of internal SRAM is not directly visible to the user Loads and stores including quad word accesses to the internal data RAM are typically performed in only one clock The complete local register set therefore can be moved to the local register cache in only four clocks A 7 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 1 8 Data Cache 80960CF Only 196
107. To summarize the information presented in the previous sections the processor signals a trace event when it detects any of the following conditions An instruction included in a trace mode group executes or is about to execute in the case of a prereturn trace event and the trace mode for that instruction is enabled An implicit call operation executed and the call trace mode is enabled A mark instruction executed and the breakpoint trace mode is enabled An fmark instruction executed The processor is executing an instruction at an IP matching an enabled instruction address breakpoint register The processor has issued a memory access matching the conditions of an enabled data address breakpoint register TRACING AND DEBUGGING intel When the processor detects a trace event and the PC register trace enable bit is set the processor performs the following action 1 The processor sets the appropriate TC register trace event flag If a trace event meets the conditions of more than one of the enabled trace modes a trace event flag is set for each trace mode condition that is met 2 The processor sets the PC register trace fault pending flag The processor may set a trace event flag and trace fault pending flag before completing execution of the instruction that caused the event However the processor only handles trace events between instruction executions If when the processor detects a trace event the PC register trace en
108. Vectors in Internal RAM Vector Number Binary Vector Number Decimal Internal RAM Address NMI 248 0000H 0001 00105 18 0004H 0010 00105 34 0008H 0011 00105 50 000CH 0100 00105 66 0010H 0101 00105 82 0014H 0110 00105 98 0018H 0111 00105 114 001CH 1000 00105 130 0020H 1001 00105 146 0024H 1010 00105 162 0028H 1011 00105 178 002CH 1100 00105 194 0030H 1101 00105 210 0034H 1110 00105 226 0038H 1111 00105 242 003CH 12 20 INTERRUPT CONTROLLER 12 3 13 DMA Suspension on Interrupts Core resources required to execute a DMA operation may impact interrupt latency A DMA operation may be temporarily suspended to reduce the effects of the DMA when interrupt response time is critical The DMA suspension option is programmed in the ICON register When the option is selected the core suspends DMA processing while executing a call to an interrupt handling procedure for a hardware requested interrupt Once the core begins executing the interrupt procedure it restores DMA processing To improve interrupt throughput DMA processing can be suspended until the execution of an interrupt handling procedure is complete To accomplish this the interrupt procedure must explicitly suspend DMA operation by clearing the DMA command register s channel enable field See section 13 10 1 DMA Command Register DMAC pg 13 21 for more information 12 3 14 Caching Interrupt H
109. a DR based link target These instructions may be issued in parallel with a machine type R instruction Table A 17 bx and balx Performance The following instructions consume n issue clocks before target code is issued where n for each addressing mode is as follows disp offset Mnemonic reg disp reg offset reg reg reg scale disp reg scale disp reg reg scale disp IP bx balx 4 4 6 A 40 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 6 6 Call and Return Procedure call return and system procedure call instructions are implemented as micro flows call consumes four issue clocks when the target is cached and a register cache location is available When a frame spill is required an additional 22 issue clocks are consumed in a zero wait state system before the target code begins execution The worst case memory activity for a call with a frame spill and a cache miss is one quad word instruction fetch followed by four quad word stores Wait states in the instruction fetch directly impact call speed while wait states in the frame stores are decoupled from internal execution by the BCU queues ret consumes four issue clocks when the target and the previous register set are both cached When a frame fill is required an additional 38 issue clocks are consumed in a zero wait state system before the target code begins execution The worst case memory activity for a return with a frame f
110. a read access following a write is not pipelined NOTE When pipelining is enabled in a region READY and BTERM are ignored for read and write cycles These must be disabled in regions that use pipelining For pipelined reads the bus controller uses a value of zero for the parameter regardless of the parameter s programmed value A non zero value defeats the purpose of pipelining The programmed value of is used for write accesses to pipelined memory regions Address Y PCLK Address Latch Pipeline Interface Memory Array Data lt 1 1 1 1 1 1 1 1 1 1 1 1 1 Latched 1 1 1 1 1 Figure 11 12 Pipelined Read Memory System 11 21 EXTERNAL BUS DESCRIPTION Function fe Bit 31 23 Value PCLK ADS A31 4 SUP DMA LOCK D31 0 WAIT BLAST DT R 1 2 Z o Byte Order Bus Pipe Exeral wa ee X X X 0 ON i XX 00000 1 x XXXXX 22 21 eg reserved Und G request pelir pipelined reads conclude pipelined reads begin requests begi CX035A 11 22 Figure 11 13 Non Burst Pipelined Read Waveform EXTERNAL BUS DESCRIPTI
111. a special register do not use r2 to hold operand values Since interrupts and faults trigger an implicit call action the RIP register may be written at any time with the return pointer associated with the interrupt or fault event 5 2 3 Call and Return Action To clarify how procedures are linked and how the local register and stack are managed the following sections describe a general call and return operation and the operations performed with the FP SP PFP and RIP registers described in the preceding sections The events for call and return operations are given in a logical order of operation The 1960 Cx processors can execute independent operations in parallel therefore many of these events execute simultaneously For example to improve performance the processors often begin prefetch of the target instruction for the call or return before the operation is complete 5 2 3 1 Call Operation When a call instruction is executed or an implicit call is triggered 1 The processor stores the instruction pointer for the instruction following the call the current stack s RIP register 12 2 The current local registers including the PFP SP and RIP registers are saved freeing these for use by the called procedure Because saved local registers are cached on the 1960 Cx processors the registers are always saved in the on chip local register cache at this time 3 The frame pointer g15 for the calling procedure is stored in
112. allows up to four consecutive columns within a selected row to be read or written at a high data rate A read or write cycle starts by asserting RAS Strobing CAS accesses the consecutive column data The input address is ignored after the first column access 1 1 RAS CAS 1 1 1 1 1 1 1 1 1 1 DATA Hz F CA111A Figure 11 Nibble Mode Read intel e BUS INTERFACE EXAMPLES B 3 1 2 Fast Page Mode DRAM Fast page mode DRAM Figure B 12 is similar to nibble mode DRAM except fast page mode allows any column within a selected row to be read or written at a high data rate A read or write cycle starts by asserting RAS Strobing CAS accesses the selected column data During reads the CAS falling edge latches the address internal to the DRAM and enables the output The processor s four word burst bus can easily take advantage of the faster column access times provided by fast page mode DRAM 1 n i Row m F CA112A Figure B 12 Fast Page Mode DRAM Read B 17 BUS INTERFACE EXAMPLES intel P B 3 1 3 Static Column Mode DRAM Static column mode DRAM write accesses Figure B 13 are similar to fast page mode writes Static column read
113. and Big Endian nsus 31 24 23 16 15 8 7 0 32 bit 00 1st dd 01 1st dd 10 1st 11 1st dd 16 bit X0 1st X1 1st 8 bit XX 1st dd 11 4 ATOMIC MEMORY OPERATIONS The LOCK Signal LOCK output assertion indicates that the processor is executing an atomic read modify write operation Atomic instructions atadd atmod require indivisible memory access That is another bus agent must not access the target of the atomic instruction between read and write cycles LOCK can be used to implement indivisible accesses to memory 11 26 3 EXTERNAL BUS DESCRIPTION Atomic instructions consist of a load and store request to the same memory location LOCK is asserted in the first address cycle of the load request and deasserted in the cycle after the last data transfer of the store request The LOCK is not active during the Nxpa states for the store request When implementing a locked memory subsystem consider the interaction that the following mechanisms may have with the system A system must account for these conditions during locked accesses e HOLD requests are acknowledged while LOCK is asserted e atomic load or store may be suspended using the BOFF input e DMA request may occur between the atomic load and store requests LOCK indicates that other agents should not write data to any address falling within the quad word boundary of the address on the
114. and Return The processor offers an on chip call return mechanism for making procedure calls Refer to section 5 2 CALL AND RETURN MECHANISM pg 5 2 These instructions support this mechanism call call calix call extended calls call system ret return call and ret use the CTRL machine instruction format callx uses the MEM format and can specify local or global registers calls uses the REG format and can specify local global or special function registers call and callx make local calls to procedures A local call is a call that does not require a switch to another stack call and callx differ only in the method of specifying the target procedure s address The target procedure of a call is determined at link time and is encoded in the opword as a signed displacement relative to the call IP callx specifies the target procedure as an absolute 32 bit address calculated at run time using any one of the addressing modes For both instructions a new set of local registers and a new stack frame are allocated for the called procedure calls is used to make calls to system procedures procedures that provide a kernel or system executive services This instruction operates similarly to call and callx except that it gets its target procedure address from the system procedure table An index number included as an operand in the instruction provides an entry point into the procedure table 4 16 INSTRUCTION SET SUMMARY Depen
115. and clear for dedicated and expanded RAM mode interrupts Vector Cache Enable ICON vce 0 Fetch From External Memory 1 Fetch From Internal RAM Sampling Mode ICON sm 0 debounce 1 fast DMA Suspension ICON dmas 0 run on interrupt 1 suspend on interrupt d m a 5 Interrupt Control Register ICON CA053A Reserved Initialize to 0 Figure F 16 Interrupt Control ICON Register Section 12 3 4 Interrupt Control Register ICON pg 12 11 F 15 REGISTER AND DATA STRUCTURES External Interrupt 0 Field IMAPO xO External Interrupt 1 Field IMAPO x1 External Interrupt 2 Field IMAPO x2 External Interrupt 3 Field IMAPO x3 31 28 24 20 16 Interrupt Map Register 0 IMAPO i External Interrupt 4 Field IMAP1 x4 External Interrupt 5 Field IMAP1 x5 External Interrupt 6 Field IMAP1 x6 External Interrupt 7 Field IMAP1 x7 x 5 4 4 31 28 24 20 16 1 Interrupt Map Register 1 IMAP1 DMA Interrupt 0 Field 2 00 DMA Interrupt 1 Field IMAP2 d1 DMA Interrupt 2 Field IMAP2 d2 DMA Interrupt 3 Field IMAP2 d3 d 1 4 4 31 28 24 20 16 1 Interrupt Register 2 IMAP2 Reserved Initialize to 0 054 Figure F 17 Interrupt Map IMAPO IMAP2 Registers Section 12 3
116. are either one or two words long The IP gives the address of the lowest order byte of the first word of the instruction The IP register cannot be read directly However the IP with displacement addressing mode allows the IP to be used as an offset into the address space This addressing mode can also be used with the Ida load address instruction to read the current IP value When a break occurs in the instruction stream due to an interrupt procedure call or fault the IP of the next instruction to be executed is stored in local register r2 which is usually referred to as the return IP or RIP register Refer to CHAPTER 5 PROCEDURE CALLS for further discussion of this operation 2 6 2 Arithmetic Controls AC Register The AC register Figure 2 4 contains condition code flags integer overflow flag mask bit and a bit that controls faulting on imprecise faults Unused AC register bits are reserved TEENS No Imprecise Faults Bit AC nif A 0 Some Faults are Imprecise 1 All Faults are Precise Integer Overflow Mask Bit AC om 0 No Mask 1 Mask Integer Overflow Flag AC of 0 No Overflow 1 Overflow Condition Code Bits AC cc Reserved Initialize to 0 F CA004A Figure 2 4 Arithmetic Controls AC Register 2 15 PROGRAMMING ENVIRONMENT intel 2 6 2 1 Initializing and Modifying the AC Register At initialization the AC register is loaded from the Initial AC image fie
117. asserted WE X the next state is STATE_0 Return to idle Refresh assert CAS RAS is not asserted CAS3 0 is asserted COL_ADR X READY is not asserted WE is not asserted the next state is STATE_8 Refresh assert RAS RAS is asserted CAS3 0 is asserted COL_ADR X READY is not asserted WE is not asserted the next state is STATE_8 STATE_9 Refresh Hold RAS RAS is asserted CAS3 0 is asserted B 36 3 BUS INTERFACE EXAMPLES STATE 0 Idle COL ADR X READY is not asserted WE X the next state is STATE 10 STATE 10 Refresh Hold RAS RAS is asserted CAS3 0 is asserted COL ADR X READY is not asserted WE is not asserted the next state is STATE 5 RAS Precharge B 4 INTERLEAVED MEMORY SYSTEMS Interleaving memory can provide a significant improvement in memory system performance Interleaved memory systems overlap accesses to consecutive addresses this results in higher performance with slower memory Two way memory interleaving is accomplished by dividing the memory into banks one bank for even word addresses one for odd word addresses The least significant address bit A2 is used to select a bank The two banks are read in parallel and the data is put onto the data bus by a multiplexer This can allow the wait states of the second access to be overlapped with the data transfer of the first access Figure B 26 shows the access overlap for
118. beginning at NFP 1 See Figure 7 4 processor gets the IP for the first instruction of the fault handling procedure from the system procedure table using the index provided in the fault table entry e processor stores the fault return code 0015 in the PFP register return type field If the fault is not a trace fault it copies the state of the system procedure table trace control flag byte 12 bit 0 into the PC register trace enable bit If the fault is a trace fault the trace enable bit is cleared On a return from the fault handling procedure the processor performs the action described in section 5 2 3 2 Return Operation pg 5 6 with the following exceptions 7 16 ntel FAULTS e fault record arithmetic controls field is copied into the AC register the processor is in supervisor mode prior to the return from the fault handling procedure which it should be the fault record process controls field is copied into the PC register f the PC register resume flag is set the processor reads the resumption record from the stack Restoring the PC register restores the trace fault pending flag and trace enable bit values to their pre fault values the processor was in user mode when the fault occurred the mode is set back to user mode otherwise the processor remains in supervisor mode e processor switches back to the stack it was using when the fault occurred If the process
119. bit 16 in the instruction cache configuration word is set the instruction cache is disabled and all instruction fetches are directed to external memory Disabling the instruction cache is useful for tracing execution in a software debug environment Instruction cache remains disabled until one of two operations is performed e The processor is reinitialized with a new value in the instruction cache configuration word e Syscll is issued with the configure instruction cache message type and a cache configuration mode other than disable cache The register cache configuration word specifies the number of register sets cached on chip The integrated procedure call mechanism saves the local register set when a call executes Local registers are saved to the local register cache When this cache is full the oldest set of local registers is flushed to the stack in external memory The register cache configuration word s least four bits specify the number of local register sets internally cached The number programmed in this word specifies from 0 to 15 register sets When more than five register sets are selected space is taken from internal data RAM for the register cache See section 5 2 CALL AND RETURN MECHANISM pg 5 2 for a complete description of the register caching mechanism 14 10 intel INITIALIZATION AND SYSTEM REQUIREMENTS 14 3 REQUIRED DATA STRUCTURES Several data structures are typically included as part of the IMI because va
120. bit 5 for short words and bit 7 for bytes 8 4 intel DATA TYPES AND MEMORY ADDRESSING MODES ADDRESS DATA 100H 12H 101H 34H 102H 56H 103H 78H Table 3 4 Byte Ordering for L and Big Endian Accesses Table 3 3 Memory Contents For Little and Big Endian Example Access Example Little Endian Big Endian Byte at 100H Idob 0x100 r3 12H 12H Short at 102H Idos 0x102 r3 7856H 5678H Word at 100H Id 0x100 r3 78563412H 12345678H NOTES D s are data transferred to from memory X s are O s for ordinal data X s are sign bit extensions for integer data WORD Figure 3 2 Data Placement in Registers 3 3 MEMORY ADDRESSING MODES The processor provides nine modes for addressing operands in memory Each addressing mode is used to reference a byte in the processor s address space Table 3 5 shows the memory addressing modes a brief description of each mode s address elements and assembly code syntax 3 5 DATA TYPES AND MEMORY ADDRESSING MODES Table 3 5 Memory Addressing Modes intel with index Mode Description Assembler Syntax Absolute Offset offset exp displacement displacement exp Register Indirect abase reg with offset abase offset exp reg with displacement abase displacement exp reg abase index scale reg reg scale with index and displacement abase index
121. bit clears all valid bits in the data cache array This provides a quick method of invalidating all the data cache The same restrictions apply to setting the data cache invalidate bit that apply to the data cache global disable bit the data cache is not actually disabled until the second clock following execution of the instruction which sets this bit The cache invalidate logic transparently manages the case where multiple pending cacheable loads are in the Bus Controller Unit BCU queues when the data cache is invalidated The logic continually invalidates the data cache until all loads have returned from the BCU This ensures that loads issued before the cache is invalidated are not written to the data cache The data cache invalidate bit remains set until all pending loads have returned and the cache is invalidated At that time the bit is cleared 13 10 2 Set Up DMA Instruction sdma sdma configures a DMA channel The sdma instruction has the following format sdma opl 2 op3 reg lit sfr reg lit sfr reg 13 24 ERRATA 7 11 94 DMA Command Register bits 30 Data Cache Global Disable and 31 Data Cache Invalidate not defined in Figure 13 9 or in the text that follows the figure These were correctly defined in the 19606 CF Microprocessor Reference Manual Supplement and unintentionally omitted from this manual intel 3 DMA CONTROLLER The three operands are described in Figure 13 10 and in the following text op
122. bits 40 lt 92 40 Carry Bit addc gl g3 gl add high order 32 bits gl lt g3 Carry Bit 64 bit result is in 40 gl addc 5BOH REG addi addo subc subi subo intel INSTRUCTION SET REFERENCE 932 addi addo Mnemonic addi Add Integer addo Add Ordinal Format add srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Adds src2 and src1 values and stores the result in dst The binary results from these two instructions are identical The only difference is that addi can signal an integer overflow Action dst src2 srcl Faults Type Mismatch Non supervisor reference of a sfr Arithmetic Integer Overflow Result too large for destination register addi only If overflow occurs and AC om 1 fault is suppressed and AC of is set to 1 Least significant 32 bits of the result are stored in dst Example addi r4 g5 r9 r9 lt 95 r4 Opcode addi 591H REG addo 590H REG See Also addc subi subo subc 9 9 INSTRUCTION SET REFERENCE intel 9 3 3 Format Description Action Faults Example Opcode See Also 9 10 alterbit alterbit Alter Bit alterbit bitpos STC dst reg lit sfr reg lit sfr reg sfr Copies src value to dst with one bit altered bitpos operand specifies bit to be changed condition code determines value to which the bit is set If condition code is bit 1 1 selected bit is set otherwise it is cleared if A
123. bus When the alternate bus master has completed its accesses BOFF is deasserted and the suspended request is resumed upon assertion of ADS on the following clock cycle Figure 11 18 11 29 EXTERNAL BUS DESCRIPTION In The backoff function differs from the bus hold mechanism The backoff function suspends a bus request which has already started The request is later resumed when the pin is deasserted The bus hold mechanism allows another bus master to control the bus only after all executing bus requests have completed Backoff can only be used for requests to regions which have the READY BTERM inputs enabled with the NgAp Ngpp Nwap and Nwpp parameters programmed to 0 BOFF may only be asserted during a bus access Recall that a bus access includes and is bounded by clock cycles in which ADS is valid and the clock cycle in which BLAST is valid and READY input is asserted External logic responsible for asserting BOFF must ensure that the signal is not asserted during idle bus cycles or during bus turnaround cycles Unpredictable behavior may occur if BOFF is subsequently deasserted during an idle bus or turnaround cycle Itis possible for HOLD and BOFF to be asserted in the same clock cycle In this case BOFF takes precedence The bus is relinquished to a hold request only after the current request is complete Bus backoff is intended for use with special multiprocessor designs or bus architectures that do not implement
124. bus when LOCK was asserted LOCK is deasserted after the write portion of an atomic access It is the responsibility of external arbitration logic to monitor the LOCK pin and enforce its meaning for atomic memory operations See Figure 11 16 A31 4 SUP DMA D C BE3 0 1 045 Figure 11 16 LOCK Signal 11 27 EXTERNAL BUS DESCRIPTION In 11 5 EXTERNAL BUS ARBITRATION 1960 Cx processors provide a shared bus protocol to allow another bus master to access the processors bus The processor enters the hold state when an external bus master is granted bus control In the hold state the processors data address and control lines are floated high Z to allow the external bus master to control the bus and memory interface The HOLD input signal is asserted to indicate that another processor or peripheral is attempting to control the bus The HOLDA Hold Acknowledge output signal acknowledges that the 1960 Cx processors have relinquished the bus Bus pins float on the same clock cycle in which the hold request is granted HOLDA asserted When the 1960 Cx processors need to access the bus they use the bus request signal BREQ to signal the other processor or peripheral When the HOLD signal is asserted the 1960 Cx processors grant the hold request asserts HOLDA and relinquishes control as follows e Ifthe bus is in the idle state the hold request is granted im
125. cache does not detect modification to program memory by loads stores or actions of other bus masters Several situations may require program memory modification such as uploading code at initialization or uploading code from a backplane bus or a disk To achieve cache coherence instruction cache contents can be invalidated after code modification is complete The sysctl instruction is used to invalidate the instruction cache for the 1960 Cx component sysctl is issued with an invalidate instruction cache message type See section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 The user program is responsible for synchronizing a program with code modification and cache invalidation In general a program must ensure that modified code space is not accessed until modification and cache invalidate is completed Instruction cache can be turned off causing all instruction fetches to be directed to external memory Disabling the instruction cache is useful for debugging or monitoring a system at the instruction prefetch level To disable the instruction cache sysctl is executed with the configure instruction cache message 2 13 PROGRAMMING ENVIRONMENT intel When the cache is disabled the processor depends on a 16 word instruction buffer to provide decoding instructions The instruction buffer is organized as two sets of two way set associative cache with a four word line size When the main cache is disabled small loops of code may still execut
126. cacheable load The machine does cacheable load not issue any user process stores until all cacheable loads have returned to the data cache BCU busy The BCU can only support one access on every clock If the BCU is processing a load from a cache miss on the previous cycle it cannot process an instruction on the current cycle The IS stalls issuance of the instruction to the BCU for one clock in this case Data cache busy The data cache is a resource which is shared between returning loads from the BCU and the IS issuing loads stores The IS stalls issuance of a load or store for one clock so the returning load form the BCU can be written into the data cache A 2 4 Processing Units Once the IS issues a group of instructions the appropriate processing units begin instruction execution in parallel with all other processor operations The following sections describe each unit s pipelines and execution times of the instructions they process A 2 4 1 Execution Unit EU The EU performs arithmetic logical move comparison bit and bit field operations The EU receives its instructions over the REG machine bus and receives source operands over the src and src2 buses and returns its result over the dst bus A 20 intel The EU pipeline is shown in Figure A 8 In the clock in which an EU instruction is issued the EU latches the source operands and begins performing the operation In the following clock the instruction com
127. case DMA Throughput used for DMA Latency Calculation Base worst case throughput per request PCLK2 1 cycles Source Synchronized Destination Synchronized Transfer Type M Nro chain Nro chain source to dest data length TO first no EOP with EOP 8 to 8 Multi Cycle 15 22 61 63 85 84 8 to 16 Multi Cycle aligned 17 32 63 71 95 92 unaligned 20 32 62 69 98 92 8 to 32 Multi Cycle aligned 18 53 63 90 96 113 unaligned 18 53 60 90 96 113 16 to 8 Multi Cycle aligned 20 23 69 62 108 81 unaligned 20 23 62 60 108 81 16 to 16 Multi Cycle aligned 20 24 90 89 114 112 unaligned 35 50 112 117 129 138 16 to 32 Multi Cycle aligned 35 42 104 103 150 127 unaligned 55 73 123 136 170 158 32 to 8 Multi Cycle aligned 21 25 92 64 87 83 unaligned 21 28 63 65 87 86 32 to 16 Multi Cycle aligned 20 26 93 89 110 110 unaligned 52 66 120 129 142 150 32 to 32 Multi Cycle aligned 24 33 92 74 94 95 unaligned 30 52 118 93 114 114 128 to 128 Multi Cycle 19 29 63 68 67 75 8 bit Fly by 27 27 59 59 88 80 16 bit Fly by 27 27 59 59 88 80 32 bit Fly by 27 27 59 59 88 80 128 bit Fly by 27 27 59 59 88 80 Additional components of worst case DMA latency depend on DMA controller configuration These components are defined in Table 13 7 and their values are given in Table 13 8 13 43 DMA CONTROLLER intel Table 13 7 DMA Latency Components Set up Des
128. could lock out the DMA store indefinitely This condition would be compounded by the fact that two of the three BCU queues are assigned to the user process when DMA is active Allowing DMA stores to be unconditionally issued makes the DMA process more deterministic but it poses one potential data cache coherency issue When the user process has a cacheable load pending in the BCU and the DMA issues a store to the same address stale data can end up in the cache It is up to the user to synchronize such operations in software There are three possible solutions e Wait for the entire DMA transfer to complete before reading the data e Check DMA destination address to ensure that the DMA has progressed beyond the address in question before reading it e Disable the data cache for the memory region while the DMA operation is underway A 1 8 10 External I O and Bus Masters and Cache Coherency 1960 CF processor implements a single processor coherency mechanism There is no hardware mechanism such as bus snooping to support multiprocessing If another bus master can change shared memory there is no guarantee that the data cache contains the most recent data The user must manage such data coherency issues in software Users typically program the MCONO 15 registers such that I O regions are non cacheable Parti tioning the system this way eliminates I O as a source of coherency problems A 13 INSTRUCTION EXECUTION AND PERFORM
129. current IP Here the target operand is an effective address which allows the full range of addressing modes to be used to specify target instruction s IP IP displacement addressing mode allows the instruction to be IP relative Indirect branching can be performed by placing target address in a register then using a register indirect addressing mode Refer to section 3 3 MEMORY ADDRESSING MODES pg 3 5 for a complete discussion of the addressing modes b IP lt IP displacement resume execution at new bx IP lt targ resume execution at new IP Trace Instruction Branch Instruction and Branch Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC b 1 Operation _ Unimplemented Execution from on chip data RAM Operand Invalid operand value encountered bx only Opcode Invalid operand encoding encountered bx only b xyz IP lt xyz bx 1332 ip IP lt IP 8 1332 this example uses IP relative addressing b 08H CTRL bx 84H MEM bal balx BRANCH IF COMPARE AND BRANCH bbc bbs intel INSTRUCTION SET REFERENCE 9 3 8 bal balx Mnemonic bal Branch and Link balx Branch and Link Extended Format bal targ disp balx targ dst mem reg Description Stores address of instruction following bal or balx in a register then branches to the instruction specified with the targ operand The bal and balx instructions are used to call leaf
130. e tta dre tha dus 10 13 10 6 BUS CONTROLLER 10 13 10 6 1 B s QUEE 5 ish ceti etc mf ete 10 14 10 6 2 Data Packing hil orci eee 10 15 10 6 3 Bus Translation Unit and Sequencer 10 15 CHAPTER 11 EXTERNAL BUS DESCRIPTION 11 1 OVERVIEW d trt nef fat tette 11 1 11 1 1 Terminology Requests and ACCESSES 11 1 11 1 1 1 ROQUESE 11 1 11 1 1 2 ACCOSS otv Hebe 11 2 11 1 2 ConfIgUratlOn 11 2 11 2 BUS OPERATION tcrtio eae mete nde vae Det ec 11 2 11 2 1 Walt States indc me ete tieni Hd ict tet tette eed des 11 4 11 2 2 Bus ie ee EIE E E Prec 11 10 11 2 3 Non Burst Beq ests een ei eon er eR epe etie ipiius 11 12 11 2 4 Burst ACCES SeS eene eedem riens 11 13 11 2 5 Pipelined Read ACCESSES eet deter 11 21 11 3 LITTLE OR BIG ENDIAN MEMORY 0 11 24 11 4 ATOMIC MEMORY OPERATIONS The LOCK Signal 11 26 11 5 EXTERNAL BUS 11 28 11 5 1 Bus Backoff Function BOFF 11 29 CHAPTER 12 INTERRUPT CONTROLLER 12 1 OVERVIEW tci oce aon tienen eei det emt e t 12 1 12 2 MANAGING INTERRUPT
131. enabled Instruction and Breakpoint Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC br 1 Operation Unimplemented Execution from on chip data RAM Assume that the breakpoint trace mode is enabled ld xyz r4 addi r4 5 mark Breakpoint trace event is generated at this point in the instruction stream mark 66BH REG fmark modpc modtc 9 47 INSTRUCTION REFERENCE intel 9 3 32 Format Description Action Faults Example Opcode See Also 9 48 modac modac Modify AC modac mask SIC dst reg lit sfr reg lit sfr reg sfr Reads and modifies the AC register src contains the value to be placed in the AC register mask specifies bits that may be changed Only bits set in mask are modified Once the AC register is changed its initial state is copied into dst temp lt AC AC src and mask or AC andnot mask dst lt temp Type Mismatch Non supervisor reference of a sfr modac 41 g9 412 AC lt 49 masked by gl 412 initial value of AC modac 645H REG modpc modtc intel INSTRUCTION SET REFERENCE 9 3 33 modi Mnemonic modi Modulo Integer Format modi srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Divides src2 by srcl where both are integers and stores the modulo remainder of the result in dst If the result is nonzero dst has the same sign as sr
132. ext state is STATE IF LAST amp amp WAIT n ELSE B next state is STATE ELSE next state is STATE 2 Generate address 11 DRAM ADRO 1 DRAM ADR1 1 IF BLAST THEN the next state is STATE 0 ELSE next state is STATE 3 A11 12 24 BUS INTERFACE EXAMPLES B 3 8 DRAM Controller State Machine Figure B 19 is a state machine that describes DRAM control logic The state machine shown or subsets thereof may be implemented in a variety of ways depending on the application s require ments PLD implementations are the easiest and the design may fit into a variety of high speed PLDs Signals going into the DRAM control logic are ADS PCLK W R BLAST WAIT BE3 0 from the bus controller DACKO the DMA acknowledge signal and DRAM CS a system generated chip select that indicates a DRAM access DRAM control logic generates RAS CAS3 0 WE and COL ADR Control signal for the address multiplexer is COL ADR Controller logic relies on the wait state region table and DMA controller Programming these on chip peripherals is described later DMA acknowledge DACKO indicates DRAM refresh cycle The DRAM WE signal is generated with combinatorial logic WE W R B 25 BUS INTERFACE EXAMPLES intel ADS amp DRAM CS amp IDACKO IW R READ ACCESS WRITE ACCESS 5 Blast ADS amp DRAM CS amp DACKO F CA119A
133. first instruction See section 14 2 INITIALIZATION pg 14 2 for a complete description of the processor reinitialization steps 4 22 INSTRUCTION SET SUMMARY The reinitialize message is useful for changing the Initial Memory Image For example at initial ization the interrupt table is moved to RAM so the interrupts may be posted in the table s pending interrupts and priorities fields In this case the reinitialize message specifies a new PRCB which contains a pointer to the new interrupt table in RAM See section 14 3 1 Reinitializing and Relocating Data Structures pg 14 11 4 3 2 5 Load Control Registers Executing sysctl with message type 04H causes the on chip control registers to be loaded with data from external memory Each sysctl invocation causes four words from the Control Register Table in external memory to be read and then placed in their respective internal control registers Field 1 must contain the number of the register group to be loaded Table 4 5 shows the register group number and the registers represented in the Control Register Table At initialization or when the processor is reinitialized all groups in the control table are automati cally loaded into the on chip control registers 4 23 INSTRUCTION SET SUMMARY 4 24 Table 4 5 Control Register Table and Register Group Numbers tel Group Byte Offset in Table Control Register Loaded 00H Data Address
134. follow do not target registers already in use the processor can execute those instructions before the prior instruction execution completes A common application of this feature is to execute one or more single cycle instructions concur rently with a multi cycle instruction e g multiply or divide Example 2 1 shows a case where register scoreboarding prevents a subsequent instruction from executing It also illustrates overlapping instructions which do not have register dependencies Register scoreboarding is implemented for global and local registers but not for SFRs When a SFR is the destination of a multi cycle instruction the programmer must prevent access to the SFR until the multi clock instruction returns a result to the SFR 2 4 PROGRAMMING ENVIRONMENT Example 2 1 Register Scoreboarding r6 is scoreboarded add must wait for the previous multiply to complete r10 is scoreboarded and instruction is executed concurrently with multiply 2 2 5 Literals The architecture defines a set of 32 literals which can be used as operands in many instructions These literals are ordinal unsigned values that range from 0 to 31 5 bits When a literal is used as an operand the processor expands it to 32 bits by adding leading zeros If the instruction requires an operand larger than 32 bits the processor zero extends the value to the operand size If a literal is used in an instruction that requires integer operand
135. for system data structures and initial configuration information for the core and integrated peripherals see Figure 14 2 The base address pointers are cached in internal registers at initialization The base addresses are accessed from these internal registers until the processor is reset or reinitialized The initial configuration information is programmed in the arithmetic controls AC initial image the register cache configuration word the fault configuration word and the instruction cache configuration word Figure 14 3 shows these configuration words 14 8 intel INITIALIZATION AND SYSTEM REQUIREMENTS AC Register Initial Image Condition Code Bits AC cc Integer Overflow Flag AC of 0 no overflow 1 overflow Integer Overflow Mask Bit AC om 0 enable overflow faults 1 mask overflow faults No Imprecise Faults Bit AC nif m allow imprecise fault conditions us imprecise E e 8 4 0 Fault Configuration Word Must be set to 1 2 20 16 12 8 4 0 Mask Non Aligned Bus Request Fault 0 enable the fault 1 mask the fault Instruction Cache Configuration Word Disable Instruction Cache 0 enable cache 1 disable cache 31 28 24 20 16 12 8 4 0 Register Cache Configuration Word Number of cached register sets 0 15 i 1 31 28 24 20 16 12 8 4 0 Reserved CR076A Initialize to 0 Figure 14 3 Process Control Block Configuration Words 14 9 IN
136. fp lda user stack gl new pfp call move frame ldconst 0 001 2403 r3 PC mask ldconst 0 000 0003 r4 PC value modpc r3 r3 r4 out of interrupted state call main to main routine terminated fmark cause breakpoint trace fault b terminated move_frame 40 new frame pointer gl new previous frame pointer PFP This routine switches stacks It should be called using a local call The new stack pointer SP is calculated by finding the relative offset between the old FP and old SP then adding this offset to the new FP 14 18 intel INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 2 Startup Routine init s Sheet 6 of 6 move frame andnot Oxf pfp r3 old FP mov 40 r6 new FP flushreg ld 4 r3 r4 old SP subo old SP offset from 1 ldq r3 r8 from old frame addo Poy St stq r8 r6 to new frame addo l6 r6 r6 cmpobl r3 r4 1b addo 90 r5 r4 new SP st gl g0 store new PFP in new frame st r4 4 40 store new SP in new frame mov g0 pfp new ret globl J intr stack globl J user stack globl supervisor stack bss user stack 0x0200 6 default application stack bss intr stack 0x0200 6 interrupt stack bss Supervisor stack 0x0600 6 fault supervisor stack fault handler ldconst F 40 call QUO ret default syspro
137. group of registers These instructions use the REG format mov move word move long word movt move triple word movq move quad word 4 2 1 3 Load Address The Load Address instruction Ida computes an effective address in the address space from an operand presented in one of the addressing modes Ida is commonly used to load a constant into a register This instruction uses the MEM format and can operate upon local or global registers On the 1960 Cx processors lda is useful for performing simple arithmetic operations The processor s parallelism allows Ida to execute in the same clock as another arithmetic or logical operation 4 2 2 Arithmetic Table 4 2 lists arithmetic operations and data types for which the 1960 Cx processors provide instructions X in this table indicates that the microprocessor provides an instruction for the specified operation and data type Extended shift right operation is an 1960 Cx processor specific extension to the 1960 processor family s instruction set All arithmetic operations are carried out on operands in registers Refer to section 4 2 11 Atomic Instructions pg 4 18 for instructions which handle specific requirements for in place memory operations arithmetic instructions use the REG format can operate on local global or special function registers The following subsections describe arithmetic instructions for ordinal and integer data types 4 6 intel INSTRUCTI
138. in the 1960 CX architecture In a pipelined access the data cycle and address cycle of two accesses overlap This is possible because address and data lines are not multiplexed A valid address can be presented on the address bus while a previous access ends with a data transfer on the data bus section 10 2 2 Burst and Pipelined Read Accesses pg 10 3 explains how to configure the bus for pipelined accesses WIR is a status signal which discerns between a write request store or a read request load or prefetch DT R and DEN pins are used to control data transceivers Data transceivers may be used in a system to isolate a memory subsystem or control loading on data lines DT R is used to control transceiver direction the signal is low for read requests and high for write requests DT R is valid on the falling PCLK2 1 edge during the address cycle DEN is used to enable the transceivers it is asserted on the rising PCLK2 1 edge following the address cycle DT R and DEN timings ensure that DT R does not change when DEN is asserted D C DMA and SUP provide information about the source of bus request D C indicates that the current request is data or a code fetch DMA indicates that the current request is a DMA access SUP indicates that the current request was originated by a supervisor mode process When used with a logic analyzer these signals aid in software debugging D C may also be used to implement separate external data and
139. instruction s opcode The encoding format of the instruction Appendix D details the complete encoding for each instruction Indicates which group of parallel processing units are used to execute the instruction Table A 2 lists the possible machine types Table A 2 Machine Type Shorthand Symbol Definition R Register The instruction is executed in parallel by a processing unit on the Register side of the processor M Memory The instruction is executed in parallel by a processing unit on the Memory side of the processor C Control The instruction is executed by the Instruction Scheduler in parallel with other R or M type instructions u Micro flow The processor performs this instruction by issuing a sequence of R M and or C type instructions stored in its internal ROM The instruction s execution time is listed in two ways Instruction Issue and Result Latency These times are not additive they represent a range from minimum to maximum within which actual execution time will fall Instruction Issue time is the number of clocks the instruction uses when there are no register or resource dependencies to slow it down Back to back instructions with no dependencies execute at the instruction issue rate Result Latency is the length of time that an instruction uses to complete once it begins Back to back instructions which are dependent upon each other execute at the Result Laten
140. instruction set and 1960 Cx processor specific instruction set extensions Also discussed are the assembly language and instruction encoding formats various instruction groups and each group s instructions CHAPTER 9 INSTRUCTION SET REFERENCE describes each instruction including assembly language syntax and the action taken when the instruction executes and examples of how to use the instruction 4 1 INSTRUCTION FORMATS Instructions described in this manual are in two formats assembly language and instruction encoding The following subsections briefly describe these formats 4 1 1 Assembly Language Format Throughout this manual instructions are referred to by their assembly language mnemonics For example the add ordinal instruction is referred to as addo Examples use Intel 80960 assembler assembly language syntax which consists of the instruction mnemonic followed by zero to three operands separated by commas In the following assembly language statement example for addo ordinal operands in global registers g5 and g9 are added together the result is stored in g7 addo g5 g9 g7 97 99 95 In the assembly language listings in this chapter registers are denoted as g global register r localregister sf special function register pound sign precedes a comment All numbers used as literals or in address expressions are assumed to be decimal Hexadecimal numbers are denoted with a Ox prefix e g Oxffff0
141. interrupt stack adjacent to the new frame that is created for the interrupt handling procedure It includes the state of the AC and PC registers at the time the interrupt was received and the interrupt procedure pointer number used Referenced to the new frame pointer address designated NFP the saved AC register is located at address NFP 12 the saved PC register is located at address NFP 16 6 9 INTERRUPTS intel Current Stack 31 local supervisor or interrupt stack 0 FP current frame 31 Interrupt Stack padding area optional data not implemented for i9609 Cx processor saved Process Controls Register NFP 16 Interrupt saved Arithmetic Controls Register NFP 12 Record new frame Reserved 017 Figure 6 3 Storage of an Interrupt Record on the Interrupt Stack 6 8 INTERRUPT SERVICE ROUTINES An interrupt handling procedure performs a specific action that is associated with a particular interrupt procedure pointer For example one interrupt handler task might be to initiate a DMA transfer The interrupt handler procedures can be located anywhere in the non reserved address space Since instructions in the 1960 processor family architecture must be word aligned each procedure must begin on a word boundary When an interrupt handling procedure is called the processor allocates a new frame on the interrupt stack and a set of local registers for the procedure If not already in supervisor mode the p
142. intx word intx Intx intx rintx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx rntx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx int intx intx intx intx intx intx word intsx intx intx intx intx intx intx intx 14 16 intel INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 2 Startup Routine init s Sheet 4 of 6 word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx INEX intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx Intex intx intx intx word intx intx intx intx xntx intx intx intx word intx intx intx intx inte intx intx intx wor
143. is similar to a system supervisor call Here the processor obtains pointers to the interrupt procedures through the interrupt table The processor always switches to supervisor mode on an interrupt procedure call A call to a fault procedure is similar to a system call Fault procedure calls can be local calls or supervisor calls The processor obtains pointers to fault procedures through the fault table and optionally through the system procedure table When a fault call or interrupt call is made a fault record or interrupt record is placed in the newly generated stack frame for the call These records hold the machine state and information to identify the fault or interrupt When a return from an interrupt or fault is executed machine state is restored from these records See CHAPTER 7 FAULTS and CHAPTER 6 INTERRUPTS for more information on the structure of the fault and interrupt records 5 8 RETURNS The return ret instruction provides a generalized return mechanism that can be used to return from any procedure that was entered by call calls callx an interrupt call or a fault call When ret executes the processor uses the information from the return type field in the PFP register Figure 5 5 to determine the type of return action to take Return Status Return Type Field PFP rt Pre Return Trace Flag PFP p Previous Frame Pointer Address PFP a CA014A Figure 5 5 Previous Frame Pointe
144. is the pathway between registers and data RAM Because of the wide internal path a quad word read or write is usually performed in a single clock 10 6 BUS CONTROLLER IMPLEMENTATION The bus controller consists of four units see Figure 10 6 e bus queue e data packing unit e translation unit e sequencer The i960 Cx processors instruction fetch unit execution unit and DMA unit all pass memory requests to the bus controller unit which arbitrates queues and executes these requests 10 13 THE BUS CONTROLLER intel Queue Unit 1 Store Data gt 1 Load Data 4 4 32 Data 32 Address Address Bus 32 inr ala UN i ata Bus Address gt lt Control Control Configuration Data Memory Region 16 Entries Configuration Table MCON 0 15 L F_CA032A Figure 10 6 Bus Controller Block Diagram 10 6 1 Bus Queue The bus controller has a queue which contains entries for up to three bus requests Each queue entry consists of a 32 bit address up to 128 bits of data four words and control information The bus queue decouples high bandwidth 128 bit wide data internal data buses from the lower bandwidth 32 bit wide data external bus Two of these queue entries are reserved for bus requests generated from user code The third queue entry is used by the DMA controller If no DMA channels are set up the third slot is also used by user code Use
145. literals or local global or special function registers modpc provides a method of reading and modifying PC register contents Only programs operating in supervisor mode may modify the PC register however any program may read it 4 18 INSTRUCTION SET SUMMARY The processor provides a flush local registers instruction flushreg to save the contents of the cached local registers to the stack The flush local registers instruction automatically stores the contents of all the local register sets except the current set in the register save area of their associated stack frames The modify arithmetic controls instruction modac allows the AC register contents to be copied to a register and or modified under the control of a mask The AC register cannot be explicitly addressed with any other instruction however it is implicitly accessed by instructions that use the condition codes or set the integer overflow flag sysctl is 1960 Cx processor specific extension to the 1960 family s instruction set which is used to configure the on chip bus controller interrupt controller breakpoint registers and instruction cache It also permits software to signal an interrupt or cause a processor reset and reinitialization sysctl may only be executed by programs operating in supervisor mode sdma and udma are 1960 Cx processor specific extensions to the 1960 family s instruction set which configure and monitor the on chip DMA control
146. load followed by a byte store DMA throughput is increased however the DMA makes more bus requests to transfer the same amount of data 13 4 5 Data Alignment The DMA controller can handle fully unaligned DMA transfers under most circumstances When both the source and destination address increment there are no alignment requirements for byte short or word transfers The byte count may also be unaligned or any value Addresses for all quad word request transfer modes must always be quad word aligned and the byte count must always be evenly divisible by 16 To interface to external DMA devices the source or destination address may be set up as fixed Fixed addresses must always be aligned to the request length boundary The byte count for the fixed addressing mode must be evenly divisible by the width of the fixed transfer For example a 32 16 transfer with a fixed destination address must have the byte count evenly divisible by two Table 13 3 summarizes the alignment requirements for all DMA transfers The byte count alignment depends on DMA controller configuration see Table 13 2 For proper operation the byte count must be evenly divisible by the byte count alignment value For example the byte count for a 32 bit fly by transfer must be evenly divisible by four Table 13 2 DMA Configuration and Byte Count Alignment Configuration Byte Count Alignment Multi cycle block mode with byte short word or word long source or dest
147. loads and stores to non aligned addresses Therefore code which generates non aligned addresses may not be compatible with all 1960 processor implementations The 1960 CA CF processors automatically handle non aligned load and store requests in microcode See section 10 4 DATA ALIGNMENT pg 10 9 The address boundaries on which an operand begins can impact processor performance Operands that span more word boundaries than necessary suffer a cost in speed due to extra bus cycles In particular an operand that spans a 16 byte quad word boundary suffers a large cost in speed Alignment of architecturally defined data structures in memory is implementation dependent See section 2 4 ARCHITECTURE DEFINED DATA STRUCTURES pg 2 8 Code which relies on specific alignment of data structures in memory is not portable to every 1960 processor type For each 1960 processor type stack frame alignment is defined according to an SALIGN parameter This alignment boundary is calculated from the relationship SALIGN 16 In the 1960 Cx processors SALIGN 1 so stack frames are aligned on 16 byte boundaries The low order bits of the Frame Pointer are ignored and are always interpreted to be zero The N parameter is defined by the following expression SALIGN 16 2N Thus for the i960 Cx processors N is 4 C 3 CONSIDERATIONS FOR WRITING PORTABLE CODE intel C 3 RESERVED LOCATIONS IN REGISTERS AND DATA STRUCTURES Some register and data st
148. normally not possible Imprecise faults are described in section 7 9 PRECISE AND IMPRECISE FAULTS pg 7 17 Unless imprecise faults are disallowed a parallel fault handling procedure generally does not attempt to recover from the faults but instead calls a debug monitor to analyze the faults If recovery from every parallel fault is possible the RIP allows the processor to resume executing the program when the fault handling has completed Even though multiple faults can be generated by multiple instructions executing in parallel only one fault is ordinarily generated per instruction as described in section 7 6 1 Multiple Faults pg 7 9 7 6 7 Fault Record for Parallel Faults Figure 7 5 shows the structure of the fault record for parallel faults 7 10 FAULTS 31 0 Address of Faulting Instruction n 8 1 32 4 1 32 Address of Faulting Instruction 2 NFP 104 NFP 100 Reserved F CA022A Figure 7 5 Fault Record for Parallel Faults To calculate byte offsets n indicates fault number Thus for the second fault recorded n2 the relationship NFP 4 n 1 32 reduces to NFP 100 For the 1960 Cx processors number of parallel faults allowed is 2 or 3 When multiple parallel faults occur the processor selects one of the faults and records it in the first 16 bytes of the fault record as described in section 7 5 1 Fault
149. operations The MDU receives its instructions over the REG machine bus and source operands over the src and src2 buses and returns its result over the dst bus Once the IS issues an MDU instruction the MDU performs its operations in parallel with all other execution The MDU pipeline for the 32x32 mulo instruction is shown in Figure 9 In the clock in which the multiply is issued the MDU latches the source operands and begins the operation The multiply completes and the result is written to the destination register in the fifth clock following the clock in which the instruction was issued When an instruction immediately follows a multiply which references the multiply s destination the instruction is not issued until the clock in which the multiply result is returned For example an addo which follows a multiply and references the destination of the multiply is delayed until the fourth clock after the processor issues the multiply This five clock multiply latency is easily hidden four to eight instructions could be placed between the multiply and add without increasing the total number of processor clocks used addo g0 41 g2 mulo g3 g4 g5 addo g5 g6 g7 Instruction Scheduler ssue addo mulo addo EU Read src1 src2 90 91 95 g6 Pipeline ano g2 lt 90 01 07 lt 05 06 Read 1 src2 g3 g4 MDU Pipeline Write dst g5 g3 g4 Figure A 9 MDU Execution Pipelin
150. placed on lines D31 0 i9609 CA CF Microprocessor 2 BE1 BEO F CA034A Figure 11 4 Data Width and Byte Enable Encodings The four byte enable signals are encoded in each region to generate proper address signals for 8 16 or 32 bit memory systems e 8 bit region BEO is address line BEI is address line e 16 bit region BEI is address line Al BE3 is the byte high enable signal BEO is the byte low enable signal BLE e 32 bit region byte enables are not encoded Byte enables BE3 0 select byte 3 to byte 0 respectively Address lines A31 2 provide the most significant portion of the address See Table 11 2 11 10 intel EXTERNAL BUS DESCRIPTION For regions configured for 8 and 16 bit bus widths data is repeated on the upper data lines for aligned store operations When storing a value to an 8 bit bus region the processor drives the same byte wide data onto lines D7 0 D15 8 D23 16 and D31 24 simultaneously When storing a value to memory in a 16 bit bus region the processor drives the same short word data onto lines D15 0 and D31 16 simultaneously Table 11 2 Byte Enable Encoding 8 Bit Bus Width BYTE X BE2 X BE1 A1 BEO A0 0 X X 0 1 X X 1 2 X X 1 0 X X 1 1 16 Bit Bus Width BYTE BHE BE2 X BE1 1 BEO BLE 0 1 X 0 0 2 3 X 1 0 0 1 X 0 0 1 0
151. previous stack frame Although fill is a function of bus fill equals 36 when the stack is in external zero wait state memory frames The number of register sets flushed to memory fixup When the shrdi instruction concludes a four clock micro flow executes if any bits shifted out were set and the source operand was negative Fixup is four clocks for this case Fixup is zero clocks for positive operands and for negative operands in which only zeros are shifted out A 3 i960 Microprocessor Instruction Set Quick Reference intel March 1994 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference For each instruction this quick reference lists mnemonic name assembler syntax action opcode instruction format machine type and execution time Mnemonic Name Assembler Syntax The acronym recommended for use by 19609 processor assemblers A descriptive name for the instruction The recommended operand ordering and syntax for 1960 processor assemblers Action An abbreviated algorithmic description of the action that the instruction performs including modification to the AC PC or TC registers Any possible faults generated are also listed Table 1 describes the meaning of the shorthand used in the register and fault sections of the reference Table 1 Action Shorthand Symbol Definition Symbol Definition
152. procedure is not available in the cache these local registers must be retrieved from the procedure stack in memory This operation is referred to as a frame fill Figure 5 3 illustrates a return operation with and without a frame fill Register cache size is specified at initialization by the register cache configuration word value in the PRCB The 1960 Cx register cache size is adjustable to hold from 1 to 14 sets of local registers See section 14 2 6 Process Control Block PRCB pg 14 8 for more information about initial ization and the PRCB 5 6 PROCEDURE CALLS call with no frame spill Procedure stack 0 Main successive numbers indicate nested procedure level local register cache default depth 5 sets current local register set local register set n stored on procedure stack user m Stack space call with frame spill reserved for local register set n F_CA011A Figure 5 2 Frame Spill 5 7 PROCEDURE CALLS return with no frame fill Procedure stack 0 Main successive numbers indicate nested procedure level local register cache default depth 5 sets current local register set local register d user n setn store stack on procedure stack space return with frame fill FRAME FILL reserved CA012A for local register set n Figure 5 3 Frame Fill Up to five local register sets are cached by default with no im
153. procedures procedures that do not call other procedures The IP saved in the register provides a return IP that the leaf procedure can branch to using a b or bx instruction to perform a return from the procedure Note that these instructions do not use the processor s call and return mechanism so the calling procedure shares its local register set with the called leaf procedure With bal address of next instruction is stored in register g14 targ operand value can be no farther than 223 to 223 4 bytes from current IP When using the Intel 1960 processor assembler targ must be a label which specifies the target instruction s IP balx performs same operation as bal except next instruction address is stored in dst allowing the return IP to be stored any available register With balx target instruction can be farther than 42284502324 bytes from current IP Here the target operand is a memory type which allows full range of addressing modes to be used to specify target IP IP displacement addressing mode allows instruction to be IP relative Indirect branching can be performed by placing target address in a register and then using a register indirect addressing mode Refer to section 3 3 MEMORY ADDRESSING MODES pg 3 5 for a complete discussion of addressing modes available with memory type operands Action bal 514 IP 4 next IP destination is always g14 IP IP displacement resume execu
154. program to synchronize with a completed chained buffer transfer With either mechanism an interrupt is generated when the chained buffer is complete The distinction between the mechanisms are 1 DMA operation continues with no delay on the next chaining buffer The interrupt service routine may process the data transferred for the completed buffer 2 DMA waits until the user program processes the first chaining buffer and sets up the next buffer transfer by modifying the chaining descriptors DMA continues with the next buffer transfer when a bit in the DMA control register DMAC is cleared These options are selected when the DMA channel is set up with the sdma instruction chaining buffers chaining buffers Buffer 1 Buffer 1 interrupt procedure interrupt procedure po 1 LP UR d l l e CLRBIT 16 512 sf2 ESSET S o ul 22 21 Buffer 2 Buffer 2 F CA065A Figure 13 8 Synchronizing to Chained Buffer Transfers 13 17 DMA CONTROLLER intel 13 8 TERMINATING A DMA A DMA operation normally ends when one of the following events is encountered e DMA byte count reaches zero 0 for a non chained DMA mode e EOP3 0 pin programmed as an input becomes active for a channel that is non chained source only chained or destination only chained e EOP3 0 pin programmed as an input becomes active during the last buffer transfer for channel which is s
155. provide eight external interrupt pins and one non maskable interrupt pin for detecting external interrupt requests The eight external pins can be configured as dedicated inputs where each pin is capable of requesting a single interrupt The external pins can also be configured in an expanded mode where the value asserted on the external pins represents an interrupt vector number In this mode up to 248 values can be directly requested with the interrupt pins The external interrupt pins can be configured in mixed mode In this mode some pins are dedicated inputs and the remaining pins are used in expanded mode 12 8 INTERRUPT CONTROLLER 12 3 1 Pin Descriptions The interrupt controller provides nine interrupt pins XINT7 0 External Interrupt input These eight pins cause interrupts to be requested Pins are software configurable for three modes dedicated expanded mixed Each pin can be programmed as an edge or level detect input Also a debounce sampling mode for these pins can be selected under program control NMI Non Maskable Interrupt input This edge activated pin causes a non maskable interrupt event to occur NMI is the highest priority interrupt recognized A debounce sampling mode for NMI can be selected under program control These pins are internally synchronized 12 3 2 Interrupt Detection Options XINT7 0 pins can be programmed for level low or falling edge detection when used as dedicated i
156. reg sfr i I U IO M 59 3 REG R 0 5 1 1 dst src2 srcl Subtract Ordinal subo srcl src2 dst reg lit sfr reg lit sfr reg sfr I I U 592 REG R 0 5 1 1 dst lt src2 srcl March 1994 Page 16 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description nif om of 3 va p p em te Events Modes T Opcode Format pun jestuction 7 Synchronize Faults synef AC nif s 1 U 66F REG u 4 4 Wait until no imprecise fault could occur associated with instructions which have begun but are not completed System Control srcl sre2 sre3 sysctl reg lit sfr reg lit sfr reg lit 65 9 REG u 80960CA sre and Oxff gt gt 8 switch i ase 0 Post an Interrupt c I U M 37 bus 37 bus ase T Purge the Instruction Cache DNE I U M 38 38 ase 2 Configure the Instruction Cache T Kbyte cache enabled I Z U M 52 52 1 Kbyte cache disabled 48 48 Toad and lock 1 Kbyte 2078 bus 2078 bus Toad and lock 512 bytes 1103 bus 1103 bus break
157. registers For dst must specify an even numbered register e g g0 g2 or rO 2 For Idt and Idq dst must specify a register number that is a multiple of four e g g0 g4 g8 or rO r4 r8 Results are unpredictable if registers are not aligned on the required boundary or if data extends beyond register 15 or r15 for Idl Idt or Idq Id dst memory word src Idob dst lt memory byte src zero extended to 32 bits Idos dst memory short src zero extended to 32 bits Idib dst lt memory byte src sign extended to 32 bits Idis dst lt memory short src sign extended to 32 bits Idi dst lt memory long src Idt dst lt memory triple src Idq dst lt memory quad src intel INSTRUCTION SET REFERENCE Faults Operation Unaligned An unaligned src was referenced and bit 30 of the Fault Configuration Word is 0 Invalid Operand Invalid operand value encountered Opcode Invalid opcode encoding encountered Example 101 2450 r3 r10 r10 rll r3 2450 in memory Opcode 90 Idob 80H MEM Idos 88H MEM Idib COH MEM Idis C8H MEM Idl 98H MEM Idq BOH MEM See Also MOVE STORE 9 45 INSTRUCTION SET REFERENCE intel 9 3 30 Format Description Action Faults Example Opcode 9 46 Ida Ida Load Address Ida STC dst mem reg efa Computes the effective addr
158. requests The bus controller executes these memory processes as it would a load store or prefetch request from the user process External bus access is shared equally between the user and DMA process The bus controller executes bus requests by each process in alternating fashion 13 1 DMA CONTROLLER intel The DMA controller is configurable to best exploit the core s processing capabilities and external bus performance Source and destination request lengths are programmed for each DMA channel Based on request length the DMA controller optimizes transfer performance between source and destination with different external data bus widths A DMA can be programmed for quad word transfers taking best advantage of external bus burst capabilities The DMA controller can also efficiently execute transfers of unaligned data A single cycle fly by transfer mode gives the highest performance transfers for a DMA In this mode a single bus request executes a transfer of data from source to destination A data chaining mode simplifies several commonly performed DMA operations such as scatter or gather Data chained DMAs are configured with a series of descriptors in memory Each descriptor describes the transfer of a single buffer or portion of the entire DMA These descriptors can be dynamically changed as the chained DMA progresses DMA setup and control is simple and efficient The setup DMA sdma instruction sets up a DMA operation sdma specifi
159. scale displacement exp reg reg scale Index with displacement index scale displacement exp reg scale IP with displacement IP displacement 8 exp IP NOTE reg is register and exp is an expression or symbolic label 3 3 1 Absolute Absolute addressing modes allow a memory location to be referenced directly as an offset from address OH At the instruction encoding level two absolute addressing modes are provided absolute offset and absolute displacement depending on offset size e For the absolute offset addressing mode the offset is an ordinal number ranging from 0 to 4095 The absolute offset addressing mode is encoded in the MEMA machine instruction format For the absolute displacement addressing mode the offset is an integer a displacement ranging from 23 to 231 1 The absolute displacement addressing mode is encoded in the MEMB format Encoding level addressing modes and instruction formats are described in CHAPTER 9 INSTRUCTION SET REFERENCE At the assembly language level the two absolute addressing modes are combined into one Both modes use the same syntax Typically development tools allow absolute addresses to be specified through arithmetic expressions e g x 44 or symbolic labels After evaluating an address specified with the absolute addressing mode the assembler converts the address into an offset or displacement and selects the appropriate instruction e
160. sfr Arithmetic Zero Divide The srcl operand is 0 Integer Overflow Result is too large for destination register remi only If overflow occurs and AC om 1 the fault 15 suppressed and AC of is set to 1 The least significant 32 bits of the result are stored in dst Example remo r4 r5 r6 r6 e r5 rem r4 Opcode remi 748 708 REG See Also modi 9 61 INSTRUCTION SET REFERENCE intel 9346 Format Description Faults 9 62 ret Return ret Returns program control to the calling procedure The current stack frame 1 that of the called procedure is deallocated and the FP is changed to point to the calling procedure s stack frame Instruction execution is continued at the instruction pointed to by the RIP in the calling procedure s stack frame which is the instruction immediately following the call instruction As shown in the action statement below the return status field and prereturn trace flag determine the action that the processor takes on the return These fields are contained in bits 0 through 3 of register rO of the called procedure s local registers Refer to section 5 2 3 Call and Return Action pg 5 5 for discussion of ret wait for any uncompleted instructions to finish case return type is if PFP rt20015 or PFP rtz1115 return from fault or interrupt handler AC lt memory FP 12 if PC em supervisor PC lt memory FP
161. significant byte at the highest byte address in memory So if a big endian ordered word is stored at address 600 the least significant byte is stored at address 603 and the most significant byte at address 600 The 1960 Cx processors use little endian byte ordering internally for data in registers and data in internal data RAM Data in memory except for internal data RAM can be stored in either little or big endian order A bit in the region table entry for a memory region determines the type of byte ordering used in that region Data and instructions can be located in either big or little endian regions Both byte ordering methods are supported for short word and word data types Table 11 4 shows how a word half word and byte data types are transferred on the bus according to the type of byte ordering used for the selected memory region and bus width 32 16 or 8 bits All transfers shown in the table are aligned memory accesses 11 24 EXTERNAL BUS DESCRIPTION For the word data type assume that a hexadecimal value of aabbccddH is stored in an internal 1960 Cx processor register where aa is the word s most significant byte and dd is the least significant byte Table 11 4 shows how this word is transferred on the bus to either a little endian or big endian region of memory For the half word data type assume that a hexadecimal value of ccddH is stored in one of the 1960 Cx processors internal registers Note that the half wo
162. that causes the fault during instruction execution e immediately following execution When the fault occurs before the faulting instruction is executed the faulting instruction may be re executed upon return from the fault handling procedure 7 12 ntel e FAULTS When a fault occurs during or after execution of the faulting instruction the fault may be accompanied by a program state change such that program execution cannot be resumed after the fault is handled For example when an integer overflow fault occurs the overflow value is stored in the destination If the destination register 15 the same as one of the source registers the source value is lost making it impossible to re execute the faulting instruction In general resumption of program execution with no changes in the program s control flow is possible with the following fault types or subtypes e Operation Subtypes e Arithmetic Zero Divide e Constraint Subtypes e Trace Subtypes Length Resumption of the program may or may not be possible with the following fault subtype e nteger Overflow The effect of specific fault types on a program is defined in section 7 10 FAULT REFERENCE pg 7 20 under the heading Program State Changes 7 7 3 Returning to the Point in the Program Where the Fault Occurred As described in section 7 7 2 Program Resumption Following a Fault pg 7 12 most faults can be handled such that program control flow
163. the DMA process can take up to four clocks for every one user process clock The effect of the throttle bit on DMA performance is fully described in section 13 11 10 Performance pg 13 36 Data cache global disable bit bit 30 controls the global enabling and disabling of the data cache After each region is configured as either cacheable or non cacheable through the Region Table entries the data cache must still be globally enabled Set this bit to 0 to enable the data cache set to 1 to globally disable the data cache Setting this bit only disables the data cache it does not invalidate any of the entries When the data cache is disabled all loads and stores are treated as non cacheable Data is not written into the cache for either a load or store After reset the data cache is initially disabled and invalidated with this bit set to 1 Due to implementation reasons the data cache is not actually disabled until the second clock following execution of the instruction which sets this bit Any load store issued in parallel or in the clock after this instruction is still directed to the data cache The following code can be used to dynamically disable the data cache setbit 30 sf2 sf2 set the bit to dynamically disable data cache mov g0 40 wait two clocks before executing any code mov g0 40 which accesses the data cache Data cache invalidate bit bit 31 is set to invalidate the entire data cache Setting this
164. the address increment Maximum burst size is four data transfers per access For an 8 or 16 bit bus this means that some bus requests may result in multiple burst accesses For example a quad word 16 byte request to an 8 bit memory results in four 4 byte burst accesses Each burst access is limited to four byte wide data transfers 11 15 EXTERNAL BUS DESCRIPTION In Burst accesses on a 32 bit bus are always aligned to even word boundaries Quad word and triple word accesses always begin on quad word boundaries A3 2 00 double word transfers always begin on double word boundaries A220 single word transfers occur on single word boundaries See Figure 11 7 Quad Word Burst 32 Bit Burst Bus Triple Word Burst Double Word Burst Double Word Burst F CA036A Figure 11 7 32 Bit Wide Data Bus Bursts Burst accesses for a 16 bit bus are always aligned to even short word boundaries A four short word burst access always begins on a four short word boundary A220 A120 Two short word burst accesses always begin on an even short word boundary A1 0 Single short word transfers occur on single short word boundaries see Figure 11 8 For a 16 bit bus data is transferred on data pins D15 0 Data is also driven on upper data lines D31 16 Burst accesses for an 8 bit bus are always aligned to even byte boundaries Four byte burst accesses always begin a 4 byte boundary 1 0 A020 Two byte burst a
165. the current stack s PFP register r0 The return type field in the PFP register is set according to the call type which is performed See section 5 8 RETURNS pg 5 16 4 A new stack frame is allocated by using the stack pointer value saved in step 3 This value is first rounded to the next 16 byte boundary to create a new frame pointer then stored in the FP register Next 64 bytes are added to create the new frame s register save area This value is stored in the SP register 5 The instruction pointer is loaded with the address of the first instruction in the called procedure The processor gets the new instruction pointer from the call the system procedure table the interrupt table or the fault table depending on the type of call executed Upon completion of these steps the processor begins executing the called procedure 5 5 PROCEDURE CALLS intel 5 2 3 2 Return Operation A return from any call type explicit or implicit is always initiated with a return ret instruction On a return the processor performs these operations 1 The current stack frame and local registers are deallocated by loading the register with the value of the PFP register 2 The local registers for the return target procedure are retrieved The registers are usually read from the local register cache however in some cases these registers have been flushed from register cache to memory and must be read directly from the save area i
166. the len operand Type Mismatch Non supervisor reference of a sfr rotate 13 r8 r12 r12 lt r8 with bits rotated 13 bits to left rotate 59DH REG SHIFT eshro intel INSTRUCTION SET REFERENCE 9348 Scanbit Mnemonic scanbit Scan For Bit Format scanbit STC dst reg lit sfr reg sfr Description Searches src value for most significant set bit 1 bit If a most significant 1 bit is found its bit number is stored in dst and condition code is set to 0105 If src value is zero all 1 s are stored in dst and condition code is set to 000 Action tempsrc lt src if tempsrc 0 dst lt OxFFFFFFFF AC cc lt 000 else 1 31 while tempsrc 2 1 0 ici l dst lt 1 AC cc lt 0105 Faults Type Mismatch Non supervisor reference of a sfr Example assume g8 is nonzero scanbit g8 410 410 lt bit number of most significant set bit in 48 AC cc lt 010 Opcode scanbit 641H REG See Also spanbit setbit 9 65 INSTRUCTION REFERENCE intel 9349 Scanbyte Mnemonic scanbyte Scan Byte Equal Format scanbyte srcl src2 reg lit sfr reg lit sfr Description Performs byte by byte comparison of src and src2 and sets condition code to 010 if any two corresponding bytes are equal If no corresponding bytes are equal condition code is set to 0005 Action tmpsrcl lt srcl tmpsrc2 lt src2 if tmpsrc1 and 000000FFH tmpsrc2 and 000000FFH or tmpsrc1 0000FF0
167. trace faults except the prereturn trace fault because events that can cause a trace fault to occur after the faulting instruction is completed As a result the faulting instruction cannot be re executed upon returning from the fault handling procedure Since the prereturn trace fault is generated before ret executes a change in the program s control flow does not accompany this fault the faulting instruction can be executed upon returning from the fault handling procedure 7 27 FAULTS 7 10 7 Type Faults Fault Type Fault Subtype Function RIP Program State Changes 7 28 AH Number Name OH Reserved 1H Type Mismatch 2H FH Reserved Indicates a program or procedure attempted to perform an illegal operation on an architecture defined data type or a typed data structure A type mismatch fault is generated when attempts are made to Modify the PC register with modpe while the processor is in user mode Write to on chip data RAM while the processor is in user mode Access a special function register while the processor is in user mode No defined value These faults may be imprecise when executing with the NIF bit cleared A change in the program s control flow does not accompany the type mismatch fault because the fault occurs before execution of the faulting instruction intel 9 TRACING AND DEBUGGING intel CHAPTER 8 TRACING AND DEBUGGING This chapter describes the 19609 Cx process
168. word data is moved directly between memory and a register with no sign extension or data truncation 3 2 intel DATA TYPES AND MEMORY ADDRESSING MODES 3 1 2 Ordinals Ordinals unsigned integer data types are stored and operated on as positive binary values Table 3 2 shows the supported ordinal sizes Table 3 2 Supported Ordinal Sizes Ordinal size Descriptive name 8 bit byte ordinals 16 bit short ordinals 32 bit ordinals 64 bit long ordinals The large number of instructions that perform logical bit manipulation and unsigned arithmetic operations reference 32 bit ordinal operands When ordinals are used to represent Boolean values 1 TRUE and 0 FALSE Several extended arithmetic instructions reference the long ordinal data type Only load and store instructions reference the byte and short ordinal data types Sign and sign extension is not a consideration when ordinal loads and stores are performed the values may however be zero extended or truncated A short word or byte load to a register causes the value loaded to be zero extended to 32 bits A short word or byte store to memory may cause an ordinal value in a register to be truncated to fit its destination in memory No overflow condition is signalled in this case 3 1 3 Bits and Bit Fields The processor provides several instructions that perform operations on individual bits or bit fields within register operands An individual bit i
169. 0 enable 1 or suspend 0 a DMA after a channel is set up Bits 0 through 3 enable or disable channels 0 through 3 respectively If an enable bit for a channel is cleared when a channel is active the DMA is suspended after pending DMA requests for the channel are completed and all bus activity for the pending request is complete The channel active bits indicate the channel is suspended DMA operation resumes at the point it was suspended when the channel enable bit is set To ensure that DMA channel does not start immediately after it is set up the enable bit for the channel must be cleared by software before sdma is issued This is necessary because the DMA controller does not explicitly clear the enable bit after a DMA has completed 13 22 ERRATA 7 1 94 DMA Command Register bits 30 Data Cache Global Disable and 31 Data Cache Invalidate not defined in Figure 13 9 or in the text that follows the figure These were correctly defined in the i960 CF Microprocessor Reference Manual Supplement and unintentionally omitted from this manual intel 3 DMA CONTROLLER The channel terminal count flags bits 7 4 are set when a DMA has stopped because byte count has reached zero for a non chained DMA or e anull pointer in a chaining descriptor is encountered in data chaining mode Flags 4 through 7 indicate terminal count for channels 0 through 3 respectively A terminal count flag is set only after the last requ
170. 0 CA processor only allows interrupt handlers to be locked in the cache The interrupt vector s two least significant bits must be set to 010 to cause the processor to fetch the interrupt procedure from locked cache rather than the normal memory cache hierarchy The interrupt procedure executes from the locked cache until a miss occurs in the locked section The cache remains locked until the cache mode is changed by the next sysctl instruction The invalidate instruction cache sysctl message invalidates both the locked and unlocked halves of the cache Refer to section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 for details on using the sysctl instruction to configure the instruction cache 12 22 ERRATA 06 14 94 Page 12 22 Table 12 2 For the CF Mode 1005 was incorrectly shown as locking 4 Kbytes it now correctly shows 2 Kbytes This errata also occurs on page 4 22 intel 13 DMA CONTROLLER intel CHAPTER 13 DMA CONTROLLER This chapter describes the 19609 Cx processor s integrated Direct Memory Access DMA Controller its operation modes setup external interface and DMA controller implementation 13 1 OVERVIEW The DMA controller concurrently manages up to four independent DMA channels Each channel supports memory to memory transfers where the source and destination can be any combination of internal data RAM or external memory The DMA mechanism provides two unique methods for performing DMA transfers e
171. 0 CF processor has a 1 Kbyte direct mapped data cache which enhances performance by reducing the number of load and store accesses to external memory The data cache can return up to a quad word 128 bits to the register file in a single clock cycle on a cache hit External memory is configured as cacheable or non cacheable on a region by region basis using special bits in the memory region configuration registers MCONO 15 This makes it easy to partition a system into cacheable regions local memory and non cacheable regions 1960 CF processor implements a simple coherency mechanism The data cache can also be enabled disabled or invalidated on a global basis through programming A 1 8 1 Data Cache Organization The data cache has a four word line size see Figure A 5 Each of the 64 cache lines has an associated cache tag containing the 22 most significant bits of the address and a valid bit Each line is further subdivided into single word blocks each with its own valid bit This subblock placement technique reduces latency on cache misses Data accesses result in cache hits and misses Accesses that match valid address tags and word s marked as valid are cache hits other data accesses are misses Valid Valid Valid Valid Valid Bit 31 10 Bit Bit Bit Bit Line 0 Cache Tag Word 9 OffsetO Word Offset 1 Word Offset 2 Word Offset 3 Line 1 Line 2
172. 00 dst src2 M3 M2 M1 1011 52 51 Src1 0101 1000 dst src M3 M2 M1 1100 S2 S1 bitpos 0101 1000 dst src2 M3 M2 M1 1101 S2 S1 Src1 0101 1000 dst src2 M3 M2 1 1110 S2 S1 Src1 0101 1000 dst src M3 M2 M1 1111 52 S1 bitpos 0101 1001 dst src2 M3 M2 M1 0000 52 S1 Src1 0101 1001 dst src2 M3 M2 1 0001 52 51 Src1 0101 1001 dst src2 M3 M2 1 0010 S2 S1 Src1 0101 1001 dst src2 M3 M2 1 0011 52 51 Src1 0101 1001 dst SIC M3 M2 M1 1000 S2 S1 len 0101 1001 dst src M3 M2 M1 1010 52 51 len 0101 1001 dst src M3 M2 M1 1011 52 51 len 0101 1001 dst src M3 M2 M1 1100 S2 51 len 0101 1001 dst src M3 M2 M1 1101 52 51 len 0101 1001 dst src M3 M2 M1 1110 52 51 len 0101 1010 src2 M3 M2 M1 0000 S2 61 Src1 0101 1010 src2 M3 M2 M1 0001 2 1 src1 0101 1010 src2 M3 M2 M1 0010 2 1 src1 0101 1010 src2 M3 M2 M1 0011 2 1 src1 0101 1010 dst src2 M3 M2 1 0100 S2 S1 Src1 0101 1010 dst src2 M3 M2 1 0101 52 51 Src1 0101 1010 dst src2 M3 M2 1 0110 S2 S1 Src1 intel Opcode 5A 7 5A E 5B 0 5B 2 5C C 5D 8 5D C 5E C 5F C 63 0 63 1 64 0 64 1 64 5 65 0 65 1 65 4 65 5 65 9 66 0 66 66 66 D 66 F 67 0 67 1 70 1 70 8 70 B 74 1 74 8 74 9 74 B MACHINE LANGUAGE INSTRUCTION REFERENCE Table E 2 REG Format Instruction Encodings Sheet 2 of 2 Mnemoni
173. 012 Several assembly language instruction statement examples follow Additional assembly language examples are given in section 3 3 5 Addressing Mode Examples pg 3 7 Further information about assembly language syntax can be found in Intel s i960 Processor Assembler User s Guide order 485276 subi r3 r5 r6 r6 r5 3 setbit 13 g4 g5 95 lt g4 with bit 13 set lda Oxfab3 r12 r12 Oxfab3 ld r4 g3 g3 lt memory location that g4 points to st 410 r6 r7 2 910 lt memory location that r6 2 r7 points to 4 1 INSTRUCTION SET SUMMARY In 4 1 2 Branch Prediction Branch prediction is an implementation specific feature of the 1960 Cx processors Not every implementation of the 1960 architecture uses the branch prediction bit Since branch instruction actions depend on the result of a previous comparison the architecture allows a programmer to predict the likely result of the branch operation for increased performance The programmer s prediction is encoded in one bit of the machine language instruction 80960 assemblers encode the prediction with a mnemonic suffix t true f false Use the t suffix to speed up execution when an instruction usually takes a branch use the f suffix when an instruction usually does not take a branch Because test and conditional fault instructions also use condition codes prediction suffixes are also implemented on these instructions Refer to s
174. 02 XX01H 2H Unimplemented XX02 XX02H 3H Unaligned XX02 XX03H 4H Invalid Operand XX02 XX04H 3H Arithmetic 1H Integer Overflow XX03 XX01H 2H Arithmetic Zero Divide XX03 XX02H 4H Reserved Floating Point 5H Constraint 1H Constraint Range XX05 XX01H 2H Privileged XX05 XX02H 6H Reserved 7H Protection Bit 1 Length XX07 XX01H 8H Reserved 9H Reserved AH Type 1H Type Mismatch XX01H BH FH Reserved NOTE 1 The operation unaligned fault is an i960 Cx processor specific extension 7 3 FAULTS intel In Table 7 1 e The first left most column contains the fault type numbers in hexadecimal e The second column shows the fault type name e The third column gives the fault subtype number as either 1 a hexadecimal number or 2 as a bit position in the fault record s 8 bit fault subtype field The bit position method of indicating a fault subtype is used for certain faults such as trace faults in which two or more fault subtypes may occur simultaneously e The fourth column gives the fault subtype name For convenience individual faults are referred to in this manual by their fault subtype name Thus an operation invalid operand fault is referred to as simply an invalid operand fault an arithmetic integer overflow fault is referred to as an integer overflow fault e The fifth column shows the encoding of the word in the fault record that contains the fault type and fault subtype numbers O
175. 0CF is 272187 To obtain updates and errata call Intel s FaxBack data on demand system 1 800 628 2283 or 916 356 3105 For information on other 1960 processor family products or the architecture in general refer to Intel s Solutions960 catalog order number is 270791 It lists all current 1960 microprocessor family related documents support components boards software development tools debug tools and more Other information can be obtained from Intel s technical BBS 916 356 3600 This manual is organized in three parts each part comprises multiple chapters and or appendices The following briefly describes each part e Part I Programming the i960 Cx Microprocessor Chapters 2 9 details the programming environment for the i960 Cx devices Described here are the processor s registers instruction set data types addressing modes interrupt mechanism external interrupt interface and fault mechanism e Part II System Implementation Chapters 10 14 identifies requirements for designing a system around the i960 Cx components such as external bus interface interrupt controller and integrated DMA controller Also described are programming requirements for the DMA controller bus controller and processor initialization e Part II Appendices includes quick references for hardware design and programming Appendices are also provided which describe the internal architecture how to write assembly level code to exploit the parall
176. 0H tmpsrc2 and 0000FF00H or tmpsrc1 and 00FF0000H tmpsrc2 and 00FF0000H or tmpsrc1 and FF000000H tmpsrc2 and FF000000H AC cc lt 0105 else AC cc lt 0005 Faults Type Mismatch Non supervisor reference of a sfr Example assume r9 Ox11AB1100 scanbyte 0 00 0011 r9 AC cc lt 010 Opcode scanbyte 5ACH REG 9 66 intel INSTRUCTION SET REFERENCE 9 3 50 80960Cx Processor Only Mnemonic sdma Setup DMA Channel Format sdma srcl src2 src3 reg lit sfr reg lit sfr reg lit Description The DMA channel specified by src is set up using the control word in src2 Dedicated data RAM for the specified DMA channel is written with src3 value First two bits of srcl specify channel src2 specifies DMA control word as a literal or single 32 bit register src3 specifies a single 32 bit register if channel is data chaining This register contains the address of the first chaining descriptor in memory src3 must specify a register with a register number divisible by four If channel is not data chaining src3 specifies a triple word contained in registers src3 src3 1 and src3 2 src3 contains byte count for DMA src3 1 contains source address src3 2 contains destination address Action control for channel src mod 4 lt src2 if not chaining mode ram src mod 4 lt src3 triple word store else ram src mod 4 lt src3 word store start channel src mod 4 Faults Constr
177. 1 The other possible entry types are reserved and must not be used 6 4 2 Pending Interrupts The pending interrupts section comprises the interrupt table s first 36 bytes divided into two fields pending priorities byte offset 0 through 3 and pending interrupts 4 through 35 Each of the 32 bits in the pending priorities field indicate an interrupt priority When the processor posts a pending interrupt in the interrupt table the bit corresponding to the interrupt s priority is set For example if an interrupt with a priority of 10 is posted in the interrupt table bit 10 is set Each of the pending interrupts field s 256 bits represent an interrupt procedure pointer Byte offset 5 is for vectors 8 through 15 byte offset 6 is for vectors 16 through 23 and so on Byte offset 4 the first byte of the pending interrupts field is reserved When an interrupt is posted its corresponding bit in the pending interrupt field is set This encoding of the pending priority and pending interrupt fields permits the processor to first check if there are any pending interrupts with a priority greater than the current program and then determine the vector number of the interrupt with the highest priority 6 4 3 Caching Portions of the Interrupt Table The architecture allows all or part of the interrupt table to be cached internally to the processor The purpose of caching these fields is to reduce interrupt latency by allowing the processor access to c
178. 1 3 1 Idob Idib Idos Idis Idi 1 4 2 1 5 3 Idq 1 6 4 1 2 stob stib stos stis stl 1 N A 3 stt 1 N A 519 1 To allow programs to issue load requests before the data is needed thus decouple memory speeds from instruction execution the BCU contains three queue entries Each entry stores all the information needed for a memory request e For loads the BCU contains the source address destination register number and load type e For stores BCU contains the destination address store type and the store data If a stq is executed all four registers are written to the BCU queue in one clock The BCU performs the actual bus request without taking any further clocks from instruction execution BCU queues maintain memory requests in order The requests are executed on the bus in the order that they are issued from the instruction stream A 27 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel ld g0 gl ld 92 93 Te 94 95 addo g 1 g 6 g7 Instruction I Scheduler ssue 10 d d addo Address Out bus Stbus 9 g2 g4 BCU External 90 02 04 Pipeline Address Bus External Data Bus go 92 95 LD Bus 91 lt 90 93 lt 92 95 lt 94 Read src1 src2 1 86 EU 91 9 Pipeline xecute an Write dst 97 lt 91 96 Figure 13 Back to Back BCU Accesses When the DMA controller is enabled one of the thr
179. 1 BOOT CONFIG 0 0 0 BYTE N 2 BOOT CONFIG 0 0 0 BYTE N 3 BOOT CONFIG 05 070 start ip amp rom prcb 2 0 0 0 0 CS 6 Example 14 6 Linker Directive File init d Sheet 1 of 2 LE RIN init ld p xsv Enough space must be reserved in ROM after the text Section to hold the initial values of the data section o 0xffff0000 1 0xfc00 rom_dat o 0xfffffc00 1 0x0300 placeholder for data image ibr o 0xffffff00 1 0x00ff data 0 0000000 1 0 1000 14 22 tel INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 6 Linker Directive File init ld Sheet 2 of 2 5 ibr rom ibr o gt ibr text text ALIGN 0x10 gt rom data gt bss gt data rom_data _checksum move 50 move 50 move 50 map 50 mkimage quit Etext section initial values move section right after the used in init s as source of command places the data ROM960 data text section rom prcb start ip text 0 ibr OxFFOO 0 0 ima Example 14 7 Makefile Sheet 1 makef ile 14 23 INITIALIZATIO
180. 1 CORE ARCHITECTURE 1960 microprocessor family products are based on the core architecture definition An 1960 processor can be thought of as consisting of two parts the core architecture implementation and implementation specific features The core architecture defines the following mechanisms and structure Programming environment global and local registers literals processor state registers data types memory addressing modes etc mplementation independent instruction set e Procedure call mechanism e Mechanism for servicing interrupts and the interrupt and process priority structure e Mechanism for handling faults and the implementation independent fault types and subtypes Implementation specific features are one or all of Additions to the instruction set beyond the instructions defined by the core architecture Extensions to the register set beyond the global local and processor state registers which are defined by the core architecture e On chip program or data memory Integrated peripherals which implement features not defined explicitly by the core archi tecture Code is directly portable object code compatible when it does not depend on implementation specific instructions mechanisms or registers The aspects of this microprocessor which are imple mentation dependent are described below Those aspects not described below are part of the core architecture CONSIDERATIONS FOR WRITING PORTABLE C
181. 1 efa dst lt memory short src sign extended OC i March 1994 Page 10 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p te Events Modes T Format Type Issue Latency idi Load Long OP 1 src ds r mem reg U 98 MEM 1 efa bus dst dst 1 lt memory long src oc Load Ordinal Byte OP I sre dst l4 efa d b ie 1 4 ge 80 MEM Morg 1 efa d dst memory byte src zero extended OC Load Ordinal Short OP 1 sre dst 1 mE icai d E 1 eu or 88 MEM Morp 1 efa un dst lt memory short src zero extended OC Load Quad OP Id src dst 2 efa 4 mem 1 0 Morg 1 efa 1 dst dst 1 dst 2 dst 3 lt memory quad src OC Load Triple OP sre dst 1 efa Idt d I pfu fe A0 MEM 1 efa e dst dst 1 dst 2 lt memory triple src i Mark mark if PC te 1 and TC btm 1 2 PC tfp I IBR IBR U 66 B REG u 17 17 TC bte lt 1 Trace Breakpoint fault Modify AC modac mask
182. 110 M3 M2 M1 1101 2 1 0110 0110 M3 M2 M1 1111 52 S1 0110 0111 dst src2 M3 M2 M1 0000 S2 S1 Src1 0110 0111 dst src2 M3 M2 M1 0001 2 S1 src1 0111 0000 dst src2 M3 M2 1 0001 S2 S1 src 0111 0000 dst src2 M3 M2 1 1000 S2 S1 src 0111 0000 dst src2 M3 M2 1 1011 S2 51 src 0111 0100 dst src2 M3 M2 1 0001 S2 51 src 0111 0100 dst src2 M3 M2 M1 1000 S2 S1 src 0111 0100 dst src2 M3 M2 1 1001 52 51 src 0111 0100 dst src2 M3 M2 M1 1011 S2 S1 src1 E 3 MACHINE LANGUAGE INSTRUCTION REFERENCE 4 testno testg teste testge testl testne testle testo bbc cmpobg cmpobe cmpobge cmpobl cmpobne cmpoble bbs cmpibno cmpibg cmpibe cmpibge cmpibl cmpibne cmpible cmpibo Table E 3 COBR Format Instruction Encodings 3 st 5 M 4 5 a 8 E 9 E 2 a BA 24 2319 18 14 139 2 1 0 0010 0000 dst M1 T S2 0010 0001 dst M1 T S2 0010 0010 dst M1 T 52 0010 0011 dst M1 T S2 0010 0100 dst M1 T S2 0010 0101 dst M1 T S2 0010 0110 dst M1 T 52 00100111 dst M1 T S2 0011 0000 bitpos src M1 targ 52 0011 0001 5 1 src2 M1 targ 52 0011 0010 5 1 src2 M1 targ T 52 0011 0011 src1 src2 M1 targ 52 0011 0100 src1 src2 M1 targ 52 0011 0101 src1 src2 M1 targ 52 0011 0110 src1 src2 M1 targ 52 00110111 b
183. 12 Instruction instruction Fetch Unit window Control Instruction Cache 2 Way Set Associative Presents 4 words per clock to the Instruction Scheduler Instruction Scheduler parallel issue paths Execution Pipelines ines NOTE Instruction Cache Size 1 KByte CF 4KByte F_CA087A A 16 Figure A 7 Issue Paths intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 8 Scoreboarding When the scheduler issues a group of instructions the targeted parallel processing units immediately acknowledge receipt of instructions and the scheduler begins considering the next four unexecuted words of the instruction stream The scheduler checks for register dependencies between instructions before issuing them The scheduler does not issue a group of instructions if 1 aregister is specified as a destination more than once or 2 register is specified as a destination in one instruction and a source in a subsequent instruction A single register may however be specified as a source in multiple instructions or as a source in one instruction and a destination in a subsequent instruction The six port register set supports these cases For example the following instructions cannot be issued in parallel due to register dependencies 90 gl g2 g2 is a destination g3 g2 is a source store must wait for addo to complete 90 gl 92 92 is destination
184. 15 src1 gt src2 cmpibl 1005 lt src2 1015 src2 cmpible 1105 lt src2 cmpibo 1115 Any Condition cmpobg 0015 src gt src2 cmpobe 0105 src src2 cmpobge 0115 Src1 gt src2 cmpobl 1005 lt src2 cmpobne 1015 src2 cmpoble 1102 lt src2 NOTE cmpibo always branches cmpibno never branches Action if src1 lt src2 lt 1005 else if src src2 AC cc lt 010 else AC cc lt 001 if and 0005 IP lt IP displacement else IP lt IP 4 Faults Trace TC i or TC br 1 Operation Type Example assume g3 lt 99 resume execution at the new IP resume execution at the next IP Instruction Branch if taken Instruction and Branch Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and Unimplemented Execution from on chip data RAM Mismatch Non supervisor reference of a sfr cmpibl g3 49 xyz assume 19 gt r7 cmpobge 19 r7 xyz 9 32 g9 is compared with g3 IP lt xyz 19 is compared with r7 lt xyz intel Opcode See Also cmpibe cmpibne cmpibl cmpible cmpibg cmpibge cmpibo cmpibno cmpobe cmpobne cmpobl cmpoble cmpobg cmpobge 3AH 3DH 3CH 3EH 39H 3BH 3FH 38H 32H 35H 34H 36H 31H 33H INSTRUCTION SET REFERENCE COBR COBR COBR COBR COBR COBR COBR COBR COBR COBR
185. 2 Q4 Priority Q 3 Encoder A1 9 2 O EO LSB L 9 HEN 12 6 Figure 12 4 Implementation of Expanded Mode Sources 3 INTERRUPT CONTROLLER 12 2 1 3 Mixed Mode In mixed mode pins XINTO through XINT4 are configured for expanded mode These pins encoded for the five most significant bits of an expanded mode vector number the three least significant bits of the vector number are set internally to be 0105 Pins XINT5 through XINT7 are configured for dedicated mode IMSK register bit O is a global mask for the expanded mode interrupts bits 5 through 7 mask the dedicated interrupts from pins 5 through XINT7 respectively IMSK register bits 1 4 must be set to 0 in mixed mode The IPND register posts interrupts from the dedicated mode pins XINT7 5 IPND register bits that correspond to expanded mode inputs are not used CAUTION When setting IMSK register bits in mixed mode make sure IMSK register bits 1 4 set to 0 12 2 2 Non Maskable Interrupt NMI The NMI pin generates an interrupt for implementation of critical interrupt routines NMI provides an interrupt that cannot be masked and that has a higher priority than priority 31 interrupts and priority 31 process priority The interrupt vector for NMI resides in the interrupt table as vector number 2
186. 248 XINT pins Y set corresponding pending bits in interrupt table Y Y SIPR interrupt priority Y NOTE 1 ICON gie 212 continue normal operation is int prio gt or 31 YES continue normal operation signal core to process interrupt YES software interrupt Y in interrupt table read pending interrupt bits clear pending interrupt bits Y update SIPR with next highest priority 1 SP interrupt stack pointer FP SP aligned to next 16 byte boundary 16 Y store interrupt record at FP 16 Y New PC clear trace fault pending bit TC tfp clear trace enable bit TC te state interrupted PC s 1 mode supervisor PC em 1 Y get interrupt procedure pointer SP FP 64 IP interrupt procedure pointer NOTES 1 Is ICON Register global interrupts enable bit set to 1 If yes external interrupt pins are enabled 2 Is interrupt priority greater than process priority or equal to 31 3 Is PC Register state bit set to 1 if yes processor is interrupted if no processor is executing 6 14 Figure 6 4 Flowchart for Worst Case Interrupt Latency intel FAULTS intel CHAPTER 7 FAULTS This chapter describes t
187. 3 37 Mnemonic Format Description Action Faults Example Opcode See Also INSTRUCTION SET REFERENCE MOVE mov Move movl Move Long movt Move Triple movq Move Quad mov STC dst reg lit sfr reg sfr Copies the contents of one or more source registers specified with src to one or more destination registers specified with dst For movl movt and movq src and dst specify the first lowest numbered register of several successive registers src and dst registers must be even numbered e g g0 g2 or 10 12 or sf0 sf2 for movl and an integral multiple of four e g g0 g4 or 10 r4 sfO sf4 for movt and The moved register values are unpredictable when 1 the src and dst operands overlap 2 registers are not properly aligned dst src Type Mismatch Non supervisor reference of a sfr movt g8 r4 r4 r5 r6 lt g8 g9 410 mov 5CCH REG movl 5DCH REG movt 5ECH REG movq 5FCH REG LOAD STORE Ida 9 53 INSTRUCTION SET REFERENCE intel 9 3 38 Format Description Action Faults Example Opcode See Also 9 54 muli mulo muli mulo mul Multiply Integer Multiply Ordinal srcl Src2 dst reg lit sfr reg lit sfr reg sfr Multiplies the src2 value by the src value and stores the result in dst The binary results from these two instructions are identical The only difference is tha
188. 48 During initialization the core caches the vector for NMI on chip to reduce NMI latency The NMI vector is cached in location of internal data RAM The core immediately services NMI requests While servicing NMI the core does not respond to any other interrupt requests even another NMI request until it returns from the NMI handling procedure An interrupt request on the NMI pin is always falling edge detected 12 2 3 Saving the Interrupt Mask The IMSK register is automatically saved in register r3 when a hardware requested interrupt is serviced After the mask is saved the IMSK register is optionally cleared This allows all interrupts except NMIs to be masked while an interrupt is being serviced Since the IMSK register value is saved the interrupt procedure can restore the value before returning The option of clearing the mask is selected by programming the ICON register as described in section 12 3 4 Interrupt Control Register ICON pg 12 11 Several options are provided for interrupt mask handling 1 Mask is unchanged 2 Clear for dedicated mode sources only 3 Clear for expanded mode sources only 4 Clear for all hard ware requested interrupts dedicated and expanded mode 12 7 INTERRUPT CONTROLLER In Options 2 and 3 are used mixed mode where both dedicated mode and expanded mode inputs are allowed DMA interrupts are always dedicated mode interrupts NOTE If the same
189. 5 to Ed E a B z lt E a z m cds lt a Figure 11 10 32 Bit Bus Burst Non Pipelined Read Request with Wait States EXTERNAL BUS DESCRIPTION In Function Bus iui 32 bit WW 0 J X 10 00000 Pipe External ini Ready Burst Control 2 OFF Disabled 0 0 Bit Value PCLK ADS A31 4 SUP DMA BE3 0 LOCK W R BLAST j oa m 2 WAIT D31 0 F CX031A Figure 11 11 32 Bit Bus Burst Non Pipelined Write Request without Wait States 11 20 3 EXTERNAL BUS DESCRIPTION 11 2 5 Pipelined Read Accesses Pipelined read accesses provide the maximum data bandwidth For pipelined reads the next address is output during the current data cycle This effectively removes the address cycle from consecutive pipelined accesses A pipelined read memory system is implemented by adding an address latch to the design see Figure 11 12 The address latch holds the address for the current read access while the processor outputs the address for the next access This allows the next address to be available during the data cycle of the current access Overlapping address and data cycles improves data bandwidth Write accesses to a pipelined region act the same as writes to a non pipelined region This means that the address for a write access is not pipelined Similarly the address for
190. 5 1 OVERVIEW The i960 architecture supports two methods for making procedure calls e RISC style branch and link a fast call best suited for calling procedures that do not call other procedures An integrated call and return mechanism a more versatile method for making procedure calls providing a highly efficient means for managing a large number of registers and the program stack On a branch and link bal balx the processor branches and saves a return IP in a register The called procedure uses the same set of registers and the same stack as the calling procedure On a call call callx calls or when an interrupt or fault occurs the processor also branches to a target instruction and saves a return IP Additionally the processor saves the local registers and allocates a new set of local registers and a new stack for the called procedure The saved context is restored when the return instruction ret executes In many RISC architectures a branch and link instruction is used as the base instruction for coding procedure call The user program then handles register and stack management for the call Since the 1960 architecture provides a fully integrated call and return mechanism coding calls with branch and link is not necessary Additionally the integrated call is much faster than typical RISC coded calls The branch and link instruction in the 1960 processor family therefore is used primarily for calling leaf procedu
191. 6 Figure 13 7 Figure 13 8 Figure 13 9 Figure 13 10 Figure 13 11 Figure 13 12 Figure 13 13 Figure 13 14 xviii intel Data Width and Byte Enable Encodings 11 10 Basic Read Request Non Pipelined Non Burst Wait States 11 12 Read Write Requests Non Pipelined Non Burst No Wait States 11 14 32 Bit Wide Data Bus Bursts 11 16 16 Bit Wide Data Bus Bursts 11 17 8 Bit Wide Data Bus Bursts 11 17 32 Bit Bus Burst Non Pipelined Read Request with Wait States 11 19 32 Bit Bus Burst Non Pipelined Write Request without Wait States 11 20 Pipelined Read Memory System 11 21 Non Burst Pipelined Read Waveform 11 22 Burst Pipelined Read Waveform 11 23 Pipelined to Non Pipelined Transitions 11 24 The LOCK Signal 11 27 HOLD HOLDA Bus Arbitration 11 29 Operation of the Bus Backoff Function 11 31 Example Application of the Bus Backoff Function 11 32 Interrupt Controller 12 3 Dedicated Mode 12 4 Expanded Mode 12 5 Implementation of Expanded Mode Sources 12 6 Interrupt Sampling 12 10 Interrupt Control ICON Register 12 11 Interrupt Mapping IMAPO IMAP2 Registers 12 13 Interrupt Mask IMSK and Interrupt Pending IPND Registers 12 15 Calculation of Worst Case Interrupt Latency int 12 19 Source Data Buffering for Destination Synchronized DMAs 13 5 Example of Source Synchronized Fly by DMA 13 6 Source Synchronized DMA Loads from an 8 bit Non burst Non pipelined Memory Region 13 7 Byte to Word Assembly 13 9 Optimizat
192. 6 bit Multi cycle Word Byte Short Byte Word to Word 32 32 bit Multi cycle Word Byte Word Byte Fly by Word Word N A N A Quad to Quad 128 128 bit Multi cycle Quad Quad Quad Quad Fly by Quad Quad N A N A 13 11 DMA CONTROLLER intel Unaligned transfers are best utilized for block mode memory to memory transfers However the synchronizing modes can also be fully unaligned given the restrictions in Table 13 3 These optimized unaligned transfers are executed by performing byte requests until alignment is enforced At this time aligned source and destination requests are executed At end of transfer the DMA may revert to byte transfers to complete the DMA While aligning the addresses the same location may be read more than once Also the synchronizing device may be required to supply fewer or more bytes per transfer For example in 32 32 destination synchronized demand mode the destination could be written with 1 to 7 bytes per DREQ When unaligned the number of DREQs required to complete a transfer is very difficult to calculate given the large number of permutations It may be greater than the byte count divided by the transfer width Each DREQ will generate a single DACK This makes it much easier for external hardware to assert DREQ based on the DACK output This alignment mechanism is shown in Figure 13 5 This is an example of a 32 32 source synchro nized transfer with source at 0x201 destination at 0x303 and
193. 64 Bit 128 Bit DST Bus Store Bus Note Instruction Cache Size PF CA 1 Kbyte pot CF Only CF 4 Kbyte CF001A 2 19609 CA CF Microprocessor Block Diagram A 1 1 Instruction Scheduler IS The IS decodes the instruction stream and drives the decoded instructions onto the machine bus which is the major control bus The IS can decode up to three instructions at a time one from each of three different classes of instructions one REG format one MEM format and one CTRL format instruction The IS directly executes the CTRL format instruction branches manages the instruction pipeline and keeps track of which instructions are in the pipeline so faults can be detected The IS is assisted by three associated functional blocks instruction fetch unit instruction cache and microcode ROM A 3 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel The instruction fetch unit provides the IS with up to four words of instructions each cycle It extracts instructions from the instruction cache microcode ROM and its instruction fetch queue for presentation to the scheduler The instruction fetch unit requests external fetch operations from the bus controller whenever a cache miss occurs The instruction cache is 1 Kbyte 80960 or 4 Kbyte 80960 two way set associative This cache delivers to the IS up to four instructions per clock The cache allows many inner loops of code to execute with no externa
194. 9 37 eshro 80960 Processor Only 9 38 QA CHEER 9 39 EAUET IE xx UNUM Ua IINE 9 40 flushireg 9 42 fmark ee AARAU deutet 9 43 LOAD ac etd toe n eset Mts hie 9 44 URL hu MIT 9 46 marc Ai hoi cate 9 47 ere EISE 9 48 TMOG 3 fn RIS 9 49 modify AEE A AE ed oie Repl ee ets 9 50 eens aut nen ee A ees 9 51 modtc eee eee oe inda 9 52 oio etes ete fat atiis 9 53 muli sx oor e nn MN 9 54 tossed teri nc nae M MEE ae Sas 9 55 9 56 not notand 222 224244 040142 000000000 9 57 a uM m ede m A IUD Mt 9 58 MOTOR cT 9 59 E ee A E ee es 9 60 remi 420 2222000 0 00001000000000000000000 9 61 9 62 jr dM eR 9 64 Scanbit 5 nen Rea eiii deeem 9 65 9 66 80960 Processor
195. A 1 7 Data RAM and Local Register Cache A 7 A 1 8 Data Cache 80960CF A 8 A 1 8 1 Data Cache Organization 8 A 1 8 2 Bus Configuration teet entre ibunt dee 9 1 8 3 Global Control of the A 9 A 1 8 4 Data Fetch Poligy 10 1 8 5 Wite PollGy 10 1 8 6 Data Gache Coherency misiri iinn e a a aea a a es A 10 A 1 8 7 BCU Pipeline and Data Cache Interaction A 11 A 1 8 8 BCU Queues and Cache Coherenoy A 12 A 1 8 9 DMA Operation and Data 2 13 1 8 10 External I O and Bus Masters and Cache Coherency 13 2 PARALLEL INSTRUCTION 14 A 2 1 Parallel ISsug 5 3 eit eit cct ph ee oM iq dH ied A 14 A 2 2 Parallel Execution eint ede iet A 15 A 2 3 ScoreboardiFig rid eder eut netten ada ede dat e tea ds A 17 A 2 3 1 Register Scoreboarding 22 A 18 A 2 3 2 Resource Scoreboarding 2 A 18 2 3 3 Prevention of Pipeline Stalls 22 18 2 3 4 Additional Scoreboarded Resources Due to the Data Cache A 19 A 2 4 Processing eR e EO A 20 A 2 4 1 Execu
196. A006A 2 22 Figure 2 6 Example Application of the User Supervisor Protection Model intel DATA TYPES AND MEMORY ADDRESSING MODES intel CHAPTER 3 DATA TYPES AND MEMORY ADDRESSING MODES 3 1 DATA TYPES The instruction set references or produces several data lengths and formats The i960 architecture defines the following data types nteger 8 16 32 and 64 bits e Ordinal unsigned integer 8 16 32 and 64 bits e Triple Word 96 bits e Quad Word 128 bits Bit Bit Field Figure 3 1 shows 1960 architecture data types and the length and numeric range of each EE Bis Be a Lo pens 8 7 0 Bit Field 3e Word S Bits Long Bl To o o Triple Word Quad Word 00 o o Byte Integer 8 Bits 27 to 27 1 Numeric Short Integer 16 Bits 215 to 215 4 Integer Integer 32 Bits 231 to 23 4 Long Integer 64 Bits 263 to 263 1 Byte Ordinal 8 Bits 0 to 28 1 Short Ordinal 16 Bits 0 to 216 1 Ordinal 32 Bits 0 to 232 1 Long Ordinal 64 Bits 0 to 26 1 Numeric Ordinal Bit 1 Bit Bit Field 1 32 Bits Triple Word 96 Bits Quad Word 128 Bits Non Numeric 008 Figure 3 1 Data Types and Ranges 3 1 DATA TYPES AND MEMORY ADDRESSING MODES intel 3 1 1 Integers Integers are signed whole numbers which are stored and operated on in two s complement format by the integer instructions Most integer ins
197. ANCE OPTIMIZATION intel 2 PARALLEL INSTRUCTION PROCESSING At the center of the 1960 Cx processor core is a set of parallel processing units capable of executing multiple single clock instructions in every clock To support this rate the IS can issue up to three new instructions in every clock Each processing unit has access to the multiple ports of the chip s six ported register file therefore each processing unit can execute instructions independently and in parallel In general the register file instruction scheduler cache and fetch unit keep the parallel processing units busy given the typical diversity of instructions found in a rolling quad word group of instructions To achieve highly optimized performance for critical code sequences the user must understand how instructions execute on the processor The following section describes instruction execution on the 1960 Cx processors with the goal of instruction stream optimization in mind See section A 2 7 Coding Optimizations pg A 43 for specific optimization techniques applicable to the i960 Cx processors 2 11 Parallel Issue The IS looks at a rolling quad word group of unexecuted instructions every clock and issues all instructions which can be executed in that clock The scheduler can issue up to three instructions every clock to the processing units and can sustain an issue rate of two instructions per clock To achieve parallelism the IS detects to which machin
198. Access Access gt lt gt gt gt Word Long Bus Request gt CA060A Figure 13 3 Source Synchronized DMA Loads from an 8 bit Non burst Non pipelined Memory Region 13 7 DMA CONTROLLER intel The request length selected for a DMA operation byte short word word or quad word should not be confused with external data bus width or other characteristics programmed in the memory region configuration table Request length dictates the type of bus request issued by DMA controller microcode while the region configuration of a DMA s source and destination memory control how that bus request is executed on the external bus As an example consider a system in which a DMA source memory region is configured for 8 bit non burst accesses and a word source request length is selected DMA microcode issues word loads identical to the Id instruction to DMA addresses in the source region Since the source memory region is configured as 8 bits the bus controller handles the word loads as four 8 bit accesses in that region To contrast this example if the DMA is configured for a byte source request length DMA microcode issues byte loads identical to the Idob instruction to DMA addresses in the source region The byte load to this region is executed as a single 8 bit access CHAPTER 11 EXTERNAL BUS DESCRIPTION fully describes bus configuration and how the bus controller execut
199. Action dst lt not src2 or src Faults Type Mismatch Non supervisor reference of a sfr Example notor gl2 g3 g6 46 lt NOT g3 OR 412 Opcode notor 58DH REG See Also and andnot nand nor not notand or ornot xnor xor 9 59 INSTRUCTION SET REFERENCE intel 9 3 44 Format Description Action Faults Example Opcode See Also 9 60 or ornot or Or ornot Or Not or srcl src2 dst reg lit sfr reg lit sfr reg sfr ornot srcl src2 dst reg lit sfr reg lit sfr reg sfr Performs a bitwise OR or instruction or ornot instruction operation on the src2 and src values and stores the result in dst or dst lt src2 or srcl ornot dst lt src2 or not src1 Type Mismatch Non supervisor reference of a sfr or 14 g9 g3 g3 lt g9 OR 14 ornot r3 r8 rill rll lt r8 OR NOT r3 or 587H REG ornot 58BH REG and andnot nand nor not notand notor xnor xor intel INSTRUCTION SET REFERENCE 9345 lemi remo Mnemonic remi Remainder Integer remo Remainder Ordinal Format rem srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Divides src2 by src1 and stores the remainder in dst The sign of the result if nonzero is the same as the sign of src2 Action if src120 Arithmetic Zero Divide fault dst lt src2 src2 srel srcl 3 srcl src2 and dst are 32 bits Faults Type Mismatch Non supervisor reference of a
200. Bytes iuris 7 000 0000H Region 14 Ene 256 MBytes Entry 15 F000 0000H Region 15 FFFF FFFFH 256 MBytes F CA027A Figure 10 1 MCON 0 15 Registers Configure External Memory 10 6 lel THE BUS CONTROLLER Burst Enable 0 disabled 1 enabled READY BTERM Enable 0 disabled 1 enabled Read Pipelining Enable 0 disabled 1 enabled Wait States 0 31 wait states Wait States 0 3 wait states Nxpa Wait States 0 3 wait states Nwap Wait States 0 31 wait states Nwpp Wait States 0 3 wait states Y Y 31 28 24 20 Y vov 16 12 8 4 0 Reserved Bus Width Initialize To 0 00 8 bit bus 01 16 bit bus 10 32 bit bus 11 reserved Byte Order 0 little endian 1 big endian Data Cache Enable i960 CF processor only l 0 disabled Memory Region Configuration 1 enabled Register MCON 0 15 F CA028A Figure 10 2 Memory Region Configuration Register MCON 0 15 10 7 THE BUS CONTROLLER intel Table 10 1 MCONO 15 Programmable Bits Entry Name Bit 4 Definition Burst Enable 0 Enables or disables burst accesses for the region READY BTERM Enables or disables region s READY and BTERM inputs If disabled READY and 1 Enable BTERM are ignored Read Pipelining 2 Enables or disables address pipelining of region s read accesses READY and Enable BTERM are ignored during pipe
201. C cc1 0 dst lt src andnot 2 bitpos mod 32 else dst lt src or 2 bitpos mod 32 Type Mismatch Non supervisor reference of a sfr assume AC cc 0105 alterbit 24 g4 g9 g9 lt g4 with bit 24 set alterbit 58FH REG chkbit clrbit notbit setbit intel INSTRUCTION SET REFERENCE 934 andnot Mnemonic and And andnot And Not Format and srcl src2 dst reg lit sfr reg lit sfr reg sfr andnot srcl 5 2 reg lit sfr reg lit sfr reg sfr Description Performs a bitwise AND and or AND NOT andnot operation on src2 and src values and stores result in dst Note in the action expressions below src2 operand comes first so that with andnot the expression is evaluated as src2 andnot src rather than srcl andnot src2 Action and dst src2 and srcl andnot dst lt src2 andnot src1 Faults Type Mismatch Non supervisor reference of a sfr Example and 0 17 48 g2 g2 lt 48 AND 0x17 andnot r3 r12 r9 r9 lt 12 AND NOT r3 Opcode and 581H REG andnot 582H REG See Also nand nor not notand notor or ornot xnor xor 9 11 INSTRUCTION REFERENCE intel 9 3 5 Format Description Action Faults Example Opcode See Also 9 12 atadd atadd Atomic Add atadd src dst STC dst reg sfr reg lit sfr reg sfr addr Adds src value full word to value in the memory location specified with src dst operand Initial va
202. C source int a b 10 procl a 1 x amp b 0 assembles to mov 3 40 value of a ldconst 1 41 value of 1 ldconst 120 42 value of x lda 0x40 fp g3 reference to b 10 call _procl mov g0 r3 save return value in a _procl movq g0 r4 save parameters other instructions in procedure 5 and nested calls mov r3 g0 load return parameter ret PROCEDURE CALLS intel 5 4 LOCAL CALLS A local call does not cause a stack switch A local call can be made two ways with the call and callx instructions or e with a system local call as described in section 5 5 SYSTEM CALLS pg 5 12 call specifies the address of the called procedures as the IP plus a signed 24 bit displacement i e 273 to 23 4 allows any of the addressing modes to be used to specify the procedure address The IP with displacement addressing mode allows full 32 bit IP relative addressing When a local call is made with a call or callx the processor performs the same operation as described in section 5 2 3 1 Call Operation pg 5 5 The target IP for the call is derived from the instruction s operands and the new stack frame is allocated on the current stack 5 5 SYSTEM CALLS A system call is a call made via the system procedure table It can be used to make a system local call similar to a local call made with call and callx or a system supervisor call A system call is initiated with calls wh
203. CE ADR BYTE COUNT DRAM REF CHAIN F CA121A Figure B 21 DMA Chaining Description B 3 11 Memory Ready The memory ready input to the 1960 Cx processors READY indicates the completion of a DRAM read or write cycle READY must be generated by the DRAM controller and must satisfy setup and hold times specified in the data sheet If multiple memory systems are using READY ready signals from these memory systems must be logically ORed together B 3 12 Region Table Programming Region table programming is critical to DRAM operation and Nwap wait states must satisfy RAS CAS and address valid times for the DRAM Ngpp and Nwpp times must satisfy the column address to data access times The NxpA time must satisfy RAS precharge time Figures B 22 and B 23 show typical system waveforms for this design Note that RAS is not asserted until the end of the address cycle this delay contributes to RAS precharge time In some DRAM designs it may be possible to remove RAS before access is complete This is especially true for static column reads and multiple world access If RAS can be removed early in the access RAS precharge can occur during the access B 29 intel BUS INTERFACE EXAMPLES Figure 22 DRAM System Read Waveform B 30 BUS INTERFACE EXAMPLES CLK ADS RAS DRAM ADR CAS DATA BLAST A 2 1 D 1 D 1 D 1 D T
204. COBR COBR COBR COBR BRANCH IF cmpi cmpo bal balx 9 33 INSTRUCTION SET REFERENCE intel 9 3 20 Format Description Action Faults Example Opcode See Also 9 34 concmpi concmpo concmpi Conditional Compare Integer concmpo Conditional Compare Ordinal concmp srcl src2 reg lit sfr reg lit sfr Compares src2 and src values if condition code bit 2 is not set If comparison is performed condition code is set according to comparison results Otherwise condition codes are not altered These instructions are provided to facilitate bounds checking by means of two sided range comparisons e g is A between B and C They are generally used after a compare instruction to test whether a value is inclusively between two other values The example below illustrates this application by testing whether g3 value is between g5 and g6 values where g5 is assumed to be less than First a comparison cmpo of g3 and g6 is performed If g3 is less than or equal to 56 1 condition code is either 0105 or 0015 a conditional comparison concmpo of g3 and g5 is then performed If g3 is greater than or equal to 5 indicating that g3 is within the bounds of g5 and g6 condition code is set to 0105 otherwise it is set to 001 if AC cc2 0 if src lt src2 lt 0105 else AC cc lt 001 Type Mismatch Non supervisor reference of a sfr 46 93 comp
205. Cx processors the programmable on chip interrupt controller transparently manages all interrupt requests Interrupts are generated by hardware external events or software the user program Hardware requests are signaled on the 8 bit external interrupt port XINT7 0 the non maskable interrupt pin NMI or the four DMA controller channels Software interrupts are signaled with the sysctl instruction with post interrupt message type 12 16 INTERRUPT CONTROLLER Posting Interrupts When an interrupt is requested the interrupt is either serviced immediately or saved for later service depending on the interrupt s priority Saving the interrupt for later service is referred to as posting An interrupt once posted becomes a pending interrupt Hardware and software interrupts are posted differently e hardware interrupts are posted by setting the interrupt s assigned bit in the interrupt pending IPND special function register e software interrupts are posted by setting the interrupt s assigned bit in the interrupt table s pending priorities and pending interrupts fields Checking Pending Interrupts Interrupts posted for later service must be compared to the current process priority If process priority changes posted interrupts of higher priority are then serviced Comparing the process priority to posted interrupt priority is handled differently for hardware and software interrupts Each hardware interrupt is assigned a speci
206. D ws ws 1 amp amp ws 32 ws 3 0 ws can define NRDD ws gt 1 amp amp ws lt 4 ws 8 0 ws define NXDA ws ws 1 amp amp ws lt 4 ws 10 0 ws can define NWAD ws gt 1 amp amp ws 32 ws 12 0 ws can define NWDD ws ws 1 amp amp ws 4 ws 17 0 ws can Bus configuration define DEFAULT BUS WIDTH 8 READY 0 5 0 PIPELINE 0 NRAD 8 NRDD 0 NXDA 1 NWAD 8 NWDD 0 define I_O BUS_WIDTH 8 READY 0 BURST 0 PIPELINE 0 NRAD 13 NRDD 0 NXDA 3 NWAD 13 NWDD 0 BIG_ENDIAN 0 BIG_ENDIAN 0 ow 32 2 19 0 Z uS But 14 25 INITIALIZATION AND SYSTEM REQUIREMENTS intel Example 14 8 Initialization Header File init h Sheet 2 of 2 define DRAM BUS WIDTH 32 READY 1 BURST 1 BIG_ENDIAN 0 PIPELINE 0 NRAD 2 NRDD 1 NXDA 1 NWAD 2 NWDD 1 define FLASH BUS WIDTH 8 READY 0 BURST 0 BIG_ENDIAN 0 PIPELINE 0 NRAD 4 NRDD O NXDA 1 NWAD 4 NWDD 0 14 4 SYSTEM REQUIREMENTS The following sections discuss generic hardware requirements for a system built around the 1960 Cx processor This section describes electrical characteristics of the 1960 Cx processor s interface to the external circuit The CLKIN RESET STEST FAIL ONCE Vgs and pins are described in deta
207. D PERFORMANCE OPTIMIZATION intel Control Instruction Cache rolling quad word instruction window Instruction Fetch Unit Instruction Scheduler Execution Pipelines REG Pipelines Pipelines Pipelines CA099A Figure A 19 Micro flow Invocation When micro flows execute they consume the instruction scheduler s activity From the first clock through the last clock of a micro flow the IS is typically issuing two instructions per clock MEM side micro flows such as loads and stores can be issued in parallel with REG side instruc tions Performance of micro flowed instructions is measured by the number of clocks taken to issue instructions 2 6 2 Data Movement Data movement instructions supported as micro flows include the triple and quad word register move instructions and the Ida load and store instructions which use complex addressing modes movt and movq each take two clocks to execute Ida takes two clocks to execute for the reg reg scale and disp reg reg scale addressing modes and can be issued in parallel with an instruction of machine type REG Ida using the disp IP addressing mode takes four clocks to execute and can be issued in parallel with a machine type REG instruction The AGU executes Ida directly for all other addressing modes A 38 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Load and store instructions are summarized in Table A 14
208. Demand mode transfers synchronized to external hardware Typically used for transfers between an external device and memory In demand mode external hardware signals for each channel are provided to synchronize DMA transfers with external requesting devices e Block mode transfers non synchronized Typically used to move blocks of data within memory To perform a DMA operation the DMA controller uses microcode the core s multi process resources the bus controller and internal hardware dedicated to the DMA controller Loads and stores execute in DMA microcode to perform each transfer The bus controller directed by DMA microcode handles data transactions in external memory DMA controller hardware synchronizes transfers with external devices or memory provides the programmer s interface to the DMA controller and manages the priority for servicing the four DMA channels The DMA controller uses multi process resources designed into the core to enable DMA operations to execute in microcode concurrently with the user s program This sharing of core resources is accomplished with hardware implemented processes for each of the four DMA channels the DMA processes and a separate process for the user s program the user process Alternating between DMA processes and the user process enables a user s program and up to four DMASs per channel to run at the same time To execute a DMA operation a DMA process issues memory load or store
209. E The interrupt source as shown in Figure 12 4 must remain asserted until the processor services the interrupt and explicitly clears the source External interrupt pins in expanded mode are always active low and level detect The interrupt controller ignores vector numbers 0 though 7 The output of the external priority encoders in Figure 12 4 can use the 0 vector to indicate that no external interrupts are pending IMSK register bit 0 provides a global mask for all expanded interrupts The remaining bits 1 7 should be set to 0 in expanded mode The mask bit can optionally be saved and cleared when an expanded mode interrupt is serviced This allows other hardware requested interrupts to be locked out until the mask is restored IPND register bits 0 7 in expanded mode have no function since external logic is responsible for posting interrupts IMAP Control Registers hard wired vector offset highest selected vector number CAOSIA XINT7 0 Figure 12 3 Expanded Mode 12 5 INTERRUPT CONTROLLER Enable I nable Input NC gl ul MSB Q 7 E1 GS Q7 1 GS 6 77 606 imis S A2 p Q4 Priority Q Priority Encoder 0 Encoder b A1 A1 O 2 0 2 0 1 91 Qo EO r Q 0 E0 A00 Interrupt Sources i up to 63 lines 97 E1 GS 9 6 05 A2 Q4 Priority 1960 O Encoder processor s INT pins 0 1 Q0 EO Q7 E1 GS Q6 5
210. E SRC1 SRC2 Displacement COBR OPCODE Displacement CTRL OPCODE SRC DST Offset MEMA Base OPCODE SRC DST i Index MEMB 32 Bit Displacement F CA009A rd Figure 4 1 Machine Level Instruction Formats 4 1 4 Instruction Operands This section identifies and describes operands that can be used with the instruction formats Format Operand s Description REG srcl src2 srcldst srcl and src2 can be global registers local registers special function registers or literals src dst is either a global local or special function register CTRL displacement CTRL format is used for branch and call instructions displacement value indicates the target instruction of the branch or call COBR srcl src2 displacement srcl src2 indicate values to be compared displacement indicates branch target src can specify a global register local register or a literal src2 can specify a global local or special function register See section 2 2 3 Special Function Registers SFRs pg 2 4 MEM srcldst efa Specifies source or destination register and an effective address efa formed by using the processor s addressing modes described in section 3 3 MEMORY ADDRESSING MODES pg 3 5 Registers specified in a MEM format instruction must be either a global or local register 4 3 INSTRUCTION SET SUMMARY 4 2 INSTRUCTION GROUPS intel 1960 processor instruction s
211. EG u 4 4 Copy DMA working registers to on chip DMA RAM Exclusive Nor srcl sre2 ds xnor teur reg lit sfr regisfr I U M S589 R 0 5 1 1 dst lt not src2 or srcl or src2 and srcl Exclusive Or xor srcl src2 dst reg lit sfr reg lit sfr reg sfr I U M 586 REG R 0 5 1 1 dst lt src2 or src1 and not src2 and srcl March 1994 Page 18 of 18 Order Number 272220 002
212. EOP3 0 pins do not require external synchronization however to guarantee detection on a particular PCLK2 1 cycle setup and hold requirements must be satisfied The maximum pulse width requirement for the EOP3 0 pin is to prevent more than one buffer transfer to terminate in the source destination chaining mode EOP3 0 inputs adhere to the same timing requirements as DREQ3 0 for arbitration of the next DMA transfer 1 m lt lt 1 lt 15 CLKs 2 CLKs min gt Note has the same timing requirements as DREQ to prevent unwanted DMA requests is NOT edge triggered EOP must be held for a minimum of 2 clock cycles then deasserted within 15 clock cycles CX045A Figure 13 15 EOP3 0 Timing 13 11 4 Block Mode Transfers Block mode DMAs require no synchronization with a source or a destination device DREQ3 0 inputs are ignored during block mode DMAs The acknowledge signal DACK3 0 is driven active when the source is accessed EOP TC3 0 pins have the same function as described in section 13 11 3 End Of Process Terminal Count Timing pg 13 32 13 11 5 DMA Bus Request Pin The DMA request pin DMA indicates that the DMA controller initiated a bus access The pin is asserted low for any DMA load or store bus request DMA is deasserted high for other bus requests The DMA pin has the same timing as the W R pin T
213. Execution Opcode Mach Instruction Result Type Issue Latency Mnemonic Description nif om of p tip em te Events Modes PIX Opcode 1 Format Test For Equal teste dst reg sfr E I I U M 22 COBR u 1 2 1 2 if AC cc and 010 0 dst lt 1 else dst lt 0 Test For Greater test as g reg sfr U 21 COBR u 1 2 1 2 if AC cc and 001 0 dst lt 1 else dst lt 0 Test For Greater Or Equal testge dst reg sfr U M 23 COBR u 1 2 1 2 if AC cc and 011 0 dst 1 else dst lt 0 Test For Less testl dst reg sfr I I U M 24 COBR u 1 2 1 2 if AC cc and 100 0 dst lt 1 else dst 0 Test For Less Or Equal testle dst reg sfr I I U M 26 COBR u 1 2 1 2 if AC cc and 110 0 dst lt 1 else dst lt 0 Test For Not Equal testne dst reg sfr gt _ I U M 25 COBR u 1 2 1 2 if AC cc and 101 0 dst lt 1 else dst lt 0 Test For Not Ordered testno dst reg sfr I I U M 20 COBR u 1 2 1 2 if AC cc 000 dst lt 1 else dst 0 Test For Ordered testo dst reg sfr I I U 27 COBR u 1 2 1 2 if AC cc and 111 0 dst lt 1 else dst 0 Update DMA Channel udma I U P 63 1 R
214. FFF FFFFH must equal 0 14 7 INITIALIZATION AND SYSTEM REQUIREMENTS intel After the checksum is computed initialization continues This includes caching various fields from the PRCB caching the NMI vector entry caching the supervisor stack pointer and computing the frame pointer and stack pointer As part of initialization the processor loads the remainder of the memory region configuration table from the external control table The Bus Configuration BCON register is also loaded at this time The control table valid BCON ctv bit can be set in the control table to validate the region table after it is loaded In this way the bus controller is completely configured during initial ization See section 10 2 MEMORY REGION CONFIGURATION pg 10 2 for a discussion of memory regions and section 10 3 PROGRAMMING THE BUS CONTROLLER pg 10 5 for information about configuring the bus controller Example 14 1 Algorithm for Computing the Checksum xe memory FFFF FF10H read 8 words from physical address FFFF FF10H chksum lt FFFFFFFFH add with carry x 0 chksum lt chksum add with carry x 1 chksum lt chksum add with carry x 2 chksum lt chksum add with carry x 3 chksum lt chksum add with carry x 4 chksum lt chksum add with carry x 5 chksum lt chksum add with carry 6 chksum lt chksum add with carry x 7 14 2 6 Process Control Block PRCB The PRCB contains base addresses
215. G coprocessors interface to the RF with two 64 bit source buses and a single 64 bit destination bus The source and result from different REG coprocessors can access the RF simultaneously using this bus structure The 64 bit source and destination buses allow the eshro mov and movl instructions to execute in a single cycle A 6 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION To manage register dependencies during parallel register accesses register bypassing result forwarding is implemented The register bypassing mechanism is activated whenever an instruc tion s source register is the same as the previous instruction s destination register The instruction pipeline allows no time for the contents of a destination register to be written before it is read again by another instruction Because of this the RF forwards the result data from the return bus directly to the source bus without reading the source register A 1 4 Execution Unit EU The EU is the 1960 Cx processor core s 32 bit arithmetic and logic unit The EU can be viewed as a self contained REG coprocessor with its own instruction set As such the EU is responsible for executing or supporting the execution of all integer and ordinal arithmetic instructions logic and shift instructions move instructions bit and bit field instructions and compare operations The EU performs any arithmetic or logical instructions in a single clock A 1 5 MMultiply Divide Unit MDU
216. HINE LEVEL INSTRUCTION FORMATS tel 31 31 24 20 16 12 31 8 24 20 16 Tg 8 4 0 8 24 20 16 12 8 4 0 0 31 28 24 20 16 12 8 4 0 fof omer MODE OPTIONAL DISPLACEMENT 31 28 24 20 16 12 8 4 0 Figure D 1 Instruction Formats The src dst field can specify a source operand a destination operand or both depending on the instruction Here again mode bit M3 determines how this field is used Table D 1 shows this relationship Table D 1 Encoding of SRC DST Field in REG Format M3 SRC DST SRC Only DST Only 0 g0 g15 90 g15 90 g15 rO r15 r0 r15 rO r15 1 Not Allowed Literal SfO sf31 If M3 is clear the src dst operand is a global or local register that is encoded as shown in Table D 1 If M3 is set the src dst operand can be used as source only operand that is 1 a literal or 2 a destination only operand that is a special function register D 2 intel MACHINE LEVEL INSTRUCTION FORMATS 0 3 FORMAT The COBR format is used primarily for compare and branch instructions test if instructions also use the COBR format The COBR opcode field is eight bits two hexadecimal digits The src and src2 fields specify source operands for the instruction The src field can specify either a global or local register or a literal as determined by mode bit m1 The src2 field can specify a glo
217. ITIALIZATION AND SYSTEM REQUIREMENTS intel The AC initial image is loaded into the on chip AC register during initialization The AC initial image allows the initial value of the overflow mask no imprecise faults bit and condition code bits to be selected at initialization The AC initial image condition code bits can be used to specify the source of an initialization or reinitialization when a single instruction entry point to the user startup code is desirable This is accomplished by programming the condition code in the AC initial image to a different value for each different entry point The user startup code can detect the condition code values and thus the source of the reinitialization by using the compare or compare and branch instructions The fault configuration word allows the operation unaligned fault to be masked when a non aligned memory request is issued See section 10 4 DATA ALIGNMENT pg 10 9 for a description of non aligned memory requests If bit 30 in the fault configuration word is set a fault is not generated when a non aligned bus request is issued The 1960 Cx processor in this case automatically performs the required sequence of aligned bus requests An application may elect to generate a fault to detect unwanted non aligned accesses by initializing bit 30 to 0 thus enabling the fault The instruction cache configuration word allows the instruction cache to be enabled or disabled at initialization If
218. Idle STATE 1 Assert RAS RAS is asserted CAS3 0 is not asserted COL ADR is not asserted READY is not asserted WE W R the next state is STATE 2 STATE 2 MUX the address RAS is asserted CAS3 0 is not asserted COL ADR is asserted READY is not asserted WE W R STATE 3 Assert CAS write is ready read is not RAS is asserted CAS3 0 BE3 0 COL ADR is asserted READY W R WI W R I pl pd W R amp amp BLAST Write access not done THEN B 35 BUS INTERFACE EXAMPLES STATE 0 STATE 4 STATE 5 STATE 6 STATE 7 STATE 8 Idle the next state is STATE 2 remove CAS ELSE IF W R amp amp BLAST the next state is STATE 5 Write Finished RAS Precharge ELSE W R Read the next state is STATE 4 Read Read data ready RAS is asserted CAS3 0 BE3 0 COL ADR is asserted READY is asserted WE W R IF BLAST read not Done THE the next state is STATE 3 Remove READY ELSE BLAST Read Done the next state is STATE_5 RAS Precharge the next state is STATE 3 RAS Precharge RAS is not asserted CAS3 0 is not asserted COL ADR X READY is not asserted WE X the next state is STATE_6 More RAS Precharge RAS is not asserted CAS3 0 is not asserted COL_ADR X READY is not
219. M mory 26 2 2 2 Internal Data RAM 4 2 2 3 Instruction Cache 2 2 4 Data Cache 80960 Processor C 3 2 5 Data and Data Structure C 3 RESERVED LOCATIONS IN REGISTERS AND DATA 5 C 4 6 4 Mages 4 C 4 1 Instruction Timing x rire Dg Re LX CD EH de HERR Hd iced C 4 C 4 2 Implementation Specific Instructions 2 C 4 6 5 EXTENDED REGISTER SET 12 ee ERE Rie ge niu be e eas 5 6 6 INITIALIZATION sia ii th etta d etus 5 XV 5 intel S Ce INTERRUPTS eee etre rt t een te erp i e e ten 5 OTHER i960 CA CF PROCESSOR IMPLEMENTATION SPECIFIC FEATURES C 6 C 8 1 Data Control Peripheral Units ene C 6 C 8 2 Fault Impletnientation ccrte tede dfe te re Arad C 6 9 BREAKPOINTS etti meo aec ere ien be um e C 6 10 BOCK PIN sie er ate c E C 6 C 10 1 External System Requirements C 6 APPENDIX D MACHINE LEVEL INSTRUCTION FORMATS D 1 GENERAL INSTRUCTION D 1 0 2 REG FORMAT gt nna b ipee d ete D 1 D3 COBR FORMAT
220. N AND SYSTEM REQUIREMENTS Example 14 7 Makefile Sheet 2 of 2 LDFILE init FINALOBJ init OBJS init o ctltbl o initmain o IBR rom ibr o LDFLAGS ACA Fcoff TS LDFILE m ASFLAGS V CCFLAGS ACA Fcoff V c init ima FINALOBJ rom960 S LDFILE FINALOBJ init OBJS IBR gid960 S LDFLAGS o lt OBJS 5 05 960 S ASFLAGS lt gcc960 S CCFLAGS 14 24 INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 8 Initialization Header File init h Sheet 1 of 2 Kf RR ae aa E oe ee Ee ae TTE init h n M fime fe e PM define BYTE_N n data unsigned data gt gt n 8 amp OxFF typedef struct unsigned charbus_byte_0 unsigned charreserved 0 3 unsigned charbus byte 1 unsigned charreserved 1 3 unsigned charbus byte 2 unsigned charreserved 2 3 unsigned charbus byte 3 unsigned charreserved 3 3 void first inst unsigned prcb ptr int check sum 6 IBR define BURST on 20x1 0 define READY on 20 2 0 define PIPELINE on 20x4 0 define BIG ENDIAN on on 0x1 22 0 Bus Width can be 8 16 or 32 default to 8 define BUS WIDTH bw bw 16 1 19 0 Wait States define NRA
221. N REFERENCE BY OPCODE This section lists the instruction encoding for each 19609 microprocessor instruction Instructions are grouped by instruction format and listed by opcode within each format Table E 1 Miscellaneous Instruction Encoding Bits M3 M2 M1 52 S1 T Description REG Format src1is a global or local register srci is a literal src is a special function register reserved Src2 is a global or local register src2is a literal Src2 is a special function register reserved src dstis a global or local register src dstis a literal when used as a source or a special function register when used as a destination M3 may not be 1 when src dst is used both as a source and destination in an instruction atmod modify extract modpc COBR Format 0 0 X src src2 and dst global or local registers 1 0 X src is a literal src2 and dst are global or local registers 0 1 X Src1 is a global or local register src2 and dst are special function registers 1 1 0 Src1 is a literal src2 and dst are special function registers COBR Format and CTRL Format X X 0 Outcome of conditional test is predicted to be true X X 1 Outcome of conditional test is predicted to be false x XxX Ol
222. ODE intel C 2 ADDRESS SPACE RESTRICTIONS Address space properties that are implementation specific to this microprocessor are described in the subsections that follow C 2 1 Reserved Memory Addresses in the range FF00 0000H to FFFF FFFFH are reserved by the 1960 architecture Any uses of reserved memory are implementation specific The 1960 Cx processor uses a section of the reserved address space for the initialization boot record see section 14 2 5 Initialization Boot Record IBR pg 14 5 The initialization boot record may not exist or may be structured differently for other implementations of the 1960 architecture Code which relies on structures in reserved memory is not portable to all 1960 processor implementations C 2 2 Internal Data RAM Internal data RAM an 1960 Cx processor implementation specific feature is mapped to the first 1 Kbyte of the processors address space 0000H 03FFH High performance supervisor protected data space and the locations assigned for DMA and interrupt functions are special features which are implemented in internal data RAM Code which relies on these special features is not directly portable to all 1960 processor implementations C 2 3 Instruction Cache The 1960 architecture allows instructions to be cached on chip in a non transparent fashion This means that cache may not detect modification of the program memory by loads stores or alteration by external agents Each impleme
223. ON Function Bit Value PCLK ADS A31 4 SUP DMA D C BE3 0 LOCK W R D31 0 WAIT BLAST DT R REDDERET EL TI a 2 reserved 31 23 Byte Bus Order Width 2 20 19 X 32 bit 10 A D D D D D non pipelined request concludes pipelined reads begin pipelined reads conclude non pipelined requests begin External Ready Control F CX037A Figure 11 14 Burst Pipelined Read Waveform 11 23 EXTERNAL BUS DESCRIPTION In P Non Pipelined ae Pipelined Burst Rare Pipelined Non Burst 0 A31 2 A33 A34 A35 D01 1 D10 1012 D30 D31 D32 1 D33 D34 D35 bi eder Figure 11 15 Pipelined to Non Pipelined Transitions 11 3 LITTLE OR BIG ENDIAN MEMORY CONFIGURATION The bus controller supports big endian and little endian byte ordering for memory operations Byte ordering determines how data is read from or written to the bus and ultimately how data is stored in memory Little endian systems store a word s least significant byte at the lowest byte address in memory For example if a little endian ordered word is stored at address 600 the least significant byte is stored at address 600 and the most significant byte at address 603 Big endian systems store the least
224. ON SET SUMMARY Table 4 2 Arithmetic Operations Data Types Arithmetic Operations Integer Ordinal X Add with Carry Subtract Subtract with Carry xXx Multiply Extended Multiply Divide Extended Divide Remainder Modulo Shift Left Shift Right Extended Shift Right X Shift Right Dividing Integer X NOTE i960 processor specific extension to the 80960 instruction set X XJ X X X XxX XxX XxX Xx xJ x Xx Xx 4 2 2 1 Add Subtract Multiply and Divide These instructions perform add subtract multiply or divide operations on integers and ordinals addi add integer addo add ordinal subi subtract integer subo subtract ordinal muli multiply integer mulo multiply ordinal divi divide integer divo divide ordinal addi subi muli and divi generate an integer overflow fault if the result is too large to fit in the 32 bit destination divi and divo generate a zero divide fault if the divisor is zero 4 7 INSTRUCTION SET SUMMARY In 4 2 2 2 Extended Arithmetic These instructions support extended precision arithmetic 1 arithmetic operations on operands greater than one word in length addc add ordinal with carry subc subtract ordinal with carry emul extended multiply ediv extended divide addc adds two word operands literals or contained in registers plus the AC Register condition co
225. Pointer r0 5 4 5 16 Pipeline Stalls register bypassing A 19 register scoreboarding 18 pipelined read accesses 10 3 11 21 posting interrupts 6 6 atomic modify operations 6 7 external agents 6 7 hardware requested interrupts 6 6 software requested interrupts 6 6 sysctl 6 7 power and ground planes 14 27 PRCB 14 8 prereturn trace mode 8 5 Previous Frame Pointer PFP 5 4 5 16 location 2 3 priority 31 interrupts 6 3 12 8 procedure calls branch and link 5 1 call and return mechanism 5 1 leaf procedures 5 1 overview 1 3 procedure stack 5 3 growth 5 3 Process Controls PC register 2 17 execution mode flag 2 17 initialization 2 19 modification 2 19 modpc 2 19 priority field 2 18 processor state flag 2 18 trace enable bit 2 19 trace fault pending flag 2 19 Index 11 INDEX Processing Units A 20 Address Generation Unit A 25 Bus Control Unit A 26 Data RAM A 24 Execution Unit A 20 Multiply Divide Unit A 22 processor control block 2 1 2 8 processor initialization 14 1 processor management instructions 4 18 processor state registers 2 1 2 14 Arithmetic Controls AC register 2 15 Instruction Pointer IP register 2 15 Process Controls PC register 2 17 Trace Controls TC register 2 20 R r0 Previous Frame Pointer 5 16 register addressing and alignment 2 5 register bypassing A 7 register cache 2 1 5 9 size 5 9 Register File RF A 6 CTRL units 15 MEM A 6 MEM units A 15 REG A 6 REG units A 15 Registe
226. RIP is unpredictable FAULTS intel The processor provides two mechanisms for controlling the circumstances under which faults are generated the AC register no imprecise faults bit NIF bit and the synchronize faults instruction syncf Faults are categorized as precise imprecise and asynchronous The following subsections describe each 7 9 1 Precise Faults Precise faults are those intended to be software recoverable For any instruction that can generate a precise fault the processor does not execute the instruction if an unfinished prior instruction will fault and does not execute subsequent out of order instructions that will fault Also the RIP points to an instruction where the processor can resume program execution without breaking program control flow Two faults are always precise trace faults and protection faults 7 9 2 Imprecise Faults Imprecise faults are those where the architecture does not guarantee that sufficient information is saved in the fault record to allow recovery from the fault For imprecise faults the faulting instruction address is correct but the state of execution of instructions surrounding the faulting instruction may be unpredictable Also the architecture allows imprecise faults to be generated out of order which means that the RIP may not be of any value for recovery Faults that the archi tecture allows to be imprecise include e operation e arithmetic constraint e Ref
227. RMANCE OPTIMIZATION intel For other addressing modes information in section A 2 6 Micro flow Execution pg A 36 still applies For each instruction that requires multiple reads on the external bus such as Idq the BCU buffers the return data until all data is returned from the bus This optimization reduces the internal load bus overhead to a minimum and allows the processor to access the MEM side while external loads are in progress If instructions are issued back to back with no register dependencies and hit the cache execution can proceed at the rate of one instruction per clock For cache misses the processor issues instructions until the cache is full Subsequent back to back execution proceeds at bus bandwidth Table A 1 BCU Instructions for the i960 CF Processor a Back to Back Issue Mnemonic Clocks Throughput Clocks Throughput Clocks Hits Hits Misses Misses Id 1 1 1 4 2 Idob Idib Idos Idis 1 1 1 5 3 Idt 1 1 1 6 4 Idq 1 1 1 5 51 1 2 2 stob stib stos stis stl 1 N A 3 N A stt N A 4 N A stq N A 5 N A A 1 8 8 BCU Queues and Cache Coherency The bus control unit is implemented as a coprocessor Many clock cycles can pass after a cacheable load instruction is issued before data is returned to the data cache and registers Because of this delay the BCU was modified to support data cache operation The pro
228. RS AND LITERALS AS INSTRUCTION OPERANDS The i960 Cx processors use only simple load and store instructions to access memory all operations take place at the register level The processors use 16 global registers 16 local registers three special function registers and 32 literals constants 0 31 as instruction operands The global register numbers g0 through g15 local register numbers are rO through r15 special function registers are 5 0 sf1 and sf2 Several of these registers are used for a dedicated function For example register rO is the previous frame pointer sometimes referred to as pfp Some assemblers and compilers only recognize one form of a register operand 1960 processor compilers recognize only the instruction operands listed in Table 2 1 Throughout this manual the registers descriptive names numbers operands and acronyms are used interchangeably as dictated by context 2 1 PROGRAMMING ENVIRONMENT intel 0000 0000H FFFF FFFFH Address Spaco e Architecturally Defined Data Structures Fetch Load Store Instruction Cache Instruction Stream Instruction Execution Sixteen 32 Bit Global Registers Processor State Registers Register Cache ul Sixteen 32 Bit ro il Local Registers 15 Pointer Arithmetic Three Special 510 Controls Function Registers sf2 Controls Controls F CA001A Figure 2 1 i960 Cx Microprocessor Programming Environment 2 2 1 Glob
229. STATE 1 Enable Selected Chip Hold Off Write or Read B 45 BUS INTERFACE EXAMPLES STATE 0 CE UART A3 TC A3 RD is not asserted W R is not asserted the next state is state 2 STATE 2 Enable Selected Chip Hold Off Write or Read CE UART A3 TC A3 RD is not asserted W R is not asserted the next state is state 3 STATE 3 Enable Selected Chip Hold Off Write or Read CE UART A3 TC A3 RD is not asserted W R is not asserted IF READ read THE next state is STATE 4 ELSE write next state is STATE 5 STATE 4 Read asserted to selected peripheral CE UART A3 TC A3 RD is asserted W R is not asserted IF BLAST Done THE next state is STATE 0 ELSE write next state is STATE 4 STATE 5 Write asserted to selected peripheral CE UART A3 TC A3 RD is not asserted W R WAIT IF BLAST Done THEN next state is STATE 0 ELSE write next state is STATE 5 B 46 intel CONSIDERATIONS FOR WRITING PORTABLE CODE APPENDIX CONSIDERATIONS FOR WRITING PORTABLE CODE This appendix describes the aspects of the microprocessor that are implementation dependent The following information is intended as a guide for writing application code that is directly portable to other i960 architecture implementations C
230. Ser Bit bitpos sre reg lit sfr reg lit sfr dst reg sfr dst lt src or 2 bitpos mod 32 58 3 R 0 5 1 1 shli Shift Left Integer len reg lit sfr dst reg sfr if len gt 32 i 32 else i lt len temp src s sign lt temp bit31 while temp bit31 temp lt temp lt lt 1 i dst lt temp ign and i 0 IO 59 E REG R 05 1 1 1994 Page 14 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic Description cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p tfp em te Events Modes T Y Format Type em Latency Shift Left Ordinal shlo len sre dst reg lit sfr reg lit sfr reg sfr U M 59 C REG R 05 1 1 if len lt 32 dst lt src lt lt len else dst lt 0 Shift Right Dividing Inte hrdi len sre dst shrd reg lit sfr reg lit sfr reg sfr if len gt 32 i lt 32 else i lt len temp lt src s sign lt temp bit31 lost bit 0 while i 0 I U M 59 5 REG m 3 3 lost bit lt lost bit or temp bit0 temp lt temp gt gt 1 temp bit31 lt temp bit30 ici 1 if s_sign
231. TION SET SUMMARY 4 2 7 2 Conditional Branch With conditional branch BRANCH IF instructions the processor checks the AC register condition code flags If these flags match the value specified with the instruction the processor jumps to the target IP These instructions use the displacement plus IP method of specifying the target IP be t f branch if equal true bne t f branch if not equal bl t f branch if less m ble t f branch if less or equal bg t f branch if greater bge t f branch if greater or equal bo t f branch if ordered bno t f branch if unordered false All use the CTRL format bo and bno are used with real numbers Refer to section 2 6 2 2 Condition Code pg 2 16 for a discussion of the condition code for conditional operations 4 2 7 3 Compare and Branch These instructions compare two operands then branch according to the comparison result Three instruction subtypes are compare integer compare ordinal and branch on bit cmpibe t f compare integer and branch if equal cmpibne t f compare integer and branch if not equal cmpibl t f compare integer and branch if less cmpible t f compare integer and branch if less or equal cmpibg t f compare integer and branch if greater cmpibge t f compare integer and branch if greater or equal cmpibo t f compare integer and branch if ordered f compare integer and branch if unordered cmpobe t f
232. The pseudo code following the figure is provided only to describe the state machine diagram It is not intended for direct use as PLD equations State BLAST amp IWAIT amp 1 3 amp 1 2 E i ress Multiplexer IBLAST amp amp amp A2 IF ICOL ADR amp DRAM_ADR 3 2 ADR 3 2 D BLAST IF COL ADR DRAMADRS2 ADR 12 11 1 DRAM ADR 32 0 1 2 DRAM ADR 322 1 0 3 DRAM ADR 3 2 1 1 F CA118A Figure B 18 DRAM Address Generation State Machine B 23 BUS INTERFACE EXAMPLES STATE 0 STATE 1 STATE 2 STATE 3 Multiplexer Emulation DRAM ADR2 DRAM ADR3 IF WAIT amp amp COI L ADR amp amp 2 COI L_ADR 66 1 L ADR 66 COI L ADR amp amp address generation BLAST amp amp COL ADR amp amp ADR3 THEN next s ELSE IF WAIT amp amp BLAST amp amp COL ADR amp amp ADR3 1 amp amp ADR2 THEN next s ELSE next s 0 amp amp ADR2 0 tate is STATE 0 tate is STATE tate is STATE 0 Generate address 01 DRAM ADR2 1 DRAM ADR3 0 IF LAST THE ELSE IF LAST amp amp WAIT B N next state is STATE B HE next state is STATE ELSE next state is STATE 1 Generate address 10 DRAM ADR2 0 DRAM ADR3 1 BLAST
233. Then on a return from that fault handler it generates a fault on the prereturn trace event The prereturn trace is the only trace event that can cause two successive trace faults to be generated between instruction boundaries 8 2 6 Supervisor Trace When supervisor trace mode is enabled the processor generates a supervisor trace event when e acall system instruction calls executes where the procedure table entry is for a system supervisor call or e instruction executes and the return type field is set to 0105 or 011 i e return from supervisor mode When these procedures are called with supervisor calls this trace mode allows a debugging program to determine kernel procedure call boundaries within the instruction stream 8 2 7 Breakpoint Trace Breakpoint trace mode allows trace events to be generated at places other than those specified with the other trace modes This mode is used in conjunction with mark and fmark 8 2 7 1 Software Breakpoints mark and fmark allow breakpoint trace events to be generated at specific points in the instruction stream When breakpoint trace mode is enabled the processor generates a breakpoint trace event any time it encounters a mark fmark causes the processor to generate a breakpoint trace event regardless of whether or not breakpoint trace mode is enabled 8 2 7 2 Hardware Breakpoints The hardware breakpoint registers are provided to enable generation of trace events and trac
234. Type Fault Entry 50H FCH 31 Local Call Entry 210 Fault Handler Procedure Address 31 System Call Entry 21 0 Fault Handler Procedure Number o 0000 027FH Reserved Initialize to 0 F CA019A Figure F 3 Fault Table and Fault Table Entries Section 7 3 FAULT TABLE pg 7 4 F4 REGISTER AND DATA STRUCTURES Fixed Data Structures Address Initialization Boot Record FFFFFFOOH s Initial Bus Configuration least significant byte of each word FFFFFF10H First Instruction Pointer PRCB Pointer 6 check words FFFFFF14H FFFFFF18H for bus confidence self test FFFFFF2CH Relocatable Data Structures User Code Loo Process Control Block PRCB Fault Table base address 00H Control Table base address 04H AC Register initial image 08H Fault Configuration Word OCH Interrupt Table base address 10H System Procedure Table base address 14H Reserved 18H Interrupt Stack Pointer 1CH Instruction Cache Configuration Word 20H Register Cache Configuration Word 24H Control Table Interrupt Table System Procedure Table other architecturally defined data structures not required as part of IMI F_CA135A Figure F 4 Initial Memory Image IMI and Process Control Block PRCB Section 14 2 5 Initialization Boot Record IBR pg 14 5 F 5 REGISTER AND DATA STRUCTURES intel P Current Stack 31 local supervisor or interrupt stack 0
235. US CONTROLLER Table 10 2 BCON Register Bit Definitions Entry Name Bit Definition When BCON ctv bit is clear all memory is accessed as defined by MCON 0 When BCON ctv bit is set MCON 0 15 are used Enables supervisor write protection for internal data RAM at address 100H to 3FFH Configuration Table Valid 0 Internal RAM Protection 1 10 3 3 Configuring the Bus Controller The bus controller is configured automatically when the processor initializes All MCON 0 15 values are loaded from the control table and the BCON ctv bit is set table valid before the first instruction of application code executes The user only has to supply the correct value in the control table in external memory See CHAPTER 14 INITIALIZATION AND SYSTEM REQUIREMENTS for more details on the processor s actions at initialization MCON 0 15 values may be altered after initialization by use of the sysctl instruction It is important to avoid altering an enabled MCON register while a bus access to that region is in progress It is acceptable however to write the same data to an enabled MCON register while a bus access to that region is in progress This consideration is especially important for MCON 0 when it is the master entry BCON ctv 0 10 4 DATA ALIGNMENT Aligned bus requests generate an address that occurs on a data type s natural boundary Quad words and triple words are aligned on 16 byte boundaries double words on 8 by
236. WAIT amp BLAST aps amp CS amp A3 amp A2 avs amp CS amp A2 amp A3 BLAST cea walt IBLAST Access 01 first or next access Access 01 Access 11 5 Access completed Next access F_CA105A B 8 Figure B 5 A3 2 Address Generation State Machine Pseudo code Key signal is asserted low equality test logical clocked assignment amp logical AND value assignment logical OR X Don t Care BUS INTERFACE EXAMPLES STATE 0 BA3 2 00 IF access 01 OR Next access ADS amp amp SRAM CS amp amp A3 2 01 CE amp WAIT 6 BLAST next state is STATE 1 ELSE IF access 10 ADS amp amp SRAM CS amp amp A3 2 10 next state is STATE 2 ELSE IF access 11 ADS amp amp SRAM CS amp amp A3 2 11 next state is STATE 3 ELSE Idle or access 00 next state is STATE 0 STATE 1 BA3 2 01 IF Next access CE amp WAIT amp BLAST HE next state is STATE 2 ELSE IF Done BLAST HE next state is STATE 0 ELSE Just Wait next state is STATE 1 STATE 2 BA3 2 10 IF Next access CE amp WAIT amp BLAST HE next state is STATE 3 ELSE IF Done BLAST HE next state is STATE 0 ELSE Just Wait next state is STATE 2 STATE 3 BA3 2 11
237. X 0 1 1 X 1 0 0 X 1 1 32 Bit Bus Width BYTE BE3 BE2 BE1 BEO 0 1 2 3 0 0 0 0 2 3 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 1 0 1 2 1 0 1 1 3 0 1 1 1 11 11 EXTERNAL BUS DESCRIPTION 11 2 3 Non Burst Requests A basic request non burst non pipelined see Figure 11 5 is an address cycle followed by a single data cycle including any optional wait states associated with the request Wait states may be generated internally by the wait state generator or externally using the 1960 Cx processors READY input Byte E Bus Pipe External ids od em X x X X 1 X 3 Disabled Disabled A 3 2 1 D 1 A PCLK wl A31 2 BE3 0 I W R BLAST DT R B DEN I DMA D C SUP LOCK WAIT 1 1 1 1 1 1 1 1 u F CX027A Figure 11 5 Basic Read Request Non Pipelined Non Burst Wait States 11 12 EXTERNAL BUS DESCRIPTION Non burst accesses and non pipelined reads are the most basic form of memory access Non burst regions may be used to memory map peripherals and memory that cannot support burst accesses Ready control may be enabled or disabled for the region NgpgAp Nwap and wait state fields of a region table entry control basic accesses Ngap Specifies the number of wait states between address and data cycles for read accesses Nwap specifies the number of wait states between add
238. XAMPLES intel Transceiver direction control is connected directly to the DT R signal of the i960 Cx devices Data transceiver usage is optional it is used here to reduce capacitive loading on the data bus The 1960 Cx processors can drive substantial capacitive loads however high speed SRAM may have limited drive capabilities If high speed SRAM is on the data bus it may be necessary to buffer the slower peripherals ADR 19609 Microprocessor Interface Logic A2 UART 82510 1 0 COUNTER 82C54 2 DATA 8 EN A1 0 F CA129A Figure B 29 8 bit Interface Schematic B 42 3 BUS INTERFACE EXAMPLES B 5 3 Waveforms The Timer Counter and UART have long address setup times to read or write They also have long read and write recovery times This design uses a PLD to implement a state machine that delays the read or write signal Delaying the read or write signal satisfies command recovery times Using the internal wait state generator to determine the length of the overall read or write cycle adds flexibility and simplifies the state machine F CA130A Figure B 30 Read Waveforms B 43 BUS INTERFACE EXAMPLES intel P Data lines not driven during Nypa wait states This requires gating the W R signal with the WAIT signal so that W R goes high while the data is still asserted There is a relative timing for output data hold after WAIT goes high The data hold require
239. XTERNAL BUS DESCRIPTION In Nrap 3 2 AI Sy s Eod org S 134 1 1 1 A31 2 i i i ADS 1 1 1 1 1 1 1 BLAST j 1 1 1 1 1 1 1 1 1 1 1 1 1 a 1 1 1 1 1 i 1 I 1 WAIT 1 1 1 1 i i 1 1 1 Nwap 3 Nwpp 2 Nxpa 3 lt gt gt lt Ba 4 1 Be s la 1 1 1 1 1 T T T T T T T T T T T T T 1 i 1 1 L 1 1 1 1 1 1 1 1 1 1 1 031 0 Valid 00 Valid 01 1 1 1 1 1 1 1 1 1 1 1 1 L L L L 1 L L 1 1 1 1 1 1 1 1 1 1 1 n ADS 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 BLAST 1 1 1 1 1 1 1 1 1 1 1 1 1 1 cane 1 1 I 1 1 1 1 1 1 1 WAIT 1 1 1 1 CA033A Figure 11 1 Internal Programmable Wait States For pipelined read accesses the bus controller uses a value of zero for the NypA parameter regardless of the programmed value for the parameter A non zero value defeats the purpose of pipelining The programmed value of is used for wri
240. a burst access ONE WAIT STATE BURST PIPELINED MEMORY SYSTEM EVEN BANK D D D D ODD BANK D D D D CA126A Figure B 26 Two Way Interleaved Read Access Overlap B 37 BUS INTERFACE EXAMPLES intel Figure B 27 is simple schematic of a two way interleaved pipelined memory system design is similar to the design of a non interleaved pipelined memory design with the following exceptions e an output data multiplexer is used to prevent data contention e the write data buffers isolate the memory data buses for writes e low address bit to the memory devices is The A2 address determines which bank even or odd word is selected Figure B 28 shows the read waveform Figure B 28 illustrates a memory system that interleaves read accesses Write interleaving requires latching the written data and controlling memory access with the READY signal Write inter leaving provides less performance improvement than read interleaving Write data must come from the processor this means a write interleaved system must queue data The 1960 Cx processor bus controller queues all access therefore write interleaving does not significantly benefit most applications B 38 intel e BUS INTERFACE EXAMPLES Control Logic A3OD a GE OG ddress Address xL de E OE G A2 0 WE WE3 0 Data 80960Cx Processor 24 F CA127A Figure
241. a byte count of 12 It takes five DREQs to complete this transfer with DACK asserted for every access to the source Alignment overhead occurs at the beginning and end of the DMA operation and depending on DMA byte count may be negligible For short short short to word and word to short multi cycle transfers the DMA performs byte requests when a memory block is unaligned 18 12 intel DMA CONTROLLER memory LSB MSB ADDRESS 0000 0200H Source memory 0000 0204H region 0000 0208H 0000 020CH 0000 0300H destination memory 0000 0304H region 0000 0308H 0000 030CH byte number bus operation SOURCE DESTINATION byte load 0201 byte store 0303 word load 0200 word load 0204 word store 304 word load 0208 word store 308 word load 020C byte store 30C byte store 30D byte store 30E F_CA062A Figure 13 5 Optimization of an Unaligned DMA 13 5 DATA CHAINING Data chaining can generate complex DMAs by linking together multiple transfer operations and is accomplished by using memory based chaining descriptors to describe component parts of a more complex DMA operation 13 13 DMA CONTROLLER intel The component parts of the chained DMA are referred to as chaining buffers To describe a single DMA chaining buffer chaining descriptor Figure 13 6 supplies source address SA destination address DA and byte count BC Chaining buffers are linked together with the value of the next point
242. a cycle of the current read access Pipelining makes the address cycle invisible for back to back read accesses 10 2 3 Wait States A wait state generator within the bus controller generates wait states for a memory access For many memory interfaces the internal wait state generator eliminates the necessity to externally generate a memory ready signal to indicate a valid data transfer Typically extra clock cycles wait states are associated with each data cycle Wait states provide the required access times for external memory or peripherals Five parameters programmed for each region define wait state generator operation These parameters are 10 3 THE BUS CONTROLLER intel NRAD Number of wait cycles for Read Address to Data The number of wait states between address cycle and first read data cycle Programmable for 0 31 wait states NRDD Number of wait cycles for Read Data to Data The number of wait states between consecutive data cycles of a burst read Programmable for 0 3 wait states NwAD Number of wait cycles for Write Address to Data The number of wait states that data is held after the address cycle and before the first write data cycle Programmable for 0 31 wait states Nwpp Number of wait cycles for Write Data to Data The number of wait states that data is held between consecutive data cycles of a burst write Programmable for 0 3 wait states NxpA Number of wait cycles for X read or write Data to Addre
243. a destination register otherwise a FALSE is stored use the COBR format and can operate on local global and special function registers Since test instruction actions depend on a comparison the architecture allows a programmer to predict the likely result of the operation for higher performance The programmer s prediction is encoded in one bit of the opword Intel 80960 assemblers encode the prediction with a mnemonic suffix of t for true and f for false Refer to section A 2 7 7 Branch Prediction pg A 53 4 2 7 Branch Branch instructions allow program flow direction to be changed by explicitly modifying the IP The processor provides three branch instruction types e unconditional branch e conditional branch e compare and branch 4 13 INSTRUCTION SET SUMMARY In Most branch instructions specify the target IP by specifying a signed displacement to be added to the current IP Other branch instructions specify the target IP s memory address using one of the processor s addressing modes This latter group of instructions is called extended addressing instructions e g branch extended branch and link extended Since branch instruction actions depend on the result of a previous comparison the architecture allows a programmer to predict the likely result of the branch operation for higher performance The programmer s prediction is encoded in one bit of the opword The Intel 80960 assembler encodes the predic
244. a on data bus pins When a burst bus mode is enabled a single address cycle can be followed with up to four data cycles This mode enables very high speed data bus transfers When disabled accesses appear as one data cycle per address cycle The burst bus mode can be enabled or disabled on a region by region basis 10 1 THE BUS CONTROLLER intel programmable wait state generator inserts a programmed number of wait states into memory access These wait states independently programmable by region can be specified between e address and data cycles consecutive data cycles of burst accesses e the last data cycle and the address cycle of the next request An external memory ready input permits the user s hardware to insert wait states into any memory cycle This pin works with the wait state generator and is enabled or disabled on a region by region basis Pipelined read mode provides the highest data bandwidth for reads and instruction fetches When a region is programmed for pipelined reads the next read s address cycle overlaps the current read s data cycle The bus controller supports big and little endian byte ordering for memory operations Byte ordering determines how data is read from or written to the bus and ultimately how data is stored in memory 10 2 MEMORY REGION CONFIGURATION Programmable memory region configurations simplify external memory system designs and reduce system parts count Certain bus
245. a single load or store request Source data is not buffered internally instead the data passes directly between source and destination via the external data bus This makes fly by the fastest DMA transfer type Fly by transfers are commonly used for high performance peripheral to memory transfers The fly by mechanism is best described by giving an example of a source synchronized demand mode DMA Figure 13 2 In the example a peripheral at a fixed address is the source of a DMA and memory is the destination Each transfer is synchronized with the source The source requests a transfer by asserting the request pin DREQ3 0 When the request is serviced a store 15 issued to the destination memory while the requesting device is selected by the DMA acknowledge pin DACK3 0 The source device when selected must drive the data bus for the store instead of the processor The processor floats the data bus for a fly by transfer 13 5 DMA CONTROLLER intel 32 bit device Source 32 bit memory i9609 CA CF Microprocessor 32 processor floats source drives data bus during store 1 word store External Bus 1 1 1 External i 0 F y Bus word store word store 1 1 1 1 1 DREQx i i l F_CA059A j 1 Figure 13 2 Example of Source Synchronized Fly by If the destination of a fly by is the requestor destination synchronization a load is issued to th
246. a switch from user mode to supervisor mode occurs and it clears the flag on a return from supervisor mode User and supervisor modes are described in section 2 7 USER SUPERVISOR PROTECTION MODEL pg 2 20 2 17 PROGRAMMING ENVIRONMENT intel Trace Enable Bit PC te 0 no trace faults 1 generated trace faults Execution Mode Flag PC em 0 user mode 1 supervisor mode Trace Fault Pending PC tfp 0 no fault pending 1 fault pending State Flag PC s 0 executing 1 interrupted 0 31 process priority Priority Field PC p Reserved Do not modify F_CROO5A Figure 2 5 Process Controls PC Register PC register state flag bit 13 indicates processor state executing 0 or interrupted 1 If the processor is servicing an interrupt its state is interrupted Otherwise the processor s state is executing While in the interrupted state the processor can receive and handle additional interrupts When nested interrupts occur the processor remains in the interrupted state until all interrupts are handled then switches back to executing state on the return from the initial interrupt procedure PC register priority field bits 16 through 20 indicates the processor s current executing or interrupted priority The architecture defines a mechanism for prioritizing execution of code servicing interrupts and servicing other implementation depen
247. able bit is clear the processor sets the appropriate event flags but does not set the PC register trace fault pending flag 8 4 HANDLING MULTIPLE TRACE EVENTS If the processor detects multiple trace events it records one or more of them based on the following precedence where 1 is the highest precedence 1 Supervisor trace event 2 Breakpoint from mark or fmark instruction or from a breakpoint register branch call or return trace event 3 Instruction trace event When multiple trace events are detected the processor may not signal each event however it always signals the one with the highest precedence 8 5 TRACE FAULT HANDLING PROCEDURE The processor calls the trace fault handling procedure when it detects a trace event See section 7 7 FAULT HANDLING PROCEDURES pg 7 12 for general requirements for fault handling procedures The trace fault handling procedure is involved in a specific way and is handled differently than other faults A trace fault handler must be involved with an implicit system supervisor call When the call is made the PC register trace enable bit in is cleared This disables trace faults when the trace fault handler is executing Recall that for all other implicit or explicit system supervisor calls the trace enable bit is replaced with the system procedure table trace control bit The 8 8 TRACING AND DEBUGGING exceptional handling of trace enable for trace faults ensures t
248. about to be issued The IS issues group A then issues the branch and group B simultaneously Since the branch takes two clocks in the CTRL pipeline to execute there is no break in the 1575 ability to issue instructions On the clock following the issuance of group B the IS issues a new group of instructions from the branch target A 30 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION b x x addo 40 41 42 lda 2 93 44 lda 2 45 46 b y addo g7 g8 g9 lda 2 g10 g11 Group A B Instruction pou Ida addo Scheduler b Ida CTRL 1 7 EU Read src1 src2 00 0 97 g8 Pipeline PE 92 90 01 9997498 Read over AGU Base Bus 93 95 gio Pipeline Execute and Write over Ldbus 94 93 2 g6 lt g5 2 011 lt 010 2 Figure A 16 Branch Second Executable Group A 31 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel b x lda 2 g3 g4 lt 40 41 92 lt B addo 95 96 97 b y y addo 48 g9 410 lda 2 411 412 Group A B C Instruction jsslis ida addo addo addo Scheduler b Ida CTRL e Execute EU Read src1 src2 90 g1 95 96 08 99 Pipeline mE ang 02 lt 00 01 076 05 06 010 lt 08 09 Read over AGU Base Bus 93 911 Pipeline Write Bey 04 lt 93 2 0126 011 2 Figure A 17 Branch in Third Executable Group A 2 4 9 Conditional Br
249. access characteristics may be programmed This programmed bus scheme allows accesses made to different areas or regions in memory to have different characteristics For example one area in memory can be configured for slow 8 bit accesses this is optimal for peripherals Another area in memory can be configured for 32 bit wide burst accesses this is optimal for fast DRAM interfaces Bus function in each region is determined by the memory region configuration The following bus characteristics are selected for each region e Selectable 8 16 or 32 bit wide data bus e Programmable high performance burst access e Five wait state parameters e Memory ready and burst cycle terminate for dynamic access control Programmable pipelined reads Bigor little endian byte order 10 2 intel THE BUS CONTROLLER These characteristics can be programmed independently for accesses made to each of 16 different regions in memory The value of the memory address upper four bits A31 28 determine the selected region Memory region configuration affects all accesses to the addressed memory region Loads stores DMA transfers and instruction fetches all use the parameters defined for the region Programming region characteristics is accomplished by setting values in the memory region configuration registers A separate register allows the user to program the characteristics for each of the 16 memory regions Memory region configuration register
250. ache Interaction During the first issue clock the data cache receives the first load address and recognizes a cache hit The following clock is an execute clock the cache returns data to the register file over the LD bus In the next issue clock the cache receives the second load address and recognizes a miss It is passed on to the BCU in the following clock The BCU then processes the load as if there were no data cache Note that the following load quad instruction is scoreboarded for a single clock while the previous cache miss is issued to the BCU The load quad instruction is determined to be a hit in the third issue clock and its full 128 bits of data is returned to the register file in the following execute clock The 1960 CF microprocessor scoreboards the store instruction until the pending load returns data to the cache The processor writes the data to the register file and the cache in the same clock updating the cache tag and valid bits In the next clock the store instruction is issued For the store the processor writes unconditionally into the cache during the issue clock When using the 1960 CF processor refer to Table A 13 and Table A 10 for a listing of the single clock load and store instructions The table is valid when offset displacement or indirect addressing modes are used over an external bus with the following characteristics Nxap Nxpp Nxpa 0 Burst On Pipelining On Ready Disabled INSTRUCTION EXECUTION AND PERFO
251. ad the following subsections to understand notations that are specific to this chapter 9 1 INSTRUCTION REFERENCE intel 9 2 1 Alphabetic Reference Instructions are listed alphabetically by assembly language mnemonic If several instructions are related and fall together alphabetically they are described as a group on a single page The instruction s assembly language mnemonic is shown in bold at the top of the page e g subc Occasionally it is not practical to list all mnemonics at the page top In these cases the name of the instruction group is shown in capital letters e g BRANCH IF or FAULT IF The 1960 Cx processor specific extensions to the 1960 microprocessor instruction set are indicated with a box around the instruction s alphabetic reference The following 1960 Cx processor s instructions are such extensions eshro sdma sysctl udma Instruction set extensions are generally not portable to other 1960 processor family implementa tions 9 2 2 Mnemonic The Mnemonic section gives the mnemonic in boldface type and instruction name for each instruction covered on the page for example subi Subtract Integer CTRL and COBR format instructions also allow the programmer to specify optional t or f mnemonic suffixes for branch prediction e tindicates to the processor that the condition the instruction is testing for is likely to be true e indicates that the condition is l
252. aint Privileged Attempt to execute while not in supervisor mode 37 26 Example ldconst set channel ldconst Channel 3 Modes r7 load controls ldq Channel 3 transfer r8 load pointers sdma r6 r7 r8 and byte count from memory configure dma channel 3 Opcode sdma 630H REG See Also udma 9 67 INSTRUCTION REFERENCE intel 9 3 51 Format Description Action Faults Example Opcode See Also 9 68 setbit setbit Set Bit setbit bitpos STC dst reg lit sfr reg lit sfr reg sfr Copies src value to dst with one bit set bitpos specifies bit to be set dst lt src or 2 bitpos mod 32 Type Mismatch Non supervisor reference of a sfr setbit 15 r9 rl 4 r1 r9 with bit 15 set setbit 583H REG alterbit chkbit clrbit notbit intel INSTRUCTION SET REFERENCE 93 52 SHIFT Mnemonic shlo Shift Left Ordinal shro Shift Right Ordinal shli Shift Left Integer shri Shift Right Integer shrdi Shift Right Dividing Integer Format sh len STC dst reg lit sfr reg lit sfr reg sfr Description Shifts src left or right by the number of bits indicated with the len operand and stores the result in dst Bits shifted beyond register boundary are discarded For values of len greater than 32 the processor interprets the value as 32 shlo shifts zeros in from the least significant bit shro shifts zeros in from the most significant bit These instructions are equivalent to m
253. ait memory src dst and not 0x3 lt dst src LOCK is asserted during the read and deasserted after the write is completed Atomic Modify sre mask src dst atmod reg lit sfr reg lit sfr reg temp memory src and not 0x3 5 dt memory src and not 0x3 sre dst and mask 9 61 0 REG m 8 wait temp and not mask src dst lt temp LOCK is asserted during the read and deasserted after the write is completed Branch b IB IB U 08 CTRL C 0 2 2 IP lt targ March 1994 Page 4 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Mnemonic Description Arithmetic Controls Process Controls Trace Controls Faults nif om of cc cc 2 1 p te Events Modes A Opcode Opcode Format Instruction Execution Mach Type Instruction Issue Result Latency bal Branch and Link targ IB 814 lt next IP IP lt targ IB 0B CTRL 1 2 2 balx Branch And Link Extended targ dst mem reg dst lt next IP IP lt targ IB 85 MEM u bbc Check Bit and Branch If Clear bitpos sre targ reg lit sfr reg AC ccl lt src and 2 bitpos mod 32 if 1 0 IP lt targ 0 IB 30 COBR bbs Check Bit and Branch If Set bitpos sre targ re
254. al cmpdec srcl Src2 dst reg lit sfr reg lit sfr reg sfr Compares src2 and src values and sets condition code according to comparison results src2 is then decremented by one and result is stored in dst The following table shows condition code setting for the three possible results of the comparison Condition Code Comparison 1005 lt src2 0105 src2 0015 gt Src2 These instructions are intended for use in ending iterative loops For cmpdeci integer overflow is ignored to allow looping down through the minimum integer values if src1 lt src2 AC cc lt 1005 else if src src2 AC cc lt 010 else AC cc lt 0015 dst lt src2 1 ftoverflow suppressed for instruction Type Mismatch Non supervisor reference of a sfr cmpdeci 12 47 gl compares 47 with 12 and sets AC cc to indicate the result 91 lt 97 1 5 7 cmpdeco 5A6H REG cmpinco cmpo cmpi cmpinci COMPARE AND BRANCH intel INSTRUCTION SET REFERENCE 93 17 Mnemonic cmpi Compare Integer cmpo Compare Ordinal Format cmp srcl src2 reg lit sfr reg lit sfr Description Compares src2 and src values and sets condition code according to comparison results The following table shows condition code settings for the three possible comparison results Condition Code Comparison 1005 src2 0105
255. al Registers Global registers are general purpose 32 bit data registers that provide temporary storage for a program s computational operands These registers retain their contents across procedure boundaries As such they provide a fast and efficient means of passing parameters between procedures 2 2 2 PROGRAMMING ENVIRONMENT Table 2 1 Registers and Literals Used as Instruction Operands Instruction Operand Register Name number Function Acronym 00 014 global 00 014 general purpose fp global g15 frame pointer FP local r0 previous frame pointer PFP sp local r1 stack pointer SP rip local r2 return instruction pointer RIP r3 r15 local r3 r15 general purpose 50 special function 0 interrupt pending IPND 51 special function 1 interrupt mask IMSK sf2 special function 2 DMA command DMAC 0 31 literals The 1960 architecture supplies 16 global registers designated gO through g15 Register 615 is reserved for the current Frame Pointer FP which contains the address of the first byte in the current topmost stack frame See section 5 2 CALL AND RETURN MECHANISM pg 5 2 for a description of the FP and procedure stack After the processor is reset register g0 contains die stepping information Software must read the value of g0 before any action is taken to modify this register The Stepping Register Information section in the 80960CA and CF data sheets de
256. al instructions are specific to the 1960 Cx processors These instructions are either functional extensions to the instruction set or instructions which control implementation specific functions CHAPTER 9 INSTRUCTION SET REFERENCE denotes each implementation specific instruction These instructions are e eshro extended shift right ordinal e sdma setup DMA controller e udma update DMA data RAM sysctl system control Application code using implementation specific instructions is not directly portable to the entire 1960 processor family 4 intel CONSIDERATIONS FOR WRITING PORTABLE CODE 5 EXTENDED REGISTER SET 1960 architecture defines a way to address an extended set of 32 registers in addition to the 16 global and 16 local registers Some or all of these registers may be implemented on a specific 1960 processor Since the use of the extended register set is not defined code which addresses these registers is not functionally compatible with all implementations of the 1960 architecture On the 1960 Cx processors three extended registers are implemented as special function registers which are designated by bits 5 and 6 of REG format instructions C 6 INITIALIZATION The 1960 architecture does not define an initialization mechanism The way that an 1960 based product is initialized is implementation dependent Code which accesses locations in initialization data structures is not portable to other 1960 processor implemen
257. all saved local registers In this way call history may be traced back through nested procedures flushreg is also used when implementing task switches in multitasking kernels The procedure stack is changed as part of the task switch To change the procedure stack flushreg is executed to update the current procedure stack and invalidate all entries in the local register cache Next the 5 9 PROCEDURE CALLS intel procedure stack is changed by directly modifying the FP and SP registers and executing a call operation After flushreg executes the procedure stack may also be changed by modifying the previous frame in memory and executing a return operation NOTE When a set of local registers is assigned to a new procedure the processor may or may not clear or initialize these registers Therefore initial register contents are unpredictable Also the processor does not initialize the local register save area in the newly created stack frame for the procedure its contents are equally unpredictable 5 3 PARAMETER PASSING Parameters are passed between procedures in two ways value Parameters are passed directly to the calling procedure as part of the call and return mechanism This is the fastest method of passing parameters reference Parameters are stored in an argument list in memory and a pointer to the argument list is passed in a global register When passing parameters by value the calling procedure stores the parameters t
258. alphabetically by assembly language mnemonic Format and notation used in this chapter are defined in section 9 2 NOTATION pg 9 1 9 1 INTRODUCTION Information in this chapter is oriented toward programmers who write assembly language code for the 1960 Cx processors The information provided for each instruction includes e Alphabetic listing of all instructions e Assembly language mnemonic name and format Description of the instruction s operation e Action or algorithm and other side effects of executing an instruction e Faults that can occur during execution e Assembly language example e Opcode and instruction encoding format Related instructions Additional information about the instruction set can be found in the following chapters and appendices in this manual e CHAPTER 4 INSTRUCTION SET SUMMARY Summarizes the instruction set by group and describes the assembly language instruction format e APPENDIX D MACHINE LEVEL INSTRUCTION FORMATS Describes instruction set opword encodings e APPENDIX E MACHINE LANGUAGE INSTRUCTION REFERENCE A quick reference listing of instruction encodings assists debug with a logic analyzer e INSTRUCTION SET QUICK REFERENCE order 272220 included as an addendum to this manual A tabular quick reference of each instruction s operation 9 2 NOTATION In general notation in this chapter is consistent with usage throughout the manual however there are a few exceptions Re
259. als other peripherals may use the interface described here B 5 1 Implementation Both the 82C54 2 Timer Counter and 82510 UART have address read write and chip enable inputs and an 8 bit bidirectional data bus The slow peripherals example considers only the memory mapped interface to chip control registers The 82C54 2 and 82510 are memory mapped into a memory region programmed for non burst non pipelined reads and an 8 bit data bus The RD high to data float time dictates the number of wait states required Recovery time between reads or writes requires special treatment The following example assumes a 33 MHz bus The issues are the same at other operating frequencies B 5 2 Schematic The interface consists of chip select logic a registered PLD with at least two combinatorial outputs and a data transceiver Chip select logic is the same as in previous examples simple demultiplexer is based only on the address The PLD that controls access qualifies this signal with the address strobe ADS The state machine PLD generates chip enable read and write signals for the UART and Timer Counter It also generates the data enable control for the data transceiver The A3 address signal determines which peripheral is enabled The data transceiver is enabled by the PLD The transceiver is activated when both the CS and DEN signals are asserted The equation is DATA 8 EN CS DEN Equation B 3 B 41 BUS INTERFACE E
260. ance the 32x32 multiply operation a single instruction takes less than five clocks to execute 150 ns or less at 33 MHz Furthermore the multiplier is a parallel unit this allows instructions that follow a multiply to execute before the multiplication is complete In fact if several unrelated instructions follow a multiply the multipli cation consumes only one clock of execution 1 1 4 Integrated Priority Interrupt Model The 1960 Cx microprocessors provide a priority based mechanism for servicing interrupts The mechanism transparently manages up to 248 distinct sources with 31 levels of priority Interrupt requests may be generated from external hardware internal hardware or software The interrupt mechanism is managed by hardware which operates in parallel with program execution This reduces interrupt latency and overhead and provides flexible interrupt handling control INTRODUCTION intel 1 1 5 Complete Fault Handling and Debug Capabilities To aid in program development the 1960 Cx processors detect faults exceptions When a fault is detected the processors make an implicit call to a fault handling routine Information collected for each fault allows a program developer to quickly correct faulting code The processors also allow automatic recovery from most faults To support system debug the 1960 architecture provides a mechanism for monitoring processor activities through a software tracing facility The 1960 Cx proc
261. anch operations involving real numbers bno can be used as branch if false instruction when used after chkbit be can be used as branch if true instruction when following chkbit The targ operand value can be no farther than 292949 279 4 bytes from current IP 9 19 INSTRUCTION SET REFERENCE Action Faults Example Opcode See Also 9 20 intel The following table shows condition code mask for each instruction The mask is in opcode bits 0 2 Instruction Mask Condition bno 0005 Unordered bg 0015 Greater be 0105 Equal bge 0115 Greater or equal bl 1005 Less bne 1015 Not equal ble 1105 Less or equal bo 1112 Ordered For all instructions except bno if mask and AC cc 0005 lt IP displacement bno if AC cc 0005 lt IP displacement resume execution at new else resume execution at next Trace Instruction Branch if taken Breakpoint resume execution at new Instruction and Branch Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC b 1 Operation Unimplemented Execution from on chip data RAM assume AC cc AND 1005 2 0 IP xyz bl xyz be 12H bne 15H bl 14 ble 16H bg 11H bge 13H bo 17H bno 10H CTRL CTRL CTRL CTRL CTRL CTRL CTRL CTRL b bx bbc bbs COMPARE AND BRANCH bal balx BRANCH IF intel INSTRUCTION SET REFERENCE
262. anches When the IS sees a conditional branch instruction the condition codes are sometimes not yet determined For example a conditional branch which immediately follows a compare instruction cannot be allowed to complete execution until the result of the comparison is known However the processor begins to execute the branch based upon the branch prediction bit set by the programmer for that branch When one or more executable instruction groups separate the conditional instruction from the instruction that changed the condition code the condition code will have already settled in the pipeline by the time the prefetch mechanism sees the conditional instruction This situation allows the branch to execute in zero clock cycles as described in Figure A 17 If the conditional instruction and the instruction that sets the condition codes are in the same executable group or in consecutive groups the condition code is not valid when the IS sees the branch a guess is required If the prediction turns out to be correct the branch executes in its normal amount of time as described in the previous section If the prediction is wrong the pipeline is flushed Any erroneously started single or multiple cycle instructions are killed and the branch executes as if there had been no lookahead or prediction In other words A 32 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION e branch takes two clocks out of the IS s issue stage if it
263. and Debug Capabilities 1 4 1 2 SYSTEM INTEGRA TION errore ce Ane en E E E 1 4 1 2 1 Pipelined Burst Bus Control Unit 2 ee 1 4 1 2 2 Flexible DMA Controller 1 4 1 2 3 Priority Interrupt Controller 2 1 5 1 3 ABOUT THIS MANUAL e ere 1 5 1 4 NOTATION AND nennen 1 6 1 4 1 Reserved and Preserved erede cete cei dee peer emat 1 6 1 4 2 Specifying Bit and Signal Values 1 7 1 4 3 Representing Numbers 2 2 44 memet nennen 1 7 1 4 4 Register Names edo ee ee teu eb etti 1 7 2 PROGRAMMING ENVIRONMENT 2 1 OVERVIEW eec dat e detis esee 2 1 2 2 REGISTERS AND LITERALS AS INSTRUCTION OPERANDS 2 1 2 2 1 Global Registers 2 2 2 2 2 Local Reglsters x ERE UE t RR A 2 3 2 2 8 Special Function Registers SFRS essen 2 4 2 2 4 Register Scoreboarding 2 2 pete i ean eddie re 2 4 2 2 5 Ec a Em 2 5 2 2 6 Register and Literal Addressing and Alignment 2 5 2 3 CONTROL REGISTERS iler tet i ade edie doh antici 2 6 2 4 ARCHITECTURE DEFINED DATA STRUCTURES eee 2 8 2 5 MEMORY ADDRESS SPAGE eet apialv nderit didi iere w
264. and Table A 15 The number of clocks shown is the additional number of issue clocks consumed for address calculation prior to the load or store being issued to the BCU or DR These instructions can be issued in parallel with a machine type REG instruction To find the result latency of the BCU or DR see the appropriate section earlier in this appendix Table A 14 Load Micro flow Instruction Issue Clocks The following load instructions consume n additional issue clocks for address calculation before initiating a load request to the BCU or DR where n for each addressing mode is as follows disp reg reg reg scale disp IP Mnemonic offset reg disp reg reg scale disp reg scale Id Idob Idib Idos Idis Idl Idt 1 2 4 Idq NOTE offset disp and reg memory addressing modes incur no address calculation overhead A 2 6 3 Bit and Bit Field scanbit spanbit extract and modify are executed as micro flows Table A 16 lists their execution times For these instructions the IS issues n clocks of instructions in place of the single word 1960 Cx processor instruction encoding where n is shown in the table Table A 15 Store Micro flow Instruction Issue Clocks The following store instructions consume n additional issue clocks for address calculation prior to initiating a store request to the BCU or DR where n for each addressing mode is as follows disp reg reg reg scale disp IP Mnemoni
265. and loads the new frame pointer NFP in global register g15 switches to the interrupted state sets the state flag in its internal process controls to interrupted its execution mode to supervisor and its priority to the priority of the interrupt Setting the processor s priority to that of the interrupt ensures that lower priority interrupts cannot interrupt the servicing of the current interrupt clears the trace fault pending and trace enable flags in its internal process controls Clearing these flags allows the interrupt to be handled without trace faults being raised sets the frame return status field associated with the PFP in register rO to 111 performs a call operation as described in CHAPTER 5 PROCEDURE CALLS The address for the called procedure is specified in the interrupt table for the specified interrupt procedure pointer Once the processor completes the interrupt procedure it performs the following return actions l 6 12 copies the arithmetic controls field and the process controls field from the interrupt record into the arithmetic controls register and process controls respectively It also returns the trace fault pending flags and trace enable bit to their values before the interrupt occurred deallocates the current stack frame and interrupt record from the interrupt stack and switches to the local stack or the supervisor stack the one it was using when it was interrupted performs a return operatio
266. and stores the result in dst The binary results from these two instructions are identical The only difference is that subi can signal an integer overflow Action dst src2 srcl Faults Type Mismatch Non supervisor reference of a sfr Arithmetic Integer Overflow Result too large for destination register subi only Result s least significant 32 bits are stored in dst If overflow occurs and AC om 1 the fault is suppressed AC of is set to a 1 The least significant 32 bits of the result are stored in dst Example subi g6 g9 412 412 lt g9 g6 Opcode subi 593H REG subo 592H REG See Also addi addo subc addc 9 76 intel INSTRUCTION SET REFERENCE 9357 Syncf Mnemonic syncf Synchronize Faults Format syncf Description Waits for all faults to be generated that are associated with any prior uncompleted instructions Action if AC nif 1 wait until no imprecise faults can occur associated with instructions which have begun but are not completed Faults Example ld 2 46 addi r6 r8 r8 syncf and g6 OxFFFF 48 the syncf instruction ensures that any faults that may occur during the execution of the ld and addi instructions occur before the and instruction is executed Opcode syncf 66FH REG See Also mark fmark 9 77 INSTRUCTION SET REFERENCE intel 9358 lt 5 5 Processor Only Mnemonic sysctl System Control Format sysctl srcl src2 s
267. and system bus functionality The processor is designed to minimize the requirements of its external system The processor requires an input clock CLKIN and clean power and ground connections Vgs and Since the processor can operate at a high frequency the external system must be designed with consider ations to reduce induced noise on signals power and ground 14 1 INITIALIZATION AND SYSTEM REQUIREMENTS intel 14 2 INITIALIZATION Initialization describes the mechanism that the processor uses to establish its initial state and begin instruction execution Initialization begins when RESET is deasserted At this time the processor automatically configures itself with information specified in the IMI and performs its built in self test The processor then branches to the first instruction of user code The objective of the initialization sequence is to provide a complete working initial state when the first user instruction executes The user s startup code has only to perform several base functions to place the processor in a configuration for executing application code 14 2 1 Reset Operation The RESET pin when asserted active low causes the processor to enter the reset state external signals go to a defined state Table 14 1 internal logic is initialized and certain registers are set to defined values Table 14 2 When the RESET pin is deasserted the processor begins initialization as described later in this ch
268. andling Procedures Fetching the first instructions of an interrupt handling procedure from external memory impacts interrupt latency and throughput The controller eliminates the fetch time by providing a mechanism to lock procedures or portions of procedures in the processor s instruction cache Using this cache locking feature particular interrupt handlers can always be fetched from on chip instruction cache eliminating the latency incurred from fetching the handlers from external memory Paragraphs that follow describe cache locking of interrupt procedures half or none of the instruction cache can be pre loaded and locked Typically one half is used as normal instruction cache and the other half for locking instructions The 1960 CA processor allows only interrupt procedures to be locked in the cache An improved mechanism on the 1960 CF processor has fewer restrictions any section of code can be locked into the cache not just interrupt procedures sysctl provides the mechanism for locking sections of procedures into the cache Instructions to be locked must first be linked to a contiguous block in external memory The block must be aligned to a quad word address Next sysctl is issued with the configure instruction cache message type The starting address of the block in memory is specified as an operand of the instruction 12 21 INTERRUPT CONTROLLER In P The 1960 CA processor supports 512 bytes or 1 Kbytes of lock
269. ansfer type is specified when a channel is set up using sdma Transfer type specifies source destination request length for operation and whether transfer is performed as a multiple cycle transfer or as a fly by 1 bus cycle transfer Multi cycle transfer is performed with two or more bus requests fly by transfer with a single bus request Fly by and multi cycle transfers are described in the following sections 13 4 1 Multi Cycle Transfers Multi cycle DMA transfer comprises two or more bus requests For these multi cycle transfers loads from a source address are followed by stores to a destination address To execute the transfer DMA microcode issues the proper combination of bus requests For example a typical multi cycle DMA transfer could appear as a single byte load request followed by a single byte store request 13 3 DMA CONTROLLER Table 13 1 Transfer Type Options intel Source Request Length Destination Request Length Transfer Type Byte 8 bits Byte 8 bits Multi Cycle Byte 8 bits Byte 8 bits Fly by Byte 8 bits Short 16 bits Multi Cycle Byte 8 bits Word 32 bits Multi Cycle Short 16 bits Byte 8 bits Multi Cycle Short 16 bits Short 16 bits Multi Cycle Short 16 bits Short 16 bits Fly by Short 16 bits Word 32 bits Multi Cycle Word 32 bits Byte 8 bits Multi Cycle Word 32 bits Short 16 bits Multi Cycle Word 32 bits Word
270. apter RESET is a level sensitive asynchronous input The RESET pin must be asserted when power is applied to the processor The processor then stabilizes in the reset state This power up reset is referred to as cold reset To ensure that all internal logic has stabilized in the reset state a valid input clock CLKIN and must be present and stable for a specified time before RESET can be deasserted The processor may also be cycled through the reset state after execution has started This is referred to as warm reset For a warm reset the RESET pin must be asserted for a minimum number of clock cycles Specifications for a cold and warm reset can be found in the 1960 CA CF microprocessor data sheets The reset state cannot be entered under direct control from a program No reset instruction or other condition which forces a reset exists on the 1960 Cx processors The RESET pin must be asserted to enter the reset state The processor does however provide a means to reenter the initialization process See section 14 3 1 Reinitializing and Relocating Data Structures pg 14 11 14 2 intel INITIALIZATION AND SYSTEM REQUIREMENTS Table 14 1 Pin Reset State Pins Reset State Pins Reset State A31 2 Floating DMA Floating D31 0 Floating SUP Floating BE3 0 High inactive FAIL Low active Low read DACK3 High inactive ADS High inactive DACK2 High inactive WAIT H
271. ar instruction or instruction type in the instruction stream The processor recognizes seven different trace events instruction branch call return prereturn supervisor breakpoint It detects these events only if the TC register mode bit is set for the event If the PC register trace enable bit is also set the processor generates a fault when a trace event is detected A trace fault is generated following the instruction that causes a trace event or prior to the instruction for the prereturn trace event The following trace modes are available Instruction Generates a trace event following every instruction Branch Generates a trace event following any branch instruction when the branch is taken a branch trace event does not occur on branch and link or call instructions Call Generates a trace event following any call or branch and link instruction or any implicit procedure call i e fault or interrupt call Return Generates a trace event following a ret Prereturn Generates a trace event prior to any ret instruction providing the PFP register prereturn trace flag is set the processor sets the flag automatically when prereturn tracing is enabled RIP Program State Changes FAULTS Supervisor Generates a trace event following any calls instruction that references a supervisor procedure entry in the system procedure table and on a return from a supervisor procedure where the return status type in the PFP reg
272. ares 96 93 and sets AC cc concmpo g5 43 rf AG ccA T Iy g5 is compared with g3 concmpi 5A3H REG concmpo 5A2H REG cmpo cmpi cmpdeci cmpdeco cmpinci cmpinco COMPARE AND BRANCH intel INSTRUCTION SET REFERENCE 9 3 21 divi divo Mnemonic divi Divide Integer divo Divide Ordinal Format div srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Divides src2 value by src value and stores the result in dst Remainder is discarded For divi an integer overflow fault can be signaled Action dst lt quotient src2 src1 src2 srcl and dst 32 bits Faults Type Mismatch Non supervisor reference of a sfr Arithmetic Zero Divide The srcl operand is 0 Integer Overflow Result too large for destination register divi only If overflow occurs and AC om 1 fault is suppressed and AC of is set to 1 Result s least significant 32 bits are stored in dst Example divo r3 r8 r13 r13 lt r8 r3 Opcode divi 74BH REG divo 70BH REG See Also ediv mulo muli emul 9 35 INSTRUCTION SET REFERENCE intel 9 3 22 Format Description Action Faults Example Opcode See Also 9 36 ediv ediv Extended Divide ediv srcl src2 dst reg lit sfr reg lit sfr reg sfr Divides src2 by src and stores result in dst The src2 value is a long ordinal 64 bits contained in two adjacent registers src2 specifies the lower numbered register which contains
273. ata Address 0 Breakpoint Enable 0 Reserved 00 disable Initialize to 0 11 enable DABO Mode See Note Data Address 1 Breakpoint Enable BPCON e1 00 disable 11 enable Note DAB1 Mode See Note Data Address Breakpoint DABO DAB1 Modes Break on 00 store only CA026A 01 data only load or store 10 data or instruction fetch 11 any access Figure F 14 Hardware Breakpoint Control Register BPCON Section 8 2 7 Breakpoint Trace pg 8 5 Instruction Address Breakpoint Enable IPB e 00 disable 11 enable Instruction Address F 024 Figure F 15 Instruction Address Breakpoint Registers IPBO IPB1 Section 8 2 7 Breakpoint Trace pg 8 5 F 14 intel REGISTER AND DATA STRUCTURES Interrupt Mode ICON im 00 dedicated 01 expanded 10 mixed 11 reserved Errata 12 06 94 SRB Signal Detection Mode ICON sdm Vector Cache Enable 0 level low activated bits ICON vce 1 falling edge activated incorrectly defined Global Interrupts Enable ICON gie Bit 0 was debounce 0 enabled it now is correctly 1 disabled defined as Fetch From Mask Operation ICON mo External Memory 00 move to R3 mask unchanged Bit 1 was Fast is now 01 move to R3 and clear for dedicated mode interrupts correctly defined as 10 move to R3 and clear for expanded mode interrupts Fetch From Internal 11 move to R3
274. ate 14 3 Register Values After Reset 14 3 19609 Cx Processor Input Pins 14 29 BCU Instructions for the i960 CF Processor A 12 Machine Type Sequences Which Can Be Issued In Parallel A 16 Scoreboarded Register Conditions A 18 Scoreboarded Resource Conditions A 19 Scoreboarded Resource Conditions Due to the Data Cache A 20 EU Instructions A 21 MDU Instructions A 23 Data RAM Instructions A 24 AGU Instructions A 25 BCU Instructions for the i960 CA Processor A 27 CTRL Instructions A 29 Cache Configuration Modes A 34 Fetch Strategy A 34 Store Micro flow Instruction Issue Clocks A 39 Load Micro flow Instruction Issue Clocks A 39 Bit and Bit Field Micro flow Instructions A 40 bx and balx Performance A 40 callx Performance A 41 sysctl Performance A 43 Creative Uses for the Ida Instruction A 49 Code Optimization Summary A 57 Encoding of SRC DST Field in REG Format D 2 Addressing Modes for MEM Format Instructions D 4 Encoding of Scale Field D 5 Miscellaneous Instruction Encoding Bits E 1 REG Format Instruction Encodings E 2 COBR Format Instruction Encodings E 4 CTRL Format Instruction Encodings E 5 MEM Format Instruction Encodings E 6 xxiii intel INTRODUCTION intel CHAPTER 1 INTRODUCTION The i960 CA and CF superscalar microprocessors represent Intel s commitment to provide a spectrum of reliable cost effective high performance processors that satisfy the requirements of today s innovative microprocessor based pr
275. ation DRAM address generation logic speeds burst accesses for static column mode and fast page mode DRAM This is accomplished by reducing the time required to present the consecutive column addresses during a burst access If the address generator is not present the address valid delay time consists of the worst case address valid delay time plus the worst case propagation delay through the input address multiplexer DRAM address generation logic must control the DRAM address two least significant bits During the initial DRAM access address generation logic acts like a multiplexer During column accesses within a burst address generation logic generates consecutive addresses Therefore DRAM address generation logic is designed to function as a multiplexer and an address generator If an address generator is used address valid delay time is equal to address generation time Address generation delay time consists of the clock to feedback and feedback to output delays for the selected device Figure B 18 illustrates the requirements for address generation logic Signals into the DRAM logic are ADR2 ADR3 ADR12 ADR13 WAIT and BLAST from the processor and COL ADR from the DRAM controller logic COL indicates if DRAM controller is requesting the row address COL ADR not asserted or column address _ ADR asserted Signals output from DRAM address generation logic are the DRAM address two least significant bits DRAM_ADR2 3
276. bal local or special function register as determined by special purpose bit s2 Complete encodings of these fields is shown in Table E 1 Miscellaneous Instruction Encoding Bits T bit supports 80960Cx processors branch prediction for conditional instructions If T is set to 0 the condition being tested is likely to be true if set to 1 the condition is likely to be false The displacement field contains a signed two s complement number that specifies word displacement The processor uses this value to compute the address of a target instruction to which the processor goes as a result of a comparison The displacement field s value can range from z210 to 210 1 To determine the target instruction s IP the processor converts the displacement value to a byte displacement i e multiplies the value by 4 It then adds the resulting byte displacement to the IP of the current instruction For the test if instructions only the src field is used Here this field specifies a destination global or local register is ignored D 4 CTRL FORMAT The CTRL format is used for instructions that branch to a new IP including the branch branch if bal and call instructions ret also uses this format The CTRL opcode field is eight bits two hexadecimal digits A branch target address is specified with the displacement field in the same manner as COBR format instructions The displacement field specifies a word displacement as a signed two
277. be completely overlapped with the execution of other instructions Store instruction execution would proceed as does the load except that there would be no return clock and no instructions could be stalled due to a scoreboarded register Table A 10 lists instructions that the 1960 CA processor s BCU executes directly For each instruction that requires multiple reads such as Idq the BCU buffers the return data until all data is returned This optimization reduces the internal load bus overhead to the minimum giving more clocks to the processor to access the DR and perform Ida operations while external loads are in progress The table is valid when offset displacement or indirect memory addressing modes are used over an external bus with the following characteristics Nxap Nxpp Nxpa 0 Burst On Pipelining On Ready Disabled For other addressing modes see section A 2 6 Micro flow Execution pg A 36 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION If instructions listed in the table are issued back to back with no register dependencies they will execute at a rate of one instruction per clock until the BCU queues are full Once the queues are full further back to back BCU instructions execute at the bus bandwidth Figure A 13 shows back to back loads being executed Table A 10 BCU Instructions for the i960 CA Processor Mnemonic Issue Clocks Result Latency Clocks Se Throughput Id
278. bled PC em lt supervisor Trace enable bit of the supervisor PC te lt temp FP T stack pointer is written to PC te intel INSTRUCTION SET REFERENCE Action These accesses are cached in the local register cache memory FP lt r0 15 lt lt temp rt lt temp FP SP lt temp FP 64 IP lt temp entry andnot 0x3 Faults Trace Instruction Call Supervisor Breakpoint Instruction Call and Supervisor Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i TC c or TC s 1 Operation _ Unimplemented Execution from on chip data RAM Type Mismatch Non supervisor reference of a sfr Protection Length Specifies a system procedure number greater than 259 Example calls r12 IP lt value obtained from procedure table for procedure number given in r12 Opcode calls 660H REG See Also bal call callx 9 23 INSTRUCTION REFERENCE intel 9 3 13 Format Description Action Faults 9 24 callx callx Call Extended callx targ mem Calls new procedure targ specifies IP of called procedure s first instruction In executing callx the processor performs a local call as described in section 5 4 LOCAL CALLS pg 5 12 As part of this operation the processor allocates a new set of local registers and a new stack frame for the called procedure Processor then goes to the instruct
279. bus controller operation The description of the bus modes and burst bus operation is simplified by defining these terms 11 1 1 1 Request The terms request bus request or memory request describe interaction between the core and bus controller The bus controller is designed to decouple as much as possible bus activity from instruction execution in the core When a load or store instruction or instruction prefetch is issued the core delivers a bus request to the bus controller unit The bus controller unit independently processes the request and retrieves data from memory for load instructions and instruction prefetches The bus controller delivers data to memory for store instructions The 1960 architecture defines byte short word word double word triple word and quad word data lengths for load and store instructions When a load or store instruction is encountered the core issues to the bus controller a bus request of the appropriate data length for example Idq requests that four words of data be retrieved from memory stob requests that a single byte is delivered to memory 11 1 EXTERNAL BUS DESCRIPTION In The processor fetches instructions using double or quad word bus requests Its microcode issues load and store requests to perform DMA transfers 11 1 1 2 Access The terms access bus access or memory access describe the mechanism for moving data or instructions between the bus controller and memory An access is bou
280. by the symbol Nyggg are provided in the two boldface columns These columns show transfer clocks for the DMA throttle bit set to 1 1 and 4 1 configuration Transfer clocks are given in pairs separated by a the number on the left is the value for source synchro nized demand mode transfers the number on the right is the value for destination synchronized demand mode transfers 13 39 DMA CONTROLLER intel Table 13 5 also shows the number of bytes per transfer This is the number of bytes which are transferred in Nxpgg clock cycles Bytes per transfer is denoted by the symbol throughput is calculated as shown in Equation 13 3 where NXFER BREQ BXFER B _REQ Equation 13 3 number of PCLK2 1 cycles per transfer number of bytes transferred per DMA request number of bytes per DMA transfer Table 13 5 DMA Transfer Clocks Nyrgn Transfer Clocks in PCLK2 1 cycles Source Sync Destination Sync Transfer Type Bytes per Throttle 4 1 Throttle 1 1 source to destination Transfer DMA User User data length Process Process NxrER Process 8 to 8 Multi Cycle 1 4 4 6 6 10 10 7 7 11 11 8 to 16 Multi Cycle 2 11 11 10 11 21 22 18 19 29 30 8 to 32 Multi Cycle 4 23 25 16 15 39 40 30 29 53 54 16 to 8 Multi Cycle 2 10 10 8 8 18 18 14 13 24 23 16 to 16 Multi
281. c ret intx ldconst I 40 call CO 14 19 INITIALIZATION AND SYSTEM REQUIREMENTS intel Example 14 3 High Level Startup Code initmain c unsigned componentid 0 main system or board specific code goes here this code is called by init s system or board specific output routine goes here Example 14 4 Control Table ctltbl c Sheet 1 of 2 n iei hei Rag e pn Pr PR or E m etitblec et ey ieee I include init h typedef struct unsigned control_reg 28 CONTROL_TABLE const CONTROL_TABLE boot_control_table Group 0 Breakpoint Registers 0 0 0 0 Group 1 Interrupt Registers 0 0 0 Interrupt Map Regs set by code as needed Oxc3bc ICON dedicated mode enabled sdm 0 falling edge actived sdm 1 falling edge actived sdm 2 falling edge actived sdm 3 falling edge actived sdm 4 level low activated sdm 5 falling edge actived sdm 6 falling edge actived sdm 7 falling edge actived mask unchanged not cached fast DMA suspended X 14 20 intel INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 4 Control Table ctltbl c
282. c cmpdeci scanbyte chkbit addc subc mov eshro movl movt movq sdma udma spanbit scanbit modac modify extract modtc modpc sysctl calls mark fmark flushreg syncf emul ediv mulo remo divo muli remi modi divi ST Bly 8 3 3 or 9 o c o BE 5 z go B 24 23 19 18 14 13 12 11 10 6 5 4 4 0 0101 1010 dst src2 M3 M2 M1 0111 S2 S1 src1 0101 1010 src2 M3 M2 M1 1100 2 1 src1 0101 1010 src M3 M2 M1 1110 2 S1 bitpos 0101 1011 dst src2 M3 M2 M1 0000 52 61 Src1 0101 1011 dst src2 M3 M2 M1 0010 2 S1 src1 0101 1100 dst M3 M2 M1 1100 S2 S1 src 0101 1101 dst src2 M3 M2 M1 1000 2 1 src1 0101 1101 dst M3 M2 1 1100 52 S1 src 0101 1110 dst M3 M2 M1 1100 52 51 src 0101 1111 dst M3 M2 M1 1100 2 1 src 0110 0011 src3 src2 M3 M2 M1 0000 S2 S1 Src1 0110 0011 0001 0110 0100 dst M3 M2 M1 0000 S2 S1 SIC 0110 0100 dst M3 M2 M1 0001 S2 51 SIC 0110 0100 mask src M3 M2 M1 0101 2 61 dst 01100101 src dst src M3 M2 M1 0000 52 61 mask 01100101 src dst len M3 M2 M1 0001 52 61 bitpos 0110 0101 mask src M3 M2 M1 0100 52 61 dst 01100101 src dst mask M3 M2 M1 0101 52 61 src 0110 0101 src3 src2 M3 M2 M1 1001 52 61 src1 0110 0110 M3 M2 M1 0000 S2 61 src 0110 0110 M3 M2 1 1011 52 S1 0110 0110 M3 M2 M1 1100 2 1 0110 0
283. c Clocks Latency Throughput Throughput AC om 1 AC om 0 muli 32x32 1 5 4 5 16x32 1 3 2 3 mulo 32x32 1 5 4 4 16x32 1 3 2 3 emul 32x32 1 6 5 6 16x32 1 3 2 3 divi 13 37 36 36 divo 3 36 35 35 ediv 36 35 remi remo modi A 23 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 4 3 Data RAM DR On chip data RAM DR described in section 2 5 4 Internal Data RAM pg 2 12 is single ported and 128 bits wide to support accesses of up to one quad load or quad store per clock The DR receives instructions over the MEM machine bus stores addresses over the 32 bit Address Out bus and stores data over the 128 bit store bus The DR returns data over the 128 bit load bus The one clock DR pipeline for reads is shown in Figure A 11 When the IS issues a load from the DR load data is written to the destination register in the following clock An instruction which immediately follows a load from the DR and references the load destination cannot execute in the same clock as the load As shown in the figure the instruction is issued in the clock in which the load data is returning Table A 8 lists the instructions executed directly in most addressing modes without micro flow execution using the DR As seen in Figure A 11 if these instructions are issued back to back they execute at a one clock sustained rate with or without register dependencies addo 416 40 40
284. c offset reg disp reg reg scale disp reg scale st stob stib stos Stis stl stt stq NOTE offset disp and reg memory addressing modes incur no address calculation overhead 1 2 4 A 39 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel Table A 16 Bit and Bit Field Micro flow Instructions Mnemonic Execution Clocks n scanbit 1 spanbit 2 extract 4 modify 3 A 2 6 4 Comparison test instructions are executed as micro flows Execution time depends upon condition code validity and prediction bit settings When condition codes are valid or the prediction bit is set correctly a test instruction takes one issue clock if its correct result is a 1 and two issue clocks if its correct result is a 0 Otherwise the test instruction takes three issue clocks to execute A 2 6 5 Branch Compare and branch extended branch branch and link and extended branch and link instructions are implemented with micro flows cmpib and cmpob instructions take one issue clock if the prediction was correct and two issue clocks if the prediction was incorrect assuming a cached branch target bal takes two issue clocks to execute assuming a cache hit bx and balx are summarized in Table A 17 The number of clocks shown is the total number of issue clocks consumed by the instruction prior to the code at the branch target being issued Times shown assume instruction cache hits and
285. case interrupt latency value does not account for interaction of faults and interrupts It is assumed that faults are not signaled in a stable system Because of the processor s instruction mix and the nature of on chip register cache typical interrupt latency is derived assuming that the interrupt occurs under the following constraints in addition to those listed above e Interrupts a single cycle RISC instruction e Frame flush does not occur Bus queue is empty The value for typical interrupt latency inp is NL int typical 30 PCLK2 1 cycles Equation 12 2 12 18 2 INTERRUPT CONTROLLER Start Here DIVO Instruction With Destination to Local Register NO YES YES 160 YES Ni n129 lt Software Interrupts Used 2 Software Interrupts Used 7 CALLS Instruction Used 2 YES YES Ny int t11 CALLS Instruction Used 2 NL int 93 F CA056A Figure 12 9 Calculation of Worst Case Interrupt Latency N int 12 3 11 Optimizing Interrupt Performance The 1960 Cx processor has several features aimed at reducing the time required to respond to and service interrupts The following section describes three methods for reducing interrupt latency e caching interrupt vectors on chip e DMA suspension while servicing interrupts e caching of interrupt handling procedure code Figure 12 9 shows that controll
286. causes the following actions to occur e core performs an atomic write to the interrupt table and sets the bits in the pending interrupts and pending priorities fields that correspond to the requested interrupt This action posts the software requested interrupt e core updates the software priority register with the value of the highest pending priority from the interrupt table This may be the priority of the interrupt which was just posted This action causes the interrupt to be serviced if its priority is greater than the current process priority or equal to 31 Requesting an interrupt with a priority 0 causes the interrupt table to be checked for posted interrupts See section 6 5 REQUESTING INTERRUPTS pg 6 6 for information about software requested interrupts 4 3 2 2 Invalidate Instruction Cache Executing sysctl with a message type of 01H invalidates all cache entries This mode clears all valid cache bits After the operation the cache is updated normally as misses occur The mode is provided to allow a program to load or modify program space it ensures that instructions are fetched from the modified space and not the cache 4 3 2 3 Configure Instruction Cache The 1960 CA processor contains an instruction cache which supports pre load and lock of either none half or all of the instruction cache However only interrupt procedures can be locked into the cache The 1960 CF processor cache locking scheme has
287. ccesses always begin an even byte boundary 0 0 see Figure 11 9 For an 8 bit bus data is transferred on data pins D7 0 Data is also driven on the upper bytes of the data bus D15 8 D23 16 and D31 24 11 16 intel 2 EXTERNAL BUS DESCRIPTION 4 Short Word Burst 16 Bit Burst Bus 2 Short Word Burst 2 Short Word Burst F_CA037A Figure 11 8 16 Bit Wide Data Bus Bursts 4 Byte Burst 8 Bit Burst Bus 2 Byte Burst 2 Byte Burst F_CA038A Figure 11 9 8 Bit Wide Data Bus Bursts 11 17 EXTERNAL BUS DESCRIPTION In Figure 11 10 shows quad word read on 32 bit bus Figure 11 11 shows a write Burst access begins by asserting the proper address and status signals ADS A31 2 BE3 0 SUP D C DMA W R This is done on the rising edge that begins the address cycle on the figures Word read asserts all byte enable signals BE3 0 ADS assertion indicates beginning of access DT R is driven on the clock s next falling edge to ensure that DT R does not change while DEN is asserted DEN is asserted on the clock s next rising edge the rising edge that ends the address cycle ADS is deasserted on this clock edge DEN is used to control external data transceivers DEN and DT R remain asserted throughout the burst access Wait state cycles that follow an address are wait states WAIT is asserted while the internal wait state generator is counting If READY BTERM control is enab
288. ccurs If the prediction bit is incorrect and no fault occurs the instruc tions require two issue clocks The time it takes to enter a fault handler varies greatly depending upon the state of the processor s parallel processing units A 2 6 8 Debug mark and fmark are implemented with micro flows mark takes one issue clock if no trace fault is signaled If a trace fault is signaled or fmark is executed the processor performs an implicit call to the trace fault handler As with conditional faults the time required to enter a fault handler varies greatly A 2 6 9 Atomic Atomic instructions are implemented with micro flows atadd takes seven issue clocks and atmod takes eight issue clocks to execute with an idle bus in a zero wait state system Memory wait states directly affect execution speed A 2 6 10 Processor Management Processor management instructions implemented as micro flows include modpc modac modtc syncf flushreg sdma udma and sysctl modpc requires 17 clocks to execute if process priority is changed and 12 clocks if process priority is not changed modac requires 9 clocks modtc requires 15 clocks syncf takes 4 issue clocks if there are no possible outstanding faults Otherwise the instruction locks the IS until it is certain that no prior instruction will fault flushreg requires 24 clocks for each frame that is flushed This translates to 120 cycles to flush five frames Wait states in the memory being writt
289. ce while maintaining controller independence with respect to memory speed and processor clock frequency One of the four on chip integrated DMA channels is used for DRAM refresh The region table DMA and the 1960 Cx processor bus signals are used to develop a transparent DRAM controller that does not require any information about the memory subsystem Figure B 17 shows the DRAM system design The DRAM is configured as a single byte accessible 32 bit wide bank RAS is common to the entire bank CAS3 0 serve as byte selects within the bank WE is common to all the DRAM The byte accessible bank can be built from four 8 bit wide DRAM modules eight 4 bit wide DRAM modules eight 4 bit wide DRAM chips or 32 1 bit wide DRAM devices B 21 BUS INTERFACE EXAMPLES intel MUX COL ADR COL ADR DRAM X8 A 3 T ETE REF REQ 4 D15 8 D23 16 Figure B 17 DRAM System with DMA Refresh DRAM Control F CA117A Control logic is divided into three logical blocks DRAM control logic DRAM address generation logic and refresh request timer logic DRAM control logic is the main controller It controls the address multiplexer and all DRAM control lines during normal and refresh accesses Address generation logic serves as a multiplexer and an address generator The refresh request timer logic generates the periodic refresh request to the DMA unit B 22 S BUS INTERFACE EXAMPLES B 3 7 DRAM Address Gener
290. cedure This discussion is provided for those readers who wish to know the details of the fault handling mechanism 7 8 1 Local Fault Call When the selected fault handler entry in the fault table is an entry type 000 local procedure the processor operates as described in section 5 2 3 1 Call Operation pg 5 5 with the following exceptions new frame is created on the stack that the processor is currently using The stack can be the user stack supervisor stack or interrupt stack e The fault record is copied into the area allocated for it in the stack beginning at NFP 1 See Figure 7 4 processor gets the IP for the first instruction in the called fault handling procedure from the fault table e processor stores the fault return code 0015 in the PFP return type field If the fault handling procedure is not able to perform a recovery action it performs one of the actions described in section 7 7 2 Program Resumption Following a Fault pg 7 12 7 15 FAULTS intel If the handler action results in recovery from the fault a ret instruction in the fault handling procedure allows processor control to return to the program that was pending when the fault occurred Upon return the processor performs the action described in section 5 2 3 2 Return Operation pg 5 6 except that the arithmetic controls field from the fault record is copied into the AC register Since the call made is local t
291. cessor clocks to processes are specified in section 13 11 10 Performance pg 13 36 13 11 8 Bus Controller Unit The bus controller unit BCU accesses memory and devices which are source and destination of a transfer When the DMA process is active DMA microcode issues load or store requests to the bus controller to perform DMA data transfers The DMA and user processes equally share access to the bus on a request by request basis If both processes attempt to flood the bus controller with memory requests the bus is shared equally this prevents lockout of either process If either process does require the bus the bus controller resource may be used entirely by either process The BCU contains a queue which accepts up to three pending requests for bus transactions Figure 13 16 When a DMA channel is set up the queue is divided such that one slot is dedicated for DMA process requests and two slots are dedicated for user process requests DMA and core entries are arranged in such a way that when both a user and DMA slot are filled bus request servicing alternates between requests issued by the user and DMA processes 13 11 9 DMA Controller Logic DMA controller logic manages the execution of DMA operations independently from the core It e Synchronizes DMA transfers with external request acknowledge signals e Provides the program interface to set up each of the four DMA channels e Provides the program interface to moni
292. cessor scoreboards all stores when cacheable loads are present in the BCU queue Consider the following case ld xyz RO load from address xyz misses the data cache st 4 store is issued to the same address A 12 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION The load instruction misses the data cache and is then issued to the bus control unit It can take several clocks before data is actually written to rO and the data cache If the store were issued before the load returns data an inconsistency would result External memory would receive correct data from the store but the data cache would contain incorrect data from the load The processor prevents this inconsistency by stalling the store until the load returns data Since typical programs are not rich in store instructions the policy of scoreboarding stores on outstanding cacheable loads decreases overall processor performance less than one percent A 1 8 9 Operation and Data Coherency The policy of scoreboarding stores on cacheable loads does not apply to stores that the DMA controller generates A DMA store is issued to the BCU regardless of any cacheable loads in the bus request queues DMA processes and user processes share core resources The core alternates CPU cycles between the DMA processes and the user processes If a DMA store waited for all cacheable loads to complete large numbers of sequential cacheable loads from a user process
293. channel Channel priority can be programmed in one of two modes fixed priority or rotating priority mode The mode is selected with the priority mode bit in DMAC register When fixed mode is selected each channel has a set priority Channel 0 has the highest priority followed by Channel 1 2 and 3 Channel 3 has the lowest priority In this mode low priority DMASs assigned to Channels 1 3 can be locked out while a time critical DMA assigned to Channel 0 receives all of the DMA controller s attention When rotating priority is selected a channel s priority depends on the last channel serviced see Table 13 4 After a channel is serviced the priority of that channel is automatically changed to the lowest channel priority The priority of the remaining enabled channels is increased with a new channel becoming the highest priority Rotating mode ensures that no single channel is locked out for an extended period of time Table 13 4 Rotating Channel Priority Last Channel Priority Serviced Lowest Ben 0 0 3 2 1 1 1 0 3 2 2 2 1 0 3 3 3 2 1 0 Rotating priority is useful for producing a uniform latency for every DMA channel When rotating mode is selected the maximum latency for a single channel is the total of all latencies associated with all enabled channels When fixed mode is enabled latency for any channel is dependent on the activity of all channels of higher priority 13 10 CHANNEL SETUP STATUS AND CONTROL
294. chnique carefully and only in situations where the fault handling procedure is closely coupled with the application program Also a return of this type can only be performed if the processor is in supervisor mode prior to the return 7 7 5 Fault Controls Certain fault types and subtypes employ mask bits or flags that determine whether or not a fault is generated when a fault condition occurs Table 7 2 summarizes these flags and masks data structures in which they are located fault subtypes they affect and where more information about each can be found The integer overflow mask bit inhibits the generation of integer overflow faults The use of this mask is discussed in section 7 10 FAULT REFERENCE pg 7 20 The no imprecise faults NIF bit controls the synchronizing of faults for a category of faults called imprecise faults The function of this bit is described in section 7 9 PRECISE AND IMPRECISE FAULTS pg 7 17 TC register trace mode bits and PC register trace enable bit support trace faults Trace mode bits enable trace modes the trace enable bit enables trace fault generation The use of these bits is described in the trace faults description in section 7 10 FAULT REFERENCE pg 7 20 Further discussion of these flags is provided in CHAPTER 8 TRACING AND DEBUGGING The unaligned fault mask bit is located in the process control block PRCB which is read during initialization It controls whether unaligned memor
295. cl Action if 57 1 0 Arithmetic Zero Divide fault dst src2 src2 srcl srcl if src2 src lt 0 and dst 0 dst lt dst src 3 srcl src2 and dst are 32 bits Faults Type Mismatch Non supervisor reference of a sfr Arithmetic Zero Divide The srcl operand is 0 Example modi r9 r2 5 r5 lt modulo r2 r9 Opcode modi 749 5 divi divo remi remo 9 49 INSTRUCTION SET REFERENCE intel 9334 modify Mnemonic modify Modify Format modify mask STC srcldst reg lit sfr reg lit sfr reg Description Modifies selected bits in src dst with bits from src The mask operand selects the bits to be modified only bits set in the mask operand are modified in srcldst Action srcldst lt src and mask or src dst andnot mask Faults Type Mismatch Non supervisor reference of a sfr Example modify g8 910 r4 r4 lt 410 masked by 48 Opcode modify 650H REG See Also alterbit extract 9 50 intel INSTRUCTION SET REFERENCE 9 3 35 modpc Mnemonic modpc Modify Process Controls Format modpc STC mask srcldst reg lit sfr reg lit sfr reg Description Reads and modifies the PC register as specified with mask and src dst src dst operand contains the value to be placed in the PC register mask operand specifies bits that may be changed Only bits set in the mask are modified Once the PC register is changed its initial value is copied into src dst The src opera
296. cmpibge cmpibo cmpibno 9 31 nor 9 56 cmpinc 9 30 not notand 9 57 cmpinci cmpinco 9 30 notbit 9 58 cmpobe cmpobne 9 31 notor 9 59 cmpobg cmpobge 9 31 or ornot 9 60 cmpobl cmpoble 9 31 remi remo 9 61 concmp 9 34 ret 9 62 concmpi concmpo 9 34 rotate 9 64 div 9 35 scanbit 9 65 divi divo 9 35 scanbyte 9 66 Index 7 INDEX instruction set continued sdma 9 67 setbit 9 68 sh 9 69 shli shri shrdi 9 69 shlo shro 9 69 spanbit 9 72 st 9 73 st stl stt stq 9 73 stib stis 9 73 stob stos 9 73 sub 9 76 subc 9 75 subi subo 9 76 sysctl 9 78 synct 9 77 test 9 81 teste testne testl testle 9 81 testg testge 9 81 testo testno 9 81 udma 9 83 xnor xor 9 84 instruction set functional groups 4 4 Instruction Trace Event 9 6 instructions parallel execution 1 1 parallel issue A 14 parallel processing 14 scoreboarding A 17 instruction trace mode 8 4 integers 3 2 data truncation 3 2 sign extension 3 2 Index 8 m intel internal data RAM 2 12 10 13 local register cache 2 12 location 2 12 modification 2 13 reserved areas 2 12 write protection 2 13 interrupt controller 12 1 configuration 12 16 interrupt pins 12 9 overview 1 5 12 1 program interface 12 1 programmer interface 12 11 setup 12 16 interrupt handling procedures 6 10 AC and PC registers 6 10 address space 6 11 global registers 6 11 instruction cache 6 11 interrupt stack 6 10 local registers 6 10 location 6 10
297. compare ordinal and branch if equal cmpobne t f compare ordinal and branch if not equal cmpobl t f compare ordinal and branch if less cmpoble t f compare ordinal and branch if less or equal cmpobg t f compare ordinal and branch if greater cmpobge t f compare ordinal and branch if greater or equal bbs t f check bit and branch if set bbc t f check bit and branch if clear 4 15 INSTRUCTION SET SUMMARY In use the COBR machine instruction format and can specify literals local global and special function registers as operands With compare ordinal and branch and compare integer and branch instructions two operands are compared and the condition code bits are set as described in section 4 2 6 Comparison pg 4 12 A conditional branch is then executed as with the conditional branch BRANCH IF instructions With check bit and branch instructions one operand specifies a bit to be checked in the other operand The condition code flags are set according to the state of the specified bit 010 true if the bit is set and 000 false if the bit is clear A conditional branch is then executed according to condition code bit settings These instructions optimize execution performance time When it is not possible to separate adjacent compare and branch instructions with other unrelated instructions replacing two instruc tions with a single compare and branch instruction increases performance 4 2 8 Call
298. consists of 30 address lines four byte enables 32 data lines two clock outputs and control and status signals The bus controller manages instruction fetches data loads stores and DMA transfer requests Bus management is accomplished by queuing bus requests this effectively decouples instruction execution speed from external memory access time Load and store instructions the program s interface to the bus controller work on ordinal unsigned or integer signed data A single load or store instruction can move from 1 to 16 bytes of data The bus controller also handles instruction fetches which read either 8 bytes two words or 16 bytes four words The bus controller divides the flat 4 Gbyte memory space into l6regions each region has independent software programmable parameters that define data bus width ready control number of wait states pipeline read mode byte ordering and burst mode These parameters are stored in the memory region configuration registers MCON 0 15 Each memory region is 228 bytes 256 Mbytes The purpose of configurable memory regions is to provide system hardware interface support Regions are transparent to the software The address upper four bits A31 28 indicate which region is enabled A data bus width parameter in each MCON register configures the external data bus as an 8 16 or 32 bit bus for a region This parameter determines byte enable signal encoding and the physical location of dat
299. cribes how the processor handles each fault subtype Describes the value saved in the RIP register of the stack frame that the processor was using when the fault occurred In the RIP definitions next instruction refers to 1 the instruction directly after the faulting instruction or 2 an instruction to which the processor can logically return when resuming program execution Describes the effect s that a fault subtype causes in a program s control flow intel FAULTS 7 10 1 Arithmetic Faults Fault Type 3H Fault Subtype Number Name 0H Reserved Integer Overflow 2H Arithmetic Zero Divide 3H FH Reserved Function Indicates a problem with an operand or the result of an arithmetic RIP Program State Changes instruction An integer overflow fault is generated when the result of an integer instruction overflows its destination and the AC register integer overflow mask is cleared Here the result s n least significant bits are stored in the destination where n is destination size Instructions that generate this fault are addi subi stib shli muli divi 7 An arithmetic zero divide fault is generated when the divisor operand of an ordinal or integer divide instruction is zero Instruc tions that generate this fault are divo divi ediv remi remo IP for next executed instruction if a fault had not occurred Faults may be imprecise when executing with the NIF bit cleared An integer overflow fault may
300. cribes the time required for microcode to complete channel setup after Nsetup DMA sdma executes This latency component may be ignored if the channel is channel enabled Neetup clock cycles after sdma is executed Swap Time required for a higher priority channel to preempt a lower priority channel Newap DMA and the time required to copy the associated DMA working registers to internal channel data RAM If only one channel is enabled a system then Ngwap equals 0 Lower Latency of lower priority channels which are preempted when a DMA for the Niower Priority highest priority channel is requested A transfer on the lower priority channel Channels must complete before the higher priority channel is serviced Interrupt Latency caused by servicing an interrupt with the suspend DMA mode enabled Nint porum Latency Ni is the same as the worst case interrupt latency for the system Table 13 8 Values of DMA Latency Components Lateney Condition Value Notes Component PCLK2 1 Cycles Non chained DMA modes 36 Chained DMA modes 44 N setup Channel enable delayed from sdma execution by gt 36 clock cycles in non chaining mode or 0 gt 44 clock cycles in chained DMA mode Single DMA channel enabled No channel 0 preemption swap Multiple DMA channels enabled Preempt 5 of channels preempted lower priority channels Single DMA channel enabled No channel 0 preemption lower Multiple DMA channels enabled Preempt 1 lower priority c
301. croprocessor data sheets for AC specifications i9609 CA CF Microprocessor dedicated control for each channel data passes over system bus CA070A External Interface System Bus Address Data Control DREQO DACKO Peripheral 0 TCO 1 DACK1 Peripheral 1 EL EOP1 TC1 DREQ2 DACK2 Peripheral 2 EOP2 2 DREQ3 DACK3 Peripheral 3 EOP3 TC3 13 11 1 DREQ3 0 DACK3 0 Figure 13 13 DMA External Interface Pin Description DMA Request input DMA request pins are individual asynchronous channel request inputs used by peripheral circuits to obtain DMA service In fixed priority mode DREQO has the highest priority DREQ3 has the lowest priority A request is generated by asserting the DREQ3 0 pin for a channel DMA Acknowledge output notifies an external DMA device that a transfer is taking place The pin is active during the bus request issued to the DMA device 13 31 DMA CONTROLLER intel EOP TC3 0 End of Process input EOP3 0 or Terminal Count output TC3 0 As an output the pin is driven active low during the last transfer for a DMA and has the same timing as the DACK3 0 signals TC3 0 pins are asserted when byte count reaches zero for a chained or non chained DMA As an input an asynchronous active low signal on the pin for a minimum of two clock cycles causes DMA to terminate as described in section 13 8 TERMINATING DMA
302. ction if PC te 1 PC tfp lt 1 TC bte lt 1 Trace Breakpoint trace fault Faults Trace Instruction Breakpoint Instruction and Breakpoint Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 Operation Unimplemented Execution from on chip data RAM Example ld xyz r4 addi r4 E57 6 fmark Breakpoint trace event is generated at this point in the instruction stream Opcode fmark 66CH REG See Also mark 9 43 INSTRUCTION SET REFERENCE intel 9 3 29 Format Description Action 9 44 LOAD Id Load Idob Load Ordinal Byte Idos Load Ordinal Short Idib Load Integer Byte Idis Load Integer Short Load Long Idt Load Triple Idq Load Quad Id src dst mem reg Copies byte or byte string from memory into a register or group of successive registers The src operand specifies the address of first byte to be loaded The full range of addressing modes may be used in specifying src Refer to section 3 3 MEMORY ADDRESSING MODES pg 3 5 dst specifies a register or the first lowest numbered register of successive registers Idob and Idib load a byte and Idos and Idis load a half word and convert it to a full 32 bit word Data being loaded is sign extended during integer loads and zero extended during ordinal loads Id 1 Idt and Idq instructions 4 8 12 and 16 bytes respectively from memory into successive
303. ctl could be executed periodically to guarantee recognition of pending interrupts which were posted in the table by the external agent An external I O agent or a coprocessor posts interrupts to a processor s interrupt table in memory in the same manner described above providing it has the capability to perform atomic operations on memory When interrupts are posted in this manner pending interrupts and pending priorities must be modified in specific order and not allow access by the processor or other external agents during the atomic modify operations The processor automatically checks the memory based interrupt table when the processor posts an interrupt using sysctl with a post interrupt message type When the processor finds a pending interrupt it handles it as if it had just received the interrupt If the processor finds two pending interrupts at the same priority it services the interrupt with the highest vector number first 6 7 INTERRUPTS intel Example 6 2 Modifying Pending Interrupts set pending interrupt bit atomic modify pending interrupts vector number 8 set pending priority bit atomic modify pending priorities 6 6 SYSTEM CONTROL INSTRUCTION sysctl sysctl is typically used to request an interrupt in a program see Example 6 3 The request interrupt message type 00H is selected and the interrupt procedure pointer number is specified in the least significant byte of the instructio
304. cy rate the Instruction Execution column a range of numbers e g 0 5 1 indicates either the degree of parallel instruction issue achieved or conditions specific to the instruction s run time execution such as branch taken or not taken Table A 3 describes the shorthand for additive factors that appear in the execution time columns intel INSTRUCTION SET QUICK REFERENCE KEY Table A 3 Execution Times Shorthand for Additive Factors efa The time for effective address calculation For the Ida instruction efa clocks Addressing Mode 0 offset 0 disp 0 reg 0 offset reg 0 disp reg 0 disp reg scale 1 reg reg scale 1 disp reg reg scale 3 disp 8 IP For all other references efa Addressing Mode 0 offset 0 disp 0 reg 1 offset reg 1 disp reg 1 disp reg scale 2 reg reg scale 2 disp reg reg scale 4 disp 8 IP bus The time necessary to perform the external memory operations associated with the instruction The additive factor bus equals 0 when memory operations associated with the instruction are in the on chip data RAM Bus is also equal to zero for branches and calls where the target is in the instruction cache spill The time required to write one cached register set to its reserved frame on the stack Although spillis a function of bus spill equals 36 when the stack is in external zero wait state memory fill The time required to read one register set from the
305. cycles start by asserting RAS Accesses to any column within the selected row may be treated as static RAM using CAS as an output enable The fastest DRAM read accesses are achieved with static column DRAM The 1960 Cx processors four word burst bus can easily take advantage of the fast column access times provided by nibble mode fast page mode or static column mode DRAM LAMAR mp Vo fb oj DATA F_CA113A Figure B 13 Static Column Mode DRAM Read B 3 2 DRAM Refresh Modes All DRAMs require periodic refresh to retain data DRAMs may be refreshed in one of two ways RAS only refresh or CAS before RAS refresh RAS only refresh Figure B 14 is realized by asserting a row address on the address pins and asserting RAS CAS is not asserted A single RAS only refresh cycle refreshes all columns within the selected row CAS before RAS refreshes Figure B 15 do not require an address to be generated DRAM generates the row address with an internal counter intel e BUS INTERFACE EXAMPLES l I ROW T T RAS j _ 114 1 1 I 1 ADR I l RAS 5 CA115A Figure 15 CAS before RAS DRAM Refresh DRAM may be refr
306. d in an ordinal operand All use the REG format and can specify literals or local global or special function registers 4 2 4 1 Bit Operations These instructions operate on a specified bit setbit set bit clrbit clear bit notbit not bit alterbit alter bit scanbit scan for bit spanbit span over bit setbit clrbit and notbit set clear or complement toggle a specified bit in an ordinal 4 10 INSTRUCTION SET SUMMARY alterbit alters the state of a specified bit in an ordinal according to the condition code If the condition code is 010 the bit is set if the condition code is 000 the bit is cleared chkbit described in section 4 2 6 Comparison pg 4 12 can be used to check the value of an individual bit in an ordinal scanbit and spanbit find the most significant set bit or clear bit respectively in an ordinal 4 2 4 2 Bit Field Operations m The two bit field instructions are extract and modify extract converts a specified bit field taken from an ordinal value into an ordinal value In essence this instruction shifts right a bit field in a register and fills in the bits to the left of the bit field with zeros eshro also provides the equivalent of a 64 bit extract of 32 bits modify copies bits from one register under control of a mask into another register Only unmasked bits in the destination register are modified modify is equivalent to a bit field move An application that uses littl
307. d intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx intx word intx intx intx intx intx 1 intx START Processor starts execution here after reset text globl start globl _reinit Start ip mov 0 g14 414 must be 0 for ic960 C compiler Copy the data area into RAM has been packed in the ROM after the code area If the copy is not needed RAM based monitor the symbol rom data be defined as 0 in the linker directives file lda rom data gl load source of copy cmpobe 0 41 1f lda Bdata g2 load destination lda Edata g3 init data ldq 41 r4 addo 16 gl gl stq r4 g2 addo 16 g2 g2 cmpobl 42 g3 init data 14 17 INITIALIZATION AND SYSTEM REQUIREMENTS intel Example 14 2 Startup Routine init s Sheet 5 of 6 Initialize the BSS area of RAM lda SBBSS G2 start of bss lda Ebss g3 end of bss movq 0 r4 bss till stq r4 g2 addo 16 42 g2 cmpobl 42 93 055 fill Save initial value of g0 it contains the stepping number st 40 _componentid _reinit ldconst 0x300 r4 reinitialize sys control lda Le ES lda rom prcb r6 Sysctl r4 r5 r6 Tz mov 0 414 lda user stack 40 new
308. d priority is checked rather than the memory based interrupt table when modpc changes a process priority The internal priority value is updated each time an interrupt is posted using sysctl 9 51 INSTRUCTION SET REFERENCE intel 9 3 36 Format Description Action Faults Example Opcode See Also 9 52 modtc modtc Modify Trace Controls modtc mask STC dst reg lit sfr reg lit sfr reg sfr Reads and modifies TC register as specified with mask and src The src operand contains the value to be placed in the TC register mask operand specifies bits that may be changed Only bits set in mask are modified mask must not enable modification of reserved bits Once the TC register is changed its initial state is copied into dst The changed trace controls may take effect immediately or may be delayed If delayed the changed trace controls may not take effect until after the first non branching instruction is fetched from memory or after four non branching instructions are executed For more information on the trace controls refer to CHAPTER 7 FAULTS and CHAPTER 8 TRACING AND DEBUGGING temp lt TC mask lt TC lt mask and src or temp andnot mask dst lt temp Type Mismatch Non supervisor reference of a sfr modtc 412 410 g2 trace controls lt 410 masked 412 previous trace controls stored 02 modtc 654H REG modac modpc intel 9
309. d to as coprocessors interface to the IS and RF connecting to either the register REG side or the memory side of the processors IS issues directives via the and MEM interfaces which target a specific coprocessor That coprocessor then executes an express function virtually decoupled from the IS and the other coprocessors The REG and MEM data buses transfer data between the common RF and the coprocessors The 1960 Cx processors are designed to allow application specific coprocessors to interface to the IS in the same way as core defined coprocessors The integrated peripherals bus controller interrupt controller and DMA controller interface to the i960 Cx processors REG and MEM sides INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Interrupt Port Four Channel Port DMA Controller Instruction Prefetch Queue Control Instruction Cache Memory Region Two Way Configuration Set Associative Bus Controller Address 128 Bit Cache Bus Bus Request Request Queues Programmable Parallel Interrupt Controller Instruction g Fam Ponens Direct Mapped Multiply Divide Data Cache Unit 1 Kbyte ata LLI D RAM Memory Side a Execution Register Side Unit Machine Bus i 5 to 15 Sets Register Cache Six Port Register File Address 64 Bit 32 Bit Generation Unit SRC1 Bus Base Bus 64 Bit 128 Bit SRC2 Bus Load Bus
310. d when a dedicated interrupt is serviced This allows other hardware generated interrupts to be locked out until the mask is restored See section 12 3 3 Programmer s Interface pg 12 11 for a further description of the IMSK IPND and IMAP registers Interrupt vectors are assigned to DMA inputs in the same way external pins are assigned dedicated mode vectors The DMA interrupts are always dedicated mode interrupts IMAP Control Registers hard wired vector offset highest selected vector number CA050A Figure 12 2 Dedicated Mode 12 4 a INTERRUPT CONTROLLER 12 2 1 2 Expanded Mode In expanded mode up to 248 interrupts can be requested from external sources Multiple external sources are externally encoded into the 8 bit interrupt vector number This vector number is then applied to the external interrupt pins Figure 12 3 with the XINTO pin representing the least significant bit and XINT7 the most significant bit of the number Note that external interrupt pins are active low therefore the inverse of the vector number is actually applied to the pins In expanded mode external logic is responsible for posting and prioritizing external sources Typically this scheme is implemented with a simple configuration of external priority encoders As shown in Figure 12 4 simple combinational logic can handle prioritization of the external sources when more than one expanded interrupt is pending NOT
311. ddo r9 0 40 mov g2 r6 lda g2 r6 cmpdeco 2 r7 r7 cmpdeco2 r7 r7 subo 10 90 40 lda 2 41 41 addo 2 41 41 subo 10 90 40 bg next j bg t new next j ret ret Execution from DR new loop Execution from DR loop Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 subo 1 addo Idob 2 Idob 2 addo 3 addo 3 Idob 4 Idob 4 addo Ida 5 addo 5 addo Idob A 51 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 52 Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 6 shlo 6 addo 7 addo 7 addo Idob 8 Idob 8 addo Ida 9 addo 9 addo Idob 10 addo 10 addo Ida 11 Idob 11 addo Idob 12 addo 12 addo Ida 13 shlo 13 addo Idob 14 addo 14 addo 15 Idob 15 addo Idob 16 addo 16 addo Ida 17 shlo 17 addo Idob 18 addo 18 addo 19 Idob 19 shro 20 shlo 20 cmpdeco stob bg t 21 addo 21 subo 22 addo 22 addo Idob 23 Idob 24 addo 25 addo 26 Idob 27 addo 28 shlo 29 addo 30 Idob 31 addo 32 shro 33 stob 34 addo bg t 35 cmpdeco 36 subo intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 7 7 Branch Prediction Conditional branches execute faster if the branch direction is correctly predicted using the branch prediction bits on conditional instructions This is particularly true when a comparison cannot be separated from the test in a conditional instructio
312. de 8 4 syncf 7 19 9 77 sysctl 4 21 9 78 system calls 5 2 5 12 calls 5 2 system local calls 5 2 system supervisor calls 5 2 INDEX system control functions 4 19 sysctl instruction syntax 4 19 system control messages 4 20 configure instruction cache 4 21 invalidate cache 4 21 load control registers 4 23 reinitialize processor 4 22 request interrupt 4 21 system procedure table 2 1 2 8 initialization 14 11 system local call 7 2 system supervisor call 7 2 T test instructions 9 81 test 9 81 teste testne testl testle 9 81 testg testge 9 81 testo testno 9 81 three state output pins 14 28 Trace Controls TC register 2 20 8 2 trace events 8 1 hardware breakpoint registers 8 1 mark and fmark 8 1 PC and TC registers 8 1 trace fault pending flag 8 3 TTL input pins 14 29 two X mode 14 26 U udma 9 83 unaligned DMA transfers 13 10 user stack 2 8 user supervisor protection model 2 20 supervisor mode resources 2 20 usage 2 21 Index 13 INDEX V vector entries 6 4 NMI 6 5 structure 6 5 W wait states programmable wait state generator 10 2 warm reset 12 15 14 2 write policy A 10 X xnor xor 9 84 Index 14 intel G INSTRUCTION SET QUICK REFERENCE KEY intel APPENDIX A INSTRUCTION SET QUICK REFERENCE KEY For each instruction this quick reference lists mnemonic name assembler syntax action opcode instruction format machine type and executi
313. de bit 1 used here as a carry bit If the result has a carry bit 1 of the condition code is set otherwise it is cleared This instruction s description in CHAPTER 9 INSTRUCTION SET REFERENCE gives an example of how this instruction can be used to add two long word 64 bit operands together subc is similar to addc except it is used to subtract extended precision values Although addc and subc treat their operands as ordinals the instructions also set bit 0 of the condition codes if the operation would have resulted in an integer overflow condition This facilitates a software imple mentation of extended integer arithmetic emul multiplies two ordinals each contained in a register producing a long ordinal result stored in two registers ediv divides a long ordinal by an ordinal producing an ordinal quotient and an ordinal remainder stored in two adjacent registers 4 2 2 3 Remainder and Modulo These instructions divide one operand by another and retain the remainder of the operation remi remainder integer remo remainder ordinal modi modulo integer The difference between the remainder and modulo instructions lies in the sign of the result For remi and remo the result has the same sign as the dividend for modi the result has the same sign as the divisor 4 8 INSTRUCTION SET SUMMARY 4 2 2 4 Shift and Rotate These shift instructions shift an operand a specified number of bits left or right shlo shift l
314. dent tasks or events This mechanism defines 32 priority levels ranging from 0 the lowest priority level to 31 the highest The priority field always reflects the current priority of the processor Software can change this priority by use of the modpc instruction The processor uses the priority field to determine whether to service an interrupt immediately or to post the interrupt The processor compares the priority of a requested interrupt with the current process priority When the interrupt priority is greater than the current process priority or equal to 31 the interrupt is serviced otherwise it is posted When an interrupt is serviced the process priority field is automatically changed to reflect interrupt priority See CHAPTER 6 INTERRUPTS 2 18 PROGRAMMING ENVIRONMENT PC register trace enable bit bit 0 and trace fault pending flag bit 10 control the tracing function The trace enable bit determines whether trace faults are to be generated 1 or not generated 0 The trace fault pending flag indicates that a trace event has been detected 1 or not detected 0 2 6 3 1 Initializing and Modifying the PC Register Any of the following three methods can be used to change bits in the PC register e Modify process controls instruction modpc e Alter the saved process controls prior to a return from an interrupt handler e Alter the saved process controls prior to a return from a fault handler modpc directly read
315. ding on the type of entry being pointed to in the system procedure table calls can cause either a system supervisor call or a system local call to be executed A system supervisor call is a call to a system procedure that also switches the processor to supervisor mode and the supervisor stack A system local call is a call to a system procedure that does not cause an execution mode or stack change Supervisor mode is described throughout CHAPTER 5 PROCEDURE CALLS ret performs a return from a called procedure to the calling procedure the procedure that made the call ret obtains its target IP return IP from linkage information that was saved for the calling procedure ret is used to return from all calls including local and supervisor calls and from implicit calls to interrupt and fault handlers 4 2 9 Conditional Faults Generally the processor generates faults automatically as the result of certain operations Fault handling procedures are then invoked to handle various fault types without explicit intervention by the currently running program These conditional fault instructions permit a program to explicitly generate a fault according to the state of the condition code flags faulte t f fault if equal faultne t f fault if not equal faultl t f fault if less faultle t f fault if less or equal faultg t f fault if greater faultge t f fault if greater or equal faulto t f fault if ordered faultno t f fault if u
316. ditions the processor sets the flags as shown in Table 2 5 To show equality and inequalities the processor sets the condition code flags as shown in Table 2 6 Table 2 5 Condition Codes for True or False Conditions Condition Code Condition 0105 true 0005 false Table 2 6 Condition Codes for Equality and Inequality Conditions Condition Code Condition 0005 unordered false 0015 greater than true 0105 equal 1005 less than 2 16 3 PROGRAMMING ENVIRONMENT Some 1960 architecture implementations provide integrated floating point processing The terms ordered and unordered are used when comparing floating point numbers If when comparing two floating point values one of the values is a NaN not a number the relationship is said to be unordered The 1960 Cx processors do not implement on chip floating point processing To show carry out and overflow the processor sets the condition code flags as shown in Table 2 7 Table 2 7 Condition Codes for Carry Out and Overflow Condition Code Condition 01 carry out OX15 overflow Certain instructions such as the branch if instructions use a 3 bit mask to evaluate the condition code flags For example the branch if greater or equal instruction bge uses a mask of 011 to determine if the condition code is set to either greater than or equal These masks cover the additional conditions of greater or e
317. dst reg lit sfr reg lit sfr reg sfr EN I E I U I m 67 0 REG R 05 1 4 srcl and src are 32 bits dst is 64 bits Extended Shift Right Ordinal srcl src2 dst reg lit sfr reg lit sfr reg sfr I U 5D 8 REG R 0 5 1 1 dst lt src2 gt gt srcl mod 32 src2 is 64 bits Extract extract bitpos len src dst reg lit sfr reg lit sfr reg Es I I U 1 MI 65 1 REG u 4 4 src dst lt src dst gt gt bitpos mod 32 and 2 len mod 32 1 Fault If Equal 99 if f if fault faulte A Cccand 010 2 0 1 I U R 1A CTRL 1 2 taken Constraint Range fault March 1994 Page 9 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution i ipti Mach 71 Resul Mnemonic Description om la 3 ig p p tip em te Events Modes T Opcode Format Fault If Greater 99 if fault faultg C AC cc and 001 0 0 19 CTRL u 1 2 taken Constraint Range fault Fault If Greater Or Equal 99 if fault faultge if AC cc and 011 0
318. e The MDU incorporates a one clock pipeline unless integer overflow faults are enabled The IS can issue a new MDU instruction one clock before the previous result is written For example back to back 32x32 multiply throughput is four clocks per multiply versus a five clock multiply latency Figure 10 shows the execution pipeline for back to back multiplies in which adjacent instructions do not have a register dependency between them A 22 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION addo 00 41 g2 mulo g2 g3 g4 mulo g5 g6 g7 addo g8 g9 g10 Instruction Scheduler ssue addo mulo mulo addo EU Read src1 src2 g0 g1 g8 g9 Pipeline i ale 926 9001 9104 08 99 Read src1 src2 g2 93 95 06 MDU Pipeline Write dst 94 02 03 Figure 10 MDU Pipelined Back To Back Operations The MDU directly executes instructions listed in Table A 7 The scheduler issues an MDU instruction in one clock The table also shows the length of the execution stage latency for each instruction Subsequent instructions not dependent upon MDU results are issued and executed in parallel with the MDU If instructions in the table are issued back to back and they have no register dependency between them the MDU pipeline improves throughput by one clock per instruction Table A 7 MDU Instructions Back to Back Back to Back i Issue Result Mnemoni
319. e side REG MEM or CTRL each instruction in the current quad word group belongs When the IS issues a group of instructions the appropriate parallel processing units acknowledge receipt and begin execution However register and resource dependencies can delay instruction execution The processor transparently manages these interactions through register scoreboarding and register bypassing To maximize the IS s ability to issue instructions in parallel the instruction cache is organized to provide three or four instructions per clock to the scheduler To minimize the cost of a cache miss the instruction fetch unit constantly checks whether a cache miss will occur on the next clock If a miss is imminent an instruction fetch is issued The following discussions assume that instructions are always available from the instruction cache For a discussion of cache organization and the impact of cache misses see section A 2 5 Instruction Cache And Fetch Execution pg A 33 A 14 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 2 Parallel Execution Six parallel processing units are attached to the six ported register file MEM side Three units are attached to the machine s memory side MEM side instructions are dispatched over the MEM machine bus BCU Bus Control Unit executes memory reads and writes for instructions which reference an operand in external memory DR Data RAM handles memory reads and write
320. e Disabling the cache does not invalidate any of its entries setbit 30 sf2 sf2 set the bit to dynamically disable data cache mov 90 90 wait two clocks before executing mov 90 90 code which the data cache The DMA Control Register s data cache invalidate bit can be set to quickly invalidate the entire cache This invalidation clears all the individual valid bits in the data cache array The effect of changing this bit is also delayed by two clocks If multiple cacheable loads are pending in the BCU queues when the cache is invalidated the processor continuously invalidates the cache until the loads are finished Once all cacheable loads are complete and all valid bits have been cleared the data cache invalidate bit reverts to 0 Upon reset or initialization the data cache is globally disabled and invalidated to ensure that accesses are not made to a cache line that may contain invalid data A 9 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 1 8 4 Data Fetch Policy Data fetch policy determines what happens for a load that misses the cache The i960 CF processor employs a natural fetch policy Word double word triple word and quad word loads are issued to the bus control logic in their original widths Byte and short word loads are promoted to word bus requests Because most applications have 32 bit data buses there is seldom a bandwidth penalty for promoting a byte or short word
321. e source while the destination is selected with the acknowledge pin The destination when selected reads the load data the processor ignores the data from the load NOTE Fly by mode may not access internal data RAM A fly by DMA in block mode is started by software as is any block mode operation Request pins DREQ3 0 are ignored in block mode Fly by block mode DMAs be used to implement high performance memory to memory transfers where source and destination addresses are fixed at block boundaries In this case the acknowledge pin must be used in conjunction with external hardware to uniquely address the source and destination for the transfer 13 4 3 Source Destination Request Length Source and destination request length is selected when a DMA channel is configured Request length determines bus request types that the DMA microcode issues Byte short word or quad word bus requests are issued by the DMA controller microcode Figure 13 3 illustrates source synchronized DMA loads 13 6 intel DMA CONTROLLER Source Request Length Word 8 Bits ADS 4 DACKx DN Access _ gt Byte Long Request gt Source Request Length Word 32 Bits 1 1 1 1 L 1 1 1 1 1 1 1 1 ADS D7 0 Byte 0 Byte 1 Byte 2 Byte 3 1 1 1 T 1 1 1 1 1 L 1 1 po 0 Pod 0 Access Access
322. e depth maximizes on chip data RAM for variable caching Some situations exist where flushreg can optimize register cache usage When an application crosses the boundary between non real time processing and real time processing it might be desirable to flush the register set Flushing the register set at the beginning of a routine saves time that would otherwise be spent on frame spills later in the routine However this approach may actually result in a greater number of spills occurring than would otherwise have occurred without the premature flush This technique may be used to control interrupt latency within sections of background code For example it may be advantageous to execute a flush at the beginning of a routine which executes many loads from very slow memory This reduces interrupt latency within that code section since there is no possibility of the interrupt s frame spill being impeded by slow memory operations A 2 8 4 Data RAM On every clock 128 bits of data can be loaded from or stored to the data RAM This rate is sustained simultaneously with single clock arithmetic operations executing from the independent REG side register ports Allocated correctly this resource dramatically increases performance of critical application algorithms If data RAM space is scarce locations can be dynamically allocated If data RAM space is plentiful locations can be globally allocated to achieve minimum latency to critical variables A
323. e endian memory regions may need to access 32 bit big endian data The 1960 Cx processors do not have a byte swap instruction however a byte swap can be performed in five clocks by use of the modify and rotate instructions Example 4 1 shows assembly language instructions that can be used in assembly language programs or in programs written in high level languages that support in line assembly code such as the GNU960 and Intel C tools C compilers Example 4 1 Byte Swap Assume g0 contains value to swap result written to r3 rotate 16 90 r3 ldconst OxffOOff00 r4 modify r4 g0 r3 rotate 8 r3 3 For example if register g0 contains 0 12345678 the final result in r3 should be 0x78563412 after the byte swap The following shows how each instruction works in this example Instruction 40 r3 rotate 16 40 lt 3 0x12345678 0x56781234 ldconst Oxff00ff00 r4 0x12345678 0x56781234 modify r4 g0 r3 0x12345678 0x12785634 rotate 953 13 0x12345678 0x78563412 INSTRUCTION SET SUMMARY In 4 2 5 Byte Operations scanbyte performs a byte by byte comparison of two ordinals to determine if any two corre sponding bytes are equal The condition code is set based on the results of the comparison scanbyte uses the REG format and can specify literals or local global or special function registers 4 2 6 Comparison The processor provides several types of instructions for comparing two operands as described in the following subsections
324. e entirely within the instruction buffer The processor can be directed to load a block of instructions into the cache and then disable all normal updates to this load cache portion This cache load and lock mechanism is provided to optimize interrupt latency and throughput The first instructions of time critical interrupt routines are loaded into the locked cache The interrupt when serviced is directed to the locked cache portion No external accesses are required for these instructions when the interrupt is serviced Only interrupts can be directed to fetch instructions from the instruction cache s locked portion Other causes of program redirection always fetch from the normal memory hierarchy even if the target address of the redirection is represented in the locked cache When bit 1 of an interrupt vector is set to 1 the interrupt is fetched from the instruction cache s locked portion Execution continues from the locked cache until a miss occurs such as a branch call or return to code outside of the locked space If an interrupt directed to the locked cache results in a miss the targeted instruction is fetched from the normal memory hierarchy Either the full cache or half the cache can be configured to load and lock When only half of the cache is loaded and locked the other half acts as a normal two way set associative cache Normally an application locks only half the cache Locking the full cache means that all instruction fetches
325. e faults on instruction addresses and data access addresses 8 5 TRACING AND DEBUGGING intel Breakpoint trace events can be generated when the processor executes an instruction with an IP that matches one of the addresses programmed into the two instruction breakpoint registers IPBO IPB1 Each instruction address breakpoint may be enabled or disabled individually by programming the two least significant bits in IPBO or IPBI Figure 8 2 describes the instruction address breakpoint registers Instruction Address Breakpoint Enable IPB e 00 disable 11 enable Instruction Address CA024A Figure 8 2 Instruction Address Breakpoint Registers 0 IPB1 Breakpoint trace events may also be generated when a memory access is issued which matches conditions programmed in one of two data address breakpoint registers DABO DABI see Figure 8 3 Each breakpoint register is programmed to fault when the address of an access matches the breakpoint register and the access is one of four types 1 any store 2 any load or store 3 any data load or store or any instruction fetch or 4 any memory access Data Address 31 28 24 20 16 12 8 4 0 025 Figure 8 3 Data Address Breakpoint Registers DABO DAB1 The programmer configures the BPCON register to set the data address breakpoint mode which corresponds to one of these access types Figure 8 4 Each data address breakp
326. e is stored The following table shows the condition code mask for each instruction The mask is in bits 0 2 of the opcode Instruction Mask Condition testno 0005 Unordered testg 0015 Greater teste 0105 Equal testge 0115 Greater or equal testl 1005 Less testne 1015 Not equal testle 1105 Less or equal testo 1112 Ordered T he optional t or f suffix may be appended to the mnemonic Use t to speed U one up execution when these instructions usually store a true 1 condition in dst se f to speed up execution when these instructions usually store a false 0 condition in dst If a suffix is not provided the assembler is free to provide For all instructions except testno if mask and AC cc 000 dst lt 0x1 T else dst 0 0 testno if AC cc 0005 dst lt 0 1 else dst lt 0 0 ype Mismatch Non supervisor reference of a sfr 9 81 INSTRUCTION SET REFERENCE Example Opcode See Also 9 82 assume AC cc 100 testl g9 g9 0x00000001 teste 22H COBR testne 25H COBR testl 24H COBR testle 26H COBR testg 21H COBR testge 23H COBR testo 27H COBR testno 20H COBR cmpi cmpdeci cmpinci intel INSTRUCTION SET REFERENCE 9 3 60 80960Cx Processor Only Mnemonic udma Update DMA Channel RAM Format udma Description The current status of the DMA channels is written to the dedicated DMA RAM Action for i 0 to 3
327. e word specified with the src dst operand until operation completes Memory location in src is the modified word s first byte LSB address Address is automatically aligned to a word boundary Action tempa lt src andnot 0x3 t force alignment to word boundary temp lt memory word tempa LOCK asserted at beginning of memory read memory word tempa lt src dst and mask or temp and not mask LOCK deasserted during memory write after the memory write completes src dst temp Faults Type Mismatch Non supervisor reference of a sfr and or non supervisor attempt to write to internal data RAM Example atmod g5 97 410 95 lt 45 masked by g7 where 45 specifies the address of a word in memory 410 lt initial value stored at address g5 in memory Opcode atmod 610H REG See Also atadd 9 13 INSTRUCTION SET REFERENCE intel 9 3 7 Format Description Action Faults Example Opcode See Also 9 14 b bx b Branch bx Branch Extended b targ disp bx targ mem Branches to the specified target With the b instruction IP specified with targ operand can be no farther than 31929 to 2 4 bytes from current IP When using the Intel 1960 processor assembler targ operand must be a label which specifies target instruction s bx performs the same operation as b except the target instruction can be farther than 22 to 223 4 bytes from
328. ecoding and optimized lookahead logic makes the micro flow invocation more efficient than a branch instruction While the IS is issuing one group of instructions parallel decode circuitry checks to see if the next executable instruction is a instruction Figure A 19 If so the opcode words presented to the IS in the next clock come from the on chip ROM location that contains the micro flow for the detected complex instruction The IS actually never attempts to issue a complex encoding The processor detects the encoding when the instruction is fetched then traps during the clock in which the instruction is presented to the IS Generally no clocks are lost when switching to a micro flow However two conditions can defeat the lookahead logic e branches to REG CTRL or COBR format instructions which are implemented as micro flows or e cache misses from straight line code execution Under these conditions the switch to on chip ROM causes a one clock break in the IS s ability to issue instructions Complex instructions encoded with the MEM format do not require lookahead detection to trap to the ROM without overhead Therefore MEM format instructions of machine type do not see a one clock performance loss even when lookahead logic is defeated Furthermore micro flows return to general execution with no overhead back to back micro flows do not incur the one clock defeated lookahead penalty A 37 INSTRUCTION EXECUTION AN
329. ection A 2 7 7 Branch Prediction pg 53 4 1 3 Instruction Encoding Formats instructions are encoded in one 32 bit machine language instruction also known as an opword which must be word aligned in memory An opword s most significant eight bits contain the opcode field The opcode field determines the instruction to be performed and how the remainder of the machine language instruction is interpreted Instructions are encoded in opwords in one of four formats see Figure 4 1 Instruction Type Format Description register REG Most instructions are encoded in this format Used primarily for instructions which perform register to register operations compare and branch COBR An encoding optimization which combines compare and branch operations into one opword Other compare and branch operations are also provided as REG and CTRL format instructions control CTRL Used for branches and calls that do not depend on registers for address calculation memory MEM Used for referencing an operand which is a memory address Load and store instructions and some branch and call instructions use this format MEM format has two encodings MEMA or MEMB Usage depends upon the addressing mode selected MEMB formatted addressing modes use the word in memory immediately following the instruction opword as a 32 bit constant 4 2 2 INSTRUCTION SET SUMMARY OPCODE SRC DST SRC2 OPCODE SRC1 REG OPCOD
330. ecution mode from user mode to section 5 5 SYSTEM CALLS pg 5 12 supervisor mode When the processor switches to supervisor mode it also switches to a new stack the supervisor stack interrupt table Contains vectors pointers to interrupt handling procedures section 6 4 INTERRUPT TABLE pg When an interrupt is serviced a particular interrupt table entry is 6 3 specified A separate interrupt stack is provided to ensure that interrupt handling does not interfere with application programs fault table Contains pointers to fault handling procedures When the section 7 3 FAULT TABLE pg 7 4 processor detects a fault the processor selects a particular entry in the fault table The architecture does not require a separate fault handling stack Instead a fault handling procedure uses the supervisor stack user stack or interrupt stack depending on processor execution mode when the fault occurred and type of call made to the fault handling procedure control table Contains on chip control register values Control table values are section 2 3 CONTROL REGISTERS moved to on chip registers at initialization or with sysctl pg 2 6 section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 The 1960 Cx processors define two initialization data structures initialization boot record IBR and processor control block PRCB These structures provide initialization data and pointers to other data structures in mem
331. ed cache The 1960 CF processor with larger instruction cache supports 2 Kbytes or 4 Kbytes of locked cache As indicated in Table 12 2 the mode field of the sysctl instruction specifies the size of locked cache Table 12 2 Cache Configuration Modes Mode Field Mode Description 80960CA 80960CF 0005 normal cache enabled 1 Kbyte 4 Kbytes 12 full cache disabled 1 Kbyte 4 Kbytes 1005 Load and lock half cache execute off chip 1 Kbyte 2 Kbytes Load and lock half the cache 110 2 remainder is normal cache enabled 512 bytes 2 Kbytes 0105 Reserved 1 Kbyte 4 Kbytes NOTES 1 On the CA only interrupt procedures can execute in the locked portion of the cache 2 On the CF interrupt procedures and other code can operate in the locked portion of the cache When sysctl executes mode 110 with a command to lock half of the instruction cache one way of the 1960 CF processor s two way set associative cache is preloaded and locked from the specified address The other half of the instruction cache functions as a 2 Kbyte direct mapped instruction cache On the 1960 CA processor the instruction cache s unlocked portion functions as a 512 byte two way set associative cache The 1960 CF processor s instruction scheduler checks both ways of the cache for every instruction fetched If an instruction is not found in either way it is fetched from external memory and cached in the unlocked way The 196
332. ee queue entries is dedicated for DMA operations This improves DMA performance and latency at the expense of loads and stores See CHAPTER 13 DMA CONTROLLER A 2 4 7 Control Pipeline The IS directly executes program flow control instructions Branches take two clocks to execute in the CTRL pipeline however the IS is able to see branches as many as four instructions ahead of the current instruction pointer This allows the scheduler to issue the branch early and in most cases execute the branch without inserting a dead clock in the REG and MEM instruction streams Table A 11 lists the instructions that the IS executes directly without the aid of micro flows For information on other control flow instructions see section A 2 6 Micro flow Execution pg 36 A 2 4 8 Unconditional Branches Figure A 15 shows the IS issue stage and the CTRL pipeline for the case where the branch target is another branch disabling the IS s ability to look ahead The IS issues the branch in one clock the branch is executed in the next clock The branch target is another branch which the scheduler issues immediately Hence branch instructions have a two clock sustained rate when issued back to back A 28 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Table A 11 CTRL Instructions Mnemonic Issue Clocks Latency Clocks Back to Back Throughput Clocks bbe 1 2 2 bne bl ble bg bge bo bno WwW O X
333. eft ordinal shro shift right ordinal shli shift left integer shri shift right integer shrdi shift right dividing integer rotate rotate left eshro extended shift right ordinal Except for rotate these instructions discard bits shifted beyond the register boundary shlo shifts zeros in from the least significant bit shro shifts zeros in from the most significant bit These instructions are equivalent to mulo and divo by the power of 2 respectively shli shifts zeros in from the least significant bit If a shift of the specified places would result in an overflow an integer overflow fault is generated if enabled The destination register is written with the source shifted as much as possible without overflow and an integer overflow fault is signaled shri performs a conventional arithmetic shift right operation by shifting the sign bit in from the most significant bit However when this instruction is used to divide a negative integer operand by the power of 2 it may produce an incorrect quotient Discarding the bits shifted out has the effect of rounding the result toward negative shrdi is provided for dividing integers by the power of 2 With this instruction 1 is added to the result if the bits shifted out are non zero and the operand is negative which produces the correct result for negative operands shli and shrdi are equivalent to muli and divi by the power of 2 respectively rotate rotates operand bits to the left towa
334. eg sfr if rcl lt sre2 lt 100 _ 1 U 5 7 REG R 05 1 1 else if src src2 lt 010 else AC cc lt 001 dst src2 1 overflow is ignored Compare and Decrement Ordinal m c src2 dst cmpdeco reg lit sfr reg lit sfr reg sfr if srel lt src2 lt 100 pese m I U 5 6 REG 0 5 1 1 else if src src2 lt 010 else AC cc lt 001 dst lt src2 1 March 1994 Page 6 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Mnemonic Description Arithmetic Con rols Process Controls Trace Controls Faults nif om of 2 1 te Events Modes A Opcode Opcode Instruction Execution Mach Instruction Result Format Type Issue Latency cmpi Compare Integer srcl src2 reg lit sfr reg lit sfr if srel lt src2 AC cc lt 100 else if src2 src2 AC cc lt 010 else AC cc lt 001 SA 1 REG R 0 5 1 1 cmpibe Compare Integer and Branch If Equal src sre2 targ reg lit sfr reg if sre1 lt sre2 lt 100 else if src src2 AC cc lt 010 IP targ else AC cc lt 001 IB IB 3A COBR
335. eived from the system i e one each clock If the fetch request is the result of a prefetch decision the IS is not stalled unless it needs an instruction from the prefetch request If the processor is executing straight line code which always misses the cache the IS is only able to issue instructions at a one instruction per clock rate It is never able to see multiple instructions in one clock The bus bandwidth of the memory subsystem containing the code limits the applica tion s performance A 35 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel b y y addo 40 gl g2 lt Cache Miss subo 43 44 g5 Instruction Scheduler ES addo subo CTRL Cache cam E Pipeline Miss Address Out bus Fetch St bus Miss External BCU Address Bus Pipeline External Data D D Bus addo subo D D La Bus addo subo Read src1 src2 0 91 3 g4 EU 99 9 93 9 Pipeline p Write dst g2 g0 g1 05 lt 04 03 Figure A 18 Fetch Execution A 2 5 4 Cache Replacement Data fetched as a result of a cache miss is written to the cache when and if the fetched data is requested by the IS This optimization keeps unexecuted prefetched data from taking up valuable cache space As the fetches come in from the BCU the fetch unit stores incomplete fetch blocks in a queue If the IS requests one or more instructions which are
336. eld IMAP2 d3 Interrupt Map Register 2 IMAP2 Reserved Initialize to 0 F CA054A Figure 12 7 Interrupt Mapping IMAPO IMAP2 Registers 12 13 INTERRUPT CONTROLLER In 12 3 6 Interrupt Mask and Pending Registers IMSK IPND The IMSK and IPND registers Figure 12 8 are special function registers sfl and sf0 respec tively Bits 0 through 7 of these registers are associated with the external interrupt pins XINT7 0 and bits 8 through 11 are associated with the DMA interrupt inputs DMA3 0 Bits 12 through 31 are reserved and should be set to 0 at initialization The IPND register posts dedicated mode interrupts originating from the eight external dedicated sources when configured in dedicated mode and the four DMA sources Asserting one of these inputs causes 1 to be latched into its associated bit in the IPND register In expanded mode bits 0 through 7 of this register are not used and should not be modified in mixed mode bits 0 through 4 are not used and should not be modified The IMSK register provides a mechanism for masking individual bits in the IPND register An interrupt source is disabled if its associated mask bit is set to 0 IMSK register bit 0 has two functions it masks interrupt pin XINTO in the dedicated mode and it globally masks all expanded mode interrupts in the expanded and mixed modes In expanded mode bits 1 through 7 are not used and should only contain zeros in
337. elism of the processor and considerations for writing software which is portable among all members of the 1960 microprocessor family INTRODUCTION intel 1 4 NOTATION AND TERMINOLOGY This section defines terminology and textual conventions that are used throughout the manual 1 4 1 Reserved and Preserved Certain fields in registers and data structures are described as being either reserved or preserved e reserved field is one that may be used by other 1960 architecture implementations Correct treatment of reserved fields ensures software compatibility with other 1960 processors The processor uses these fields for temporary storage as a result the fields sometimes contain unusual values e A preserved field is one that the processor does not use Software may use preserved fields for any function Reserved fields in certain data structures should be set to 0 zero when the data structure is created Set reserved fields to 0 when creating the Control Table Initialization Boot Record Interrupt Table Fault Table System Procedure Table and Process Control Block Software should not modify or rely on these reserved field values after a data structure is created When the processor creates the Interrupt or Fault Record data structure on the stack software should not depend on the value of the reserved fields within these data structures Some bits or fields in data structures and registers are shown as requiring specific encoding T
338. en affect this instruc tion s performance sdma executes in 22 clocks In the case of back to back sdma instructions 40 clocks are required udma requires 4 clocks sysctl Timings shown in Table A 19 assume a zero wait state memory system A 42 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Table A 19 sysctl Performance Message Message Type Issue Clocks Request Interrupt 00H 37 bus wait states Invalidate Cache 01H 38 Configure Cache 02H 52 with 1 Kbyte cache enabled 48 with 1Kbyte cache disabled 2078 bus wait states with load and lock 1Kbyte 1103 bus wait states with load and lock 512 bytes Reinitialize 03H 243 bus wait states Load Control Register Group 04H 42 bus wait states A 2 7 Coding Optimizations Embedded applications often benefit from hand optimized interrupt handlers and critical primitives This section reviews coding optimizations which arise due to the microarchitecture of the 1960 Cx instruction set processor The examples in this section are constructed to illustrate particular optimization tricks In general every example could be further optimized by applying several techniques instead of one A 43 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 7 1 Loads and Stores intel Separate load instructions from instructions that use load data Remember that store instructions can also be reordered Although it returns no results to a
339. ence information on the processor s instructions It is arranged alphabet ically by instruction or instruction group 9 7 INSTRUCTION REFERENCE intel 9 3 1 Format Description Action Faults Example Opcode See Also 9 8 addc addc Add Ordinal With Carry addc srcl src2 dst reg lit sfr reg lit sfr reg sfr Adds src2 and src values and condition code bit 1 used here as a carry in and stores the result in dst If ordinal addition results in a carry condition code bit 1 is set otherwise bit 1 is cleared If integer addition results in an overflow condition code bit 0 is set otherwise bit 0 is cleared Regardless of addition results condition code bit 2 is always set to 0 addc can be used for ordinal or integer arithmetic addc does not distinguish between ordinal and integer source operands Instead the processor evaluates the result for both data types and sets condition code bits 0 and 1 accordingly An integer overflow fault is never signaled with this instruction dst src2 srcl AC ccl AC cc lt is carry from ordinal addition V 1 if integer addition would have generated an overflow Type Mismatch Non supervisor reference of a sfr Example of double precision arithmetic Assume 64 bit source operands in 40 41 and g2 93 cmpo 1 0 clears Bit 1 carry bit of the AC cc addc 40 g2 40 add low order 32
340. er NPTR field in the chaining descriptor NPTR contains the chaining descriptor address which describes the next part of the chained DMA operation DMA operation ends when a NPTR of 0 null pointer is encountered Internal Register First Descriptor Pointer Linked Descriptors In Memory Buffer Transfers Byte Count BC First Buffer Transfer Source Address SA Destination Address DA Next Descriptor Pointer NPTR Second Buffer Transfer Nth Buffer Transfer OH Null Pointer F CA063A Figure 13 6 DMA Chaining Operation 18 14 intel 9 DMA CONTROLLER A chained DMA operation is started by specifying a pointer to the first chaining descriptor when sdma is used to configure the DMA channel Initial source address destination address and byte count are taken from the first chaining descriptor Chained DMAs are configured such that subsequent buffer transfers use either source destination or both of these addresses to continue the chained DMA These modes are referred to as source chaining destination chaining or source des tination chaining For example if a channel is configured for source chaining Figure 13 7 the source address for the DMA operation is updated to the value specified in each new descriptor The destination address is continually incremented from the address specified in the DA field of the first descriptor or is held fixed at that addre
341. er 272456 for actual programmable logic equations 1 NON PIPELINED BURST SRAM INTERFACE This appendix uses a simple SRAM design to demonstrate how the bus and control signals are used The design also demonstrates the internal wait state generator The basic SRAM interface provides the basic information needed to design most I O and memory interfaces The design supports burst and non burst bus accesses The SRAM interface is important for shared memory systems variations can be used to communicate with external memory mapped peripherals B 1 4 Background SRAM devices are available in a wide variety of packages and densities SRAM address pins are always dedicated as inputs Data pins may be configured in two ways each pin can be dedicated as an input or an output e aset of data pins may be used for both data in and data out Control signals usually found on SRAM include Chip Enable CE Output Enable OE and Write Enable WE The following example deals with a SRAM that has CE OE and WE control signals address inputs and data input output pins Memory is read when CE and OE are asserted and WE is not asserted Memory is written when CE and WE are asserted The OE input becomes don t care when WE is asserted However it is recommended that OE is not asserted at the beginning or end of a write cycle this can lead to bus contention B 1 2 Implementation Figure B 1 illustrates a 32 bit burst access SRAM interface The design
342. er eae 13 1 13 2 DEMAND AND BLOCK MODE 13 2 13 3 SOURCE AND DESTINATION ADDRESSING eee 13 3 13 4 TRANSFERS nanenane ate lo ed anid tetanic 13 3 13 4 1 Multi Cycle Transfers 13 3 13 4 2 Fly By Single Cycle Transfers 8 13 5 13 4 3 Source Destination Request 13 6 13 4 4 Assembly and Disassembly tcrra cia te ide ede eet 13 9 13 4 5 Data Alignment ERR REO ee E HE te 13 10 13 5 DATA CHAINING irte dritte tet tod to actin dng 13 13 13 6 DMA SOURCED 13 16 13 7 SYNCHRONIZING PROGRAM CHAINED BUFFER TRANSFERS 13 17 13 8 TERMINATING ADMA ce ee eet 13 18 13 9 CHANNEL PRIORITY 1 nicotine interit aa HR EROR RE EIER eeu RR Des 13 20 13 10 CHANNEL SETUP STATUS AND 13 20 13 10 1 DMA Command Register 13 21 13 10 2 Set Up DMA Instruction sdma 1 13 24 13 10 3 13 25 13 10 4 DMA Data 13 27 13 10 5 Channel Setup Examples 13 29 13 11 DMA EXTERNAL 13 30 xi
343. er to section 7 10 FAULT REFERENCE pg 7 20 to determine which faults are precise 7 9 3 Asynchronous Faults Asynchronous faults are those whose occurrence has no direct relationship to the instruction pointer The 1960 architecture does not define any faults in this category 7 9 4 No Imprecise Faults NIF Bit The NIF bit controls imprecise fault generation When this bit is set all faults generated are precise This means the following conditions hold true e All faults are generated in order A precise fault record is provided for each fault the faulting instruction address is correct and the RIP provides a valid re entry point into the program 7 18 ntel FAULTS When the NIF bit is clear imprecise faults are allowed to be generated in parallel out of order and with an imprecise RIP Here the following conditions hold true e When an imprecise fault occurs the faulting instruction address in the fault record is valid but the saved IP is unpredictable e f instructions are executed out of order and parallel faults occur recovery from some faults may not be possible because the faulting instruction s source operands may be modified when subsequent instructions are executed out of order 7 9 5 Controlling Fault Precision The synef instruction forces the processor to complete execution of all instructions that occur prior to syncf and to generate all faults before it begins work on instructions that occur a
344. erate interrupt signals to the processor These specification must also be used to calculate the minimum signal width as shown in Figure 12 5 nine Sree PCLK 7 0 prece detect interrupt 7 cycle min 7 1 1 1 1 D D 1 XINT 7 0 debounce detect interrupt Denotes sampling clock edge interrupt pins are sampled one time for every 2 PCLK cycles F 052 Figure 12 5 Interrupt Sampling 12 10 INTERRUPT CONTROLLER intel 12 3 3 Programmer s Interface The programmer s interface to the interrupt controller is through four control registers and two special function registers all described in this section ICON control register IMAPO IMAP2 control registers IMSK special function register sf1 and IPND special function register sfO 12 3 4 Interrupt Control Register ICON The ICON register Figure 12 6 is a 32 bit control register that sets up the interrupt controller Software can load this register using the sysctl instruction The ICON register is also automati cally loaded at initialization from the control table in external memory Interrupt Mode ICON im 00 dedicated 01 expanded 10 mixed 11 reserved Signal Detection Mode ICON sdm 0 level low activated 1 falling edge activated Global Interrupts Enable ICON gie 0 enabled 1 disabled Mask Operation ICON mo 00 move to R3 mas
345. errupt it uses a vector number that accompanies the interrupt request to locate the vector entry in the interrupt table From that entry it gets an address to the first instruction of the selected interrupt procedure The processor then makes an implicit call to that procedure When the interrupt call is made the processor uses a dedicated interrupt stack A new frame is created for the interrupt on this stack and a new set of local registers is allocated to the interrupt procedure The interrupted program s current state is also saved Upon return from the interrupt procedure the processor restores the interrupted program s state switches back to the stack that the processor was using prior to the interrupt and resumes program execution Since interrupts are handled based on priority requested interrupts are often saved for later service rather than being handled immediately The mechanism for saving the interrupt is referred to as interrupt posting The mechanism the 1960 Cx processors use for posting interrupts is described in section 12 2 MANAGING INTERRUPT REQUESTS pg 12 2 6 1 INTERRUPTS intel On the 1960 Cx processors interrupt requests may originate from external hardware sources internal DMA sources or from software External interrupts are detected with the chip s 8 bit interrupt port and with a dedicated NMI input Interrupt requests originate from software by the sysctl instruction which signals interrupts To mana
346. ers avoids numerous cache fills and spills for most applications and does not use any of the data RAM which is available for general data storage It is recommended to configure the cache for a minimum of five register sets 5 2 5 Mapping Local Registers to the Procedure Stack Each local register set is mapped to a register save area of its respective frame in the procedure stack Figure 5 1 Saved local register sets are frequently cached on chip rather than saved to memory This caching is performed non transparently Local register set contents are not saved automatically to the save area in memory when the register set is cached This would cause a significant performance loss for call operations Also no automatic update policy is implemented for register cache If the register save area in memory for a cached register set is modified there is no guarantee that the modification will be reflected when the register set is restored The set must be written or flushed to memory because of a frame spill prior to the modification for the modification to be valid flushreg causes the contents of all cached local register sets to be written flushed to their associated stack frames in memory The register cache is then invalidated meaning that all flushed register sets are restored from their save areas in memory The current set of local registers is not written to memory flushreg is commonly used in debuggers or fault handlers to gain access to
347. ertain interrupt procedure pointers and to the pending interrupt information without having to make memory accesses The microprocessor caches the following e The value of the highest priority posted in the pending priorities field e A predefined subset of interrupt procedure pointers entries from the interrupt table Pending interrupts received from external interrupt pins and on chip DMA controller hardware requested interrupts 6 5 INTERRUPTS intel This caching mechanism is non transparent in other words the processor may modify fields in a cached interrupt table without modifying the same fields in the interrupt table itself Vector caching is described in section 12 3 12 Vector Caching Option pg 12 20 6 5 REQUESTING INTERRUPTS On the 1960 Cx microprocessors interrupt requests may originate from external hardware sources internal DMA sources or from software External interrupts are detected with the chip s 8 bit interrupt port and with a dedicated NMI input Interrupt requests originate from software by the sysctl instruction which signals interrupts To manage and prioritize all possible interrupts the microprocessor integrates an on chip programmable interrupt controller The configuration and operation of the integrated interrupt controller is described in section 12 2 MANAGING INTERRUPT REQUESTS pg 12 2 Interrupts may be requested directly by a user s program This mechanism is often useful for requ
348. ervice an interrupt depends on the processor state when the interrupt is received If the processor is executing a background task when an interrupt request is to be serviced the interrupt context switch must change stacks to the interrupt stack This is called an executing state interrupt If the processor is already executing an interrupt handler no stack switch is required since the interrupt stack will already be in use This is called an interrupted state interrupt The following subsections describe interrupt handling actions for executing state and interrupted state interrupts In both cases it is assumed that the interrupt priority is higher than that of the processor and thus is serviced immediately when the processor receives it INTERRUPTS intel 6 9 1 Executing State Interrupt When the processor receives an interrupt while in the executing state 1 executing a program it performs the following actions to service the interrupt This procedure is the same regardless of whether the processor is in user or supervisor mode when the interrupt occurs The processor 1 switches to the interrupt stack as shown in Figure 6 3 The interrupt stack pointer becomes the new stack pointer for the processor saves the current state of process controls and arithmetic controls in an interrupt record on the interrupt stack The processor also saves the interrupt procedure pointer number allocates a new frame on the interrupt stack
349. es addressing transfer type and DMA modes A special function register the DMA command register DMAC is an interface for commonly used command and status functions for each channel Flexibility and a high degree of programmability for a DMA operation create a number of options for balancing DMA and processor performance and DMA latency This flexibility enables the programmer to select the best DMA configuration for a particular application 13 2 DEMAND AND BLOCK MODE DMA A channel can be configured as a demand mode or block mode DMA channel Demand mode DMAs move data between memory and an external I O device block mode DMAs typically move blocks of data from memory to memory When a channel is configured for demand mode an external device requests a DMA transfer with a DMA request input DREQ3 0 The DMA controller acknowledges the requesting device with a DMA acknowledge signal DACK3 0 The DACK3 0 signal is asserted during the bus request which the DMA controller makes to the requesting device Specific DREQ3 0 and DACK3 0 signal relationships are described in section 13 11 DMA EXTERNAL INTERFACE pg 13 30 After a DMA channel is configured the channel must be enabled by software through the DMA command register DMAC The DMA operation continues until it e is terminated by an external source with EOP e 15 suspended by software e ends because of a zero byte count An interrupt may be generated to detect any of the
350. es bus requests In demand mode transfers DREQ3 0 is asserted to request a DMA transfer DACK3 0 is asserted during the bus request issued in response to the DMA request Continuing the example started above if the DMA controller is set up for source synchronized demand mode DREQ3 0 causes word Id request to be issued when source request length equals word and causes a byte Idob request to be issued when the source request length equals byte DACK3 0 is asserted for the duration of the bus request for each case For demand mode transfers the request length is typically selected to match the external bus width of the external DMA device If request length is greater than bus width the DMA device must be designed to support multiple data cycles for each DMA transfer requested This may be accom plished by using a small FIFO and an external circuit to load and unload the FIFO This method reduces bus loading by the DMA process For block mode transfers source and destination request lengths are typically selected to match external data bus width This configuration uses the external bus most efficiently and also reduces latency for bus requests issued by the user process In instances where source and destination bus widths are different DMA performance may be increased by setting up the DMA with matching source and destination request lengths This configuration reduces DMA microcode overhead required to pack or unpack data between unequa
351. es the instruction and calculates the next instruction address This could be a macro or micro instruction address It is either the next sequential address or the target of a branch For conditional branches the IS uses condition codes or internal hardware flags to determine which way to branch If branch conditions are not valid when the IS sees a branch the processor guesses the branch direction using the branch prediction specified in the instruction If the guess was wrong the IS cancels the instructions on the wrong path and begins fetching along the correct path In the issue stage instructions are emitted or issued to the rest of the machine via the machine bus The machine bus consists of three parts REG format instruction portion MEM format instruction portion and CTRL format portion Each part of the machine bus goes to the coprocessor that executes the appropriate instruction The RF supplies operands and stores results for REG and MEM format instructions For this reason the RF is connected to both the REG and MEM portions of the machine bus The CTRL portion stays within the instruction sequencer since it directly executes the branch operations Several events occur when an instruction is issued 1 The information is driven onto the machine bus 2 IS reads the source operands and checks that all resources needed to execute the instruction are available 3 The instruction is cancelled if any resource that the instructio
352. escription cc cc cc Opcode Mach Instruction Result nif om of 2 1 0 p tfp em te Events Modes T Y Format Type issue Latency Call all targ 9 disp RIP next IP temp lt sp 0x10 and not Oxf memory lt 10 15 acces IC 09 CTRL u 4 spil 4 spill in local PFP PFP rt lt 000 FP lt temp SP lt FP 64 IP lt targ Call System cells 38 564 38 56 z ICS ICS U L M 660 REG u p if src gt 259 Protection length fault spill spill if local call or PC em 1 Perform Local Call using SPT else Perform Supervisor Call using SPT Call Extended callx larg mem lt next IP OP temp lt sp 0x10 and not Oxf 7 94 7 94 memory fp lt 10 15 accesses are cached ES IC IC U gt 86 u local register cache OC spill spill lt PFP rt lt 000 FP lt temp SP lt fp 64 IP lt targ Check Bit hkbi bitpos src chkbit reg lit sfr reg lit sfr 1 0 2 1 0 I I U M 5 REG R 05 1 1 if src and 2 bitpos mod 32 0 AC ccl lt 0 else AC ccl lt 1 Clear Bit Irbi bitpos sre dst rows 1 1 U I IM S amp C REG R 05 1 1 dst lt src and not 2 bitpos mod 32 Compare and Decrement Integer m i src src2 dst reg lit sfr reg lit sfr r
353. ese fetches differ from actual instruction stream loads in two ways load destination and load data buffering intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION First the load destination of an instruction fetch is the instruction fetch buffer not the register file Since fetch data goes directly from the BCU to the instruction fetch buffer and IS the scheduler can issue fetched instructions during the clock after they are read from external memory Second to reduce fetch latency the BCU buffers fetch data differently than a regular load instruction Instead of buffering four words of instructions before sending data to the fetch unit the BCU sends each word as it is received over the bus If the fetches are from 8 or 16 bit memory the BCU collects 32 bits before sending the word to each fetch unit Figure A 18 shows the execution of a two word fetch that resulted from a cache miss The fetch unit detects the cache miss at the end of the clock in which instructions would be issued had a hit occurred The fetch unit issues the instruction fetch in the following clock Assuming that the BCU is not busy with another operation the request begins on the external bus in the next clock The first word of the fetch is returned to the fetch unit in the clock in which it is received from the memory system the IS attempts to issue the instruction to an execution unit in that same clock The remaining words of a fetch are returned as they are rec
354. eshed in either a distributed or a burst manner Burst refresh does not refer to the burst access bus The term simply means that all memory rows are sequentially accessed when the refresh interval time expires Distributed refresh implies that refresh cycles are distributed within the refresh interval required by the memory BUS INTERFACE EXAMPLES intel P Distributed refresh cycles are spread out over the refresh interval reducing possible access latency Burst refreshing may lock the processor out of the DRAM for a longer period of time it may be inappropriate for some applications Burst refreshing however guarantees that no refresh activity occurs between refresh intervals Some applications may take advantage of this to burst refresh the DRAM during a time it will not be accessed making refresh invisible to the application B 3 3 Address Multiplexer Input Connections Address multiplexer inputs can be ordered such that 256 Kbyte through 4 Mbyte DRAM can be supported Interleaving the upper address signals provides compatibility with all these memory densities Figure B 16 illustrates this arrangement Availability of DRAM modules with standard pinouts makes this an attractive way to ensure future memory expansion PROCESSOR ADDRESS DRAM ADR COL ROW 0 A2 A11 1 A3 A12 2 4 A13 3 256K 1M A5 A14 4 A6 A15 5 A7 A16 6 A8 A17 7 A9 A18 8 A10 A19 9 A20 21 10 22 23 116 Figure 16 Address Multi
355. esign 14 29 INITIALIZATION AND SYSTEM REQUIREMENTS intel 14 4 7 Line Termination Input voltage level violations are usually due to voltage spikes that raise input voltage levels above the maximum limit overshoot and below the minimum limit undershoot These voltage levels can cause excess current on input gates resulting in permanent damage to the device Even if no damage occurs many devices are not guaranteed to function as specified if input voltage levels are exceeded Signal lines are terminated to minimize signal reflections and prevent overshoot and undershoot Terminate the line if the round trip signal path delay is greater than signal rise or fall time If the line is not terminated the signal reaches its high or low level before reflections have time to dissipate and overshoot or undershoot occurs For the 1960 Cx processor two termination methods are attractive AC and series An AC termination damps the signal at the end of the series line termination compensates for excess current before the signal travels down the line Series termination decreases current flow in the signal path by adding a series resistor as shown in Figure 14 7 The resistor increases signal rise and fall times so that the change in current occurs over a longer period of time Because the amount of voltage overshoot and undershoot depends on the change in current over time V L di dt the increased time reduces overshoot and undershoot Place t
356. ess specified with src and stores it in dst The src address is not checked for validity Any addressing mode may be used to calculate efa An important application of this instruction is to load a constant longer than 5 bits into a register To load a register with a constant of 5 bits or less mov can be used with a literal as the src operand dst lt efa src Operation Operand Invalid operand value encountered Opcode Invalid opcode encoding encountered 14 58 g9 gl gl 49458 lda 0x749 r8 r8 lt 0x749 Ida 8CH MEM intel 9 3 31 Mnemonic Format Description Action Faults Example Opcode See Also INSTRUCTION SET REFERENCE mark mark Mark mark Generates breakpoint trace event if breakpoint trace mode is enabled Breakpoint trace mode is enabled if the PC register trace enable bit bit 0 and the TC register breakpoint trace mode bit bit 7 are set When a breakpoint trace event is detected the PC register trace fault pending flag bit 10 and the TC register breakpoint trace event flag bit 23 are set Then before the next instruction is executed a breakpoint trace fault is generated If breakpoint trace mode is not enabled mark behaves like a no op For more information on trace fault generation refer to CHAPTER 8 TRACING AND DEBUGGING if PC te 1 and TC br 1 PC tfp lt 1 TC bte lt 1 Trace Breakpoint trace fault Trace Instruction Breakpoint if
357. essors can be configured to detect as many as seven different trace events including breakpoints branches calls supervisor calls returns prereturns and the execution of each instruction for single stepping through a program The processors also provide four breakpoint registers that allow break decisions to be made based upon instruction or data addresses 1 2 SYSTEM INTEGRATION The 1960 Cx microprocessors are based on the C series core which is object code compatible with the 32 bit 1960 microprocessor core architecture Additionally the 1960 Cx devices integrate three peripherals around the core bus control unit DMA controller and interrupt controller 1 2 1 Pipelined Burst Bus Control Unit The 1960 Cx processors integrate a 32 bit high performance bus controller for interfacing to external memory and peripherals The bus control unit incorporates full wait state logic and bus width control to provide high system performance with minimal system design complexity The bus control unit features a maximum transfer rate of 132 Mbytes per second at 33 MHz Internally programmable wait states and 16 separately configurable memory regions allow the processor to interface with a variety of memory subsystems with minimum complexity and maximum performance 1 2 2 Flexible DMA Controller A four channel DMA controller provides high speed DMA data transfers Source and destination can be any combination of internal RAM external memory or per
358. est for the channel is serviced and all bus activity for that request is complete A channel s terminal count flag must be cleared by software before the DMA channel is enabled This is because the DMA controller does not explicitly clear the terminal count flags after a DMA has completed this action must be performed by software The terminal count flags indicate status only Modifying these bits by software has no effect on a DMA operation The channel active flags bits 11 8 indicate that a channel is either idle 0 or active 1 Bits 8 through 11 indicate active channels 0 though 3 respectively For demand mode the active bit is set when the DMA request is recognized by internal hardware and remains set until all bus activity for that request is complete In block mode the channel active bit remains set for the duration of the block mode DMA Channel active flags indicate status only These flags cannot be modified by software attempts to modify these bits by software has no effect on a DMA operation The channel done flags bits 15 12 indicate that a channel s DMA has finished Bits 12 through 15 indicate a completed DMA on channels 0 through 3 respectively The DMA controller sets a channel done flag when a DMA operation has finished in one of three ways e byte count reached zero in non chaining mode e null pointer reached in a chaining mode e EOP3 0 signal is asserted which ends the DMA operation DMA controller channel do
359. esting and prioritizing low level tasks in a real time application Software can request interrupts in the following two ways with the sysctl instruction or by posting an interrupt in the interrupt table s pending interrupts and pending priorities fields 6 5 1 Posting Interrupts For the 1960 Cx processors only software requested interrupts are posted in the interrupt table hardware requested interrupts are posted in the interrupt pending IPND register This register and the mechanism for requesting and posting hardware interrupts is described in section 12 3 6 Interrupt Mask and Pending Registers IMSK IPND pg 12 14 Software posting of interrupts in the interrupt table can assist an application in prioritizing processing demands as follows e posting interrupt requests in the interrupt table the application can delay the servicing of low priority tasks which were signaled by a higher priority interrupt In systems with more than one processor both processors can post and service interrupts from a shared interrupt table This interrupt table sharing allows processors to share the interrupt handling load or provide a communication mechanism between the processors To post a pending interrupt in the memory resident interrupt table the processor performs the atomic read write operation that locks the interrupt table until the posting operation has completed see Example 6 1 6 6 intel INTERRUPTS Example 6 1 At
360. et can be categorized into the following functional groups Table 4 1 shows the instructions in these groups The actual number of instructions is greater than those shown in this list because for some operations several unique instructions are provided to handle various operand sizes data types or branch conditions The following sections briefly Data Movement Bit Bit Field and Byte Call Return Atomic e Arithmetic Ordinal and Integer Comparison e Fault e Processor Management overview the instructions in each group Logical e Branch Debug Table 4 1 i960 Cx Microprocessor Instruction Set Summary Data Movement Arithmetic Logical Bit Bit Field Byte Load Add AND Set Bit Store Subtract NOT AND Clear Bit Move Multiply AND NOT Not Bit Load Address Divide OR Alter Bit Add with carry Exclusive OR Scan For Bit Subtract with carry NOT OR Span Over Bit Extended Multiply OR NOT Extract Extended Divide NOT Modify Remainder Exclusive NOR Scan Byte For Equal Modulo NOR Shift NAND Extended Shift Rotate Comparison Branch Call Return Fault Compare Unconditional Branch Call Conditional Fault Conditional Compare Conditional Branch Call Extended Synchronize Faults Check Bit Compare and Branch Call System Compare and Increment Return Compare and Decrement Branch and Link Test Condition Code Debug Atomic Processor Modify Trace Controls Mark Force Mark Atomic Add Atomic Mod
361. etic Controls Register fields AC cc Condition Code flags AC cc2 0 0 Condition Code Bit 0 1 Condition Code Bit 1 AC cc2 Condition Code Bit 2 AC nif No Imprecise Faults flag AC of Integer Overflow flag AC om Integer Overflow Mask Bit Process Controls Register fields PC em Execution Mode flag PC s State Flag PC tfp Trace Fault Pending flag PC p Priority Field PC p5 0 PC te Trace Enable Bit Trace Controls Register fields TC i Instruction Trace Mode Bit TC c Call Trace Mode Bit TC p Pre return Trace Mode Bit TC br Breakpoint Trace Mode Bit TC b Branch Trace Mode Bit TC r Return Trace Mode Bit TC s Supervisor Trace Mode Bit TC if Instruction Trace Event flag TC cf Call Trace Event flag TC pf Pre return Trace Event flag TC brf Breakpoint Trace Event flag TC bf Branch Trace Event flag TC rf Return Trace Event flag TC sf Supervisor Trace Event flag Previous Frame Pointer 0 PFP add Address PFP add31 4 PFP rt Return Type Field PFP rt2 0 PFP p Pre return Trace flag sp Stack Pointer r1 fp Frame Pointer g15 rip Return Instruction Pointer r2 SPT System Procedure Table SPT base SPT targ Supervisor Stack Base Address Address of SPT Entry targ 9 5 INSTRUCTION SET REFERENCE Table 9 2 Pseudo code Symbol Definitions lt Assignment Comparison equal not equal lt gt less than greater than lt gt less than or equal to g
362. fa For all other references efa clocks Addressing Mode clocks Addressing Mode 0 offset 0 offset 0 disp 0 disp 0 reg 0 reg 0 offset reg 1 offset reg 0 disp reg 1 disp reg 0 disp reg scale 1 disp reg scale 1 reg reg scale 2 reg reg scale 1 disp reg reg scale 2 disp reg reg scale 3 disp 8 IP 4 disp 8 IP bus The time necessary to perform the external memory operations associated with the instruction The additive factor bus equals 0 when memory operations associated with the instruction in the on chip data RAM Bus is also equal to zero for branches and calls where the target is in the instruction cache spill The time required to write one cached register set to its reserved frame on the stack Although spill is a function of bus spill equals 36 when the stack is in external zero wait state memory fill The time required to read one register set from the previous stack frame Although fill is a function of bus fill equals 36 when the stack is in external zero wait state memory frames The number of register sets flushed to memory fixup When the shrdi instruction concludes a four clock micro flow executes if any bits shifted out were set and the source operand was negative Fixup is four clocks for this case Fixup is zero clocks for positive operands and for negative operands in which only zeros are shifted out March 1994 Page 3 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide
363. fewer restrictions any section of code can be locked into half of the instruction cache not just the interrupt procedures Executing sysctl with a message type of 02H selects cache mode One of four cache modes are selected with the configure instruction cache message normal cache e load and lock half the cache e load and lock entire cache e cache disabled The sysctl field 1 value determines which configure cache operation is performed see Table 4 4 Field 3 is a word aligned 32 bit address when a load and lock mode is selected otherwise this field is ignored Text following the table further defines the modes 4 21 INSTRUCTION SET SUMMARY In P Table 4 4 Cache Configuration Modes Mode Field Mode Description 80960CA 80960 0005 normal cache enabled 1 Kbyte 4 Kbytes 12 full cache disabled 1 Kbyte 4 Kbytes 1005 Load and lock full cache execute off chip 1 Kbyte 4 Kbytes Load and lock half the cache 110 2 remainder is normal cache enabled 512 bytes 2 Kbytes 0105 Reserved 1 Kbyte 4 Kbytes NOTES 1 On the CA only interrupt procedures can execute in the locked portion of the cache 2 On the CF interrupt procedures and other code can operate in the locked portion of the cache Mode 000 configures the cache as two way set associative Mode XX1 completely disables the cache Either of these cache configurations can be specified when the processor initializes b
364. fic priority when the processor is configured The priority of all posted hardware interrupts is continually compared to the current process priority Software interrupts are posted in the interrupt table in external memory The highest priority posted in this table is also saved in an on chip software priority register this register is continually compared to the current process priority Servicing Interrupts If the process priority falls below that of any posted interrupt the interrupt is serviced The comparator signals the core to begin a microcode sequence to perform the interrupt context switch and branch to the first instruction of the interrupt routine Figure 12 9 illustrates interrupt controller function For best performance the interrupt flow for hardware interrupt sources is implemented entirely in hardware The comparator only signals the core when a posted interrupt is a higher priority than the process priority Because the comparator function is implemented in hardware microcode cycles are never consumed unless an interrupt is serviced 12 3 10 Interrupt Service Latency The time required to perform an interrupt task switch is referred to as interrupt service latency Latency is the time measured between activation of an interrupt source and execution of the first instruction for the accompanying interrupt handling procedure In the following discussion interrupt service latency is derived in number of PCLK2 1 cycles The es
365. for SDMA Instruction Reserved Initialize To 0 F 068 Figure 13 11 DMA Control Word 13 27 DMA CONTROLLER intel The transfer type field bits 3 0 specifies the request length of bus requests issued by the DMA controller and selects between multi cycle and fly by transfers The source destination addressing bits bits 4 5 determine if the source or destination address for a channel is held fixed 1 or incremented 0 during a DMA Bit 5 controls the source address and bit 4 controls the destination address The source addressing bit bit 5 controls address increment and hold for fly by transfers The synchronization mode bit bit 6 specifies that a multi cycle demand mode transfer is synchro nized with the source 0 or the destination 1 In fly by mode the bit specifies whether fly by stores 0 or fly by loads 1 are performed Fly by stores are source synchronized fly by loads are destination synchronized For non fly by block mode transfers this bit is ignored The synchronization select bit bit 7 determines whether a transfer is demand 1 or block mode 0 The EOP TC select bit bit 8 selects EOP TC3 0 pin function If the EOP TC3 0 select bit is cleared 0 the pins are configured as end of process inputs EOP3 0 If set 1 the pin is configured as a terminal count output TC3 0 The following bits in the DMA control word control data chaining If chaining mode is not used the source desti
366. for each word in a line The write policy is write through and write allocate With respect to data accesses on a region by region basis external memory is configured as either cacheable or non cacheable A bit in the memory region table entry defines whether or not data accesses are cacheable This makes it very easy to partition a system into non cacheable regions for I O or shared data in a multiprocessor system and cacheable regions local system memory with no external hardware logic To maintain data cache coherency the 1960 CF processor implements a simple single processor coherency mechanism Also by software control the data cache can be globally enabled globally disabled or globally invalidated A data access is either explicitly defined as cacheable or non cacheable through the memory region table e implicitly defined as non cacheable by the nature of the access all DMA accesses and atomic accesses atmod atadd are implicitly defined as non cacheable data accesses The data cache indirectly supports unaligned accesses Micro flows break unaligned accesses into aligned accesses which are cacheable or non cacheable according to the same rules as aligned accesses An unaligned access could be only partially in the data cache and be a combination of hits and misses The data cache supports both big endian and little endian data types C 2 5 Data and Data Structure Alignment The 1960 architecture does not define how to handle
367. fset new j offset b next j new next i next i new next j subo r9 gO0 40 next j first mask row first mask row addo 1 41 41 ldob g0 r5 ldob g0 r5 addo 1 40 40 addo 1 90 40 ldob 00 r4 ldob g0 r4 addo 1 90 40 addo 1 40 40 shlo 1 r4 r4 lda r4 2 r4 addo x54 5 addo d BI ldob g0 r4 ldob g0 r4 addo EA dx5y sho addo Ed dX Dus SID addo r8 0 40 addo r8 0 40 second mask row second mask row ldob g0 r4 ldob g0 r4 A 50 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION addo 1 g0 40 addo 1 90 40 shlo 1 r4 r4 addo 4 addo r4 x5 lda ed c 2 4 ldob 00 r4 ldob 00 r4 addo 1 90 40 addo 1 0 40 shlo 2 r4 r4 lda r4 4 r4 addo E5 addo r4 r5 r5 ldob 00 r4 ldob 00 r4 shlo 1 r4 r4 addo r8 90 40 addo r4 5 5 lda r4 2 addo r8 90 90 r4 r5 r5 third mask row third mask row ldob 00 r4 ldob 00 r4 addo 1 90 40 1 90 90 rA r5 5 addo r4 r5 r5 ldob 00 r4 ldob 00 r4 addo 1 40 40 addo 1 40 40 shlo 1 r4 r4 lda r4 2 r4 addo r4 r5 r5 addo r4 x5 x5 ldob g0 r4 ldob g0 r4 addo rA x5 ES addo r4 x5 345 shro A r5 r5 shro 4 r5 stob r5 91 cmpdeco2 r6 r6 addo 1 gl gl stob r5 gl subo r9 90 40 update pointers update pointers cmpdeco2 r6 r6 bg t new next i bg next i a
368. fter syncf This instruction has two uses e forces faults to be precise when the NIF bit is clear e ensures that all instructions are complete and all faults are generated in one block of code before executing another block of code Compiled code should execute with the NIF bit clear using syncf where necessary to ensure that faults occur in order In this mode imprecise faults are considered as catastrophic errors from which recovery is not needed The NIF bit should be set if recovery from one or more imprecise faults is required For example the NIF bit should be set if a program needs to handle and recover from unmasked integer overflow faults and the fault handling procedure cannot be closely coupled with the application to perform imprecise fault recovery 7 19 FAULTS intel 7 10 FAULT REFERENCE This section describes each fault type and subtype and gives detailed information about what is stored in the various fields of the fault record The section is organized alphabetically by fault type The following paragraphs describe the information that is provided for each fault type Fault Type and Subtype Function RIP Program State Changes 7 20 Gives the number which appears in the fault record fault type field when the fault is generated The fault subtype section lists fault subtypes and number associated with each fault subtype Describes the purpose of fault type and fault subtype It also des
369. full cache execute off chip 1 Kbyte 4 Kbytes 1105 Load and lock half the cache remainder is normal cache enabled 212 byles 2 Kbytes 0105 Reserved 1 Kbyte 4 Kbytes Action temp lt srcl tmpmessage lt temp and Oxf0 gt gt 8 switch tmpmessage case 0 Signal an Interrupt post_interrupt temp and Oxf break case 1 Invalidate the Instruction Cache invalidate_instruction_cache break case 2 Configure Instruction Cache tmptype lt src and Oxff if tmptype bitO 1 disable instruction cache else if tmptype 0x0 enable instruction cache else if tmptype 0 4 Load and freeze 1k cache instr cache lt memory Ik src2 load 1k bytes freeze instruction cache else if tmptype 0 x 6 Load and freeze 512 bytes of cache instr cache lt memory_512 src2 load 512 bytes freeze 512 instruction cache else Reserved break 9 79 INSTRUCTION SET REFERENCE intel Faults Example Opcode case 3 case 4 default ldconst Clear cache g6 sysctl r6 r7 r8 be uploaded code sysctl Software Reset temp lt src2 load PRCB pointed to by src3 IP lt temp break Load One Group of Control Registers from the Control Table temp 0 3 lt memory_quad Control Table Base group offset for 1 lt 0 1 lt 3 1 lt 1 1 control reg i lt temp i break Operation invalid o
370. g 14 14 e Example 14 3 High Level Startup Code initmain c pg 14 20 e Example 14 5 Initialization Boot Record File rom ibr c pg 14 21 e Example 14 6 Linker Directive File init ld pg 14 23 e Example 14 7 Makefile pg 14 24 e Example 14 8 Initialization Header File init h pg 14 25 Example 14 2 Startup Routine init s Sheet 1 of 6 Roe seo SHR Bee ae SIE Sa GENES 7 inrt s if M initial PRCB globl rom prcb align 4 rom prcb word boot flt table 0 Fault Table word boot control table 4 Control Table word 0x00001000 8 AC reg mask overflow fault word 0x40000001 12 Flt CFG Allow Unaligned word boot intr table 16 Interrupt Table word rom sys proc table 20 System Procedure Table word 0 24 Reserved word intr stack 28 Interrupt Stack Pointer word 0x00000000 32 Inst Cach enable cach word 5 36 Reg Cache 5 sets cached 14 14 intel INITIALIZATION AND SYSTEM REQUIREMENTS Example 14 2 Startup Routine init s Sheet 2 of 6 ROM system procedure table supervisor proc align 6 rom sys proc table space 12 word Supervisor stack space 32 word default sysproc word default sysproc word _default_sysproc word _default_sysproc word default sysproc word default sysproc word default sysproc w
371. g A 36 for a description of how other addressing modes are handled A 2 4 6 Bus Control Unit BCU The BCU executes memory operations for load and store instructions instruction fetches micro flows and DMA operations It executes memory load requests in two clocks zero wait states and returns a result on the third clock Using address pipelining and on chip request queuing the BCU can accept a load or store from the IS every clock and return load data every clock The BCU receives instructions over the MEM machine bus stores addresses over the 32 bit address out bus and stores data over the 128 bit store bus The BCU returns data over the 128 bit load bus The BCU receives a load address during the issue clock The address is placed on the system bus during the next clock the first BCU execute stage The system returns data at the end of the following clock the second BCU execute stage On the next clock the BCU writes the data to the destination register This write is bypassed to the REG side and MEM side source buses and the scoreboarded instruction is issued in the same clock The zero wait state load causes a two clock execution delay of the next instruction because the load data is referenced immediately after the load is issued If the memory system has wait states the load data delay would be longer If the load is advanced in the code such that it is separated from the instruction which uses the data the load delay could
372. g from on chip cache therefore it is possible that bus requests may be posted in the queue after the hold request is granted In this case BREQ can be used to relinquish the hold request when the processor needs the bus 11 28 3 EXTERNAL BUS DESCRIPTION HOLD and HOLDA arbitration can also function during the reset state The bus controller acknowledges HOLD while RESET is asserted If RESET is asserted while HOLDA is asserted the processor has acknowledged the HOLD the processor remains in the HOLDA state The processor does not go into the reset state until HOLD is removed and the processor removes HOLDA Word Read Request Npap 0 0 Word Read Request 1 Nxpa 1 ADS FO 1 1 NEN A31 2 SUP 0 WAIT DEN DT R 1 1 1 1 Ux BLAST i Venet a Zi 1 1 1 1 1 1 HOLD HOLDA CX044A Figure 11 17 HOLD HOLDA Bus Arbitration 11 5 1 Bus Backoff Function BOFF pin The bus backoff input BOFF suspends a bus request already in progress and allows another bus master to temporarily take control of the bus The BOFF pin causes the current bus request to be suspended When BOFF is asserted the processor s address data and status pins are floated on the following clock cycle At this time an alternate bus master may take control of the local system
373. g lit sfr reg AC ccl lt sre and 2 bitpos mod 32 if 1 1 IP lt targ 0 0 IB 37 COBR be Branch If Equal targ if AC cc and 010 0 IP lt targ IB IB 12 CTRL bg Branch If Greater targ if AC cc and 001 0 IP lt targ IB IB 11 CTRL bge Branch If Greater Or Equal targ if AC cc and 011 0 IP lt targ IB IB 13 CTRL bl Branch If Less targ if AC cc and 100 0 IP amp targ IB IB 14 CTRL ble Branch If Less Or Equal targ if and 110 0 IP lt targ IB IB 16 CTRL bne Branch If Not Equal targ if and 101 0 IP lt targ IB IB 15 CTRL bno Branch If Not Ordered targ if AC cc 000 IP lt targ IB IB 10 CTRL bo Branch If Ordered targ if AC cc 0 IP lt targ IB IB 17 CTRL bx Branch Extended targ mem IP lt targ IB IB 84 MEM u March 1994 Page 5 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Controls Process Controls Trace Controls Faults Opcode Instruction Execution Mnemonic D
374. ge and prioritize all possible interrupts the microprocessor integrates an on chip programmable interrupt controller Integrated interrupt controller configuration and operation is described in CHAPTER 12 INTERRUPT CONTROLLER The 1960 architecture defines two data structures to support interrupt processing see Figure 6 1 the interrupt table and interrupt stack The interrupt table contains 248 vectors for interrupt handling procedures and an area for posting software requested interrupts The interrupt stack prevents interrupt handling procedures from overwriting the stack in use by the application program It also allows the interrupt stack to be located in a different area of memory than the user and supervisor stack e g fast SRAM Do 12 Mem 7 7 7 1 Interrupt l Interrupt 9609 l Request Processor xsmuptPoier Procedure rM 4 F_CAO15A Figure 6 1 Interrupt Handling Data Structures 6 2 SOFTWARE REQUIREMENTS FOR INTERRUPT HANDLING To use the processor s interrupt handling facilities software must provide the following items in memory e Interrupt Table e Interrupt Handler Routines e Interrupt Stack These items are generally established in memory as part of the initialization procedure Once these items are present in memory and pointers to them have been entered in the appropriate system data structures the processor handles interrupts automatical
375. gnates individual bits in a field For example the return type rt field in the previous frame pointer PFP register is designated as PFP rt The least significant bit of the return type field is then designated as PFP rt0 1 8 intel PROGRAMMING ENVIRONMENT intel CHAPTER 2 PROGRAMMING ENVIRONMENT This chapter describes the 19609 Cx microprocessors programming environment including global and local registers special function registers control registers literals processor state registers and address space 2 1 OVERVIEW 1960 architecture defines a programming environment for program execution data storage and data manipulation Figure 2 1 shows the programming environment elements which include a 4 Gbyte p byte flat address space an instruction cache global and local general purpose registers a set of literals special function registers control registers and a set of processor state registers A register cache saves the 16 procedure specific local registers The processor defines several data structures located in memory as part of the programming environment These data structures handle procedure calls interrupts and faults and provide configuration information at initialization These data structures are e interrupt stack control table e System procedure table e local stack e fault table e process control block e supervisor stack e interrupt table e initialization boot record 2 2 REGISTE
376. gt src2 These instructions are intended for use in ending iterative loops For cmpinci integer overflow is ignored to allow looping up through the maximum integer values Action if src1 lt src2 lt 1005 else if srcl src2 lt 010 else AC cc lt 0015 dst lt src2 1 overflow suppressed for instruction Faults Type Mismatch Non supervisor reference of a sfr Example cmpinco r8 42 49 compares the values g2 and r8 and sets to indicate the result g9 lt g2 1 Opcode cmpinci 5 5 5 4 See Also cmpdeco cmpo cmpi cmpdeci COMPARE AND BRANCH 9 30 intel 9 3 19 Mnemonic Format Description INSTRUCTION SET REFERENCE COMPARE AND BRANCH cmpibe t f Compare Integer and Branch If Equal cmpibne t f Compare Integer and Branch If Not Equal cmpibl t f Compare Integer and Branch If Less cmpible t f Compare Integer and Branch If Less Or Equal cmpibg t f Compare Integer and Branch If Greater cmpibge t f Compare Integer and Branch If Greater Or Equal cmpibo t f Compare Integer and Branch If Ordered cmpibno t f Compare Integer and Branch If Not Ordered cmpobe t f cmpobne t f cmpobl t f cmpoble t f cmpobg t f cmpobge t f Compare Ordinal and Branch If Equal Compare Ordinal and Branch If Not Equal Compare Ordinal and Branch If Less Compare Ordina
377. h fault 7 7 FAULTS intel 7 5 3 Fault Record Location The fault record is stored in the stack that the processor uses to execute the fault handling procedure As shown in Figure 7 4 this stack can be the user stack supervisor stack or interrupt stack The fault record begins at byte address NFP 1 NFP refers to the new frame pointer which is computed by adding the memory size allocated for padding and the fault record to the new stack pointer NSP The processor automatically determines the number of bytes required for the fault record and increments the FP by that amount rounding it off to the next highest 16 byte boundary Fault record size is variable based on the size of the optional fault data portion of the fault record Stack frame alignment is defined for each implementation of the 1960 architecture This alignment boundary is calculated from the relationship SALIGN 16 For example if SALIGN is selected to be 4 stack frames are aligned on 64 byte boundaries In the 1960 Cx processors SALIGN 1 Current Stack User Supervisor or Interrupt Stack Current Frame Local Stack or Supervisor Stack NSP Padding Area Fault Stack Growth Fault Record Record NFP 4 NFP New Frame NOTES 1 If the call to the fault handler procedure does not require a stack switch the new stack pointer NSP is the same as SP 2 If the processor is in user mode and the fault handler procedure is called with a system superviso
378. h index and displacement addressing mode adds both a scaled index and a displacement to the address base There is only one version of this addressing mode at the instruction encoding level it is encoded in the MEMB instruction format 3 3 3 Index with Displacement A scaled index can also be used with a displacement alone Again the index is contained in a register and multiplied by a scaling constant before displacement is added 3 3 4 IP with Displacement This addressing mode is used with load and store instructions to make them IP relative IP with displacement addressing mode references the next instruction s address plus the displacement plus a constant of 8 The constant is added because in a typical processor implementation the address has incremented beyond the next instruction address at the time of address calculation The constant simplifies IP with displacement addressing mode implementation 3 3 5 Addressing Mode Examples The following examples show how 1960 addressing modes are encoded in assembly language Example 3 1 shows addressing mode mnemonics Example 3 2 illustrates the usefulness of scaled index and scaled index plus displacement addressing modes In this example a procedure named array op uses these addressing modes to fill two contiguous memory blocks separated by a constant offset A pointer to the top of the block is passed to the procedure in g0 the block size is passed in g1 and the fill data in g2 3
379. h the branch and it takes a full two clocks to execute as seen in Figure 15 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION b x x addo 90 1 2 93 b y y addo g5 lda 2 g8 Instruction isie pen addo Scheduler b Ida CTRL Pipeline EU Read src1 src2 90 01 95 06 Pipeline Execute and g2 g0 g1 97 95 96 Read over AGU Base Bus 93 98 Pipeline Execute and Write over Ldbus 94 lt 2 03 994 08 2 Figure 15 Branch in First Executable Group Figure A 16 shows the case where a branch when first seen by the IS is in the second executable group B of instructions in the rolling quad word not the first executable group A which is about to be issued The IS issues the branch immediately along with the first group of instructions ahead of it A Since the branch takes two clocks in the CTRL pipeline to execute there is no break in the IS s ability to issue instructions On the next clock the IS issues a new group of instructions from the branch target In the figure two other instructions were issued simultaneously with the branch and one instruction was issued during the clock in which the branch was executing Hence it can be said that this branch takes zero clocks to execute Figure A 17 shows the case where a branch when first seen by the IS is in the third executable group C of instructions of the rolling quad word not the first executable group A which is
380. hannel E N DMA suspend on interrupt disabled Wi int orst case D DMA suspend on interrupt enabled Interrupt Latency NOTES 1 is the sum of maximum latencies of all channels which may be preempted by the requesting channel For example with four DMA channels enabled and rotating priority mode a channel request may be required to preempt three other channels with pending requests In this case the component is the sum of all of these latencies 13 44 intel 3 DMA CONTROLLER As shown in Equations 13 7 and 13 8 worst case DMA latency is finally calculated as the sum of the individual latency components plus the worst case throughput condition Non chaining modes Ni worst case max NT Nr first Nsetup Nswap Niower Nin Equation 13 7 Chaining modes worst case Nt chain Nsetup Nswap Nint Equation 13 8 13 45 DMA CONTROLLER 13 46 intel 14 INITIALIZATION AND SYSTEM REQUIREMENTS intel CHAPTER 14 INITIALIZATION AND SYSTEM REQUIREMENTS This chapter describes the steps that the 19609 Cx processors take during initialization Discussed are the RESET pin the reset state built in self test BIST features and on circuit emulation function ONCE The chapter also describes the processor s basic system requirements including power ground and clock and concludes with some general guidelines for high speed circuit board design
381. hat tracing is turned off when a trace fault handling procedure is being executed This is necessary to prevent an endless loop of trace fault handling calls 8 6 TRACE HANDLING ACTION Once a trace event is signaled the processor determines how to handle the trace event according to the PC register trace enable bit and trace fault pending flag settings and to other events that might occur simultaneously with the trace event such as an interrupt or non trace fault Sub sections that follow describe how the processor handles trace events for various situations 8 6 1 Normal Handling of Trace Events Before the processor executes an instruction 1 The processor checks the state of the trace fault pending flag If clear the processor begins execution of the next instruction If set the processor performs the following actions E 2 processor checks the PC register trace enable bit state If clear the processor clears any trace event flags that are set prior executing the next instruction If set the processor signals a trace fault and begins fault handling action as described in section 7 7 FAULT HANDLING PROCEDURES pg 7 12 8 6 2 Prereturn Trace Handling The processor handles a prereturn trace event the same as described above except when it occurs at the same time as a non trace fault In this case the non trace fault is handled first On returning from the fault handler for the non trace fault the proces
382. he 19609 Cx processors fault handling facilities Subjects covered include the fault handling data structures and fault handling mechanism See section 7 10 FAULT REFERENCE pg 7 20 for detailed information on each fault type 7 1 FAULT HANDLING FACILITIES OVERVIEW The 1960 processor architecture defines various conditions in code and or the processor s internal state that could cause the processor to deliver incorrect or inappropriate results or that could cause it to head down an undesirable control path These are called fault conditions For example for inappropriate operand values and for invalid opcodes and addressing modes the architecture defines faults for divide by zero and overflow conditions on integer calculations As shown in Figure 7 1 the architecture defines a fault table a system procedure table a set of fault handling procedures and a stack user stack supervisor stack or both to handle processor generated faults Fault Handling Procedures Processor System Procedure Supervisor Table Stack User Stack F 018 Figure 7 1 Fault Handling Data Structures 7 1 FAULTS intel The fault table contains pointers to fault handling procedures The system procedure table optionally provides an interface to any fault handling procedure and allows faults to be handled in supervisor mode Stack frames for fault handling procedures are created on either the user or supervisor stac
383. he DMA pin is not active when chaining descriptors are fetched from memory 13 34 intel 3 DMA CONTROLLER 13 11 6 DMA Controller Implementation The 1960 Cx processors DMA functions are implemented primarily in microcode Processor clock cycles are required to setup and execute a DMA operation DMA features including data chaining data alignment byte assembly and disassembly are implemented in microcode DMA hardware arbitrates channel requests handles the DMA external hardware interface and interfaces to microcode for most efficient use of core resources When considering whether to use the DMA controller two questions generally arise 1 When a DMA transfer is executing how many internal processor clock cycles does the DMA operation consume 2 When a DMA transfer is executing how much of the total bus bandwidth is consumed by the DMA bus operations These questions are addressed in the following sections 13 11 7 DMA and User Program Processes The 1960 Cx processors allow DMA operations to be executed in microcode while providing core bandwidth for the user s program This sharing of core resources is accomplished by implementing separate hardware processes for each DMA channel and for the user s program Alternating between the DMA and the user process enables the user code and up to four DMA processes one per channel to run concurrently The environments for the DMA and user processes are implemented entire
384. he process controls field from the fault record is not copied back to the PC register 7 8 2 System Local Fault Call When the fault handler selects an entry for a local procedure in the system procedure table entry type 105 the processor performs the same action as is described in the previous section for a local fault call or return The only difference is that the processor gets the fault handling procedure s address from the system procedure table rather than from the fault table 7 8 3 System Supervisor Fault Call When the fault handler selects an entry for a supervisor procedure in the system procedure table the processor performs the same action described in section 5 2 3 1 Call Operation pg 5 5 with the following exceptions e Ifin user mode when the fault occurs the processor switches to supervisor mode reads the supervisor stack pointer from the system procedure table and switches to the supervisor stack A new frame is then created on the supervisor stack e Ifin supervisor mode when the fault occurs the processor creates a new frame on the current stack If the processor is executing a supervisor procedure when the fault occurred the current stack is the supervisor stack if it is executing an interrupt handler procedure the current stack is the interrupt stack The processor switches to supervisor mode when handling interrupts e fault record is copied into the area allocated for it in the new stack frame
385. he series resistor as close as possible to the signal source Series termination however reduces signal rise and fall times so it should not be used when these times are critical AC termination is effective in reducing signal reflection ringing This termination is accom plished by adding an RC combination at the signal s destination Figure 14 8 While the termination provides no DC load the RC combination damps signal transients Selection of termination methods and values is dependent upon many variables such as output buffer impedance board trace impedance and length and timings that must be met CA080A Figure 14 7 Series Termination 14 30 intel INITIALIZATION AND SYSTEM REQUIREMENTS a Source F CA081A Figure 14 8 AC Termination 14 4 8 Latchup Latchup is a condition in a CMOS circuit in which Vcc becomes shorted to Intels CHMOS IV process is immune to latchup under normal operation conditions Latchup can be triggered when the voltage limits on I O pins are exceeded causing internal PN junctions to become forward biased The following guidelines help prevent latchup e Observe the maximum rating for input voltage on I O pins e Never apply power to an 1960 Cx processor or a device connected to 1960 Cx processor pin before applying power to the 1960 Cx processor itself e Prevent overshoot and undershoot on I O pins by adding line terminati
386. here available and static caching data RAM 2 8 1 Instruction Cache If an algorithm fits into the instruction cache it generally executes faster than if it did not fit This is true even if the compressed code contains more comparisons and branches than uncompressed code contains If a loop fits in the cache but is not capable of executing two instructions per clock due to memory or resource dependencies keep unrolling the loop and pipelining operations until the cache is full To increase performance of loops with multiple iterations and memory operations unroll the loops until all registers are used or the cache is full If the system is interrupt intensive consider locking interrupt service routines into the cache On the 1960 CF microprocessor cache locking is extended to any frequently executed code segments Some experimentation may be necessary to determine if cache locking impacts performance of remaining non locked code Finally as mentioned in a previous section on branches aligning branch targets can improve performance While long word aligned branch targets improve the scheduler s lookahead ability in the first clock of the branch quad word aligned branch targets reduce the number of long word instruction fetches issued Although the long word fetch is implemented to reduce cache miss latency for many cases the quad word instruction fetch is more efficient for system throughput A 2 8 2 Data Cache i960 CF Processor O
387. hese fields should be treated as if they were reserved fields They should be set to the specified value when the data structure is created or when the register is initialized and software should not modify or rely on the value after that Reserved bits in the Special Function Registers and Arithmetic Controls AC register can be set to 0 after initialization to ensure compatibility with future implementations Reserved bits in the Process Controls PC register and Trace Controls TC register should not be initialized When the AC PC and TC registers are modified using modac modpc or modtc instructions the reserved locations in these registers must be masked Certain areas of memory may be referred to as reserved memory in this reference manual Reserved when referring to memory locations implies that an implementation of the 1960 architecture may use this memory for some special purpose For example memory mapped peripherals would be located in reserved memory areas on future implementations Programs may use reserved memory just like any other memory unless it is specifically documented otherwise 1960 Cx processors Initialization Boot Record must be located in reserved memory at address FFFF FF00H System designers typically map the entire boot ROM into the reserved memory to reduce the complexity of the select decoding 1 6 intel INTRODUCTION 1 4 2 Specifying Bit and Signal Values The terms set and clear in this ma
388. his bit performs the same function as clearing the mask register The mask operation field bits 11 12 determines the operation the core performs on the mask register when a hardware generated interrupt is serviced On an interrupt the IMSK register is either unchanged cleared for dedicated mode interrupts cleared for expanded mode interrupts or cleared for both dedicated and expanded mode interrupts The vector cache enable bit bit 13 determines whether interrupt table vector entries are fetched from the interrupt table or from internal data RAM Only vectors with four least significant bits equal to 00102 may be cached in internal data RAM The sampling mode bit bit 14 determines whether dedicated inputs and NMI pin are sampled using debounce sampling or fast sampling Expanded mode inputs are always detected using debounce mode The DMA suspension bit bit 15 determines whether DMA continues running or is suspended while an interrupt procedure is being called Bits 16 through 31 are reserved and must be set to 0 at initialization 12 3 5 Interrupt Mapping Registers IMAPO IMAP2 The IMAP registers Figure 12 7 are three 32 bit registers IMAPO through IMAP2 These register s bits are used to program the vector number associated with the interrupt source when the source is connected to a dedicated mode input IMAPO and IMAPI contain mapping information for the external interrupt pins four bits per pin IMAP2 contains mappi
389. i 0 temp lt temp lt lt 1 11 1 dst temp if len gt 32 1 lt 32 else i len temp lt src while 1 0 temp lt temp gt gt 1 shift temp right one bit temp bit31 lt temp bit30 extend temp s sign bit ici 1 dst lt temp 1 lt len if 15 32 i lt 32 temp lt src Ss sign lt temp bit31 lost bit 0 while i 0 lost bit lt lost bit or temp bitO temp lt temp gt gt 1 shift temp right one bit temp bit31 temp bit30 extend temp s sign bit 14 1 1 if s sign 1 and lost bit 1 temp lt temp 1 dst lt temp Mismatch Non supervisor reference of a sfr Integer Overflow Result is too large for the destination register shli only If overflow occurs and AC om is a 1 the fault is suppressed and AC of is set to a 1 After an overflow dst will equal src shifted left as much as possible without overflowing 4 r6 46 lt 44 shifted left 13 bits intel Opcode See Also shlo shro shli shri shrdi divi muli rotate eshro 59CH 598H 59EH 59BH 59AH REG REG REG REG REG INSTRUCTION SET REFERENCE 9 71 INSTRUCTION REFERENCE intel Spanbit spanbit Span Over Bit Format spanbit STC dst reg lit sfr reg sfr Description Searches src value for the most significant clear bit 0 bit If a most significant 0 bit is found its bit number is stored i
390. i9609 Microprocessor User s Manual March 1994 Order Number 270710 003 Intel Corporation makes no warranty for the use of its products and assumes responsibility for any errors which may appear in this document nor does it make a commitment to update the information contained herein Intel retains the right to make changes to these specifications at any time without notice Contact your local Intel sales office or your distributor to obtain the latest specifications before placing your product order MDS is an ordering code only and is not used as a product name or trademark of Intel Corporation Intel Corporation and Intel s FASTPATH are not affiliated with Kinetics a division of Excelan Inc or its FASTPATH trademark or products Other brands and names are the property of their respective owners Additional copies of this document or other Intel literature may be obtained from Intel Corporation Literature Sales P O Box 7641 Mt Prospect IL 60056 7641 or call 1 800 879 4683 INTEL CORPORATION 1994 intel CONTENTS CHAPTER 1 INTRODUCTION 1 1 i9609 MICROPROCESSOR ARCHITECTURE sseeteteteeteenenis 1 1 1 1 1 Parallel Instruction Execution 2 eene 1 1 1 1 2 Full Procedure Call Model 4 0 1 3 1 1 3 Versatile Instruction Set and Addressing s 1 3 1 1 4 Integrated Priority Interrupt 1 3 1 1 5 Complete Fault Handling
391. ich requires a procedure number operand The procedure number provides an index into the system procedure table where the processor finds IPs for specific procedures Using an 1960 processor language assembler a system procedure is directly declared using the Sysproc directive At link time the optimized call directive callj is replaced with a calls when system procedure target is specified Refer to current 1960 processor assembler documents for a description of the sysproc and callj directives The system call mechanism offers two benefits First it supports application software portability System calls are commonly used to call kernel services By calling these services with a procedure number rather than a specific IP applications software does not need to be changed each time the implementation of the kernel services is modified Only the entries in the system procedure table must be changed Second the ability to switch to a different execution mode and stack with a system supervisor call allows kernel procedures and data to be insulated from applications code This benefit is further described in section 2 7 USER SUPERVISOR PROTECTION MODEL pg 2 20 5 12 intel PROCEDURE CALLS 5 5 1 System Procedure Table The system procedure table is a data structure for storing IPs to system procedures These can be procedures which software can access through 1 a system call or 2 fault handling procedures which the pr
392. ification of DMA data RAM for an active or idle channel may cause unpredictable DMA controller operation Conversely executing sdma may cause previously stored data to be overwritten in the data RAM 13 29 DMA CONTROLLER intel 13 10 5 Channel Setup Examples Example 13 1 Simple Block Mode Setup Block mode setup mov 0xc g4 Byte count 12 ldconst 0 src addr g5 Source address for channel 0 ldconst 0 dest addr g6 Destination addr for channel O0 ldconst Oxf g3 DMA ctl word 32 32 std source inc dest inc block sdma 0 43 44 Setup channel 0 Other instructions optional setbit enable channel O0 Example 13 2 Chaining Mode Setup Chaining mode setup ldconst ptrl1 g4 Initial descriptor pointer ldconst Oxla6f g3 DMA ctl word 32 32 std source hold dest inc demand source sync dest chain channel wait interrupt on buffer complete Setup channel 1 Other instructions optional setbit enable channel 1 Descriptor list in memory for chaining word 0x100 0 src addr bl dest addr ptr3 ptr2 word 0x200 0x0 0 dest addr 0x0 word 0 100 0 0 b2 dest addr ptr2 13 30 intel 13 11 DMA CONTROLLER DMA EXTERNAL INTERFACE DMA signal characteristics DACK3 0 DREQ3 0 EOP TC3 0 and DMA and DMA transfer timing requirements are described in the following sections Figure 13 13 illustrates the external interface Refer to the 1960 Cx mi
393. ify Flush Local Registers Modify Arithmetic Controls Modify Process Controls System Control DMA Control NOTE Asterisk denotes instructions that are i960 processor family s instruction set 4 4 Cx processor specific extensions to the i960 INSTRUCTION SET SUMMARY 4 2 1 Data Movement These instructions are used to move data from memory to global and local registers from global and local registers to memory and data among local global and special function registers Rules for register alignment must be followed when using load store and move instructions that move 8 12 or 16 bytes at a time See section 2 5 MEMORY ADDRESS SPACE pg 2 9 for alignment requirements for code portability across implementations 4 2 1 1 Load and Store Instructions Load instructions listed below copy bytes or words from memory to local or global registers or to a group of registers Each load instruction requires a corresponding store instruction to copy to memory bytes or words from a selected local or global register or group of registers All load and store instructions use the MEM format Id load word st store word Idob load ordinal byte stob store ordinal byte Idos load ordinal short stos store ordinal short Idib load integer byte stib store integer byte Idis load integer short stis store integer short Idi load long stl store long Idt load triple stt store triple Idq load quad stq store quad Id c
394. igh inactive DACK1 High inactive BLAST High inactive DACKO High inactive DT R Low receive EOP TC3 Floating input DEN High inactive EOP TC2 Floating input LOCK High inactive EOP TC1 Floating input BREQ Low inactive EOP TCO Floating input D C Floating NOTE 1 Pin states shown assume HOLD and ONCE pins are not asserted If HOLD is asserted during reset the hold is acknowledged by asserting HOLDA and the processor pins are configured in the Hold Acknowledge state See CHAPTER 10 THE BUS CONTROLLER If the ONCE pin is asserted the processor pins are all floated Table 14 2 Register Values After Reset Register Value after cold reset Value after warm reset AC AC initial image in PRCB AC initial image in PRCB PC C01F2002H 01 2002 TC initial image in PRCB TC initial image in PRCB FP g15 interrupt stack base interrupt stack base r0 undefined undefined SP r1 interrupt stack base 64 interrupt stack base 64 RIP r2 undefined undefined IPND 50 undefined value before warm reset IMSK sf1 00H 00H DMAC sf2 00H 00H NOTE 1 All control registers not listed are configured with their respective values from the control table after reset 14 3 INITIALIZATION AND SYSTEM REQUIREMENTS intel 14 2 2 Self Test Function STEST FAIL As part of initialization the 1960 Cx processors execute a bus confidence self test and o
395. ikely to be false The processor uses the programmer s prediction to prefetch and decode instructions along the most likely execution path when the actual path is not yet known If the prediction was wrong all actions along the incorrect path are undone and the correct path is taken For further discussion see section A 2 7 7 Branch Prediction pg 53 When the programmer provides no suffix with an instruction which supports a suffix the assembler makes its own prediction When an instruction supports prediction the mnemonic listing includes the notation tl f to indicate the option for example be t f Branch If Equal 9 2 intel INSTRUCTION SET REFERENCE 9 2 3 Format The Format section gives the instruction s assembly language format and allowable operand types Format is given in two or three lines The following is a two line format example sub srcl src2 dst reg lit sfr reg lit sfr reg sfr The first line gives the assembly language mnemonic boldface type and operands italics When the format is used for two or more instructions an abbreviated form of the mnemonic is used An asterisk at the end of the mnemonic indicates a variable in the above example sub is either subi or subo Operand names are designed to describe operand function e g src len mask The second line shows allowable entries for each operand Notation is as follows reg Global 20 215 or local 10 r15 register li
396. il Specific signal functions for the external bus signals DMA signals and interrupt inputs are discussed in their respective sections in this manual 14 4 1 Input Clock CLKIN The clock input CLKIN determines processor execution rate and timing The clock input is internally divided by two or used directly to produce the external processor clock outputs PCLKI and PCLK2 CLKMODE pin state determines whether the input clock is in two X or one X mode When CLKMODE is tied to ground or left floating the CLKIN input is internally divided by two to produce PCLK2 1 2X mode When CLKMODE is pulled to a logic 1 high the CLKIN input is used to create PCLK2 1 at the same frequency using an internal phase locked loop circuit 1X mode Refer to the 1960 CA CF microprocessor data sheets for CLKIN specifica tions The clock input is designed to be driven by most common TTL crystal clock oscillators The clock input must be free of noise and conform with the specifications listed in the data sheet CLKIN input capacitance is minimal for this reason it may be necessary to terminate the CLKIN circuit board trace at the processor to prevent overshoot and undershoot Additionally a series damping resistor may be required to damp ringing on the input 14 26 intel INITIALIZATION AND SYSTEM REQUIREMENTS 14 4 2 Power and Ground Requirements Vcc Vss The large number of and pins effectively reduces the impedance of
397. ile 2 9 2 5 1 Memory Requirements 2 10 2 5 2 Data and Instruction Alignment the Address Space 2 11 2 5 8 Byte Word and Bit Addressing 2 2 11 2 5 4 Internal Data RAM cidit eit p pa i e 2 12 2 5 5 Iristr cti n Gache iid ede ee ee eti itn pe EDT HS ees 2 13 2 5 6 Data Cache 80960CF Only 2 beh deeds 2 14 2 6 PROCESSOR STATE 5 5 2 14 2 6 1 Instruction Pointer IP Register 2 2 15 2 6 2 Arithmetic Controls AC Register 2 15 2 6 2 1 Initializing and Modifying the AC Register 014221222 2 16 2 6 2 2 Condition eter enin bad NH E edet erede 2 16 5 intel 9 2 6 3 Process Controls PC Register 88 2 17 2 6 3 1 Initializing and Modifying the PC Register 22 1101 2 19 2 6 4 Trace Controls TC 2 20 2 7 USER SUPERVISOR PROTECTION MODEL eee 2 20 2 7 1 Supervisor Mode Resources seeessssssseseseeeeeeeeenmneeen nnne rne nennen 2 20 2 7 2 Using the User Supervisor Protection Model 2 21 CHAPTER 3 DATA TYPES AND MEMORY ADDRESSING MODES 3 1 DATAGEY PES dba ttd ean riche M LM 3 1 3 1 1 Integers eene eei eR n
398. ill and a cache miss is four quad word reads followed by one quad word fetch Wait states in the instruction fetch or the frame fill directly impact return speed calls consumes up to 56 issue clocks if the call is to a supervisor procedure If the call is to a non supervisor procedure calls takes 38 issue clocks These times assume an available register cache location and a cached target During calls execution the processor accesses the system procedure table with a single word read and a long word read The presence of several wait states in these reads directly affects the instruction s performance The impact of non cached target code or a frame spill on the calls instruction is identical to the impact on the call instruction callx timing is similar to call instruction timing with the exception of issue clocks Table A 18 shows total issue clocks for callx Table A 18 callx Performance The following instruction consumes n issue clocks before target code is issued where n for each addressing mode is as follows disp offset Mnemonic reg disp reg offset reg reg reg scale disp reg scale disp reg reg scale disp IP calix 7 9 9 Times shown assume instruction cache hits A 41 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 6 7 Conditional Faults fault instructions are implemented with micro flows and require one issue clock if the prediction bit is correct and no fault o
399. imum time delay measured between the assertion of DREQ3 0 and the assertion of the corresponding DACK3 0 pin In this section latency is derived in number of PCLK2 1 cycles This value is denoted by the symbol argncy The established measure of DMA latency in units of seconds is derived with this equation N DMA Latency seconds Equation 13 2 C where Latency PCLK2 1 cycles fc PCLK2 1 frequency 13 38 intel CONTROLLER DREQ 3 0 DACK 3 0 Niatency Niatency i Nrago N Latency Sec Throughput 2 Bytes Sec Number of Latency Clocks Number of Clocks Per DMA Request Brea Number of Bytes Per DMA Request fe PCLK2 1 Frequency F_CA074A Figure 13 17 DMA Throughput and Latency 13 11 11 DMA Throughput DMA throughput NTRgQ for a particular system is governed by the following factors e DMA transfer type e Memory system configuration e Bus activity generated by the user process DMA throttle bit value 15 derived from the transfer clocks Nxpggg provided in Table 13 5 Values in this table are derived assuming e No bus activity is generated by the user process e DMA transfer source and destination memory are zero wait states or internal data RAM Table 13 5 provides the number of PCLK2 1 cycles required for each unit DMA transfer Transfer clock values denoted
400. in the queue the fetch unit satisfies the queue request If the queue entry that the scheduler requests contains a full group two words of instruc tions the valid groups in the queue are also written to the cache in the same clock that they are given to the scheduler The least recently used set is updated A 2 6 Micro flow Execution The 1960 Cx processors parallel processing units directly execute about half of the processor s instructions The processor services the remaining complex instructions by executing a sequence of simple instructions from an on chip ROM Complex instructions are detected in the clock in which they are fetched This information becomes part of the instruction encoding stored in the instruction fetch unit queue and or instruction cache A 36 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Micro flow instruction sequences are written to enable the parallel processing units to perform the required function as fast as possible Micro flows use instructions described in prior sections of this appendix machine types REG and CTRL and some special parallel circuitry to carry out the complex instructions An instruction which cannot be directly issued to a parallel processing unit is said to have the machine type p 2 6 1 Invocation and Execution Invocating a micro flow can be considered analogous to the processor s execution of an uncondi tional branch into the on chip ROM However pre d
401. ination request length 1 quad word transfers 16 Multi cycle mode at least one address fixed smallest fixed transfer length bytes Multi cycle mode both addresses incrementing 1 All fly by mode transfers transfer length bytes Multi cycle DMAs to aligned memory blocks perform better than DMAs to unaligned memory blocks Additional microcode cycles are required to access the unaligned memory 13 10 intel DMA CONTROLLER Most unaligned DMA transfers however use the external bus almost as efficiently as aligned DMAs Multi cycle DMA configurations which use the bus efficiently when memory blocks are unaligned are e Word to Word e Byte to Short e Byte to Word e Short to Byte e Word to Byte Table 13 3 DMA Transfer Alignment Requirements Boundary Alignment Requirements Transfer Types Source Address Destination Source to Destination or Fly by Address Address Fixed Incr Fixed Incr Byte to Byte 8 8 bit Multi cycle Byte Byte Byte Byte Fly by Byte Byte N A N A Byte to Short 8 16 bit Multi cycle Byte Byte Short Byte Byte to Word 8 32 bit Multi cycle Byte Byte Word Byte Short to Byte 16 8 bit Multi cycle Short Byte Byte Byte Short to Short 16 16 bit Multi cycle Short Byte Short Byte Fly by Short Short N A N A Short to Word 16 32 bit Multi cycle Short Byte Word Byte Word to Byte 32 8 bit Multi cycle Word Byte Byte Byte Word to Short 32 1
402. ing and Relocating Data Structures 14 11 14 3 2 Initialization FOW rec Eee CHEER 14 12 14 3 3 Startup Code Example 14 14 14 4 SYSTEM REQUIREMENTS vfi eL ib pat cedebat debet n dy etta 14 26 14 4 1 Input Glock CEKIN oerte rra IR eie eed inu 14 26 14 4 2 Power and Ground Requirements Voc 14 27 14 4 3 Power and Ground Planes esee nennen nnne 14 27 14 4 4 Decoupling Capacitors 1 ipae e EI Ee pde dos 14 28 14 4 5 Pin Characteristics entend Pene ti Perinde eee ER ie 14 28 14 4 5 1 Output PINS 1 5 cedi thi mec o eid ene eee 14 28 14 4 5 2 Ut PINS e 14 29 14 4 6 High Frequency Design Considerations 2 14 29 14 4 7 Line Termination reete eite D erede etd 14 30 14 4 8 EIE 14 81 14 4 9 Interference 20 eese t eoe ue Lise Pe toe d nien edel Heels 14 31 xii intel 3 CONTENTS APPENDIX A INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 1 INTERNAL PROCESSOR A 2 A 1 1 Instruction Scheduler 15 A 3 A 1 2 Instruction Flow 4 A 1 3 Register File HE UH EPA THERME 6 A 1 4 Execution Unit El ascites eere itte ide et A 7 A 1 5 Multiply Divide Unit 7 A 1 6 Address Generation Unit AGU A 7
403. ing examples describe how the interrupt controller can be dynamically configured after initialization Example 12 2 sets up the interrupt controller for expanded mode operation Here a value which selects expanded mode operation is loaded into the ICON register The sysctl instruction is issued with the load control register message type 04H and selecting group number 01H from the control table Group 01H contains the ICON and IMAP registers Note that the IMAP registers as well as the ICON register are reloaded with this operation Modifying the control table implies that the table or part of it must reside in RAM If the control registers are modified after initialization the control register must be relocated to RAM by reini tializing See section 14 3 1 Reinitializing and Relocating Data Structures pg 14 11 Example 12 2 Programming the Interrupt Controller for Expanded Mode Exampl xpanded mode setup mov 0 5 1 ldconst 0x01 40 clear IMSK register mask all interrupts st gO ctrl table ICON store mode information to control table ldconst 0x401 r4 create operand for sysctl selects load control register message type selects register group 1 load control register unmask expanded interrupts 12 3 9 Implementation The interrupt controller microcode and core resources handle all stages of interrupt service Interrupt service is handled in the following stages Requesting Interrupts In the 1960
404. ing the use of long instructions may also be used to optimize interrupt performance 12 19 INTERRUPT CONTROLLER In 12 3 12 Vector Caching Option To reduce interrupt latency the 1960 Cx processors allow some interrupt table vector entries to be cached in internal data RAM When the vector cache option is enabled and an interrupt request is serviced which has a cached vector the controller fetches the associated vector from internal RAM rather than from the interrupt table in memory Interrupts with a vector number with four least significant bits equal to 0010 can be cached vectors that can be cached coincide with the vector numbers that are selected with the mapping registers and assigned to dedicated mode inputs The vector caching option is selected when programming the ICON register software must explicitly store the vector entries in internal RAM Since the internal RAM is mapped directly to the address space this operation can be performed using the core s store instructions Table 12 1 shows the required vector mapping to specific locations in internal RAM For example the vector entry for vector number 18 must be stored at RAM location 04H and so on The NMI vector is also shown in Table 12 1 This vector is always cached in internal data RAM at location 0000H The processor automatically loads this location at initialization with the value of vector number 248 in the interrupt table Table 12 1 Location of Cached
405. instruction memories SUP can be used to protect hardware from accesses while the processor is not in user mode The bus is in the idle state between bus requests Idle bus state begins after cycles and ends when ADS is asserted The bus controller aligns all bus accesses non aligned accesses are translated into a series of smaller aligned accesses Alignment is described in section 10 4 DATA ALIGNMENT pg 10 9 11 2 1 Wait States In non burst mode it is possible to insert wait states between the address and data cycle In a burst mode access it is possible to insert wait states between the address cycle and data cycle and between subsequent data cycles for a burst access It is also possible to insert wait states between bus accesses which occur back to back intel EXTERNAL BUS DESCRIPTION 1960 Cx processors bus controller provides an internal counter for automatically inserting wait states The bus controller provides control of five different wait state parameters Figure 11 1 and the following text describe each parameter Nrap Nwap Nwpp NxpA Number of wait cycles for Read Address to Data The number of wait states between the address cycle and first read data cycle Np Ap can be programmed for 0 31 wait states Number of wait cycles for Read Data to Data The number of wait states between consecutive data cycles of a burst read Ngpp be programmed for 0 3 wait states
406. instructions 9 40 INDEX fault record 7 6 address of faulting instruction field 7 6 fault subtype field 7 6 fault type field 7 6 location 7 6 7 8 optional data fields 7 7 size 7 8 structure 7 6 fault table 2 1 2 8 7 4 local call entry 7 6 location 7 4 system call entry 7 6 fault type and subtype numbers 7 2 fault types 7 4 fault 9 40 faulte faultne faultl faultle 9 40 faultg faultge 9 40 faulto faultno 9 40 faults C 6 NIF bit 7 19 overview 1 4 precision syncf 7 19 fetch latency A 34 fetch strategy A 34 field definition 1 7 flag definition 1 7 flushreg 5 9 9 42 fly by transfer mode 13 2 13 5 fmark 9 43 FP Frame Pointer g15 5 4 frame fills 5 6 Frame Pointer FP 5 4 location 2 3 frame spills 5 6 Index 5 INDEX G global registers 2 1 2 2 overview 1 7 IBR see initialization boot record ICON register 12 11 IMAP registers 12 12 IMI 14 5 implementation specific features C 1 implicit calls 5 1 7 2 IMSK register 12 7 12 14 Index with Displacement 3 7 indivisible access 2 10 initial memory image IMI 14 5 initialization 14 1 14 2 CLKIN 14 26 data structure locations 14 11 hardware requirements 14 26 power and ground 14 27 Initialization Boot Record IBR 2 1 2 8 14 5 initialization mechanism C 5 instruction buffer 2 14 Instruction Cache cache replacement A 36 fetch latency A 34 fetch strategy A 34 locking A 33 Index 6 m intel instruction cache 2 1 2 13
407. int TC brf Hardware Breakpoint Event Flags Instruction Breakpoint 0 TC iOf Reserved Instruction Address Breakpoint 1 TC i1f Data Address Breakpoint 0 TC dOf Data Address Breakpoint 1 TC d1f F_CA 023A Figure F 22 Trace Controls TC Register Section 8 1 1 Trace Controls TC Register pg 8 2 F 20 intel REGISTER AND DATA STRUCTURES AC Register Initial Image Condition Code Bits AC cc Integer Overflow Flag AC of 0 no overflow 1 overflow Integer Overflow Mask Bit AC om 0 enable overflow faults 1 mask overflow faults No Imprecise Faults Bit AC nif 0 allow imprecise fault conditions 1 prevent imprecise fault conditions 31 28 24 20 16 n 12 Fault Configuration Word Must be set to 1 31 28 24 20 16 12 8 4 Initialize to 0 0 Mask Non Aligned Bus Request Fault 0 enable the fault 1 mask the fault Instruction Cache Configuration Word Disable Instruction Cache 0 enable cache 1 disable cache 31 28 24 20 16 12 8 0 Register Cache Configuration Word Number of cached register sets 0 15 1 51 28 24 20 16 12 8 4 0 Reserved CR076A Figure F 23 Process Control Block Configuration Words Section 14 3 REQUIRED DATA STRUCTURES pg 14 11 F 21 intel INDEX intel A Absolute absolute displacement 3 6 absolute offset 3 6 AC registe
408. integer bit or bit field instruction 3 1 5 Data Alignment Data in registers and memory must adhere to specific alignment requirements Align long word operands in registers to double register boundaries e Align triple and quad word operands in registers to quad register boundaries For the 1960 Cx processors data alignment in memory is not required Unaligned memory accesses by programmable option can either cause a fault or be handled automatically Refer to section 2 5 2 Data and Instruction Alignment in the Address Space pg 2 11 for a complete description of alignment requirements for data and instructions 3 2 BYTE ORDERING 1960 Cx processors can be programmed to use little or big endian byte ordering for memory accesses Byte ordering refers to how data items larger than one byte are assembled e For little endian byte order the byte with the lowest address in a multi byte data item has the least significance e For big endian byte order the byte with the lowest address in a multi byte data item has the most significance For example Table 3 3 shows 4 bytes of data in memory Table 3 4 shows the differences between little and big endian accesses for byte short and word data Figure 3 2 shows the resultant data placement in registers Once data is read into registers byte order is no longer relevant The lowest significant bit is always bit 0 The most significant bit is always bit 31 for words
409. integer stores cannot be represented correctly in the destination width an Arithmetic Integer Overflow fault is signaled st stl stt and stq copy 4 8 12 and 16 bytes respectively from successive registers to memory For stl src must specify an even numbered register e g g0 g2 or rO r2 For stt and stq src must specify a register number that is a multiple of four e g g0 g4 g8 or r0 r4 r8 st memory word dst lt src stob memory byte dst lt src truncated to 8 bits stib memory byte dst lt src truncated to 8 bits stos memory short dst src truncated to 16 bits stis memory short dst src truncated to 16 bits stl memory_long dst lt src stt memory_triple dst lt src stq memory_quad dst lt src Operation Unaligned An unaligned dst was referenced and bit 30 of the Fault Configuration Word is 0 Invalid Operand Invalid operand value encountered 9 73 INSTRUCTION SET REFERENCE intel Example Opcode See Also 9 74 Opcode Invalid opcode encoding encountered Arithmetic Integer Overflow Result is too large for destination stib and stis only If overflow occurs and AC om l the fault is suppressed and AC of is set to 1 After an overflow destination contains the least significant n bits of the store where n is the transfer width 8 or 16 bits Type Mismatch Non supervisor attempt to write to internal data RAM st g2
410. interrupt is requested simultaneously by a dedicated and an expanded mode source the interrupt is considered an expanded mode interrupt and the IMSK register is handled accordingly The IMSK register must be saved and cleared when expanded mode inputs request a priority 31 interrupt Priority 31 interrupts are interrupted by other priority 31 interrupts In expanded mode the interrupt pins are level activated For level activated interrupt inputs instructions within the interrupt handler are typically responsible for causing the source to deactivate If these priority 31 interrupts are not masked another priority 31 interrupt will be signaled and serviced before the handler is able to deactivate the source The first instruction of the interrupt handling procedure is never reached unless the option is selected to clear the IMSK register on entry to the interrupt Another use of the mask is to lock out other interrupts when executing time critical portions of an interrupt handling procedure All hardware generated interrupts are masked until software explicitly replaces the mask processor does not restore r3 to the IMSK register when the interrupt return is executed If the IMSK register is cleared the interrupt handler must restore the IMSK register to enable interrupts after return from the handler 12 3 EXTERNAL INTERFACE DESCRIPTION This section describes the physical characteristics of the interrupt inputs The 1960 Cx processors
411. ion of an Unaligned DMA 13 13 DMA Chaining Operation 13 14 Source Chaining 13 15 Synchronizing to Chained Buffer Transfers 13 17 DMA Command Register DMAC 13 22 Setup DMA sdma Instruction Operands 13 25 DMA Control Word 13 26 DMA Data RAM 13 28 DMA External Interface 13 30 DMA Request and Acknowledge Timing 13 32 intel Figure 13 15 Figure 13 16 Figure 13 17 Figure 14 1 Figure 14 2 Figure 14 3 Figure 14 4 Figure 14 5 Figure 14 6 Figure 14 7 Figure 14 8 Figure 14 9 Figure A 1 Figure A 2 Figure A 3 Figure A 4 Figure A 5 Figure A 6 Figure A 7 Figure A 8 Figure A 9 Figure A 10 Figure A 11 Figure A 12 Figure A 13 Figure A 14 Figure A 15 Figure A 16 Figure A 17 Figure A 18 Figure A 19 Figure B 1 Figure B 2 Figure B 3 Figure B 4 Figure B 5 Figure B 6 Figure B 7 Figure B 8 Figure B 9 CONTENTS EOP30 Timing cu ei teet e te ebbe ete teh e eds 13 33 DMA and User Requests in the Bus 13 36 DMA Throughput and 13 38 14 4 Initial Memory Image IMI and Process Control Block PROB 14 6 Process Control Block Configuration Words eene 14 9 Processor Initialization 14 13 Lowpass Filter 14 27 Reducing Characteristic 14 28 Series Te
412. ion register locking the processors use 33rd bit in each register to indicate whether the register is available or locked This bit is called the scoreboard bit There is a scoreboard bit for each of the 32 registers A 18 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Table A 4 Scoreboarded Resource Conditions Condition Description BCU Queue Full Bus Controller queues are full and the scheduler is attempting to issue a memory request MDU Busy The Multiply Divide Unit is busy executing a previously issued instruction and the scheduler is attempting to issue another instruction for which the MDU is responsible DR Busy On chip data RAM can support one 128 bit load or store every clock However the data RAM has no queues for storing requests The unit stalls execution if a new request is issued to it when it has not been allowed to return data from a prior instruction For example if the DR and BCU attempt to return results over the load bus in the same clock the BCU wins the arbitration This delays the DR result by one clock If simulta neously the IS is attempting to issue another instruction to the data RAM the DR stalls the processor for one clock Register bypassing eliminates a pipeline stall that would otherwise occur when one parallel processing unit is returning a result to a register over one port while in the same clock another unit is assessing the same register over a different port
413. ion specified with targ and begins execution of new procedure callx performs the same operation as call except the target instruction can be farther than 223 to 223 4 bytes from current IP The operand is a memory type which allows the full range of addressing modes to be used to specify the IP of the target instruction The IP displacement addressing mode allows the instruction to be IP relative Indirect calls can be performed by placing the target address in a register and then using one of the register indirect addressing modes Refer to CHAPTER 3 DATA TYPES AND MEMORY ADDRESSING MODES for a complete discussion of addressing modes wait for any uncompleted instructions to finish temp lt SP 0x10 andnot 0xf round to next boundary RIP lt next IP memory FP lt r0 15 these accesses are cached in local register cache lt FP PFP rt lt 000 FP lt temp SP lt temp 64 IP lt targ Trace Instruction Call Instruction and Call Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or 1 Operation Unimplemented Execution from on chip data Operand Invalid operand value encountered Opcode Invalid operand encoding encountered intel INSTRUCTION SET REFERENCE Example 11 95 IP lt 45 where the address in g5 is the address of the new procedure Opcode callx 86H MEM See Als
414. ipherals DMA channels perform single cycle or multi cycle transfers and can perform data packing and unpacking between peripherals and memory with varying bus widths Also provided are block transfers in addition to source or destination synchronized transfers The DMA supports various transfer types such as high speed fly by quad word transfers and data chaining with the use of linked descriptor lists The high performance fly by mode is capable of transfer speeds of up to 59 Mbytes per second at 33 MHz 1 4 intel INTRODUCTION 1 2 3 Priority Interrupt Controller The interrupt controller provides full programmability of 248 interrupt sources into 31 priority levels The interrupt controller handles prioritization of software interrupts hardware interrupts and process priority In addition it also manages four internal sources from the DMA controller and a single non maskable interrupt input 1 3 ABOUT THIS MANUAL This i960 CA CF Microprocessor User s Manual provides detailed programming and hardware design information for the 1960 Cx microprocessors It is written for programmers and hardware designers who understand the basic operating principles of microprocessors and their systems This manual does not provide electrical specifications such as DC and AC parametrics operating conditions and packaging specifications Such information is found in the 80960 micropro cessor data sheets 80960CA order number is 270727 8096
415. is in the same executable group as the instruction which modified the condition codes or e the branch takes one clock if it is in the executable group adjacent to the group that modifies the condition codes 2 5 Instruction Cache And Fetch Execution The instruction cache provides three or four consecutive opcode words to the IS on every clock This capability allows the processor to dispatch instructions from the processor s sequential instruction stream to multiple independent parallel processing units When a cache miss occurs or is about to occur the Instruction Fetch Unit issues instruction fetch requests to the BCU 2 5 1 Instruction Cache Organization 1960 Cx processors instruction cache is a two way set associative cache organized in two sets of eight word lines e 1960 CA processor cache is 1 KByte organized as two sets of 16 eight word lines e Thei960 CF processor cache is 4 KBytes organized as two sets of 64 eight word lines Each line is composed of four two word blocks which can be replaced independently On every clock the cache accesses one or two lines and multiplexes the correct three or four words to the IS Three words are valid if the requested address is for odd word in memory A2 1 Four words are valid if the requested address is for an even word of memory A2 0 The 1960 CA processor s instruction cache supports pre loading and locking of none half or all of the instruction cache
416. is not affected In this case the processor allows work on a program to be resumed at the point where the fault occurred following a return from a fault handling procedure initiated with a ret instruction The resumption mechanism used here is similar to that provided for returning from an interrupt handler To use this mechanism the fault handling procedure must be invoked using a supervisor call This method is required because to resume work on the program at the point where the fault occurred the saved process controls in the fault record must be copied back into the PC register upon return from the fault handling procedure The processor only performs this action if the processor is in supervisor mode when the return is executed 7 7 4 Returning to a Point in the Program Other Than Where the Fault Occurred A fault handling procedure can also return to a point in the program other than where the fault occurred To do this the fault procedure must alter the RIP 7 18 FAULTS intel To predictably perform a return from a fault handling procedure to an alternate point in the program the fault handling procedure should perform the following four steps 1 Flush the local register sets to the stack with flushreg instruction 2 Modify the RIP in the previous frame 3 Clear the trace fault pending flag in the fault record s process controls field before the return 4 Execute a return with the ret instruction Use this te
417. ister is 0105 or 011 Breakpoint Generates a trace event following any processor action that causes a breakpoint condition such as a mark or fmark instruction or a match of the instruction address breakpoint register or the data address breakpoint register Trace fault subtype and fault subtype field bits are associated with each mode Multiple fault subtypes can occur simultaneously the fault subtype bit is set for each subtype that occurs When a fault type other than a trace fault is generated during execution of an instruction that causes a trace event a non trace fault is handled before a trace fault An exception is the prereturn trace fault which occurs before the processor detects a non trace fault and is handled first Similarly if an interrupt occurs during an instruction that causes a trace event the interrupt is serviced before the trace fault is handled Again the prereturn trace fault is an exception Since it is generated before the instruction it is handled before any interrupt that occurs during instruction execution The address of the faulting instruction field in the fault record contains the IP for the instruction that causes the trace event For the prereturn trace fault this field has no defined value IP for the instruction that would have executed next if the fault had not occurred This fault type is always precise regardless the NIF bit value A change in the program s control flow accompanies all
418. ities indivisible and atomic access are required only when multiple processors or other external agents such as DMA graphics controllers share a common memory indivisible access Guarantees that a processor reading or writing a set of memory locations completes the operation before another processor or external agent can read or write the same location The processor requires indivisible access within an aligned 16 byte block of memory atomic access A read modify write operation Here the external memory system must guarantee that once a processor begins a read modify write operation on an aligned 16 byte block of memory it is allowed to complete the operation before another processor or external agent is allowed access to the same location An atomic memory system can be implemented by using the LOCK signal to qualify hold requests from external bus agents LOCK is asserted for the duration of an atomic memory operation The upper 16 Mbytes of the address space addresses 00 0000H through FFFF FFFFH are reserved for implementation specific functions In general programs can access this address space section unless an implementation specifically uses the memory or forbids access This address range is termed reserved so future 1960 architecture implementations may use these addresses for special functions such as mapped registers or data structures Therefore to ensure complete object level compatibility po
419. itpos src M1 targ T 52 0011 1000 5 1 src2 M1 targ 52 0011 1001 src1 src2 M1 targ 52 0011 1010 5 1 src2 M1 targ 52 0011 1011 5 1 src2 M1 targ 52 0011 1100 src1 src2 M1 targ 52 0011 1101 src1 src2 M1 targ 52 00111110 src1 src2 M1 targ 52 00111111 src1 src2 M1 targ 52 Table E 4 CTRL Format Instruction Encodings Opcode 09 0A 0B 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1 call ret bal bno bg be bge bl bne ble bo faultno faultg faulte faultge faultl faultne faultle faulto MACHINE LANGUAGE INSTRUCTION REFERENCE t g 8 5 o 2 S 2 24 2 1 0 0000 1000 targ T 0 0000 1001 targ T 0 0000 1010 T 0 0000 1011 targ T 0 0001 0000 targ T 0 0001 0001 targ T 0 0001 0010 targ T 0 0001 0011 targ T 0 0001 0100 targ T 0 0001 0101 targ T 0 0001 0110 targ T 0 0001 0111 targ T 0 0001 1000 T 0 0001 1001 T 0 0001 1010 T 0 0001 1011 T 0 0001 1100 T 0 0001 1101 T 0 0001 1110 T 0 0001 1111 T 0 E 5 MACHINE LANGUAGE INSTRUCTION REFERENCE Table E 5 MEM Format Instruction Encodings i 24 23 19 18 14 1312 110 Opcode src dst ABASE Mode
420. ix When sysctl executes the load control register message type and group number are specified sysctl moves the quad word group of register values from the control table in memory and writes the values to on chip registers See section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 At initialization the control table is automatically loaded into the on chip control registers This action simplifies the user s startup code by providing a transparent setup of the processor s peripherals at initialization See CHAPTER 14 INITIALIZATION AND SYSTEM REQUIRE MENTS 2 6 PROGRAMMING ENVIRONMENT 31 0 00H 04H F 002 Figure 2 2 Control Table 2 7 PROGRAMMING ENVIRONMENT intel 2 4 ARCHITECTURE DEFINED DATA STRUCTURES The architecture defines a set of data structures including stacks interfaces to system procedures interrupt handling procedures and fault handling procedures Table 2 3 defines the data structures and references other sections of this manual where detailed information can be found Table 2 3 Data Structure Descriptions Structure see also Description user stack The processor uses this stack when executing application code section 5 6 USER AND SUPERVISOR STACKS pg 5 15 system procedu re table Contains pointers to system procedures Application code uses secon 27 USER SUPERVISOR PO te PROTECTION MODEL pg 2 20 EE supervisor call switches ex
421. ized 8 to 32 bit transfer requires 4 DMA requests before the transfer is complete In this case Bxpggg 4 Nxrgg739 clocks but Bggo 1 Brag because the source of this source synchronized transfer is only 8 bits 1 byte wide This leads to a of 39 4 clocks By changing this example from source synchronized to destination synchronized becomes 39 clocks This is due to the fact that the destination is 32 bits 4 bytes wide and a complete transfer occurs every DMA request 13 11 12 DMA Latency DMA latency in a system depends on the following factors e DMA transfer type and subsequently the worst case throughput value calculated for that transfer e Number of channels enabled and the priority of the requesting channel e Status of the suspend DMA on interrupt bit in the Interrupt Control register ICON dmas DMA latency is the sum of the worst case throughput for the channel plus added components which are dependent on the configuration of the DMA controller DMA latency is denoted as Ni ATENCY in the following discussion and is measured in number of PCLK2 1 cycles Table 13 6 shows the values for worst case throughput Nrggo NT first and chain describe DMA throughput Nrggo derived in Equation 13 3 describes the average DMA throughput measured for a transfer which is in progress NT first chain represent boundary conditions of throughput for the following conditions 7 7 First DMA transfer in no
422. k depending on the mode in which the fault is handled Once these data structures and the code for the fault procedures are established in memory the processor handles faults automatically and independently from application software The processor can detect a fault at any time while executing instructions whether from a program interrupt handling procedure or fault handling procedure If a fault occurs during program execution the processor determines the fault type and selects a corresponding fault handling procedure from the fault table It then invokes the fault handling procedure by means of an implicit call As described later in this chapter the fault handler call can be e a local call call extended operation e asystem local call local call through the system procedure table e asystem supervisor call also through the system procedure table As part of the implicit call to the fault handling procedure the processor creates a fault record on the stack that the fault handling procedure is using This record includes information on the fault and the processor s state when the fault was generated After the fault record is created the processor executes the selected fault handling procedure If the fault handling procedure recovers from the fault the processor then restores itself to its state prior to the fault and resumes program execution with no break in program control flow If the fault handling procedure cannot recover f
423. k cycle a processor must decode multiple instructions in parallel and simultaneously issue these instructions to parallel processing units The various processing units must then be able to independently access instruction operands in parallel from a common register set The on chip instruction cache enables parallel decode by constantly providing the next four unexecuted instructions to the processor s instruction scheduler In a single clock cycle the scheduler inspects all four instructions and issues one two or three of these instructions in the same clock cycle 1 Throughout this manual refers to both the 1960 CA and CF microprocessors Information that is specific to each is clearly indicated 1 1 INTRODUCTION intel Four Channel DMA Port DMA Controller Instruction Prefetch Queue r ontrol Instruction Cache Memory Region Two Way Configuration Set Associative Bus Controller Address 128 Bit Cache Bus Bus Request Interrupt Programmable Queues Parallel 1 Kbyte Direct Mapped Port Interrupt Controller Instruction Data Cache Scheduler Multiply Divide Unit 1 4 Data RAM Execution Register Side Memory Side Unit Machine Bus Machine Bus 5 to 15 Sets Register Cache Register File Address 64 Bit 32 Bit Generation Unit SRC1 Bus Base Bus 64 Bit 128 Bit SRC2 Bus Load Bus 64 Bit 128 Bit DST Bus Store Bu
424. k unchanged 01 move to R3 and clear for dedicated mode interrupts 10 move to R3 and clear for expanded mode interrupts 11 move to R3 and clear for dedicated and expanded mode interrupts Vector Cache Enable ICON vce 0 fetch from external memory 1 fetch from internal RAM Sampling Mode ICON sm 0 debounce 1 fast DMA Suspension ICON dmas 0 run on interrupt 1 suspend on interrupt Interrupt Control Register ICON CA053A Reserved Initialize to 0 Figure 12 6 Interrupt Control ICON Register 12 11 Errata 12 06 94 SRB Vector Cache Enable bits ICON vce incorrectly defined Bit 0 was debounce it now is correctly defined as Fetch From External Memory Bit 1 was Fast is now correctly defined as Fetch From Internal RAM INTERRUPT CONTROLLER In The interrupt mode field bits 0 and 1 determines the operation mode for the external interrupt pins XINT7 0 dedicated expanded or mixed The signal detection mode bits bits 2 9 determine whether the signals on the individual external interrupt pins XINT7 0 are level low activated or falling edge activated Expanded mode inputs are always level detected the NMI input is always edge detected regardless of the bit s value The global interrupts enable bit bit 10 globally enables or disables the external interrupt pins and DMA inputs It does not affect the NMI pin T
425. l PROCEDURE CALLS available number of return argument registers the calling procedure passes a pointer to an argument list on its stack where the remaining return values will be placed Example 5 1 illustrates parameter passing by value and reference Local registers are automatically saved when a call is made Because of the local register cache they are saved quickly and with no external bus traffic The efficiency of the local register mechanism plays an important role in two cases when calls are made 1 When procedure is called which contains other calls global parameter registers are moved to working local registers at the beginning of the procedure In this way parameter registers are freed and nested calls are easily managed The register move instruction necessary to perform this action is very fast the working parameters now in local registers are saved efficiently when nested calls are made 2 When other procedures nested within an interrupt or fault procedure the procedure must preserve all normally non preserved parameter registers This is necessary because the interrupt or fault occurs at any point in the user s program and a return from an interrupt or fault must restore the exact processor state The interrupt or fault procedure can move non preserved global registers to local registers before the nested call Example 5 1 Parameter Passing Code Example Example of parameter passing
426. l This operand is the channel number 0 3 which is set up with sdma Values other than the valid channel numbers are reserved and can cause unpredictable results if used op2 This operand is the DMA control word for the channel The control word selects the modes and options for a DMA The value of this operand is described in section 13 10 3 DMA Control Word pg 13 25 op3 This operand is used differently depending on the DMA configuration and must be a quad aligned register r4 r8 r12 g0 g4 g8 or g12 e Non chaining multi cycle DMAs op3 is the first of three consecutive 32 bit registers The first register must be programmed with byte count the second the source address the third the destination address e Non chained fly by DMAs op3 is the first of two consecutive 32 bit registers The first register must be programmed with byte count the second the fly by address e All chained DMAs op3 is a single 32 bit register op3 must be programmed with a pointer to the first chaining descriptor See section 13 5 DATA CHAINING pg 13 13 for more information on chaining descriptors The channel setup mechanism started with the sdma instruction is two part sdma is a multi cycle instruction When sdma is issued 1 the instruction executes reading the register operands for the DMA operation then completes freeing these registers for use by other instructions 2 a DMA setup process is triggered to c
427. l request lengths Packing unpacking is handled more efficiently by the bus controller unit Matching the request lengths may increase latency for bus requests issued by the user process Quad word source and destination request lengths are used for highest DMA performance Quad transfers use the external bus most efficiently when the source or destination memory regions support burst accesses Since the request length for quad word transfers is always greater than the bus width DMA devices must support multiple data cycles for each requested DMA transfer Using quad word request lengths may increase bus latency for loads stores and instruction fetches that the user s program generates 13 8 intel CONTROLLER cases where source address destination address byte count are unaligned requests shorter than the selected request length are issued to align the transfers Refer to section 13 4 5 Data Alignment pg 13 10 13 4 4 Assembly and Disassembly The DMA controller internally assembles or disassembles data between different source and destination request lengths Assembly refers to the packing of narrow data into wider data Disassembly refers to the unpacking of wide data into narrow data Assembly and disassembly is performed automatically when a channel is set up with different source and destination request lengths Assembly and disassembly are performed for all aligned transfers configured with combi nations of by
428. l and Branch If Less Or Equal Compare Ordinal and Branch If Greater Compare Ordinal and Branch If Greater Or Equal cmpib t f srcl src2 targ reg lit reg sfr disp cmpob t f srcl src2 targ reg lit reg sfr disp Compares src2 and src values and sets AC register condition code according to comparison results If logical AND of condition code and mask part of opcode is not zero the processor branches to instruction specified with targ otherwise the processor goes to next instruction Optional t or f suffix may be appended to mnemonic Use t to speed up execution when these instructions usually take the branch Use f to speed up execution when these instructions usually do not take the branch If suffix is not provided assembler is free to provide one targ can be no farther than 2 to 2 4 bytes from current IP When using the Intel 1960 processor assembler targ must be a label which specifies target instruction s IP The following table shows the condition code mask for each instruction The mask is in bits 0 2 of the opcode Functions these instructions perform can be duplicated with a cmpi or cmpo followed by a branch if instruction as described in section 9 3 17 cmpi pg 9 29 9 31 INSTRUCTION SET REFERENCE Instruction Mask Branch Condition cmpibno 0005 No Condition cmpibg 0015 src gt src2 cmpibe 0105 src2 01
429. l code and data The code and data structures in the shaded areas can only be accessed in supervisor mode In this example kernel procedures are accessed through the system procedure table with system supervisor calls These procedures execute in supervisor mode Some application procedures are also called through the system procedure table using a system local call Fault procedures are executed in supervisor mode by directing the faults through the system procedure table Interrupt procedures which are likely to modify SFRs process controls or use other supervisor operations are executed in supervisor mode The interrupt stack and supervisor stack are insulated from the user stack in this system If an application does not require user supervisor protection mechanism the processor always execute in supervisor mode At initialization the processor is placed in supervisor mode prior to executing the first instruction of the application code The processor then remains in supervisor mode indefinitely as long as no action is taken to change execution mode to user mode The processor does not need a user stack in this case 2 21 PROGRAMMING ENVIRONMENT Application Program Call System Procedure System Exec Fault Handlers Table Fault Fault OT Table Interrupt Interrupt Interrupt Table Handlers pM Indicates data structure in protected memory Supervisor Stack Interrupt Stack C
430. l instruction fetches this maximizes the core s performance 1960 Cx processors use a microcode engine to implement complex instructions and functions This includes implicit and explicit calls returns DMA assists and initialization sequences Microcode provides a method for implementing complex instructions in the processors RISC environment Unlike conventional microcode 1960 Cx processor microcode uses a RISC subset of the instruction set in addition to specific micro instructions Microcode therefore can be thought of as a RISC program containing operational routines for complex instructions When the instruction pointer references a microcoded instruction the instruction fetch unit automatically branches to the appropriate microcode routine The 1960 Cx processors perform this microcode branch in 0 clocks A 1 2 Instruction Flow Most instructions flow through a three stage pipeline Figure A 3 e The decode stage calculates the address used to fetch the next instruction from the instruction cache Additionally this stage starts decoding the instruction e Theissue stage completes instruction decode and sends it to the appropriate execution unit e During the execute stage the operation is performed and the result is returned to the RF State 1 2 3 Figure A 3 Instruction Pipeline F CA085A 4 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION In the decode stage the IS decod
431. l synchronization however to guarantee detection on a particular PCLK2 1 cycle setup and hold requirements must be satisfied At the end of the acknowledge bus request DREQ3 0 may be held active to initiate further DMA transfers or DREQ3 0 may be driven inactive to prevent further transfers Depending on DMA mode arbitration for the next DMA transfer begins Case 1 On the PCLK2 1 cycle in which DACK3 0 is deasserted This timing applies to demand mode fly by transfers and multi cycle packing or unpacking modes with adjacent request loads or adjacent request stores Case 2 Two PCLK2 1 cycles after DACK3 0 is deasserted This timing applies to demand mode multi cycle transfers with alternating request loads and stores 13 32 intel CONTROLLER When DMA operation is destination synchronized the next load access is performed even if the request input is deasserted This prefetch is implemented to increase performance If the following DMA cycle is prevented prefetch data is saved internally and stored when the next transfer is requested The entire DMA cycle is not repeated 1 1 1 System POLT d NF 1 1 1 OO Start DMA ADS i i Bus Request 1 1 l End DMA BAST H34 amp READY Bus Request amp 1 1 1 1 1 1 1 AIT L DMA 1 Acknowledge
432. ld in the Process Control Block Reserved bits are set to 0 in the AC Register Initial Image Refer to CHAPTER 14 INITIALIZATION AND SYSTEM REQUIREMENTS After initialization software must not modify or depend on the AC register s reserved location The modify arithmetic controls modac instruction can be used to examine and or modify any of the register bits This instruction provides a mask operand that can be used to limit access to the register s specific bits or groups of bits such as the reserved bits The processor automatically saves and restores the AC register when it services an interrupt or handles a fault The processor saves the current AC register state in an interrupt record or fault record then restores the register upon returning from the interrupt or fault handler 2 6 2 2 Condition Code The processor sets the AC register s condition code flags bits 0 2 to indicate the results of certain instructions usually compare instructions Other instructions such as conditional branch instructions examine these flags and perform functions as dictated by the state of the condition code Once the processor sets the condition code flags the flags remain unchanged until another instruction executes that modifies the field Condition code flags show true false conditions inequalities greater than equal or less than conditions or carry and overflow conditions for the extended arithmetic instructions To show true or false con
433. led in this region and READY and BTERM are not asserted after the wait state generator has finished counting wait states continue to be inserted until READY is asserted If BTERM is asserted READY is ignored Data is then read and a new address cycle is generated The data cycle is followed by Nppp wait states These wait states separate burst data cycles and can be used to extend data access time of reads and data setup and hold times for writes BLAST assertion indicates the end of data transfer cycles for this access At this time DEN is deasserted Nxpa Wait states turnaround wait states are inserted after the last access of a bus request wait states follow BLAST only when BLAST is asserted for the last access of a bus request A new address cycle may start after cycles have expired NypA states allow slow devices to get off the bus 11 18 EXTERNAL BUS DESCRIPTION intel F CX030A 11 19 I 11 Ins 1 1 2 FF Disabled Enabled Valid T 1 1 1 1 1 4 ini E E vass Cs J 3 C1 Byte Bus Pipe cis faeces Width p m s ic x z 5191
434. ler These instructions may only be executed by programs operating in supervisor mode Refer to CHAPTER 9 INSTRUCTION SET REFERENCE and CHAPTER 13 DMA CONTROLLER for a description of these instructions 4 3 SYSTEM CONTROL FUNCTIONS System control functions are a group of operations specific to the 1960 Cx processor These operations are performed by issuing the system control sysctl instruction sysctl is a general purpose instruction which performs a variety of functions message type field an operand of the instruction determines which function is performed The system control functions include posting interrupts configuring the instruction cache invalidating the instruction cache software reinitialization and loading control registers 4 3 1 sysctl Instruction Syntax sysctl instruction syntax is generalized because the function of the operands differ depending on message type selection As shown in Figure 4 2 the instruction takes three source operands The message type field is always the second byte of the source operand The instruction s generalized operand fields designated as fields 1 through 4 are interpreted differently or may not be used depending on the function selected in the message type field see Table 4 3 sysctl is a supervisor only instruction Executing this instruction while in user mode generates the type mismatch fault 4 19 INSTRUCTION SET SUMMARY In 31 16 15 87 0 SRC1
435. lined reads Npap Wait States 3 7 Number of Read Address to Data wait states in the region Programmed for 0 31 Wait States Wait States 8 9 of Read Data to Data wait states in the region Programmed for 0 3 Wait Number of X read or write Data to Address wait states in the region Wait States 10 11 Programmed for 0 3 Wait States wait states are only inserted at the end of a bus request Number of Write Address to Data wait states in the region Programmed for 0 31 Wait States Number of Write Data to Data wait states in the region Programmed for 0 3 Wait States Determines region s data bus width Effects encoding of byte enable signals BES3 0 Byte Ordering 22 Selects region s byte ordering little endian or big endian Nwap Wait States 12 16 Nwpp Wait States 17 18 Bus Width 19 20 10 3 2 Bus Configuration Register BCON The Bus Configuration BCON register Figure 10 3 is a 32 bit register that controls MCON 0 15 and internal data RAM protection Table 10 2 defines the BCON register s programmable bits Configuration Table Valid BCON ctv 0 table not valid 1 table valid 0 protection OFF 1 protection ON Internal RAM Protection Enabled BCON irp Bus Configuration Register BCON Reserved Initialize to 0 029 Figure 10 3 Bus Configuration Register 10 8 intel THE B
436. ll to the selected procedure The action that the processor performs is the same as described in section 5 2 3 1 Call Operation pg 5 5 The call s target IP is taken from the system procedure table and the new stack frame is allocated on the current stack The calls algorithm is described in section 9 3 12 calls pg 9 22 5 5 3 System Call to a Supervisor Procedure When a calls instruction references an entry in the system procedure table with an entry type of 010 the processor executes a system supervisor call to the selected procedure The call s target IP is taken from the system procedure table processor performs the same action as described in section 5 2 3 1 Operation pg 5 5 with the following exceptions e Ifthe processor is in user mode it switches to supervisor mode e The new frame for the called procedure is placed on the supervisor stack e Ifa mode switch occurs the state of the trace enable bit in the PC register is saved in the return type field in the PFP register The trace enable bit is then loaded from the trace control bit in the system procedure table When the processor switches to supervisor mode it remains in that mode and creates new frames on the supervisor stack until a return is performed from the procedure that caused the original switch to supervisor mode While in supervisor mode either the local call instructions call and callx or calls can be used to call procedu
437. load to a full word bus operation A 1 8 5 Write Policy Write policy determines what happens on cacheable store operations The write policy for the i960 CF processor is write through and write allocate For cacheable stores data is written into both the cache and external memory simultaneously regardless of whether the write is a hit or miss This maintains coherency between the data cache and external memory For cacheable stores that are equal to or greater than a word in length cache tags and appropriate valid bits are updated whenever data is written into the cache Consider a word store as an example The tag is always updated and its valid bit is set The appropriate valid bit for that word is always set and the other three valid bits are always cleared Cacheable stores that are less than a word in length are handled differently Byte and short word stores that hit the cache 1 are contained in valid words within valid cache lines do not change the tag and valid bits The processor writes the data into the cache and external memory as usual A byte or short word store to an invalid word within a valid cache line leaves the word valid bit cleared because the rest of the word is still invalid In all cases the processor simultaneously writes the data into the cache and the external memory A 1 8 6 Data Cache Coherency DMA cycles and atomic accesses from the atmod and atadd instructions are implicitly non cacheable Otherwise enti
438. lock PRCB and system data structures Figure 14 2 shows the IMI components The IBR is fixed in memory the other components are referenced directly or indirectly by pointers in the IBR and the PRCB 14 2 5 Initialization Boot Record IBR The IBR is the primary data structure required to initialize the 1960 Cx processor The IBR is a 12 word structure which must be located at address FFFF FFOOH see Figure 14 2 The IBR is made up of four components the initial bus configuration data the first instruction pointer the PRCB pointer and the self test checksum data 14 5 INITIALIZATION AND SYSTEM REQUIREMENTS Fixed Data Structures Address Initialization Boot Record FFFFFFOOH Initial Bus Configuration least significant byte of each word FFFFFFTOH First Instruction Pointer PRCB Pointer 6 check words FFFFFF14H FFFFFF18H for bus confidence self test FFFFFF2CH Relocatable Data Structures User Code Process Control Block PRCB Fault Table base address Control Table base address AC Register initial image Fault Configuration Word Interrupt Table base address System Procedure Table base address Reserved Interrupt Stack Pointer Instruction Cache Configuration Word Register Cache Configuration Word Control Table Interrupt Table System Procedure Table other architecturally defined data structures not required as part of IMI 00H 04H 08H
439. ls to leaf procedures Leaf procedures typically call no other procedures Branch and link is the fastest way to make a call providing the calling procedure does not require its own registers or stack frame 5 18 intel INTERRUPTS intel CHAPTER 6 INTERRUPTS This chapter describes how a programmer uses the processor s interrupt mechanism defines data structures used for interrupt handling and describes actions that the processor takes when handling an interrupt CHAPTER 12 INTERRUPT CONTROLLER describes the mechanism for signaling and posting interrupts it is best suited for a system implementor 6 1 OVERVIEW An interrupt is an event that causes a temporary break in program execution so the processor can handle another chore Interrupts commonly request I O services or synchronize the processor with some external hardware activity For interrupt handler portability across the 19609 processor family implementations the architecture defines a consistent interrupt state and interrupt priority handling mechanism To manage and prioritize interrupt requests in parallel with processor execution the 1960 Cx processors provide an on chip programmable interrupt controller Requests for interrupt service come from many sources These requests are prioritized so that instruction execution is redirected only if an interrupt request is of higher priority than that of the executing task When the processor is redirected to service an int
440. lue from memory is stored in dst Memory read and write are done atomically i e other processors must be prevented from accessing the word of memory containing the word specified by src dst operand until operation completes Memory location in src dst is the word s first byte LSB address Address is automatically aligned to a word boundary Note that src dst operand maps to srcl operand of the REG format tempa lt src dst andnot 0x3 force alignment to word boundary temp lt memory word tempa LOCK asserted at begin of read memory word tempa lt temp src ordinal addition LOCK deasserted after memory write dst lt temp Type Mismatch Non supervisor reference of a sfr And or non supervisor attempt to write to internal data RAM atadd r8 r2 r11 r8 lt r2 address r8 where r8 specifies the address of a word in memory rll lt initial value stored at address r8 in memory atadd 612H REG atmod intel INSTRUCTION SET REFERENCE 9 3 6 atmod Mnemonic atmod Atomic Modify Format atmod STC mask srcldst reg sfr reg lit sfr reg sfr addr Description Copies the selected bits of src dst value into memory location specified in src Bits set in mask operand select bits to be modified in memory Initial value from memory is stored in src dst Memory read and write are done atomically i e other processors must be prevented from accessing the quad word of memory containing th
441. lues in these data structures are accessed by the processor during initialization These data structures are usually programmed in the system s boot ROM located in memory region 15 of the address space The required data structures are e PRCB e System procedure table e IBR e control table e interrupt table At initialization the processor loads the supervisor stack pointer from the system procedure table and caches the pointer in an internal register The supervisor stack pointer is located in the preamble of the system procedure table at byte offset 12 from the base address System procedure table base address is programmed in the PRCB See section 5 5 1 System Procedure Table pg 5 13 The control table is the data structure that contains the on chip control register values It is automatically loaded during initialization and must be completely constructed in the IMI See section 2 3 CONTROL REGISTERS pg 2 6 for a description of the control table At initialization the NMI vector is loaded from the interrupt table and saved at location 0000H of the internal data RAM The interrupt table is typically programmed in the boot ROM and then relocated to RAM by reinitializing the processor See CHAPTER 6 INTERRUPTS for a description of NMI and the interrupt table The remaining data structures which an application may need are the fault table user stack supervisor stack and interrupt stack The necessary stacks must be located i
442. ly and independently from software 6 2 intel INTERRUPTS 6 3 INTERRUPT PRIORITY Each interrupt procedure pointer is eight bits in length which allows up to 256 unique procedure pointers to be defined Each procedure pointer s priority is defined by dividing the procedure pointer number by eight Thus at each priority level there are eight possible procedure pointers e g procedure pointers 8 through 15 have a priority of 1 and procedure pointers 246 through 255 have a priority of 31 Procedure pointers 0 through 7 cannot be used Since 0 priority is the lowest priority a priority 0 interrupt will never successfully stop execution of a program of any priority The processor compares its current priority with the interrupt request priority to determine whether to service the interrupt immediately or to delay service The interrupt is serviced immediately if the interrupt request priority is higher than the processor s current priority the priority of the program or interrupt the processor is executing If the interrupt priority is less than or equal to the processor s current priority the processor does not service the request When multiple interrupt requests are pending at the same priority level the request with the highest vector number is serviced first Priority 31 interrupts are handled as a special case Even when the processor is executing at priority level 31 a priority 31 interrupt will interrupt the processor On the 1960 C
443. ly in internal hardware as well as the mechanism for switching between processes This hardware implementation enables the 1960 Cx processors to switch processes on clock boundaries no instruction overhead is necessary to switch the process With this switching mechanism DMA microcode and the user program can frequently alternate execution with absolutely no performance loss caused by the process switching A process switch from user process to DMA process occurs as a result of a DMA event A DMA event is signaled when a DMA channel requires service or is in the process of setting up a channel Signaling the DMA event is controlled by DMA logic After a DMA event is signaled the DMA process takes a certain number of clock cycles and then the user process is restored The maximum ratio of DMA to user cycles is 4 1 This means that at most the DMA process takes four clock cycles to every single user process clock The ratio of DMA to user cycles can also be selected as 1 1 to increase execution speed of the user process while a DMA is in progress The user to DMA cycle ratio is controlled by the throttle bit in the DMA command register DMAC t 13 35 DMA CONTROLLER intel A DMA rarely uses the maximum available cycles for the DMA process Actual cycle allocation between user process and DMA process depends on the type of DMA operation performed DMA channel activity and external bus loading and performance Maximum allocation of internal pro
444. m State Changes 2H Number Name 0H Reserved 1H Invalid Opcode 2H Unimplemented Reserved 3H 4H Invalid Operand 5H FH Reserved Indicates the processor cannot execute the current instruction because of invalid instruction syntax or operand semantics An invalid opcode fault is generated when the processor attempts to execute an instruction containing an undefined opcode or addressing mode An unimplemented fault is generated when processor attempts to execute an instruction fetched from on chip data RAM An unaligned fault is generated when the following conditions are present 1 the processor attempts to access an unaligned word or group of words in memory and 2 a fault is enabled by the unaligned fault mask bit in the PRCB fault configuration word The 1960 Cx processors handle unaligned accesses to little endian regions of memory in microcode and carry out the access regardless of the unaligned fault mask bit setting The processors do not support unaligned accesses to big endian regions such attempts result in incoherent data in memory Enabling the unaligned fault when using big endian byte ordering provides a means of detecting unsupported unaligned accesses When an unaligned fault is signaled the effective address of the unaligned access is placed in the fault record s optional data section beginning at address NFP 24 This address is useful to debug a program that is making unintentional unaligned accesses A
445. m supervisor call 7 4 STACK USED IN FAULT HANDLING The architecture does not define a dedicated fault handling stack Instead to handle a fault the processor uses either the user interrupt or supervisor stack whichever is active when the fault is generated with one exception if the user stack is active when a fault is generated and the fault handling procedure is called with an implicit supervisor call the processor switches to the supervisor stack to handle the fault 7 5 FAULT RECORD When a fault occurs the processor records information about the fault in a fault record in memory The fault handling procedure uses the information in the fault record to correct or recover from the fault condition and if possible resume program execution The fault record is stored on the stack that the fault handling procedure will use to handle the fault 7 5 1 Fault Record Data Figure 7 3 shows the fault record s structure In this record the fault s type number is stored in the fault type field and the fault s subtype number or bit positions for multiple subtypes is stored in the fault subtype field The address of faulting instruction field contains the IP of the instruction which caused the processor to fault 7 6 ntel FAULTS When a fault is generated the existing PC and AC register contents are stored in their respective fault record fields The processor uses this information to resume program execution after the fault i
446. may be simplified if burst access modes are not required it is easily modified for 8 or 16 bit buses B 1 BUS INTERFACE EXAMPLES intel WAIT generated by the internal wait state generator is used to generate write strobes at the proper place in the write cycle WAIT is used in the address generation circuit to generate mid burst addresses External address generation improves performance in burst accesses B 1 3 Block Diagram The 32 bit burst SRAM interface consists of chip select logic a state machine Programmable Logic Device PLD and write enable logic State Machine PAL F CA101A Figure B 1 Non Pipelined Burst SRAM Interface B 2 BUS INTERFACE EXAMPLES B 1 3 1 Chip Select Logic Chip select logic is a simple asynchronous data selector it can be implemented with an external state machine or PLD Chip select CS is based only on the address and is not qualified with any other signals The state machine PLD qualifies CS with ADS See section B 2 2 Waveforms pg B 13 for a more in depth discussion of chip select generation B 1 3 2 State Machine PLD The SRAM state machine PLD generates the CE and OE signals to the SRAM This PLD also contains the next address generation logic this logic improves burst access performance The improvement occurs because the 1960 Cx processors worst case address valid delay is longer than the PLD s worst case delay B 1 3 3 Write Enable Ge
447. mediately e Ifa bus request is being serviced the hold request is granted at the end of the current bus request e Ifthe processor is in the backoff state BOFF pin asserted the hold request is granted after BOFF is deasserted and the resumed request has completed The hold request may be acknowledged between internal DMA load and store operations and atomic requests read modify write accesses that assert LOCK When the HOLD signal is removed HOLDA is deasserted on the following PCLK2 1 cycle and the bus and control signals are driven The HOLD signal is a synchronous input Setup and hold times for this input are given in the 80960 and data sheets BREQ indicates that the bus controller queue contains one or more pending bus requests The bus controller can queue up to three bus requests refer to section 10 6 1 Bus Queue pg 10 14 for a complete description of the bus queue When the bus queue is empty the BREQ pin is deasserted BREQ determines bus queue state during a hold state or before the hold state is requested It may be useful to use BREQ to qualify hold requests and optimize the processor s use of the bus when shared by external masters Because the hold request is granted between bus requests the bus controller queue may contain one or more entries when the request is granted be used to delay a hold request until all pending bus requests are complete The processor may continue executin
448. ment of the peripheral and the delay time to gate the write signal with WAIT determines if this is an appropriate solution The state machine simply delays the read or write signal so that back to back commands to the peripheral satisfy the peripheral s command recovery time When the write state is entered the W R output of the PLD is a gated version of the WAIT signal This guarantees that the peripheral s write data hold time is satisfied CLK ADS W R WAIT L L L L L I DEN cs WR DATA Data Valid CA131A Figure 31 Write Waveforms 2 BUS INTERFACE EXAMPLES IDLE E asserted E asserted delay control CE asserted delay control Assert READ 5 Assert WRITE WR WAIT 5 CS BLAST Nrap 12 Nxpa 2 Il 0 1 2 3 4 RD ASSERTED F CA132A Figure B 32 State Machine Diagram This pseudo code example is provided only to describe the state machine diagram shown in Figure B 32 It is not intended for direct use as PLD equations STATE 0 idle UART is not asserted is not asserted RD is not asserted W R is not asserted IF selected ADS amp CS THEN next state is STATE 1 ELSE next state is STATE 0
449. mined for the current DMA status 13 10 1 DMA Command Register DMAC The DMA command register Figure 13 9 is a 32 bit special function register SFR specified as sf2 in assembly language Bits 21 0 are used for DMA status and configuration the remaining bits bits 31 22 are reserved These reserved bits should be programmed to zero 0 at initialization and not modified thereafter These reserved bits are not implemented on the 1960 Cx processors clearing these bits at initialization is only required for portability to other 1960 processor family products 13 21 DMA CONTROLLER intel Channel Enable Bits DMAC ce 0 suspend 1 enable Channel Terminal Count Flags DMAC ctc 0 non zero byte count 1 zero byte count software must reset Channel Active Flags DMAC ca 0 idle 1 active 0 not done 1 done software must reset Channel Done Flags DMAC cd 12 8 4 0 Channel Wait Bits DMAC cw 0 read next descriptor 1 descriptor has been read Priority Mode Bit DMAC pm 0 fixed 1 rotating Throttle Bit DMAC t 0 4 DMA to 1 user clock max 1 1 DMA to 1 user clock max Data Cache Global Disable DMAC dcgd 0 Enabled 1 Disabled Data Cache Invalidate DMAC dci 0 Enabled 1 Invalidate DMA Command Register DMAC Reserved Initialize to 0 Figure 13 9 DMA Command Register DMAC The channel enable bits bits 3
450. mixed mode bits 1 through 4 are not used and should only contain zeros Software can read and write the IPND and IMSK registers using any instruction that can use special function registers as operands When the core handles a pending interrupt it attempts to clear the bit that is latched for that interrupt in the IPND register before it begins servicing the interrupt If that bit is associated with an interrupt source that is programmed for level detection and the true level is still present the bit remains set Because of this the interrupt routine for a level detected interrupt should clear the external interrupt source and explicitly clear the IPND bit before return from handler is executed An alternative method of posting interrupts in the IPND register other than through the external interrupt pins and DMA interrupt inputs is to set bits in the register directly using an instruction such as a move instruction This operation has the same effect as requesting an interrupt through the external interrupt pins or DMA interrupt inputs The bit set in the IPND register must be associated with an interrupt source that is programmed for dedicated mode operation 12 14 lel INTERRUPT CONTROLL ER External Interrupt Pending Bits IPND xip 0 no interrupt 1 pending interrupt 0 no interrupt DMA Interrupt Pending Bits IPND dip 1 pending interrupt Interrupt Pending Registers IPND SFO Inte
451. n When the prediction is correct branches generally execute in parallel with other execution If prediction is not correct the worst case branch time for cached execution is still two clocks Although prediction bits are most likely set to gain maximum throughput different strategies can be used for setting the prediction bits A code sequence dominated by comparisons and conditional branches might see large differences between execution time of the fastest path and slowest path Prediction bits can be set to provide the best average throughput to ensure the fastest worst case execution or to minimize deviation between slowest and fastest times A 2 7 8 Branch Target Alignment Branch target code executes with more parallelism in the first clock if the branch target is long word or quad word aligned Quad word alignment is preferable for prefetch efficiency The IS sees four words in a clock when the requested IP is long word aligned and three words when the requested IP is not on a long word boundary Aligned branch targets give the scheduler another word to examine on the first clock following a branch However there are only a few cases where this optimization pays off The IS takes advantage of seeing four words on the first clock after a branch when the fourth word is a branch or micro flow and all three previous opcodes are executable in one clock Example A 6 shows a three word executable group add followed by Ida with 32 bit constant
452. n 4 2 branch and link 5 1 coding calls 5 1 returning from 5 18 Breakpoint Trace Event 9 6 breakpoints C 6 built in self test 14 2 burst access 10 3 bus access 11 2 bus backoff input BOFF 11 29 Index 1 INDEX Bus Configuration BCON register 10 5 10 8 Bus Control Unit A 26 loads A 26 queue entries 27 stores 26 bus controller configuration 10 9 11 2 instruction fetches 10 1 load and store instructions 10 1 memory regions 10 1 overview 1 4 10 1 programming 10 2 10 5 queue 10 14 wait state generator 10 3 bus requests 11 1 aligned 10 9 little endian memory regions 10 9 operation unaligned fault 10 9 unaligned 10 9 unsupported unaligned 10 10 bus translation unit 10 15 bus width 11 10 byte ordering big and little endian 11 24 C CA CF functional units A 2 cache load and lock 2 14 12 21 locking 4 21 cache replacement 36 caching of local register sets 5 6 frame fills 5 6 frame spills 5 6 mapping to the procedure stack 5 9 updating the register cache 5 9 call 5 2 9 21 Index 2 intel call and return mechanism 5 1 5 2 call and return instructions 4 16 explicit calls 5 1 implicit calls 5 1 local register cache 5 3 local registers 5 2 procedure stack 5 3 register and stack management 5 4 frame pointer 5 4 previous frame pointer 5 4 return instruction pointer 5 5 return type field 5 4 stack pointer 5 4 stack frame 5 2 call and return operations 5 5 call operation 5 5 return
453. n a system s RAM The fault table is typically located in boot ROM If it is necessary to locate the fault table in RAM the processor must be reinitialized 14 3 1 Reinitializing and Relocating Data Structures Reinitialization can reconfigure the processor and change pointers to data structures The processor is reinitialized by issuing the sysctl instruction with the reinitialize processor message type See section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 for a description of sysctl The reinitialization instruction pointer and a new PRCB pointer are specified as operands to the sysctl instruction When the processor is reinitialized the fields in the newly specified PRCB are loaded as described in section 14 2 6 Process Control Block PRCB pg 14 8 14 11 INITIALIZATION AND SYSTEM REQUIREMENTS intel Reinitialization is useful for relocating data structures to RAM after initialization The interrupt table must be located in RAM to post software generated interrupts the processor writes to the pending priorities and pending interrupts fields in this table It may also be necessary to relocate the control table to RAM it must be in RAM if the control register values are to be changed by the user program In some systems it is necessary to relocate other data structures fault table and system procedure table to RAM because of poor load performance from ROM After initialization the user program is responsible for copying data
454. n an opposite manner in memory The block s most significant byte is stored at the base address and the less significant bytes are stored at successively higher addresses This byte ordering scheme referred to as big endian applies to data blocks which are short words or words For more about byte ordering see section 10 4 DATA ALIGNMENT pg 10 9 When loading a byte half word or word from memory to a register the block s least significant bit is always loaded in register bit 0 When loading double words triple words and quad words the least significant word is stored in the base register The more significant words are then stored at successively higher numbered registers Bits can only be addressed in data that resides in a register bit 0 in a register is the least significant bit bit 31 is the most significant bit 2 5 4 Internal Data RAM Internal data RAM is mapped to the lower 1 Kbyte 0000H to 03FFH of the address space Loads and stores with target addresses in internal data RAM operate directly on the internal data RAM no external bus activity is generated Data RAM allows time critical data storage and retrieval without dependence on external bus performance The lower 1 Kbyte of memory is data memory only Instructions cannot be fetched from the internal data RAM Instruction fetches directed to the data RAM cause a type mismatch fault to occur Some internal data RAM locations are reserved for functions other tha
455. n as described in CHAPTER 5 PROCEDURE CALLS This causes the processor to switch back to the local or supervisor stack whichever it was using before the interrupt switches to the executing state and resumes work on the program if there are no pending interrupts to be serviced or trace faults to be handled intel INTERRUPTS 6 9 2 Interrupted State Interrupt If the processor receives an interrupt while it is servicing another interrupt and the new interrupt has a higher priority than the interrupt currently being serviced the current interrupt handler routine is interrupted Here the processor performs the same interrupt servicing action as is described in section 6 9 1 Executing State Interrupt pg 6 12 to save the state of the interrupted interrupt handler routine The interrupt record is saved on the top of the interrupt stack prior to the new frame that is created for use in servicing the new interrupt On the return from the current interrupt handler to the previous interrupt handler the processor deallocates the current stack frame and interrupt record and stays on the interrupt stack 6 13 INTERRUPTS In Dedicated Interrupt Y set bit in IPND get vector from IMAP register Expanded Interrupt Sotware Interrupt D Non Maskable Interrupt Y Y Y get vector encoded get vector in field 1 vector
456. n chained modes Np first is the throughput of the first transfer of a non chained DMA operation After the setup microcode completes additional microcode is required to start the first DMA transfer 13 41 DMA CONTROLLER intel First DMA transfer of chained DMA buffer N chain is the throughput between chained buffers chaining mode only The time required to arbitrate another buffer transfer in chaining mode read the next chaining descriptor from memory and acknowledge the first transfer of the new buffer Two values are given in Table 13 6 for chain to account for differences in throughput for EOP chaining mode EOP chaining occurs when the DMA controller is configured for both source and destination chaining the EOP TC3 0 pins are configured as inputs and EOP3 0 is asserted by the external system to cause chaining to the next buffer transfer NT first and NT chain are calculated as shown in Equations 13 5 and 13 6 Nr first Nro first Nro first 0 6 throttle Equation 13 5 Nr chain Nro chain Nto_first 0 6 throttle Equation 13 6 where throttle 0 for 4 1 throttle mode 1 for 1 1 throttle mode The factor of 0 6 is used to characterize the effect on the worst case base throughput value of disabling the throttle mode For determination of NTgpo Table 13 5 provides separate measure ments with the throttle bit both enabled and disabled 13 42 intel DMA CONTROLLER Table 13 6 Base Values of Worst
457. n chip caching On the 1960 Cx processors branches are virtually free in cached programs and cached program execution is dramatically faster than non cached execution Therefore branches and the branch and link instruction should be used to compress algorithms into the cache For example the previous low pass filter routine could be modified to use coeffi cients from registers instead of literals A short code piece could then sequence different filter coefficients through the registers and branch using bal to the filter loop The entire routine would fit in the instruction cache and could perform a chain of linear filters without a procedure call 54 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 8 Utilizing On chip Storage The processor has the ability to consume instructions and execute quad word memory operations in parallel with arithmetic operations every clock The instruction cache data cache 1960 CF processor only register cache and on chip data RAM are valuable resources for sustaining such optimized execution Compiler experimentation is an important aid to maximize utilization of on chip storage resources Compiler optimization is not limited to instruction caching In particular execution profiling will automate assignment of frequently used data to the data RAM Availability of data RAM provides more options for partitioning data between context based storage register cache general storage data cache w
458. n dst and condition code is set to 010 If src value is all 1 s all 1 s are stored in dst and condition code is set to 000 Action if src FFFFFFFFH dst lt FFFFFFFFH AC cc lt 000 else 1 31 while src and 271 0 1 lt dst lt i AC cc lt 0105 Faults Type Mismatch Non supervisor reference of a sfr Example assume r2 is not Oxffffffff Spanbit r2 r9 r9 lt bit number of most significant clear bit in r2 AC cc lt 010 Opcode spanbit 640H REG See Also scanbit 9 72 intel 9 3 54 Mnemonic Format Description Action Faults INSTRUCTION SET REFERENCE STORE st Store stob Store Ordinal Byte stos Store Ordinal Short stib Store Integer Byte stis Store Integer Short stl Store Long stt Store Triple stq Store Quad st STC dst reg mem Copies a byte or group of bytes from a register or group of registers to memory src specifies a register or the first lowest numbered register of successive registers dst specifies the address of the memory location where the byte or first byte or a group of bytes is to be stored The full range of addressing modes may be used in specifying dst Refer to section 3 3 MEMORY ADDRESSING MODES pg 3 5 for a complete discussion stob and stib store a byte and stos and stis store a half word from the src register s low order bytes Data for ordinal stores is truncated to fit the destination width If the data for
459. n general data storage Figure 2 4 When the DMA controller is active 32 bytes of data RAM are reserved for each channel in use Additionally 64 bytes of data RAM may be used to cache specific interrupt vectors The word at location 0000H is always reserved for the cached NMI vector With the exception of the cached NMI vector other reserved portions of the data RAM can be used for data storage when the alternate function is not used As described in section 14 2 6 Process Control Block PRCB pg 14 8 local register cache size is specified by the value of the Process Control Block s Register Cache Configuration Word The first five local register sets are cached internally if more than five sets are to be cached the local register cache can be extended into the internal data RAM Up to ten more sets occupying up to 640 bytes of data RAM can be used When the local register cache is extended each new register set consumes 16 words of internal data RAM beginning at the highest data RAM address The user program is responsible for preventing corruption to the internal RAM areas set aside for the register cache See CHAPTER 5 PROCEDURE CALLS PROGRAMMING ENVIRONMENT Internal RAM s first 256 bytes 0000H to OOFFH are user mode write protected This data RAM can be read while executing in user or supervisor mode however RAM can only be modified in supervisor mode Writes to these locations while in user mode cause a ty
460. n invalid operand fault is generated when the processor attempts to execute an instruction that has one or more operands having special requirements which are not satisfied A fault is caused by specifying a non existent SFR or non defined sysctl and or references to an unaligned long triple or quad register group No defined value Faults may be imprecise when executing with the NIF bit cleared A change in the program s control flow does not accompany operation faults faults occur before instruction execution 7 23 FAULTS 7 10 4 Parallel Faults Fault Type Fault Subtype Function RIP Program State Changes 7 24 intel See section 7 6 4 Parallel Faults pg 7 9 None see Figure 7 5 Fault Record for Parallel Faults pg 7 11 Indicates that one or more faults occurred when the processor was executing instructions in parallel in different execution units This fault type can occur only when the AC register NIF bit is cleared If parallel faults occur the number of parallel faults field in the fault record is a non zero value which indicates the number of parallel faults recorded This field is located in the fault record at location NFP 20 A fault record is saved for each parallel fault detected Information contained in these records is the same as described in this section for specific fault types IP of instruction that would execute next if faults were not generated Precision of faults reco
461. n operand See section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 for a complete discussion of sysctl Example 6 3 Using sysctl to Request an Interrupt ldconst 0 53 45 Vector number 53H is loaded into byte 0 of register g5 and the value is zero extended into byte 1 of the register sysctl g5 95 95 Vector number 53H is posted A literal can be used to post an interrupt with a vector number from 8 to 31 Here the required value of 00H in the second byte of a register operand is implied The action of the core when it executes the sysctl instruction is as follows 1 The core performs an atomic write to the interrupt table and sets bits in the pending interrupts and pending priorities fields that correspond to the requested interrupt 2 The core updates the internal software priority register with the value of the highest pending priority from the interrupt table This may be the priority of the interrupt that was just posted The interrupt controller continuously compares the following three values software priority register current process priority priority of the highest pending hardware generated interrupt When the software priority register value is the highest of the three the following actions occur 1 interrupt controller signals the core that software generated interrupt is to be serviced 6 8 intel INTERRUPTS 2 The core checks the interrupt table in memory determines the vect
462. n requires is busy The resource 18 busy if a previous incomplete instruction reserved it or the resource is already working on an instruction 4 IS then attempts to re issue the instruction on the next clock the same sequence of events is repeated This processor resource management mechanism is called resource scoreboarding A specific form of resource scoreboarding is register scoreboarding When an instruction s computation stage takes more than one clock the result registers are scoreboarded A subsequent operation needing that particular register is delayed until the multi clock operation completes Instructions which do not use the scoreboarded registers can execute in parallel The execute stage performs the instruction This stage is handled by the coprocessors which connect to the REG and MEM side buses In this stage the coprocessor has received operands from the RF and recognized opcode which tells the coprocessor which instruction to execute Execution begins and a result is returned in this stage for single clock instructions The execute stage is a single or multi clock pipeline stage depending on the operation performed and the coprocessor targeted For single clock coprocessors such as the integer execution unit the result of an operation is always returned immediately Because of the three stage pipeline construction and the register bypassing mechanism no conflicts between source access and result return can
463. n the stack frame 3 The processor sets the instruction pointer to the value of the RIP register Upon completion of these steps the processor executes the procedure to which it returns 5 2 4 Caching of Local Register Sets The 1960 architecture provides a local register cache to improve call and return performance Local registers are typically saved and restored from the local register cache when calls and returns are executed For the 1960 Cx microprocessors movement of a local register set between local registers and cache takes only four clock cycles Other overhead associated with a call or return is performed in parallel with this data movement When the number of nested procedures exceeds local register cache size local register sets must at times be saved or restored to their associated save areas in the procedure stack Because these Operations require access to external memory this local cache miss impacts call and return performance When a call is made and the register cache is full a register set in the cache must be saved to external memory to make room for the current set of local registers in the cache This action is referred to as a frame spill The oldest set of local registers stored in the cache is spilled to the associated local register save area in the procedure stack Figure 5 2 illustrates a call operation with and without a frame spill Similarly when a return is made and the local register set for the target
464. n the following sections Table D 2 Addressing Modes for MEM Format Instructions Format Mode Address Computation MEMA 00 offset 10 abase offset MEMB 0100 abase 0101 IP displacement 8 0110 reserved 0111 abase index 2scale 1100 displacement 1101 abase displacement 1110 index 28816 displacement 1111 abase index 25 displacement NOTE In these address computations a field in parentheses e g abase indicates that the value in the specified register is used in the computation Usage of a reserved encoding causes generation of an invalid opcode fault D 5 1 Format Addressing The MEMA format provides two addressing modes absolute offset register indirect with offset The offset field specifies an unsigned byte offset from 0 to 4096 The abase field specifies a global or local register that contains an address in memory D 4 intel MACHINE LEVEL INSTRUCTION FORMATS For the absolute offset addressing mode mode 00 the processor interprets the offset field as offset from byte 0 of the current process address space the abase field is ignored Using this addressing mode along with the Ida instruction allows a constant in the range 0 to 4096 to be loaded into a register For the register indirect with offset addressing mode mode 10 offset field value is added to the address in the abase register Setting the offset value to ze
465. nation chaining select bits bits 9 and 10 must be set to 0 The source destination chaining select bits bits 9 10 are set to enable data chaining mode Setting bit 9 enables destination chaining setting bit 10 enables source chaining Setting bits 9 and 10 enables source destination chaining Non chaining mode is selected if both bits are clear The interrupt on chaining buffer select bit bit 11 is set to cause an interrupt to be generated when byte count for a chained buffer reaches 0 Bit is ignored in a non chaining mode The chaining wait select bit bit 12 15 set to enable the channel wait function When the wait enable function is selected DMAC register channel wait bits must be cleared before a chaining descriptor is read This channel wait function together with the interrupt on buffer complete function allows chaining descriptors to be dynamically changed during the course of a chained DMA operation This bit is ignored when a non chaining mode is selected See section 13 5 DATA CHAINING pg 13 13 13 10 4 DMA Data RAM The DMA controller uses up to 32 words of internal data RAM to swap service between active channels When a channel is set up the DMA controller dedicates 8 words of data RAM to that channel see Figure 13 12 When channel service swaps from one channel to another the active channel s state is saved in data RAM The state is retrieved when the channel is again serviced DMA data RAM for a channel i
466. ncoding format and addressing mode 3 3 2 Register Indirect Register indirect addressing modes use a register s 32 bit value as a base for address calculation The register value is referred to as the address base designated abase in Table 3 5 Depending on the addressing mode an optional scaled index and offset can be added to this address base 3 6 intel DATA TYPES AND MEMORY ADDRESSING MODES Register indirect addressing modes are useful for addressing elements of an array or record structure When addressing array elements the abase value provides the address of the first array element an offset or displacement selects a particular array element In register indirect with index addressing mode the index is specified using a value contained in a register This index value is multiplied by a scale factor allowable factors are 1 2 4 8 and 16 The two versions of register indirect with offset addressing mode at the instruction encoding level are register indirect with offset and register indirect with displacement As with absolute addressing modes the mode selected depends on the size of the offset from the base address At the assembly language level the assembler allows the offset to be specified with an expression or symbolic label then evaluates the address to determine whether to use register indirect with offset MEMA format or register indirect with displacement MEMB format addressing mode Register indirect wit
467. nction registers and literals are used directly as instruction operands Table 2 2 lists instruction operands for each machine level instruction format and positions which can be filled by each register or literal Table 2 2 Allowable Register Operands Operand 1 Instruction Local Global Extended 3 Encoding Register Register Register SFR Src1 X X X X src2 x x X X src DST as src X X X src DST as DST X X X Src DST as both X X 2 MEM src DST X X abase X X index X X COBR 5 1 X X Src2 X X X DST X 3 X 3 X 3 NOTES 1 denotes the register can be used as an operand in a particular instruction field 2 Extended registers cannot be addressed in the src DST field of REG format instructions in which this field is used as both source and destination e g extract and modify 3 The COBR destination operands apply only to TEST instructions 2 3 CONTROL REGISTERS Control registers are used to configure on chip peripherals DMA controller interrupt controller and bus controller A program cannot access control registers directly as instruction operands Instead control registers are loaded from a data structure called the control table see Figure 2 2 The system control sysctl instruction moves control table values to on chip control registers The control table comprises seven quad word groups each group is assigned a group number from zero to s
468. nd is a dummy operand that should specify a literal or the same register as the mask operand The processor must be in supervisor mode to use this instruction with a non zero mask value If mask 0 this instruction can be used to read the process controls without the processor being in supervisor mode If the action of this instruction results in processor priority being lowered the interrupt table is checked for pending interrupts Changing the PC register reserved fields can lead to unpredictable behavior as described in section 2 6 3 Process Controls PC Register pg 2 17 Action if mask 0 if PC em supervisor Type mismatch fault temp lt lt mask and src dst or PC andnot mask src dst temp if temp p PC p check pending interrupts else src dst lt PC Faults Type Mismatch Non supervisor reference of a sfr Mismatch Attempted to execute instruction with non zero mask value while not in supervisor mode Example modpc g9 g9 g8 process controls lt g8 masked by 49 Opcode modpc 655H REG See Also modac modtc When modify process controls modpc instruction causes a program s priority to be lowered other 1960 processor family members check for pending interrupts in the memory based interrupt table the 1960 Cx devices internally store the priority of the highest pending interrupt found in the interrupt table s pending interrupts field To improve performance the store
469. nded by the assertion of ADS address strobe and BLAST burst last signals which are outputs from the processor ADS indicates that a valid memory address is present and an access has started BLAST indicates that the next data which is transferred is the end of access The bus controller can be configured to initiate burst non burst or pipelined accesses A burst access begins with ADS followed by two to four data transfers The last data transfer is indicated by assertion of BLAST Non burst accesses begin with assertion of ADS followed by a single data transfer Pipelined accesses begin on the same clock cycle in which the previous cycle completes This is accomplished by asserting ADS and a valid address during the last data transfer of the previous cycle Pipelined accesses may also be burst or non burst The bus controller can be configured for various modes to optimize interfaces to external memory Access type burst non burst or pipelined is selected when the bus controller is configured 11 1 2 Configuration The bus controller can be configured in various ways Bus width and access type can be set based on external memory system requirements For example peripheral devices commonly have slow non burst 8 bit buses The bus controller can be configured to make memory accesses to these 8 bit non burst devices Each memory access to the peripheral begins with assertion of ADS and a valid address BLAST is asserted and after
470. ndicates the access is complete CE is the output of the state register therefore the CE output delay is the clock to output time of the PLD Minimizing CE delay provides more memory access time The A3 2 address generation state machine Figure B 5 generates consecutive addresses for multiple word burst accesses The address generation state machine is not necessary if the memory region is defined in the region configuration table as non burst The burst address outputs BA3 2 correspond to registers within the PLD Address generation time then corresponds to the clock to output time of the PLD The BA3 2 signals are forced to 0 when BLAST is asserted The pseudo code descriptions that follow the figures are provided only to describe the state machine diagrams They are not intended to be PLD equations A trailing indicates a signal is asserted low In the pseudo code description the assertion of ADS and SRAM CS indicates the beginning of an access The state machine jumps to the proper state based on A3 2 The assertion of CE indicates that an access is underway The assertion of CE WAIT and BLAST indicates that the current transfer is complete and it is time to generate the next address The assertion of BLAST indicates the access is complete ADS amp CS Assert CE F CA104A Figure B 4 Chip Enable State Machine B 7 BUS INTERFACE EXAMPLES 06 amp CS amp amp A2 CE amp
471. ne B 23 DRAM Controller State Machine B 26 DMA Request and Acknowledge Signals B 28 DMA Chaining Description B 29 DRAM System Read Waveform B 30 DRAM System Write Waveform B 31 Memory System Block Diagram B 32 DRAM State Machine B 34 Two Way Interleaved Read Access Overlap B 37 Two Way Interleaved Memory System B 39 Two Way Interleaved Read Waveforms B 40 8 bit Interface Schematic B 42 Read Waveforms B 43 Write Waveforms B 44 State Machine Diagram B 45 Instruction Formats D 2 Control Table F 2 Fault Record F 3 Fault Table and Fault Table Entries F 4 Initial Memory Image IMI and Process Control Block PRCB F 5 Storage of an Interrupt Record on the Interrupt Stack F 6 Interrupt Table F 7 Procedure Stack Structure and Local Registers F 8 System Procedure Table F 9 Arithmetic Controls Register AC F 10 Bus Configuration Register BCON F 10 Data Address Breakpoint Registers F 11 DMA Command Register DMAC F 11 DMA Control Word F 12 Hardware Breakpoint Control Register BPCON F 13 Instruction Address Breakpoint Registers IPBO IPB1 F 13 Interrupt Control ICON Register F 14 intel Figure F 17 Figure F 18 Figure F 19 Figure F 20 Figure F 21 Figure F 22 Figure F 23 CONTENTS Interrupt Map IMAPO IMAP2 Registers eee F 15 Interrupt Mask IMSK and Interrupt Pending IPND Registers F 16 Memory Region Configuration Register MCON 0 15
472. ne flags are not cleared when a channel is set up or enabled This action must be performed by software Channel done flags indicate status only modifying these flags does not affect DMA controller operation The channel wait bits bits 19 16 signal that a chaining descriptor was read and optionally enables a read of the next chaining descriptor in memory Channel wait bits only enable the descriptor read when the channel is set up with the channel wait function enabled See section 13 10 2 Set Up DMA Instruction sdma pg 13 24 This function provides synchronization for programs which dynamically change chaining descriptors when a DMA is in progress The DMA controller sets a channel wait bit when a chaining descriptor is read from memory If the channel wait function is enabled the DMA controller waits for the channel wait bit to be cleared by software before the next descriptor is read See section 13 5 DATA pg 13 13 The priority mode bit bit 20 selects fixed 0 or rotating 1 priority mode The priority mode determines the order in which DMA channels are serviced if more than one request is pending See section 13 9 CHANNEL PRIORITY pg 13 20 13 23 DMA CONTROLLER intel throttle bit bit 21 selects the maximum ratio of DMA process time to user process time If the throttle bit is set the DMA process can take up to one clock for every one clock of the user process If the bit is clear
473. nections 2 B 20 B 3 4 Series Damping Resistors B 20 B 3 5 System Eoadirng 5 oorpore d PII B 21 B 3 6 Design Example Burst DRAM with Distributed RAS Only Refresh Using DMA B 21 B 3 7 DRAM Address Generation emere B 23 B 3 8 DRAM Controller State Machine B 25 B 3 9 DRAM Refresh Request and Timer Logic B 28 B 3 10 Programming for Refresh essem eene B 29 B 83 11 Memory Ready ite EO EE Ure i nei tere ides B 29 B 3 12 Region Table enne B 29 B 3 13 Design Example Burst DRAM with Distributed CAS Before RAS Refresh Using READY Control nee ere nete lo Avie edet B 32 B 3 14 DRAM Controller State Machine B 33 B 4 INTERLEAVED MEMORY SYSTEMS cccceesceeeceeeeeeeeeeeeeeeeaeeeeeeeeaeesaeeeaeeseeeeaeeeneeteas B 37 B 5 INTERFACING TO SLOW PERIPHERALS USING THE INTERNAL WAIT STATE GENERATOR Lieb eid E E EO es B 41 B 5 1 Implementation PER c E POE B 41 B 5 2 SCHOMALC 04 cL eid ee i hel atten tugs ard een B 41 B 5 3 WAVGTONMNIS erp te ER E E 43 APPENDIX CONSIDERATIONS FOR WRITING PORTABLE CODE C 1 ARCHITECTURE enne aa Eaa rnnt nnns enne C 1 C 2 ADDRESS SPACE RESTRICTIONS sess 2 2 1 Reserved
474. nel s integrity For example it allows system debugging software or a system monitor to be accessed even if an application s program destroys its own stack e When an instruction executed in supervisor mode causes bus access to occur an external supervisor pin SUP is asserted for loads stores and instruction fetches Hardware protection of system code or data can be implemented by using the supervisor pin to qualify write accesses to the protected memory In supervisor mode the processor is allowed access to a set of supervisor only functions and instructions For example the processor uses supervisor mode to handle interrupts and trace faults Operations which can modify DMA or interrupt controller behavior or reconfigure bus controller characteristics can only be performed in supervisor mode These functions include modification of SFRs control registers or internal data RAM which is dedicated to the DMA and interrupt controllers A fault is generated if supervisor only operations are attempted while the processor is in user mode Table 2 8 lists supervisor only operations and the fault which is generated if the operation is attempted in user mode The PC register execution mode flag specifies processor execution mode The processor automati cally sets and clears this flag when it switches between the two execution modes 2 20 S PROGRAMMING ENVIRONMENT Table 2 8 Supervisor Only Operations and Faults Generated in User
475. neration Logic The write enable generation logic generates the WE signal to the SRAM WE signals are conditioned on the i960 Cx processor byte enables BE3 0 the write read signal W R and the wait signal WAIT There is a write enable signal WE3 0 for each byte position corresponding to the byte enable signals BE3 0 this allows byte short word and word wide writes Read accesses to this memory system always result in word reads The 1960 Cx devices in the case of byte or short word reads read the data from the correct place on the data bus B 1 3 4 Chip Select Generation ADS assertion during the PCLK rising edge indicates the address is valid Address setup time to this clock edge is PCLK period Tpp minus address output delay Toy CS signal generation time CS gen must satisfy the input setup time of the State Machine PLD Tpr p setup Therefore CS gen Toy TPLD setup Equation B 1 B 3 BUS INTERFACE EXAMPLES B 1 4 Waveforms Figure B 2 shows a Non Pipelined SRAM Read Waveform Figure B 3 shows a Non Pipelined SRAM Write Waveform CLK ADS A3 2 1 4 DATA W R BLAST Non Pipelined Burst Read 1 _ 102 Figure 2 Non Pipelined SRAM Read Waveform B 4 ERRATA 10 31 94 SRB On pg B 5 Fig B 3 the ADS signal incorrectly showed a deassertion in the 6th cycle and the 3rd deassertion in the 11th cycle It now correctly
476. ng information for the DMA interrupt inputs four bits per input Each set of four bits contains a vector number s four most significant bits the four least significant bits are always 0010 In other words each source can be programmed for a vector number of PPPP 00105 where indicates a programmable bit For example IMAPO bits 4 through 7 contain mapping information for the pin If these bits are set to 0110 the pin is mapped to vector number 0110 0010 or vector number 98 12 12 intel INTERRUPT CONTROLLER Software can load the mapping registers using the sysctl instruction The mapping re also automatically loaded at initialization from the control table in external memory No 16 through 31 of each register are reserved and should be set to 0 at initialization gisters are te that bits External Interrupt 0 Field IMAPO xO External Interrupt 1 Field IMAPO x1 External Interrupt 2 Field IMAPO x2 External Interrupt 3 Field IMAPO x3 x 3 2 1 31 28 24 20 16 Interrupt Map Register 0 IMAPO 1 1 5 4 oj pox External Interrupt 4 Field IMAP1 x4 External Interrupt 5 Field IMAP1 x5 External Interrupt 6 Field IMAP1 x6 External Interrupt 7 Field IMAP1 x7 Interrupt Map Register 1 1 DMA Interrupt 0 Field IMAP2 d0 DMA Interrupt 1 Field IMAP2 d1 DMA Interrupt 2 Field IMAP2 d2 DMA Interrupt 3 Fi
477. nly The 1960 CF microprocessor has a 1 Kbyte direct mapped data cache The effect of data caching on performance is usually not as great as the effect of instruction caching because the processor often accesses data in a random occasional pattern compared to the repetitive looping pattern commonly seen with instruction execution A 55 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel The data cache behaves like SRAM for cache hits delivering data in a single clock Data cache misses require BCU interaction as do all stores to external memory addresses Data caching can be enabled for particular memory regions In most cases programmers will use this function only to distinguish non cacheable memory mapped I O space from ordinary data memory Once the data cache is enabled its operation is transparent as there are no further programming options 2 8 3 Register Cache Register cache can be thought of as a data cache which selectively caches only that data related to procedure context section 5 2 CALL AND RETURN MECHANISM pg 5 2 describes the 1960 Cx processors register cache The register cache data RAM partition is programmable Therefore the user can determine the trade off between procedural context caching and static caching of procedure variables in the on chip data RAM Experiments can be run to measure the sensitivity of system performance to register cache depth of a fixed program Minimizing register cach
478. nordered All use the CTRL format Since the actions of these instructions are dependent upon the result of a previous comparison the architecture allows a programmer to predict the likely result of the conditional fault instructions for higher performance The programmer s prediction is encoded in one bit of the opword The Intel 80960 assembler encodes the prediction with a mnemonic suffix of t for true and f for false 4 2 10 Debug The processor supports debugging and monitoring of program activity through the use of trace events These instructions support these debugging and monitoring tools modpc modify process controls modtc modify trace controls mark mark fmark force mark 4 17 INSTRUCTION SET SUMMARY In These all use the format Trace functions are controlled with bits in the Trace Control TC register which enable or disable various types of tracing Other TC register flags indicate when an enabled trace event is detected Refer to CHAPTER 8 TRACING AND DEBUGGING modpc can enable disable trace fault generation modtc permits trace controls to be modified mark causes a breakpoint trace event to be generated if breakpoint trace mode is enabled fmark generates a breakpoint trace independent of the state of the breakpoint trace mode bits The 1960 processor specific sysctl instruction described in section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 also provides control over breakpoint trace event
479. not be recoverable because the result is stored in the destination before the fault is generated e g the faulting instruction cannot be re executed if the destination register was also a source register for the instruction An arithmetic zero divide fault is generated before execution of the faulting instruction 7 21 FAULTS 7 10 2 Constraint Faults Fault Type Fault Subtype Function RIP Program State Changes 7 22 5H Number Name 0H Reserved 1H Constraint Range 2H Privileged 3H FH Reserved Indicates the program or procedure violated architectural constraint A constraint range fault is generated when a fault if instruction is executed and the AC register condition code field matches the condition required by the instruction A privileged fault is also generated when the program or procedure attempts to use a privileged supervisor mode only instruction while the processor is in user mode Privileged instructions for the 1960 Cx processor are sdma sysctl No defined value These faults may be imprecise when executing with the NIF bit cleared No changes in the program s control flow accompany these faults A constraint range fault is generated after the fault if instruction executes the program state is not affected A privileged fault is generated before the faulting instruction executes intel FAULTS 7 10 3 Operation Faults Fault Type Fault Subtype Function RIP Progra
480. nputs All dedicated inputs plus the NMI pin are programmed globally for fast sampling or debounce sampling Expanded mode inputs are always sampled in debounce mode Pin detection and sampling options are selected by programming the ICON register e When a pin is programmed for falling edge detection the corresponding pending bit in the IPND register is set when a high to low transition is detected e When a pin is programmed for low level detection the corresponding pending bit in the IPND register is set when a low level is detected Even for the level detect mode the pending bits are sticky and remain set after the interrupt source removes the active level from the interrupt pin The processor attempts to clear the pending bit on entry into the interrupt handler Edge and level detect modes are distinguished by the way software must deal with the external interrupt source on entry into the handler e Forthe edge detect mode the pending bit is cleared when the handler is entered In this mode software is not required to clear the interrupt source n the level detect mode the pending bit remains set if the external source is still active This means that software must explicitly clear the interrupt source before returning from the interrupt handler Otherwise the handler is re entered after the return is executed Example 12 1 demonstrates how a level detect interrupt is typically handled The example assumes that the Id fr
481. nsitioning it now correctly ia B fm Xm Xo shows that the signal is j asserted high throughout 0310 B vw ee F CA039A Figure 11 2 Quad word Read from 32 bit Non burst Memory EXTERNAL BUS DESCRIPTION f External Bus Pipe Ready Function width Nwoo Nwao Nxoa Naoo Lining Control Bits 10 T Value Not Reagy Burst Ready Ready Terminate Ready Nrap 2 1 Nrap 2 Nnpp t I 27 ba WS BT Gu Bg A Sy ty Orij WB I I 1 1 1 I 1 1 1 A31 4 SUP DMA D C BE3 0 LOCK 1 1 1 1 1 1 1 i Note BLAST is asserted in the last data transfer when WAIT is deasserted BLAST stays asserted until the end of the data cycle CA033A Figure 11 3 Bus Request with READY and BTERM Control EXTERNAL BUS DESCRIPTION In 11 2 2 Bus Width Each region s data bus width is programmed in the memory region configuration table The 1960 Cx processors allow an 8 16 or 32 bit wide data bus for each region The 1960 Cx processors place 8 and 16 bit data on low order data pins This simplifies interface to external devices As shown in Figure 11 4 8 bit data is placed on lines D7 0 16 bit data is placed on lines D15 0 32 bit data is
482. nt during burst accesses The A3 2 signals are pipelined they must be latched for read accesses Write accesses are not pipelined therefore it is necessary for burst writes to latch A3 2 on reads and pass A3 2 through The A3 2 generation is implemented as a state machine to achieve minimum address delay out of the PLD PA3 2 pipelined address 3 2 outputs are also the state bit of the PLD This ensures that the address delay is only the clock to output time for the PLD B 2 1 3 Write Enable Logic Write enable logic uses the byte enable signals BE3 0 the WAIT signal and a latched version of the W R signal OE Therefore WE OE amp WAIT amp BE or WEO OE WAIT BEO WEI OE WAIT BEI WE2 OE WAIT BE2 WE3 OE WAIT BE3 DEN remains asserted as long as consecutive pipelined read accesses continue DEN and DT R are related to the data not the address therefore DEN and DT R are not pipelined and retain the same timing for pipelined and non pipelined reads In the pipelined read mode a series of non burst accesses results in ADS remaining asserted for several clock cycles Similarly BLAST remains asserted for several clock cycles WIR behaves slightly differently for pipelined reads than for non pipelined reads W R is not valid for the last cycle of a pipelined read This requires that W R be latched for pipelined reads similar to A31 2 The following signals are pipelined during pipelined read accesses
483. ntation of the 1960 architecture which uses an integrated instruction cache provides a mechanism to purge the cache or some other method that forces consistency between external memory and internal cache This mechanism is implementation dependent Application code which supports modification of the code space must use this implementation specific feature and therefore is not object code portable to all 1960 processor implementations The CA has a 1 Kbyte instruction cache the CF has a 4 Kbyte instruction cache This instruction cache is purged using the system control sysctl instruction which may not be available on other 1960 processors The instruction cache supports locking interrupt procedures into none half or all of the cache The unlocked portion functions as a two way set associative cache The CF instruction cache supports locking any code section into half of the cache The unlocked portion functions as a direct mapped cache Refer to section 2 5 5 Instruction Cache pg 2 13 for a description of cache configuration 2 intel CONSIDERATIONS FOR WRITING PORTABLE CODE C 2 4 Data Cache 80960CF Processor Only The 1960 CF processor s 1 Kbyte direct mapped data cache can return up to a quad word 128 bits to the register file in a single clock cycle on a cache hit In this sense the data cache has the same bandwidth as the data RAM for cache hits The data cache has a four word line size with a separate valid bit
484. nual refer to bit values in register and data structures If a bit is set its value is 1 if the bit is clear its value is 0 Likewise setting a bit means giving it a value of 1 and clearing a bit means giving it a value of 0 The terms assert and deassert refer to the logically active or inactive value of a signal or bit respectively A signal is specified as an active 0 signal by an overbar For example the input is active low and is asserted by driving the signal to a logic 0 value 1 4 3 Representing Numbers numbers in this manual can be assumed to be base 10 unless designated otherwise In text binary numbers are sometimes designated with a subscript 2 for example 001 If it is obvious from the context that a number is a binary number the 2 subscript may be omitted Hexadecimal numbers are designated in text with the suffix H for example FFFF FF5AH In pseudo code action statements in the instruction reference section and occasionally in text hexadecimal numbers are represented by adding the C language convention Ox as a prefix For example appears as in the pseudo code 1 4 4 Register Names Special function registers and several of the global and local registers are referred to by their generic register names as well as descriptive names which describe their function The global register numbers are gO through g15 local register numbers are rO through r15 special function
485. o call calls bal 9 25 INSTRUCTION SET REFERENCE intel 9 3 14 Format Description Action Faults Example Opcode See Also 9 26 chkbit chkbit Check Bit chkbit bitpos STC reg lit sfr reg lit sfr Checks bit in src designated by bitpos and sets condition code according to value found If bit is set condition code is set to 010 if bit is clear condition code is set to 0005 if src and 2 bitpos mod 32 2 0 AC cc lt 000 else AC cc lt 010 Type Mismatch Non supervisor reference of a sfr chkbit 13 g8 checks bit 13 48 and sets AC cc according to the result chkbit 5 alterbit clrbit notbit setbit cmpi intel INSTRUCTION SET REFERENCE 9 315 Clrbit Mnemonic clrbit Clear Bit Format clrbit bitpos STC dst reg lit sfr reg lit sfr reg sfr Description Copies src value to dst with one bit cleared bitpos operand specifies bit to be cleared Action dst lt src andnot 2 bitpos mod 32 Faults Type Mismatch Non supervisor reference of a sfr Example clrbit 23 g3 g6 f 96 lt g3 with bit 23 cleared Opcode cirbit 58CH REG See Also alterbit chkbit notbit setbit 9 27 INSTRUCTION SET REFERENCE intel P 9 3 16 Mnemonic Format Description Action Faults Example Opcode See Also 9 28 cmpdeci cmpdeco cmpdeci Compare and Decrement Integer cmpdeco Compare and Decrement Ordin
486. o be passed in global registers Since the calling procedure and the called procedure share the global registers the called procedure has direct access to the parameters after the call When a procedure needs to pass more parameters than will fit in the global registers they can be passed by reference Here parameters are placed in an argument list and a pointer to the argument list is placed in a global register The argument list can be stored anywhere in memory however a convenient place to store an argument list is in the stack for a calling procedure Space for the argument list is created by incre menting the SP register value If the argument list is stored in the current stack the argument list is automatically deallocated when no longer needed A procedure receives parameters from and returns values to other calling procedures To do this successfully and consistently all procedures must agree on the use of the global registers Parameter registers pass values into a function Up to 12 parameters can be passed by value using the global registers If the number of parameters exceeds 12 additional parameters are passed using the calling procedure s stack a pointer to the argument list is passed in a pre designated register Similarly several registers are set aside for return arguments and a return argument block pointer is defined to point to additional parameters If the number of return arguments exceeds the 5 10 inte
487. oad is executed by the BCU on the MEM side Furthermore a branch can be issued in the same clock as the add and load since the IS executes it directly three instructions per clock The IS does not exploit every possible combination of three instruction types in four consecutive words Table A 2 summarizes the sequences of instruction machine types that can be issued in parallel group of one or more instructions which can be issued in the same clock is referred to in this appendix as an executable group of instructions Figure 7 shows the paths that the IS has available for dispatching each word of the rolling quad word to the three machine sides INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel Table A 2 Machine Type Sequences Which Can Be Issued In Parallel Sequence Description RMxx REG side followed immediately by a MEM side instruction RMCx REG side followed immediately by a MEM side followed immediately by a CTRL instruction RMxC REG side followed immediately by a MEM side followed by a CTRL instruction in the same rolling quad word RCxx REG side followed immediately by a CTRL instruction RxCx REG side followed by a CTRL instruction in the same rolling quad word RxxC MCxx MEM side followed immediately by a CTRL instruction MxCx MEM side followed by a CTRL instruction in the same rolling quad word MxxC rolling 8 quad word Word IP Word IP 4 Word IP 8 Word IP
488. obl t g0 g6 loop addi g4 g5 g5 bl t opt loop Execution from DR Execution from DR Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 1 ld 2 Id 2 ld 3 muli 3 muli 4 4 addo 5 5 6 6 addo 7 2 7 bl t 8 addi 8 addi 9 addo 9 ld 10 addo bl t 11 cmpo 12 Id A 45 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 7 83 Advancing Comparisons Where possible instructions which change condition codes should be separated from instructions that use condition codes Although correct branch prediction gives the same performance as separating the compare from the branch prediction is statistical while separation is deterministic In the previous example optimized code advanced the comparison enough that branch prediction is not being relied upon to keep the branch true path executing at nine clocks Furthermore the branch false path does not take extra clocks since the condition codes are known when the branch is encountered In a situation where the comparison and a branch cannot be separated to achieve a performance advantage use the combined compare and branch instructions This is likely to lead to faster execution since the two instructions are encoded in a single word This code economy frees another location in the cache and the IS may be able to see the branch earlier because the branch is encoded in the same opcode word A 2 7 4 Unrolling Loops Expand small loops in
489. occur For multi clock coprocessors such as the multiply divide unit the coprocessor must arbitrate access to the return path A 5 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 1 3 Register File RF The RF contains the 16 local and 16 global registers and has six ports Figure A 4 This allows several of the core s coprocessors to access the register set in parallel This parallel access results in an ability to execute one simple logic or arithmetic instruction one memory operation LOAD STORE and one address calculation per clock REG Data Buses MEM Data Buses SREI 77 128 Load Six Ported Register File SRC2 64 16 Local Registers Store 64 32 Address DST 16 Global Registers Base F CA086A Figure A 4 Six Port Register File MEM coprocessors interface to the RF with a 128 bit wide load bus and a 128 bit wide store bus An additional 32 bit port allows the Address Generation Unit to simultaneously fetch an address or address reduction operand These wide load and store data paths e enable up to four words of source data and four words of destination data to simultaneously pass between the RF and a MEM coprocessor in a single clock e provide a high bandwidth path between data RAM data cache 80960CF only and local register cache to implement high speed data operations e provide a highly efficient means for moving load store and fetch data between the bus controller and the RF RE
490. ocessor can access through its fault handling mechanism Using the system procedure table to store IPs for fault handling is described in section 7 1 FAULT HANDLING FACILITIES OVERVIEW pg 7 1 Figure 5 4 shows the system procedure table structure It is 1088 bytes in length and can have up to 260 procedure entries At initialization the processor caches a pointer to the system procedure table This pointer is located in the PRCB The following subsections describe this table s fields 31 0 000H 008H 22 gre Trace Control Bit 02CH 03CH 438H procedure entry 259 43CH 81 Procedure Entry 210 1 Reserved Entry Type Initialize to 0 e 00 Local 10 Supervisor Preserved CA013A Figure 5 4 System Procedure Table 5 13 PROCEDURE CALLS intel 5 5 1 1 Procedure Entries A procedure entry in the system procedure table specifies a procedure s location and type Each entry is one word in length and consists of an address or IP field and a type field The address field gives the address of the first instruction of the target procedure Since all instructions are word aligned only the entry s 30 most significant bits are used for the address The entry s two least significant bits specify entry type The procedure entry type field indicates call type system local call or system supervisor call Table 5 2 On a system call the processor perfo
491. oducts The 1960 Cx processors are designed for appli cations which require greater performance on a single chip than is usually found in an entire embedded system The sheer speed of the 1960 Cx processors enriches traditional embedded appli cations and makes many new functions possible at a reduced cost These embedded processors are versatile they are found in diverse products such as laser printers X terminals bridges routers PC add in cards and server motherboards Figure 1 1 identifies the processors most notable features including the multiple instruction per clock C series core two way set associative instruction cache programmable register cache on chip data RAM multi mode programmable bus controller for its demultiplexed bus four channel 59 Mbyte per second DMA controller and high speed interrupt controller 1 1 1960 MICROPROCESSOR ARCHITECTURE The i960 architecture provides a high performance computing model The architecture profits from reduced instruction set computer RISC concepts and through superscalar implementa tions includes refinements for execution of more than one instruction per clock The archi tecture provides a high speed procedure call return model a powerful instruction set suited to parallelism and integrated interrupt and fault handling models appropriate in a parallel execution environment 1 1 1 Parallel Instruction Execution To sustain execution of multiple instructions in each cloc
492. of both comparison operations Therefore only one conditional branch instruction is required to act upon the range check otherwise two branches would be needed chkbit checks a specified bit in a register and sets the condition code flags according to the bit state The condition code is set to 010 if the bit is set and 000 otherwise INSTRUCTION SET SUMMARY 4 2 6 2 Compare and Increment or Decrement These instructions compare two operands set the condition code bits according to the results then increment or decrement one of the operands cmpinci compare and increment integer cmpinco compare and increment ordinal cmpdeci compare and decrement integer cmpdeco compare and decrement ordinal m These all use the REG format and can specify literals or local global or special function registers They are an architectural performance optimization which allows two register operations e g compare and add to execute in a single cycle These are intended for use at the end of iterative loops 4 2 6 3 Test Condition Codes These test instructions allow the state of the condition code flags to be tested teste test for equal testne test for not equal testl test for less testle test for less or equal testg test for greater testge test for greater or equal testo test for ordered testno test for unordered If the condition code matches the instruction specified condition these cause a TRUE 01H to be stored
493. oint may also be enabled or disabled individually by programming the BPCON enable bits The instruction address breakpoint data address breakpoint and breakpoint control registers are on chip control registers These are loaded from the control table in memory at initialization or may be modified using sysctl Control registers are described in section 2 3 CONTROL REGISTERS pg 2 6 sysctl is further described in section 4 3 SYSTEM CONTROL FUNCTIONS pg 4 19 8 6 3 TRACING AND DEBUGGING A breakpoint trace event is signalled when the processor attempts an access which is set for detection instruction or data breakpoint Breakpoint trace is enabled by setting the appropriate field in the IPBO IPB1 and BPCON registers If breakpoint trace is enabled the appropriate TC register hardware breakpoint trace event flags are set If tracing is enabled a trace fault is generated after the faulting instruction completes execution IL t Data Address 0 Breakpoint Enable BPCON e0 Reserved 00 disable Initialize to 0 11 enable DABO Mode See Note Data Address 1 Breakpoint Enable BPCON e1 00 disable 11 enable NOTE 1 Data Address Breakpoint DABO DAB1 Modes Break on 00 store only CA026A 01 data only load or store 10 data or instruction fetch 11 any access Figure 8 4 Hardware Breakpoint Control Register BPCON 8 3 SIGNALING A TRACE EVENT
494. ol bit The processor gets the trace control bit from bit 0 of the supervisor stack pointer which is cached during the reset initialization sequence When the trace control bit is set tracing is enabled on supervisor calls when cleared tracing is disabled on supervisor calls Upon return from the supervisor procedure the PC register trace enable bit is restored to the value saved in the PFP register return type field TRACING AND DEBUGGING intel 8 2 TRACE MODES This section defines trace modes enabled through the TC register These modes can be enabled individually or several modes can be enabled at once Some modes overlap such as call trace mode and supervisor trace mode section 8 4 HANDLING MULTIPLE TRACE EVENTS pg 8 8 describes processor function when multiple trace events occur e Instruction trace e Branchtrace Breakpoint trace Prereturn trace e Call trace Return trace e Supervisor trace 8 2 1 Instruction Trace When the instruction trace mode is enabled the processor generates an instruction trace event each time an instruction executes A debug monitor can use this mode to single step the processor 8 2 2 Branch Trace When the branch trace mode is enabled the processor generates a branch trace event when a branch instruction executes and the branch is taken A branch trace event is not generated for conditional branch instructions that do not branch for branch and link call or return instructions 8
495. om address timer 0 deactivates the interrupt input 12 9 INTERRUPT CONTROLLER In P Example 12 1 Return from a Level detect Interrupt Clear level detect interrupts before return from handler ld timer 0 40 Get timer value and clear XINTO wait clrbit 0 sf0 sf0 Attempt to clear bit bbs 0 sf0 wait Retry if not clear ret Return from handler The debounce sampling mode provides a built in filter for noisy or slow falling inputs The debounce sampling mode requires that a low level is stable for approximately 6 PCLK2 1 periods before the interrupt input is detected Expanded mode interrupts are always sampled using the debounce sampling mode This mode provides time for interrupts to trickle through external priority encoders Figure 12 5 shows how a signal is detected in each mode debounce and fast sample mode The debounce sampling option adds several clocks to an interrupt s latency due to the multiple clocks of sampling Interrupt pins are asynchronous inputs and are synchronized internally by the processor If the input width is sufficient the input is detected correctly regardless of setup and hold time relative to PCLK2 1 The interrupt inputs are internally sampled once every two PCLK2 1 falling edges Setup and hold specifications are provided in the data sheet which guarantee detection of the interrupt on particular edges of PCLK2 1 These specification are useful in designs which use synchronous logic to gen
496. omic Read Write Operation x z are temporary registers X lt atomic read pending priorities assert LOCK pin 2 lt read pending interrupts vector number 8 x vector number 8 lt 1 z vector number mod 8 1 write pending interrupts vector number 8 lt 2 atomic write pending priorities lt x deassert LOCK The LOCK pin can be used to prevent other agents on the bus from accessing the interrupt table during the posting operation On the 1960 Cx microprocessor posting software interrupts is performed by sysctl 6 5 2 Posting Interrupts Directly to the Interrupt Table 1960 Cx processors or external agent that is sharing memory with the microprocessor such as an I O processor or another 1960 Cx processor can post pending interrupts directly in the interrupt table by setting the appropriate bits in the pending priorities and pending interrupts fields This action however does not ensure that the core will handle the interrupt immediately nor does it cause the core to update the value in the software priority register To do this the sysctl instruction should be used as described in the preceding sections sysctl can be used at any time to explicitly force the core to check the interrupt table for pending interrupts This is done by specifying an invalid vector number in the range of 0 to 7 For example when an external agent is posting interrupts to a shared interrupt table sys
497. omplete the channel setup The setup process runs concurrently with the execution of the user s program After the setup process is started it is possible to enable a channel through the DMAC register before the setup completes In this case the DMA controller waits for the setup to complete before the DMA operation begins The result is the potential for additional latency on the first DMA request To decrease this additional latency issue the sdma instruction well in advance of enabling the DMA channel A second sdma can be issued before a previously issued DMA setup event completes The second sdma must wait for the first event to complete preventing other instructions from executing If the segment of code which issues the sdma is time critical it may be beneficial to overlap other operations other than sdma with the setup event and space the sdma instructions in the code instead of issuing them back to back A waiting sdma instruction is interruptible therefore back to back sdma instructions do not adversely increase interrupt latency 13 25 DMA CONTROLLER intel Internal Register non chained non chained multl cycle DMA fly by DMA any chained DMA Channel No 0 3 op 1 Channel No 0 3 op 1 Channel No 0 3 op 1 DMA Control Word op 2 DMA Control Word op 2 DMA Control Word op 2 Byte Count Byte Count op3 Pointer To 1st Descriptor op 3 Source Address Destination Address Note must be quad aligned regi
498. on and by designing to reduce noise and reflection on signal lines 14 4 9 Interference Interference is the result of electrical activity in one conductor that causes transient voltages to appear in another conductor Interference increases with the following factors e Frequency Interference is the result of changing currents and voltages The more frequent the changes the greater the interference e Closeness of two conductors Interference is due to electromagnetic and electrostatic fields whose effects are weaker further from the source 14 31 INITIALIZATION AND SYSTEM REQUIREMENTS intel Two types of interference must be considered in high frequency circuits electromagnetic inter ference EMI and electrostatic interference ESI EMI also called crosstalk is caused by the magnetic field that exists around any current carrying conductor The magnetic flux from one conductor can induce current in another conductor resulting in transient voltage Several precautions can minimize EMI e ground lines between two adjacent lines wherever they traverse a long section of the circuit board The ground line should be grounded at both ends ground lines between the lines of an address bus or a data bus if either of the following conditions exist bus is on an external layer of the board bus is on an internal layer but not sandwiched between power and ground planes that are at most 10 mils away
499. on time Mnemonic Name Assembler Syntax Action A descriptive name for the instruction The acronym recommended for use by 19609 processor assemblers The recommended operand ordering and syntax for 1960 processor assemblers An abbreviated algorithmic description of the action that the instruction performs including modification to the AC PC or TC registers Any possible faults generated are also listed Table A 1 describes the meaning of the shorthand used in the register and fault sections of the reference Table A 1 Action Shorthand Symbol Definition Symbol Definition Not changed or generated by C Call the instruction 0 Set to 0 by the instruction S Supervisor under any condition 1 Set to 1 by the instruction BR Breakpoint under any condition May be set or cleared by the R Return instruction 0 be cleared but is P Prereturn never set by the instruction be set but is never U Unimplemented cleared by the instruction T Trace Fault Type OP Invalid Operand Operation Fault OC Invalid Opcode A Arithmetic Fault Type IO Integer Overflow Constraint Fault Type 2 Zero Divide P Privilege Fault Type M Machine Y Type Fault Type R Range Instruction Undefined B Branch A 1 INSTRUCTION SET QUICK REFERENCE KEY intel Opcode Instruction Format Machine Type Execution Time A 2 The
500. only 0 disabled F 028 Figure F 19 Memory Region Configuration Register MCON 0 15 Section 10 3 1 Memory Region Configuration Registers MCON 0 15 pg 10 6 F 18 intel REGISTER AND DATA STRUCTURES Return Status Return Type Field PFP rt Pre Return Trace Flag PFP p Previous Frame Pointer Address PFP a 014 Figure 20 Previous Frame Pointer Register PFP 0 Section 5 8 RETURNS pg 5 16 Trace Enable Bit PC te 0 no trace faults 1 generated trace faults Execution Mode Flag PC em 0 user mode 1 supervisor mode Trace Fault Pending PC tfp 0 no fault pending 1 fault pending State Flag PC s 0 executing 1 interrupted Priority Field PC p 0 31 process priority Reserved Do Not Modify F_CROO5A Figure F 21 Process Controls PC Register Section 2 6 3 1 Initializing and Modifying the PC Register pg 2 19 F 19 REGISTER AND DATA STRUCTURES Trace Mode Bits Instruction Trace Mode TC i Branch Trace Mode TC b Call Trace Mode TC c Return Trace Mode TC r Pre Return Trace Mode TC p Supervisor Trace Mode TC s Breakpoint Trace Mode TC br Trace Event Flags Instruction TC if Branch TC bf Call TC cf Return TC rf Pre Return TC pf Supervisor TC sf Breakpo
501. onous Inputs y sampled PCLK2 1 sampled by RESET RESET D31 0 MENDES NOI XINT7 0 STEST READY NEM GUT NMI ONCE BTERM MNA DREQ3 0 CLKMODE HOLD 0 1960 Cx processor inputs which considered asynchronous are internally synchronized to the rising edge of PCLK2 1 Since they are internally synchronized the pins only need to be held long enough for proper internal detection In some cases it is useful to know if an asynchronous input will be recognized on a particular PCLK2 1 cycle or held off until a following cycle The 1960 CA CF microprocessor data sheets provide setup and hold requirements relative to PCLK2 1 which ensure recognition of an asynchronous input on a particular clock The data sheets also supply hold times required for detection of asynchronous inputs The ONCE CLKMODE and STEST inputs are asynchronous inputs These signals are sampled and latched on the rising edge of the RESET input instead of PCLK2 1 14 4 6 High Frequency Design Considerations At high signal frequencies and or with fast edge rates the transmission line properties of signal paths in a circuit must be considered Reflections interference and noise become significant in comparison to the high frequency signals These errors can be transient and therefore difficult to debug In this section some high frequency design issues are discussed for more information consult a reference book on high frequency d
502. onsistent interrupt model as required for interrupt handler compatibility between various implementations of the 1960 processor family The archi tecture however leaves the interrupt request management strategy to the specific 1960 processor family implementations In the 1960 Cx processors the programmable on chip interrupt controller transparently manages all interrupt requests Figure 12 1 These requests originate from e 8 bitexternal interrupt pins XINT7 0 e four DMA controller channels e non maskable interrupt pin NMI e sysctl instruction execution External interrupt pins can be programmed to operate in three modes 1 dedicated mode the pins may be individually mapped to interrupt vectors 2 expanded mode the pins may be interpreted as a bit field which can request any of the 248 possible interrupts that the 1960 processor family supports 3 mixed mode five pins operate in expanded mode and can request thirty two different interrupts and three pins operate in dedicated mode Dedicated mode requests are posted in the Interrupt Pending Register IPND The processor does not post expanded mode requests The NMI pin allows a highest priority non maskable and non interruptible interrupt to be requested NMI is always a dedicated mode input Each of the four DMA channels has an associated interrupt request to allow the application to synchronize with the DMA operations of each channel DMA interrupt requests are alway
503. operand s least significant bits src2 must be an even numbered register 1 0 12 r4 or g0 g2 or 50 sf2 srcl value is a normal ordinal i e 32 bits The result consists of a one word remainder and a one word quotient Remainder is stored in the register designated by dst quotient is stored in the next highest numbered register dst must be an even numbered register 1 IO 12 r4 or g0 g2 or 5 0 sf2 This instruction performs ordinal arithmetic If this operation overflows quotient or remainder do not fit in 32 bits no fault is raised and the result is undefined dst lt src2 src2 srel src remainder dst 1 lt src2 srcl quotient 3t src2 is 64 bits srcl dst and dst 1 are 32 bits Type Mismatch Non supervisor reference of a sfr Arithmetic Zero Divide The srcl operand is 0 g3 94 410 910 lt remainder of 94 95 93 f 911 lt quotient of 94 95 93 ediv 671H REG emul divi divo intel 9 3 23 Mnemonic Format Description Action Faults Example Opcode See Also INSTRUCTION SET REFERENCE emul emul Extended Multiply emul srcl src2 dst reg lit sfr reg lit sfr reg sfr Multiplies src2 by src and stores the result in dst Result is a long ordinal 64 bits stored in two adjacent registers dst specifies lower numbered register which receives the result s least significant bits dst must be an even numbered registe
504. operation 5 6 calls 5 2 9 22 call trace mode 8 4 callx 5 2 9 24 chkbit 9 26 clock input CLKIN 14 26 clrbit 9 27 cmp 9 29 cmpdec 9 28 cmpdeci cmpdeco 9 28 cmpi cmpo 9 29 cmpib cmpob 9 31 cmpibe cmpibne cmpibl cmpible 9 31 cmpibg cmpibge cmpibo cmpibno 9 31 cmpinc 9 30 cmpinci cmpinco 9 30 cmpobe cmpobne 9 31 cmpobg cmpobge 9 31 cmpobl cmpoble 9 31 intel Coding Optimizations branch prediction A 53 branch target alignment 53 comparison and branching 46 compressing algorithms using branching 54 data cache A 55 data RAM A 55 instruction cache 55 loads and stores A 44 loop expansion 46 maximizing instruction execution A 49 multiplication and division A 45 on chip storage A 55 register cache A 56 reordering code for parallel issue 48 cold reset 12 15 14 2 comparison instructions 4 12 9 31 compare and conditional compare instructions 4 12 compare and increment or decrement instruc tions 4 13 test condition instructions 4 13 concmp 9 34 concmpi concmpo 9 34 conditional fault instructions 4 17 configurable memory regions see also MCON 10 1 Control Pipeline conditional branches A 32 unconditional branches A 28 control registers 2 1 2 7 control table 2 1 2 6 2 8 initialization 14 11 core architecture A 1 core architecture mechanisms C 1 D data alignment 3 4 data alignment in external memory 2 11 data cache 2 14 A 8 BCU interaction A 11 bus configuration
505. opies 4 bytes from memory into successive registers Idl copies 8 bytes Idt copies 12 bytes Idq copies 16 bytes St copies 4 bytes from successive registers into memory stl copies 8 bytes stt copies 12 bytes stq copies 16 bytes For Id Idob Idos Idib and Idis the instruction specifies a memory address and register the memory address value is copied into the register The processor automatically extends byte and short half word operands to 32 bits according to data type Ordinals are zero extended integers are sign extended For st stob stos stib and stis the instruction specifies a memory address and register the register value is copied into memory For byte and short instructions the processor automatically reformats the source register s 32 bit value for the shorter memory location For stib and stis this reformatting can cause integer overflow if the register value is too large for the shorter memory location When integer overflow occurs either an integer overflow fault is generated or the integer overflow flag in the AC register is set depending on the integer overflow mask bit setting in the AC register 4 5 INSTRUCTION SET SUMMARY In For stob and stos the processor truncates the operand and does not create a fault if truncation resulted in the loss of significant bits 4 2 1 2 Move Move instructions copy data from a local global special function register or group of registers to another register or
506. or intel INSTRUCTION SET REFERENCE 9341 hot notand Mnemonic not Not notand Not And Format not STC dst reg lit sfr reg sfr notand srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Performs a bitwise NOT not instruction or NOT AND notand instruction operation on the src2 and src values and stores the result in dst Action not dst lt not src notand dst lt not src2 and src Faults Type Mismatch Non supervisor reference of a sfr Action not g2 g4 g4 lt NOT g2 notand r5 r6 r7 r7 lt NOT AND r5 Opcode not 58AH REG notand 584H REG See Also and andnot nand nor notor or ornot xnor xor 9 57 INSTRUCTION SET REFERENCE intel 9 3 42 Format Description Action Faults Example Opcode See Also 9 58 notbit notbit Not Bit notbit bitpos STC dst reg lit sfr reg lit sfr reg sfr Copies the src value to dst with one bit toggled The bitpos operand specifies the bit to be toggled dst lt src xor 2 bitpos mod 32 Type Mismatch Non supervisor reference of a sfr notbrt x42 r7 lt r12 with the bit specified in r3 toggled notbit 580H REG alterbit chkbit clrbit setbit intel INSTRUCTION SET REFERENCE 9343 Mnemonic notor Not Or Format notor srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Performs a bitwise NOTOR operation on src2 and src values and stores result in dst
507. or number of the highest priority pending interrupt and clears the pending interrupts and pending priorities bits in the table that correspond to that interrupt 3 The core detects the interrupt with the next highest priority which is posted in the interrupt table if any and writes that value into the software priority register 4 The core services the highest priority interrupt If more than one pending interrupt is posted in the interrupt table at the same interrupt priority the core handles the interrupt with the highest vector number first The software priority register is an internal register and as such is not visible to the user The core only updates this register s value when sysctl requests an interrupt or when a software generated interrupt is serviced 6 7 INTERRUPT STACK AND INTERRUPT RECORD The interrupt stack can be located anywhere in the non reserved address space The processor obtains a pointer to the base of the stack during initialization The interrupt stack has the same structure as the local procedure stack described in section 5 2 1 Local Registers and the Procedure Stack pg 5 2 As with the local stack the interrupt stack grows from lower addresses to higher addresses The processor saves the state of an interrupted program or an interrupted interrupt procedure in a record on the interrupt stack Figure 6 3 shows the structure of this interrupt record The interrupt record is always stored on the
508. or was in user mode when the fault occurred this operation causes a switch from the supervisor stack to the user stack e Ifthe trace fault pending flag and trace enable bit are set the trace fault is also handled at this time PC register restoration causes any changes to the process controls caused by the fault handling procedure to be lost In particular if the ret instruction from the fault handling procedure caused the PC register trace fault pending flag to be set this setting would be lost upon return 7 8 4 Faults and Interrupts If an interrupt occurs during e aninstruction that will fault or e instruction that has already faulted or e fault handling procedure selection the processor handles the interrupt in the following way It completes the selection of the fault handling procedure then services the interrupt just prior to executing the first instruction of the fault handling procedure The fault is handled upon return from the interrupt Handling the interrupt before the fault reduces interrupt latency 7 9 PRECISE AND IMPRECISE FAULTS As described in section 7 6 4 Parallel Faults pg 7 9 the 1960 architecture to support parallel and out of order instruction execution allows some faults to be generated together and not in sequence When this situation occurs it may be impossible to recover from some faults because the state of the instructions surrounding the faulting instruction has changed or the
509. ord fault handler word default sysproc space 251 4 Fault Table syscall 2 equ fault proc 7 text align 4 2 Reserved Supervisor stack pointer supervisor proc Preserved sysproc 0 sysproc 1 sysproc 2 sysproc 3 Sysproc 4 sysproc 5 sysproc 6 sysproc 7 Sysproc 8 sysproc 9 14 15 INITIALIZATION AND SYSTEM REQUIREMENTS intel Example 14 2 Startup Routine init s Sheet 3 of 6 boot flt table word fault proc 2 syscall O Parallel Fault word 0 27 word fault proc 2 syscall l Trace Fault word 0 27 word fault proc 2 syscall 2 Operation Fault word 0 27 word fault proc 2 syscall 3 Arithmetic Fault word 0 27 word fault proc 2 4 syscall 4 Reserved word 0 27 word fault proc 2 syscall 5 Constraint Fault word 0 27 word fault proc 2 syscall 6 Reserved word 0 27 word fault proc 2 syscall 7 Protection Fault word 0 27 word fault proc 2 4 syscall 8 Reserved word 0 27 word fault proc 2 syscall 9 Reserved word 0 27 word fault proc 2 syscall Oxa Type Fault word 0 27 space 21 8 reserved Boot Interrupt Table text boot intr table word 0 word 0 305 0 20 0 Op 2090 word intx intx intx intx intx intx intx intx word intx intx intx intx intx intx intx
510. ore lines This effect is illustrated in Figure 14 6 which shows that two lines in parallel have half the impedance of one To reduce impedance even further add more lines Ideally a plane an infinite number of parallel lines results in the lowest impedance Fabricate ground planes with a minimum of 2 oz copper power and ground pins must be connected to a plane Ideally the 1960 Cx processor should be located at the center of the board to take full advantage of these planes simplify layout and reduce noise 14 27 INITIALIZATION AND SYSTEM REQUIREMENTS intel 215 Lo CA079A Figure 14 6 Reducing Characteristic Impedance 14 4 4 Decoupling Capacitors Decoupling capacitors placed across the device between and reduce voltage spikes by supplying the extra current needed during switching Place these capacitors close to their devices because connection line inductance negates their effect Also for this reason the capacitors should be low inductance Chip capacitors surface mount exhibit lower inductance and require less board space than conventional leaded capacitors 14 4 5 I O Pin Characteristics The 1960 Cx processor interfaces to its system through its pins This section describes the general characteristics of the input and output pins 14 4 5 1 Output Pins All output pins on the 1960 Cx processor are three state outputs Each output can drive a logic 1 low impedance to
511. ors facilities for runtime activity monitoring The 1960 architecture provides facilities for monitoring processor activity through trace event generation trace event indicates a condition where the processor has just completed executing a particular instruction or type of instruction or where the processor is about to execute a particular instruction When the processor detects a trace event it generates a trace fault and makes an implicit call to the fault handling procedure for trace faults This procedure can in turn call debugging software to display or analyze the processor state when the trace event occurred This analysis can be used to locate software or hardware bugs or for general system monitoring during program development Tracing is enabled by the process controls PC register trace enable bit and a set of trace mode bits in the trace controls TC register Alternatively the mark and fmark instructions can be used to generate trace events explicitly in the instruction stream The 1960 Cx processors also provide four hardware breakpoint registers that generate trace events and trace faults Two registers are dedicated to trapping on instruction execution addresses while the remaining two registers can trap on the addresses of various types of data accesses 8 1 TRACE CONTROLS To use the architecture s tracing facilities software must provide trace fault handling procedures perhaps interfaced with a debugging monitor Soft
512. ort two types of procedure calls an integrated call and return mechanism and a RISC style branch and link instruction The integrated call and return mechanism automati cally saves local registers when a call instruction executes and restores them when a ret return instruction executes The RISC style branch and link is a fast call that does not save any of the registers These mechanisms result in high performance and reduced code size while maintaining assembly level compatibility To attain the highest performance for procedure calls and returns the processors integrate a programmable depth register cache The register cache internally saves the local registers for procedure calls rather than actually writing the data to the external procedure stack This caching greatly reduces the external bus traffic associated with procedure context saving and restoring 1 1 3 Versatile Instruction Set and Addressing The 1960 Cx microprocessors offer a full set of load store move arithmetic shift comparison and branch instructions and support operations on both integer and ordinal data types They also provide a complete set of Boolean and bit field instructions to simplify manipulation of bits and bit strings Most instructions are typical RISC operations However several commonly used complex instruc tions are also part of the instruction set Performance is optimized by implementing these commonly used functions with parallel hardware For inst
513. ory When the processor is initialized these pointers are read from the initialization data structures and cached for internal use Pointers to the system procedure table interrupt table interrupt stack fault table and control table are specified in the processor control block Supervisor stack location is specified in the system procedure table User stack location is specified in the user s startup code Of these structures the system procedure table fault table control table and initialization data structures may be in ROM the interrupt table and stacks must be in RAM The interrupt table must be in RAM because the processor sometimes writes to it 2 8 2 PROGRAMMING ENVIRONMENT 2 5 MEMORY ADDRESS SPACE Address space is byte addressable with addresses running contiguously from 0 to 237 1 Some of this address space is reserved or assigned special functions as shown in Figure 2 3 Address 31 0 0000 0000H 0 NMI Vector 0000 0004H Internal Data RAM optional interrupt vectors 0000 003FH 64 0000 0040H Internal Data RAM optional DMA registers 0000 00BFH 0000 00COH 192 0000 0100H 256 0000 03FFH 0000 0400H 1024 Code Data Architecturally Defined Data Structures external memory FEFF FFFFH FF00 0000H Reserved Memory FFFF FFEFH FFFF FFOOH Initialization Boot Record IBR FFFF FF2CH FFFF FF2DH Reserved Memory 2 FFFF FFFFH 2324 4 Gbytes Figure 2 3 Addre
514. ory speed affect chaining latency Chaining latency is defined as the time required for the DMA controller to access the next descriptor plus the time required to set up for the next buffer transfer Chaining latency is reduced by placing descriptors in internal data RAM or fast memory 13 6 DMA SOURCED INTERRUPTS Each DMA channel is the source for one interrupt When a DMA channel signals an interrupt the DMA interrupt pending bit corresponding to that channel is set in the interrupt pending IPND register Each channel s interrupt can be selectively masked in the interrupt mask IMSK register or handled as a dedicated hardware requested interrupt Refer to CHAPTER 6 INTERRUPTS for a complete description of hardware requested interrupts The interrupt pending bit for a DMA channel is set for the following conditions 1 A non chained DMA terminates because byte count reaches zero 0 or chained DMA terminates because the null chaining pointer is reached 2 EOP3 0 pin is programmed as an input and asserted after a sdma is performed 3 For a chained DMA the interrupt on buffer complete function is enabled and the end of a chaining buffer is reached 13 16 intel CONTROLLER 13 7 SYNCHRONIZING A PROGRAM CHAINED BUFFER TRANSFERS When any of the conditions listed above occur the current DMA request is completed before the pending bit in the IPND register is set Two mechanisms illustrated in Figure 13 8 enable a
515. ource destination chained e The null chaining pointer is encountered in any chaining mode The DMA takes the following actions when any one of these events occur e DMAC register channel done flag is set e register channel terminal count flag is set only if the byte count has reached zero 0 non chained or the null chaining pointer is reached chaining e register channel active bit is reset after all channel activity has completed e PND register channel interrupt pending bit is set If the corresponding bit in the is cleared an interrupt is signaled When a chained DMA channel is set up for source destination chaining the EOP3 0 inputs are designed to terminate only the current chaining buffer The DMA controller continues normally with the next buffer transfer The DMA ends as described above if the EOP3 0 pin is asserted during the last buffer transfer When EOP3 0 is asserted the entire DMA bus request completes before the DMA terminates For example assume the DMA is programmed for quad word transfers If EOP3 0 is asserted the entire quad word is transferred before the DMA terminates The DMA controller may be configured to generate an interrupt when a DMA terminates A program may determine how a DMA has ended by reading the DMAC register channel terminal count and channel done flag values e Ifa channel s terminal count flag and done flag are set the DMA has ended due to a byte coun
516. pact to the processor s available resources When the cache is configured for 6 to 14 sets part of the internal data RAM is used to expand the cache Data RAM usage begins at the highest address of internal RAM 03FFH and grows downward As indicated in Table 5 1 the programmed value of the cache configuration word CCW in the PRCB determines the number of register sets cached and the amount of internal data RAM used 5 8 intel PROCEDURE CALLS Table 5 1 PRCB Cache Configuration Word and Internal Data RAM CCW Value of Cached Sets we 0 1 0 1 lt CCW lt 5 CCW 0 6 X CCW lt 15 CCW 1 CCW 5 16 Register cache cannot be disabled Register cache size equals 1 when the cache configuration word is programmed to a value of 0 Also a value of 5 or 6 produces the same cache number of cache sets however when programmed to 6 16 bytes of internal data RAM is used when programmed to 5 no internal data RAM is used The user program is responsible for preventing any corruption to the areas of internal RAM which are used for the register cache In a typical program most procedure calls and returns cause procedure depth to oscillate a few levels around a median call depth The cache tends to be partially filled at the median call depth Cache flushes occur when oscillations around the median depth are larger than the cache size can accommodate Configuring local register cache to hold five sets of local regist
517. pcode Mach Instruction Result nif om of 2 1 0 p te Events Modes T A Format Type Issue Latency Not Bit n i bitpos sre dst m I 4 580 REG R 05 1 1 dst lt src xor 2 bitpos mod 32 Not Or notor 58 D REG R 05 1 1 dst lt not src2 or srel Or or src2 dst reg lit sfr reg sfr I I 58 7 REG R 0 5 1 1 dst lt src2 or srel Or Not ornot srcl src2 dst reg lit sfr reg lit sfr reg sfr I I 58 REG R 0 5 1 1 dst lt src2 or not src1 Remainder Integer remi sre2 dst it sfr reg lit sfr reg sfr IO i 74 8 3 36 if src 0 Arithmetic Zero Divide fault ZD dst lt remainder sre2 sre1 src2 srcl and dst are 32 bits Remainder Ordinal remo src src2 dst reg lit sfr reg lit sfr reg sfr I ZD 70 8 REG m 3 36 if src 0 Arithmetic Zero Divide fault dst lt remainder src2 srcl sre2 srcl and dst are 32 bits Return ret if PFP rt 001 or PFP rt 111 ret from fault or interrupt handler memory 12 if PC em Supervisor PC memory FP 16 IR else if PFP rt 010 or PFP rt 011 0 IRPS PS 0A CTRL u 4 fill 4 4 fill ret to user procedure PC te lt 10 PC em lt User 1 FP 0 15 lt accesses are cached in lt RIP local register cache Rotate len sre dst
518. pe mismatch fault to be generated This feature provides supervisor protection for DMA and Interrupt functions which use internal RAM See section 2 7 USER SUPERVISOR PROTECTION MODEL pg 2 20 User mode write protection is optionally selected for the rest of the data RAM 0100H to 03FFH by setting the Bus Configuration Register BCON RAM protection bit 2 5 5 Instruction Cache The instruction cache enhances performance by reducing the number of instruction fetches from external memory The cache provides fast execution of cached code and loops of code in the cache and also provides more bus bandwidth for data operations in external memory The 1960 Cx processors instruction cache is a two way set associative cache organized in two sets of eight word lines Each line is composed of four two word blocks which can be replaced independently e i960 processor cache is 1 Kbyte organized as two sets of 16 eight word lines 1960 CF processor cache is 4 Kbytes organized as two sets of 64 eight word lines To optimize cache updates when branches or interrupts execute each word in the line has a separate valid bit Cache misses cause the processor to issue either double or quad word fetches to update the cache Refer to APPENDIX INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION for a thorough discussion of the instruction cache operation Bus snooping is not implemented with the 1960 Cx processors cache The
519. perand fault Unimplemented Attempted to execute unimplemented command set clear cache message xecute cache invali dation r7 r8 are dummies here branch to code which was uploaded de de de de dE dE 659H REG NOTE When modify process controls modpc instruction causes a program s priority to be lowered other 1960 processor family members check for pending interrupts in the memory based interrupt table the 1960 Cx device internally stores the priority of the highest pending interrupt found in the interrupt table s pending interrupts field To improve performance the stored priority is checked rather than the memory based interrupt table when modpc changes a process priority The internal priority value is updated each time an interrupt is posted using sysctl 9 80 intel 9 3 59 Mnemonic Format Description Action Faults TEST INSTRUCTION SET REFERENCE teste t f Test For Equal testne t f Test For Not Equal testl t f Test For Less testle t f Test For Less Or Equal testg t f Test For Greater testge t f Test For Greater Or Equal testo t f Test For Ordered testno t f Test For Not Ordered test t f dst reg sfr Stores a true 01H in dst if the logical AND of the condition code and opcode mask part is not zero Otherwise the instruction stores a false 00H in dst For testno Unordered a true is stored if the condition code is 0005 otherwise a fals
520. pletes and the result is written to the destination register When an instruction immediately follows an EU operation that references the EU s destination register the new instruction is issued in the clock following the EU operation INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION The EU directly executes the instructions listed in Table A 6 The EU is pipelined such that back to back EU operations execute at a one clock sustained rate The EU returns its result to the destination register in the clock following the clock in which the instruction was issued If a fixup is needed during shrdi execution the processor executes a four clock micro flow See section A 2 6 Micro flow Execution pg 36 addo g0 41 g2 shlo g3 g4 g5 subo g5 g6 g7 shro 48 49 410 Pn duet n addo shlo subo shro Scheduler Issue EU Read src 1 src2 90 91 93 04 95 96 98 99 Pipeline Execute and g2 lt g0 g1 95 lt 04 lt lt 93 97 lt 06 05 010 lt 09 gt gt 08 Write dst Figure A 8 EU Execution Pipeline Table A 6 EU Instructions addo shlo mov and addi shro movl andnot addc shri cmpo notand subo shli cmpi nand subi shrdi cmpdeco or subc eshro cmpdeci nor ornot setbit alterbit scanbyte notor clrbit chkbit xnor notbit xor not rotate INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 4 2 Multiply Divide Unit MDU The MDU performs multiplication division remainder and modulo
521. plexer Inputs B 3 4 Series Damping Resistors Series damping resistors are recommended on all DRAM control and address inputs Series damping resistors prevent overshoot and undershoot on input lines Damping is required because of the large capacitive load present when many DRAMSs are connected together combined with circuit board trace inductance Damping resistor values are typically between 15 and 100 Ohms depending on the load the lower the load the higher the required damping resistor value If the damping resistor value is too high the signal will be overdamped extending memory cycle time If the damping resistor value is too low overshoot or undershoot is not sufficiently damped B 20 BUS INTERFACE EXAMPLES B 3 5 System Loading 1960 Cx processors can drive a large capacitive load However systems with many DRAM banks may require data buffers and for interleaved designs multiplexers to isolate the DRAM load from the 1960 Cx processors or other system components with less drive capability e g high speed SRAM RAS and CAS inputs to the DRAM should also be designed with consideration for capacitive load When many DRAMs are connected to common RAS and CAS signals the capacitive load become considerable B 3 6 Design Example Burst DRAM with Distributed RAS Only Refresh Using DMA The goal of this design is to illustrate a DRAM interface controller that provides good memory performan
522. power and ground connections to the chip and reduces transient noise induced by current surges The i960 Cx processor is implemented in CHMOS IV technology Unlike NMOS processes power dissipation in the CHMOS process is due to capacitive charging and discharging on chip and in the processor s output buffers there is almost no DC power component The nature of this power consumption results in current surges when capacitors charge and discharge The processor s power consumption depends mostly on frequency It also depends on voltage and capacitive bus load see the i960 CA CF microprocessor data sheets To reduce clock skew on later versions of the 1960 Cx processor the pin for the Phase Lock Loop PLL circuit is isolated on the pinout The lowpass filter as shown in Figure 14 5 reduces CLKIN to PCLK2 1 skew in system designs This circuit is compatible with those i960 Cx processor versions which do not implement isolated PLL power VocpLe Voc On i960 Cx processors Board Plane J 22 uf Figure 14 5 Vccp__ Lowpass Filter CA078A 14 4 3 Power and Ground Planes Power and ground planes must be used in 1960 Cx processor systems to minimize noise Justifi cation for these power and ground planes is the same as for multiple and pins Power and ground lines have inherent inductance and capacitance therefore an impedance Z L C 7 Total characteristic impedance for the power supply can be reduced by adding m
523. provide one faulto and faultno are provided for use by implementations with a floating point coprocessor They are used for compare and branch or fault operations involving real numbers The following table shows the condition code mask for each instruction The mask is opcode bits 0 2 Instruction Mask Condition faultno 0005 Unordered faultg 0015 Greater faulte 0105 Equal faultge 0115 Greater or equal faultl 1005 Less faultne 1015 Not equal faultle 1105 Less or equal faulto 1115 Ordered For all instructions except faultno if mask and 0005 Constraint range fault faultno if AC cc 000 Constraint range fault Constraint Range If condition being tested is true intel Example Opcode See Also INSTRUCTION SET REFERENCE assume AC cc AND 1105 faultle faulte 1AH faultne 1DH faultle 1 faultg 19H faultge 1BH faulto 1FH faultno 18H BRANCH IF TEST CTRL CTRL CTRL CTRL CTRL CTRL CTRL CTRL Constraint Range Fault is generated 9 41 INSTRUCTION SET REFERENCE intel 9 3 27 Format Description Action Faults Example Opcode 9 42 flushreg flushreg Flush Local Registers flushreg Copies the contents of every cached register set except the current set to its associated stack frame in memory The entire register cache is then marked as purged or invalid On a re
524. ption Action 9 22 calls calls Call System calls targ reg lit Calls a system procedure targ specifies called procedure s number For calls the processor performs system call operation described in section 5 5 SYSTEM CALLS pg 5 12 targ provides an index to a system procedure table entry from which the processor gets the called procedure s IP The called procedure can be a local or supervisor procedure depending on system procedure table entry type If it is a supervisor procedure the processor switches to supervisor mode if not already in this mode Processor also allocates a new set of local registers and new stack frame for called procedure If the processor switches to supervisor mode the new stack frame is created on the supervisor stack if targ gt 259 Protection length fault wait for any uncompleted instructions to finish temp entry lt memory word SPT targ SPT targ is the address of the system procedure table entry targ RIP lt next IP if temp_entry type local or PC em supervisor no stack switch required round to next boundary temp FP lt SP 0x10 andnot Oxf temp rt lt 000 return type is local else stack switch to supervisor stack required read supervisor temp FP lt memory word cached SPT stack pointer set return type to supervisor if PC te 0 temp rt lt 0105 with trace disabled else temp rt lt 0115 with trace ena
525. ptionally an internal self test program The self test STEST pin enables or disables internal self test The FAIL pin indicates that either of the self tests passed or failed Internal self test checks basic functionality of internal data paths registers and memory arrays on chip Internal self test is not intended for a full validation of the processor s functionality Internal self test detects catastrophic internal failures and complements a user s system diagnostics by ensuring a confidence level in the processor before any system diagnostics are executed Internal self test is disabled with the STEST pin Internal self test can be disabled if the initial ization time needs to be minimized or if diagnostics are simply not necessary The STEST pin is sampled on the rising edge of the RESET input If asserted high the processor executes the internal self test if deasserted the processor bypasses internal self test The external bus confidence test is always performed regardless of STEST pin value The external bus confidence self test checks external bus functionality it reads eight words from the Initialization Boot Record IBR and performs a checksum on the words and the constant FFFF FFFFH If the processor calculates a sum of zero 0 the test passes The external bus confidence test can detect catastrophic bus failures such as shorted address data or control lines in the external system See section 14 2 4 Initial Memory Image
526. qual 0115 less or equal 1105 and not equal 1015 The mask is part of the instruction opcode the instruction performs a bitwise AND of the mask and condition code The AC register integer overflow flag bit 8 and integer overflow mask bit bit 12 are used in conjunction with the arithmetic integer overflow fault The mask bit disables fault generation When the fault is masked and integer overflow is encountered the processor instead of generating a fault sets the integer overflow flag If the fault is not masked the fault is allowed to occur and the flag is not set Once the processor sets this flag it never implicitly clears it the flag remains set until the program clears it Refer to the discussion of the arithmetic integer overflow fault in CHAPTER 7 FAULTS for more information about the integer overflow mask bit and flag The no imprecise faults bit bit 15 determines whether or not faults are allowed to be imprecise If set all faults are required to be precise if clear certain faults can be imprecise See section 7 9 PRECISE AND IMPRECISE FAULTS pg 7 17 for more information 2 6 3 Process Controls PC Register The PC register Figure 2 5 is used to control processor activity and show the processor s current state PC register execution mode flag bit 1 indicates that the processor is operating in either user mode 0 or supervisor mode 1 The processor automatically sets this flag on a system call when
527. r IP with Displacement 3 7 IPND register 12 14 IS instruction scheduler A 3 L latency interrupt servicing 12 17 Id 9 44 Id Idl Idt Idq 9 44 Ida 9 46 Idib Idis 9 44 Idob Idos 9 44 leaf procedures 5 1 literal addressing and alignment 2 5 literals 2 1 2 5 little endian 2 12 11 24 Load and store instructions 10 1 load instructions 4 5 9 44 load and lock mechanism 2 14 4 22 local calls 5 2 5 12 7 2 call 5 2 callx 5 2 Local Register Cache 5 3 A 7 Index 9 INDEX local registers 2 1 5 2 allocation 2 3 5 2 management 2 3 overview 1 7 usage 5 2 local stack 2 1 LOCK pin C 6 logical instructions 4 10 M mark 9 47 MCON external bus width 10 1 15 registers 10 1 10 6 I O configuration A 13 memory region configuration 10 3 MDU multiply divide unit A 7 memory access 11 2 non burst and non pipelined 11 13 memory address space 2 1 external memory requirements 2 10 atomic access 2 10 byte ordering big endian 2 12 little endian 2 12 data alignment 2 11 data block sizes 2 11 data block storage 2 12 indivisible access 2 10 instruction alignment in external memory 2 11 reserved memory 2 10 location 2 9 management 2 0 Index 10 m intel memory addressing modes Absolute 3 6 examples 3 7 Index with Displacement 3 7 IP with Displacement 3 7 Register Indirect 3 6 memory region configuration MCON table 10 1 memory region control registers MCON 0 15 10 6 memory regions A31 28
528. r 1 r0 r2 r4 or g0 g2 or 5 0 sf2 This instruction performs ordinal arithmetic dst src2 srcl src and src2 32 bits dst is 64 bits Type Mismatch Non supervisor reference of a sfr emul r4 r5 g2 42 43 r4 r5 emul 670H REG ediv muli mulo 9 37 INSTRUCTION SET REFERENCE intel 9 3 24 80960Cx Processor Only Mnemonic eshro Extended Shift Right Ordinal Format eshro srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Shifts src2 right by src mod 32 places and stores the result in dst Bits shifted beyond the least significant bit are discarded src2 value is a long ordinal i e 64 bits contained in two adjacent registers src2 operand specifies the lower numbered register which contains operand s least significant bits src2 operand must be an even numbered register 1 IO 12 14 or g0 g2 or 570 sf2 srcl operand is a single 32 bit register literal or sfr where the lower 5 bits specify the number of places that the src2 operand is to be shifted The shift operation result s least significant 32 bits are stored in dst Action dst lt src2 gt gt srcl mod 32 src2 is 64 bits src and dst are 32 bits Faults Type Mismatch Non supervisor reference of a sfr Example eshro 43 94 911 911 lt 44 5 shifted right by g3 MOD 32 Opcode eshro 5D8H REG See Also SHIFT extract 9 38 intel INSTRUCTION SET REFERENCE
529. r initial image 14 10 see also Arithmetic Controls AC register add 9 9 addc 9 8 addi addo 9 9 Address Generation Unit AGU A 7 A 25 Address Space Restrictions data cache C 3 data structure alignment C 3 instruction cache C 2 internal data RAM C 2 reserved memory C 2 stack frame alignment C 3 addressing registers and literals 2 5 AGU address generation unit A 7 alignment of registers and literals 2 5 alterbit 9 10 and andnot 9 11 argument list 5 10 Arithmetic Controls AC register 2 15 condition code flags 2 16 initialization 2 16 integer overflow flag 2 17 no imprecise faults bit 2 17 arithmetic instructions 4 6 add subtract multiply or divide 4 7 extended precision instructions 4 8 remainder and modulo instructions 4 8 shift and rotate instructions 4 9 arithmetic operations and data types 4 7 atadd 9 12 INDEX atmod 9 13 atomic access 2 10 Atomic instructions LOCK signal 4 18 11 26 B b 9 19 b bx 9 14 bal balx 9 15 bb 9 17 bbc bbs 9 17 BCU see Bus Control Unit be bg bge 9 19 big endian 2 12 11 24 bit definition 1 7 bit bit field and byte instructions 4 10 bit field instructions 4 11 bit instructions 4 10 byte instructions 4 12 bits and bit fields 3 3 bl ble bne 9 19 Block mode transfers DMA 13 1 bo bno 9 19 branch instructions 4 13 9 19 compare and branch instructions 4 15 conditional branch instructions 4 15 unconditional branch instructions 4 14 branch predictio
530. r Indirect 3 6 register indirect with displacement 3 7 register indirect with index 3 7 register indirect with index and displacement 3 7 register indirect with offset 3 7 register scoreboarding 2 4 A 5 A 18 common application 2 4 example 2 5 implementation 2 4 Index 12 intel registers naming conventions 1 7 remi remo 9 61 reserved locations C 4 reserved memory 1 6 resource scoreboarding 5 A 18 ret 9 62 Return Instruction Pointer RIP 5 5 location 2 3 return operation 5 6 return type field 5 4 RF register file A 6 RIP Return Instruction Pointer r2 5 4 rotate 9 64 S SALIGN C 3 scanbit 9 65 scanbyte 9 66 Scoreboarding A 17 register scoreboarding 18 resource scoreboarding A 18 sdma 9 67 13 24 setbit 9 68 SFRs see special function registers 2 1 sh 9 69 shift instructions 9 69 shli shri shrdi 9 69 shlo shro 9 69 six port register file A 6 SP Stack Pointer r1 5 4 spanbit 9 72 intel special function registers SFRs 2 1 2 4 9 4 access to 2 4 data cache CF only 2 4 overview 1 7 reading or modifying 2 4 usage 2 4 SRAM interface B 1 SRAM A 7 st 9 73 St stl stt stq 9 73 stack frame allocation 5 2 Stack Pointer SP 5 4 location 2 3 Static RAM 7 stib stis 9 73 stob stos 9 73 store instructions 4 5 9 73 sub 9 76 subc 9 75 subi subo 9 76 supervisor calls 5 2 supervisor mode resources 2 20 supervisor pin 2 20 supervisor stack 2 1 2 8 supervisor trace mo
531. r Register PFP 0 5 16 intel return type field indicates the type of call which was made Table 5 3 shows the return type field encoding for the various calls local supervisor interrupt and fault PROCEDURE CALLS trace on return flag PFP rtO or bit 0 of the return type field stores the trace enable bit value when a system supervisor call is made from user mode When the call is made the PC register trace enable bit is saved as the trace on return flag and then replaced by the trace controls bit in the system procedure table On a return the trace enable bit s original value is restored This mechanism allows instruction tracing to be turned on or off when a supervisor mode switch occurs prereturn trace flag PFP p is used in conjunction with call trace and prereturn trace modes If call trace mode is enabled when a call is made the processor sets the prereturn trace flag otherwise it clears the flag Then if this flag is set and prereturn trace mode is enabled a prereturn trace event is generated on a return before any actions associated with the return operation are performed See section 8 2 TRACE MODES pg 8 4 for a discussion of interaction between call trace and prereturn trace modes with the prereturn trace flag Table 5 3 Encoding of Return Status Field Return Status Call Type Return Action Field Locali all Local return p000 system local call or system superviso
532. r call the processor switches to the supervisor stack CAO21A Figure 7 4 Storage of the Fault Record on the Stack 7 8 ntel FAULTS 7 6 MULTIPLE AND PARALLEL FAULTS Multiple fault conditions can occur during a single instruction execution and during multiple instruction execution when the instructions are executed by parallel execution units within the processor The following sections describe how faults are handled under these conditions 7 6 1 Multiple Faults Multiple fault conditions can occur during a single instruction execution For example an instruction can have an invalid operand and unaligned address When this situation occurs the processor is required to recognize and generate at least one of the fault conditions The processor may not detect all fault conditions and may not report all detected faults In a multiple fault situation the reported fault condition is left to the implementation The archi tecture however does define the criteria for determining which fault to report when trace fault conditions are one or more of the fault conditions 7 6 2 Multiple Trace Fault Conditions Only Multiple trace fault conditions that single instruction executions generate are reported in a single trace fault To support this multiple fault reporting the trace fault uses bit positions in the fault subtype field to indicate occurrences of multiple faults of the same type Table 7 1 For example when instruc
533. r call i return to local stack no mode switch made from supervisor mode p001 Fault call Fault return Supervisor return 011 Sustamdtietyitorthom sort d6 return to user stack mode switch to user P y P mode trace enable bit is replaced with the t bit stored in the PFP register on the call p100 reserved p101 reserved p110 reserved 111 Interrupt call Interrupt return NOTE p is prereturn trace flag denotes the trace on return flag used only for system supervisor calls which cause a user to supervisor mode switch PROCEDURE CALLS intel 5 9 BRANCH AND LINK A branch and link is executed using either the branch and link instruction bal or branch and link extended instruction balx When either instruction executes the processor branches to the first instruction of the called procedure the target instruction while saving a return IP for the calling procedure in a register The called procedure uses the same set of local registers and stack frame as the calling procedure For bal the return IP is automatically saved in global register g14 for balx the return IP instruction is saved in a register specified by one of the instruction s operands A return from a branch and link is generally carried out with a bx branch extended instruction where the branch target is the address saved with the branch and link instruction The branch and link method of making procedure calls is recommended for cal
534. r requests are serviced in a first in first out FIFO manner The DMA does not issue back to back requests therefore the CPU is guaranteed access to the external bus between DMA accesses thus allowing the user and DMA processes to execute concurrently while sharing the external bus Queue depth affects bus request and interrupt latency Queued requests must be serviced before the pending request can be serviced If an interrupt occurs when all three bus queue entries are full the three outstanding requests must be serviced before the first interrupt instruction may be fetched from memory 10 14 intel THE BUS CONTROLLER 10 6 2 Data Packing Unit The data packing unit handles data movement between queues and external bus It controls data alignment and data packing e Data is unpacked when data store request width exceeds physical bus width e Data is packed when data load request width exceeds physical bus width If a word load is issued to an 8 bit bus the bus controller issues four 1 byte reads and the data packing unit assembles incoming data into a single word If a quad word store is issued to an 8 bit bus the bus controller issues four 4 byte writes and the data packing unit unpacks the outgoing data 10 6 3 Bus Translation Unit and Sequencer The bus translation unit is responsible for looking up the memory configuration in the region table The look up is based on the bus request s address The bus request and region table da
535. rallel with other R or M type instructions u Micro flow The processor performs this instruction by issuing a sequence of R M and or C type instructions stored in its internal ROM Execution Time March 1994 The instruction s execution time is listed in two ways Instruction Issue and Result Latency These times are not additive they represent a range from minimum to maximum within which actual execution time will fall Instruction Issue time is the number of clocks the instruction uses when there are no register or resource dependencies to slow it down Back to back instructions with no dependencies execute at the instruction issue rate Result Latency is the length of time that an instruction uses to complete once it begins Back to back instructions which are dependent upon each other execute at the Result Latency rate In the Instruction Execution column a range of numbers e g 0 5 1 indicates either the degree of parallel instruction issue achieved or conditions specific to the instruction s run time execution such as branch taken or not taken Table 3 describes the shorthand for additive factors that appear in the execution time columns Page 2 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Table 3 Execution Times Shorthand for Additive Factors efa The time for effective address calculation For the Ida instruction e
536. rc3 reg lit sfr reg lit sfr reg lit message type Description Processor control function specified by the message field of src is executed The type field of srcl is interpreted depending upon the command Remaining src bits are reserved The src2 and src3 operands are also interpreted depending upon the command The srcl operand is interpreted as follows 31 16 15 8 70 srci FIELD 2 MESSAGE TYPE FIELD 1 The following table lists 1960 Cx processor commands Src1 Src2 Src3 Message Type Field 1 Field 2 Field 3 Field 4 Request 00H Vector Number N U N U N U Interrupt Invalidate 01H N U N U N U N U Cache Configure 02H Cache Mode Configuration see N U Cache load N U Cache table N U address Reinitialize 03H N U N U 1st Inst PRCB address address Load Control 04H Register Group Number N U N U N U Register NOTE Sources and fields which are not used designated N U are ignored When executing a sysctl instruction to load and lock either half or all of the cache it is necessary to provide a cache load address The last two bits of the cache load address must be 10 for the cache locking mechanism to work properly 9 78 intel INSTRUCTION SET REFERENCE Table 9 5 Cache Configuration Modes Mode Field Mode Description CA CF 000 normal cache enabled 1 Kbyte 4 Kbytes cache disabled 1 Kbyte 4 Kbytes 100 Load and lock
537. rd goes out on different data lines on a 32 bit bus depending on whether address line A1 is odd or even Table 11 4 also shows that the 1960 Cx processors handle byte data types the same regardless of byte ordering type Multiple word bus requests bursts to a big endian region are handled as individual words Bytes in each word are stored in big endian order Big endian data types that exceed 32 bits are not supported and must be handled by software 11 25 EXTERNAL BUS DESCRIPTION In Table 11 4 Byte Ordering on Bus Transfers Word Data Type Bus Pins Data Lines 31 0 Bus Addr Bits Little Endian Big Endian Width 1 0 e 31 24 23 16 15 8 7 0 31 24 23 16 15 8 7 0 32 bit 00 1st aa bb cc dd dd bb aa 16 bit 00 1st bb aa 00 2nd aa bb dd 8 bit 00 1st dd DE 00 2 bb 00 3rd bb gt 00 4th Half Word Data Bus Pins Data Lines 31 0 Bus Addr Bits Little Endian Big Endian Width 1 0 31 24 23 16 15 8 7 0 31 24 23 16 15 8 7 0 32 bit 00 1st dd dd 10 1st cc dd dd 16 bit X0 1st Es dd dd 8 bit X0 1st dd 2nd 55 Byte Data Type Bus Pins Data Lines 31 0 Bus Addr Bits si Little
538. rd higher significance by a specified number of bits Bits shifted beyond register s left boundary bit 31 appear at the right boundary bit 0 eshro is an 1960 Cx processor specific extension to the 1960 processor family s instruction set This instruction performs an ordinal right shift of a source register pair 64 bits by as much as 32 bits and stores the result in a single 32 bit register This instruction is equivalent to an extended divide by a power of 2 which produces no remainder The instruction is also the equivalent of a 64 bit extract of 32 bits 4 9 INSTRUCTION SET SUMMARY In 4 2 3 Logical These instructions perform bitwise Boolean operations on the specified operands and src2 AND srcl notand NOT src2 AND srcl andnot src2 AND NOT src xor src2 srcl or src2 OR srcl nor NOT src2 OR src1 xnor src2 XNOR src not NOT srcl notor NOT src2 or srcl ornot src2 or NOT srcl nand NOT src2 AND srcl These all use the REG format and can specify literals or local global or special function registers The processor provides logical operations in addition to and or and xor as a performance optimi zation This optimization reduces the number of instructions required to perform a logical operation and reduces the number of registers and instructions associated with bitwise mask storage and creation 4 2 4 Bit and Bit Field These instructions perform operations on a specified bit or bit fiel
539. rded in a parallel fault record depends on the fault types detected A change in the program s control flow may or may not accompany parallel faults depending on fault types detected ntel FAULTS 7 10 5 Protection Faults Fault Type 7H Fault Subtype Number Name OH 1H Reserved 2H Length 3H Reserved 4H SRAM Protection 5 FH Reserved Function Indicates a program or procedure is attempting to perform an illegal operation that the architecture protects against A length fault is generated when the index operand used in a calls instruction points to an entry beyond the extent of the system procedure table SRAM protection is generated when a write to the on chip SRAM is attempted while in user mode RIP Same as the address of faulting instruction field Program State Changes This fault type is always precise regardless of the NIF bit value A change in the program s control flow does not accompany a length fault a fault is generated before the faulting instruction 7 25 FAULTS 7 10 6 Trace Faults Fault Type Fault Subtype Function 7 26 Number Name Bit 0 Reserved Bit 1 Instruction Trace Bit 2 Branch Trace Bit 3 Call Trace Bit4 Return Trace Bit 5 Prereturn Trace Bit 6 Supervisor Trace Bit 7 Breakpoint Trace Indicates the processor detected one or more trace events The event tracing mechanism is described in CHAPTER 8 TRACING AND DEBUGGING A trace event is the occurrence of a particul
540. re 3 2 3 1 2 Ordirials 3 3 3 1 3 Bits and Bit Fields titre aan tener tid 3 3 3 1 4 Triple and Quad Words esee nantes inniti daten 3 4 3 1 5 Data Alignment ied t e i ccna ene 3 4 3 2 ND E REP ene mede 3 4 3 3 MEMORY ADDRESSING 3 5 3 8 1 Absolute teo hee Seba E eee nito t e diee 3 6 3 3 2 Register Indirect rettet e e e 3 6 3 3 3 Index with Displacement 3 7 3 3 4 IP with Displacement 3 7 3 3 5 Addressing Mode Examples 2 040 3 7 CHAPTER 4 INSTRUCTION SET SUMMARY 4 1 INSTRUCTION 2 4 1 4 1 1 Assembly Language 4 1 4 1 2 Branch Prediction 5a trot tu o eire tte tas 4 2 4 1 3 Instruction Encoding Formats 4 2 4 1 4 Instruction Operands tert oer tre nen 4 3 4 2 INSTRUCTION GROUPS iier ede Yet eee eet emper 4 4 4 2 1 Data Movement ie dece Ed 4 5 4 2 1 1 Load and Store Instructions seem 4 5 4 2 1 2 te nah ee HAE 4 6 4 2 1 3 Load Address aiite gine ii te ert ttes 4 6 4 2 2 ArithiTigtio cue rte Me cr rer Mee e E s ceu LOR 4 6 4 2
541. re memory regions would have to be programmed as non cacheable to support routine DMA and semaphore operations Whenever the cacheability of a region is changed cache coherency becomes an immediate issue The coherency mechanism solves this issue directly The processor compares a non cacheable store to the relevant tag in the data cache If the store address matches the tag the processor invalidates the entire cache line In a single processor system this guarantees that the data cache never contains stale data When the data cache is globally disabled all stores are non cacheable and the processor invalidates relevant tags whenever addresses match A 10 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 1 8 7 BCU Pipeline and Data Cache Interaction The BCU s interaction with the data cache affects overall bus throughput Figure A 6 shows how the BCU and data cache process a series of hits and misses for cacheable loads and stores Id 00 01 data cache hit Id g2 94 data cache miss Id 93 98 data cache hit st 91 90 store is scoreboarded Instruction Scheduler id 9 dq minem st Address 0 Out Bus gp St Bus a Pipeline External 2 Address Bus 9 90 External 02 91 Data Bus LD Bus 04 lt 02 90 92 93 Address Out Bus Hit Miss Hit 90 St Bus Cache lt g1 quad Pipeline Bus gt go 98 03 4 Figure 6 BCU Data C
542. reater than or equal to lt lt gt gt Logical Shift Exponentiation and or not xor Bitwise Logical Operations mod Modulo Addition Subtraction Multiplication Integer or Ordinal Division Integer or Ordinal 4 Comment delimiter memory Memory access of specified width memory byte short word long triple quad memory Width implied by context 9 2 6 Faults The Faults section lists faults that can be signaled as a direct result of instruction execution Table 9 3 shows two possible faulting conditions that are common to the entire instruction set and could directly result from any instruction These fault types are not included in the instruction reference Table 9 4 shows three possible faulting conditions that are common to large subsets of the instruction set Other instructions can generate faults in addition to those shown in the following tables If an instruction can generate a fault it is noted in that instruction s Faults section Table 9 3 Fault Types and Subtypes Fault Type Subtype Description Trace An Instruction Trace Event is signaled after instruction completion A Trace fault is generated if both PC te and TC i 1 A Breakpoint Trace Event is signaled after completion of an instruction for which there is a hardware breakpoint condition match and TC br is set A Trace fault is generated if PC te and TC br are both 1 Operation
543. register a poorly placed store in front of a critical load slows down the load Reorder to issue the load first Example A 1 shows a simple change that saved one clock from a five clock loop Example A 1 Overlapping Loads Checksum loop opt loop ldob g0 41 ldob g0 41 addo gl 42 cmpinco g0 g3 gO cmpinco 0 g3 addo gl g2 g2 bl t loop bl t opt loop Execution Execution Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 Idob 1 Idob 2 2 cmpinco 3 3 bl t 4 addo bl t 4 addo 5 cmpinco 5 Idob 6 Idob A 44 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION A 2 7 2 Multiplication and Division Begin multiply and divide instructions several cycles before instructions that use their results MDU instructions consume less than one clock if they are sufficiently separated from the instruc tions that use their results Also remember to use shift instructions to replace multiplication and division by powers of two Example A 2 shows overlapping pointer math and a comparison with the 32x32 multiply time in a simple multiply accumulate loop Example A 2 Overlapping MDU Operations Multiply Accumulate loop opt_loop 1 40 92 1 90 92 1 91 93 1 91 93 muli g2 g3 g4 muli g2 g3 g4 addi g4 g5 g5 addo 4 g0 40 addo 4 gO 40 cmpo g0 g6 addo 4 gl 1 addo 4 gl 1 cmp
544. rency pg A 12 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A cacheable load is checked to see if it hits the data cache in the issue stage of the instruction pipeline and returns data to the register file in the execute state If the load missed the data cache it must be issued to the BCU to fetch the data from external memory The missed load is not issued to the BCU until the execute state which is also the issue stage for the next instruction in the pipeline Because only one instruction can be issued to the BCU at a time the BCU is score boarded for one clock cycle Any instruction directed at the BCU in the clock cycle immediately following a data cache miss is scoreboarded for one cycle For cache misses the data cache is a multi clock processor which must interact with the BCU to return the load to the data cache Because the data cache is a multi clock processor it must arbitrate for access to the return path to the data cache There is a conflict between any new load or store being issued to the data cache and a load returning to the data cache from the BCU In this case the IS stalls for one clock while the returning load is written into the data cache Table A 5 summarizes the additional scoreboarded resources due to the 1960 CF processor s data cache unit Table A 5 Scoreboarded Resource Conditions Due to the Data Cache Condition Description BCU queues contain One or more of the BCU queues contains a
545. rer ee rn tenir rti 5 10 5 4 LOCALE CALLS iato e mela cn t E 5 12 5 5 SYSTEM CALLS 2 20 ac ete pn ie en E RUE hide i tds 5 12 5 5 1 System Procedure 5 13 5 5 1 1 Procedure Entries niece edere tede tede t derer 5 14 5 5 1 2 Supervisor Stack Pointer 2 5 14 5 intel S 5 5 1 3 Trace Control Bit ade n tede edes 5 14 5 5 2 System Call to a Local Procedure 5 15 5 5 3 System Call to a Supervisor Procedure 5 15 5 6 USER AND SUPERVISOR 56 5 5 15 5 7 INTERRUPT AND FAULT GALLS entrent eret tete dete eee 5 16 5 8 RETURNS aet etate ta botte dde net Ca eds Em 5 16 5 9 BRANCH AND EINK e UE nee eu ce vea 5 18 CHAPTER 6 INTERRUPTS 6 1 OVERVIEW 3521 nsns dediti tatit den idem nc ae RES 6 1 6 2 SOFTWARE REQUIREMENTS FOR INTERRUPT HANDLING 6 2 6 3 iere lene dn nea rer e toe eod evite 6 3 6 4 INTERRUPT TABEE 35 htt nte eit predecir pari e eee sua ads 6 3 6 4 1 eee E TR CERE E C A 6 4 6 4 2 Pending Interr pts 52 0 ep Dent ora reco tob ede n TH o Ie i aie 6 5 6 4 3 Caching Portions of the Interrupt Table sm 6 5 6 5 REQUESTING
546. res The user supervisor protection model and its relationship to the supervisor call are described in section 2 7 USER SUPERVISOR PROTECTION MODEL pg 2 20 5 6 USER AND SUPERVISOR STACKS When using the user supervisor protection mechanism the processor maintains separate stacks in the address space One of these stacks the user stack is for procedures executed in user mode the other stack the supervisor stack is for procedures executed in supervisor mode user and supervisor stacks are identical in structure Figure 5 1 The base stack pointer for the supervisor stack is automatically read from the system procedure table and cached internally at initialization or when the processor is reinitialized with sysctl Each time a user to supervisor mode switch occurs the cached supervisor stack pointer base is used for the starting point of the new supervisor stack The base stack pointer for the user stack is usually created in the initial 5 15 PROCEDURE CALLS intel ization code See section 14 2 INITIALIZATION pg 14 2 The base stack pointers must be aligned to a 16 byte boundary otherwise the first frame pointer in the stack is rounded up to the next 16 byte boundary 5 7 INTERRUPT AND FAULT CALLS The architecture defines two types of implicit calls that make use of the call and return mechanism interrupt handling procedure calls and fault handling procedure calls A call to an interrupt procedure
547. res Leaf procedures call no other procedures They are called leaf procedures because they reside at the leaves of the call tree In the 1960 architecture the integrated call and return mechanism is used in two ways explicit calls to procedures in a user s program e implicit calls to interrupt and fault handlers The remainder of this chapter explains the generalized call mechanism used for explicit and implicit calls and call and return instructions 5 1 PROCEDURE CALLS intel The processor performs two call actions local When a local call is made execution mode remains unchanged and the stack frame for the called procedure is placed on the ocal stack The local stack refers to the stack of the calling procedure supervisor When a supervisor call is made execution mode is switched to supervisor and the stack frame for the called procedure is placed on the supervisor stack Explicit procedure calls can be made using several instructions Local call instructions call and callx perform a local call action With call and callx the called procedure s IP is included as an operand in the instruction A system call is made with calls This instruction is similar to call and callx except that the processor obtains the called procedure s IP from the system procedure table system call when executed is directed to perform either the local or supervisor call action These calls are referred to as system local and system supe
548. ress and data cycle for write accesses specifies the number of wait states between data cycle and next address cycle Data to data wait states Nppp Nwpp are not used if burst accesses are not enabled A read access begins by asserting the proper address and status signals ADS A31 2 BE3 0 SUP D C DMA W R on the rising clock edge that begins the address cycle marked as on the figures Assertion of ADS indicates the beginning of an access DT R is driven on the clock s next falling edge This signal is asserted early to ensure that DT R does not change while DEN is asserted DEN is asserted on the clock s next rising edge the rising edge in which ADS is deasserted and the address cycle ends DEN can be used to control external data transceivers The cycles that follow are Npap wait states WAIT is asserted while the internal wait state generator is counting If READY BTERM control is enabled in this region and READY is not asserted after the wait state generator has finished counting wait states continue to be inserted until READY is asserted BLAST assertion indicates end of data transfer cycles for this access DEN is deasserted wait states turnaround wait states follow BLAST new address cycle may start after Nypa cycles expire Nxpa states allow time for slow devices to get off the bus For this figure this access is the last access of a bus request because wait states are in
549. ries A vector entry contains a specific interrupt handler s address When an interrupt is serviced the processor branches to the address specified by the vector entry Each interrupt is associated with an 8 bit vector number which points to a vector entry in the interrupt table The vector entry section contains 248 one word entries Vector numbers 8 through 243 and 252 through 255 and their associated vector entries are used for conventional interrupts Vector number 244 through 247 and 249 through 251 are reserved Vector number 248 and its associated vector entry is used for the non maskable interrupt NMI 6 4 intel INTERRUPTS Vector entry 248 contains the NMI handler address When the processor is initialized the NMI vector located in the interrupt table is automatically read and stored in location OH of internal data RAM The NMI vector is subsequently fetched from internal data RAM to improve this interrupt s performance Vector entry structure is given at the bottom of Figure 6 2 Each interrupt procedure must begin on a word boundary so the processor assumes that the vector s two least significant bits are 0 Bits 0 and 1 of an entry indicate entry type type 000 indicates that the interrupt procedure should be fetched normally type 010 indicates that the interrupt procedure should be fetched from the locked partition of the instruction cache Refer to section 12 3 14 Caching Interrupt Handling Procedures pg 12 2
550. riting the destination register Table A 9 lists the Ida addressing mode combinations that the AGU executes directly As seen in the figure if Ida instructions are issued back to back using one of the addressing modes in the table the instructions execute at a one clock sustained rate with or without register dependencies addo 16 40 40 lda 16 g0 g4 addo g4 g5 g6 lda 16 g7 4 g8 lda 16 48 40 Instruction addo Scheduler Issue addo Ida Ida Ida EU Read src1 src2 16 00 94 95 Pipeline UNE ane g0 lt g0 16 06 lt 04 95 90 97 98 Pipeline 94 lt 90 16 986 0744 16 90 98 16 Figure A 12 The Ida Pipeline Table A 9 AGU Instructions Mnemonic Issue Clocks Addressing Mode Result Latency Clocks Ida 1 offset 1 disp reg offset reg disp reg disp reg scale A 25 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 4 5 Effective Address efa Calculations The AGU calculates the efa for instructions which require one When the addressing mode specified by an instruction is the offset disp or reg mode the AGU generates the efa in parallel with the instruction s issuance As shown in the previous pipeline figure for the DR Figure A 11 load and store instructions begin immediately for these addressing modes with no delay for address generation See section A 2 6 Micro flow Execution p
551. rmination 1 ec Dern 14 30 AG Tetrmlhatiom eiecti estie 14 31 Avoid Closed Loop Signal 14 32 C Series Core and A 1 i960 CA CF Microprocessor Block Diagram A 3 Instr ction Pipelined vc aise ets Pet Rc S d rep 4 SpcPort Register N 6 Data Cache OrganlzatlOln o cocco 8 BCU and Data Cache A 11 ISSUG ac ae cr Ro vibe A 16 EU Execution Pipeline cep ento ninh aes A 21 MDU Execution Pipeline eene A 22 MDU Pipelined Back To Back A 23 Data RAM Execution e A 24 The Ida Pipeling ione beet ended deberet A 25 Back to Back BCU ACCESSES A 28 CTRL Pipeline for Branches to Branches see A 29 Branch in First Executable A 30 Branch in Second Executable A 31 Branch in Third Executable ene A 32 Fetch Execution zi suco cesi ctp mete e D epa ec at A 36 Micro flow Invocation 5 2 nien ente ERE t deg fedis A 38 Non Pipelined Burst SRAM
552. rms different actions depending on the type of call selected Table 5 2 Encodings of Entry Type Field in System Procedure Table Encoding Call Type 000 System Local Call 001 Reserved 010 System Supervisor Call 011 Reserved 5 5 1 2 Supervisor Stack Pointer When a system supervisor call is made the processor switches to a new stack called the supervisor stack if not already in supervisor mode The processor gets a pointer to this stack from the supervisor stack pointer field in the system procedure table Figure 5 4 during the reset initial ization sequence and caches the pointer internally Only the 30 most significant bits of the supervisor stack pointer are given The processor aligns this value to the next 16 byte boundary to determine the first byte of the new stack frame 5 5 1 3 Trace Control Bit The trace control bit byte 12 bit 0 specifies the new value of the trace enable bit in the PC register PC te when a system supervisor call causes a switch from user mode to supervisor mode Setting this bit to 1 enables tracing in the supervisor mode setting it to O disables tracing The use of this bit is described in section 8 1 2 Trace Enable Bit and Trace Fault Pending Flag pg 8 3 5 14 intel PROCEDURE CALLS 5 5 2 System Call to a Local Procedure When a calls instruction references an entry in the system procedure table with an entry type of 00 the processor executes a system local ca
553. rnal Interrupt Mask Bits IMSK xim 0 masked 1 not masked DMA Interrupt Mask Bits IMSK dim 0 masked 1 not masked Interrupt Mask Register IMSK SF1 Reserved Initialize to 0 F 055 12 Figure 12 8 Interrupt Mask IMSK and Interrupt Pending IPND Registers 3 7 Default and Reset Register Values The ICON and 2 0 control registers are loaded from the control table in external memory when the processor is initialized or reinitialized The control table is described in section 2 3 CONTROL REGISTERS pg 2 6 The IMSK register is set to 0 when the processor is initialized RESET is deasserted The IPND register value is undefined after a power up initial ization cold reset The user is responsible for clearing this register before any mask register bits are set otherwise unwanted interrupts may be triggered For a reset while power is ON warm reset the pending register value is retained 12 15 INTERRUPT CONTROLLER In 12 3 8 Setting Up the Interrupt Controller This section provides several examples of setting up the interrupt controller Recall that the IMAP and ICON registers are control registers When the entire control table is automatically read at initialization the ICON and IMAP registers are loaded with the values pre programmed in the table In many applications setting these register values in the initial control table is the only setup required The follow
554. ro creates a register indirect addressing mode however this operation can generally be carried out faster by using the MEMB version of this addressing mode 0 5 2 MEMB Format Addressing The MEMB format provides the following seven addressing modes absolute displacement register indirect register indirect with displacement register indirect with displacement register indirect with index and displacement index with displacement P with displacement The abase and index fields specify local or global registers the contents of which are used in address computation When the index field is used in an addressing mode the processor automati cally scales the index register value by the amount specified in the scale field Table D 3 gives the encoding of the scale field The optional displacement field is contained in the word following the instruction word The displacement is a 32 bit signed two s complement value Table D 3 Encoding of Scale Field Scale Scale Factor Multiplier 000 1 001 2 010 4 011 8 100 16 101 to 111 Reserved NOTE Usage of a reserved encoding causes generation of an invalid opcode fault For the IP with displacement mode the value of the displacement field plus eight is added to the address of the current instruction D 5 intel E MACHINE LANGUAGE INSTRUCTION REFERENCE intel APPENDIX E MACHINE LANGUAGE INSTRUCTION REFERENCE 1 INSTRUCTIO
555. rocessor always switches to supervisor mode while an interrupt is being handled It also saves the states of the AC and PC registers for the interrupted program The interrupt procedure shares the 6 10 intel INTERRUPTS remainder of the execution environment resources namely the global registers special function registers and the address space with the interrupted program Thus interrupt procedures must preserve and restore the state of any resources shared with a non cooperating program CAUTION Interrupt procedures must preserve and restore the state of any resources shared with a non cooperating program For example an interrupt procedure which uses a global register which is not permanently allocated to it should save the register s contents before it uses the register and restore the contents before returning from the interrupt handler To reduce interrupt latency to critical interrupt routines interrupt handlers may be locked into the instruction cache See section 12 3 14 Caching Interrupt Handling Procedures pg 12 21 for a complete description 6 9 INTERRUPT CONTEXT SWITCH When the processor services an interrupt it automatically saves the interrupted program state or interrupt procedure and calls the interrupt handling procedure associated with the new interrupt request When the interrupt handler completes the processor automatically restores the interrupted program state The method that the processor uses to s
556. rom the fault the fault handler can call a debug monitor or perform an action such as resetting the processor This procedure call mechanism handles faults that occur e while the processor is servicing an interrupt e while the processor is working on another fault handling procedure 7 2 FAULT TYPES The 1960 architecture defines a basic set of faults which are categorized by type and subtype Each fault has a unique type and subtype number When the processor detects a fault it records the fault type and subtype numbers in a fault record It then uses the type number to select a fault handling procedure 7 2 ntel FAULTS The fault handling procedure can optionally use the subtype number to select a specific fault handling action The 1960 Cx processor recognizes 1960 architecture defined faults and a new fault subtype for detecting unaligned memory accesses Table 7 1 lists all faults that the 1960 Cx processor detects arranged by type and subtype Text that follows the table gives column definitions Table 7 1 i960 Cx Processor Fault Types and Subtypes Fault Type Fault Subtype Fault Record Number Name Bit 1 Instruction Trace XX01 XX02H Bit 2 Branch Trace XX01 XX04H Bit 3 Call Trace XX01 XX08H Bit 4 Return Trace XX01 XX10H Bit 5 Prereturn Trace XX01 XX20H Bit 6 Supervisor Trace XX01 XX40H Bit 7 Breakpoint Trace XX01 XX80H 2H Operation 1H Invalid Opcode XX
557. rs SFRs are provided as an extension to the architectural register model These registers are designated sf0 sfl sf2 see Table 2 1 Registers sf3 sf31 are not implemented on the 1960 Cx processors Reading or modifying unimplemented registers causes the operation invalid opcode fault to occur SFRs provide a means to configure and monitor the interrupt controller and DMA controller status for the 1960 CF processor SFRs are used to control the data cache The processor provides a mechanism which allows only privileged access to SFRs These registers can only be accessed while the processor is in supervisor execution mode See section 2 7 USER SUPERVISOR PROTECTION MODEL pg 2 20 A type mismatch fault occurs if an instruction with a SFR operand is executed in user mode SFRs are not used as operands for instructions whose machine level instruction format is of type MEM or CTRL Such instructions include loads stores and those which cause program redirection call return and branches APPENDIX D MACHINE LEVEL INSTRUCTION FORMATS describes machine level encoding for operands Table 2 2 summarizes the use of SFRs as instruction operands 2 2 4 Register Scoreboarding Register scoreboarding allows concurrent execution of sequential instructions When an instruction executes the processor sets a register scoreboard bit to indicate that a particular register or group of registers is being used in an operation If the instructions that
558. rtable code must not access or depend on values in this region As shown in Figure 2 3 the initialization boot record is located in the 1960 Cx processors reserved memory The 1960 Cx processors require some special consideration when using the lower 1 Kbyte of address space addresses 0000H 03FFH Loads and stores directed to these addresses access internal memory instruction fetches from these addresses are not allowed for the 1960 Cx processors See section 2 5 4 Internal Data RAM pg 2 12 2 10 3 PROGRAMMING ENVIRONMENT 2 5 2 Data and Instruction Alignment in the Address Space Instructions program data and architecturally defined data structures can be placed anywhere in non reserved address space while adhering to these alignment requirements Align instructions on word boundaries e Align all architecturally defined data structures on the boundaries specified in Table 2 4 e Align instruction operands for the atomic instructions atadd atmod to word boundaries in memory The 1960 Cx processors do not require that load and store data be aligned in memory It can handle a non aligned load or store request by either of two methods e It can automatically service a non aligned memory access with microcode assistance as described in section 10 4 DATA ALIGNMENT pg 10 9 e It can generate an operation unaligned fault when a non aligned access is detected The method for handling non aligned accesses i
559. ructure fields are defined as reserved locations A reserved field may be used by future implementations of the 1960 architecture For portability and compatibility code should initialize reserved locations When an implementation uses a reserved location the imple mentation specific feature is activated by a value of 1 in the reserved field Setting the reserved locations to 0 guarantees that the features are disabled C 4 INSTRUCTION SET The 1960 architecture defines a comprehensive instruction set Code which uses only the architec turally defined instruction set is object level portable to other implementations of the 1960 archi tecture Some implementations may favor a particular code ordering to optimize performance This special ordering however is never required by an implementation The following subsections describe the implementation dependent instruction set properties C 4 1 Instruction Timing An objective of the 1960 architecture is to allow microarchitectural advances to translate directly into increased performance The architecture does not restrict parallel or out of order instruction execution nor does it define the time required to execute any instruction or function Code which depends on instruction execution times therefore is not portable to all 1960 processor architecture implementations C 4 2 Implementation Specific Instructions Most of the processor s instruction set is defined by the core architecture Sever
560. rvisor calls respectively A system supervisor call is also referred to as a supervisor call 5 2 CALL AND RETURN MECHANISM At any point in a program the 1960 processor has access to the global registers a local register set and the procedure stack A subset of the stack allocated to the procedure is called the stack frame e When call executes new stack frame is allocated for the called procedure The processor also saves the current local register set freeing these registers for use by the newly called procedure In this way every procedure has a unique stack and a unique set of local registers e When a return executes the current local register set and current stack frame are deallocated The previous local register set and previous stack frame are restored 5 2 1 Local Registers and the Procedure Stack The processor automatically allocates a set of 16 local registers for each procedure Since local registers are on chip they provide fast access storage for local variables Of the 16 local registers 13 are available for general use r0 rl and r2 are reserved for linkage information to tie procedures together NOTE The processor does not always clear or initialize the set of local registers assigned to a new procedure Therefore initial register contents are unpre dictable Also because the processor does not initialize the local register save area in the newly created stack frame for the procedure its contents are eq
561. s Note Instruction Cache Size 1 Kbyte Only CF 4 Kbyte direc F CF001A Figure 1 1 i960 CA CF Superscalar Microprocessor Architecture Parallel decode also speeds conditional operations such as branches These instructions are decoded and executed ahead of the current instruction pointer while maintaining the logical control flow of the sequential program Once the scheduler issues an instruction or group of instructions one of six parallel processing units begins to execute each instruction Each parallel unit handles a different subset of the instruction set enabling multiple instructions to be issued and executed every clock cycle Each unit executes its instructions in parallel with other processor operations The 1960 Cx processors 32 general purpose 32 bit registers are each six ported to allow unimpeded parallel access to independent processing units To maintain the logical integrity of sequential instructions which are being executed in parallel the processor implements register scoreboarding and resource scoreboarding interlocks intel INTRODUCTION The superscalar 1960 Cx processors can decode multiple instructions at once and issue them to independent processing units where they are executed in parallel As a result the processors deliver sustained execution of multiple instructions per clock from a sequential instruction stream 1 1 2 Full Procedure Call Model These processors supp
562. s handled as dedicated mode interrupt requests The application program may use the sysctl instruction to request interrupt service The vector that sysctl requests is serviced immediately or posted in the interrupt table s pending interrupts section depending upon the current processor priority and the request s priority The interrupt controller caches the priority of the highest priority interrupt posted in the interrupt table The interrupt controller continuously compares the priorities of the highest posted software interrupt and the highest pending hardware interrupt to the processor s priority The core is interrupted when a pending interrupt request is higher than the processor priority or a priority 31 In the event that both hardware and software requested interrupts are posted at the same level the hardware interrupt is serviced before the software interrupt when the priority is 1 to 30 At priority 31 the software interrupt is serviced first 12 2 2 INTERRUPT CONTROLLER Service Interrupts poe pee Check Pending metit ptum Interrupts sb priority current egister 1 resolver E internal priority Interrupt Mask Pending Priorities and l Register IMSK Pending Interrupts Fields Post Interrupts Interrupt Table Interrupt Controller interr
563. s the processor treats the literal as a positive integer value 2 2 6 Register and Literal Addressing and Alignment Several instructions operate on multiple word operands For example the load long instruction loads two words from memory into two consecutive registers The register for the less significant word is specified in the instruction the more significant word is automatically loaded into the next higher numbered register In cases where an instruction specifies a register number and multiple consecutive registers are implied the register number must be even if two registers are accessed e g g0 g2 and an integral multiple of 4 if three or four registers are accessed e g g0 g4 If a register reference for a source value is not properly aligned the source value is undefined and an operation invalid operand fault is generated If a register reference for a destination value is not properly aligned the registers to which the processor writes and the values written are undefined The processor then generates an operation invalid operand fault The assembly language code in Example 2 2 shows an example of correct and incorrect register alignment Example 2 2 Register Alignment movl 93 98 INCORRECT ALIGNMENT resulting value in registers g8 and g9 is unpredictable non aligned source movl g4 g8 CORRECT ALIGNMENT 2 5 PROGRAMMING ENVIRONMENT intel Global registers local registers special fu
564. s and modifies the PC register The processor must be in supervisor mode to execute this instruction a type mismatch fault is generated if modpc is executed in user mode As with modac modpc provides a mask operand that can be used to limit access to specific bits or groups of bits in the register In the latter two methods the interrupt or fault handler changes process controls in the interrupt or fault record that is saved on the stack Upon return from the interrupt or fault handler the modified process controls are copied into the PC register The processor must be in supervisor mode prior to return for modified process controls to be copied into the PC register When process controls are changed as described above the processor recognizes the changes immediately except for one situation if modpc is used to change the trace enable bit the processor may not recognize the change before the next four instructions are executed After initialization hardware reset the process controls reflect the following conditions e priority 31 execution mode supervisor trace enable off e state interrupted When the processor is reinitialized via the system control instruction and reinitialize message the PC register reflects the same conditions except that the processor retains the same priority as before reinitialization The reserved bits indicated in Figure 2 5 should never be set to zero user software should not depend on the val
565. s are described in section 10 3 PROGRAMMING THE BUS CONTROLLER pg 10 5 The following subsections describe the 1960 Cx processors programmable bus characteristics 10 2 1 Data Bus Width Each region s data bus width is programmed in the memory region configuration table The 1960 CX processors allow an 8 16 or 32 bit wide data bus for each region Byte enable signals encoded in each region provide the proper address for 8 16 or 32 bit memory systems The 1960 CX processors use the lower order data lines when reading and writing to 8 or 16 bit memory 10 2 2 Burst and Pipelined Read Accesses To improve bus bandwidth the 1960 CX devices provide a burst access and pipelined read access These burst and pipelining modes are separately enabled or disabled for each memory region by programming the memory region configuration table When burst access is enabled the bus controller generates an address the burst address followed by one to four data transfers The lower two address bits A3 2 are incremented for each consecutive data transfer Burst accesses facilitate the interface to fast page mode DRAM wait states following the address cycle and wait states between data cycles can be controlled indepen dently Data cycle time is typically a fraction of address cycle time This provides an optimal wait state profile for fast page mode DRAM When address pipelining is enabled the next read address is asserted in the last dat
566. s for instructions which reference on chip data RAM AGU Address Generation Unit executes the Ida callx bx and balx instruc tions and assists address calculation for all loads and stores REG side Two units are attached to the register side REG side instructions are dispatched over the REG machine bus MDU Multiply Divide Unit executes the multiply divide remainder modulo and extended multiply and divide instructions EU Execution Unit executes all other arithmetic logical shift comparison bit bit field move instructions and the scanbyte instruction CTRL side One unit is on the control side IS Instruction Scheduler directly executes control instructions by modifying the next instruction pointer given to the instruction cache The processor uses on chip ROM to execute instructions not directly executed by one of the parallel processing units This ROM contains a sequence of RISC instructions for each complex instruction not directly executable in one of the parallel processing units When the scheduler encounters a complex instruction the appropriate sequence of RISC instructions is issued for execution This sequence of instructions is called a micro flow The IS can issue multiple instructions in every clock when the instructions decoded in that clock can be executed by different machine sides For example an add can begin in the same clock as a load since the addition is performed by the EU on the REG side while the l
567. s handled In the case of parallel instruction execution these fields contain register states that were pending when the processor completed execution of all parallel and out of order instructions 31 0 Optional Data 1 Reserved F CA020A NFP 20 NFP 16 NFP 12 NFP 8 NFP 4 Figure 7 3 Fault Record Optional data fields are defined for certain faults These fields contain additional information about the faulting conditions usually to assist resumption The 1960 Cx processor uses these optional data fields for two fault types only parallel faults and operation unaligned faults The processor can generate parallel faults when instructions are executed in parallel section 7 6 1 Multiple Faults pg 7 9 describes optional data field usage for parallel faults section 7 10 3 Operation Faults pg 7 23 describes optional data field usage for operation unaligned faults All unused bytes in the fault record are reserved 7 5 2 Return Instruction Pointer RIP When a fault handling procedure is called a return instruction pointer RIP is saved in the RIP register r2 The RIP points to an instruction where program execution can be resumed with no break in the program s control flow It generally points to the faulting instruction or to the next instruction to be executed In some instances however the RIP is undefined section 7 10 FAULT REFERENCE pg 7 20 defines the RIP content for eac
568. s in the maximum bandwidth utilization of the bus See Figure B 6 IATATATATAYAS CE CA106A Figure B 6 Pipelined Read Address and Data B 10 intel e BUS INTERFACE EXAMPLES B 2 4 Block Diagram The same SRAM used in a non pipelined read memory system can be used in a pipelined read memory system Figure B 7 shows a 32 bit wide burst read pipelined memory system Burst mode is used to speed write accesses The design of a pipelined read SRAM interface is very similar to the design of a non pipelined SRAM interface The difference is that an address latch and a W R latch have been added Chip select logic is a simple asynchronous data selector Chip select CS is based only on the address and is not qualified with any other signals See section B 1 NON PIPELINED BURST SRAM INTERFACE pg B 1 for more information on chip select generation 1 4 80960 107 Figure B 7 Pipelined SRAM Interface Block Diagram BUS INTERFACE EXAMPLES intel 2 1 1 Address Latch During pipelined reads the 1960 Cx processors output the next address during the last data cycle of the current access This requires either an address latch or memory devices that are designed to work with the pipelined bus B 2 1 2 State Machine PLD The state machine PLD contains logic to control CE and address signals A3 2 CE is controlled by a simple state machine A3 2 automatically increme
569. s only updated when service swaps to another channel or udma executes 13 28 intel CONTROLLER Channel swapping occurs when channel priority for a pending DMA request is higher than that of the currently active or last serviced channel Address Internal SRAM 0000 0000H DMA Working Registers Byte Count 0H Source Address 4H 9000 0040 Channel 0 Setup Destination Address 8H 32 Bytes nem 0000 0060H Channel 1 Setup Next Pointer Chaining Mode CH ae 0o80 Gv S 18H 0000 00A0H Channel 3 Setup ES 0000 00COH CA069A Figure 13 12 DMA Data RAM udma flushes the state of a currently executing channel to data RAM Additional DMA transfers can occur between the time that udma executes and a program reads the locations in data RAM The channel may be suspended before udma executes to ensure coherence between the values read from data RAM and actual DMA progress DMA data RAM is 128 bytes of internal RAM located at 0000 0040H to 0000 OOBFH See Figure 13 12 This memory is read write in supervisor mode and read only in user mode This supervisor protection prevents errant modification of the DMA data RAM by a program DMA data RAM for any channel can be used for general purpose storage when the channel is not in use program however must not modify data RAM dedicated for a channel which is already set up and awaiting activity In general any mod
570. s selected at initialization based on the value of the Fault Configuration Word in the Process Control Block See section 14 2 6 Process Control Block PRCB pg 14 8 Table 2 4 Alignment of Data Structures in the Address Space Data Structure Alignment System Procedure Table 4 byte Interrupt Table 4 byte Fault Table 4 byte Control Table 16 byte User Stack 16 byte Supervisor Stack 16 byte Interrupt Stack 16 byte Process Control Block 16 byte Initialization Boot Record Fixed at FFFF FFOOH 2 5 3 Byte Word and Bit Addressing The processor provides instructions for moving data blocks of various lengths from memory to registers LOAD and from registers to memory STORE Allowable sizes for blocks are bytes half words 2 bytes words 4 bytes double words triple words and quad words For example stl store long stores an 8 byte double word data block in memory PROGRAMMING ENVIRONMENT intel The most efficient way to move data blocks longer than 16 bytes is to move them in quad word increments using quad word instructions Idq and stq When a data block is stored in memory normally the block s least significant byte is stored at a base memory address and the more significant bytes are stored at successively higher byte addresses This method of ordering bytes in memory is referred to as little endian ordering The 1960 Cx processors also provide the option for ordering bytes i
571. s specified for a bit operation by giving its bit number and register The least significant bit of a 32 bit register is bit 0 the most significant bit is bit 31 A bit field is a contiguous sequence of bits within a register operand Bit fields do not span register boundaries A bit field is defined by giving its length in bits 0 31 and the bit number of its lowest numbered bit 0 31 In other words the bit field is any contiguous group of bits up to 31 bits long in a 32 bit register Loading and storing of bit and bit field data is normally performed using the ordinal load and store instructions Integer load and store instructions operate on two s complement numbers Depending on the value a byte or short integer load can result in sign extension of data in a register A byte or short store can signal an integer overflow condition 3 3 DATA TYPES AND MEMORY ADDRESSING MODES intel 3 1 4 Triple and Quad Words Triple and quad words refer to consecutive words in memory or in registers Triple and quad word loads stores and move instructions use these data types These instructions facilitate data block movement No data manipulation sign extension zero extension or truncation is performed in these instructions Triple and quad word data types can be considered a superset of or as encompassing the other data types described The data in each word subset of a quad word is likely to be the operand or result of an ordinal
572. scribes the die stepping information contained in g0 2 2 2 Local Registers The 1960 architecture provides a separate set of 32 bit local data registers rO through r15 for each active procedure These registers provide storage for variables that are local to a procedure Each time a procedure is called the processor allocates a new set of local registers for that procedure and saves the calling procedure s local registers on the procedure stack The processor performs local register management a program need not explicitly save and restore these registers through r15 are general purpose registers rO through r2 are reserved for special functions rO contains the Previous Frame Pointer PFP r1 contains the Stack Pointer SP r2 contains the Return Instruction Pointer RIP These are discussed in CHAPTER 5 PROCEDURE CALLS NOTE The processor does not always clear or initialize the set of local registers assigned to a new procedure Therefore initial register contents are unpre dictable Also because the processor does not initialize the local register save area in the newly created stack frame for the procedure its contents are equally unpredictable 2 3 PROGRAMMING ENVIRONMENT intel 2 2 3 Special Function Registers SFRs 1960 architecture provides a mechanism to expand its architecture defined register set with up to 32 additional 32 bit registers On the 1960 Cx microprocessor three special function registe
573. se three cases 13 2 intel 3 DMA CONTROLLER 13 3 SOURCE AND DESTINATION ADDRESSING When a DMA operation is set up it is described with a source address destination address and byte count For each channel an address is either held fixed or incremented after each transfer A fixed address is used for addressing external I O devices an address which increments is used for the memory side of a DMA transfer When a channel is set up address increment or hold is selected separately for the source and destination address Source and destination address and byte count are 32 bit values Source and destination are byte addressable over the entire address space DMA operation length can be up to 4 Gbytes 23 Bytes Source and destination address and byte count are specified when sdma executes 13 4 DMA TRANSFERS The following sections explain DMA transfer characteristics especially those transfer character istics affected by channel setup Intelligent selection of transfer characteristics works to balance DMA performance and functionality with the performance of the user s program Source destination request length selects the bus request types which the DMA microcode issues when executing a DMA transfer To perform a transfer combinations of byte short word word and quad word load and store requests are issued Refer to section 11 2 BUS OPERATION pg 11 2 for a detailed description of bus request As indicated in Table 13 1 tr
574. sembler is free to provide one For bbe if selected bit in src is clear the processor sets condition code to 0005 and branches to instruction specified with targ otherwise it sets condition code to 010 and goes to next instruction For bbs if selected bit is set the processor sets condition code to 010 and branches to targ otherwise it sets condition code to 000 and goes to next instruction targ can be no farther than 212 to 2 4 bytes from current IP When using the Intel 1960 processor assembler targ must be a label which specifies target instruction s IP Action bbc if src and 2 bitpos mod 32 0 lt 000 IP IP displacement resume execution at new else AC cc lt 010 resume execution at next IP bbs if src and 2 bitpos mod 32 1 AC cc lt 010 IP IP displacement resume execution at new else AC cc lt 000 resume execution at next IP INSTRUCTION SET REFERENCE intel Faults Trace Instruction Branch if taken Instruction and Branch Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC b 1 Operation Unimplemented Execution from on chip data RAM Type Mismatch Non supervisor reference of a sfr Example assume bit 10 of r6 is clear bbc 10 r6 xyz bit 10 of r6 is checked and found clear AC cc lt 000 IP xyz Opcode bbc 30H COBR bbs 37H COBR See
575. serted and DEN is deasserted 11 2 4 Burst Accesses A burst access is an address cycle followed by two to four data cycles The two least significant address signals automatically increment during a burst access Maximum burst size is four data cycles This maximum is independent of bus width byte wide bus has a maximum burst size of four bytes a word wide bus has a maximum of four words If a quad word load request e g Idq is made to an 8 bit data region it results in four 4 byte burst accesses See Table 11 3 11 13 EXTERNAL BUS DESCRIPTION Function Bit 31 23 Value PCLK AD A31 4 SUP DMA D C BE3 0 LOCK W R BLAST DT R pc xdg dg T4 dd 3 m WAIT D31 0 Byte Order 22 21 Bus Width ep reserved Nwpp ps er pe NxpA 0 00000 vy x XX Nrap Ready Control Pipe 2 0 External Disabled 0 Burst Disabled F CX026A 11 14 Figure 11 6 Read Write Requests Non Pipelined Non Burst No Wait States 2 EXTERNAL BUS DESCRIPTION Table 11 3 Burst Transfers and Bus Widths Bus Width Number of Burst Number of Number of 3 Accesses Transfers Burst Transfers Quad Word 8 bit 4 4 4 4 4 16 16 bit 2 4 4 8 32 bit 1 4 4 Triple Word 8 bit 3 4 4 4 12 16 bit 2 4 2 6 32 bit 1 3
576. shows NO deassertion in the 6th cycle and the last deassertion in the 10th cycle 2nd deassertion removed 3rd deassertion shifted left 1 cycle 2 BUS INTERFACE EXAMPLES ADS clock i incorrectly goes low ADS B the sixth and eleventh cycles Nwap 1 cs 1 t now correctly goes Nypa 0 low in the first and tenth cycles 1 Wait State Burst Write A31 4 DATA W R BLAST Figure B 3 Non Pipelined SRAM Write Waveform B 1 4 1 Wait State Selection The 1960 Cx processors incorporate an internal wait state generator wait state selection is dictated by the memory system The number of Ngap wait states required is a function of output enable access time chip enable access time or address access time Ngap must be selected so the wait states and data cycle accommodate the longest of these times It is important to consider PLD output delay B 5 BUS INTERFACE EXAMPLES intel The number of Ngpp wait states required is a function of address access time Nppp must be selected so that the wait states and data cycle accommodate the memory system s address to data time If the memory system is using the burst addresses provided by the 1960 Cx processors it is important to consider address output delay from the 1960 Cx devices If external address generation is used PLD delay is important The number of Nwap and N
577. sor and execute a test of the 1960 Cx processor system The on circuit emulation mode is entered by asserting low the ONCE pin while the 1960 Cx processor is in the reset state ONCE pin value is latched on RESET signal s rising edge The ONCE pin should be left unconnected in 1960 Cx processor system The pin is connected to through an internal pull up resistor causing the unconnected to remain in the inactive state To enter on circuit emulation mode an external tester simply drives the ONCE pin low overcoming the pull up resistor and initiates a reset cycle To exit on circuit emulation mode the reset cycle must be repeated with the ONCE pin deasserted prior to the rising edge of RESET See the 1960 CA CF microprocessor data sheets for specific timing of the ONCE pin and the character istics of the on circuit emulation mode 14 2 4 Initial Memory Image IMI The IMI comprises the minimum set of data structures that the processor needs to initialize its system The IMI performs three functions for the processor e it provides initial configuration information for the core and integrated peripherals e it provides pointers to the system data structures and the first instruction to be executed after the processor s initialization e jt provides checksum words that the processor uses in its self test routine at startup The IMI is made up of three components the initialization boot record IBR process control b
578. sor checks the PFP register prereturn trace flag If set the processor generates a prereturn trace event then handles it as described in section 8 6 1 Normal Handling of Trace Events pg 8 9 8 6 3 Tracing and Interrupt Procedures When the processor invokes an interrupt handling procedure to service an interrupt it disables tracing It does this by saving the PC register s current state then clearing the PC register trace enable bit and trace fault pending flag 8 9 TRACING AND DEBUGGING intel On returning from the interrupt handling procedure the processor restores the PC register to the state it was in prior to handling the interrupt which restores the trace enable bit and trace fault pending flag states If these two flags were set prior to calling the interrupt procedure a trace fault is signaled on return from the interrupt procedure NOTE On a return from an interrupt handling procedure the trace fault pending flag is restored If this flag was set as a result of the interrupt procedure s ret instruction 1 indicating a return trace event the detected trace event is lost This is also true on a return from a fault handler when the fault handler is called with an implicit supervisor call 8 10 intel 9 INSTRUCTION SET REFERENCE intel CHAPTER 9 INSTRUCTION SET REFERENCE This chapter provides detailed information about each instruction available to the 19609 Cx processors Instructions are listed
579. special function registers 6 11 supervisor mode 6 10 interrupt latency 12 18 interrupt mask saving 12 7 interrupt pins dedicated mode 12 2 expanded mode 12 2 mixed mode 12 2 interrupt posting 6 1 interrupt procedure pointer 6 5 interrupt record 6 9 location 6 9 interrupt request management 12 2 intel interrupt requests interrupt controller 6 6 origination 6 6 sysctl 6 8 interrupt service latency 12 17 interrupt servicing mechanism C 5 interrupt stack 2 1 2 8 6 9 structure 6 9 interrupt table 2 1 2 8 6 3 alignment 6 3 caching mechanism 6 6 initialization 14 11 location 6 3 LOCK pin 6 7 locking 6 6 pending interrupts 6 5 vector entries 6 4 interrupts checking pending 12 17 clearing the source 12 5 dedicated mode 12 4 dedicated mode posting 12 4 definition 6 1 DMA operations 12 2 DMA suspension 12 21 expanded mode 12 5 function 6 1 internal RAM 12 20 interrupt context switch 6 11 interrupt handling procedures 6 10 interrupt record 6 9 interrupt stack 6 9 interrupt table 6 3 masking hardware interrupts 12 8 mixed mode 12 7 non maskable 12 7 non maskable interrupt NMI 6 3 interrupts continued optimizing performance 12 19 physical characteristics 12 8 posting 6 1 6 6 12 17 priority handling 12 2 priority 31 interrupts 6 3 12 8 programmable options 12 9 requesting 12 16 restoring r3 12 8 servicing 6 3 12 17 sysctl 12 2 vector caching 12 20 INDEX IP register see Instruction Pointer IP registe
580. sre dst reg lit sfr reg lit sfr reg sfr i 2 2 2 2 I I U 64 5 REG u 9 9 temp lt AC lt src and mask or AC and not mask dst temp Modulo Integer modi sre IO i I U ZD 74 9 REG R 3 36 if src2 0 Arithmetic Zero Divide fault dst lt src2 mod src src2 src and dst are 32 bits Modify mask sre sre dst modify reg lit sfr reg lit sfr reg I I U Dii 65 0 REG u 3 3 src dst src and mask or sre dst and not mask Modify PC sre mask sre dst modpc reg lit sfr reg lit sfr reg if mask 0 and PC em Supervisor y Type Mismatch fault e I I U 65 5 u 12 17 12 17 temp lt PC lt mask and src dst or PC and not mask src dst temp March 1994 Page 11 of 18 Order Number 272220 002 19609 Cx Microprocessor User s Guide Instruction Set Quick Reference Arithmetic Con Process Controls Trace Controls Instruction Execution e Opcode Mnemonic Description Opcode Mach Instruction Result em te Events Format Type Issue Latency Modify TC m mask sre dst pete reg lit sfr reg lit sfr reg sfr LN 65 4 REG u 15 15 temp lt TC TC lt mask and src or TC and not mask dst lt temp Move mov sre dst regilivstr
581. ss Addresses may be incremented or held fixed for any DMA operation Each buffer transfer is handled as if it were a single non chained DMA Data alignment require ments for each buffer are identical to the requirements for any other DMA See section 13 4 5 Data Alignment pg 13 10 Since each buffer is considered a single DMA data is never internally buffered when moving from one buffer to another for unaligned DMAs Internal Register First Descriptor Pointer user loads descriptors Source buffers destination NPTR buffer a Terminate BC Byte Count SA Source Address DA Destination Address NPTR Next Pointer Figure 13 7 Source Chaining 13 15 DMA CONTROLLER intel Depending on DMA channel configuration and the chaining mode selected certain fields in the chaining descriptor are ignored but must be set to zero for future compatibility 1 When channel is source chained the DA field of the first descriptor specifies the destination address the DA field in subsequent descriptors is ignored 2 When a channel is destination chained the SA field of the first descriptor specifies the source address the SA field in subsequent descriptors is ignored 3 When a channel is configured for chained fly by mode the SA field always contains the fly by address the DA field is ignored When descriptors are read from external memory bus latency and mem
582. ss The minimum number of wait states between the last data cycle of a bus request to the address cycle of the next bus request applies to read and write requests Program mable for 0 3 clocks Nrap Nwap describe address to data wait states and Nwpp specify the number of wait states between consecutive data when burst mode is enabled and Nwpp are not used in non burst memory regions Nxpa describes the number of wait states between consecutive bus requests NxpA is the bus turnaround time An external device s ability to relinquish the bus on a read access read deasserted to data float determines the number of cycles NOTE For pipelined read accesses the bus controller uses a value of zero 0 for regardless of the parameter s programmed value A non zero value defeats the purpose of pipelining The programmed value of is used for write requests to pipelined memory regions as the 1960 CX processor does not support pipelined write accesses The ready READY and burst terminate BTERM inputs dynamically control bus accesses These inputs are enabled or disabled for each memory region READY extends accesses by forcing wait states BTERM allows a burst access to be broken into multiple accesses with no lost data The memory region registers are programmed to enable or disable these inputs for each region 10 4 intel THE BUS CONTROLLER READY and BTERM work
583. ss Space Address space can be mapped to read write memory read only memory and memory mapped I O The architecture does not define a dedicated addressable I O space There are no subdivisions of the address space such as segments For memory management an external memory management unit MMU may subdivide memory into pages or restrict access to certain areas of memory to protect a kernel s code data and stack However the processor views this address space as linear An address in memory is a 32 bit value in the range OH to FFFF FFFFH Depending on the instruction an address can reference in memory a single byte half word 2 bytes word 4 bytes double word 8 bytes triple word 12 bytes or quad word 16 bytes Refer to load and store instruction descriptions in CHAPTER 9 INSTRUCTION SET REFERENCE for multiple byte addressing information 2 9 PROGRAMMING ENVIRONMENT intel 2 5 1 Memory Requirements The architecture requires that external memory has the following properties e Memory must be byte addressable Memory must support burst transfers 1 transfer blocks of up to 16 contiguous bytes e memory is mapped at reserved addresses which are specifically used by an implemen tation e Memory must guarantee indivisible access read or write for addresses that fall within 16 byte boundaries e Memory must guarantee atomic access for addresses that fall within 16 byte boundaries The latter two capabil
584. sses For example if BTERM is asserted after the first word of a quad word burst the bus controller initiates another access by asserting ADS The accompanying address is the address of the second word of the burst access A3 2 015 The bus controller then bursts the remaining three words The BLAST burst last signal indicates the last data transfer of the access Read data is accepted on the clock edge that asserts BTERM write data is assumed written BTERM effectively overrides the memory ready READY signal when it is asserted In this way no data is lost when the current access is terminated When BTERM is asserted READY is ignored until after the address cycle which resumes the burst As with READY BTERM is ignored when pipelining is enabled in a region regardless of how the region is programmed For proper operation the BTERM inputs should be disabled in regions that have pipelining enabled EXTERNAL BUS DESCRIPTION In P Byte Function X X Bit Bus Pipe YExternal i N N N N N ini Ready Burst Width Control 32 bit x X 1 X 01 Value 10 dH reserved ep reserved l 1 1 PCLK 1 4 SUP Valid DMA D C BE3 0 LOCK i rA l Errata 10 31 94 SRB WAIT Wait signal incorrectly shown T as tra
585. ster r4 r8 r12 90 g4 g8 012 CA067A Fly By Address Figure 13 10 Setup DMA sdma Instruction Operands 13 10 3 DMA Control Word DMA control word Figure 13 11 specifies DMA modes and options The control word is an operand 2 of the sdma instruction Paragraphs that follow the figure define the register s bit and field settings 13 26 DMA CONTROLLER Transfer Type Field 8 to 8 bits 01 8 to 16 bits 02H reserved 8 to 32 bits 04H 16 to 8 bits 05H 16 to 16 bits 06H reserved 07H 16 32 bits O8H 8 bits fly by 09H 16 bits fly by OAH 128 bits fly by quad OBH 32 bits fly by OCH 32 to 8 bits ODH 32 to 16 bits OEH 128 to 128 bits quad OFH 32 to 32 bits Destination Addressing 0 increment 1 hold Source Addressing 0 increment 1 hold Synchronization Mode Bit 0 source synchronized 1 destination synchronized Synchronization Select Bit 0 block non synchronized 1 demand synchronized EOP TC Select Bit 0 End Of Process 1 Terminal Count Destination Chaining Select Bit 0 no chaining 1 chained destination Source Chaining Select Bit 0 no chaining 1 chained source Interrupt on chaining buffer Select Bit 0 no interrupt 1 interrupt Chaining Wait Select Bit 0 Wait function disabled 1 Wait function enabled 31 28 24 20 16 12 8 4 0 DMA Control Word Instruction Operand
586. ster save area in the stack frame for the local registers The user must modify the SP register value when data is stored or removed from the stack The 1960 architecture does not provide an explicit push or pop instruction to perform this action This is typically done by adding the size of all pushes to the stack in one operation 5 2 2 3 Previous Frame Pointer The previous frame pointer is the previous stack frame s first byte address This address upper 28 bits are stored in local register r0 the previous frame pointer register The four least significant bits of the PFP are used to store the return type field 5 2 2 4 Return Type Field PFP register bits 0 through 3 contain return type information for the calling procedure When a procedure call is made either explicit or implicit the processor records the call type in the return type field The processor then uses this information to select the proper return mechanism when returning to the calling procedure The use of this information is described section 5 8 RETURNS pg 5 16 intel PROCEDURE CALLS 5 2 2 5 Return Instruction Pointer When a call is made the processor saves the address of the instruction after the call providing a re entry point when the return instruction is executed This address is automatically stored in local register r2 of the calling frame Register r2 is referred to as the return instruction pointer RIP register The RIP register is
587. structures from ROM into RAM The processor is then reinitialized with a new PRCB which contains the base addresses of the new data structures in RAM Reinitialization is required to relocate any of several data structures since the processor caches the pointers to the structures The processor caches the following pointers during its initialization e Interrupt Table Address e System Procedure Table Address e Supervisor Stack Pointer e Interrupt Stack Pointer e Fault Table Address e Control Table Address e PRCB Address 14 3 2 Initialization Flow This section summarizes initialization by presenting a flow of the steps that the processor takes during initialization Figure 14 4 The entry point for reinitialization is also shown 14 12 2 INITIALIZATION AND SYSTEM REQUIREMENTS Hardware Reset Software Reset Reset state Executing program RESET Asserted YES SYSCTL reinitialize NO YES Assert FAIL pin Get PRCB pointer and start IP from SYSCTL operands enable faults Process PRCB cache data structure pointers read configuration words and configure processor NO asserted on rising edge of reset 2 YES Cache NMI vector from Perform internal self test vector location 248 in interrupt table NO STOP Cache supervisor stack pointer from offset 12 in system procedure table Internal self test pass YES Deassert FAIL pin FP interrupt stack pointer SP
588. t Literal of the range 0 31 sfr Special Function Register sf0 sf2 disp Signed displacement of range 2 7 00055 mem Address defined with the full range of addressing modes ml NOTE For future implementations the 1960 architecture will allow up to 32 Special Function Registers SFRs However sf0 sfl and sf2 are the only SFRs implemented on the 1960 Cx processors In some cases a third line is added to show register or memory location contents For example it may be useful to know that a register is to contain an address The notation used in this line is as follows addr Address efa Effective Address 9 2 4 Description The Description section is a narrative description of the instruction s function and operands It also gives programming hints when appropriate 9 3 INSTRUCTION SET REFERENCE intel 9 2 5 Action The Action section gives an algorithm written in a pseudo code that describes direct effects and possible side effects of executing an instruction Algorithms document the instruction s net effect on the programming environment they do not necessarily describe how the processor actually implements the instruction For example shli requires seven lines of pseudo code to completely describe its function Although it might appear from the algorithm that the instruction should take multiple clocks to execute the 1960 Cx processors execute the instruction in a single clock The following is an example of the ac
589. t completed cc busy AC register condition codes are not valid Correct branch prediction eliminates dead clocks due to condition code dependencies A 2 8 2 Resource Scoreboarding A scoreboarded resource also defeats the scheduler s attempt to issue an instruction A resource is scoreboarded when it is needed to execute the instruction but is not available The parallel processing units are the resources Table A 4 lists cases which cause an instruction to be delayed due to a scoreboarded resource Text that follows the table describes what happens to an instruction once it is issued to a processing unit A 2 3 3 Prevention of Pipeline Stalls To maintain the logical intent of the sequential instruction stream the 1960 Cx processors implement register scoreboarding and register bypassing Examples of each are demonstrated in the descriptions and examples in this appendix These mechanisms eliminate possible pipeline stalls due to parallel register access dependencies It is not necessary to perform any code optimi zations to take advantage of this parallel support hardware Register scoreboarding maintains register coherency by preventing parallel execution units from accessing registers for which there is an outstanding operation When the IS issues an instruction which requires multiple clocks to return a result the instruction s destination register is locked to further accesses until it is updated To manage this destinat
590. t muli can signal an integer overflow dst src2 srcl 3 srcl src2 and dst are 32 bits Type Arithmetic muli r3 muli mulo Mismatch Non supervisor reference of a sfr Integer Overflow Result is too large for destination register muli only If overflow occurs and AC om 1 the fault is suppressed and AC of is set to 1 Result s least significant 32 bits are stored in dst r4 r9 r9 r4 TIMES r3 741H REG 701H REG emul ediv divi divo intel INSTRUCTION SET REFERENCE 9 3 39 nand Mnemonic nand Nand Format nand srcl src2 dst reg lit sfr reg lit sfr reg sfr Description Performs bitwise NAND operation on src2 and src values and stores the result in dst Action dst lt not src2 and srcl Faults Type Mismatch Non supervisor reference of a sfr Example nand g5 r3 r7 r7 lt r3 NAND 45 Opcode nand 58bEH REG See Also and andnot nor not notand notor or ornot xnor xor 9 55 INSTRUCTION SET REFERENCE intel 9 3 40 Format Description Action Faults Example Opcode See Also 9 56 nor Nor nor srcl src2 dst reg lit sfr reg lit sfr reg sfr Performs a bitwise NOR operation on the src2 and src values and stores the result in dst dst lt not src2 or srcl Type Mismatch Non supervisor reference of a sfr nor g8 28 r5 r5 lt 28 NOR g8 nor 588H REG and andnot nand not notand notor or ornot xnor x
591. t of 0 non chaining or a null chaining pointer reaching 0 chaining e If only the done flag is set for the channel DMA has ended because of an active EOP3 0 input For source destination chained DMAs an interrupt is generated by asserting EOP3 0 to terminate the current chaining buffer 13 18 intel 3 DMA CONTROLLER NOTE An interrupt is generated when EOP3 0 is asserted or when buffer transfer is complete and the interrupt on buffer complete mode is enabled There is no way in software to distinguish between these two interrupt sources If this distinction is necessary the EOP3 0 pin may be connected to a dedicated external interrupt source A DMA operation can be suspended at any time by clearing the DMAC register channel enable bit It may be necessary to synchronize software to the completion of a channel s bus activity after the enable bit is cleared This is accomplished by polling the DMA channel active bit as shown in the following assembly code segment clrbit 0 sf2 sf2 disable channel 0 self bbs 4 sf2 self wait for channel activity to complete DMA operation is restarted by setting the channel enable bit A channel may be suspended to allow a section of time critical user code to execute with the maximum core and bus resources available To reduce interrupt latency all DMAs can be suspended when an interrupt is serviced This option is set in the Interrupt Control ICON register When
592. ta are passed to the bus sequencer when the external bus is available The sequencer then breaks the request into a set of bus accesses this generates the signals on the external bus pins 10 15 intel 1 EXTERNAL BUS DESCRIPTION intel CHAPTER 11 EXTERNAL BUS DESCRIPTION This chapter discusses the bus pins bus transactions and bus arbitration It shows waveforms to illustrate some common bus configurations This chapter serves as a guide for the hardware designer when interfacing memory and peripherals to the 19609 Cx processors For further details on external bus operation refer to APPENDIX B BUS INTERFACE EXAMPLES For information on bus controller configuration refer to CHAPTER 10 THE BUS CONTROLLER For pin descriptions refer to the 80960 and CF data sheets 11 1 OVERVIEW The 1960 Cx processors integrated bus controller and external bus provide a flexible easy to use interface to memory and peripherals All bus transactions are synchronized with the processor clock outputs PCLK2 1 therefore most memory system control logic can be implemented as state machines The internal programmable wait state generator external ready control signals bus arbitration signals data transceiver control signals and programmable bus width parameters all combine to reduce system component count and ease the design task 11 1 1 Terminology Requests and Accesses The terms request and access are used frequently when referring to
593. tablished measure of interrupt service latency in units of seconds is derived with the following equation N Lint f Interrupt Service Latency in seconds Equation 12 1 where fe PCLK2 1 frequency Hz Nj int number of PCLK2 1 cycles 12 17 INTERRUPT CONTROLLER In For real time applications worst case interrupt latency must be considered for critical handling of external events For example an interrupt from a FIFO buffer may need service to prevent the FIFO from an overrun condition For many applications typical interrupt latency must be considered in determining overall system performance For example a timer interrupt may frequently trigger a task switch in a multi tasking kernel The flowchart in Figure 12 9 can be used to determine worst case interrupt latency Flowchart values are based on the assumption that the interrupt controller is configured in the following way e Hardware interrupt is requested XINT7 0 pins or NMI e Fast sample mode Fast sample mode is selected ICON sm 1 e Cached interrupt vector Interrupt vector is fetched from internal data RAM This is automatic for the NMI vector or is selected in the ICON register ICON vce 1 e Cached interrupt handler Cache hit for interrupt call target e DMA suspended on interrupt DMA suspend on interrupt is enabled ICON dmas 1 e Minimum Bus Latency All memory is configured as zero wait state and burst access mode NOTE The worst
594. tants into registers Remember to use addressing modes that the AGU executes directly machine type M not y Table A 20 lists several conversions that can move an instruction to the AGU from either the EU or MDU Example A 5 exploits the Ida instruction to increase a 3x3 lowpass filter s performance by approximately 30 percent Table A 20 Creative Uses for the Ida Instruction Operation Equivalent Ida instruction addo 5 0 41 constant addition lda 5 40 gl shlo 2 gl g2 shifts by a constant lda gl 4 g2 mov 31 40 constant load lda 31 40 shlo 2 91 g2 shift add combination lda 5 gl 4 92 addo 5 g2 g2 mov 40 gl register move lda g0 gl A 49 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel Example A 5 Change the Type of Instruction Used 3x3 Lowpass Mask Y 2 XL MI 1 2 1 16 16 16 242 16 16 16 1 21 16 16 16 initial values initial values 40 points to 0 0 40 points to 0 0 gl points to Y 1 1 gl points to 1 0 g2 contains imax g2 contains imax r4 load temp r4 load temp rb accumulator rb accumulator r6 imax i count temp r6 imax i count temp r7 jmax 3 count temp r7 j count temp r8 imax 1 r8 imax 1 new mask row offset new mask row offset r9 2 imax 2 r9 2 imax 2 new i offset new i offset r10 is 2 imax 1 r10 is 2 imax 1 new j of
595. tations The 1960 Cx processors use an initialization boot record IBR and a process control block to hold initial configuration and a first instruction pointer C 7 INTERRUPTS The 1960 architecture defines the interrupt servicing mechanism This includes priority definition interrupt table structure and interrupt context switching which occurs when an interrupt is serviced The core architecture does not define the means for requesting interrupts external pins software etc or for posting interrupts i e saving pending interrupts The method for requesting interrupts depends on the implementation The 1960 Cx processors have an interrupt controller that manages nine external interrupt pins and four internal DMA sources The organization of these pins and the registers of the interrupt controller are implementation specific Code which configures the interrupt controller is not directly portable to other 1960 imple mentations On the 1960 Cx processors interrupts may also be requested in software with the sysctl instruction This instruction and the software request mechanism are implementation specific Posting interrupts is also implementation specific Different implementations may optimize interrupt posting according to interrupt type and interrupt controller configuration A pending priorities and pending interrupts field is provided in the interrupt table for interrupt posting However the 1960 Cx processors post hardware reques
596. te short word and word request lengths Quad word DMA transfers require that source and destination request lengths equal quad word therefore data assembly and disassembly are not applicable to this DMA mode Figure 13 4 shows a typical demand mode configuration in which an 8 bit device is the source requestor for a DMA and 32 bit memory is the destination If byte source and word destination request length is selected for this DMA data from four source requests is buffered before a load to the 32 bit memory is executed This configuration represents an optimal use of bus resources for a DMA between an 8 bit device and 32 bit memory 8 bit device i960 CA CF Microprocessor source buffer 32 bit memory 32 1 store to destination 4 loads from source External Bus External byte byte byte byte word Bus load load load load store DREQx DACKx F CA061A Figure 13 4 Byte to Word Assembly 13 9 DMA CONTROLLER intel Microcode algorithms which perform assembly and disassembly are less efficient than algorithms which perform transfers between source and destination with equal request lengths DMA controller assembly and disassembly is provided for convenience and for most efficient external bus usage For example the system shown in Figure 13 4 functions the same when source and destination request lengths are both byte long In this case each transfer is performed with a byte
597. te boundaries words on 4 byte boundaries short words half words on 2 byte boundaries bytes on 1 byte boundaries Unaligned bus requests do not occur on these natural boundaries Any unaligned bus request to a little endian memory region is executed however unaligned requests to big endian regions are supported only if software adheres to particular address alignment restrictions The processor handles all unaligned bus requests to little endian memory regions It executes unaligned little endian requests as several aligned requests This method of handling an unaligned bus request results in some performance loss compared to aligned requests microcode uses CPU cycles to generate aligned requests and more bus cycles are used to transfer unaligned data The processor may generate an operation unaligned fault when any unaligned request is encountered This fault can be masked with the PRCB fault configuration word 10 9 THE BUS CONTROLLER intel i When the processor encounters an unaligned request microcode breaks the unaligned request into a series of aligned requests For example if a read request is issued to read a little endian word from address XXXX XXXIH unaligned a byte request followed by a short request followed by a byte request is executed Figure 10 4 and Figure 10 5 show how aligned and unaligned bus transfers are carried out for memory regions that use little endian byte ordering If the unaligned fault is not masked
598. te requests to pipelined memory regions EXTERNAL BUS DESCRIPTION The processor asserts the WAIT signal when Nwap Ngpp or Nwpp are inserted WAIT can be used as a read or write strobe for the external memory system Wait states can also be controlled with READY and BTERM These inputs are enabled or disabled in a region by programming the memory region configuration table Refer to section 10 2 3 Wait States pg 10 3 for details on setting up bus controller for wait states When enabled READY indicates to the processor that read data on the bus is valid or a write data transfer has completed The READY pin value is ignored until the Ng Ap Ngpp Nwap Nwpp wait states expire At this time if READY is deasserted high wait states continue to be inserted until READY is asserted low Nxpa Wait states cannot be extended by READY The READY input is ignored during the idle cycles the address cycle and Nypa cycles READY is also ignored in memory regions where pipelining is enabled regardless of memory region programming For proper operation the READY inputs should be disabled in regions that have pipelining enabled The burst terminate signal BTERM breaks up a burst access Asserting BTERM low for one clock cycle completes the current data transfer and invokes another address cycle This allows a burst access to be dynamically broken into smaller accesses The resulting accesses may also be burst acce
599. te to implementation specific features For example the 1960 Cx microprocessors provide an operation unaligned fault for detecting non aligned memory accesses Future 1960 processor implementations which generate this fault are expected to assign the same fault type and subtype number to the fault 9 BREAKPOINTS Breakpoint registers are not defined in the 1960 architecture C 10 LOCK PIN The LOCK pin is not defined in the 1960 architecture Bus control logic and protocol associated with this pin may vary among 1960 processor implementations C 10 1 External System Requirements External system requirements are not defined by the architecture The external bus RESET pin clock input and output power and ground requirements testability features and character istics are all specific to the 1960 microprocessor implementation C 6 intel D MACHINE LEVEL INSTRUCTION FORMATS intel APPENDIX D MACHINE LEVEL INSTRUCTION FORMATS This appendix describes the encoding format for instructions used by the 19609 processors Included is a description of the four instruction formats and how the addressing modes relate to the these formats Refer also to APPENDIX E MACHINE LANGUAGE INSTRUCTION REFERENCE D 1 GENERAL INSTRUCTION FORMAT The 1960 architecture defines four basic instruction encoding formats as shown in Figure D 1 REG COBR CTRL and MEM Each instruction uses one of these formats which is defined by the ins
600. ted interrupts internally in the IPND register instead Code which requests interrupts by setting bits in the pending priorities and pending interrupts field of the interrupt table is not portable Also application code which expects interrupts to be posted in the interrupt table is not object code portable to all 1960 based products 5 CONSIDERATIONS FOR WRITING PORTABLE CODE intel The 1960 Cx processors do not store a 16 byte resumption record for suspended instructions in the interrupt or fault record Portable programs must tolerate interrupt stack frames with and without these resumption records OTHER i960 CA CF PROCESSOR IMPLEMENTATION SPECIFIC FEATURES Subsections that follow describe additional implementation specific features of the 1960 Cx processors These features do not relate directly to application code portability C 8 1 Data Control Peripheral Units The DMA controller bus controller and interrupt controller are implementation specific extensions to the core architecture Operation setup and control of these units is not a part of the core architecture Other implementations of the 1960 architecture are free to augment or modify such system integration features C 8 2 Fault Implementation The architecture defines a subset of fault types and subtypes which apply to all implementations of the architecture Other fault types and subtypes may be defined by implementations to detect errant conditions which rela
601. ter 2 18 Example Application of the User Supervisor Protection Model 2 22 Data Types and enne 3 1 Data Placement in 3 5 Machine Level Instruction 4 3 Source Operands for 4 20 Procedure Stack Structure and Local 5 3 c M 5 7 ie ER Der Dei E UR a EOD ER EDS 5 8 System Procedure 5 13 Previous Frame Pointer Register PFP 0 4 5 16 Interrupt Handling Data 6 2 Interrupt Tables rte een e 6 4 Storage of an Interrupt Record on the Interrupt 6 10 Flowchart for Worst Case Interrupt Latency 6 14 Fault Handling Data Structures 7 1 Fault Table and Fault Table Entries 7 5 Fault Recordi xiii E REED e 7 7 Storage of the Fault Record on the Stack 7 8 Fault Record for Parallel 7 11 Trace Controls TC Register 400
602. the processor continues executing instructions until each execution unit instruction and all out of order instructions are executed For example if an integer overflow occurs during the addition in the following code example the fault is detected before the multiply has completed execution Before invoking the integer overflow fault handling procedure the processor waits for the multiply to complete muli 92 94 96 addi 98 99 910 results in integer overflow 7 6 6 Faults in Multiple Parallel Instructions When executing instructions in parallel it is possible for faults to occur in more than one currently executing instruction In the code sequence above for example an integer overflow fault could occur for both the muli and addi instructions with the fault from the addi instruction being recognized by the processor first To report multiple parallel faults the architecture provides the parallel fault type In these parallel fault situations the processor saves the fault type and subtype of the second and subsequent faults detected in the optional data field of the fault record The fault handling procedure for parallel faults can then analyze the fault record and handle the faults The fault record for parallel faults is described in the next section The existence of multiple parallel faults is often catastrophic Multiple parallel faults are generated as imprecise faults which means that recovery from the faults is
603. the Fault Occurred pg 7 13 7 7 1 Possible Fault Handling Procedure Actions The processor allows easy recovery from many faults that occur When fault recovery is possible the processor s fault handling mechanism allows the processor to automatically resume work on the program or interrupt pending when the fault occurred Resumption is initiated with a ret instruction in the fault handling procedure If recovery from the fault is not possible or not desirable the fault handling procedure can take one of the following actions depending on the nature and severity of the fault condition or conditions in the case of multiple faults e Return to a point in the program or interrupt code other than the point of the fault e debug monitor e Explicitly write the processor state and fault record into memory and perform processor or system shutdown e Perform processor or system shutdown without explicitly saving the processor state or fault information When working with the processor at the development level a common fault handling procedure action is to save the fault and processor state information and make a call to a debugging device such as a debugging monitor This device can then be used to analyze the fault information 7 7 2 Program Resumption Following a Fault Because of the 1960 Cx processors multi stage execution pipeline faults can occur before execution of the faulting instruction i e the instruction
604. the bus controller executes the unaligned access the same as it does when the fault is masked and signals an operation unaligned fault The unaligned access fault can be used as a debug feature Removing unaligned memory accesses from an application increases performance NOTE When an unsupported unaligned bus request to a big endian region is attempted the bus controller handles the transfer exactly the same as it does for little endian regions that is it treats the data as little endian data Thus the data is not stored coherently in memory 10 10 tel THE BUS CONTROLLER Az Byte Offset 0 Word Offset 0 1 2 3 Short Word Load Store Word Load Store Double Word Load Store Short Request Aligned Byte Byte Requests I Short Request Aligned Byte Byte Requests 1 Word Request Aligned Byte Short Byte Requests 1 Short Short Requests IL D Byte Short Byte Requests One Double Word Burst Aligned T HH Byte Short Word Byte Requests I 1 Lt 1 Short Word Short Requests Request Aligned Byte Word Short Byte Requests CX048A Figure 10 4 Summary of Aligned Unaligned Transfers for Little Endian Regions 10 11 THE BUS CONTROLLER intel P 12 16 20 24 Word Offset 0 1 2 3 4 5 6 One Three Word Request Aligned
605. the desired number of wait states eight bits of data are transferred A peripheral device is accessed as described above regardless of which bus request type is issued For example if a program includes a Id word load instruction from the peripheral the load is executed as four 8 bit accesses to the peripheral 11 2 BUS OPERATION As described in Table 11 1 the 1960 Cx processor bus consists of 30 address signals four byte enables 32 data lines and various control and status signals Some signals are referred to as status signals A status signal is valid for the duration of a bus request Other signals are referred to as control signals Control signals are used to define and manage a bus request This chapter defines the bus pins and pin function intel EXTERNAL BUS DESCRIPTION Table 11 1 Bus Controller Pins Pin Name Description Input Output PCLK2 1 Processor Output Clocks D31 0 Data Bus 1 2 Control Signals 0 Byte Enables 5 Address Strobe WAIT Wait States BLAST Burst Last READY Memory Ready Burst Terminate Data Enable Status Signals W R Write Read DT R Data Transmit Receive D C Data Code Request DMA DMA Request SUP Supervisor Mode Request Bus Arbitration HOLD Hold Request HOLDA Hold Acknowledge LOCK Locked Request
606. the option is selected all DMA operations are suspended during the time that the core processes the interrupt context switch DMAs are restarted before the interrupt procedure s first instruction is encountered This option reduces interrupt latency by providing full processor resources to the interrupt context switch DMA operations can be suspended by user code in an interrupt procedure to increase procedure throughput This is accomplished by clearing the DMAC register channel enable field See section 13 10 1 DMA Command Register DMAC pg 13 21 The interrupt procedure should re enable all suspended channels before returning Issuing sdma for an active channel causes the current DMA transfer to abort Current DMA operation is terminated and the channel is set up with the newly issued sdma instruction Do not terminate a DMA operation with sdma this instruction causes a non graceful termination of a DMA transfer In other words the transfer may be aborted between a source and destination access potentially losing part of the source data Additionally status information for the terminated DMA is lost when the new sdma instruction reconfigures the channel The channel done bit is not set when sdma terminates a DMA 13 19 DMA CONTROLLER intel 13 9 CHANNEL PRIORITY Each DMA channel is assigned a priority When more than one DMA channel is enabled channel priority determines the order in which transfers execute for each
607. ther 1960 processor family members may provide extensions that recognize additional fault conditions Fault type and subtype encoding allows all faults to be included in the fault table those which are common to all 1960 processors and those which are specific to one or more family members The fault types are used consistently for all family members For example Fault Type 4 is reserved for floating point faults Any 1960 processor with floating point operations uses Entry 4 to store the pointer to the floating point fault handling procedure 7 3 FAULT TABLE The fault table Figure 7 2 is the processor s pathway to the fault handling procedures It can be located anywhere in the address space The processor obtains a pointer to the fault table during initialization The fault table contains one entry for each fault type When a fault occurs the processor uses the fault type to select an entry in the fault table From this entry the processor obtains a pointer to the fault handling procedure for the type of fault that occurred Once called a fault handling procedure has the option of reading the fault subtype or subtypes from the fault record when determining the appropriate fault recovery action 74 FAULTS 31 E Reserved Initialize to 0 Fault Table Parallel Fault Entry Trace Fault Entry Operation Fault Entry Arithmetic Fault Entry Constraint Fault Entry Protection Fault Entry Type Fault Entry Local
608. times and low cost per bit DRAM is available in a wide variety of packages making it easy to pack a lot of memory into a small space DRAM features described here are provided as general information See specific data sheets for detailed information The 1960 Cx processors burst mode bus is well suited to the high speed multiple column access modes found in DRAM Nibble fast page and static column modes of DRAM can easily be exploited to improve 1960 Cx processor memory system performance DRAMs have a multiplexed address bus a write enable input WE and two address strobes row address strobe RAS and column address strobe CAS Some DRAMs also have an output enable input OE DRAMs are accessed by placing a valid row address on the address input pins and asserting RAS then the column address is driven onto the DRAM address pins and CAS is asserted Write enable WE input on the DRAM determines whether the access is a read or write Output enable input OE found on some DRAMs controls the DRAM output buffers and can be useful for multibanked and interleaved designs B 3 1 DRAM Access Modes The modes discussed in the following subsections are e section B 3 1 1 Nibble Mode DRAM pg B 16 e section B 3 1 2 Fast Page Mode DRAM pg B 17 e section B 3 1 3 Static Column Mode DRAM pg B 18 B 15 BUS INTERFACE EXAMPLES intel B 3 1 1 Nibble Mode DRAM Nibble mode DRAM Figure B 11
609. tion Unit EU iae A 20 A 2 4 2 Multiply Divide Unit A 22 A 2 4 3 Data RAM DR tette tette tete emt ie dt te etel A 24 A 2 4 4 Address Generation Unit A 25 2 4 5 Effective Address efa Calculations 2 A 26 A 2 4 6 Bus Control Unit BCU Ret ie ed ertet A 26 A 2 4 7 Conttrol Pipeline etri re E ed EY A 28 A 2 4 8 Unconditional Branches 2 28 2 4 9 Conditional Branches mesinin n ee eee reete A 32 A 2 5 Instruction Cache And Fetch Execution A 33 A 2 5 1 Instruction Cache Organization A 33 A 2 5 2 Fetch Strategy eee iterom eei dio e ER 34 A 2 5 3 Fetch Latency eiecit D eR Euer dn laden diate 34 2 5 4 Cache Replacement 4 A 36 A 2 6 S 36 A 2 6 1 Invocation and 37 xiii 5 intel 3 A 2 6 2 Data Movement entere rib e iin A A 38 A 2 6 3 Bit and Bit Field erronee recette Mte inde A 39 A 2 6 4 Comparison iau atte ate Ae uetus A 40 A 2 6 5 Branch uh aureae PARE PER e TREE MERI DRE EVER A 40 A 2 6 6 Call and Return eec pit ete b e ie e ettet A 41 A 2 6 7 Conditional Faults 5 5 pen OR Ear diete 42 2 6 8 DO DUG ccm A 42 A 2 6 9 ATOMIC 2 avian RE e a E p A 42 A 2 6 10 Processor Management aye ete Ue e
610. tion algorithm for the alterbit instruction if AC cc1 0 0 dst lt src andnot 2 bitpos mod 32 else dst lt src or 2 bitpos mod 32 2 bitpos mod 32 is equivalent to 2021709 mod 32 Table 9 1 defines each abbreviation used in the instruction reference pseudo code Table 9 2 explains the symbols used in the pseudo code Since special function registers sfr may change independent of instruction execution the following distinctions are important when interpreting the algorithm of any instruction which references a sfr 1 When a source operand is a sfr and referenced more than once in an algorithm the operand s value at every reference is the same as the first reference In other words the instruction operates as if the sfr was actually read only once at the beginning of the instruction 2 When the same sfr is specified as the source for multiple operands of the same instruction the instruction operates as if the source sfr was actually read only once at the beginning of the instruction When either source operand appears in the action algorithm the single operand value is used 3 When a sfr is specified as a destination and the algorithm indicates more than one modifi cation of the destination the instruction operates as if the sfr were written only once at the end of the instruction 9 4 INSTRUCTION SET REFERENCE Table 9 1 Abbreviations in Pseudo code Arithm
611. tion at new balx dst lt IP inst length instruction length is 4 or 8 bytes IP lt targ resume execution at the new Faults Trace Instruction Branch Instruction and Branch Trace Events are signaled after instruction completion Trace fault is generated if PC te 1 and TC i or TC br 1 9 15 INSTRUCTION REFERENCE Example Opcode See Also 9 16 Operation intel Unimplemented Execution from on chip data RAM Operand Invalid operand value encountered Opcode Invalid operand encoding encountered bal xyz balx 42 44 OBH balx 85H lt Xyz IP 92 address of return instruction is stored in g4 example of indirect addressing CTRL MEM b bx BRANCH IF COMPARE AND BRANCH bbc bbs intel INSTRUCTION SET REFERENCE 9 3 9 bbc bbs Mnemonic bbc t f Check Bit and Branch If Clear bbs t f Check Bit and Branch If Set Format bb t f bitpos STC targ reg lit reg sfr disp Description Checks bit in src designated by bitpos and sets AC register condition code according to src value The processor then performs conditional branch to instruction specified with targ based on condition code state Optional t or f suffix may be appended to mnemonic Use t to speed up execution when these instructions usually take the branch use f to speed up execution when these instructions usually do not take the branch If suffix is not provided as
612. tion tracing is enabled an instruction trace fault condition is detected on each instruction that is executed along with other trace fault conditions that are enabled e g a call trace fault or a branch trace fault The processor generates a trace fault after each instruction and sets the appropriate bit or bits in the fault subtype field to indicate the instruction trace fault and any other trace fault subtypes that occurred 7 6 3 Multiple Trace Fault Conditions with Other Fault Conditions The execution of a single instruction can create one or more trace fault conditions in addition to multiple non trace fault conditions When this occurs the processor generates at least two faults a non trace fault and a trace fault The non trace fault is handled first and the trace fault is triggered immediately after executing the return instruction ret at the end of the non trace fault handler 7 6 4 Parallel Faults The 1960 Cx processors exploit the architecture s tolerance of parallel and out of order instruction execution by issuing instructions to multiple independent execution units on the chip The following sub sections describe how the processor handles faults in this environment 7 9 FAULTS intel 7 6 5 Faults in One Parallel Instruction When a fault occurs during the execution of a particular instruction it is not possible to suspend other instructions that are already executing in other execution units To handle the fault
613. tion with a mnemonic suffix of t for true and f for false 4 2 7 1 Unconditional Branch These instructions are used for unconditional branching b Branch bx Branch Extended bal Branch and Link balx Branch and Link Extended b and bal use the CTRL format bx and balx use the MEM format and can specify local or global registers as operands b and bx cause program execution to jump to the specified target IP These two instructions perform the same function however their determination of the target IP differs The target IP of a b instruction is specified at link time as a relative displacement from the current IP The target IP of the bx instruction is the absolute address resulting from the instruction s use of a memory addressing mode during execution bal and balx store the next instruction s address in a specified register then jump to the specified target IP For bal the RIP is automatically stored in register g14 for balx the RIP location is specified with an instruction operand As described in section 5 9 BRANCH AND LINK pg 5 18 branch and link instructions provide a method of performing procedure calls that do not use the processor s integrated call return mechanism Here the saved instruction address is used as a return IP Branch and link is generally used to call leaf procedures that is procedures that do not call other procedures bx and balx can make use of any memory addressing mode INSTRUC
614. to larger loops which fill the cache use more registers and pipeline their memory operations The strategy is to begin accessing the memory system as soon as the routine is entered and to make the best use of the bus Less bus bandwidth is used for the same operations if the algorithm is implemented with quad loads and or stores The large register set allows an unrolled loop to have multiple sets of working temporary registers for operations in various stages For example the previous checksum example is repeated in Example A 3 The loop is unrolled to perform checksums nearly twice as fast as the simple loop 46 intel INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION Example A 3 Unrolling Loops Checksum initialize initialize loop opt loop ldob 00 41 ldob g0 41 addo gl 42 g2 cmpinco g0 g3 gO cmpinco gO 9 40 addo g4 g2 g2 bl t loop bge f exitl ret ldob g0 44 cmpinco g0 93 40 addo gl 42 42 bi t opt loop exit2 addo g4 g2 g2 ret exitl addo 91 92 92 ret Execution Execution Clock REGop MEMop CTRLop Clock REGop MEMop CTRLop 1 Idob 1 Idob g1 2 2 cmpinco bge f 3 5 3 addo g4 4 addo bl t 4 Idob 04 5 cmpinco 5 cmpinco bl t 6 Idob 6 addo g1 7 Idob 01 47 INSTRUCTION EXECUTION AND PERFORMANCE OPTIMIZATION intel A 2 7 5 Enabling Constant Parallel Issue As described in section A 2 1 Parallel Issue pg
615. tor the status of the four channels e Arbitrates requests between multiple DMA channels by managing channel priority e Produces DMA event which causes DMA microcode to execute 13 36 intel DMA CONTROLLER User Program DMA user program _ and DMA issue requests Request Queue user program requests DMA requests Service Requests CA073A Figure 13 16 DMA and User Requests the Bus Queue 13 11 10 Performance DMA performance is characterized by two values throughput and latency Figure 13 17 Throughput measurement is needed as a measure of the DMA transfer bandwidth Worst case latency is required to determine if the DMA is fast enough in responding to transfer requests from DMA devices 13 37 DMA CONTROLLER intel Throughput describes how fast data is moved by DMA operations In this discussion throughput is the measure of how often DMA requests are serviced DMA throughput denoted as is measured PCLK2 1 clocks per DMA request As Figure 13 17 shows Nrggo is the time measured between adjacent assertions of DACK3 0 The established measure of throughput in units of bytes second is derived with the following equation Breg X fc Throughput bytes second X Equation 13 1 TREQ where NTREQ throughput clocks per DMA request PCLK2 1 cycles BREQ bytes per DMA request fc PCLK2 1 frequency Latency is defined as the max
616. tputs them as PA3 2 On a write the state machine jumps to the appropriate state based on the value of A3 2 When in a write state the state machine will advance to the next write state if WAIT and BLAST are not asserted The state machine can advance from any write state to the READ STATE BUS INTERFACE EXAMPLES B 2 3 Trade offs and Alternatives The example described above demonstrates a burst pipelined read SRAM memory interface Burst mode is used to improve write performance If write performance is not critical i e if the region is used only for code the next address generation PLD can be removed The design is easily expanded to accommodate multiple SRAM banks B 3 INTERFACING TO DYNAMIC RAM This section provides an overview of DRAM and DRAM access modes and describes an 1960 Cx processor specific DRAM interface Two specific design examples are also included one design uses the integrated DMA unit to refresh the DRAM the other example uses the CAS before RAS method of refresh Both designs illustrate the advantage of the 1960 Cx processors burst bus and the fast column address access times available on many modern DRAMs The burst bus and memory region configuration tables simplify DRAM interface to the 1960 Cx processors DRAM systems can be designed in many ways there are memory access options memory system configuration options and refresh mode options DRAM offers high data density fast access
617. truction s opcode field instructions are one word long and begin on word boundaries MEM format instructions are encoded in one of two sub formats MEMA or MEMB MEMB permits an optional second word to hold a displacement value The following sections describe each format s instruction word fields D 2 REG FORMAT REG format is used for operations performed on data contained in global special function or local registers Most of the i960 processor family s instructions use this format The opcode for the REG instructions is 12 bits long three hexadecimal digits and is split between bits 7 through 10 and bits 24 through 31 For example the addi opcode is 591H Here 59H is contained in bits 24 through 31 1H is contained in bits 7 through 10 srcl and src2 fields specify the instruction s source operands Operands can be global or local registers special function registers or literals Mode bits M1 for src and M2 for src2 special purpose bits 51 for src s2 for src2 and the instruction type determine what an operand specifies Ifamode bit and its associated special purpose bit are set to 0 the respective src or src2 field specifies a global or local register If the mode bit is set to 1 and the special purpose bit is set to 0 the field specifies a literal in the range of 0 to 31 If the mode bit is set to 0 and the special purpose bit is set to 1 the field specifies a special function register MAC
618. tructions operate on 32 bit integers Byte and short integers are only referenced by the byte and short classes of the load and store instructions None of the 1960 Cx processor instructions reference or produce the long integer data type Table 3 1 shows the supported integer sizes Table 3 1 Supported Integer Sizes Integer size Descriptive name 8 bit byte integers 16 bit short integer 32 bit integers 64 bit long integers NOTE HLL compilers may define long integer types differently than the 1960 archi tecture Integer load or store size byte short or word determines how sign extension or data truncation is performed when data is moved between registers and memory For instructions Idib load integer byte and Idis load integer short a byte or short word in memory is considered a two s complement value The value is sign extended and placed in the 32 bit register which is the destination for the load For instructions stib store integer byte and stis store integer short a 32 bit two s complement number in a register is stored to memory as a byte or short word If register data is too large to be stored as a byte or short word the value is truncated and the integer overflow condition is signalled When an overflow occurs either an AC register flag is set or the integer overflow fault is generated CHAPTER 7 FAULTS describes the integer overflow fault For instructions Id load word and st store
619. ts most relaxed options e Non burst NgaAp 3l Non pipelined Nepp 3 e Ready disabled Nwap 31 e Bus width 8 bits Nwpp 3 Little endian byte order e NxpA 73 With this region configuration the first byte of bus configuration data is loaded from the IBR This byte is immediately placed into the lower byte of the MCONO register This action provides the user specified Ng Ap pipeline control ready control and burst control values for bus configuration The remaining configuration data bytes are then read with requests which use the new Ng value Once all three bytes are read MCONO is rewritten and initialization continues This reduces the number of clocks required to load the bus configuration data The configuration data in MCONO controls all memory regions The bus configuration data is typically programmed for a system s region 15 bus characteristics This is done because the remainder of the IBR and the data structures must be loaded using the new bus characteristics and the IBR is fixed in region 15 The processor loads the remainder of the IBR which consists of the first instruction pointer the PRCB pointer and six checksum words The PRCB pointer and the first instruction pointer are internally cached The six checksum words along with the PRCB pointer and the first instruction pointer are used in a checksum calculation which implements a confidence test of the external bus The sum of these eight words plus F
620. turn to a stack frame for which the local registers are not cached the processor reloads the locals from memory flushreg is provided to allow a debugger or application program to circumvent the processor s normal call return mechanism For example a debugger may need to go back several frames in the stack on the next return rather than using the normal return mechanism that returns one frame at a time Since the local registers of an unknown number of previous stack frames may be cached a flushreg must be executed prior to modifying the PFP to return to a frame other than the one directly below the current frame Write all cached local register sets except the current set to memory Invalidate the local register cache Type Mismatch Non supervisor attempt to write to internal data RAM flushreg flushreg 66D REG intel INSTRUCTION SET REFERENCE 9 3 28 fmark Mnemonic fmark Force Mark Format fmark Description Generates a breakpoint trace event Causes a breakpoint trace event to be generated regardless of breakpoint trace mode flag setting providing the PC register trace enable bit bit 0 is set When a breakpoint trace event is detected the PC register trace fault pending flag bit 10 and the TC register breakpoint trace event flag bit 23 are set Then a breakpoint trace fault is generated before the next instruction executes For more information on trace fault generation refer to CHAPTER 7 FAULTS A
621. ually unpredictable 5 2 intel PROCEDURE CALLS The procedure stack can be located anywhere in the address space and grows from low addresses to high addresses It consists of contiguous frames one frame for each active procedure Local registers for a procedure are assigned a save area in each stack frame Figure 5 1 The procedure stack available to the user begins after this save area To increase procedure call speed the architecture allows an implementation to cache the saved local register sets on chip Thus when a procedure call is made the contents of the current set of local registers often does not have to be written out to the save area in the stack frame in memory Refer to section 5 2 4 Caching of Local Register Sets pg 5 6 and section 5 2 5 Mapping Local Registers to the Procedure Stack pg 5 9 for more about local registers and procedure stack interrelations Procedure Stack Current Register Set Previous Frame Pointer PFP rO Stack Pointer SP ri Previous Return Instruction Pointer RIP r2 Stack Frame r15 Frame Pointer FP 01 5 2 user allocated stack padding area Previous Frame Pointer PFP i 1 Stack Pointer SP ri register Current reserved for RIP r2 save area Stack Frame n15 user allocated stack unused stack stack growth toward higher addresses F_CA010A Figure 5 1 Procedure Stack Structure and Local Registers 5 3 PROCEDURE CALLS intel 5 2
622. ue of the reserved bits Normally modpc is not used to directly modify execution mode trace fault pending and state flags except under special circumstances such as in initial ization code 2 19 PROGRAMMING ENVIRONMENT intel 2 6 4 Trace Controls TC Register The TC register in conjunction with the PC register controls processor tracing facilities It contains trace mode enable bits and trace event flags which are used to enable specific tracing modes and record trace events respectively Trace controls are described in CHAPTER 8 TRACING AND DEBUGGING 2 7 USER SUPERVISOR PROTECTION MODEL The capability of a separate user and supervisor execution mode creates a code and data protection mechanism referred to as the user supervisor protection model This mechanism allows code data and stack for a kernel or system executive to reside in the same address space as code data and stack for the application The mechanism restricts access to all or parts of the kernel by the application code This protection mechanism prevents application software from inadvertently altering the kernel 2 7 1 Supervisor Mode Resources The processor can be in either of two execution modes user or supervisor Supervisor mode is a privileged mode which provides several additional capabilities over user mode e When the processor switches to supervisor mode it also switches to the supervisor stack Switching to the supervisor stack helps maintain a ker
623. ulo and divo by the power of 2 respectively shli shifts zeros in from the least significant bit An overflow fault is generated if the bits shifted out are not the same as the most significant bit bit 31 If overflow occurs dst will equal src shifted left as much as possible without overflowing shri performs a conventional arithmetic shift right operation by shifting in the most significant bit bit 31 When this instruction is used to divide a negative integer operand by the power of 2 it produces an incorrect quotient discarding the bits shifted out has the effect of rounding the result toward negative shrdi is provided for dividing integers by the power of 2 With this instruction is added to the result if the bits shifted out are non zero and the src operand was negative which produces the correct result for negative operands shli and shrdi are equivalent to muli and divi by the power of 2 eshro is provided for extracting a 32 bit value from a long ordinal 1 64 bits which is contained in two adjacent registers Action shlo if len lt 32 dst lt src lt lt len else dst lt 0 shro if len lt 32 dst lt src gt gt len else dst lt 0 9 69 INSTRUCTION SET REFERENCE intel Faults Example 9 70 shli shri shrdi Type Arithmetic shli 13 if len gt 32 1 lt 32 else i len temp lt src Ss sign lt temp bit31 while temp bit31 s sign and
624. upt detection external DMA Sources Sources SYSCTL Instruction Request Interrupt Pending impl d in th E plemented in the Interrupt Register IPND l software generated hardware generated interrupts interrupts Figure 12 1 Interrupt Controller F_CA049A 12 2 1 Interrupt Controller Modes The eight external interrupt pins can be configured for one of three modes expanded dedicated and mixed Each mode is described in the subsections that follow 12 3 INTERRUPT CONTROLLER In 12 2 1 1 Dedicated Mode In dedicated mode each external interrupt pin is assigned a vector number Vector numbers that be assigned to a pin are those with the encoding PPPP 0010 Figure 12 2 where bits marked P are programmed with bits in the interrupt map IMAP registers This encoding of programmable bits and preset bits can designate 15 unique vector numbers each with a unique even numbered priority Vector 0000 0010 is undefined it has a priority of 0 Dedicated mode interrupts are posted in the interrupt pending IPND register Single bits in the IPND register correspond to each of the eight dedicated external interrupt inputs plus the four DMA inputs to the interrupt controller The interrupt mask IMSK register selectively masks each of the dedicated mode interrupts The IMSK register can optionally be saved and cleare
625. ware must also manipulate the following registers and control bits to enable the various tracing modes and enable or disable tracing in general These controls are described in the following sub sections e TC register mode bits e PC register trace enable bit e PC register trace fault pending flag PFP register return status field prereturn trace flag bit 0 e System procedure table supervisor e BPCON register breakpoint mode bits and stack pointer field trace control bit enable bits in the control table e PBO IPBI registers address field e DABO DABI registers address field and in the control table enable bit in the control table 8 1 TRACING AND DEBUGGING lel 8 1 1 Trace Controls TC Register The TC register Figure 8 1 allows software to define conditions which generate trace events Trace Mode Bits Instruction Trace Mode TC i Branch Trace Mode TC b Call Trace Mode TC c Return Trace Mode TC r Pre Return Trace Mode TC p Supervisor Trace Mode TC s Breakpoint Trace Mode TC br 81 28 24 20 16 Trace Event Flags Instruction TC if Branch TC bf Call TC cf Return TC rf Pre Return TC pf Supervisor TC sf Breakpoint TC brf Hardware Breakpoint Event Flags Instruction Breakpoint 0 TC iOf Reserved Instruction Address Breakpoint 1 TC i1f Data Address Breakpoint 0 TC dOf Data Address Breakpoint 1 TC d1f
626. with the programmed internal wait state counter If READY and BTERM are enabled in a region these pins are sampled only after the programmed number of wait states expire If the inputs are disabled in a region the inputs are ignored and the internal wait state counter alone determines access wait states Refer to section 11 2 1 Wait States pg 11 4 for details on the operation of the READY and BTERM inputs NOTE READY and BTERM must be disabled in regions where pipelined reads are enabled 10 2 4 Byte Ordering Byte ordering determines how data is read from or written to the bus and ultimately how data is stored in memory Byte ordering can be individually selected for each memory region by setting a bit in the corresponding MCON register The bus controller supports big endian and little endian byte ordering for memory operations little endian The controller reads or writes a data word s least significant byte to the bus eight least significant data lines D7 0 Little endian systems store a word s least significant byte at the lowest byte address in memory For example if a little endian ordered word is stored at address 600 the least significant byte is stored at address 600 and the most significant byte at address 603 big endian The controller reads or writes a data word s least significant byte to the bus eight most significant data lines D31 24 Big endian systems store the least significant byte at the highest b
627. wpp wait states required is a function of memory write cycle time The number of wait states required is a function of the memory system s output to float time determines how soon read data from the memory must be off the data bus before any other device asserts data on the data bus This could be a read from another memory system or a write from the 1960 Cx processors B 1 4 2 Output Enable and Write Enable Logic The output enable signal is simply see Figure 1 OE W R Equation B 2 The PLD is used to buffer the W R signal this may be necessary to reduce the load on the W R signal The write enable signals are WE WAIT amp W R Or WEO WE amp BEO WEI WE amp WE2 amp BE2 WE3 amp BE3 The WAIT signal is used to create the write strobe When W R indicates a write and BEx and WAIT are asserted the logic asserts WE The i960 CA CF Microprocessor Data Sheets guarantee a relationship from WAIT high to write data invalid B 1 4 3 State Machine Descriptions The state machine PLD incorporates two state machines one controls SRAM chip enable CE the other generates the A3 2 address signals for multiple word burst accesses B 6 3 BUS INTERFACE EXAMPLES The chip enable state machine Figure B 4 controls the CE signal CE is normally not enabled but when both ADS and BSRAM CS are asserted CE is asserted and remains asserted until BLAST is asserted BLAST i
628. x processor implementations the non maskable interrupt NMJ interrupts priority 31 execution no interrupt can interrupt an NMI handler The processor may post requests for later servicing Interrupts waiting to be serviced called pending interrupts are discussed in section 6 4 2 Pending Interrupts pg 6 5 6 4 INTERRUPT TABLE The interrupt table Figure 6 2 1028 bytes in length can be located anywhere in the non reserved address space It must be aligned on a word boundary The processor reads a pointer to interrupt table byte 0 during initialization The interrupt table must be located in RAM since the processor must be able to read and write the table s pending interrupt section The interrupt table is divided into two sections vector entries and pending interrupts Each are described in the subsections that follow 6 3 INTERRUPTS 000H 004H 31 87 0 Pending Priorities Pending Interrupts 020H 024H Vector 8 028H Vector 9 Entry 10 02 Vector 10 Entry 243 3DOH Vector 243 3D4H Vector 244 Vector 247 3E8H Vector 249 Vector 251 Entry 252 3F4H Vector 252 Entry 255 400H Vector 255 Vector Entry 21 Instruction Pointer 1 Entry x Jo 00 Normal Reserved Initialize to 0 01 Reserved p 4 10 Target in Cache reserve 11 Reserved F 016 Figure 6 2 Interrupt Table 6 4 1 Vector Ent
629. y programming the Cache Configuration Word in the PRCB See section 14 2 6 Process Control Block PRCB pg 14 8 The modes allow the cache to be turned off temporarily to aid in debugging When the cache is disabled the processor depends on a 16 word instruction buffer to provide decoding instructions The instruction buffer operates as a small cache organized as two sets of two way set associative cache with a four word line size When the main cache is disabled small code loops may still execute entirely within the instruction buffer Modes 100 and 110 select cache load and lock options These modes determine whether half or all of the cache is loaded with instructions and locked against further updates The sysctl instruc tion s field 3 must contain an address this address points to a quad word aligned block of memory in the external address space Instructions starting at this address are loaded into the cache These instructions can only be accessed by selected interrupts which vector to these instructions addresses The load and lock mechanism selectively optimizes latency and throughput for interrupts 4 3 2 4 Reinitialize Processor Executing sysctl with message type 03H reinitializes the processor sysctl fields 3 and 4 must contain respectively the First Instruction Pointer and the PRCB Pointer Reinitialization bypasses the 1960 Cx processors built in self test The PRCB is processed and the processor branches to the
630. y accesses are handled by the processor or generate a fault See section 10 4 DATA ALIGNMENT pg 10 9 7 8 FAULT HANDLING ACTION Once a fault occurs the processor saves the program state calls the fault handling procedure and when the fault recovery action completes restores the program state if possible No software other than the fault handling procedures is required to support this activity 7 14 ntel FAULTS Table 7 2 Fault Flags or Masks Flag or Mask Name Location Faults Affected Integer Overflow Mask Bit Arithmetic Controls AC Register Integer Overflow No Imprecise Faults Bit Arithmetic Controls AC Register Imprecise Faults Trace Enable Bit Process Controls PC Register All Trace Faults Trace Mode Flags Trace Controls TC Register All Trace Faults Unaligned Fault Mask Process Control Block PRCB Unaligned Fault NOTE The unaligned fault unaligned fault mask and the processor control block are i960 Cx processor extensions to the i960 architecture Three different types of implicit procedure calls can be used to invoke the fault handling procedure according to the information in the selected fault table entry a local call a system local call and a system supervisor call The following sections describe actions the processor takes while handling faults It is not necessary to read these sections to use the fault handling mechanism or to write a fault handling pro
631. yte address in memory So if a big endian ordered word is stored at address 600 the least significant byte is stored at address 603 and the most significant byte at address 600 10 3 PROGRAMMING THE BUS CONTROLLER The bus controller is programmed using 17 control registers 16 of which are MCONO 15 the remaining one is the Bus Configuration BCON register Control registers are automatically loaded at initialization from the control table in external memory Control registers are modified by using the load control registers message of the system control sysctl instruction See section 2 3 CONTROL REGISTERS pg 2 6 for control register definition 10 5 THE BUS CONTROLLER intel 10 3 1 Memory Region Configuration Registers MCON 0 15 The control table contains 16 memory region control registers MCON 0 15 Each specifies number of wait states e burst mode data bus width pipeline mode byte ordering e external ready mode for the region that it controls An address four most significant bits indicate which region is being accessed Each MCON register is 32 bits wide see Figure 10 1 and Figure 10 2 however not all bits are currently used Table 10 1 defines MCON 0 15 register s programmable bits Address Address Space Memory Region Table Entry Configuration 1 0000 0000H Resend Tolo 256 MBytes 1000 0000H 227 Regions 1 12 ae Entries 1 12 go D000 0000H Region 13 ny gue 256 M

i960 CA/CF Microprocessor User`s Manual

Contents

Download Pdf Manuals

Related Search

Related Contents