Home

CPU Design

image

Contents

1. CPU D esign A s we saw in Chapter 4 a CPU contains three main sections the register sec tion the arithmetic logic unit ALU and the control unit These sections work to gether to perform the sequences of micro operations needed to perform the fetch decode and execute cycles of every instruction in the CPU s instruction set In this chapter we examine the process of designing a CPU in detail To demonstrate this design process we present the designs of two CPUs each implemented using hardwired control A different type of control which uses a microsequencer is examined in Chapter 7 We start by analyzing the applications for the CPU For instance will it be used to control a microwave oven or a personal computer Once we know its application we can determine the types of programs it will run and from there we can develop the instruction set architecture ISA for the CPU Next we determine the other registers we need to include within the CPU that are not a part of its ISA We then design the state dia gram for the CPU along with the micro operations needed to fetch decode and execute each instruction Once this is done we define the internal data paths and the necessary control signal Finally we design the control unit the logic that generates the control signals and causes the operations to occur In this chapter we present the complete design of two simple CPUs along with an
2. Figure 6 14 Generic bidirectional data pin eq Input Data Enable N Output Data Enable To From Internal Bus Unlike the Very Simple CPU it is not a trivial matter to assign connections between the registers and the bits of the data bus in the Relatively Simple CPU AR and PC are 16 bit registers connected to a 16 bit bus so they present no problem The remaining 8 bit registers can be connected to bits 7 0 of the bus Although this configuration allows almost every individual transfer to take place it causes prob lems for several states e During FETCH3 the CPU must transfer IR DR and AR lt PC simulta neously As configured both transfers would need to use bits 7 0 of the internal bus at the same time which is not allowable Since IR receives data only from DR it is possible to establish a direct path SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU from the output of DR to the input of IR allowing IR DR to occur without using the internal bus This allows the CPU to perform both operations simultaneously We can also disconnect the input of IR from the internal bus since it no longer receives data from the bus During LDAC2 and several other states TR DR and DRM need to use the bus simultaneously Fortunately TR also receives data only from DR so the CPU can include a direct path from the output of DR to the input of TR just as we did for IR The input of TR is also dis
3. MEMBUS FETCH2 v ADDI v ANDI PCBUS FETCHI Finally the control unit must generate a READ signal which is output from the CPU This signal causes memory to output its data value This occurs when memory is read during states FETCH2 ADDI and AND1 so READ can be set as follows READ FETCH2 v ADDI v ANDI The circuit diagram for the portion of the control unit that gener ates these signals is shown in Figure 6 10 This completes the design of the Very Simple CPU CHAPTER 6 CPU DESIGN Figure 6 10 Control signal generation for the Very Simple CPU FETCH ARLOAD IRLOAD FETCH3 PETEN JMP1 PCLOAD AND2 ALUSEL FETCH2 ADDI gt MEMBUS FETCH2 PCINC ANDI FETCHI PCBUS ADDIT 5 DRLOAD A FETCH3 ane DRBUS ADD2 JMP1 _ gt ACLOAD AND2 FETCH2 ADDI READ INCI ACINC ANDI 6 2 8 Design Verification Now that we have designed the CPU we must verify that it works properly To do so we trace through the fetch decode and execute cy cles of each instruction Consider this segment of code containing each instruction once 0 ADD4 1 AND5 2 INC 3 JMPO 4 27H 5 39H The CPU fetches decodes and executes each instruction follow ing the appropriate state sequences from the state diagram ADD4 FETCHISFETCH2 FETCH3 ADDI ADD2 AND5 FETCHISFETCH2 FETCH3ANDIAND2 INC FETCH1 3FETCH2 3FE
4. The JMPZ and JPNZ instructions each have two possible outcomes de pending on the value of the Z flag If the jump is to be taken the CPU follows execution states exactly the same as those used by the JUMP CHAPTER 6 CPU DESIGN instruction However if the jump is not taken the CPU cannot simply return to the fetch routine After the fetch routine the PC contains the address of the low order half of the jump address If the jump is not taken the CPU must increment the PC twice so that it points to the next instruction in memory not to either byte of T The states to perform the JMPZ instruction are as follows Note that the JMPZY states are executed if Z 1 and the JMPZN states are executed if Z 0 JMPZY1 DRM AR AR 1 JMPZY2 TReDR DR M JMPZY3 PC DR TR JMPZNI PC PC 1 JMPZN2 PC PC 1 The states for JPNZ are identical but are accessed under opposite con ditions that is JPNZY states are executed when Z 0 and JPNZN states are traversed when Z 1 JPNZY1 DRM AR AR 1 JPNZY2 TReDR DR M JPNZY3 PC DR TR JPNZN1 PC PC 1 JPNZN2 PCePC 1 6 33 7 The Remaining Instructions The remaining instructions are each executed in a single state For each state two things happen The correct value is generated and stored in AC and the zero flag is set If the result of the operation is 0 Zis set to 1 otherwise it is set to 0 Since this happens during a sin gle state the CPU cannot first store the re
5. connected from the internal bus During LDAC3 and several other states DR and TR must be placed on the bus simultaneously DR on bits 15 8 and TR on bits 7 0 However DR is connected to bits 7 0 of the bus One way to handle this is simply to connect the output of DR to bits 15 8 instead of bits 7 0 but that would cause a problem during LDAC5 and other states which need DR on bits 7 0 Another solution implemented here is to route the output of DR to both bits 15 8 and bits 7 0 Separate buffers with different enable signals must be used because DR should not be active on both halves of the bus simultaneously Finally we must connect register Z Reviewing the states and their functions we see that Z is only set when an ALU operation occurs It is set to 1 if the value to be stored in AC which is the output of the ALU is 0 otherwise it is set to 0 To implement this we NOR together the bits output from the ALU The NOR will produce a value of 1 only if all bits are 0 thus the output of the NOR gate can serve as the input of Z This is why we implemented the increment and clear operations via the ALU rather than incorporating them directly into the AC register Figure 6 15 on page 246 shows the internal organization of the CPU after incorporating these changes 6 3 5 Design of a Relatively Simple ALU All data that is to be loaded into AC must pass through the ALU To de sign the ALU we first list all transfers that modify
6. tion PC AR instead Either is acceptable 6 2 4 4 INC Instruction The INC instruction can also be executed using a single state The CPU simply adds 1 to the contents of AC and goes to the fetch routine The state for this execute cycle is INCI AC AC 1 The state diagram for this CPU including the fetch decode and exe cute cycles is shown in Figure 6 4 on page 222 6 2 5 Establishing Required Data Paths The state diagram and register transfers specify what must be done in order to realize this CPU Now we must design the CPU so that it actu ally does these things First we look at what data transfers can take place and design the internal data paths of the CPU so this can be done The operations associated with each state for this CPU are FETCHI ARS PC FETCH2 DRM PC PC 1 FETCH3 IRS DRI7 6 AR DR 5 0 ADD1 DR M ADD2 ACE AC DR AND1 DR M AND2 ACcC AC DR JMP1 PC DRI5 0 INCI AC AC 1 222 CHAPTER 6 CPU DESIGN Figure 6 4 Complete state diagram for the Very Simple CPU FETCH FETCH2 FETCH3 ADD2 AND2 If this looks like RTL code you re headed in the right direction Note that memory supplies its data to the CPU via pins D 7 0 Also recall that the address pins A 5 0 receive data from the address register so the CPU must include a data path from the outputs of AR to A To design the data paths we can take one of two approaches The f
7. 8 bit data register DR which receives instructions and data from memory and transfers data to memory via D 7 0 An 8 bit instruction register IR which stores the opcode fetched from memory An 8 bit temporary register TR which temporarily stores data dur ing instruction execution Besides the differences in register size there are several differences between the registers for this CPU and the Very Simple CPU These changes are all necessary to accommodate the more complex instruc tion set First of all notice that the program counter can hold not only the address of the next instruction but also the address of the next operand In the Very Simple CPU the only operand is an address that is fetched along with the opcode The Relatively Simple CPU uses 8 bit opcodes and 16 bit addresses If the opcode and address were packed into one word it would have to be 24 bits wide For instructions that do not access memory the 16 bit address portion of the instruction code would be wasted To minimize unused bits the CPU keeps each word byte 8 bits wide but uses multiple bytes to store the instruction and its address Part of the time the PC will be pointing to the memory byte containing the opcode but at other times it will be pointing to the memory bytes containing the address This may seem a bit confus ing but it will become clearer during the design of this CPU The Very Simple CPU could not output data The Relatively Sim ple CP
8. CPU contains a 6 bit address register AR and program counter PC an 8 bit accumulator AC and data register DR and a 2 bit in struction register JR The CPU must realize the following instruction set Instruction Instruction Code Operation COM OOXXXXXX AC AC JREL 01AAAAAA PC PC 00AAAAAA OR 10AAAAAA AC AC v M 00AAAAAA SUBI 1 1AAAAAA AC AC M 00AAAAAA 1 29 Design a CPU that meets the following specifications It can access 256 words of memory each word being 8 bits wide The CPU does this by outputting an 8 bit address on its output pins A 7 0 and reading in the 8 bit value from memory on its inputs D 7 0 e The CPU contains an 8 bit address register AR program counter PC accumulator AC and data register DR and a 3 bit instruc tion register JR The CPU must realize the following instruction set Note that a is an 8 bit value stored in the location immediately following the instruction Instruction Instruction Code Operation LDI 000XXXXX a ACc a STO 001XXXXX a M a AC ADD 010XXXXX a AC AC Ma OR 011XXXXX a AC AC v Mla JUMP 100XXXXX a PC Sa JREL 101AAAAA PCc PC 000AAAAA SKIP 1 10XXXXX PC PC 1 RST 111 XXXXX PC 0 AC 0 CHAPTER 6 CPU DESIGN Modify the Relatively Simple CPU so that it can use a stack The changes required to do this are as follows Include a 16 bit stack pointer SP regi
9. DESIGN m The following control unit is supposed to realize the state diagram also shown but it does not Show the state diagram it actually realizes di FETCHI FETCH2 FETCH3 FETCH4 IAI IA2 VY X X v Y Counter LD INC CLR Decoder FETCH4 IA3 IBI FETCH2 ps FETCH3 IB2 IC IB 1 IC2 1 IDI OD 0 NOU RF VU N o IDI IC2 EH PROBLEMS Modify the control unit of Problem 4 so that it realizes the state dia gram properly We wish to modify the Very Simple CPU to incorporate a new instruc tion CLEAR which sets AC 0 the instruction code for CLEAR is 111X XXXX The new instruction code for INC is 110X XXXX all other in struction codes remain unchanged Show the new state diagram and RTL code for this CPU For the CPU of Problem 6 show the modifications necessary for the register section For the CPU of Problem 6 show the modifications necessary for the control unit Include the hardware needed to generate any new or modified control signals Verify the functioning of the CPU of Problems 6 7 and 8 for the new instruction We wish to modify the Very Simple CPU to incorporate a new 8 bit reg ister R and two new instructions MVAC performs the transfer R AC and has the instruction code 1110 XXXX MOVR performs the opera tion AC R and has the instruction code 1111 XXXX The new instruc tion code for INC is 110X XXXX all other instruc
10. L0gy ZN 9 Jeyng sseuppe eieq sng sseJppy uy yesay Shey ino jasay 9 Jeyng sseuppy Aes lt Jaysibay yore sseyppe JejueujeJoep 1ejueuieJou Buipoou3 opo euluoe pue Jepooeq uonondjsu 8 J91sIBSY uononasu vwd PICH W Ol S S 41V 4M ay Apeay ino 19 snyeis 043300 0305 pue Butuil p GND Alddns AS lt 19 0d 8 nv wun 21607 PNEU sdoj4 dij4 s Bely ft sng 2q jeuJa1U 1Iq 8 JOU0D 0 1 eues qos als dv l SZ LSY 011U07 1dnuiu S S LSY S 9 LSY ULNI VINI CHAPTER 6 CPU DESIGN times it must tri state its connections to these buses this is the func tion of these buffers This happens when the computer is performing a DMA transfer described in detail in Chapter 10 In addition the data address buffers determine whether data is input to or output from the CPU just as was done with the Relatively Simple CPU The interrupt control block contains the interrupt mask register The user can read the value from this register or store a value into that register so it is included in the microprocessor s instruction set architecture and its register section The serial I O control block also contains a register to latch serial output data The registers communicate within the CPU via the 8 bit internal data bus Although it is not very clear in Figure 6 19 the connection
11. also as sign ADD1 and ADD to consecutive counter values as well as ANDI and AND2 Assign the first state of each execute routine based on the instruc tion opcodes and the maximum number of states in the execute routines Use the opcodes to generate the data input to the counter and the LD input of the counter to reach the proper execute rou tine This point squarely addresses the implementation of in SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU struction decoding Essentially it implements a mapping of the opcode to the execute routine for that instruction It occurs ex actly once in this and all CPUs at the last state of the fetch cycle To load in the address of the proper execute routine the control unit must do two things First it must place the address of the first state of the proper execute routine on the data inputs of the counter Second it must assert the LD signal of the counter The LD signal is easy itis directly driven by the last state of the fetch cycle FETCH3 for this CPU The difficulty comes in allocating counter values to the states Toward that end consider the list of instructions their first states and the value in register IR for those instructions as shown in Table 6 2 The input to the counter is a function of the value of IR The goal is to make this function as simple as possible Consider one pos sible mapping 10IR 1 0 That is if IR 00 the input to the counter is 1000 for I
12. analysis of their shortcomings We also look at the internal architecture of the Intel 8085 microprocessor whose instruction set architecture was intro duced in Chapter 3 6 1 Specifying a CPU The first step in designing a CPU is to determine its applications We don t need anything as complicated as an Itanium microproces sor to control a microwave oven a simple 4 bit processor would SECTION 6 1 SPECIFYING A CPU 215 be powerful enough to handle this job However the same 4 bit processor would be woefully inadequate to power a personal com puter The key is to match the capabilities of the CPU to the tasks it will perform Once we have determined the tasks a CPU will perform we must design an instruction set architecture capable of handling these tasks We select the instructions a programmer could use to write the appli cation programs and the registers these instructions will use After this is done we design the state diagram for the CPU We show the micro operations performed during each state and the conditions that cause the CPU to go from one state to another A CPU is just a complex finite state machine By specifying the states and their micro operations we specify the steps the CPU must perform in order to fetch decode and execute every instruction in its instruc tion set Figure 6 1 Generic CPU state diagram Decode gt Execute CHAPTER 6 C
13. components of the control unit and the rest of the blocks are parts of the register section Let s look at these sections in some detail The easiest component to examine is the 8085 s ALU It performs all arithmetic logic and shift instructions making its result available to the registers via the 8 bit internal data bus Control signals from the control unit not shown in Figure 6 19 select the function to be performed by the ALU The register section contains the user accessible registers speci fied in the 8085 s instruction set architecture A B C D E H L SP and the flags This section also contains the microprocessor s instruction register and program counter a temporary register that it uses to input data to the ALU and an address latch which is equivalent to the AR register in the Relatively Simple CPU Although not shown in Figure 6 19 two additional temporary registers are used by the microproces sor to store data during the execution of an instruction They serve the same purpose as the TR register in the Relatively Simple CPU Although not registers the address and data address buffers are included in the register section Under certain conditions the 8085 does not access the system address and data buses During these Figure 6 19 Internal organization of the 8085 microprocessor MCS 80 85 Family User s Manual Reprinted by permission of Intel Corporation Copyright Intel Corporation 1979 snq eyep ssouppy
14. data among themselves could be connected to more than one bus or there could be connections established between buses For the Relatively Simple CPU one bus could be set up for ad dress information and another for data One possible configuration which uses three buses is shown in Figure 6 18 on page 254 Another benefit of multiple buses is the elimination of direct connections between components Recall that the Relatively Simple CPU included direct connections from DR to TR and IR so that multiple data transfers could occur simultaneously during FETCH3 LDAC2 STAC2 JUMP2 JMPZY2 and JPNZY2 As the number of registers within the CPU increases this becomes increasingly important 6 4 3 Pipelined Instruction Processing In the CPUs developed in this chapter one instruction is fetched de coded and executed before the next instruction is processed In pipelining instructions are processed like goods on an assembly line While one instruction is being decoded the next instruction is fetched and while the first instruction is being executed the second is decoded and a third instruction is fetched Overlapping the fetch decode and execute of several instructions allows programs to be ex ecuted more quickly even though each individual instruction requires the same amount of time Although this process has some problems particularly with con ditional and unconditional jump instructions it offers significant in creases in performa
15. instruction set architecture and the system designer computer organization In this chapter we examined the CPU from the perspective of the computer architect To design a CPU we first develop its instruction set architecture including its instruction set and its internal registers We then create a finite state machine model of the micro operations needed to fetch decode and execute every instruction in its instruction set Then we develop an RTL specification for this state machine A CPU contains three primary sections the register section con sisting of the registers in the CPU s ISA as well as other registers not directly available to the programmer the ALU and the control unit The micro operations in its RTL code specify the functions to be per formed by the register section and the ALU These micro operations are used to design the data paths within the register section includ ing direct connections and buses and the functions of each register The micro operations also specify the functions of the ALU Since the ALU must perform all of its calculations in a single clock cycle it is constructed using only combinatorial logic The conditions under which each micro operation occurs dictate the design of the control unit The control unit generates the control signals that load increment and clear the registers in the register sec tion The control unit also enables the buffers used to control the CPU s internal buses The f
16. internal control signals of the microprocessor Finally the interrupt control and serial I O control blocks are partially elements of the control unit The interrupt control block ac cepts external interrupt requests checks whether the requested inter rupts are enabled and passes valid requests to the rest of the control unit As with the internal control signals the path followed by these requests is not shown in Figure 6 19 but it is present nonetheless The serial I O control block contains logic to coordinate the serial transfer of data into and out of the microprocessor The 8085 microprocessor addresses several but not all of the shortcomings of the Relatively Simple CPU First of all it contains more general purpose registers than the Relatively Simple CPU This allows the 8085 to use fewer memory accesses than the Relatively PROBLEMS Simple CPU to perform the same task The 8085 microprocessor also has a larger instruction set and has the ability to handle subroutines and interrupts However it still uses only one internal bus to transfer data which limits the number of data transfers that can occur at any given time The 8085 also does not use an instruction pipeline Like the Relatively Simple CPU it processes instructions sequentially it fetches decodes and executes one instruction before fetching the next instruction 6 6 Summary In previous chapters we looked at the CPU from the point of view of the programmer
17. load Now we connect every component to the system bus including tri state buffers where necessary We also connect output pins A 15 0 and bidirectional pins D 7 0 The preliminary connections are shown in Figure 6 13 Next we modify the design based on the following considerations 1 Asin the Very Simple CPU AR and IR of the Relatively Simple CPU do not supply data to other components We can remove their outputs to the internal bus 2 Pins D 7 0 are bidirectional but the current configuration does not allow data to be output from these pins Preliminary register section for the Relatively Simple CPU SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU 90H NU RS TR L DEREN qu IR L 8 Le i 7 Fm AC L I CLK 16 bit bus CHAPTER 6 CPU DESIGN 3 The 16 bit bus is not fully used by all registers We must specify which bits of the data bus are connected to which bits of the registers 4 Register Z is not connected to anything To address the first point we simply remove the unused connec tions The second point is also straightforward A standard way to im plement bidirectional pins is to use a pair of buffers one in each di rection One buffer is used to input data from the pins and the other outputs data to the pins The two buffers must never be enabled si multaneously This configuration is shown in Figure 6 14
18. reader CHAPTER 6 CPU DESIGN Table 6 6 State generation for a Relatively Simple CPU State Function State Function FETCHI TO JMPZY1 YMPZAZAT3 FETCH2 TI JMPZY2 UMPZ Z T4 FETCH3 T2 JMPZY3 UJMPZ Z T5 NOPI INOP T3 JMPZNI YMPZ A Z 473 LDACI ILDAC T3 JMPZN2 UMPZ Z T4 LDAC2 ILDAC T4 JPNZY1 UJPNZ Z T3 LDAC3 ILDAC T5 JPNZY2 UJPNZ Z T4 LDAC4 ILDAC T6 JPNZY3 UPNZ A ZA T5 LDAC5 ILDAC T7 JPNZNI UJPNZ Z T3 STACI ISTAC T3 JPNZN2 PNZ Z ATA STAC2 ISTAC T4 ADDI IADD T3 STAC3 ISTAC T5 SUBI ISUB T3 STAC4 ISTAC T6 INACI IINAC T3 STAC5 ISTAC T7 CLACI ICLAC T3 MVACI IMVAC T3 ANDI IAND T3 MOVRI IMOVR T3 ORI IOR T3 JUMPI JUMP T3 XORI IXOR T3 JUMP2 JUMP T4 NOTI INOT T3 JUMP3 JUMP T5 Finally we generate the ALU control signals in the same manner For example ALUSI ADDI v SUBI v INACI and ALUS4 SUBI v INACI The remaining control signals are left as exercises for the reader 6 3 7 Design Verification To verify the design of this CPU the designer should prepare a trace of the execution as was done for the Very Simple CPU For the JMPZ and JPNZ instructions the trace should show the execution under all possible circumstances in this case Z 0 and Z 1 This is left as an exercise for the reader To perform the trace students may use the RS CPU simulator package This package is a Java applet that can be run using any
19. we can use these signals to generate the control signals for the counter of the control unit and for the components of the rest of the CPU For the counter we must generate the INC CLR and LD signals INC is asserted when the control unit is traversing sequential states during FETCH1 FETCH2 ADDI and ANDI CLR is asserted at the end of each execute cycle to return to the fetch cycle this happens during ADD2 AND2 JMP1 and INCI Finally as noted earlier LD is as serted at the end of the fetch cycle during state FETCH3 Note that each state of the CPU s state diagram drives exactly one of these three control signals The circuit diagram for the control unit at this point is shown in Figure 6 9 Figure 6 9 Hardwired control unit for the Very Simple CPU FETCHI FETCH2 FETCH3 Counter Decoder ADDI ADD2 ANDI AND2 JMP1 1 IR 1 0 0 0 1 2 8 9 10 INC LD INC CLR FETCH3 FETCHI Cao FETCH2 ANDI SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU These state signals are also combined to create the control signals for AR PC DR IR M the ALU and the buffers First consider register AR It is loaded during states FETCH1 AR PC and FETCH3 AR lt DR 5 0 By logically ORing these two state signals together the CPU generates the LD signal for AR It doesn t matter which value is to be loaded into AR at least as far as the LD signal is concerned When the designers cre ate the co
20. CPUs developed in this chapter 6 4 1 More Internal Registers and Cache One of the best ways to improve the performance of a microprocessor is to incorporate more storage within the CPU Adding registers and cache makes it possible to replace some external memory accesses with much faster internal accesses To illustrate this consider the ADD instructions for the Very Sim ple and Relatively Simple CPUs The ADD instruction for the Very Sim ple CPU adds the contents of the accumulator to that of a memory lo cation It requires two states one to read the value from memory ADD1 and another to add the two values and store the result in the accumulator ADD2 The Relatively Simple CPU however adds the contents of the accumulator and register R Because the CPU does not access memory it executes the ADD instruction in a single state CHAPTER 6 CPU DESIGN ADD1 Removing memory accesses from other instructions by using internal registers reduces the time needed to execute the instructions in a similar manner Having more registers within the CPU also improves performance in programs that have subroutines Consider a program for a CPU with no internal data registers other than an accumulator Assume this pro gram invokes a subroutine and this subroutine must receive six data values from the main program as passed parameters The main pro gram would have to write those six values to predetermined memory locations The subroutine would
21. PU DESIGN In general a CPU performs the following sequence of operations Fetch cycle Fetch an instruction from memory then go to the de code cycle Decode cycle Decode the instruction that is determine which in struction has been fetched then go to the execute cycle for that instruction Execute cycle Fxecute the instruction then go to the fetch cycle and fetch the next instruction A generic state diagram is shown in Figure 6 1 on page 215 Note that the decode cycle does not have any states Rather the decode cycle is actually the multiple branches from the end of the fetch rou tine to each individual execute routine 6 2 Design and Implementation of a Very Simple CPU In this section we specify and design a Very Simple CPU probably the simplest CPU you will ever encounter This CPU isn t very practical but it is not meant to be The sole application of this CPU is to serve as an instructional aid to illustrate the design process without bur dening the reader with too many design details In the next section we design a more complex CPU which builds on the design methods presented here 6 2 1 Specifications for a Very Simple CPU To illustrate the CPU design process consider this small and some what impractical CPU It can access 64 bytes of memory each byte be ing 8 bits wide The CPU does this by outputting a 6 bit address on its output pins A 5 0 and reading in the 8 bit value from memory on its i
22. R 01 the input is 1001 and so on This would result in the assignment shown in Table 6 3 Table 6 2 Instructions first states and opcodes for the Very Simple CPU Instruction First State IR ADD ADDI 00 AND ANDI 01 JMP JMP1 10 INC INCI 11 Table 6 3 Counter values for the proposed mapping function IR 1 0 Counter Value State 00 1000 8 ADDI 01 1001 9 ANDI 10 1010 10 JMP1 11 1011 11 INCI Although this would get to the proper execute routine it causes a problem Since state ADDI has a counter value of 8 and state AND1 has a counter value of 9 what value should we assign to ADD2 and how would it be accessed from ADDI This could be done by incorporating additional logic but this is not the best solution for the design Looking at the state diagram for this CPU we see that no execute routine contains more than two states As long as the first states of CHAPTER 6 CPU DESIGN the execute routines have counter values at least two apart it is possi ble to store the execute routines in sequential locations This is ac complished by using the mapping function 1 R 1 0 0 which results in counter values of 8 10 12 and 14 for ADDI AND1 JMP1 and INCI respectively To assign the execute routines to consecutive values we assign ADD2 to counter value 9 and AND2 to counter value 11 Now that we have decided which decoder output is assigned to each state
23. TCH3 INCI JMP O FETCHISFETCH2 FETCH35JMPI Table 6 4 shows the trace of the execution of one iteration of this program We can see that the program processes every instruction cor rectly Initially all registers contain the value 0 Table 6 4 Execution trace SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIMPLE CPU Instruction State Active Signals Operations Performed Next State ADD 4 FETCHI PCBUS ARLOAD ARO FETCH2 FETCH2 READ MEMBUS DR 04H PC 1 FETCH3 DRLOAD PCINC FETCH3 DRBUS ARLOAD R 00 AR 04H ADDI IRLOAD ADDI READ MEMBUS DR 27H ADD2 DRLOAD ADD2 DRBUS ACLOAD AC 0 27H 27H FETCHI AND 5 FETCHI PCBUS ARLOAD AR lt 1 FETCH2 FETCH2 READ MEMBUS DR 45H PC 2 FETCH3 DRLOAD PCINC FETCH3 DRBUS ARLOAD R 01 AR 05H ANDI IRLOAD ANDI READ MEMBUS DR 39H AND2 DRLOAD AND2 DRBUS ALUSEL AC27H 39H 31H FETCH ACLOAD INC FETCHI PCBUS ARLOAD AR 2 FETCH2 FETCH2 READ MEMBUS DR COH PC 3 FETCH3 DRLOAD PCINC FETCH3 DRBUS ARLOAD R 11 AR 00H INCI IRLOAD INCI ACINC AC 21H 1 22H FETCH JMP 0 FETCHI PCBUS ARLOAD AR 3 FETCH2 FETCH2 READ MEMBUS DR 80H PC 4 FETCH3 DRLOAD PCINC FETCH3 DRBUS ARLOAD R 10 AR 00H JMP1 IRLOAD JMP1 DRBUS PCLOAD PC 0 FETCHI 6 3 Design and Implementation of a Relatively Simple C PU The CPU designed in the previous
24. U provides this capability and it does so by making data avail able on the bidirectional pins D 7 0 For this design this data is pro vided solely from DR Most CPUs have more than one internal register for manipulating data For this reason the Relatively Simple CPU includes a general pur pose register R Internal registers improve the performance of the CPU by reducing the number of times memory must be accessed To illus trate this consider the ADD instruction of the Very Simple CPU After fetching and decoding the instruction the CPU had to fetch the operand from memory before adding it to the accumulator The Rela tively Simple CPU adds the contents of register R to AC eliminating the memory access and reducing the time needed to perform the addition CHAPTER 6 CPU DESIGN Most CPUs have several general purpose registers this CPU has only one to illustrate the use of general purpose registers while still keep ing the design relatively simple Most CPUs contain internal data registers that cannot be accessed by the programmer This CPU contains temporary register TR which it uses to store data during the execution of instructions As we will see the CPU can use this register to save data while it fetches the address for memory reference instructions Unlike the contents of AC or R which are directly modified by the user no instruction causes a per manent change in the contents of TR Finally most CPUs contain flag regi
25. an generate these results The first three changes are easy to make we simply remove the unused connections The fourth item is more of a bookkeeping matter than anything else In most cases we simply connect registers to the lowest order bits of the bus For example AR and PC are connected to bits 5 0 of the bus since they are only 6 bit registers The lone excep tion is IR Since it receives data only from DR 7 6 it should be con nected to the high order 2 bits of the bus Now comes the tricky part Since AC can load in one of two val ues either AC DR or AC DR the CPU must incorporate some arith metic and logic circuitry to generate these values Most CPUs contain an arithmetic logic unit to do just that In terms of the data paths the ALU must receive AC and DR as inputs and send its output to AC There are a couple of ways to route the data to accomplish this In this CPU we hardwire AC as an input to and output from the ALU and route DR as an input to the ALU via the system bus At this point the CPU is capable of performing all of the required data transfers Before proceeding we must make sure transfers that are to occur during the same state can in fact occur simultaneously For example if two transfers that occur in the same state both require that data be placed on the internal bus they could not be performed simultaneously since only one piece of data may occupy the bus at a given time This is another reason f
26. assume that the CPU fetched an instruction from location 10 In FETCHI it would perform the operation AR PC which has the value 10 In FETCH2 it would fetch the instruction from memory location 10 and store it in DR Presumably the CPU would then decode the instruc tion and execute it and then return to FETCHI to fetch the next instruction How ever PC still contains the value 10 so the CPU would continuously fetch decode and execute the same instruction The next instruction to be executed is stored in the next location 11 The CPU must increment the PC some time before it returns to the fetch routine To make this happen the designer has two options have every instruction increment the PC as part of its execute routine or increment the PC once during the fetch routine The latter is much easier to implement so CPUs take this approach SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU 219 writing the unused value These two operations can be performed in one state as FETCH3 IRS DRI7 6 AR DR 5 0 The state diagram for the fetch cycle is shown in Figure 6 2 6 2 3 Decoding Instructions After the CPU has fetched an instruction from memory it must deter mine which instruction it has fetched so that it may invoke the correct execute routine The state diagram represents this as a series of branches from the end of the fetch routine to the individual execute routines For this CPU there are four instructi
27. e ALU is shown in Figure 6 7 Figure 6 7 A Very Simple ALU 8 8 AC 8 8 X To AC 10 S DR 4 from bus Control Signal from control unit 6 2 7 Designing the Control Unit Using Hardwired Control At this point it is possible for the CPU to perform every operation nec essary to fetch decode and execute the entire instruction set The next task is to design the circuitry to generate the control signals to cause the operations to occur in the proper sequence This is the con trol unit of the CPU There are two primary methodologies for designing control units Hardwired control uses sequential and combinatorial logic to generate control signals whereas microsequenced control uses a lookup mem ory to output the control signals Each methodology has several de sign variants This chapter focuses on hardwired control microse quenced control is covered in Chapter 7 This Very Simple CPU requires only a very simple control unit The simplest control unit has three components a counter which con tains the current state a decoder which takes the current state and generates individual signals for each state and some combinatorial logic to take the individual state signals and generate the control sig nals for each component as well as the signals to control the counter CHAPTER 6 CPU DESIGN These signals cause the control unit to traverse the states in the proper order A generic vers
28. e data between components First we regroup the data transfers by destination AR ARc PC AR AR 1 AR DR TR PC PC PC 1 PC DR TR DR DRM DR AC IR IR DR R RAC TR TRS DR AC AC DR ACR AC AC R AC AC R AC AC 1 AC 0 ACC AC R AC AC v R AC AC R AC AC Z Zc 1 Z 0 both conditional From these operations we select the functions of each component AR and PC must be able to perform a parallel load and increment Both registers receive their data from the internal bus DR IR R and TR must be able to load data in parallel For now each register will receive its data from the internal bus As we will see later in the design process this will not work and more than one connection will have to be changed AC will require a lot of work as will Z This CPU will utilize an ALU to perform all of these functions The ALU will receive AC as one in put and the value on the internal bus as the other input AC will al ways receive its input from the ALU The CPU will also use the out put of the ALU to determine whether or not the result is 0 for the purpose of setting Z Although the CPU could use a register with parallel load incre ment and clear signals for AC we will only use a register with parallel load and have the ALU create values AC 1 and 0 when necessary This is done to facilitate the proper setting of Z The Z flag is imple mented as a 1 bit register with parallel
29. e registers but they are 32 bits wide as opposed to the 16 bits of its predecessors Intel s most recent micro processor as of this writing the Itanium microprocessor has 128 general purpose integer registers and an additional 128 general purpose floating point registers Intel first introduced cache memory into its Pentium microprocessor starting with 16K of cache memory It soon increased this to 32K and further increased the amount in later processors The Itanium microprocessor contains three levels of cache with over 4 MB of cache memory SECTION 6 4 SHORTCOMINGS OF THE SIMPLE CPUS 6 4 2 Multiple Buses Within the CPU Buses are efficient media for routing data between components within a CPU However a bus may only contain one value at any given time For that reason most CPUs contain several buses Multiple buses allow multiple data transfers to occur simultaneously one via each bus This reduces the time needed to fetch decode and execute in structions thus improving system performance Consider the register section of the Relatively Simple CPU shown in Figure 6 15 Most data transfers are routed via the 16 bit bus every register except IR either outputs data to or inputs data from the bus However most of these components never need to communicate with each other For example it is possible to route data from R to AR but it is never necessary to do so If multiple buses were used compo nents that transfer
30. from the register array the block containing registers B C D E H L SP and PC is wide enough for one register to place data onto the bus while another register reads the data from the bus as when the in struction MOV B C is executed When data is read from memory such as during an instruction fetch or from an I O device the data is passed through the data address buffer on to the internal data bus From there it is read in by the appropriate register The control section consists of several parts The timing and con trol block is equivalent to almost the entire control unit of the Rela tively Simple CPU It sequences through the states of the microproces sor and generates external control signals such as those used to read from and write to memory Although not shown it also generates all of the internal control signals used to load increment and clear regis ters to enable buffers and to specify the function to be performed by the ALU The instruction decoder and machine cycle encoding block takes the current instruction stored in the instruction register as its input and generates state signals that are input to the timing and control block This is similar to the function performed by the 4 to 16 decoder in the control unit of the Relatively Simple CPU as shown in Figure 6 17 Essentially it decodes the instruction The decoded signals are then combined with the timing signals in the timing and control block to generate the
31. have to read the values from memory and write its results back to memory Finally the main program would have to read the results from memory If the CPU contained enough registers the main program could store the parameters in its internal registers The subroutine would not need to access memory because the CPU already contained the data in its registers On completion the main program would receive the results via the registers Overall a large number of memory accesses are thus avoided As processors have become more complex designers have included more storage within the CPU both in registers and in ternal cache memory See Historical Perspective Storage in Intel Microprocessors HISTORICAL PERSPECTIVE Storage in Intel Microprocessors Since the introduction of its first microprocessor in 1971 Intel has steadily in creased the number of general purpose registers in its microprocessors The 4004 Intel s first microprocessor had no general purpose registers per se although a complete 4 chip computer consisting of the 4001 4002 4003 and 4004 chips in cluded 16 RAM locations that were used as registers Its successors the 8008 8080 and 8085 incorporated six general purpose registers as well as an accumu lator within the processor chip itself The 8086 microprocessor has eight general purpose registers as do the 80286 80386 and 80486 microprocessors The Pen tium microprocessor also has 8 internal general purpos
32. hen load it into AR during the next state but in crementing AR now will reduce the number of states needed to execute the LDAC instruction Thus the first state of this execute routine is LDACI DR M PC PC 1 AR AR 1 Having fetched the low order half of the address the CPU now must fetch the high order half It must also save the low order half somewhere other than DR otherwise it will be overwritten by the high order half of address T Here we make use of the temporary reg ister TR Again the CPU must increment PC or it will not have the cor rect address for the next fetch routine The second state is LDAC2 TReDR DR M PC PC 1 Now that the CPU contains the address it can read the data from memory To do this the CPU first copies the address into AR then reads data from memory into DR Finally it copies that data into the accumulator and branches back to the fetch routine The states to per form these operations are LDAC3 ARS DR TR LDAC4 DR M LDAC5 AC lt DR 6 3 3 3 STAC Instruction Although the STAC instruction performs the opposite operation of LDAC it duplicates several of its states Specifically it fetches the SECTION 6 3 DESIGN AND IMPLEM ENTATION OF A RELATIVELY SIM PLE CPU memory address in exactly the same way as LDAC states STACI STAC2 and STAC3 are identical to LDACI LDAC2 and LDAC3 respectively Once AR contains the address this routine must copy the data from AC to DR then write it to
33. his is the time savings mentioned earlier Thus ADD1 DR M Now that both operands are within the CPU it can perform the actual addition in one state ADD2 ACcAC DR These two operations comprise the entire execute cycle for the ADD instruction At this point the ADD execute cycle would branch back to the fetch cycle to begin fetching the next instruction 6 2 4 2 AND Instruction The execute cycle for the AND instruction is virtually the same as that for the ADD instruction It must fetch an operand from memory mak ing use of the address copied to AR during FETCH3 However instead of adding the two values it must logically AND the two values The states that comprise this execute cycle are SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU AND1 DR M AND2 AC AC A DR 6 2 4 3 JMP Instruction Any JMP instruction is implemented in basically the same way The ad dress to which the CPU must jump is copied into the program counter Then when the CPU fetches the next instruction it uses this new ad dress thus realizing the JMP The execute cycle for the JMP instruction for this CPU is quite trivial Since the address is already stored in DR 5 0 we simply copy that value into PC and go to the fetch routine The single state which comprises this execute cycle is JMP1 PC DRI5 0 In this case we actually had a second choice Since this value was copied into AR during FETCH3 we could have performed the opera
34. ing this CPU First we design the state dia gram for the CPU Then we design both the necessary data paths and the control logic to realize the finite state machine thus implement ing the CPU 6 2 2 Fetching Instructions from Memory Before the CPU can execute an instruction it must fetch the instruc tion from memory To do this the CPU performs the following se quence of actions 1 Send the address to memory by placing it on the address pins A 5 0 2 After allowing memory enough time to perform its internal de coding and to retrieve the desired instruction send a signal to memory so that it outputs the instruction on its output pins These pins are connected to D 7 0 of the CPU The CPU reads this data in from those pins The address of the instruction to be fetched is stored in the program counter Since A 5 0 receive their values from the address register the first step is accomplished by copying the contents of PC to AR Thus the first state of the fetch cycle is FETCH1 ARS PC Next the CPU must read the instruction from memory The CPU must assert a READ signal which is output from the CPU to memory to cause memory to output the data to D 7 0 At the same time the CPU must read the data in and store it in DR since this is the only CHAPTER 6 CPU DESIGN register used to access memory By waiting until one state after FETCHI the CPU gives memory the time to access the requested data which is an instructi
35. ion of this type of hardwired control unit is shown in Figure 6 8 Figure 6 8 Generic hardwired control unit CLK Input Counter Control signals to registers Decoder Logic ALU buffers and output pins LD INC CLR For this CPU there are a total of 9 states Therefore a 4 bit counter and a 4 to 16 bit decoder are needed Seven of the outputs of the decoder will not be used The first task is to determine how best to assign states to the out puts of the decoder and thus values in the counter The following guidelines may help l Assign FETCHI to counter value 0 and use the CLR input of the counter to reach this state Looking at the state diagram for this CPU we see that every state except FETCHI can only be reached from one other state FETCHI is reached from four states the last state of each execute routine By allocating FETCHI to counter value 0 these four branches can be realized by asserting the CLR signal of the counter which minimizes the amount of digital logic needed to design the control unit Assign sequential states to sequential counter values and use the INC input of the counter to traverse these states If this is done the control unit can traverse these sequential states by asserting the INC signal of the counter which also reduces the digital logic needed in the control unit This CPU would assign FETCH2 to counter value 1 and FETCH3 to counter value 2 It would
36. ion set computing debate which is examined more closely in Chapter 11 6 4 5 Subroutines and Interrupts Almost all CPUs have hardware to handle subroutines typically a stack pointer and instructions to call and return from the subroutine Most CPUs also have interrupt inputs to allow external hardware to in terrupt the current operations of the CPU This is useful for such things as updating the computer s display since it is preferable for the CPU to perform useful work and to be interrupted when the screen must be refreshed rather than spending time polling the screen con troller to determine whether it needs to be updated Interrupts are de scribed in more detail in Chapter 10 6 5 Real World Example Internal Architecture of the 8085 Microprocessor In Chapters 3 and 4 we examined the instruction set architecture of the 8085 microprocessor and a computer based on this microproces sor In this section we look inside the 8085 and compare its organiza tion to that of the Relatively Simple CPU The internal organization of Intel s 8085 microprocessor is shown in Figure 6 19 Note that some elements of the design such as internal control signals are present but not shown in the figure As with the other CPUs described so far the 8085 contains a register section an ALU and a control unit Note that the interrupt control and serial I O control blocks are not exclusively a part of any one section In fact part of these blocks are
37. irst is to create direct paths between each pair of components that transfer data We can use multiplexers or buffers to select one of sev eral possible data inputs for registers that can receive data from more than one source For example in this CPU AR can receive data from PC or DR 5 0 so the CPU would need a mechanism to select which one is to supply data to AR at a given time This approach could work for this CPU because it is so small However as CPU complexity increases this becomes impractical A more sensible approach is to create a bus within the CPU and route data between components via the bus SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU To illustrate the bus concept consider an interstate highway that is 200 miles long and has about as many exits Assume that each exit connects to one town When building roads the states had two choices They could build a separate pair of roads one in each direc tion between every pair of towns resulting in almost 40 000 roads or one major highway with entrance and exit ramps connecting the towns The bus is like the interstate highway It consolidates traffic and reduces the number of roads data paths needed We begin by reviewing the data transfers that can occur to deter mine the functions of each individual component Specifically we look at the operations that load data into each component It is not neces sary to look at operations in which a component supplies
38. memory The states that comprise this execute routine are STACI DRM PC PC 1 AR AR 1 STAC2 TRS DR DRM PC PC 1 STAC3 AR DR TR STAC4 DRc AC STAC5 MDR At first glance it may appear that STAC3 and STAC4 can be combined into a single state However when constructing the data paths later in the design process we decided to route both transfers via an internal bus Since both values cannot occupy the bus simultaneously we chose to split the state in two rather than create a separate data path This process is not uncommon and the designer should not be con cerned about needing to modify the state diagram because of data path conflicts Consider it one of the tradeoffs inherent to engineering design 6 3 3 4 MVAC and M OVR Instructions The MVAC and MOVR instructions are both fairly straightforward The CPU simply performs the necessary data transfer in one state and goes back to the fetch routine The states that comprise these routines are MVAC1 ReAC and MOVRI ACR 6 3 3 5 JUMP Instruction To execute the JUMP instruction the CPU fetches the address just as it did for the LDAC and STAC instructions except it does not increment PC Instead of loading the address into AR it copies the address into PC so any incremented value of PC would be overwritten anyway This instruction can be implemented using three states JUMP1 DRM AR AR 1 JUMP2 TR DR DR M JUMP3 PC DR TR 6 3 3 6 JMPZ and JPNZ Instructions
39. nce Pipelining is discussed in detail in Chapter 11 6 4 4 Larger Instruction Sets Having a larger number of instructions in a processor s instruction set generally allows a program to perform a function using fewer instruc tions For example consider a CPU that can logically AND two values and complement one value but cannot logically OR two values To CHAPTER 6 CPU DESIGN ES Register section for the Relatively Simple CPU using multiple buses READ WRITE To from data bus 2 PCLOAD PCINC 16 bit address bus 8 bit data bus 1 f IRLOAD CLK 8 15 8 Apehsus 8 x TRLOAD galg DRLBUS 8 8 C oE ae LD ORAE DR3BUS Y To data 8 IR bus 2 7 LD A SECTION 6 4 SHORTCOMINGS OF THE SIMPLE CPUS EH Figure 6 18 continued To from D 7 0 buffers 8 bit data bus 2 perform a logical OR of two values A and B it would perform the fol lowing instructions Complement A Complement B AND A and B Complement the result If this CPU contained an OR instruction only one instruction would be needed instead of four There is considerable debate over how many instructions a CPU should have As the number of instructions increases so does the time needed to decode the instructions which limits the clock speed of the CHAPTER 6 CPU DESIGN CPU This is the basis of the complex versus reduced instruct
40. nputs D 7 0 This CPU will have only one programmer accessible register an 8 bit accumulator labeled AC It has only four instructions in its in struction set as shown in Table 6 1 Table 6 1 Instruction set for the Very Simple CPU Instruction Instruction Code Operation ADD OOAAAAAA AC AC M AAAAAA AND 01AAAAAA AC AC M AAAAAA JMP 10AAAAAA GOTO AAAAAA INC 1 1 XXXXXX AC lt AC 1 SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU As noted earlier this is a fairly impractical CPU for several reasons For example although it can perform some computations it cannot output the results In addition to AC this CPU needs several additional registers to perform the internal operations necessary to fetch decode and exe cute instructions The registers in this CPU are fairly standard and are found in many CPUs their sizes vary depending on the CPU in which they are used This CPU contains the following registers A 6 bit address register AR which supplies an address to mem ory via A 5 0 A 6 bit program counter PC which contains the address of the next instruction to be executed An 8 bit data register DR which receives instructions and data from memory via D 7 0 e A 2 bit instruction register IR which stores the opcode portion of the instruction code fetched from memory A CPU is just a complex finite state machine and that dictates the approach we take in design
41. ntrol signals for the buffers they will ensure that the proper data is placed on the bus and made available to AR Following this pro cedure we create the following control signals for PC DR AC and IR PCLOAD JMP1 PCINC FETCH2 DRLOAD FETCH1 v ADDI v ANDI ACLOAD ADD2 v AND2 ACINC INCI IRLOAD FETCH3 The ALU has one control input ALUSEL When ALUSEL 0 the output of the ALU is the arithmetic sum of its two inputs if ALUSEL 1 the output is the logical AND of its inputs Setting ALUSEL AND2 routes the correct data from the ALU to AC when the CPU is executing an ADD or AND instruction At other times during the fetch cycle and the other execute cycles the ALU is still outputting a value to AC However since AC does not load this value the value output by the ALU does not cause any problems Many of the operations use data from the internal system bus The CPU must enable the buffers so the correct data is placed on the bus at the proper time Again looking at the operations that oc cur during each state we can generate the enable signals for the buf fers For example DR must be placed onto the bus during FETCH3 IR DR 7 6 AR DR 5 0 ADD2 AC AC DR AND2 AC AC DR and JMP1 PC DR 5 0 Recall that the ALU receives DR input via the internal bus Logically ORing these state values produces the DRBUS signal This procedure is used to generate the enable signals for the other buffers as well
42. on in this case The net result is at first FETCH2 DRM In fact there is another operation that will be performed here We must also increment the program counter so FETCH2 should actually be as follows FETCH2 DRM PC PC 1 See Practical Perspective Why a CPU Increments PC During the Fetch Cycle for the reasoning behind this Finally there are two other things that the CPU will do as part of the fetch routine First it copies the two high order bits of DR to IR As shown in Table 6 1 these two bits indicate which instruction is to be executed As we will see in the design of the control logic it is neces sary to save this value in a location other than DR so it will be avail able to the control unit Also the CPU copies the six low order bits of DR to AR during the fetch routine For the ADD and AND instructions these bits contain the memory address of one of the operands for the instruction Moving the address to AR here will result in one less state in the execute routines for these instructions For the other two in structions it will not cause a problem They do not need to access memory again so they just won t use the value loaded into AR Once they return to the FETCH routine FETCHI will load PC into AR over PRACTICAL PERSPECTIVE Why a CPU Increments PC During the Fetch C ycle n To see why a CPU increments the program counter during FETCH2 consider what would happen if it did not increment PC For example
43. one particularly unusual feature of the state diagram in Figure 6 11 Two of the instructions JMPZ and JPNZ have two differ ent execute routines These conditional instructions will be executed in one of two ways depending on the value of Z Either they will jump to address T or they will not Each execute routine implements one of these two possibilities the value of Z determines which is selected SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU Fetch and decode cycles for the Relatively Simple CPU Y FETCH2 IR 07 Z 0 Tir OA NOP MVAC JPNZ INAC OR Execute Execute Z 1 Z 0 Execute Execute Cycle Cycle Execute Execute Cycle Cycle Cycle Cycle IR 01 IR 04 IR 06A Z 0 IR 08 IR OB IR OE LDAC MOVR JMPZ ADD XOR Execute Execute Z 0 Execute Execute Cycle Cycle Execute Cycle Cycle Cycle IR 02 IR 05 IR 07A4 Z 1 IR 09 IR OC IR OF STAC JUMP JNPZ SUB NOT Execute Execute Z 1 Execute Execute Cycle Cycle Execute Cycle Cycle Cycle 6 3 3 Executing Instructions The final task in creating the state diagram for this CPU is to prepare the state diagrams for the execute routines As before we develop them individually and combine them into a final state diagram 6 331 NOP Instruction The NOP is the easiest instruction to implement The CPU does noth ing and then goes to the fetch routine to fetch the next instruction This could be accomplished either by having the fetch r
44. ons and thus four exe cute routines The value in IR 00 01 10 or 11 determines which ex ecute routine is invoked The state diagram for the fetch and decode cycles is shown in Figure 6 3 on page 220 Fetch cycle for the Very Simple CPU FETCH FETCH2 FETCH3 6 2 4 Executing Instructions To complete the state diagram for this CPU we must develop the state diagram for each execute routine Now we design the portion of the state diagram for each execute routine and the overall design for the CPU The state diagrams for the individual execute routines are fairly simple so they are only included in the diagram of the finite state machine for the entire CPU 6 2 4 1 ADD Instruction In order to perform the ADD instruction the CPU must do two things First it must fetch one operand from memory Then it must add this operand to the current contents of the accumulator and store the re sult back into the accumulator CHAPTER 6 CPU DESIGN Fetch and decode cycles for the Very Simple CPU FETCHI FETCH2 FETCH3 ADD AND JMP INC Execute Execute Execute Execute Cycle Cycle Cycle Cycle To fetch the operand from memory the CPU must first make its address available via A 5 0 just as it did to fetch the instruction from memory This is done by moving the address into AR However this was already done in FETCH3 so the CPU can simply read the value in immediately T
45. or implementing PC PC 1 by using a counter for PC if that value was routed via the bus both oper ations during FETCH2 would have required the bus As it is no state of the state diagram for this CPU would require more than one value to be placed on the bus so this design is OK in that respect The modified version of the internal organization of the CPU is shown in Figure 6 6 The control signals shown will be generated by the control unit 226 CHAPTER 6 CPU DESIGN Figure 6 6 Final register section for the Very Simple CPU READ 6 8 5 M A 5 0 P17 01 8 MEMBUS A 6 5 0 ARLOAD 6 5 0 6 R1 59 n gt PCLOAD PCINC DRBUS 8 8 f f 7 x 8 ALU ALUSEL ACLOAD ACINC 2 7 6 CLK 8 bit bus 6 2 6 Design ofa Very Simple ALU The ALU for this CPU performs only two functions adds its two inputs or logically ANDs its two inputs The simplest way to design this ALU is to create separate hardware to perform each function and then use SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU a multiplexer to output one of the two results The addition is imple mented using a standard 8 bit parallel adder The logical AND opera tion is implemented using eight 2 input AND gates The outputs of the parallel adder and the AND gates are input to an 8 bit 2 to 1 multi plexer The control input of the MUX is called S for select The circuit diagram for th
46. or this Relatively Simple CPU is its instruction set shown in Table 6 5 Instruction set for a Relatively Simple CPU Instruction Instruction Code Operation NOP 0000 0000 No operation LDAC 0000 0001 Tl AC MIT STAC 0000 0010T MIT AC MVAC 0000 0011 R AC MOVR 0000 0100 ACR JUMP 0000 0101 r GOTOT JMPZ 0000 0110T IF Z 1 THEN GOTO T JPNZ 0000 0111 r IF Z 0 THEN GOTO T ADD 0000 1000 AC AC R IF AC R 0 THEN Z 1 ELSE Z 0 SUB 0000 1001 AC AC R IF AC R 0 THEN Z 1 ELSE Z 0 INAC 0000 1010 AC AC 1 IF AC 1 0 THEN Z 1 ELSE Z 0 CLAC 0000 1011 ACCO0 Z 1 AND 0000 1100 AC lt AC R IF AC R 0 THEN Z 1 ELSE Z 0 OR 0000 1101 ACC AC v R IF AC v R 0 THEN Z 1 ELSE Z 0 XOR 0000 1110 AC AC 6 R IF AC 6 R 0 THEN Z 1 ELSE Z 0 NOT 0000 1111 AC AC IF AC 0 THEN Z 1 ELSE Z 0 SECTION 6 3 DESIGN AND IMPLEM ENTATION OF A RELATIVELY SIM PLE CPU As in the Very Simple CPU this Relatively Simple CPU contains several registers in addition to those specified in its instruction set ar chitecture Differences between these registers and those of the Very Simple CPU are italicized A 16 bit address register AR which supplies an address to memory via A 15 0 A 16 bit program counter PC which contains the address of the next instruction to be executed or the address of the next required operand of the instruction An
47. outine branch back to its own beginning or by creating a single state that does noth ing as the execute routine In this CPU we use the latter approach The state diagram for this execute routine contains the single state NOPI No operation CHAPTER 6 CPU DESIGN 6 3 3 2 LDAC Instruction LDAC is the first of the multiword instructions in this CPU It contains three words the opcode the low order half of the address and the high order half of the address The execute routine must get the ad dress from memory then get data from that memory location and load it into the accumulator Remember that after the instruction has been fetched from memory the program counter contains the next address in memory If the instruction consisted of a single byte the PC would contain the address of the next instruction Here however it contains the address of the first operand the low order half of the address F This CPU uses this value of PC to access the address First the CPU must get the address from memory Since the ad dress of the low order half of address T was loaded into AR during FETCH3 this value can now be read in from memory The CPU must also do two other things at this time Because the CPU has read in the data whose address is stored in PC it must increment PC and because it will need to get the high order half of the address from the next memory location it must also increment AR The CPU could simply increment PC now and t
48. ow the modifications necessary for the control unit Include the hardware needed to generate any new or modified control signals Verify the functioning of the CPU of Problems 19 20 and 21 for the new instruction Modify the Relatively Simple CPU to include a new 8 bit register B and five new instructions as follows Show the modified state diagram and RTL code for this CPU Instruction Instruction Code Operation ADDB 0001 1000 ACC AC B SUBB 0001 1001 ACC AC B ANDB 0001 1100 ACC AC B ORB 0001 1101 ACC AC v B XORB 0001 1110 AC AC B For the CPU of Problem 23 show the modifications necessary for the register section and the ALU For the CPU of Problem 23 show the modifications necessary for the control unit Include the hardware needed to generate any new or modified control signals Verify the functioning of the CPU of Problems 23 24 and 25 for the new instructions PROBLEMS For the Relatively Simple CPU assume the CLAC and INAC instructions are implemented via the CLR and INC signals of the AC register in stead of through the ALU Modify the input and control signals of Z so it is set properly for all instructions m Design a CPU that meets the following specifications e It can access 64 words of memory each word being 8 bits wide The CPU does this by outputting a 6 bit address on its output pins A 5 0 and reading in the 8 bit value from memory on its inputs D 7 0 The
49. rol unit logically ANDs the correct time value with the output of the instruction multiplexer cor responding to the proper instruction For example the states of the LDAC execute routine are LDACI ILDAC T3 LDAC2 ILDAC T4 SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU 249 Hardwired control unit for the Relatively Simple CPU INOP ILDAC ISTAC IMVAC IMOVR UMP Time IIMPZ Counter JPNZ IADD ISUB IINAC ICLAC IAND IOR IXOR INOT Decoder 0 1 2 3 4 5 6 7 8 CLK LDAC3 ILDAC T5 LDAC4 ILDAC T6 LDAC5 ILDAC T7 The complete list of states is given in Table 6 6 on page 250 Having generated the states we must generate the signals to supply the CLR and INC inputs of the time counter The counter is cleared only at the end of each execute routine To do this we logi cally OR the last state of each execute routine to generate the CLR in put The INC input should be asserted at all other times so it can be implemented by logically ORing the remaining states together As an alternative the INC input can be the complement of the CLR input since if the control unit is not clearing the counter it is incrementing the counter Following the same procedure we used for the Very Simple CPU we generate the register and buffer control signals Table 6 7 on page 251 shows the values for the buffers and AR The remaining control signals are left as design problems for the
50. rs and combines their outputs to CHAPTER 6 CPU DESIGN Figure 6 16 A Relatively Simple ALU Parallel C L ALUS 4 Adder To AC generate the state value One value is the opcode of the instruction The other is a counter to keep track of which state in the fetch or exe cute routine should be active The opcode value is relatively easy to design The opcode is stored in IR so the control unit can use that register s outputs as in puts to a decoder Since the instruction codes are all of the form 0000 XXXX we only need to decode the four low order bits We NOR together the four high order bits to enable the decoder Then the counter can be set up so that it only has to be incremented and cleared and never loaded this greatly simplifies the design These components and the labels assigned to their outputs are shown in Figure 6 17 The fetch routine is the only routine that does not use a value from the instruction decoder Since the instruction is still being fetched during these states this decoder could have any value during the instruction fetch Just as with the Very Simple CPU this control unit assigns TO to FETCHI since it can be reached by clearing the time counter We assign T1 and T2 to FETCH2 and FETCH3 respectively The states of the execute routines depend on both the opcode and time counter values T3 is the first time state of each execute rou tine T4 is the second and so on The cont
51. section is named appropriately It is indeed very simple It illustrated design methods that are too simple to handle the complexity of a larger CPU This section presents the de sign of a more complex but still relatively simple CPU This CPU has a larger instruction set with more complex instructions Its design fol lows the same general procedure used to design the Very Simple CPU 234 CHAPTER 6 CPU DESIGN 6 3 1 Specifications for a Relatively Simple CPU Chapter 3 introduced the instruction set architecture for the Rela tively Simple CPU This CPU can access 64K bytes of memory each 8 bits wide via address pins A 15 0 and bidirectional data pins D 7 0 Three registers in the ISA of this processor can be directly con trolled by the programmer The 8 bit accumulator AC receives the re sult of any arithmetic or logical operation and provides one of the operands for arithmetic and logical instructions which use two operands Whenever data is loaded from memory it is loaded into the accumulator data stored to memory also comes from AC Register R is an 8 bit general purpose register It supplies the second operand of all two operand arithmetic and logical instructions It can also be used to temporarily store data that the accumulator will soon need to access Finally there is a 1 bit zero flag Z which is set whenever an arith metic or logical instruction is executed The final component of the instruction set architecture f
52. stan dard Web browser with Java enabled Using this package the reader can enter a program and step through the fetch decode and execu tion of the individual instructions The package may be accessed at the textbook s companion Web site along with its instructions SECTION 6 4 SHORTCOMINGS OF THE SIMPLE CPUS na Table 6 7 Control signal values for a Relatively Simple CPU Signal Value PCBUS FETCHI v FETCH3 DRHBUS LDAC3 v STAC3 v JUMP3 v JMPZY3 v JPNZY3 DRLBUS LDAC5 v STAC5 TRBUS LDAC3 v STAC3 v JUMP3 v JMPZY3 v JPNZY3 RBUS MOVRI v ADDI v SUBI v ANDI v ORI v XORI ACBUS STAC4 v MVACI MEMBUS FETCH2 v LDACI v LDAC2 v LDAC4 v STACI v STAC2 v JUMPI v JUMP2 v JMPZYI v JMPZY2 v JPNZY 1 v JPNZY2 BUSMEM STAC5 ARLOAD FETCHI v FETCH3 v LDAC3 v STAC3 ARINC LDACI v STACI v JUMPI v JMPZYI v JPNZY1 6 4 Shortcomings of the Simple CPUs The CPUs presented in this chapter were designed as educational tools Although they share many features with commonly used micro processors they are not representative of the current state of CPU de sign Several common features were excluded from the Very Simple and Relatively Simple CPUs in an attempt to incorporate the essential features without overwhelming the reader Consider the feature sets of these CPUs to be the result of an engineering education design tradeoff Following are some of the features found in many CPUs that are not present in either of the
53. ster that holds the address of the top of the stack The CPU must realize the following additional instructions Note that operations separated by semicolons occur sequentially and op erations separated by commas occur simultaneously Also note that the value of PC used by the CALL instruction is the value of PC after T has been fetched from memory Instruction Instruction Code Operation LDSP 10000000T SP r CALL 10000010T SP SP 1 M SP PC 15 8 SP SP 1 M SP PC 7 0 PC T RET 10000011 PC 7 04 M SP SP SP 1 PC 15 8 M SP SP SP 1 PUSHAC 10000100 SP SP 1 M SP AC POPAC 10000101 AC MISP SP SP 1 PUSHR 10000110 SP SP 1 M SP 4 R POPR 10000111 Rc M SP SP SP 1
54. sters or flags which show the result of a previous operation Typical flags indicate whether or not an operation generated a carry the sign of the result or the parity of the result The Relatively Simple CPU contains a zero flag Z which is set to 1 if the last arithmetic or logical operation produced a result equal to 0 Not every instruction changes the contents of Z in this and other CPUs For example an ADD instruction sets Z but a MOVR move data from R into AC instruction does not Most CPUs contain condi tional instructions that perform different operations depending on the value of a given flag The JMPZ and JPNZ instructions for this CPU fall into this category 6 3 2 Fetching and Decoding Instructions This CPU fetches instructions from memory in exactly the same way as the Very Simple CPU does except at the end of the fetch cycle Here IR is 8 bits and receives the entire contents of DR Also AR lt PC instead of DR 5 0 because that is the next address it would need to access The fetch cycle thus becomes FETCH1 AR PC FETCH2 DRM PC PC 1 FETCH3 IRS DR AR PC The state diagram for this fetch cycle is exactly the same as that of the Very Simple CPU shown in Figure 6 2 We also follow the same process for decoding instructions that we used for the Very Simple CPU Here IR is 8 bits wide and there will be more possible branches The state diagram for the fetch and de code cycles is shown in Figure 6 11 There is
55. sult in AC and then set Z It must perform both operations simultaneously For now we simply specify the states and defer the implementation until later in the de sign process The states for these execute routines are as follows ADD1 AC lt AC R IF AC R 0 THEN Ze1 ELSE Z 0 SUB1 AC AC R IF AC R 0 THEN Ze1 ELSE Z 0 INAC1 ACc AC 1 IF AC 1 0 THEN Ze1 ELSE Z0 CLACI AC 0 Z 1 AND1 ACAC R IF ACAR 0 THEN Z 1 ELSE Z 0 ORI ACc AC v R IF AC v R 0 THEN Ze1 ELSE Z 0 XORI ACeAC R IF AC 6 R 0 THEN Ze 1 ELSE Z 0 NOTI AC lt AC IF AC 0 THEN Ze 1 ELSE Z 0 The state diagram for this entire CPU is shown in Figure 6 12 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU SECTION 6 3 Complete state diagram for the Relatively Simple CPU S2V 1 trval 2V 1 loval 2V1 RH LE 107 yl LdON I LLON LOVID at m ZNZNdf ZNZdIN LNZNdf LNZdWf Lovan IVAW HOX LOVNI A 0 Z v 90 yl SUC HI 10 MI VO Ml a SOVLS lt DOVLS HO agns AZNdf AZdINf 2VIS do Ml 60 7 yl ZAZNdf ZAZdN COVIS LANV Lady LAZNdf LAZdWf IOVLS A 3023411 o zvzo uit L z v90 ui 20281 20293 HD111 cHDL11 00 al 2 gt IH2134 lt CHAPTER 6 CPU DESIGN 6 3 4 Establishing D ata Paths As with this Very Simple CPU the Relatively Simple CPU uses an inter nal data bus to mov
56. the contents of AC LDAC5 AC lt DR MOVRI ACR ADDI AC AC R SUBI AC AC R INAC1 AC AC 1 CLACI AC 0 ANDI AC ACAR ORI AC AC v R XORI AC AC R NOTI AC lt AC 2 46 CHAPTER 6 CPU DESIGN Final register section for the Relatively Simple CPU READ WRITE i 8 16 A 15 0 D 7 0 MEMBUS V XBUSMEM E da CBUS16 m BUS PCLOAD PCINC l E 15 51 DRLBUS 8 7 0 8 g DRLBUS L L DR HH gt 7 0 LD A DRLOAD TRBUS Ly IR LD A IRLOAD 8 7 0 8 4 817 0 L R 4 LD A b f 8 RLOAD 8 8 4 8 LD A ALUSI1 7 8 ACLOAD ZLOAD CLK 16 bit bus SECTION 6 3 DESIGN AND IMPLEMENTATION OF A RELATIVELY SIM PLE CPU An arithmetic logic unit ALU can be designed just as its name implies We can design one section to perform the arithmetic instruc tions and another section to perform the logical instructions A multi plexer selects data from the correct section for output to AC First we design the arithmetic section To do this we rewrite the arithmetic instructions to indicate the source of their operands LDAC5 AC lt BUS MOVR1 AC lt BUS ADDI ACeAC BUS SUBI AC AC BUS INAC1 AC AC 1 CLACI AC 0 Each of these instructions can be implemented by using a paral lel adder with carry in by modifying the input values rewriting each operation as the sum of two
57. the data or one of the operands that will be taken care of when we look at the component whose value is being changed First we regroup the opera tions without regard for the cycles in which they occur by the regis ter whose contents they modify This results in the following AR AR PC AR DR 5 0 PC PC PC 1 PC DR 5 0 DR DRS M IR IR DR 7 6 AC AC AC DR AC AC DR AC AC 1 Now we examine the individual operations to determine which functions can be performed by each component AR DR and IR always load data from some other component made available by the bus so they only need to be able to perform a parallel load PC and AC can load data from external sources but they both need to be able to in crement their values We could create separate hardware that would increment the current contents of each register and make it available for the register to load back in but it is easier to design each register as a counter with parallel load capability In that way the increment operations can be performed solely within the register the parallel load is used to implement the other operations Next we connect every component to the system bus as shown in Figure 6 5 on page 224 Notice that we have included tri state buffers between the outputs of the registers and the system bus If we did not do this all the registers would place their data onto the bus at all times making it impossible to transfer valid data wi
58. thin the CPU Also the outputs of AR are connected to pins A 5 0 as required in the CPU specification At this point the CPU does not include the control unit nor the control signals we will design those later Right now our goal is to ensure that all data transfers can occur Later we will design the control unit to make sure that they occur properly 224 CHAPTER 6 CPU DESIGN Preliminary register section for the Very Simple CPU 8 bit bus SECTION 6 2 DESIGN AND IMPLEMENTATION OF A VERY SIMPLE CPU Now we look at the actual transfers that must take place and modify the design accordingly After reviewing the list of possible op erations we note several things 1 ARonly supplies its data to memory not to other components It is not necessary to connect its outputs to the internal bus 2 IR does not supply data to any other component via the internal bus so its output connection can be removed The output of IR will be routed directly to the control unit as shown later 3 AC does not supply its data to any component its connection to the internal bus can also be removed 4 The bus is 8 bits wide but not all data transfers are 8 bits some are only 6 bits and one is 2 bits We must specify which registers send data to and receive data from which bits of the bus 5 AC must be able to load the sum of AC and DR and the logical AND of AC and DR The CPU needs to include an ALU that c
59. tion codes remain un changed Show the new state diagram and RTL code for this CPU For the CPU of Problem 10 show the modifications necessary for the register section For the CPU of Problem 10 show the modifications necessary for the control unit Include the hardware needed to generate any new or modified control signals Verify the functioning of the CPU of Problems 10 11 and 12 for the new instructions Enhance the Very Simple ALU to perform the following operations in addition to those it currently performs shl AC AC AC neg AC AC 1 adl AC A4AC DR 1 Show the logic needed to generate the control signals for registers PC DR TR and IR of the Relatively Simple CPU CHAPTER 6 N e CPU DESIGN Show the logic needed to generate the control signals for registers R AC and Z of the Relatively Simple CPU Show the logic needed to generate the control signals for the ALU of the Relatively Simple CPU Verify the functioning of the Relatively Simple CPU for all instructions either manually or using the CPU simulator Modify the Relatively Simple CPU to include a new instruction SETR which performs the operation R 1111 1111 Its instruction code is 0001 0000 Show the modified state diagram and RTL code for this CPU Hint One way to implement this is to clear R and then decrement it For the CPU of Problem 19 show the modifications necessary for the register section For the CPU of Problem 19 sh
60. unction to be performed by the ALU is spec ified by the control unit By outputting the control signals in the proper sequence the control unit causes the CPU to properly fetch decode and execute every instruction in its instruction set Problems A CPU with the same registers as the Very Simple CPU connected as shown in Figure 6 6 has the following instruction set and state dia gram Show the RTL code for the execute cycles for each instruction Assume the RTL code for the fetch routine is the same as that of the Very Simple CPU CHAPTER 6 CPU DESIGN Instruction Instruction Code Operation JMP1 OOAAAAAA PC AAAAAA 1 INC2 0 1XXXXXX AC AC 2 ADDI 10AAAAAA AC AC M AAAAAA 1 SKIP 1 1XXXXXX PC PC 1 JMP12 IR 01 IR 10 TUE IR 11 Y Y INC22 BH A CPU with the same registers as the Very Simple CPU connected as shown in Figure 6 6 has the state diagram on the next page and fol lowing RTL code Show the instruction set for this CPU FETCH1 ARS PC FETCH2 DRM PC PC 1 FETCH3 IRS DR 7 6 AR DR 5 0 001 DRM ARAR 1 002 AC lt AC DR 003 DR M PROBLEM S 261 004 AC AC DR 011 DRM PC PC 1 012 AC lt AC DR 1X1 AC AC 1 DR M 1X2 AC lt AC DR M FETCHI Y FETCH2 S 012 1X2 EH Develop a control unit for the state diagram in Problem 2 262 CHAPTER 6 CPU
61. values and a carry LDACS AC 0 BUS 0 MOVRI ACe0 BUS 0 ADD1 AC AC BUS 0 SUBI AC AC BUS 1 INAC1 AC 4AC 0 1 CLACI AC 0 0 0 Note that subtraction is implemented via two s complement addition as described in Chapter 1 For now we design the data paths we im plement the control logic later in the design process The first input to the parallel adder is either the contents of AC or 0 The ALU can use a multiplexer to select one of these two values and pass it to one input of the parallel adder Similarly the ALU uses a multiplexer to send BUS BUS or 0 to the second input The ALU could also use a multiplexer to supply the carry input but that would be overkill We simply use a control input to directly generate this value The logical operations are relatively straightforward Since there are four logical operations we use an 8 bit 4 to 1 multiplexer The in puts to the MUX are AC BUS AC v BUS AC 6 BUS and AC Finally a multiplexer selects the output of either the parallel adder or the logic multiplexer to output to AC The entire ALU design is shown in Figure 6 16 on page 248 6 3 6 Designing the Control Unit Using Hardwired Control The Relatively Simple CPU has a total of 37 states making it too com plex to implement efficiently using the same design as the Very Simple CPU s control unit Instead of using one register to generate the state this control unit uses two registe

Download Pdf Manuals

image

Related Search

Related Contents

Nikon 45136 User's Manual  Lennox Hearth 1000 User's Manual  96M5771o User's Manual  GD0800C  Osram HALOSTAR STAR  Luiken Hatches Luke Capots  2011 Rural Urban Classification - User Guide  

Copyright © All rights reserved.
Failed to retrieve file