Home

MIPS32® 34Kc™ Processor Core Datasheet

1. All subsequent uncached accelerated word or double word stores to the same 32B region will write sequentially into this buffer independent of the word address associated with these latter stores The uncached accelerated buffer is tagged with the address of the first store An uncached accelerated store that does not merge and does not go to an aligned address will be treated as a regular uncached store 11 Copyright 2005 2010 MIPS Technologies Inc All rights reserved SimpleBE Mode To aid in attaching the 34Kc CPU to structures which cannot easily handle arbitrary byte enable patterns there is a mode that generates only simple byte enables Only byte enables representing naturally aligned byte halfword word and doubleword transactions will be generated The only case where a read can generate non simple byte enables is on an uncached tri byte load LWL LWR In SimpleBE mode such a read will be converted into a word read on the external interface Writes with non simple byte enable patterns can arise when a sequence of stores is processed by the merging write buffer or from uncached tri byte stores SWL SWR In SimpleBE mode these stores will be broken into multiple write transactions EJTAG Debug Support The 34Kc CPU includes an Enhanced JTAG EJTAG block for use in the software debug of application and kernel code In addition to standard user supervisor kernel modes of operation the 34Kc CPU provide
2. At the core of the user experience BusBridge Bus Navigator CLAM CorExtend CoreFPGA CoreLV EC FPGA View FS2 FS2 FIRST SILICON SOLUTIONS logo FS2 NAVIGATOR HyperDebug HyperJTAG JALGO Logic Navigator Malta MDMX MED MGB OCI PDtrace the Pipeline Pro Series SEAD SEAD 2 SmartMIPS SOC it System Navigator and YAMON are trademarks or registered trademarks of MIPS Technologies Inc in the United States and other countries All other trademarks referred to herein are the property of their respective owners Template nDb1 03 Built with tags 2B MIPS32 34Kc Processor Core Datasheet Revision 01 21 MD00418 Copyright 2005 2010 MIPS Technologies Inc All rights reserved
3. example S _Int 5 0 S _Int_1 5 0 SI_NMI SI NMI_1 and S I_ Reset continue to run Once the CPU is in instruction controlled power management mode any interrupt NMI or reset condition causes the CPU to exit this mode and resume normal operation MIPS32 34Kc Processor Core Datasheet Revision 01 21 The 34Kc CPU asserts the S _S eep signal which is part of the system interface whenever it has entered low power operation and gone to sleep It will enter sleep mode when all bus transactions are complete and all TCs are not running instructions This happens when a TC is e Blocked due to a WAIT instruction e Blocked due to an outstanding ITC operation e Yielded e Halted e Not Active Test Capability Internal Scan Full mux based scan for maximum test coverage is supported with a configurable number of scan chains ATPG test coverage can exceed 99 depending on standard cell libraries and configuration options Memory BIST The core provides an integrated memory BIST solution for testing the internal cache SRAMs the on chip trace memory and SPRAM using BIST controllers and logic tightly coupled to the cache subsystem These BIST controllers can be configured to utilize the following algorithms March C or IFA 13 Memory BIST can also be inserted with a CAD tool or other user specified method Wrapper modules and signal buses of configurable width are provided within the core to facilitate this approach Use
4. py Configi ps Present or not Config 1 c D cache hardware aliasing support Present or not Config7ar Cache parity Present or not ErrCtlpe PrID Company Option 0x0 0x7f PrIDCompanyOption Memory BIST Integrated March C or March C plus N A IFA 13 custom or none N A Clock gating Top level integer register file array TLB array fine grain or none These bits indicate the presence of external blocks Bit will not be set if interface is present but block is not Document Revision History Change bars vertical lines in the margins of this document Architecture specifications for example instruction set indicate significant changes in the document since its last descriptions and EJTAG register definitions and change bars release Change bars are removed for changes that are more in these sections indicate changes since the previous version than one revision old This document may refer to of the relevant Architecture document Table 3 Revision History Revision Date Description 00 01 August 12 2004 e Initial version August 30 2004 e Pre pre Sales Version February 4 2005 Fixed some consistency errors on number of outstanding loads and misses May 24 2005 e Added ISPRAM details and misc cleanup September 26 2005 e Production release e Add option for 8KB instruction and data caches gau Marek 7 2006 Describe trace capability and options 01 02 August
5. 25 2006 e Updated to reflect support for 9 TCs MIPS32 34Kc Processor Core Datasheet Revision 01 21 15 Copyright 2005 2010 MIPS Technologies Inc All rights reserved 16 Table 3 Revision History Revision Date Description 01 10 October 17 2007 e Updated to document template nDb1 03 e Updated clock ratio capabilities December 19 2008 e Alias removal supported in 64KB data cache and instruction cache Improved uncached accelerated description IT bypass added for single TC configurations November 19 2010 e Fixed maximum size of on chip trace buffer 1MB e added number of area reduction options MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved Copyright 2005 2010 MIPS Technologies Inc All rights reserved Unpublished rights if any reserved under the copyright laws of the United States of America and other countries This document contains information that is proprietary to MIPS Technologies Inc MIPS Technologies Any copying reproducing modifying or use of this information in whole or in part that is not expressly permitted in writing by MIPS Technologies or an authorized third party is strictly prohibited At a minimum this information is protected under unfair competition and copyright laws Violations thereof may result in criminal penalties and fines Any document provided in source format i e in a m
6. data is never loaded into the cache Store data can be gathered in a write buffer before being sent out on the bus as a bursted write This is more efficient than sending out individual writes as occurs in regular uncached mode Scratchpad RAM The 34Kc CPU allows blocks of scratchpad RAM to be attached to the load store and fetch units These allow low latency access to a fixed block of memory These blocks can be modified by the customer A reference design is provided which includes an SRAM array as well as an external DMA port to allow the system to directly access the array InterThread Communication Unit ITU This block provides a mechanism for efficient communication between TCs using gating storage This block has a number of locations that can be accessed using different views These views provide the mechanisms to implement a number of useful communication methods such as mailboxes FIFO mailboxes mutexes and semaphores This block can be modified by the customer to target a specific application A reference ITU design is included with the CPU that implements some basic views and functionality Bus Interface BIU The Bus Interface Unit BIU controls the external interface signals The primary interface implements the Open Core Protocol OCP Additionally the BIU includes a write buffer OCP Interface Table shows the OCP Performance Report for the 34Kc core This table lists characteristics about the core an
7. is no data dependency a MUL can be issued every cycle For applications which will not use the MDU much an iterative MDU is also available This MDU saves area while still preserving MIPS32 compatibility Both multiplies and divides are processed using a 1 bit per cycle iterative algorithm and have 34 cycle latencies System Control Coprocessor CPO In the MIPS architecture CPO is responsible for the virtual to physical address translation and cache protocols the exception control system the processor s diagnostic capability the operating modes kernel user supervisor and debug and whether interrupts are enabled or disabled Configuration information such as cache size and associativity presence of features like MIPS 16e or floating point unit is also available by accessing the CPO registers Coprocessor 0 also contains the logic for identifying and managing exceptions Exceptions can be caused by a variety of sources including boundary cases in data external events or program errors Most of CPO is replicated per VPE A small amount of state is replicated per TC and some is shared between the VPEs Interrupt Handling Each 34Kc VPE includes support for six hardware interrupt pins two software interrupts a timer interrupt and a performance counter interrupt These interrupts can be used in the following interrupt modes MIPS32 34Kc Processor Core Datasheet Revision 01 21 e Interrupt compatibilit
8. or 16 data pins plus a clock Clock and Power Considerations The following sections describe clocking and power management features Clocking The CPU has 3 primary clock domains e Core domain This is the main CPU clock domain controlled by the S _Cikin clock input e OCP domain This domain controls the OCP bus interface logic This domain is synchronous to S _Cikin but can be run at different frequencies e TAP domain This is a low speed clock domain for the EJTAG TAP controller controlled by the EJ_TCK pin It is asynchronous to S _Clkin Power Management The 34Kc core offers a number of power management features including low power design active power management and power down modes of operation The logic features a static design style that supports slowing or halting the clocks which reduces system power consumption during idle periods Local clock gating A significant portion of the power consumed by the 34Kc CPU is often in the clock tree and clocking registers The CPU has support for extensive use of local gated clocks Power conscious implementors can use these gated clocks to significantly reduce power consumption within the CPU Instruction Controlled Power Management The primary mechanism for invoking power down mode is through execution of the WAIT instruction When the WAIT instruction is executed the internal clock is suspended however the internal timer and some of the input pins for
9. pipeline The execution unit includes e 32 bit adder used for calculating the data address e Logic for verifying branch prediction e Load aligner e Bypass multiplexers used to avoid stalls when executing instructions streams where data producing instructions are followed closely by consumers of their results e Leading Zero One detect unit for implementing the CLZ and CLO instructions e Arithmetic Logic Unit ALU for performing bitwise logical operations e Shifter amp Store Aligner MIPS16e Application Specific Extension The 34Kc CPU includes support for the MIPS 16e ASE This ASE improves code density through the use of 16 bit encodings of many MIPS32 instructions plus some MIPS 16e specific instructions PC relative loads allow quick access to constants Save Restore macro instructions provide for single instruction stack frame setup teardown for efficient subroutine entry exit Multiply Divide Unit MDU The 34Kc CPU includes a multiply divide unit MDU that contains a separate pipeline for integer multiply and divide operations This pipeline operates in parallel with the integer unit pipeline and does not stall when the integer pipeline stalls This allows any long running MDU operations to be masked by instructions on other TCs and or other integer unit instructions The standard MDU consists of a pipelined 32x32 multiplier result accumulation registers HI and LO a divide state machine and the necessary m
10. registers Fractional data types Q15 Q31 Saturating arithmetic SIMD instructions operate on 2x16b or 4x8b simultaneously e Programmable Memory Management Unit 16 32 64 dual entry JTLB per VPE JTLBs are sharable under software control 4 12 entry MT optimized ITLB 8 entry DTLB Optional simple Fixed Mapping Translation FMT mechanism Programmable L1 Cache Sizes Individually configurable instruction and data caches 4 Way Set Associative sizes of 4 8 16 32 64 KB Direct mapped optionally available in sizes 0 1 2 4 8 16 KB Up to 9 outstanding load misses Write back and write through support 32 byte cache line size Virtually indexed physically tagged Cache line locking support Non blocking prefetches Optional parity support e Scratchpad RAM support Separate RAMs for Instruction and Data Independent of cache configuration Maximum size of 1MB Reference design available that features two 64 bit OCP interfaces for external DMA e Bus Interface OCP interface with 32 bit address and 64 bit data Flexible core bus clock ratios Burst size of four 64 bit beats 4entry write buffer Simple byte enable mode allows easier bridging to other bus standards Extensions for front side L2 cache e Multiply Divide Unit High Performance Maximum issue rate of one 32x32 multiply per clock Scycle
11. 1 2 4 8 or 16 0 1 2 4 8 or 16 KB Direct KB Direct Mapped Mapped Line Size 32 Bytes 32 Bytes Read Unit 64 bits 64 bits write through without write Write Policies N A allocate write back with write allocate Miss restart after miss word miss word transfer of Cache Locking per line per line Logical size of instruction cache Cache physi cally contains some extra bits used for precod ing the instruction type Cache Protocols The 34Kc CPU supports the following cache protocols e Uncached Addresses in a memory area indicated as uncached are not read from the cache Stores to such addresses are written directly to main memory without changing cache contents e Write through no write allocate Loads and instruction fetches first search the cache reading main memory only if the desired data does not reside in the cache On data store operations the cache is first searched to see if the 10 target address is cache resident If it is resident the cache contents are updated and main memory is also written If the cache look up misses only main memory is written e Write back write allocate Loads and stores that miss in the cache will cause a cache refill Store data however is only written to the cache Caches lines that are written by stores will be marked as dirty If a dirty line is selected for replacement the cache line will be written back to main memory e Uncached Accelerated Like uncached
12. Breakpoints are set on virtual address and ASID values similar to the Instruction breakpoint Data breakpoints can also be set based on the value of the load store operation Finally masks can be applied to both the virtual address and the load store value Fast Debug Channel The 34Kc CPU includes the EJTAG Fast Debug Channel FDC as a mechanism for efficient bidirectional data transfer between the CPU and the debug probe Data is transferred serially via the TAP interface A pair of memory mapped FIFOs buffer the data isolating software running on the CPU from the actual data transfer Software can configure the FDC block to generate an interrupt based on the FIFO occupancy or can poll the status Figure 4 Fast Debug Channel CPU Stores 2 O a Z i Loads m MIPS Trace The 34Kc CPU includes optional MIPS Trace support for real time tracing of instruction addresses data addresses and data values The trace information is collected in an on chip or off chip memory for post capture processing by trace regeneration software On chip trace memory may be configured in size from 0 to 1MB it is accessed through the existing EJTAG TAP interface and requires no additional chip MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved pins Off chip trace memory is accessed through a special trace probe and can be configured to use 4 8
13. Entries can then be marked as locked or unlocked on a per entry basis using the CACHE instruction Instruction Cache The instruction cache is an on chip memory block of 8 16 32 64 KB with 4 way associativity Direct mapped caches of 0 1 2 4 8 16 KB are also supported though not generally recommended for performance reasons A tag entry holds 20 or 21 bits of physical address a valid bit a lock bit and an optional parity bit The instruction data entry holds two instructions 64 bits 6 bits of pre decode information to speed the decode of branch and jump instructions and 9 optional parity bits one per data byte plus one more for the pre decode information There are four data entries for each tag entry The tag and data entries exist for each way of the cache The LRU replacement bits 6b are shared among the 4 ways and are stored in a separate array The instruction cache block also contains and manages the instruction line fill buffer Besides accumulating data to be written to the cache instruction fetches that reference data in the line fill buffer are serviced either by a bypass of that data or data coming from the external interface The instruction cache control logic controls the bypass function Just like the data cache with certain cache and TLB page sizes it is possible to have virtual aliasing in the instruction cache This is less of a problem because the instruction cache is not written so the aliases are always co
14. IPS Technologies Inc All rights reserved re nt A Oe I cache eee CorExtend FO 1 8 64KB Off Chip Se MT 4 way set associative pebug IE control blocks Sree a a ea Scratchpad Le ey el Lg TC Dispatch Unit Fetch Unit BIU 4 entry g A merging write a Execution buffer 6 10 Unit RF per MMU per VPE paan eing a5 TC ALU 16 64 entry reads 9L Shift etc JTLB or FMT O Non blocking Load Store Unit 4 8 outstanding misses Peas ay Scratchpad D cache gt RAM Coprocessor 3 64KB 4 way set associative ein ir Inter Thread Power lCommunication E el MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved 34Kc CPU Features e 8 9 stage pipeline a thread selection stage is bypassed on single TC CPUs yielding 8 stages e 32 bit address paths e 64 bit data paths to caches and external interface e MIPS32 Release2 Instruction Set and Privileged Resource Architecture e MIPS16e Code Compression optional e MIPS MT Application Specific Extension ASE Support for 1 or 2 Virtual Processing Elements VPEs Support for 1 9Thread Contexts TCs Inter Thread Communication ITC memory for efficient communication amp data transfer e MIPS DSP ASE optional 3 additional pairs of accumulator
15. Mis verted TECHNOLOGIES MIPS32 34Kc Processor Core Datasheet November 19 2010 The MIPS32 34Kc core from MIPS Technologies is a high performance low power 32 bit MIPS RISC core designed for custom system on silicon applications The core is designed for semiconductor manufacturing companies ASIC developers and system OEMs who want to rapidly integrate their own custom logic and peripherals with a high performance RISC processor Fully synthesizable and highly portable across processes it can be easily integrated into full system on silicon designs allowing developers to focus their attention on end user products The MIPS32 34Kn core is a family variant of the MIPS32 34Kc which includes several several features to mitigate performance degradataion when using small cache sizes so as to facilitate massively parallel systems The 34Kc CPU implements the MIPS32 Release 2 Architecture In addition to the base architecture it features the following application specific extensions ASE e The MIPS MT ASE which defines multi threaded operation e The MIPS DSP ASE which provides support for signal processing instructions e The MIPS16e ASE which reduces code size This standard architecture allows support by a wide range of industry standard tools and development systems The MT ASE allows the CPU to operate more efficiently by executing multiple program streams concurrently The CPU can be configured with 1 or 2 Virtual Pr
16. Performance or Iterative Config upu Watch Registers Present or Not Configl wr UserLocal Register Present or Not Config3utri Config3pc Dynamic or Static Config7 gyr Config7rps Performance Counters Present or Not Branch Prediction Instruction Bufer Depth WriteBack Buffer WBB Depth 14 Request Queue Depth 7 or max none L2 Cache Support Present or Not Config2 57 Instruction hardware breakpoints per VPE 0 2 or 4 DCRIp IBSgon Data hardware breakpoints per VPE 0 1 or 2 Fast Debug FIFO Sizes Min 2Tx 2Rx Useful 12Tx 4Rx FDCFG MIPS Trace support Present or not Contig3 7 MIPS Trace memory location On core off chip or both TCBCONFIGon7 TCBCONFIG opr MIPS Trace on chip memory size 256B IMB TCBCONFIGgz MIPS Trace triggers TCBCONFIG TRIG MIPS Trace source field bits in trace word TCBCONTROLBwesrewidth These bits indicate the presence of external blocks Bit will not be set if interface is present but block is not MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved Table 2 Build time Configuration Options Continued CorExtend interface Pro only Present or not Configup Coprocessor2 interface 0 1 2 4 8 16 32 or 64 KB Contig1 Config1 ia Contig is 0 1 2 4 8 16 32 or 64 KB Config1 p Config
17. d the specific OCP functionality that is supported MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved Table 1 OCP Performance Report 34Kc 34Kc Vendor Code 0x4d50 CPU Code 0x10a Revision Code 0x1 CPU Identity Additional identification is available through the Pr D and EBase Coprocessor0 registers Process dependent Yes Frequency range for this CPU Area Power Estimate Special reset requirements Number of Inter faces Synthesizable so varies based on process libraries and implementa tion 1 OCP master Master OCP Interface Operations issued Issue rate per OCP cycle Maximum number of operations out standing Burst support and effect on issue rates RD WR One per cycle 10 read operations All writes are posted so the OCP fabric determines the maximum number of outstanding writes Fixed burst length of four 64b beats with single request per burst Burst sequences of WRAP or XOR sup ported High level flow None control Number of threads All transactions utilize a single thread supported and use of those threads MIPS32 34Kc Processor Core Datasheet Revision 01 21 Table 1 OCP Performance Report Continued Connection ID and use of connection information Use of sideband signals Implementation restrictions 1 MReqInfo handled in a user de
18. fined way 2 MAddrSpace is used 2 bits to indicate L2 L3 access 3 CPU clock is synchronous but a multiple of the OCP clock Strobe inputs to the core control input and output registers to establish the core bus clock ratio Write Buffer The BIU contains a merging write buffer The purpose of this buffer is to store and combine write transactions before issuing them to the external interface The write buffer is organized as four 32 byte buffers Each buffer contains data from a single 32 byte aligned block of memory When using the write through cache policy the write buffer significantly reduces the number of write transactions on the external interface and reduces the amount of stalling in the core due to issuance of multiple writes in a short period of time The write buffer also holds eviction data for write back lines The load store unit opportunistically pulls dirty data from the cache and sends it to the BIU It is gathered in the write buffer and sent out as a bursted write For uncached accelerated references the write buffer can gather multiple writes together and then perform a bursted write to increase the efficiency of the bus Uncached accelerated gathering is supported for word or dword stores Gathering of uncached accelerated stores will start on cache line aligned addresses i e 32 byte aligned addresses Once an uncached accelerated store starts gathering a gather buffer is reserved for this store
19. misses are non blocking and up to 8 may be outstanding Two instruction cache misses can be outstanding Both caches are virtually indexed and physically tagged to allow them to be accessed in the same cycle that the address is translated To achieve high frequencies while using commercially available SRAM generators the cache access is spread across two pipeline stages dedicating nearly an entire cycle for the SRAM access The Bus Interface Unit implements the Open Core Protocol OCP which has been developed to address the needs of SOC designers This implementation features 64 bit read and write data buses to efficiently transfer data to and from the L1 caches The BIU also supports a variety of core bus clock ratios to give greater flexibility for system design implementations MIPS32 34Kc Processor Core Datasheet Revision 01 21 MD00418 Copyright 2005 2010 MIPS Technologies Inc All rights reserved An Enhanced JTAG EJTAG block allows for software Figure 1 shows a block diagram of the 34Kc CPU The debugging of the processor This includes a TAP controller dashed boxes indicate blocks that can be modified by the with PC sampling and Fast Debug Channel features Optional customer for specific applications features include instruction and data trace as well as instruction and data virtual address value breakpoints Figure 1 34Kc CPU Block Diagram 2 MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 M
20. multiply latency Early in iterative divide Minimum 11 and maximum 34 clock latency dividend rs sign extension dependent e Multiply Divide Unit Iterative Reduced area option that maintains full MIPS32 compatibility Iterative 1 bit per cycle processing of multiplies and divides Not available with DSP ASE or CorExtend access e CorExtend User Defined Instruction Set Extensions Separately licensed a core with this feature is known as the 34Kc Pro core Allows user to define and add instructions to the CPU at build time Maintains full MIPS32 compatibility Supported by industry standard development tools Single or multi cycle instructions Includes access to HI and LO registers e Coprocessor 2 interface 64 bit interface to a user designed coprocessor e Power Control Minimum frequency 0 MHz Power down mode triggered by WAIT instruction Support for software controlled clock divider Support for extensive use of fine grained clock gating e EJTAG Debug Support for single stepping Instruction address and data address value breakpoints TAP controller is chainable for multi CPU debug Cross CPU breakpoint support e MIPS Trace PC data address and data value tracing w trace compression Support for on chip and off chip trace memory MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights
21. nsistent If instruction memory is modified all of the aliases should be flushed from the instruction cache The CPU can automatically check all possible aliases when invalidating an address from the instruction cache The 34Kc CPU also supports instruction cache locking when configured as 4 way set associative Cache locking allows critical code or data segments to be locked into the cache on a per line basis enabling the system programmer to maximize the efficiency of the system cache The cache locking function is always available on all instruction cache entries Entries can then be marked as locked or unlocked on a per entry basis using the CACHE instruction Copyright 2005 2010 MIPS Technologies Inc All rights reserved Cache Memory Configuration The 34Kc CPU incorporates on chip instruction and data caches that are usually implemented from readily available single port synchronous SRAMs and accessed in two cycles one cycle for the actual SRAM read and another cycle for the tag comparison hit determination and way selection The instruction and data caches each have their own 64 bit data paths and can be accessed simultaneously Table 2 lists the 34Kc CPU instruction and data cache attributes Table 2 34Kc CPU Instruction and Data Cache Attributes Parameter Instruction Data 4 8 16 32 0r64 4 8 16 32 or 64 KB 4 way set KB 4 way set Sie and associative associative Organization 0
22. ocessing Elements VPEs each of which contain much of the privileged coprocessor 0 state including a full Memory Management Unit MMU to allow multiple OSes to operate concurrently on the processor Additionally the core can be configured to have from 1 9 Thread Contexts TCs A TC consists of a register file a Program Counter and a limited amount of privileged state TCs offer lightweight multi threading to allow cooperative or independent threads to run concurrently The DSP ASE provides support for a number of powerful data processing operations There are instructions for fractional arithmetic Q15 Q31 and for saturating arithmetic Additionally for smaller data sizes SIMD operations are supported allowing 2x16b or 4x8b operations to occur simultaneously Another feature of the ASE is the inclusion of additional HI LO accumulator registers to improve the parallellization of independent accumulation routines The synthesizable 34Kc CPU includes a high performance Multiply Divide Unit MDU by default The MDU is fully pipelined to support a single cycle repeat rate for 32x32 MAC instructions Further in the 34Kc Pro CPU the optional CorExtend block can utilize the HI LO registers in the MDU block The CorExtend block allows specialized functions to be efficiently implemented Instruction and data level one caches are configurable at 0 8 16 32 or 64 KB in size Each cache is organized as 4 way set associative by default Data cache
23. odifiable form such as in FrameMaker or Microsoft Word format is subject to use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS TECHNOLOGIES INC MIPS Technologies reserves the right to change the information contained in this document to improve function design or otherwise MIPS Technologies does not assume any liability arising out of the application or use of this information or of any error or omission in such information Any warranties whether express statutory implied or otherwise including but not limited to the implied warranties of merchantability or fitness for a particular purpose are excluded Except as expressly provided in any written license agreement from MIPS Technologies or an authorized third party the furnishing of this document does not give recipient any license to any intellectual property rights including any patent rights that cover the information in this document The information contained in this document shall not be exported reexported transferred or released directly or indirectly in violation of the law of any country or international law regulation treaty Executive Order statute amendments or supplements thereto Should a conflict arise regarding the export reexport transfer or
24. r specified BIST signals are also provided for the other data arrays that can be implemented with generator based SRAM cells in place of the standard registers Build Time Configuration Options The 34Kc core allows a number of features to be customized based on the intended application Table 2 summarizes the key configuration options that can be selected when the core is synthesized and implemented For a core that has already been built software can determine the value of many of these options by querying an appropriate register field Refer to the MIPS32 34Kc CPU Family Software User s Manual for a more complete description of these fields The value of some options that do not have a functional effect on the core are not visible to software 13 Copyright 2005 2010 MIPS Technologies Inc All rights reserved Table 2 Build time Configuration Options Choices lor2 MVPConf0pype 1 9 MVPConf0pt Option Software Visibility Number of VPEs Number of TCs Integer register file implementation Flops or generator N A style 4or8 N A N A Number of outstanding data cache misses Number of outstanding Loads 4 or 9 Memory Management Type per VPE TLB or FMT Contignyt TLB Size per VPE 16 32 or 64 dual entries Contig 1 wuusize TLB data array implementation style Flops or generator N A Config oa MIPS 16e Support Present or not DSP ASE Support Present or not Config3pspp MDU High
25. ration e bypass muxes EX Execute Memory Access e skewed ALU e DTLB e DCache SRAM access e Branch Resolution e Data watch and EJTAG break address compares MS Memory Access Second e DCache hit detection e Way select mux e Load align ER Exception Resolution e Instruction completion e Register file write setup e Exception processing WB Writeback e Register file writeback occurs on rising edge of this cycle 34Kc CPU Logic Blocks The 34Kc CPU consists of the following logic blocks shown in Figure 1 These logic blocks are defined in the following subsections Fetch Unit This block is responsible for fetching instructions for all Thread Contexts TCs Each TC has an 8 entry instruction buffer IBF that decouples the fetch unit from the execution Copyright 2005 2010 MIPS Technologies Inc All rights reserved unit When executing instructions from multiple TCs a portion of the IBF is used as a skid buffer Instructions are held in the IBF after being sent to the execution unit This allows stalled instructions to be flushed from the execution pipeline without needing to be refetched In order to fetch instructions without intervention from the execution unit the fetch unit contains branch prediction logic A 512 entry Branch History Table BHT is used to predict the direction of branch instructions It uses a bimodal algorithm with two bits of history information per entry Also a 4 entry Retu
26. release of the information contained in this document the laws of the United States of America shall be the governing law The information contained in this document constitutes one or more of the following commercial computer software commercial computer software documentation or other commercial items If the user of this information or any related documentation of any kind including related technical data or manuals is an agency department or other entity of the United States government Government the use duplication reproduction release modification disclosure or transfer of this information or any related documentation of any kind is restricted in accordance with Federal Acquisition Regulation 12 212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227 7202 for military agencies The use of this information by the Government is further restricted in accordance with the terms of the license agreement s and or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party MIPS MIPS I MIPS II MIPS III MIPS IV MIPS V MIPS 3D MIPS16 MIPS 16e MIPS32 MIPS64 MIPS Based MIPSsim MIPSpro MIPS Technologies logo MIPS VERIFIED MIPS VERIFIED logo 4K 4Kc 4Km 4Kp 4KE 4KEc 4KEm 4KEp 4KS 4KSc 4KSd M4K 5K 5Kc 5Kf 24K 24Kc 24Kf 24KE 24KEc 24KEf 34K 34Kc 34Kf 74K 74Kc 74Kf 1004K 1004Kc 1004Kf R3000 R4000 R5000 ASMACRO Atlas
27. reserved e Testability Full scan design achieves test coverage in excess of 99 dependent on library and configuration options Optional memory BIST for internal SRAM arrays Pipeline Flow The 34Kc CPU implements a 8 9 stage pipeline One stage is bypassed if the CPU is configured with a single TC Two extra fetch stages are conditionally added when executing MIPS 16e instructions This pipeline allows the processor to achieve a high frequency while maintaining reasonable area and power numbers Figure 2 shows a diagram of the 34Kc CPU pipeline Figure 2 34Kc CPU Pipeline IK mips16 Stages Gala Gala ai IF Stage Instruction Fetch First Decoupled IFU e J cache tag data arrays accessed e Branch History Table accessed e ITLB address translation performed e Instruction watch and EJTAG break compares done IS Instruction Fetch Second e Detect I cache hit e Way select e Branch prediction IR Instruction Recode e MIPS 16e instruction recode IK Instruction Kill e MIPS16e instruction kill MIPS32 34Kc Processor Core Datasheet Revision 01 21 IT Instruction Fetch Third Instruction Buffer e Thread selection e This stage is bypassed on single TC configurations when the instruction buffer is empty e Branch target calculation RF Register File Access e Register File access e Instruction decoding dispatch logic e Bypass muxes AG Address Generation e D cache Address Gene
28. rganized as pairs of even and odd entries containing pages that range in size from 4 KB to 256 MB in factors of four into the 4 GB physical address space The JTLB is organized in page pairs to minimize the overall size Each tag entry corresponds to two data entries an even page entry and an odd page entry The highest order virtual address bit not participating in the tag comparison is used to determine which of the data entries is used Since page size can vary on a page pair basis the determination of which address bits participate in the comparison and which bit is used to make the even odd determination is decided dynamically during the TLB look up Instruction TLB ITLB The ITLB is dedicated to performing translations for the instruction stream The ITLB is a hybrid structure having 3 entries that are shared by all TCs plus an additional entry dedicated to each TC Thus the ITLB may be as large as 12 entries but each TC may only have its translations in up to 4 places The ITLB only maps 4 KB or 1 MB pages subpages For 4 KB or MB pages the entire page is mapped in the ITLB If the main TLB page size is between 4 KB and 1 MB only the current 4 KB subpage is mapped Similarly for page sizes larger than 1 MB the current 1 MB subpage is mapped MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved The ITLB is managed by hardware and is transparent to
29. rn Prediction Stack RPS is a simple structure to hold the return address from the most recent subroutine calls The link address is pushed onto the stack whenever a JAL JALR or BGEZAL instruction is seen Then that address is popped when a JR instruction occurs The BHT is shared by all TCs on the processor while the RPS is dynamically associated with a single TC Thread Schedule Unit TSU This unit is responsible for dispatching instructions from different Thread Contexts TCs An external policy manager assigns priorities for each TC The TSU determines which TCs are runnable and selects the highest priority one available If multiple are available a round robin mechanism will select between them fairly The policy manager is a customer configurable block Simple round robin or fixed priority policies can be implemented by tying off signals on the interface A reference policy manager is also included that implements a weighted round robin algorithm for long term distribution of execution bandwidth Execution Unit The 34Kc CPU execution unit implements a load store architecture with single cycle ALU operations logical shift add subtract and an autonomous multiply divide unit Each TC on a 34Kc CPU contains thirty one 32 bit general purpose registers used for integer operations and address calculation The register file consists of two read ports and one write port and is fully bypassed to minimize operation latency in the
30. s a Debug mode that is entered after a debug exception derived from a hardware breakpoint single step exception etc is taken and continues until a debug exception return DERET instruction is executed During this time the processor executes the debug exception handler routine The EJTAG interface operates through the Test Access Port TAP a serial communication port used for transferring test data in and out of the 34Kc CPU In addition to the standard JTAG instructions special instructions defined in the EJTAG specification define what registers are selected and how they are used Hardware Breakpoints There are several types of simple hardware breakpoints defined in the EJTAG specification These breakpoints stop the normal operation of the CPU and force the system into debug mode There are two types of simple hardware breakpoints implemented in the 34Kc CPU Instruction breakpoints and Data breakpoints During synthesis the 34Kc CPU can be configured to support the following breakpoint options per VPE e Zero instruction and zero data Two instruction and one data e Four instruction and two data Instruction breaks occur on instruction fetch operations and the break is set on the virtual address Instruction breaks can also be made on the ASID value used by the MMU A mask can be applied to the virtual address to set breakpoints on a range of instructions Data breakpoints occur on load and or store transactions
31. software The larger JTLB is used as a backing structure for the ITLB If a fetch address cannot be translated by the ITLB the JTLB is used to attempt to translate it in the following clock cycle or when available If successful the translation information is copied into the ITLB for future use There is a minimum two cycle ITLB miss penalty Data TLB DTLB The DTLB is an 8 entry fully associative TLB dedicated to performing translations for loads and stores All entries are shared by all TCs Similar to the ITLB the DTLB only maps either 4 KB or 1 MB pages subpages The DTLB is managed by hardware and is transparent to software The larger JTLB is used as a backing structure for the DTLB If a load store address cannot be translated by the DTLB a lookup is done in the JTLB If the JTLB translation is successful the translation information is copied into the DTLB for future use The DTLB miss penalty is also two cycles Fixed Mapping Translation FMT The FMT is much simpler and smaller than the TLB style MMU and is a good choice when the full protection and flexibility of the TLB is not needed Like a TLB the FMT performs virtual to physical address translation and provides attributes for the different segments Those segments that are unmapped in a TLB implementation kseg0 and kseg1 are handled identically by the FMT Data Cache The data cache is an on chip memory block of 4 8 16 32 64 KB with 4 way associativity Direc
32. t mapped caches of 0 1 2 4 8 16 KB are also supported though not generally recommended for performance reasons A tag entry holds 20 or 21 bits of physical address a valid bit a lock bit and an optional parity bit The data entry holds 64 bits of data per way with optional parity per byte There are 4 data entries for each tag entry The tag and data entries exist for each way of the cache There is an additional array that holds the dirty and LRU replacement algorithm bits for all 4 ways 6b LRU 4b dirty and optionally 4b dirty parity Using 4KB pages in the TLB and 32 or 64KB cache sizes it is possible to get virtual aliasing A single physical address can exist in multiple cache locations if it was accessed via different virtual addresses There is an implementation option to eliminate virtual aliasing If this option is not selected software must take care of any aliasing issues by using a page coloring scheme or some other mechanism MIPS32 34Kc Processor Core Datasheet Revision 01 21 When built with a 4 way cache the 34Kc CPU supports data cache locking Cache locking allows critical code or data segments to be locked into the cache on a per line basis enabling the system programmer to maximize the efficiency of the system cache The locked contents can be updated on a store hit but will not be selected for replacement on a cache miss The cache locking function is always available on all data cache entries
33. ture A TLB provides mapping and protection capability with per page granularity The 34Kc implementation allows a wide range of page sizes to be present simultaneously The TLB contains a fully associative Joint TLB JTLB To enable higher clock speeds two smaller micro TLBs are also implemented the Instruction Micro TLB ITLB and the Data Micro TLB DTLB When an instruction or data address is calculated the virtual address is compared to the contents of the appropriate micro TLB uTLB If the address is not found in the uTLB the JTLB is accessed If the entry is found in the JTLB that entry is then written into the uTLB If the address is not found in the JTLB a TLB exception is taken Figure 3 shows how the ITLB DTLB and JTLB are implemented in the 34Kc CPU Figure 3 Address Translation During a Cache Access Instruction Cache Tag RAM Virtual Address Instruction Calculator Instruction IVA Entry Hit Miss JTLB DVA Entry Data Hit Miss Data Cache TagRAM Address DTLB Calculator Virtual Address Joint TLB JTLB The JTLB is a fully associative TLB cache containing 16 32 or 64 dual entries mapping up to 128 virtual pages to their corresponding physical addresses The address translation is performed by comparing the upper bits of the virtual address along with the ASID against each of the entries in the tag portion of the joint TLB structure The JTLB is o
34. ultiplexers and control logic The MDU supports execution of one multiply or multiply accumulate operation every clock cycle Divide operations are implemented with a simple 1 bit per clock iterative algorithm An early in detection checks the sign extension of the dividend rs operand If rs is 8 bits wide 23 iterations are skipped For a 16 bit wide rs 15 iterations are skipped and for a 24 bit wide rs 7 iterations are skipped Any attempt to issue a subsequent MDU instruction while a divide is still active causes a pipeline stall until the divide operation is completed Table 1 lists the repeat rate peak issue rate of cycles until the operation can be reissued and latency number of cycles until a result is available for the 34Kc CPU multiply and divide instructions The approximate latency and repeat rates are listed in terms of pipeline clocks For a more detailed discussion of latencies and repeat rates refer to Chapter 9 of Programming the MIPS32 34Kc Core Family MIPS32 34Kc Processor Core Datasheet Revision 01 21 Copyright 2005 2010 MIPS Technologies Inc All rights reserved Table 1 34Kc CPU Integer Multiply Divide Unit Latencies and Repeat Rates High Performance MDU Operand Size mul ri Repeat Opcode div rs Latency Rate MULT MULTU MADD MADDU 32 bits 5 1 MSUB MSUBU MUL 32 bits 5 1 8 bits 12 14 12 14 16 bits 20 22 20 22 DIV DIVU 28 30 28 30 36 38 36 38 Tf there
35. used for application programs Supervisor mode gives an intermediate privilege level with access to the ksseg address space Supervisor mode is not supported with the fixed mapping MMU Kernel mode is typically used for handling exceptions and operating system kernel functions including CPO management and I O device accesses An additional Debug mode is used during system bring up and software development Refer to EJTAG Debug Support on page 12 for more information on debug mode Memory Management Unit MMU Each 34Kc VPE contains a Memory Management Unit MMU that is primarily responsible for converting virtual addresses to physical addresses and providing attribute information for different segments of memory At synthesis time the type of MMU can be chosen independently for each VPE from the following options Copyright 2005 2010 MIPS Technologies Inc All rights reserved Translation Lookaside Buffer TLB e Fixed Mapping Translation FMT In a dual TLB configuration each VPE contains a separate JTLB so that the translations for each are independent from each other However there is a further configuration option where the JTLBs can be shared This requires special OS support but enables a higher performance MMU with less area impact The following sections explain the MMU options in more detail Translation Lookaside Buffer TLB The basic TLB functionality is specified by the MIPS32 Privileged Resource Architec
36. y mode which acts identically to that in an implementation of Release of the Architecture e Vectored Interrupt VI mode which adds the ability to prioritize and vector interrupts to a handler dedicated to that interrupt and to assign a GPR shadow set for use during interrupt processing The presence of this mode is denoted by the VInt bit in the Config3 register This mode is architecturally optional but it is always present on the 34Kc CPU so the VInt bit will always read as a 1 for the 34Kc CPU e External Interrupt Controller EIC mode which redefines the way in which interrupts are handled to provide full support for an external interrupt controller handling prioritization and vectoring of interrupts This presence of this mode denoted by the VEIC bit in the Config3 register Again this mode is architecturally optional On the 34Kc core the VEIC bit is set externally by the static input S _E CPresent to allow system logic to indicate the presence of an external interrupt controller If a TC is configured to be used as a shadow register set the VI and EIC interrupt modes can specify which shadow set should be used upon entry to a particular vector The shadow registers further improve interrupt latency by avoiding the need to save context when invoking an interrupt handler Modes of Operation The 34Kc CPU supports four modes of operation user mode supervisor mode kernel mode and debug mode User mode is most often

MIPS32® 34Kc™ Processor Core Datasheet

Contents

Download Pdf Manuals

Related Search

Related Contents

MIPS32&reg; 34Kc&trade; Processor Core Datasheet

Contents

Download Pdf Manuals

Related Search

Related Contents

MIPS32® 34Kc™ Processor Core Datasheet