Home

IPv6 in Embedded Micro-controllers

1. the means to which this is possible by software alone are limited if existent highlighting a need for hardware parallelism The DMA controller in the M30803 micro controller works by what is called the cycle steal method Mitsubishi 2001 This means that when there are DMA transfers in progress CPU cycles are shared between each DMA channel and the CPU in a round robin manner providing the data transfer program execution parallelism The word transfer count and the addresses to transfer to and from are also loaded but in order for a single word transfer to occur a selectable trigger signal must be activated Mitsubishi 2001 A In the case of the demonstration board there was no available trigger signal from the NIC to activate these so the DMA transfers were triggered from a high speed internal clock In the final hardware design the DMA transfers were set to trigger from the IOCS16B signal The IOCS16B signal is an ISA bus signal that is triggered on a word transfer hence allowing the DMA controller to effectively self trigger The timing of the trigger is not critical the CPU schedules the DMA transfer to occur at the next available moment although if two requests are made before the hardware has the opportunity to react only one transfer will take place Mitsubishi 2001 This is not an issue in the design 2 5 Implementation The Printed Circuit Board PCB was implemented using a two layer design as that was det
2. Introduction In this project two sets of hardware have been used one set was a development board with add on Ethernet card kindly loaned by Mitsubishi and the other set of hardware was the hardware designed specifically for this project The decision was made to construct hardware for the project in order to provide a typical system one that matched the approximate specifications of the target system family but was also engineered for the optimal performance that a system of this type would be engineered for An important part of implementing networking on such a limited platform is to make maximum use of hardware to support the software implementations for instance the common Network Interface Controller used in the final hardware contains 16KB of on board buffer RAM Realtek 2001 and much more control on how it is used as opposed to the 4KB buffer RAM available on the add on card Cirrus Logic 2001 The hardware designed in this project has the capability of providing better performance in several areas while being a very similar computing platform to the development board It was also projected at the beginning of the project that more system resources may have been needed in terms of RAM or a processor upgrade to determine the minimum requirements for such a system could it not be implemented on the development board but this turned out not to be the case The development board was also however a critical part of
3. Assembly Instructions PAI queue is a different way of looking at the problem It is based on the observation that there are some packets that are static very close to static or are responses in request response pairs where the response is very similar to the request Neighbour discovery packets and ping responses fit into this category The normal sending of a Neighbour Discovery Request for instance involves copying most of a static structure into a buffer modifying the address to which it is to be sent and then calculating the ICMPv6 checksum In the case of a ping response it is similar it is the copying of a static packet header coupled with a copying of the additional request data Again the checksum is calculated across the entire packet 14 Free Allocate Buffer Buffer Pong Template Copy Template Copy data Pong Packet Ping data Send buffer Normal Ping Processing and Pong Assembly This is inefficient both in its use of memory and processing power The Neighbour Discovery request could be represented as a pre built packet along with an instruction to modify two fields the destination address and the ICMPv6 checksum The ICMPv6 checksum could be recalculated more quickly by taking the checksum of the existing packet and combining it with the checksum of the new destination address rather than the entire packet This is of course assuming that there is no encryption in place The entire set of Neighbour discovery requests
4. This was accomplished using strong reading glasses silver solder flux and a fine tipped soldering iron and then inspected using a microscope to ensure that the wires did not touch After achieving this the soldering of the one hundred pin Ethernet chip to the board was not comparatively difficult After this saga had run to its completion the errors in the serial interface chip were discovered and corrected and several other small modifications to the board were made in the form of pull up or pull down resistors for the In System Programming that had somehow been missed in the design When the In System Programming was eventually working it was discovered that a compiler upgrade was necessary in order to get programs working and loaded onto the micro controller The last error in the hardware was discovered during the programming phase of the final device driver The RTL8019 is a mixed 8 16 bit device and it had been mapped into the address space in a manner which only allowed for sixteen bit transfers There were two possible solutions One solution was to modify the board such that all of the address lines were moved down one bit omitting address line zero The other solution was that the software had to switch the bus between eight and sixteen bit mode when accessing control registers and DMA ports respectively The software method was used to resolve this problem Though the implementation issues involved in this project were daunting
5. as it must wait until it is called again by the DMA transfer completing as it cannot do anything until the DMA transfer is complete The last step of the ISR is to read the current page pointer and the boundary pointer from the RTL8019 The current page pointer indicates where the NIC has written to in the Receive Ring Buffer The boundary pointer indicates where the micro controller has read to in the Recieve Ring Buffer The ISR checks to see whether the two values are equal If they are then nothing is available to be received and the ISR clears any miscellaneous bits Unused informational bits in the Interrupt Status Register and returns If however the two values are not equal then the NIC has received a packet that must be transferred It clears the Packet Received OK bit in the Interrupt Status Register and checks the Free Packet Pool to see whether there is a free buffer If there isn t then the ISR returns This means that the NIC will buffer the packet in the meantime and the ISR will receive it later when there is a free buffer available It then disables the NIC hardware interrupt and configures the RTL8019 for a DMA transfer before switching the system bus into eight bit mode and initiating the DMA transfer At this point the ISR returns 22 There are a couple of differences between this ISR and the previous one Though the operations are still partitioned because of hardware usage the order of the operations is much more flex
6. be seen in the next diagram In this diagram the address resolution request is sent freeing up a transmit buffer and hence allowing processing of Rx Packet 2 to continue In this case the response to Rx Packet 2 Tx Packet 3 is also to an address whose hardware address is unknown but there are no free transmit buffers in which to compose an address resolution request This is not too much of a problem this packet can be sent later but it means that the second transmit buffer is put on the queue awaiting address resolution This situation would be resolvable if the next packet were the packet resolving the address for Tx Packet 1 That would allow Tx packet 1 to be sent freeing up its transmit buffer Instead two more packets are received Rx Packet 3 and Rx Packet 4 The processing of Rx Packet 3 stops at the stack as there are no free transmit buffers and hence freezes the stack Rx Packet 4 is received by the driver but is not processed as the stack is still completing processing of Rx Packet 3 The resulting situation is that the system is deadlocked as it requires an address resolution response in order to free up its transmit buffers but the address resolution response cannot be received because the receive buffers are full This may at first glance seem to be symptomatic of a lack of buffer memory certainly the situation will be exacerbated by a lack of buffer memory However merely increasing the amount of buffer memory wi
7. being transmitted and one for being received This is so that the entirety of the free packet pool cannot be utilised by incoming packets deadlocking a system hence unable to make outgoing transmissions 3 4 The Memory Recycle Algorithm Problems and Solutions There are some practical faults in the implementation of the Memory Recycle Algorithm detailed above one of which is the basic assumption that packets are independent and that time is the only thing limiting buffer flow around the system However the algorithm can be modified to work around these blocking points without too much difficulty 11 Stack buffers Free Receive Buffers Driver Receive Rx Packet 1 Rx Packet 1 Rx Packet 1 Rx Packet 2 Rx Packet 2 EA Rx Packet 2 Stage 1 Receive packet 1 comes in is processed and a response Transmit packet 1 is generated to an unknown address The response is hence frozen and an address resolution packet Transmit packet 2 is sent in order to get a hardware address to send the response to Receive packet 1 is freed Meanwhile Receive packet 2 is received an independent request It is processed but when the stack tries to get a transmit buffer in order to send a response it is frozen until a transmit packet becomes free hence freezing receive packet 2 Stack buffers Free Transmit Driver Transmit Buffers A Tx Packet 1 Tx Packet 1 E Tx Packet 2 Tx Packet 2 Tx Packet 2 The Memory Recycle Algorithm Four Steps to deadl
8. data should be transferred or assembled inside the system only once and the data should then be passed by reference This concept developed from the observation that in Operating Systems of the time a large proportion of the CPU time consumed was in simple copy operations Although the concept is simple the implementation can however be complex due to issues surrounding the use of the memory in a multithreaded environment 3 3 The Memory Recycle Algorithm The Memory Recycle Algorithm is a response to the issues involved in implementing the Zero Copy paradigm and efficient memory use It is intended to avoid the overhead of memory allocation deallocation and excessive copying and it provides a structure that implements the Zero Copy paradigm In the Memory Recycle Algorithm each buffer is treated as an object in that only one distinct process can own the buffer at a time When a particular process has finished processing the buffer it gives the buffer to the next process in practical terms passes a pointer to the object and then forgets that it ever saw the object This concept in the algorithm is the part that implements the Zero Copy paradigm The Memory Recycle Algorithm can perhaps be best explained by explanation of the packet lifecycle Each packet starts as a buffer in the free packet pool When the NIC signals that a packet is ready to be received the Ethernet driver is called It requests a buffer from th
9. in each Carrying this to the extreme would however eventually result in the normal memory allocation behaviour of the system If many small packets are sent in a system that has only the one packet pool of maximum size packets then memory usage is not very efficient except for one point memory allocation can be controlled with respect to time instead of just with respect to memory available Consider the situation where the buffers have been Stack data Free Packet Pool YO Buffers ia ae Packet 2 Packet 2 Packet 2 Packet n The driver detects an incoming packet and gets a pointer to a free buffer DMA occurs to fill the buffer with data The driver passes the full buffer pointer to the stack for processing The stack processes the data and gives the pointer back to the free packet pool The stack gets a pointer to a free buffer and fills it with the packet response The stack passes the full buffer pointer to the driver for transmission DMA occurs to transmit the data in the buffer Once transmission is complete the buffer pointer is given back to the free packet pool A B C D E F G H Note Conceptually each buffer is treated like a physical object in that only one entity has possession of it at any one point in time Itis explained with pointers in this diagram for clarity The Memory Recycle Algorithm Data flow 10 exhausted and the stack wishes to compose a transmission Under a normal memory allocation
10. occur with minimal CPU intervention in parallel with other tasks For a comparison in the MI6C series at a minimum a 16 bit transfer from an external memory area to the internal RAM actuated by a CPU instruction will take two cycles to execute depending on factors such as the number of bytes in the CPU instruction queue buffer Mitsubishi 2001 C A 16 bit DMA transfer under the same conditions will take a guaranteed two cycles without overhead needed for loop control or incrementing of addresses Mitsubishi 2001 A In the best case even when utilising such optimisation techniques as loop unrolling DMA as a mechanism for data transfer is faster and hence more efficient The efficiency is clearly critical when the amount of data to be transferred and processed is taken into account An Ethernet link rate of only 10 Mb could at maximum produce in the order of 1 25 MB per second of data each way full duplex operation This means that the micro controller should optimally be capable of transferring a full 2 5 MB of data across its external data bus at a speed of 1 25 million 16 bit transfers second When the differing amounts of cycles for reading versus writing are taken into account this operation will occupy a maximum of 16 CPU time using DMA close to a sixth of the CPU time available If other operations such as measurement or data processing are also time critical then this means that the time must be efficiently multiplexed
11. of the NICs follow similar forms Following a software reset the hardware address must be programmed into the unit and the device configured to both generate and filter interrupts The physical interface must be selected as both NICs offer the ability to work with both twisted pair and coaxial interfaces although both pieces of hardware have only been designed with the external circuitry for twisted pair operation The multicast hash filter must be initialised and both transmitter and receiver parameters configured In the case of the RTL8019 the endian ness of the NIC interface can and must be set The receiver parameters have been selected such that only correct packets are received as although the hardware will accept and pass on packets with checksum errors and undersized or oversized packets there is no reason to waste resources processing these packets The ability to receive broadcast and multicast packets must also be enabled explicitly In the case of the RTL8019 the receive buffer ring must also be explicitly initialised The transmitter parameters have been initialised so as to maximise throughput and minimise extra processing needed In the case of the CS8900A the device is configured to start transmitting packets after only 381 bytes have been received This allows the use of buffer memory for less time and causes no problem with operation as the DMA controller can easily keep up the necessary data rate to supply the chip
12. the project although the use of two different Ethernet chips required the development of two Ethernet drivers the use of the development board was critical in both an early understanding of the issues involved and in the parallelisation of the hardware and software development 2 2 Problem Statement The notion of an embedded system is a vague one in terms of processing power and memory capacity with such systems including 4 MHz 8 bit micro controllers to 300 MHz Celeron systems such as found in Biscuit PCs Clearly the upper end of this scale is not what this project is concerned with a 300 MHz Celeron certainly has the processing power to manage IPv6 It is also clear that the lower end of the scale is not relevant either a 4 MHz 8 bit micro controller would lack the sheer processing power to even receive 10 Mbit Ethernet at full rate The first decision that was made was to design the hardware with a common embedded micro controller that runs at approximately the 20 MHz mark This means that there is sufficient processing power available to manage the Ethernet communication but also that the device will be close in specification to those employed in such applications as office equipment or consumer network appliances Mitsubishi 2001 The second decision was in order to satisfy the memory requirements for both RAM and ROM within the system The amount of RAM needed was estimated to be at a minimum of 10 KB with prefe
13. was selected as the operating system to run on the PCs because the TAHI test tools have been developed to run on BSD variants and also because BSD contains an IPv6 reference implementation so the machines that the project was tested against were known to be compliant 5 3 Testing Results Extensive testing was not possible to perform due to time constraints However basic testing through the development process showed that the stack functioned In terms of throughput as tested by a flood ping test the implementation on the final hardware was able to handle ping packets at a steady rate of 50000 per minute and that 100000 per minute was possible In the latter test the stack stopped functioning after a short time at the higher data rate and although the matter has not been thoroughly 27 debugged it is thought that the fault may be a missed DMA interrupt The figure of 50000 packets per minute is exhibited by the stable implementation which was capable of sustaining operation indefinitely The tests performed much earlier on the development board showed a sustained packet rate of 36000 packets per minute although that stack stopped functioning after a number of minutes at that data rate These figures cannot however be compared as the tests on the development board used some timing configurations that may have been too conservative with respect to what the CS8900A is capable of The reason for the stack stopping functionality had not bee
14. 7 Differences between the CS8900A and the RTL8019AGS cscsssssssessenees 17 3 8 Ethernet Driver Structure and Operation ssesssesssecssocesooessocessccesocesocssoosesse 18 Non Hardware Specific FUNCION tddi ii 18 3 9 1 Inttialisation Code iii donen eden eiii 18 3 8 2 The Interrupt Service Routine ISR idad ii 19 3 9 Implementation Issues ii iccs sconccsicisscnesissednsecasascvacseunccacteseaesddesansceccsssssceeunscdeseceass 23 A SOFTWARE olaaa A 24 4 1 Problem Statement e sesseseesossesossossesoossesossossesossossesoosoesessossesossossesosssssessessssese 24 4 2 Stateless Address Auto configuration ocomcmss 24 4 3 Duplicate Address Detection ccccccccssssssscssssssscccssssssccssssscccsssssscscssssesseees 25 AA Kotter DISCOVERY RS AR a tiantaa deans ati ear Ria 25 4 5 Neighbour DISCOVERY eeesssooesesosesesssooeessooesssooeeesosecesssocesssoossesooecesosecessseossssoosseo 25 4 6 IPv6 to Ethernet Address Mapping sssesssessseessecesocesocesoosesscessocesocssoossssesssee 26 A TS D Ae EEE T A TE E E TE ATTI 26 4 8 Implementation Issues os ssosocesssocesssooesssoocessosecesssocesssoossssosecesosecesssoosssseossee 26 5 TESTING von sas 27 5 1 Problem Statement soesoessessoesocssessoesoossesooesocssessossocesessossoossessossoossosssesoossossssse 27 5 2 Testing SEU o ores nios N r E i en EEN ETEN EE eE E Enas Sess 27 5 3 Testing A sss sacccsssescensiassvdeesuvav
15. Full TCP IP in 8 bits Sweden Swedish Institute of Computer Science SICS Retreieved 20 March 2003 from http www sics se adam full tcpip in 8 bits ps Erikstan U 2002 A Small Mobile IPv6 Client Thesis Report Australia The Centre for Telecommunications and Information Engineering Monash University Gascoigne C 2002 IPv6 for Embedded Devices Thesis report Australia Department of Electrical Engineering Monash University Hagen Silvia 2002 IPv6 Essentials O Reilly amp Associates Inc Sebastopol California Hinden R amp Deering S 1998 IP Version 6 Addressing Architecture RFC 2373 The Internet Society Retrieved 18 y uly 2003 from http www faqs org rfes rfc2373 html Mitsubishi Electric Corp 2001 B Programming Language lt C Language gt M16C 80 Series Rev A2 Japan Mitsubishi Electric Corp Semiconductor Marketing Division Mitsubishi Electric Corp 2000 MI6C Software Manual Rev D1 Japan Mitsubishi Electric Corp Kitaitami Works Mitsubishi Electric Corp 1999 M16C 80 Series Programming Manual lt Assembler Language gt Japan Mitsubishi Electric Corp Kitaitami Works National Semiconductor Corp 1995 DP83901A SNIC Serial Network Interface Controller Arlington Texas National Semiconductor Corp National Semiconductor Corp 1993 Writing Drivers for the DP8390 NIC Family of Ethernet Controllers Arlington Texas National Semiconductor Corp 30 31
16. IPv6 in Embedded Micro controllers COMP420Y Final Report Student Jeremy Stringer 0023109 Supervisor Richard Nelson J Stringer 2003 Table of Contents 1 INTRODUCTION ile 4 LE Perico 4 ZMHAROWARES O AA eine 5 PA AAA nn o O 5 22 A O 5 2 3 Component DECISIONS 652555 ccessieaseciiecsceesatoacudeceetecvuncerdesesdecavocsseapedannesevccsonseoignesetened 6 2 4 Direct Memory Access DMA SUppoOTt cocconccoonoccnonoccnnnnccnnnnccnnoccononcccnnnccconoconnos 6 2 5 Implementation arar in id da ci ida 7 2 6 Implementation Issues iciesisssscsesicsccscossisaccacisibesvenscdcciiessesvciecssscnsanissseverssserdencessvect 8 3 HARDWARE SOFTWARE INTERFACE cccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeees 9 3 1 Proplem Statement aca 9 3 2 The Zero Copy Paradigm dins 9 3 3 The Memory Recycle Algorithm 0scsssssscsssssssssssssssesssssssssssesssssessnsserenes 9 3 4 The Memory Recycle Algorithm Problems and Solutions ssccsesscees 11 3 4 1 Deadlock detection and buffer flushiNg oo ooooonnnocccnoncccnonccnnoncconnnccnnnnccnnnos 14 3 4 2 Dedicated Privileged DU da 14 3 4 3 Packet Assemble Instructions PAT Queues cccsccsccceceesesessceeeeeeeeeeees 14 3 5 WERT SUED CT END ssij 5 cease opeees ve euaasaseeneass cote ca ban canseny co o esoop dam terasta isst eo ate 16 3 6 Multicast Reception seoesssoocsssosccssssocesssoosessooesssosecessscossssoossssooecesosecesssosssssoosse 16 3
17. NIC The last is that additional support is required in order to support IPv6 over Ethernet Unlike PPP Ethernet has multiple destinations reachable on one interface and hence requires hardware addressing Other requirements specific to Ethernet are the implementation of Stateless Address Auto configuration Neighbour Discovery and the mapping of IPv6 addresses to Ethernet Addresses in general These will be detailed in later sections 4 2 Stateless Address Auto configuration In a protocol such as IPv4 nodes must either be statically configured with an address or they must request an address from an external source This is due both to the design and the limited address space of the protocol In IPv6 however 128 bits are allocated to the address and in addition one of the addresses that an interface may have is an address scoped purely to the link to which it is attached Stateless Address Auto configuration is a process by which a node may allocate addresses to its interfaces itself instead of being forced to have them configured statically or externally This is an advantage in deployment as it is part of a set of features that make IPv6 literally plug n play with respect to a network environment Due to the vast array of network types that IPv6 is expected to run on Stateless Address Auto configuration is specified in a link specific manner In the case of Ethernet the host portion of the IPv6 address is formed from a identifi
18. asuteer sconseus A 27 By CONCLUSION ona ie aia 28 Y REFERENCES urnas 29 8 BIBLIOGRAPHY iii acia 30 1 Introduction This report concerns itself with the practical issues in implementing the Internet Protocol Version 6 IPv6 in embedded systems IPv6 is the next generation of the Internet Protocol the protocol that all computers use to communicate across the Internet It supports features such as auto configuration that are ideal for Low Cost Network Appliances LCNAs by making them easier to configure and deploy One problem however is that IPv6 has considerably more complexity than IPv4 in some areas and as such presents practical difficulties in its implementation in embedded systems An embedded system is a small computer system literally embedded into an appliance or piece of machinery Embedded systems are typically very limited in the amount of memory and processing power that they have and often do not even run Operating Systems For this project I have built a small embedded system that is typical of an low cost network appliance and ported and modified an existing IPv6 stack to it The existing IPv6 stack that was used communicated via a slow serial line and needed to be tested on Ethernet hence I wrote an Ethernet driver and implemented specific functionalities required for Ethernet operation such as Neighbour Discovery and Auto configuration as detailed in later sections The initial software development was done
19. atest received packet to indicate that it was ready for passing to the stack before reading the Interrupt Status Queue on the CS8900A If the result was nonzero indicating that the NIC had signalled an event then the ISR would execute the event handling code otherwise it would continue on to the transmit initiation code to transmit any outstanding packets The event handling code would then check the Interrupt Status Queue value to determine which particular event had occurred The most important events were the Transmit Ready Transmit Complete Transmit Terminated and Receive Events the others events generated by the NIC are primarily informational or only useful for debugging The Transmit Ready Event would indicate that the NIC was ready for the next transmission to be started In this case the ISR would initiate the DMA transfer which had been previously set up by the transmit initiation code and return from the ISR The Transmit Complete Event would signal that the most recent transmission had completed successfully without colliding with other packets In this case the last transmitted packet would be taken off the transmitted queue and transferred to the free packet pool leaving a fresh entry on the top of the transmission queue for the next time the queue was accessed by the transmit initiation code The Transmit Terminated Event was initiated when transmission of a packet was terminated due to excessive collisions The ISR inc
20. cating the obvious In the case of a successful transmission it frees the most recently transmitted packet by transferring it to the free packet pool and updates a static variable that keeps track of the RTL8019 transmit ring buffer usage In the case of the transmit failing it merely changes the state of the packet indicating that it needs to be retransmitted After these processing steps are complete it resets the bits that it uses in the Interrupt Service Register in a similar manner to the DMA It is worth noting that each functional section in the ISR only resets the bits that are responsible for triggering them This is because the Interrupt Status Register can change over the execution time of the ISR and if they cleared all bits then they would eliminate critical status information causing severe ISR malfunction The next step of the ISR is to check for any packets that are ready to be transmitted If there are then the size of the packet in 256 byte pages is calculated and hence it is determined whether there is enough space in the RTL8019 transmit buffers for the packet If there isn t then execution continues after this stage of the ISR If however there is space then the NIC hardware interrupt is disabled the chip is configured to set up a DMA transfer the bus is put into sixteen bit mode and a DMA transfer is initiated The state variable is also set to indicate that a DMA transfer is in progress At this point the ISR returns
21. ck is expected to send or receive at this point mostly addresses used by processes such as Neighbour Discovery An additional but not overly significant overhead is that in order to receive packets addressed to multicast addresses the driver should be instructed to program the NIC to receive packets from these addresses The alternative is to receive all multicast packets lowering efficiency This involves a checksum calculation duplicating the checksum hardware in the NIC This would only be an issue if large amounts of multicast groups were joined and left constantly using a lot of CPU time and requiring that the addresses be kept track of At the current time the driver does not have this functionality and instead accepts all multicast addresses 4 7 IPSec IPSec is the security and encryption layer for IPv6 and it is a requirement for all IPv6 nodes to implement Loughney 2003 The minimum requirements for IPSec are that the ESP and AH protocols are implemented IPSec was not implemented in the stack due to time constraints It will however have a significant effect on the feasibility of IPv6 on embedded micro controllers because it involves a data processing step over most of the packet which will slow the stack down considerably This may be a function better implemented in specialised hardware positioned between the micro controller and the NIC as the encryption methods supported in IPv6 have been selected to be able to be
22. cks the top elements of both the transmit queue and the receive queue to determine whether the event was the completion of transmit DMA or receive DMA In the case of the receive DMA finishing it sets the status of the packet to done making it ready to be picked up by the stack It then updates the boundary pointer register in the RTL8019 to indicate the point up to 21 which it has read In the case of the transmit DMA finishing it goes through a short wait cycle checking the Interrupt Status Register until it has determined that the RTL8019 has internally completed the DMA transfer before moving the packet from the transmit queue to the transmitted queue reading for the transmit command After the ISR has completed the appropriate post DMA processing it resets the DMA completed flag in the Interrupt Status Register The Interrupt Status Register is the register that signals what event has caused a hardware interrupt if it is non zero then the RTL8019 interrupt line is asserted The ISR then re enables the hardware interrupt that would normally trigger it and resets the state variable The hardware interrupt that would normally trigger the ISR is disabled during DMA transfers because the Interrupt Status Register as noted before cannot be read during DMA The ISR then reads the Interrupt Status Register to check the status of the NIC The first bits that it checks are the Transmit completed ok and the Transmit failed bits indi
23. could hence be represented by a declaration of the static portions of the packet along with new destination new checksum parameter pairs Ping responses could be represented the same way using the same incremental checksum update techniques with the exception that they would have to include a pointer to the ping data to be sent This leads to the idea of the Packet Assembly Instructions queue The idea of the PAI queue is that instead of filling a buffer for certain packets instructions for packet assembly are sent The driver could then modify a static buffer containing the appropriate packet format and immediately transmit the packet to the hardware This would mean that the processing could be done more efficiently as only the necessary data is copied inside the CPU and also only one copy of the static data is used at all and hence the use of memory is vastly more efficient The assembly could however only be done at the driver level as only the driver can guarantee synchronised access to write to the static data structures Stack layering would still be preserved as the packet formation would still effectively be done at the correct level of the stack 15 Ping Packet Packet Reception d DMA PingData a Pre calculated Calculate response Parameters parameters Packet Assembly Send PAI i i and DMA Ping Processing and Response using Packet Assemble Instructions PAI Method This takes the idea of Zero copy one step
24. e free packet pool and transfers the data into the buffer via DMA When DMA completes the driver is called again and the packet is transferred into the queue of packets that the stack is processing The driver returns and the stack then processes the packet Upon completion of the processing the stack gives the packet back to the free packet pool The stack will then usually need to produce a response To do this it requests a buffer from the free packet pool fills the buffer with data and the response buffer is put on to the queue of packets ready for transmission by the driver The driver is then called again and if the NIC is ready for a packet to be transmitted then the packet is transferred via DMA When DMA completes and the packet has been transmitted successfully then the driver gives the buffer back to the free packet pool The algorithm avoids the overheads of memory allocation and de allocation by effectively simplifying the memory allocation scheme for packets which also has the benefit of reducing memory fragmentation to zero given the small amount of memory available for processing packets and the otherwise high allocation de allocation rate this is a highly desirable result One drawback is that in the simplest of implementations at least the buffers in the free packet pool must be at least large enough to hold a complete IPv6 packet This could be overcome by having multiple free packet pools with different size buffers
25. edure merely inserts the free buffer into the free buffer linked list for the ISR to utilise at its leisure The send buffer and release receive buffer functions are unique in that they are the only exported functions that actually trigger an interrupt The send buffer function does this only if the NIC is idle as there could otherwise be considerable delay in transmitting packets The receive buffer function triggers an interrupt only under the same conditions in order for the Interrupt Service to quickly process delayed packets 3 8 1 Initialisation Code The initialisation code naturally differs in both drivers but is very similar both structurally and in intent It initialises the queue structures used in the driver the 18 hardware the interrupts and the DMA In the case of the Mitsubishi demonstration board it sets the DMA controller to trigger off an clock signal internally generated within the micro controller and also initialises a system timer to generate this signal In the case of the final hardware it sets the DMA controller to trigger off the INT1 signal which is the line to which the hardware trigger is wired It detects the NIC by reading the product ID code naturally if in an embedded system the hardware were missing this would be extremely surprising but this also serves to ascertain that the hardware is responding correctly and that the system is configured correctly in the development process The initialisations
26. ements Neighbour Discovery Mostly It supports router discovery It supports prefix discovery It responds to Neighbour Solicitations It implements DAD It supports sending router solicitations It receives and processes Router Advertisements It can send and receive Neighbour Solicitation and Advertisement messages It supports ICMP v6 Conta Deering 1998 It supports IPv6 Stateless Address Auto configuration Thomson Narten 1998 It uses native addressing instead of transition based addressing RFC3484 6 Conclusion The following objectives have been met in this project o An embedded system has been designed representing a typical embedded system that maximised the usage of the hardware resources available to it o The aforementioned system has been constructed debugged fully tested and works successfully 28 o A stack for a different processor has been ported to the M16C platform for use on both the demonstration board and the final hardware produced This stack communicated via PPP over serial and hence the hardware layer driver was removed o An Ethernet driver has been written for each system used in the project o The Ethernet driver for the CS8900A was moderately successful o The Ethernet driver for the RTL8019 operates successfully o Ethernet specific functionality has been written into the stack including Neighbour Discovery Address Auto configuration Duplicate Address Detection and mapping between Eth
27. er called the EUI 64 The EUI 64 is an identifier that is guaranteed unique to each Ethernet node as it is formed from the 48 bit Media Access Controller MAC address by the expedient of inserting two bytes in the middle To use the EUI 64 to generate the host portion of an address one bit is inverted the universal local bit is inverted in the EUI 64 Thomson Narten 1998 Unfortunately the process is more complex than the generation and use of a magic number since other methods of generating an address on a link are possible and hence the host must check that the address is unique on the link It does this by performing a process known as Duplicate Address Detection DAD and then by searching for routers to give it a link prefix so as to form a globally unique address 24 4 3 Duplicate Address Detection Duplicate Address Detection DAD is performed by forming a tentative address from the EUI and the link local prefix which is a special network prefix meaning on this link The host then sends a number of Neighbour Solicitations querying whether another host is already using the address The Neighbour Solicitation message is a general message that is used for querying the mapping of IPv6 addresses to Ethernet Addresses on a link If another host is either using the address already or is testing the address with a view to using it then Duplicate Address Detection fails and the interface must be configured manually Th
28. ermined to be the minimum number of layers necessary in order to be able to complete the design Two layers was considered the minimum especially when noise considerations were taken into account two layers allowed a reasonable area of one layer to be used for a ground plane as well as providing for the routing considerations in connecting two 100 pin chips The costs of manufacture was also a factor in the design a two layer board is considerably cheaper than boards with more layers as it can be manufactured in New Zealand Quotes were obtained from multiple manufacturers confirming this Implementing the design in two layers was of moderate difficulty as the design parameters of the PCB manufacturer were larger than manufacturers offering multi layer board manufacture so some parts were changed and the board was increased in size The board was designed for hardware triggered DMA operation as described above and featured two serial ports for In System Programming a method by which the Flash ROM in the micro controller is written via serial port debugging via monitor and simultaneous program output The micro controller units were sourced from Mitsubishi in Australia and arrived promptly The NICs were sourced from Cornelius Consulting in Germany as Realtek does not provide samples and only sells in bulk The PCBs were manufactured in Auckland by Circuit Graphix Ltd 2 6 Implementation Issues Due to the timeframe of the projec
29. ernet and IPv6 multicast addresses o Limited testing has been performed on the stack to ensure that it can configure itself correctly and communicate with other devices both on link and off link o A data flow system has been implemented so as to make efficient use of memory and time Although there is more work to be done in order to make the software produced in this project fully compliant this project shows that it is at least feasible to implement IPv6 on an embedded device The system produced is capable of up to 100000 packets minute corresponding to a data rate of approximately 4 Mb s each way with room for improvement This is a utilisation of approximately 40 of the link rate and a respectable figure for a low powered device The only barrier to feasibility of IPv6 in embedded devices that remains is the question of IPSec which this project does not address 7 References Conta A amp Deering S 1998 Internet Control Message Protocol ICMPv6 for the Internet Protocol Version 6 IPv6 RFC 2463 The Internet Society Retrieved 5h April 2003 from http www faqs org rfcs rfc2463 html Crawford M 1998 Transmission of IPv6 Packets over Ethernet Networks RFC 2464 The Internet Society Retrieved e April 2003 from http www faqs org rfcs rfc2464 html Deering S amp Hinden R 1998 Internet Protocol Version 6 IP v6 Specification RFC 2460 The Internet Society Retrieved 18 3 uly 2003 from http
30. ess to which Tx Packet 1 must be sent Since the stack was able to get a transmit buffer and compose a response processing of Rx Packet 1 completes and Rx Packet 1 is released into the free packet pool A second packet Rx Packet 2 is then received by the driver and passed along to the stack Unlike the previous packet processing on this packet cannot be completed as there are no transmit buffers available in which a response can be built The transmit buffer allocation procedure locks the IPv6 stack until a transmit buffer becomes available in this case when the address resolution request finishes transmitting 12 Stack buffers Free Receive Buffers Driver Receive LA Rx Packet 4 Stage 2 Tx Packet 2 The Address Resolution Packet is sent freeing up a transmit buffer for Receive Packet 2 The stack generates the response Transmit Packet 3 to Receive Packet 2 again to an unknown address An address resolution packet must be sent but there are no free transmit buffers hence Transmit Packet 3 is frozen for the time being Receive Packet 2 is however released In the meanwhile reception continues and Receive Packet 3 comes into play again being frozen as there are no free transmit buffers Receive packet 4 delivers the coup de grace Stack buffers Free Transmit Buffers Driver Transmit Buffers Tx Packet 1 Tx Packet 1 The Memory Recycle Algorithm Four Steps to Deadlock Part 2 The second phase of this occurrence can
31. from the Interrupt Service Routine This means that there is a period in the Interrupt Service Routine after the reading of the Interrupt Status Queue where new events can occur but not generate a response due to the micro controller ignoring the interrupt Unless enabled explicitly interrupts cannot interrupt Interrupt Service Routines The programming interface consists of the queue of events being exposed via the Interrupt Status Queue and also a set of control registers which can be written to so as to execute commands to initiate received buffer transfers or transmissions It is a relatively simple interface in terms of transmission and reception as processes like buffer management and event queuing are managed on the NIC The RTL8019 has by contrast a slightly more complex interface due to more exposure of the internals to the outside world The interrupt is level triggered which means that the interrupt logic needs to be reset for each event as otherwise the Interrupt Service Routine will be called again by the micro controller on exit This does however provide the advantage that in the period between the Interrupt Status Register and the Interrupt Service Routine exiting the Interrupt Service Routine will be immediately re triggered The memory management of the buffers is reasonably simple and is exposed to the micro controller Memory management of the NIC transmit buffer space is left to the micro controller and memory manageme
32. further instead of avoiding copying the packet around in the CPU it avoids even assembling the packet in the CPU and it achieves this in an efficient way that requires very little processing It allows a very low memory usage out of band channel to be constructed aiding the solution of the deadlock problem in the transmit path but handling of the receive path would have to follow a similar procedure as outlined under the other solutions 3 5 Multi Buffering The use of DMA allows the parallelisation of data transfer and other tasks but to fully exploit this fact software support must be provided as well parallelising tasks is not possible without the tasks themselves being parallelisable In terms of getting maximum throughput while transferring data to the hardware this project utilises a multi buffering scheme The effect that this scheme has is that previous data is being processed while new data is being transferred from the hardware This requires at least two buffers per transfer direction one buffer for the data that is currently being transmitted and one working buffer where data is being assembled or processed This transforms the stack data transfer from what is essentially a stop and wait scheme to a sliding window scheme The Ethernet driver produced allows the number of buffers to be easily modified 3 6 Multicast Reception Both NICs must be explicitly programmed to receive multicast packets but ca
33. however for retransmissions of address resolution request packets This method is one of the simplest solutions but probably the least efficient 3 4 2 Dedicated Privileged buffers Another way of dealing with the deadlock problem as outlined above is by the introduction of dedicated buffers for address resolution Address resolution packets in IPv6 are only 100 bytes compared to full size Ethernet buffers at 1 5 KB or minimum MTU size IPv6 buffers at 1 2 KB This solution would set aside small sized privileged buffers only usable by address resolution code In the case of address resolution transmissions these buffers would recycle quickly and reliably since address resolution requests are targeted towards multicast addresses which are already known In the case of address resolution responses however filtering would need to be performed The first filter would be at the Ethernet driver level which would drop any packets too large to be an address resolution packet The second would be at the stack or address resolver module which would filter any packets that were not address resolution responses This solution is an improvement on the method above as it does not waste the processing time that has already been used producing packets but instead provides an out of band channel in the buffer management in order to get requests through It is still relatively simple however 3 4 3 Packet Assemble Instructions PAI Queues The Packet
34. ible as the state of the hardware is only changed explicitly not implicitly This means that receive operations can be delayed and packets left in the hardware buffer for later processing enabling the order of driver operations to follow the natural data flow of the system The second main difference is because the RTL8019 only changes its internal state by explicit operations the demands on the ISR are relaxed as there is little danger of losing data by out of order operations The other side to this of course is that some smarter management is needed in the ISR but this works to the advantage of the programmer by increasing the flexibility In this ISR for an example the amount of free transmit buffer space in the NIC could be read and then if it was not possible to transmit the oldest packet due to size constraints then the transmit packet queue could be checked for an appropriately sized packet Due to time constraints this was not implemented along with a few other elements that would improve the robustness and efficiency of the driver such as checking for Receive Buffer Ring overflows 3 9 Implementation Issues There were a number of issues experienced in the programming of the hardware software interface that made it more difficult Some of these were from the compiler some from the documentation available and some were a result of hardware issues The issues with the compiler were in code generation In certain cases the compile
35. is situation can be detected by either receiving a Neighbour Solicitation message querying the tentative address or by receiving a Neighbour Advertisement a message stating the Ethernet address mapping to the IPv6 address In order to be able to receive a Neighbour Advertisement message a host is required to include its Ethernet address in the Neighbour Solicitation so that other hosts have a hardware address mapping that they can use to address a reply The Neighbour Solicitation message is addressed to a special IPv6 address the Solicited Node Multicast address formed by a Solicited Node Multicast prefix and the lower 24 bits of the address required Narten Nordmark Simpson 1998 4 4 Router Discovery Router discovery is fairly simple in IPv6 a host sends out a Router Solicitation message addressed to the all routers multicast address and if there are any routers attached to the link they will reply with a Router Advertisement message containing the prefixes for the link as well as other parameters that the host should or may use such as the Time To Live for outgoing packets the link MTU and optionally the router s link layer address The host then combines the advertised network prefixes with its host identifier to get a unicast addresses scoped globally or to the site The information in the Router Advertisement is however time limited and includes an expiry date at which point the host may no longer use the p
36. ll only make the problem less likely to occur as in a theoretical sense this problem could clearly occur regardless of the amount of buffer memory unless other steps were taken 13 There are however solutions to this problem three of which are detailed below A complete solution would likely combine aspects of each solution as some have the advantage of simplicity while others offer better performance in general 3 4 1 Deadlock detection and buffer flushing One way to deal with the problem of deadlock in this manner is with co operation of the IPv6 stack In this case the stack updates a variable when it receives each packet as to whether it is an address resolution packet or not In the case where the transmit buffers are all engaged a situation that is easy to detect in the buffer allocation function these buffers can be checked to see if they are waiting on address resolution If they are then the transmit buffers can be flushed freeing them up for new responses This method is perfectly valid in IPv6 as it expects to see a lossy transmission medium but it is not very efficient due to the amount of CPU time already spent processing the relevant packets A similar variant on this is to have a separate module for address resolution filtering out non address resolution packets in situations where all the transmit buffers were waiting on address resolution This variant would still have to drop packets from the transmission queue
37. ly One example of this is in the addressing module the Neighbour Cache should really be a direct mapped cache but was implemented very quickly as a fixed array structure with first in first served mapping This simple modification would show significant differences in stack performance 5 Testing 5 1 Problem Statement For an IPv6 implementation to be useful it must be known to follow the IPv6 standards so that it can communicate with other devices including devices from different vendors The way that this is done is to do conformance testing running a test suite that can report whether the stack satisfies the requirements The TAHI test tools are tools that were designed for such a purpose and satisfy the requirements for IPv6 testing However during the development process conformance tools are not necessarily useful as often very localised tests testing specific or incomplete functionality are required 5 2 Testing Setup The WANDNet test network was used to test the hardware and software in the development stages The FreeBSD operating system was installed on two spare PCs and the PCs were connected to WANDNet along with the device under test One of the PCs was configured as an IPv6 router in order to test stateless address auto configuration and the other was configured as an IPv6 host in order to be able to test on link directly connected and off link indirectly connected via a router data transmissions FreeBSD
38. ming as it makes the device easier to program and test Events are signalled by the Interrupt Status Queue as stated before but the role of the Interrupt Service Routine is then just to read registers make decisions on what data to transfer and trigger transfers as opposed to having to process specific events as is the case with the CS8900A The transparency of the buffer management interface is an advantage as it means that the micro controller can form a complete picture of the state of the buffers vastly improving its ability to utilise the buffers in a manner that supports software buffering This is clear when the following situation is considered If the stack has two buffers for reception totalling 3KB and the NIC is configured to set aside 13KB of 16KB of buffer space for reception buffers then the total amount of space that can be utilised for buffering is 16KB assuming that the buffer space is managed in a sensible manner This is an example of the maximum utilisation of hardware needed for high efficiency in systems utilising limited memory 3 8 Ethernet Driver Structure and Operation The Ethernet Driver is in the critical path of all packets and it is hence critical that it be as efficient as possible To make the matter more complex it must fully utilise the available hardware such as to maximise efficiency and also be structured with care so that the correct sequences of operations are performed upon the hardware for ins
39. n be programmed to be selective about which multicast packets are to be received This is done via the Hash Filter in the CS8900A and the Multicast Address Register in the RTL8019 The mechanism is very similar When a packet is received its address is put through a CRC function to produce a 6 bit code This six bit code is put through a demultiplexer to in order to select 1 of 64 bits If the bit that is selected matches the corresponding bit in the hash filter register and that corresponding bit is set to a one then the packet is accepted Due to time constraints no use was made of the hash register but it could be a factor in speeding up processing as detailed in later sections The address so accepted is referred to in the documentation as a hashed address or a hashed multicast address 16 3 7 Differences between the CS8900A and the RTL8019AS The CS8900A is the NIC in the Ethernet Card used on the demonstration board and the RTL8019 is the NIC used on the final hardware The two NICs are quite different in terms of the programming interfaces that they export and an understanding of these interfaces is relevant to understanding the Ethernet driver operation The CS8900A exports quite a simple interface it generates an interrupt and passes sequences of event in the Interrupt Status Queue register The interrupt is edge triggered and hence the interrupt logic in the NIC does not have to be reset on return
40. n fully debugged on the CS8900A due to time constraints but the problem is thought to be linked to an incorrect skipping of packets causing internal memory management problems inside the CS8900A The correct manner of processing the packet skip operations had been implemented as per the documentation but the symptoms are the same as if it had not Apart from the throughput testing only function testing has been performed The stack correctly auto configures itself and correctly detects routers and configures link prefixes It correctly performs both Neighbour Discovery and Duplicate Address Detection and is capable of responding to and addressing hosts both on and off link One issue struck with the testing was that the initial routing daemon failed to respond to Router Solicitations This was resolved by replacing the routing daemon after an attempt at debugging that showed that packets were not being passed correctly to the daemon This may have been a configuration error but the quicker and easier solution was simply to replace the daemon The stack complies with the following node requirements It follows the appropriate layer 2 document IPv6 over Ethernet Crawford 1998 It processes unrecognised options in hop by hop or destination options It processes extensions as described in RFC 2460 It limits the layer 4 payload size because it doesn t allow outgoing fragments It accepts packets addressed to it but not router packets Impl
41. nt elements of the NIC receive buffers are manipulable by the micro controller meaning that if jams occur it is immediately clear to the debugger as contrasted to the CS8900A where errata and FAQs indicate what operations or sequences of operations should not be performed The way that the RTL8019 manages its buffers is by the control logic for the Receive Buffer Ring four registers As the name suggests the Receive Buffer Ring is a space in the RTL8019 buffers that is managed by treating it as a ring of buffers The four registers that control the Receive Buffer Ring are the Boundary Pointer Current Page Pointer Page Start Pointer and the Page Stop Pointer The Boundary Pointer is set to the end of the last buffer that the micro controller has read while the Current Page Pointer is set to the beginning of the next free 256 byte page of buffer memory If the Current Page Pointer should be incremented such that it is equal to the Boundary Pointer then a Ring Buffer Overflow condition is considered to have occurred and the NIC will need to be reset The Page Start and Page Stop Registers respectively set the beginning and the end of the Receive Buffer Ring in the buffer RAM The ability to modify these registers means that varying proportions of transmit to receive buffer space can be set which is an advantage as this ratio can be optimised to the application 17 The exposure of the Receive Buffer Ring is significant in terms of program
42. ock Part 1 One clear blocking point at the level of code that this project is concerned with is address resolution When a packet is addressed to an IPv6 address that the system cannot yet map to a hardware address the packet is put on a queue and an address resolution request is sent When an address resolution response arrives this queue is checked and any outstanding packets that can be sent are sent However if too many packets requiring address resolution are queued then the transmit buffers can become blocked When the transmission buffers become blocked the blocking effect propagates through the system the stack cannot compose responses because it has no buffers to compose them into and it cannot receive any more packets because it is waiting for transmit buffers in order to complete processing on previous packets This results in deadlock if not checked by other mechanisms The first phase of this occurrence can be seen in the diagram above In this case a packet Rx Packet 1 is received by the Ethernet driver and is then passed along to the stack which generates a response The response is directed towards an IPv6 address that hasn t been seen yet hence there is no way to send the packet since all Ethernet packets must have a hardware destination address The response Tx Packet 1 is put in a queue awaiting address resolution and an address resolution request Tx Packet 2 is sent out in order to get the hardware addr
43. on a development board on loan from Mitsubishi in order to parallelise the software and hardware development and as such two Ethernet drivers were written as the hardware that was produced used a different chipset from the development board Ethernet card The purpose of this project was to find the practicalities and limits of implementing IPv6 in an embedded system by building a typical system and attempting to modify an existing IPv6 stack to run on it The problems that have been tackled in this project fit into four main categories the design and construction of the hardware the data transfer strategy and interfacing of the stack to the hardware the IPv6 stack development and the testing of work completed The following report is structured as such 1 1 Specification IPv6 is still a protocol under development and work is still going on to standardise the minimum node requirements Work has been done by the TACA Tiny software and system Architecture for non Computer Appliances Project to define the minimum requirements for LCNAs Low Cost Network Appliances Okabe Sakane Inoue Ishiyama Esaki 2003 but that effort has since been absorbed into the main IPv6 Node requirements document Loughney 2003 which defines what an IPv6 node must implement in order to be considered IPv6 compliant The elements of the node requirements document that this project is compliant with are listed under testing 2 Hardware 2 1
44. performed efficiently in hardware In addition to this requirement automatic key generation may be too higher a load for the micro controller to handle while dealing with other tasks 4 8 Implementation Issues The implementation issues experienced during the work on the stack were similar to the issues experienced in the software hardware interface programming The compiler s code generation was a factor in the stack programming as well as in the hardware software interface 26 In terms of selecting code to modify the code that I was originally to modify was in fact a different stack that had already been ported to the M16C but the commenting was poor and the code a little untidy with sections of code commented out so I decided to use the original stack that it had been based on It was noticed that some of the standard C routines were a little slow and so for time critical sections of the code functions such as memcpy have been replaced with macros to do fixed length copy operations which produced substantial improvements These sort of modifications have not been implemented throughout the code however and the code as it stands could benefit from many optimisations as the focus of the project in terms of the time constraints was to prioritise the objective of getting at least a minimum set of functionality As a result complete compliance was not achieved in all modules and some optimisation work would benefit the stack great
45. program the device and it is difficult to program and understand a device based on public driver code written for other operating systems and architectures Fortunately upon contacting Realtek they were able to provide me with datasheets that were sufficient to program the device in fact devices for a National Semiconductor chip that the RTL8019 was an upgraded clone of The material that they gave me also had example drivers for some embedded systems as well as popular PC operating systems but these were only of limited use since the embedded drivers were primarily using polled mode instead of interrupt driven I O 23 The last issue experienced with the hardware software interface was due to the 8 16 bit hardware error detailed in the hardware section This took some time to debug as the problem eventually had to be diagnosed with an oscilloscope 4 Software 4 1 Problem Statement The stack that needed to be modified was a stack written by Chris Gascoigne for the Infineon C167 processor It communicated via the Point to Point Protocol PPP over a 9600 bps serial line The requirements of this project dictated a stack for an M16C family processor communicating via 10 Mbit Ethernet This dictated three main objectives for the stack The first is that the stack had to be ported to the M16C family of micro controllers The second achieved by the hardware software interface is that the micro controller had to be interfaced to the
46. r would fail to perform shift operations correctly instead setting the variable to zero This problem had to be resolved and debugged down at the assembly language level and the debugger Running on the PC and communicating with the board via serial port provided with the development kit proved an invaluable tool in resolving this problem However much care had to be taken in writing further code as even with care this error was experienced several times The second code generation problem related to near and far pointer parameters to functions The micro controller being a 16 bit device with a larger than 16 bit address space has the concept of near and far pointers the near pointer being 16 bits and the far pointer being 32 bits In the case of pointers being passed as parameters the pointers were not automatically converted for instance two bytes of pointer data would be pushed onto the stack and four bytes taken off as a parameter This problem was bypassed by using explicit casts on some parameters to functions although it came up a number of times as much care had to be taken to avoid the situation These issues had a strong effect on the development of the code as a compiler is usually a trusted part of the system and hence compiler error is one of the last possibilities to be considered and examined The RTL8019 documentation was the second major issue the datasheet that Realtek produced for the RTL8019 is insufficient to
47. refix Narten et al 1998 This is intended to allow site easy renumbering Unfortunately due to time constraints this timing out of router information has not yet been implemented 4 5 Neighbour Discovery Neighbour Discovery is the IPv6 equivalent to the Address Resolution Protocol ARP it is the protocol that maps IPv6 addresses to hardware addresses on multicast capable links Neighbour Discovery also keeps track of reachability in both directions To map an IPv6 address to an Ethernet address the host sends out a Neighbour Solicitation message as detailed above under Duplicate Address Detection If it gets a Neighbour Advertisement back then it knows the Ethernet address and also that the host is bidirectionally reachable at that point in time and so it caches the address for a time defined by the mandated ReachableTime variable If it fails to receive a Neighbour Advertisement in a certain amount of time then it will retransmit the Neighbour Solicitation a certain amount of times Narten et al 1998 The behaviour above is sufficient to get enough address information to be able to transmit packets to the specified IPv6 address but it does not address questions of reachability The Neighbour Discovery specification specifies a number of states that addresses may be in at certain points and appropriate actions to take Due to time restrictions the complete Neighbour Discovery specification has not been completed 25 in this p
48. remaining reasonably compatible enabling the replacement of the micro controller should the selected micro controller prove insufficient for the task The M30803 runs at 20MHz which is the same as the 10 Mbit Ethernet NICs and features 20 KB RAM and 256 KB of ROM hence satisfying or exceeding memory and processing power requirements It has a DMA controller and features In System Programming ISP a technology that allows the internal ROM to be programmed without removing the micro controller from the circuit It is also a micro controller aimed at the household appliance and office equipment market which means that it is well suited as exemplar of the target system family Mitsubishi 2003 The Realtek RTL8019 was selected for the Ethernet interface as it is a cheap and common NIC and it supports both 10 Mbit Ethernet and the ISA bus interface standard which makes it easier to interface to the micro controller as opposed to more recent NIC designs which utilise the PCI bus common in modern PCs In addition to these considerations the hardware structure of the RTL8019 supports solutions to some Hardware Software interface difficulties involving low memory and the Memory Recycle Algorithm This is detailed under later sections 2 4 Direct Memory Access DMA Support Direct Memory Access is important in an embedded system running a high bandwidth device such an NIC because it allows more efficient copying and it allows the copying to
49. rements a counter at the moment but it would normally retransmit the packet in this case The last main event is the Receive event indicating that the hardware has received a packet This event indicates whether the packet received was addressed to a broadcast unicast or hashed address The ISR doesn t pass this address type information up the stack at this stage but it may be desirable for it to do so in the future in order to speed up address recognition further up the stack Upon receiving this event the ISR would check that the Packet OK bit was set before checking for a free receive buffer If a free receive buffer was available the ISR would allocate it and then initiate DMA to transfer the packet into the waiting buffer If there were no free receive buffers then the ISR would issue a skip packet command to the NIC in order to avoid overflowing the internal memory an event that would require a minimum of a 10 ms reset time to correct The ISR would then return Should the ISR process an event that did not require returning from the ISR it would loop back to the start and continue reading the Interrupt Status Queue until the time at which the Interrupt Status Queue held a value of zero signalling that no events were currently outstanding If the ISR had not returned by this point it would check to see if there were any outstanding transmission requests If so it would initiate them which involved writing a transmi
50. rence for more The reasoning for this is that the existing stack already uses 10 KB of RAM and with the addition of Ethernet functionality the memory requirements increase The amount of ROM was estimated to be a minimum of approximately 10KB This figure allows for the code size of the existing stack with some room to spare Given that in most embedded micro controllers the ROM capacity far exceeds the RAM capacity this metric was not important in selecting a micro controller Renesas 2003 In addition to the requirements above the decision was made to use a micro controller with DMA support in order to relieve CPU contention as the transfer of Ethernet scale amounts of data amounts to a considerable waste of processor power in copying The system also needed a Network Interface Controller NIC in order to interface the micro controller to the network The NIC was specified to be a common 10 Mbit controller as 10 Mbit Ethernet is both common and compatible with a large amount of network equipment such as hubs and routers In addition 10 Mbit Ethernet presents a reasonable amount of data to the micro controller a 100 Mbit NIC would require computing power far in excess of what the target micro controller provides 2 3 Component Decisions The Mitsubishi M30803 processor was selected for the hardware implementation It is part of the M16C family of micro controllers which scale from 5 MIPS up to 32 MIPS in CPU power while still
51. roject although there is enough implemented to allow the project to function if non compliantly 4 6 IPv6 to Ethernet Address Mapping IPv6 has a number of address types unicast multicast and anycast to name a few Some of these addresses have corresponding addresses in Ethernet the all link local nodes multicast address directly maps onto the Ethernet broadcast mechanism The unicast addresses must clearly map onto corresponding unique Ethernet addresses The other multicast addresses also have equivalents and since multiple multicast addresses are provided on Ethernet a mechanism has been specified to map IPv6 multicast addresses to Ethernet addresses albeit imperfectly but sufficiently such that nodes will not have to process excessive quantities of broadcast messages in the case of extensive multicast usage This mapping is an effective use of the medium but increases processing time as the type of address and the mapping must be determined for each and every outgoing packet there is a similar occurrence at the reception end of the stack where the stack has to check the address against the multiple addresses that it either owns or listens to A way to optimise this is to pass up and down the stack information about which class an incoming or outgoing address falls into This optimisation has not been implemented time constraints but the multicast mapping has been implemented for all multicast addresses which the sta
52. scheme the memory allocation would fail usually desirable as in a normal system there is no guarantee that memory will become available In the situation of the data processing system detailed above however memory becoming available is just a matter of time time for the hardware to complete its current transmissions and free some more buffers Under the Memory Recycle Algorithm the memory allocation is blocked if no buffers are available naturally only for non driver processes This means that the data flow can be controlled to where demands can be met in all places in the system effectively making more efficient use of memory by using it where it is needed most This also makes the data processing programs easier to write as the sensible behaviour for a process in this state would be to poll the free packet pool until it had buffers available In the case of the driver the driver can drop packets or delay reception of packets without substantial effect on the system as IPv6 assumes a lossy link layer The last effect of controlling memory usage this way is that the total memory usage of the IPv6 subsystem is fixed and hence memory availability can be guaranteed for other processes in the system given that IPv6 operation is unlikely to be the primary operation in any particular embedded system One last point to make is that while one free packet pool has been assumed to exist above in practice two were used one for packets
53. ssion event to the hardware and then checking a status register to see if the NIC was ready for data to be transmitted If the NIC was ready for the data the ISR would set up and initiate a DMA transfer If not the transmission request would later result in a Transmit Ready event which would initiate the transfer of the packet to the NIC The packet status would then be marked as started and if the DMA transfer had taken place the packet would be transferred to the transmitted buffer queue awaiting confirmation of its transmission If there were no outstanding transmission requests then the ISR would just exit Several things to note in this structure are the partitioning of events with respect to the reading of them and the necessity of dealing with events first in the ISR as opposed 20 to last In the CS8900A the reading of events changes the Interrupt Status Queue and also changes the state of the hardware For an example if an Interrupt Status Queue read occurred during a DMA transfer of a packet to the micro controller the read would be interpreted as an implied skip and the internal buffer in which the packet resided would be de allocated ensuring that the packet data would be irretrievable This means that events must be partitioned in that the event that the Interrupt Status Queue signals must be completely processed before the next event is read even in the case of events that do not require reading the In
54. t and a miscalculation on time available the PCB design was completed under intense time pressure pressure that was increased when the RTL8019s failed to come in the two weeks promised The RTL8019s eventually arrived from Germany six weeks after they were ordered Due to the time pressure in getting the board designs off to be manufactured several small errors crept into the design The M30803 comes in two different packages and I did not realise that the pin configurations on the two different packages were different In fact they were out by two In addition the serial interface chip also came in two packages both the same shape but one smaller than the other They too had different pin configurations As a result the first smoke test failed and considerable time was spent in debugging the hardware offline as the problems in the hardware could obviously not be observed in the state it was in The problems were discovered and it looked as if the hardware part of the project would fail the one hundred pin micro controllers and NICs had been considered far too difficult to hand solder on to the board and it was planned to have the boards assembled by a company in Auckland However due to the helpful suggestions and assistance of other people a solution was realised One hundred wires approximately 1 5 cm long were hand soldered to the micro controller and then the other ends were painstakingly hand soldered to the board
55. tance reading of the Interrupt Service Register on the CS8900A NIC changes the state of the NIC It is hence not surprising that the hardware dictates some of the structure of the driver Ideally the ISR should also not impose many if any restrictions on the other processes that produce or consume the data that it processes the interface must be static so that replacement of the hardware driver does not impact the integration of the stack and the driver It should also ideally be asynchronous such that it does not demand or rely on processing from other code The drivers that have been developed in this project are strongly event driven synchronous processes which present an asynchronous best effort with respect to delay service to the other consumer producer processes The drivers that were written are identical in the interfaces that they export to the other processes and also share some code They however vary quite markedly in the Interrupt Service Routine ISR Non Hardware Specific Functions The functions that do not have to access the hardware functions like the transfer buffer allocation function are quite simple in structure They are directly called by other procedures and either add or delete objects from various linked lists with the manipulation procedure flanked by interrupt disabling and enabling operations in order to ensure that the operations are atomic If a receive buffer needs to be freed for example then the proc
56. terrupt Status Queue such as the transmission of packets which can be polled The approach of combining DMA read and writes Transmitting a packet to the NIC at the same time as receiving a packet from it was attempted but the hardware failed to work under these conditions This was unfortunate because the ability to transfer data using two DMA channels would allow higher use of data bus bandwidth but it did not come as a surprise The necessity of processing NIC signalled events before actuating driver transmit events was prompted by earlier code revisions that failed to operate correctly when processing driver transmit events first The relevance of the former point is that the hardware affects the structure of the ISR in this case the ability to read the hardware status at any time does not exist The relevance of the latter point is that processing the transmit events first would be a little better as the processing of the transmit events before the receive events follows the natural data flow of the system For an example consider the situation where the receive buffers are full and the stack is waiting on a transmission in order to continue processing packets If data cannot be transmitted before the receive events are processed then incoming packets must be skipped before the transmit event can take place in order to free up the stack If the transmit event can occur first however then the possibility exists to leave the receive event
57. they were very valuable learning experiences few university graduates complete their degrees having completed a design such as this 3 Hardware Software Interface 3 1 Problem Statement The interface between the hardware and the software is a critical one It must transfer data quickly and efficiently from the NIC to the micro controller RAM while inflicting a minimum of CPU overhead To do this it must exploit fully the functionality available in the hardware and should only minimally dictate the structure and requirements of the packet processing code One example which it would be best to avoid for instance would be to require the stack to process all incoming messages in the inter packet delay time This would put high perhaps unachievable requirements on the stack processing code that need not exist Another situation it is best to avoid is where the stack wishes to send a packet but must wait in a loop until the NIC becomes ready to send this can be overcome by buffering outgoing packets The last piece of functionality that the stack must exhibit is buffering in a manner that makes the most efficient use of memory possible Clearly only a minimum of memory space is actually available for buffering an intrinsically memory hungry operation 3 2 The Zero Copy Paradigm The Zero Copy paradigm is an important concept in the design of both embedded systems software and modern Operating Systems The concept is simple
58. unprocessed and hence let the stack do more processing and free up a receive buffer for incoming packets letting the ISR process the incoming packet at a later time hence avoiding having to drop the packet This shows what subtle effects the hardware can have on system dataflow The second ISR was for the final hardware produced in this project utilising the RTL8019AS This ISR was also triggered by both hardware interrupt and by DMA completion This first step in this ISR was to check a state variable indicating whether the ISR was in a state where it was completing DMA or whether it was responding to an NIC interrupt The state variable is necessary because it is not possible to determine the interrupt source without querying the hardware CHECK THIS and due to the way that both the hardware and the RTL8019 were designed it is not possible to check the Interrupt Status Register during DMA This is for two reasons The first is that the registers in the RTL8019 are divided into four pages and the register that selects which page to expose to the system bus is also the page that controls the DMA requests including the DMA abort The second is due to a minor error in the hardware that means that the bus width must be switched between eight and sixteen bits when accessing control registers and performing DMA respectively In the case where the ISR has been triggered by DMA completion the ISR switches the bus back into eight bit mode and then che
59. with data The hardware is also configured to generate the CRC checksums at the end of each packet as otherwise these would need to be generated in software In the case of the RTL8019 there is no option to disable the CRC generator and the size of the transmit buffer has already been set in the receive buffer ring initialisation 3 8 2 The Interrupt Service Routine ISR The core of the driver is the ISR This is the only part of the driver that interacts directly with the hardware apart from the initialisation code This is done this way so that all interactions with the hardware can be tightly synchronised and controlled If for example the send packet and receive packet functions directly wrote to the hardware then they would have to be synchronised via wait loops wasting CPU time as otherwise they would interrupt each other s transfers and hence not only would nothing be accomplished but the hardware would be left in an unknown state The first ISR that was written for this project was the ISR for the Mitsubishi demonstration board and Ethernet card which utilised the CS8900A NIC The ISR was triggered by either a DMA transfer completing or by the NIC asserting an interrupt The first step upon entering the ISR was to check if the DMA was running if it was then the ISR would exit as it would disturb the DMA transfer if it started 19 reading or writing to the NIC If the DMA had completed then it would set the status of the l
60. www faqs org rfes rfc2460 html Loughney J 2003 IPv Node Requirements The Internet Society Retrieved 23 June from http www ietf org internet drafts draft ietf ipv6 node requirements 06 txt Mitsubishi Electric Corp 2001 A M16C 80 Group Data Sheet Rev E3 Japan Mitsubishi Electric Corp Kitaitami Works Mitsubishi Electric Corp 2001 C User s Manual M16C 80 Group Rev B Japan Mitsubishi Electric Corp Kitaitami Works Narten T Nordmark E amp Simpson W 1998 Neighbour Discovery for IP Version 6 IPv6 RFC 2461 The Internet Society Retrieved 5 April 2003 from http www fags org rfcs rfc2461 html 29 Okabe N Sakane S Inoue A Ishiyama M amp H Esaki 2002 Host Requirements for IPv6 for Low Cost Network Appliances Internet Engineering Task Force IETF Retrieved 23 J une 2003 from http www taca jp internet draft draft okabe ipv6 lcna minreq 02 txt Realtek Semiconductor Co Ltd 2001 RTLSOI9AS Realtek Full Duplex Ethernet Controller with Plug and Play Function RealPNP Taiwan Realtek Semiconductor Co Ltd Thomson S amp Narten T 1998 IPv6 Stateless Address Autoconfiguration RFC 2462 The Internet Society Retrieved 11 April 2003 from http www faqs org rfcs rfc2462 html 8 Bibliography Dunkel A 2002 Minimal TCP IP implementation with proxy support Thesis Report Sweden Swedish Institute of Computer Science SICS Dunkel A 2002

IPv6 in Embedded Micro-controllers

Contents

Download Pdf Manuals

Related Search

Related Contents