Home

C - Oracle Documentation

image

Contents

1. return DDI FAILURE return DDI SUCCESS default return DDI FAILURE For a description of the autoconfiguration process see Chapter 5 Autoconfiguration Controlling Device Access Access to a device by one or more application programs is controlled through the open 9E and close 9E entry points The open 9E routine of a character driver is always called whenever an open 2 system call is issued on a special file representing the device For a particular minor device open 9E may be called many times but the close 9E routine is called only when the final reference to a device is removed If the device is accessed through file descriptors this is by a call to close 2 or exit 2 If the device is accessed through memory mapping this could also be by a call to munmap 2 open int xxopen dev_t devp int flag int otyp cred_t credp Drivers for Character Devices 151 152 The primary function of open 9E is to verify that the open request is allowed Code Example 8 2 Character driver open 9E static int routine xxopen dev t devp int flag int otyp cred t credp int instance if getminor devp is invalid return EINVAL instance getminor devp one to on xample mapping Is the instance attached if ddi get soft state statep return ENXIO instance NULL verify that otyp is appropriate if otyp
2. dma esp le cgthree o sd sd ecc i86pc pseudo isa ecc mm smc aha asy cmdk cmtp eee Figure 1 2 Example device trees Each node is given a name by the kernel internally which is not necessarily the same name that applications use Associated with each leaf or bus nexus node may be a driver Each device driver has associated with it a device operations structure see dev ops 95 that defines the operations that the device driver can perform The device Overview of the SunOS Kernel operations structure contains function pointers for generic operations such as identify 9E and attach 9E It also contains a pointer to operations specific to bus nexus drivers and a pointer to operations specific to leaf drivers The SPARCstation in Figure 1 2 has several on board devices and a number of SBus devices On board it has some serial chips zs a floppy drive fd and an audio device These on board devices are children of the root node On the SBus it has a frame buffer cgthree an ethernet interface le and a SCSI host adapter esp These devices are represented as children of the SBus node Finally there are two disk devices sd connected to the SCSI host adapter and these are represented as leaf nodes on the SCSI host adapter The x86 device tree has an ISA bus which has a network device smc an asynchronous communication device asy and a SCSI host adapter aha As
3. Loadable driver interface 7 f modldrv 9S mod driverops Device configuration Y dev ops 9S be 9E vinos 7 ee attach 9E identify 9E detach 9E Device access open 9E cb ops 9S gt ioctl 9E chpoll 9E mmap 9E close 9E read 9E write 9E print 9E strategy 9E prop op 9E Figure 5 1 Autoconfiguration Data Structures The structures in this diagram must be provided and initialized correctly for the driver to load and for its routines to be called If an operation is not supported by the driver the address of the routine nodev 9F can be used to Writing Device Drivers August 1994 O1 lll fill it in If the driver supports the entry point but does not need to do anything except return success the address of the routine nulldev 9F can be used Note These structures should be initialized at compile time They should not be accessed or changed by the driver at any other time modlinkage nt ml rev void ml linkage 4 The modlinkage 9S structure is exported to the kernel when the driver is loaded The m1 rev field indicates the revision number of the loadable module system which should be set to MODREV_1 Drivers can only support one module so only the first element of ml_linkage should be set to the address of a modldrv 9S structure m1 1inkage 1 should be set to NULL
4. Physical Page Number 19 L Type Note This discussion is for informational and debugging use only Device drivers must not manipulate page table entries The following is a template bit mask that can be used to construct standard PTEs An acceptable mask assumes values as follows V valid 2 1 w s write ok supervisor only 11 don t cache 1 a m accessed modified 00 unused 00000 A one in the don t cache position only disables caching if the type is zero since other types of pages are never cached Substituting the above values the template looks like this 1111 00 00000 Lil it EE ble ee Lie yy e STE res Physical Page Number 19 Type Writing Device Drivers August 1994 2 This results in a mask of OxF0000000 assuming that the type field is 00 Thus the four masks for the four types of memory are Table 2 9 PTE masks Type Description Mask 0 On Board Memory OxF0000000 1 On Board I O Space OxF4000000 2 vmel d16 OxF8000000 2 vme24d16 OxF8000000 2 vme32d16 OxF8000000 3 vmel6d32 OxFC000000 3 vme24d32 OxFC000000 3 vme32d32 OxFC000000 To determine the value to be plugged into the PTE the appropriate mask is added to the physical page number resulting in the full 32 bit PTE Following are some rules for generating PTEs with the correct template based on the address space the device is in and assuming an 8K page size
5. The high level interrupt routine services the device and enqueues the data The high level routine triggers a software interrupt if the low level routine is not running Code Example 6 4 High level interrupt routine static u int xxhighintr caddr t arg struct xxstate xsp struct xxstate arg u char status temp int need softint mutex enter amp xsp high mu read status Status xsp 2regp csr Interrupt Handlers 117 118 if status amp INTERRUPTING mutex exit amp xsp high mu return DDI INTR UNCLAIMED device isn t interrupting xsp regp csr CLEAR INTERRUPT Flush store buffers temp xsp gt regp gt csr read data from device and queue the data for the low level interrupt handler if xsp gt softint_running need_softint 0 else need_softint 1 mutex_exit amp xsp gt high_mutex read only access to xsp gt id no mutex needed if need_softint ddi_trigger_softintr xsp gt id return DDI_INTR_CLAIMED The low level interrupt routine is started by the high level interrupt routine triggering a software interrupt Once running it should continue to do so until there is nothing left to process Code Example 6 5 Low level interrupt routine static u_int xxlowintr caddr_t arg struct xxstate xsp struct xxstate arg mutex_enter amp xsp gt low_mu mutex enter amp
6. for character oriented devices ay static in xxread dev_t dev struct uio uiop cred_t credp definition static in xxwrite dev_t dev struct uio uiop cred_t credp de finition Writing Device Drivers August 1994 Qo lll static int xxioctl dev t dev int cmd int arg int mode cred t credp int rvalp definition for memory mapped character oriented devices f static int xxmmap dev t dev off t off int prot definition for support of the poll 2 system call static int xxchpoll dev t dev short events int anyyet short reventsp struct pollhead phpp definition for drivers needing a xxprop op routine Ef static int xxprop op dev t dev dev info t dip ddi prop op t prop op int mod flags char name caddr t valuep int lengthp definition other driver routines such as xxintr y The C Language and Compiler Modes The SPARCworks 2 0 1 and ProWorks 2 0 1 C compilers are ANSI C compilers They support several compilation modes a number of new keywords and function prototypes Overview of SunOS Device Drivers 67 Compiler Modes The following compiler modes are of interest to driver writers Xt Transition Mode This mode accepts ANSI C and Sun C compatibility extensions In case of a conflict between ANSI and Sun C a warning is issued and Sun C semantics ar
7. Once the item is allocated the driver only needs to call ddi get soft state 9F to retrieve the pointer See Loadable Driver Interface on page 89 for an example use of these routines Writing Device Drivers August 1994 Qo lll Properties Properties define arbitrary characteristics of the device or device driver Properties may be defined by the FCode of a self identifying device by a hardware configuration file see driver conf 4 or by the driver itself using ddi_prop_create 9F A property is a name value pair The name is a string that identifies the property and the value is an array of bytes Examples of properties are the height and width of a frame buffer or the number of blocks in a partition of a block device The value of a property may be one of three types A boolean property which has no length it either exists or does not exist Aninteger property which has length four and has an integer value A long property which has an arbitrary length and whose value is a series of bytes Note Strictly speaking ddi software property names are not restricted in any way however there are certain recommended uses As defined in IEEE 1275 1994 the Standard for Boot Firmware a property is a human readable text string consisting of one to thirty one printable characters Property names shall not contain upper case characters or the characters NV i J and Q9 Property names
8. Exiting To exit either adb 1M or kadb 1M use q kadb 0 q Type go to resume ok kadb 1M can be continued by typing go on the OBP or c on SunMON Writing Device Drivers August 1994 1I3 Warning No other commands can be performed from the PROM if the system is to be continued PROM commands other than c continue may change system state that SunOS depends on Staying at the kadb 1M prompt for too long may cause the system to lose track of the time of day and cause network connections to time out Commands The general form of an adb 1M kadb 1M command is address count command If address is omitted the current location is used also stands for the current location The address can be a kernel symbol If count is omitted it defaults to iy Commands to adb consist of a verb followed by a modifier or list of modifiers Verbs can be Print locations starting at address in the executable Print locations starting at address in the core file Print the value of address itself gt Assign a value to a variable or register lt Read a value from a variable or register RETURN Repeat the previous command with a count of 1 Increment With and output format specifiers can be used Lowercase letters normally print 2 bytes uppercase letters print 4 bytes o O Octal d D Decimal x X Hexadecimal u U Unsigned decimal F 4 8 byte f
9. ok setenv sunmon compat false sunmon compat false The printenv command displays all parameters and their values Help Help is available with the help command History EMACS style command line history is available Use Control N next and Control P previous to walk the history list Forth Commands The Open Boot PROM uses the Forth programming language This is a stack based language arguments must be pushed on the stack before running the desired command called a word and the result is left on the stack To place a number on the stack type its value ok 57 ok 68 To add the two top values on the stack use the operator ok The result is left on the stack The stack is shown with the s word ok s bf Writing Device Drivers August 1994 2 The default base is hexadecimal The hex and decimal words can be used to switch bases ok s 191 ok decimal See the Forth User s Guide for more information Walking the PROMs Device Tree The SunOS like commands pwd cd and 1s walk the PROM device tree to get to the device The cd command must be used to establish a position in the tree before pwd will work This example is from a SPARCstation IPC ok ed To see the devices attached to the current node in the tree use 1s ok 1s ffec8760 ffecb5bce0 ffebabo64 ffeba958 ffeb9084 ffeb9020 ffeb8fb8 ffeb8
10. OxFFFFFFFF System Stack Data Text 0 The ABI specifies the system portion of a process virtual address space is in the high end and may occupy no more than 512 megabytes In other words all kernel addresses will be 0xE0000000 or higher Some implementations may use less kernel space and so begin at a higher address This fact can be used when debugging if pointers point below the address 0xE0000000 they probably are user addresses System Support The system provides a number of routines that can aid in debugging they are documented in Section 9F of the man Pages 9F DDI and DKI Kernel Functions cmn_err cmn err 9F is the function to use to print messages to the console from the kernel See cmn err 9F and Printing Messages on page 55 for more information on its use Debugging 239 13 240 Note Though printf and uprintf currently exist they should not be used if the driver is to be Solaris DDI compliant An example from the probe 9E routine might be to print a message if the device is not found Normally probe 9E routines should not print messages if the device is not there if ddi pokec dip amp regp csr ENABLE INTERRUPTS DDI SUCCESS cmn err CE NOTE s not found ddi get name dip return DDI PROBE FAILURE A handy format for printing device register bits is b See cmn err 9F for information on how to
11. The crash occurs a few instructions after a call to getminor 9F Examining the ramdisk c source file these lines stand out in rd write int instance getminor dev rd devstate t rsp if uiop uio offset gt rsp gt ramsize return EINVAL Notice that rsp is never initialized This is the problem It is fixed by including the correct call to ddi get soft state 9F since the ramdisk driver uses the soft state routines to do state management int instance getminor dev rd devstate t rsp ddi get soft state rd state instance if uiop uio offset gt rsp gt ramsize return EINVAL Debugging 261 13 Note Most data fault panics are bad pointer references Example kadb on a Deadlocked Thread The next problem is that the system does not panic but the mk s 1M command hangs and cannot be aborted Though a core dump can be forced by sending a break and then using sync from the OBP or using g 0 from SunMon in this case kadb 1M will be used After logging in remotely and using ps which indicated that only the mk s 1M process was hung not the entire system the system is shut down and booted using kadb 1M ok boot kadb d Boot device sbus esp80 800000 sd83 0 File and args kadb d kadb kernel unix Size 673348 182896 46008 bytes kernel unix loaded 0x125000 bytes used kadb 0 c SunOS Release 5 4 Version Generic UNIX R System V Release 4 0 Copyright c 19
12. If the device is in vmel6d16 vme24d16 or vme32d16 Use Type 2 Template If the device is in vmel6d32 vme24d32 or vme32d16 Use Type 3 Template If the device is in vme32d16 or vme32d32 Physical Page Number Physical Address gt gt 13 If the device is in vme24d16 or vme24d32 Physical Page Number Physical Address OxFF000000 gt gt 13 If the device is in vmel6d16 or vme16d32 Physical Page Number Physical Address OXFFFF0000 gt gt 13 Example One A device is attached at physical address 0x280008 in bus type vme24d16 which will be mapped into virtual memory at address OxE000000 What is the corresponding PTE Hardware Overview 37 38 Answer OxF807F940 Explanation Because the device is being mapped into vme24d16 OxF8000000 is used as the template Adding the physical address to OxFF000000 yields OxFF280008 In binary this is 1111 1111 0010 1000 0000 0000 0000 1000 Shifting this right by 13 yields XXXX XXXX XXXX X111 1111 1001 0100 0000 Adding the OxF8000000 template results in values for the 13 bits that are undefined from the shift The PTE is 1111 1000 0000 0111 1111 1001 0100 0000 In hexadecimal this is 0xF807F940 The resulting PTE maps the virtual page beginning at 0xE000000 to the physical page containing 0x280008 To get the virtual address to access the device it is necessary to take the lower 13 bits of the physical installation address the bits that are just passed through the MMU
13. ddi dma buf setup ddi dma free min getminor allocate space from kernel free memory free previously allocated kernel memory allocate and clear space from kernel free memory concatenate two message blocks log kernel errors get host ID from EPROM get major device number make packet for SCSI group 0 commands make packet for SCSI group 0 sequential commands make packet for SCSI group 1 commands make packet for SCSI group 5 commands map physical to virtual space remove physical to virtual mappings return the larger of two integers setup system DMA resources release system DMA resources allocate a main bus buffer free main bus resources set up use of main bus resources return the lesser of two integers get minor device number Writing Device Drivers August 1994 D lll Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description minphys mp_nbmapalloc MBI_ADDR msgdsize nodev noenable nulldev ovbcopy panic peek peekc peekl physio pkt transport poke pokec pokel printf pritospl psignal ptob pullupmsg put putbq putctl minphys ddi_dma_addr_setup ddi dma htoc msgdsize nodev noenable nulldev cmn err ddi_peeks ddi peekc ddi peekl1 physio Scsi transport ddi pokes ddi pokec ddi_pokel
14. ddi_getprop 9F is a wrapper around ddi prop op 9F It can be used to retrieve boolean and integer sized properties int ddi getlongprop dev t dev dev info t dip int flags char name caddr t valuep int lengthp ddi getlongprop 9F is a wrapper around ddi prop op 9F It is used to retrieve properties having values of arbitrary length The value returned is stored in a buffer allocated by kmem_alloc 9F which the driver must free with kmem_free 9F when the value is no longer needed int ddi getlongprop buf dev t dev dev info t dip int flags char name caddr t valuep int lengthp ddi getlongprop buf 9F isa wrapper around ddi prop op 9F ltis used retrieve a property having a value of arbitrary length and to copy that value into a buffer supplied by the driver valuep must point to this buffer int ddi getproplen dev t dev dev info t dip int flags char name int lengthp ddi getproplen 9F is a wrapper around ddi prop op 9F that passes back in the location pointed to by 1engthp the length of the property identified by name Writing Device Drivers August 1994 C lll Register and Memory Mapping These interfaces support the mapping of device memory and device registers into kernel memory so a device driver can address them int ddi_dev_nregs dev_info_t dip int resultp ddi_dev_nregs 9F passes back in the location pointed to by resultp the number of register specifications a device has
15. and add them to virtual address OxE000000 The lower 13 bits of physical address 0x280008 are 0008 adding them to 0xE000000 yields 0xE000008 the virtual address by which the device can be accessed Example Two A device at physical address 0xEE48 on bus type vme16d32 will be mapped to virtual address OxE000000 What is the PTE Answer 0xFCO7FFFF Explanation Because the device is being mapped into vme16d32 0xFC000000 is used as the template Adding the physical address to OXFFFF0000 yields OxFFFFEE48 In binary this is 1111 1111 1111 1111 1110 1110 0100 1000 Shifting this right by 13 yields XXXX XXXX XXXX X111 1111 1111 1111 1111 Adding the OxFC000000 template results in values for the 13 bits that are undefined from the shift The PTE is 1111 1100 0000 0111 1111 1111 1111 1111 Writing Device Drivers August 1994 No lll This is OxFCO7FFFF in hexadecimal To get the virtual address to access the device at physical address 0xEE48 add its lower 13 bits OxE48 to OxE000000 This yields OxEO00EA8 Sun 4 110 Considerations The Sun 4 110 MMU does not store bits 28 31 For the VMEbus which uses 32 bits of physical addressing bits 28 31 are generated by sign extending bit 27 When the PTE is read back these upper bits are always set to zero This essentially creates a hole in the address space that is not addressable When entering page table entries on a Sun 4 110 to test hardware from the PROM use a virtual
16. int ddi dev regsize dev info t dip u int rnumber off t resultp ddi dev regsize 9F passes back in the location pointed to by resultp the size of the register set identified by rnumber on the device identified by dip int ddi map regs dev info t dip u int rnumber caddr t kaddrp off t offset off t len ddi map regs 9F maps the register specification identified by rnumber on the device identified by dip into kernel memory starting at of fset bytes from the base of the register specification ddi map regs 9F then passes back in the location pointed to by kaddrp a pointer to the base of the register specification plus offset void ddi unmap regs dev info t dip u int rnumber caddr t kaddrp off t offset off t len ddi_unmap_regs 9F unmaps the register specification identified by rnumber on the device identified by dip The associated mapping resources are freed and the driver may no longer address the registers int ddi segmap dev t dev off t offset struct as as caddr t addrp off t len u int prot u int maxprot u int flags cred t credp ddi segmap 9F supports the mmap 2 system call which allows application programs to map device memory into their address spaces ddi segmap 9F should be used as the segmap 9E entry in the cb ops 95 structure Summary of Solaris 2 4 DDI DKI Services 333 334 int ddi mapdev dev t dev off t offset struct as as caddr t addrp off t len u int prot u int
17. suser swab testb timeout uiomove unbufcall unlinkb untimeout uprintf ureadc useracc scsi_reset scsi_destroy_pkt scsi probe pollwakeup ddi_slaveonly cv wait mutex enter mutex exit strcmp strcmp drv_priv swab testb timeout uiomove unbufcall unlinkb untimeout cmn_err ureadc useracc reset a SCSI bus or target free an allocated SCSI packet probe for a SCSI target inform a process that an event has occurred tell if device is installed in a slave only slot suspend calling thread and exit mutex atomically set CPU priority level reset priority level set processor level for STREAMS compare two null terminated strings copy a string from one location to another verify superuser swap bytes in 16 bit halfwords check for an available buffer execute a function after a specified length of time copy kernel data using uio 9S structure cancel an outstanding bufcall request remove a message block from the head of a message cancel previous timeout function call kernel print to controlling terminal add character to a uio structure verify whether user has access to memory Converting a Device Driver to SunOS 5 4 295 296 Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description usleep drv_usecwait busy wait for specified
18. u long dlim addr hi inclusive upper bound of address range u int dlim cntr max setto0 u int dlim burstsizes settol u int dlim minxfer minimum DMA transfer size u int dlim dmaspeed setto0 u int dlim version version number of this structure u int dlim adreg max inclusive upper bound of incrementing address register u int dlim ctreg max maximum transfer count 1 u int dlim granular granularity of transfer count u int dlim sgllen length of DMA scatter gather list u int dlim reqsize maximum transfer size bytes of a single I O ddi dma lim t dlim addr lo is the lowest address that the DMA engine can access dlim addr hi is the highest address that the DMA engine can access dlim minxfer is the minimum transfer size the DMA engine can perform It also influences alignment and padding restrictions It should be set to DMA UNIT 8 DMA UNIT 16 0r DMA UNIT 32 to indicate 1 2 or 4 byte transfers dlim version specifies the version number of this structure It should be set to DMALIM VERO Writing Device Drivers August 1994 ie dlim adreg max is the upper bound of the DMA engine s address register This is often used where the upper 8 bits of an address register are a latch containing a segment number and the lower 24 bits are used to address a segment In this case diim cntr max would be set to 0x00FFFFFF this prev
19. xsp ddi get soft state statep instance if xsp NULL return 1 if off is invalid return 1 return hat getkpfnum xsp regp csr off Writing Device Drivers August 1994 8 dev is the device number and of f is the offset into the device s memory prot specifies the kind of access requested such as PROT_READ and PROT_WRITE A value of PROT_WRITE for prot would be invalid on a read only device See mmap 9E and mmap 2 hat_getkpfnum 9F returns the page frame number for the memory that should be mapped xsp regp csr is the kernel virtual address of the device memory determined in attach 9E by calling ddi map regs 9F and stored in the state structure In Code Example 8 9 the whole address range up to of f must be mapped using ddi map regs 9F This can use a lot of system resources for devices that have a large mappable memory area and is a waste of resources if the driver only needs the mapping so it can call hat getkpfnum 9F A better way to get the page frame number for a given offset is to just map that individual page retrieve the page frame number then unmap the page before returning Since the page frame number refers to a page on the device it will not change when the page is unmapped Code Example 8 10 mmap 9E routine using less resources static int xxmmap dev t dev off t off int prot int kpfn 1 caddr_t kva if ddi_map_regs xsp gt dip rnumber amp kva
20. This is a required entry point it cannot be replaced with nulldev 9F or nodev 9F open SunOS 4 x int xxopen dev flag dev t dev int flag SunOS 5 x int xxopen dev t devp int flag int otyp cred t credp The first argument to open 9E is a pointer to a dev t The rest of the cb ops 95 routines receive a dev t Writing Device Drivers August 1994 A Verify that the open type is one that the driver actually supports This is normally OTYP_CHR for character devices or OTYP_BLK for block devices This prevents the driver from allowing future open types that it does not support If the driver used to check for root privileges using suser it should now use driv_priv 9F instead on the passed credential pointer psize This entry point does not exist Instead block devices should support the nblocks property This property may be created in attach 9E if its value will not change A prop_op 9E entry point may be required if the value cannot be determined at attach time such as if the device supports removable media See Properties on page 59 for more information read andwrite SunOS 4 x int xxread dev uio int xxwrite dev uio dev t dev struct uio uio SunOS 5 x int xxread dev t dev uio t uiop cred t credp int xxwrite dev t dev uio t uiop cred t credp physio 9F should no longer be called with the address of a statically allocated bu 9S structure Instead
21. modldrv struct mod_ops drv_modops char drv_linkinfo struct dev_ops drv_dev_ops This structure describes the module in more detail The drv_modops field points to a structure describing the module operations which is amp mod_driverops for a device driver The drv_linkinfo field is displayed by the modinfo 1M command and should be an informative string identifying the device driver The drv dev ops field points to the next structure in the chain the dev ops 95 structure dev ops int devo rev int devo refcnt int devo getinfo dev info t dip ddi info cmd t infocmd void arg void result int devo identify dev info t dip int devo probe dev info t dip int devo attach dev info t dip ddi attach cmd t cmd int devo_detach dev info t dip ddi detach cmd t cmd Autoconfiguration 89 90 Che Ops int devo_reset dev_info_t dip ddi_reset_cmd_t cmd struct cb ops devo cb ops struct bus ops devo bus ops The dev ops 95 structure allows the kernel to find the autoconfiguration entry points of the device driver The devo rev field identifies the revision number of the structure itself and must be set to DEVO REV The devo refcnt field must be initialized to zero The function address fields should be filled in with the address of the appropriate driver entry point exceptions If a probe 9E routine is not needed use nulldev 9F nodev 9F can be used in devo
22. nblocks flags amp xsp nblocks sizeof int other cases skip return ddi_prop_op dev dip prop_op flags name valuep lengthp This section suggests a structure for device drivers The following sections describe the example driver layout in detail The code for a device driver is usually divided into the following files headers h files source files c files possibly a configuration file conf file Note This is a suggested layout only It is not required as only the final object module matters to the system Header files define data structures specific to the device such as a structure representing the device registers data structures defined by the driver for maintaining state information defined constants such as those representing the bits of the device registers and macros such as those defining the static mapping between the minor device number and the instance number Writing Device Drivers August 1994 3 Some of this information such as the state structure may only be needed by the device driver This information should go in private headers These header files are only included by the device driver itself Any information that an application might require such as the I O control commands should be in public header files These are included by the driver and any applications that need information about the device There is no standard for naming private and publi
23. prop op 9E Manages arbitrary driver properties mmap 9E Checks virtual mapping for a memory mapped device segmap 9E Maps device memory into user space chpoll 9E Poll device for events Note Some of these entry points may be replaced by nodev 9F or nulldev 9F as appropriate The attach 9E routine should perform the common initialization tasks that all devices require Typically these tasks include Allocating per instance state structures Mapping the device s registers Registering device interrupts Initializing mutex and condition variables Creating minor nodes Character device drivers create minor nodes of type S IFCHR This causes a character special file representing the node to eventually appear in the devices hierarchy Code Example 8 1 Character driver attach 9E routine static int xxattach dev info t dip ddi attach cmd t cmd Writing Device Drivers August 1994 Co lll switch cmd case DDI_ATTACH allocate a state structure and initialize it map the device s registers add the device driver s interrupt handler s initialize any mutexes and condition variables Create the device s minor node Note that the node type argument is set to DDI NT TAPE if ddi create minor node dip Wrninor name S IFCHR minor number DDI NT TAPE 0 DDI FAILURE free resources allocated so far Remove any previously allocated minor nodes ddi remove minor node dip NULL
24. that the device s CSR includes this information mutex enter amp xsp mu Status xSp oregp csr if status amp INTERRUPTING mutex exit amp xsp mu return DDI INTR UNCLAIMED Inform the device that it is being serviced and re enable interrupts Th xample assumes that writing to the CSR accomplishes this The driver must ensure that this write operation makes it to the device before the interrupt service returns For example reading the CSR if it does not result in unwanted effects can ensure this F xsp regp csr CLEAR INTERRUPT temp xsp gt regp gt csr ENABLE INTERRUPTS perform any I O related and synchronization processing signal waiting threads biodone 9F or co signal 9F mutex exit amp xsp mu return DDI INTR CLAIMED On an architecture that does not support vectored hardware interrupts when the system detects an interrupt it calls the driver interrupt handler function for each device that could have issued the interrupt The interrupt handler must determine whether the device it handles issued an interrupt On architectures supporting vectored interrupts this step is unnecessary but not harmful and it enhances portability The syntax and semantics of the interrupt handling routine therefore can be the same for both vectored interrupts and polling interrupts In the model presented here the argum
25. Book title a new word or term or an emphasized word system add drv addsa driver to the system inumber is the number of the interrupt to register See chapter 9 of the STREAMS Programmer s Guide A mutual exclusion lock is Any device interrupts must be registered with the system xxiii xxiv Writing Device Drivers August 1994 What is the Kernel Overview ofthe SunOS Kernel 1 This chapter provides an overview of the SunOS kernel It covers concepts of particular importance to device driver writers including general kernel structure and function kernel and user threads relevant aspects of the virtual memory VM system and the Solaris 2 x DDI DKI The SunOS kernel is a program that manages system resources It insulates applications from the hardware and provides them with essential system services such as input output I O management virtual memory and scheduling The kernel consists of object modules that are dynamically loaded into memory when needed The main part of the kernel is contained in the file kernel unix The kernel provides a set of interfaces for applications to use called system calls System calls are documented in the Solaris 2 4 Reference Manual AnswerBook see Intro 2 The function of some system calls is to invoke a device driver to perform I O Device drivers are kernel modules which normally reside in the hierarchy kernel or usr kernel See Chapter 12 Loadi
26. Forward declaration of entry points d static declarations of cb ops entry point functions A static struct cb ops xx cb ops set cb ops fields iA static declarations of dev ops entry point functions E static struct dev_ops xx_ops Writing Device Drivers August 1994 Qo lll set dev_ops fields h declare and initialize the module configuration section static struct modldrv modldrv set modldrv fields h static struct modlinkage modlinkage set modlinkage fields X int init void definition int info struct modinfo modinfop definition int _fini void definition static int xxidentify dev info t dip definition static int xxprobe dev_info_t dip definition Overview of SunOS Device Drivers 65 66 static int xxattach dev info t dip ddi attach cmd t cmd definition static in xxdetach dev_info_t dip ddi_detach_cmd_t cmd definition static in xxgetinfo dev_info_t dip ddi_info_cmd_t cmd void arg void result definition static int xxopen dev_t devp int flag int otyp cred_t credp definition static int xxclose dev_t dev int flag int otyp cred_t credp definition static in xxstrategy struct buf bp definition
27. Now entries are advertised to the kernel by device drivers calling ddi create minor node 9F once they have determined a particular device exists drvcon ig 1M actually maintains the file system nodes This results in names that completely identify the device dev In SunOS 4 x device special files lived by convention in dev Now that the devices directory is used for special files dev is used for logical device names Usually these are symbolic links to the real names in devices Logical names can be used for backwards compatibility with SunOS 4 X applications a shorthand for the real devices name or a way to identify a device without having to know where it is in the devices tree dev fb could refer to a cgsix cgthree or bwtwo framebuffer but the application does not need to know this See disks 1M tapes 1M ports 1M devlinks 1M and etc devlink tab for system supported ways of creating these links See Chapter 5 Autoconfiguration and Application Packaging Developer s Guide for more information Multithreading SunOS 5 x supports multiple threads in the kernel and multiple CPUs A thread is a sequence of instructions being executed by a program In SunOS 5 x there are application threads and there are kernel threads Kernel threads are used to execute kernel code and are the threads of concern to the driver writer Interrupts are also handled as threads Because of this there is less of a distinct
28. Readers Writer Locks Semaphores Thread Synchronization Writing Device Drivers August 1994 54 55 55 56 57 60 60 65 66 66 66 71 71 71 72 73 74 74 75 75 77 77 77 Condition Variables lee 77 GVitvmedwadt 4 std oru unt d vu erede dte 81 Gv wait Sigi Jie es IURE xa ees 82 cv timedwa rtt svg ntecceccida da d x mes 82 Choosing Locking Scheme ss s eret dde 83 AutocanfiguratioD i i oce veda eua e eu uw ene oy TS 85 OVerVIeW ctetu d Me ug Wan bo Phd tae Bar toned ea 85 State Structure s asso own thee IO wee du er TETTEEEEPPRSTATA 85 Data Structures eve v TRE Meo eo EON E HEN CRGA 86 modlimktagel oes ag kx wee oboe et sae Maal ate ae 87 MOOT ARV heey vedas exu ae RES 87 dew ops szxsanmamadweed eae ee aaa daar oa 87 COL ODS f tok aaa tees cy SPEE ES EVE PRM SESSA 88 Loadable Driver Interface 2 ry sow I 89 Device Configuration duce wies Pepe Do DPI edt 91 lderntuty T ILS EUER SAIS cR RE Ue 91 ptobe yXATAMS L LEE EDS MIU E IP AS E 93 attach E es eese etti Dh a eu SCOR OS cR tA e o 95 clevere s e ee euh d Rs e m T uf 100 getemto s da AREE en ette e pedcs 102 Interrupt Handlers i044 oce i e en e exa ey ry 105 OVervIe Wc cuts tu Ww DU ET E TR PY Ie p pu ROC 105 Interrupt Specification aes vamos ie dere E rrt tuas 106 Intecrupt Nui befc e eq EDS TEUER ete d a eA CR 106 vii bussInterrupt bevels 42 tu ob e eee oe ers o rs 106 High Level IMOFDHPIS 16 iviv
29. See Chapter 6 Appendix J and Appendix K of The SPARC Architecture Manual Version 8 for more details on the SPARC memory model This section describes a number of bus specific topics including device identification device addressing and interrupts Device Identification 14 Device identification is the process of determining which devices are present in the system Self Identifying Devices Some devices are self identifying the device itself provides information to the system so that it can identify the device driver that needs to be used The device usually provides additional information to the system in the form of name value pairs that can be retrieved using the property interfaces See Properties on page 57 for more information on properties SBus devices are examples of self identifying devices The information is usually derived from a small FORTH program stored in the FCode PROM on the device See sbus 4 for more information Writing Device Drivers August 1994 No lll Non Self Identifying Devices Devices that do not provide information to the system to identify themselves are called non self identifying devices Drivers for these devices must have a probe 9E routine which determines whether the device is really there In addition information about the device must be provided in a hardware configuration file see driver conf 4 so that the system can provide probe 9E with the information it needs t
30. align max align mineffect adjust offset for ddi dma htoc 9F DMA 147 148 Writing Device Drivers August 1994 Entry Points Driversfor Character Devices 8 This chapter describes the structure of a character device driver The entry points of a character device driver are the main focus and the use of physio 9F in read 9E and write 9E is also explained Associated with each device driver is a dev ops 95 structure which in turn refers to a cb ops 95 structure These structures contain pointers to the driver entry points and must be set by the driver Table 8 1 lists the character device driver entry points Table 8 1 Character Driver Entry Points Entry Point Description _init 9E Initializes the loadable driver module _info 9E Returns the loadable driver module information _fini 9E Prepares a loadable driver module for unloading identify 9E Identifies if the device driver supports a physical device probe 9E Determines if a device is present attach 9E Performs device specific initialization detach 9E Removes device specific state getinfo 9E Gets device driver information 149 lll Co Autoconfiguration 150 Table 8 1 Character Driver Entry Points Entry Point Description open 9E Gains access to a device close 9E Relinquishes access to a device read 9E Read data from device write 9E Write data to device ioctl 9E Perform arbitrary operation
31. cv wait sig 9F returns zero if it is returning because of a signal or nonzero if the condition occurred Code Example 4 4 Using cv wait sig 9F mutex enter amp xsp mu while xsp busy if cv wait sig amp xsp cv amp xsp mu 0 Signalled while waiting for the condition tidy up and exit mutex exit amp xsp mu return EINTR xsp busy 1 mutex exit amp xsp mu cv timedwait sig cv timedwait sig 9F is similar to cv timedwait 9F and cv wait sig 9F except that it returns 1 without the condition being signaled after a timeout has been reached or 0 if a signal for example ki11 2 is sent to the thread Writing Device Drivers August 1994 4 For both cv timedwait 9F and cv timedwait sig 9F time is measured in absolute clock ticks since the last system reboot Choosing a Locking Scheme The locking scheme for most device drivers should be kept straightforward Using additional locks may allow more concurrency but increase overhead Using fewer locks is cheaper but allows less concurrency Generally use one mutex per data structure a condition variable for each event or condition the driver must wait for and a mutex for each major set of data global to the driver Avoid holding mutexes for long periods of time For more information on locking schemes see Appendix B Advanced Topics Also see Multithreaded Programming Guide for more detail on multithreading operatio
32. ddi get soft state statep instance struct xxctx newctx Create a new context for this mapping newctx kmem alloc sizeof struct xxctx KM SLEEP newctx xsp Xxsp Set up mapping error ddi mapdev dev off asp addrp len prot maxprot flags credp amp xx mapdev ctl amp newctx gt handle newctx if error kmem free newctx sizeof struct xxctx return error Managing Mapping Accesses The device driver is notified when a user process accesses an address in the memory mapped region that does not have valid mapping translations When the access event occurs the mapping translations of the process that currently has access to the device must be invalidated The device context of the process requesting access to the device must be restored and the translations of the mapping of the process requesting access must be validated The functions ddi_mapdev_intercept 9F and ddi_mapdev_nointercept 9F are used to invalidate and validate mapping translations int ddi mapdev intercept ddi mapdev handle t handle off t offset off t len ddi_mapdev_intercept 9F invalidates the mapping translations for the pages of the mapping specified by handle offset and len By invalidating the mapping translations for these pages the device driver is telling the system Device Context Management 221 11 222 to intercept accesses to these pages of the mapping and notify the device d
33. detach to prevent the driver from being unloaded devo reset should be set to nodev 9F The devo cb ops member should contain the address of the cb ops 9S structure The devo bus ops field must be set to NULL int cb open dev t devp int flag int otyp cred t credp int cb close dev t dev int flag int otyp cred t credp int cb strategy struct buf bp int cb print dev t dev char str int cb dump dev t dev caddr t addr daddr t blkno int nblk int cb read dev t dev struct uio uiop cred t credp int cb write dev t dev struct uio uiop cred t credp int cb ioctl dev t dev int cmd int arg int mode cred t credp int rvalp int cb_devmap Irt cb_mmap dev_t dev off_t off int prot int cb_segmap dev_t dev off_t off struct as asp addr t addrp off t len unsigned int prot unsigned int maxprot unsigned int flags cred t credp int cb chpoll dev t dev short events int anyyet short reventsp struct pollhead phpp int cb prop op dev t dev dev info t dip ddi prop op t prop op int mod flags Writing Device Drivers August 1994 O1 lll char name caddr_t valuep int length struct streamtab eob str STREAMS information int cb flag The cb ops 95 structure contains the entry points for the character and block operations of the device driver Any entry points the driver does not suppo
34. dip char name int spec type int minor num char node type int is clone ddi create minor node 9F advertises a minor device node which will eventually appear in the devices directory and refer to the device specified by dip void ddi remove minor node dev info t dip char name ddi remove minor node 9F removes the minor device node name for the device dip from the system name is assumed to have been created by ddi create minor node 9F If name is NULL all minor node information is removed int mod install struct modlinkage modlinkage mod install 9F links the calling driver module into the system and prepares the driver to be used modlinkage is a pointer to the modlinkage structure defined in the driver mod install 9F must be called from the init 9E entry point Summary of Solaris 2 4 DDI DKI Services 315 lll C Device Information 316 int mod remove struct modlinkage modlinkage mod remove 9F unlinks the calling driver module from the system modlinkage is a pointer to the modlinkage structure defined in the driver mod remove 9F must be called from the _fini 9E entry point int mod info struct modlinkage modlinkage struct modinfo modinfop mod info 9F reports the status of a dynamically loadable driver module It must be called from the _info 9E entry point These interfaces provide information to the driver about a device such as whether the device is self identifying wh
35. dynamic loading 3 dynamic memory allocation 54 E entry points See driver entry points external registers 25 F filesystem I O 169 fini 9E 49 91 first party DMA 125 G geographical addressing 15 graphics devices device context management of 213 H hardware configuration file 226 header files for device drivers 60 I O control overview 43 disk controls 299 filesystem structure 169 miscellaneous control of 165 multiplexing 162 port access 331 programmed transfers 154 scatter gather structures 153 identify 9E entry point 91 info 9E 49 init 9E 49 91 instance numbers 92 internal mode registers 25 internal sequencing logic 25 interrupt cookie See cookie interrupt handling block interrupt cookie 52 device interrupt cookie 52 interfaces for 322 overview 52 registering a handler 95 interrupts common problems with 25 registering 97 specification of 106 types of 107 inumber 97 K kadb 1M command 246 kernel modules directory of 227 dynamic loading 3 kernel threads 72 kernel definition of 1 keywords new 66 359 360 L leaf device drivers 5 lightweight process 71 linking a driver 226 loading drivers add drv 1M command 227 compiling a driver 226 hardware configuration file 226 linking a driver 226 overview 3 loading modules 49 227 lock granularity 295 locking primitives types of 74 LWP 71 M memory mapping device context m
36. for two arguments Breakpoints In kadb 1M breakpoints can be set which will automatically drop back into kadb when reached The standard form of a breakpoint command is addr count b command addr is the address at which the program will be stopped and the debugger will receive control Count is the number of times that the breakpoint address occurs before stopping and command is almost any adb 1 command Other breakpoint commands are BC d continue execution delete breakpoint single step single step but step over function calls stop after return to caller of current function delete all breakpoints 254 Writing Device Drivers August 1994 1I3 Here is an example of setting a breakpoint in a commonly used routine scsi_transport 9F Upon reaching the breakpoint c is used to get a stack trace The top of stack is the first function printed Note that kadb 1M does not know how many arguments were passed to the function it always prints six kadb 0 sesi_transport b kadb 0 c test console login root Password breakpoint scsi transport save sp 0x60 sp kadb 0 c Scsi transport 0xff09c400 0x3 0x3 0x1 0xff09c534 0xff0a0690 sdstrategy 0xff0a0690 0x3 0xff09c440 0x170 0xff09c534 0xff09c400 3d8 bwrite 0xff0a0690 0xffled400 0x1 0xb 0xff0a06f0 0xf017d9d8 bc sbupdate 0xf00dcfe8 0xff20 400 0xff0a0690 0x400 0x0 0xff035000 4 9c ufs update 0xff034c80 0x3 0xff034
37. h Example Three A DMA engine on an x86 ISA bus device has the following limitations It only access the first 16 megabytes of memory It can perform transfers to segments up to 32k in size It can hold up to 17 scatter gather transfers It operates on units of 512 bytes It has a minimum effective transfer size of 2 bytes It has an average transfer speed of 10 Mbytes per second The resulting limit structure is Writing Device Drivers August 1994 N lll static ddi_dma_lim_t limits 0x00000000 low address OxOOFFFFFF high address 0 must be 0 1 must be 1 DMA UNIT 8 minimum transfer size 0 must be 0 DMALIM VERO version OxFFFFFF address register maximum 0x007FFF maximum transfer 1 512 granularity DIT scatter gather length OxFFFFFFFF request size Object Locking Before allocating the DMA resources for a memory object the object must be prevented from moving If it is not the system may remove the object from memory while the device is writing to it causing the data transfer to fail and possibly corrupting the system The process of preventing memory objects from moving during a DMA transfer is known as locking down the object Note Locking objects in memory is not related to the type of locking used to protect data The following object types do not require explicit locking Buffers coming from the file
38. opaque object that keeps track of the soft state structures Summary of Solaris 2 4 DDI DKI Services 343 lll C String Manipulation 344 void ddi soft state fini void state p ddi soft state fini 9F is the inverse operation to ddi soft state init 9F state p points a pointer to an opaque object that keeps track of the soft state structures int ddi soft state zalloc void state int item ddi soft state zalloc 9F allocates and zeroes a new instance of a soft state structure statep points to an opaque object that keeps track of the soft state structures void ddi get soft state void state int item ddi get soft state 9F returns a pointer to the soft state structure for the device instance item statep points to an opaque object that keeps track of the soft state structures void ddi soft state free void state int item ddi soft state free 9F releases the resources associated with the soft state structure for item statep points to an opaque object that keeps track of the soft state structures These interfaces are generic string manipulation utilities similar to and in most cases identical to the routines of the same names defined in the standard C library used by application programmers int stoi char str stoi 9F converts the ASCII decimal numeric string pointed to by str to an integer and returns the integer str is updated to point to the last character examined Writing D
39. pass a NULL pointer as the second argument which causes physio 9F to allocate a buf structure The address of the allocated buf structure should always be saved in st rategy 9E since it is needed to call biodone 9F An alternative is to use get rbuf 9F to allocate the buf 9S structure and freerbuf 9F to free it ioctl SunOS 4 x int xxioctl dev cmd data flag dev_t dev int cmd flag caddr_t data SunOS 5 x Converting a Device Driver to SunOS 5 4 287 288 int xxioctl dev t dev int cmd int arg int mode cred t credp int rvalp In SunOS 4 x ioctl command arguments were defined as follows define XXIOCTL1 IOR m 1 u int The _IOR IOW and IOWR macros used to encode the direction and size of the data transfer The kernel would then automatically copy the data into or out of the kernel This is no longer the case To do a data transfer the driver is now required to use ddi_copyin 9F and addi copyout 9F explicitly Do not dereference arg directly In addition use the new method of a left shifted letter OR ed with number fdefine XXIOC x 8 define XXIOCTL1 XXIOC 1 The credential pointer can be used to check credentials on the call with drv_priv 9F and the return value pointer can be used to return a value which means something as opposed to the old method of always getting zero back for success This number should be positive to avoid confusion with applications that
40. their readiness to perform certain I O operations See the po11 2 manual page for details int nochpoll dev t dev short events int anyyet short reventsp struct pollhead pollhdrp Use nochpo11 9F as the chpol1 entry in the cb ops 95 structure if the driver does not support polling void pollwakeup struct pollhead php short event If the driver does implement a chpo11 9E entry point to support polling it should call pollwakeup 9F whenever the event occurs Printing System Messages These interfaces are functions that display messages on the system console void cmn err int level char format cmn err 9F is the mechanism for printing messages on the system console level may be one of CE NOTE CE WARN CE CONT or CE PANIC CE NOTE indicates a purely informational message CE WARN indicates a warning to the user CE CONT continues a previous message And CE PANIC issues a fatal error and crashes the system Use CE PANIC only for unrecoverable system errors Whenever possible CE CONT should be used to print system messages Note that CE PANIC CE NOTE and CE WARN cause cmn err 9F to always append a new line to the message void ddi report dev dev info t dip ddi report dev 9F possibly prints a message announcing the presence of a device on the system Call this function before returning from a successful attach 9E Summary of Solari
41. void drv_usecwait clock_t microsecs drv_usecwait 9F busy waits for microsecs microseconds Summary of Solaris 2 4 DDI DKI Services 351 lll C uio 9S Handling Utility Functions 352 clock_t drv_hztousec clock_t hertz drv_hztousec 9F converts hertz clock ticks to microseconds and returns the number of microseconds clock t drv usectohz clock t microsecs drv usectohz 9F converts microsecs microseconds to clock ticks and returns the number of clock ticks These interfaces all deal with moving data using the uio 9S data structure int uiomove caddr t address long nbytes enum uio rw rwflag struct uio uio p uiomove 9F copies data between the address address and the uio 9S structure pointed to by uio p If rwflag is UIO READ data are transferred from address to a data buffer associated with the uio 9S structure If rwflag is UIO WRITE data are transferred from a data buffer associated with the uio 9S structure to address int ureadc int c uio t uio p ureadc 9F appends the character c to the a data buffer associated with the uio 95 structure pointed to by uio p int uwritec uio t uio p uwritec 9F removes a character from a data buffer associated with the uio 9S structure pointed to by uio p and returns the character These interfaces are miscellaneous utilities that driver may use Writing Device Drivers August 1994 C lll void ASSERT EX The AS
42. 1 can be used to debug applications or the kernel though it cannot debug the kernel interactively such as by setting breakpoints To interactively debug the kernel use kadb 1M Both adb 1 and kadb 1M share a common command set adb 1 is a very terse debugger It does not normally prompt for input though kadb 1M does Starting adb The command for starting adb to debug a kernel core dump is 9 adb k var crash hostname unix n var crash hostname vmcore n To start adb on a live system use as root f adb k dev ksyms dev mem dev ksyms is a special driver that provides an image of the kernel s symbol table This can be used to examine the debugging information traces the driver has left in the memory When adb 1 responds with physmem XXX it is ready for a command Writing Device Drivers August 1994 1I3 Starting kadb The system must be booted under kadb 1M before kadb 1M can be used From the Open Boot PROM use ok boot kadb Boot device sbus esp80 800000 sd83 0 File and args kadb kadb kernel unix Size 673348 182896 46008 bytes kernel unix loaded 0x125000 bytes used SunOS Release 5 4 Version Generic UNIX R System V Release 4 0 Copyright c 1983 1994 Sun Microsystems Inc By default kadb 1M boots and debugs kernel unix It can be passed a file name as an argument to boot a different kernel or d can be passed to have kadb 1
43. 1 Accessing bwtwo with the sbusmem driver Debugging 243 13 244 include lt sys types h gt include lt sys mman h gt include lt sys stat h gt include lt fcntl h gt include lt stdio h gt void fill char base int count char val register int i for i 0 i lt count i baset val sleep 2 int main int argc char argv int fd caddr_t base size_t fb len 0x20000 off t fb offset 0x800000 fd open devices sbus 1 8000000 sbusmem 3 0 slot3 O_RDWR if fd 1 perror open devices sbus 1 8000000 sbusmem 3 0 slot3 return 1 base mmap NULL fb_len PROT_READ PROT_WRITE MAP_SHARED fd fb offset if base caddr t 1 perror mmap of SBus slot 3 return 1 close fd fill base 0x20000 Oxff fill base 0x20000 0 fill base 0x18000 0x55 fill base 0x15000 0x3 fill base 0x10000 0x5 fill base 0x5000 Oxf9 return 0 Writing Device Drivers August 1994 1I3 Debugging Tools This section describes some programs and files that can be used to debug the driver at run time etc system The etc system file is read once while the kernel is booting It is used to set various kernel options After modifying this file the system must be rebooted for the changes to take effect If a change in the file causes the system not to work boot with the ask a option and specify de
44. 10 SCSI Target Dryers cer nb he Pa READER ER Ie CRUS GE 189 JVePV Ie WC Bey IO QOL o DA E URURUDURI UE DNUS RE M 189 Reference Documents as oec acute oH ue es 190 Sun Common SCSI Architecture Overview 191 General Flow of Control eens eeawewnye teh ee sear s 192 SCSA PUNCHONS quete e S V CIC TEM Wiese uc neg 194 SCSA Compatibility Functions s esos eecox x X Pe exe en 195 SCSI Target Drivers os eile VEU SU eho pou Stata biel 195 Hardware Configuration File 00058 195 Declarations and Data Structures 04 196 Autoconfiguration iG acc cp Ee kx kadda dad eet 199 Resource Allocation 2 229 eee RR odes 205 Building and Transporting aCommand 208 Building a Command v40cxces ess ie ee ewes ek ERI 208 Transporting a Command Lr ded Istat d eyPT ERE AS 209 Command Completion ucc Lo EC COE C IA 210 Writing Device Drivers August 1994 12 13 What Is A Device Context tivus vo vcore eres Gete Snnt vs 213 Context Management Model 2 eee pei ro teen s 213 Multiprocessor Considerations 0 00000 215 Context Management Operation 000 sees 216 State SIUCEULG 2 eor oe pied ae ete Ae E EE Fa b t d dod 216 Declarations and Data Structures 0005 217 Associating Devices with User Mappings 217 Managing Mapping Accesses 0 0000 e eee eee eee 219 Device Context Management Entry Points
45. 106 SCSI 189 bus nexus device drivers 5 bus master DMA 124 byte stream I O 42 C cache 140 callback functions 51 102 134 cb ops 9S structure 88 170 character device drivers 42 147 entry points for 50 compiler modes 65 compiling linking a driver 226 condition variables 344 and interface functions 344 and mutex locks 77 277 routines for 78 configuration file device attach 9E 95 detach 9E 100 getinfo 9E 102 identify 9E 91 probe 9E 93 configuration file hardware 226 context of device driver 52 control registers device context management of 213 cookie DMA 120 types of 52 357 D data structures cb ops 9S 88 170 dev ops 95S 87 170 for device drivers 60 overview of 86 data storage classes of 74 DDI DKI and disk performance 300 compliance testing 263 interface summary 307 kernel support routines 270 ddi functions 307 ddi add intr 9F 97 112 ddi create minor node 9F 98 ddi dma free 9F 123 ddi_dma_nextseg 9F 124 ddi dma nextwin 9F 123 ddi_dma_segtocookie 9F 124 ddi get instance 9F 97 ddi iblock cookie t 52 ddi idevice cookie t 52 ddi map regs 9F 98 ddi prop create 9F 58 ddi prop op 9F 58 ddi remove intr 9F 101 ddi unmap regs 9F 101 detach 9E entry point 100 dev ops 95 structure 87 170 device access system calls 173 device addressing 15 device driver converting to 5 x 269 debugging coding hints 236
46. 32 bit boundaries long long integers are aligned on 64 bit boundaries Usually alignment issues are handled by the compiler Driver writers are more likely to be concerned about alignment as they must use the proper data types to access their device Since device registers are commonly accessed through a pointer reference drivers must ensure that pointers are properly aligned when accessing the device See Device Issues on page 44 for more information about accessing device registers Structure Member Alignment Because of the data alignment restrictions imposed by the SPARC processor C structures also have alignment requirements Structure alignment requirements are imposed by the most strictly aligned structure component For example a structure containing only characters has no alignment restrictions while a structure containing a long long member must be constructed to guarantee that this member falls on a 64 bit boundary See Structure Padding on page 47 for more information on how this relates to device drivers Byte Ordering The SPARC processor uses big endian byte ordering in other words the most significant byte of an integer is stored at the lowest address of the integer Byte 0 Byte1 Byte 2 Byte3 MSB LSB Register Windows SPARC processors use register windows Each register window is comprised of 8 in registers 8 local registers and 8 out registers which are the in registers of the n
47. 9S structure contains information about the direction and size of the transfer plus an array of buffers describing one end of the transfer the other end is the device Below is a list of uio 9S structure members that are important to character drivers iovec t uio iov base address of the iovec buffer description array int uio iovcnt the number of iovec structures off t uio offset offset into device where data is transferred from or to offset t uio loffset 64 bit offset into file where data is transferred from or to See NOTES int uio resid amount in bytes not transferred on completion Writing Device Drivers August 1994 8 A uio 9S structure is passed to the driver read QE and write 9E entry points This structure is generalized to support what is called gather write and scatter read When writing to a device the data buffers to be written do not have to be contiguous in application memory Similarly when reading from a device into memory the data comes off the device in a contiguous stream but can go into a noncontiguous area of application memory See readv 2 writev 2 pread 2 and pwrite 2 for more information on scatter gather I O Each buffer is described by an iovec 95 structure This structure contains a pointer to the data area and the number of bytes to be transferred caddr t iov base address of buffer int iov len amount to trans
48. ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT THIS PUBLICATION COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THE PUBLICATION SUN MICROSYSTEMS INC MAY MAKE IMPROVEMENTS AND OR CHANGES IN THE PRODUCT S AND OR THE PROGRAM S DESCRIBED IN THIS PUBLICATION AT ANY TIME XX amp O ion ig Contents 1 Overview of the SunOS Kernel esee Whatis the Kemel s vede cqUODEOUCDCDER OE WENN AUC RR Multithre ding quy aver iex ote CO CIN RID D E E d di dd Virtual Memory 42d Fade eb eS eee DUI ect Dena Virtual Addresses avdescscuA NN vx TEX Ru eres Ad tess Spdcesi aas oodd uo PRESE uedwelbbbbe ds Special Files s s sse kreR EE ES pen paw red pra er Dynamic Loading of Kernel Modules Overview of the Solaris 2 x DDI DKI D vice lree usus SERE EA TAN AER NAAR Example Device Ire 2250524 9224 e CE TEE S tees 2 Hardware Ovetvie Wa vices iis se ho ege e aal ev wv SPARC Processor Issues eee es NO OO iO DH FW CQ CQ NY NY NY 3 KH E Data AlSnMeN tas cnt vocatas ee te P etr ar etta Structure Member Alignment ssseuuss 10 iii Byte Ordering ue telat CN s b cor o et Fats tems Repister Windows nesiel cvuri ive se
49. Avoiding Unnecessary Locks Use the MT semantics of the entry points to your advantage If an element of a device s state structure is read mostly for example initialized in attach and destroyed in detach but only read in other entry points there is no need to acquire a mutex to read that element of the structure This may sound obvious but blindly adding calls to mutex_enter 9F and mutex_exit 9F around every access to such a variable can lead to unnecessary locking overhead Make all entry points reentrant and reduce the amount of shared data by changing static variables to automatic or by adding them to your state structure Note Kernel thread stacks are small currently 8 Kbytes so do not allocate large automatic variables and avoid deep recursion Locking Order When acquiring multiple mutexes be sure to acquire them in the same order on each code path For example mutexes A and B are used to protect two resources in the following ways Code Path 1 Code Path 2 mutex enter amp A mutex enter amp B mutex enter amp B mutex enter amp A mutex exit amp B mutex exit amp A mutex exit amp A mutex exit amp B 298 Writing Device Drivers August 1994 B If thread 1 is executing code path one and thread two is executing code path 2 the following could occur 1 Thread one acquires mutex A 2 Thread two acquires mutex B 3 Thread one needs mutex B so it blocks holdin
50. C can be used to abort the listing kadb 0 lt threadlist thread id f0141ee0 le data address not found thread id f0165ee0 0xf00e24e0 0xf00e24e0 0xff004000 0xc 0x callout thread 0xff004090 0xf00d7d9a 0xf0 thread id f016bee0 Oxf00e04d0 0xf00e04d0 0x80b5f7ff 0x1 0x background 0x0 0x0 0x0 0xf00e23ac 0x0 Oxf thread id f016eee0 0xf00dd884 0xf00dd884 0x80b777 ff f freebs 0x0 0x0 0xf00e2388 0xff21d27 AG 0x4000e0 le e24e0 0xf00ac6c0 0x0 0xf004000 2c oO o0 Qi AC 0x4000e0 le 0e04d0 64 oo Cc x1 0x Oxf0 0x4000e0 le e24a0 0xf00dd884 2c Ceo c Debugging 257 13 258 Another useful macro is thread Given a thread ID it prints the corresponding thread structure This can be used to look at a certain thread found with the threadlist macro to look at the owner of a mutex or to look at the current thread kadb 0 g7 thread Oxf0141ee0 link Stk stksize 0 0141ee0 ee0 Oxf0141eec affinity affcnt bind cpu 1 1 1 Note There is no type information kept in the kernel so using a macro on an inappropriate object will result in garbage output Macros do not necessarily output all the fields of the structures nor is the output necessarily in the order given in the structure definition Occasionally memory may need to be dumped for certain structures and then matched with the struct
51. Code Example 9 8 on page 185 for an example of setting b resid The b resid member is overloaded it is also used by disksort 9F b error isset by the driver an error number when a transfer error occurs It is set in conjunction with the b flags B ERROR bit See Intro 9E for details regarding error values Drivers should use bioerror 9F in preference to setting b error directly b private is used exclusivly by the driver to store driver private data b edev contains the device number of the device involved in the transfer bp mapin When a buf structure pointer is passed into the device driver s strategy 9E routine the data buffer referred to by b_un b_addr is not necessarily mapped in the kernel s address space This means that the data is not directly accessible Drivers for Block Devices 179 by the driver Most block oriented devices have DMA capability and therefore do not need to access the data buffer directly Instead they use the DMA mapping routines to allow the device s DMA engine to do the data transfer For details about using DMA see Chapter 7 DMA If a driver needs to directly access the data buffer as opposed to having the device access the data it must first map the buffer into the kernel s address space using bp mapin 9F bp mapout 9F should be used when the driver no longer needs to access the data directly Synchronous Data Transfers 180 This section discusses a simple method for perf
52. Critical System Files There are a number of driver related system files that are difficult if not impossible to reconstruct Files such as etc name to major could be corrupted if the driver crashes the system during installation see add drv 1M To be safe once the test machine is in the proper configuration make a backup copy of the root file system Recreating devices and dev If the devices or dev directories are damaged most likely if the driver crashes during attach 9E they may be recreated in the following way Boot the system from somewhere else another disk an installation CD or the network and run sck 1M to repair the damaged root filesystem Then mount the root filesystem and recreate devices by running drvconfig 1M and specifying the devices directory on the mounted disk The dev directory can be repaired by running devlinks 1M disks 1M tapes 1M and ports 1M on the dev directory of the mounted disk Writing Device Drivers August 1994 1I3 For example if the damaged disk is dev dsk c0t3d0s0 and an alternate boot disk is dev disk c0t1d0s0 do the following ok boot disk1l Rebooting with command diskl Boot device sbus esp80 800000 sd1 0 File and args SunOS Release 5 4 Version Generic UNIX R System V Release 4 0 Copyright c 1983 1994 Sun Microsystems Inc fsck dev dsk c0t3d0s0 dev dsk c0t3d0s0 Last Mounted on Phase 1 Check Blocks and S
53. OTYP_CHR return EINVAL if flag amp FWRITE amp amp drv_priv credp EPERM return EPERM return 0 devp is a pointer to a device number The open 9E routine is passed a pointer so that the driver can change the minor number This allows drivers to dynamically create minor instances of the device An example of this might be a pseudo terminal driver that creates a new pseudo terminal whenever the driver is opened A driver that chooses the minor number dynamically normally creates only one minor device node in attach 9E with ddi_create_minor_node 9F then changes the minor number component of devp using makedevice 9F and getmajor 9F devp makedevice getmajor devp new minor The driver must keep track of available minor numbers internally otyp indicates how open 9E was called The driver must check that the value of ot yp is appropriate for the device For character drivers ot yp should be OTYP CHR see the open 9E manual page flag contains bits indicating whether the device is being opened for reading FREAD writing FWRITE or both Threads issuing the open 2 can also request exclusive access to the device F1 Writing Device Drivers August 1994 EXCL or specify that the open should 8 not block for any reason FNDELAY but it is up to the driver to enforce both cases A driver for a write only device such as a printer might
54. Such a compiler might also determine that reordering the instructions would provide better performance causing the device to be programmed in the incorrect order If the csr field is declared volatile it is not allowed to do so Following is an example of the second use of volatile A busy flag is used to prevent a thread from continuing while the device is busy and the flag is not protected by a lock while busy do something else The testing thread will continue when another thread turns off the busy flag busy 0 However since busy is accessed frequently in the testing thread the compiler may optimize the test by placing the value of busy in a register then test the contents of the register without reading the value of busy in memory before every test The testing thread would never see busy change and the other thread would only change the value of busy in memory resulting in deadlock The busy flag should be declared volatile forcing its value to be read before each test Note It would probably be preferable to use a condition variable mutex discussed under Condition Variables on page 79 instead of the busy flag in this example It is also recommended that the volatile qualifier be used in such a way as to avoid the risk of accidental omission For example this code struct device reg volatile u char csr volatile u char data struct device_reg regp is recommended over Writing Device Dr
55. Vectored interrupt handlers were called directly in response to a particular hardware interrupt on the basis of the interrupt vector number assigned to the device In SunOS 5 x the interrupt handler model has been unified The device driver registers an interrupt handler for each device instance and the system either polls all the handlers for the currently active interrupt level or calls that handler directly if it is vectored The driver no longer needs to care which type of interrupt mechanism is in use in the handler ddi add intr 9F is used to register a handler with the system A driver defined argument of type caddr t to pass to the interrupt handler The address of the state structure is a good choice The handler can then cast the caddr t to whatever was passed See Registering Interrupts on page 111 and Responsibilities of an Interrupt Handler on page 113 for more information In SunOS 4 x to do a DMA transfer the driver mapped a buffer into the DMA space retrieved the DMA address and programed the device did the transfer then freed the mapping This was accomplished in a sequence like 1 mb mapalloc map buffer into DMA space 2 MBI ADDR retrieve address from returned cookie 3 program the device and start the DMA 4 mb map ree free mapping when DMA is complete The first three usually occurred in a start routine and the last in the interrupt routine The SunOS 5 x DMA model is similar but it ha
56. a number of devices at the given interrupt priority level has issued the interrupt 2 Inform the device that it is being serviced This is a device specific operation but is required for the majority of devices For example SBus devices are required to interrupt until the driver tells them to stop This guarantees that all SBus devices interrupting at the same priority level will be serviced Most vectored devices on the other hand stop interrupting after the bus interrupt acknowledge cycle however their internal state still indicates that they have interrupted but have not been serviced yet 3 Perform any I O request related processing Devices interrupt for different reasons such as transfer done or transfer error This step may involve reading the device s data buffer examining the device s error register and setting the status field in a data structure accordingly Interrupt dispatching and processing is relatively expensive The following points apply to interrupt processing Do only what absolutely requires interrupt context Do any additional processing that could save another interrupt for example read the next data from the device 4 Return DDI INTR CLAIMED Code Example 6 2 Interrupt routine static u_int xxintr caddr t arg Interrupt Handlers 113 114 struct xxstate xsp struct xxstate arg u char status temp Claim or reject the interrupt This example assumes
57. address pointed to by ap See the manual page for a list of supported capabilities whom indicates whether the capability applies only to the target at the specified SCSI address or to all targets serviced by the host adapter int scsi ifsetcap struct scsi address ap char cap int value int whom scsi_ifsetcap 9F sets the current value of the host adapter capability denoted by cap for the host adapter servicing the target at the SCSI address pointed to by ap to value See the manual page for a list of supported capabilities whom indicates whether the capability applies only to the target at the specified SCSI address or to all targets serviced by the host adapter Summary of Solaris 2 4 DDI DKI Services 339 340 struct scsi pkt scsi init pkt struct scsi address ap struct scsi pkt pktp struct buf bp int cmdlen int statuslen int privatelen int flags int callback caddr t caddr t arg scsi init pkt 9F requests the transport layer to allocate a command packet for commands and possibly data transfers If pktp is NULL a new scsi pkt 95 is allocated If bp is non NULL and contains a valid byte count the buf 9S structure is set up for DMA transfer If bp was allocated by scsi alloc consistent buf 9F the PKT CONSISTENT flag must be set If privatelen is set additional space is allocated for the pkt private area of the scsi_pkt 9S structure otherwise pkt private is a pointer that is typically used to sto
58. address less than 0x800000 Virtual addresses at or above 0x800000 are not set up by the PROM for use When mapping the device to vme16 vme24 or the top half of the vme32 address space after entering the PTE the top five bits of the physical page number are zero because the Sun 4 110 physical address space is split with 128 megabytes at the bottom and 128 megabytes at the top Whenever the physical address goes above 128 megabytes the high bit is sign extended so that the address lies within the top 128 megabytes Sign extending the high bit into the next five bits should result in the previously calculated physical page number In this example instead of using 0xE000000 as the starting address the value OxEO0000 could be used successfully Mapping Commands 1 Issue the PROM command that puts the CPU into supervisor data state gt sB 2 Calculate the PTE appropriate to the chosen physical address 3 Map the virtual address to the device The PROM p command will do this given a virtual address gt p F32000 Hardware Overview 39 40 This command takes the virtual address 0xF32000 as its argument and displays the PTE for it It also displays a which indicates that a new value may be typed in to replace the one displayed Note that all virtual addresses within a page select the same PTE Type any non hexadecimal character to stop 4 Repeat step 3 for each page to map Reading and Writing
59. as virtual CPU that schedules user thread execution When a user thread issues a system call the LWP running the thread calls into the kernel and remains bound to the thread at least until the system call completes When 73 lll HS an LWP is running in the kernel executing a system call on behalf of a user thread it runs one kernel thread Each LWP is therefore associated with exactly one kernel thread Kernel Threads There are two types of kernel threads those bound to an LWP and those not associated with an LWP Threads not associated with LWPs are system threads such as those created to handle hardware interrupts For those threads bound to an LWP there is one and only one kernel thread per LWP On a multiprocessor system several kernel threads can run simultaneously Even on uniprocessors running kernel threads can be preempted at any time to run other threads Drivers are mainly concerned with kernel threads as most device driver routines run as kernel threads Figure 4 1 illustrates the relationship between threads and lightweight processes Process 1 Process 2 Process 3 Process 4 User Threads Kernel Kernel y Threads O Threads LWP CPU Hardware 7 Figure 4 1 Threads and lightweight processes 74 Writing Device Drivers August 1994 4 A multithreaded kernel requires programmers to consider two issues locking primitives and thread synchroniz
60. at a lower priority than hardware interrupt handlers so they can do more work without seriously impacting the performance of the system Additionally if the hardware interrupt handler is high level it is severely restricted in what it can do In this case it is a good idea to simply trigger a software interrupt in the high level handler and put all possible processing in the lower level software interrupt handler Writing Device Drivers August 1994 6 Registering Interrupts Software interrupt handlers must not assume that they have work to do when they run since like hardware interrupt handlers they can run because some other driver triggered a soft interrupt For this reason the driver must indicate to the soft interrupt handler that it should do work before triggering the soft interrupt Before a device can receive and service interrupts it must register them with the system by calling ddi add intr 9F This provides the system with a way to associate an interrupt handler with an interrupt specification This interrupt handler is called when the device might have been responsible for the interrupt It is the handlers responsibility to determine if it should handle the interrupt and claim it if so The following steps are usually performed in attach 9E Test for high level interrupts Call ddi intr hilevel 9F to find out if the interrupt specification maps to a high level interrupt If it does one possibility is to pos
61. check for ioct1 2 returning a negative value for failure strategy SunOS 4 x int xxstrategy buf struct buf bp SunOS 5 x int xxstrategy struct buf bp Retrieving the minor number from the b dev field of the buf 9S structure no longer works or will work occasionally and fail in new and interesting ways at other times Use the b edev field instead If the driver used to allocate buffers uncached it should now use ddi dma sync 9F whenever consistent view of the buffer is required mmap SunOS 4 x Writing Device Drivers August 1994 D lll int xxmmap dev off prot dev_t dev off_t ORT int prot SunOS 5 x int xxmmap dev_t dev off_t off int prot Building a page table entry manually is no longer allowed The driver must use hat_getkpfnum 9F to retrieve the PTE information from a virtual address See Mapping Device Memory on page 161 for more information If the driver used to check for root privileges using suser it should now use drv_priv 9F Because there is no credential pointer passed to mmap 9E the driver must use ddi_get_cred 9F to retrieve the credential pointer chpoll chpo11 9E is similar in operation to select but there are more conditions that can be examined See Multiplexing I O on File Descriptors on page 162 for details SunOS 4 1 x to SunOS 5 4 Differences This table compares device driver routines on SunOS 4 1 x versus SunOS 5 4 It is n
62. cmn err ptob ddi_ptob pullupmsg put putbq putctl limit transfer request size to system maximum setup system DMA resources retrieve DMA address return the number of bytes in a message error function returning ENXIO prevent a queue from being scheduled function returning zero copy overlapping byte memory regions reboot at fatal error read a short value from a location read a byte value from a location read a long value from a location limit transfer request size request by a SCSI target driver to start a command write a short value to a location write a byte value to a location write a long value to a location display an error message or panic the system convert priority level send a signal to a process convert size in pages to size in bytes concatenate bytes ina message call a STREAMS put procedure place a message at the head of a queue send a control message to a queue Converting a Device Driver to SunOS 5 4 293 294 Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description putctll putctll send a control message with one byte parameter to a queue putnext putnext send a message to the next queue putg putq put a message on a queue qenable qenable enable a queue qreply qreply send a message on a stream in the reverse direction qsize qsize find the number of messages on a queue remintr ddi_r
63. comes with the correct entry for serial port B but one must be added for serial port A debug dv dev term a br 9600 el1 C S Q U D ie S 0e D 3 Ina shell window on the host run t ip 1 and specify the name of the entry test tip debug connected The shell window is now a tip window directed to the test machine Writing Device Drivers August 1994 1I3 Setting Up the Test System A quick way to set up the test machine is to unplug its keyboard before turning it on It then automatically uses serial port A as the console Another way to do this is to use boot PROM commands to make serial port A the console 1 On the test machine enter the boot PROM ok prompt Direct I O to the serial line indicating the correct serial port In this example the test machine is using serial port A so the command is ttya io Pressing Return in the tip window should get a boot PROM prompt Caution Do not use L1 A on the host machine to send a break to stop the test machine This actually stops the host machine To send a break to the test machine type in the tip window Tilde commands such as this are recognized only if they are the first characters on a line so press the Return key or Control U first if there is no effect 2 To make the test machine always come up with serial port A as the console set the environment variables input device and output device ok setenv inpu
64. command without disconnect privileges FLAG NODISCON or to disable parity FLAG NOPARITY See scsi pkt 95 for details pkt time is a timeout value in seconds If the command does not complete within this time the host bus adapter calls the completion routine with pkt reason set to CMD TIMEOUT The target driver should set this field to longer than the maximum time the command might take If the timeout is zero no timeout is requested Timeout starts when the command is transmitted on the SCSI bus pkt scbp is a pointer to the SCSI Status completion block this is filled in by the host bus adapter driver pkt cdbp is a pointer to the SCSI Command Descriptor Block the actual command to be sent to the target device The host bus adapter driver does not interpret this field The target driver must fill it in with a command that the target device understands pkt residis the residual of the operation When allocating DMA resources for a command scsi init pkt 9F pkt_resid indicates the number of bytes for which DMA resources could rot be allocated due to DMA hardware scatter gather or other device limitations After command transport pkt_resid indicates the number of data bytes not transferred this is filled in by the host bus adapter driver before the completion routine is called pkt state indicates the state of the command The host bus adapter driver fills in this field as the command progresses One bit is set in this field
65. consider an open for reading invalid credp is a pointer to a credential structure containing information about the caller such as the user ID and group IDs Drivers should not examine the structure directly but should instead use drv_priv 9F to check for the common case of root privileges In this example only root is allowed to open the device for writing close int xxclose dev_t dev int flag int otyp cred_t credp close 9E should perform any cleanup necessary to finish using the minor device and prepare the device and driver to be opened again For example the open routine might have been invoked with the exclusive access FEXCL flag A call to close 9E would allow further opens to continue Other functions that close 9E might perform are Wait for I O to drain from output buffers before returning Rewind a tape tape device Hang up the phone line modem device I O Request Handling This section gives the details of I O request processing from the application to the kernel the driver the device the interrupt handler and back to the user User Addresses When a thread issues a write 2 system call it passes the address of a buffer in user space char buffer python count write fd buffer strlen buffer 1 Drivers for Character Devices 153 154 Vectored I O The system builds a uio 9S structure to describe this transfer by allocating an iovec 95 structure and setting t
66. declaration including at least the types before the function is called Prototypes are provided for most DDI DKI functions so many potentially fatal errors are now caught at compile time Writing Device Drivers August 1994 A Header Files Overview of Changes For Solaris 2 x DDI DKI compliance drivers are allowed to include only the kernel header files listed in the synopsis sections of Section 9 of the man Pages 9 DDI and DKI Overview All allowed kernel header files are now located in the usr include sys directory New header files all drivers must include are sys ddi h and lt sys sunddi h gt These two headers must appear last in the list of kernel header include files Autoconfiguration Under SunOS 4 1 2 or later the system initialized all the drivers in the system before starting init 8 The advent of loadable module technology allowed some device drivers to be added and removed manually at later times in the life of the system SunOS 5 X extends this idea to make every driver loadable and to allow the system to automatically configure itself continually in response to the needs of applications This plus the unification of the mb style and Open Boot style autoconfiguration has meant some significant changes to the identify 9E probe 9E and attach 9E routines and has added detach 9E Because all device drivers are loadable the kernel no longer needs to be recompiled and relinked to add a d
67. describe the built in restrictions of a DMA engine These limits include 128 Writing Device Drivers August 1994 N lll Limits on addresses the device can access Maximum transfer count Address alignment restrictions To ensure that DMA resources allocated by the system can be accessed by the device s DMA engine device drivers must inform the system of their DMA engine limitations using a ddi_dma_lim 9S structure The system may impose additional restrictions on the device attributes but it never removes any of the driver supplied restrictions DMA Limits All DMA resource allocation routines take a pointer to a DMA limit structure as an argument see Code Example 7 1 on page 134 This structure is currently processor architecture dependant ddi dna lim sparc The SPARC DMA limit structure contains the following members typedef struct ddi dma lim t u long dlim addr lo lower bound of address range u long dlim addr hi inclusive upper bound of address range u int dlim minxfer minimum effective DMA transfer size u int dlim cntr max inclusive upper bound of address register u int dlim burstsizes bitmask encoded DMA burst sizes u int dlim dmaspeed average DMA data rate KB s ddi dma lim t dlim addr lo is the lowest address that the DMA engine can access dlim addr hi is the highest address that the DMA engine can access dlim minxfer is the minimum effec
68. di prop remove 9F Remove a property d di prop remove all 9F Remove all properties associated with a device prop op To report the values of device properties to the system drivers must fill in the entry point in the cb ops 95 structure with their own prop op 9E entry point or the ddi prop op 9F routine A prop op 9E routine is only necessary if more control is needed over property management For example if the value of a property changes frequently it may be more efficient for the driver to maintain it locally updating a variable representing the property Writing Device Drivers August 1994 3 value whenever it changes If a caller requests the value of the property the driver s prop_op 9E modifies the property using ddi_prop_modify 9F and then calls ddi_prop_op 9F to retrieve it Providing a prop_op 9E entry point does not mean that the driver must manage all properties locally If a property is modified dynamically it should be maintained by the driver in prop op 9E If a property is static set only once it is easier for the driver to use ddi prop create 9F to create it and allow ddi prop op 9P to retrieve it The prop op 9E entry point would just intercept some property requests and pass all others to addi prop op 9F Here is the prop op 9E prototype int xxprop op dev t dev dev info t dip ddi prop op t prop op int flags char name caddr t valuep int lengthp This section d
69. disk sectors btop btop convert size in bytes to size in pages ddi btop round down btopr btopr convert size in bytes to size in pages ddi btopr round up bufcall bufcall call a function when a buffer becomes available bzero bzero zero out memory canput canput test for room in a message queue clrbuf clrbuf erase the contents of a buffer copyb copyb copy a message block copyin ddi_copyin copy data from a user program to a driver buffer copymsg copymsg copy a message Writing Device Drivers August 1994 D lll Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description copyout ddi_copyout copy data from a driver to a user program datamsg datamsg test whether a message is a data message delay delay delay execution for a specified number of clock ticks disksort disksort single direction elevator seek sort for buffers dupb dupb duplicate a message block descriptor dupmsg dupmsg duplicate a message enableok enableok reschedule a queue for service esballoc esballoc allocate a message block using caller supplied buffer esbbcall esbbcall call function when buffer is available ffs ddi ffs find first bit set in a long integer fls ddi fls find last bit set in a long integer flushq flushq remove messages from a queue free pktiopb freeb freemsg
70. for each of the five following command states e STATE GOT BUS Acquired the bus STATE GOT TARGET Selected the target STATE SENT CMD Sent the command STATE XFERRED DATA Transferred data if appropriate S TATE GOT STATUS Received status from the device Writing Device Drivers August 1994 10z pkt statistics contains transport related statistics set by the host bus adapter driver pkt reason gives the reason the completion routine was called The main function of the completion routine is to decode this field and take the appropriate action If the command completed in other words if there were no transport errors this field is set to CMD_CMPLT other values in this field indicate an error After a command completes the target driver should examine the pkt_scbp field for a check condition status See scsi_pkt 9S for more information State Structure This section adds the following fields to the state structure See State Structure on page 57 for more information struct scsi pkt rqs Request Sense packet struct buf rqsbuf buf for Request Sense struct scsi_pkt pkt packet for current command struct scsi device sdp pointer to device s scsi device 9S structure rqs is a pointer to a SCSI Request Sense command scsi_pkt 9S structure allocated in the att ach 9E routine This packet is preallocated because the Request Sense comm
71. for each allocation operation that may have been performed Registering Interrupts In the call to addi add intr 9F inumber specifies which of several possible interrupt specifications is to be handled by intr handler For example if the device interrupts at only one level pass 0 for inumber The interrupt specifications being referred to by inumber are described by the interrupts property see driver conf 4 isa 4 eisa 4 mca 4 sysbus 4 vme 4 and sbus 4 intr handler is a pointer to a function in this case xxintr to be called when the device issues the specified interrupt intr handler arg is an argument of type caddr t to be passed to intr handler intr handler arg may be a pointer to a data structure representing the device instance that issued the interrupt ddi add intr 9F returns an interrupt block cookie in xsp iblock cookie for use in calls to mutex_init 9F it also returns a Autoconfiguration 99 100 device cookie in xsp idevice cookie for use with devices having programmable bus interrupt levels The device cookie contains the following fields u short idev vector u short idev priority The idev priority field of the returned structure contains the bus interrupt priority level and the ideo vector field contains the vector number for vectored bus architectures such as VMEbus Note There is a potential race condition in attach 9E The interrupt routine is eligible to be called as soon as d
72. for the device specific driver and separates device debugging from driver debugging Depending on the driver and device mmap 2 or read 2 and write 2 possibly with 1seek 2 may be used The mem and kmem drivers access physical memory and kernel memory respectively dev mem is used to provide a core memory image to debuggers such as adb 1 To examine kernel memory from a user program consider using the libkvm routines see section kvm_open 3K for a slightly portable way Be aware that kernel structures change frequently so any code that examines the kernel is likely to need changes in future releases or on other platforms For devices there are two bus space drivers vmemem and sbusmem These allow access to devices on the bus without a driver After verifying that the device is accessible through the PROM see The PROM on SPARC Machines on page 26 these drivers can verify that it is accessible by SunOS However special handling requiring knowledge of the device such as interrupt handling and DMA can not be performed by these drivers Be careful to not compromise system security such as by giving non root users access to the special files for these drivers or system integrity by accessing other devices This program opens the sbusmem driver for the slot the bwtwo is in and performs the same operations that were done previously with the PROM see Reading and Writing on page 33 Code Example 13
73. get pktiopb geterror getlongprop getprop getproplen getq gsignal hat getpkfnum Scsi free consistent buf freeb freemsg Scsi alloc consistent buf geterror ddi getlongprop ddi getprop ddi getproplen getq hat getkpfnum free a SCSI packet in the iopb map free a message block free all message blocks in a message allocate a SCSI packet in the iopb map get buffer s error number get arbitrary size property information get boolean and integer property information get property information length get the next message from a queue send signal to process group get page frame number for kernel address Converting a Device Driver to SunOS 5 4 291 292 Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description index strchr return pointer to first occurrence of character in string insq insq insert a message into a queue kmem alloc kmem free kmem zalloc linkb log machineid major makecom_g0 makecom gO0 s makecom gl makecom g5 mapin map regs mapout unmap_regs max mb mapalloc mb mapfree mballoc mbrelse mbsetup min minor kmem alloc kmem free kmem zalloc linkb strlog getmaJjor makecom_g0 makecom gO s makecom_gl makecom g5 ddi map regs ddi unmap regs max
74. handle to store the returned handle Handling Resource Allocation Failures The resource allocation routines provide the driver several options when handling allocation failures The waitfp argument indicates whether the allocation routines will block return immediately or schedule a callback waitfp Indicated Action DDI DMA DO TWAIT DDI DMA SLEE Other values Driver does not wish to wait for resources to become available Driver is willing to wait indefinitely for resources to become available The address of a function to be called when resources are likely to be available State Structure This section adds the following fields to the state structure See State Structure on page 57 for more information DMA 135 136 struct buf bp current transfer ddi dma handle t handle struct xxiopb iopb array for I O Parameter Blocks ddi dma handle t iopb handle Device Register Structure Devices that do DMA have more registers than have been used in previous examples This section adds the following fields to the device register structure to support DMA capable device examples volatile caddr t dma addr starting address for DMA volatile u int dma size amount of data to transfer volatile caddr t iopb_addr When written informs device of the next command s parameter block address When read after an interrupt contains the a
75. id If the number of clock ticks originally specified to t imeout 9F have not elapsed the callback function will not be called These interfaces manage device interrupts and software interrupts The basic model is to register with the system an interrupt handling function to be called when a device interrupts or a software interrupt is triggered Writing Device Drivers August 1994 C lll int ddi_add_intr dev_info_t dip u_int inumber ddi_iblock_cookie_t iblock_cookiep ddi_idevice_cookie_t idevice_cookiep u int int handler caddr t caddr t int handler arg ddi_add_intr 9F tells the system to call the function pointed to by int handler when the device specified by dip issues the interrupt identified by inumber ddi add intr 9F passes back an interrupt block cookie in the location pointed to by iblock cookiep and an interrupt device cookie in the location pointed to by idevice cookiep The interrupt block cookie is used to initialize mutual exclusion locks mutexes and other synchronization variables The device interrupt cookie is used to program the level at which the device interrupts for those devices that support such programming void ddi remove intr dev info t dip u int inumber ddi iblock cookie t iblock cookie ddi remove intr 9F tells the system to stop calling the interrupt handler registered for the interrupt inumber on the device identified by dip iblock cookie is the interrupt block cookie tha
76. interrupting Device The interrupt routine is very similar to the asynchronous version with the addition of the call to start and the removal of the call to cv signal 9F Code Example 9 8 Asynchronous block driver interrupt routine static u int xxintr caddr t arg struct xxstate xsp struct xxstate arg struct buf bp u_char temp status mutex enter amp xsp mu Drivers for Block Devices 187 lll Ko status xsp gt regp gt csr if status amp INTERRUPTING mutex exit amp xsp mu return DDI INTR UNCLAIMED Get the buf responsible for this interrupt bp xsp gt bp xsp gt bp NULL This example is for a simple device which either succeeds or fails the data transfer indicated in the command status register EJ if status amp DEVICE_ERROR failure bp b resid bp b bcount bioerror bp EIO else success bp b resid 0 xsp regp csr CLEAR INTERRUPT Read the csr to flush any hardware store buffers temp xsp gt regp gt csr The transfer has finished successfully or not biodone bp release any resources used in the transfer such as DMA resources ddi_dma_free 9F Let the next I O thread have access to the device xsp gt busy 0 mutex exit amp xsp mu void xxstart caddr t xsp return DDI INTR CLAIMED Miscellaneous Entry Po
77. lock_B Modify A Modify B Release lock_A Release lock_B Figure 4 3 SunOS 5 x on a multiprocessor In Figure 4 3 CPU1 and CPU3 are executing kernel code simultaneously In traditional UNIX systems any section of kernel code runs until it explicitly gives up the processor by calling sleep or is interrupted by hardware This is not true in SunOS 5 x A kernel thread can be preempted at any time to run another thread Since all kernel threads share kernel address space and often need to read and modify the same data the kernel provides a number of locking primitives to prevent threads from corrupting shared data These mechanisms include mutual exclusion locks readers writer locks and semaphores Storage Classes of Driver Data 76 The storage class of data is a guide to whether the driver may need to take explicit steps to control access to the data Automatic Stack Data Since every thread has a private stack drivers never need to lock automatic variables Writing Device Drivers August 1994 Hx lll Global and Static Data Global and static data can be shared by any number of threads in the driver the driver may need to lock this type of data at times Kernel Heap Data Kernel heap data such as data allocated by kmem_alloc 9F may be shared by any number of threads in the driver If this data is shared the driver may need to protect it at times State Structure This section adds the following field t
78. maxprot u int flags cred t credp struct ddi mapdev ctl ctl ddi mapdev handle t handle void devprivate ddi mapdev 9F sets up user mappings to device space in the same manner as ddi segmap 9F However unlike mappings created with ddi segmap 9F mappings created with ddi mapdev 9F have a set of driver entry points and a mapping handle associated with them The driver is notified via these entry points in response to user events on the mappings int ddi mapdev intercept ddi mapdev handle t handle off t offset off t len int ddi mapdev intercept ddi mapdev handle t handle off t offset off t len ddi_mapdev_intercept 9F and ddi_mapdev_nointercept 9F control whether or not user accesses to the device mappings created by ddi_mapdev 9F in the specified range will generate an access event notification to the device driver ddi mapdev intercept 9F tells the system to intercept mapping accesses and invalidates the mapping translations ddi mapdev nointercept 9F prevents the system from intercepting mapping accesses and validates the mapping translations Writing Device Drivers August 1994 C lll I O Port Access These interfaces support the accessing of device registers from the device driver unsigned char inb int port unsigned short inw int port unsigned long inl int port void repinsb int port unsigned char addr int count void repinsw int port unsigned short addr int count void repins
79. name pointed to by name and the value pointed to by valuep int ddi prop modify dev t dev dev info t dip int flags char name caddr t valuep int length ddi prop modify 9F changes the value of the property identified by name to the value pointed to by valuep int ddi prop remove dev t dev dev info t dip char name ddi prop remove 9F frees the resources associated with the property identified by name void ddi prop remove all dev info t dip ddi prop remove all 9F frees the resources associated with all properties belonging to dip ddi prop remove all 9F should be called in the detach 9E entry point if the driver defines properties int ddi prop undefine dev t dev dev info t dip int flags char name ddi prop undefine 9F marks the value of the property identified by name as temporarily undefined The property continues to exist however and may be redefined later using ddi prop modify 9F Summary of Solaris 2 4 DDI DKI Services 331 int ddi prop op dev t dev dev info t dip ddi prop op t prop op int flags char name caddr t valuep int lengthp ddi prop op 9F is the generic interface for retrieving properties ddi prop op 9F should be used as the prop op 9E entry in the cb ops 95 structure if the driver does not have a prop op 9E routine See Properties on page 59 for more information int ddi getprop dev t dev dev info t dip int flags char name int defvalue
80. need to be declared as constants If the parameter is a pointer though it can be declared to point to a constant object int strlen const char s Any attempt to change the string by st rlen is an error and the compiler will now catch it volatile The correct use of volatile is necessary to prevent elusive bugs It instructs the compiler to use exact semantics for the declared objects in particular do not optimize away or reorder accesses to the object There are two instances where device drivers must use the volatile qualifier 1 When data refers to an external hardware device register memory that has side effects other than just storage 2 When data refers to global memory that is accessible by more than one thread is not protected by locks and therefore is relying on the sequencing of memory accesses In general drivers should not qualify a variable as volatile if it is merely accessible by more than one thread and protected from conflicting access by synchronization routines The following is an example of the first use There are two writes to a device required to begin a transfer struct device_reg regp regp gt csr ENABLE_INTERRUPTS regp gt csr START_TRANSFER Overview of SunOS Device Drivers 69 70 A highly optimizing compiler may determine that the value of regp csr is not used between the first and the second assignment and could remove the first assignment
81. overlays a larger VME space it steals memory from the larger space and is considered by the MMU to be part of the larger address space There is no way to physically access VMEbus addresses above OxFF000000 in 32 bit VMEbus space or above 0x00FF0000 in 24 bit VMEbus space Figure 2 1 illustrates the overlaying of VMEbus address apaces i mel6d32 Type eI vme32d32 u LF 2 bits MMU 32 bits vme24d32 32 bit MS a vme32d16 vmel6d16 24 bits OnBoard Physical dis Address 32 bits OnBoard vme24d16 Memory Figure 2 1 Sun 4 architecture VMEbus address spaces Caution There are restrictions on device addressing The lower ranges of the 32 bit and 24 bit VME space are reserved for DMA For example on the Sun 4 architecture devices must not be present in the low megabyte of VME address Writing Device Drivers August 1994 2 x66 Buses space or the system will not boot In addition there may be devices on the bus with addresses that conflict These can be determined by examining the hardware configuration files Hardware Configuration Files Most VME devices require hardware configuration files to inform the system that the device hardware may be present The configuration file must specify the device addresses on the VMEbus and any interrupt capabilities that the device has Configuration files for VMEbus devices should identify th
82. registers complete at the proper time 12 Writing Device Drivers August 1994 2 For example when acknowledging an interrupt the driver usually sets or clears a bit in a device control register The driver must ensure that the write to the control register has reached the device before the interrupt handler returns Similarly if the device requires a delay the driver busy waits after writing a command to the control register the driver must ensure that the write has reached the device before delaying If the device registers can be read without undesirable side effects verification of a write can be as simple as reading the register immediately after writing to it If that particular register cannot be read without undesirable side effects another device register in the same register set can be used see ddi map regs 9F If no device register in the set can be read without undesirable side effects one of the ddi poke 9F routines may be used as a last resort to write to the registers When these routines return they guarantee that the write has reached the device Note Future hardware platform implementations may not permit the ddi poke 9F routines to guarantee that a write has reached a device Drivers should avoid the use of ddi_poke 9F for this purpose whenever possible SPARC Memory Model The SPARC memory model defines the semantics of memory operations such as load and store and specifies how the order
83. structure xxcallback is the private command completion routine f rqpkt pkt comp xxcallback rqpkt pkt time 30 30 second command timeout rqpkt pkt flags FLAG SENSING xsp gt rqs rgpkt xsp gt rqsbuf bp create minor nodes report device and do any other initialization xsp gt open 0 return DDI SUCCESS failed if bp Scsi free consistent buf bp if rqpkt Sscsi destroy pkt rqpkt Sdp sd private caddr t NULL Sdp sd sense NULL Scsi unprobe sdp free any other resources such as the state structure return DDI FAILURE detach The detach 9E entry point is the inverse of att ach 9E it must free all resources that were allocated in at tach 9E If successful the detach should call scsi_unprobe Code Example 10 3 SCSI target driver detach 9E routine static int xxdetach dev info t dip ddi detach cmd t cmd struct xxstate xsp Writing Device Drivers August 1994 10 normal detach 9E operations such as getting a pointer to the state structure Scsi free consistent buf xsp gt rqsbuf Scsi destroy pkt xsp gt rqs xsp sdp sd private caddr t NULL xXSp sdp sd sens NULL Scsi unprobe xsp sdp remove minor nodes free resources such return getinfo DDI SUCCI as the state structure ESS The get info 9E routine for SCSI target drivers is much the same as for other d
84. system through st rategy 9E These buffers are already locked by the file system Kernel memory allocated within the device driver such as that allocated by ddi mem alloc 9F or ddi iopb alloc 9F For other objects such as buffers from user space physio 9F must be used to lock down the objects This is usually performed in the read 9E or write 9E routines of a character device driver See DMA Transfers on page 158 for an example Allocating DMA Resources Two interfaces are recommended for allocating DMA resources DMA 133 lll N ddi_dma_buf_setup 9F Recommended for use with buffer structures ddi_dma_addr_setup 9F Recommended for use with virtual addresses Table 7 1 lists the appropriate DMA resource allocation interfaces for different classes of DMA objects Table 7 1 DMA Resource Allocation Interfaces Type of Object Resource Allocation Interface Memory allocated within the driver using ddi_dma_addr_setup 9F ddi_mem_alloc 9F or ddi_iopb_alloc 9F Requests from the file system through ddi_dma_buf_setup 9F strategy 9E Memory in user space that has been locked down ddi_dma_buf_setup 9F using physio 9F All resource allocation routines return a DMA handle for use in subsequent calls to DMA related functions DMA resources are usually allocated in the driver s xxstart routine if it has one See Asynchronous Data Transfers on page 184 for discussion of xxstart int ddi dm
85. the FKIOCTL flag is set it indicates that buf is a kernel address and ddi copyout 9F behaves like bcopy 9F Otherwise buf is interpreted as a user buffer address and ddi_copyin 9F behaves like copyout 9F The value of the 1ags argument to ddi copyout 9F should be passed through directly from the mode argument of ioct1 9E untranslated These interfaces verify the credentials of application threads making system calls into drivers They are sometimes used in the open 9E entry point to restrict access to a device though this is usually achieved with the permissions on the special files in the file system Writing Device Drivers August 1994 C lll Device Configuration int drv priv cred t credp drv_priv 9F returns zero if the credential structure pointed to by credp is that of a privileged thread It returns EPERM otherwise Only use drv_priv 9F in place of calls to the obsolete suser function and when making explicit checks of a calling thread s UID These interfaces are used in setting up a driver and preparing it for use Some of these routines handle the dynamic loading of device driver modules into the kernel and some manage the minor device nodes in devices that are the interface to a device for application programs All of these routines are intended to be called in the driver s init 9E fini 9E info 9E attach 9E detach 9E and probe 9E entry points int ddi create minor node dev info t
86. the PROM User s Guide Mapping the Device To test the device it must be mapped into memory The PROM can then be used to verify proper operation by using data transfer commands to transfer bytes words and long words If the device can be made to operate from the PROM even in a limited way the driver should also be able to operate the device To set up the device for initial testing 1 Select an appropriate virtual address for the testing of the device 2 Determine the physical address of the device as well as the address space that it occupies 3 Use the monitor to map the system s virtual address to the device s physical address Selecting a Virtual Address When mapping a virtual address to a physical address the MMU is actually mapping to a page of physical memory and an offset within that page The low order bits of a virtual address those that specify the offset are not mapped See Figure 2 2 on page 35 The mapping mechanism is essentially the same for all systems though the details of the address size and page mapping differ Writing Device Drivers August 1994 No lll 32 bits high high 32 bits MMU Input 19 19 Output Virtual Physical Address Address low 13 Figure 2 2 Sun 4 architecture address mapping The easiest way to select a virtual address for testing is to use one between 0x4000 and 0x100000 Addresses in this range are unused by the PROM in Sun 4 architecture machines
87. the completion function which was specified in the scsi packet At this time the host bus adapter driver is no longer responsible for the packet and the target driver has regained ownership of the packet 5 The SCSI packet s completion routine analyzes the returned information and determines whether the SCSI operation was successful If a failure has occurred the target driver may retry the command by calling scsi transport 9F again If the host bus adapter driver does not support auto request sense the target driver must submit a request sense packet in order to retrieve the sense data in the event of a check condition 6 If either the command completed successfully or cannot be retried the target driver calls scsi destroy pkt 9F which synchronizes the data and frees the packet If the target driver needs to access the data before freeing the packet it may call scsi sync pkt 9F 7 Finally the target driver notifies the application program that originally requested the read or write that the transaction is complete either by returning from the read 9E entry point in the driver for a character device or indirectly through biodone 9F The SCSA allows the execution of many of such operations both overlapped and queued at various points in the process The model places the management of system resources on the host bus adapter driver The software interface allows the execution of target driver functions on host bus adapter driver
88. the parent bus nexus page size rounded up to the nearest page unsigned long ddi ptob dev info t dip unsigned long pages ddi ptob 9F converts a size expressed in terms of the parent bus nexus page size to a size expressed in bytes int ddi ffs long mask ddi ffs 9F returns the number of the first least significant bit set in mask int ddi fls long mask ddi fls 9F returns the number of the last most significant bit set in mask caddr t ddi get driver private dev info t dip ddi get driver private 9F returns a pointer to the data stored in the driver private area of the dev info node identified by dip void ddi set driver private dev info t dip caddr t data ddi set driver private 9F sets the driver private data of the dev info node identified by dip to the value data int ddi peekc dev info t dip char addr char valuep ddi peekc 9F reads a character from the address addr to the location pointed to by valuep Writing Device Drivers August 1994 C lll int ddi_peeks dev_info_t dip short addr short valuep ddi_peeks 9F reads a short integer from the address addr to the location pointed to by valuep int ddi peekl dev info t dip long addr long valuep ddi_peek1 9F reads a long integer from the address addr to the location pointed to by valuep int ddi peekd dev info t dip longlong t addr longlong t valuep ddi_peekd 9F reads a double long integer from the
89. to a driver debugging and test machine using t ip 1 This allows a window on the host system called a tip window to be used as the console of the test machine See tip 1 for additional information Note A second machine is not required to debug a SunOS device driver It is only required for the use of tip 1 233 13 234 Using a tip window is very helpful It lets the window system to assist in interactions with the boot PROM or kadb For example the window can keep a log of the session which is very handy if the driver crashes the test system It allows the test machine to be remote It is reached by logging into a host machine often called a tip host and using tip 1 to connect to the test machine Setting Up the Host System A simple setup for connecting serial port A on the host running Solaris 2 x to serial port A on the test machine a SPARC system with an Open Boot PROM is 1 Connect the host system to the test machine using either serial port the example in this section uses port A This connection must be made with a null modem cable which connects the signal Receive to Transmit and Ground to Ground This cable can be constructed by the developer or a null modem adaptors can be found at electronics stores 2 On the host system make an entry in etc remote for the connection if it is not already there see remote 4 The terminal entry must match the serial port being used Solaris 2 x
90. to configure this channel in a cascade mode such that the DMA engine will not interfere with the transfer The platform that the device operates on may provide one of two types of memory access Direct Memory Access DMA or Direct Virtual Memory Access DVMA On platforms that support DMA the device is provided with a physical address by the system in order to perform transfers In this case one logical transfer may actually consist of anumber of physically discontiguous transfers An example of this occurs when an application transfers a buffer that spans several contiguous virtual pages that map to physically discontiguous pages In order to deal with the discontiguous memory devices for these platforms Writing Device Drivers August 1994 ie usually have some kind of scatter gather DMA capability Typically the system that supports x86 platforms provides physical addresses for direct memory transfers On platforms that support DVMA the device is provided with a virtual address by the system in order to perform transfers In this case the underlying platform provides some form of MMU which translates device accesses to these virtual addresses into the proper physical addresses The device transfers to and from a contiguous virtual image that may be mapped to discontiguous virtual pages Devices that operate in these platforms don t need scatter gather DMA capability Typically the system which supports SPARC platforms provi
91. to the device Since the page daemon has no relation to the current user thread st rategy 9E has kernel context in this case Interrupt Context Interrupt context is a more restrictive form of kernel context Driver interrupt routines operate in interrupt context and have an interrupt level associated with them See Chapter 6 Interrupt Handlers for more information High level Interrupt Context High level interrupt context is a more restricted form of interrupt context If ddi intr hilevel 9F indicates that an interrupt is high level driver interrupt routines added for that interrupt with ddi add intr 9F run in high level interrupt context See Handling High Level Interrupts on page 113 for more information Device drivers do not usually print messages Instead the entry points should return error codes so that the application can determine how to handle the error If the driver really needs to print a message it can use cmn err 9F to do so This is similar to the C function print 35 but only prints to the console to the message buffer displayed by dmesg 1M or both Overview of SunOS Device Drivers 55 void cmn err int level char format format is similar to the printf 35 format string with the addition of the format b which prints bit fields level indicates what label will be printed CE NOTE NOTICE format n WARN WARNING format n _CONT format PANIC panic format n C C
92. update the CPU cache with the new data Similarly a synchronization step is required if data modified by the CPU is to be accessed by a device DMA 143 144 There may also be additional caches and buffers in between the device and memory such as caches associated with bus extenders or bridges ddi_dma_sync 9F is provided to synchronize all applicable caches ddi dma sync If a memory object has multiple mappings such as for a device through the DMA handle and for the CPU and one mapping is used to modify the memory object the driver needs to call ddi dma sync 9F to ensure that the modification of the memory object is complete before accessing the object through another mapping ddi dma sync 9F may also inform other mappings of the object that any cached references to the object are now stale Additionally ddi dma sync 9F flushes or invalidates stale cache references as necessary Generally the driver has to call ddi dma sync 9F when a DMA transfer completes The exception to this is that deallocating the DMA resources ddi dma free 9F does an implicit ddi dma sync 9F on behalf of the driver int ddi dma sync ddi dma handle t handle off t off u int length u int type If the object is going to be read by the DMA engine of the device the device s view of the object must be synchronized by setting type to DDI DMA SYNC FORDEV If the DMA engine of the device has written to the memory object and the obj
93. xxctx KM SLEEP mutex_enter amp xsp gt ctx_lock newctx gt xsp Xxsp bcopy ctxp gt context newctx context XXCTX SIZE newctx handle new handle new devprivate newctx mutex exit amp xsp ctx lock return 0 Writing Solaris Graphics Device Drivers August 1994 Loading and Unloading Drivers Dm This chapter describes the procedure for installing a device driver in the system and for dynamically loading and unloading a device driver during testing and development Preparing for Installation Before the driver is actually installed all necessary files must be prepared The drivers module name must either match the name of the device nodes or the system must be informed that other names should be managed by this driver The driver must then be properly compiled and a configuration file must be created if necessary Module Naming The system maintains a one to one association between the name of the driver module and the name of the dev info node For example a dev info node for a device named wombat is handled by a driver module called wombat in a subdirectory called drv resulting in drv wombat found in the module path If the driver should manage dev info nodes with different names the add drv 1M utility can create aliases The i flag specifies the names of other dev info nodes that the driver handles 227 12 228 Compile and Link the Driver Comp
94. xxstrategy struct buf bp int xxprint dev t dev char str int xxdump dev t dev caddr t addr daddr t blkno int nblk int xxprop op dev t dev dev info t dip ddi prop op t prop op int mod flags char name caddr t valuep int length These routines have kernel context Character Driver Entry Points int xxopen dev t devp int flag int otyp cred t credp int xxclose dev t dev int flag int otyp cred t credp int xxread dev t dev struct uio uiop cred t credp int xxwrite dev t dev struct uio uiop cred t credp int xxioctl dev t dev int cmd int arg int mode cred t credp int rvalp int xxmmap dev t dev off t off int prot Writing Device Drivers August 1994 Qo lll Callback functions int xxsegmap dev_t dev off_t off struct as asp caddr_t addrp off_t len unsigned int prot unsigned int maxprot unsigned int flags cred_t credp int xxchpoll dev_t dev short events int anyyet short reventsp struct pollhead phpp int xxprop op dev t dev dev info t dip ddi prop op t prop op int mod flags char name caddr t valuep int length With the exception of prop op 9E all these routines have user context prop op 9E has kernel context Some routines provide a callback mechanism This is a way to schedule a function to be called when a condition is met Typical conditions for which callback functions are set up include When a transfer h
95. 0x800000 within the bwtwo space As a result the actual offset to be mapped is 0x6800000 3 Use the map sbus word to map the device in The map sbus word takes an offset and a size as arguments to map Like the offset the size of the byte transfer is specific to the device In the bwtwo example the size is set to 20000 bytes Hardware Overview 31 lll No Reading and Writing 32 In the code example below the offset and size values for the frame buffer are displayed as arguments to the map sbus word Notice that the virtual address to use is left on top of the stack The stack is then shown using the s word It can be assigned a name with the constant operation ok 6800000 20000 map sbus ok s ffe7f000 ok constant fb The PROM provides a variety of 8 bit 16 bit and 32 bit operations In general a c character prefix indicates an 8 bit one byte operation a w word prefix indicates a 16 bit two byte operation and an L longword prefix indicates a 32 bit four byte operation A suffix of is used to indicate a write operation The write operation takes the first two items off the stack the first item is the address and the second item is the value ok 55 ffe8000 c A suffix of is used to indicate a read operation The read operation takes one argument the address the off the stack ok ffe80000 cQ ok s ae A suffix of is used to display the value without af
96. 1a7000 4 le zs high intr 0xff1a0230 19c levell1 0xf0141ee0 404 idle 0x0 0x0 0x0 0xf0171ee0 0x0 0x1 28 thread id f0165ee0 le cv wait 0xf00e24e0 0xf00e24e0 0xff004000 0xb 0x0 0x4000e4 callout thread 0xff004090 0xf00d7d9a 0xf00e24e0 0xf00ac6c0 0x0 0xff004000 2c thread id f016bee0 thread id ff11c600 le biowait Oxf01886d0 0x0 0x7fe00 0x200 0Oxf00e085c 0x3241c physio 0xff196120 0xf01886d0 0xf01888a4 0x3241c 0x0 0xf0188878 338 rd write 0x1180000 0xf0188878 0xff19b680 0xff190680 0x2 0xff 335884 8c rdwr 0xff1505c0 0xf0188878 0xf0188918 0x0 0x0 0xff24dd04 138 rw 0xf0188e90 0xf0188918 0x2 0xf01888a4 0xf0188e90 0x3241c llc syscall 0xf00c1c54 4d4 Of all the threads only one has a stack trace that references the ramdisk driver It happens to be the last one It seems that the process running mk s 1M is blocked in biowait 9F after a call to physio 9F biowait 9F takes a bu 95 structure as a parameter the next step is to examine the buf 9S structure kadb 0 01886d0 lt buf Oxf01886d0 flags 129 Oxf01886d4 forw back av_forw av_back ff24dd04 72616d64 69736b3a 302c7261 Oxf01886e8 count bufsize error edev 512 770 0 1180000 Oxf01886ec addr blkno resid proc 3241c 3ff 0 f26f000 0xf0188714 iodone vp pages 0 01888a4 efffff68 Writing Device Drivers August 1994 1I3 Testing The resid field is 0 which indicates that the transfer is
97. 220 Loading and Unloading Drivers 00 ee cues 225 Preparing for Installation iva d veastesoequ queste tern d 225 Module Naming vanes detiene okenne nieee eb uE E DE 225 Compile and Link the Ditvers ovv oot vu dee ete o3 226 Write a Hardware Configuration File 226 Installing and Removing Drivers 00005 227 Copy the Driver to a Module Directory 227 Kuttgdd eevel M Laus e meis in deteted a ut arbeite s 227 Removing the Driver sedute RREREPNg dada sake Poen 228 Loading DIVete dai pa das bee ente tone aed ir aded 228 Getting the Driver Module s ID 00005 228 Unloading Drivers vu uva poA tret trit VP VR Eh ea 229 Deb gting xc cs core D AE E ERE cae ne aes 231 Machine Configuration xx qa Se eoe E RE CR OE a EX CR 231 xi Setting Up a tip 1 Connection cc utes e nee mnes 231 Preparing for the Worst Lv rptu eee eel heed Cy oed ea 233 Coding Hints iter Sce ee ex rue cu e adus 236 Process Lay OUlentsn wench ek eee ee RIMd4AqE EE Rosen 237 System OUPPOR 4422 55 55 5505s wd bo eet ee ad dede qus 237 Conditional Compilation and Variables 239 The Optimizer and volatile 20 0ics reme e 241 Using Existing Drivers iei Yao RET FR de bee Pe eR SARA ES 241 Debugging Tools eval oleae ache ODE OE C Ea E AR 243 J GEQ Sy SECIS eae ee UM ERA sae Wau Xx E RUE 243 modload and modulos eve b eva e e ew 244 Saving System Core Dumps eins erem eee 245 adb
98. 4 35 Figure4 1 Threads and lightweight processes 000005 72 Figure 4 2 SunOS 4 x kernels on a multiprocessor 4 73 Figure 4 3 SunOS 5 x ona multiprocessor 1 6 6 6c eee eee eee 74 Figure 5 1 Autoconfiguration Data Structures 0065 86 Figure 7 1 The DMA Model 0 6 cece 121 HBigure7 2 aches c aue Ee ert een e 141 Figure 10 1 SCSA Block Diagram 2 coe 191 Figure 11 1 Device context management 6 600 214 Figure 11 2 Device context switched to user process A 54 215 xvii xviii Writing Device Drivers August 1994 Tables Table 2 1 Table 2 2 Table 2 3 Table 2 4 Table 2 5 Table 2 6 Table 2 7 Table 2 8 Table 2 9 Table 4 1 Table 4 2 Table 5 1 Table 5 2 Table 7 1 Table 8 1 Table 9 1 Physical space in the SPARCstation 1 and SPARCstation 1 16 SPARCstation 1 SBus address bits 2005 17 Generic VMEbus full set 2 0 0 eee eee eee eee 18 Page table types for the Sun 4 00000000 19 ISA bus address space 0 6 22 EISA bus address space 1 6 6c ie a iita EAn eee eee 23 MCA address space 6 6 6 24 SBus physical addresses 000 0000000000 31 PTE Masks wis 3 00 core me mpeg e aede ad 37 Mutex routines dn eie a eae ta ase aa en 75 Condition variable routines 66 sasarane 78 Possible node types sv uote ve ELE EN hits Gola MI ya 99 Exa
99. 5969esev e rev pe oo DPPre 143 ddi mem alboc iamen tene aE aa 144 ddi dma devalrcgnq eseresesdszkriieia deed eee 145 Drivers for Character Devices essssssssssseseeees 147 Entry POLES 2 0 Eder aru atit Roc Se e o re ee QUE NC REAL CIC 147 Autoconfiguration uses ux CODO pA VAT ERU E we 148 Controlling Device ACC SS ca 6 E V eee 149 I O Request Handling ases er ceto ge bee ere set 151 User Addresses esee oo y ek EREDAR REELS Ss Y qs 151 Vectored LA Jones Spp DUO d Gee ves 152 Driver Operations oscuos eva sis Se le pas Peed eee 154 Mapping Device Memory keeles ote dux vu mes 159 Multiplexing I O on File Descriptors 0 162 Miscellaneous I O Controls dvs ee ex ettet vs ee ehawks 165 Drivers for Block Devices 169 Ie l O cte xXx E b DK PRIOR CR HOHER was qU 169 State SUCIUTO lt Mr 170 Entry POTS qutitetatqei dw e nC EROR ACA REO OUR en dea a o ps pen Et 170 A toOCOR Ie UPa HOD sv dede ob E Seeded settee yey E E 171 Controlling Device Access ois Leo eR XO CR RU OE X ea CIO 173 ix Data Transfers s M ited BG ot rbv qv vA ree ce ae eee 176 Strategy i Baki shat sv Me Ulta goed sees bait nM 176 The uS Structure a tee ted ps eid atthe el a a ae 176 Synchronous Data Transfers cseeeeeeee eese 178 Asynchronous Data Transfers 0 0000 eee eee 182 Miscellaneous Entry Points 32e CU den enar 186 dump y senride de dare alse SUR e oe yiien iens 186 pint s eansteswtvev ECC quee o tode pre VETE 188
100. 83 1994 Sun Microsystems Inc After the rest of the kernel has loaded moddebug is patched to see if loading is the problem since it got to rd_write before it is probably not the problem it will be checked anyway stopped at Oxfbd01028 ta Ox7d kadb 0 moddebug X moddebug moddebug 0 kadb 0 moddebug W 0x80000000 moddebug 0x0 0x80000000 kadb 0 c 262 Writing Device Drivers August 1994 1I3 modload 1M is used to load the driver to separate module loading from the real access f modload home driver drv ramdisk load usr kernel drv ramdisk id 61 loaded 0xff335000 size 3304 installing ramdisk module id 61 It loads fine so loading is not the problem The condition is recreated with mk s 1M f mkfs F ufs o nsect 8 ntrack 8 free 5 devices pseudo ramdisk 0 c raw 1024 ramdisk0 misusing 524288 bytes of memory It hangs At this point kadb 1M is entered and the stack examined Stopped at Oxfbd01028 ta Ox7d kadb 0 c end bcleb40 debug enter 0xfbd01000 0xffl1a7054 0x0 0x0 0xb 0xffla7000 88 zs high intr 0xff1a0230 19c levell1 0xf0141ee0 404 idle 0x0 0x0 0x0 0xf0171ee0 0x0 0x1 28 Debugging 263 13 264 It does not look like the current thread is the problem so the entire thread list is checked for hung threads kadb 0 threadlist thread id f0141ee0 0xfbd01000 0xff1a7054 0x0 0x0 0xb 0xff
101. C EE pe ER CE PANIC has the side effect of crashing the system This level should only be used if the system is in such an unstable state that to continue would cause more problems It can also be used to get a system core dump when debugging The first character of the format string is treated specially See cmn err 9F for more detail Dynamic Memory Allocation 56 Device drivers must be prepared to simultaneously handle all attached devices that they claim to drive There should be no driver limit on the number of devices that the driver handles and all per device information must be dynamically allocated void kmem alloc size t size int flag The standard kernel memory allocation routine is knem_alloc 9F It is similar to the C library routine malloc 3C with the addition of the 1ag argument The flag argument can be either KM SLEEP or KM NOSLEEP indicating whether the caller is willing to block if the requested size is not available If KM NOSLEEP is set and memory is not available kmem_alloc 9F returns NULL kmem_zalloc 9F is similar to knem_alloc 9F but also clears the contents of the allocated memory Note Kernel memory is a limited resource not pageable and competes with user applications and the rest of the kernel for physical memory Drivers that allocate a large amount of kernel memory may cause application performance to degrade Writing Device Dr
102. Configuration files for these devices should normally identify the parent bus driver as isa However since the EISA bus is a super set of the ISA bus all ISA devices can also be configured to run in the EISA bus slot In this case instead of implicitly specifying a particular parent in the configuration file driver writers can use the class key word and specify the class as sysbus This removes the dependency on the name of a particular bus driver See driver conf 4 and isa 4 for further details Writing Device Drivers August 1994 No lll EISA Bus Memory and I O Space Two address spaces are provided memory address space and I O address space Depending on the device registers may appear in one or both of these address spaces Table 2 6 EISA bus address space EISA Space Address Data Transfer Physical Address Name Size Size Range Main Memory 32 32 Ox0 Oxffffffff 1 O 8 16 32 OxO Oxffff Registers can be mapped in memory address space and used by the driver as normal memory see Memory mapped Access on page 44 Registers in I O space are accessed through I O port numbers using separate kernel routines See I O Port Access on page 45 for more information Hardware Configuration Files EISA bus devices require hardware configuration files to inform the system that the hardware may be present The configuration file must specify any device I O port addresses any interrupt capabilities that t
103. Drivers for Block Devices 181 182 biodone bp return 0 validate the transfer request if bp b blkno gt xsp nblocks bp b blkno lt 0 bioerror bp EINVAL biodone bp return 0 Hold off all threads until the device is not busy mutex enter amp xsp mu while xsp gt busy cv_wait amp xsp gt cv amp xsp mu xsp busy 1 mutex exit amp xsp mu set up DMA resources with ddi dma buf setup 9F retrieve the DMA cookie from the handle returned xsp gt bp bp Set up device DMA engine from the cookie regp xsp gt regp regp gt dma_addr cookie dmac_address regp gt dma_size cookie dmac size regp gt csr ENABLE_INTERRUPTS START_TRANSFER Read the csr to flush any hardware store buffers temp fegp csr return 0 5 Handle the interrupting device When the device finishes the data transfer it generates an interrupt which eventually results in the interrupt routine being called Most drivers specify the state structure of the device as the argument to the interrupt routine when registering interrupts see ddi add intr 9F and Registering Interrupts on page 111 The interrupt routine can then access the buf 95 structure being transferred plus any other information available from the state structure Writing Device Drivers August 1994 9 The interrupt handler should
104. EAMS Div Cis au cs ses oy nee Read 44 Device TSSuee dude oh CU SU Abad fado pone Kost Aa tes 44 Accessing Device Registers 2224 dual es 44 Example Device Registers dp elit tte e eC aa 45 Device Register Structure bast ta er EXE eee en 46 Driver Interfaces coss qutt pr em ES ede pedea 48 Entry Polls eter oed p Ond ese E E ta b ea dar 48 Callback funcolsa ov Jus aryna e ter Hv oh ERR 51 Interrupt Handling uide de ob eR Deeded settee ey ee baie 52 Driver COMEKE aw sere ieee Me CUALES A E 52 vi Printing Messages 4 5h st e eoa e a wt dre tu Rn M UR Dynamic Memory Allocation Software State Management State Stricture c eb bbe bedded ud ead State Management Routines 0 0 0 0 e eee Properties Jo pex pM reb etree Er Ce eee eek ba dt Driver Way OUP autsstnsw cals n EPOR a QUA ER o a pu eed Header Files uuu ceu rg b s The C Language and Compiler Modes Compiler Mods s ceat qe eO b Asp US E ERN OX as Function Prototypes New KovywORdS rersrsri er terene yayayain e ruv Ass Multithreaditg os vont eee pee DEA dard VE DEDE TX Threads a iov erecto tre Ne e User Threads eet eno Cut ded s Kernel Threads sapupu Dua d ea e eee eens Multiprocessing Changes Since SunOS 4 x 005 Eockmpg Prinitvesis 5 33 sickle s reis e aiaia E bea Storage Classes of Driver Data 0004 State Structure Mutual Exclusion Locks 0 0 0 2 eee eee eee
105. However extra memory cycles may be required for the x86 processor to properly handle misaligned data transfers Structure Member Alignment See Structure Padding on page 47 for more information on how this relates to device drivers Hardware Overview 11 lll No Byte Ordering The x86 processor uses little endian byte ordering The least significant byte of an integer is stored at the lowest address of the integer Byte 0 Byte 1 Byte 2 Byte 3 LSB MSB Floating Point Operations Drivers should not perform floating point operations since they are not supported in the kernel x86 Architecture Manuals Intel Corporation publishes a number of books on the x86 family of processors 80386 Programmer s Reference Manual Intel Corporation 1986 ISBN 1 55512 022 9 i486 Microprocessor Hardware Reference Manual Intel Corporation 1990 ISBN 1 55512 112 8 Pentium Processor User s Manual Volume 3 Architecture and Programming Manual Intel corporation 1993 ISBN 1 55512 195 0 System Memory Model This section describes memory model implications for device drivers Store Buffers To improve performance the system hardware may buffer data written to device memory This may affect the synchronization of device I O operations Writes to device registers may pass through several system I O buffers before reaching the registers The driver needs to take explicit steps to make sure that writes to
106. It uses the ddi getprop 9F routine to retrieve the device s SCSI target and logical unit numbers so that it can print them in messages It also retrieves its scsi device 95 structure from the private field of its dev info structure The probe 9E routine then calls scsi probe 9F to verify that the expected device a printer in this case is present If scsi_probe 9F succeeds it has attached the device s SCSI Inquiry data in a scsi_inquiry 9S structure to the sd ing field of the scsi_device 9S structure The driver can then check to see if the device type is a printer reported in the inq dtype field If it is the type is reported with scsi log 9F using scsi dname 9F to convert the device type into a string Code Example 10 1 SCSI target driver probe 9E routine static int xxprobe dev info t dip struct scsi_device sdp int rval target lun Get the SCSI target lun properties DDI PROP DONTPASS prevents ddi getprop from looking beyond this node for the properties target ddi getprop DDI DEV T ANY dip DDI PROP DONTPASS target 1 lun ddi getprop DDI DEV T ANY dip DDI PROP DONTPASS 202 Writing Device Drivers August 1994 10 lun 1 if target 1 lun 1 return DDI FAILURE Get a pointer to the scsi_device 9S structure f Sdp struct scsi device ddi get driver private dip Call scsi probe 9F to send the Inquiry command It w
107. M prompt for the kernel name This flag also causes kadb 1M to provide a prompt after it has loaded the kernel so breakpoints can be set ok boot kadb d Boot device sbus esp 0 800000 sd 3 0 File and args kadb d kadb kernel unix kadb kernel unix Size 673348 182896 46008 bytes kernel unix loaded 0x125000 bytes used kadb 0 At this point you can set break points or continue with the c command Note kadb 1M passes on any kernel flags to the booted kernel For example the flags r s and a can be passed to kernel unix with the command boot kadb ras Debugging 249 13 250 Once the system is booted sending a break passes control to kadb 1M A break is generated with L1 A or by if the console is connected through a tip window The system is ready test console login stopped at Oxfbd01028 ta Ox7d kadb 0 The number in brackets is the CPU that kadb 1M is currently executing on the remaining CPUs are halted The CPU number is zero on a uniprocessor Halting the system will also drop to a kadb 1M prompt Warning Before rebooting or shutting off the power always halt the system cleanly with init 0 or shutdown Buffers may not be flushed otherwise If the shutdown must occur from the boot PROM make sure to flush buffers with sync on the OBP or g 0 on SunMon To continue back to SunOS use c kadb 0 c test console login
108. Operations The read YE or write 9E entry point is called when a user thread issues the corresponding system call It is the responsibility of these routines to perform the desired transfer then return an indication of success or failure Programmed I O Transfers Programmed I O PIO devices rely on the CPU to perform the data transfer PIO data transfers are identical to other device register read and write operations An assignment statement is used to move a value from one variable or structure to another uiomove This kind of device may allow the driver to use uiomove 9F uiomove 9F transfers data between the user space defined by the uio 95 structure and the kernel uiomove 9F can handle page faults so the memory to which data is transferred need not be locked down It also updates the uio resid field in the uio 95 structure The following example is one way to write the ramdisk read 9E routine and relies on the following fields being present in the ramdisk state structure caddr t ram base address of ramdisk int ramsize size of the ramdisk Code Example 8 3 Ramdisk read 9E routine using uiomove 9F statrce rnt rd read dev t dev struct uio uiop cred t credp int instance rd_devstate_t rsp instance getminor dev rsp ddi_get_soft_state rd_statep instance if rsp NULL return ENXIO if uiop gt uio_offset gt rsp gt ramsize return EINVAL uiomove takes
109. S 5 4 Az This chapter is a guide to the differences between SunOS 4 x and SunOS 5 x device drivers Some simple drivers can be ported easily but to properly handle the multithreaded environment most drivers will need to be rethought and rewritten Before Starting the Conversion Review Existing Functionality Make sure the driver s current functionality is well understood the way it manages the hardware and the interfaces it provides to applications 1oct1 2 states the device is put in for example Maintain this functionality in the new driver Read the Manual This chapter is not a substitute for the rest of this book Make sure you have access to the SunOS 5 4 Reference Manuals 271 ANSI C The unbundled Sun C compiler is now ANSI C compliant Most ANSI C changes are beyond the scope of this book There are a number of good ANSI C books available from local bookstores The following two books are good references Kernighan and Ritchie The C Language Second Edition 1988 Prentice Hall Harbison and Steele C A Reference Manual Second Edition 1987 Prentice Hall Development Environment 272 DDI DKI The DDI DKI is a new name for the routines formerly called kernel support routines in the SunOS 4 x Writing Device Drivers manual and for the well known entry points in the SunOS 4 x cdevsw and bdevsw structures The intent is to specify a set of interfaces for drivers that provide a binary and sour
110. SERT 9F macro does nothing if EX evaluates to non zero If EX evaluates to zero ASSERT 9F panics the system ASSERT 9F is useful in debugging a driver since it can be used to stop the system when an unexpected situation is encountered such as an erroneously NULL pointer ASSERT 9F exhibits this behavior only when the DEBUG preprocessor symbol is defined int bcmp char s1 char s2 size t len bcmp 9F compares len bytes of the byte arrays starting at s1 and s2 If these bytes are identical bcmp 9F returns zero Otherwise bcmp 9F returns a non zero value unsigned long btop unsigned long numbytes btop 9F converts a size n expressed in bytes to a size expressed in terms of the main system MMU page size rounded down to the nearest page unsigned long btopr unsigned long numbytes btopr 9F converts a size n expressed in bytes to a size expressed in terms of the main system MMU page size rounded up to the nearest page void bzero caddr t addr size t bytes bzero 9F zeroes bytes bytes starting at addr unsigned long ddi btop dev info t dip unsigned long bytes ddi btop 9F converts a size expressed in bytes to a size expressed in terms of the parent bus nexus page size rounded down to the nearest page Summary of Solaris 2 4 DDI DKI Services 353 354 unsigned long ddi btopr dev info t dip unsigned long bytes ddi btopr 9F converts a size expressed in bytes to a size expressed in terms of
111. System V Release 4 0 Copyright c 1983 1994 Sun Microsystems Inc For more complete control boot with the ask a option this allows alternate files to be specified such as etc system orig if that is the original clean system file ok boot diski a Rebooting with command diskl a Boot device sbus esp80 800000 sd1 0 File and args a Enter filename kernel unix kernel orig unix SunOS Release 5 4 Version Generic UNIX R System V Release 4 0 Copyright c 1983 1994 Sun Microsystems Inc ame of system file etc system etc system orig ame of default directory for modules kernel orig usr kernel CR root filesystem type ufs CR Enter physical name of root device sbus 1 8000000 esp 0 800000 sd 1 0 a CR During development debugging the driver should be a constant consideration Since the driver is operating much closer to the hardware without the protection of the operating system debugging kernel code is harder than debugging user level code A stray pointer access can crash the entire system This section provides some information that may be used to make the driver easier to debug Writing Device Drivers August 1994 1I3 Process Layout SunOS 5 x operating system processes follow the definition given in the System V Application Binary Interface SPARC Processor Supplement also known as the ABI A standard process looks similar to this
112. The format of the data transfer commands are command address value Valid commands are o open a byte e open a word 4 bytes 1 open a longword 4 bytes When using the o e or 1 commands to open a location the monitor reads the present contents of that location and displays it before giving the option to rewrite it To do the write without the read because the device does something else when the read occurs pass a value after the address argument this is known as a blind write Writing Device Drivers August 1994 Overview of SunOS Device Drivers az This chapter gives an overview of SunOS device drivers It discusses what a device driver is and the types of device drivers that SunOS supports It also provides a general discussion of the routines that device drivers must implement and points out compiler related issues What is a Device Driver A device driver is a kernel module containing subroutines and data responsible for managing low level I O operations for a particular hardware device Device drivers can also be software only emulating a device such as a RAM disk or a pseudo terminal that only exists in software Such device drivers are called pseudo device drivers and cannot perform functions requiring hardware such as DMA A device driver contains all the device specific code necessary to communicate with a device and provides a standard I O interface to the rest of the system This interface protects the ker
113. There is always the possibility that either the driver accidentally waits for an event that will never occur or the event will not happen for a long time In either case the user may want to abort the process by sending it a signal or typing a character that causes a signal to be sent to the process Whether the signal causes the driver to wake up depends on the driver In SunOS 4 x whether the s1eep was signal interruptible depended on the dispatch priority passed to sleep If the priority was greater than PZERO the driver was signal interruptible otherwise the driver would not be awakened by a signal Normally a signal interrupt caused s1eep to return back to the user without letting the driver know the signal had occurred Drivers that needed to release resources before returning to the user passed the PCATCH flag to s1eep then looked at the return value of sleep to determine why they awoke while busy if sleep amp busy PCATCH PRIBIO 1 awakened because of a signal free resources return EINTR In SunOS 5 x the driver can use cv_wait_sig 9F to wait on the condition variable but be signal interruptible Note that cv_wait_sig 9F returns zero to indicate the return was due to a signal but sleep in SunOS 4 x returned a nonzero value Converting a Device Driver to SunOS 5 4 281 282 Interrupts while busy if cv_wait_sig amp busy_cv amp busy_mu 0 re
114. V es tena S taut tomm 27b Locking vedo Os ick a St its eut ed ahaha aac os nc apta 276 Tnterr pts qute usos deaeque ce CREER ER Ead OP a te ester 280 DMA sciciciti estas ca ddone ee m iiuna DIM d 281 COBRVerstOn Notes Liege Conrado v IC uta eos a e dod 282 SunOS 4 1 x to SunOS 5 4 Differences lusus 287 B Advanced Topics secus xe EE REVO Ie qu dra dude a eRe 295 Mu ltithre di g co osea ato COP POWER Ue DII a eA a 295 xiii xiv Lock Gr nularity deed ut co ey yy Sets esos asa 295 Avoiding Unnecessary Locks 4 cere ccce SE esed 23 296 Locking Orders is iustas ger exiis a a db cade 296 Scope of a LOC ast o DeC eee eet dai e T etd 297 Potential PanicS s ak heb RR eR RI EE a dera 298 Sun Disk Device Drivers esee n CO Om prenant 299 Disk T O CONMOS PE 299 Disk Performance osi tiu eleen i nir pepe es eases 300 DA 4 docete date eee CENA abi DU DEA a QU WN 301 Global Data Delrnilolsc xax cedcecsepx ERE UE EE EAE vut 301 Tagged OBeUelnieaa va 1a de dco Aa e e a 302 Untagged Quettetg debe cheb ee er wate Rete 303 Auto Request Sense Modes v srsresewebtE VE PO 303 Summary of Solaris 2 4 DDI DKI Services 307 bu fio Handling aca aoe ok use ey bp os Od Cua P dac 308 Copying AES e Mints ee ONO Uh od eem dass nuns 311 Device ACCESE DONNER RERUM 312 Device Configuration qadieescateg au ol iidt thot e abd 313 Device Information aad mter cteKE EE Quad C E rnrn 314 DMA Handling s 23 9 3 UE pode eon
115. Whiting Device Drivers 2550 Garcia Avenue wm Mountain View CA 94043 2 un O0 U S A ii A Sun Microsystems Inc Business O 1994 Sun Microsystems Inc 2550 Garcia Avenue Mountain View California 94043 1100 U S A Allrights reserved This product and related documentation are protected by copyright and distributed under licenses restricting its use copying distribution and decompilation No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Portions of this product may be derived from the UNIX and Berkeley 4 3 BSD systems licensed from UNIX System Laboratories Inc a wholly owned subsidiary of Novell Inc and the University of California respectively Third party font software in this product is protected by copyright and licensed from Sun s font suppliers RESTRICTED RIGHTS LEGEND Use duplication or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252 227 7013 c 1 ii and FAR 52 227 19 The product described in this manual may be protected by one or more U S patents foreign patents or pending applications TRADEMARKS Sun theSun logo Sun Microsystems Sun Microsystems Computer Corporation SunSoft the SunSoft logo Solaris SunOS OpenWindows DeskSet ONC ONC and NFS are trademarks or registered trademarks of Sun Microsystems Inc in the U S and certain other cou
116. a dev t b edev expanded dev field Tj b flags contains status and transfer attributes of the buf structure If B READ is set the buf structure indicates a transfer from the device to memory otherwise it indicates a transfer from memory to the device If the driver Writing Device Drivers August 1994 9 encounters an error during data transfer it should set the B_ERROR field in the b_flags member and provide a more specific error value in b_error Drivers should use bioerror 9F in preference to setting B ERROR Caution Drivers should never clear b_flags av forwand av back are pointers that can be used to manage a list of buffers by the driver See Asynchronous Data Transfers on page 184 for a discussion of the av orw and av back pointers b bcount specifies the number of bytes to be transferred by the device b un b addr is the virtual address of the data buffer when it is mapped into the kernel b blkno is the starting 32 bit logical block number on the device for the data transfer expressed in DEV BSIZE 512 bytes units The driver should use b blkno orb lblkno but not both b 1blkno is the starting 64 bit logical block number on the device for the data transfer expressed in DEV BSIZE 512 bytes units The driver should use b blkno orb lblkno but not both b_resid is set by the driver to indicate the number of bytes that were not transferred due to an error See
117. a addr setup dev info t dip struct as as caddr t addr u int len u int flags int waitfp caddr t caddr t arg ddi dma lim t lim ddi dma handle t handlep int ddi dma buf setup dev info t dip struct buf bp u int flags int waitfp caddr t caddr t arg ddi dma lim t lim ddi dma handle t handlep ddi dma addr setup 9F and ddi dma buf setup 9F take the following two arguments dip is a pointer to the device s dev info structure the object to allocate resources for For ddi dma addr setup 9F the object is described by an address range as is a pointer to an address space structure this must be NULL addr is the base kernel address of the object len is the length of the object 134 Writing Device Drivers August 1994 A For ddi dma buf setup 9F the object is described by a bu 95 structure bp is a pointer to a buf 9S structure flags is a set of flags indicating the transfer direction and other attributes DDI DMA READ indicates a data transfer from device to memory DDI DMA WRITI E indicates a data transfer from memory to device See ddi dma req 9S for a complete discussion of the allowed flags waitfp is the address of callback function for handling resource allocation failures arg is the argument to pass to the callback function limis a pointer to a ddi dma lim 95 structure as described in Device limitations on page 128 handlep is a pointer to DMA
118. a call to adddma This was needed because the kernel might want to block interrupts to prevent DMA but needed to know the highest interrupt level to block Because the new implementation uses mutexes this is no longer needed identify SunOS 4 x int xxidentify name char name SunOS 5 x int xxidentify dev_info_t dip Writing Device Drivers August 1994 A The name property is no longer passed to identify 9E ddi_get_name 9F must be used to retrieve the name from the dev_info_t pointer argument Note The unit counting is now handled by the framework To get the unit number in any routine call ddi_get_instance 9F Do not count units anywhere identify 9E is no longer guaranteed to be called for all units before attach 9E is ever called However ident ify 9E is guaranteed be called before att ach 9E on a per instance basis probe SunOS 4 x int xxprobe reg unit caddr_t reg int unit SunOS 5 x int xxprobe dev_info_t dip probe 9E is still expected to determine if a device is there or not but now it may be called any number of times so it must be stateless free anything it allocates attach SunOS 4 x VMEbus SBus int xxattach md int xxattach devinfo struct mb_device md struct dev_info devinfo SunOS 5 x int xxattach dev_info_t dip ddi_attach_cmd_t cmd As noted in ident ify 9E drivers are not allowed to count instances anywhere Use ddi_get_instance 9F to ge
119. abled A target driver enables auto request sense mode by using scsi_ifsetcap 9F Code Example B 2 is an example of enabling auto request sense Code Example B 2 Enabling auto request sense static int xxattach dev info t dip ddi attach cmd t cmd struct xxstate xsp struct scsi device sdp struct scsi device ddi get driver private dip Advanced Topics 305 306 enable auto request sense an auto request sense cmd may fail due to a BUSY condition or transport error Therefore it is recommended to allocate a separate request sense packet as well Note that scsi ifsetcap 9F may return 1 0 or 1 xsp sdp arq enabled scsi ifsetcap ROUTE auto rgsense 1 1 1 1 0 if the HBA driver supports auto request sense then th status blocks should be sizeof struct scsi arq status else one byte is sufficient if xsp gt sdp_cmd_stat_size xsp gt sdp_arq_enabled sizeof struct scsi arq status 1 When a packet is allocated using scsi init pkt 9F and auto request sense is desired on this packet then the target driver must request additional space for the status block to hold the auto request sense structure as Code Example B 3 illustrates The sense length used in the request sense command is sizeof struct scsi extended sense The scsi arq status structure contains the following members struct scsi status sts status struct scsi stat
120. action to take if resources are not available If set to NULL FUNC scsi init pkt 9F returns immediately returning NULL If set to SLEEP FUNC it does not return until resources are available Any other valid kernel address is interpreted as the address of a function to be called when resources are likely to be available arg is the parameter to pass to the callback function The scsi sync pkt 9F routine can be used to synchronize any cached data after a transfer when a target driver wants to reuse a scsi pkt for another command This may be done either in the command completion routine or before calling scsi transport 9F for the command a second time The scsi destroy pkt 9F routine synchronizes any remaining cached data associated with the packet if necessary and then frees the packet and associated command status and target driver private data areas This routine should be called in the command completion routine see scsi pkt structure on page 193 Scsi alloc consistent buf For most I O requests the data buffer passed to the driver entry points is not accessed directly by the driver it is just passed on to scsi init pkt 9F lfa driver sends SCSI commands which operate on buffers the driver examines itself such as the SCSI Request Sense command the buffers should be DMA consistent The scsi alloc consistent buf 9F routine allocates a bu 9S structure and a data buffer suitable for DMA consistent operatio
121. address addr to the location pointed to by valuep int ddi pokec dev info t dip char addr char value ddi_pokec 9F writes the character in value to the address addr int ddi pokes dev info t dip short addr short value ddi pokes 9F writes the short integer in value to the address addr int ddi pokel dev info t dip long addr long value ddi pokel 9F writes the long integer in value to the address addr int ddi poked dev info t dip longlong t addr longlong t value ddi poked 9F writes the double long integer in value to the address addr major t getmajor dev t dev getmajor 9F decodes the major device number from dev and returns it Summary of Solaris 2 4 DDI DKI Services 355 356 minor t getminor dev t dev getminor 9F decodes the minor device number from dev and returns it dev t makedevice major t majnum minor t minnum makedevice 9F constructs and returns a device number of type dev t from the major device number ma jnum and the minor device number minnum int max int intl int int2 max 9F returns the larger of the integers int 1 and int2 int min int intl int int2 min 9F returns the lesser of the integers int 1 and int2 int nodev nodev 9F returns an error Use nodev 9F as the entry in the cb_ops 9S structure for any entry point for which the driver must always fail int nulldev nulldev 9F always returns zero a return which for many entry point
122. address space where the driver can address the data The driver developer must ensure that adequate space is allocated for driverbuf int copyout caddr t driverbuf caddr t userbuf size t cn copyout 9F copies data from the kernel virtual address space to an application program s virtual address space Summary of Solaris 2 4 DDI DKI Services 313 lll C Device Access 314 int ddi copyin caddr t buf caddr t driverbuf Size t cn int flags This routine is designed for use in driver ioct1 9E routines It copies data from a source address to a driver buffer The driver developer must ensure that adequate space is allocated for the destination address The flags argument is used to determine the address space information about buf If the FKIOCTL flag is set it indicates that buf is a kernel address and ddi copyin 9F behaves like bcopy 9F Otherwise buf is interpreted as a user buffer address and ddi_copyin 9F behaves like copyin 9F The value of the flags argument to ddi copyin 9F should be passed through directly from the mode argument of ioct1 9E untranslated int ddi copyout caddr t driverbuf caddr t buf Size t cn int flags This routine is designed for use in driver ioct1 9E routines for drivers that support layered I O controls ddi copyout 9F copies data from a driver buffer to a destination address buf The 1ags argument is used to determine the address space information about buf If
123. ally allocate structures whenever possible If bu 95 structure is needed do not declare one Instead declare a pointer to one and call getrbuf 9F to allocate it Note Even using kmem alloc sizeof struct buf is not allowed since the size of a bu 95 structure may change in future releases System V Release 4 SunOS 5 x is the Sun version of AT amp T s System V Release 4 SVR4 The system administration model is different from those in previous SunOS releases which were more like 4 3 BSD Differences important to device driver writers are Halting and booting the machine see the Solaris 1 x to Solaris 2 x Transition Guide Kernel configuration see Chapter 5 Autoconfiguration Software packaging see the Application Packaging Developer s Guide For general SVR4 system administration information see the Solaris 1 x to Solaris 2 x Transition Guide Development Tools The only compiler that should be used to compile SunOS 5 x device drivers is the unbundled Sun C compiler This is either part of SPARCworks 2 0 1 for SPARC systems or ProWorks 2 0 1 for x86 systems See Chapter 12 Loading and Unloading Drivers for information on how to compile and load a driver Note that the compiler s bin directory possibly opt SUNWspro bin and the supporting tools directory usr ccs bin should be prepended to the PATH When compiling a driver use the Xa and D KERNEL options When building a loada
124. an entry point and another in the interrupt routine The driver had to deal with this in SunOS 4 x but with the restriction that the interrupt routine blocked the user context routine while it ran 2 Two threads could be in a routine at the same time This could not happen in SunOS 4 x Both of these cases are similar to situations present in SunOS 4 x but now these threads could run at the same time on different CPUs The driver must be prepared to handle these types of occurrences Mutual Exclusion Locks In SunOS 4 x a driver had to be careful when accessing data shared between the top half and the interrupt routine Since the interrupt could occur asynchronously the interrupt routine could corrupt data or simply hang To prevent this portions of the top half of the driver would raise using the various sp1 routines the interrupt priority level of the CPU to block the interrupt from being handled S splr pritospl 6 access shared data void sp1x s Writing Device Drivers August 1994 A In SunOS 5 x this no longer works Changing the interrupt priority level of one CPU does not necessarily prevent another CPU from handling the interrupt Also two top half routines may be running simultaneously with the interrupt running on a third CPU To solve this SunOS 5 x provides 1 A uniform module of execution even interrupts run as threads This blurs the distinction between the top half and the bottom hal
125. anagement of 43 213 memory model SPARC 13 store buffers 12 memory allocation of 54 326 minor device node 98 modldrv 87 modlinkage 87 module directory 227 module ID getting 228 modunload 1M command 229 mount 2 system call 173 multithreaded kernel 73 multithreading 2 and condition variables 78 and lock granularity 295 and locking primitives 74 application threads 71 thread synchronization 77 mutex functions 76 344 locking order 296 locks 75 344 related panics 298 Writing Device Drivers August 1994 routines 75 N node types 99 non self identifying devices 15 O object locking 131 open 2 system call 173 P padding structures 47 peripheral devices 26 physical DMA 122 physical SBus addresses in SPARCstation 1 17 poll 2 system call 44 polled interrupts 107 polling See device polling printing messages 53 probe 9E entry point 93 programmed I O 154 prop op 9E entry point 58 properties ddi prop create 9F 58 ddi prop op 9F 58 overview of 57 prop op 9E entry point 58 types of 57 PTE masks 37 Q queueing 302 R readers writer locks 77 registers See control registers and device registers rnumber 98 S S IFCHR 99 SBus geographical addressing 15 physical SBus addresses 17 slots supporting DVMA 17 scatter gather I O 153 SCSA 189 functions types of 194 global data definitions 301 interfaces 334 SCSI architecture 191 f
126. and completed successfully and decode the sense data Code Example B 4 Checking for auto request sense static void xxcallback struct scsi pkt pkt if pkt pkt state amp STATE ARQ DONE The transport layer successfully completed an auto request sens Decode the auto request sense data here A The sample SCSI drivers in appendixes E and F show in more detail how to interpret the auto request sense data structure Advanced Topics 307 lll O3 308 Writing Device Drivers August 1994 Summary of Solaris 2 4 DDI DKI Services C This chapter discusses by category the interfaces provided by the Solaris 2 4 DDI DKI After each category is introduced each function in that category is listed with a brief description These descriptions should not be considered complete or definitive nor do they provide a thorough guide to usage The descriptions are intended to describe what the functions do in general terms and what the arguments and return values mean See the manual pages for more detailed information The categories are buf 9S Handling page 310 Copying Data page 313 Device Access page 314 Device Configuration page 315 Device Information page 316 DMA Handling page 317 Flow of Control page 324 Interrupt Handling page 324 Kernel Statistics page 324 Memory Allocation page 328 Polling page 329 Printing System Messages page 329 Process Signalin
127. and are available Be aware that these addresses while convenient for testing are not those that the kernel chooses when the driver is finally installed It is most convenient to select a virtual address that has only zeros in its low order bits This way the first address in a virtual page is selected The low order bits in the chosen address remain unchanged With X representing the unmapped low order bits 13 on an 8K page Sun 4 the test address 0x4000 is in binary 0000 0000 0000 0000 010X XXXX XXXX XXXX Finding a Physical Address The device may be preconfigured to some address If it is then that address should be used unless it conflicts with the address of an already installed device If it conflicts an unused physical address must be found To do so examine the hardware configuration files in kernel drv and usr kernel drv on the test system See driver conf 4 for more information on hardware configuration files and vme 4 for specific information on configuration files for VMEbus devices Hardware Overview 35 36 Creating a Page Table Entry The link between the virtual and physical address is the MMU The MMU contains a page table to keep track of which virtual pages map to which physical pages Entries in this table are called page table entries PTE To create a mapping a page table entry must be constructed On a Sun 4 PTEs are 32 bit numbers with the following structure Viwsc am Unused
128. and is small and may be used in time critical areas of the driver such as when handling errors Autoconfiguration SCSI target drivers must implement the standard autoconfiguration routines _init 9E fini 9E info 9E and identify 9E See Chapter 5 Autoconfiguration for more information probe 9E att ach 9E and getinfo 9E are also required but they must perform SCSI and SCSA specific processing probe SCSI target devices are not self identifying so target drivers must have a probe 9E routine This routine must determine whether or not the expected type of device is present and responding SCSI Target Drivers 201 10 The general structure and return codes of the probe 9E routine are the same as those of other device drivers See probe on page 87 for more information SCSI target drivers must use the scsi_probe 9F routine in their probe 9E entry point scsi_probe 9F sends a SCSI Inquiry command to the device and returns a code indicating the result If the SCSI Inquiry command is successful scsi_probe 9F allocates a scsi_inquiry 9S structure and fills it in with the device s Inquiry data Upon return from scsi_probe 9F the sd inq field of the scsi_device 9S structure points to this scsi_inquiry 9S structure Since probe 9E must be stateless the target driver must call scsi unprobe 9F before probe 9E returns even if scsi_probe 9F fails Code Example 10 1 shows a typical probe 9E routine
129. and kadbesaasenaut ere dese EEEREN bb CRM E 246 Example adb on a Core Dump x ecevE Tode ES 257 Example kadb on a Deadlocked Thread 260 MOSEL sexo det hee b UO A dude ahaha ee a 263 Configuration Testing eo et evs vale pe Pei PS eee o 263 Functionality Iesung ust vede ee peas apetece 264 Error Hand lint adc ot ER Enea dada E Rd eee 264 Stress Performance and Interoperability Testing 265 DDI DKI Compliance Testing RR m8 265 Installation and Packaging Testing lusus 266 Testing Specific Types of Drivers sies 266 A Converting a Device Driver to SunOS 5 4 269 xii Writing Device Drivers August 1994 Before Starting the Conversion lleseeeeeses 269 Review Existing Futictonality cc ccce teo red fg 269 Read the M n al s sou ited Peete eiia a ae eC 269 ANSI nie Cet Ci cadem t Si Iudei er Supr 270 Development Environment ees reme am STE 270 DDLIZDKL 2 REGE PE Oda eo ev qu rapa E Aq 270 Thi gs to VOR duy uxo A uo doteb s ves et etu v o Eno 270 System V Release 4 susesasduu e re rapid dd 271 Development Tools 0 4 qu eb oP OCT D e Ow Ca C CA 271 Debugging Tools 02 ux COCOS POTES A ALAS e nU A IC ws 272 ANSI eeii En EEE E E NE AE EA 272 header Pilesc sss aio Euer eb Peu NEN deci eripi uw 273 Overview of Changes sissoo rese xa ede E ber E 273 Autoconfiguration veces e paved oed 6b Vatqctu at 273 iate E 274 do e 275 M ltithreadi g b co eh Est
130. and vector that contains strings of this kind for all the SCSI commands it supports struct scsi pkt scsi dmaget struct scsi pkt pkt opaque t dmatoken int callback void scsi_dmaget 9F allocates resources for an existing scsi pkt 95 structure pointed to by pkt Pass in dmatoken a pointer to the bu 95 structure that encodes original I O request If callback is not NULL FUNC and the requested DMA resources are not immediately available the function pointed to by callback will be called when resources may have become available callback can call scsi_dmaget 9F again If callback is SLEEP FUNC scsi dmaget 9F may block waiting for resources Writing Device Drivers August 1994 C lll void scsi_dmafree struct scsi pkt pkt scsi_dmafree 9F frees the DMA resources previously allocated by scsi_dmaget 9F for the scsi_pkt 9S structure pkt char scsi_dname int dtype scsi_dname 9F decodes the device type code dt ype found in the INQUIRY data and returns a character string denoting this device type void scsi free consistent buf struct buf bp sScsi free consistent buf 9F frees a buffer header and consistent data buffer that was previously allocated using Scsi alloc consistent buf 9F int scsi ifgetcap struct scsi address ap char cap int whom scsi ifgetcap 9F returns the current value of the host adapter capability denoted by cap for the host adapter servicing the target at the SCSI
131. annans sss Seo dos deser 107 TypesorlInterrupte 65 4350 seb Oe Edd OO pna 107 Vectored Interrupts Pico ec ER ERE Rd dada dE eise 107 Polled Interrupts 4 i34 81d anh ente CET ERE da 108 Software nterfuplsa s sucio ee CET eee esit Spr an 108 Registering Inter Bpte v oooouio p ERE EN ureten ea dS 109 Responsibilities of an Interrupt Handler 111 State Str GtTUTE yi er n tie lacis Eo OE D DU a da a 113 Handling High Level Interrupts 22 0 sgn neds rx pn 113 Example easton Sa Pa a eae ee ee o tst ee 114 VI DMA Se eae Rae nee oh ee ROE CEOE OO Rae 6 EWN Solan HA GGT 119 The DMA M del za eTEOE REP ERERO AAAS ES 119 Types OF Device DMA Acero fogs V Po PERSONE dene dp 122 DMA and DWVM sco mci een EDU OC aa dE 122 Handles Windows Segments and Cookies 123 DMA Operations uL UU SUE Obie Ua a a E Ed Le vuU 124 Device limitations ovato ee pileta s o eC a 126 Object LOCEIBg 5 ode te ERES P Mgadax Cis eed 131 Allocating DMA Resources seeeeeeeeeee 131 Burst SIZES c c Sheedy sod ae eda de C qoe p bored dd 135 Programming the DMA Engine 0005 136 Freeing the DMA Resources 44d orit Er Far Ree 137 Cancelling DMA Callbacks 252 nn etes cR cues ee Cn ee 138 viii Writing Device Drivers August 1994 CACC uero AA MELLE eS te he V VU e hr teed Ee due 140 ddi dma synmeg C icubeaetfeses e hee etd 142 Allocating Private DMA Butters n nnn der E VE ee een Os 143 dec opo albLoc xnn
132. annot be allocated right away If the resources are allocated successfully ddi_dma_addr_setup 9F passes back the DMA handle for the mapping in the location pointed to by handlep NULL should be passed for as int ddi_dma_buf_setup dev_info_t dip struct buf bp u_int flags int waitfp caddr_t caddr_t arg ddi dma lim t lim ddi dma handle t handlep ddi dma buf setup 9F allocates resources for an object described by a bu 9F structure pointed to by bp subject to constraints specified by lim waitfp is a pointer to a callback function to be called later if the DMA resources cannot be allocated right away If the resources are allocated successfully ddi dma buf setup 9F passes back the DMA handle for the resources in the location pointed to by handlep int ddi dma burstsizes ddi dma handle t handle ddi dma burstsizes 9F returns an integer that encodes the allowed burst sizes for the DMA resources specified by handle Allowed power of two burst sizes are bit encoded in the return value For a mapping that allows only two byte bursts for example the return value would be 0x2 For a mapping that allows 1 2 4 and 8 byte bursts the return value would be Oxf Writing Device Drivers August 1994 C lll int ddi dma coff ddi dma handle t handle ddi dma cookie t cookiep off t offp ddi dma coff 9F passes back in the location pointed to by of fp an offset into a DMA object The mapping is specified by handl
133. are used in the DDI DKI to describe aspects of a DMA transaction These include DMA Object Memory that is the source or destination of a DMA transfer DMA Handle 121 122 An opaque object returned from a successful DMA setup call The DMA handle can be used in successive DMA subroutine calls to refer to the DMA object DMA Window A DMA window describes all or a portion of a DMA object that is ready to accept data transfers DMA Segment A DMA segment is a contiguous portion of a DMA window that is entirely addressable by the device DMA Cookie A ddi_dma_cookie 9S structure ddi dma cookie t describes a DMA segment It contains DMA addressing information required to program the DMA engine Rather than knowing that a platform needs to map an object typically a memory buffer into a special DMA area of the kernel address space device drivers instead allocate DMA resources for the object The DMA routines then perform any platform specific operations needed to set the object up for DMA access The driver receives a DMA handle to identify the DMA resources allocated for the object This handle is opaque to the device driver the driver must save the handle and pass it in subsequent calls to DMA routines but should not interpret it in any way Operations are defined on a DMA handle that provide the following services Manipulating DMA resources Synchronizing DMA objects Retrieving attributes of th
134. as completed When a resource might become available When a timeout period has expired Transfer completion callbacks perform the tasks usually done in an interrupt service routine In some sense callback functions are similar to entry points The functions that allow callbacks expect the callback function do to certain things In the case of DMA routines a callback function must return a value indicating whether the callback function wants to be rescheduled in case of a failure Callback functions execute as a separate thread They must consider all the usual multithreading issues Note All scheduled callback functions must be canceled before a device is detached Overview of SunOS Device Drivers 53 3 Interrupt Handling Driver Context 54 The Solaris 2 x DDI DKI addresses these aspects of device interrupt handling Registering device interrupts with the system Removing device interrupts from the system Interrupt information is contained in a property called interrupts or intr on x86 platforms see isa 4 which is either provided by the PROM of a self identifying device or in a hardware configuration file See sbus 4 vme 4 and Properties on page 59 for more information Since the internal implementation of interrupts is an architectural detail special interrupt cookies are used to allow drivers to perform interrupt related tasks The types of cookies for interrupts are Device interrup
135. asier to allocate DMA resources for use with kernel virtual addresses ddi dma addr setup 9F and buf 9S structures ddi dma buf setup 9F The setup functions pass back a pointer to a DMA handle which identifies the allocated DMA resources in future calls to other DMA handling functions The DMA setup functions take a pointer to a DMA limits structure as an argument The DMA limits structure allows any constraints which the device s DMA controller may impose on DMA transfers to be specified such as a limited transfer size The DMA setup functions also provide a callback mechanism where a function can be specified to be called later if the requested mapping can t be set up immediately The DMA window functions allow resources to be allocated for a large object The resources can be moved from one part of the object to another by moving the DMA window The DMA engine functions allow drivers to manipulate the system DMA engine if there is one These are currently used on x86 systems Summary of Solaris 2 4 DDI DKI Services 317 318 int ddi dma addr setup dev info t dip struct as as caddr t addr u int len u int flags int waitfp caddr t caddr t arg ddi dma lim t lim ddi dma handle t handlep ddi dma addr setup 9F allocates resources for an object of length len at kernel address addr subject to any constraints specified by lim waitfpisa pointer to a callback function to be called later if the DMA resources c
136. asses of devices such as frame buffers or disks must support standard sets of I O control requests These standard I O control interfaces are documented in the Solaris 2 4 Reference Manual AnswerBook For example bio 7 documents the I O controls that frame buffers must support and dkio 7 documents standard disk I O controls See Miscellaneous I O Control on page 165 for more information on I O control Note The I O control commands in section 7 are not part of the Solaris 2 x DDI DKI Memory Mapping For certain devices such as frame buffers it is more efficient for application programs to have direct access to device memory Applications can map device memory into their address spaces using the mmap 2 system call To support memory mapping device drivers implement segmap 9E and mmap 9E entry points See Chapter 11 Device Context Management for details Drivers that define an mmap 9E entry point usually do not define read 9E and write 9E entry points since application programs perform I O directly to the devices after calling mmap 2 See Chapter 11 Device Context Management for more information on I O control Overview of SunOS Device Drivers 45 lll Qo Device Polling The po11 2 system call allows application programs to monitor or poll a set of file descriptors for certain conditions or events po11 2 is used to find out whether data are available to be read from the file descriptors or whether
137. at instance number the system has assigned to a device instance the name of the dev info node for the device and the dev info node of the device s parent int ddi dev is sid dev info t dip ddi_dev_is_sid 9F returns DDI SUCCESS if the device identified by dip is self identifying see Device Identification on page 14 Otherwise it returns DDI FAILURE int ddi get instance dev info t dip ddi get instance 9F returns the instance number assigned by the system for the device instance specified by dip char ddi get name dev info t dip ddi get name 9F returns a pointer to a character string that is the name of the dev info tree node specified by dip ddi get name 9F should be called in the identify 9E entry point and the result compared to the name of the device Writing Device Drivers August 1994 C lll DMA Handling dev info t ddi get parent dev info t dip ddi get parent 9F returns the dev info t pointer for the parent dev info node of the passed node identified by dip int ddi slaveonly dev info t dip ddi_slaveonly 9F returns DDI SUCCESS if the device indicated by dip is installed in a slave access only bus slot It returns DDI FAILURE otherwise These interfaces allocate and release DMA resources for devices capable of directly accessing system memory The family of setup functions are all wrappers around the main setup function ddi dma setup 9F The wrappers make it e
138. ately by kstat runq enter 9F void kstat runq to waitq kstat io t kiop kstat runq to waitq 9F is used to update the kernel io 95 structure pointed to by kiop indicating that the request is transitioning from one state to the next kstat runq to waitq 9F is used when a driver would normally call kstat runq exit 9F followed immediately by a call to kstat waitq enter 9F These interfaces dynamically allocate memory for the driver to use void kmem alloc size t size int flag kmem_alloc 9F allocates a block of kernel virtual memory of length size and returns a pointer to it If flag is KM SLEEP kmem alloc 9F may block waiting for memory to become available If flag is KM NOSLEEP kmem alloc 9F returns NULL if the request cannot be satisfied immediately void kmem free void cp size t size kmem free 9F releases a block of memory of length size starting at address addr that was previously allocated by kmem_alloc 9F size must be the original amount allocated void kmem zalloc size t size int flags kmem zalloc 9F calls kmem alloc 9F to allocate a block of memory of length size and calls bzero 9F on the block to zero its contents before returning its address Writing Device Drivers August 1994 C lll Polling These interfaces support the po11 2 system call which provides a mechanism for application programs to poll character oriented devices inquiring about
139. ation Multiprocessing Changes Since SunOS 4 x Here is a simplified view of how the earlier releases of the SunOS kernel ran on multiprocessors only one processor could run kernel code at any one time and this was enforced by using a master lock around the entire kernel When a processor wanted to execute kernel code it acquired the master lock blocking other processors from accessing kernel code It released the lock on exiting the kernel CPUO CPU1 CPU2 CPUS User User User User Kernel Kernel Kernel Kernel CPU 1 Acquire master lock Run code Release master lock Figure 4 2 SunOS 4 x kernels on a multiprocessor In Figure 4 2 CPU1 executes kernel code All other processors are locked out of the kernel the other processors could however run user code In SunOS 5 x instead of one master lock there are many locks that protect smaller regions of code or data In the example shown in Figure 4 3 there is a kernel lock that controls access to data structure A and another that controls Multithreading 75 lll HS Locking Primitives access to data structure B Using these locks only one processor at a time can be executing code dealing with data structure A but another could be accessing data within structure B This allows a greater degree of concurrency CPUO CPU1 CPU2 CPU3 User User User User Kernel Kernel Kernel Kernel CPU1 CPU 3 Acquire lock_A Acquire
140. ations with the tape removed attempting writes with the write protect on and removing power during operations Tape drivers typically implement exclusive access open 9E calls which should be tested by having a second process try to open the device while a first process already has it open Disk Drivers Disk drivers should be tested in both the raw and block device modes For block device tests a new file system should be created on the device and mounted Multiple file operations can be performed on the device at this time Note The file system uses a page cache so reading the same file over and over again will not really be exercising the driver The page cache can be forced to retrieve data from the device by memory mapping the file with mmap 2 and using msync 2 to invalidate the in memory copies Another unmounted partition of the same size can be copied to the raw device and then commands such as sck 1M can be used to verify the correctness of the copy The new partition can also be mounted and compared to the old one on a file by file basis Asynchronous Communication Drivers Asynchronous drivers can be tested at the basic level by setting up a 1ogin line to the serial ports A good start is if a user can log in on this line To sufficiently test an asynchronous driver however all of the I O control functions must be tested and many interrupts at high speed must occur A test involving a loopback serial cable a
141. be return DDI PROBE DONTCARE Autoconfiguration 95 96 instance ddi get instance dip assigned instance if ddi intr hilevel dip inumber cmn err CE CONT xX driver does not support high level interrupts Probe failed return DDI PROBE FAILURE Map device registers and try to contact device E7 if ddi_map_regs dip rnumber amp reg_addr offset len 0 return DDI PROBE FAILURE if ddi peekc dip reg addr NULL DDI SUCCESS goto failed free allocated resources ddi unmap regs dip rnumber amp reg addr offset len if device is present and ready for attach return DDI PROBE SUCCESS else if device is present but not ready for attach return DDI PROBE PARTIAL else device is not present return DDI PROBE FAILURE failed free allocated resources ddi unmap regs dip rnumber amp reg addr offset len return DDI PROBE FAILURE The string printed in the high level interrupt case begins with a character This causes the message to be printed only if the kernel was booted with the verbose v flag See kerne1 1M Otherwise the message only goes into the message log where it can be seen by running dmesg 1M Probing the device registers is device specific The driver probably has to perform a series of tests of the hardware to assure that the hardware is reall
142. beginning with the character are reserved for use by future revisions of IEEE 1275 1994 By convention underscores are not used in property names use a hyphen instead Also by convention property names ending with the question mark character auto boot contain values that are strings typically true or false A driver can request a property from its parent which in turn may ask its parent The driver can control whether the request can go higher than its parent Overview of SunOS Device Drivers 59 60 For example the esp driver maintains an integer sized property called targetX sync speed The prtconf 1M command in its verbose mode displays driver properties The following example shows a partial listing for the esp driver test prtconf v esp instance 0 Driver software properties name target2 sync speed length 4 value lt 0x00000fa0 gt The property interface can be used to Create a property with ddi prop create 9F This usually is performed in attach 9E Retrieve a property with ddi prop op 9L or one of the following more specific routines ddi_getproplen 9F to retrieve the length of a property ddi_getprop 9F for boolean and integer properties e ddi_getlongprop 9F and addi getlongprop buf 9F for other property sizes ddi prop modify 9F Change the value of a property d di prop undefine 9F Explicitly undefine but not remove a property
143. bject is used in the next transfer However if the same object is always used the resources may be allocated once and continually reused as long as there are intervening calls to ddi_dma_sync 9F Cancelling DMA Callbacks DMA callbacks cannot be cancelled This requires some additional code in the drivers det ach 9E routine since it must not return DDI_SUCCESS if there are any outstanding callbacks When DMA callbacks occur the det ach 9E routine must wait for the callback to run and must prevent it from rescheduling itself This can be done using additional fields in the state structure int cancel_callbacks detach 9E sets this to prevent callbacks from rescheduling themselves int callback_count number of outstanding callbacks kmutex_t callback_mutex protects callback_count and cancel_callbacks kcondvar_t callback_cv condition is that callback_count is zero detach 9E waits on it 140 Writing Device Drivers August 1994 4 lll Code Example 7 4 Cancelling DMA callbacks static int xxdetach dev_info_t dip ddi_detach_cmd_t cmd mutex enter amp xsp callback mutex xsp cancel callbacks 1 while xsp callback count gt 0 cv wait amp xsp callback cv amp xsp callback mutex mutex exit amp xsp callback mutex static int xxstrategy struct buf bp mutex enter amp xsp callback mutex xsp gt bp bp er
144. ble driver module from the object modules use 1d 1 with the r flag Converting a Device Driver to SunOS 5 4 273 274 Debugging Tools ANSIC adb 1 kadb 1M and crash 1M are essentially the same as they were in SunOS 4 x though there are new macros To debug a live kernel use dev ksyms see ksyms 7 instead of the kernel name which used to be vmunix f adb k dev ksyms dev mem See Debugging Tools on page 245 for more information The unbundled Sun C compiler is now ANSI C compliant Two important ANSI C features device driver writers should use are the volatile keyword and function prototyping volatile volatile is a new ANSI C keyword It is used to prevent the optimizer from removing what it thinks are unnecessary accesses to objects All device registers should be declared volatile As an example if the device has a control register that requires two consecutive writes to get it to do something the optimizer could decide that the first write is unnecessary since the value is unused if there is no intervening read access Note It is not an error to declare something volatile unnecessarily Function Prototypes ANSI C provides function prototypes This allows the compiler to check the type and number of arguments to functions and avoids default argument promotions To prototype functions declare the type and name of each function in the function definition Then provide a prototype
145. c files One possible convention is to name the private header file xximpl h and the public header file xxio h Code Example 3 4 and Code Example 3 5 show the layout of these headers Code Example 3 4 xximpl h Header File xximpl h struct device reg jo Leld Sars A hn define bits of the device registers struct xxstate fields related define statements Code Example 3 5 xxio h Header File xxio h ud struct xxioctlreq fields j CEC 7 Overview of SunOS Device Drivers 63 64 define XXIOC b lt lt 8 define XXIOCTL 1 XXIOC 1 description define XXIOCTL_2 XXIOC 2 description xx c Files A c file for a device driver contains the data declarations and the code for the entry points of the driver It contains the include statements the driver needs declares extern references declares local data sets up the cb_ops and dev_ops structures declares and initializes the module configuration section makes any other necessary declarations and defines the driver entry points The following sections describe these driver components Code Example 3 6 shows the layout of an xx c file Code Example 3 6 xx c File XX C T include xximpl h finclude xxio h finclude sys ddi h must include these two files include lt sys sunddi h gt and they must be the last system includes
146. cac 0xff2ac764 0xff035b000 0xf00dfe8 198 ufs_sync 0x0 0x10000 Oxf 007980 Oxle 0xf00d8e38 0xf00d8e38 8 fsflush 0xf00dd4330 0xf006e0830 0x1af 0x2 0xf00 3ab8 0xf00508254 568 kadb 0 s stopped at scsi_transport 4 ld i0 0x4 00 kadb 0 b breakpoints count bkpt command 1 scsi_transport kadb 0 scsi_transport d kadb 0 c Conditional Breakpoints Breakpoints can also be set to occur only if a certain condition is met By providing a command the breakpoint will be taken only if the count is reached or the command returns zero For example a breakpoint that occurs only on Debugging 255 256 certain I O controls could be set in the driver s ioct1 9E routine Here is an example of breaking only in the sdioct1 routine if the DKTOGVTOC get volume table of contents I O control occurs kadb 0 sdioct1 4 0 b il1 0x40B kadb 0 b breakpoints count bkpt command 0 sdioctl 4 i1 0x40B kadb 0 c Adding four to sdioct1 skips to the second instruction in the routine bypassing the save instruction that establishes the stack The i1 refers to the first input register which is the second parameter to the routine the cmd argument of ioctl 9E The count of zero is impossible to reach so it stops only when the command returns zero which is when i1 0x40B is true This means i1 contains 0x40B the value of the ioct1 command determined by examining th
147. ce It supports a mechanism to manage the list of I O requests so as to optimize disk access for a file system See Asynchronous Data Transfers on page 184 for a description of enqueuing an I O request The diskhd structure is used to manage a linked list of I O requests struct diskhd long b flags not used needed for consistency xj struct buf b_forw b back queue of unit queues struct buf av forw av back queue of bufs for this unit long b bcount active flag The diskhd data structure has two buf pointers which can be manipulated by the driver The av_forw pointer points to the first active I O request The second pointer av_back points to the last active request on the list A pointer to this structure is passed as an argument to disksort 9F along with a pointer to the current buf structure being processed The disksort 9F routine is used to sort the buf requests in a fashion that optimizes disk seek and then inserts the buf pointer into the diskhd list The disksort program uses the value that is in b resid of the buf structure as a sort key It is up to the driver to set this value Most Sun disk drivers use the cylinder group as the sort key This tends to optimize the file system read ahead accesses Once data has been added to the diskhd list the device needs to transfer the data If the device is not busy processing a request the xxstart routine pulls the first buf structure off t
148. ce allocation function It allocates resources based on the DMA request structure pointed to by dmareqp and passes back a DMA handle that identifies the mapping in the location pointed to by handlep int ddi dma free ddi dma handle t handle ddi_dma_free 9F calls ddi dma sync 9F and frees the resources associated with the DMA mapping identified by handle int ddi dma sync ddi dma handle t handle off t off u int length u int type ddi dma sync 9F assures that any CPU and the device see the same data starting at of f bytes into the DMA resources identified by handle and continuing for len bytes type should be DDI DMA SYNC FORDEV to make sure the device sees any changes made by a CPU DDI DMA SYNC FORCPU to make sure all CPUs see any changes made by the device DDI DMA SYNC FORKERNEL similar to DDI DMA SYNC FORCPU except that only the kernel view of the object is synchronized int ddi dmae alloc dev info t dip int chnl int dmae waitfp caddr t arg ddi_dmae_alloc 9F allocates a DMA channel from the system DMA engine It must be called prior to any operation on a channel int ddi dmae release dev info t dip int chnl ddi dmae release 9F releases a previously allocated DMA channel Summary of Solaris 2 4 DDI DKI Services 321 322 int ddi dmae prog dev info t dip struct ddi dmae req dmaereqp ddi dma cookie t cookiep int chnl The ddi dmae prog 9F function program
149. ce code interface If a driver uses only kernel routines and structures described in Section 9 of the man Pages 9 DDI and DKI Overview it is called Solaris 2 4 DDI DKI compliant A Solaris 2 4 DDI DKI compliant driver is likely to be binary compatible across Sun Solaris platforms with the same processor SPARC and binary compatible with future releases of Solaris on platforms the driver works on Things to Avoid Many architecture specific features have been hidden from driver writers behind DDI DKI interfaces Specific examples are elements of the dev info structure user structure proc structure and page tables If the driver has been using unadvertised interfaces it must be changed to use DDI DKI interfaces that provide the required functionality If the driver continues to use unadvertised interfaces it loses all the source and binary compatibility features of the DDI DKI For example previous releases had an undocumented routine called as_fault that could be used to lock down user pages in memory This routine still exists but is not part of the DDI DKI so it should not be used The only documented way to lock down user memory is to use physio 9F Writing Device Drivers August 1994 A Do not use any undocumented fields of structures Documented fields are in Section 9S of the man Pages 9S DDI and DKI Data Structures Do not use fields structures variables or macros just because they are in a header file Dynamic
150. ces that have different interrupts for different events A communications controller may have one interrupt for receive ready and one for transmit ready The device driver normally knows how many interrupts the device has but if the driver has to support several variations of a controller it can call ddi_dev_nintrs 9F to find out the number of device interrupts For a device with n interrupts the interrupt numbers range from 0 to n 1 Bus Interrupt Levels Buses prioritize device interrupts at one of several bus interrupt levels These bus interrupt levels are then associated with different processor interrupt levels For example SBus devices that interrupt at SBus level 7 interrupt at SPARC level 9 on SPARCstation 2 systems but interrupt at SPARC level 13 on SPARCstation 10 systems Writing Device Drivers August 1994 ON lll High Level Interrupts A bus interrupt level that maps to a CPU interrupt priority level above the scheduler priority level is called a high level interrupt High level interrupts must be handled without using system services that manipulate threads In particular the only kernel routines that high level interrupt handlers are allowed to call are mutex_enter 9F and mutex exit 9F on a mutex initialized with an interrupt block cookie associated with the high level interrupt ddi_trigger_softintr 9F A bus interrupt level by itself does not determine whether a device interrupts at high level a gi
151. ch 9F in the Solaris 2 4 Reference Manual AnswerBook for information on controlling these options for a particular host adapter The default setting for scsi options has these values set SCSI OPTIONS DR SCSI OPTIONS SYNC Advanced Topics 303 304 SCSI OPTIONS PARITY SCSI OPTIONS TAG SCSI OPTIONS FAST SCSI OPTIONS WIDE Tagged Queueing For a definition of tagged queueing refer to the SCSI 2 specification To support tagged queueing first check the scsi options flag SCSI OPTIONS TAG to see if tagged queueing is enabled globally Next check to see if the target is a SCSI 2 device and whether it has tagged queueing enabled If this is all true attempt to enable tagged queueing by using scsi_ifsetcap 9F Code Example B 1 shows an example of supporting tagged queueing Code Example B 1 Supporting SCSI Tagged Queueing define ROUTE amp sdp sd address If SCSI 2 tagged queueing is supported by the disk drive and by the host adapter then we will enable it xsp gt tagflags 0 if scsi_options amp SCSI_OPTIONS_TAG amp amp devp sd inq inq rdf RDF SCSI2 amp amp devp sd inq inq cmdque if scsi ifsetcap ROUTE tagged qing 1 1 1 xsp gt tagflags FLAG STAG xsp gt throttle 256 else if scsi_ifgetcap ROUTE untagged qing 0 1 xsp gt dp gt options XX QUEUEING xsp gt throttle 3 else xsp gt dp gt
152. ch as those for graphics hardware provide user processes with direct access to the device These devices often require that only one process at a time accesses the device This chapter describes the set of interfaces that allow device drivers to manage access to such devices What Is A Device Context The context of a device is the current state of the device hardware The device context for a process is managed by the device driver on behalf of the process The device driver must maintain a separate device context for each process that accesses the device It is the device driver s responsibility to restore the correct device context when a process accesses the device Context Management Model An accelerated frame buffer is an example of a device that allows user processes such as graphics applications to directly manipulate the control registers of the device through memory mapped access Since these processes are not using the traditional I O system calls read 2 write 2 and ioct1 2 the device driver is no longer called when a process accesses the device However it is important that the device driver be notified when a process is about to access a device so that it can restore the correct device context and provide any needed synchronization 215 11 To resolve this problem the device context management interfaces allow a device driver to be notified when a user processes accesses memory mapped regions of the device an
153. check the device s status register to determine if the transfer completed without error If an error occurred the handler should indicate the appropriate error with bioerror 9F The handler should also clear the pending interrupt for the device and then complete the transfer by calling biodone 9F As the final task the handler clears the busy flag and calls cv_signa1 9F or cv_broadcast 9F on the condition variable signaling that the device is no longer busy This allows other threads waiting for the device in strategy 9E to proceed with the next data transfer Code Example 9 5 Synchronous block driver interrupt routine static u_int xxintr caddr t arg struct xxstate xsp struct xxstate arg struct buf bp u char temp status mutex enter amp xsp mu status xXsp oregp 2csr if status amp INTERRUPTING mutex exit amp xsp mu return DDI INTR UNCLAIMED Get the buf responsible for this interrupt bp xsp gt bp xsp gt bp NULL This example is for a simple device which either succeeds or fails the data transfer indicated in the command status register if if status amp DEVICE ERROR failure bp b resid bp b bcount bp b error EIO bp b flags B ERROR else success bp gt b_resid 0 Drivers for Block Devices 183 xsp regp csr CLEAR INTERRUPT Read the csr to flush any hardware st
154. checks The device can be opened for example it is on line and ready The device can be opened as requested the device supports the operation and the device s current state does not conflict with the request The caller has permission to open the device Code Example 9 2 Block driver open 9E routine static int xxopen dev t devp int flags int otyp cred t credp int instance struct xxstate xsp instance getminor devp xsp ddi_get_soft_state statep instance if xsp NULL return ENXIO Drivers for Block Devices 175 176 mutex enter amp xsp mu only honor FEXCL If a regular open or a layered open is still outstanding on the device th xclusive open must fail if flags amp FEXCL amp amp xsp gt open xsp gt nlayered mutex exit amp xsp mu return EAGAIN switch otyp case OTYP LYR xsp gt nlayered t break case OTYP_BLK xsp gt open 1 break default mutex exit amp xsp mu return EINVAL mutex exit amp xsp mu return 0 The otyp argument is used to specify the type of open on the device OTYP BLK is the typical open type for a block device A device may be opened several times with otyp set to OTYP_BLK though close 9E will only be called once the final close of type OTYP_BLK has occurred for the device ot yp is set to OTYP_LYR if the device is being used as a layered device For e
155. chitectures that conform to the Solaris 2 x DDI DKI Chapter Overview Chapter 1 Overview of the SunOS Kernel provides general background information about the SunOS kernel and the interfaces provided for device drivers Chapter 2 Hardware Overview discusses hardware issues related to device drivers xxi xxii Chapter 3 Overview of SunOS Device Drivers gives an outline of the kinds of device drivers and their basic structure Chapter 4 Multithreading describes the mechanisms of the SunOS multithreaded kernel that are of interest to driver writers Chapter 5 Autoconfiguration describes the support a driver must provide for autoconfiguration Chapter 6 Interrupt Handlers describes the interrupt handling mechanisms These include registering servicing and removing interrupts Chapter 7 DMA describes direct memory access DMA and the DMA interfaces Chapter 8 Drivers for Character Devices describes the structure and functions of a driver for a character oriented device Chapter 9 Drivers for Block Devices describes the structure and functions of a driver for a block oriented device Chapter 10 SCSI Target Drivers outlines the Sun Common SCSI Architecture and describes the additional requirements of SCSI target drivers Chapter 11 Device Context Management describes the set of interfaces that allow device drivers to manage the context of use
156. ck ticks since the last reboot The driver however usually has a maximum number of seconds or microseconds to wait so this value is converted to clock ticks with drv usectohz 9F and added to the value from drv getparm 9F Code Example 4 3 shows how to use cv_timedwait 9F to wait up to five seconds to access the device before returning EIO to the caller Code Example 4 3 Using cv timedwait 9F cGlogck t cur ticks to mutex enter amp xsp mu while xsp gt busy drv_getparm LBOLT amp cur_ticks to cur_ticks drv_usectohz 5000000 5 seconds from now if cv_timedwait amp xsp gt cv amp xsp mu to 1 The timeout time to was reached without the condition being signalled zY tidy up and exit Multithreading 83 84 mutex exit amp xsp mu return EIO xsp gt busy 1 mutex exit amp xsp mu cv wait sig There is always the possibility that either the driver accidentally waits for a condition that will never occur as described in cv timedwait on page 81 or that the condition will not happen for a long time In either case the user may want to abort the thread by sending it a signal Whether the signal causes the driver to wake up depends on the driver cv wait sig 9F allows a signal to unblock the thread This allows the user to break out of potentially long waits by sending a signal to the thread with kill 1l or by typing the interrupt character
157. cked the data in memory 3 Allocate DMA resources for the object 4 Retrieve the next DMA window with ddi dma nextwin 9F 5 Retrieve the next segment in the window with ddi dma nextseg 9F 6 Get a DMA cookie for the segment with ddi dma segtocookie 9F Writing Device Drivers August 1994 f 7 Program the DMA engine on the device and start it this is device specific When the transfer is complete continue the bus master operation 8 Perform any required object synchronizations 9 Transfer the rest of the window by repeating from Step 5 10 Transfer the rest of the object by repeating from Step 4 11 Release the DMA resources First party DMA In general here are the steps that must be performed to perform first party DMA 1 Allocate a DMA channel 2 Configure the channel with ddi dmae 1stparty 9F 3 Lock the DMA objects in memory This step is not necessary in block drivers for buffers coming from the file system as the file system has already locked the data in memory Allocate DMA resources for the object Retrieve the next DMA window with ddi dma nextwin 9F Retrieve the next segment in the window with ddi dma nextseg 9F N Oo OF A Get a DMA cookie for the segment with addi dma segtocookie 9F Program the DMA engine and start it When the transfer is complete continue the DMA operation 8 Perform any required object synchronizations 9 Transfer the rest of the window by
158. co EC bos da eva Floating Point Operations 2s es essed owe e eec ea Multiply and Divide Instructions SPARC Architecture Manual 0 00000 XODP EOCOSSOI 5S UB cor pv ocOe Oe bo CC n nebat de Data Alignment ova exu aq eroe EVE es step VP VV oh S Structure Member Alignment 2 2 06 rrr ttes Byte Ordering o ee EE ERA Reb ioc STD C ea xa Floating Point Operations sasas sess E Es Kx X A x86 Architecture Manuals ee AE ees System Memory Model 4 einer ek queer hs ees 3 Store DUlferec s aceto T seks x ek RRS RESELL S SPARG Memory Model copes a oi os XVIe a Dus Atrchitect reS s oe es ewe ete arate te ted ne Yos Device Identification 0 0 cee eee eee eee eee Writing Device Drivers August 1994 DEI Sue oue LESE US tux e re er D e UC d A 2b Internal Sequencing Logic eere een xr a Pete a 25 Interrapt Issues suas etc ee ake duda a Ed Cmte 25 Byte Ordering oa d b du SR kera ed qu aa a 26 The PROM on SPARC Machines 00 000000 26 Open Boot PROM 2 X au a cute eat si me Cet VIP do bla 27 Reading and Writing sendas e y br i3 32 The Sun Monitor s ure LR e oe DEDERE DNE UE CU 34 Overview of SunOS Device Drivers leeleuue 41 What is a Device DIVEI ec eedus be e b pet e eoa 41 Types of Device Drivers soa st eeeteboe qe qoi pie tees dd 42 Block Device Dr verss 222 2e x eR Werder RRRERI ES 42 Standard Character Device Drivers 04 42 STR
159. complete physio 9F is still blocked however Examining the physio 9F manual page points out that biodone 9F should be called to unblock biowait 9F This is the problem rd strategy did not call biodone 9F Adding a call to biodone 9F before returning fixes this problem Once a device driver is functional it should be thoroughly tested before it is distributed In addition to the testing done to traditional UNIX device drivers Solaris 2 x drivers require testing of Solaris 2 x features such as dynamic loading and unloading of drivers and multithreading Configuration lesting A driver s ability to handle multiple configurations is very important and is a part of the test process Once the driver is working on a simple or default configuration additional configurations should be tested Depending on the device this may be accomplished by changing jumpers or DIP switches If the number of possible configurations is small all of them should be tried If the number is large various classes of possible configurations should be defined and a sampling of configurations from each class should be tested The designation of such classes depends on how the different configuration parameters might interact which in turn depends on the device and on how the driver was written For each configuration the basic functions must be tested which include loading opening reading writing closing and unloading the driver Any function that de
160. configuration 231 existing drivers 241 tools 243 definition of 41 entry points 48 for character oriented devices 147 header files 60 layout structure 60 loadable interface 89 Writing Device Drivers August 1994 module configuration 62 overview 41 register mapping 98 source files 62 standard character 42 testing 263 types of 42 device information dev info node 97 self identifying 14 tree structure 5 6 device interrupt cookie 52 device interrupt handling ddi_add_intr 9F 97 112 ddi remove intr 9F 101 interrupt block cookie 97 device interrupts types of 107 device memory accessing 44 mapping 43 331 device polling overview 44 poll 2 system call 44 device registers accessing 44 ddi map regs 9F 98 ddi unmap regs 9F 101 examples of 45 mapping 95 device tree 5 devlinks 1M command 227 disk I O controls 299 performance 300 DKI See DDI DKI DMA buffer allocation 143 burst sizes 135 callbacks 138 cookie 120 engine programming 136 engine restrictions 126 freeing resources 137 handle 119 limits 127 locking 131 object 119 operations 124 private buffer allocation 143 register structure 134 resource allocation 131 resource interfaces 315 segment 120 transfers 156 types of 122 window 120 driver entry points 313 attach 9E 95 definition of 48 detach 9E 100 identify 9E 91 probe 9E 93 prop op 9E 58 DVMA SBus slots that support 17
161. ct Because of padding and alignment requirements the actual size might be larger than the requested size ddi dma addr setup 9F requires the actual length ddi dma devalign After allocating DMA resources for private data buffers ddi dma devalign 9F should be used to determine the minimum required data alignment and minimum effective transfer size Although the starting address for the DMA transfer will be aligned properly the offset passed to ddi dma htoc 9F allows the driver to start a transfer anywhere within the object eventually bypassing alignment restrictions The driver should therefore check the alignment restrictions prior to initiating a transfer and align the offset appropriately The driver should also check the minimum effective transfer size The minimum effective transfer size indicates for writes how much of the mapped object will be affected by the minimum access For reads it indicates how much of the mapped object will be accessed For memory allocated with ddi iopb alloc 9F the minimum transfer size will usually be one byte This means that positioning randomly within the mapped object is possible For memory allocated with ddi mem alloc 9F the minimum transfer size is usually larger as caches might be activated that only operate on entire cache lines line size granularity Example if ddi dma devalign xsp handle amp align amp mineffect DDI FAILURE error handling goto failure
162. ct buf bp NULL int instance instance ddi get instance dip allocate a state structure and initialize it xsp ddi get soft state statep instance sdp struct scsi device ddi get driver private dip Cross link the state and scsi device 9S structures xy sdp gt sd_private caddr t xsp xsp gt sdp sdp call scsi_probe 9F again to get and validate inquiry data Allocate a request sense buffer The buf 9S structure is set to NULL to tell the routine to allocate a new one The callback function is set to NULL FUNC to tell the routine to return failure immediately if no resources are available bp scsi alloc consistent buf amp sdp sd address NULL SENSE LENGTH B READ NULL FUNC NULL if bp NULL goto failed F F 0X 0X Create a Request Sense scsi_pkt 9S structure EJ rqpkt scsi_init_pkt amp sdp gt sd_address NULL bp CDB GROUPO 1 0 PKT CONSISTENT NULL_FUNC NULL if rqpkt NULL goto failed scsi alloc consistent buf 9F returned a buf 9S structure The actual buffer address is in b un b addr EJ sdp gt sd_sense struct scsi_extended_sense bp gt b_un b_addr SCSI Target Drivers 205 10 206 Create a GroupO0 CDB for the Request Sense command 5 makecom_g0 rqpkt devp FLAG NOPARITY SCMD REQUEST SENSE 0 SENSE LENGTH Fill in the rest of the scsi pkt
163. cute the same code simultaneously The way this should be done depends on the driver some drivers will require special testing applications but starting several UNIX commands in the background will be suitable for others It depends on where the particular driver uses locks and condition variables Testing a driver on a multiprocessor machine is more likely to expose problems than testing on a single processor machine Interoperability between drivers must also be tested particularly because different devices can share interrupt levels If possible configure another device at the same interrupt level as the one being tested and whether the driver correctly claims its own interrupts and otherwise operates correctly under the stress tests described above should be tested Stress tests should be run on both devices at once Even if the devices do not share an interrupt level this test can still be valuable for example if serial communication devices start to experience errors while a network driver is being tested that could indicate that the network driver is causing the rest of the system to encounter interrupt latency problems Performance of a driver under these stress tests should be measured using UNIX performance measuring tools This can be as simple as using the t ime 1 command along with commands used for stress tests DDI DKI Compliance Testing To assure compatibility with later releases and reliable support for the current relea
164. d drv xx Loading and Unloading Drivers 229 12 Removing the Driver Loading Drivers This is a simple case in which the device identifies itself as xx and the device special files will have default ownership and permissions 0600 root sys add drv 1M also allows additional names for the device aliases to be specified See add drv 1M to determine how to add aliases and set file permissions explicitly Note add drv 1M should not be run when installing a STREAMS module See the STREAMS Programmer s Guide for details To remove a driver from the system use rem drv 1M then delete the driver module and configuration file from the module path The driver cannot be used again until it is reinstalled with add drv 1M Opening a special file associated with the device driver causes the driver to be loaded modload 1M can also be used to load the driver into memory but does not call any routines in the module Opening the device is the preferred method Getting the Driver Module s ID 230 Individual drivers can be unloaded by module id To determine the module id assigned to a driver use modinfo 1M Find the driver s name in the output The first column of that entry is the driver s module ID modinfo Id Loadaddr Size Info Rev Module Name 124 211000 ldf4 101 1 xx xx driver v1 0 The number in the Info field is the major number chosen for the driver Writing Device Drivers Au
165. d int port unsigned long addr int count These routines read data of various sizes from the I O port with the address specified by port The inb 9F inw 9F and in1 9F functions read 8 bits 16 bits and 32 bits of data respectively returning the resulting values The repinsb 9F repinsw 9F and repinsd 9F functions read multiple 8 bit 16 bit and 32 bit values respectively count specifies the number of values to be read addr is a pointer to a buffer that will receive the input data The buffer must be long enough to hold count values of the requested size void outb int port unsigned char value void outw int port unsigned short value void outl int port unsigned long value void repoutsb int port unsigned char addr int count void repoutsw int port unsigned short addr int count void repoutsd int port unsigned long addr int count These routines write data of various sizes to the I O port with the address specified by port The outb 9F outw 9F and out 1 9F functions write 8 bits 16 bits and 32 bits of data respectively writing the data specified by value Summary of Solaris 2 4 DDI DKI Services 335 lll C SCSIand SCSA 336 The repoutsb 9F repoutsw 9F and repout sd 9F functions write multiple 8 bit 16 bit and 32 bit values respectively count specifies the number of values to be written addr is a pointer to a buffer from which the output values are fetched These i
166. d to control accesses to the device s hardware Synchronization and management of the various device contexts is the responsibility of the device driver When a user process accesses a mapping the device driver must restore the correct device context for that process A device driver will be notified whenever one of the following events occurs on a mapping Access to a mapping by a user process Duplication of a mapping by a user process Freeing of a mapping by a user process Figure 11 1 is a snapshot of multiple user processes that have memory mapped a device Process B has been granted access to the device by the driver and the driver no longer requires notification by process B However the driver does require notification if either process A or process C access the device User Processes Current Context Process A Process B Process C Se Hardware Device 216 Figure 11 1 Device context management Writing Solaris Graphics Device Drivers August 1994 i At some point in the future process A accesses the device The device driver is notified of this and blocks future access to the device by process B It then restores the device context of process A and grants access to process A This is illustrated in Figure 11 2 At this point the device driver requires notification if either process B or process C access the device User Processes Current Conte
167. data may be written to the file descriptors without delay Drivers referred to by these file descriptors must provide support for the po11 2 system call by implementing a chpo11 9E entry point Drivers for communication devices such as serial ports should support polling since they are used by applications that require synchronous notification of changes in read and write status Many communications devices however are better implemented as STREAMS drivers STREAMS Drivers Device Issues STREAMS is a separate programming model for writing a character device Devices that receive data asynchronously such as terminal and network devices are suited to a STREAMS implementation STREAMS device drivers must provide the loading and autoconfiguration support described in Chapter 5 Autoconfiguration See the STREAMS Programmer s Guide for additional information on how to write STREAMS drivers Accessing Device Registers 46 There are two common ways of accessing device registers through memory mapping and I O ports The preferred method depends on the device it is not generally software configurable For example SBus and VMEbus devices do not provide I O ports but some ISA MCA and EISA devices may provide both access methods Memory mapped Access In memory mapped access device registers appear in memory address space and are treated as normal memory Just as the driver needs a kernel virtual address to access physical memory
168. ddi prop create makedevice DDI MAJOR T UNKNOWN instance dip DDI PROP CANSLEEP nblocks caddr t amp xsp nblocks sizeof int DDI PROP SUCCESS cmn err CE CONT s cannot create nblocks property n ddi get name dip free resources allocated so far return DDI FAILURE xsp open 0 xsp nlayered 0 return DDI SUCCESS return DDI FAILURE Writing Device Drivers August 1994 Ko lll Properties are associated with device numbers In Code Example 9 1 attach 9E builds a device number using makedevice 9F At this point however only the minor number component of the device number is known so it must use the special major number DDI_MAJOR_T_UNKNOWN to build the device number Controlling Device Access This section describes aspects of the open 9E and close 9E entry points that are specific to block device drivers See Chapter 8 Drivers for Character Devices for more information on open 9E and close 9E open int xxopen dev t devp int flag int otyp cred t credp The open 9E entry point is used to gain access to a given device The open 9E routine of a block driver is called when a user thread issues an open 2 or mount 2 system call on a block special file associated with the minor device or when a layered driver calls open 9E See File I O on page 171 for more information The open 9E entry point should make the following
169. ddress of the completed command Callback Example In Code Example 7 1 xxstart is used as the callback function and the per device state structure is given as its argument xxstart attempts to start the command If the command cannot be started because resources are not available xxstart is scheduled to be called sometime later when resources might be available Since xxstart is used as a DMA callback it must follow these rules imposed on DMA callbacks It must not assume that resources are available it must try to allocate them again It must indicate to the system whether allocation succeed by returning 0 if it fails to allocate resources and needs to be called again later or 1 indicating success so no further callback is necessary See ddi dma req 95 for a discussion of DMA callback responsibilities Code Example 7 1 Allocating DMA resources static int xxstart caddr t arg Writing Device Drivers August 1994 N lll struct xxstate xsp struct xxstate arg struct device_reg regp int flags mutex_enter amp xsp gt mu if xsp gt busy transfer in progress mutex exit amp xsp mu return 0 xsp gt busy 1 mutex_exit amp xsp gt mu regp xsp gt regp if transfer isa read flags DDI_DMA_READ else flags DDI_DMA_WRITE if ddi dma buf setup xsp dip xsp gt bp flags xxstart caddr t xsp amp limits amp xsp
170. des virtual addresses for direct memory transfers Handles Windows Segments and Cookies A DMA handle is an opaque pointer representing an object usually a memory buffer or address where a device can perform DMA transfer The handle is used in several different calls to DMA routines to identify the DMA resources allocated for the object An object represented by a DMA handle is completely covered by one or more DMA windows The system uses the information in the DMA limit structure and the memory location and alignment of the target object to decide how to divide an object into multiple windows in order to fit the request within system resource limitations The ddi_dma_nextwin 9F function takes a DMA handle obtained from a DMA setup function and a previous window or NULL for the first window and passes back the next or first window of the object An active DMA window may represent allocated resources such as intermediate buffers The resources will be released upon the next call to ddi_dma_nextwin 9F or when the DMA resources are freed using ddi_dma_free 9F A DMA window can span several discontiguous pages of system memory If the DMA engine does not have a memory map a DMA window might have to be broken into multiple DMA segments each representing a contiguous piece of memory to or from which the DMA engine can transfer data The ddi_dma_nextseg 9F function takes a DMA window obtained from ddi_dma_nextwin 9F and a previous segm
171. device driver entry points are used to manage device context mapdev access int xxmapdev access ddi mapdev handle t handle void devprivate off t offset This entry point is called when an access is made to a mapping whose translations are invalid Mapping translations are invalidated when the mapping is created with ddi mapdev 9F in response to mmap 2 duplicated by fork 2 or explicitly invalidated by a call to ddi_mapdev_intercept 9F Writing Solaris Graphics Device Drivers August 1994 i handle represents the mapping that was accessed by a user process devprivate is a pointer to the driver private data associated with the mapping offset is the offset within the mapping that was accessed In general mapdev_access 9E should call ddi_mapdev_intercept 9F with the handle of the mapping that currently has access to the device to invalidate the translations for that mapping This ensures that a call to mapdev_access 9E occurs for the current mapping the next time it is accessed To validate the mapping translations for the mapping that caused the access event to occur the driver must restore the device context for the process requesting access and call ddi_mapdev_nointercept 9F on the handle of the mapping that generated the call to this entry point Accesses to portions of mappings that have had their mapping translations validated by a call to ddi_mapdev_nointercept 9F do not generate a call to mapdev_acc
172. device reg volatile u char csr volatile u char data The driver then maps the registers into memory and refers to them through a pointer to the structure 48 Writing Device Drivers August 1994 Qo lll struct device_reg regp ddi_map_regs caddr_t amp regp The code that reads the data register upon a completed transfer now looks like this if regp gt csr amp TRANSFER COMPLETE data regp gt data Structure Padding A device that has a one byte command status register followed by a four byte data register might lead to the following structure layout struct device_reg u_char csr u conc data The above structure is not correct because the compiler places padding between the two fields For example the SPARC processor requires each type to be on its natural boundary which is byte alignment for the csr field but four byte alignment for the data field This results in three unused bytes between the two fields Using this structure the driver would be three bytes off when accessing the data register Finding Padding The ANSI C of fsetof 3C macro may be used in a test program to determine the offset of each element in the structure Knowing the offset and the size of each element the location and size of any padding can be determined Code Example 3 1 Structure padding finclude sys types h include lt stdio h gt include lt stddef h gt struct de
173. device specific parameters It is frequently used to set a device specific mode either by setting internal driver software flags or by writing commands to the device It can also be used to return information to the user about the current device state In short it can do whatever the application and driver need it to do ioctl 9E int xxioctl dev t dev int cmd int arg int mode cred t credp int rvalp The cmd parameter indicates which command ioct1 9E should perform By convention I O control commands indicate the driver they belong to in bits 8 15 of the command usually given by the ASCII code of a character representing the driver and the driver specific command in bits 0 7 They are usually created in the following way define XXIOC x lt lt 8 define XX_GET_STATE XXIOC 1 get status register define XX_SET_CMD XXIOC 2 send command The interpretation of arg depends on the command I O control commands should be documented in the driver documentation or a manual page and defined in a public header file so that applications know the names what they do and what they accept or return as arg Any data transfer of arg into or out of the driver must be performed by the driver ioct1 9E is usually a switch statement with a case for each supported ioct1 9E request Drivers for Character Devices 167 lll Co Code Example 8 13 ioctl 9E routine static int xxioctl d
174. di add intr 9F returns This may result in the interrupt routine being called before any mutexes have been initialized with the interrupt block cookie If the interrupt routine acquires the mutex before it has been initialized undefined behavior may result See Registering Interrupts on page 111 for a solution to this problem Mapping Device Drivers In the ddi map regs 9F call dip is the dev info pointer passed to attach 9E rnumber specifies which register set to map if there is more than one For devices with only one register set pass 0 for rnumber The register specifications referred to by rnumber are described by the reg property see driver conf 4 isa 4 eisa 4 mca 4 sysbus 4 vme 4 and sbus 4 ddi map regs 9F maps a device register set register specification and returns a kernel virtual address in xsp regp This address is offset bytes from the base of the device register set and the mapping extends sizeof struct device reg bytes beyond that To map all of a register set pass zero for offset and the length Minor Device Nodes A minor device node contains the information exported by the device that the system uses to create a special file for the device under devices in the filesystem In the call to ddi create minor node 9F the minor name is the character string that is the last part of the base name of the special file to be created for this minor device number for example b raw in Writing Dev
175. di mapdev nointercept 9F 5 Free the device context structure if needed State Structure This section adds the following fields to the state structure See State Structure on page 51 for more information kmutex t ctx lock struct xxctx current ctx The structure xxctx is the driver private device context structure for the examples used in this section It looks like this struct xxctx ddi mapdev handle t handle char context XXCTX SIZE struct xxstate XSp The context field stores the actual device context In this case it is simply a chunk of memory in other cases it may actually be a series of structure fields corresponding to device registers Writing Solaris Graphics Device Drivers August 1994 i Declarations and Data Structures Device drivers that use the device context management interfaces must include the following declaration char depends on misc seg mapdev ddi mapdev ctl The device driver must allocate and initialize a ddi_mapdev_ct1 9S structure to inform the system of its device context management entry point routines This structure contains the following fields struct ddi mapdev ctl int mapdev rev int mapdev access ddi mapdev handle t handle void private off t offset void mapdev free ddi mapdev handle t handle void private int mapdev dup ddi mapdev handle t oldhandle void oldprivate ddi mapdev handle t newhandle void new
176. dmae_lstparty 9F function is used by device drivers using first party DMA to configure a channel in the system s DMA engine to operate in a slave mode int ddi dmae getlim dev info t dip ddi dma lim t limitsp The ddi dmae getlim 9F function fills in the DMA limit structure pointed to by 1imitsp with the DMA limits of the system DMA engine This limit structure must be passed to the DMA setup routines so that they will know how to break the DMA request into windows and segments If the device has any particular restrictions on transfer size or granularity for example a disk sector size the driver should further restrict the values in the structure members before passing them to the DMA setup routines The driver must not relax any of the restrictions embodied in the structure after it is filled in by ddi dmae getlim 9F int ddi iomin dev info t dip int initial int streaming ddi iomin 9F returns an integer that encodes the required alignment and the minimum number of bytes that must be read or written by the DMA controller of the device identified by dip ddi iomin 9F is like ddi dma devalign 9F but the memory object is assumed to be primary memory and the alignment is assumed to be equal to the minimum possible transfer int ddi iopb alloc dev info t dip ddi dma lim t limits u int length caddr t iopbp ddi_iopb_alloc 9F allocates a block of length bytes of memory subject to constraints specified by 1i
177. dware SCSI Hardware SCSI Hardware Interface Interface Figure 10 1 SCSA Block Diagram SCSI Target Drivers 193 10 The lower level software component consists of a SCSA interface layer and one or more host bus adapter drivers The host bus adapter driver has several responsibilities It must Manage host bus adapter hardware Accept SCSI commands from the SCSI target driver Transport the commands to the specified SCSI target device Perform any data transfers that the command requires Collect status Handle auto request sense optional Inform the target driver of command completion or failure The target driver is completely responsible for the generation of the proper SCSI commands required to execute the desired function General Flow of Control When transferring data to or from a user address space using the read 9E or write 9E entry points SCSI target character device drivers must use hysio 9F to encode the request into a bu 95 structure and call the driver s trategy 9E routine u O hysio 9F locks down the user buffer into memory before issuing a SCSI command The file system locks down memory for block device drivers See Chapter 9 Drivers for Block Devices for more information on writing a trategy 9E entry point and Chapter 8 Drivers for Character Devices for more information on using physio 9F n Assuming no transport errors occur the following steps descr
178. e allocate resources specific to each printer instance in attach 9E Similarly in fini 9E release only those resources allocated by init 9E Note Once _init 9E has called mod install 9F none of the data structures hanging off of the modlinkage 9S structure should be changed by the driver as the system may make copies of them or change them Each driver must provide five entry points that are used by the kernel for device configuration They are identify 9E probe 9E attach 9E detach 9E getinfo 9E Every device driver must have an identify 9E attach 9E and getinfo 9E routine probe 9E is only required for non self identifying devices For self identifying devices an explicit probe routine may be provided or nulldev 9F may be specified in the dev ops structure for the probe 9E entry point identify The system calls identify 9E to find out whether the driver drives the device specified by dip Code Example 5 2 identify 9E routine static int xxidentify dev info t dip Autoconfiguration 93 94 if strcmp ddi_get_name dip xx 0 return DDI IDENTIFIED else return DDI NOT IDENTIFIED If the device is known by several different names identify 9E should check for a match with each name before failing The names must also have been passed with aliases to add drv 1M when the driver was installed See Chapter 12 Loading and Unloading Dr
179. e and the offset of fp is derived from the DMA cookie referred to by cookiep ddi dma coff 9F can be used after a DMA transfer is complete to find out where the DMA controller stopped int ddi dma curwin ddi dma handle t handle off t offp u int lenp ddi dma curwin 9F passes back the offset and length of the current DMA window in the locations pointed to by of fp and lenp respectively int ddi dma devalign ddi dma handle t handle u int alignment u int minxfr ddi dma devalign 9F passes back in the location pointed to by alignment the required alignment for the beginning of a DMA transfer using the resources identified by handle The alignment will be a power of two ddi dma devalign 9F also passes back in the location pointed to by minxfr the minimum number of bytes of the mapping that will be read or written in a single transfer int ddi dma htoc ddi dma handle t handle off t off ddi dma cookie t cookiep ddi_dma_htoc 9F passes back a DMA cookie in the location pointed to by cookiep that represents a DMA transfer starting at o f in the DMA resources identified by handle The DMA cookie is described in ddi dma cookie 95 that contains information about a potential DMA transfer The field dmac address contains the transfer address for the DMA controller Summary of Solaris 2 4 DDI DKI Services 319 320 int ddi dma movwin ddi dma handle t handle off t offp u int lenp ddi dma cookie t cookiep dd
180. e used This is the default mode Xa ANSI C Mode This mode accepts ANSI C and Sun C compatibility extensions In case of a conflict between ANSI and Sun C the compiler issues a warning and uses ANSI C interpretations This will be the default mode in the future Function Prototypes Function prototypes specify the following information to the compiler The type returned by the function The number of the arguments to the function The type of each argument This allows the compiler to do more type checking and also to promote the types of the parameters to the type expected by the function For example if the compiler knows a function takes a pointer casting NULL to that pointer type is no longer necessary Prototypes are provided for most Solaris 2 x DDI DKI functions provided the driver includes the proper header file documented in the manual page for the function New Keywords There are a few new keywords available in ANSI C The following keywords are of interest to driver writers const The const keyword can be used to define constants instead of using define Writing Device Drivers August 1994 Qo lll const int count 5 However it is most useful when combined with function prototypes Routines that should not be modifying parameters can define the parameters as constants and the compiler will then give errors if the parameter is modified Since C passes parameters by value most parameters don t
181. e Example 6 3 attach 9E routine handling high level interrupts static int xxattach dev info t dip ddi attach cmd t cmd struct xxstate xsp if ddi intr hilevel dip inumber add null high level handler if ddi add intr dip inumber amp xsp high iblock cookie NULL u int caddr_t nulldev NULL DDI SUCCESS goto failed mutex init amp xsp high mu xx high mutex MUTEX DRIVER void xsp high iblock cookie ddi remove intr dip inumber xsp high iblock cookie if ddi add intr dip inumber amp xsp high iblock cookie amp xsp high idevice cookie xxhighintr caddr t xsp DDI SUCCESS goto failed 116 Writing Device Drivers August 1994 O lll add null low level handler if ddi add softintr dip DDI SOFTINT HI amp xsp id amp xsp low iblock cookie NULL u int caddr t nulldev NULL DDI SUCCESS goto failed mutex init amp xsp low mu xx low mutex MUTEX DRIVER void xsp low iblock cookie ddi remove softintr xsp id if ddi add softintr dip DDI SOFTINT HI amp xsp gt id amp xsp low iblock cookie NULL xxlowintr caddr t xsp DDI SUCCESS goto failed else add normal interrupt handler cv init amp xsp cv xx condvar CV DRIVER NULL return DDI SUCCESS failed free allocated resources remove interrupt handlers return DDI FAILURE
182. e allocated resources Figure 7 1 shows the relationship between the DMA object the DMA handle and the DMA windows segments and cookies Writing Device Drivers August 1994 The DMA Model C DMA COOKIES C C C DMA COOKIES Figure 7 1 The DMA Model 123 f Types of Device DMA DMA and DVMA 124 Devices may perform one of the following three types of DMA Bus Master DMA If the device is capable of acting as a true bus master then the driver should program the device s DMA registers directly The transfer address and count is obtained from the cookie and given to the device Devices on current SPARC platforms use this form of DMA exclusively Third party DMA Third party DMA utilizes a system DMA engine resident on the main system board which has several DMA channels available for use by devices The device relies on the system s DMA engine to perform the data transfers between the device and memory The driver uses DMA engine routines see ddi_dmae 9F to initialize and program the DMA engine For each DMA data transfer the driver programs the DMA engine and then gives the device a command to initiate the transfer in cooperation with that engine First party DMA Under first party DMA the device drives its own DMA bus cycles using a channel from the system s DMA engine The ddi_dmae_lstparty 9F function is used
183. e file which can then be examined with adb 1 cd var crash test 1s bounds unix 0 vmcore 0 adb k unix 0 vmcore 0 physmem ac0 The first step is to examine the stack to determine where the system was when it crashed c complete panic 0x0 0x1 0xf0056c00 0x7d0 0xf0056c00 0xe3 114 do panic 0xf00be7ac 0xf 0269750 0x4 0xb 0xb 0xf00b6c00 1c die 0x9 0xf0269704 0x4 0x80 0x1 0xf00be7ac 5c trap 0x9 0xf0269704 0x4 0x80 0x1 0xf02699d8 6b4 Debugging 259 13 260 This stack trace is not very helpful initially since the ramdisk routines are not on the stack trace However there is a useful bit of information the call to trap The first argument to trap is the trap type in this case 9 which isa T DATA FAULT trap from lt sys trap h gt See The SPARC Architecture Version 8 manual for more information The second argument to t rap is a pointer to a regs structure containing the state of the registers at the time of the trap 0xf0269704 regs 0xf0269704 psr pc npc cO ff2dd8b0 ff2dd8b4 0xf0269710 y gi g2 g3 e0000000 ffffff98 8000000 ffffff80 Oxf0269720 g4 g5 g6 g7 0 02699d8 1 22c800 0xf 0269730 o0 o1 o2 o3 f02697a0 ff080000 19000 ef709000 0xf0269740 o4 o5 o6 o7 8000 0 0269750 VEEL EEL Note that the PC was 2dd8b0 when the trap occurred The next step is to determine which routine that is in 2dd8b0 i rd_writet 0Ox2c ld 02 0
184. e header file To force the breakpoint to be reached the prtvtoc 1M command is used It known to issue this I O control prtvtoc dev rdsk c0t3d0s0 breakpoint sdioct1 4 mov 10 00 kadb 0 e sdioctl 0x800018 0x40b 0xeffffc24 0x1 0xff22fa80 0xf01e9918 4 ioctl 0xf01e9e90 0xf01e9918 0x1 0x40b 0xff2ab380 0xff0894b4 lec syscall 0xf00c1c54 4d4 Syscall 0x3 8c 2 2 7fffffff Syssize 0x3 0xeffffc24 0xeffffd6c 0x5403148 0x0 0x5452ea0 20338 Syssize 0x3 0xefffff7c 0xeffffc24 0x80 0x3 0x0 fb70 Syssize 0Oxefffff7c 0x2000 0x1 0x1 0x1 0x3 f51c Syssize 0x2 0xeffffee4 0xeffffef0 0x22c00 0x0 0xffffffff eb8c kadb 1M cannot always determine where the bottom of the stack is In the above example the calls to Syssize and are not part of the stack Writing Device Drivers August 1994 13 Macros adb 1 and kadb 1M support macros adb 1 macros are in usr kvm lib adb while kadb 1M s macros are built in and can be displayed with M Most of the existing macros are for private kernel structures New macros for adb can be created with adbgen 1 Macros are used in the form address lt macroname threadlist is a useful macro that displays the stacks of all the threads in the system This macro that does not take an address and can generate a lot of output so be ready to use Control S and Control Q to start stop if necessary this is another good reason to use a tip window Control
185. e parent bus driver implicitly using the class key word and specifying class vme This removes the dependency on the name of the particular bus driver involved since the driver may be named differently on different platforms See driver conf 4 and vme 4 for further details Currently there are three buses supported on the x86 platform ISA Industry Standard Architecture EISA Extended Industry Standard Architecture MCA MicroChannel Architecture Hardware Overview 21 22 ISA Bus Memory and I O Space Two address spaces are provided memory address space and I O address space Depending on the device registers may appear in one or both of these address spaces Table 2 5 ISA bus address space ISA Space Address Data Transfer Physical Address Name Size Size Range Main Memory 24 16 Ox0 Oxffffff I O 8 16 Ox0 Oxfff Registers can be mapped in memory address space and used by the driver as normal memory see Memory mapped Access on page 44 Registers in I O space are accessed through I O port numbers using separate kernel routines See I O Port Access on page 45 for more information Hardware Configuration Files ISA bus devices require hardware configuration files to inform the system that the hardware may be present The configuration file must specify any device I O port addresses any interrupt capabilities that the device may have and any memory mapped addresses it may occupy
186. ect is going to be read by the CPU the CPU s view of the object must be synchronized by setting type to DDI DMA SYNC FORCPU Here is an example of synchronizing a DMA object for the CPU if ddi dma sync xsp handle 0 length DDI DMA SYNC FORCPU DDI SUCCESS the CPU can now access the transferred data else error handling If the only mapping that concerns the driver is one for the kernel such as memory allocated by ddi_mem_alloc 9F the flag DDI_DMA_SYNC_FORKERNEL can be used This is a hint to the system that if it can synchronize the kernel s view faster than the CPU s view it can do so otherwise it acts the same as DDI_DMA_SYNC_FORCPU Writing Device Drivers August 1994 N lll Allocating Private DMA Buffers Some device drivers may need to allocate memory for DMA transfers to or from a device in addition to doing transfers requested by user threads and the kernel Examples of this are setting up shared memory for communication with the device and allocating intermediate transfer buffers Two interfaces are provided for allocating memory for DMA transfers ddi_iopb_alloc 9F and ddi_mem_alloc 9F ddi_iopb_alloc ddi_iopb_alloc 9F should be used if the device accesses in a non sequential fashion or if synchronization steps using ddi_dma_sync 9F should be as lightweight as possible due to frequent use on small objects This type of access is commonly known as consiste
187. edev xsp ddi get soft state statep instance Figure out why this callback routine was called 4 if pkt pkt reason CMP CMPLT bp gt b_resid bp b bcount bioerror bp EIO Scsi destroy pkt pkt release resources biodone bp notify waiting threads else Command completed check status See scsi status 9S FJ ssp struct scsi_status pkt gt pkt_scbp if ssp sts busy error target busy or reserved else if ssp gt sts_chk send a request sense command else bp gt b_resid pkt pkt resid packet completed OK 212 Writing Device Drivers August 1994 10 scsi destroy pkt pkt biodone bp This is a very simple completion callback routine It checks to see whether the command completed and if it did not gives up immediately If the target was busy it gives up or if it returned a check condition status it sends a Request Sense command Otherwise the command succeeded If this is the end of processing for the command it destroys the packet and calls biodone 9F This example does not attempt to retry incomplete commands See Appendix D Sample Driver Source Code Listings for information about sample SCSI drivers Also see Appendix B Advanced Topics for further information SCSI Target Drivers 213 10 214 Writing Device Drivers August 1994 Device Context Management 11 Some device drivers su
188. ee rwlock 9F for more information Semaphores Counting semaphores are available as an alternative primitive for managing threads within device drivers See semaphore 9F for more information Thread Synchronization In addition to protecting shared data drivers often need to synchronize execution among multiple threads Condition Variables Condition variables are a standard form of thread synchronization They are designed to be used with mutexes The associated mutex is used to ensure that a condition can be checked atomically and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed Condition variables must be initialized by calling cv_init 9F and must be destroyed by calling cv_destroy 9F Note Condition variable routines are approximately equivalent to the routines sleep and wakeup used in SunOS 4 x Multithreading 79 80 Table 4 2 lists the condvar 9F interfaces The four wait routines cv wait 9F cv timedwait 9F cv wait sig 9F and cv timedwait sig 9F take a pointer to a mutex as an argument Table 4 2 Condition variable routines Name Description cv init 9F Initialize a condition variable cv destroy 9F Destroy a condition variable cv wait 9F Wait for condition cv timedwait 9F Wait for condition or timeout cv wait sig 9F Wait for condition or return zero o
189. el SUNW 501 1561 The reg property defines an array of register description structures containing the following fields u int bustype cookie for related bus type u int addr address of reg relative to bus u int size size of this register set For the bwtwo example the address is 0 Writing Device Drivers August 1994 No lll Mapping the Device To test the device it must be mapped into memory The PROM can then be used to verify proper operation of the device by using data transfer commands to transfer bytes words and long words If the device can be operated from the PROM even in a limited way the driver should also be able to operate the device To set up the device for initial testing perform the following three steps 1 Determine the physical address of the SBus slot the device is in Table 2 8 displays the physical addresses of various SBus slots on a SPARCstation 1 and SPARCstation 1 Table 2 8 SBus physical addresses SBus Slot Number Physical Address Space SBus slot 0 0 internal slot SBus slot 1 0x2000000 SBus slot 2 0x4000000 SBus slot 3 0x6000000 In this example the bwtwo device is located in slot 3 Consequently the physical address space for the device is 0x6000000 2 Determine the offset within the physical address space used by the device The offset used is specific to the device In the bwtwo example the video memory happens to start at offset
190. em code that determines the logical device address or logical block number for each block and builds a block I O request in the form of a bu 95 structure The driver st rategy 9E entry point then interprets the buf 95 structure and completes the request 171 9 State Structure Entry Points 172 This chapter adds the following fields to the state structure See State Structure on page 57 for more information int nblocks size of device int open flag indicating device is open int nlayered count of layered opens struct buf list head head of transfer request list struct buf list tail tail of transfer request list Associated with each device driver is a dev ops 95 structure which in turn refers to a cb ops 95 structure See Chapter 5 Autoconfiguration for details regarding driver data structures Table 9 1 lists the block driver entry points Table 9 1 Block Driver Entry Points Entry Point _init 9E _info 9E _fini 9E identify 9E probe 9E attach 9E detach 9E getinfo 9E dump 9E open 9E close 9E prop op 9E print 9E strategy 9E Description Initialize a loadable driver module Return information on a loadable driver module Prepare a loadable driver module for unloading Determine if the device driver supports a given physical device Determine if a device is present Perform device specific initialization Remove device specific sta
191. emove_intr remove an interrupt handler report dev rmalloc rmalloc iopbmap rmfree rmfree iopbmap rmvb rmvq scsi_abort scsi_dmafree scsi_dmaget scsi ifgetcap scsi ifsetcap scsi pktalloc scsi pktfree scsi poll scsi_resalloc ddi_report_dev rmallocmap rmalloc ddi iopb alloc rmfreemap rmfree ddi_iopb_free rmvb rmvq scsi abort scsi destroy pkt Scsi init pkt Scsi ifgetcap Scsi ifsetcap Scsi pktalloc scsi_pktfree scsi_poll scsi_init_pkt announce a device allocate resource map allocate space from a resource map allocate consistent memory free resource map free space back into a resource map free consistent memory remove a message block from a message remove a message from a queue abort a SCSI command free DMA resources for SCSI command allocate DMA resources for SCSI command get SCSI transport capability set SCSI transport capability allocate packet resources for SCSI command free packet resources for SCSI command run a polled SCSI command prepare a complete SCSI packet Writing Device Drivers August 1994 D lll Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description scsi_reset scsi_resfree scsi_slave selwakeup slaveslot sleep spln splr splx splstr strcmp strcpy
192. ent or NULL for the first segment and returns the next or first segment in the window A segment represents a contiguous object that is completely addressable in one DMA cookie DMA 125 lll N DMA Operations 126 The DMA cookie is a data structure that contains information such as the transfer address and count needed to program the DMA engine see ddi_dma_cookie 9S The ddi_dma_segtocookie 9F function takes a DMA segment obtained from ddi_dma_nextseg 9F and passes back a DMA cookie for that segment Scatter Gather Some DMA engines may be able to accept more than one cookie Such engines can perform scatter gather I O without the help of the system In this case it is most efficient if the driver uses ddi_dma_nextseg 9F and ddi_dma_segtocookie 9F to get as many cookies as the DMA engine can handle and program them all into the engine The device can then be programmed to transfer the total number of bytes covered by all these segments combined The steps involved in a DMA transfer are similar among the types of DMA Bus master DMA In general here are the steps that must be followed to perform bus master DMA 1 Describe the device limitations This allows the routines to ensure that the device will be able to access the buffer 2 Lock the DMA objects in memory see physio 9F Note This step is not necessary in block drivers for buffers coming from the file system as the file system has already lo
193. ent passed to xxintr is a pointer to the state structure for the device that may have issued the interrupt This was be set up by passing a pointer to the state structure as the intr handler arg argument to ddi add intr 9F in attach 9E Writing Device Drivers August 1994 6 Most of the steps performed by the interrupt routine depend on the specifics of the device itself Consult the hardware manual for the device to learn how to determine the cause of the interrupt detect error conditions and access the device data registers State Structure This section adds the following fields to the state structure See State Structure on page 57 for more information ddi iblock cookie t high iblock cookie ddi idevice cookie t high idevice cookie kmutex t high mu int softint running ddi iblock cookie t low iblock cookie kmutex t low mu ddi softintr t Ed Handling High Level Interrupts High level interrupts are those that interrupt at the level of the scheduler and above This level does not allow the scheduler to run therefore high level interrupt handlers cannot be preempted by the scheduler nor can they rely on the scheduler cannot block they can only use mutual exclusion locks for locking Because of this the driver must use ddi intr hilevel 9F to determine if it uses high level interrupts If ddi intr hilevel 9F returns true the driver can fail to attach or it can use a two level scheme to handle t
194. ently available The next step is to initialize the SCSI CDB using the makecom 9F family of functions makecom_g0 pkt sdp flags SCMD READ bp b blkno bp b bcount gt gt DEV BSHIFT This example builds a Group 0 Command Descriptor Block and fills in the pkt cdbp field as follows The command itself byte 0 is set from the fourth parameter SCMD READ The target device s logical unit number bits 5 7 of byte 1 is set using sd address field of sdp The pkt flags field is set from the flags parameter The address field bits 0 4 of byte 1 and bytes 2 and 3 is set from bp b blkno The count field byte 4 is set from the last parameter In this case it is set to bp b bcount gt gt DEV BSHIFT where DEV BSHIFT is the byte count of the transfer converted to the number of blocks After initializing the SCSI CDB initialize three other fields in the packet and store as a pointer to the packet in the state structure Writing Device Drivers August 1994 pkt pkt private opaque t bp pkt pkt comp xxcallback pkt pkt time 30 xsp gt pkt pkt The bu 95 pointer is saved in the pkt_private field for later use in the completion routine Transporting a Command After creating and filling in the scsi_pkt 9S structure the final step is to hand it to the host bus adapter driver if scsi transport pkt TRAN ACCEPT bp gt b_resid bp b bcount bioer
195. ents the system from crossing a 24 bit segment boundary when establishing mappings to the object dlim ctreg max specifies the maximum transfer count that the DMA engine can handle in one segment or cookie The limit is expressed as the maximum count minus one This transfer count limitation is a per segment limitation It is used as a bit mask so it must also be one less than a power of two dlim granular field describes the granularity of the device s DMA transfer ability in units of bytes This value is used to specify for example the sector size of a mass storage device DMA requests will be broken into multiples of this value If there is no scatter gather capability then the size of each DMA transfer will be a multiple of this value If there is scatter gather capability then a single segment will not be smaller than the minimum transfer value but may be less than the granularity however the total transfer length of the scatter gather list will be a multiple of the granularity value dlim sgllen specifies the maximum number of entries in the scatter gather list It is the number of segments or cookies that the DMA engine can consume in one I O request to the device If the DMA engine has no scatter gather list this field should be set to one dlim reqsize describes the maximum number of bytes that the DMA engine can transmit or receive in one I O command This limitation is only significant if it is less than dlim ctreg max 1 d
196. ep instance set up interrupt handler for the device f if ddi add intr dip inumber amp xsp iblock cookie amp xsp idevice cookie NULL intr handler intr handler arg DDI SUCCESS ddi soft state free statep instance return DDI FAILURE Autoconfiguration 97 98 map device registers i if ddi map regs dip rnumber caddr t amp xsp regp offset sizeof struct device_reg DDI_SUCCESS ddi_remove_intr dip inumber xsp gt iblock_cookie ddi soft state free statep instance return DDI FAILURE initialize locks Note that mutex init wants a ddi iblock cookie not the address of one as the fourth argument f mutex_init amp xsp gt mu xx mutex MUTEX_DRIVER void xsp gt iblock_cookie cv init amp xsp cv xx cv CV DRIVER NULL xsp gt dip dip initialize the rest of the software state structure make device quiescent device specific for devices with programmable bus interrupt level x program device interrupt level using xsp idevice cookie if ddi create minor node dip minorname S IFCHR minor number node type 0 DDI SUCCESS goto failed initialize driver data prepare for a later open of the device evice specific ddi report dev dip return DDI SUCCESS default return DDI FAILURE failed free allocated resources ddi unmap regs dip r
197. er to a private target driver state structure scsi pkt Structure This structure contains the following fields struct scsi address pkt address opaque t pkt private void pkt comp struct scsi pkt pkt long pkt flags u long pkt time u char pkt scbp u char pkt cdbp long pkt resid u long pkt state u long pkt statistics u char pkt reason pkt address is the target device s address set by scsi init pkt 9F pkt private is a place to store private data for the target driver It is commonly used to save the buf 9S pointer for the command pkt comp is the address of the completion routine The host bus adapter driver calls this routine when it has transported the command This does not mean that the command succeeded the target might have been busy or may not have responded before the time out time elapsed see the description for pkt time field The target driver must supply a valid value in this field though it can be NULL if the driver does not want to be notified SCSI Target Drivers 199 10 200 Note There are two different SCSI callback routines The pkt comp field identifies a completion callback routine called when the host bus adapter completes its processing There is also a resource callback routine called when currently unavailable resources are likely to be available as in scsi init pkt 9F pkt flags provides additional control information for example to transport the
198. es also known as the soft state routines These routines dynamically allocate retrieve and destroy memory items of a specified size and hide all the details of list management in a multithreaded kernel An item number is used to identify the desired memory item this can be and usually is the instance number assigned by the system The driver must provide a state pointer which is used by the soft state system to create the list of memory items static void statep Routines are provided to Initialize the provided state pointer ddi soft state init 9F Allocate space for a certain item ddi soft state zalloc 9F Retrieve a pointer to the indicated item ddi get soft state 9F Free the memory item ddi soft state free 9F Finish using the state pointer ddi soft state fini 9F When the module is loaded the driver calls ddi soft state init 9F to initialize the driver state pointer passing a hint indicating how many items to pre allocate If more items are needed they will be allocated as necessary The driver must call ddi soft state fini 9F when the driver is unloaded To allocate an instance of the soft state structure the driver calls ddi soft state zalloc 9F then ddi get soft state 9F to retrieve a pointer to the allocated structure This is usually performed when the device is attached and the inverse operation ddi soft state free 9L is performed when the device is detached
199. es are not immediately available the function pointed to by callback will be called when resources may have become available If callback is SLEEP FUNC scsi slave 9F may block waiting for resources Writing Device Drivers August 1994 C lll char scsi_sname u_char sense key scsi sname 9F decodes the SCSI sense key sense key and returns the corresponding sense key string int scsi transport struct scsi pkt pkt scsi transport 9F requests the host adapter driver to schedule the command packet pointed to by pkt for execution Use scsi_transport 9F to issue most SCSI command scsi_pol1 9F may be used to issue synchronous commands void scsi unprobe struct scsi device devp scsi unprobe 9F is used to free any resources that were allocated on the driver s behalf during scsi probe 9F void scsi unslave struct scsi device devp scsi_unslave 9F is used to free any resources that were allocated on the driver s behalf during scsi slave 9F Soft State Management These interfaces comprise the soft state structure allocator a facility that simplifies the management of state structures for driver instances These routines are the recommended way to keep track of per instance data int ddi soft state init void state p size t size size t n items ddi soft state init 9F sets up the soft state allocator to keep track of soft state structures for all device instances state p points a pointer to an
200. escribes a simple implementation of the prop op 9E routine that intercepts property requests then uses the existing software property routines to update property values For a complete description of all the parameters to prop op 9E see the manual page In Code Example 3 3 the prop op 9E intercepts requests for the nblocks property The driver updates a variable in the state structure whenever the property changes but only updates the property when a request is made It then uses the system routine ddi prop op 9F to get the new value If the property request is not specific to a device the driver does not intercept the request This is indicated when the value of the dev parameter is equal to DDI DEV T ANY the wildcard device number State Structure This section adds the following field to the state structure See State Structure on page 57 for more information int nblocks number of blocks in block device Code Example 3 3 prop op 9E routine static int xxprop op dev t dev dev info t dip ddi prop op t prop op int flags char name caddr t valuep int lengthp int instance struct xxstate xsp Overview of SunOS Device Drivers 61 lll Qo Driver Layout Header Files 62 if dev DDI DEV T ANY goto skip instance getminor dev xsp ddi get soft state statep instance if xsp NULL return DDI PROP NOTFOUND if strcmp name nblocks 0 ddi prop modify dev dip
201. ess 9E A subsequent call to ddi_mapdev_intercept 9F will invalidate the mapping translations and allows mapdev_access 9E to be called again If either ddi_mapdev_intercept 9F or ddi_mapdev_nointercept 9F return an error mnapdev access 9E should immediately return that error If the device driver encounters a hardware failure while restoring a device context a 1 should be returned Otherwise after successfully handling the access request napdev access 9E should return zero A return of other than zero from mapdev_access 9E will cause a SIGBUS or SIGSEGV to be sent to the process Code Example 11 2 shows how to manage a one page device context Code Example 11 2 mapdev_access 9E routine static int xxmapdev access ddi mapdev handle t handle void devprivate off t offset int error struct xxctx ctxp devprivate struct xxstate xsp ctxp gt xsp Device Context Management 223 11 mutex enter amp xsp ctx lock enable access callback for the current mapping if xsp current ctx NULL if error ddi mapdev intercept xsp gt current_ctx gt handle offset 0 0 xsp gt current_ctx NULL mutex exit amp xsp ctx lock return error Switch device context device dependent if xxctxsave xsp current ctx lt 0 Xsp current ctx NULL mutex exit amp xsp ctx lock return 1 if xxctxrestore ctxp lt 0 xsp gt current_ctx NULL mu
202. eter events specifies which conditions the driver should check If the appropriate conditions have occurred the driver sets that bit in revent sp If none of the conditions have occurred and anyyet is not set the address of the pollhead structure is returned in phpp Code Example 8 12 Interrupt routine supporting chpo11 9E static u_int xxintr caddr_t arg struct xxstate xsp struct xxstate arg u_char status temp normal interrupt processing Status xXSp regp 2csr if status amp DEVICE ERROR pollwakeup amp xsp pollhead POLLERR if just completed a read pollwakeup amp xsp pollhead POLLIN return DDI INTR CLAIMED Writing Device Drivers August 1994 8 pollwakeup 9F is usually called in the interrupt routine when a supported condition has occurred The interrupt routine reads the status from the status register and checks for the conditions It then calls pollwakeup 9F for each event to possibly notify polling threads that they should check again Note that pollwakeup 9F should not be called with any locks held since it could cause the chpo11 9E routine to be entered causing deadlock if that routine tries to grab the same lock Miscellaneous I O Control The ioct1 9E routine is called when a user thread issues an ioct1 2 system call on a file descriptor associated with the device The I O control mechanism is a catchall for getting and setting
203. ev info t dip int result ddi dev nintr 9F passes back in the location pointed to by result the number of different interrupt specifications that the device indicated by dip can generate This is useful when dealing with a device that can interrupt at more than one level int ddi intr hilevel dev info t dip u int inumber ddi intr hilevel 9F returns non zero if the system considers the interrupt specified by inumber on the device identified by dip to be high level Otherwise it returns zero These interfaces allow device drivers to store statistics about the device in the kernel for later retrieval by applications kstat t kstat create char module int instance char name char class uchar t type ulong t ndata uchar t ks flag kstat create 9F allocates and performs necessary system initialization of a kstat 9S structure After a successful call to kstat create 9F the driver must perform any necessary initialization of the data structure and then use kstat install 9F to make the kstat 9S structure accessible to user land applications Writing Device Drivers August 1994 C lll void kstat delete kstat t ksp kstat_delete 9F removes the kstat 9S structure pointed to by ksp from the kernel statistics data and frees associated system resources void kstat install kstat t ksp kstat install 9F allows the kstat 9S structure pointed to by ksp to be accessible by the user land applicati
204. ev t dev int cmd int arg int mode cred t credp int rvalp int instance u char Sr struct xxstate xsp instance getminor dev xsp ddi get soft state statep instance if xsp NULL return ENXIO switch cmd case XX_GET_STATUS csr xsp gt regp gt csr if ddi copyout amp csr caddr t arg sizeof u char mode 0 return EFAULT break case XX_SET_CMD if ddi_copyin caddr_t arg amp csr sizeof u_char mode 0 return EFAULT xsp gt regp gt csr csr break default generic ioctl unknown error return ENOTTY return 0 The cmd variable identifies a specific device control operation If arg contains a user virtual address ioct 1 9E must call ddi_copyin 9F or ddi_copyout 9F to transfer data between the data structure in the application program pointed to by arg and the driver In Code Example 8 13 for the case of an XX GET STATUS request the contents of xsp gt regp gt csr is copied to the address in arg When a request succeeds ioct1 9E can store in rvalp any integer value to be the return value of the ioct1 2 system call 168 Writing Device Drivers August 1994 8 that made the request Negative return values such as 1 should be avoided as they usually indicate the system call failed and many application programs assume negative values indicate failure An application that uses the I O control
205. evice Drivers August 1994 C lll void numtos unsigned long num char s numtos 9F converts the integer num to an ASCII decimal string and copies the string to the location pointed to by s The driver must provide the storage for the string s and assure that it can contain the result char strchr const char str int chr strchr 9F returns a pointer to the first occurrence of the character chr in the string pointed to by str or NULL if chr is not found in the string int strcmp const char s1 const char s2 strcmp 9F compares two null terminated character strings It returns zero if they are identical otherwise it returns a non zero value int strncmp const char s1 const char s2 size t n strncmp 9F compares the first n characters of the two strings It returns zero if these characters are identical otherwise it returns a non zero value char strcpy char dst const char srs strcpy 9F copies the character string pointed to by srs to the location pointed to by dst The driver must provide storage for the string dst and assure that it is long enough char strncpy char dst const char srs size t n strncpy 9F copies n characters from the string pointed to by srs to the string pointed to by dst The driver must provide storage for the string dst and assure that it is long enough Size t strlen const char sp strlen 9F returns the length of the character string pointed to by sp not
206. evice specific the driver should know how to interpret it There is an implementation specific agreement on the shape of the cookie The driver should not perform any manipulations such as logical or arithmetical on the cookie Freeing the DMA Resources After a DMA transfer completes usually in the interrupt routine the DMA resources may be released by calling ddi_dma_free 9F As described in Synchronizing Memory Objects on page 142 ddi_dma_free 9F calls ddi_dma_sync 9F eliminating the need for any explicit synchronization After calling ddi_dma_free 9F the DMA handle becomes invalid and further references to the handle have undefined results Code Example 7 3 shows how to use ddi_dma_free 9F Code Example 7 3 Freeing DMA resources static u_int xxintr caddr_t arg struct xxstate xsp struct xxstate arg u_char status temp mutex enter amp xsp mu read status Status xsp 2regp csr DMA 139 lll N if status amp INTERRUPTING mutex exit amp xsp mu return DDI INTR UNCLAIMED xsp gt regp gt csr CLEAR INTERRUPT for store buffers temp xsp gt regp gt csr ddi_dma_fr xsp gt handle check for errors xsp gt busy 0 mutex exit amp xsp mu if pending transfers void xxstart caddr t xsp return DDI_INTR_CLAIMED The DMA resources should be released and reallocated if a different o
207. ext window There are also 8 global registers The number of register windows ranges from 2 to 32 depending on the processor implementation Because drivers are normally written in C the fact that register windows are used is usually hidden by the compiler However it may be necessary to use them when debugging the driver See Debugging Tools on page 243 for more information on how register windows are used when debugging Also see the SPARC Assembly Language Reference Manual for more information Writing Device Drivers August 1994 No lll Floating Point Operations Drivers should not perform floating point operations since they are not supported in the kernel Multiply and Divide Instructions The Version 7 SPARC processors do not have multiply or divide instructions These instructions are emulated in software and should be avoided Since a driver cannot tell whether it is running on a Version 7 or Version 8 processor intensive integer multiplication and division should be avoided if possible Instead use bitwise left and right shifts to multiply and divide by powers of two SPARC Architecture Manual The SPARC Architecture Manual Version 8 contains more specific information on the SPARC CPU x86 Processor Issues This section describes a number of x86 processor specific topics including data alignment byte ordering and floating point instructions Data Alignment There are no alignment restrictions on data types
208. f as effectively every routine is a bottom half routine 2 A number of locking mechanisms a common mechanism is to use mutual exclusion locks mutexes mutex enter amp mu access shared data mutex exit amp mu A subtle difference from SunOS 4 X is that because everything is run by kernel threads the interrupt routine needs to explicitly acquire and release the mutex In SunOS 4 x this was implicit since the interrupt handler automatically ran at an elevated priority See Locking Primitives on page 76 for more information on locking Condition Variables In SunOS 4 X when the driver wanted the current process to wait for something such as a data transfer to complete it called sleep specifying a channel and a dispatch priority The interrupt routine then called wakeup on that channel to notify all processes waiting on that channel that something happened Since the interrupt could occur at any time the interrupt priority was usually raised to ensure that the wakeup could not occur until the process was asleep Code Example 13 2 SunOS 4 x synchronization method int busy global device busy flag int xxread dev uio dev t dev struct uio uio int S Converting a Device Driver to SunOS 5 4 279 280 S splr pritospl 6 while busy sleep amp busy PRIBIO 1 busy 1 void spl1x s do the read int xxintr busy 0 wakeup amp busy SunOS 5 X p
209. f type events listed in Code Example 8 11 occurs This function should be called only with one event at a time pollwakeup 9F might be called in the interrupt routine when the condition has occurred The following two examples show how to implement the polling discipline and how to use pollwakeup 9F Code Example 8 11 chpo11 9E routine static int xxchpoll dev t dev short events int anyyet short reventsp struct pollhead phpp vnt instance u char status short revent struct xxstate xsp instance getminor dev xsp ddi get soft state statep instance if xsp NULL return ENXIO revent 0 Valid events are POLLIN POLLOUT POLLPRI POLLHUP POLLERR This example checks only for POLLIN and POLLERR y status xsp gt regp gt csr Drivers for Character Devices 165 166 if events amp POLLIN amp amp data available to read revent POLLIN if events amp POLLERR amp amp status amp DEVICE ERROR revent POLLERR if nothing has occurred if revent 0 if anyyet phpp amp xsp pollhead reventsp revent return 0 In this example the driver can handle the POLLIN and POLLERR events see chpoll 9E for a detailed discussion of the available events The driver first reads the status register to determine the current state of the device The param
210. f54 ffeb8ed0 ffeb8e5c ffeb8de8 ffeb8cf8 ffeb8c54 ffeb8c04 ffeb7b5c options fd 1 7200000 virtual memory Q0 0 memoryQ0 0 sbusQ1 f8000000 auxiliary io01 f7400003 interrupt enableG1 f5000000 memory error81 f4000000 counter timer 1 3000000 eeprom l 2000000 audio 1 7201000 zs 1 0000000 zs 1 1000000 openprom packages Hardware Overview 29 30 The full node name can be used ok cd sbus 1 8000000 ok 1s ffecd450 bwtwoQ3 0 ffecc2f0 1e80 c00000 ffec9b38 esp 0 800000 ffec9af4 dma 0 400000 Rather than using the full node name in the previous example you could have used an abbreviation The abbreviated command line entry looks like this ok cd sbus The name is actually device slot offset for SBus devices The bwtwo device is in slot 3 and starts at offset 0 If an SBus device shows up in this tree the device has been recognized by the PROM The attributes command displays the PROM properties of a device These can be examined to determine what properties the device exports this is useful later to ensure that the driver is looking for the correct hardware properties These are the same properties that can be retrieved with ddi getprop 9F See sbus 4 and Properties on page 57 for related information ok cd bwtwo ok attributes monitor sense 00 00 00 03 intr 00 00 00 07 00 00 00 00 reg 00 00 00 03 00 00 00 00 01 00 00 00 device_type display mod
211. fecting the stack ok e80000 c TY Writing Device Drivers August 1994 2 Be careful when trying to query the device If the mappings are not set up correctly trying to read or write could cause errors There are special words provided to handle these cases cprobe wprobe and lprobe for example read from the given address but return zero if the location does not respond or nonzero if it does ok ffee0000 cQ Data Access Exception ok ffee0000 cprobe ok s 0 ok ffe80000 cprobe ok s QO ffffffff A region of memory can be shown with the dump word This takes an address and a length and displays the contents of the memory region in bytes In the following example the fill word is used to fill video memory with a pattern fill takes the address the number of bytes to fill and the byte to use there is also a wfill and an Lfill for words and longwords This causes the bwtwo to display simple patterns based on the byte passed 6800000 20000 map sbus constant fb fb 20000 ff fill fb 20000 O0 fill fb 18000 55 fill fb 15000 3 fill fb 10000 5 fill fb 5000 f9 fill 0000000 0 ann nnn vw Interrupts Certain machine specific interrupt levels are ignored when the Open Boot PROM controls the machine Hardware Overview 33 34 The Sun Monitor Normally the Sun Monitor is used on Sun 4 architectures with a VMEbus For complete documentation on SunMon see
212. fer The uio structure contains a pointer to an array of iovec 95 structures The base address of this array is held in uio iov and the number of elements is stored in uio iovocnt The uio offset field contains the 32 bit offset into the device that the application wants to begin the transfer at uio_loffset is used for 64 bit file offsets If the device does not support the notion of an offset these fields can be safely ignored The driver should intrepret either uio offset or uio loffset but not both The driver determines the offset used according to the settings of the flags field in the cp ops 95 structure The uio resid field starts out as the amount of data to be transferred the sum of all the iov len fields in uio iov and must be set by the driver to the amount of data not transferred before returning The read 2 and write 2 system calls use the return value from the read QE and write 9E entry points to determine if the transfer failed and then return 1 If the return value indicates success the system calls return the number of bytes requested minus uio resid If uio residis not changed by the driver the read 2 and write 2 calls will return 0 indicating end of file even though all the data was transferred The support routines uiomove 9F and physio 9F update the uio 9S structure directly If they are used no driver adjustments are necessary Drivers for Character Devices 155 lll Co Driver
213. fic to SVR4 These interfaces may not be supported in future releases of System V Only two interfaces belong to this group segmap 9E and hat getkp nun 9F The Solaris 2 x DDI DKI like its SVR4 counterpart is intended to standardize and document all interfaces between device drivers and the kernel In addition the Solaris 2 x DDI DKI is designed to allow source compatibility for drivers on any SunOS 5 x based machine regardless of the processor architecture such as SPARC or x86 It is also intended to provide binary compatibility for drivers running on any SunOS 5 x based processor regardless of the specific platform architecture sun4 sun4c sun4d sun4e Sun4m i86pc Drivers using only kernel facilities that are part of the Solaris 2 x DDI DKI are known as Solaris 2 x DDI DKI compliant device drivers The Solaris 2 x DDI DKI allows platform independent device drivers to be written for SunOS 5 x based machines These shrink wrapped binary compatible drivers allow third party hardware and software to be more easily integrated into SunOS 5 x based machines The Solaris 2 x DDI DKI is designed to be architecture independent and allow the same driver to work across a diverse set of machine architectures Platform independence is accomplished in the design of DDI portions of the Solaris 2 x DDI DKI The following main areas are addressed Interrupt handling Accessing the device space from the kernel or a user process regi
214. find the responsible interrupt handler The VMEbus supports vectored interrupts Polled Interrupts In polled or autovectored devices the only information the system has about a device interrupt is its bus interrupt priority level When a handler is registered the system adds the handler to list of potential interrupt handlers for the bus interrupt level When an interrupt occurs the system must determine which device of all the devices at that level actually interrupted It does this by calling all the interrupt handlers for that bus interrupt level until one of them claims the interrupt The SBus supports polled interrupts Software Interrupts The Solaris 2 x DDI DKI supports software interrupts also known as soft interrupts Soft interrupts are not initiated by a hardware device they are initiated by software Handlers for these interrupts must also be added to and removed from the system Soft interrupt handlers run in interrupt context and therefore can be used to do many of the tasks that belong to an interrupt handler Commonly hardware interrupt handlers are supposed to be very quick since they may suspend other system activity while running particularly in high level interrupt handlers For example they may prevent lower priority interrupts from occurring while they run For this reason hardware interrupt handlers should do the minimum amount of work needed to service the device Software interrupt handlers run
215. g mutex A 4 Thread two needs mutex A so it blocks holding mutex B These threads are now deadlocked This is hard to track down and usually even more so since the code paths are rarely so straightforward Also it doesn t always happen as it depends on the relative timing of threads one and two Scope of a Lock Experience has shown that it is easier to deal with locks that are either held throughout the execution of a routine or locks that are both acquired and released in one routine Avoid nesting like this static void xxfoo mutex enter amp softc lock xxbar static void xxbar mutex exit amp softc lock This example works but will almost certainly lead to maintenance problems If contention is likely in a particular code path try to hold locks for a short time In particular arrange to drop locks before calling kernel routines that might block For example mutex enter amp softc lock softc foo bar Advanced Topics 299 lll ee softc thingp kmem alloc sizeof thing t KM SLEEP mutex exit amp softc lock This is better coded as thingp kmem alloc sizeof thing t KM SLEEP mutex enter amp softc lock softc foo bar softc thingp thingp mutex exit amp softc lock Potential Panics Here is a set of mutex related panics panic recursive mutex enter mutex x caller x Mutexes are not reen
216. g page 330 Properties page 331 Register and Memory Mapping page 333 309 lll C buf 95 Handling 310 I O Port Access page 333 SCSI and SCSA page 336 Soft State Management page 341 String Manipulation page 344 System Information page 346 Thread Synchronization page 346 Timing page 351 uio 9S Handling page 352 Utility Functions page 352 STREAMS interfaces are not discussed here to learn about them see the STREAMS Programmer s Guide These interfaces manipulate the bu 95 data structure It is used to encode block I O transfer requests but some character drivers also use buf 9S to encode character I O requests with physio 9F Drivers that use buf 9S as their primary means of encoding I O requests have to implement a strategy 9E routine See Chapter 9 Drivers for Block Devices and Chapter 8 Drivers for Character Devices for more information void biodone struct buf bp biodone 9F marks the I O described by the buf 9S structure pointed to by bp as complete by setting the B DONE flag in bp b flags biodone 9F then notifies any threads waiting in biowait 9F for this buffer Call biodone 9F on bp when the I O request it encodes is finished void bioerror struct buf bp int error bioerror 9F marks the error bits in the I O described by the buf 9S structure pointed to by bp with error Writing Device Drivers August 1994 C lll v
217. gust 1994 I2z2 Unloading Drivers Normally the system automatically unloads device drivers when they are no longer in use During development it may be necessary to use modunload 1M to unload the driver before installing a new version In order for modunload 1M to be successful the device driver must not be active there must be no outstanding references to the device such as through open 2 or mmap 2 Use modunload 1M like this to unload a driver from the system modunload i module id In addition to being inactive the driver must have working detach 9E and fini 9E routines for modunload 1M to succeed To unload all currently unloadable modules specify module ID zero modunload i 0 Loading and Unloading Drivers 231 12 232 Writing Device Drivers August 1994 Debugging B This chapter describes how to debug a device driver This includes how to set up a tip 1 connection to the test machine how to prepare for a crash how to use existing memory driver and also some hints for coding the device driver It also introduces system debugging tools that are available and gives hints on how to test the device driver Note The information presented in this chapter is specific to the release of the operating system and is subject to change Machine Configuration Setting Upatip 1 Connection The serial ports on one system the host system can be used to connect
218. handle DDI DMA MAPPED really should check all return values in a switch return 0 program the DMA engine return 1 Burst Sizes SPARC device drivers specify the burst sizes their device supports in the dlim burstsizes field of the ddi_dma_lim 9S structure This is a bitmap of the supported burst sizes However when DMA resources are allocated the system might impose further restrictions on the burst sizes that may actually be used by the device The ddi dma burstsizes 9F routine can be used to obtain the allowed burst sizes It returns the appropriate burst size bitmap for the device When DMA resources are allocated a driver can ask the system for appropriate burst sizes to use for its DMA engine define BEST BURST SIZE 0x20 32 bytes if ddi dma buf setup xsp dip xsp gt bp flags xxstart caddr t xsp amp limits amp xsp handle DDI DMA MAPPED DMA 137 138 error handling return 0 burst ddi dma burstsizes xsp gt handle check which bit is set and choose one burstsize to program the DMA engine if burst amp BEST_BURST_SIZE program DMA engine to use this burst size else other cases Programming the DMA Engine When the resources have been successfully allocated the driver traverses the returned DMA window and finds the first segment Code Example 7 2 is a simple example of this Code Example 7 2 Traversing windo
219. he device mutex enter amp xsp mu bp av forw NULL if xsp list head Non empty transfer list xsp list tail av forw bp xsp list tail bp else Empty Transfer list xsp list head bp xsp ist tail bp mutex exit amp xsp mu Start the transfer if possible void xxstart caddr t xsp return 0 Drivers for Block Devices 185 186 3 Start the first transfer Device drivers that implement queuing usually have a start routine start is so called because it is this routine that dequeues the next request and starts the data transfer to or from the device In this example all requests regardless of the state of the device busy or free are processed by start Note start must be written so that it can be called from any context since it can be called by both the strategy routine in kernel context and the interrupt routine in interrupt context start is called by st rategy every time it queues a request so that an idle device can be started If the device is busy start returns immediately start is also called by the interrupt handler before it returns from a claimed interrupt so that a non empty queue can be serviced If the queue is empty start returns immediately Since start is a private driver routine it can take any arguments and return any type The example is written as if it will also be used as a DMA callback
220. he device may have and any memory mapped addresses it may occupy Configuration files for these devices should normally identify the parent bus driver as eisa See driver conf 4 and eisa 4 for further details Hardware Overview 23 lll No MCA Bus Device Issues Memory and I O Space Two address spaces are provided memory address space and I O address space Depending on the device registers may appear in one or both of these address spaces Table 2 7 MCA address space MCA Space Address Data Transfer Physical Address Name Size Size Range Main Memory 32 32 Ox0 Oxffffffff 1 O 8 16 32 Ox0 Oxfff Registers can be mapped in memory address space and used by the driver as normal memory see Memory mapped Access on page 44 Registers in I O space are accessed through I O port numbers using separate kernel routines See I O Port Access on page 45 for more information Hardware Configuration Files MCA bus devices require hardware configuration files to inform the system that the hardware may be present The configuration file must specify any device I O port addresses any interrupt capabilities that the device may have and any memory mapped addresses it may occupy Configuration files for these devices should normally identify the parent bus driver as mca See driver conf 4 and mca 4 for further details Timing Critical Sections 24 While most driver operations can be performed without
221. he diskhd list and starts a transfer Writing Device Drivers August 1994 B SCSA If the device is busy the driver should return from the xxstrategy entry point Once the hardware is done with the data transfer it generates an interrupt The driver s interrupt routine is then called to service the device After servicing the interrupt the driver can then call the start routine to process the next buf structure in the diskhd list Global Data Definitions The following is information for debugging useful when a driver runs into bus wide problems There is one global data variable that has been defined for the SCSA implementation scsi options This variable is a SCSA configuration longword used for debug and control The defined bits in the scsi options longword can be found in the file sys scsi conf autoconf h and have the following meanings when set Table B 3 SCSA Options Option Description SCSI OPTIONS DR enable global disconnect reconnect SCSI OPTIONS SYNC enable global synchronous transfer capability SCSI OPTIONS PARITY enable global parity support SCSI OPTIONS TAG enable global tagged queuing support SCSI OPTIONS FAST enable global FAST SCSI support 10MB sec transfers as opposed to 5 MB sec SCSI OPTIONS WIDE enable global WIDE SCSI Note The setting of scsi options affects all host adapter and target drivers present on the system as opposed to scsi_ifsetcap 9F Refer to scsi_hba_atta
222. he iov base field to the address passed to write 2 in this case bu er The uio 95 structure is what is passed to the driver write 9E routine see Vectored I O for more information about the uio 9S structure A problem is that this address is in user space not kernel space and so is not guaranteed to be currently in memory It is not even guaranteed to be a valid address In either case accessing a user address directly could crash the system so device drivers should never access user addresses directly Instead they should always use one of the data transfer routines in the Solaris 2 x DDI DKI that transfer data into or out of the kernel see Copying Data on page 313 and uio 9S Handling on page 352 for a summary of the available routines These routines are able to handle page faults either by bringing the proper user page in and continuing the copy transparently or by returning an error on an invalid access Two routines commonly used are copyout 9F to copy data from the driver to user space and copyin 9F to copy data from user space to the driver ddi_copyout 9F and ddi_copyin 9F operate similarly but are to be used in the ioct1 9E routine copyin 9F and copyout 9F can be used on the buffer described by each iovec 9S structure or uiomove 9F can perform the entire transfer to or from a contiguous area of driver or device memory In character drivers transfers are described by a uio 9S structure The uio
223. he on board slot and slots 0 3 are available for SBus cards Slots 4 14 are reserved Because some SBus systems such as the SPARCstation 1 may not allow some slots to perform DMA drivers that require DMA capability should use ddi slaveonly 9F to determine if their device is in a DMA capable slot For an example use of this function see attach on page 95 Hardware Configuration Files Hardware configuration files should be unnecessary for SBus devices However on some occasions drivers for SBus devices may need to use hardware configuration files to augment the information provided by the SBus card See driver conf 4 and sbus 4 for further details The VMEbus supports multiple address spaces An appropriate entry in the driver conf 4 file should be made for the address space used by the device generally this is not under control of the driver For DMA devices the address space that the board uses for its DMA transfers must be known by the driver this is usually a 32 or 24 bit space Address Spaces Sun 4 architecture machines that use a VMEbus are all based on the full 32 bit VMEbus Table 2 3 contains a listing of the VMEbus address types supported by the generic VMEbus Table 2 3 Generic VMEbus full set VMEbus Space Address Data Transfer Physical Address Name Size Size Range vme32d16 32 bits 16 bits 0x0 OxFFFFFFFF vme24d16 24 bits 16 bits 0x0 OxFFFFFF vmel6d16 16 bits 16 bits 0x0 OxFFFF Writ
224. he system resources become low 0x00000080 No autounloading streams the system will not attempt to unload the streams module when the system resources become low modloadandmodunload Since the kernel automatically loads needed modules and unloads unused ones these two commands are now obsolete However they can be used for debugging modload 1M can be used to force a module into memory The kernel may unload it subsequently but modload 1M may be used to insure that the driver has no unresolved references when loaded modunload 1M can be used to unload a module given a module ID which can be determined with modinfo 1M Unloading a module does not necessarily remove it from memory To unload all unloadable modules and forcibly remove them from memory so that they will be reloaded from the actual object file use module ID zero modunload i 0 Note modload 1M and modunload 1M may be removed in a future release Writing Device Drivers August 1994 1I3 Saving System Core Dumps When the system panics it writes the interesting portions of memory to the dump device which is usually the swap device This is a system core dump similar to the core dumps generated by applications To save a core dump there must be enough space in the swap area to contain it To be safe the primary swap area should be at least the size of main memory all the information is in main memory though not all
225. hem Properly handling high level interrupts is the preferred solution Note By writing the driver as if it always uses high level interrupts a separate case can be avoided However this does result in an extra software interrupt for each hardware interrupt The suggested method is to add a high level interrupt handler which just triggers a lower priority software interrupt to handle the device The driver should allow more concurrency by using a separate mutex for protecting data from the high level handler Interrupt Handlers 115 lll O High level Mutexes A mutex initialized with the interrupt block cookie that represents a high level interrupt is known as a high level mutex While holding a high level mutex the driver is subject to the same restrictions as a high level interrupt handler The only routines it can call are mutex_exit 9F to release the high level mutex ddi_trigger_softintr 9F to trigger a soft interrupt Example In the model presented here the high level mutex xsp high mu is only used to protect data shared between the high level interrupt handler and the soft interrupt handler This includes a queue that the high level interrupt handler appends data to and the low level handler removes data from and a flag that indicates the low level handler is running A separate low level mutex xsp 10w mu is used to protect the rest of the driver from the soft interrupt handler Cod
226. i dma movwin 9F moves the current DMA window in the mapping identified by handle The new window offset and length are passed back in the locations pointed to by of fp and lenp respectively If a pointer to a DMA cookie structure is passed in cookiep ddi dma movwin 9F calls ddi dma htoc 9F passes back a new DMA cookie in the location pointed to by cookiep int ddi dma nextseg ddi dma win t win ddi dma seg t seg ddi dma seg t nseg ddi dma nextseg 9F gets the next DMA segment within the specified window win If the current segment is NULL the first DMA segment within the window is returned int ddi dma nextwin ddi dma handle t handle ddi dma win t win ddi dma win t nwin ddi dma nextwin 9F shifts the current DMA window win within the object referred to by handle to the next DMA window nwin If the current window is NULL the first window within the object is returned int ddi dma segtocookie ddi dma seg t seg off t offp off t lenp ddi dma cookie t cookiep ddi dma segtocookie 9F takes a DMA segment and fills in the cookie pointed to by cookiep with the appropriate address length and bus type to be used to program the DMA engine ddi dma segtocookie 9F also fills in offp and lenp which specify the range within the object Writing Device Drivers August 1994 C lll int ddi_dma_setup dev_info_t dip struct ddi dma req dmaregqp ddi dma handle t handlep ddi dma setup 9F is the main DMA resour
227. ibe the general flow of control for a read or write request starting from the call to the target driver s strategy routine 1 The target driver s st rategy 9E routine checks the request and allocates a scsi_pkt 9S using scsi_init_pkt 9F The target driver initializes the packet and sets the SCSI CDB using the makecom 9F function The target driver also specifies a timeout and provides a pointer to a callback function which is called by the host bus adapter driver on completion of the command The buf 9S pointer should be saved in the scsi packet s target private space 194 Writing Device Drivers August 1994 10z 2 The target driver submits the packet to the host bus adapter driver using scsi transport 9F The target driver is then free to accept other requests The target driver should not access the packet while it is in transport If either the host bus adapter driver or the target support queueing new requests can be submitted while the packet is in transport 3 As soon as the SCSI bus is free and the target not busy the host bus adapter driver selects the target and passes the CDB The target executes the command and performs the requested data transfers The target controls the SCSI bus phase transitions The host bus adapter just responds to these transitions until the command completes 4 After the target sends completion status and disconnects the host bus adapter driver notifies the target driver by calling
228. ice Drivers August 1994 5 f d 1 7200000 b raw S_IFCHR means create a character special file Finally the node type is one of the following system macros or any string constant that does not conflict with the values of these macros See ddi_create_minor_node 9F for more information Table 5 1 Possible node types Constant Description DDI_NT_SERIAL Serial port DDI_NT_SERIAL_DO Dialout ports DDI_NT_BLOCK Hard disks DDI_NT_BLOCK_CHAN Hard disks with channel or target numbers DDI_NT_CD ROM drives CDROM DDI_NT_CD_CHAN ROM drives with channel or target numbers DDI_NT_FD Floppy disks DDI_NT_TAPE Tape drives DDI_NT_NET Network devices DDI_NT_DISPLAY Display devices DDI_PSEUDO General pseudo devices The node types DDI_NT_BLOCK DDI_NT_BLOCK_CHAN DDI_NT_CD and DDI_NT_CD_CHAN causes disks 1M to identify the device instance as a disk and to create a symbolic link in the dev dsk or dev rdsk directory pointing to the device node in the devices directory tree The node type DDI_NT_TAPE causes t apes 1M to identify the device instance as a tape and to create a symbolic link from the dev rmt directory to the device node in the devices directory tree The node type DDI_NT_SERIAL causes port s 1M to identify the device instance as a serial port and to create symbolic links from the dev term and dev cua directories to the device node in the devices director
229. ile each driver source file and link the resulting object files into a driver module For a driver called xx that has two C language source files the following commands are appropriate test cc D KERNEL c xxl c test cc D KERNEL c xx2 c test ld r o xx xxl o xx2 0 The KERNEL symbol must be defined while compiling kernel driver code No other symbols such as sun4c or sun4m should be defined other than driver private symbols DEBUG may also be defined to enable any calls to ASSERT 9F There is also no need to use the I flag for the standard headers Once the driver is stable optimization flags can be used For SPARCompilers 2 0 1 and ProCompilers 2 0 1 the normal O flag or its equivalent x02 may be used Note that x02 is the highest level of optimization device drivers should use see cc 1 Note Running 1d r is necessary even if there is only one object module Write a Hardware Configuration File If the device is non self identifying the kernel requires a hardware configuration file for it If the driver is called xx the hardware configuration file for it should be called xx conf See driver conf 4 isa 4 pseudo 4 sbus 4 scsi 4 and vme 4 for more information on hardware configuration files Arbitrary properties can be defined in hardware configuration files by adding entries of the form property value where property is the property name and value is its initial val
230. ile holding the mutex pointed to by mp cv wait 9F releases the mutex and blocks until a call is made to cv signal 9F or cv broadcast 9F for the condition variable pointed to by cup cv wait 9F then reacquires the mutex and returns Use cv wait 9F to block on a condition that may take a while to change void cv signal kcondvar t cvp cv signal 9F unblocks one cv wait 9F call that is blocked on the condition variable pointed to by cvp Call cv signal 9F when the condition that cv wait 9F is waiting for becomes true To unblock all threads blocked on this condition variable use cv broadcast 9F void cv broadcast kcondvar t cvp cv broadcast 9F unblocks all threads that are blocked on the condition variable pointed to by cvp To unblock only one thread use cv signal 9F int cv wait sig kcondvar t cvp kmutex t mp cv wait sig 9F is like cv_wait 9F but if the calling thread receives a signal while cv wait sig 9F is blocked cv wait sig 9F immediately reacquires the mutex and returns zero int cv timedwait kcondvar t cvp kmutex t mp long timeout cv timedwait 9F is like cv wait 9F but it returns 1 at time timeout if the condition has not occurred timeout is given as a number of clock ticks since the last reboot drv_usectohz 9F converts microseconds a platform independent time to clock ticks Summary of Solaris 2 4 DDI DKI Services 347 348 int cv timedwait sig kcondvar t cvp k
231. ill fill in the sd inq field of the scsi device structure E switch scsi_probe sdp NULL_FUNC case SCSIPROBE_FAILURE case SCSIPROBE_NORESP case SCSIPROBE_NOMEM In these cases device may be powered off in which case we may be able to successfully probe it at some future tim referred to as deferred attach E7 rval DDI PROBE PARTIAL break case SCSIPROBE NONCCS default Device isn t of the type we can deal with and or it will never be useabl a rval DDI_PROBE_FAILURE break case SCSIPROBE_EXISTS There is a device at the target lun address Check inq dtype to make sure that it is the right device type See scsi inquiry 9S for possible device types f switch sdp gt sd_ing gt inq_dtype case DTYPE_PRINTER scsi_log sdp xx SCSI_DEBUG found s device at target d lun dWn scsi_dname int sdp gt sd_ing gt ing_dtype target lun rval DDI PROBE SUCCESS break SCSI Target Drivers 203 10 204 case DTYPE NOTPRESENT default rval DDI PROBE FAILURE break scsi_unprobe sdp return rval A more thorough probe 9E routine could also check other fields of the scsi inquiry 95 structure as necessary to make sure that the device is of the type expected by a particular driver attach After the pr
232. in the SPARCstation example these devices are represented as children of their physical parent which in this case is the ISA bus node The SCSI host adapter also has two children a disk cmdk and a tape cmtp The x86 device tree also shows the pseudo bus nexus node This node is the parent of all pseudo device drivers drivers without hardware The prtconf 1M and sysde 1M commands display the internal device tree The devices hierarchy is the external representation of the device tree 1s 1 can be used to view it Writing Device Drivers August 1994 Hardware Overview 2 This chapter discusses some general issues about the hardware that SunOS 5 x runs on This includes issues related to the processor bus architectures and memory models supported by Solaris 2 x various device issues and the PROM used in Sun platforms Note The information presented here is for informational purposes only and may be of help during driver debugging However the Solaris 2 x DDI DKI hides many of these implementation details from device drivers SPARC Processor Issues This section describes a number of SPARC processor specific topics including data alignment byte ordering register windows and availability of floating point instructions Data Alignment All quantities must be aligned on their natural boundaries Using standard C data types short integers are aligned on 16 bit boundaries long integers are aligned on
233. in the tree are generally associated with buses such as the SBus SCSI and EISA busses These nodes are called bus nexus nodes and the drivers associated with them are called bus nexus drivers Bus nexus drivers encapsulate the architectural dependencies associated with a particular bus This manual does not document writing bus nexus drivers This approach allows drivers to function regardless of the architecture of the machine or the processor In all of the architectural configurations in Figure 1 1 the xyz driver can be source compatible and it can be binary compatible if the system uses the same Instruction Set Architecture Additionally in Figure 1 1 the bus nexus driver associated with the SBus to VMEbus adapter card handles all of the architectural dependencies of the interface The xyz driver only needs to know that it is connected to a VMEbus Example Device Tree In this example the system builds a tree structure that contains information about the devices connected to the machine at boot time The system uses this information to build a node for each device and to create a dependency tree Writing Device Drivers August 1994 A lll Figure 1 2 illustrates two device trees that might be created for particular SPARCstation and x86 machines Sun 4 60 ee d RA c TN zs ZS fd sbus
234. in which these operations are issued by a processor is related to the order in which they reach memory The memory model applies to both uniprocessors and shared memory multiprocessors Two memory models are supported by the SPARC processor Total Store Ordering TSO and Partial Store Ordering PSO All SPARC processors must support TSO TSO guarantees that the store FLUSH and atomic load store instructions of all processors appear to be executed by memory serially in a single order called the memory order Furthermore the sequence of store FLUSH and atomic load store instructions in the memory order for a given processor is identical to the sequence in which they were issued by the processor Hardware Overview 13 lll No Bus Architectures Like the TSO memory model PSO guarantees that the store FLUSH and atomic load store instructions of all processors appear to be executed by memory serially in a single order called the memory order However the memory order of store FLUSH and atomic load store instructions for a given processor is in general not the same as the order in which the instructions were issued by that processor Conformance between issuing order and memory order is provided by the STBAR instruction if two of the above instructions are separated by an STBAR in the issuing order of a processor or if they reference the same location then the memory order of the two instructions is the same as the issuing order
235. including the null termination character Summary of Solaris 2 4 DDI DKI Services 345 m System Information These interfaces return current information about the system such as the root node of the system dev info tree and the values of certain system wide parameters dev info t ddi root node void ddi root node 9F returns a pointer to the root node of the system dev info tree Device drivers rarely use this int drv getparm unsigned long parm unsigned long valuep drv_getparm 9F retrieves the value of the system parameter parm and returns that value in the location pointed to by valuep See the manual page for a list of possible parameters Thread Synchronization 346 These interfaces allow a device to exploit multiple CPUs on multiprocessor machines They prevent the corruption of data by simultaneous access by more than one thread The mechanisms for doing this are mutual exclusion locks mutexes condition variables readers writer locks and semaphores void cv init kcondvar t cvp char name kcv type t type void arg cv init 9F prepares the condition variable pointed to by cop for use CV DRIVER should be specified for t ype void cv destroy kcondvar t cvp cv destroy 9F releases the resources associated with the condition variable pointed to by cvp Writing Device Drivers August 1994 C lll void cv wait kcondvar t cvp kmutex t mp cv wait 9F must be called wh
236. ing Device Drivers August 1994
237. ing Device Drivers August 1994 No lll Table 2 3 Generic VMEbus full set VMEbus Space Address Data Transfer Physical Address Name Size Size Range vme32d32 32 bits 32 bits 0x0 OxFFFFFFFF vme24d32 24 bits 32 bits 0x0 OxFFFFFF vmel6d32 16 bits 32 bits 0x0 OxFFFF Not all of these address spaces are commonly used nevertheless they are all supported on Sun 4 architecture systems Table 2 4 indicates their sizes and physical address mappings Table 2 4 Page table types for the Sun 4 Type Address Space Name Address Range 0 1 2 3 2 2 3 3 On board Memory On board I O vme32d16 vme32d32 vme24d16 Stolen from top 16M of vme32d16 vme16d16 Stolen from top 64K of vme24d16 vme24d32 Stolen from top 16M of vme32d32 vme16d32 Stolen from top 64K of vme24d32 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 OxFFFFFFFF OxFFFFFF OxFEFFFFFF OxFEFFFFFF xFEFFFF OxFFFF xFEFFFF xFFFF The type is a field in the page table entry used by the Sun4 MMU to implement the virtual memory subsystem It indicates the type of memory referenced Type 0 main memory Type 1 on board I O Type 2 VMEbus memory 16 bit data Type 3 VMEbus memory 32 bit data Other Sun machines such as the SPARCServer 600 series have a different format for page table entries Hardware Overview 19 20 CPU bits Virtual Address CPU or DVMA When a smaller VME space
238. interval uwritec uwritec remove a character from a uio structure wakeup cv_broadcast signal condition and wake all blocked threads Writing Device Drivers August 1994 Advanced Topics B This appendix contains a collection of topics Not all drivers need to be concerned with the issues addressed Multithreading This section supplements the guidelines presented in Chapter 4 Multithreading for writing an MT safe driver a driver that safely supports multiple threads Lock Granularity Here are some issues to consider when deciding on how many locks to use in a driver The driver should allow as many threads as possible into the driver this leads to fine grained locking However it should not spend too much time executing the locking primitives this approach leads to coarse 2rained locking Moreover the code should be simple and maintainable Avoid lock contention for shared data Write reentrant code wherever possible This makes it possible for many threads to execute without grabbing any locks Use locks to protect the data and not the code path 297 lll ee Keep in mind the level of concurrency provided by the device if the controller can only handle one request at a time there is no point in spending a lot of time making the driver handle multiple threads A little thought in reorganizing the ordering and types of locks around such data can lead to considerable savings
239. ints dump The dump 9E entry point is used to dump a portion of virtual address space directly to the specified device in the case of a system failure 188 Writing Device Drivers August 1994 Ko lll int xxdump dev_t dev caddr_t addr daddr_t blkno int nblk dev is the device number of the device to dump to addr is the base kernel virtual address to start the dump at bl kno is the beginning block to dump to and nblk is the number of blocks to dump The example depends on the existing driver working properly It creates a bu 95 request to pass to strategy 9E Interrupts are not necessarily enabled at this point so xxdump calls a special version of st rategy 9E not shown that only does polled I O non interrupt driven Code Example 9 9 Block driver dump 9E routine static int xxdump dev_t dev caddr t addr daddr t blkno int nblk int error struct buf bp Allocate a buf structure to perform the dump bp getrbuf KM NOSLEEP if bp NULL return EIO Set the appropriate fields in the buf structure This is OK since the driver knows what its strategy routine will examin xf bp gt b_un b_addr addr bp gt b_edev dev bp gt b_bcount nblk DEV_BSIZE bp b flags B WRITE B KERNBUF bp b blkno blkno disable interrupts void xxstrategy poll bp Wait here until the driver performs a biodone 9F on the buffer being t
240. ion between the top half and bottom half of a driver than there was in SunOS 4 x All driver code is executed by a thread which may be running in parallel with threads in other or the same part of a driver The distinction now is whether these threads have user context See Chapter 4 Multithreading for more information Converting a Device Driver to SunOS 5 4 277 278 Locking Under SunOS 4 1 2 or later only one processor can be in the kernel at any one time This is accomplished by using a master lock around the entire kernel When a processor wants to execute kernel code it needs to acquire the lock this excludes other processors from running the code protected by the lock and then release the lock when it is through Because of this master lock drivers written for uniprocessor systems did not change for multiprocessor systems Two processors could not execute driver code at the same time In SunOS 5 x instead of one master lock there are many smaller locks that protect smaller regions of code For example there may be a kernel lock that protects access to a particular vnode and one that protects an inode Only one processor can be running code dealing with that vnode at a time but another could be accessing an inode This allows a greater degree of concurrency However because the kernel is multithreaded the possibility exists that two or more threads are in driver code at the same time 1 One thread could be in
241. ivers identify 9E should not maintain a device count since the system does not guarantee that identify 9E will be called for all device instances before attach 9E is called for any device instance nor does the system make any guarantees about the number of times identify 9E will be called for any given device Instance Numbers In SunOS 4 x drivers counted the calls to identify 9E and used the current value of this number as an instance number in a later call to attach However drivers must not do this in Solaris 2 x The system now assigns an instance number to each device this number is derived in an implementation specific manner from different properties for the different device types The following properties are used to derive instance numbers The reg property is used for SBus VMEbus ISA EISA and MCA devices Non self identifying device drivers provide this in the hardware configuration file See sbus 4 isa 4 and vme 4 The target and lun properties are used for SCSI target devices These are provided in the hardware configuration file See scsi 4 The instance property is used for pseudo devices This is provided in the hardware configuration file See pseudo 4 The driver should retrieve the particular instance number that has been assigned by calling ddi get instance 9F See Code Example 5 4 on page 95 for an example Writing Device Drivers August 1994 O1 lll probe Persistent Ins
242. ivers August 1994 Qo lll struct device_reg u_char csr u_char data volatile struct device_reg regp Although the two examples are functionally equivalent the second one requires the writer to ensure that volatile is used in every declaration of type struct device_reg The first example results in the data being treated as volatile in all declarations and is therefore preferred Overview of SunOS Device Drivers 71 72 Writing Device Drivers August 1994 Threads Multithreading 4 This chapter describes the locking primitives and thread synchronization mechanisms of the SunOS multithreaded kernel A thread of control or thread is a sequence of instructions executed within a program A thread can share data and code with other threads and can run concurrently with other threads There are two kinds of threads user threads and kernel threads See Multithreaded Programming Guide for more information on threads User Threads Each process in the SunOS operating system has an address space that contains one or more lightweight processes LWPs each of which in turn runs one or more user threads Figure 4 1 shows the relationship between threads LWPs and processes An LWP schedules its user threads and runs one user thread at a time though multiple LWPs may run concurrently User threads are handled in user space The LWP is the interface between user threads and the kernel The LWP can be thought of
243. ivers August 1994 Qo lll void kmem_free void cp size t size Memory allocated by kmem_alloc 9F or by kmem_zalloc 9F is returned to the system with kmem_free 9F This is similar to the C library routine free 3C with the addition of the size argument Drivers must keep track of the size of each object they allocate in order to call kmem_free 9F later Software State Management State Structure For each device that the driver handles the driver must keep some state information At the minimum this consists of a pointer to the dev_info node for the device required by get info 9E The driver can define a structure that contains all the information needed about a single device struct xxstate dev_info_t dip hn This structure will grow as the device driver evolves Additional useful fields might be A pointer to each of the devices mapped registers Flags such as busy The initial state structure the examples in this book use is given in Code Example 3 2 Code Example 3 2 Initial State Structure struct xxstate dev_info_t dip struct device_reg regp Subsequent chapters may require new fields Each chapter will list any additions to the state structure Overview of SunOS Device Drivers 57 58 State Management Routines To assist device driver writers in allocating state structures the Solaris 2 x DDI DKI provides a set of memory management routines called the software state routin
244. izes Phase 2 Check Pathnames Phase 3 Check Connectivity Phase 4 Check Reference Counts Phase 5 Check Cyl groups 1478 files 9922 used 29261 free 141 frags 3640 blocks 0 4 fragmentation f mount dev dsk c0t3d0sO mnt drvconfig r mnt devices devlinks r mnt disks r mnt tapes r mnt ports r mnt Se oce cb bo cb Caution Fixing devices and dev may allow the system to boot but other parts of the system may still be corrupted This may only be a temporary fix to allow saving of information such as system core dumps before reinstalling the system Booting an Alternate Kernel A kernel other than kernel unix can be booted by specifying it as the boot file In fact backup copies of all the system drivers in kernel can be made and used in the event the originals fail this is probably more useful if more than one driver is being debugged For example cp r kernel kernel orig Debugging 237 13 Coding Hints 238 To boot the original system boot kernel orig unix By default the first module directory in the module directory path is the one the kernel resides in By booting kernel orig unix the module directory path becomes kernel orig usr kernel ok boot diskl kernel orig unix Rebooting with command diskl kernel orig unix Boot device sbus esp 0 800000 sd 1 0 File and args kernel orig unix SunOS Release 5 4 Version Generic UNIX R
245. ked thread with cv signal 9F Release the mutex Code Example 4 1 uses a busy flag mutex and condition variable to force the read 9E routine to wait until the device is no longer busy before starting a transfer Code Example 4 1 Using mutexes and condition variables static int xxread dev t dev struct uio uiop cred t credp struct xxstate xsp mutex enter amp xsp mu while xsp gt busy cv wait amp xsp cv amp xsp mu xsp gt busy 1 mutex exit amp xsp mu do the read static u_int xxintr caddr t arg struct xxstate xsp caddr_t arg mutex enter amp xsp mu xsp gt busy 0 cv_broadcast amp xSp gt cv mutex exit amp xsp mu Multithreading 81 In Code Example 4 1 xxintr always calls cv signal 9F even if there are no threads waiting on the condition This extra call can be avoided by using a want flag in the state structure Before a thread blocks on the condition variable such as because the device is busy it sets the want flag indicating that it wants to be signalled when the condition occurs When the condition occurs the device finishes the transfer the call to cv broadcast 9F is made only if the want flag is set Code Example 4 2 Using a want flag static int xxread dev t dev struct uio uiop cred t credp struct xxstate xsp mutex enter amp xsp mu while xsp gt busy xsp gt want 1 cv wait amp xsp cv amp xs
246. kt 9S structure and related DMA resources previously allocated by get pktiopb 9F Writing Device Drivers August 1994 C lll void makecom gO struct scsi pkt pkt struct scsi device devp int flag int cmd int addr int cnt makecom_g0 9F formulates a group 0 SCSI command for the target device denoted by devp in the scsi_pkt 9S structure pointed to by pkt The target must be a non sequential access device Use makecom_g0_s 9F to formulate group 0 commands for sequential access devices void makecom_g0_s struct scsi pkt pkt struct scsi device devp int flag int cmd int cnt int fixbit makecom_g0_s 9F formulates a group 0 SCSI command for the sequential access target device denoted by devp in the scsi_pkt 9S structure pointed to by pkt Use makecom_g0 9F to formulate group 0 commands for non sequential access devices void makecom gl struct scsi pkt pkt struct scsi device devp int flag int cmd int addr int cnt makecom gl 9F formulates a group 1 SCSI command for the target device denoted by devp in the scsi_pkt 9S structure pointed to by pkt void makecom g5 struct scsi pkt pkt struct scsi device devp int flag int cmd int addr int cnt makecom g5 9F formulates a group 5 SCSI command for the target device denoted by devp in the scsi_pkt 9S structure pointed to by pkt int scsi abort struct scsi address ap struct scsi pkt pkt scsi_abort 9F cancels the command encoded in the sc
247. l unit attached to the system The target driver can retrieve a pointer to this structure by calling ddi get driver private 9F Caution Because the host bus adapter driver uses the private field in the target device s dev info structure target drivers should not use ddi set driver private 9F The scsi device 95 structure contains the following fields struct scsi address sd address dev info t sd dev kmutex t sd mutex struct scsi inquiry sd ing struct scsi extended sens sd sense caddr t sd private sd address is a data structure that is passed to the SCSI resource allocation routines sd dev is a pointer to the target s dev info structure sd mutex is a mutex for use by the target driver This is initialized by the host bus adapter driver and can be used by the target driver as a per device mutex Do not hold this mutex across a call to scsi_transport 9F or scsi poll 9F See Chapter 4 Multithreading for more information on mutexes Writing Device Drivers August 1994 10z sd inq is a pointer for the target device s SCSI Inquiry data The scsi probe 9F routine allocates a buffer fills it in and attaches it to this field Sd sense is a pointer to a buffer to contain SCSI Request Sense data from the device The target driver must allocate and manage this buffer itself see attach on page 202 sd private is a pointer field for use by the target driver It is commonly used to store a point
248. l entry points should be tested in this process including mmap 9E po11 9E and ioct1 9E if applicable The ioct1 9E tests may be quite different for each driver and for nonstandard devices a custom testing application will be required Error Handling A driver may perform correctly in an ideal environment but fail to handle cases where a device encounters an error or an application specifies erroneous operations or sends bad data to the driver Therefore an important part of driver testing is the testing of its error handling All of a driver s possible error conditions should be exercised including error conditions for actual hardware malfunctions Some hardware error conditions may be difficult to induce but an effort should be made to cause them or to simulate them if possible It should always be assumed that all of these conditions will be encountered in the field Cables should be removed or loosened boards should be removed and erroneous user application code should be written to test those error paths Writing Device Drivers August 1994 1I3 Stress Performance and Interoperability Testing To help ensure that the driver performs well it should be subjected to vigorous stress testing Running single threads through a driver will not test any of the locking logic and might not test condition variable waits Device operations should be performed by multiple processes at once in order to cause several threads to exe
249. le can be used to invalidate and validate the mapping translations If the driver invalidates the mapping translations it will be notified of any future access to the mapping If the driver validates the mapping translations it will no longer be notified of accesses to the mapping Mappings are always created with the mapping translations invalidated so that the driver will be notified on first access to the mapping To ensure that a device driver can distinguish between the various user processes that have memory mapped the device only mappings of type MAP PRIVATE can be used with addi mapdev 9F The dev offset asp addrp len prot maxprot flags and cred arguments are passed into the segmap 9E entry point and should be passed on to ddi_mapdev 9F unchanged ddi mapdev 9F also takes the driver defined structure ddi_mapdev_ct1 9S and a pointer to device private data This pointer is passed into each entry point and is usually a pointer to the device context structure Code Example 11 1 segmap 9E entry point static struct ddi mapdev ctl xx mapdev ctl MAPDEV REV xxmapdev access xxmapdev free xxmapdev dup Writing Solaris Graphics Device Drivers August 1994 i static int xxsegmap dev_t dev off t off struct as asp caddr t addrp off t len unsigned int prot unsigned int maxprot unsigned int flags cred t credp int error int instance getminor dev struct xxstate xsp
250. lim_sgllen If the DMA engine has no particular limitation this field should be set to OxFFFFFFFF Here are some examples specifying device limitations Example One A DMA engine on a SPARC SBus device has the following limitations It can only access addresses ranging from 0xFF000000 to OxFFFFFFFF It has a 32 bit address register It supports 1 2 and 4 byte burst sizes It has a minimum effective transfer size of 1 byte The system should not make optimizations related to transfer speed DMA 131 132 The average speed is not known so the dlim_dmaspeed field is set to zero The resulting limit structure is static ddi dma lim t limits OxFF000000 low address OxFFFFFFFF high address OxFFFFFFFF address register maximum 0x7 burst sizes 0x1 0x2 0x4 0x1 minimum transfer size 0 speed h Example Two A DMA engine on a SPARC VMEbus device has the following limitations It can address the full 32 bit range It has a 24 bit address register It supports 2 to 256 byte burst sizes and all powers of 2 in between It has a minimum effective transfer size of 2 bytes It has an average transfer speed of 10 Mbytes per second The resulting limit structure is static ddi dma lim t limits 0x00000000 low address OxFFFFFFFF high address OxFFFFFF address register maximum Ox1FE burst sizes 25 minimum transfer size 10240 speed
251. loaded dynamically as references are made to them For example when a device special file is opened see open 2 the corresponding driver is loaded if it is not already in memory Device drivers must provide support for dynamic loading See Chapter 5 Autoconfiguration for more details about the loadable module interface Overview of the Solaris 2 x DDI DKI In System V Release 4 SVR4 the interface between device drivers and the rest of the UNIX kernel has been standardized and documented in Section 9 of the of the Solaris 2 4 Reference Manual AnswerBook The reference manual documents driver entry points driver callable functions and kernel data structures used by device drivers These interfaces known collectively as the Solaris 2 x Device Driver Interface Driver Kernel Interface Solaris 2 x DDI DKT are divided into the following subdivisions Device Driver Interface Driver Kernel Interface DDI DKT Includes architecture independent interfaces supported on all implementations of System V Release 4 SVR4 Overview of the SunOS Kernel 3 Solaris DDI Includes architecture independent interfaces specific to Solaris Solaris SPARC DDI Includes SPARC Instruction Set Architecture ISA interfaces specific to Solaris Solaris x86 DDI Includes x86 Instruction Set Architecture ISA interfaces specific to Solaris Device Kernel Interface DKI Includes DKI only architecture independent interfaces speci
252. loating point Debugging 251 mum z13 c Print the addressed character C Print the addressed character using escape notation s Print the addressed string S Print the addressed string using escape notation i Print as machine instructions disassemble a Print the value of in symbolic form w W Write a 2 4 byte value Note Understand exactly what sizes the objects are and what effects changing them might have before making any changes For example to set a bit in the moddebug variable when debugging the driver first examine the value of moddebug then OR in the desired bit kadb 0 moddebug X moddebug moddebug 0 kadb 0 moddebug W 0x80000000 moddebug 0x0 0x80000000 Routines can be disassembled with the i command This is useful when tracing crashes since the only information may be the program counter at the time of the crash The output has been formatted for readability kadb 0 strcmp 4 i strcmp strcmp ba strcmp 0x20 ldsb 01 05 add 00 0x1 00 Srce g0 o5 g0 252 Writing Device Drivers August 1994 1I3 To show the addresses also specify symbolic notation with the a command kadb 0 strcmp 4 ai strcmp strcmp ba strcmp 0x20 strcmp 4 ldsb 01 05 strcmp 8 add 00 0x1 00 strcmp 0xc orcc g0 05 g0 Register Identifiers Machine or kadb 1M internal registers are identified with the command foll
253. low of control 192 interfaces 334 resource allocation 205 simple driver code listing 355 target driver overview 189 target drivers 102 195 self identifying devices 14 semaphores 344 slice number for block devices 171 soft state structure 55 341 source compatibility 4 source files for device drivers 62 SPARC processor byte ordering 10 12 data alignment 9 11 floating point operations 11 12 multiply and divide instructions 11 register windows 10 structure member alignment 10 11 special files 3 sst_getinfo entry point 102 state structure description of 55 management routines 56 store buffers 12 STREAMS drivers 44 interfaces 308 string manipulation 342 structure padding 47 SunDDI DKI interface summary 307 overview 3 170 synchronization of threads 344 system call description of 1 T tagged queueing 302 third party DMA 126 thread synchronization 344 condition variables 77 mutex locks 75 mutex init 9F 76 per instance mutex 95 readers writer locks 77 threads preemption of 74 types of 71 timing routines 349 U uio 9S data structure 350 unloading drivers getting the module ID 228 untagged queuing 303 user threads 71 utility functions 350 V vectored interrupts 107 virtual addresses 2 virtual DMA 122 virtual memory address spaces 2 memory management unit MMU 2 overview 2 VMEbus address spaces 20 machine architecture 18 361 362 Writ
254. method handles the fact that cmn err 9F has a variable number of arguments Another method relies on the macro having one argument a parenthesized argument list for cmn err 9F which the macro removes It also removes the reliance on the optimizer by expanding the macro to nothing if DEBUG is not defined ifdef DEBUG comments on values of xxdebug and what they do static int xxdebug define dcmn err X if xxdebug cmn err X else define dcmn err X nothing endif dcmn err CE NOTE Error This can be extended in many ways such as by having different messages from cmn err 9F depending on the value of xxdebug but be careful not to obscure the code with too much debugging information Another common scheme is to write an xxlog function and have it use vsprint 9F or vcmn_err 9F to handle variable argument lists Writing Device Drivers August 1994 1I3 The Optimizer and volatile The volatile keyword must be used when declaring any variable that will reference a device register or the optimizer may optimize important accesses away This is very important since not using volatile can result in bugs that are very difficult to track down See volatile on page 69 for more information Using Existing Drivers Using existing drivers with a user program is a good way to see if the kernel sees the device This allows the device to be debugged without the need
255. mits A block of memory so allocated is commonly called an I O parameter block or IOPB and is usually used to encode a device command This block of consistent memory can be directly accessed by the device A pointer to the allocated IOPB is passed back in the location pointed to by iopbp Summary of Solaris 2 4 DDI DKI Services 323 lll C Flow of Control Interrupt Handling 324 void ddi iopb free caddr t iopb ddi_iopb_free 9F frees the I O parameter block pointed to by iopb which must have been allocated previously by ddi_iopb_alloc 9F These interfaces influence the flow of program control in a driver These are mostly callback mechanisms functions that schedule another function to run at a later time Many drivers schedule a function to run every so often to check on the status of the device and possibly issue an error message if some strange condition is detected Note The detach 9E entry point must assure that no callback functions are pending in the driver before returning successfully See Chapter 5 Autoconfiguration int timeout void ftn caddr_t caddr_t arg long ticks timeout 9F schedules the function pointed to by tn to be run after ticks clock ticks have elapsed arg is passed to the function when it is run timeout 9F returns a timeout ID that can be used to cancel the timeout later int untimeout int id untimeout 9F cancels the timeout indicated by the timeout ID
256. mple of functions with callbacks that can be cancelled 101 DMA Resource Allocation Interfaces 004 132 Character Driver Entry Points lsliseseelessse 147 Block Driver Entry Points a n anaana ranana rnrn 170 xix XX Table 10 1 Table 10 2 Table A 1 Table B 1 Table B 2 Table B 3 Table D 1 Standard SCSA Functions 0 0 0 0 0 cece eee eee 194 SCSA Compatibility Functions 0 0 0 cece eee 195 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines 287 Mandatory Sun Disk I O Controls 0 000 299 Optional Sun Disk Ioctls 0 000 c eee 299 SESA OPONIS i exec hha esee eter eese tus 301 Sample driver source code listings 2000 355 Writing Device Drivers August 1994 Preface Writing Device Drivers describes how to develop device drivers for character oriented devices block oriented devices and Small Computer System Interface SCSI target devices Who Should Read This Book The audience for this book is UNIX programmers familiar with UNIX device drivers Several overview chapters at the beginning of the book provide background information for the detailed technical chapters that follow but they are not intended as a general tutorial or text on device drivers How This Book Is Organized This book discusses the development of a dynamically loadable and unloadable multithreaded reentrant device driver applicable to all ar
257. mutex t mp long timeout cv_timedwait_sig 9F is like cv_timedwait 9F and cv wait sig 9F except that it returns 1 at time timeout if the condition has not occurred If the calling thread receives a signal while cv_timedwait_sig 9F is blocked cv_timedwait_sig 9F immediately returns zero In all cases cv_timedwait_sig 9F reacquires the mutex before returning void mutex_init kmutex_t mp char name kmutex type t type void arg mutex init 9F prepares the mutual exclusion lock pointed to by mp for use MUTEX DRIVER should be specified for type and pass an interrupt block cookie of type ddi iblock cookie t for arg The interrupt block cookie is returned by ddi add intr 9F void mutex enter kmutex t mp mutex enter 9F acquires the mutual exclusion lock pointed to by mp If another thread holds the mutex mutex enter 9F will either block or spin waiting for the mutex to become available utexes are not reentrant if a thread calls mutex enter 9F on a mutex it already holds the system will panic p is assumed to protect a certain set of data often a single data structure and ll driver threads accessing those data must first acquire the mutex by calling utex enter 9F This is accomplished by mutual agreement and consistency mong all driver code paths that access the data in question mutex_enter 9F in no way prevents other threads from accessing the data It is only when all driver code paths agree to acquire
258. n identify which device is interrupting and call the correct device driver directly The Solaris 2 x DDI DKI interrupt model is the same for both types of devices See Chapter 6 Interrupt Handlers for more information about interrupt handling This section covers addressing and device configuration issues specific to the buses that SunOS supports Typical SBus systems consist of a motherboard containing the CPU and SBus interface logic a number of SBus devices on the motherboard itself and a number of SBus expansion slots An SBus can also be connected to other types of buses through an appropriate bus bridge Following is a discussion of how the SBus is implemented in the SPARCstation 1 and SPARCstation 14 Physical Address Space The physical address space layout of the SPARCstation 1 and SPARCstation 1 is shown in Table 2 1 Table 2 1 Physical space in the SPARCstation 1 and SPARCstation 1 Space Range Usage Main Memory 0x00000000 OxEFFFFFFF Main Memory I O Devices OxF0000000 OxF7FFFFFF Sun I O Devices OxF8000000 OxF9FFFFFF SBus Slot 0 OxFA000000 OXFBFFFFFF SBus Slot 1 OxFCOO0000 OxFDFFFFFF SBus Slot 2 OxFEO00000 OxFFFFFFFF SBus Slot 3 Writing Device Drivers August 1994 No lll Physical SBus Addresses The address bus of the SPARC CPU has 32 bits The SBus has 28 address bits as described in the SBus Specification In the SPARCstation 1 the address bits are used as desc
259. n receipt of a signal cv timedwait sig 9F Wait for condition or timeout or signal cv signal 9F Signal one thread waiting on the condition variable cv broadcast 9F Signal all threads waiting on the condition variable Initializing Condition Variables Declare a condition variable type kcondvar t for each condition Usually this is done in the driver s soft state structure Use cv init 9F to initialize each one Similar to mutexes condition variables are usually initialized at attach 9E time For example cv init amp xsp cv xx cv CV DRIVER NULL For a more complete example of condition variable initialization see Chapter 5 Autoconfiguration Using Condition Variables On the code path waiting for the condition take the following steps Acquire the mutex guarding the condition Test the condition Writing Device Drivers August 1994 4 If the test results do not allow the thread to continue use cv wait 9F to block the current thread on the condition cv wait 9F releases the mutex before blocking Upon return from cv wait 9F which will reacquire the mutex before returning repeat the test Once the test allows the thread to continue set the condition to its new value For example set a device flag to busy Release the mutex On the code path signaling the condition take the following steps Acquire the mutex guarding the condition Set the condition Signal the bloc
260. nd high speed communication will help test the reliability of the driver Running uucp 1C over the line also provides some exercise note however that uucp 1C performs its own error handling so it is important to verify that the driver is not reporting excessive numbers of errors to the uucp 1C process These types of devices are usually STREAMS based Debugging 269 13 270 Network Drivers Network drivers may be tested using standard network utilities t p 1 and rcp 1 are useful because the files can be compared on each end of the network The driver should be tested under heavy network loading so various commands should be run by multiple processes Heavy network loading means There is a lot of traffic to the test machine There is heavy traffic among all machines on the network Network cables should be unplugged while the tests are executing and the driver should recover gracefully from the resulting error conditions Another important test is for the driver to receive multiple packets in rapid succession back to back packets In this case a relatively fast host on a lightly loaded network should send multiple packets in quick succession to the machine with the driver being tested It should be verified that the receiving driver does not drop the second and subsequent packets These types of devices are usually STREAMS based Writing Device Drivers August 1994 Converting a Device Driver to SunO
261. necessary mutex is not acquired To determine if a mutex is held by a thread use mutex owned 9F within ASSERT 9F void helper void this routine should always be called with the mu mutex held ASSERT mutex owned amp xsp mu Future releases of Solaris may only support the use of mutex_owned 9F within ASSERT 9F by not defineing mutex owned 9F unless the preprocessor symbol DEBUG is defined Conditional Compilation and Variables There are two common ways to place debugging code in a driver conditionally compiling code based on a preprocessor symbol such as DEBUG or using a global variable Conditional compilation has the advantage that unnecessary code can be removed in the production driver Using a variable allows the amount of debugging output to be chosen at run time such as by setting a debugging level at run time with an I O control or through a debugger Commonly these two methods are combined Debugging 241 242 The following example relies on the compiler to remove unreachable code the code following the always false test of zero and also provides a local variable that can be set in etc system or patched by a debugger ifdef DEBUG comments on values of xxdebug and what they do Static int xxdebug define dcmn err if xxdebug else cmn err define dcmn err if 0 cmn err endif 4 2 dcmn err CE NOTE Error Nn This
262. nel from device specifics just as the system call interface protects application programs from platform specifics Application programs and the rest of the kernel need little if any device specific code to address the device In this way device drivers make the system more portable and easier to maintain 43 3 Types of Device Drivers There are several kinds of device drivers each handling a different kind of I O Block device drivers manage devices with physically addressable storage media such as disks All other devices are considered character devices There are two types of character device drivers standard character device drivers and STREAMS device drivers Block Device Drivers Devices that support a file system are known as block devices Drivers written for these devices are known as block device drivers Block device drivers take a file system request in the form of a buf 9S structure and make the device transfer the specified block The main interface to the file system is the strategy 9E routine See Chapter 9 Drivers for Block Devices for more information Block device drivers can also provide a character driver interface that allows utility programs to bypass the file system and access the device directly This device access is commonly referred to as the raw interface to a block device Standard Character Device Drivers 44 Character device drivers normally perform I O in a byte stream They can als
263. ng Device Memory Some devices such as frame buffers have memory that is directly accessible to user threads by way of memory mapping Drivers for these devices typically do not support the read 9E and write 9E interfaces Instead these drivers Drivers for Character Devices 161 162 support memory mapping with the mmap 9E entry point A typical example is a frame buffer driver that implements the mmap 9E entry point to allow the frame buffer to be mapped in a user thread segmap int xxsegmap dev t dev off t off struct as asp caddr t addrp off t len unsigned int prot unsigned int maxprot unsigned int flags cred t credp segmap 9E is the entry point responsible for actually setting up a memory mapping requested by the system on behalf of an mmap 2 system call Drivers for all memory mapped devices usually use ddi_segmap 9F as the entry point rather than define their own segmap 9E routine mmap int xxmmap dev t dev off t off int prot This routine is called as a result of an mmap 2 system call and also as the result of a page fault mmap 9E is called to translate the device offset of f to the corresponding page frame number Code Example 8 9 allows a user thread to memory map the device registers Code Example 8 9 mmap 9E routine static int xxmmap dev t dev off t off int prot int instance struct xxstate xsp if prot amp PROT WRITE return 1 instance getminor dev
264. ng and Unloading Drivers for the details of compiling and installing device drivers lll Multithreading Virtual Memory In most UNIX systems the process is the unit of execution In SunOS 5 x a thread is the unit of execution A thread is a sequence of instructions executed within a program A process consists of one or more threads There are two types of threads application threads which run in user space and kernel threads which run in kernel space The kernel is multithreaded MT Many kernel threads can be running kernel code and may be doing so concurrently on a multiprocessor MP machine Kernel threads may also be preempted by other kernel threads at any time This is a departure from the traditional UNIX model where only one process can be running kernel code at any one time and that process is not preemptable though it is interruptible The multithreading of the kernel imposes some additional restrictions on the device drivers For more information on multithreading considerations see Chapter 4 Multithreading and Appendix B Advanced Topics A complete overview of the SunOS virtual memory VM system is far beyond the scope of this book but two virtual memory terms of special importance are used when discussing device drivers virtual addresses and address spaces Virtual Addresses A virtual address is an address that is mapped by the memory management unit MMU to a physical hardware addre
265. ns Multithreading 85 86 Writing Device Drivers August 1994 Overview State Structure Autoconfiguration 5 This chapter describes the support a driver must provide for autoconfiguration Autoconfiguration is the process of getting the driver s code and static data loaded into memory and registered with the system Autoconfiguration also involves configuring attaching individual device instances that are controlled by the driver These processes are discussed in more detail in Loadable Driver Interface on page 89 and Device Configuration on page 93 The autoconfiguration process begins when the device is put into use This section adds the following fields to the state structure See State Structure on page 57 for more information int instance ddi_iblock_cookie_t iblock_cookie ddi_idevice_cookie_t idevice_cookie 87 5 Data Structures 88 Figure 5 1 shows an overview of the driver data structures that are accessed directly by the kernel for loading the driver configuring devices and providing access to devices The figure is divided into 3 sections the first two are discussed in this chapter The third section is discussed in Chapter 8 Drivers for Character Devices and Chapter 9 Drivers for Block Devices Figure 5 1 shows the relationship between the autoconfiguration structures and the driver entry points _init QE modlinkage 9S _info 9E _fini QE
266. ns The HBA will perform any necessary synchronization of the buffer before performing the command completion callback Caution scsi alloc consistent buf 9F uses scarce system resources it should be used sparingly sScsi free consistent buf 9F releases a buf 9S structure and the associated data buffer allocated with scsi alloc consistent buf 9F See attach on page 202 and detach on page 204 for examples SCSI Target Drivers 209 10 Building and Transporting a Command 210 The host bus adapter driver is responsible for transmitting the command to the device and taking care of the low level SCSI protocol The scsi transport 9F routine hands a packet to the host bus adapter driver for transmission It is the target driver s responsibility to create a valid scsi_pkt 9S structure Building a Command The routine scsi_init_pkt 9F allocates space for a SCSI CDB allocates DMA resources if necessary and sets the pkt_flags field pkt scsi init pkt amp sdp sd address NULL bp CDB GROUPO 1 0 0 SLEEP FUNC NULL This example creates a new packet and allocates DMA resources as specified in the passed buf 95 structure pointer A SCSI CDB is allocated for a Group 0 6 byte command the pkt_flags field is set to zero but no space is allocated for the pkt private field This call to scsi init pkt 9F because of the SLEEP FUNC parameter waits indefinitely for resources if none are curr
267. nt access I O parameter blocks that are used for communication between a device and the driver are set up using ddi iopb alloc 9F ddi iopb free 9F is used to free the memory allocated by ddi iopb alloc 9F On x86 systems ddi_iopb_alloc 9F can be used to allocate memory that is physically contiguous as well as consistent Code Example 7 5 is an example of how to allocate IOPB memory and the necessary DMA resources to access it DMA resources must still be allocated and the DDI DMA CONSISTENT flag must be passed to the allocation function Code Example 7 5 Using ddi iopb alloc 9F if ddi iopb alloc xsp dip amp limits size amp xsp iopb array DDI SUCCESS error handling goto failure if ddi_dma_addr_setup xsp gt dip NULL xsp gt iopb_array size DDI_DMA_READ DDI_DMA CONSISTENT DDI_DMA SLEEP NULL amp limits amp xsp iopb handle DDI_DMA_MAPPED error handling ddi iopb free xsp iopb array goto failure DMA 145 146 ddi mem alloc ddi mem alloc 9F should be used if the device is doing sequential unidirectional block sized and block aligned transfers to or from memory This type of access is commonly known as streaming access In SPARC ddi_mem_alloc 9F obeys the alignment and padding constraints specified by the dlim_minxfer and dlim_burstsizes fields in the passed DMA limit structure to get the most effective hardware support for large transfers F
268. nterfaces are part of the Sun Common SCSI Interface routines that support the writing of target drivers to drive SCSI devices Most of these routines handle allocating SCSI command packets formulating SCSI commands within those packets and transporting the packets to the host adapter driver for execution See Chapter 10 SCSI Target Drivers struct scsi pkt get pktiopb struct scsi address ap caddr t datap int cdblen int statuslen int datalen int readflag int callback void get pktiopb 9F allocates a SCSI packet structure with a small data area in the system IOPB I O parameter block map for the target device denoted by ap get pktiopb 9F calls scsi_dmaget 9F to allocate the data area and calls scsi_resalloc 9F to allocate the scsi_pkt 9S structure itself If func is not NULL FUNC and resources cannot be allocated right away the function pointed to by func will be called when resources may have become available func can call get pktiopb 9F again If callback is SLEEP FUNC scsi dmaget 9F may block waiting for resources Target drivers often use get pktiopb to allocate packets for the REQUEST SENSE or INQUIRY SCSI commands which need a small amount of cache consistent memory Use IOPB packets sparingly though because they are allocated from scarce DMA memory resources void free pktiopb struct scsi pkt pkt caddr t datap int datalen free pktiopb 9F frees a scsi_p
269. ntries UNIX is a registered trademark of Novell Inc in the United States and other countries X Open Company Ltd is the exclusive licensor of such trademark OPEN LOOK isa registered trademark of Novell Inc PostScript and Display PostScript are trademarks of Adobe Systems Inc All other product names mentioned herein are the trademarks of their respective owners All SPARC trademarks including the SCD Compliant Logo are trademarks or registered trademarks of SPARC International Inc SPARCstation SPARCserver SPARCengine SPARCstorage SPARCware SPARCcenter SPARCclassic SPARCcluster SPARCdesign SPARC811 SPARCprinter UltraSPARC microSPARC SPARCworks and SPARCompiler are licensed exclusively to Sun Microsystems Inc Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interfaces were developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements X Window System is a product of the Massachusetts Institute of Technology THIS PUBLICATION IS PROVIDED AS IS WITHOUT WARRANTY OF
270. number caddr t amp xsp gt regp offset sizeof struct device_reg ddi_remove_intr dip inumber xsp gt iblock_cookie cv_destroy amp xsp gt cv mutex destroy amp xsp mu Writing Device Drivers August 1994 O1 lll ddi_soft_state_free statep instance return DDI FAILURE attach 9E first checks for the DDI ATTACH command which is the only one it handles Future releases may support additional commands consequently it is important that drivers return DDI FAILURE for all the commands they do not recognize att ach 9E then calls ddi get instance 9P to get the instance number the system has assigned to the dev info node indicated by dip Since the driver must be able to return a pointer to its dev info node for each instance att ach 9E must save dip usually in a field of a per instance state structure The example also requires DMA capability so ddi_slaveonly 9F is called to check if the slot is capable of DMA The section SBus Slots on page 18 discusses one example of such SBus hardware If any of the resource allocation routines fail the code at the ailed label should free any resources that had already been allocated before returning DDI FAILURE This can be done with a series of checks that look like this if xsp gt regp ddi unmap regs dip rnumber caddr t amp xsp regp offset sizeof struct device reg There should be such a check and a deallocation operation
271. o provide additional interfaces not present in block drivers such as I O control ioctl 9E commands memory mapping and device polling See Chapter 8 Drivers for Character Devices for more information Byte Stream I O The main job of any device driver is to perform I O and many character device drivers do what is called bytestream or character I O The driver transfers data to and from the device without using a specific device address This is in contrast to block device drivers where part of the file system request identifies a specific location on the device The read 9E and write 9E entry points handle bytestream I O for standard character drivers See I O Request Handling on page 153 for more information Writing Device Drivers August 1994 Qo lll I O Control Many devices have characteristics and behaviors that can be configured or tuned The ioct1 2 system call and the ioct1 9E driver entry point provide a mechanism for application programs to change and determine the status of a driver s configurable characteristics The baud rate of a serial communications port for example is usually configurable in this way The I O control interface is open ended allowing device drivers to define special commands for the device The definition of the commands is entirely up to the driver and is restricted only by the requirements of the application programs using the device and the device itself Certain cl
272. o by pkt that was previously allocated by scsi_pktalloc 9F int scsi poll struct scsi pkt pkt scsi poll 9F transports the command packet pointed to by pkt to the host adapter driver for execution and waits for it to complete before it returns Use scsi_pol1 9F sparingly and only for commands that must execute synchronously int scsi_probe struct scsi_device devp int callback void scsi_probe 9F determines whether a target lun is present and sets up the scsi_device 9S structure with inquiry data scsi_probe 9F uses the SCSI INQUIRY command to test if the device exists It may retry the INQUIRY command as appropriate If scsi_probe 9F is successful it will fill in the scsi inquiry 95 structure pointed to by the sd inq member of the scsi_device 9S structure and return SCSI PROBE EXISTS If callback is not NULL FUNC and necessary resources are not immediately available the function pointed to by callback will be called when resources may have become available If callback is SLEEP FUNC scsi_probe 9F may block waiting for resources struct scsi pkt scsi resalloc struct scsi address ap int cmdlen int statuslen opaque t dmatoken int callback void scsi_resalloc 9F allocates and returns a pointer to a SCSI command packet for the target at the SCSI address pointed to by ap cmdlen and statuslen tell scsi resalloc 9F what size command descriptor block CDB and Summary of Solaris 2 4 DDI DKI Ser
273. o completion Callbacks that can be cancelled do not pose a problem just remember to cancel the callback before detach 9E returns DDI_SUCCESS Each of the callback cancellation routines in Table 5 2 atomically cancels callbacks so that a callback routine does not run while it is being cancelled Table 5 2 Example of functions with callbacks that can be cancelled Function Cancelling function timeout 9F unt imeout QF bufcall 9F unbufcall 9F esbbcall 9F unbufcall 9F Autoconfiguration 103 104 getinfo Some callbacks cannot be cancelled for these it is necessary to wait until the callback has been called In some cases such as ddi dma setup 9F the callback must also be prevented from rescheduling itself See Cancelling DMA Callbacks on page 140 for an example Following is a list of some functions that may establish callbacks that cannot be cancelled esballoc 9F ddi dma setup 9F ddi_dma_addr_setup 9F ddi_dma_buf_setup 9F scsi dmaget 9F scsi resalloc 9F scsi_pktalloc 9F scsi_init_pkt 9F The system calls get info 9E to obtain configuration information that only the driver knows The mapping of minor numbers to device instances is entirely under the control of the driver The system sometimes needs to ask the driver which device a particular dev_t represents get info 9E is called during module loading and at other times during the life of the driver It can take
274. o contact the device See probe on page 93 for more information VMEbus ISA EISA and MicroChannel devices are examples of non self identifying devices SCSI target devices and pseudo devices are also non self identifying devices See vme 4 isa 4 scsi 4 and pseudo 4 for more information Device Addressing Interrupts Device addressing is different on the buses that SunOS currently supports The SBus is geographically addressed each SBus slot exists at a fixed physical address in the system An SBus card has a different address depending on which slot it is plugged into Moving an SBus device to a new slot causes the system to treat it as a new device See Persistent Instances on page 93 for more information On other buses such as the VMEbus each card has its own address possibly configurable by jumpers A VMEbus card has the same address no matter which slot it is plugged into Changing the address of a VME card causes the system to treat it as a new device SunOS supports polling interrupts and vectored interrupts The SBus uses polling interrupts When an SBus device interrupts the system only knows which of several devices might have issued the interrupt The system interrupt handler must ask the driver for each device whether it is responsible for the interrupt Hardware Overview 15 Bus Specifics 16 SBus The VMEbus uses vectored interrupts When a VMEbus device interrupts the system ca
275. o the state structure See State Structure on page 57 for more information int busy device busy flag kmutex_t mu mutex to protect state structure kcondvar_t cv threads wait for access here Mutual Exclusion Locks A mutual exclusion lock or mutex is usually associated with a set of data and regulates access to that data Mutexes provide a way to allow only one thread at a time access to that data Table 4 1 Mutex routines Name Description mutex_init 9F Initialize a mutex mutex_destroy 9F Release any associated storage mutex_enter 9F Acquire mutex mutex_tryenter 9F Acquire mutex if available but do not block mutex_exit 9F Release mutex mutex_owned 9F Test if the mutex is held by the current thread To be used in ASSERT 9F only Multithreading 77 78 Setting Up Mutexes Device drivers usually allocate a mutex for each driver data structure The mutex is typically a field in the structure and is of type kmutex t mutex init 9F is called to prepare the mutex for use This is usually done at attach 9E time for per device mutexes and _init 9E time for global driver mutexes For example struct xxstate xsp mutex init amp xsp mu xx mutex MUTEX DRIVER NULL For a more complete example of mutex initialization see Chapter 5 Autoconfiguration The driver must destroy the mutex with mutex destroy 9F before being unloaded This is
276. o11 9E entry point State Structure This section adds the following field to the state structure See State Structure on page 57 for more information struct pollhead pollhead for chpoll 9E pollwakeup 9F chpoll int xxchpoll dev t dev short events int anyyet short reventsp struct pollhead phpp The system calls chpo11 9E when a user process issues a po11 2 system call on a file descriptor associated with the device The chpo11 9E entry point routine is used by non STREAMS character device drivers that wish to support polling In chpo11 9E the driver must follow the following rules Implement the following algorithm when the chpo11 9E entry point is called if events are satisfied now reventsp mask of satisfied events else reventsp 0 if l anyyet Writing Device Drivers August 1994 Co lll phpp amp local pollhead structure return 0 xxchpoll should check to see if certain events have occurred see chpo11 9E It should then return the mask of satisfied events by setting the return events in reventsp If no events have occurred the return field for the events is cleared If the anyyet field is not set the driver must return an instance of the pollhead structure It is usually allocated in a state structure and should be treated as opaque by the driver None of its fields should be referenced Call pollwakeup 9F whenever a device condition o
277. obe 9E routine has verified that the expected device is present attach 9E is called This routine allocates and initializes any per instance data and creates minor device node information See attach on page 95 for details of this In addition to these steps a SCSI target driver again calls scsi_probe 9F to retrieve the device s Inquiry data and also creates a SCSI Request Sense packet If the attach is successful the attach function should not call scsi_unprobe Three routines are used to create the Request Sense packet Scsi alloc consistent buf 9F scsi init pkt 9P and makecom gO 9F scsi alloc consistent buf 9F allocates a buffer suitable for consistent DMA and returns a pointer to a bu 95 structure The advantage of a consistent buffer is that no explicit syncing of the data is required In other words the target driver can access the data after the callback The sd sense element of the device s scsi_device 9S structure must be initialized with the address of the sense buffer scsi init pkt 9F creates and partially initializes a scsi pkt 95 structure makecom gO 9F creates a SCSI Command Descriptor Block CDB in this case creating a SCSI Request Sense command Code Example 10 2 SCSI target driver att ach 9E routine static int xxattach dev info t dip ddi attach cmd t cmd struct xxstate xXSp Writing Device Drivers August 1994 10 struct scsi pkt rgpkt NULL struct scsi device sdp stru
278. ock for writing If rw tryupgrade 9F cannot acquire the lock for writing it returns zero void sema init ksema t sp u int val char name ksema type t type void arg sema init 9F prepares the semaphore pointed to by sp for use SEMA DRIVER should be passed for type count is the initial count for the semaphore which usually should be 1 or 0 In almost all cases drivers should pass 1 for count Writing Device Drivers August 1994 C lll Timing void sema destroy ksema_t sp sema destroy 9F releases the resources associated with the semaphore pointed to by sp void sema p ksema t sp sema p 9F acquires the semaphore pointed to by sp by decrementing the counter if its value is greater than zero If the semaphore counter is zero sema p 9F blocks waiting to acquire the semaphore int sema p sig ksema t sp sema p sig 9F is like sema p 9F except that if the calling thread has a signal pending and the semaphore counter is zero sema p sig 9F returns zero without blocking void sema v ksema t sp sema v 9F releases the semaphore pointed to by sp by incrementing its counter int sema tryp ksema t sp sema tryp 9F is similar to sema p 9F but if the semaphore counter is zero sema tryp 9F immediately returns zero These are delay and time value conversion routines void delay long ticks delay 9F blocks the calling thread for at least ticks clock ticks using timeout 9F
279. of it is dumped savecore 1M is used to copy the system s core image to a file Normally the system does not examine the swap area for core dumps when it boots This must be enabled in etc init d sysetup Change the lines that read d Default is to not do a savecore d if d var crash uname n then mkdir p var crash uname n fi echo checking for crash dump c savecore var crash uname n echo Y To Default is to not do a savecore if d var crash uname n then mkdir p var crash uname n fi echo checking for crash dump c savecore var crash uname n echo When savecore 1M runs it makes a copy of the kernel that was running called unix n and dumps a core file called vmcore n in the specified directory normally var crash machine name There must be enough space Debugging 247 13 248 in var crash to contain the core dump or it will be truncated The file will appear larger than it actually is since it contains holes so avoid copying it adb 1 can then be used on the core dump and the saved kernel Note savecore 1M can be prevented from filling the file system if there is a file called minfree in the directory in which the dump will be saved This file contains a number of kilobytes to remain free after savecore 1M has run However if not enough space is available the core file is not saved adband kadb adb
280. off ptob 1 DDI_SUCCESS kpfn hat getkpfnum kva ddi unmap regs xsp dip rnumber amp kva off ptob 1 return kpfn If the mappable memory of the device is physically contiguous converting of f to the number of pages and adding it to the base page frame number will give the same result as getting the page frame number of a mapped page In this case only the first page of the device s memory needs to be mapped return hat getkpfnum xsp regp csr btop off For an example showing how to access a memory mapped device from a user program see Using Existing Drivers on page 243 Drivers for Character Devices 163 8 Multiplexing I O on File Descriptors 164 A thread sometimes wants to handle I O on more than one file descriptor One example is an application program that wants to read the temperature from a temperature sensing device and then report the temperature to an interactive display If the program makes a read request and there is no data available it should not block waiting for the temperature before interacting with the user again The po11 2 system call provides users with a mechanism for multiplexing I O over a set of file descriptors that reference open files po11 2 identifies those files on which a program can send or receive data without blocking or on which certain events have occurred To allow a program to poll a character driver the driver must implement the chp
281. oid bioreset struct buf bp bioreset 9F is used to reset the buf 9S structure pointed to by bp allowing a device driver to reuse privately allocated buffers bioreset 9F resets the buffer header to its initially allocated state int biowait struct buf bp biowait 9F suspends the calling thread until the I O request described by bp completes A call to biodone 9F unblocks the waiting thread Usually if a driver does synchronous I O it calls biowait 9F in its strategy 9E routine and calls biodone 9F in its interrupt handler when the request is complete biowait 9F is usually not called by the driver instead it is called by physio 9F or by the file system after calling strategy 9F The driver is responsible for calling biodone 9F when the I O request is complete void bp mapin struct buf bp bp mapin 9F maps the data buffer associated with the buf 9S structure pointed to by bp into the kernel virtual address space so the driver can access it Programmed I O device drivers often use bp mapin 9F because they have to transfer data explicitly between the bu 95 structure s buffer and a device buffer See bp mapin on page 177 for more information void bp mapout struct buf bp bp mapout 9F unmaps the data buffer associated with the buf 9S structure pointed to by bp The buffer must have been mapped previously by bp mapin 9F bp mapout 9F can only be called from user or kernel context void clrbuf s
282. ompletes and the queue is non empty the interrupt routine begins a new transfer This example actually places the decision of whether to start a new transfer into a separate routine for convenience The av forw and the av back members of the buf 9S structure can be used by the driver to manage a list of transfer requests A single pointer can be used to manage a singly linked list or both pointers can be used together to build a Writing Device Drivers August 1994 Ko lll doubly linked list The driver writer can determine from a hardware specification which type of list management such as insertion policies will optimize the performance of the device The transfer list is a per device list so the head and tail of the list are stored in the state structure This example is designed to allow multiple threads access to the drivers shared data so it is extremely important to identify any such data such as the transfer list and protect it with a mutex See Chapter 4 Multithreading for more details about mutex locks Code Example 9 6 Asynchronous block driver st rategy 9E routine static int xxstrategy struct buf bp struct xxstate xsp int instance instance getminor bp b edev xsp ddi_get_soft_state statep instance validate transfer request Add the request to the end of the queue Depending on the device a sorting algorithm such as disksort 9F may be used if it improves the performance of t
283. one of two commands as its infocmd argument DDI_INFO_DEVT2INSTANCE which asks for a device s instance number and DDI INFO DEVT2DEVINFO which asks for pointer to the device s dev info structure In the DDI INFO DEVT2INSTANCE case arg is a dev t and getinfo 9E must translate the minor number to an instance number In the following example the minor number is the instance number so it simply passes back the minor number In this case the driver must not assume that a state structure is available since get info 9E may be called before attach 9E The mapping the driver defines between minor device number and instance number does not necessarily follow the mapping shown in the example In all cases however the mapping must be static Writing Device Drivers August 1994 5 In the DDI_INFO_DEVT2DEVINFO case arg is again a dev t so get info 9E first decodes the instance number for the device It then passes back the dev_info pointer saved in the driver s soft state structure for the appropriate device Code Example 5 1 get info 9E routine static int xxgetinfo dev info t dip ddi info cmd t infocmd void arg void result struct xxstate xsp dev t dev int instance error switch infocmd case DDI_INFO_DEVT2INSTANCE dev dev_t arg result void getminor dev error DDI SUCCESS break case DDI INFO DEVT2DEVINFO dev dev
284. ons void kstat_named_init kstat_named_t knp char name uchar t data type kstat named init 9F associates the name pointed to by name and the type specified in data type with the kstat named 9S structure pointed to by knp void kstat waitq enter kstat io t kiop kstat waitq enter 9F is used to update the kernel io 95 structure pointed to by kiop indicating that a request has arrived but has not yet be processed void kstat waitq exit kstat io t kiop kstat waitq exit 9F is used to update the kernel io 95 structure pointed to by kiop indicating that the request is about to be serviced void kstat runq enter kstat io t kiop kstat runq enter 9F is used to update the kernel io 95 structure pointed to by kiop indicating that the request is in the process of being serviced kstat runq enter 9F is generally invoked after a call to kstat waitq exit 9F void kstat runq exit kstat io t kiop kstat runq exit 9F is used to update the kernel io 95 structure pointed to by kiop indicating that the request is serviced Summary of Solaris 2 4 DDI DKI Services 327 lll C Memory Allocation 328 void kstat waitq to runq kstat io t kiop kstat waitq to runq 9F is used to update the kernel io 95 structure pointed to by kiop indicating that the request is transitioning from one state to the next kstat waitq to runq 9F is used when a driver would normally call kstat waitq exit 9F followed immedi
285. options amp XX_QUEUEING xsp gt throttle 1 Writing Device Drivers August 1994 oj lll Untagged Queueing If tagged queueing fails you can attempt to set untagged queuing In this mode you submit as many commands as you think necessary optimal to the host adapter driver Then the host adapter queues the commands to the target one at a time as opposed to tagged queueing where the host adapter submits as many commands as it can until the target indicates that the is queue full Auto Request Sense Mode Auto request sense mode is most desirable if tagged or untagged queueing is used A contingent allegiance condition is cleared by any subsequent command and consequently the sense data is lost Most HBA drivers will start the next command before performing the target driver callback However some HBA drivers may use a separate and lower priority thread to perform the callbacks which may increase the time it takes to notify the target driver that the packet completed with a check condition In this cas the target driver may not be able to submit a request sense command in time to retrieve the sense data To avoid this loss of sense data the HBA driver or controller should issue a request sense command as soon as a check condition has been detected this mode is known as auto request_sense mode Note that not all HBA drivers are capable of auto request sense mode and some can only operate with auto request sense mode en
286. or Character Devices 159 160 strategy The strategy 9E routine originated in block drivers and is so called because it can implement a strategy for efficient queuing of I O requests to a block device A driver for a character oriented device can also use a strategy 9E routine In the character I O model presented here st rategy 9E does not maintain a queue of requests but rather services one request at a time In this example the strategy 9E routine for a character oriented DMA device allocates DMA resources for the data transfer and starts the command by programming the device register see Chapter 7 DMA for a detailed description Note that strategy 9E does not receive a device number dev t as a parameter this is instead retrieved from the b edev field of the buf 95 structure Code Example 8 7 strategy 9E routine supporting physio 9F static int xxstrategy struct buf bp int instance struct xxstate xsp ddi dma cookie t cookie instance getminor bp b edev xsp ddi get soft state statep instance set up DMA resources with ddi dma buf setup 9F xsp gt bp bp remember bp program DMA engine and start command return 0 Though strategy 9E is declared to return an int it should always return zero strategy 9E indicates an error to physio 9F by setting the B ERROR bit in the b 1ags member of the buf 9S structure and placing the appropriate error number in the b er
287. or example if an I O transfer can be sped up by using an I O cache which at a minimum transfers flushes one cache line ddi_mem_alloc 9F will round the size to a multiple of the cache line to avoid data corruption In x86 ddi_mem_alloc 9F obeys the alignment specified by the dlim_minxfer fields in the passed DMA limit structure In addition the physical address of the allocated memory will be within the dlim_addr_lo and dlims_addr_hi of the DMA limit structure ddi_mem_free 9F is used to free the memory allocated by ddi_mem_alloc 9F Note If the memory is not properly aligned the transfer will succeed but the system will pick a different and possibly less efficient transfer mode that requires less restrictions For this reason ddi mem alloc 9F is preferred over kmem alloc 9F when allocating memory for the device to access Code Example 7 6 is an example of how to allocate memory for streaming access Code Example 7 6 Using ddi mem alloc 9F if ddi mem alloc xsp dip amp limits size 0 amp memp amp real length DDI SUCCESS error handling goto failure if ddi dma addr setup xsp dip NULL memp real length DDI DMA READ DDI DMA SLEEP NULL amp limits amp mem handle DDI DMA MAPPED error handling Writing Device Drivers August 1994 N lll ddi mem free memp goto failure ddi mem alloc 9F returns the actual size of the allocated memory obje
288. ore buffers temp xsp gt regp gt csr The transfer has finished successfully or not biodone bp release any resources used in the transfer such as DMA resources ddi_dma_free 9F Let the next I O thread have access to the device xsp gt busy 0 cv_signal amp xsp gt cv mutex exit amp xsp mu return DDI INTR CLAIMED Asynchronous Data Transfers 184 This section discusses a method for performing asynchronous I O transfers The driver queues the I O requests and then returns control to the caller Again the assumption is that the hardware is a simple disk device that allows one transfer at a time The device interrupts when a data transfer has completed or when an error occurs 1 Check for invalid bu 9S requests As in the synchronous case the device driver should check the bu 95 structure passed to st rategy 9E for validity See Synchronous Data Transfers on page 180 for more details 2 Enqueue the request Unlike synchronous data transfers the asynchronous driver does not wait for the current data transfer to complete Instead it adds the request to a queue The head of the queue can be the current transfer or a separate field in the state structure can be used to hold the active request as in this example If the queue was initially empty then the hardware is not busy and strategy 9E starts the transfer before returning Otherwise whenever a transfer c
289. orming synchronous I O transfers It is assumes that the hardware is a simple disk device that can transfer only one data buffer at a time using DMA The device driver s strategy 9E routine waits for the current request to complete before accepting a new one The device interrupts the CPU when the transfer completes or when an error occurs 1 Check for invalid bu 9S requests Check the bu 95 structure passed to st rategy 9E for validity All drivers should check to see if a The request begins at a valid block The driver converts the b_blkno field to the correct device offset and then determines if the offset is valid for the device b The request does not go beyond the last block on the device c Device specific requirements are met If an error is encountered the driver should indicate the appropriate error with bioerror 9F and complete the request by calling biodone 9F biodone 9F notifies the caller of strategy 9E that the transfer is complete in this case because of an error 2 Check if the device is busy Synchronous data transfers allow single threaded access to the device The device driver enforces this by maintaining a busy flag guarded by a mutex and by waiting on a condition variable with cv_wait 9F when the device is busy Writing Device Drivers August 1994 Ko lll If the device is busy the thread waits until a cv_broadcast 9F or cv_signal 9F from the interrupt handler indicates that
290. ot a table of equivalences That is simply changing from the function in column one to the function or group of functions in column two is not always sufficient If the 4 1 x driver used a function in column one read about the function in column two before changing any code Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description ASSERT ASSERT expression verification CDELAY conditional busy wait DELAY drv_usecwait busy wait for specified interval OTHERQ OTHERQ get pointer to queue s partner queue RD RD get pointer to the read queue WR WR get pointer to the write queue Converting a Device Driver to SunOS 5 4 289 290 Table A 1 SunOS 4 1 x and SunOS 5 4 Kernel Support Routines SunOS 4 1 x SunOS 5 4 Description add_intr ddi_add_intr add an interrupt handler adjmsg adjmsg trim bytes from a message allocb allocb allocate a message block backq backq get pointer to queue behind the current queue bcmp bcmp compare two byte arrays bcopy bcopy copy data between address locations in kernel biodone biodone indicate I O is complete iodone biowait biowait wait for I O to complete iowait bp mapin bp mapin allocate virtual address space bp mapout bp mapout deallocate virtual address space brelse brelse return buffer to the free list btodb z convert bytes to
291. owed by the register of interest The following register names are recognized dot the current location i0 7 input registers to current function 00 7 output registers for current function 10 7 local registers g0 7 global registers psr Processor Status Register tbr Trap Base Register wim Window Invalid Mask g7 always contains the current thread pointer For more information on how these registers are normally used see The SPARC Architecture Manual Version 8 and the System V Application Binary Interface SPARC Processor Supplement The following command displays the PSR as a 4 byte hexadecimal value kadb 0 lt psr X 400cc3 The individual bits of the PSR are defined in lt sys psw h gt More information is available in The SPARC Architecture Manual Version 8 and The SPARC Assembly Language Reference Manual Debugging 253 13 Display and Control Commands The following commands display and control the status of adb 1 kadb 1M b c d Sq r SM Display all breakpoints Display stack trace Change default radix to value of dot Quit Display registers Display built in macros c is very useful with crash dumps it shows the call trace and arguments at the time of the crash It is also useful in kadb 1M when a breakpoint is reached but is usually not useful if kadb 1M is entered at a random time The number of arguments to print can be passed following the c Sc 2
292. p gt mu xsp gt busy 1 mutex exit amp xsp mu do the read static u_int xxintr caddr_t arg struct xxstate xsp caddr_t arg mutex enter amp xsp mu xsp gt busy 0 if xsp gt want xsp gt want 0 cv broadcast amp xsp Ccv mutex exit amp xsp mu Writing Device Drivers August 1994 Hx lll cv timedwait If a thread blocks on a condition with cv wait 9F and that condition does not occur it may wait forever One way to prevent this is to establish a callback with t imeout 9F This callback sets a flag indicating that the condition did not occur normally and then unblocks the thread The notified thread then notices that the condition did not occur and can return an error such as device broken A better solution is to use cv timedwait 9F An absolute wait time is passed to cv timedwait 9F which returns 1 if the time is reached and the event has not occurred It returns nonzero otherwise This saves a lot of work setting up separate timeout 9F routines and avoids having threads get stuck in the driver cv timedwait 9F requires an absolute wait time expressed in clock ticks since the system was last rebooted This can be determined by retrieving the current value with drv_getparm 9F The drv getparm 9F function takes an address to store a value and an indicator of which kernel parameter to retrieve In this case LBOLT is used to get the number of clo
293. p for reading if no thread currently holds the lock for writing and if no thread is waiting to acquire the lock for writing Otherwise rw enter 9F blocks If enter type is RW WRITER rw enter 9F acquires the lock for writing if no thread holds the lock for reading or writing and if no other thread is waiting to acquire the lock for writing Otherwise rw enter 9F blocks Summary of Solaris 2 4 DDI DKI Services 349 350 void rw exit krwlock t rwlp rw exit 9F releases the lock pointed to by rwlp void rw init krwlock t rwlp char name krw type t type void arg rw init 9F prepares the readers writer lock pointed to by rwlp for use RW DRIVER should be passed for t ype int rw read locked krwlock t rwlp The lock pointed to by rw1p must be held during a call to rw read locked 9F If the calling thread holds the lock for reading rw read locked 9P returns a non zero value If the calling thread holds the lock for writing rw read locked 9F returns zero int rw tryenter krwlock t rwlp krw t enter type rw tryenter 9F attempts to enter the lock like rw enter 9F but never blocks It returns a non zero value if the lock was successfully entered and zero otherwise int rw tryupgrade krwlock t rwlp If the calling thread holds the lock pointed to by rw1p for reading rw tryupgrade 9F acquires the lock for writing if no other threads hold the lock and no thread is waiting to acquire the l
294. pends on the configuration deserves special attention For example changing the base memory address of device registers is not likely to affect the behavior of most driver functions if the driver works well with one address it is likely to work as well with a different address providing the configuration code allows it to work at all On the other hand a special I O control call may have different effects depending on the particular device configuration Debugging 265 13 266 Loading the driver with varying configurations assures that the probe 9E and attach 9E entry points can find the device at different addresses For basic functional testing using regular UNIX commands such as cat 1 or dd 1M is usually sufficient for character devices Mounting or booting may be required for block devices Functionality Testing After a driver has been run through configuration testing all of its functionality should be thoroughly tested This requires exercising the operation of all of the driver s entry points In addition to the basic functional tests done in configuration testing full functionality testing requires testing the rest of the entry points and functions to obtain confidence that the driver can correctly perform all of its functions Many drivers will require custom applications to test functionality but basic drivers for devices such as disks tapes or asynchronous boards can be tested using standard system utilities Al
295. pings are SunOS specific and do not become active until SunOS is booted Upon power up the PROM maps essential system devices such as the keyboard Examples in this section use a bwtwo monochrome frame buffer on a SPARCstation IPC Using PROM commands to modify video memory on this frame buffer provides a visual indication that something is happening when PROM commands are executed Open Boot PROM 2 x For complete documentation on the Open Boot PROM see the Open Boot PROM Toolkit User s Guide and monitor 1M The examples in this section refer to a Sun 4c other architectures may require new commands to map memory among other things The Open Boot PROM is currently used on Sun machines with an SBus It is more powerful than the older SunMon The Sun Monitor on page 34 The Open Boot PROM uses an ok prompt rather than the gt prompt used by SunMon However many Open Boot PROM machines present the old style interface by default The n command switches an OBP from the old mode to the new mode Type b boot c continue or n new command mode gt n Type help for more information ok Note If the PROM is in secure mode the security mode parameter is not set to none the PROM password may be required set in the security password parameter Hardware Overview 27 28 To make the machine come up in new mode by default set the environment variable sunmon compat to false
296. pointer tells physio 9F to allocate a buf 9S structure If it is necessary for the driver to provide physio 9F with a bu 95 structure get rbuf 9F should be used to allocate one physio 9F returns zero if the transfer completes successfully or an error number on failure The return value of physio 9F is determined by the strategy 9E routine minphys xxminphys is a pointer to a function to be called by physio 9F to ensure that the size of the requested transfer does not exceed a driver imposed limit If the user requests a larger transfer physio 9F calls xxstrategy repeatedly requesting no more than the imposed limit at a time This is important for DMA transfers because there is only finite amount of DMA resources available Drivers for slow devices such as printers should be careful because they tie up resources for a long time Usually a driver passes a pointer to the kernel function minphys 9F but it can define its own xxminphys routine instead The job of minphys 9F is to keep the b bcount field of the bu 95 structure below a driver limit There may be additional system limits that the driver should not circumvent so the driver minphys routine should call the system minphys 9F routine before returning Code Example 8 6 minphys 9F routine define XXMINVAL 124 lt lt 10 static void xxminphys struct buf bp if bp gt b_bcount gt XXMINVAL bp gt b_bcount XXMINVAL minphys bp Drivers f
297. private h mapdev rev is the version number of the ddi_mapdev_ct1 9S structure It must be set to MAPDEV REV mapdev access must be set to the address of the driver s mapdev_access 9E entry point mapdev free must be set to the address of the driver s mapdev free 9E entry point mapdev dup must be set to the address of the driver s mapdev dup 9E entry point Associating Devices with User Mappings When a user process requests a mapping to a device with mmap 2 the device s segmap 9E entry point is called The device must use ddi mapdev 9F when setting up the memory mapping if it wants to manage device contexts Otherwise the device driver must use ddi segmap 9F to set up the mapping Device Context Management 219 11 220 Unlike ddi segmap 9E addi mapdev 9F can not be used directly as an entry point in the cb ops 95 structure You must define a segmap 9E entry point to use ddi mapdev 9F int ddi mapdev dev t dev off t offset struct as asp caddr t addrp off t len u int prot u int maxprot u int flags cred t cred struct ddi mapdev ctl m ops ddi mapdev handle t handlep void private data ddi_mapdev 9F is similar to ddi segmap 9F in that they both allow a user to map device space In addition to establishing a mapping ddi_mapdev 9F informs the system of the ddi mapdev ct1 95 entry points and creates a handle to the mapping in handlep This hand
298. r and reads from it are directed to an internal status register The command register looks like this Enable Interrupts Clear Interrupt Start Transfer Overview of SunOS Device Drivers 47 lll Qo The status register looks like this Interrupt Pending Interrupts Enabled Device Busy Error Occurred Transfer Complete Many drivers provide macros for the various bits in their registers to make the code more readable The examples in this manual use the following names for the bits in the command register define ENABLE INTERRUPTS 0x10 define CLEAR_INTERRUPT 0x08 define START_TRANSFER 0x04 For the bits in the status register the following macros are used define INTERRUPTS_ENABLED 0x10 define INTERRUPTING 0x08 define DEVICE_BUSY 0x04 define DEVICE_ERROR 0x02 define TRANSFER_COMPLETE 0x01 Device Register Structure Using pointer accesses to communicate with the device results in unreadable code For example the code that reads the data register when a transfer has completed might look like this if reg addr amp TRANSFER COMPLETE data reg addr 1 read data To make the code more readable it is common to define a structure that matches the layout of the devices registers In this case the structure could look like this struct
299. r drivers especially those with DMA capabilities use physio 9F to do most of the work int physio int strat struct buf struct buf bp dev t dev int rw void mincnt struct buf struct uio uio physio 9F requires the driver to provide a st rategy 9E entry point though it does not get placed in the cb ops 95 structure physio 9F ensures that memory space is locked down cannot be paged out for the duration of the data transfer This is necessary for DMA transfers because they cannot handle page faults physio 9F also provides an automated way of breaking a larger transfer into a series of smaller more manageable ones Code Example 8 5 xead 9E and write 9E routines using physio 9F static int xxread dev t dev struct uio uiop cred t credp int instance struct xxstate xsp instance getminor dev xsp ddi get soft state statep instance if xsp NULL return ENXIO return physio xxstrategy NULL dev B READ xxminphys uiop static int xxwrite dev t dev struct uio uiop cred t credp int instance struct xxstate xsp instance getminor dev xsp ddi get soft state statep instance if xsp NULL return ENXIO return physio xxstrategy NULL dev B WRITE xxminphys uiop Writing Device Drivers August 1994 8 In the call to physio 9F xxstrategy is a pointer to the driver strategy routine Passing NULL as the buf 9S structure
300. r may need to synchronize the memory object with respect to various caches This section gives guidelines on when and how to synchronize memory objects Cache is a very high speed memory that sits between the CPU and the system s main memory CPU cache or between a device and the system s main memory I O cache Writing Device Drivers August 1994 N lll CPU Cache System I O Cache Bus Extender I O Cache I O Device Figure 7 2 Caches When an attempt is made to read data from main memory the associated cache first checks to see if it contains the requested data If so it very quickly satisfies the request If the cache does not have the data it retrieves the data from main memory passes the data on to the requestor and saves the data in case that data is requested again Similarly on a write cycle the data is stored in the cache very quickly and the CPU or device is allowed to continue executing transferring This takes much less time than it otherwise would if the CPU or device had to wait for the data to be written to memory An implication of this model is that after a device transfer has completed the data may still be in the I O cache but not yet in main memory If the CPU accesses the memory it may read the wrong data from the CPU cache To ensure a consistent view of the memory for the CPU the driver must call a synchronization routine to write the data from the I O cache to main memory and
301. r processes accessing a device Chapter 12 Loading and Unloading Drivers shows the steps for compiling and linking a driver and for installing it in the system Chapter 13 Debugging gives coding suggestions debugging hints a simple adb kadb tutorial and some hints on testing the driver Appendix A Converting a Device Driver to SunOS 5 4 gives hints on converting SunOS 4 x drivers to SunOS 5 x Appendix B Advanced Topics presents a collection of optional topics Appendix C Summary of Solaris 2 4 DDI DKI Services summarizes by topic the kernel functions device driver can use Appendix D Sample Driver Source Code Listings displays a list of sample drivers and the location of the sample code in the DDK Writing Device Drivers August 1994 Related Books For information about writing STREAMS device drivers and modules see the STREAMS Programmer s Guide For more detailed reference information about the device driver interfaces see sections 9 9E entry points 9F functions and 9S structures of the Solaris 2 4 Reference Manual AnswerBook Typographic Conventions The following table describes the meanings of the typefaces used in this book Typographic Conventions Typeface Meaning Example constant width C language ddi add intr symbol or UNIX registers a device command interrupt with the italic italic Placeholder for a value that the driver must supply
302. ransferred if error biowait bp freerbuf bp return error Drivers for Block Devices 189 lll Ko 190 print int xxprint dev_t dev char str The print 9E entry is called by the system to display a message about an exception it has detected print 9E should call cmn_err 9F to post the message to the console on behalf of the system Here is an example static int xxprint dev t dev char str cmn err CE CONT xx s n str return 0 Writing Device Drivers August 1994 Overview SCSI Target Drivers 10 This chapter describes how to write a SCSI target driver using the interfaces provided by the Sun Common SCSI Architecture SCSA Overviews of SCSI and SCSA are presented followed by the details of implementing a target driver Note Target driver developers may be interested in SCSI HBA driver information A SCSI HBA chapter in progress SCSIHBA PS is included as a ps file in the DDK It is located in this path opt SUNWddk doc The Solaris 2 4 DDI DKI divides the software interface to SCSI devices into two major parts target drivers and host bus adapter HBA drivers Target refers to a driver for a device on a SCSI bus such as a disk or a tape drive host bus adapter refers to the driver for the SCSI controller on the host machine such as the esp driver on a SPARCstation SCSA defines the interface between these two components This chapter discusse
303. re the bp during execution of the command The 1ags are set in the command portion of the scsi_pkt 9S structure If callback is not NULL FUNC and the requested DMA resources are not immediately available the function pointed to by callback will be called when resources may have become available callback can call scsi_init_pkt 9F again If callback is SLEEP FUNC scsi_init_pkt 9F may block waiting for resources char scsi_mname u_char msg scsi_mname 9F decodes the SCSI message code msg and returns the corresponding message string struct scsi pkt scsi pktalloc struct scsi address ap int cmdlen int statuslen int callback void scsi pktalloc 9F allocates and returns a pointer to a SCSI command packet for the target at the SCSI address pointed to by ap cmdlen and statuslen tell scsi pktalloc 9F what size command descriptor block CDB and status completion block SCB to allocate Use scsi_pktalloc 9F only for commands that do no actual I O Use scsi_resalloc 9F for I O commands Writing Device Drivers August 1994 c If callback is not NULL_FUNC and the requested DMA resources are not immediately available the function pointed to by callback will be called when resources may have become available If callback is SLEEP_FUNC scsi_pktalloct 9F may block waiting for resources void scsi_pktfree struct scsi_pkt pkt scsi_pktfree 9F frees the scsi_pkt 9S structure pointed t
304. rent from that used by the system on which they are installed The Intel 82586 for example supports little endian byte ordering conventions making it compatible with Multibus based but not VMEbus based machines Drivers for such peripheral devices must swap bytes without inadvertently reordering the bits in any control fields greater than 16 bits in length See swab 9F for more information The PROM on SPARC Machines 26 Some platforms have a PROM monitor that provides support for debugging a device without an operating system This section describes how to use the PROM on SPARC machines to map device registers so that they can be accessed Usually the device can be exercised enough with PROM commands to determine if the device is working correctly Two separate boot PROMs are briefly discussed here the Open Boot PROM version 2 OBP used on machines with an SBus and the PROM Monitor SunMon available on Sun 4 machines The PROM has several purposes it serves to Bring the machine up from power on or from a hard reset OBP reset command or SunMon k2 command Provide an interactive tool for examining and setting memory device registers and memory mappings Writing Device Drivers August 1994 No lll Boot SunOS or the kernel debugger kadb 1M Simply powering up the computer and attempting to use its PROM to examine device registers will likely fail While the device may be correctly installed those map
305. repeating from Step 6 10 Transfer the rest of the object by repeating from Step 5 11 Release the DMA resources 12 Deallocate the DMA channel DMA 127 lll N Third party DMA In general here are the steps that must be performed to perform third party DMA 1 2 Allocate a DMA channel Retrieve the system s DMA engine limitations with ddi dmae getlim 9F 3 Lock the DMA objects in memory This step is not necessary in block drivers oN A Oe 10 11 12 13 14 for buffers coming from the file system as the file system has already locked the data in memory Allocate DMA resources for the object Retrieve the next DMA window with ddi_dma_nextwin 9F Retrieve the next segment in the window with ddi dma nextseg 9F Get a DMA cookie for the segment with ddi dma segtocookie 9F Program the system DMA engine to perform the transfer with ddi dmae prog 9F Perform any required object synchronizations Transfer the rest of the window by repeating from Step 6 Transfer the rest of the object by repeating from Step 5 Stop the DMA engine with ddi dmae stop 9F Release the DMA resources Deallocate the DMA channel Certain hardware platforms may restrict DMA capabilities in a bus specific way Drivers should use ddi slaveonly 9F to determine if the device is in a slot in which DMA is possible For an example see the attach section on page 95 Device limitations Device limitations
306. request strat is a pointer to a strategy 9E routine which physio 9F calls to handle the I O request If bp is NULL physio 9F allocates a private bu 95 structure Before calling strategy 9E physio 9F locks down the memory referred to by the buf 9S structure initialized from the uio 9S structure For this reason many drivers which do DMA must use physio 9F as itis the only way to lock down memory Writing Device Drivers August 1994 C Copying Data In most block device drivers read 9E and write 9E handle raw I O requests and consist of little more than a call to physio 9F void minphys struct buf bp minphys 9F can be passed as the mincnt argument to physio 9F This causes physio 9F to make I O requests to the strategy routine that are no larger than the system default maximum data transfer size If the original uio 95 I O request is to transfer a greater amount of data than minphys 9F allows physio 9F calls strategy 9E repeatedly These interfaces are data copying utilities used both for copying data within the kernel and for copying data between the kernel and an application program void bcopy caddr t from caddr t to size t bcount bcopy 9F copies count bytes from the location pointed to by from to the location pointed to by to int copyin caddr t userbuf caddr t driverbuf size t cn copyin 9F copies data from an application program s virtual address space to the kernel virtual
307. ribed in Table 2 2 Table 2 2 SPARCstation 1 SBus address bits Bits Description 0 24 These bits are the SBus address lines used by a SBus card to address the contents of the card 25 26 Used by the CPU to select one of the SBus slots These bits generate the SlaveSelect lines 27 Used by the CPU to distinguish between SBus devices and devices resident on the CPU board A one 1 indicates an SBus device and a zero 0 indicates an on board device 28 31 Not used For compatibility with Sun 4 architecture conventions these bits are assumed to be all ones This addressing scheme yields the SPARCstation 1 and SPARCstation 1 addresses shown earlier in Table 2 1 Other implementations may use a different number of address bits SBus Slots SBus systems have several SBus slots The number of slots is system specific The SPARCstation 1 has four SBus slots numbered 0 through 3 Slot 0 is reserved slots 1 2 and 3 are available for SBus cards The slots are used in the following way Slot 0 is not a physical slot but refers to the on board DMA SCSI and Ethernet controllers For convenience these are viewed as being plugged into Slot 0 Slots 1 and 2 are physical slots that have DMA master capability Slot 3 is a slave only physical slot that does not support boards that operate as DMA masters Hardware Overview 17 18 VMEbus On other systems for example the SPARCstation 10 slot 15 slot Oxf is t
308. river The config 8 program has been replaced by Open Boot PROM information and supplemented by information in hardware configuration files see driver conf 4 Changes to Routines The xxinit routine for loadable modules in SunOS 4 x has been split into three routines The VDLOAD case has become _init 9E the VDUNLOAD case has become _fini 9E and the VDSTAT case has become _info 9E It is no longer guaranteed that identify 9E is called once before attach 9E It may now be called any number of times and may be called at any time Do not count device units See ddi get instance 9F for more information Converting a Device Driver to SunOS 5 4 275 276 devices The SunOS 5 x probe 9E is not the same as probe 9E in SunOS 4 x It is called before att ach 9E and may be called any number of times so it must be stateless If it allocates resources before it probes the device it must deallocate them before returning regardless of success or failure attach 9E will not be called unless probe 9E succeeds attach 9E is called to allocate any resources the driver needs to operate the device The system now assigns the instance number previously known as the unit number to the device The reason the rules are so stringent is that the implementation will change If driver routines follow these rules they will not be affected by changes to the implementation If however they assume that the autoconfiguration routine
309. river the next time these pages of the mapping are accessed by calling the mapdev_access 9E entry point int ddi mapdev nointercept ddi mapdev handle t handle off t offset off t len ddi_mapdev_nointercept 9F validates the mapping translations for the pages of the mapping specified by handle offset and len By validating the mapping translations for these pages the driver is telling the system not to intercept accesses to these pages of the mapping and allow accesses to proceed without notifying the device driver ddi_mapdev_nointercept 9F must be called with the offset and the handle of the mapping that generated the access event for the access to complete If ddi_mapdev_nointercept 9F is not called on this handle the mapping translations will not be validated and the process will receive a SIGBUS For both functions requests affect the entire page containing the offset and all the pages up to and including the entire page containing the last byte as indicated by offset len The device driver must make sure that for each page of device memory being mapped only one process has valid translations at any one time Both functions return zero if they are successful It however there was an error in validating or invalidating the mapping translations that error is returned to the device driver It is the device driver s responsibility to return this error to the system Device Context Management Entry Points The following
310. rivers see getinfo on page 102 for more information on DDI INFO DI DDI INFO DI EVT2DI EVT21NSTANCI E case However in the EVINFO case of the get info 9E routine the target driver must return a pointer to its dev info node This can be saved in the driver state structure or can be retrieved from the sd dev field of the scsi_device 9S structure Code Example 10 4 Alternative SCSI target driver get info 9E code fragment case DDI_INFO_ DEVT2DEVINFO dev dev_t arg instance getminor dev xsp ddi_get_soft_state statep instance if xsp NULL return DDI FAILURE result void xsp sdp sd dev return DDI SUCCESS Resource Allocation To send a SCSI command to the device the target driver must create and initialize a scsi pk SCSI Target Drivers t 95 structure and pass it to the host bus adapter driver 207 10 208 scsi_init_pkt The scsi init pkt 9F routine allocates and zeros a scsi_pkt 9S structure it also sets pointers to pkt private pkt scbp pkt cdbp Additionally it provides a callback mechanism to handle the case where resources are not available This structure contains the following fields struct scsi pkt scsi init pkt struct scsi address ap struct scsi pkt pktp struct buf bp int cmdlen int statuslen int privatelen int flags int callback caddr t caddr t arg ap is a pointer to a scsi address st
311. ror ddi dma buf setup xsp dip xsp bp flags xxdmacallback caddr t xsp limits amp xsp handle if error DDI DMA NORESOURCES xsp callback count t mutex exit amp xsp callback mutex static int xxdmacallback caddr t callbackarg struct xxstate xsp struct xxstate callbackarg mutex_enter amp xsp gt callback_mutex if xsp gt cancel_callbacks do not reschedule in process of detaching xsp callback count if xsp callback count 0 cv signal amp xsp callback cv mutex exit amp xsp callback mutex return 1 tell framework for this callback routine not to reschedule it Presumably at this point the device is still active and will not be detached until the DMA has completed DMA 141 A return of 0 means try again later 7 error ddi dma buf setup xsp dip xsp bp flags DDI DMA DONTWAIT NULL amp limits amp xsp handle if error DDI DMA MAPPED program the DMA engine xsp callback count mutex exit amp xsp callback mutex return 1 if error DDI_DMA_NORESOURCES xsp callback count mutex exit amp xsp callback mutex return 1 mutex_exit amp xsp gt callback_mutex return 0 Synchronizing Memory Objects 142 Cache At various points when the memory object is accessed including the time of removal of the DMA resources the drive
312. ror bp EIO biodone bp The other return values from scsi_transport 9F are TRAN BUSY There is already a command in progress for the specified target TRAN BADPKT The DMA count in the packet was too large TRAN BADPKT The host adapter driver rejected this packet TRAN FATAL ERROR The host adapter driver is unable to accept this packet Warning The mutex sd mutex in the scsi_device 9S structure must not be held across a call to scsi_transport 9F If scsi_transport 9F returns TRAN ACCEPT the packet is the responsibility of the host bus adapter driver and should not be accessed by the target driver until the command completion routine is called SCSI Target Drivers 211 10 Command Completion Once the host bus adapter driver has done all it can with the command it invokes the packet s completion callback routine passing a pointer to the scsi_pkt 9S structure as a parameter The completion routine decodes the packet and takes the appropriate action A simple completion routine is given in Code Example 10 5 Code Example 10 5 SCSI driver completion routine static void xxcallback struct scsi_pkt pkt struct buf bp struct xxstate xsp int instance struct scsi status ssp Get a pointer to the buf 9S structure for the command and to the per instance data structure 4 bp struct buf pkt 5pkt private instance getminor bp 5b
313. ror field physio 9F will then return zero to indicate success or the value of the b error field if an error occurs Writing Device Drivers August 1994 8 After calling strategy 9E physio 9F waits until the driver finishes with the bu 95 structure the transfer fails or is complete by calling biowait 9F A call to biodone 9F must later be made by the driver or physio 9F will wait forever For example strategy 9E must call biodone 9F when an error occurs Transfer Completion On completion of the DMA transfer the device generates an interrupt Eventually the interrupt routine will be called In this example xxintr receives a pointer to the state structure for the device that might have issued the interrupt The most important function in the interrupt routine is to notify physio 9F that the transfer is complete which is accomplished by the call to biodone 9F If the driver does not notify physio 9F that the transfer is complete physio 9F will not return and the thread will hang Code Example 8 8 Interrupt routine using physio 9F static u int xxintr caddr t arg struct xxstate xsp struct xxstate arg if device did not interrupt return DDI INTR UNCLAIMED if error i error handling release any resources used in the transfer such as DMA resources ddi_dma_free 9F wake up waiting thread in physio 9F biodone xsp bp return DDI INTR CLAIMED Mappi
314. rovides similar functionality with condition variables Threads are blocked on condition variables until they are notified that the condition has occurred The driver must acquire a mutex which protects the condition variable before blocking the thread The mutex is then released before the thread is blocked similar to blocking unblocking interrupts in SunOS 4 X Code Example 13 3 Synchronization in SunOS 5 x similar to SunOS 4 x int busy global device busy flag kmutex_t busy_mu mutex protecting busy flag kcondvar_t busy_cv condition variable for busy flag static int xxread dev_t dev struct uio uiop cred_t credp mutex enter amp busy mu while busy cv wait amp busy cv amp busy mu busy 1 mutex exit amp busy mu do the read static u_int xxintr caddr_t arg mutex enter amp busy mu busy 0 cv_broadcast amp busy_cv mutex_exit amp busy_mu Writing Device Drivers August 1994 A Like wakeup cv broadcast 9F unblocks all threads waiting on the condition variable To wake up one thread use cv signal 9F there was no documented equivalent for cv signal 9F in SunOS 4 x Note There is no equivalent to the dispatch priority passed to sleep Though the sleep and wakeup calls exist please do not use them since the result would be an MT unsafe driver See Thread Synchronization on page 79 for more information Catching Signals
315. rresponds to each partition on the disk the driver must also create an nblocks property This is an integer property giving the number of blocks supported by the minor device expressed in units of DEV_BSIZE 512 bytes The file system uses the nblocks property to determine device limits See Properties on page 59 for details Code Example 9 1 shows a typical att ach 9E entry point with emphasis on creating the device s minor node and the nblocks property Code Example 9 1 Block driver att ach 9E routine static int xxattach dev info t dip ddi attach cmd t cmd Drivers for Block Devices 173 174 default switch cmd case DDI ATTACH allocate a state structure and initialize it map the devices registers add the device driver s interrupt handler s initialize any mutexs and condition variables read label information if the device is a disk Create the devices minor node Note that the node type argument is set to DDI NT BLOCK xy if ddi create minor node dip minor name S IFBLK minor number DDI NT BLOCK 0 DDI FAILURE free resources allocated so far Remove any previously allocated minor nodes ddi remove minor node dip NULL return DDI FAILURE Create driver properties like nblocks If the device is a disk the nblocks property is usually calculated from information in the disk label xsp nblocks size of device in 512 byte blocks if
316. rs such as the functions that are required for module loading init 9E info 9E and _fini 9E and the required autoconfiguration entry points identify 9E att ach 9E and getinfo 9E Drivers may also support the optional autoconfiguration entry Writing Device Drivers August 1994 3 points for probe E and detach 9E All device drivers must support the entry point get info 9E Most drivers have open 9E and close 9E entry points to control access to their devices See Chapter 8 Drivers for Character Devices Chapter 9 Drivers for Block Devices and Chapter 5 Autoconfiguration for details about these entry points Traditionally all driver function and variable names have some prefix added to them Usually this is the name of the driver such as xxopen for the open 9E routine of driver xx In subsequent examples xx is used as the driver prefix Note In SunOS 5 x only the loadable module routines must be visible outside the driver object module Everything else can have the storage class static Loadable Module Routines int _init void int info struct modinfo modinfop int _fini void All drivers must implement the init 9E fini 9E and _info 9E entry points to load unload and report information about the driver module The driver is single threaded when the kernel calls init No other thread will enter a driver routine until mod install 9F returns success Any resources global to
317. rt should be initialized to nodev 9F For example character device drivers should set all the block only fields such as cb stategy to nodev 9F The cb str field is used to determine if this is a STREAMS based driver The device drivers discussed in this book are not STREAMS based so cb str must be set to NULL The cb 1ag member indicates whether the driver is safe for multithreading D MP and whether it is a new style driver D NEW All drivers are new style drivers and should properly handle the multithreaded environment so cb 1ag should be set to both D NEW D_MP If the driver properly handles 64 bit offsets it should also set the D_64BIT flag in the cb flag field This specifies that the driver will use the uio loffset field of the uio 9S structure Loadable Driver Interface Device drivers must be dynamically loadable and should be unloadable to help conserve memory resources Drivers that can be unloaded are also easier to test and debug Each device driver has a section of code that defines a loadable interface This code section defines a static pointer for the soft state routines the structures described in Data Structures on page 88 and the routines involved in loading the module Code Example 5 1 Loadable interface section static void statep for soft state routines static struct cb ops xx cb ops forward reference static struct dev ops xx ops DEVO REV 0 xxgetinfo xxidentif
318. rticular device may be able to generate interrupts on one or more of them Multiple devices may share a common IRQ line 107 108 The bus IRQ lines are connected to an interrupt controller that arbitrates between interrupt requests The kernel programs the interrupt controller to select which interrupts should be enabled at any particular time When the interrupt controller determines that an interrupt should be delivered it raises a request to the CPU If processor interrupts are enabled the CPU acknowledges the interrupt and causes the kernel to begin interrupt handler processing Interrupt Specification An interrupt specification is the information the system needs in order to link an interrupt handler with a given interrupt It describes the information provided by the hardware to the system when making an interrupt request Interrupt specifications typically includes a bus interrupt level For vectored interrupts it includes an interrupt vector On x86 platforms the driver conf 4 file specifies the relative priority of the devices interrupt See isa 4 eisa 4 mca 4 sbus 4 and vme 4 for specific information on interrupt specifications for these buses Interrupt Number When registering interrupts the driver must provide the system with an interrupt number This identifies which interrupt specification the driver is registering a handler for Most devices have one interrupt interrupt number zero However there are devi
319. ruct xxstate xsp ctxp gt xsp mutex enter amp xsp ctx lock if xsp current ctx ctxp Xsp current ctx NULL mutex exit amp xsp ctx lock kmem free ctxp sizeof struct xxctx mapdev dup int xxmapdev dup ddi mapdev handle t handle void devprivate ddi mapdev handle t new handle void new devprivate This entry point is called when a device mapping is duplicated for example by a user process calling fork 2 The driver is expected to generate new driver private data for the new mapping handle is a pointer to the mapping being duplicated new handle is a pointer to the new mapping that was duplicated devprivate is a pointer to the driver private data associated with the mapping being duplicated new devprivate should be set to point to the new driver private data for the new mapping Device Context Management 225 11 226 Mappings created with mapdev_dup 9E will by default have their mapping translations invalidated This will force a call to the mapdev_access 9E entry point the first time the mapping is accessed Code Example 11 4 mapdev dup 9E routine static int xxmapdev dup ddi mapdev handle t handle void devprivate ddi mapdev handle t new handle void new devprivate struct xxctx ctxp devprivate struct xxstate xsp ctxp gt xsp struct xxctx newctx Create a new context for the duplicated mapping newctx kmem_alloc sizeof struct
320. ructure This is the sd address field of the device s scsi_device 9S structure pktp is a pointer to the scsi_pkt 9S structure to be initialized If this is set to NULL a new packet is allocated bp is a pointer to a bu 95 structure If this is non NULL and contains a valid byte count DMA resources are allocated cmdlen is the length of the SCSI Command Descriptor Block CDB in bytes statuslen is the required length of the SCSI status completion block in bytes privatelen is the number of bytes to allocate for the pkt private field To store a pointer specify the size of the pointer here such as sizeof struct xxstate when storing a pointer to the state structure flags is a set of flags Possible bits include PKT CONSISTENT This must be set if the DMA buffer was allocated using scsi alloc consistent buf 9F In this case the host bus adapter driver guarantees that the data transfer is properly synchronized before performing the target driver s command completion callback PKT DMA PARTIAL This may be set if the driver can accept a partial DMA mapping If set scsi init pkt 9F allocates DMA resources with the DDI DMA PARTIAL dmar flag set The pkt_resid 9E field of the scsi_pkt 9S structure may be returned with a non zero residual indicating the number of bytes for which scsi init pkt was unable to allocate DMA resources Writing Device Drivers August 1994 10z callback specifies the
321. s are called only in a certain order first identify 9E then probe 9E then attach 9E for example these drivers will break in some future release Instance Numbers In SunOS 4 x drivers used to count the number of devices that they found and assign a unit number to each in the range 0 to the number of units found less one Now these are called instance numbers and are assigned to devices by the system Instances can be thought of as a shorthand name for a particular instance of a device 000 could name instance 0 of device oo They are assigned and remembered by the system even after any number of reboots This is because at open 2 time all the system has is a dev t To determine which device is needed since it may need to be attached the system needs to get the instance number which the driver retrieves from the minor number The mapping between instance numbers and minor numbers see getinfo 9E should be static The driver should not require any state information to do the translation since that information may not be available the device may not be attached All devices in the system are represented by a data structure in the kernel called the device tree The devices hierarchy is a representation of this tree in the file system Writing Device Drivers August 1994 A In SunOS 4 x special device files were created using mknod or by an installation script running mknod by the administrator
322. s implies success See the manual pages in Section 9 of the man Pages 9E DDI and DKI Driver Entry Points to learn about entry point return semantics unsigned long ptob unsigned long numpages ptob 9F converts a size expressed in terms of the main system MMU page size to a size expressed in bytes Writing Device Drivers August 1994 Sample Driver Source Code Listings D This chapter lists all the sample driver source code available on the DDK Sample driver names and driver descriptions are provided Sample drivers are located in the following DDK path opt SUNWddk driver dev Table D 1 Sample driver source code listings Subdirectory Driver description sst Simple SCSI target driver bst Block SCSI target driver cgsix Graphics device driver psli Data link provider interface DLPI network device driver pio Simple programmed I O driver dma Simple DMA character device driver ramdisk Simple RAM disk pseudo device driver 357 lll J 358 Writing Device Drivers August 1994 Index A adb 1 command 246 add drv 1M command 227 address spaces 2 18 attach 9E entry point 95 autoconfiguration of block devices 171 of character devices 148 of SCSI drivers 199 routines 49 autovectored interrupts 107 B binary compatibility 4 block driver autoconfiguration 171 entry points 50 slice number 171 block interrupt cookie 52 burst sizes 135 bus architectures 14 interrupt levels
323. s 2 4 DDI DKI Services 329 lll C Process Signaling 330 char sprintf char buf const char fmt sprint 9F is just like the C library s sprintf 3 Use it to format a message and place it in buf void vcmn err int level char format va list ap vcmn err 9F is a version of cmn err 9F that uses varargs see the stdarg 5 manual page char vsprintf char buf const char fmt va list ap vsprintf 9F is a version of sprintf 9F that uses varargs see the stdarg 5 manual page These interfaces allow a device driver to send signals to a process in a multithread safe manner void proc ref void proc ref 9F retrieves an unambiguous reference to the process of the current thread for signalling purposes int proc signal void pref int sig proc signal 9F sends the signal indicated in sig to the process defined by pref that has been referenced by proc ref 9F void proc unref void pref proc unref 9F unreferences the process defined by pref Writing Device Drivers August 1994 C lll Properties Properties are name value pairs defined by the PROM or the kernel at boot time by hardware configuration files or by calls to ddi prop create 9F These interfaces handle creating modifying retrieving and reporting properties int ddi prop create dev t dev dev info t dip int flags char name caddr t valuep int length ddi prop create 9F creates a property of the
324. s above could look like the following Code Example 8 14 Using io ct1 2 finclude sys types h include xxio h int main void u_char status read the device status t if ioctl fd XX GET STATUS amp status 1 error handling printf device sta exit 0 Drivers for Character Devices tus xWMn status 169 170 Writing Device Drivers August 1994 File I O Driversfor Block Devices 9 This chapter describes the structure of block device drivers The kernel views a block device as a set of randomly accessible logical blocks The file system buffers the data blocks between a block device and the user space using a list of bu 95 structures Only block devices can support a file system For information on writing disk drivers that support SunOS disk commands such as format 1M see Appendix B Advanced Topics A file system is a tree structured hierarchy of directories and files Some file systems such as the UNIX File System UFS reside on block oriented devices File systems are created by mk s 1M and new s 1M When an application issues a read 2 or write 2 system call to an ordinary file on the UFS file system the file system may call the device driver strategy 9E entry point for the block device on which the file resides The file system code may call strategy 9E several times for a single read 2 or write 2 system call It is the file syst
325. s been extended The goal of the new DMA model is to abstract the platform dependent details of DMA away from the driver A sliding DMA window has been added for drivers that want to do DMA to large objects and the DMA routines can be informed of device limitations such as 24 bit addressing Converting a Device Driver to SunOS 5 4 283 lll D Conversion Notes 284 The normal sequence for DMA is similar to SunOS 4 x Commit DMA resources using one of the ddi_dma_setup 9F routines retrieve the DMA address from the handle to do the DMA then free the mapping with ddi_dma_free 9F The new sequence is something like this 1 ddi_dma_buf_setup 9F allocate resources ddi_dma_nextwin 9F to get the first DMA window ddi dma nextseg 9F to get the first DMA segment 2 3 4 ddi dma segtocookie retrieve address from the returned cookie 5 program the device and start the DMA 6 Perform the transfer Note If the transfer involves several segments or windows or both you can call ddi dma nextseg 9F or ddi_dma_nextwin 9F or both to move to subsequent segments and windows 7 ddi dma free 9F free mapping when DMA is complete Additional routines have been added to synchronize any underlying caches and buffers and handle IOPB memory See Chapter 7 DMA for details In addition in SunOS 4 x the driver had to inform the system that it might do DMA either through the mb driver structure or with
326. s target drivers only See SCSI HBA Drivers for information on host bus adapter drivers Note The terms host bus adapter or HBA used in this manual are equivalent to the phrase host adapter as defined in SCSI specifications 191 10 Reference Documents 192 Target drivers can be either character or block device drivers depending on the device Drivers for tape drives are usually character device drivers while disks are handled by block device drivers This chapter describes how to write a SCSI target driver and discusses the additional requirements that SCSA places on block and character drivers for SCSI target devices The following reference documents provide supplemental information needed by the designers of target drivers and host bus adapter drivers Small Computer System Interface SCSI Standard ANSI X3 131 1986 American National Standards Institute Sales Department 1430 Broadway New York NY 10018 Phone 212 642 4900 Small Computer System Interface 2 SCSI 2 Standard document X3 131 1994 Global Engineering Documents 15 Inverness Way East Englewood CO 80112 5704 Phone 800 854 7179 or 303 792 2181 FAX 303 792 2192 Basics of SCSI ANCOT Corporation Menlo Park California 94025 Phone 415 322 5322 FAX 415 322 0455 Also refer to the SCSI command specification for the target device provided by the hardware vendor For information on setting global SCSI options see Appendi
327. s the DMA channel for an operation This function allows access to various capabilities of the DMA engine hardware It disables the channel prior to setup and enables the channel before returning The DMA address and count are specified by passing ddi dmae prog 9F a cookie obtained from ddi dma segtocookie 9F Other DMA engine parameters are specified by the DMA engine request structure passed in through dmaereqp The fields of that structure are documented in ddi dmae req 99 int ddi dmae disable dev info t dip int chnl The ddi dmae disable 9F function disables the DMA channel so that it no longer responds to a device s DMA service requests int ddi dmae enable dev info t dip int chnl The ddi dmae enable 9F function enables the DMA channel for operation This may be used to re enable the channel after a call to ddi dmae disable 9F The channel is automatically enabled after successful programming by ddi dmae prog 9F int ddi dmae stop dev info t dip int chnl The ddi dmae stop 9F function disables the channel and terminates any active operation int ddi dmae getcnt dev info t dip int chnl int countp The ddi dmae getocnt 9F function examines the count register of the DMA channel and sets countp to the number of bytes remaining to be transferred The channel is assumed to be stopped Writing Device Drivers August 1994 C lll int ddi_dmae_lstparty dev_info_t dip int chnl The ddi_
328. s using SCSI bus adapters of varying degrees of intelligence SCSI Target Drivers 195 10 SCSA Functions 196 SCSA defines a number of functions listed in Table 10 1 which manage the allocation and freeing of resources the sensing and setting of control states and the transport of SCSI commands Table 10 1 Standard SCSA Functions Function Name Category scsi_init_pkt 9F Resource management scsi_sync_pkt 9F Scsi dmafree 9F Sscsi destroy pkt 9F Scsi alloc consistent buf 9F Scsi free consistent buf 9F Sscsi transport 9F Command transport scsi ifgetcap 9F Transport information and control Scsi ifsetcap 9F scsi_abort 9F Error handling Scsi reset 9F scsi_poll 9F Polled I O scsi_probe 9F Probe functions scsi_unprobe 9F makecom_g0 9F CDB initialization functions makecom_gl 9F makecom g0 s 9F makecom g5 9F Writing Device Drivers August 1994 10z SCSA Compatibility Functions SCSI Target Drivers The functions listed in Table 10 2 are maintained for both source and binary compatibility with previous releases However new drivers should use the new functions listed in Table 10 1 Table 10 2 SCSA Compatibility Functions Function Name Category scsi resalloc 9F Resource management scsi resfree 9F scsi pktalloc 9F scsi pktfree 9F scsi dmaget 9F get pktiopb 9F free pktiopb 9F scsi slave 9F Probe functions sc
329. se every driver should be Solaris 2 4 DDI DKI compliant One way to determine if the driver is compliant is by inspection The driver can be visually inspected to ensure that only kernel routines and data structures specified in sections 9F and 9S of the Solaris 2 4 Reference Manual AnswerBook are used In addition the Solaris 2 4 Driver Developer Kit DDK now includes a DDI compliance tool DDICT that checks device driver C source code for non DDI DKI compliance and issues either error or warning messages when it finds non compliant code SunSoft recommends that all drivers be written to pass DDICT After the DDK has been installed the DDICT can be found in Debugging 267 opt SUNWddk driver dev bin ddict A new manual page describing DDICT is available in opt SUNWddk driver dev ddict man mani1 ddict 1 Installation and Packaging Testing Drivers are delivered to customers in packages A package can be added and removed from the system using a standard documented mechanism see the SunOS 5 4 Application Packaging and Installation Guide Test that the driver has been correctly packaged to ensure that the end user will be able to add it to and remove it from a system In testing the package should be installed and removed from every type of media on which it will be released and on several system configurations Packages must not make unwarranted assumptions about the directory environment of the target system Certain valid ass
330. si unslave 9F Hardware Configuration File Since SCSI devices are not self identifying a hardware configuration file is required for a target driver see driver conf 4 and scsi 4 for details A typical configuration file looks like this name xx class scsi target 2 lun 0 The system reads the file during autoconfiguration and uses the class property to identify the driver s possible parent The system then attempts to attach the driver to any parent driver that is of class scsi All host bus adapter drivers are of this class Using the class property rather than the parent property allows the target driver to be attached to any host bus adapter driver that finds the expected device at the specified target and lun ids The target driver is responsible for verifying this in its probe 9E routine SCSI Target Drivers 197 10 198 Declarations and Data Structures Target drivers must include the header file lt sys scsi scsi h gt SCSI target drivers must also include this declaration char _depends_on misc scsi Scsi device Structure The host bus adapter driver allocates and initializes a scsi_device 9S structure for the target driver before either the probe 9E or attach 9E routine is called This structure stores information about each SCSI logical unit including pointers to information areas that contain both generic and device specific information There is one scsi_device 9S structure for each logica
331. si_pkt 9S structure pointed to by pkt at the SCSI address denoted by ap To indicate the current target pass in ap the sd address field of the scsi_device 9S structure for the target To abort the current command pass NULL for pkt Summary of Solaris 2 4 DDI DKI Services 337 338 struct buf scsi alloc consistent buf struct scsi address ap struct buf bp int datalen ulong bflags int callback caddr t caddr t arg scsi alloc consistent buf 9F allocates a buffer header and the associated data buffer for direct memory access DMA transfer This buffer is allocated from the IOPB space which is considered consistent memory If bp is NULL a new buffer header will be allocated using get rbuf 9F If datalen is non zero a new buffer will be allocated using ddi iopb alloc 9F If callback is not NULL FUNC and the requested DMA resources are not immediately available the function pointed to by callback will be called when resources may have become available callback can call scsi alloc consistent buf 9F again If callback is SLEEP FUNC scsi alloc consistent buf 9F may block waiting for resources char scsi cname u char cmd char cmdvec scsi cname 9F searches for the command code cmd in the command vector cmdvec and returns the command name Each string in cmdvec starts with a one character command code followed by the name of the command To use scsi_cname 9F the driver must define a comm
332. sk information to or from the device driver In the case where data is copied out of the driver to the user ddi_copyout 9F should be used to copy the information into the user s address space When data is copied to the disk from the user the ddi_copyin 9F should be used to copy data into the kernels address space Table B 1 lists the mandatory Sun disk I O controls Table B 1 Mandatory Sun Disk I O Controls I O Control Description DKIOCINFO Return information describing the disk controller DKIOCGAPART Return a disk s partition map DKIOCSAPART Set a disk s partition map DKIOCGGEOM Return a disk s geometry DKIOCSGEOM Set a disk s geometry DKIOCGVTOC Return a disk s Volume Table of Contents DKIOCSVTOC Set a disk s Volume Table of Contents Sun disks may also support a number of optional ioctls listed in the ndio 7 manual page Table B 2 lists optional Sun disk ioctls Table B 2 Optional Sun Disk Ioctls I O Control Description HDKIOCGTYPE Return the disk s type HDKIOCSTYPE Set the disk s type Advanced Topics 301 302 Table B 2 Optional Sun Disk Ioctls I O Control Description HDKIOCGBAD Return the bad sector map of the device HDKIOCSBAD Set the bad sector map for the device HDKIOCGDIAG Return the diagnostic information regarding the most recent command Disk Performance The Solaris 2 x DDI DKI provides facilities to optimize I O transfers for improved file system performan
333. sp mu return 0 Drivers for Block Devices 177 9 Data Transfers strategy int xxstrategy struct buf bp The strategy 9E entry point is used to read and write data buffers to and from a block device The name strategy comes from the fact that this entry point may implement some optimal strategy for ordering requests to the device strategy 9E can be written to process one request at a time synchronous transfer or to queue multiple requests to the device asynchronous transfer When choosing a method the abilities and limitations of the device should be taken into account The strategy 9E routine is passed a pointer to a bu 95 structure This structure describes the transfer request and contains status information on return buf 9S and st rategy 9E are the focus of block device operations The buf Structure 178 Below is a list of buf structure members that are important to block drivers int b_flags Buffer Status struct buf av forw Driver work list link struct buf av back Driver work lists link unsigned int b bcount of bytes to transfer union caddr t b addr Buffer s virtual address b un daddr t b blkno Block number on device diskaddr t b lblkno Expanded block number on device unsigned int b resid of bytes not transferred after error int b error Expanded error field void b private opaque driver private are
334. ss All addresses accessed directly by the driver are kernel virtual addresses they refer to the kernel address space Address Spaces An address space is a set of virtual address segments each of which is a contiguous range of virtual addresses Each user process has an address space called the user address space The kernel has its own address space called the kernel address space Writing Device Drivers August 1994 A lll Special Files In UNIX devices are treated as files They are represented in the file system by special files These files are advertised by the device driver and maintained by the drvconfig 1M program Special files commonly reside in the devices directory hierarchy Special files may be of type block or character The type indicates which kind of device driver operates the device Associated with each special file is a device number This consists of a major number and a minor number The major number identifies the device driver associated with the special file The minor number is created and used by the device driver to further identify the special file Usually the minor number is an encoding that identifies the device the driver should access and the type of access to perform The minor number for example could identify a tape device requiring backup and also specify whether the tape needs to be rewound when the backup operation completes Dynamic Loading of Kernel Modules Kernel modules are
335. ster mapping and memory mapping Accessing kernel or user process space from the device DMA services Managing device properties Writing Device Drivers August 1994 Device Tree Architectural independence is achieved in the Solaris 2 x DDI DKI through a layered approach implemented as a tree structure Each node in the tree structure is described by a device information structure Standard device drivers and their devices are associated with leaf nodes These drivers are called leaf drivers Bus drivers are associated with bus nexus nodes and are called bus nexus drivers This book documents writing leaf drivers only Figure 1 1 illustrates possible device tree configurations root node sbus onboard uart bus nexus node leaf node vmebus adapter bus nexus node xyz device leaf node root node vme bus bus nexus node Xyz device leaf node Figure 1 1 Possible device tree configurations Overview of the SunOS Kernel onboard uart leaf node The topmost node in the device tree is called the root node The tree structure creates a parent child relationship between nodes This parent child relationship is the key to architectural independence When a leaf or bus nexus driver requires a service that is architecturally dependent in nature it requests its parent to provide the service The intermediate nodes
336. supportedNn return DDI FAILURE The interrupt routine will grab the mutex so a null handler is required ef if ddi_add_intr dip inumber amp xsp iblock cookie NULL u int caddr t nulldev NULL DDI_SUCCESS cmn err CE WARN xx cannot add interrupt handler return DDI FAILURE mutex init amp xsp mu xx mutex MUTEX DRIVER void xsp iblock cookie ddi remove intr dip inumber xsp iblock cookie if ddi add intr dip inumber amp xsp iblock cookie amp xsp idevice cookie xxintr caddr t xsp DDI_SUCCESS cmn err CE WARN xx cannot add interrupt handler goto failed cv_init amp xsp gt cv xx cv CV DRIVER NULL return DDI SUCCESS failed remove interrupt handler if necessary destroy mutex return DDI FAILURE Writing Device Drivers August 1994 ON lll Responsibilities of an Interrupt Handler The interrupt handler has a set of responsibilities to perform Some are required by the framework and some are required by the device All interrupt handlers are required to do the following 1 Possibly reject the interrupt The interrupt handler must first examine the device and determine if it has issued the interrupt If it has not the handler must return DDI_INTR_UNCLAIMED This step allows the implementation of device polling it tells the system whether this device among
337. synchronization and protection mechanisms beyond those provided by the locking primitives described in Locking Primitives on page 74 some devices require that a sequence of events happen in order without interruption In conjunction with Writing Device Drivers August 1994 No lll the locking primitives the function ddi enter critical 9F asks the system to guarantee to the best of its ability that the current thread will neither be preempted nor interrupted This stays in effect until a closing call to ddi_exit_critical 9F is made See ddi_enter_critical 9F for details Delays Many chips specify that they can be accessed only at specified intervals For example the Zilog Z8530 SCC has a write recovery time of 1 6 microseconds This means that a delay must be enforced with drv_usecwait 9F when writing characters with an 8530 In some instances it is unclear what delays are needed in such cases they must be determined empirically Internal Sequencing Logic Devices with internal sequencing logic map multiple internal registers to the same external address There are various kinds of internal sequencing logic The Intel 8251A and the Signetics 2651 alternate the same external register between two internal mode registers Writing to the first internal register is accomplished by writing to the external register This write however has the side effect of setting up the sequencing logic in the chip so that the nex
338. t arg instance getminor dev xsp ddi get soft state statep instance if xsp NULL return DDI FAILURE result void xsp gt dip error DDI SUCCESS break default error DDI FAILURE break return error Autoconfiguration 105 106 Writing Device Drivers August 1994 Overview Interrupt Handlers 6 This chapter describes the interrupt handling mechanisms of the Solaris 2 x DDI DKI This includes registering servicing and removing interrupts An interrupt is a hardware signal from a device to the CPU It tells the CPU that the device needs attention and the CPU should drop whatever it is doing and respond to the device If the CPU is available it is not doing something that is higher priority such as servicing a higher priority interrupt it suspends the current thread and eventually invokes the interrupt handler for that device The job of the interrupt handler is to service the device and stop it from interrupting Once the handler returns the CPU resumes whatever it was doing before the interrupt occurred The Solaris 2 x DDI DKI provides a bus architecture independent interface for registering and servicing interrupts Drivers must register their device interrupts before they can receive and service interrupts Example On x86 platforms a device requests an interrupt by asserting an interrupt request line IRQ on the system bus The bus implements multiple IRQ lines and a pa
339. t read write operation refers to the second internal register The NEC PD7201 PCC has multiple internal data registers To write a byte into a particular register two steps must be performed The first step is to write into register zero the number of the register into which the following byte of data will go The data is then written to the specified data register The sequencing logic automatically sets up the chip so that the next byte sent will go into data register zero The AMD 9513 timer has a data pointer register that points at the data register into which a data byte will go When sending a byte to the data register the pointer is incremented The current value of the pointer register cannot be read Interrupt Issues The following are some common interrupt related issues Hardware Overview 25 A controller interrupt does not necessarily indicate that both the controller and one of its slave devices are ready For some controllers an interrupt may indicate that either the controller is ready or one of its devices is ready but not both Not all devices power up with interrupts disabled and then start interrupting only when told to do so Some devices do not provide a way to determine that the board has generated an interrupt Not all interrupting boards shut off interrupts when told to do so or after a bus reset Byte Ordering Peripheral devices can contain chips that use a byte ordering convention diffe
340. t a message to that effect and return DDI FAILURE Code Example 6 1 on page 110 does this Add the interrupt Initialize any associated mutexes There is a potential race condition between adding the interrupt handler and initializing mutexes The interrupt routine is eligible to be called as soon as ddi add intr 9F returns as another device might interrupt and cause the handler to be invoked This may result in the interrupt routine being called before any mutexes have been initialized with the returned interrupt block cookie If the interrupt routine acquires the mutex before it has been initialized undefined behavior may result The solution to this problem is to use ddi add intr 9F to add an interrupt handler that never claims the interrupt This allows the driver to get the interrupt block cookie for the interrupt which it can then use to initialize any mutexes Once the mutexes are initialized the temporary interrupt handler can be removed and the real one installed nulldev 9F can be used as the temporary handler though it needs to be cast properly See Code Example 6 1 for an example Interrupt Handlers 111 112 Code Example 6 1 att ach 9E routine with temporary interrupt handler static int xxattach dev info t dip ddi attach cmd t cmd struct xxstate xsp if cmd DDI ATTACH return DDI FAILURE if ddi intr hilevel dip inumber O cmn err CE CONT xx high level interrupts are not
341. t ach 9E always fails and the driver will not be unloaded This is the simplest way to specify that a driver is not unloadable Code Example 5 5 detach 9E routine static int xxdetach dev info t dip ddi detach cmd t cmd struct xxstate xsp int instance switch cmd case DDI DETACH instance ddi get instance dip xsp ddi get soft state statep instance 102 Writing Device Drivers August 1994 O1 lll make device quiescent device specific ddi_remove_minor_node dip NULL ddi_unmap_regs dip rnumber caddr_t amp xsp gt regp offset sizeof struct device reg ddi remove intr dip inumber xsp iblock cookie mutex destroy amp xsp mu cv destroy amp xsp Ccv ddi soft state free statep instance return DDI SUCCESS default return DDI FAILURE In the call to ddi_unmap_regs 9F rnumber and offset are the same values passed to ddi_map_regs 9F in attach 9E Similarly in the call to ddi remove intr 9F inumber is the same value that was passed to ddi_add_intr 9F Callbacks The detach 9E routine must not return DDI_SUCCESS while it has callback functions pending This is only critical for callbacks registered for device instances that are not currently open since the DDI_DETACH case is not entered if the device is open There are two types of callback routines of interest callbacks that can be cancelled and callbacks that must run t
342. t cookies Block interrupt cookies Device Interrupt Cookies Defined as type ddi_idevice_cookie_t this cookie is a data structure containing information used by a driver to program the interrupt request level or the equivalent for a programmable device See ddi add intr 9F and Registering Interrupts on page 99 for more information Interrupt Block Cookies Defined as type ddi iblock cookie t this cookie is used by a driver to initialize the mutual exclusion locks it uses to protect data This cookie should not be interpreted by the driver in any way There are four contexts in which driver code executes user kernel interrupt high level interrupt Writing Device Drivers August 1994 3 Printing Messages The following sections point out the context in which driver code can execute The driver context determines which kernel routines the driver is permitted to call For example in kernel context the driver must not call copyin 9F The manual pages in section 9F document the allowable contexts for each function User Context A driver entry point has user context if it was directly invoked because of a user thread The read 9E entry point of the driver invoked by a read 2 system call has user context Kernel Context A driver function has kernel context if was invoked by some other part of the kernel In a block device driver the strategy 9E entry point may be called by the pageout daemon to write pages
343. t device ttya ok setenv output device ttya On x86 platforms the test machine needs to set console 1 in etc system This causes a switch to COMI during reboot Preparing for the Worst It is possible that the driver will render the system unbootable this is most likely if the driver is for the boot device If a complete system reinstallation is to be avoided some advance work must be done to prepare for this possibility Debugging 235 13 236 Boot Off a Backup Root Partition One way to deal with this is to have another bootable root file system Use format 1M to make a partition the exact size of the original then from SunOS use dd 1M to copy it Do this from single user mode so that there is as little file system activity as possible and run fsck 1M on the new file system to ensure its integrity Later if the system cannot boot from the original root partition boot the backup partition and use dd 1M to copy the backup partition onto the original one If the system will not boot but the root file system is undamaged just the boot block or boot program was destroyed boot off the backup partition with the ask a option then specify the original filesystem as the root filesystem Boot Off the Network If the system is attached to a network the test machine can be added as a client of a server If a problem occurs the system can be booted off the network The local disks can then be mounted and fixed
344. t the assigned instance number new kmem alloc and new kmem zalloc have become kmem_alloc 9F and kmem zalloc 9F In SunOS 4 x sleep flags were KMEM SLEEP and KMEM NOSLEEP now they are KM SLEEP and KM NOSLEEF Consider using I I Converting a Device Driver to SunOS 5 4 285 286 KM SLEEP only on small requests as larger requests could deadlock the driver if there is not or there will not be enough memory Instead use KM NOSLEEP possibly shrink the request and try again Any required memory should be dynamically allocated as the driver should handle all occurrences of its device rather than a fixed number of them if possible Instead of statically allocating an array of controller state structures each should now be allocated dynamically Remember to call ddi create minor node 9F for each minor device name that should be visible to applications The module loading process turns the information in any driver conf 4 file into properties Information which used to pass in the config file such as flags should now be passed as properties getinfo SunOS 5 x int xxgetinfo dev info t dip ddi info cmd t cmd void arg void resultp Make sure that the minor number to instance number and the reverse translation is static since get info 9E may be called when the device is not attached For example define XXINST dev getminor dev gt gt 3
345. t was returned by ddi add intr 9F when the interrupt handler was set up Device interrupts must be disabled before calling ddi remove intr 9F and always call ddi remove intr 9F in the detach 9E entry point before returning successfully if any interrupts handlers were added int ddi add softintr dev info t dip int preference ddi softintr t idp ddi iblock cookie t ibcp ddi idevice cookie t idcp u int int handler caddr t caddr t int handler arg ddi_add_softintr 9F tells the system to call the function pointed to by int_handler when a certain software interrupt is triggered ddi add softintr 9F returns a software interrupt ID in the location pointed to by idp This ID is later used by addi trigger softintr 9F to trigger the software interrupt Summary of Solaris 2 4 DDI DKI Services 325 lll C Kernel Statistics 326 void ddi trigger softintr ddi softintr t id ddi trigger softintr 9P triggers the software interrupt identified by id The interrupt handling function that was set up for this software interrupt by ddi add softintr 9F is then called void ddi remove softintr ddi softintr t id ddi remove softintr 9F tells the system to stop calling the software interrupt handler for the software interrupt identified by id If the driver has soft interrupts registered it must call ddi remove softintr 9F in the detach 9E entry point before returning successfully int ddi dev nintrs d
346. tances Once an instance number has been assigned to a particular physical device by the system it stays the same even across reconfiguration and reboot Because of this instance numbers seen by a driver may not appear to be in consecutive order For non self identifying devices see Device Identification on page 14 this entry point should determine whether the hardware device is present on the system and return DDI_PROBE_SUCCESS if the probe was successful DDI_PROBE_FAILURE if the probe failed DDI_PROBE_DONTCARE if the probe was unsuccessful yet attach 9E should still be called OR DDI_PROBE_PARTIAL if the instance is not present now but may be present in the future For a given device instance attach 9E will not be called before probe 9E has succeeded at least once on that device It is important that probe 9E free all the resources it allocates because it may be called multiple times however attach 9E will not necessarily be called even if probe 9E succeeds For probe to determine whether the instance of the device is present probe 9E may need to do many of the things also commonly done by attach 9E In particular it may need to map the device registers Code Example 5 3 is an example of probe 9E Code Example 5 3 probe 9E routine static int xxprobe dev info t dip int instance volatile caddr t reg addr if ddi dev is sid dip DDI SUCCESS no need to pro
347. te Get device driver information Dump memory to the device during system failure Gain access to a device Relinquish access to a device Manage arbitrary driver properties Print error message on driver failure I O interface for block data Writing Device Drivers August 1994 Ko lll Autoconfiguration Note Some of the above entry points may be replaced by nodev 9F or nulldev 9F as appropriate attach 9E should perform the common initialization tasks for each instance of a device Typically these tasks include Allocating per instance state structures Mapping the device s registers Registering device interrupts Initializing mutex and condition variables Creating minor nodes Block device drivers create minor nodes of type S_IFBLK This causes a block special file representing the node to eventually appear in the devices hierarchy Logical device names for block devices appear in the dev dsk directory and consist of a controller number bus address number disk number and slice number These names are created by the disks 1M program if the node type is set to DDI_NT_BLOCK or DDI_NT_BLOCK_CHAN DDI_NT_BLOCK_CHAN should be specified if the device communicates on a channel a bus with an additional level of addressability such as SCSI disks and causes a bus address field tN to appear in the logical name DDI_NT_BLOCK should be used for most other devices For each minor device which co
348. tex exit amp xsp ctx lock return 1 xsp current cCUX ctxp Disable access callback for handle and return error ddi_mapdev_nointercept handle offset 0 if error xsp current cLtx NULL mutex exit amp xsp ctx lock return error mapdev free void xxmapdev free ddi mapdev handle t handle void devprivate This entry point is called when a mapping is unmapped This can be caused by a user process exiting or calling the munmap 2 system call Partial unmappings are not supported and will cause the munmap 2 system call to fail with EINVAL handle is the handle of the mapping being freed devprivate is a pointer to the driver private data associated with the mapping 224 Writing Solaris Graphics Device Drivers August 1994 i The mapdev free 9E routine is expected to free any driver private resources that were allocated when this mapping was created either by ddi mapdev 9F or by mapdev dup 9E There is no need to call ddi_mapdev_intercept 9F on the handle of the mapping being freed even if it is the mapping with the valid translations However to prevent future problems in mapdev_access 9E the device driver should make sure that its representation of the current mapping is set to no current mapping Code Example 11 3 mapdev_free 9E routine static void xxmapdev free ddi mapdev handle t handle void devprivate struct xxctx ctxp devprivate st
349. the device driver should be allocated in init 9E before calling mod install 9F and should be released in _fini 9E after calling mod remove 9F These routines have kernel context Note Drivers must use these names and they must not be declared static unlike the other entry points where the names and storage classes are up to the driver Autoconfiguration Routines static int xxidentify dev info t dip static int xxprobe dev info t dip static int xxattach dev info t dip ddi attach cmd t cmd Overview of SunOS Device Drivers 51 52 static int xxdetach dev info t dip ddi detach cmd t cmd static int xxgetinfo dev info t dip ddi info cmd t infocmd void arg void result The driver is single threaded on a per device basis when the kernel calls these routines with the exception of get info 9E The kernel may be in a multithreaded state when calling get info 9E which can occur at any time No calls to attach 9E will occur on the same device concurrently However calls to at tach 9E on different devices that the driver handles may occur concurrently Any per device resources should be allocated in attach 9E and released in detach 9E No resources global to the driver should be allocated in attach 9E These routines have kernel context Block Driver Entry Points int xxopen dev t devp int flag int otyp cred t credp int xxclose dev t dev int flag int otyp cred t credp int
350. the device is no longer busy See Chapter 4 Multithreading for details on condition variables When the device is no longer busy the st rategy 9E routine marks it as busy and prepares the buffer and the device for the transfer Set up the buffer for DMA Prepare the data buffer for a DMA transfer using ddi_dma_buf_setup 9F See Chapter 7 DMA for information on setting up DMA resources and related data structures Begin the Transfer At this point a pointer to the buf 9S structure is saved in the state structure of the device This is so that the interrupt routine can complete the transfer by calling biodone 9F The device driver then accesses device registers to initiate a data transfer In most cases the driver should protect the device registers from other threads by using mutexes In this case because st rategy 9E is single threaded guarding the device registers is not necessary See Chapter 4 Multithreading for details about data locks Once the executing thread has started the device s DMA engine the driver can return execution control to the calling routine Code Example 9 4 Synchronous block driver st rategy 9E routine static int xxstrategy struct buf bp struct xxstate xsp struct device_reg regp u_char temp int instance ddi_dma_cookie_t cookie instance getminor bp b edev xsp ddi get soft state statep instance if xsp NULL bioerror bp ENXIO
351. the driver also needs a virtual address to Writing Device Drivers August 1994 3 access any device registers To get a virtual address the driver must map the device registers into kernel virtual memory The Solaris 2 x DDI DKI provides this ability with the ddi_map_regs 9F function volatile char reg_addr ddi_map_regs caddr t amp reg addr Once the registers are successfully mapped they can be accessed as any other memory The following example writes one byte to the first location mapped hexadecimal notation is usually used when writing bits reg addr 0x10 I O Port Access In I O port access the device registers appear in I O address space Each addressable element of the I O address space is called an I O port Device registers are accessed through I O port numbers which are defined by the hardware These port numbers can refer to 8 16 or 32 bit registers Reading from a port is accomplished with one of the inb 9F family of routines Writing to a port is performed with an outb 9F routine On x86 systems device registers are typically accessed through I O ports Large buffers on the other hand are accessed using memory mapping Example Device Registers Most of the examples in this manual use a fictitious device that has an 8 bit command status register csr followed by an 8 bit data register The command status register is so called because writes to it go to an internal command registe
352. the mutex before accessing the data that the data are safe o 3 9 8 void mutex_exit kmutex_t mp mutex_exit 9F releases the mutual exclusion lock pointed to by mp Writing Device Drivers August 1994 C lll void mutex_destroy kmutex_t mp mutex_destroy 9F releases the resources associated with the mutual exclusion lock pointed to by mp int mutex owned kmutex t mp mutex_owned 9F returns non zero if the mutual exclusion lock pointed to by mp is currently held otherwise it returns zero Use mutex_owned 9F only in an expression used in ASSERT 9F int mutex tryenter kmutex t mp mutex_tryenter 9F is similar to mutex enter 9F but it does not block waiting for the mutex to become available If the mutex is held by another thread mutex tryenter 9F returns zero Otherwise mutex tryenter 9F acquires the mutex and returns non zero void rw destroy krwlock t rwlp rw destroy 9F releases the resources associated with the readers writer lock pointed to by rwlp void rw downgrade krwlock t rwlp If the calling thread holds the lock pointed to by rwlp for writing rw downgrade 9F releases the lock for writing but retains the lock for reading This allows other readers to acquire the lock unless a thread is waiting to acquire the lock for writing void rw enter krwlock t rwlp krw t enter type If enter type is RW READER rw enter 9F acquires the lock pointed to by rwl
353. the offset into the kernel buffer the data transfer count minimum of the requested and the remaining data the UIO READ flag and a pointer 156 Writing Device Drivers August 1994 Co lll to the uio structure xf return uiomove rsp gt ram uiop gt uio_offset min uiop gt uio_resid rsp ramsize uiop gt uio_offset UIO_READ uiop uwritec andureadc Another example might be a driver writing data directly to the device s memory which must be performed one byte at a time Each byte is retrieved from the uio 9S structure using uwritec 9F then sent to the device read 9E can use ureadc 9F to transfer a byte from the device to the area described by the uio 9S structure Code Example 8 4 Programmed I O write 9E routine using uwritec 9F static int xxwrite dev t dev struct uio uiop cred t credp int instance int value struct xxstate xsp struct device_reg regp instance getminor dev xsp ddi_get_soft_state statep instance if xsp NULL return ENXIO regp xsp gt regp whil uiop uio resid gt 0 do the PIO access a value uwritec uiop if value 1 return EFAULT regp data u_char value regp gt csr START TRANSFER this device requires a ten microsecond delay between writes drv_usecwait 10 return 0 Drivers for Character Devices 157 158 DMA Transfers Many characte
354. though that portion is not shown so it must take a caddr_t as an argument and return an int See Handling Resource Allocation Failures on page 135 for more information about DMA callback routines Code Example 9 7 Block driver start routine static int xxstart caddr t arg struct xxstate xsp struct xxstate arg struct device reg regp struct buf bp u char temp start should never be called with the mutex held just in case though ASSERT mutex owned amp xsp mu mutex enter amp xsp mu If there is nothing more to do or the device is busy return 57 Writing Device Drivers August 1994 Ko lll if xsp gt list_head NULL xsp gt busy mutex exit amp xsp mu return 0 xsp gt busy 1 Get the first buffer off the transfer list bp xsp gt list_head Update the head and tail pointer xsp gt list_head xsp gt list_head gt av_forw if xsp gt list_head NULL xsp gt list_tail NULL bp gt av_forw NULL mutex exit amp xsp mu set up DMA resources with ddi dma buf setup 9F xsp bp bp Set up device DMA engine from the cookie regp xsp regp regp dma addr cookie dmac address regp dma size cookie dmac size regp gt csr ENABLE INTERRUPTS START TRANSFER Read the csr to flush any hardware store buffers temp regp csr return 0 4 Handle the
355. tive transfer size the device can perform It also influences alignment and padding restrictions dlim cntr max is the upper bound of the DMA engine s address register This is often used where the upper 8 bits of an address register are a latch containing a segment number and the lower 24 bits are used to address a segment In this case diim cntr max would be set to 0x00FFFFFF this prevents the system from crossing a 24 bit segment boundary when establishing mappings to the object DMA 129 130 dlim burstsizes specifies the burst sizes that the device supports A burst size is the amount of data the device can transfer before relinquishing the bus This member is a bitmask encoding of the burst sizes For example if the device is capable of doing 1 2 4 and 16 byte bursts this field should be set to 0x17 The system also uses this field to determine alignment restrictions If the device is an SBus device and can take advantage of a 64 bit SBus the lower 16 bits are used to specify the burst size for 32 bit transfers and the upper 16 bits are used to specify the burst size for 64 bit transfers dlim dmaspeed is the average speed of the DMA engine in KBytes second This is intended to be a hint for the resource allocation routines but is optional and may be zero ddi dma lim x86 The x86 DMA limit structure contains the following members typedef struct ddi dma lim u long dlim addr lo lower bound of address range
356. tra edd bad da 315 Blow of Control coder oe oe DO eter e as 922 Interrupt Handling ioco uy atra rae eee OE en arsit ll VP es eh ER 922 Kernel Statistics bencede ote Sod E etie iP vr er rA 324 Memory All6catiOn cxvstet ela e e Lax DER C ea UR 326 Writing Device Drivers August 1994 Pollio ep detto Bep oer ved ed o s vA 327 Printing System Messages ox ti vate ES ENSE OR denne 327 Process Signaling igs sedo act ep pubs Plates I prb dns 328 DIDDBIBES ccce hi Ree Ea ka uaa eo CLS dd 329 Register and Memory Mapping llssseeeeess 331 I Port ACCESS uen opo RN ehe heal erus 333 SCSLand SESA shares crane ae Me sea detulerit ek d 334 Soft State Management i545 9s 44 CUT EERPPEITATAR 341 String Nanipulatlon covoetes eoe RCLDP ODE D D CI a eso 342 System Informallofisso a vxo C Dono Aegan ee ee v a IOS dd 344 Thread Synchronization s cesa co e E V ege 344 Jte ucc vss gebe de vtae eee b qo pet irai Mer 349 uio 9S Handling s saspe eee EE RYE CEEE CDU SEEE S 350 Utility CHONG Sd eee POPE PEPPER iu hart pda a i 350 D Sample Driver Source Code Listings 355 hono TUE 357 XV xvi Writing Device Drivers August 1994 Figures Figure 1 1 Possible device tree configurations 54 5 Figure 1 2 Example device trees 6 0 0 cece ccc eens 7 Figure 2 1 Sun 4 architecture VMEbus address spaces 20 Figure 2 2 Sun architecture address mapping 5
357. trant by the same thread If you already own the mutex you cannot own it again Doing this leads to the above panic panic mutex adaptive exit mutex not held by thread Releasing a mutex that the current thread does not hold causes the above panic panic lock set lock held and only one CPU This only occurs on a uniprocessor and says that a spin mutex is held and it would spin forever because there is no other CPU to release it This could happen because the driver forgot to release the mutex on one code path or blocked while holding it A common cause of this panic is that the device s interrupt is high level see ddi intr hilevel 9F and Intro 9F and is calling a routine that blocks the interrupt handler while holding a spin mutex This is obvious if the driver explicitly calls cv_wait 9F but may not be so if it s blocking while grabbing an adaptive mutex with mutex enter 9F Note In principle this is only a problem for drivers that operate above lock level 300 Writing Device Drivers August 1994 od lll Sun Disk Device Drivers Sun disk devices represent an important class of block device drivers A Sun disk device is one that is supported by disk utility commands such as format 1M and newfs 1M Disk I O Controls Sun disk drivers need to support a minimum set of I O controls specific to Sun disk drivers These I O controls are specified in the dkio 7 manual page Disk I O controls transfer di
358. truct buf bp clrbuf 9F zeroes bp gt b_bcount bytes starting at bp b un b addr Summary of Solaris 2 4 DDI DKI Services 311 312 void disksort struct diskhd dp struct buf bp disksort 9F implements a queueing strategy for block I O requests to block oriented devices dp is a pointer to a di skhd structure that represents the head of the request queue for a the disk disksort 9F sorts bp into this queue in ascending order of cylinder number The cylinder number is stored in the b resid field of the buf 9S structure This strategy minimizes seek time for some disks void freerbuf struct buf bp freerbuf 9F frees the bu 95 structure pointed to by bp The structure must have been allocated previously by get rbuf 9F int geterror struct buf bp geterror 9F returns the error code stored in bp if the B ERROR flag is set in bp gt b_flags It returns zero if no error occurred struct buf getrbuf long sleepflag getrbuf 9F allocates a buf 9S structure and returns a pointer to it sleepflag should be either KM SLEEP or KM_NOSLEEP depending on whether get rbuf 9F should wait for a buf 9S structure to become available if one cannot be allocated immediately int physio int strat struct buf struct buf bp dev t dev int rw void mincnt struct buf struct uio uio physio 9F translates a read or write I O request encoded in a uio 9S structure into a bu 95 I O
359. turned because of signal free resources return EINTR cv timedwait Another solution drivers used to avoid blocking on events that would not occur was to set a timeout before the call to sleep This timeout would occur far enough in the future that the event should have happened and if it did run it would awaken the blocked process The driver would then see if the timeout function had run and return some sort of error This can still be done in SunOS 5 x but the same thing may be accomplished with cv timedwait 9F An absolute time to wait for is passed to cv timedwait 9F and which will return zero if the time is reached and the event has not occurred See Code Example 4 3 on page 81 for an example usage of cv timedwait 9F Also see cv timedwait sig on page 84 for information on cv timedwait sig 9F Other Locks Semaphores and readers writers locks are also available See semaphore 9F and rwlock 9F Lock Granularity Generally start with one and add more depending on the abilities of the device See Choosing a Locking Scheme on page 85 and Appendix B Advanced Topics for more information In SunOS 4 x two distinct methods were used for handling interrupts Polled or autovectored interrupts were handled by calling the xxpo11 routine of the device driver This routine was responsible for checking all drivers active units Writing Device Drivers August 1994 A DMA
360. ue This allows devices to be configured by changing the property values Writing Device Drivers August 1994 122 Installing and Removing Drivers Before a driver can be used the system must be informed that it exists The add drv 1M utility must be used to correctly install the device driver Once the driver is installed it can be loaded and unloaded from memory without using add drv 1M again Copy the Driver to a Module Directory The driver and its configuration file must be copied to a drv directory in the module path Usually this is usr kernel drv su f cp xx usr kernel drv f cp xx conf usr kernel drv During development it may be convenient to add the development directory to the module path that the kernel searches by adding a line to etc system moddir kernel usr kernel new mod dir Optionally Edit etc devlink tab If the driver creates minor nodes that do not represent disks tapes or ports terminal devices etc devlink tab can be modified to cause devlinks 1M to create logical device names in dev See devlink tab 4 for a description of the syntax of this file Alternatively logical names can be created by a program run at driver installation time Run add drv 1M Run add drv 1M to install the driver in the system If the driver installs successfully add drv 1M will run disks 1M tapes 1M port s 1M and devlinks 1M to create the logical names in dev ad
361. umptions may be made about where standard kernel files are kept however It is a good idea to test the adding and removing of packages on newly installed machines that have not been modified for a development environment It is a common packaging error for a package to use a tool or file that exists only in a development environment or only on the driver writer s own development system For example no tools from Source Compatibility package SUNWscpu should be used in driver installation programs The driver installation must be tested on a minimal Solaris system without any of the optional packages installed Testing Specific Types of Drivers Since each type of device is different it is difficult to describe how to test them all specifically This section provides some information about how to test certain types of standard devices Tape Drivers Tape drivers should be tested by performing several archive and restore operations The cpio 1 and tar 1 commands may be used for this purpose The dd 1M command can be used to write an entire disk partition to tape which can then be read back and written to another partition of the same size Writing Device Drivers August 1994 1I3 and the two copies compared The mt 1 command will exercise most of the I O controls that are specific to tape drivers see mt io 7 all of the options should be attempted The error handling of tape drivers can be tested by attempting various oper
362. ure definition in the kernel header files Warning The driver should have knowledge only of headers and structures listed in Section 9S of the man Pages 9S DDI and DKI Data Structures even though interesting knowledge may be uncovered while debugging Writing Device Drivers August 1994 1I3 Example adb ona Core Dump During the development of the example ramdisk driver the system crashes with a data fault when running mk s 1M testi mkfs F ufs o nsect 8 ntrack 8 free 5 devices pseudo ramdisk 0 raw 1024 BAD TRAP mkfs Data fault kernel read fault at addr 0x4 pme 0x0 Sync Error Reg 80 lt INVALID gt pid 280 pc Oxff2f88b0 sp 0xf01fe750 psr 0xc0 context 2 gl g7 ffffff98 8000000 ffffff80 0 f01fe9d8 1 ffld4900 Begin traceback sp f01fe750 Called from 0098050 fp f01fe7b8 args 1180000 f01fe878 ffled280 ffled280 2 ff2f8884 Called from f 0097d94 fp f01fe818 args ff24fd40 f01fe878 f01fe918 0 0 ff2c9504 Called from f 0024e8c fp f0lfe8b0 args f0lfee90 f01fe918 2 f0l1fe8a4 fO0lfee90 3241c Called from 0005a28 fp f01fe930 args f00c1c54 f01fe98c 1 f00p9d58 0 3 Called from 15c9c fp effffca0 args 5 3241c 200 0 0 7fe00 End traceback panic Data fault savecore 1M was not enabled After enabling it See Saving System Core Dumps on page 247 the system is rebooted The crash is then recreated by running mkfs 1M again When the system comes up it saves the kernel and the cor
363. us sts rqpkt status u char sts rqpkt reason reason completion u_char sts rqpkt resid residue u long sts rqpkt state state of command u long sts rqpkt statistics statistics struct scsi extended sense sts sensedata Auto request sense can be disabled per individual packet by just allocating sizeof struct scsi status for the status block Code Example B 3 Allocating a packet with auto request sense pkt scsi init pkt ROUTE NULL bp CDB_GROUP1 xsp sdp cmd stat size PP LEN 0 func caddr t xsp The packet is submitted using scsi transport 9F as usual When a check condition occurs on this packet the host adapter driver Writing Device Drivers August 1994 B Issues a request sense command if the controller doesn t have auto request sense capability Obtains the sense data e Fills in the scsi arq status information in the packet s status block Sets STATE ARO DONE in the packet s pkt state field Calls the packet s callback handler pkt comp The target driver s callback routine should verify that sense data is available by checking the STATE ARO DONE bit in pkt state which implies that a check condition has occurred and a request sense has been performed If auto request sense has been temporarily disabled in a packet there is no guarantee that the sense data can be retrieved at a later time The target driver should then verify whether the auto request sense comm
364. use it ASSERT void ASSERT int expression ASSERT 9F can be used to assure that a condition is true at some point in the program It is a macro and what it does depends on whether or not the symbol DEBUG is defined from lt sys debug h gt If DEBUG is not defined the macro expands to nothing and the expression is not evaluated If DEBUG is defined the expression is evaluated and if the value is zero a message is printed and the system panics For example if at a point in the driver a pointer should be non NULL and if it is not something is seriously wrong the following assertion could be used ASSERT ptr NULL If compiled with DEBUG defined and the assertion fails a panic occurs panic assertion failed ptr NULL file driver c line 56 Note Because ASSERT 9F uses DEBUG it is suggested that any conditional debugging code should also be based on DEBUG rather than with a driver symbol such as MYDEBUG Otherwise for ASSERT 9F to function properly DEBUG must be defined whenever MYDEBUG is defined Assertions are an extremely valuable form of active documentation Writing Device Drivers August 1994 13 mutex owned int mutex owned kmutex t mp A significant portion of driver development involves properly handling multiple threads Comments should always be used when a mutex is acquired and are even more useful when an apparently
365. usually done at det ach 9E time for per device mutexes and ini 9E time for global driver mutexes Using Mutexes Every section of the driver code that needs to read or write the shared data structure must do the following Acquire the mutex Access the data Release the mutex For example to protect access to the busy flag in the state structure mutex enter amp xsp mu xsp gt busy 0 mutex exit amp xsp mu The scope of a mutex the data it protects is entirely up to the programmer A mutex protects some particular data structure because the programmer chooses to do so and uses it accordingly A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex For additional guidelines on using mutexes see Appendix B Advanced Topics Writing Device Drivers August 1994 Hx lll Readers Writer Locks A readers writer lock regulates access to a set of data The readers writer lock is so called because many threads can hold the lock simultaneously for reading but only one thread can hold it for writing Most device drivers do not use readers writer locks These locks are slower than mutexes and provide a performance gain only when protecting data that is not frequently written but is commonly read by many concurrent threads In this case contention for a mutex could become a bottleneck so using a readers writer lock might be more efficient S
366. v null as the system file The path the kernel uses when looking for modules can be set by changing the moddir variable in the system file If the driver module is in a working area such as home driver the following example adds that directory to the module path moddir kernel usr kernel home driver Caution Do not allow non root users to write to the module directory The set command is used to set integer variables To set module variables the module name must also be specified set module variable value For example to set the variable xxdebug in the driver xx use the following set command set xx xxdebug 1 To set a kernel integer variable omit the module name Other assignments are also supported such as bitwise ORing a value into an existing value set moddebug 0x80000000 See system 4 for more information Note Most kernel variables are not guaranteed to be present in subsequent releases moddebug is a bit field that controls the module loading process See lt sys modct1 h gt for all its possible values Here are a few useful ones Debugging 245 13 246 0x80000000 Print messages to the console when loading unloading modules 0x40000000 Give more detailed error messages 0x20000000 Print more detail when loading unloading such as including the address and size 0x00001000 No autounloading drivers the system will not attempt to unload the device driver when t
367. ven bus interrupt level may map to a high level interrupt on one platform but map to an ordinary interrupt on another platform The function ddi_intr_hilevel1 9F given an interrupt number returns a value indicating whether the interrupt is high level The driver can choose whether or not to support high level interrupts but it always has to check it cannot assume that its interrupts are not high level For information on checking for high level interrupts see Registering Interrupts on page 111 Types of Interrupts There are two common ways to request for an interrupt vectored and polled Both methods commonly specify a bus interrupt priority level Their only difference is that vectored devices also specify an interrupt vector but polled devices do not Vectored Interrupts Devices that use vectored interrupts are assigned an interrupt vector This is a number that identifies that particular interrupt handler This vector may be fixed configurable using jumpers or switches or programmable In the case of programmable devices an interrupt device cookie is used to program the device interrupt vector When the interrupt handler is registered the kernel saves the vector in a table Interrupt Handlers 109 110 When the device interrupts the system enters the interrupt acknowledge cycle asking the interrupting device to identify itself The device responds with its interrupt vector The kernel then uses this vector to
368. very open of type OTYP LYR the layering driver issues a corresponding close of type OTYP LYR The example keeps track of each type of open so the driver can determine when the device is not being used in close 9E See the open 9E manual page for more details about the ot yp argument close int xxclose dev t dev int flag int otyp cred t credp The arguments of close 9E entry point are identical to arguments of open 9E except that dev is the device number as opposed to a pointer to the device number Writing Device Drivers August 1994 9 The close 9E routine should verify otyp in the same way as was described for the open 9E entry point In the example c1ose 9E must determine when the device can really be closed based on the number of block opens and layered opens Code Example 9 3 Block device close 9E routine static int xxclose dev t dev int flag int otyp cred t credp int instance struct xxstate xsp instance getminor dev xsp ddi get soft state statep instance if xsp NULL return ENXIO mutex enter amp xsp mu switch otyp case OTYP LYR xsp nlayered break case OTYP BLK xsp gt open 0 break default mutex exit amp xsp mu return EINVAL if xsp gt open xsp gt nlayered not done yet mutex exit amp xsp mu return 0 cleanup rewind tape free memory wait for I O to drain mutex exit amp x
369. vice_reg u_char csr u_int data Overview of SunOS Device Drivers 49 lll Qo Driver Interfaces int main void printf The offset of csr is d its size is d n offsetof struct device reg csr sizeof u_char printf The offset of data is d its size is d n offsetof struct device reg data sizeof u int return 0 Here is a sample compilation with SPARCompilers 2 0 1 and a subsequent run of the program test cc Xa c c test a out The offset of csr is 0 its size is 1 The offset of data is 4 its size is 4 Driver developers should be aware that padding is dependent not only on the processor but also on the compiler The kernel expects device drivers to provide certain routines that must perform certain operations these routines are called entry points This is similar to the requirement that application programs have a __start entry point or that C applications have the more familiar main routine Entry Points 50 Each device driver defines a standard set of functions called entry points which are defined in the Solaris 2 4 Reference Manual AnswerBook Drivers for different types of devices have different sets of entry points according to the kinds of operations the devices perform A driver for a memory mapped character oriented device for example supports an mmap 9E entry point while a block driver does not Some operations are common to all drive
370. vices 341 342 status completion block SCB to allocate Pass in dmatoken a pointer to the bu 9S structure encoding the original I O request Use scsi_pktalloc 9F for commands that do no actual I O If callback is not NULL FUNC and the requested DMA resources are not immediately available the function pointed to by callback will be called when resources may have become available If callback is SLEEP FUNC scsi resalloc 9F may block waiting for resources int scsi reset struct scsi address ap int level scsi reset 9F requests the host adapter driver to reset the target at the SCSI address pointed to by ap if level is RESET TARGET If level is RESET ALL the entire SCSI bus is reset void scsi resfree struct scsi pkt pkt scsi_resfree 9F frees the scsi pkt 95 structure pointed to by pkt and related DMA resources that were previously allocated by scsi_resalloc 9F char scsi rname u char reason scsi rname 9F decodes the packet completion reason code reason and returns the corresponding reason string int scsi slave struct scsi device devp int callback void scsi slave 9F issues to the device indicated by devp a TEST UNIT READY command one or more REQUEST SENSE commands and an INQUIRY command to determine whether the target is present and ready It returns a code indicating the state of the target If callback is not NULL FUNC and necessary resourc
371. ws and segments ddi_dma_win_t win nwin ddi_dma_seg_t seg nseg int retw rets for win NULL retw ddi_dma_nextwin xsp gt handle win amp nwin DDI_DMA_DONE win nwin if retw DDI_SUCCESS do error handling else for seg NULL rets ddi dma nextseg nwin seg amp nseg DDI DMA DONE seg nseg if rets DDI SUCCESS do error handling else ddi dma segtocookie nseg amp off amp len amp cookie program the DMA engine Writing Device Drivers August 1994 ie The device must then be programmed to transfer this segment Although programming a DMA engine is device specific all DMA engines require a starting address and a transfer count Device drivers retrieve these two values from a given segment by calling ddi_dma_segtocookie 9F This function takes the segment and fills in a DMA cookie and the offset and length of the segment A cookie is of type ddi_dma_cookie 9S and has the following fields unsigned long dmac_address unsigned 32 bit address u_int dmac_size unsigned 32 bit size u_int dmac_type bus specific type bits Upon return from ddi_dma_segtocookie 9F the dmac_address field of the cookie contains the DMA transfer s starting address and dmac_size contains the transfer count Depending on the bus architecture the third field in the cookie may be required by the driver The exact shape of dmac_address is d
372. x B Advanced Topics For a pointer to SCSI driver sample code see Appendix D Sample Driver Source Code Listings Writing Device Drivers August 1994 10z Sun Common SCSI Architecture Overview The Sun Common SCSI Architecture SCSA is the Solaris 2 4 DDI DKI programming interface for the transmission of SCSI commands from a target driver to a host bus adapter driver This interface is independent of the type of host bus adapter hardware the platform the processor architecture and the SCSI command that is being transported across the interface By conforming to the SCSA the target driver can pass any SCSI command to a target device without knowledge of the hardware implementation of the host bus adapter The SCSA conceptually separates building the SCSI command by the target driver from transporting the SCSI command and data across the SCSI bus The architecture defines the software interface between high level and low level software components The higher level software component consists of one or more SCSI target drivers which translate I O requests into SCSI commands appropriate for the peripheral device Applications Application Program 1 Application Program 2 System Calls Target Target Target Driver 1 Driver 2 Driver 3 Kernel Sun Common SCSI Architecture SCSA Host Bus Adapter Host Bus Adapter Driver 1 Driver 2 Har
373. x4 03 That PC corresponds to rd_write which is a routine in the ramdisk driver The bug is in the ramdisk write routine and occurs during an 1d load instruction This load instruction is dereferencing the value of 02 4 so the next step is to determine the value of o2 Note Using the r command to examine the registers is inappropriate because the registers have been reused in the t rap routine Instead examine the value of o2 from the regs structure Writing Device Drivers August 1994 1I3 o2 has the value 19000 in the regs structure Valid kernel addresses are constrained to be above 0xE0000000 by the ABI so this address is probably a user one The ramdisk does not deal with user addresses though so this is something the ramdisk write routine should not be dereferencing Now where this occurs in relation to the complete routine must be determined so that the assembly language can be matched to the C code To do this the routine is disassembled up to the problem instruction which occurs 2c bytes into the routine Each instruction is 4 bytes in size so 2c 4 or 0xb additional instructions should be displayed rd write c i rd write rd write sethi hi OxfffffcO0 sgl add gl 0x398 gl ffffff98 save ssp sgl sp st i0 fp 0x44 st Sil fp 0x48 st i2 fp Ox4c ld fp 0x44 00 Gall getminor nop st 00 Sfp 0x4 ld fp 0x8 02 ld 02 Ox4 03
374. xsp high mu if queue empty xsp softint running mutex exit amp xsp high mu mutex exit amp xsp low mu return DDI INTR UNCLAIMED xsp gt softint_running 1 while dataonqueue ASSERT mutex_owned amp xsp gt high_mu dequeue data from high level queue mutex_exit amp xsp gt high_mu Writing Device Drivers August 1994 O lll normal interrupt processing mutex enter amp xsp high mu xsp 5softint running 0 mutex exit amp xsp high mu mutex exit amp xsp low mu DDI INTR CLAIMED return 119 Interrupt Handlers 120 Writing Device Drivers August 1994 The DMA Model DMA Many devices can temporarily take control of the bus and perform data transfers to and from main memory or other devices Since the device is doing the work without the help of the CPU this type of data transfer is known as a direct memory access DMA DMA transfers can be performed between two devices between a device and memory or between memory and memory This chapter describes transfers between a device and memory only The Solaris 2 x DDI DKI provides a high level architecture independent model for DMA This allows the framework the DMA routines to hide architecture specific details such as Setting up DMA mappings Building scatter gather lists Ensuring I O and CPU caches are consistent There are several abstractions that
375. xt Process A Process B Process C Hardware Device Figure 11 2 Device context switched to user process A Multiprocessor Considerations On a multiprocessor machine multiple processes could be attempting to access the device at the same time This can cause thrashing The kernel prevents this from happening by guaranteeing that once a device driver has granted access to a process no other process will be allowed to request access to the same device for at least one clock tick However some devices require more time to restore a device context than others To prevent more CPU time from being used to restore a device context than to actually use that device context the time that a process needs to have access to the device must be increased If more time than one click tick is Device Context Management 217 required the driver can block new access to the device for an additional predetermined amount of time using the standard thread synchronization function calls See Thread Synchronization on page 79 for more information Context Management Operation 218 In general here are the steps for performing device context management 1 Define a ddi mapdev ct 1 95 structure 2 Allocate space to save device context if necessary 3 Set up user mappings to the device and driver notifications with ddi mapdev 9F 4 Manage user access to the device with ddi mapdev intercept 9F and d
376. y xxprobe xxattach xxdetach nodev Autoconfiguration 91 lll O1 amp xX cb ops struct bus ops NULL static struct modldrv modldrv amp mod driverops xx driver v1 0 amp XX Ops static struct modlinkage modlinkage MODREV_1 amp modldrv NULL int init void int error ddi soft state init amp statep sizeof struct xxstate estimated number of instances further per module initialization if necessary error mod install amp modlinkage if error 0 undo any per module initialization done earlier ddi soft state fini amp statep return error int _fini void int error rror mod_remove amp modlinkage if error 0 release per module resources if any were allocated ddi soft state fini amp statep return error int _info struct modinfo modinfop return mod info amp modlinkage modinfop 92 Writing Device Drivers August 1994 5 Device Configuration Any one time resource allocation or data initialization should be performed during driver loading in _init 9E For example any mutexes global to the driver should be initialized here Do not however use _init 9E to allocate or initialize anything that has to do with a particular instance of the device Per instance initialization must be done in attach 9E For example if a driver for a printer can drive more than one printer at the same tim
377. y there The test criteria must be rigorous enough to avoid misidentifying devices It may for example appear that the device is present when in fact it is not because a different device appears to behave like the expected device The ddi peek 9F and ddi poke 9F family of routines must be used to access the device registers as they cope correctly with the faults that may occur if the access fails for example because the device is not there Writing Device Drivers August 1994 O1 lll attach The system calls att ach 9E to attach a device instance to the system The responsibilities of the DDI_ATTACH case of attach 9E include Optionally allocating a soft state structure for the instance Registering an interrupt handler Mapping device registers Initializing per instance mutexes and condition variables Creating minor device nodes for the instance Code Example 5 4 is an example of an attach 9E routine Code Example 5 4 attach 9E routine static int xxattach dev info t dip ddi attach cmd t cmd struct xxstate xsp int instance switch cmd case DDI_ATTACH get assigned instance number EA instance ddi get instance dip this device requires DMA capability make sure the bus slot allows this aA if ddi slaveonly dip return DDI FAILURE DDI SUCCESS if ddi soft state zalloc statep instance 0 return DDI FAILURE xsp ddi get soft state stat
378. y tree and to add a new entry to etc inittab Autoconfiguration 101 lll O1 Vendor supplied strings should include an identifying value to make them unique such as their name or stock symbol if appropriate The string along with the other node types not consumed by disks 1M tapes 1M or ports 1M can be used in conjunction with devlinks 1M and devlink tab 4 to create logical names in dev Deferred Attach open 9E might be called before attach 9E has succeeded open 9E must then return ENXIO which will cause the system to attempt to attach the device If the attach succeeds the open is retried automatically detach detach 9E is the inverse operation to att ach 9E It is called for each device instance receiving a command of DDI_DETACH when the system attempts to unload a driver module The system only calls the DDI_DETACH case of detach 9E for a device instance if the device instance is not open No calls to other driver entry points for that device instance occurs during detach 9E although interrupts and time outs may occur The main purpose of detach 9E is to free resources allocated by attach 9E for the specified device For example detach 9E should unmap any mapped device registers remove any interrupts registered with the system and free the soft state structure for this device instance If the detach 9E routine entry in the dev ops 95 structure is initialized to nodev it implies that de

Download Pdf Manuals

image

Related Search

Related Contents

Milwaukee 5455 Use and Care Manual  PLUS 100 AB - Pego S.r.l.    INTEGRATION PLUG&PLAY IC 165VW  voyage 1.1  INSTALLATION MANUAL  Manual técnico    S6161-PA-FSE-010 - Equipment Catalog  MANUAL DEL USUARIO SERVICE MANUAL MANUEL  

Copyright © All rights reserved.
Failed to retrieve file