Home

DIGITAL StorageWorks HSJ50 Array Controller HSOF V5.1

image

Contents

1. CXO 5291A MC To move a storageset member while maintaining the data it contains 1 Delete the unit number of the storageset that contains the disk drive you want to move HSJ50 gt DELETE unit number 2 Delete the storagesest that contains the disk drive you want to move HSJ50 gt DELETE storageset name 3 Delete each disk drive one at a time that was contained by the storageset HSJ50 gt DELETE disk name_1 HSJ50 gt DELETE disk name_2 HSJ50 gt DELETE disk name_n 4 Move the desired disk drive to its new PTL location 5 Re add each member to the controller s list of valid devices HSJ50 gt ADD DISK disk name PTL location HSJ50 gt ADD DISK disk name PTL location HSJ50 gt ADD DISK disk name PTL location HSJ50 Array Controller Moving Storagesets and Devices 4 7 HSJ50 Array Controller 6 Re create the storageset by adding it s name to the controller s list of valid storagesets and specifying the disk drives it contains Although you have to re create the storageset from its original members you don t have to add them in their original order HSJ50 gt ADD STORAGESET storageset name disk_1 disk_n 7 Re present the storageset to the host by giving it a unit number that the host can recognize You can use the original unit number or create a new one HSJ50 gt ADD UNIT unit number storageset name Example The following example moves disk210 to PTL location 30
2. CXO 4824A MC 4 At the tape drive to be replaced press the two mounting tabs together to release the device from the shelf and partially pull it out of the shelf 5 Use both hands to pull the device out of the shelf HSJ50 Array Controller Service Manual 2 54 Replacing field replaceable units 6 Quiesce the port again Look for the following OCP indicators LE oowoo i Tae A e G ee S a G n A ae L je e jo ie e ie CXO 4824A MC 7 Align the tape drive with the shelf rails and insert the new device 8 When the controller recognizes the device the port light will be turned off and the system will return to normal operation Replacing a solid state disk drive optical or CD ROM drives Use the cold swap method to replace a solid state optical or CD ROM drive When using this method you must shut down the controllers and remove power from the shelf Required tools The tools listed in Table 2 10 are required for replacing solid state disk drives opticalor CD ROM drives Table 2 10 Required tools for Replacing solid state disk drives Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet Solid state optical and CD ROM drive replacement procedure 1 Halt all host I O activity using the appropriate procedures for your operating system 2 Connect a maintenance terminal to one of the controllers Service Manual HSJ50 Array Contr
3. NAME Type Port Targ Lun Used by LOADER120 passthrough loader 4 2 0 P3 TAPE430 passthrough tape 4 3 0 PO HSJ50 gt DE P3 HSJ50 gt DE PO HSJ50 gt DE TAPE430 move passthrough loaderl20 to new location HSJ50 gt ADD PASSTHROUGH LOADER120 1 2 0 HSJ50 gt ADD UNIT PO TAPE430 HSJ50 gt ADD UNIT P3 LOADER LOADER 120 HSJ50 Array Controller Service Manual 5 Removing Removing a patch Removing a controller and cache module Removing storage devices HSJ50 Array Controller Service Manual 5 2 Precautions Service Manual Removing Some of the procedures in this chapter involve handling program cards controller modules and cache modules Use the following guidelines to prevent component damage while servicing subsystem modules After removing a controller or cache module from the shelf place the module into an approved antistatic bag or onto a grounded antistatic mat Cover the program card with the snap on ESD cover when the card is installed in the controller Keep the program card in its original carrying case when not in use Do not twist or bend the program card Do not touch the program card contacts HSJ50 Array Controller Removing 5 3 Removing a patch Use the delete patch program to free memory space for patches that need to be added to the current software version When patches are removed from controller memory they will also be removed from storagesets
4. 030C4002 00003C51 Q lt 00000000 heed 000B9331 VARN 00000000 EE 00000000 Diera 00000000 Peen f 00000000 EEEE 1F000504 VEE 36325A52 RZ26 20202020 f 29432820 HSJ50 Array Controller Troubleshooting C LONGWORD 13 43454420 DEC LONGWORD 14 34333533 3534 LONGWORD 15 37313739 9717 LONGWORD 16 00000000 fadia LONGWORD 17 00000004 fenek LONGWORD 18 00000000 VEEN A LONGWORD 19 853F0000 AE A LONGWORD 20 00000000 Once you locate and identify the instance code see the Appendix for information about instance codes last fail codes and recommended repair actions If possible use the FMU to interpret the event codes See Using FMU to describe event log codes in this chapter for details about using the FMU Reading a DECevent error log DECevent generated error reports while containing basically the same information as ERF generated reports are far easier to interpret This is true because more of the binary log is translated Some of the information directly available from the ASCII report output includes the following HSJ50 Array Controller Instance code Recommended repair action Recovery threshold PTL of the faulty device ASC ASCQ code values Template type MSCP event code Controller model The error log example that follows illustrates the difference between an event log generated by ERF as shown in the exa
5. N CXO 5282A MC 11 Press Return on the operating controller s console 12 Wait for the following text to be displayed on the operating controller s console Service Manual HSJ50 Array Controller Replacing field replaceable units 2 13 Port 1 restarted Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted Controller Warm Swap terminated The configuration has two controllers To restart the other HSJ50 1 Enter the command RESTART OTHER_CONTROLLER 2 Press and hold in the Reset button while inserting the program card 3 Release Reset the controller will initialize 4 Configure new controller by referring to the controller s configuring manual Restarting the subsystem HSJ50 Array Controller Note The following section explains in detail the four steps shown on the terminal s screen Start the new controller by entering the following CLI command HSJ50 gt RESTART OTHER_CONTROLLER Connect the maintenance terminal to the newly installed controller Press and hold the Reset button on the new controller while inserting the program card from the replaced controller Release the Reset button to initialize the controller Wait for the CLI prompt to appear at the terminal You will see a Controllers misconfigured message which you can ignore If the new controller reports an invalid cache error enter one of the follow
6. button to initialize the controller Wait for the CLI prompt to appear at the terminal You will see a Controllers misconfigured message which you can ignore Enter the following command HSJ50 gt SET NOFAILOVER Enter the following command from the controller B CLI to put the controllers into dual redundant mode HSJ50 gt SET FAILOVER COPY OTHER_CONTROLLER Controller B will restart Ensure that the ECB cable mounting are secure Remove the disconnected SBB battery module from the device shelf and replace it with the new operating SBB Shut down the old SBB battery module by pressing both ECB shut down buttons until the LEDs stop flashing HSJ50 Array Controller Replacing field replaceable units 2 47 Replacing power supplies There are two methods for replacing shelf and controller power supplies asynchronous swap and cold swap Asynchronous swap allows you to remove a defective power supply while the other power supply provides power to the shelf or the controller Use asynchronous swap to replace a power supply only when there is a redundant power supply in the shelf and if one power supply is still operating When using the cold swap method service to a device is interrupted for the duration of the service cycle Use the cold swap method when there are no redundant power supplies in the shelf Required tools Table 2 7 Required tools The tools listed in Table 2 7 are required for replacing power sup
7. If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 36 The CACHEBO or CACHEB1 DRAB detected a Nonexistent Memory Error condition Use the following register information to locate additional details The CACHEBn DRAB EAR register combined with the Master DRAB RSR register bits 12 through 15 CACHEB memory region yields the affected memory address The CACHEBn DRAB EDR register contains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 36 HSJ50 Array Controller Appendix A HSJ50 Array Controller Repair Action Code 2F A 97 Action to take The Master DRAB detected an Address Parity Error or a Write Data Parity Error condition Use the following register information to locate additional details The Master DRAB EAR register combined with Master DRAB ERR bits 0 through 3 address region yields the affected memory address The Master DRAB EDR register contains the error data If the fa
8. 03A34002 Request Sense command to drive failed 03A40064 Illegal command for pass through mode 03A64002 Premature completion of a drive command 03AC4002 ID message not sent by drive 03AD4002 Synchronous negotiation error 03AE4002 The drive unexpectedly disconnected from the SCSI bus 03B40101 No command control structures available for media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined oe m i x 03B52002 SCSI interface chip command time out during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined HSJ50 Array Controller Service Manual A 30 Appendix A a 03B64002 Byte transfer time out during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03B74402 SCSI bus errors during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03B82002 Device port SCSI chip reported gross error during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03B92002 Non SCSI bus parity error durin
9. ences Shelf LED PS 2 is operational onanio 5 mm Roiccre Shelf LED Possible PS 1 and PS 2 fault or input power problem Power supply LED LEDon i LED off Service Manual HSJ50 Array Controllers Installing 3 63 Required tools The tools listed in Table 3 15 are required for power supply installation Table 3 15 Required tools for power supply installation Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet Installing a power supply Use the following procedure to install a power supply For dual power supply configurations repeat this procedure For a single power supply configuration use slot 7 of the SBB shelf Use slot 6 if you are adding a second power supply for redundancy 1 Firmly push the power supply into the shelf until the mounting tabs snap into place See Figure 3 27 Figure 3 27 Installing a power supply SBB CXO 5304A MC X Plug the power cord into the supply Observe the power and shelf status indicators and ensure they are both on If the status indicators are not on refer to the Status indicator tables and take appropriate service action HSJ50 Array Controllers Service Manual 3 64 Installing 4 Repeat the above steps to add a second power supply for redundancy After connecting the power cord observe the status indicators and ensure that they are both on Installing storage building blocks The storage device building blocks SBBs are in
10. i Offline Inoperative The unit is inoperative and cannot be brought available by the controller m Offline Maintenance The unit has been placed in maintenance mode for diagnostic or other purposes o Online Mounted by at least one of the host systems For HSZ controllers on line in this column means that the unit is on line to the HSZ controller only It does not indicate that the unit is mounted by the host r Offline Rundown The CLI SET NORUN command has been issued for this unit v Offline No Volume Mounted The device does not contain media x Online to other controller Not available for use by this controller A space in this column indicates the availability is unknown HSJ50 Array Controller Troubleshooting HSJ50 Array Controller The spindle state is indicated using the following characters For disks this symbol indicates the device is at speed For tapes it indicates the tape is loaded gt For disks this symbol indicates the device is spinning up For tapes it indicates the tape is loading lt For disks this symbol indicates the device is spinning down For tapes it indicates the tape is unloading v For disks this symbol indicates the device is stopped For tapes it indicates the tape is unloaded For other types of devices this column is left blank For disks and tapes a w in the write protect column indicates the unit i
11. 0 1 4 Enter option 3 to list patches The following patches are currently stored in the patch area Firmware Version Patch number s v50J Zi V519 D2 rk Currently 91 of the patch area is free The SHOW THIS_CONTROLLER command also provides patch information In the following example software Version 3 0 has three patches applied to the current software HSJ50 Array Controllers Service Manual 3 6 Installing 5 At the CLI prompt enter CLI gt SHOW THIS_CONTROLLER Controller HSJ50 ZG33400026 Firmware V51J 3 Hardware 0000 Note at the bold number 3 shows that three patches have been installed for software version 51J Installing a patch This option allows you to enter a firmware program patch directly into the controller s NVMEM You are prompted to enter the firmware version number to which the patch applies the patch length the patch type the patch number the count the RAM address the new contents of that address and a patch verification number Note The patch data in this example is provided only for the purpose of illustrating the code patch operation Obtain actual code patch data for your controller s firmware version from your Digital representative The code patch utility verifies that the patch you are entering is appropriate for the firmware version in the controller and that there are no required dependent patches It allows you to enter only one patch at a ti
12. 022E0102 02360101 02370102 Service Manual Appendix A Explanation A callback from DS on a transfer request has returned a bad or illegal DWD status Last Failure Parameter 0 contains the DWD Status Last Failure Parameter 1 contains the DWD address Last Failure Parameter 2 contains the PUB Address Last Failure Parameter 3 contains the Device port A READ_LONG operation was requested for a Local Buffer Transfer READ LONG is not supported for Local Buffer Transfers A WRITE_LONG operation was requested for a Local Buffer Transfer WRTE_LONG is not supported for Local Buffer Transfers An invalid mapping type was specified for a logical unit Last Failure Parameter 0 contains the USB address Last Failure Parameter 1 contains the Unit Mapping Type Unrecognized state supplied to FOC SEND callback routine va_dap_snd_cmd_complete Last Failure Parameter 0 contains the unrecognized value Unsupported return from HIS GET_CONN_INFO routine Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status HSJ50 Array Controller Appendix A HSJ50 Array Controller Last Fail Code 02392084 023A2084 02440100 02530102 02560102 Explanation A 49 A processor interrupt was generated by the controller s XOR engine FX with no bits set in the CSR to indicate a reason for the interrupt Last Failure Parameter 0 contains t
13. 5 6 Removing Removing a controller and cache module A controller and a cache module may be removed so they it can be used in another subsystem Required tools The tools listed in Table 5 2 are required for removing a controller Table 5 2 Required tools Maintenance terminal To shutdown and to restart controllers Small flat head screwdriver To loosen captive screws ESD wrist strap To protect all equipment against electrostatic discharge 5 32 inch Allen wrench To unlock the SW800 series cabinet To remove a controller and its cache module 1 Connect a maintenance terminal to the controller to be removed See Figure 5 2 Figure 5 2 Connecting a maintenance terminal to the controller Local connection port BC16E XX we To terminal CXO 5322A MC 2 Take the controller to be removed out of service CLI gt SHUTDOWN THIS CONTROLLER Service Manual HSJ50 Array Controller Removing 5 7 3 Ensure that the controller has shutdown cleanly Check for the following indications on the controller s OCP operator control panel The Reset light is on Port lights 1 2 3 are continuously lit 4 With a small flat head screwdriver loosen the captive screws that secure the CI cable to the controller See Figure 5 3 Figure 5 3 Disconnecting the Cl cable connector Cl bus cable CXO 5319A MC Remove the CI cable from the controller s front bezel 6 Disable the ECB by pressing the b
14. HSJ50 Array Controllers The Code Patch utility does not allow you to incorrectly enter or delete patch information The program provides messages to assist you with understanding any problems that you may encounter and suggests corrective actions Message Software Version x does not have any patches to delete Explanation You cannot delete a patch because the software version entered does not have any patches entered Message Firmware Version x does not have patch number x to delete Explanation You cannot delete this patch because the software version entered does not have the specified patch entered Service Manual Service Manual Installing Message The patch you entered is already installed on this controller Explanation The specified patch is already present in the patch area of controller memory If you wish to reenter this patch first use the Delete Patch option Message The patch you are entering requires other patches to be entered Explanation You have attempted to enter a patch without first entering the lower numbered patches in the hierarchy Enter all patches for this software version that have lower numbers than the current patch Then enter the current patch Message WARNING The patch you are entering is not for the current firmware version x Explanation The patch you are entering applies to a software version other than the one currently installed in the controller Code Patch w
15. Small computer system interface An ANSI interface defining the physical and electrical parameters of a parallel I O bus used to connect initiators to a maximum of seven devices The StorageWorks device interface is implemented according to SCSI 2 standard allowing the synchronous transfer of 8 bit data at rates of up to 10 MB s SCSI device A host computer adapter a peripheral controller or a storage element that can be attached to the SCSI bus SCSI device ID The bit significant representation of the SCSI addressing that refers to one of the signal lines numbered 0 through 7 Also referred to as a target ID SCSI A cable A 50 conductor 25 twisted pair cable used for single ended SCSI bus connections SCSI P cable A 68 conductor 34 twisted pair cable used for differential bus connections Small Computer System Interface See SCSI Spareset A pool of disk drives used by the controller to replace failed members of a RAIDset Service Manual Service Manual Glossary SPD Software product description A document that contains the legal description of a product storageset Any collection of containers such as stripesets RAIDsets the spareset and the failedset that make up a container storage unit The general term that refers to storagesets single disk units and all other storage devices that can be installed in your subsystem and accessed by a host A storage unit can be any entity that is capable
16. 01662D02 01672D02 01682D02 01692D02 Service Manual Appendix A Explanation The Master DRAB detected a Nonexistent Memory Error condition during a host port attempt to read buffer memory The Master DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write buffer memory The Master DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write a byte to buffer memory The Master DRAB detected a Nonexistent Memory Error condition during a Device port attempt to read buffer memory The Master DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write buffer memory The Master DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write a byte to buffer memory The Master DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to read buffer memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an FX attempt to write CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an FX attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an FX attempt to read CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during a host port attempt to write CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during a host port attempt t
17. 40 Replacing field replaceable units Port 4 quiesced Port 5 quiesced Port 6 quiesced All ports quiesced Insert the other HSJ50 WITHOUT its program card and press Return 5 Slide the cache module for controller A along the rails and then push firmly to seat it in the backplane See Figure 2 24 Figure 2 24 Reinstalling the cache and controller module Controller CXO 5324A MC 6 Slide controller A along the rails and then push firmly to seat it in the backplane See Figure 2 24 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 41 Caution Do not overtighten the controller s front panel captive screws Damage to the controller PC board or front panel may result Reinstall the CI connector on controller A and tighten the captive screws Press Return on the operating controller s console Wait for the following text to be displayed on the operating controller s console Port 1 restarted Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted Controller Warm Swap terminated The configuration has two controllers To restart the other HSJ50 1 Enter the command RESTART OTHER_CONTROLLER 2 Press and hold in the Reset button while inserting the program card 3 Release Reset the controller will initialize 4 Configure new controller by referring to the controller s configuration manual Restarti
18. Associated Additional Sense Code Qualifier fields are undefined 03832002 A SCSI interface chip command time out occurred during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03844002 Byte transfer time out during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined HSJ50 Array Controller Service Manual 40 40 40 40 00 00 40 40 01 40 0 40 0 0 40 40 40 40 40 45 1 20 40 A 28 Appendix A PE 03854402 SCSI bus errors occurred during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03862002 Device port SCSI chip reported gross error during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03872002 A non SCSI bus parity error occurred during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03880101 A source driver programming error was encountered during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 0389010
19. Cm This column indicates what percentage of data transferred between the host and the unit were compared A compare operation may be accompanied by either a read or a write operation so this column is not cumulative with read percentage and write percentage columns This data is only contained in the DEFAULT display for disk and tape device types HSJ50 Array Controller Troubleshooting Device Status PTLO ASWFO D100 D120 D140 D210 D230 D300 D310 D320 D400 D410 D420 D430 D440 D450 D500 D510 D520 D530 KA AA KA RA AA AA KA AS AA AX KA 1 59 HT This column indicates the cache hit percentage for data transferred between the host and the unit PH This column indicates the partial cache hit percentage for data transferred between the host and the unit MS This column indicates the cache miss percentage for data transferred between the host and the unit Purge This column shows the number of blocks purged from the write back cache in the last update interval BlChd This column shows the number of blocks added to the cache in the last update interval BIHit This column shows the number of cached data blocks hit in the last update interval Rq S RdKB S WrkB S Que Tg CRO BRO TRO 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 93 0 ii 1 0 0 0 0 0 0 0 0 0 0 0 11 93 0 2 1 0 0 0 0 0 0 0 0 0 0 0 36 247 0 12 10 0 0 0 11 93 0 2 1
20. HSJ50 Array Controller You can run only one VTDPY session on each controller at one time Prior to running VTDPY set the terminal to NOWRAP mode to prevent the top line of the display from scrolling off of the screen To initiate VTDPY from a maintenance terminal at the CLI gt prompt enter the following command HSJ50 gt RUN VTDPY Service Manual 1 46 Troubleshooting To initiate VTDPY from a virtual terminal use the following command This command applies only to HSJ or HSD controllers from VMS SET HOST DUP SERVER MSCP DUP lt controller_name gt TASK VTDPY Using the VTDPY Control Keys Use the control key sequences shown in Table 1 5 with the VTDPY display Table 1 5 VTDPY control keys Control Key Sequence Function Ctrl R Refreshes current screen display same as Ctrl W Ctrl W Refreshes current screen display same as Ctrl R Ctrl Y Terminates VTDPY and resets screen characteristics Ctrl Z Updates the screen same as Ctrl G While VTDPY and a maintenance terminal interface support passing all of the listed control characters some host based terminal interfaces restrict passing some of the characters All of the listed characters have equivalent text string commands defined in the table in Using the VTDPY Command Line Using the VIDPY Command Line VTDPY contains a command line interpreter that you can invoke by entering Ctrl C any time after starting the program The command line interpret
21. Occurred on 07 DEC 1995 at 09 21 44 Controller Model HSJ50 Serial Number 2651909900 Hardware Version 0000 00 Controller Identifier Unique Device Number 01519090 Model 40 28 Class 1 01 Firmware Version W18J FF Node Name HSJA3 CI Node Number 12 0C Informational Report Instance Code 01010302 Description An unrecoverable hardware detected fault occurred Reporting Component 1 01 Description Executive Services Reporting component s event number 1 01 Event Threshold 2 02 Classification HARD Failure of a component that affects controller performance or precludes access to a device connected to the controller is indicated Last Failure Code 018800A0 No Last Failure Parameters Last Failure Code 018800A0 Description A processor interrupt was generated with an indication that the program card was removed Reporting Component 1 01 Description Executive Services Reporting component s event number 136 88 Restart Type 2 02 Description Automatic hardware restart Testing disks DILX HSJ50 series controllers have a Disk In line Exerciser DILX that you can use to test suspect disks When you run DILX you can specify many parameters for the test such as starting and ending block numbers the duration of the test and whether the test should be read only or read write Service Manual HSJ50 Array Controller Troubleshooting 1 23 Note DILX
22. See FRU Firmware HSJ50 Array Controller Index copying from host to subsystem 3 17 installing on devices 3 15 upgrading for devices 3 15 Flush G 5 Formatting a disk drive 3 12 FRU G 5 dual redundant configuration 2 22 replacing 2 2 single configuration 2 15 FWD SCSI G 5 G Guidelines ESD protection 2 2 H Half height device G 5 HBVS G 5 HIS G 5 Host G 5 Host copy script OpenVMS 3 17 Host port cables and ESD 2 2 Hot swap G 6 HSOF G 5 HSUTIL abort codes 3 21 device code load function 3 15 error messages 3 21 formatting a disk drive with 3 12 Initiator G 6 Installation controller into a shelf 3 25 firmware on target device 3 18 installing a cache module 3 44 power supplies into shelf 3 60 precautions 3 2 SBBs 3 64 second controller 3 30 SIMM cards 3 53 Installing new device firmware 3 15 Instance code HSJ50 Array Controller definitions A 2 L Last fail code code load code patch utility CLCP A 90 Last fail codes CLI A 72 Clone unit utility CLONE A 89 common library A 67 device configuration utilities CONFIG CFMENU A 89 diagnostics and utilities protocol server A 84 disk and tape MSCP server A 80 disk in line exerciser DILX A 86 DUART services A 67 executive services A 42 facility lock manager A 71 Failover control A 68 fault manager A 64 format and device code load utility HSUTIL A 89 host interconnec
23. Service Manual HSJ50 Array Controller Appendix A Instance Code 0257000A 0258000A 0259000A 025A000A 025B000A 025C000A 025D000A 025E0064 03010101 03022002 HSJ50 Array Controller A 23 Explanation An attempt to reassign a bad disk block failed The contents of the disk block is lost The Information field of the Device Sense Data contains the block number of the first block in error The command was aborted prior to completion The Information field of the Device Sense Data contains the block number of the first block in error The write operation failed because the unit is hardware write protected The Information field of the Device Sense Data contains the block number of the first block in error The command failed because the unit became inoperative prior to command completion The Information field of the Device Sense Data contains the block number of the first block in error The command failed because the unit became unknown to the controller prior to command completion The Information field of the Device Sense Data contains the block number of the first block in error The command failed because of a unit media format error The Information field of the Device Sense Data contains the block number of the first block in error The command failed for an unknown reason The Information field of the Device Sense Data contains the block number of the first block in error The mirrorset unit a
24. fields are undefined Received SCS DISCONNECT_REQ on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS DISCONNECT_RSP ona connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS CREDIT_REQ on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS CREDIT_RSP on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS APPL_MSG on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined A 39 Service Manual A 40 Appendix A Instance Code Explanation 4064020A Received an unrecognized SCS message Note that in this instance if the connection id field is zero the content of the vcstate remote node na
25. for yes Wait for the following text to appear on the operating controller s console Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced Port 6 quiesced All ports quiesced Insert the other HSJ50 WITHOUT its program card and press Return HSJ50 Array Controller Replacing field replaceable units 2 11 6 If necessary install the cache module Slide the module straight in along the rails and then push firmly to seat it in the backplane See Figure 2 5 Figure 2 5 Installing the new cache and controller module CXO 5324A MC 7 Slide the module straight in along the rails and then push firmly to seat it in the backplane See Figure 2 5 HSJ50 Array Controller Service Manual 2 12 Replacing field replaceable units Caution Do not overtighten the controller s front panel captive screws the cache module s front panel captive screws or the ECB cable captive screws Damage to the controller PC board or front panel the cache module front panel or the battery SBB may result 8 Tighten the two front panel captive screws on the controller and cache module 9 Reconnect the CI cable to the controller s front panel 10 Reconnect the open end of the ECB cable to the cache module Tighten the ECB cable captive screws See Figure 2 6 Figure 2 6 Reconnecting the ECB cable
26. removing 5 6 replacing controllers 2 3 replacing in a dual redundant configuration 2 22 replacing in a single configuration 2 15 warm swap 2 3 Copying firmware from host to subsystem 3 17 CRC G 3 Creating the firmware source in your subsystem 3 17 D Data center cabinet G 3 DDL G 3 DECevent 1 15 Deleting cache modules 5 6 controllers 5 6 storage devices 5 10 Device tape drives 4 10 Device code load function 3 15 Devices CD ROM drive 4 10 disk drives 4 10 installing new firmware on 3 15 moving 4 10 removing 5 10 Differential SCSI bus G 3 DILX 1 22 G 3 advanced disk test 28 running a disk test 1 25 Disk drives Service Manual Index configuring 3 17 formating 3 12 installing new firmware on 3 15 removing 5 10 Drives formatting 3 12 installing new firmware on 3 15 removing 5 10 DSA G 3 DSSI G 3 Dual redundant configuration G 4 DUART G 4 DUP G 4 DWZZA G 4 E ECB G 4 ECC G 4 EDC G 4 Electrostatic discharge See ESD See ESD ERF 1 12 Error logging HSJ series 1 12 1 15 ESD G 4 guidelines 2 2 handling components 3 2 precautions 3 2 protection from 2 2 wrist strap 3 2 F Failedset G 4 Failover G 5 Fault indications SW300 cabinet 3 60 SW500 and SW800 single power supply 3 61 Fault indicators SW500 and SW800 dual power supply 3 62 Fault isolation 1 3 Fault management utility 1 19 Field replaceable units
27. which signals that nothing is blocked Last Failure Parameter 0 contains the port number 1 n that we were waiting on to be unblocked 200E0101 While traversing the structure of a unit a config_info node was discovered with an unrecognized structure type Last Failure Parameter 0 contains the structure type number that was unrecognized 200F0101 A config_info node was discovered with an unrecognized structure type Last Failure Parameter 0 contains the structure type number that was unrecognized 20100101 A config_node of type VA_MA_DEVICE had an unrecognized SCSI device type Last Failure Parameter 0 contains the SCSI device type number that was unrecognized 20120101 While traversing the structure of a unit a config_info node was discovered with an unrecognized structure type Last Failure Parameter 0 contains the structure type number that was unrecognized 20130101 While traversing the structure of a unit the device was of a unrecognized type Last Failure Parameter 0 contains the SCSI device type that was unrecognized 20150100 On SCSI failover both controllers must be restarted for failover to take effect This is how this controller is restarted in COPY OTHER 20160100 Unable to allocate resources needed for the 01 CLI local program 20180010 User requested this controller s parameters to be set to initial configuration state HSJ50 Array Controller Service Manual A 74 Appendix A Last Fail Cod
28. 0 0 0 0 0 0 0 0 0 0 0 36 247 0 10 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 93 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47 0 375 6 5 0 0 0 Description This subdisplay shows the status of the physical storage devices that are known to the controller firmware It also shows I O performance information and bus statistics for these devices Up to 42 devices can be displayed in this subdisplay HSJ50 Array Controller The PTL column contains a letter indicating the type of device followed by the SCSI Port Target and LUN of the device The list is Service Manual 1 60 Service Manual Troubleshooting sorted by port target and LUN The following device type letters may appear D indicates a disk device T indicates a tape device L indicates a media loader C indicates a CD ROM device F indicates a device type not listed above U indicates the device type is unknown The ASWF columns indicate the allocation spindle state write protect state and fault state respectively of the device The availability state is indicated using the following letters A Allocated to this controller a Allocated to the other controller U Unallocated but owned by this controller u Unallocated but owned by the other controller A space in this column indicates the allocation is unknown The spindle state is indicated using the following char
29. 2ZG61000012 In dual redundant configuration SCSI address 7 Time 15 JUN 1996 16 32 54 Service Manual 2 22 Replacing field replaceable units Host port Node name HSJA1 valid CI node 21 32 max nodes Path A is on Path B is on SCP allocation class 3 TMSCP allocation class 3 CI_ARBITRATION SYNCHRONOUS AXIMUM HOSTS 9 NOCI_4K_ PACKET _CAPABILITY Cache 128 megabyte write cache version 3 Cache is GOOD Battery is good No unflushed data in cache CACHE_FLUSH TIMER DEFAULT 10 seconds CACHE POLICY B NOCACHE_UPS Replacing one dual redundant controller and write back cache module When you replace one dual redundant controller module using the following instructions device service is interrupted for the duration of the service cycle Use this procedure when you can t warm swap the controller and or cache module Required tools The tools listed in Table 2 4 are required for the replacement of controllers Table 2 4 Required tools for controller replacement Required tools Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities ESD wrist strap and ESD mat To protect all equipment against ESD Small flat head screwdriver To loosen CI connector and front bezel captive screws 5 32 Allen wrench To unlock the SW800 Series cabinet Service Manual HSJ50 Array Controller Replacing field replaceable units 2 23 R
30. Array Controllers Use the following procedure to install a solid state disk CD ROM and optical drives To install the device power must be removed from the shelf 1 Halt all I O activity using the appropriate procedures for your operating system 2 Connect a maintenance terminal to one of the controllers 3 Atthe CLI prompt enter CLI gt SHUTDOWN OTHER_CONTROLLER CLI gt SHUTDOWN THIS_CONTROLLER 4 To ensure that the controller has shut down cleanly check for the following indications on the controller s OPC operator control panel The Reset light is on continuously Port lights 1 2 3 are also on continuously 5 Remove the power cords from the shelf power supplies in which you will install the solid state drive 6 Insert the device into the shelf 7 Reconnect the power cord to the shelf power supply 8 Reinitialize the controllers by pressing the Reset button on both controllers 9 Observe the status indicators for the following conditions 10 Ensure that the amber status indicator is off Service Manual 4 Moving Storagesets and Devices Moving storagesets Moving storageset members Moving single disk drive units Moving devices HSJ50 Array Controller Service Manual 4 2 Moving Storagesets and Devices Precautions If you re moving a storageset or device that contains data you want to keep e Make sure that the controller is functioning properly its green LED shoul
31. Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The controller shelf is reporting a problem This could mean one or both of the following If the shelf is using dual power supplies one power supply has failed One of the shelf cooling fans has failed Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Failover Control detected a receive packet sequence number mismatch The controllers are out of synchronization with each other and are unable to communicate Note that in this instance the Last Failure Code and Last Failure Parameters fields are undefined Failover Control detected a transmit packet sequence number mismatch The controllers are out of synchronization with each other and are unable to communicate Note that in this instance the Last Failure Code and Last Failure Parameters fields are undefined Failover Control received a Last Gasp message from the other controller The other controller is expected to restart itself within a given time period If it does not it will be held reset with the Kill line HSJ50 Array Controller Appendix A HSJ50 Array Controller Instance Code 07060C01 07070C01 07080B0A 40016001 CI A B transmit cables are crossed 40026001 CI A B receive cables are crossed 4003640A 4004020A Host Interconnect
32. Controller HSJ50 ZG34901786 Firmware V05 0 0 Hardware BX11 Configured for dual redundancy with 2G61000012 In dual redundant configuration SCSI address 7 Time 15 JUN 1995 16 32 54 Host port Node name HSJA1 valid CI node 21 32 max nodes Path A is on Path B is on SCP allocation class 3 TMSCP allocation class 3 CI_ARBITRATION SYNCHRONOUS AXIMUM HOSTS 9 NOCI_4K_ PACKET _CAPABILITY Cache 128 megabyte write cache version 3 Cache is GOOD Battery is good No unflushed data in cache CACHE _FLUSH TIMER DEFAULT 10 seconds CACHE POLICY B NOCACHE_UPS 26 Use the SHOW OTHER_COMMAND to check the capacity of the second cache module CLI gt SHOW OTHER_CONTROLLER The OTHER_CONTROLLER will report the same information HSJ50 Array Controllers Service Manual 3 60 Installing Installing power supplies This procedure may be used to install a power supply into a SBB shelf or into a controller shelf Power supply and shelf LED status indicators Each power supply has two LED indicators that display the power supply status The upper LED is the common power supply status The lower LED is the power supply status indicator Table 3 12 shows the possible fault indications for an SW300 cabinet Table 3 12 Power supply status indicators When the The RAID Shelf Power Status is LED Display is All the power supplies on the associated power bus are functioning This power suppl
33. DATE TIME 28 APR 1994 11 39 40 33 SYS_TYPE 00000003 SYSTEM UPTIME 0 DAYS 00 01 41 SCS NODE MTX2 OpenVMS AXP X6 1 FT7 HW_MODEL 00000401 Hardware Model 1025 ERLSLOGMESSAGE ENTRY DEC 7000 Model 610 I O SUB SYSTEM UNIT _MATSDUA450 MESSAGE TYPE 0001 DISK MSCP MESSAGE MSLG L_CMD_REF 00000000 MSLGSW_UNIT 01C2 UNIT 450 MSLGS W_SEQ_NUM 0015 SEQUENCE 21 MSLGS B_FORMAT 02 DISK TRANSFER LOG MSLG B_FLAGS 00 UNRECOVERABLE ERROR MSLGSW_EVENT 014B DRIVE ERROR HSJ50 Array Controller Service Manual MSLG Q_CNT_ID MSLGS B_CNT_SVR 14 MSLGSB_CNT_HVR 49 MSLGSW_MULT_UNT 0035 MSLG Q_UNIT_ID 02FF0000 MS MS MS MS MS MS Service Manual LGS LG LGS LGS LG LGS B_ Bi B_ B_ Li UNIT_SVR 01 UNIT_HVR 43 LEVEL 01 RETRY 00 VOL_SER 00000000 _HDR_CODE 00000000 01280009 40802576 00000022 Troubleshooting CNTRLR DETECTED PROTOCOL ERROR UNIQUE IDENTIFIER 000940802576 X MASS STORAGE CONTROLLER MODEL 40 CONTROLLER SOFTWARE VERSION 20 CONTROLLER HARDWARE REVISION 73 UNIQUE IDENTIFIER 000000000022 X DISK CLASS DEVICE 166 HSXnn UNIT SOFTWARE VERSION 1 UNIT HARDWARE REVISION 67 VOLUME SERIAL 0 LOGICAL BLOCK 0 GOOD LOGICAL SECTOR CONTROLLER DEPENDENT INFORMATION LONGWORD 1 LONGWORD 2 LONGWORD 3 LONGWORD 4 LONGWORD 5 LONGWORD 6 LONGWORD 7 LONGWORD 8 LONGWORD 9 LONGWORD 10 LONGWORD 11 LONGWORD 12
34. DRAB detected a Nonexistent Memory Error condition during a host port attempt to write CACHEA1 memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during a host port attempt to write a byte to CACHEAI memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during a host port attempt to read CACHEA1 memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write CACHEA1 memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write a byte to CACHEAI memory A 7 Service Manual A 8 Appendix A Instance Code Explanation 01772D02 The CACHEA1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to read CACHEAL memory 01782D02 The CACHEA DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write CACHEA1 memory 01792D02 The CACHEA DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write a byte to CACHEA1 memory 017A2D02 The CACHEA DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to read CACHEA1 memory 017B2E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during an FX attempt to write CACHEBO memory 017C2E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during an FX attempt to write a byte to CACHEBO memory 017D2E02 The CACHEBO DRAB detected a Nonex
35. DRAB detected an Address Parity error during a Host port attempt to read CACHEB1 memory 01AF3102 The CACHEB1 DRAB detected an Address Parity error during a Device port attempt to read CACHEB1 memory 01B03102 The CACHEB1 DRAB detected an Address Parity error during an 1960 attempt to read CACHEB1 memory 01B13702 The Master DRAB unexpectedly reported an Address Parity error 01B23702 The CACHEAO DRAB unexpectedly reported an Address Parity error 01B33702 The CACHEA1 DRAB unexpectedly reported an Address Parity error HSJ50 Array Controller Service Manual A 12 Appendix A Instance Code Explanation 01B43702 The CACHEBO DRAB unexpectedly reported 37 an Address Parity error 01B53702 The CACHEB1 DRAB unexpectedly reported 37 an Address Parity error 01B63202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 ID Cache access attempt 01B73202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 buffer memory access attempt 01B83202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 buffer memory access attempt with a simultaneous but unrelated CACHExn memory access 01B93202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 CACHEA memory access with a simultaneous but unrelated buffer memory access 01BA3202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 CACHEB memory access with a simultaneous but unrelated buffer memory access 01BB3202 The M
36. Enter record count 1 4294967295 4096 1000 If you want to test data integrity set the percentage of read and write commands that will have a data compare operation performed Perform data compare y n n y Enter compare percentage 1 100 2 1 The system displays a list of all tape drive units by unit number that you can choose for TILX testing Select the first tape drive that you want to test Do not include the letter T in the unit number Enter unit number to be tested 350 Check to be sure that a tape is loaded in the drive then answer yes to the next question Is a tape loaded and ready answer Yes when ready Y Service Manual Troubleshooting 14 TILX indicates whether it has been able to allocate the tape drive If you want to test more tape drives enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n TILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes 15 TILX will run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test T if you are running TILX through VCS Type C to terminate the current TILX test Type Y to terminate the current test and exit TILX Running an advanced tape drive test Service Manual This section prov
37. In addition you will see one of the following error code messages Message TILX detected error code 1 Illegal Data Pattern Number found in data pattern header Unit x Explanation TILX read data from the tape and found that the data was not in a pattern that TILX previously wrote to the tape Message TILX detected error code 2 No write buffers correspond to data pattern Unit x Explanation TILX read a legal data pattern from the tape at a place where TILX wrote to the tape but TILX does not have any write buffers that correspond to the data pattern Thus the data has been corrupted Message TILX detected error code 3 Read data do not match write buffer Unit x Explanation TILX writes data to the tape and then reads it and compares it against what was written This indicates a compare failure More information is displayed to indicate where in the data buffer the compare operation failed and what the data was and should have been Message TILX detected error code 4 TILX TAPE record size mismatch Unit x Explanation The size of the record that TILX read from the tape did not match the size of the record that was written on the previous write pass Message TILX detected error code 5 A tape mark was detected in a place where it was not expected Unit x Service Manual 1 44 Troubleshooting Explanation TILX encountered a tape mark where it was not expected The tape
38. Manual 3 44 Installing CLI gt SHOW THIS_CONTROLLER 10 After the controller has initialized connect the CI cable to the new controller and tighten the captive screws 11 Enable the path by entering the following command CLI gt SET THIS_CONTROLLER PATH_A CLI gt SET THIS_CONTROLLER PATH_B Installing a cache module This procedure contains information for installing a cache module The controller module is seated in front of the cache module Any time you add a cache module the controller module has to be removed first Service to the devices is interrupted during the installation procedure Required tools The tools listed in Table 3 9 are required for the installation of a cache module Table 3 9 Required tools for cache module replacement Maintenance terminal To set controller parameters ESD wrist strap To protect all equipment against electrostatic discharge ESD Mat To protect the controller and cache modules 5 32 inch Allen wrench To unlock the SW800 series cabinet Flat head screwdriver To loosen controller mounting screws and to disconnect CI cable Install a cache module single controller configuration The following procedure describes how to install a write back cache module in a single controller configuration For dual redundant configurations use the Adding a second controller using C_Swap procedure Service Manual HSJ50 Array Controllers Installing 3 45 Removing the controlle
39. Service Manual HSJ50 Array Controller Appendix A Last Fail Code 02A30100 02A40100 02A50100 02A60100 02A70100 02A80000 02A90100 02AA0100 02AB0100 02AC0100 02AD0180 02AE0100 HSJ50 Array Controller A 53 Explanation No available data buffers If the cache module exists then this is true after testing the whole cache Otherwise there were no buffers allocated from BUFFER memory on the controller module A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when allocating VAXDs A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when allocating DILPs A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when allocating Change State Work Items A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when allocating VA Request Items Controller is being rebooted as a result of a CLI CLEAR INVALID_CACHE command being issued on the other controller Too many pending FOC SEND requests by the Cache Manager Code is not designed to handle more than one FOC SEND to be pending because there s no reason to expect more than one pending An invalid call was made to CACHE DEALLOCATE_CLD Either that device had dirty data or it was bound to a RAIDset An invalid call was made to CACHE DEALLOCATE_SLD A RAIDset member either had dirty data or write back already turned on An invalid call was made to CACHE DEALLOCATE_SLD The RAIDset still has data strip nodes
40. Services detected protocol 02 error upon validating a received packet 4007640A 4009640A CI Port detected bad path A upon attempting to transmit a packet 400A640A CI Port detected bad path B upon attempting to transmit a packet 400E640A CI Port detected bad path B upon attempting to transmit a packet 400F640A Host Interconnect Services detected packet sequence number mismatch Explanation Failover Control detected that both controllers are acting as SCSI ID 6 Since IDs are determined by hardware it is unknown which controller is the real SCSI ID 6 Note that in this instance the Last Failure Code and Last Failure Parameters fields are undefined Failover Control detected that both controllers are acting as SCSI ID 7 Since IDs are determined by hardware it is unknown which controller is the real SCSI ID 7 Note that in this instance the Last Failure Code and Last Failure Parameters fields are undefined Failover Control was unable to send keep alive communication to the other controller It is assumed that the other controller is hung or not started Note that in this instance the Last Failure Code and Last Failure Parameters fields are undefined CI Port detected a Dual Receive condition that resulted in the closure of the Virtual Circuit This error condition will be eliminated in a future CI interface chip CI Port detected error upon attempting to transmit a packet This resulted in the closure of the Virtua
41. TILX is running specify how often TILX should display the summaries Enter performance summary interval in minutes 1 65535 10 5 The normal TILX summary simply indicates whether it detected any errors on each unit Additionally you can choose to see statistics on how many operations were performed and how much data was transferred during the test Include performance statistics in performance summary y n n y TILX asks if you want hard and soft errors sense data and deferred errors displayed If you do answer y and respond to the rest of the questions If you don t want to see the errors displayed answer n and proceed to the next step Display hard soft errors y Display hex dump of Error Information Packet Requester Specific information y n n y When the hard error limit is reached the unit will be dropped from testing Enter hard error limit 1 65535 32 100 When the soft error limit is reached soft errors will no longer be displayed but testing will continue for the unit Enter soft error limit 1 65535 32 32 Set the maximum number of outstanding I Os for each unit Set the I O queue depth 1 12 4 6 Suppress caching Suppress caching y n n n HSJ50 Array Controller Troubleshooting HSJ50 Array Controller 1 37 9 Run the basic function test 10 11 12 13 14 15 16 xxx Available tests are 1 Basic Function 2 User Defined
42. Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The SWAP interrupt from the shelf indicated by the port field can not be cleared All SWAP interrupts from all ports will be disabled until corrective action is taken When SWAP interrupts are disabled neither controller front panel button presses nor removal insertion of devices are detected by the controller Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The SWAP interrupts have been cleared and re enabled for all shelves Note that in this instance the Associated Port Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Service Manual A 36 Instance Code 03F30064 03F40064 03F50402 07030B0A 07040B0A 07050064 Service Manual Appendix A Explanation An asynchronous SWAP interrupt was detected by the controller for the shelf indicated by the port field Possible reasons for this occurrence include device insertion removal shelf power failure SWAP interrupts re enabled Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Device services had to reset the port to clear a bad condition Note that in this instance the
43. The FX detected a compare error for data that was identical This error has always previously occurred due to a hardware problem The mirrorset member count and individual member states are inconsistent Discovered during a mirrorset write or erase Service Manual A 54 Appendix A Last Fail Code Explanation 02AF0102 An invalid status was returned from VA XFER in a write operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 02B00102 An invalid status was returned from VA XFER in an erase operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 02B10100 A mirrorset read operation was received and the round robin selection algorithm found no normal members in the mirrorset Internal inconsistency 02B20102 An invalid status was returned from CACHE LOCK_READ during a mirror copy operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 02B80100 Invalid Cache Policy parameter to CACHE 01 POLICY_CHANGE 02B90100 Invalid code loop counter attempting to find 01 the Cache ID Blocks 02BC0100 A mirrorset read operation was received and 01 the preferred member selection algorithm found no normal members in the mirrorset Internal inconsistency 02BD0100 A mirrorset metadata online operation found 01 no normal members in the mirrorset Internal
44. a write within the user data area of the disk Note that due to the way Bad Block Replacement is performed on SCSI disk drives information on the actual replacement blocks is not available to the controller and is therefore not included in the event report Journal SRAM backup battery failure detected during system restart The Memory Address field contains the starting physical address of the Journal SRAM Service Manual Instance Code 02042001 02052301 02062301 02072201 02082201 02090064 020A0064 020B2201 020C2201 Service Manual Appendix A Explanation Journal SRAM backup battery failure detected during periodic check The Memory Address field contains the starting physical address of the Journal SRAM A processor interrupt was generated by the CACHE Dynamic Ram controller and ArBitration engine DRAB with an indication that the CACHE backup battery has failed or is low needs charging The Memory Address field contains the starting physical address of the CACHEAO memory The CACHE backup battery has been declared bad Either it failed testing performed by the cache diagnostics during system startup or it was too low insufficiently charged for the expected duration The Memory Address field contains the starting physical address of the CACHEAO memory The CACHE Dynamic Ram controller and ArBitration engine 0 DRABO failed testing performed by the cache diagnostics The Memory Address fiel
45. and cache module CXO 5324A MC Caution Do not overtighten the front panel captive screws Damage to the controller PC board or front panel may result Connect one end of the battery cable to the cache module and the other end to the ECB 18 Tighten the ECB cable mounting screws 17 Tighten the two front panel captive screws on the cache module and the 19 two captive screws on the controller module HSJ50 Array Controllers Service Manual 3 58 Installing 20 Remove the program card from each controller by pressing and holding in the Reset button then pressing the eject button next to the program card See Figure 3 26 Figure 3 26 Removing the program card Eject Pcmcia button card CXO 5323A MC 21 Reconnect the power cords to the controller power supplies 22 Press and hold the Reset button on each controller while pushing in the program card Service Manual HSJ50 Array Controllers Installing 3 59 23 The controllers will initialize When the reset light on each controller flashes at a rate of once every second the initialization process is complete 24 Snap the ESD covers into place over each program card Push the pins inward to lock the covers in place 25 To check cache capacity of the modules attach a maintenance terminal to one of the controllers At the CLI prompt type CLI gt SHOW THIS_CONTROLLER The controller will report the following information
46. are known to the controller firmware It also indicates performance information for the units Up to 42 units can be displayed in this subdisplay The Unit column contains a letter indicating the type of unit followed by the unit number of the logical unit The list is sorted by unit number There may be duplication of unit numbers between devices of different types If this happens the order of these devices is arbitrary The following device type letters may appear D indicates a disk device T indicates a tape device L indicates a media loader C indicates a CD ROM device F indicates a device type not listed above Service Manual 1 54 Service Manual Troubleshooting U indicates the device type is unknown The ASWC columns indicate respectively the availability spindle state write protect state and cache state of the logical unit For HSZ controllers on line in this column means that the unit is on line to the HSZ controller only It does not indicate that the unit is mounted by the host The availability state is indicated using the following letters a Available Available to be mounted by a host system d Offline Disabled by Digital Multivendor Customer Services The unit has been disabled for service e Online Exclusive Access Unit has been mounted for exclusive access by a user f Offline Media Format Error The unit cannot be brought available due to a media format inconsistency
47. be disconnected from at least one controller for the duration of the procedure If you are using a dual redundant configuration you may want to shut down one controller and use the surviving controller to service devices while you replace the cables on the shut down controller Required tools The tools listed in Table 2 11 are required for replacing CI host cables Table 2 11 Required tools for Cl host cable replacement Small flat head screwdriver To loosen captive screws 5 32 inch Allen wrench To unlock the SW800 series cabinet Replacing the internal Cl cables Caution Never leave the active CI host buses unterminated during the service cycle 1 Halt all I O activity using the appropriate procedures for your operating system 2 Dismount all units using the procedures for your operating system 3 Disconnect the external CI cable from the star coupler and terminate See Figure 2 31 4 Disconnect the CI cable from the controller host port Service Manual HSJ50 Array Controller Replacing field replaceable units 2 57 Figure 2 31 Disconnecting the internal Cl cable Internal Cl cable teers peceve om rane pecene SS External cables CXO 5319A MC 5 Disconnect the internal CI cable 17 03427 01 from the external CI cables See Figure 2 31 6 Remove the internal CI cable from the cabinet cutting the tie wraps as necessary 7 Position and route the new CI cable within the cabi
48. be dropped from testing Enter hard error limit 1 65535 65535 100 When the soft error limit is reached soft errors will no longer be displayed but testing will continue for the unit Enter soft error limit 1 65535 32 32 Set the maximum number of outstanding I Os for each unit Set the I O queue depth 1 12 4 9 Service Manual 1 30 Service Manual 9 10 11 12 13 14 Troubleshooting Run the user defined test xxx Available tests are 1 Basic Function 2 User Defined Use the Basic Function test 99 9 of the time The User Defined test is for special problems only Enter test number 1 2 1 2 Caution If you define write or erase commands user data will be destroyed Define the test sequence by entering command number and its associated parameters You may define up to 20 commands and they will be executed in the order in which you enter them Enter command number 1 read write access erase quit read Enter starting lbn for this command 0 Enter the IO size in 512 byte blocks for this command 1 128 20 Enter in HEX the MSCP Command Modifiers 0 0 Repeat step 0 until you have defined the entire command sequence up to 20 When you have finished entering commands type quit The system displays a list of all single disk units by unit number that you can choose for DILX testing Select the first disk that you want to tes
49. but no error conditions are indicated in the available DRAB registers The Master DRAB registers content is supplied 01292602 The Master DRAB detected a Cache Time out condition during an access attempt to a CACHEAO CACHEA1 CACHEBO or CACHEB1 DRAB registers region or memory region The addressed region failed to respond 012A3702 The CACHEAO DRAB unexpectedly reported a Cache Time out condition 012B3702 The CACHEA1 DRAB unexpectedly reported a Cache Time out condition 012C3702 The CACHEBO DRAB unexpectedly reported a Cache Time out condition 012D3702 The CACHEB1 DRAB unexpectedly reported 37 a Cache Time out condition Service Manual HSJ50 Array Controller 20 7 3 26 37 37 37 0122330A An error condition detected by one of the 3 CACHE DRABs that requires supplemental information has been reported in another event report This event report contains the Master DRAB and Diagnostic registers content associated with that initial event report Appendix A Instance Code 012E2702 HSJ50 Array Controller Explanation The Master DRAB detected an Nbus Transfer Error Acknowledge TEA condition This means the TEA signal was asserted by an Nbus device during an 1960 to Nbus device transaction The CACHEAO DRAB unexpectedly reported an Nbus Transfer Error Acknowledge condition The CACHEA1 DRAB unexpectedly reported an Nbus Transfer Error Acknowledge condition The CACHEBO DRAB unexpectedly reporte
50. complete compliance with country specific standards for example FCC TUV and so forth and with all Digital standards quiesce To make a bus inactive or dormant The operator must quiesce SCSI bus operations for example during a device warm swap RAID Redundant array of independent disks The multiple storage access methods devised for performance RAID 0 striping and or various cost levels of availability RAID 1 through RAID 5 RAIDset Three or more physical disks that are configured to present an array of disks as a single virtual unit to the host read cache The cache used to accelerate read operations by retaining data that has been previously read written or erased based on a prediction that it will be reread replacement policy The method by which a spare disk is selected to replace a disk that has failed in a RAIDset SBB StorageWorks building block A modular carrier plus the individual mechanical and electromechanical interface required to mount it into a standard StorageWorks shelf Any device conforming to shelf mechanical and electrical standards is considered an SBB HSJ50 Array Controller Glossary HSJ50 Array Controller G 9 SBB shelf StorageWorks building block shelf A StorageWorks shelf such as the BA350 Sx designed to house plug in SBB modules scs System communication services A delivery protocol for packets of information commands or data to or from the host SCSI
51. controller to shutdown 080A0000 The other controller requested this controller to selftest 080B0100 Could not get enough memory to build a FCB 0l to send to the remote routines on the other controller 080C0100 Could not get enough memory for FCBs to receive information from the other controller 080D0100 Could not get enough memory to build a FCB to reply to a request from the other controller 080E0101 An out of range receiver ID was received by Ol the NVFOC communication utility master send to slave send ACK Last Failure Parameter 0 contains the bad id value HSJ50 Array Controller Service Manual 01 01 01 01 01 01 01 00 00 00 01 01 A 70 Appendix A 080F0101 An out of range receiver ID was received by 1 the NVFOC communication utility received by master Last Failure Parameter 0 contains the bad id value 08100101 A call to NVFOC TRANSACTION had a 1 from field id that was out of range for the NVFOC communication utility Last Failure Parameter 0 contains the bad id value 08110101 NVFOC tried to defer more than one FOC 1 send Last Failure Parameter 0 contains the master ID of the connection that had the multiple delays 08120100 Unable to lock other controller s NVmemory 1 despite the fact that the running and handshake_complete flags are set 08130100 Could not allocate memory to build a callback 1 context block on an unlock NVmemory call 08140100 Could not allocate memory to build a
52. device could be disabled NOTE In order to minimize the possibility of a SCSI bus reset which could disable the destination device it is recommended that you prevent non HSUTIL IO operations to all other devices on the same port as the destination device After you answer the next question the code load will start Do you want to continue y n n Y HSUTIL is doing code load Please be patient Device code has been successfully downloaded to device TAPE100 HSUTIL Normal Termination at 14 JUN 1996 16 31 09 Service Manual HSJ50 Array Controllers Installing 3 21 HSUTIL abort codes If HSUTIL terminates before it formats a disk drive or installs new firmware it reports one of the abort codes in Table 3 1 Table 3 1 Abort codes FAO returned either FAO_BAD_FORMAT or FAO_OVERFLOW Bad return from TS READ_TERMINAL_DATA TS READ_TERMINAL_DATA returned either an ABORTED or INVALID_BYTE_COUNT User requested an abort via Y or C An error occurred on a SCSI command Can t find the PUB device is probably missing HSUTIL messages HSJ50 Array Controllers HSUTIL may produce one or more of the following messages while you re formatting disk drives or installing new firmware Many HSUTIL messages have been omitted from this section since they are self explanatory Message Insufficient resources Explanation HSUTIL cannot find or perform the operation because internal controller resources
53. device is connected to the controller with an incorrect cable Follow repair action 08 A controller hardware failure Follow repair action 20 Determine which blower failed and replace it Replace the power supply documentation There are four possible problem sources 1 Total power supply failure on a shelf Follow repair action 09 A device inserted into a shelf that has a broken internal A 91 Service Manual A 92 Service Manual Appendix A Repair Action Action to take Code The other controller in a dual redundant configuration has been reset with the Kill line by the controller that reported the event To restart the Killed controller enter the CLI command RESTART OTHER on the Surviving controller and then depress the RESET button on the Killed controller If the other controller is repeatedly being Killed for the same or a similar reason follow repair action 20 Both controllers in a dual redundant configuration are attempting to use the same SCSI ID either 6 or 7 as indicated in the event report Note The other controller of the dual redundant pair has been reset with the Kill line by the controller that reported the event Two possible problem sources are indicated 1 A controller hardware failure 2 A controller backplane failure First follow repair action 20 for the Killed controller If the problem persists then follow repair action 20 for the Surviving controller If the pr
54. disk drives eee eecceeeeceseeeeeeceeeceeeeteeeeeseeeeenees 2 50 Table 2 9 Required tools for SBB replacement cece seeeeeseeeeeeeeeeeseeeeeeeesaeenaeeesaeeeaes 2 53 Table 2 10 Required tools for Replacing solid state disk drives eeeeeeeseeeeeeeeeereeeee 2 54 Table 2 11 Required tools for CI host cable replacement eee eeeeeeeeeeseeereeneeeeeeeeeneenees 2 56 Table 2 12 Required tools for replacing SCSI device cables 000 0 eeeeeeeeeeeeeseeeeeeeeeeeeeeene 2 58 Table 3 Abort codenar sensible N eE e EE EOE E Taa S E as Bag TES 3 21 Table 3 2 Required tools for controller installation ssseeseeseeeseeeeesessseeressrrressressrerrrsrreres 3 25 Table 3 3 Controller installation guide eseeeseeeseereeseeeseesesertseresertrrssrrreressrsrreseresresreseres 3 25 Table 3 4 ECB status indications oniiir n a aa a a a 3 30 Table 3 5 Required tools eeni i eeue ene auoe e eA Ee oa Eua Cea EEs E nei es aS eriak 3 31 Table 3 6 Controller installation guide eseeeseeeseeeeeseeeseesesereseresereressrereressesrresteesresreseres 3 33 Table 3 7 ECB status dicatof Seien t a N eel a 3 38 Table 3 8 Required tools for adding a second controller eesseeeseseeeeeseereeeeeeresesereeersrreees 3 39 Table 3 9 Required tools for cache module replacement ssesseeeseseseeeerreresreeresrreresereeree 3 44 Table 3410 Required tools rnui meege arire ona aE o E aE aE Eaa EE EEE SETTA 3 52 Table 3 11 Add
55. downloaded with a single write buffer command y n y Explanation This message is displayed if HSUTIL detects that an unsupported device has been selected as the target device You must indicate whether to download the firmware image to the device in one or more contiguous blocks each corresponding to one SCSI Write Buffer command Installing a controller and cache module single controller configuration Use the following procedure to install a controller and its power supplies into an empty controller shelf for the first time Required tools The tools listed in Table 3 2 are required for the installation of controllers Table 3 2 Required tools for controller installation Maintenance terminal To set controller parameters ESD wrist strap To protect all equipment against electrostatic discharge 5 32 inch Allen wrench To unlock the SW800 series cabinet Flat head screwdriver To loosen controller mounting screws and to disconnect the CI cable 1 Using Table 3 3 as a guide determine the slot and the SCSI ID into which the controller is to be installed Note that the first controller should be installed in the slot that corresponds to SCSI ID 7 Table 3 3 Controller installation guide Controller SW800 Sws00 Sw500 Front View Rear View Font amp Rear View First Right Side Left Side Top Slot Bottom Slot Controller SCSI ID 7 SCSI ID 7 SCSI ID 7 SCSI ID 7 Second Left Side Right Side Bottom Slot Top Slot Control
56. drive may not be able to accurately position the tape Message TILX detected error code 7 EOT encountered in unexpected position Unit x Explanation The tape reached EOT before TILX expected it The tape drive may not be able to accurately position the tape TILX data patterns Table 1 4 defines the data patterns used with the TILX Basic Function or User Defined tests There are 18 unique data patterns These data patterns were selected as worst case or the ones most likely to produce errors on tape drives connected to the controller Table 1 4 TILX data patterns Pattern Number Pattern in Hexadecimal Numbers 5 shifting 1s 0001 0003 0007 OOOF 001F 003F 007F OOFF OLFF O3FF O7FF OFFF 1FFF 3FFF 7FFF 6 shifting Os FIE FFFC FFFC FFFC FFEO FFEO FFEO FFEO FE00 FC00 F800 F000 F000 C000 8000 0000 7 alternating 1s Os 0000 0000 0000 FFFF FFFF FFFF 0000 0000 FFFF FFFF 0000 FFFF 0000 FFFF 0000 FFFF 5555 5555 5555 AAAA AAAA AAAA 5555 5555 AAAA AAAA 5555 AAAA 5555 AAAA 5555 AAAA 5555 2D2D 2D2D 2D2D D2D2 D2D2 D2D2 2D2D 2D2D D2D2 D2D2 2D2D D2D2 2D2D D2D2 2D2D D2D2 Service Manual HSJ50 Array Controller Troubleshooting 1 45 Pattern Number Pattern in Hexadecimal Numbers 13 ripple 1 0001 0002 0004 0008 0010 0020 0040 0080 0100 0200 0400 0800 1000 2000 4000 8000 14 ripple 0 FIE FFFD FFFB FFF7 FFEF FFDF FFBF FF7F FE
57. drive the drive may become unusable until a successful format is completed To minimize this possibility Digital recommends that you HSJ50 Array Controllers Installing 3 13 secure a reliable power source and suspend all non HSUTIL activity to the bus that services the target disk drive e HSUTIL cannot control or affect the defect management for a disk drive The drive s microcode controls the defect management during formatting e Don t invoke any CLI command or run any local program that might reference the target disk drive while HSUTIL is active Also don t initialize either controller in the dual redundant configuration For example CLI gt RUN HSUTIL x k Available functions are 0 EXIT 1 FORMAT 2 DEVICE_CODE_LOAD_ DISK 3 DEVICE_CODE_LOAD_TAPE Enter function number 0 3 0 1 Unattached devices on this controller include Device SCSI Product ID Current Device Rev DISK100 R226 C DEC T386 DISK200 R226 C DEC T386 DISK210 RZ29B C DEC 0006 DISK310 R225 C DEC 0900 DISK320 RZ26L C DEC X442 Enter a device to format disk100 Format DISK100 may take up to 40 minutes to format Select another device y n n Y ct ct Enter a device to format disk200 Format DISK200 may take up to 40 minutes to format Select another device y n n Y Enter a device to format disk210 Format DISK210 may take up to 15 minutes to format Select another device y
58. inconsistency 02BF0100 Report_error routine encountered an 01 unexpected failure status returned from DIAG LOCK_AND_TEST_CACHE _B 02C00100 Copy_buff_on_this routine expected the given 01 page to be marked bad and it wasn t 02C10100 Copy_buff_on_other routine expected the 01 given page to be marked bad and it wasn t 02C60100 Mirroring transfer found CLD with writeback 01 state OFF Service Manual HSJ50 Array Controller Appendix A A 55 Last Fail Code Explanation 02C70100 Bad BBR offsets for active shadowset 01 detected on write 02C80100 Bad BBR offsets for active shadowset 01 detected on read 02C90100 Illegal call made to CACHE 01 PURGE_META when the storageset was not quiesced 02CA0100 Illegal call made to VA 01 RAID5_META_READ when another read of metadata is already in progress on the same strip 02CB0000 A restore of the configuration has been done This cleans up and restarts with the new configuration 02CC0100 On an attempt which is not allowed to fail 01 to allocate a cache node no freeable cache node was found 02CD0100 On an attempt which is not allowed to fail 01 to allocate a strip node no freeable strip node was found 02CE1010 Serial number mismatch was detected during 10 an attempt to restore saved configuration information 02CF0100 An unsupported message type or terminal 01 request was received by the VA_SAVE_Config virtual terminal code from the CLI 02D00100 Not all alter_d
59. indication See Table 2 2 Table 2 2 ECB status indicators LED Status Battery Status LED is on continuously System power is on and the ECB is fully charged LED blinks rapidly System power is on and the ECB is charging LED blinks slowly System power is off and the ECB is supplying power to the cache LED is off System power is off and the ECB is not supplying power to the cache HSJ50 Array Controller Replacing field replaceable units 2 15 Replacing a controller and cache module in a single controller configuration When you replace the controller module in a nonredundant configuration device service is interrupted for the duration of the service cycle Required tools The tools listed in Table 2 3 are required for the replacement of single controllers Table 2 3 Required tools for single controller replacement Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities ESD wrist strap and ESD To protect all equipment against ESD mat Small flat head To loosen CI connector and front bezel screwdriver captive screws 5 32 Allen wrench To unlock the SW800 Series cabinet Removing the controller 1 Ifthe controller is still or partially functioning connect a maintenance terminal to the controller See Figure 2 7 HSJ50 Array Controller Service Manual 2 16 Replacing field replaceable units Figure 2 7 Connecting a maintenance terminal to the con
60. is the code to be loaded TO device name 8 Read the cautionary information that HSUTIL displays then confirm or cancel the load Do you want to continue y n n Y 9 When HSUTIL terminates verify the new firmware revision level CLI gt SHOW device name HSUTIL Output Example CLI gt RUN HSUTIL xxx Available functions are EXIT 0 1 FORMAT 2 DEVICE_CODE_LOAD_DIS 3 DEVICE_CODE_LOAD_TAPI Enter function number OPA 73 O 2 3 Available single device units on this controller include Unit Associated Device SCSI Product ID Current Device Rev 625 DISK250 RZ28 C DEC T436 613 DISK130 RZ25 C DEC 0700 Which unit is the code to be loaded from 625 E E E E What is the starting LBN 0 0 What is the SCSI PRODUCT ID of the device that you want code load TO TZ867 Unattached devices on this controller include Device SCSI Product ID Current Device Rev Service Manual 3 20 Installing APE100 TZ867 4318 Which device is the code to be loaded TO tape100 The tape cartridge must be removed to update the tape drive firmware The cartridge is in the drive Please unload Is the cartridge loaded y n n n Y and C will be disabled while the code load operation is in progress CAUTION Loading the incorrect firmware can disable the destination device If a failure occurs while loading drive memory the destination
61. module One dual redundant controller and cache module Both dual redundant controllers and cache modules Cache modules Battery modules Power supplies Disk drives Tape drives Solid state disks optical and CD ROM drives Host and device cables HSJ50 Array Controller Service Manual Replacing field replaceable units Electrostatic discharge protection The following sections describe necessary precautions for preventing component damage while servicing your HSJ50 subsystem Use the following guidelines when performing any of the replacement procedures Electrostatic discharge ESD can damage system components Use the following guidelines when handling subsystem components Handling controllers or cache modules After removing a controller or cache module from the shelf place the module into an approved antistatic bag or onto a grounded antistatic mat When removing write back cache modules always disconnect the external cache battery cable from the external cache battery ECB module and the cache module If power is removed from the controller shelf or cabinet during a procedure disable the ECB by pressing the battery disable switch then disconnect the battery cable at the cache module If power is not to be removed during the procedure disconnect the ECB cable starting at the cache module end and then the ECB After the cable is removed disable the ECB by pressing the battery disable switch Handling the program ca
62. n n N ct ct Y and C will be disabled while the format operation is in progress CAUTION When you format a device it will destroy the data on the device A backup of the device should have been done if the data is important HSJ50 Array Controllers Service Manual 3 14 Service Manual Installing NOTE In order to minimize the possibility of a SCSI bus reset it s recommended that you prevent non HSUTIL IO operations to all other devices on the same port as the destination device s If a SCSI bus reset occurs the format may be incomplete and you may have to re invoke HSUTIL After you answer the next question the format will start Do you want to continue y n n Y HSUTIL started at 14 JUN 1996 15 00 31 Format of DISK100 finished at 14 JUN 1996 16 40 12 Format of DISK200 finished at 14 JUN 1996 17 15 31 Format of DISK210 finished at 14 JUN 1996 16 30 43 HSUTIL Normal Termination at 14 JUN 1996 16 31 09 HSJ50 Array Controllers Installing Installing new firmware on a device HSJ50 Array Controllers Installing new firmware on a disk or tape drive is a two step process as shown in Figure 3 3 First you copy the new firmware from your host to a disk drive in your subsystem then use HSUTIL to distribute the firmware to devices in your subsystem Figure 3 3 Copy the firmware to a disk drive in your subsystem then distribute it to the devices you want to upgrade
63. of storing data whether it is a physical device or a group of physical devices StorageWorks Digital s family of modular data storage products that allows customers to design and configure their own storage subsystems Components include power packaging cabling devices controllers and software Customers can integrate devices and array controllers in StorageWorks enclosure to form storage subsystems StorageWorks building block See SBB stripeset A virtual disk drive with its physical data spread across multiple physical disks Stripeset configurations do not include a data recovery mechanism striped mirrorset Stripesets whose members have been mirrored tagged command queuing A SCSI feature that allows a device to have multiple I O requests outstanding to it at one time target A SCSI device that performs an operation requested by an initiator The target number is determined by the device s address on its SCSI bus HSJ50 Array Controller Glossary HSJ50 Array Controller G 11 TMSCP Tape mass storage control protocol The protocol by which blocks of information are transferred between the host and the controller unit The host s view of a container on an HS array controller A unit may be made up of simply a physical disk or tape drive or a more complex container such as a RAIDset unwritten cached data Data in the write back cache that has not yet been written to the physical device but the use
64. patterns 5 shifting 1s 0001 0003 0007 OOOF 001F 003F 007F OOFF OLFF O3FF O7FF OFFF 1FFF 3FFF 7FFF 6 shifting Os FIE FFFC FFFC FFFC FFEO FFEO FFEO FFEO FEOO FC00 F800 F000 F000 C000 8000 0000 7 alternating 1s Os 0000 0000 0000 FFFF FFFF FFFF 0000 0000 FFFF FFFF 0000 FFFF 0000 FFFF 0000 FFFF Fc 5555 5555 5555 AAAA AAAA AAAA 5555 5555 AAAA AAAA 5555 AAAA 5555 AAAA 5555 AAAA 5555 HSJ50 Array Controller Troubleshooting 1 33 Pattern Number Pattern in Hexadecimal Numbers 11 2D2D 2D2D 2D2D D2D2 D2D2 D2D2 2D2D 2D2D D2D2 D2D2 2D2D D2D2 2D2D D2D2 2D2D D2D2 13 ripple 1 0001 0002 0004 0008 0010 0020 0040 0080 0100 0200 0400 0800 1000 2000 4000 8000 14 ripple 0 FIE FFFD FFFB FFF7 FFEF FFDF FFBF FF7F FEFF FDFF FRBFF F7FF EFFF BFFF DFFF 7FFF DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D 9999 1999 699C E99C 9921 9921 1921 699C 699C 0747 0747 0747 699C E99C 9999 9999 16 3333 3333 3333 1999 9999 9999 B6D9 B6D9 B6D9 B6D9 FFFF FFFF 0000 0000 DB6C DB6C Default Use all of the above patterns in a random method Testing tapes TILX HSJ50 Array Controller HSJ50 series controllers have a Tape In line Exerciser TILX that you can use to test suspect tape drives When you run TILX you can specify many parameters for the test such as the I O queue depth the
65. recovery thread in the er_funct_step field of the PCB Last Failure Parameter 0 contains the PCB er_funct_step code Service Manual A 62 Service Manual Last Fail Code 033E0108 033F0108 Appendix A Explanation An attempt was made to restart a device port at the SDP DBD Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers An EDC error was detected on a read of a soft sectored device path not yet implemented Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port D
66. redundant controller system Enter auto configure option 1 3 3 1 DILX displays a warning indicating that data on the disks will be destroyed Either exit the test or enter y to continue All data on the Auto Configured disks will be destroyed You MUST be sure of yourself Are you sure you want to continue y n n y HSJ50 Array Controller Troubleshooting 1 25 Enter the amount of time that you want the test to run A single complete pass takes 10 minutes after the initial write pass Enter execution time limit in minutes 1 65535 60 25 If you want to see performance summaries while DILX is running specify how often DILX should display the summaries Enter performance summary interval in minutes 1 65535 60 5 The normal DILX summary simply indicates whether it detected any errors on each unit Additionally you can choose to see statistics on how many read and write operations were performed during the test Include performance statistics in performance summary y n n y DILX displays a list of the units that it is able to test and begins the initial write pass and test DILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes DILX will run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Type G Control G to get a current performance summary without stopping t
67. redundant mode HSJ50 gt SET FAILOVER COPY OTHER_CONTROLLER To check controller parameters HSJ50 gt SHOW THIS CONTROLLER A listing of controller information will be displayed similar to the following example Controller HSJ50 2634901786 Firmware V05 0 0 Hardware BX11 Configured for dual redundancy with 2ZG61000012 In dual redundant configuration SCSI address 7 Time 15 JUN 1996 16 32 54 Host ports Node name HSJA1 valid CI node 21 32 max nodes Path A is on Path B is on SCP allocation class 3 TMSCP allocation class 3 CI_ARBITRATION SYNCHRONOUS AXIMUM HOSTS 9 OCI_4K_ PACKET _CAPABILITY Cache HSJ50 Array Controller Replacing field replaceable units 2 29 128 megabyte write cache version 3 Cache is GOOD Battery is good No unflushed data in cache CACHE FLUSH TIMER DEFAULT 10 seconds CACHE POLICY B NOCACHE_UPS Repeat for the other controller HSJ50 gt SHOW OTHER_CONTROLLER Replacing cache modules This section contains replacement procedures for write back cache modules for the HSJ50 controllers If you are replacing one cache modules in a dual redundant configuration use the Controller and cach module warm swap procedure in this manual The controller module is seated in front of the cache module Any time you service a cache module in a single controller configuration you must shut down the subsystem In a dual redundant configuration use the controller warm s
68. reena eenean Raipe Teee eieaa eeaeee SEAN e KOSER SEET tees 5 3 Required tools aens urs aen a a a E A E E E SE 5 3 Removing a controller and cache module eseeseeeseeeseereesreesrertssrtssrereretrssrrssresresereseresereresere 5 6 Required tools iint a edie DER a aaa Ra ited Loh 5 6 REMOVING StOLas CEVICES c 25 53 8 feeeevegeds acetate devesuesoans osucuesevetedesoevenbpcdutensecegadensutecdbetetedeypeentye 5 10 Removing disk drives ieisc cyasi inivndharedlt ae basi ah ave ei 5 10 Required tools esters sewn totes etter orks sito tee cl cane te nid cave Maton tate abutment eet eee s 5 10 Removing solid state disks read write optical devices and CD ROM drives 04 5 11 REQUITER tools Sereen eeo neee apee suet buvne Eene aes Enee eae NO anp ANES EAEE ESEE ESEA ries 5 11 Removing tape drives ais ars aE E N aE a Ta ater ans TA AEE aS 5 13 Reguired tOOIS weno aa ee e Meche a e a e EA E aE RR 5 13 Appendix A Instance codes and definitions eeeeeeeeeeeseessseesetetrrseessssssrettrrresrsesssstterrtrriessseetreeerreeesss A 2 ET Roa AO E E E A E N E SE A 42 Repair action code Sinne en vata cilia aa Aa EE E A AU ae ASAE ATST A 91 Glossary Index Figures Figure 2 1 Connecting a maintenance terminal to the controller eee eeeeeeeeeeeeeeeeeeeeeeee 2 4 Figure 2 2 Removing the program Card eeseeessceseecsseeesseesseecsseeceeesesaeeesaeenseeesaeessaeesaeers 2 6 Figure 2 3 Disconnecting the ECB cable cece eeeeeseeceeceseecne
69. remote connection s queue 60450100 dmscp_ded_errlog_rve found that an error log is not associated with a command internal miscellaneous error logs are assumed to not be associated with a connection and remote miscellaneous error logs generation was not requested Service Manual HSJ50 Array Controller Appendix A A 83 Last Fail Code Explanation 60460100 dmscp_dcd_elrt_scc_send was entered to issue a remote source connection SCC but was unable to find an available HTB on the connection s htb_list With no active DCDs the connection should always have HTBs available 60480100 tmscp_suc_avl_cmpl_rtn found the unit not in the available state 60490100 tmscp_clear_sex_cdl_cmpl_rtn found the state change failed 604A0100 tmscp_clear_sex_cdl_cmpl_rtn found the state change failed 604B0100 Subroutine process_event returned a value to 01 dmscp_dcd_comm_path_event that indicates that an internal disconnect request occurred while processing an immediate communications event 604D0100 Subroutine process_event returned a value to 01 dmscp_dcd_comm_path_event that indicates that a connection established event occurred while no DCD commands were active 604F0100 tmscp_set_sex_cmpl_rtn found the state 01 change failed 60500100 dmscp_dcd_op_cmpl found an unrecognized 01 P_STS value in a DCD HTB status field 60550100 mscp_initialize unable to get LOCAL 01 STATIC memory from exec for use as a local connection ITB 60560100 msc
70. size of the cache module that is installed with the first controller If a cache module is present at the first controller prepare another one of the same size for installation with the second controller Insert the SBB battery module that was shipped your controller into an existing device slot If you are replacing a single ECB with dual ECB follow these steps a Press the shutdown button on the single ECB until the LED stops flashing b Remove the ECB cable from the single ECB c Remove the single ECB from the device slot d Install the dual ECB into the device slot Start the C_LSWAP program CLI gt RUN C_SWAP When the controller prompts you answer the question Do you have a replacement HSJ readily available N y Press Y for Yes Answer the question Sequence to INSERT the other HSJ has begun Do you wish to INSERT the other HSJ y n Press Y for yes Wait for the following text to appear on the operating controller s console Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced Port 6 quiesced All ports quiesced Insert the other HSJ50 WITHOUT its program card and press Return Service Manual 3 42 Service Manual Installing If you are installing a new cache module slide it straight in along the rails and then push firmly to seat it in the backplane Check the new controller to make su
71. terminal code from the CLI Not all alter_device requests from the CONFIG utility completed within the timeout interval An unsupported message type or terminal request was received by the CFMENU utility code from the CLI Not all alter_device requests from the CFMENU utility completed within the timeout interval Explanation An unsupported message type or terminal request was received by the CLONE virtual terminal code from the CLI Table A 24 Format and device code load utility HSUTIL last failure codes Last Fail Code Explanation 85010100 85020100 HSJ50 Array Controller Repair Action HSUTIL tried to release a facility that wasn t 01 reserved by HSUTIL 1 HSUTIL tried to change the unit state from 0 MAINTENANCE _MODE to NORMAL but was rejected because of insufficient resources Service Manual A 90 Appendix A Last Fail Code Explanation 85030100 HSUTIL tried to change the usb unit state from MAINTENANCE_MODE to NORMAL but HSUTIL never received notification of a successful state change 85040100 HSUTIL tried to switch the unit state from MAINTENANCE_MODE to NORMAL but was not successful Table A 25 Code load code patch utility CLCP last failure codes Last Fail Code Explanation 86000020 Controller was forced to restart in order for 00 new code load or patch to take effect 0 86010010 The controller code load function is about to 0 update the program card This requires controller act
72. test in almost all situations 1 Start DILX from the CLI prompt HSJ50 gt RUN DILX Service Manual HSJ50 Array Controller Troubleshooting HSJ50 Array Controller 1 29 Skip the auto configure option to get to the user defined test Do you wish to perform an Auto configure y n n Do not accept the default settings Use all defaults and run in read only mode y n n Enter the amount of time that you want the test to run A single complete pass takes 10 minutes after the initial write pass Enter execution time limit in minutes 1 65535 60 25 If you want to see performance summaries while DILX is running specify how often DILX should display the summaries Enter performance summary interval in minutes 1 65535 60 5 The normal DILX summary simply indicates whether it detected any errors on each unit Additionally you can choose to see statistics on how many read and write operations were performed during the test Include performance statistics in performance summary y n n y DILX asks if you want hard and soft errors sense data and deferred errors displayed If you do answer y and respond to the rest of the questions If you don t want to see the errors displayed answer n and proceed to the next step Display hard soft errors y Display hex dump of Error Information Packet Requester Specific information y n n y When the hard error limit is reached the unit will
73. the Name column of the SHOW Storageset name command HSJ50 gt DELETE storageset name Delete each disk drive one at a time that was contained by the storageset HSJ50 gt DELETE disk name HSJ50 gt DELETE disk name HSJ50 gt DELETE disk name Remove the disk drives and move them to their new PTL locations Re add each disk drive to the controller s list of valid devices HSJ50 gt ADD DISK disk name PTL location HSJ50 gt ADD DISK disk name PTL location HSJ50 gt ADD DISK disk name PTL location Re create the storageset by adding it s name to the controller s list of valid storagesets and specifying the disk drives it contains Although you have to re create the storageset from its original disks you don t have to add them in their original order HSJ50 gt ADD STORAGESET storageset name disk name disk name disk name Re present the storageset to the host by giving it a unit number that the host can recognize You can use the original unit number or create a new one HSJ50 gt ADD UNIT unit number storageset name Example The following example moves unit D100 to another cabinet D100 is the RAIDset RAID99 that comprises members 200 300 and 400 HSJ50 gt SHOW Raid99 Name Storageset Uses Used by RAID99 raidset disk100 D100 disk200 disk300 HSJ50 gt DELETE D100 HSJ50 gt DELETE RAID99 HSJ50 gt DELETE DISK200 DISK300 DISK400 move the
74. the CLI Reference Manual for further details of the LOCATE command Asynchronous device installation Service Manual The controller supports asynchronous device swapping on the device bus Asynchronous swapping is defined as removal or insertion of a device while the controller is running without halting bus activity using port quiesce buttons or CLI commands Asynchronous device swapping is supported on the HSJ50 controller subject to the following restrictions e Asynchronous device swapping is not supported during failover e Asynchronous device swapping is not supported during failback e Asynchronous device swapping is not supported from the time the controller is initialized until the CLI prompt appears e Asynchronous device swapping is not supported if the controller is in the process of recognizing or processing one or more asynchronous device removals e Asynchronous device swapping is not supported while local programs such as DILX or VTDPY are running HSJ50 Array Controllers Installing 3 69 Installing SBBs Use the following procedure to install SBBs 1 Insert the SBB into the shelf guide slots and push it in until it is fully seated and the mounting tabs engage the shelf 2 Observe the activity indicator upper LED and the status indicator lower LED The activity indicator is either on flashing or off The status indicator is off Installing a solid state disk CD ROM and optical drives HSJ50
75. the ECB cable to the new cache module Reconnect the power cords to the controller power supplies Initialize the controllers by pressing the front panel Reset button oS te If the controller reports an invalid cache error enter the following command HSJ50 gt CLEAR_ERRORS INVALID_CACHE THIS CONTROLLER DESTRY_UNFLUSHED_DATA Replacing external cache batteries ECBs The batteries are installed in a 3 1 2 inch storage building block SBB module Digital does not recommend replacing individual ECBs Therefore when one ECB needs to be replaced replace the entire SBB There are two ways to change the battery SBB an online method C_SWAP in which one controller continues to process I O and an offline method in which both controllers are shut down The following procedure describes the online method Required Tools The tools listed in Table 2 6 are required for replacing the external cache batteries Table 2 6 Required tools Required tools Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities ESD wrist strap and ESD To protect all equipment against ESD mat Small flat head screwdriver To loosen CI connector and front bezel captive screws 5 32 Allen wrench To unlock the SW800 Series cabinet HSJ50 Array Controller Service Manual 2 34 Replacing field replaceable units Replacing the SBB battery module Use the following procedure to replace a
76. the program Card 0 ceeesesccesseesseeesseesseeceeeesaeecseecsaeesaeesseeeeesenees 3 37 Figure 3 13 Connecting a maintenance terminal to the controller eeeeseeeeeeeeeeeeeeeee 3 40 Figure 3 14 Connecting a maintenance terminal to the controller eeeseeeereeeeeeeeeeeeeee 3 45 Figure 3 15 Disconnecting the CI cable adapter ee eeeeeeeesseceneecseeeeeeeesseenaeessaeeeaeeesas 3 46 Service Manual HSJ50 Array Controller Tables Figure 3 16 Removing controller Modules cee eeeeesseeseeeeeeeeseeceeeessecsaeecsaeesaeesseeeaeers 3 46 Figure 3 17 Installing an SBB battery Module ee eeeeesseeeseeeeseeeseeeeseeesaeeeseessaeeaes 3 47 Figure 3 18 Installing the cache and controller modules 00 0 0 cele seeseeeeeceseeeeeeeeeeeeeeeseeens 3 48 Figure 3 19 Removing the program Card 0 cc eeeesecesecssnecneeeesseeseeeesaeesaeeesseesaeecseeesaeeeaes 3 50 Figure 3 20 Installing the program Card ee eeeseeeccesseeseeeeeeesseceaeeesaeeeseecsaeesaeecsaeesaeeaes 3 51 Figure 3 21 Connecting a maintenance terminal to the controller eee eeeeeeeeeeteeeneeeeee 3 53 Figure 3 22 Disconnecting the CI cable eee eseeceeecessceeeeeesseesseecseeeseeceeeeesaeecaeeesaeenaes 3 54 Figure 3 23 Removing controller and cache modules ce eceeseeseeeeseeeeeeeseesseeesseesneeeees 3 55 Figure 3 24 Cache configurations for cache Version 3 cesesecesseceeeeeecesseeeeeeeeeeesaeeneeens 3 56 Figure 3 25 Installing the controller a
77. three four and so on Service Manual Installing e Controllers in dual redundant configurations must have the same patches applied You must enter patches into each controller separately Listing patches Service Manual The List Patches option allows you to display a listing of controller software versions and the currently installed patches that apply to them Following is an example of the List Patches option 1 Connect a maintenance terminal to the controller See Figure 3 1 Figure 3 1 Connecting a maintenance terminal to the controller Local connection port BC16E XX ia To terminal CXO 5322A MC 2 Invoke the CLCP utility CLI gt RUN CLCP The CLCP main menu is displayed Select an option from the following list Code Load amp Code Patch local program Main Menu QO Exit 1 Enter Code LOAD local program 2 Enter Code PATCH local program Enter option number O02 EO 2 HSJ50 Array Controllers Installing 3 5 This controller module does not support code load functionality Exiting CLCP CLI gt 3 Enter option 2 to enter the patch program You have selected the Code Patch local program This program is used to manage firmware code patches Select an option from the following list Type Y or C then RETURN at any time to abort Code Patch Code Patch Main Menu O Exit 1 Enter a Patch 2 Delete Patches 3 List Patches Enter option number 0 3
78. to recover by trying the command again 03DF450A During device initialization the device reported the SCSI Sense Key EQUAL This indicates a SEARCH DATA command has satisfied an equal comparison 03E0450A During device initialization the device reported the SCSI Sense Key VOLUME OVERFLOW This indicates a buffered peripheral device has reached the end of partition and data may remain in the buffer that has not been written to the medium A RECOVER BUFFERED DATA command s may be issued to read the unwritten data from the buffer 03E1450A During device initialization the device reported the SCSI Sense Key MISCOMPARE This indicates the source data did not match the data read from the medium 03E2450A During device initialization the device 45 reported a reserved SCSI Sense Key 03E60702 The EMU has detected one or more bad power 07 supplies Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03E70602 The EMU has detected one or more bad fans 06 Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03E80D02 The EMU has detected an elevated temperature condition Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03E90E02 The EM
79. to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Byte transfer time out during operation to a device which is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Miscellaneous SCSI Port Driver coding error detected during operation to a device which is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined An error code was reported that was unknown to the Fault Management firmware Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Device port SCSI chip reported gross error during operation to a device which is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Non SCSI bus parity error during operation to a device which is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Source driver programming error encountered during operation to a device which is unknown to the controller Note that in this instance the Ass
80. unit addressed cannot be accessed Operator intervention may be required to correct this condition During device initialization the device reported the SCSI Sense Key MEDIUM ERROR This indicates that the command terminated with a non recovered error condition that was probably caused by a flaw in the medium or an error in the recorded data This sense key also may be returned if the target is unable to distinguish between a flaw in the medium and a specific hardware failure HARDWARE ERROR sense key During device initialization the device reported the SCSI Sense Key HARDWARE ERROR This indicates that the target detected a non recoverable hardware failure for example controller failure device failure parity error etc while performing the command or during a self test HSJ50 Array Controller Appendix A Instance Code 03D8450A 03D9450A 03DA450A 03DB450A 03DC450A 03DD450A HSJ50 Array Controller A 33 Explanation During device initialization the device reported the SCSI Sense Key ILLEGAL REQUEST This indicates that there was an illegal parameter in the command descriptor block or in the additional parameters supplied as data for some commands FORMAT UNIT SEARCH DATA etc If the target detects an invalid parameter in the command descriptor block then it will terminate the command without altering the medium If the target detects an invalid parameter in the additional parameters supp
81. 0 Its new name will be disk300 to correspond to its new PTL location Disk210 was a spare that was pulled into unit D100 when its disk300 failed D100 is the RAIDset RAID99 that comprises members 200 210 and 400 HSJ50 gt DELETE D100 HSJ50 gt DELETE RAID99 HSJ50 gt DELETE DISK210 move disk210 to PTL location 300 HSJ50 gt ADD DISK DISK300 3 0 0 HSJ50 gt ADD RAIDSET RAID99 DISK200 DISK300 DISK400 HSJ50 gt ADD UNIT D100 RAID99 Service Manual Moving Storagesets and Devices Moving a single disk drive unit Service Manual You can move a single disk drive unit from one subsystem to another without destroying its data You can also use the procedure given below to move a unit to a new location within the same subsystem To move a single disk drive unit while maintaining the data it contains 1 Show the details for the unit that you want to move HSJ50 gt SHOW unit number Delete the unit number shown in the Used by column of the SHOW unit number command HSJ50 gt DELETE unit number Delete the disk drive HSJ50 gt DELETE disk name Remove the disk drive and move it to its new PTL location If you re moving disk drives from an HSC subsystem run CHVSN to generate a unique volume serial number for each disk otherwise skip this step HSJ50 gt RUN CHVSN Device port target lun EXIT PTL location CHVSN Volume Serial Number is 00000
82. 0 attempt to read CACHEAO memory The CACHEA1 DRAB detected a Multiple Bit ECC error during an FX attempt to read CACHEAI memory The CACHEA1 DRAB detected a Multiple Bit ECC error during a host port attempt to read CACHEA1 memory The CACHEA1 DRAB detected a Multiple Bit ECC error during a device port attempt to read CACHEA1 memory The CACHEA1 DRAB detected a Multiple Bit ECC error during an 1960 attempt to read CACHEAI memory The CACHEBO DRAB detected a Multiple Bit ECC error during an FX attempt to read CACHEBO memory The CACHEBO DRAB detected a Multiple Bit ECC error during a host port attempt to read CACHEBO memory The CACHEBO DRAB detected a Multiple Bit ECC error during a device port attempt to read CACHEBO memory The CACHEBO DRAB detected a Multiple Bit ECC error during an 1960 attempt to read CACHEBO memory The CACHEB1 DRAB detected a Multiple Bit ECC error during an FX attempt to read CACHEB1 memory The CACHEB1 DRAB detected a Multiple Bit ECC error during a host port attempt to read CACHEB1 memory HSJ50 Array Controller Appendix A A 5 Instance Code Explanation 014B2A02 The CACHEB1 DRAB detected a Multiple Bit ECC error during a device port attempt to read CACHEB1 memory 014C2A02 The CACHEB1 DRAB detected a Multiple Bit ECC error during an 1960 attempt to read CACHEB1 memory 014D3702 The Master DRAB unexpectedly reported a 37 Multiple Bit ECC error 014E3702 The CACHEAO DRAB unexpe
83. 000 00000000 Update CHVSN Y N Y HSC controllers accept duplicate volume serial numbers whereas Model controllers do not If CHVSN reports a volume serial number of zero for a disk drive press Y when prompted to Update CHVSN HSJ50 Array Controller Moving Storagesets and Devices HSJ50 Array Controller Re add the disk drive to the controller s list of valid devices HSJ50 gt ADD DISK disk name PTL location Re present the disk drive to the host by giving it a unit number that the host can recognize You can use the original unit number or create a HSJ50 gt ADD UNIT unit number disk name 6 7 new one Example The following example moves Disk507 to PTL location 100 Its new name will be Disk100 to correspond to its new PTL location HSJ50 gt HSJ50 gt HSJ50 gt HSJ50 gt HSJ50 gt Show D507 Delete D507 Delete Disk100 Add Disk100 1 0 0 Add D507 Disk100 Service Manual 4 10 Moving Storagesets and Devices Moving devices Follow these steps to move a device such as a disk drive tape drive CDROM drive or tape loader 1 2 Service Manual Quiesce the bus that services the device you want to move Show the details for the device that you want to move If you re moving a tape loader show the details for the passthrough device that s associated with it HSJ50 gt SHOW device name If the device has a unit number associated with it delete the unit number
84. 0A020100 ILF CACHE_READY unable to allocate 01 necessary DWDs e 01 Table A 12 CLI last failure codes Last Fail Code Explanation 20010100 The action for work on the CLI queue should be CLI_CONNECT CLI_COMMAND_IN or CLI_PROMPT If it isn t one of these three this bugcheck will result 20020100 The FAO returned a non successful response Ol This will only happen if a bad format is detected or the formatted string overflows the output buffer 20030100 The type of work received on the CLI work 1 queue wasn t of type CLI 20070100 A work item of an unknown type was placed 01 on the CLI s DUP Virtual Terminal thread s work queue by the CLI 20080000 This controller requested this controller to restart 20090010 This controller requested this controller to shutdown 200A0000 This controller requested this controller to selftest 200B0100 Could not get enough memory for FCBs to 0 receive information from the Other controller 20060100 A work item of an unknown type was placed 0l on the CLI s SCSI Virtual Terminal thread s work queue by the CLI 00 00 00 1 Service Manual HSJ50 Array Controller Appendix A A 73 Last Fail Code Explanation 200C0100 After a CLI command the NV memory was still locked The CLI should always unlock NV memory when the command is complete if it had an error or not Removed from HSOF firmware at Version 2 7 200D0101 After many calls to DS PORT_BLOCKED we never got a FALSE status back
85. 1 workblock to queue to the NVFOC thread 08150100 A lock was requested by the other controller but the memory is already locked by the other controller 08160100 A request to clear the remote configuration 1 was received but the memory was not locked 08170100 A request to read the next configuration was 1 received but the memory was not locked 1 1 01 08180100 Could not get enough memory for FLS FCBs 01 to receive information from the other controller 08190100 An unlock command was received when the NV memory was not locked Removed from HSOF firmware at Version 2 7 081A0100 Unable to allocate memory for remote work 081B0101 Bad remote work received on remote work 01 queue Last Failure Parameter 0 contains the id type value that was received on the NVFOC remote work queue 081C0101 Bad member management work received 01 Last Failure Parameter 0 contains the bad member management value that was detected Service Manual HSJ50 Array Controller 0 0 0 0 0 0 0 0 0 Appendix A A 71 081F0000 An FLM INSUFFICIENT_RESOURCES error was returned from a FLM lock or unlock call 08200000 Expected restart so the write_instance may recover from a configuration mismatch Last Fail Code Explanation 09010100 Unable to acquire memory to initialize the FLM structures 09640101 Work that was not FLM work was found on the FLM queue Bad format is detected or the formatted string overflows the output buffer Last Fai
86. 1 A miscellaneous SCSI Port Driver coding error was encountered during tape operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 038A0101 A tape related error code was reported that was unknown to the Fault Management firmware Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 038B450A The tape device reported standard SCSI Sense 45 Data 03964002 An unrecoverable media loader error was 40 encountered while performing work related to media loader operations 03994002 A Drive failed because a Test Unit Ready 40 command or a Read Capacity command failed 039A000A The drive was failed by a Mode Select 0 command received from the host i 0 039B4002 The drive failed due to a deferred error 0 reported by drive 039C4002 Unrecovered Read or Write error 039D4002 No response from one or more drives Service Manual HSJ50 Array Controller 4 4 Appendix A A 29 Instance Code Explanation 039E430A Nonvolatile memory and drive metadata indicate conflicting drive configurations 039F430A The Synchronous Transfer Value differs between drives in the same storageset 03A04002 Maximum number of errors for this data transfer operation exceeded 03A14002 The drive reported recovered error without transferring all data 03A24002 Data returned from drive is invalid
87. 100 DILX tried to issue a oneshot I O for an opcode not supported 800D0100 A DILX device control block contains an unsupported unit_state 800E0100 While trying to print an Event Information 01 Packet DILX discovered an unsupported MSCP error log format 800F0100 A DILX cmd completed with a sense key that DILX does not support 80100100 DILX could not compare buffers because no memory was available from EXEC ALLOCATE_MEM_ZEROED 80110100 While DILX was deallocating his deferred error buffers at least one could not be found 80120100 DILX expected an eip to be on the receive eip 0 q but no eips were there Service Manual HSJ50 Array Controller 80080100 While DILX was deallocating his eip buffers at least one could not be found 01 01 01 01 01 01 01 01 1 01 01 01 01 01 01 1 Appendix A A 87 Last Fail Code Explanation 80130100 DILX was asked to fill a data buffer with an 01 unsupported data pattern 80140100 DILX could not process an unsupported Ol answer in dx reuse_params 80150100 A deferred error was received with an 0l unsupported template Table A 21 Tape inline exerciser TILX last failure codes Last Fail Code Explanation 81010100 An HTB was not available to issue an I O when it should have been 81020100 A unit could not be dropped from testing because an available cmd failed 81030100 TILX tried to release a facility that wasn t reserved by TILX 81040100 TILX tried to change the unit state fro
88. 1072 97 0 Description This display indicates the number of packets sent and received through the DSSI port and the packet rate This display is available only on DSSI based controllers Service Manual 1 50 Troubleshooting Packets received from a remote node Packets sent to a remote node that were ACKed Packets sent to a remote node that were NAKed Packets sent to a remote node for which no response was received Cl DSSI Connection Status Service Manual Connections 0123456789 Description This display shows the current status of any connections to a remote CI or DSSI node This display is available only on CI and DSSI based controllers Each position in the data field represents one of the possible nodes to which the controller can communicate To locate the connection status for a given node use the column on the left to determine the high order digit of the node number and use the second row to determine the low order digit of the node number For CI controllers the number of nodes displayed is determined by the controllers MAX NODE parameter The maximum supported value for this parameter is 32 For DSSI controllers the number of nodes is fixed at 8 Each location in the grid contains a character to indicate the connection status C indicates one connection to that node In this example node 12 shows one connection This usually happens if a host has multiple adapters and it is us
89. 2010100 Initialization code was unable to allocate enough memory to setup the send data descriptors 02040100 Unable to allocate memory necessary for data buffers 02050100 Unable to allocate memory for the Free Buffer Array 02080100 A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when populating the disk read DWD stack Service Manual HSJ50 Array Controller Appendix A HSJ50 Array Controller Explanation A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when populating the disk write DWD stack A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when populating the tape read DWD stack A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when populating the tape write DWD stack A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when populating the miscellaneous DWD stack A call to EXEC ALLOCATE_MEM_ZEROED failed to return memory when creating the device services state table Unable to allocate memory for the Free Node Array Unable to allocate memory for the Free Buffer Descriptor Array Unable to allocate memory for the Free Buffer rray Unable to allocate memory for the Free Strip ode Array Unable to allocate memory for WARPs and RMDs Invalid parameters in CACHE OFFER_META call No buffer found for CACHE MARK_META_DIRTY call A 47 01 01 01 01 Service Manual A 48 Last Fail Code 02270104 022C0100 022D0100
90. 22C0064 022D0064 022F0064 02300064 02310064 023B0064 023C0064 Service Manual Appendix A Explanation The device specified in the Device Locator field failed to be added to the mirrorset associated with the logical unit The device will remain in the Spareset The device specified in the Device Locator field failed to be added to the mirrorset associated with the logical unit The failed device has been moved to the Failedset The device specified in the Device Locator has transitioned from Copying or Normalizing state to Normal state The mirrorset associated with the logical unit has gone inoperative Note that in this instance information supplied in the Device Locator Device Firmware Revision Level Device Product ID and Device Type fields is for the first device in the mirrorset The mirrored device specified in the Device Locator field has been converted to a single device associated with the logical unit The device specified in the Device Locator field has been reduced from its associated mirrorset The nominal number of members in the mirrorset has been decreased by one The reduced device is now available for use The mirrorset associated with the logical unit has had its nominal membership changed The new nominal number of members for the mirrorset is specified in the Associated Port field Note that in this instance information supplied in the Device Locator Device Firmware Revision Le
91. 3 Read Only Use the Basic Function test 99 9 of the time The User Defined test is for special problems only Enter test number 1 3 1 1 Specify the data pattern to be used for the writes Unless you have some specific requirement select 0 to use all patterns See Table 1 3 for a listing of available patterns Enter data pattern number O ALL 19 USER_DEFINED 0 19 0 0 Set the number of records to be written read during the test Enter record count 1 4294967295 4096 1000 If you want to test data integrity set the percentage of read and write commands that will have a data compare operation performed Perform data compare y n n y Enter compare percentage 1 100 2 1 The system displays a list of all tape drive units by unit number that you can choose for TILX testing Select the first tape drive that you want to test Do not include the letter T in the unit number Enter unit number to be tested 350 Check to be sure that a tape is loaded in the drive then answer yes to the next question Is a tape loaded and ready answer Yes when ready Y TILX indicates whether it has been able to allocate the tape drive If you want to test more tape drives enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n TILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes TILX wi
92. 3 1 2 inch or 5 1 4 inch form factors The HSJ50 controller supports the following devices e 3 5 inch and 5 25 inch disk drives Figure 3 28 Typical 3 5 inch and 5 25 inch disk drive or optical disk SBBs Device Device ctivity Activity Green Green Amber Amber CXO 5254A MC Service Manual HSJ50 Array Controllers Installing 3 65 e CDROM drives in 5 1 4 inch Storage Works building blocks Figure 3 29 Typical 5 25 inch CD ROM SBB CXO 5167A MC HSJ50 Array Controllers Service Manual 3 66 Installing e Solid state disks and tape drives Figure 3 30 Typical 3 5 inch tape drive SBB A CXO 5168A MC Caution Do not install solid state disk and CD ROM drives when power is applied to the shelf SBB activity and status indicators Most storage device have two LEDs that display SBB status These LEDs have three states on off and flashing The upper LED green is the device activity indicator and is on or flashing when the SBB is active The lower LED amber is the device status indicator and indicates an error condition or a configuration problem when it is on or flashing See Table 3 16 Service Manual HSJ50 Array Controllers Installing HSJ50 Array Controllers LED Device activity Device fault Device activity Device fault Device activity Device fault Device activity Device fault Device activity Device fault Device activity Devic
93. 7040100 Unable to restart the Failover Control Timer 1 07050100 Unable to allocate flush buffer 07060100 Unable to allocate active receive fcb 07070100 The other controller killed this controller but 01 could not assert the kill line because nindy was on or in debug It killed this controller now 07080000 The other controller crashed so this one must 00 crash too Service Manual HSJ50 Array Controller Appendix A A 69 Table A 9 Nonvolatile parameter memory failover control last failure codes 08010101 A remote state change was received from the FOC thread that NVFOC does not recognize Last Failure Parameter 0 contains the unrecognized state value 08020100 No memory could be allocated for a NVFOC information packet 08030101 Work received on the S_nvfoc_bque did not have a NVFOC work id Last Failure Parameter 0 contains the id type value that was received on the NVFOC work queue 08040101 Unknown work value received by the S_nvfoc_bque Last Failure Parameter 0 contains the unknown work value 08050100 An unlock was received and the controller was not locked by the other controller This last_failure code was removed from HSOF firmware at Version 2 7 08060100 A really write command was received when the NV memory was not locked 08070100 A write to NV memory was received while not locked 08080000 The other controller requested this controller to restart 08090010 The other controller requested this
94. ATION failed due to INSUFFICIENT_RESOURCES 41180100 Encountered a NULL completion routine 0 pointer in a Work q entry 01 01 01 1 01 01 01 01 01 01 01 01 1 Service Manual HSJ50 Array Controller Appendix A A 77 Table A 15 Host interconnect port services last failure codes Last Fail Code Explanation Repair Action Code 42000100 Cmpl_main routine found invalid port xmt status 42040100 Host port buffer allocation macro found an 01 error allocating free buffers The free buffer was NULLPTR DEBUG conditional 42060100 HP_INIT could not allocate initial buffers for 420B0100 HP_INIT could not allocate initial buffs for Path A dl_ctl table 420C0100 HP_INIT could not allocate initial htb for Path ee A 01 Ol 01 01 01 01 420D0100 HP_INIT could not allocate HPHW structure 01 42120100 Host port polling HTB failed to complete in Ol time This last_failure code was removed from HSOF firmware at Version 2 7 42126500 Host port polling HTB failed to complete in 65 time 42130100 Host port detected a inconsistency in the HW 01 transmit status 20 01 01 0 0 0 42316601 Host port found that the controller has exceeded the maximum number of user specified host VCS Last Failure Parameter 0 is a 32 bit MASK of OPEN VCS the controller sees to host nodes 42332080 Receive_main found destination address in the rcv packet does not match node address 42340100 HP could not allocate buffers for I O rundown in V
95. C Close 42350100 HP found a negative offset in a Host Data transfer Operation 42382080 Ci_isr found that the yaci hardware had 20 invalid xmt status on Path A no bits set 42392080 Ci_isr found that the yaci hardware had 2 invalid xmt status on Path B no bits set 423A2080 CI_ISR found the abort bit set with out any 2 valid reason Path A 423B2080 CI_ISR found transmit parity error without 2 abort bit set Path A HSJ50 Array Controller Service Manual A 78 Appendix A Last Fail Code Explanation 423C2080 CI_ISR found buffer underflow without abort 2 bit set Path A 423D2080 CLISR found the abort bit set with out any 2 valid reason Path B 423E2080 CI_ISR found transmit parity error without 2 abort bit set Path B 423F2080 CI_ISR found buffer underflow without abort 2 bit set Path B i 0 0 0 0 0 0 0 20 42452080 Ci_isr found that yaci hardware had a bus timeout error 42472080 Ci_isr found Data parity on Transmit Path A 42482080 Ci_isr found Data parity on Transmit Path B f20 424B0001 Ci_isr found Host Reset on Path A 00 Last Failure Parameter 0 contains the node number of the resetting node 424C0001 Ci_isr found Host Reset on Path B Last Failure Parameter 0 contains the node number of the resetting node 424D2080 Ci_isr found Fetch parity on Transmit Path A 20s 42442080 Ci_isr found that yaci hardware had a parity error is i smi 20 424E2080 Ci_isr found Fetch parity on Transmit
96. CD ROM readers and optical devices Table 5 4 Required tools Required tools To shutdown and to restart controllers 5 32 inch Allen wrench To unlock the SW800 series cabinet 1 Connect a maintenance terminal to the controller see Figure 5 7 HSJ50 Array Controller Service Manual 5 12 Service Manual Removing Figure 5 7 Connecting a maintenance terminal to the controller Local connection port BC16E XX he To terminal CXO 5322A MC Halt all host I O activity using the appropriate procedures for your operating system Take the controllers out of service HSJ50 gt SHUTDOWN OTHER_CONTROLLER HSJ50 gt SHUTDOWN OTHER_CONTROLLER Remove the power cords from the shelf power supplies Remove the device by pressing the two mounting tabs together to release it from the shelf Using both hands remove the device from the shelf Restart the system HSJ50 Array Controller Removing 5 13 Removing tape drives Use the following procedure to remove tape drives Required tools The tools listed in Table 5 5 are required for removing tape drives Table 5 5 Required tools Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet 1 Halt all I O activity to the tape drive using the appropriate procedures for your operating system 2 Quiesce the appropriate device port by pushing the device port button on the controller s OCP operator control panel 3 When the OCP LEDs flash
97. DIGITAL StorageWorks HSJ50 Array Controller HSOF Version 5 1 Service Manual Part Number EK HSJ50 SV BO1 March 1997 Software Version HSOF Version 5 1 Digital Equipment Corporation Maynard Massachusetts March 1997 While Digital Equipment Corporation believes the information included in this manual is correct as of the date of publication it is subject to change without notice DIGITAL makes no representations that the interconnection of its products in the manner described in this document will not infringe existing or future patent rights nor do the descriptions contained in this document imply the granting of licenses to make use or sell equipment or software in accordance with the description No responsibility is assumed for the use or reliability of firmware on equipment not supplied by DIGITAL or its affiliated companies Possession use or copying of the software or firmware described in this documentation is authorized only pursuant to a valid written license from DIGITAL an authorized sublicensor or the identified licensor Commercial Computer Software Computer Software Documentation and Technical Data for Commercial Items are licensed to the U S Government with DIGITAL s standard commercial license and when applicable the rights in DFAR 252 227 7015 Technical Data Commercial Items Digital Equipment Corporation 1997 Printed in U S A All rights reserved Alpha CI DCL DECconnect DECserver DIG
98. DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers Service Manual A 60 Service Manual Last Fail Code 03370108 03380188 Appendix A Explanation A device port detected an illegal script instruction Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers A device port s DSTAT register contains multiple asserted bits or an invalidity asserted bit or both Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parame
99. EBO memory HSJ50 Array Controller Appendix A HSJ50 Array Controller 01E93102 01EA3702 The Master DRAB unexpectedly reported a Write Data Parity error 01EB3702 The CACHEAO DRAB unexpectedly reported a Write Data Parity error A 15 Explanation The CACHEB1 DRAB detected a Write Data Parity error during an FX attempt to write CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during an FX attempt to write a byte to CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during a Host port attempt to write CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during a Host port attempt to write a byte to CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during a Device port attempt to write CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during a Device port attempt to write a byte to CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during an 1960 attempt to write CACHEB1 memory The CACHEB1 DRAB detected a Write Data Parity error during an 1960 attempt to write a byte to CACHEB1 memory 01ED3702 The CACHEBO DRAB unexpectedly reported a Write Data Parity error 37 37 01EC3702 The CACHEA1 DRAB unexpectedly reported 37 a Write Data Parity error 37 37 01EE3702 The CACHEB1 DRAB unexpectedly reported a Write Data Parity error 02020064 02032001 Disk Bad Block Replacement attempt completed for
100. ER SUPPLY 2X FIRST CONTROLLER BA350 M SHELF k De p e 00 6 0 0 G Oo 0L Oral G e G a a O o CXO 5006A MC Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 9 Connect the ECB cable to the cache module and then to the ECB 10 Unsnap and remove the program card ESD cover 11 Remove the program card from the controller by pressing and holding in the Reset button then pressing the eject button next to the program card 12 Connect the power cords to the controller power supplies HSJ50 Array Controllers Service Manual 3 30 Installing 13 Press and hold the Reset button while inserting the program card Release the Reset button The controller will initialize and perform all internal self tests When the reset light flashes at a rate of once every second the initialization process is complete 14 Snap the ESD cover into place over the program card Push the pins inward to lock the cover into place 15 Connect the host CI connector to the controller 16 Check the ECB status indicator for the appropriate indication See Table 3 4 Table 3 4 ECB status indications LED status Battery status LED is on System power is on and the ECB is continuously fully charged LED blinks rapidly System power is on and the ECB is
101. FF FDFF FRBFF F7FF EFFF BFFF DFFF 7FFF DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D B6DB 6DB6 DB6D 3333 3333 3333 1999 9999 9999 B6D9 B6D9 B6D9 B6D9 FFFF FFFF 0000 0000 DB6C DB6C 9999 1999 699C E99C 9921 9921 1921 699C 699C 0747 0747 0747 699C E99C 9999 9999 E Default Use all of the above patterns in a random method 15 16 17 18 Monitoring system performance with the VTDPY utility The VTDPY utility gathers and displays system state and performance information for the HS family of modular storage controllers The information displayed includes processor utilization host port activity and status device state logical unit state and cache and I O performance The VTDPY utility requires a video terminal that supports ANSI control sequences such as a VT220 VT320 or VT420 terminal A graphics display that provides emulation of an ANSI compatible video terminal can also be used For DSSI and CI based HS controllers VTDPY can be run on terminals either directly connected to the HS controller or on terminals connected through a host based DUP connection For SCSI based HS controllers VTDPY can be run only on terminals connected to the HS controller maintenance terminal port Note that VCS can be used only from a terminal attached to the terminal port on the front bezel of the HS array controller The following sections show how to use the VTDPY utility How to Run VTDPY
102. HER_CONTROLLER CLI gt SHUTDOWN THIS CONTROLLER To ensure that the controller has shut down cleanly check the OCP for the following indications The Reset LED is continuously lit Port LEDs 1 2 3 are also lit continuously 4 After the controllers have shut down remove the maintenance terminal cable and remove the power cords from the controller power supplies Service Manual 3 54 Installing 5 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist 6 Attach or clip the other end of the ESD strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface 7 Disable the ECB by pressing the battery disable switch on the battery module s front panel 8 Loosen the captive screws on the controller s front bezel 9 With a small flat head screw driver loosen the captive screws on the CI cable of each controller and remove the cable See Figure 3 22 Figure 3 22 Disconnecting the CI cable Cl bus cable CXO 5319A MC Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 10 Disconnect the battery cable from the ECB SBB 11 Use a gentle rocking motion to loosen the controller modules Service Manual HSJ50 Array Controllers Installing 3 55 12 Slide the controller m
103. ITAL DSSI HSC HSJ HSD HSZ MSCP OpenVMS StorageWorks TMSCP VAX VAXcluster VAX 7000 VAX 10000 VMS VMScluster and the DIGITAL logo are trademarks of Digital Equipment Corporation All other trademarks and registered trademarks are the property of their respective holders This equipment has been tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of the FCC Rules These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment This equipment generates uses and can radiate radio frequency energy and if not installed and used in accordance with the instruction manual may cause harmful interference to radio communications Operation of this equipment in a residential area is likely to cause harmful interference in which case the user will be required to correct the interference at his own expense Restrictions apply to the use of the local connection port on this series of controllers failure to observe these restrictions may result in harmful interference Always disconnect this port as soon as possible after completing the setup operation Any changes or modifications made to this equipment may void the user s authority to operate the equipment Warning This is a Class A product In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures Acht
104. Menu 0 Exit 1 Enter a Patch 2 Delete Patches 3 List Patches Enter option number 0 3 0 Select option 2 to delete a patch This is the Delete Patches option The program prompts you for the firmware version and patch number you wish to delete If you select a patch for deletion that is required for an other patch all dependent patches are also selected for deletion The program lists your deletion selections and asks if you wish to continue Type Y or C then RETURN at any time to abort Code Patch HSJ50 Array Controller Removing 5 5 The following patches are currently stored in the patch area Firmware Version Patch number s V123 T2 V456 1 Currently 90 of the patch area is free Firmware Version of patch to delete 5 Enter the firmware version V456 6 Patch Number to delete The following patches have been selected for deletion Firmware Version Patch number s V456 1 Do you wish to continue y n y 9 7 Press Y to continue The patch you have just deleted is currently applied but will not be applied when the controller is restarted Code Patch Main Men O Exit 1 Enter a Patch 2 Delete Patches 3 List Patches Enter option number 0 3 0 The following patches are currently stored in the patch area Firmware Version Patch number s V123 V2 Currently 95 of the patch area is free HSJ50 Array Controller Service Manual
105. Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers The required error information packet EIP or device work descriptor DWD were not supplied to the Device Services error logging code A 57 0 0 0 032A0100 HIS GET_CONN_INFO returned an 01 unexpected completion code 032B0100 03320101 HSJ50 Array Controller A Device Work Descriptor DWD was supplied with a NULL Physical Unit Block PUB pointer An invalid code was passed to the error recovery thread in the error_stat field of the PCB Last Failure Parameter 0 contains the PCB error_stat code Service Manual A 58 Service Manual Appendix A Last Fail Code Explanation 03330188 A parity error was detected by a device port while sending data out onto the SCSI bus Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies o
106. Path B 20 fers 01 424F0100 HP could not allocate buffers to repopulate dds when we close path 42506700 CI Host port detected an arbitration timeout on 67 path A Persistent Hardware faults generating an arbitration timeout will cause the controller to repeatedly reboot 42516700 CI Host port detected an arbitration timeout on 67 path b Persistent Hardware faults generating an arbitration timeout will cause the controller to repeatedly reboot 42590001 Ci_isr found Host Reset on Path A 00 Last Failure Parameter 0 contains the node number of the resetting node 425A0001 Ci_isr found Host Reset on Path B 00 Last Failure Parameter 0 contains the node number of the resetting node 425B2080 CI_ISR found the abort bit set with out any 20 valid reason Path A Service Manual HSJ50 Array Controller Appendix A A 79 Last Fail Code Explanation 425C2080 CI_ISR found transmit parity error without 20 abort bit set Path A 425D2080 CI_ISR found buffer underflow without abort 20 bit set Path A 425E2080 Ci_isr found that the yaci hardware had 20 invalid xmt status on path a no bits set 425F2080 CI_ISR found the abort bit set with out any 20 valid reason Path B 42602080 CI_ISR found transmit parity error without 20 abort bit set Path B 42612080 CI_ISR found buffer underflow without abort 20 bit set Path B 42622080 Ci_isr found that the yaci hardware had 20 invalid xmt status on Path B no bits set 42632080 Receive_
107. Press Y for yes Answer the question Will its cache module also be removed Y N n Press Y for yes Wait for the following text to be displayed at the console Killing other controller Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced HSJ50 Array Controller Replacing field replaceable units 2 43 10 11 12 HSJ50 Array Controller Port 6 quiesced All ports quiesced Remove the other controller the one WITHOUT a blinking green LED within 5 minutes Time remaining 4 minutes 50 seconds Place the ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface Unsnap and remove the program card ESD cover on the controller you are removing See Figure 2 21 Remove the program card by pressing and holding in the Reset button then pressing the eject button next to the card See Figure 2 21 Slide the controller module out of the shelf noting which rails the module was seated in and place it on an ESD mat See Figure 2 22 Do not remove the CI connector Wait for the following text to be displayed at the operating controller s console Note You may remove the cache module before or after port activity has restarted Restarting all ports Port 1 restart
108. SHOW THIS _CONTROLLER The controller will display the following information this is a sample only Controller HSJ50 2634901786 Firmware V05 0 0 Hardware BX11 Not configured for dual redundancy SCSI address 7 Time 15 JUN 1995 16 32 54 Host port ode name HSJA1 valid CI node 21 32 max nodes Path A is on Path B is on SCP allocation class 3 MSCP allocation class 3 CI_ARBITRATION SYNCHRONOUS AXIMUM HOSTS 9 OCI_4K_PACKET _CAPABILITY 128 megabyte write cache version 3 Cache is GOOD Battery is good No unflushed data in cache CACHE FLUSH TIMER DEFAULT 10 seconds CACHE POLICY B NOCACHE_UPS Note the type memory size and version of cache that is installed with the first controller If a cache module is present with the first controller prepare another one of the same type memory size and version for installation with the second controller An additional single ECB or a dual ECB must also be installed Use the procedures appropriate to your host operating system to halt host activity on your subsystem At the CLI prompt enter CLI gt SHUTDOWN THIS CONTROLLER When you enter the SHUTDOWN command do not specify any optional qualifiers The default qualifiers do not allow the controller to shut HSJ50 Array Controllers Installing 10 3 33 down until data is completely and successfully stored on the appropriate storage devices Obtain and place an ESD
109. SJ50 Array Controllers Installing HSJ50 Array Controllers 3 23 Message CAUTION In order to minimize the possibility of a SCSI bus reset which could disable the destination device it is recommended that you prevent IO operations to all other devices on the same port as the destination device Explanation Displayed in code load only A SCSI bus reset can occur if the controller is manually initialized or if it detects an error during normal subsystem operation The more active devices there are on the same port as the target device the greater the chance that an error causing a SCSI bus reset may occur By minimizing the level of activity on the device port being used for code loading the user minimizes the chances of a SCSI bus reset that could render a target device unusable Message Exclusive access is declared for unit unit_number Explanation Another subsystem function has reserved the unit shown Message The other controller has exclusive access declared for unit unit_number Explanation The companion controller has locked out this controller from accessing the unit shown Message The RUNSTOP_SWITCH is set to RUN_DISABLED for unit unit_number Explanation The RUN NORUN unit indicator for the unit shown is set to NORUN The disk is not spun up Service Manual 3 24 Service Manual Installing Message No available unattached devices Explanation The program could find no unat
110. SP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers HSJ50 Array Controller Appendix A A 63 Last Fail Code Explanation 03410101 Invalid SCSI device type in PUB Last Failure Parameter 0 contains the PUB SCSI device type 03420188 A UDC interrupt could not be associated with either a DWD or the non callable scripts Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers 03470100 Insufficient memory available for target block 01 allocation 03480100 Insufficient memory available for device port 01 info block allocation 03490100 Insufficient memor
111. SUDSYSteM ee oer ern re ai aa E EE EE E aaa Eei aE 2 41 Servicing the second cache module 0 eee eee eeseeesseecneeesseeceseeeseeceseeeeseeesaeeeseesneers 2 42 Removing the SBB battery module eee eeeeeeeneeseeeseeceeeceacecseeceeseseeeesaeenaee 2 42 Reinstalling the modules ccessecceesssceeeeceeceeseeeeeeeceeseecseeeeeeeeeeceseeeeenseeeseaees 2 44 Restarting the subsystem e E raa aa ii ane Asin 2 46 Replacing power supplies c ceeesccceeeeseceeneeeeeeeneceesneeecseeeeeeaeeecenseeceeeeeceaeeeceeeaeeeenseeeeeeas 2 47 REQuITed tOOIS est ss a03 acakccaniahcausaleatic E oteacie eat euea ae dees 2 47 Removing the power supply cccsccceeeencccenceeeeeeneeceaceceeseaeesseeeeesseaeeeseaeeeseeneeensas 2 47 Installing the new power SUPPLY eeeeeeceeeceseecneecsseeesceceseeeteeeesecseeeesecesaeeeeeeesaee 2 48 Asynchronous swap method cc eeceeeseeesseecsseeeeseeessneeesaeecsacecseecesaeeesaeecsaeesseesseeeesaes 2 49 Replacitie stora s devices eprori iaa e n sag a E EE Taaa ie 2 50 Asynchronous disk drive swap esesesesesesesessseserersrersrererereeesseeseeestesstesstessreesressreesreseee 2 50 Required tool ETEA ast oie ee andi o honed saitionins 2 50 Disk drive replacement procedure 3 1 2 and 5 1 4 inch drives eeeeeeeseeeneeeeeeneeee 2 51 Replacitig tape drivess s stscesseziacisdisvigsveeeseasscieagansaeahe shes a oe E E R a NTE E 2 53 Required tools on n a Baste a n E ee
112. Service Manual Caution Suspend all non HSUTIL I O to the buses that service the source disk drive and the target device Loading the incorrect firmware can disable the destination device If a failure occurs while loading drive memory the destination device could be disabled To install new device firmware 1 In dual redundant configurations you should shut down the controller that you won t be using for the installation and eject its program card After you ve finished installing the firmware reinstall the program card and restart the controller Start HSUTIL CLI gt RUN HSUTIL Press 2 to select DEVICE_CODE_LOAD_DISK or 3 to select the DEVICE_CODE_LOAD_TAPE HSUTIL finds and displays all of the disk drives that may contain the new firmware for your device Enter the unit number of the disk drive that actually contains the firmware Which unit is the code to be loaded from unit number HSJ50 Array Controllers Installing HSJ50 Array Controllers 5 Enter the starting LBN of the firmware In most cases you can accept the default 0 What is the starting LBN of the code on the unit where the code is to be loaded FROM 0 0 6 Enter the product ID of the device that you re updating Enter this information exactly as it appears in the SHOW command output What is the SCSI PRODUCT ID of the device that you want code load TO Product ID 7 Enter the name of the target device Which device
113. THRESHOLD_CODE RESTART_TYPE SCSI_COMMAND_OPERATION_CODE SENSE_DATA_QUALIFIERS SENSE_KEY_CODE TEMPLATE_CODE Service Manual 1 20 Service Manual Troubleshooting To translate a code 1 Start FMU from the CLI HSJ50 gt RUN FMU Use the correct DESCRIBE command and give it the code number that you want translated FMU gt DESCRIBE code type code number additional numbers FMU Output Example HSJ50 gt RUN FMU Fault Management Utility FMU gt DESCRIBE INSTANCE_CODE 030C4002 Instance Code 030C4002 Description A Drive failed because a Test Unit Ready command or a Read Capacity command failed Reporting Component 3 03 Description Device Services Reporting component s event number 12 0C Event Threshold 2 02 Classification HARD Failure of a component that affects controller performance or precludes access to a device connected to the controller is indicated FMU gt DESCRIBE REPAIR ACTION _ CODE 22 Recommended Repair Action Code 34 22 Description Replace the indicated cache module If you are unsure what value to enter with the DESCRIBE command type a question mark in place of a parameter to see the value and range required For those code types that require multiple values you must supply values for the earlier parameters before entering a question mark for the later values FMU Help Example FMU gt DESCRIBE ASC_ASCQ CODE Your options are ASC value range 0
114. U has detected an external air sense OE fault Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Service Manual HSJ50 Array Controller Appendix A Instance Code 03F00402 03F10502 03F20064 HSJ50 Array Controller A 35 Explanation The EMU detected power supply fault is now fixed Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The EMU detected bad fan fault is now fixed Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The EMU detected elevated temperature fault is now fixed Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The EMU detected external air sense fault is now fixed Note that in this instance the Associated Target Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined The shelf indicated by the port field is reporting a problem This could mean one or both of the following If the shelf is using dual power supplies one power supply has failed One of the shelf cooling fans has failed Note that in this instance the Associated
115. Wan me tT WU an Wt HRN tm e ee WAX 6000 320 Wining ME 2 ui Mn ene CXO 5259A MC Service Manual Installing Considerations for installing new device firmware Keep the following points in mind while using HSUTIL to install new firmware on a device Service Manual Installing new firmware with HSUTIL has been thoroughly tested with the qualified devices listed in the release notes HSUTIL doesn t prevent you from attempting to install new firmware on unsupported devices but if the installation fails the device may be rendered unusable and require the manufacturer s attention If the power fails or the bus is reset while HSUTIL is installing the new firmware the device may become unusable To minimize this possibility Digital recommends that you secure a reliable power source and suspend all non HSUTL activity to the bus that services the target device HSUTIL cannot install firmware on devices that have been configured as single disk drive units or as members of a storageset spareset or failedset If you want to install firmware on a device that s previously been configured as such you ll have to delete the unit number and storageset name associated with it The source disk drive that contains the new firmware to be downloaded must be configured as a single disk driv
116. Y displays The display and subdisplay you see depends on the type of controller CI and DSSI based controllers or SCSI and HS based controllers Characteristics for each are included in the field description sections that follow Service Manual Troubleshooting Cl DSSI Host Port Characteristics Node HSJ501 Port 13 SysId 4200100D0720 Description This subdisplay shows the current host port identification information This subdisplay is available only for CI or DSSI based controllers SCS node name Port number SCS system ID SCSI Host Port Characteristics Service Manual Xfer Rate TOwel16mMhnzO 1 W 7 10 00 2 W Async Description This subdisplay shows the current host port SCSI target identification any initiator which has negotiated synchronous transfers and the negotiated transfer method currently in use between the controller and the initiators This subdisplay is available only for SCSI based HS controllers SCSI host port target ID Transfer width W indicates 16 bit or wide transfers are being used A space indicates 8 bit transfers are being used The initiator with which synchronous communication has been negotiated A numeric value indicates the synchronous data rate which has been negotiated with the initiator at the specified SCSI ID The value is listed in megahertz Mhz In this example the negotiated synchronous transfer rate is approximately 3 57 Mhz To convert thi
117. a a E e a ae 2 53 Tape drive replacement procedure siii eieren arenei ie E aaa EEs EA 2 53 Replacing a solid state disk drive optical or CD ROM drives ssesseesseserserersererrererrererrene 2 54 Required TOONS 2255285 sec ceaccthssendasetsetevads Meeeat sien EAE A Ea EER ANa 2 54 Solid state optical and CD ROM drive replacement procedure 0 eee eeeeeeeeeeeeeeeeee 2 54 Replacing internal CI Cables ici ciisicii soci inv ivsvohiedaiavmnodiedisdeoiatiedes 2 56 Required toolsiiic titset eel ioe Mie lt RL Mi So Sita eee 2 56 Replacing the internal CI cables eee eeeeeeeceeseeeceeceeeecssecsaeecsaeecseecsaeecseeeseeseeeeeseeee 2 56 Replacing SCSI device port Cables ienirt ioii te aosi eies PES eitus ees 2 58 R duire d TOGIS 555 sess r eea a a a E A E RE A biel a a hash aR EEE REEERE 2 58 Replacing the device Cables iein iasi nnne eko aie aai aaan E a a ERE TEETER ASES 2 58 3 Installing PEECAU LOSS so E Giee ce et th lee ede bros La oat sheng oat ie at aunt oak Mia Ste T dh Ln em tae 3 2 Electrostatic discharge ProtectiOn cscccesscessessscessseecseecseessscesseeesseeeesecesaeeesseeeseessaeers 3 2 Handling controllers or cache MOdUIES eee eeeeeseesseeeeseceececeeeceseeceeeeesaeeeeeesaeesaee 3 2 Handling the progtam Cards sesso ch sioe dec tive hehe Suds EE E O AE E cob tenes AES 3 2 Patching controller softwares airesin ree er Ta viee tei ei neared A a a 3 3 HSJ50 Array Controller Service Manual vi Code patch
118. able A 24 Table A 25 Table A 26 Table A 27 HSJ50 Array Controller xi Disk and tape MSCP server last failure codes 00 0 ceeeeseceeeceneeeneeeneeteaeeeneenaee A 80 Diagnostics and utilities protocol server last failure codes 00 0 eee eee A 84 System communication services directory last failure code eee A 85 SCSI host value added services last failure codes ccccceescccessseeeeesteeeeeeees A 85 Disk inline exerciser DILX last failure codes ceecsssecceeeseeeeeeeeseeeeeees A 86 Tape inline exerciser TILX last failure codes ceeecceeeseeeeeteeeeeeeeeeeneeeee A 87 Device configuration utilities CONFIG CFMENVU last failure codes A 89 Clone unit utility CLONE last failure codes 0 ceceeeeceeeesceeeeteeeeeeeeeenneeeeee A 89 Format and device code load utility HSUTIL last failure codes 0 A 89 Code load code patch utility CLCP last failure codes 00 0 eeeeeeeeeereeseeeneees A 90 Induce controller crash utility CRASH last failure codes c ceesseeesees A 90 R pair action cod s ne Forside iss ete aA anes ee E REEE eE TE EE STO eae A 91 Service Manual Related documents The following table lists documents that contain information related to this product DECevent Installation Guide AA Q73JA TE StorageWorks BA350 MA Controller Shelf User s EK 350MA UG Guide StorageWorks Configuration Manager for DEC AA QC38A TE OSF 1 Installati
119. acters For disks this symbol indicates the device is at speed For tapes it indicates the tape is loaded gt For disks this symbol indicates the device is spinning up For tapes it indicates the tape is loading lt For disks this symbol indicates the device is spinning down For tapes it indicates the tape is unloading v For disks this symbol indicates the device is stopped For tapes it indicates the tape is unloaded For other types of devices this column is left blank HSJ50 Array Controller Troubleshooting HSJ50 Array Controller For disks and tapes a W in the write protect column indicates the device is hardware write protected This column is left blank for other device types A F in the fault column indicates an unrecoverable device fault If this field is set the device fault indicator also is illuminated Rq S This column shows the average I O request rate for the device during the last update interval These requests are up to eight kilobytes long and are either generated by host requests or cache flush activity RdKB S This column shows the average data transfer rate from the device in kilobytes during the previous screen update interval WrKB S This column shows the average data transfer rate to the device in kilobytes during the previous screen update interval Que This column shows the maximum number of transfer requests waiting to be transf
120. apes it indicates the tape is loaded gt For disks this symbol indicates the device is spinning up For tapes it indicates the tape is loading lt For disks this symbol indicates the device is spinning down For tapes it indicates the tape is unloading v For disks this symbol indicates the device is stopped For tapes it indicates the tape is unloaded For other types of devices this column is left blank For disks and tapes a w in the write protect column indicates the unit is write protected This column is left blank for other device types The data caching state is indicated using the following letters b Both Read caching and Write Back caching are enabled r Read caching is enabled w Write Back caching is enabled A space in this column indicates caching is disabled KB S This column indicates the average amount of kilobytes of data transferred to and from the unit in the previous screen update interval This data is only available for disk and tape units Rd This column indicates what percentage of data transferred between the host and the unit were read from the unit This data is only contained in the DEFAULT display for disk and tape device types Wr This column indicates what percentage of data transferred between the host and the unit were written to the unit This data is only contained in the DEFAULT display for disk and tape device types
121. are not available Message Unable to change operation mode to maintenance for unit unit_number Explanation HSUTIL was unable to put the source single disk drive unit into maintenance mode to enable formatting or code load Service Manual 3 22 Service Manual Installing Message Unit unit_number successfully allocated Explanation HSUTIL has allocated the single disk drive unit for code load operation At this point the unit and its associated device are not available for other subsystem operations Message Unable to allocate unit Explanation HSUTIL could not allocate the single disk drive unit An accompanying message explains the reason Message Unit is owned by another sysap Explanation Device cannot be allocated because it is being used by another subsystem function or local program Message Unit unit_number is in maintenance mode Explanation Device cannot be formatted or code loaded because it is being used by another subsystem function or local program Message Unit unit_number is allocated to other controller please re invoke HSUTIL from the other controller or make the unit allocated to this controller by one of the following commands SET THIS PREFERRED ID unit s target ID SE OTHER NOPREFERRED_ID Explanation The device shown is still under the control of the companion controller Follow the recommended steps to run HSUTIL H
122. ast Failure Parameter 1 contains the value supplied in the eip gt generic mscp1 flgs fiel Unexpected template type found during fmu_display_errlog processing Last Failure Parameter 0 contains the unexpected template value Unexpected instance code found during fmu_memerr_report processing Last Failure Parameter 0 contains the unexpected instance code value CLIB SDD_FAO call failed Last Failure Parameter 0 contains the failure status code value HSJ50 Array Controller Appendix A A 67 Last Fail Code Explanation 04130103 The event log format found in the eip is not supported by the Fault Manager The bad format was discovered while trying to fill in the DLI of the supplied eip Last Failure Parameters 0 contains the instance code value Last Failure Parameters 1 contains the format code value Last Failure Parameters 2 contains the requester error table index value 04140103 The template value found in the eip is not supported by the Fault Manager The bad template value was discovered while trying to build an esd Last Failure Parameters 0 contains the instance code value Last Failure Parameters 1 contains the template code value Last Failure Parameters 2 contains the requester error table index value Table A 6 Common library last failure codes Last Fail Code Explanation 05010100 In recursive_nonconflict could not get enough memory for scanning the keyword tables for configuration name
123. aster DRAB detected an Ibus Parity 32 Error during an 1960 CACHEA Control and Status Register access with a simultaneous but unrelated buffer memory access 01BC3202 The Master DRAB detected an Ibus Parity 32 Error during an 1960 CACHEB Control and Status Register access with a simultaneous but unrelated buffer memory access 01BD3702 The Master DRAB unexpectedly reported an Ibus Parity error 01BE3702 The CACHEAO DRAB unexpectedly reported an Ibus Parity error 01BF3702 The CACHEA1 DRAB unexpectedly reported an Ibus Parity error 01003702 The CACHEBO DRAB unexpectedly reported an Ibus Parity error 3 3 3 3 3 7 7 7 7 01C13702 The CACHEB1 DRAB unexpectedly reported 7 an Ibus Parity error 01C22F02 The Master DRAB detected a Write Data 2F Parity error during an FX attempt to write buffer memory 01C32F02 The Master DRAB detected a Write Data 2F Parity error during an FX attempt to write a byte to buffer memory Service Manual HSJ50 Array Controller Appendix A HSJ50 Array Controller A 13 Explanation The Master DRAB detected a Write Data Parity error during a Host port attempt to write buffer memory The Master DRAB detected a Write Data Parity error during a Host port attempt to write a byte to buffer memory T P he Master DRAB detected a Write Data arity error during a Device port attempt to write buffer memory The Master DRAB detected a Write Data Parity error during a Device port attemp
124. ated with this Last Failure code All structures contained in the System Information Page and the Last Failure entries have been reset to their default settings as the result of certain controller manufacturing configuration activities If this event is reported at any other time follow the recommended repair action associated with this Last Failure code Non maskable interrupt entered but no Non maskable interrupt pending This is typically caused by an indirect call to address 0 HSJ50 Array Controller Appendix A Last Fail Code 01110106 01126880 01136880 01150106 A 45 Explanation A bugcheck occurred during EXEC BUGCHECK processing Last Failure Parameter 0 contains the executive flags value Last Failure Parameter 1 contains the RIP from the bugcheck call stack Last Failure Parameter 2 contains the first SIP last failure parameter value Last Failure Parameter 3 contains the second SIP last failure parameter value Last Failure Parameter 4 contains the SIP last failure code value Last Failure Parameter 5 contains the EXEC BUGCHECK call last failure code value A processor interrupt was generated by the CACHA Dynamic Ram controller and ArBitration engine DRAB with an indication that the CACHE backup battery has been disconnected A processor interrupt was generated by the CACHB Dynamic Ram controller and ArBitration engine DRAB with an indication that the CACHE backup
125. ation 60290100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by mscp_rcv_listen 602A0100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by dmscp_dcd_sre_gces_send 602B0100 HIS did not allocate an HTB when there should have been one reserved for this connection as determined by dmscp_dcd_comm_path_event 602C0100 When trying to put THE extra send HTB on the connections send_htb_list there was already one on the queue 602D0100 The VA CHANGE_STATE service did not set the Software write protect as requested for disk 602E0100 The VA CHANGE_STATE service did not set the Software write protect as requested for tape 603B0100 Initial HIS LISTEN call for MSCP DISK was unsuccessful 603C0100 Initial HIS LISTEN call for MSCP TAPE was unsuccessful 60400100 Unrecognized or invalid in this context return value from routine RESMGR ALLOCATE_DATA_SEGMENT while dmscp_dcd_allocate_dseg attempting to allocate a data segment 60410100 Unrecognized or invalid in this context return value from routine RESMGR ALLOCATE_DATA_BUFFERS while dmscp_dcd_allocate_dbuf attempting to allocate a data buffer 60420100 dmscp_dcd_rmte_end_msg was unable to find a command message that corresponds to end message it is currently processing 60440100 dmscp_dced_src_gces_cmpl found the command being GCSed is no longer at the head of the
126. attery disable switch on the battery module s front panel Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 7 Disconnect the ECB cables See Figure 5 4 HSJ50 Array Controller Service Manual 5 8 Removing Figure 5 4 Removing the ECB cables a CXO 5282A MC 8 Loosen the captive screws on the controller and cache module s front bezel 9 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist 10 Attach the other end of the ESD strap to the cabinet grounding stud or a convenient cabinet grounding point non painted surface 11 Slide the controller out of the shelf see Figure 5 5 and place it in an ESD bag Service Manual HSJ50 Array Controller Removing 5 9 Figure 5 5 Removing the controller and cache module CXO 5327A MC 12 If required slide the cache module out of the shelf and place it in an ESD bag See Figure 5 5 13 Remove the ECB and store it with the cache module HSJ50 Array Controller Service Manual 5 10 Removing Removing storage devices Remove storage devices so they can be used in other subsystems Removing disk drives Disk drives may be removed without having to quiesce the device bus or remove power from the shelf with the following rest
127. battery has been disconnected A bugcheck occurred before subsystem initialization completed Last Failure Parameter 0 contains the executive flags value Last Failure Parameter 1 contains the RIP from the bugcheck call stack Last Failure Parameter 2 contains the first SIP last failure parameter value Last Failure Parameter 3 contains the second SIP last failure parameter value Last Failure Parameter 4 contains the SIP last failure code value Last Failure Parameter 5 contains the EXEC BUGCHECK call last failure code value 018000A0 A powerfail interrupt occurred HSJ50 Array Controller Service Manual A 46 Appendix A Last Fail Code Explanation 018600A0 A processor interrupt was generated with an indication that the other controller in a dual controller configuration asserted the KILL line to disable this controller 018700A0 A processor interrupt was generated with an indication that the RESET button on the controller module was depressed 018900A0 A processor interrupt was generated with an indication that the controller inactivity watchdog timer expired 018B2580 A NMI interrupt was generated with an indication that a memory system problem occurred 018C2580 A DRAB_INT interrupt was generated with an indication that a memory system problem occurred Last Fail Code Explanation 02000100 Initialization code was unable to allocate enough memory to setup the receive data descriptors 0
128. ce code contained in the memory address field of this event report to correlate this event report with the other event report If bit 31 of the DCSR register of the DRAB that detected the failure is set it indicate a firmware fault follow repair action 01 If bit 31 is not set follow repair action 36 If bits 20 through 23 of the WDR1 register contain a non zero value it indicates a firmware fault follow repair action 01 If bits 20 through 23 contain zero follow repair action 36 No other information is available to aid in diagnosing the cause of the failure If the Master DRAB detected the failure follow repair action 20 If the CACHEAn or CACHEBn DRAM detected the failure follow repair action 22 If the problem persists follow repair action 01 The Memory System Failure translator could not determine the failure cause Follow repair action 01 If the Sense Data FRU field is non zero follow repair action 41 If the Sense Data FRU field is zero replace the appropriate FRU associated with the device s SCSI interface or the entire device Update the configuration data to correct the problem Replace the SCSI cable for the failing SCSI bus If the problem persists replace the controller backplane drive backplane or controller module Interpreting the device supplied Sense Data is beyond the scope of the controller s firmware See the device s service manual to determine the appropriate repair action if any Swa
129. cecsseceseeceseeeseeeeseeeseeeeseeesaee 1 12 Reading DEEevent error l gico aa aa ae Ea AE a EA ET ea 1 15 Using FMU to Describe Event Log Codes esesseesseseesresreeressrssirsresressreressrersessreserereesresses 1 19 Using FMU to Describe Recent Last Fail or Memory System Failure Codes 1 21 For Examples a n E T E AE OE EE E ENE ONENE 1 21 Testne disks DIEX cni E EE E E A E E E RE 1 22 Rurinins a quick disk testenni seneo besten oe a aa aae Stubs pede besten 1 23 Running an initial test on all disks eee ceeeccceeecceeneeeeeeeneecesnececseeeeesneeeeeeeeeeeeneeeeeeees 1 24 Running a disk basic function teSt eee eeeeeeneecsseeseeesaceceeecsscecsseecseeesseeeesaeeteeeeeaeeraes 1 25 Running an advanced disk test cesceseeeseecsseceseecsseeeesecseecsseeseeeeeseeeesaeesaeeesaeessaeeeaeers 1 28 DIL X Error cod s sene nan Phe as te Ash arin ed ets et Ae E A Bie E S 1 31 DIX data patterns cc 5 ioiesdovertsetesicvandusdsvsetavicsesysadastessctesesvendset AVE AEE a TEE IE Soa 1 32 Testine tapes TA a a E os band A A A EE AN 1 33 R nmngagu ick tape testise arniko EEEE E KAE AT RAEE REA 1 33 Running a tape drive basic function test seesseesseseesreesreeererirsressreeresrresrresessreseresereerese 1 34 Running a tape drive read only test eeeseeeseeseeseessesseeressresstesersrissrrssreresereseresersresere 1 37 Running an advanced tape drive test eeeceseceseeeesseeseeesneeceeecsseecsseeseessees
130. cells 2 33 Write through cache G 12 Service Manual
131. charging LED blinks slowly System power is off and the ECB is supplying power to the cache LED is off System power is off and the ECB is not supplying power to the cache 17 Enable the write back cache at the CLI CLI gt SET unit name WRITEBACK CACHE Installing a second controller and cache module This procedure may be used to install a second controller and cache module for redundancy Service to the subsystem during this procedure is halted Use this procedure if you prefer not to use the C_Swap utility Service Manual HSJ50 Array Controllers Installing Required tools 3 31 The tools listed in Table 3 5 are required for the installation of a second controller and cache module Table 3 5 Required tools Required tools Maintenance terminal ESD wrist strap 5 32 inch Allen wrench Flat head screwdriver Add a second controller Purpose To set controller parameters To protect all equipment against electrostatic discharge To unlock the SW800 series cabinet To loosen controller mounting screws and to disconnect SCSI cables 1 Connect a maintenance terminal to the operating controller See Figure 3 9 Figure 3 9 Connecting a maintenance terminal to the controller Local connection port HSJ50 Array Controllers BC16E XX ae To terminal CXO 5322A MC Service Manual 3 32 Service Manual Cache Installing At the operating controller s terminal enter CLI gt
132. conflicts 05030100 In clib scan_for_nonconflict could not get enough memory for scanning the keyword tables for configuration name conflicts Table A 7 DUART services last failure codes Last Fail Code Explanation 06010100 The DUART was unable to allocate enough 0l memory to establish a connection to the CLI HSJ50 Array Controller Service Manual A 68 Appendix A Last Fail Code Explanation 06020100 A port other than terminal port A was referred to by a set terminal characteristics command This is illegal 06030100 A DUP question or default question message type was passed to the DUART driver but the pointer to the input area to receive the response to the question was NULL 06040100 Attempted to detach unattached maintenance 01 terminal 06050100 Attempted output to unattached maintenance 0l terminal 06060100 Attempted input from output only Ol maintenance terminal service 06070100 The DUART was unable to allocate enough 0l memory for its input buffers 06080000 Controller was forced to restart due to entry of 00 a CONTROL K character on the maintenance terminal Table A 8 Failover control last failure codes Last Fail Code Explanation Repair Action 07010100 All available slots in the FOC notify table are 01 filled 07020100 FOC CANCEL_NOTIFY was called to 01 disable notification for a rtn that did not have notification enabled 07030100 Unable to start the Failover Control Timer before main loop 0
133. considerations 0 cc eceeeeseessceseceseeseeeaeceaeceeceaeceaeceaeeaeceaeeeaseasseaseeseenseeaees 3 3 Listing patches nionee enn Ae es leet oe 3 4 T st allijg a pateisina iae aA AE A OR EET E 3 6 Code patch Messages eeta e a RE AEE EREE AAA EA A LAEE 3 9 Formatting disk dbives tiss aistivti a iaehishisid anise Aap isd ot evhiet aa AE A ar sedis A TEENE 3 12 Considerations for formatting disk Arives ce eeeeeseecsseesseeceseceseeceseceseeceeeseseeeneeseeeees 3 12 Installing new firmware on a device 0 0 eeeeeeeseseceeesceeeseeeeceeesaeceseecsaeecsseesaeecsaeeseeeesseeeesaeons 3 15 Considerations for installing new device firMWaTe eeeseeseeseeesteeeseceseeeeeeesaeeeaeeaee 3 16 Copying the firmware to your subsystem eee ceeeeceeeeeeeceeecsaeecseessacesseessereeeenseeses 3 17 From Open VMS erien aao Sectand sock seedh dua coodounbecubsahd Dunseth eouts EEEN EESE 3 17 Installing the firmware onto a target device eee eeceeeeeseesneeseseceseeceeeceaeeceeessaeecseerenees 3 18 HIS UTIL abOrtsCOdes vi 225 sas sesistes his fo a E estes e A Backes PBA Queso seb btes eas ashe ia 3 21 HSU TIL Messages siiscsi she dehh tinier doin webs a AAA EEEIEE KEREI aimed 3 21 Installing a controller and cache module single controller configuration cesses 3 25 Required tools cien na a TR A agai zetia A E A a eraio 3 25 Installing a second controller and cache module 0 0 eee eec eee seeeseeeeeeeeeneeeseeteesse
134. controller flush The act of writing data from the cache module to the media FRU Field replaceable unit A hardware component that can be replaced FWD SCSI Fast wide differential SCSI The differential SCSI bus with a 16 bit parallel data path that yields a transfer rate of up to 20 MB s half height device A device that occupies half of a 5 25 inch SBB carrier Two half height devices can be mounted in a 5 25 inch SBB carrier The first half height device is normally mounted in the lower part of the carrier The second device is normally mounted in the upper part of the carrier HBVS Host based volume shadowing Also known as Phase 2 volume shadowing HSOF Hierarchical storage operating firmware Software contained on a program card that provides the logic for the HS array controllers HIS Host interconnect services The firmware in the HS array controller that communicates with the host host Any computer to which a storage subsystem can be attached Service Manual G 6 Service Manual Glossary hot swap A method of replacing a device whereby the system that contains the device remains online and active during replacement The device being replaced is the only device that cannot perform operations during a hot swap initiator A SCSI device that requests an I O process to be performed by another SCSI device a target This is always the controller local terminal A terminal plugged into
135. cted a Nonexistent Memory Error condition during an FX attempt to write a byte to CACHEB1 memory 01892E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during an FX attempt to read CACHEB1 memory 018A2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Host port attempt to write CACHEB1 memory 018B2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Host port attempt to write a byte to CACHEB1 memory 018C2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Host port attempt to read CACHEB1 memory 018D2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write CACHEB1 memory 018E2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write a byte to CACHEB1 memory 018F2E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during a Device port attempt to read CACHEB1 memory 01902E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write CACHEB1 memory 01912E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write a byte to CACHEB1 memory 01922E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to read CACHEB1 memory 01933702 The Master DRAB unexpectedly reported a 37 Nonexistent Memory Error condition HSJ50 Array Controll
136. cted in 0l build_raid_node 02920100 Unable to handle that many bad dirty pages 01 exceeded MAX_BAD_DIRTY Cache memory is bad 02930100 There was no free or freeable buffer to convert bad metadata or to borrow a buffer during failover of bad dirty 02940100 A free Device Correlation Array entry could not be found during write back cache failover 02950100 Invalid DCA state detected in start_crashover 01 1 02960100 Invalid DCA state detected in start_failover 01 02970100 Invalid DCA state detected in init_failover 01 02980100 This bugcheck was created for testing 01 purposes only specifically testing write back cache failover It should not be seen in the field 02990100 A free RAID Correlation Array entry could 01 not be found during write back cache failover 029A0100 Invalid cache buffer metadata detected while 01 scanning the Buffer Metadata Array Found a page containing dirty data but the corresponding Device Correlation Array entry does exist 029D0100 Invalid metadata combination detected in 01 build_bad_raid_node 029E0100 Distinguished member is not null_pub This 01 last_fail code was removed from HSOF firmware at Version 2 5 029F0100 The Cache Manager software has insufficient 01 resources to handle a buffer request pending 02A00100 VA change state is trying to change device 01 affinity and the cache has data for this device 02A10100 Pubs not one when transportable 02A20100 Pubs not one when transportable
137. ctedly reported 37 a Multiple Bit ECC error 014F3702 The CACHEA1 DRAB unexpectedly reported a Multiple Bit ECC error 01503702 The CACHEBO DRAB unexpectedly reported a Multiple Bit ECC error 01513702 The CACHEB1 DRAB unexpectedly reported a Multiple Bit ECC error 01522B02 The Master DRAB detected an Ibus to Nbus Time out condition during an 1960 to Nbus device transaction The Nbus device failed to respond 01533702 The CACHEAO DRAB unexpectedly reported an Ibus to Nbus Time out condition 01543702 The CACHEA1 DRAB unexpectedly reported an Ibus to Nbus Time out condition 01553702 The CACHEBO DRAB unexpectedly reported an Ibus to Nbus Time out condition 01563702 The CACHEB1 DRAB unexpectedly reported an Ibus to Nbus Time out condition 01572C02 The Master DRAB detected a Nonexistent Memory Error condition during an FX attempt to write buffer memory 01582C02 The Master DRAB detected a Nonexistent Memory Error condition during an FX attempt to write a byte to buffer memory 01592C02 The Master DRAB detected a Nonexistent Memory Error condition during an FX attempt to read buffer memory 015A2C02 The Master DRAB detected a Nonexistent Memory Error condition during a host port attempt to write buffer memory 015B2C02 The Master DRAB detected a Nonexistent Memory Error condition during a Host port attempt to write a byte to buffer memoryh HSJ50 Array Controller Service Manual 01632D02 01642D02 01652D02
138. d an Nbus Transfer Error Acknowledge condition The CACHEB1 DRAB unexpectedly reported an Nbus Transfer Error Acknowledge condition A Multiple Bit ECC error was detected during a memory refresh attempt by the Master DRAB A Multiple Bit ECC error was detected during a memory refresh attempt by the CACHEAO DRAB A Multiple Bit ECC error was detected during a memory refresh attempt by the CACHEA1 DRAB A Multiple Bit ECC error was detected during a memory refresh attempt by the CACHEBO DRAB A Multiple Bit ECC error was detected during a memory refresh attempt by the CACHEB1 DRAB The Master DRAB detected a Multiple Bit ECC error during an FX attempt to read buffer memory The Master DRAB detected a Multiple Bit ECC error during a host port attempt to read buffer memory The Master DRAB detected a Multiple Bit ECC error during a Device port attempt to read buffer memory Service Manual A 4 Service Manual Appendix A Explanation The Master DRAB detected a Multiple Bit ECC error during an 1960 attempt to read buffer memory The CACHEAO DRAB detected a Multiple Bit ECC error during an FX attempt to read CACHEAO memory The CACHEAO DRAB detected a Multiple Bit ECC error during a host port attempt to read CACHEAO memory The CACHEAO DRAB detected a Multiple Bit ECC error during a Device port attempt to read CACHEAO memory The CACHEAO DRAB detected a Multiple Bit ECC error during an 196
139. d be flashing on and off about one time per second e Exit any local programs that you may be running such as C_LSWAP or CFMENU e Wait until the CLI prompt appears on your local or remote terminal before inserting or removing any device e Wait about one minute after inserting each device before you insert another e Don t insert or remove a device during failover or failback Service Manual HSJ50 Array Controller Moving Storagesets and Devices 4 3 Moving storagesets You can move a storageset from one subsystem to another without destroying its data You can also follow these steps to move a storageset to a new location within the same subsystem Figure 4 1 Moving a storageset from one subsystem to another CXO 5290A MC To move a storageset while maintaining the data it contains 1 Show the details for the storageset that you want to move HSJ50 gt SHOW storageset name 2 Label each member with its name and PTL location If you don t have a storageset map for your subsystem you can LOCATE each member to find its PTL location HSJ50 gt LOCATE disk name To cancel the locate command HSJ50 gt LOCATE CANCEL 3 Delete the unit number shown in the Used by column of the SHOW Storageset name command HSJ50 gt DELETE unit number HSJ50 Array Controller Service Manual 4 4 Service Manual Moving Storagesets and Devices Delete the storageset shown in
140. d contains the starting physical address of the CACHEAO memory The CACHE Dynamic Ram controller and ArBitration engine 1 DRAB1 failed testing performed by the cache diagnostics The Memory Address field contains the starting physical address of the CACHEAI memory A data compare error was detected during the execution of a compare modified READ or WRITE command A data compare error was detected during the execution of a compare modified READ or WRITE command Note that in this instance the SCSI Device Sense Data fields cmdopced through keyspec are undefined A failed read test of a write back metadata page residing in cache occurred Dirty write back cached data exists and cannot be flushed to media The dirty data is lost The Memory Address field contains the starting physical address of the CACHEAO memory cache diagnostics have declared the cache bad during testing The Memory Address field contains the starting physical address of the CACHEAO memory HSJ50 Array Controller Appendix A A 17 aa 020D2401 The wrong write cache module is configured The serial numbers do not match Either the existing or the expected cache contains dirty write back cached data Note that in this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 020E2401 The write cache module is missing A cache is expected to be configured and contains dirty write back cached data Note that in
141. d write fail Replace the card ILF INIT unable to Reset the controller allocate memory 0000 Bugcheck before Reset the controller subsystem initialization completed o000000 No program card seen Try the card in another module If the problem follows the card replace the card Otherwise replace the controller Table 1 2 Flashing controller LED codes Code Description of Error Corrective Action o00000 Program card EDC error Replace program card o000 O00 Timer zero in the timer Replace controller chip will run when module disabled OO00 KO Timer zero in the timer Replace controller chip decrements module incorrectly HSJ50 Array Controller Service Manual 1 8 Troubleshooting Description of Error Corrective Action O00 SO Timer zero in the timer Replace controller chip did not interrupt module the processor when requested OO0O0 Timer one in the timer Replace controller chip decrements module incorrectly o00 O00 Timer one in the timer Replace controller chip did not interrupt module the processor when requested GOOF OO Timer two in the timer Replace controller chip decrements module incorrectly OO0O OF O Timer two in the timer Replace controller chip did not interrupt module the processor when requested OOOO X Memory failure in the Replace controller I D cache module 00 OO No hit or miss to the I D Replace controller cache when expected module OOO KOR One
142. de 60140100 tmscp_clear_sex_cdl_cmpl_rtn detected an unexpected opcode 60150100 VA CHANGE_STATE failed to change the SW Write protect when requested to do so as part of the Disk Set Unit Characteristics command 60160100 VA CHANGE_STATE failed to change the 01 SW Write protect when requested to do so as part of the Tape Set Unit Characteristics command 60170100 Invalid type in entry of long interval work 01 queue 60180100 mscp_short_interval found an Invalid type in 01 entry of long interval work queue 01 60190100 dmscp_dcd_send_cmd found that the SIWI Work Item code supplied is unrecognized or invalid in this context during DCD inhibited processing 601B0100 Invalid EVENT_CODE parameter in call to 01 dmscp_connection_event 601C0100 Invalid EVENT_CODE parameter in call to 01 tmscp_connection_event 601D0100 Invalid EVENT_CODE parameter in call to 01 dmscp_dcd_comm_path_event 601E0100 Invalid EVENT_CODE parameter in call to 0l dmscp_dcd_comm_path_event 60250100 An attempt was about to be made to return a 01 progress indicator to the host that was OxFFFFFFFF the only invalid value 60260100 An WH_DAF command was requested to be 01 performed by the wrong process 60270100 A nonimmediate WHM operation was passed 0l to the dmscp_exec_whm_immediate routine 60280100 This routine found an invalid xfer_state so 01 cannot continue HSJ50 Array Controller Service Manual A 82 Appendix A Last Fail Code Explan
143. de the user data area of the disk Note that due to the way Bad Block Replacement is performed on SCSI disk drives information on the actual replacement blocks is not available to the controller and is therefore not included in the event report 021D0064 Unable to lock the other controller s cache in a write cache failover attempt Either a latent error could not be cleared on the cache or the other controller did not release its cache Note that in this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 021E0064 The device specified in the Device Locator field has been added to the RAIDset associated with the logical unit The RAIDset is now in Reconstructing state Service Manual HSJ50 Array Controller Appendix A HSJ50 Array Controller Instance Code 021F0064 02200064 02210064 02220064 02230064 02240064 0227000A 02280064 02290064 Explanation The device specified in the Device Locator field has been removed from the RAIDset associated with the logical unit The removed device is now in the Failedset The RAIDset is now in Reduced state The device specified in the Device Locator field failed to be added to the RAIDset associated with the logical unit The device will remain in the spareset The device specified in the Device Locator field failed to be added to the RAIDset associated with the logical unit The failed device has been moved t
144. dependent controllers Controllers in a dual redundant configuration must have the same allocation class array controller A hardware software device that facilitates communications between a host and one or more devices organized in an array HS family controllers are examples of array controllers BBR Bad block replacement The procedure used to locate a replacement block mark the bad block as replaced and move the data from the bad block to the replacement block BBU Battery backup unit A StorageWorks SBB option that extends power availability after the loss of primary ac power or a power supply to protect against the corruption or loss of data block The smallest data unit addressable on a disk Also called a sector In integrated storage elements a block contains 512 bytes of data EDC ECC flags and the block s address header CDU Cable distribution unit The power entry device for StorageWorks cabinets The unit provides the connections necessary to distribute ac power to cabinet shelves and fans CLI Command line interpreter Operator command line interface for the HS family controller firmware HSJ50 Array Controller Glossary HSJ50 Array Controller G 3 controller shelf A StorageWorks shelf designed to contain controller and cache memory modules CRC Cyclic redundancy check An 8 character cyclic redundancy check string used in conjunction with the customer identification string for turn
145. different than what is found present Note that in this instance the memory address byte count DRAB register and Diagnostic register fields are undefined 02502401 The cache module has memory SIMMs populated in an unsupported configuration Note that in this instance the memory address byte count DRAB register and Diagnostic register fields are undefined 0251000A The command failed because the target unit is not online to the controller The Information field of the Device Sense Data contains the block number of the first block in error 0252000A The last block of data returned contains a forced error A forced error occurs when a disk block is successfully reassigned but the data in that block is lost Rewriting the disk block will clear the forced error condition The Information field of the Device Sense Data contains the block number of the first block in error 0253000A The data supplied from the host for a data compare operation differs from the data on the disk in the specified block The Information field of the Device Sense Data contains the block number of the first block in error 0254000A The command failed due to a host data transfer failure The information field of the Device Sense Data contains the block number of the first block in error 0255000A The controller was unable to successfully transfer data to target unit 0256000A The write operation failed because the unit is data safety write protected
146. disk drives to their new location HSJ50 Array Controller Moving Storagesets and Devices 4 5 HSJ50 Array Controller HSJ50 gt ADD DISK DISK200 2 0 0 HSJ50 gt ADD DISK DISK300 3 0 0 HSJ50 gt ADD DISK DISK400 4 0 0 HSJ50 gt ADD RAIDSET RAID99 DISK200 DISK300 DISK400 HSJ50 gt ADD UNIT D100 RAID99 Example The following example moves the reduced RAIDset R3 to another cabinet R3 used to contain disk200 which failed before the RAIDset was moved At the beginning of this example it contains disk100 disk300 and disk400 HSJ50 gt DELETE D100 HSJ50 gt DELETE R3 HSJ50 gt DELETE DISK100 DISK300 DISK400 move disk drives to their new location HSJ50 gt ADD DISK DISK100 1 0 0 HSJ50 gt ADD DISK DISK300 3 0 0 HSJ50 gt ADD DISK DISK400 4 0 0 HSJ50 gt ADD RAIDSET R3 DISK100 DISK300 DISK400 REDUCED HSJ50 gt ADD UNIT D100 R3 Service Manual Moving Storagesets and Devices Moving storageset members Service Manual You may want to move a storageset member and its data from one PTL location to another to maintain the symmetry in your subsystem For example if a RAIDset member fails and is replaced by a disk drive in the spareset you could move the replacement member into the column that contains the RAIDset Figure 4 2 Maintaining symmetry in your subsystem makes it easier to keep track of your storagesets and their members
147. duration of the test and whether the test should be read only or read write Note TILX places a heavy load on the controllers To avoid the possibility that data may be lost you should stop normal I O operations before running TILX or run TILX during periods of low activity TILX can test several tape drives at the same time Each tape drive that you want to test must be configured as a unit but it must be dismounted from the host There are four tests that you can run with TILX a quick tape test a basic function test a read only test and a user defined test Service Manual 1 34 Troubleshooting Running a quick tape test Service Manual This section provides instructions on how to run a quick TILX test on one or more tape drives This is a 10 minute read only test that uses the default TILX settings 1 Start TILX from the CLI prompt HSJ50 gt RUN TILX 2 Use the default settings Use all defaults y n yl y 3 The system displays a list of all tape drive units by unit number that you can choose for TILX testing Select the first tape drive that you want to test Do not include the letter T in the unit number Enter unit number to be tested 350 4 Check to be sure that a tape is loaded in the drive then answer yes to the next question Is a tape loaded and ready answer Yes when ready Y 5 TILX indicates whether it has been able to allocate the tape drive If you want to test more tape dri
148. e After removing a controller or cache module from the shelf place the module into an approved antistatic bag or onto a grounded antistatic mat Write back cache modules contain high current battery cells that can cause injury to personnel if in contact with a conductive surface Do not place write back cache modules or battery cells on metal surfaces Handling the program card Use the following guidelines when handling the program card e Cover the program card with the snap on ESD cover when the card is installed in the controller e Keep the program card in its original carrying case when not in use e Do not twist or bend the program card e Do not touch the program card contacts Service Manual HSJ50 Array Controllers Installing 3 3 Patching controller software The Code Patch function of the Code Load Code Patch CLCP utility allows you to enter small changes to the controller s software The patches that you enter are placed directly into the controller s NVMEM non volatile memory and become active after the next controller initialization If the storage devices have been initialized with the SAVE_CONFIGURATION switch of the INITIALIZE command the patches that you enter will be stored on each disk of the storage set The code patching utility allows more than one patch to be entered for a given software version Each patch is associated with only one software version and the code patch utility verifies the patc
149. e Install controller B Slide the controller module along the rails and then push firmly to seat it in the backplane See Figure 2 24 CSC Catttinn o Do not overtighten the captive screws on the controller s front bezel or the cache module s front bezel Damage to the controller PC board or front bezel or the cache module s front bezel may result Tighten the front bezel captive screws on the cache module and the controller module Press Return on the operating controller s console Wait for the following text to be displayed on the operating controller s console Port 1 restarted Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted Controller Warm Swap terminated The configuration has two controllers To restart the other HSJ50 1 Enter the command RESTART OTHER_CONTROLLER 2 Press and hold in the Reset button while Service Manual 2 46 Replacing field replaceable units inserting the program card 3 Release Reset the controller will initialize 4 Configure new controller by referring to the controller s configuration manual Restarting the subsystem 1 Service Manual Start controller B by entering the following CLI command HSJ50 gt RESTART OTHER_CONTROLLER Connect the maintenance terminal to controller B Press and hold the Reset button on controller B while inserting the program card Release the Reset
150. e e Check that the screws that attach the internal CI cable to the controller are tightened VMS shadowsets go into mount verify Symptoms e Units that are members of VMS shadowsets intermittently go into Mount Verification during heavy I O operations such as backup or shadow copy e VMS errorlog shows DATAGRAM FOR NON EXISTING UCB at the time of the mount verifications e HSJ50 controller error log shows instance code 4007640A Likely Cause The HSJ50 declared a timeout on the CI which momentarily closed its VC with the host node Solution The Quiet Slot may not be set correctly on the host nodes to work with the HSJ50 controllers Set the Quiet Slot to 10 Units are Host Unavailable Symptoms e The OpenVMS host reports all disk or tape units in the subsystem as host unavailable e The host can see the controller Service Manual HSJ50 Array Controller Troubleshooting Likely Cause The CI connections are not complete The MSCP and or TMSCP allocation class settings on the controller have been changed while the cluster was up Solution Check the CI cable connections Check the MSCP and TMSCP allocation class in the controller HSJ50 gt SHOW THIS_CONTROLLER HSJ50 gt SHOW OTHER_CONTROLLER If the settings are not correct you must either change them and the restart the cluster or change the controller s SCS_NODENAME If the SCS_NODENAME contains characters such as _ or the co
151. e for example both write back cache replace a cache module to assure both are compatible If this is a dual redundant configuration and both write caches are not of the same size replace a cache module to assure both are compatible If the cache module is populated with memory SIMMs in an illegal configuration reconfigure according to guidelines An unrecoverable Memory System failure occurred Upon restart the controller will generate one or more Memory System Failure Event Sense Data Responses Follow the repair actions contained in the respwhichwhichonses The Master DRAB detected a Cache Time out condition The cache regions in effect are identified in the Master DRAB RSR register as follows Bits 8 through 11 identify the CACHEA memory region Bits 12 through 15 identify the CACHEB memory region Bits 20 through 23 the CACHEA DRAB registers region Bits 24 through 27 identify the CACHEB DRAB registers region If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 36 HSJ50 Array Controller Service Manual A 94 Service Manual Appendix A Repair Action Action to take Code The Master DRAB detected an Nbus Transfer Error Acknowledge TEA condition If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear
152. e invalidate or delete data from the cache memory accordingly to ensure that the cache does not contain obsolete data The user sees the operation as complete only after the backup storage device has been updated HSJ50 Array Controller A Abort codes 3 21 Adapter G 2 Adding disk drives to configuration 3 17 Allocation class G 2 Array controller G 2 Asynchronous device swap 3 68 Asynchronous swap 2 47 power supplies 2 47 storage devices 2 50 B BBR G 2 BBU G 2 Block G 2 C C_SWAP 3 38 adding second controller 3 38 offline method 3 38 online method 3 38 Cables handling for ESD 2 2 replacing CI host cables 2 56 HSJ50 Array Controller Index replacing SCSI device port cables 2 58 Cache memory procedure for adding 3 52 SIMM cards 3 52 Cache module removing 5 6 Cache modules handling for ESD 2 2 installing into HSJ50 controller 3 44 removing 5 6 replacing 2 29 replacing battery cells 2 33 CD ROM installing 3 69 CDU G 2 CFMENU considerations for using 4 2 CI host cables replacing 2 56 CLCP code patch 3 3 CLI G 2 code patch error messages 3 9 installing patches 3 6 listing patches 3 3 3 4 special considerations 3 3 Service Manual Cold swap power supplies 2 47 solid state disks 2 50 Configuring disk drives 3 17 Controller removing 5 6 Controller shelf G 3 Controllers handling for ESD 2 2 installing patches for 3 3
153. e DISCONNECT_SENT or DISCONNECT_MATCH connection state 4059020A Received SCS CREDIT_REQ when in the 02 DISCONNECT_REC or DISCONNECT_MATCH connection state 405A020A Received SCS APPL_MSG when in the 02 DISCONNECT_SENT or DISCONNECT_ACK connection state 405B020A Received SCS ACCEPT_REQ ona 02 connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Service Manual HSJ50 Array Controller 4053020A Received SCS ACCEPT_REQ when not in CONNECT_ACK connection state 00 02 02 02 02 02 02 2 Appendix A HSJ50 Array Controller Instance Code Explanation Received SCS ACCEPT_RSP on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS REJECT_REQ on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state fields are undefined Received SCS REJECT_RSP on a connection that is no longer valid Note that in this instance if the connection id field is zero the content of the vcstate remote node name remote connection id and connection state
154. e Explanation 20190010 A cache state of a unit remains WRITE_CACHE_UNWRITTEN_DATA The unit is not ONLINE thus this state would only be valid for a very short period of time 201A0100 An attempt to allocate memory so a CLI prompt message could be reformatted has failed 201B0100 Insufficient resources to get memory to lock CLI 201C0100 Insufficient resources to get memory to unlock CLI 20640000 Nindy was turned on foo o s 01 01 00 20650000 Nindy was turned off 00 20692010 To enter dual redundant mode both 20 controllers must be of the same type 206A0000 Controller restart forced by DEBUG CRASH 0l REBOOT command 206B0010 Controller restart forced by DEBUG CRASH 01l NOREBOOT command Table A 13 Host interconnect services last failure codes Last Fail Code Explanation 40000101 HSJ3x 4x An unrecognized DSSI opcode was received by HIS These packets are packets with CI opcodes recognized by the port but not by HIS Last Failure Parameter 0 contains the CI opcode value HSJ3x 4x HS1CP An unrecognized DSSI opcode was received by HIS These packets are packets with DSSI opcodes recognized by the port but not by HIS Last Failure Parameter 0 contains the DSSI opcode value 4007640A HSJ3x 4x HS1CP DSSI Port detected error upon attempting to transmit a packet This resulted in the closure of the Virtual Circuit 40150100 LOCAL VC Timer in unexpected state 40280100 Failed to allocate Buffer Name Tab
155. e fault Device activity Device fault 3 67 Table 3 16 Storage SBB Status Indicators SBB is operating normally SBB is operating normally SBB is operating normally The SBB is inactive and there is no fault Fault status SBB is probably not responding to control signals Replace the SBB Fault status SBB is inactive and spun down Replace the SBB Fault status SBB is active and is spinning down because of a fault Fault status SBB has been identified by the controller as failed Replace the SBB LED on a LED off LED flashing A When using the LOCATE command the lower LED is used to locate or identify units storagesets and devices in a cabinet For example To locate or identify device DISK100 enter CLI gt LOCATE DISK100 The lower LED amber of DISK100 will flash a rate of once every second To turn off the lower LED use the LOCATE CANCEL command Service Manual 3 68 Installing Note If a device has been placed into a failedset the lower LED will flash at a rapid rate In this case the LOCATE command cannot identify the device The lower LED of each configured device can be tested using the LOCATE command with the ALL qualifier For example To test the fault LED of each configured device enter CLI gt LOCATE ALL The lower LED of each configured device will flash at rate of once every second To turn off the lower LED use the LOCATE CANCEL command Refer to
156. e module until it is seated Service Manual HSJ50 Array Controller Replacing field replaceable units 2 27 Figure 2 16 Installing the new cache and controller module CXO 5324A MC 3 Slide the controller module into the shelf using the same rails as the removed module See Figure 2 16 4 Tighten the two captive retaining screws on the front panel of the cache and controller modules 5 Reconnect the CI connector to the controller s front panel 6 Reconnect the open end of the ECB cable to the new cache module 7 Press and hold the green reset button while inserting the program card The program card eject button will extend when the card is fully inserted HSJ50 Array Controller Service Manual 2 28 Service Manual 10 11 12 13 Replacing field replaceable units Release the reset buttons to initialize the controller Attach the maintenance terminal to the replaced controller If the controller reports an invalid cache error enter one of the the following CLI command Ifyou have replaced the cache module enter the following command CLI gt CLEAR_ERRORS INVALID_CACHE THIS_CONTROLLER DESTROY_UNFLUSHED_DATA Otherwise enter HSJ50 gt CLEAR_ERRORS INVALID_CACHE THIS CONTROLLER NODESTROY_UNFLUSHED _DATA Ensure that the controllers are not in dual redundant mode HSJ50 gt SET NOFAILOVER Enter the following command from the surviving controller to put both controllers into dua
157. e unit within your subsystem During the installation the source disk drive is not available for other subsystem operations Some firmware releases require that you format the device after installing the new firmware Refer to the documentation that accompanied the firmware to determine if you need to reformat the device after installing new firmware Some devices may not reflect the new firmware version number etc when viewed from another controller in dual redundant configurations If you experience this simply re initialize the device from either controller HSJ50 Array Controllers Installing Copying the firmware to your subsystem Before you can install new firmware on a device you need to copy its image to a single disk drive unit in a StorageWorks subsystem You can use this disk drive in other StorageWorks subsystems to install firmware on other devices in those subsystems The single disk drive unit may be a raw disk drive with no file system or label or it may be a normal file system disk drive In either case the firmware must be copied in contiguous blocks beginning at a known logical block number LBN The steps for copying the firmware are specific to your host s operating system and are given below From OpenVMS HSJ50 Array Controllers To copy one or more firmware files from your OpenVMS host to a single disk drive unit in your subsystem 1 Add a disk drive to your subsystem CLI gt ADD DISK di
158. eceseeceseceseeceseeeseeeesecseeceeeseeeeeeeeeeen 5 11 Figure 5 7 Connecting a maintenance terminal to the controller eee ee eeeeeseeeeeeeeeeeeeees 5 12 Table 1 1 Solid controller LED codes 00 eee eeseeesecsseeeeneeesseesneeceeeceeeeesaeeesaeenaeesseesenees 1 6 Table 1 2 Flashing controller LED codes 000 ceecccecesseeeenceceeneeeceeeeeeeeneeeseneecseneeeseseeeeenaeess 1 7 Table 1 3 DLX data patterns meniere aes aee e aaae SSE SE aoe eek bis Beles hachesd 1 32 Tabl e J A TEX data A E S E r rr E E EE E EAS EESE ES 1 44 Table 1 5 WIDPY coritrol k eysrinoieene to a a e ea R a N 1 45 Table1 6 WVEDPY commands s s tsssoutisiere Seisioassesieuedsnaosaxheedaihcaesesaateaodeseaietioasnaciace dese 1 46 Table 2 1 Reguited tools anng iein eer fee eessos hed eis E Le eu Meiners ites sek 2 3 Table 2 2 HEB statis indicators ie anien tp rap neesate eel aa reaa Eo erases EE E ESS 2 14 Table 2 3 Required tools for single controller replacement cc eeeeeseceseeeeteeeseeeneeeeeers 2 15 Table 2 4 Required tools for controller replacement ccccceeeeceeeeeeeceeeeeceneeeeeeeeeeeneees 2 22 Table 2 5 Required toolsssii05i as oee e e oeiee tvs hides SO Eee EO EE ate EEAS EKE nee 2 29 HSJ50 Array Controller Service Manual Table 2 6 Required tools bpr 50 devas ian vege edesets toasts eae ee ane aaaea e si Teano ees 2 33 Table 2 7 Required tools ysis tists a etek A A a 2 47 Table 2 8 Required tools for installing
159. eceseesseeseeeseeeeseeesaeeeseeeaeees 2 8 Figure 2 4 Removing the controller and cache modules sceeeeeseeeseceseeceeeeeseeeseeeneeeenees 2 9 Figure 2 5 Installing the new cache and controller module see eeeeeeeeeseceneeeeeeeeneeeneers 2 11 Figure 2 6 Reconnecting the ECB cable cee eeseeescccsseesseecsseeceecseeceseesseecsseeseseseseeeesaes 2 12 HSJ50 Array Controller Service Manual viii Figure 2 7 Connecting a maintenance terminal to the controller eseeeeeceeeeereeeeereeeeeeee 2 16 Figure 2 8 Disconnecting the ECB cable eeseeeeseeeeseseesserirsrrsressresrrsrersresresrresersressreresse 2 17 Figure 2 9 Removing the program Card esseseeeseeeresrresrssrirsteeresererererereressressrerresrresrenees 2 18 Figure 2 10 Removing the controller and cache module ee eeeeeseceseeeeseeeeeeeseeeeeeeenees 2 19 Figure 2 11 Installing the new cache and controller modules ce ceeeeeeseeeeeeteeeeeeeeeeeeee 2 20 Figure 2 12 Connecting a maintenance terminal to the controller eee eeeeeeeeeeeeeeeeeeeee 2 23 Figure 2 13 Removing the program Card eesesscssseeeseeseeeesseeceeecsaeecaeecsaeeseesseeeeesenees 2 24 Figure 2 14 Disconnecting the ECB cable cei eeeceseeesecsneeceeecececeeceseeseeeesaeeseeeesaeenaes 2 25 Figure 2 15 Removing the controller and cache modules ceeeeeeeeseeseecreeseeeeeeeeaeeeees 2 26 Figure 2 16 Installing the new cache and controller Module ee eeseeseeeseee
160. econds after inserting one disk drive before inserting a second disk drive The tools listed in Table 2 8 are required for replacing disk drives Table 2 8 Required tools for installing disk drives Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet Service Manual HSJ50 Array Controller Replacing field replaceable units 2 51 Disk drive replacement procedure 3 1 2 and 5 1 4 inch drives 1 Press the two mounting tabs together to release the disk drive from the shelf and partially pull it out of the shelf See Figure 2 28 Figure 2 28 Removing a disk drive iS x A a pa E a pa E aa p E 4 L O 2 COA Pp A ae M M CXO 4439A MC 2 Using both hands pull the disk drive out of the shelf See Figure 2 28 Notice that the corresponding port LED on the controller s OCP is flashing 3 When the port LED stops flashing align the replacement disk drive with the shelf rails 4 Push the disk drive all the way into the shelf until the locking tabs snap into place HSJ50 Array Controller Service Manual 2 52 Replacing field replaceable units 5 Observe the device status LED for the following indications See Figure 2 29 The device status indicator amber LED is off Figure 2 29 Status indicators for 3 5 and 5 25 inch SBBs Device Device Activity Activity Green Green Device Dev
161. ed Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted Pull the cache module partly out of the shelf Service Manual 2 44 Replacing field replaceable units 13 Disconnect the Controller B cache module ECB cable from the ECB you are replacing and connect it to the new ECB See Figure 2 25 Tighten the battery cable connector mounting screws on the new SBB battery module Leave the old SBB battery module in the device shelf Figure 2 25 ECB Cable Connection CXO 5282A MC Reinstalling the modules 1 When the controller prompts you answer the question Do you have a replacement HSJ50 readily available IN y 2 Press Y for yes 3 Answer the question Sequence to INSERT the other HSJ50 has begun Do you wish to INSERT the other HSJ50 N 4 Press Y for yes Service Manual HSJ50 Array Controller Replacing field replaceable units 2 45 HSJ50 Array Controller 3 10 Wait for the following text to appear on the operating controller s console Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced Port 6 quiesced All ports quiesced Insert the other HSJ50 WITHOUT its program card and press Return Slide the cache module for controller B all the way back into the shelf and push firmly to seat it in the backplan
162. ed to the UNKNOWN state Therefore the I O was aborted HSJ50 Array Controller Service Manual A 18 Appendix A Instance Code Explanation 02160064 A request was received to abort this 00 command 0 0217000A Raid support is enabled but not licensed on 0 this controller Any use of this feature requires licensing Continued use does not comply with the terms and conditions of licensing for this product 0218000A Write back cache support is enabled but not licensed on this controller Any use of this feature requires licensing Continued use does not comply with the terms and conditions of licensing for this product 02192401 The cache modules are not configured properly for a dual redundant configuration One of the cache modules is not the same type both write back cache which is necessary to perform cache failover of dirty write back cached data Note that in this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 021A0064 Disk Bad Block Replacement attempt completed for a write of controller metadata to a location outside the user data area of the disk Note that due to the way Bad Block Replacement is performed on SCSI disk drives information on the actual replacement blocks is not available to the controller and is therefore not included in the event report 021B0064 Disk Bad Block Replacement attempt completed for a read of controller metadata from a location outsi
163. eeeenseeesaeeeseesseeeneers A 46 Table A 4 Device services last failure codes ce sceeseesseceseceseeseeeceseeceeeesaeeaeessaeesaeesas A 56 Table A 5 Fault manager last failure codes 0 0 eee eeeeeseeeeeeeeseeceeeeeaeenseecsseesaeessaessaeeraees A 64 Table A 6 Common library last failure codes 0 00 eeeeeneceeeeeseecneeceseeeeeceaeesaeeeeeeesaeeeaee A 67 Table A 7 DUART services last failure codes see eeseeseeeeseceneeceeecesceceeeceseesneesseeeaeeesee A 67 Table A 8 Failover control last failure COdeS ee eeeeeseeceseceseeceseeesceceseeesaeceeeeesaeeeaeeesaes A 68 Table A 9 Nonvolatile parameter memory failover control last failure codes A 69 Table A 10 Facility lock manager last failure codes teesceeseeeeseesneeeeeeeeeeeeseeeeeeeeeenee A 71 Table A 11 Integrated logging facility last failure codes cceeeseeeseeceeeeeeeeeeeeeeneeeeeneees A 72 Table A 12 CLI last famlure codesys iiaea ees foes ona teva EEEE A Ee EE aK ataie A 72 Table A 13 Host interconnect services last failure codes seeeeeeseeereereesrerrerrerrrereereeeesee A 74 Table A 14 SCSI host interconnect services last failure COde S ces eeeeeeeeeeeeteeereeeeeeeees A 76 Table A 15 Host interconnect port services last failure codes 0 0 eeeeeeseeeseeeteeeteeeneeteeers A 77 Service Manual HSJ50 Array Controller Table A 16 Table A 17 Table A 18 Table A 19 Table A 20 Table A 21 Table A 22 Table A 23 T
164. eeseeeesaeeenee 2 57 Figure 2 32 Removing the volume shield ee eee eeseceeeceseeceeesesecsaeecsaeenseecseeeeeeeaees 2 59 Figure 2 33 Access to the SCSI cables 0 0 cee ceesceseecesseceneesseesaeecsseesaeecsaeesaeecseessaeeseeesnees 2 60 Figure 3 1 Connecting a maintenance terminal to the controller cece eeeeeeseeeneeeneeeenees 3 4 Figure 3 2 Connecting a maintenance terminal to the controller eee eeeeeeseeeeeeeneeeeeees 3 7 Figure 3 3 Copy the firmware to a disk drive in your subsystem then distribute it to the devices you Want to UPQrade oe ee eeeeeeeseccneecsseeeesseesseeesaeecseecsaeecsseeseesseesssaeeneeeesaee 3 15 Figure 3 4 Connecting a maintenance terminal to the controller cee eeeeeeeeeeeeeeeeeeeene 3 26 Figure 3 5 Installing an SBB battery Module eee eeeeseeeeseceseeeeeeeesaecneeeesaeesaeessaeenaes 3 27 Figure 3 6 Installing power supplies into the controller shelf cee eeeeeeeeeeeteeeteeeeeenees 3 27 Figure 3 7 Installing a single controller SW500 cabinet eeeeeeeseeeseeceeeeeeeeeteeeeeeeeee 3 28 Figure 3 8 Installing a single controller SW800 cabinet eeeeeseeeseeeeeeseeeeeteeeeeeeeee 3 29 Figure 3 9 Connecting a maintenance terminal to the controller eeeeeteeeneeeneeeees 3 31 Figure 3 10 Installing an SBB Battery module elec eeeeeeeeneeceseceeeceeeeeseesneeesneeeeeesee 3 34 Figure 3 11 Installing the second controller and cache module ee eeseeeeeeseeeereeeeeeee 3 35 Figure 3 12 Inserting
165. emory available in the buffer area is insufficient for the controller to run The code image was not the same as the image on the card after the contents were copied to memory Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module HSJ50 Array Controller Troubleshooting 1 11 Description of Error Corrective Action OO 8 O Diagnostic register Replace controller shelf indicates that the cache backplane module does not exist but access to that cache module caused an error OO BR RO Diagnostic register Replace controller shelf indicates that the cache backplane module does not exist but access to that cache module did not cause an error eOOOO the journal SRAM Replace controller battery is bad module OE ORO There was an Replace controller unexpected interrupt module from a read cache or the present and lock bits are not working correctly HE OK There is an interrupt Replace controller pending on the module controller s policy processor when there should be none There was an Replace controller unexpected fault during module initialization There was an Replace controller unexpected maskable module interrupt received during initialization There was an Replace controller unexpected module nonmaskable interru
166. emoving the controller 1 Determine that one controller is still operating properly 2 If the controller you are removing is partially functioning connect a maintenance terminal to the controller If the controller has failed connect a maintenance terminal to the other controller See Figure 2 12 Figure 2 12 Connecting a maintenance terminal to the controller Local connection port BC16E XX gm To terminal CXO 5322A MC 3 At the partially functioning controller or at the operating controller s console enter HSJ50 gt SHUTDOWN THIS CONTROLLER To ensure that the controller has shut down cleanly check for the following indication on the controller s OCP The Reset light is lit continuously Port lights 1 2 3 are also lit continuously 4 After shut down is complete connect the maintenance terminal to the other controller Shut down the controller by entering HSJ50 gt SHUTDOWN THIS CONTROLLER 5 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist HSJ50 Array Controller Service Manual 2 24 Replacing field replaceable units 6 Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point 7 Unsnap and remove the program card ESD cover See Figure 2 13 Figure 2 13 Removing the program card PCMCIA button card CXO 5323A MC 8 Remove the program card b
167. en prompted Otherwise enter n to start the test Select another unit y n n n DILX testing started at lt date gt lt time gt Test will run for 10 minutes Service Manual 1 24 6 Troubleshooting DILX will run for 10 minutes and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test T if you are running DILX through VCS Type C to terminate the current DILX test Type Y to terminate the current test and exit DILX Running an initial test on all disks Service Manual Ctttfo sos gt The initial test performs write operations Make sure that the disks that you use do not contain customer data This section provides instructions on how to run a DILX test on all single disk units in the subsystem This is a read write basic function test that uses the default DILX settings The test performs an initial write pass followed by a repeating 10 minute cycle consisting of eight minutes of random I O and test 1 two minutes of data intensive transfers You can set the duration of the Start DILX from the CLI prompt HSJ50 gt RUN DILX Choose the auto configure option to test all single disk units Do you wish to perform an Auto configure y n In y Choose option test all disks if you have a single controller system choose option 2 test half of the disks if you have a dual
168. epeated three times Tape mark writing is intermixed in the test The read pass consists of three phases Data intensive Read operations of fixed record sizes with a byte count equal to the expected tape record size Forward position commands are issued when tape marks are encountered Random Begins at the point where random sized records were written to the tape Most read operations are issued with a byte count equal to the expected tape record byte count Occasionally read operations are intemixed with a byte count less than or greater than the expected tape record byte count Forward position commands are issued when tape marks are encountered Position Intensive Begins halfway down from the start of the area where random sized records are located Read operations and position commads are intermixed so that the test gradually proceeds toware the EOT Forward position commands are issued when tape marks are encountered In all phases if the EOT is detected the tape is rewound to the beginning of tape BOT and the write pass starts again Service Manual 1 36 Service Manual Troubleshooting To run a basic function test 1 Start TILX from the CLI prompt HSJ50 gt RUN TILX Do not accept the default settings Use all defaults y n n Enter the amount of time that you want the test to run Enter execution time limit in minutes 10 65535 10 25 If you want to see performance summaries while
169. er Service Manual A 10 Service Manual Appendix A Instance Code Explanation 01943702 The CACHEAO DRAB unexpectedly reported a Nonexistent Memory Error condition 01953702 The CACHEA1 DRAB unexpectedly reported a Nonexistent Memory Error condition 01963702 The CACHEBO DRAB unexpectedly reported a Nonexistent Memory Error condition 01973702 The CACHEB1 DRAB unexpectedly reported a Nonexistent Memory Error condition 01982F02 An Address Parity error was detected during a 2F memory refresh attempt by the Master DRAB 01993002 An Address Parity error was detected during a 30 memory refresh attempt by the CACHEAO0 DRAB 019A3002 An Address Parity error was detected during a memory refresh attempt by the CACHEA1 DRAB 019B3102 An Address Parity error was detected during a 31 memory refresh attempt by the CACHEBO DRAB 019C3102 An Address Parity error was detected during a 31 memory refresh attempt by the CACHEB1 DRAB 019D2F02 The Master DRAB detected an Address Parity 2F error during an FX attempt to read buffer memory 019E2F02 The Master DRAB detected an Address Parity 2F error during a Host port attempt to read buffer memory 019F2F02 The Master DRAB detected an Address Parity 2F error during a Device port attempt to read buffer memory 01A02F02 The Master DRAB detected an Address Parity 2F error during an 1960 attempt to read buffer memory 01A13002 The CACHEAO DRAB detected an Address 30 Parity error du
170. er bits 10 through 12 contains the value 3 and WDR1 register bit 28 is clear Master DRAB CSR register bits 10 through 12 contains the value 4 and WDR1 register bit 29 is clear Master DRAB CSR register bits 10 through 12 contains the value 5 and WDR1 register bit 30 is clear Master DRAB CSR register bits 10 through 12 contains the value 6 and WDR1 register bit 31 is clear If none of the above conditions were true follow repair action 36 Service Manual A 96 Service Manual Appendix A Repair Action Action to take Code The Master DRAB detected a Nonexistent Memory Error condition Use the following register information to locate additional details The Master DRAB EAR register combined with Master DRAB ERR bits 0 through 3 address region yields the affected memory address The Master DRAB EDR register contains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 36 The CACHEAO or CACHEA1 DRAB detected a Nonexistent Memory Error condition The CACHEAn DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEA memory region yields the affected memory address The CACHEAn DRAB EDR register contains the error data
171. er is used to modify the characteristics of the VTDPY display Table 1 6 lists the VTDPY commands Service Manual HSJ50 Array Controller Troubleshooting 1 47 Table 1 6 VTDPY commands Commana Siring Function i DISPLAY CACHE Use 132 column unit caching statistics display DISPLAY DEFAULT Use default 132 column system performance display DISPLAY DEVICE Use 132 column device performance display DISPLAY STATUS Use 80 column controller status display Qurr The keywords in the command strings can be abbreviated to the minimum number of characters that are necessary to uniquely identify the keyword Entering a question mark after a keyword causes the parser to provide a list of keywords or values that can follow the supplied keyword The command line interpreter is not case sensitive so keywords can be entered in uppercase lowercase or mixed case Upon successful execution of a command other than HELP the command line interpreter is exited and the display is resumed Entering a carriage return without a command also exits the command line interpreter and resumes the display If an error occurs in the command the user prompts for command expansion help or the HELP command is entered the command line interpreter prompts for an additional command instead of returning to the display How to Interpret the VIDPY Display Fields HSJ50 Array Controller This section contains descriptions of the major fields in the VTDP
172. er queue 62030100 Failure to allocate connection id timers Table A 19 SCSI host value added services last failure codes Last Fail Code Explanation 64000100 Insufficient buffer memory to allocate data structures needed to propagate SCSI Mode Select changes to other controller 64010100 During initialization of LUN specific mode pages an unexpected device type was encountered HSJ50 Array Controller Service Manual A 86 Appendix A Table A 20 Disk inline exerciser DILX last failure codes Last Fail Code Explanation Repair Action Code 80010100 An HTB was not available to issue an I O when it should have been 80020100 A unit could not be dropped from testing because an available cmd failed 80030100 DILX tried to release a facility that wasn t 01 reserved by DILX 80040100 DILX tried to change the unit state from MAINTENANCE_MODE to NORMAL but was rejected because of insufficient resources 80050100 DILX tried to change the usb unit state from MAINTENANCE_MODE to NORMAL but DILX never received notification of a successful state change 80060100 DILX tried to switch the unit state from MAINTENANCE_MODE to NORMAL but was not successful 80070100 DILX aborted all cmds via va d_abort but the HTBS haven t been returned 80090100 DILX received an end msg which corresponds to an op code not supported by DILX 800A0100 DILX was not able to restart his timer 800B0100 DILX tried to issue an I O for an opcode not supported 800C0
173. erred to the device during the last screen update interval Tg This column shows the maximum number of transfer requests queued to the device during the last screen update interval If a device does not support tagged queuing the maximum value is 1 CR This column indicates the number of SCSI command resets that occurred since VTDPY was started BR This column indicates the number of SCSI bus resets that occurred since VTDPY was started TR This column indicates the number of SCSI target resets that occurred since VTDPY was started Service Manual 1 62 Troubleshooting Device SCSI Port Performance Port Rg S RdKB S WrkKB S CR BRO TRO 1 0 0 0 0 0 0 2 11 93 0 0 0 0 3 48 341 0 0 0 0 4 48 340 0 0 0 0 5 58 93 375 0 0 0 6 0 0 0 0 0 0 Description This subdisplay shows the accumulated I O performance values and bus statistics for the SCSI device ports The subdisplay for a controller that has six SCSI device ports in shown o 2 Service Manual The Port column indicates the number of the SCSI device port Rg S This column shows the average I O request rate for the port during the last update interval These requests are up to eight kilobytes long and are either generated by host requests or cache flush activity RdKB S This column shows the average data transfer rate from all devices on the SCSI bus in kilobytes during the previous screen update interval WrKB S This column show
174. eseeeneeeesaes 1 40 TIX error codes ine npu e a EE a aE TE AE EE AR 1 42 HSJ50 Array Controller Service Manual TTX data patterns sce ics eo hyeon n feet taste Eaa ae teint eevee btie ae terion ga ae ees 1 43 Monitoring system performance with the VTDPY utility eee eeecceseeceeeseneesneeeeeenaees 1 45 How to Run V ED PY ierra a aE TR O E E OATS 1 45 Using the VITDPY Control Key Shimaore a a aeniea SaS ap Da s E e TEESE SESA 1 45 Using the VTDPY Command Line sape arira Riis eit daphne agate 1 46 How to Interpret the VTDPY Display Fields 0 0 00 eeceeeecseeeseceeeneeeceeseeeeeneeeeeeeesnees 1 47 CI DSSI Host Port Characteristics c ccccssseeesscccesceeeeeeeeceeeneeeenneeenceeeeeeeeeeeneeeeeneeeesae 1 47 SCSI Host Port Characteristics ccccccessseeeseecceescceceseeeeeeeeeceeneecesaeeeceaeeeeesaeeeeeeeeeenneeeees 1 48 CI Performance Display ss areia e yee ha sidsacsedaasenssyeedash A N EAE 1 49 DSSL Performanc Display nenii steal iod eas ei bie band out a othe el thn a a aoi 1 49 CI DSSI Connection Statuss aii a aE E E EA E a eiS 1 50 CI DSST HOst Path SIAS e ie eteen e taea ere tateetan ea Re ETE Ea ERA ERS AST ESTES eS 1 51 D vice SCSI Status E E EE E A E T 1 52 Unit Status abbreviated 0 0 cccccccccccccssscceeceeesseeeeecceeseeeeeeeceeseseeeeeeeseeaeeeeeseeesaeeeeeees 1 53 Writ Status E aI D EAEE AEE A EE T 1 56 DEVICES tatu Ses cees2y esses eee Fees a e a a a O Peace E E aed T uae ES 1 59 Deyice SCSL Port Pe
175. esseeeeeees 3 30 Required tOOlS 5 cosa Novices air dade aiid odes aa eben dk nap fasiepap test Ae EEEE EA e E i a 3 31 Add asecond controle necita nied Cues bh antisite ae 3 31 Adding a second controller using C_SWAP 000 e ccceeececeessseeseeeeceseeeeeeeeeceseeeseaeecenaeeeeeereeeeeas 3 38 Required tools ssscstsesthsk ashes sive aa a a a a a AOS A E N e eS ea TEE EERS 3 39 Prepare the subs ysteti cis a a ae A EAE AT SAREETAN 3 39 Restartiiis the subsystem nuit cian eea e a ee aad 3 43 Installing a cache module s maincine E a a E eau Bigs ae 3 44 Reguited fo l Sierre n E E e a Saeed Ee esae a eS a aE S aa i aaia 3 44 Install a cache module single controller configuration eseseseesseseeeeereereerrrersereerseee 3 44 Removing thecontroll r mensan eetet anr AE edie EESE Eaa 3 45 Installing a write back cache module 0 0 0 eee eeeeeesneeceeeeeseeceseceaeesseessseeeeeseaeens 3 47 Restartiris the subsystem ts nea i a enea Stace voces iahadebedee Saeed estes basi Pied SeSe Ea oE ten Atia 3 50 Adding Cache Memory iss c5 bss dacs aii ens nA Mn hain sida dao Matas Aaeaias 3 52 Required Tools ssssiccsi st oi eoi aiid thes eo aad teach ad iho ee he tt 3 52 Tnstalling SIMM Card Si scccsisissteas 2cstecinaeakisveaterenniasasuaaicataaceibs dagiensansteiis siaagsugahecanciatass 3 53 Installing power supp less si scso ebessis Pasch oi cads sabedlns cove ESE SE S Ka n a aa o advan rls RSie 3 60 Power supply and shelf LED status ind
176. evice requests from 01 VA_SAVE_CONFIG completed within the time out interval 02D10102 Failed to read or write metadata while 01 UNMIRRORING a mirrorset to a disk unit 02D30100 The controller has insufficient memory to 01 allocate enough data structures used to manage metadata operations 02D50100 No resources are available to handle a new 01l metadata work request 02D60100 An invalid storage set type was specified for 01 metadata initialization 02D72390 Forced failover of devices due to a cache 23 battery failure This was initiated because the dual partner was operational with a good battery and there is no host failover assistance HSJ50 Array Controller Service Manual A 56 Last Fail Code 02D80100 Appendix A Explanation Unable to allocate memory for Fast Buffers Code bug suspected rather than low buffer memory because the code just checked for sufficient buffer memory Table A 4 Device services last failure codes Last Fail Code Explanation 03010100 03020101 03030101 03040101 03050101 03060101 03070101 03080101 Failed request for port specific scripts memory allocation Invalid SCSI direct access device opcode in misc command DWD Last Failure Parameter 0 contains the SCSI command opcode Invalid SCSI sequential access device opcode in misc cmd DWD Last Failure Parameter 0 contains the SCSI command opcode Invalid SCSI CDROM device opcode in misc command DWD Las
177. f the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers 03350188 The TEA bus fault signal was asserted into a device port Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port BC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers HSJ50 Array Controller Appendix A HSJ50 Array Controller A 59 Last Fail Code Explanation 03360188 A device port s host bus watchdog timer expired Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port
178. f the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT For Write Data Parity Error conditions bits 0 through 3 of the CACHEAn DRAB CSR register identify the byte in error For Address Parity Error conditions follow repair action 34 For Write Data Parity Error conditions follow repair action 35 HSJ50 Array Controller Appendix A HSJ50 Array Controller Repair Action Code 31 A 99 Action to take The CACHEBO or CACHEB1 DRAB detected an Address Parity Error or a Write Data Parity Error condition Use the following register information to location additional details about the error If the failure occurred during a memory refresh attempt the CACHEBn DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEB memory region yields the affected memory address If the failure occurred during a memory access attempt the CACHEBO DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEB memory region or bits 20 through 23 CACHEB DRAB register region yields the affected memory address Unfortunately no other information is available to distinguish a memory region access from a DRAB register region access The CACHEBn DRAB EDR register con
179. fic information y n n y When the hard error limit is reached the unit will be dropped from testing Enter hard error limit 1 65535 32 100 When the soft error limit is reached soft errors will no longer be displayed but testing will continue for the unit Enter soft error limit 1 65535 32 32 Set the maximum number of outstanding I Os for each unit Set the I O queue depth 1 12 4 6 Suppress caching Suppress caching y n n n Run the basic function test xxx Available tests are 1 Basic Function 2 User Defined 3 Read Only Use the Basic Function test 99 9 of the time The User Defined test is for special problems only Enter test number 1 3 1 2 Define the test sequence by entering command number and its associated parameters You may define up to 20 commands and they will be executed in the order in which you enter them If you define write or erase commands user data will be destroyed Enter command number 1 read write access erase quit read Enter starting lbn for this command 0 Enter the IO size in 512 byte blocks for this command 1 128 20 Enter in HEX the MSCP Command Modifiers 0 0 Service Manual Troubleshooting Repeat step 10 until you have defined the entire command sequence up to 20 When you have finished entering commands type quit The system displays a list of all single tape drive units by unit nu
180. g 3 35 12 Slide the cache module into the appropriate slot in the controller shelf Push the module firmly into the slot until it is seated See Figure 3 11 Figure 3 11 Installing the second controller and cache module Controller CXO 5324A MC 13 Slide the controller module into the appropriate slot Push the module firmly into the slot until it is seated HSJ50 Array Controllers Service Manual 3 36 Service Manual 14 Installing Caution Do not overtighten the controller s front panel captive screws the cache module s front panel captive screws or the ECB cable captive screws Damage to the controller PC board or front panel the cache module front panel or the SBB may result Tighten the front panel captive screws on the cache and the controller modules Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface Connect the battery cable to the cache module and then the ECB For dual ECB SBBs a Connect one end of a battery cable to ECB A and the other end to cache module A b Connect one end of a battery cable to ECB B and the other end to cache module B Reconnect power cords to the controller power supplies Unsnap and remove the program card ESD cover Remove the program card from the controller by pressing and holding in the Reset butto
181. g media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03BA0101 Source driver programming error encountered during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03BB0101 Miscellaneous SCSI Port Driver coding error detected during media loader operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03BC0101 A media loader related error code was reported that was unknown to the Fault Management firmware Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03BD450A The media changer device reported standard 45 SCSI Sense Data 03C80101 No command control structures available for operation to a device which is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Service Manual HSJ50 Array Controller Appendix A Instance Code 03C92002 03CA4002 03CB0101 03CC0101 03CD2002 03CE2002 03CF0101 03D04002 HSJ50 Array Controller A 31 Explanation SCSI interface chip command time out during operation to a device which is unknown
182. h 0 byte count 409D0100 Illegal return value from HIS MAP 40B40101 Invalid value in max_nodes field of 01 se_params structure Last Failure Parameter 0 contains the max_nodes field value HSJ50 Array Controller Service Manual 408F0100 Unrecognized HTB id type 01 1 1 1 1 01 01 01 01 01 01 01 1 A 76 Appendix A Table A 14 SCSI host interconnect services last failure codes Last Failure Explanation Repair Code Action Code 41000100 Encountered an unexpected structure type on S_shis_ctl scsi_q 41020100 Unable to allocate the necessary number of HTBS in shis_init 41030100 Unable to allocate the necessary number of large Sense Data buckets in shis_init 41060100 Unable to locate the IDENTIFY msg in HTB 41070100 Encountered an unknown MESSAGE OUT message 41080100 Encountered an unknown MESSAGE OUT message 41090100 Encountered an unknown structure on the host port queue During SCSI ABORT message 410A0100 Encountered an unknown structure on the host 0l port queue During SCSI ABORT TAG message 410B0100 Encountered an unknown structure on the host port queue During SCSI CLEAR QUEUE message 410E0100 Encountered an unrecognized queue tag message 41100100 Encountered a NULL completion routine pointer in a DD 41130100 Could not allocate a large sense bucket for 41160100 A sense data bucket of unknown type neither 01 LARGE or SMALL was passed to deallocate_SDB 41170100 Call to VA ENABLE _NOTIFIC
183. h against the currently installed software version Some patches require the installation of previous patches called dependent patches before they can be installed Each patch has a unique patch number to identify it The Code Patch function also allows you to list patches already installed You may want to list patches before you install a patch to see what has previously been loaded and to see how much free space is available You can run the Code Patch function of the CLCP utility from either a maintenance terminal or a virtual host terminal Code patch considerations HSJ50 Array Controllers Be aware of the following characteristics when using the Code Patch function of the CLCP utility e The controller reserves enough nonvolatile memory for approximately 10 patches However this number varies according to the size of the patches you install e Each patch is associated with only one software version and the Code Patch program verifies the patch against the currently installed software version e Patches are hierarchical In other words for any given software version patch number one must be entered before you enter patch number two and so on Furthermore there are no zero patches Patches are always numbered sequentially beginning with the number one e Because of the hierarchical patch structure removing any patch also removes all higher numbered patches For example deleting patch number two also removes patches
184. he FX Control and Status Register CSR Last Failure Parameter 1 DMA Indirect List Poi Last Failure Parameter 2 contains the FX nter register DILP contains the FX DMA Page Address register DADDR Last Failure Parameter 3 contains the FX DMA Command and control register DCMD A processor interrupt was generated by the controller s XOR engine FX indicating an unrecoverable error condition Last Failure Parameter 0 contains the FX Control and Status Register CSR Last Failure Parameter 1 contains the FX DMA Indirect List Pointer register DILP Last Failure Parameter 2 DMA Page Address r Last Failure Parameter 3 contains the FX egister DADDR contains the FX DMA Command and control register DCMD The logical unit mapping type was detected invalid in va_set_disk_geometry An invalid status was returned from CACHE LOOKUP_LOCK Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status An invalid status was returned from CACHE LOOKUP_LOCK Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 status contains the invalid Service Manual A 50 Appendix A Last Fail Code Explanation 02570102 An invalid status was returned from VA XFER during a operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the in
185. he test T if you are running DILX through VCS Type C to terminate the current DILX test Type Y to terminate the current test and exit DILX Running a disk basic function test HSJ50 Array Controller This section provides instructions on how to run a DILX basic function test on one or more disks The test performs an optional initial write pass followed by a repeating 10 minute cycle consisting of eight minutes of random I O and two minutes of data intensive transfers You can set the percentage of the test that will be read operations the data pattern to use for write command and other parameters Start DILX from the CLI prompt HSJ50 gt RUN DILX Service Manual 1 26 Service Manual Troubleshooting Skip the auto configure option to get to the basic function test Do you wish to perform an Auto configure y n n Do not accept the default settings Use all defaults and run in read only mode y n n Enter the amount of time that you want the test to run A single complete pass takes 10 minutes after the initial write pass Enter execution time limit in minutes 1 65535 60 25 If you want to see performance summaries while DILX is running specify how often DILX should display the summaries Enter performance summary interval in minutes 1 65535 60 5 The normal DILX summary simply indicates whether it detected any errors on each unit Additionally you can choose to see s
186. hether it detected any errors on each unit Additionally you can choose to see statistics on how many operations were performed and how much data was transferred during the test Include performance statistics in performance summary y n n y 6 TILX asks if you want hard and soft errors sense data and deferred errors displayed If you do answer y and respond to the rest of the questions If you don t want to see the errors displayed answer n and proceed to the next step HSJ50 Array Controller Troubleshooting HSJ50 Array Controller 10 11 12 13 1 39 Display hard soft errors y Display hex dump of Error Information Packet Requester Specific information y n n y When the hard error limit is reached the unit will be dropped from testing Enter hard error limit 1 65535 32 100 When the soft error limit is reached soft errors will no longer be displayed but testing will continue for the unit Enter soft error limit 1 65535 32 32 Set the maximum number of outstanding I Os for each unit Set the I O queue depth 1 12 4 6 Suppress caching Suppress caching y n n n Run the read only test xxx Available tests are 1 Basic Function 2 User Defined 3 Read Only Use the Basic Function test 99 9 of the time The User Defined test is for special problems only Enter test number 1 3 1 3 Set the number of records to be written read during the test
187. holding in the Reset button then pressing the eject button next to the card See Start the C_SWAP program HSJ50 gt RUN C_SWAP Figure 2 2 Pull the card out and save it for use in the replacement controller module Start the C_SWAP program HSJ50 gt RUN C_SWAP Service Manual 2 6 Replacing field replaceable units Figure 2 2 Removing the program card pcmcia Dutton card CXO 5323A MC Service Manual HSJ50 Array Controller Replacing field replaceable units 2 7 Removing the modules 1 Indicate that you are going to remove the controller module Do you wish to remove the other HSJ50 y n n 2 Press Y for yes 3 Indicate if you are also going to remove the cache module Will its cache module also be removed Y N n 4 Wait for the following text to be displayed at the console Note If the cache module is not to be removed the time allowed to remove the controller will be 2 minutes Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced Port 6 quiesced All ports quiesced Remove the other controller the one WITHOUT a blinking green LED within 5 minutes Time remaining 4 minutes 50 seconds 5 Remove the CI connector from the controller Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 6 Di
188. hose symptoms Controller is not operating Symptoms e Controller is not operating e Green RESET LED is on solid e Controller does not restart when you press the RESET button Likely Cause In dual redundant configurations when one controller stops the surviving controller asserts a kill line which prevents the controller from restarting Solution 1 Establish a local or remote terminal connection to the operating controller 2 From the CLI tell the operating controller to allow the other controller to restart HSJ50 gt RESTART OTHER_CONTROLLER 3 Press the RESET button on the non operating controller and verify that it returns to normal service Unable to see units from host HSJ50 Array Controller Symptoms e Units have been created in the subsystem but cannot be seen from the host Likely Cause Settings are not correct or enabled to allow the subsystem to communicate with the host Solution e Check that the console version includes HSJ50 device support e Check that PATH_A and PATH_B are enabled on the HSJ50 SHow THIS Service Manual 1 4 Troubleshooting e Check that the CIXCD is set up correctly Quiet Slot Count 10 Tick Count CI Node number for the subsystem does not conflict with another CI node CIXCD transition card is plugged into the right backplane slot and seated properly CI cables are wired correctly Transmit to Transmit and Receive to Receiv
189. hould not occur 81130100 TILX calculated an illegal position type value while trying to generate a cmd for the position intensive phase of the Basic Function test 81140100 While trying to print an Event Information Packet TILX discovered an unsupported MSCP error log format 81150100 A cmd which TILX issued was terminated with a sense key of SCSILSENSEKEY_ILLEGAL_REQUEST 81160100 A cmd which TILX issued was terminated with a sense key of SCSI_LSENSEKEY_VOLUME_OVERFLOW but the End of Medium bit is not set 81170100 A TILX cmd completed with a sense key that 01 TILX does not support 81180100 TILX found an unsupported device control block substate while trying to build a SCSI cmd for the Basic Function test 81190100 While TILX was deallocating his deferred 01 error buffers at least one could not be found 811A0100 TILX expected a deferred error to be on the 0l receive deferred error q but no deferred errors were there 811B0100 TILX was asked to fill a data buffer with an 01 unsupported data pattern 811C0100 TILX could not process an unsupported 01 answer in tx reuse_params 811D0100 TILX received a SCSI deferred error with a 01 template which is not supported Service Manual HSJ50 Array Controller Appendix A A 89 Table A 22 Device configuration utilities CONFIG CFMENU last failure codes Last Fail Code 84010100 Explanation An unsupported message type or terminal request was received by the CONFIG virtual
190. icators cece eeseecseceneceseceseeceeeceneeeneecneeesnees 3 60 REquiredstOOls nor ee ih tile AE e ead I eT be A a 3 63 Installing a power SUPPLY 2 c6sc5 sessciescacesadaesaesseavescsitesseadsasdencuetesaeteeseenianssesaadbagusess 3 63 Installing storage building DIOCKS eee eesceeseeceeecsseeeseseeseecesaeecsaeceaeecsacecseesseessneeeneeeesaes 3 64 SBB activity and status indicators ssoi irinae ea ee arar o aTa a E E ERSS 3 66 Asynchronous device installation 00 00 00 eee eee eseeescesseseeeseeseeeseesseessesseeseessseseseetseaes 3 68 Installing SB Bpa i iio sata anes slants eate oid iste a EE ET EET 3 69 Installing a solid state disk CD ROM and optical drives 0 00 0 eee eeee cece eeeeeseeneeeaes 3 69 Service Manual HSJ50 Array Controller vii 4 Moving Storagesets and Devices PRECAUTIONS rsa nets ds Phebus tata besat testes E ader havs E tase david sly das eE E is NES Ea 4 2 Mowing storagesets si is cseydesteitssgsbeds avdediein eesti eid dae edi vei teehee ae bs 4 3 Moving storageset MEMDETS ceessesseeesseecsseceeecsscecseeceseecesaeesaecesseecesessaeessaeecseesseeesseeens 4 6 Moving a single disk rive UMIt elec eeesseeceeecesseecsseceseecsseecssecsseecseeceeeeesaeeesaeesaeeceaeessaees 4 8 MOWAT SE VICES e oi scus at cae tae Shek aect uae Ea ated aE e e ra A A bbs e eA ea aai 4 10 5 Removing Precautions i wiinderiia Ais sisi cua Moi dado deine ian d ht hada 5 2 REMOVING a pateh essa endene penen
191. ice Fault Fault Amber Amber CXO 4654B MC If the replaced disk was in the failedset before replacement and the failedset is set to autospare then the replacement disk will automatically place the new disk into the spareset Otherwise you may use the replaced disk in creating or recreating an appropriate container type 6 Restore the data from whatever backup method you use Service Manual HSJ50 Array Controller Replacing field replaceable units 2 53 Replacing tape drives Use the warm swap method to replace tape drives When using this method the OCP operator control panel buttons are used to quiesce the bus that corresponds to the replacement device Required tools The tool listed in Table 2 9 is required for replacing tape optical and CD ROM drives Table 2 9 Required tools for SBB replacement Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet Tape drive replacement procedure 1 Halt all I O activity to the appropiate port using the using the required procedure for your operating system 2 Quuiesce the port by pressing the OCP button for that port 3 When the OCP LEDs flash in an alternating pattern the device port is quiesce For example when you quiesce device port three and I O has halted the OCP LEDs are flashing in an alternating pattern as shown in the following illustration The flashing LEDs are represented by the dark circles with lines radiating from them
192. ides instructions on how to run an advanced TILX test in which you define the commands that make up the test read write reposition and so on Only select this test if you are very knowledgeable about tape drive testing You should use the basic function test in almost all situations 1 Start TILX from the CLI prompt HSJ50 gt RUN TILX 2 Do not accept the default settings Use all defaults and run in read only mode y n n 3 Enter the amount of time that you want the test to run Enter execution time limit in minutes 10 65535 10 25 4 If you want to see performance summaries while TILX is running specify how often TILX should display the summaries Enter performance summary interval in minutes 1 65535 10 5 5 The normal TILX summary simply indicates whether it detected any errors on each unit Additionally you can choose to see statistics on how many operations were performed and how much data was transferred during the test HSJ50 Array Controller Troubleshooting HSJ50 Array Controller 10 1 41 Include performance statistics in performance summary y n n y TILX asks if you want hard and soft errors sense data and deferred errors displayed If you do answer y and respond to the rest of the questions If you don t want to see the errors displayed answer n and proceed to the next step Display hard soft errors y Display hex dump of Error Information Packet Requester Speci
193. iguration To replace a controller using the C_SWAP method a minimum of two power supplies are required in SW500 and SW800 cabinets If you are performing the replacement procedure in an SW300 cabinet a minimum of five power supplies are required Required tools The tools listed in Table 2 1 are required to replace a controller and or cache module Table 2 1 Required tools Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities Small flat head screwdriver To loosen CI connector and front bezel captive screws ESD wrist strap and ESD mat To protect all equipment against ESD 5 32 Allen wrench To unlock the SW800 Series cabinet HSJ50 Array Controller Service Manual Replacing field replaceable units Caution Before invoking the C_SWAP utility terminate all other running utilities and disable all other terminals Preparing the subsystem 1 Terminate all running utilities and disable all other terminals 2 Have the replacement modules at hand The modules should be factory fresh or should have been shut down cleanly with the SHUTDOWN command in their last application Figure 2 1 Connecting a maintenance terminal to the controller 3 Connect a maintenance terminal to the controller that you are not replacing See Figure 2 1 Local connection port BC16E XX he To terminal CXO 5322A MC 4 Prefer all units to the controller that you are not rep
194. ill allow you to enter the patch however the patch will not be applied until its correct software version is installed Message You incorrectly entered the patch information Explanation The patch information was not entered exactly The program prompts you for each line of the patch entry with the default from your previous response Verify that each entry is exactly the same as the patch release If you choose not to continue or if you abort during this review procedure the patch information you entered is lost and you must enter the entire patch again You may enter Ctrl Z followed by Return at any prompt to choose the default for the remaining entries HSJ50 Array Controllers Installing HSJ50 Array Controllers 3 11 Message The patch you have just entered is not applied until the controller firmware is changed to Version x Explanation The patch entered applies to a software version other than the one currently installed in the controller Code Patch will not apply the patch until its correct software version is installed Message You have requested deletion of a patch number that another patch requires Explanation You are attempting to delete a patch in the hierarchy that has higher numbered patches entered Code Patch will allow you to proceed however the program will delete all the higher numbered patches in the hierarchy for this software version along with the specified patch Service Man
195. ilure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT For Write Data Parity Error conditions Bits 0 through 3 of the Master DRAB CSR register identify the byte in error For Address Parity Error conditions follow repair action 34 For Write Data Parity Error conditions follow repair action 35 Service Manual A 98 Service Manual Repair Action Code 30 Appendix A Action to take The CACHEAO or CACHEA1 DRAB detected an Address Parity Error or a Write Data Parity Error condition Use the following register informtion to locate additional details about the error If the failure occurred during a memory refresh attempt the CACHEAn DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEA memory region yields the affected memory address If the failure occurred during a memory access attempt the CACHEAO DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEA memory region or bits 20 through 23 CACHEA DRAB register region yields the affected memory address Unfortunately no other information is available to distinguish a memory region access from a DRAB register region access The CACHEAn DRAB EDR register contains the error data I
196. in an alternating pattern the device port is quiesced For example when you quiesce device port 3 and I O has halted the OCP LEDs are flashing in an alternating pattern as shown below The flashing LEDs are represented by the dark circles with lines radiating from them CXO 4824A MC 4 When the port has quiesced remove the tape drive by pressing the two mounting tabs together to release it from the shelf 5 Using both hands pull the tape drive out of the device shelf HSJ50 Array Controller Service Manual Appendix A Instance codes Last failure codes Repair action codes HSJ50 Array Controller Service Manual A 2 Appendix A Instance codes and definitions The following table contains instance codes and their definitions Use these tables to identify instance code definitions and the suggested repair action Repair action codes on page A 91 contains the repair action codes and actions Table A 1 Instance codes Instance Code Explanation Repair Action Code 01010302 An unrecoverable hardware detected fault 03 occurred 0102030A An unrecoverable firmware inconsistency was 03 detected or an intentional restart or shutdown of controller operation was requested 01032002 Nonvolatile parameter memory component EDC check failed The content of the component was reset to default settings 0121370A Memory System Error Analysis is indicated in the information preserved during a previous last failure
197. ing CLI commands to clear the error If you have replaced the cache module enter the following command HSJ50 gt CLEAR_ERRORS INVALID_CACHE THIS CONTROLLER DESTROY_UNFLUSHED_DATA Otherwise enter Service Manual 2 14 Service Manual 10 11 12 13 Replacing field replaceable units HSJ50 gt CLEAR_ERRORS INVALID CACHE THIS_CONTROLLER NODESTROY_UNFLUSHED_DATA Ensure that the new controller is not in dual redundant mode by entering the following command on the new controller HSJ50 gt SET NOFAILOVER If the controller was already in nonredundant mode you will see a message indicating this condition Connect the maintenance terminal to the new controller See Error Reference source not found Enter the following command from the new controller to put the controllers into dual redundant mode HSJ50 gt SET FAILOVER COPY OTHER_CONTROLLER The new controller will initialize Verify that all host settings are correct HSJ50 gt SHOW THIS CONTROLLER Modify any incorrect information If any information has been changed restart the new controller HSJ50 gt RESTART THIS CONTROLLER After the controller has initialized reconnect the CI cable to the new controller and tighten the captive screws Enable CI paths by entering the following commands HSJ50 gt SET THIS CONTROLLER PATH A HSJ50 gt SET THIS_CONTROLLER PATH_B On the ECB module front panel check the LED status indicator for the appropriate
198. ing cache memory You may increase cache memory up to a maximum of 128 MB in 4x32MB SIMM cards The controller module is seated in front of the cache module Any time the cache module requires service the controller module has to be removed first Service to the devices is interrupted during the upgrade procedures Required Tools Table 3 10 Required tools The tools listed in Table 3 10 are required for the upgrading of cache modules Maintenance terminal To change controller parameters ESD wrist strap and ESD To protect all equipment against ESD mat Small flat head screwdriver To loosen CI cable and front bezel captive screws 5 32 Allen wrench To unlock the SW800 Series cabinet Service Manual HSJ50 Array Controllers Installing 3 53 Installing SIMM Cards HSJ50 Array Controllers The following procedure shows how to install SIMM cards to increase write back cache capacity in single and dual redundant controller configuration 1 Connect a maintenance terminal to the controller See Figure 3 21 Figure 3 21 Connecting a maintenance terminal to the controller Local connection port _ To PC lt SS ee B gt H8571 J 5 ka l ae BC16E XX BE To terminal CXO 5322A MC 2 Take the single controller out of service CLI gt SHUTDOWN THIS CONTROLLER 3 If you are working with a dual redundant configuration take both controllers out of service CLI gt SHUTDOWN OT
199. ing cache memory capacity eeseeeseeseeessesesertsererrssressrerstssresreesrrererereseres 3 56 Table 3 12 Power supply status indicators eeseeeseeesseeseeesiereseresrrsrissssressreressrresreeresereee 3 60 Table 3 13 Shelf and single power supply status indicators sesseseeseeeeeeeeeeereerrerrrreeeres 3 61 Table 3 14 Shelf and dual power supply status indicators eeeesseeeeeseereesreressrerresrrereser 3 62 Table 3 15 Required tools for power supply installation ceeeeeeeeseeeseeeeeeeeeeeeteeseeeeeee 3 63 Table 3 16 Storage SBB Status Indicators eee eeeseecseecsseecneeeesseeseeeesseeseeeesecesaeeeeeeesaes 3 67 Table 5 1 Required tools isrener ine e o aereas S a ea EEE beds 5 3 Table 5 2 Requir d AOO S a sachsen de tp E aAa a E ona Min E AEEA R eS 5 6 Table 5 3 Required tools Sei eeto Sais spies Sin oooh eels hd es APE EEEE Steph AEE Ea Reta 5 10 Lable5 4 Required tools siss aissssscceresinsissisadascerveste state sabethaass ooze dtesiateaaleagbanateaiatians ondaniars 5 11 Pable5 5 Required tolsin see ses dee sees diose ed bes hea osa EuS ies INATESA DENE oas aeS aaas 5 13 Table A 1 Instance codes c cccecceeeeececcccccccccccessssseseeccccssssssseeesssesesecccsssseusseuseeeceeeseesseuees A 2 Table A 2 Executive services last failure COd S ceeceeeseeeeeesteeseeeceeceeaeeeeecesaeecatessaeenaes A 42 Table A 3 Value added services last failure codes 0 00 eeeeeeeeeeeneeeeeee
200. ing more than one adapter for load balancing M indicates multiple connections to that node Because each host system can make a separate connection to each of the disk tape and DUP servers this field frequently shows multiple connections to a host system In this example nodes 8 9 and 14 show multiple connections Vi indicates that only a virtual circuit is open and no connection is present This happens prior to establishing a connection It also happens when there is another controller on the same network and HSJ50 Array Controller Troubleshooting when there are systems with multiple adapters connected to the same network Node 15 demonstrates this principle If a period is in a position corresponding to a node that node does not have any virtual circuits or connections to this controller A space indicates the address is beyond the visible node range for this controller Cl DSSI Host Path Status Path Status 0123456789 HSJ50 Array Controller Oi ks neat ake e TAB Arua 2 aeos x 3 Description This display indicates the path status to any system for which a virtual circuit exists This display is available only on CI and DSSI based controllers Each position in the data field represents one of the possible nodes to which the controller can communicate To locate the path status for a given node use the column on the left to determine the high order digit of the node number and use the sec
201. ing on licensed features such as write back caching data center cabinet A generic reference to the large cabinets such as the SW800 series in which StorageWorks components can be mounted DDL Dual data link The ability to operate on the CI bus using both paths simultaneously to the same remote node differential SCSI bus A signal s level is determined by the potential difference between two wires A differential bus is more robust and less subject to electrical noise than is a single ended bus DILX Disk inline exerciser Diagnostic firmware used to test the data transfer capabilities of disk drives in a way that simulates a high level of user activity DSA Digital storage architecture A set of specifications and interfaces describing standards for designing mass storage products DSA defines the functions performed by host computers controllers and disk drives It also specifies how they interact to accomplish mass storage management DSSI Digital storage system interconnect A Digital specific data bus with an 8 bit data transfer rate of 4 MB s Service Manual Service Manual Glossary dual redundant configuration Two controllers in one controller shelf providing the ability for one controller to take over the work of the other controller in the event of a failure of the other controller DUART Dual universal asynchronous receiver transmitter An integrated circuit containing two serial asynchrono
202. isk HSJ50 Array Controller Troubleshooting 1 17 Command Reference number x00000000 Unit Number 450 MSCP Sequence number 21 Logged Message Format 2 Disk Transfer Error MSCP Flags x00 No MSCP Flags indicated MSCP Unique Controller ID x0000000940802576 MSCP Controller Model 40 HSJ50 HS Array Controller MSCP Controller Class 1 Mass Storage Controller class Controller SW version 20 Controller HW version t3 MSCP Unique Unit ID x0000000000000022 MSCP Unit Model 1 HSX0n MSCP basic virtual disk MSCP Unit Class 2 Disk class DEC Std 166 disk Unit SW version 1s Unit HW version 67 MSCP Event Code x014B Major Event Drive Error Sub event Controller Detected Protocol Error Multiunit code x0035 Error recovery Level Ls Retry count O Volume Serial Number QO Header code x00000000 Flags Good LBN LBN QO ia aren an HSAC Data Instance Code x030C4002 A Drive failed because a Test Unit Ready command or a Read Capacity command failed Component ID Device Services Event Number x0000000C Repair Action x00000040 NR Threshold x00000002 Template Type x51 Disk Transfer Error Power On Time Value x00000000000B9331 Completed Byte Count 0 Starting LBN 0 Device Locator x00000504 Port 4 Target 5 LUN 0 SCSI Device Type x1F Device Type not decoded Drive Product Name RZ26 C DEC HSJ50 Array Controller Service Manual Drive Serial Number Command Opcode Sense Data Qualifie
203. istent Memory Error condition during an FX attempt to read CACHEBO memory 017E2E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a host port attempt to write CACHEBO memory 017F2E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a host port attempt to write a byte to CACHEBO memory 01802E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a host port attempt to read CACHEBO memory 01812E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a device port attempt to write CACHEBO memory 01822E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a device port attempt to write a byte to CACHEBO memory 01832E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during a Device port attempt to read CACHEBO memory 01842E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write CACHEBO memory Service Manual HSJ50 Array Controller Appendix A A 9 Instance Code Explanation 01852E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write a byte to CACHEBO memory 01862E02 The CACHEBO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to read CACHEBO memory 01872E02 The CACHEB1 DRAB detected a Nonexistent Memory Error condition during an FX attempt to write CACHEB1 memory 01882E02 The CACHEB1 DRAB dete
204. it Status full Service Manual Unit ASWC KB S Rd 0 Wrs Cm 0 HTS PH 0 MSSO Purge BlChdes B1lHitV D0003 o r 382 0 100 0 0 0 0 0 6880 0 D0250 o r 382 100 0 0 0 0 100 0 6880 0 D0251 o r 284 100 0 0 0 0 100 0 5120 0 DO262 av xr 0 0 0 0 0 0 0 0 0 0 DO280 o r 497 44 55 0 0 0 100 0 9011 0 D0351 a r 0 0 0 0 0 0 0 0 0 0 D0911 a r 0 0 0 0 0 0 0 0 0 0 D1000 a xr 0 0 0 0 0 0 0 0 0 0 Description This subdisplay shows the status of the logical units that are known to the controller firmware It also shows I O performance information and caching statistics for the units Up to 42 units can be displayed in this subdisplay The Unit column contains a letter indicating the type of unit followed by the unit number of the logical unit The list is sorted by unit number There may be duplication of unit numbers between devices of different types If this happens the order of these devices is arbitrary The following device type letters may appear D indicates a disk device T indicates a tape device L indicates a media loader C indicates a CD ROM device F indicates a device type not listed above U indicates the device type is unknown The ASWC columns indicate the availability spindle state write protect state and cache state respectively of the logical unit HSJ50 Array Controller Troubleshooting HSJ50 Array Controller For HSZ controllers on line in this column means that the uni
205. ivity to cease This code is used to inform the other controller this controller will stop responding to inter controller communications during card update An automatic restart of the controller at the end of the program card update will cause normal controller activity to resume Table A 26 Induce controller crash utility CRASH last failure codes Last Fail Code Explanation 88000000 Controller was forced to restart due to the 00 execution of the CRASH utility Service Manual HSJ50 Array Controller Appendix A HSJ50 Array Controller Repair action codes This section contains the repair action codes you will reference from Instance codes and Last Fail codes Table A 27 Repair action codes Repair Action Action to take Code No action necessary An unrecoverable hardware detected fault occurred or an unrecoverable firmware inconsistency was detected proceed with controller support avenues Inconsistent or erroneous information was received from the operating system Proceed with operating system software support avenues Follow the recommended repair action contained in the Last Failure Code There are two possible problem sources 1 In the case of a shelf with dual power supplies one of the power supplies has failed Follow repair action 07 for the power supply with the power LED out One of the shelf blowers has failed Follow repair action 06 SBB connector Follow repair action 0A A standalone
206. l Circuit A 37 400D640A CI Port detected bad path A upon attempting 64 to transmit a packet 3 E 2 4015020A Remote SYSAP sent an SCS APPL_MSG but 0 no receive credit was available 4029010A Illegal connection state Not in CONNECT_REC connection state when an SCS ACCEPT_REQ is pending Service Manual A 38 Appendix A Instance Code Explanation 402A010A Illegal connection state Not in CONNECT_REC connection state when an SCS REJECT_REQ is pending 402B010A Illegal connection state Not in CLOSED connection state when an SCS CONNECT_REQ is pending 402C010A Illegal connection state Not in OPEN or DISCONNECT_REC connection state when an SCS DISCONNECT_REQ is pending 403D020A Received packet with an unrecognized PPD opcode Note that the content of the vestate field is undefined in this instance 40440064 Received a PPD NODE_STOP and closed virtual circuit 4051020A Received SCS CONNECT_RSP when not in CONNECT_SENT connection state 4052020A Received SCS CONNECT_RSP when the connection is no longer valid 4054020A Received SCS ACCEPT_RSP when not in the ACCEPT_SENT connection state 4055020A Received SCS REJECT_REQ when not in the CONNECT_ACK connection state 4056020A Received SCS REJECT_RSP when not in the REJECT_SENT connection state 4057020A Received SCS DISCONNECT_REQ when not 0 in the OPEN DISCONNECT_SENT or DISCONNECT_ACK connection state 4058020A Received SCS DISCONNECT_RSP when not 02 in th
207. lacing HSJ50 gt SET unit number PREFERRED PATH THIS CONTROLLER Repeat the command for each unit that is preferred to the controller that you are replacing Service Manual HSJ50 Array Controller Replacing field replaceable units 2 5 10 11 12 13 14 15 HSJ50 Array Controller Disable the CI paths by issuing the following CLI commands HSJ50 gt SET OTHER_CONTROLLER NOPATH_A HSJ50 gt SET OTHER CONTROLLER NOPATH_B If the controller to be replaced is still functioning use the SHUTDOWN command to take it out of service When using this command do not specify any command switches When the controller halts the green Reset LED stops flashing and stays lit Take the operating controller out of dual redundant failover mode HSJ50 gt SET NOFAILOVER With a small flat head screwdriver loosen the captive screws that secure the CI connector to the controller Loosen the two captive retaining screws at each corner of the controller and cache module front bezel Do not remove the module yet Obtain and place the ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface Unsnap and remove the program card ESD cover on the controller you are removing Remove the program card by pressing and
208. lacing a single ECB with dual ECB follow these steps a Press the shutdown button on the single ECB until the LED stops flashing b Remove the ECB cable from the single ECB c Remove the single ECB from the device slot d Install the dual ECB into the device slot Figure 3 17 Installing an SBB battery module CXO 5306A MC 2 Slide the cache module into the appropriate slot Push the module firmly into the slot until it is seated See Figure 3 18 HSJ50 Array Controllers Service Manual 3 48 Installing Figure 3 18 Installing the cache and controller modules CXO 5324A MC 3 Reinstall the controller module into the appropriate slot Push the module firmly into the slot until it is seated See Figure 3 18 Caution Do not overtighten the controller s front panel captive screws the cache module s front panel captive screws or the ECB cable captive screws Damage to the controller PC board or front panel the cache module front panel or the SBB may result 4 Tighten the front panel captive screws on the controller and cache module Do not over tighten Service Manual HSJ50 Array Controllers Installing 3 49 Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 5 Connect the ECB battery cables starting at the cache modules 6 Reinstall the CI cable at
209. ld replaceable units 2 21 HSJ50 Array Controller 10 aC Catttinn o gt Do not overtighten the controller s front panel captive screws the cache module s front panel captive screws or the ECB cable captive screws Damage to the controller PC board or front panel the cache module front panel or the battery SBB may result Tighten the front panel captive screws on the controller and cache module Reconnect the open end of the ECB cable to the cache module Tighten the cable mounting screws Attach a maintenance terminal to the new controller Press and hold the controller s green reset button while inserting the program card The program card eject button will extend when the card is fully inserted Release the reset button to initialize the controller If the controller reports an invalid cache error enter one of the the following CLI command If you have replaced the cache board enter the following command CLI gt CLEAR_ERRORS INVALID_CACHE THIS_CONTROLLER DESTROY_UNFLUSHED_DATA Otherwise enter HSJ50 gt CLEAR_ERRORS INVALID _CACHE THIS CONTROLLER NODESTROY_UNFLUSHED _DATA Reattach the CI cable to the controller s front panel Check the configuration by entering the following CLI commad HSJ50 gt SHOW THIS CONTROLLER The controller will display the following information this is a sample only Controller HSJ50 2634901786 Firmware V05 0 Hardware BX11 Configured for dual redundancy with
210. le 1 40290100 Failed to allocate ACB 402A0100 Failed to allocate ID member template Service Manual HSJ50 Array Controller Appendix A A 75 Last Fail Code Explanation 402B0100 Failed to allocate DG HTBs 402C0100 Failed to allocate message HTBs 402D0101 S_max_node greater than 01 MAX_VC_ENTRIES Last Failure Parameter 0 contains the S_ci_max_nodes value 402E0101 S_max_node not set to valid value 8 16 32 01 64 128 256 Last Failure Parameter 0 contains the S_ci_max_nodes value 402F0100 Failure to allocate a HIS EIP structure 40300100 Failure in memory allocation 7 40510100 htb_id type not DG when attempting to deallocate DG HTB 40520100 htb_id type not RCV_SND when attempting to dealloc recv queue HTB 40530100 htb_id type not RCV_SND when attempting to dealloc SCS queue HTB 40560100 Failed to find a vc entry for ccb during 0 his_close_connection routine 407B0100 SCS command timeout unexpectedly inactive 0 during SCS Accept Request 407C0100 SCS command timeout unexpectedly inactive 0 during SCS Reject Request 408E0100 Message receive queue count disagrees with 0 HTBs on the queue 40930100 Message receive queue count disagrees with HTBs on the queue 40950100 Create xfer request with 0 byte count 40900100 htb_id type not DG when attempting to xmit DG HTB 40960100 Create xfer request with 0 byte count 40970100 Create xfer request with 0 byte count 40980100 Create xfer request wit
211. ler SCSI ID 6 SCSI ID 6 SCSI ID 6 SCSI ID 6 HSJ50 Array Controllers Service Manual 3 26 Installing 2 Connect a maintenance terminal to the controller See Figure 3 4 Figure 3 4 Connecting a maintenance terminal to the controller Local connection port BC16E XX oo To terminal CXO 5322A MC 3 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around you wrist 4 Attach or clip the other end of the ESD strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface Service Manual HSJ50 Array Controllers Installing 3 27 5 Install an external cache battery ECB SBB into a convenient device slot See Figure 3 5 Figure 3 5 Installing an ECB SBB CXO 5306A MC 6 Install the controller power supplies into the controller shelf See Figure 3 6 Figure 3 6 Installing power supplies into the controller shelf CXO 5304A MC HSJ50 Array Controllers Service Manual 3 28 Installing 7 Slide the cache module into the controller shelf Push the module firmly into the slot until it is seated See Figure 3 7 Figure 3 7 Installing a single controller SW500 cabinet CXO 5324A MC 8 Install the controller module into the shelf slot that corresponds to SCSI ID7 See Figure 3 7 and Figure 3 8 Service Manual HSJ50 Array Controllers Installing 3 29 Figure 3 8 Installing a single controller SW800 cabinet POW
212. lied as data then the target may have already altered the medium This sense key also may indicate that an invalid IDENTIFY message was received During device initialization the device reported the SCSI Sense Key UNIT ATTENTION This indicates that the removable medium may have been changed or the target has been reset During device initialization the device reported the SCSI Sense Key DATA PROTECT This indicates that a command that reads or writes the medium was attempted on a block that is protected from this operation The read or write operation is not performed During device initialization the device reported the SCSI Sense Key BLANK CHECK This indicates that a write once device encountered blank medium or format defined end of data indication while reading or a write once device encountered a nonblank medium while writing During device initialization the device reported a SCSI Vendor Specific Sense Key This sense key is available for reporting vendor specific conditions During device initialization the device reported the SCSI Sense Key COPY ABORTED This indicates a COPY COMPARE or COPY AND VERIFY command was aborted due to an error condition on the source device the destination device or both Service Manual A 34 Appendix A a 03DE450A During device initialization the device reported the SCSI Sense Key ABORTED COMMAND This indicates the target aborted the command The initiator may be able
213. ll ports quiesced Remove the other controller the one WITHOUT a blinking green LED within 5 minutes Time remaining 4 minutes 50 seconds 6 Place the ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist 7 Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface 8 Unsnap and remove the program card ESD cover on the controller you are removing HSJ50 Array Controller Service Manual 2 36 Replacing field replaceable units 9 Remove the program card by pressing and holding in the Reset button then pressing the eject button next to the card See Figure 2 21 Figure 2 21 Removing the program card PCMCIA button card CXO 5323A MC 10 Remove the CI cable from controller A Service Manual HSJ50 Array Controller Replacing field replaceable units 2 37 11 Slide controller A out of the shelf and note the rails in which the module was seated Place the module on an ESD mat Do not remove the CI connector See Figure 2 22 Figure 2 22 Removing the controller and cache module CXO 5327A MC 12 Wait for the following text to be displayed at the operating controller s console Note You may remove the cache module before or after port activity has restarted HSJ50 Array Controller Service Manual 2 38 Replacing field replaceable units Restarting all ports Port 1 re
214. ll run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Service Manual 1 38 Troubleshooting Type G Control G to get a performance summary without stopping the test T if you are running TILX through VCS Type C to terminate the current TILX test Type Y to terminate the current test and exit TILX Running a tape drive read only test Service Manual This section provides instructions on how to run a TILX read only test on one or more tape drives The test only verifies that a tape is readable The test performs an initial read pass until it reaches the EOT or the selected number of records It then rewinds the tape and starts another read pass The test ignores tape marks Since the test will most likely perform read operations with incorrect record sizes it ignores record size mismatches The test records all other errors To run a read only TILX test 1 Start TILX from the CLI prompt HSJ50 gt RUN TILX 2 Do not accept the default settings Use all defaults y n n 3 Enter the amount of time that you want the test to run Enter execution time limit in minutes 10 65535 10 25 4 If you want to see performance summaries while TILX is running specify how often TILX should display the summaries Enter performance summary interval in minutes 1 65535 10 5 5 The normal TILX summary simply indicates w
215. llegal Data Pattern Number found in data pattern header Unit x Explanation DILX read data from the disk and found that the data was not in a pattern that DILX previously wrote to the disk Message Code 2 No write buffers correspond to data pattern Unit xX Explanation DILX read a legal data pattern from the disk at a place where DILX wrote to the disk but DILX does not have any write buffers that correspond to the data pattern Thus the data has been corrupted Service Manual 1 32 Troubleshooting Message Code 3 Read data do not match what DILX thought was written to the media Unit x Explanation DILX writes data to the disk and then reads it and compares it against what was written to the disk This indicates a compare failure More information is displayed to indicate where in the data buffer the compare operation failed and what the data was and should have been Message Code 4 Compare Host Data should have reported a compare error but did not Unit x Explanation A compare host data compare was issued in a way that DILX expected to receive a compare error but no error was received DILX data patterns Service Manual Table 1 3 defines the data patterns used with the DILX Basic Function or User Defined tests There are 18 unique data patterns These data patterns were selected as worst case or the ones most likely to produce errors on disks connected to the controller Table 1 3 DILX data
216. lure Parameter 0 contains the work found 09650101 Work that was not FLM work was found on the FLM queue Last Failure Parameter 0 contains the structure found 09670101 Local FLM detected an invalid facility to act upon Last Failure Parameter 0 contains the facility found 09680101 Remote FLM detected an error and requested the local controller to restart Last Failure Parameter 0 contains the reason for the request 09C80101 Remote FLM detected an invalid facility to act upon Last Failure Parameter 0 contains the facility found 09C90101 Remote FLM detected an invalid work type Last Failure Parameter 0 contains the work type found 09CA0101 Remote FLM detected an invalid work type Last Failure Parameter 0 contains the work type found 09CB0012 Remote FLM detected that the other controller has a facility lock manager at an incompatible revision level with this controller Last Failure Parameter 0 contains this controller s FLM revision Last Failure Parameter 1 contains the other controller s FLM revision HSJ50 Array Controller Service Manual A 72 Appendix A Table A 11 Integrated logging facility last failure codes Last Fail Code Explanation Repair Action Code 0A010100 CACHE FIND_LOG_BUFFERS returned continuation handle gt 0 0A030100 ILF CACHE_READY buffers_obtained gt non zero stack entry count 0A040100 ILF CACHE_READY DWD overrun 0A050100 IFL CACHE_READY DWD underrun for d J
217. m MAINTENANCE _MODE to NORMAL but was rejected because of insufficient resources 81050100 TILX tried to change the usb unit state from MAINTENANCE _MODE to NORMAL but TILX never received notification of a successful state change 81060100 TILX tried to switch the unit state from MAINTENANCE _MODE to NORMAL but was not successful 81070100 TILX aborted all cmds via va d_abort but the htbs haven t been returned 81080100 While TILX was deallocating his eip buffers at least one could not be found 81090100 TILX received an end msg which corresponds 01 to an opcode not supported by TILX 810A0100 TILX was not able to restart his timer for 810B0100 TILX tried to issue an I O for an opcode not supported 810D0100 A TILX device control block contains an unsupported unit_state 810E0100 TILX received an unsupported Value Added status in a Value added completion message HSJ50 Array Controller Service Manual 01 01 01 01 01 01 01 01 01 01 01 01 A 88 Appendix A Last Fail Code Explanation 810F0100 TILX found an unsupported device control block substate while trying to build a cmd for the Basic Function test 81100100 TILX found an unsupported device control block substate while trying to build a cmd for the Read Only test 81110100 TILX found an unsupported device control block substate while trying to build a cmd for the User Defined test 81120100 TILX received an EOT encountered while in a substate where EOT encountered s
218. m controller and ArBitration engine DRAB with an indication that the CACHE backup battery has failed or is low needs charging A processor interrupt was generated by the CACHEB Dynamic Ram controller and ArBitration engine DRAB with an indication that the CACHE backup battery has failed or is low needs charging Service Manual A 44 Last Fail Code 010D0110 010E0110 010F0110 01100100 Service Manual Appendix A Explanation The System Information structure within the System Information Page has been reset to default settings The only known cause for this event is an 1960 processor hang caused by an unimplemented memory region reference When such a hang occurs controller modules equipped with inactivity watchdog timer circuitry will spontaneously reboot after the watchdog timer expires within seconds of the hang Controller modules not so equipped will just hang as indicated by the green LED on the OCP remaining in a steady state All structures contained in the System Information Page SIP and the Last Failure entries have been reset to their default settings This is a normal occurrence in the following situations For the first boot following manufacture of the controller module During the transition from one firmware version to another if the format of the SIP is different between the two versions If this event is reported at any other time follow the recommended repair action associ
219. m one side to the other and rock the connector to help seat it Listen for the connector to snap into place 11 Reinstall the SBBs into the shelf Ensure that you install all devices into the same slot that they were removed from 12 Replace the volume shield in the controller shelf and lightly tighten the captive screws using a flat head screwdriver Replace the cache modules and the controller modules using the Replace Cache Module section of this manual Service Manual 3 Installing Patching controller software Formatting disk drives Installing new software on a device Installing a controller and cache module single controller configuration Installing a second controller and cache module Adding a second controller using C_SWAP Installing a cache module Adding cache memory Installing power supplies Installing storage building blocks HSJ50 Array Controllers Service Manual 3 2 Installing Precautions Some of the procedures in this chapter involve handling program card controller modules and cache modules Use the following guidelines to prevent component damage while servicing your subsystem modules Electrostatic discharge protection Electrostatic discharge ESD can damage system components Use the following guidelines when handling your subsystem components Handling controllers or cache modules Always wear a properly grounded ESD wrist strap whenever you remove or install a controller or cache modul
220. m will not be able to communicate with the host 82062002 An unrecoverable error was detected during execution of the UART DUART Subsystem Test This will cause the console to be unusable This will cause failover communications to fail 82072002 An unrecoverable error was detected during 20 execution of the FX Subsystem Test Service Manual HSJ50 Array Controller Appendix A A 41 Instance Code Explanation 82082002 An unrecoverable error was detected during 20 execution of the nbuss init Test HSJ50 Array Controller Service Manual A 42 Appendix A Last fail codes The following tables contain last fail codes and their definitions Use these tables if your subsystem or controller is out of service due to some type of failure and you cannot use FMU to translate the last fail code These codes are presented in tables according to the software component that was the source of the error However they are also sorted numerically so you can scan down the list until you find the code you re looking for Table A 2 Executive services last failure codes Last Fail Code Explanation 01000100 Memory allocation failure during executive 0l initialization 01010100 An interrupt without any handler was 01 triggered 01020100 Entry on timer queue was not of type AQ or 01 BQ 01 01030100 Memory allocation for a facility lock failed 0 01040100 Memory initialization called with invalid memory type 01050104 The 1960 rep
221. main found destination address in 20 the rcv packet does not match node address 42640100 Scan packet que found bad path select case for DSSI 42680102 Dssi_wait_isr routine found that 720 report 01 unexpected interrupt status for target mode Last Fail Parameter 0 contains the 720 chip sist0 register value Last Failure Parameter 1 contains the 720 chip sist1 register value 42690101 Dssi_wait_isr routine found that the 720 script 01 reported an invalid rcv status Last Failure Parameter 0 contains the receive interrupt status N720 _dsps value 426B0101 Dssi_wait_isr routine found that 720 01 interrupted without status Last Failure Parameter 0 contains the 720 chip istat register value 42752002 Dssi_wait_isr routine found that 720 reported 20 a bus error on the controller s internal bus Last Failure Parameter 0 contains the 720 chip dstat register value Last Failure Parameter 1 contains the 720 chip dcmd register value HSJ50 Array Controller Service Manual A 80 Service Manual Last Fail Code 42760102 42770102 42790103 427A6601 Explanation Dssi_wait_isr routine found that 720 reported an unexpected status for initiator mode Last Failure Parameter 0 contains the 720 chip dstat register value Last Failure Parameter 1 contains the 720 chip demd register value Dssi_wait_isr routine found that 720 reported an unexpected status for initiator mode Last Failure Parameter 0 contains the 720 chi
222. mber that you can choose for TILX testing Select the first tape drive that you want to test Do not include the letter D in the unit number Enter unit number to be tested 350 TILX indicates whether it has been able to allocate the tape drive If you enabled the read write test TILX gives you a final warning that the data on the tape drive will be destroyed Unit 350 will be writ nabled Do you still wish to add this unit y n n y If you want to test more tape drives enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n TILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes TILX will run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test T if you are running TILX through VCS Type C to terminate the current TILX test Type Y to terminate the current test and exit TILX TILX error codes Service Manual If TILX detects an error the performance display for the unit includes the controller instance code IC the device PTL location PTL the SCSI sense key Key the ASC and ASCQ codes ASC Q the number of hard and soft errors HC SC HSJ50 Array Controller Troubleshooting HSJ50 Array Controller 1 43
223. me The utility prompts with error messages if you attempt to perform an illegal patch entry Following is an example of the use of the patch entry option 1 Obtain the appropriate patch data for your controller s firmware version from your Digital Equipment Corporation representative 2 Connect a maintenance terminal to the controller See Figure 3 2 Service Manual HSJ50 Array Controllers Installing HSJ50 Array Controllers 3 7 Figure 3 2 Connecting a maintenance terminal to the controller Local connection port BC16E XX p To terminal CXO 5322A MC Invoke the CLCP utility CLI gt RUN CLCP The CLCP main menu is displayed Select an option from the following list Code Load amp Code Patch local program Main Menu Exit Enter Code PATCH local program 0 1 Enter Code LOAD local program 2a Enter option number 0 2 This controller module does not support code load functionality Exiting CLCP CLI gt Enter option 2 for the code patch menu You have selected the Code Patch program This program is used to manage firmware code patches Select an option from the following list Type Y or C then RETURN at any time to abort Code Patch Service Manual 3 8 Service Manual Installing Code Patch Main Menu 0 Exit Enter a Patch Delete Patches List Patches WN FR Enter option number 0 3 Press 1 to select the Enter a Patch program Thi
224. me remote connection id and connection state fields are undefined 4065020A Received SCS CONNECT_RSP with an unrecognized status Connection is broken by Host Interconnect Services 4066020A Received SCS REJECT_REQ with an invalid reason 4067020A Received SCS APPL_MSG with no receive credit available 41010064 SCSI Host Interconnect Services has detected that the other controller identified in the Failed Controller Target Number and Other Controller Board Serial Number sense data fields has failed and that the controller reporting the event has assumed control of the units identified in the Affected LUNs sense data field 41020064 SCSI Host Interconnect Services has detected that the other controller identified in the Failed Controller Target Number and Other Controller Board Serial Number sense data fields is again operational and that the controller reporting the event is willing to relinquish control of the units identified in the affected LUNs sense data field 82012002 An unrecoverable error was detected during execution of the Device port Subsystem Built In Self Test One or more of the device ports on the controller module has failed some all of the attached storage is no longer accessible via this controller 82042002 A spurious interrupt was detected during the 20 execution of a Subsystem Built In Self Test 82052002 An unrecoverable error was detected during 20 execution of the HOST PORT Subsystem Test The syste
225. mple in the previous section and a DECevent log which translates the same event Service Manual Troubleshooting When using DECevent to translate event information you should not need to refer as frequently to the information and tables included in the remaining sections of this chapter However familiarity with the ERF error log format and other elements of the event logs will help you understand the wide variety of events that might be reported regardless of the translation utility you are using KKK KKK KK KK KK KK KR KKK KKK KKK KKK RNTRY Logging OS 3y OS version Event sequence number Ds Timestamp of occurrence System uptime in seconds 101 VMS error mask x00000000 VMS flags x0001 Host name AXP HW model System type register x00000003 Unique CPU ID x00000002 mpnum x000000FF mperr x000000FF Event validity Sale Event severity I Entry type 100 Major Event class oy AXP Device Type 0 IO Minor Class Li IO Minor Sub Class Des Device Profile Vendor Product Name Unit Name Unit Number 450 Device Class x0001 IO SW Profile VMS DCS_CLASS I VMS DT _TYPE 141 gt MSCP Logged Msg Service Manual 1122 KKK KK KK KKK KK KK KK KKK KK KK KKK OpenVMS AXP X6 1 FT7 28 APR 1994 11 39 40 Dynamic Device Recognition present MTX2 DEC 7000 Model 610 DEC 7000 Unknown validity code Unknown severity code IO Subsystem MSCP Logged Message HSX00 MSCP basic disk MATSDUA D
226. n then pressing the eject button next to the program card Press and hold the Reset button while reinserting the program card Release the Reset button The controller will initialize and perform all internal self tests See Figure 3 12 When the reset LED flashes at a rate of once every second the initialization process is complete If the controller fails to initialize an OCP code will be displayed HSJ50 Array Controllers Installing 3 37 Figure 3 12 Inserting the program card Orientation dot PCMCIA card cover CXO 5331A MC 20 Snap the ESD cover into place over the program card Push inward to lock the cover into place HSJ50 Array Controllers Service Manual 3 38 Installing Caution Ensure the version level of the software in the program card is the same as that in the existing controller A mismatch in the firmware levels may cause unpredictable controller operation 21 Connect the CI cable to the controller 22 From the maintenance terminal that is still connected to the first controller place the two controllers in the dual redundant mode by entering the following command CLI gt SET FAILOVER COPY THIS_ CONTROLLER The two controllers are now in a dual redundant configuration 23 Check the front panel ECB status indicator for the Appropriate indication See Table 3 7 Table 3 7 ECB status indicators LED Status Battery Status LED is on continuously System
227. n n y Set the percentage of requests that will be read requests during the random I O phase The remaining requests will be write requests Enter read percentage for Random IO and Data Intensive phase 0 100 67 80 If you set the test to allow write operations specify the data pattern to be used for the writes Unless you have some specific requirement select 0 to use all patterns See Table 1 3 for a listing of available patterns Enter data pattern number O ALL 19 USER_DEFINED 0 19 0 0 If you set the test to allow write operations and you want to test data integrity enable or disable the initial write pass The initial write pass writes the selected data patterns to the entire specified data space This allows the data to be verified later but it may take a long time to complete the write operation Perform initial write y n n y During the data intensive phase DILX only executes access and erase commands Set the percentage of commands that will be access commands the remaining commands will be erase commands Th rase percentage will be set automatically Enter access percentage for Seek Intensive phas 0 100 90 Service Manual 1 28 Troubleshooting 15 If you enabled the initial write pass and want to test data integrity set the percentage of read and write commands that will have a data compare operation performed Perform data compare y n n y En
228. nd cache module 00 0 0 cece eeseeeeeeeereeseeeeeeeeeeeeeeees 3 57 Figure 3 26 Removing the program Card 0 cc eeeeesseceseceseeceeeeeeseesaeeesaeesaeeesaeesseecseeesaeeeaee 3 58 Figure 3 27 Installing a power supply SBB 00 eee eeeeeessccneeceseeeseeceeeeeeeesaeenaeeesesesaeeaee 3 63 Figure 3 28 Typical 3 5 inch and 5 25 inch disk drive or optical disk SBBS eee 3 64 Figure 3 29 Typical 5 25 inch CD ROM SBB uu eee eeceeneeeeceseecneeeesaeeeeeeesaeesaeeesaeenaes 3 65 Figure 3 30 Typical 3 5 inch tape drive SBB 0 0 eee eeeceseecneeceseeeseeceeesaeessseeseeseseesee 3 66 Figure 4 1 Moving a storageset from one subsystem to another cs eeseeeeseeceeeeteeeeeeeee 4 3 Figure 4 2 Maintaining symmetry in your subsystem makes it easier to keep track of your storagesets and their members ce eeeeceeecesseecscecseecscecsseeseecsseeeeecesaeeesaeenseeesaeers 4 6 Figure 5 1 Connecting a maintenance terminal to the controller eeeeeeeeeeeeeeeeeeeeeeeee 5 3 Figure 5 2 Connecting a maintenance terminal to the controller ee eee eeeeeseeeeeeeeeeeeeeeeee 5 6 Figure 5 3 Disconnecting the CI cable connector 0 eee eeeeeeeeceseeeeeeeeeeeeeeeeesaeenseecseesneers 5 7 Figure 5 4 Removing the ECB cables eseescecsseesseeeeseeceeecescecesaeeaeeesaeecsaeesaeesseeeseers 5 8 Figure 5 5 Removing the controller and cache module eee eee eeseeeeesneeeneeeeseeseersneesaeers 5 9 Figure 5 6 Removing a 3 5 inch disk drive eeees
229. ndex value The event log format found in the already built eip is not supported by the Fault Manager The bad format was discovered while trying to copy the eip information into a datagram HTB Last Failure Parameter 0 contains the format code value Last Failure Parameter 1 contains the instance code value The caller of FM CANCEL_EVENT_NOTIFICATION passed an address of an event notification routine which does not match the address of any routines for which event notification is enabled Service Manual A 66 Last Fail Code 040A0100 040B0100 040D0100 040E0100 040F0102 04100101 04110101 04120101 Service Manual Appendix A Explanation The caller of FM CANCEL_SCSI_DE_NOTIFICATION passed an address of a deferred error notification routine which doesn t match the address of any routines for which deferred error notification is enabled An error which is not related to an O request htb_ptr has an unsupported template type FM ENABLE EVENT_NOTIFICATION was called to enable eip notification but the specified routine was already enabled to receive eip notification FM ENABLE_DE_NOTIFICATION was called to enable deferred error notification but the specified routine was already enabled to receive deferred error notification The eip gt generic mscp1 flgs field of the EIP passed to FM REPORT_EVENT contains an invalid flag Last Failure Parameter 0 contains the instance code value L
230. ne 1 DRAB1 failed cache diagnostics testing performed on Cache B other cache during a cache failover attempt The memory address field contains the starting physical address of the CACHEB1 memory The CACHE Dynamic Ram controller and ArBitration engine 0 and 1 DRABO and DRAB1 failed cache diagnostics testing performed on Cache B other cache during a cache failover attempt The memory address field contains the starting physical address of the CACHEB1 memory The A Write Append Position Error occurred during a tape write but no recovery was attempted because the attempted transfer did not meet the parameters for a recoverable Write Append Position Error When attempting to recover a Write Append Position Error on a tape unit the recovery failed to start because resources required for the recovery were not available When attempting to recover a Write Append Position Error on a tape unit an error occurred during the recovery Service Manual A 22 Appendix A aa 024B2401 The Write back caching has been disabled either due to a cache or battery related problem The exact nature of the problem is reported by other instance codes Note that in this instance the memory address byte count DRAB register and Diagnostic register fields are undefined 024F2401 This cache module is populated with SIMMs incorrectly Cache metadata resident in the cache module indicates that unflushed write cache data exists for a cache size
231. net 8 Connect the new CI cable to the controller 9 Install new tie wraps as necessary to hold the new CI cable in place 10 Connect the other end of the internal CI cable to the external CI cable 11 Reconnect the external CI cable at the star coupler HSJ50 Array Controller Service Manual 2 58 Replacing field replaceable units Replacing SCSI device port cables Servicing SCSI device port cables will require some downtime because you must remove devices to access SCSI connectors in the controller shelf and the device shelf Required tools The tools listed in Table 2 12 are required for replacing SCSI device cables Table 2 12 Required tools for replacing SCSI device cables Small flat head screwdriver To loosen captive screws 5 32 inch Allen wrench To unlock the SW800 series cabinet Replacing the device cables 1 Halt all I O activity to the controller using the appropriate procedure for your operating system 2 Dismount all units using the appropriate procedures for your operating system 3 Remove each controller and cache module using the Replacing Cache Module procedures 4 Use a flat head screwdriver to loosen the two captive screws on each side of the volume shield and remove the shield See Figure 2 32 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 59 Figure 2 32 Removing the volume shield o o o o o o hi o o 0 pra eT o o 0 o
232. ng the subsystem HSJ50 Array Controller 1 Restart controller A HSJ50 gt RESTART OTHER_CONTROLLER Connect the maintenance terminal to controller A See Figure 2 20 Press and hold the Reset button on controller A while inserting the program card Release the Reset button to initialize the controller Wait for the CLI prompt to appear at the terminal You will see a Controllers misconfigured message which you can ignore Enter the following command HSJ50 gt SET NOFAILOVER Service Manual 2 42 Replacing field replaceable units Enter the following command from controller A CLI to put the controllers into dual redundant mode HSJ50 gt SET FAILOVER COPY OTHER_CONTROLLER Controller A will restart Tighten the front bezel captive screws on controller A cache module Do not overtighten Servicing the second cache module 1 Loosen the captive screws of controller B CI connector and the front bezel of controller B and cache module B At controller A shut down controller B HSJ50 gt SHUTDOWN OTHER_CONTROLLER When the controller halts the green Reset LED stops flashing and stays lit Take the operating controller out of dual redundant failover mode HSJ50 gt SET NOFAILOVER Start the C_SWAP program HSJ50 gt RUN C_SWAP Removing the SBB battery module Service Manual 1 When the controller prompts you answer the question Do you wish to remove the other HSJ50 y n n
233. nt firmware Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 0328450A The disk device reported standard SCSI Sense Data 03324002 SCSI bus selection time out f4o 03204002 Synchronous negotiation error 03344002 Target assertion of REQ after WAIT DISCONNECT 03354002 During device initialization a Test Unit Ready 40 command or a Read Capacity command to the drive failed HSJ50 Array Controller Service Manual A 26 Appendix A Instance Code Explanation 03364002 During device initialization the device 40 reported a deferred error 03374002 During device initialization the maximum 40 number of errors for a data transfer operation was exceeded 03384002 Request Sense command to the device failed 033B4002 Unexpected bus phase 033C4002 The device unexpectedly disconnected from the SCSI bus 033D4002 Unexpected message 033F0101 No command control structures available for pass through device operation 03402002 Device port SCSI chip reported gross error 03410101 Miscellaneous SCSI Port Driver coding error for 03420101 A pass through device related internal error 01 code was reported that is not recognized by the Fault Management firmware of the HSZ controller 03434002 During device initialization the device 40 reported unexpected standard SCSI Sense Data 03644002 An unrecoverable tape drive error was 01 encountered while perf
234. ntifies the following The error or condition The component reporting the condition The recommended repair action 3 Keep in mind that the 32 bit instance code appears in LONGWORD 1 of CONTROLLER DEPENDENT INFORMATION with the following exceptions When MSLG B_FORMAT reads 09 BAD BLOCK REPLACEMENT ATTEMPT the instance code does not appear because the Error Reporting Formatter ERF does not provide CONTROLLER DEPENDENT INFORMATION When MSLG B_FORMAT reads 0A MEDIA LOADER LOG the instance code appears in LONGWORD 2 HSJ50 Array Controller Troubleshooting 1 13 When MSLG B_FORMAT reads 00 CONTROLLER LOG the instance code appears in part of both LONGWORD 1 and LONGWORD 2 For the 00 CONTROLLER LOG MSLG B_FORMAT the code is skewed and not directly readable as a longword The code s low order bytes appear in the two high order bytes of LONGWORD 1 and the code s high order bytes appear in the two low order bytes of LONGWORD 2 For example CONTROLLER DEPENDENT INFORMATION LONGWORD 1 030A0000 Pagesa wih LONGWORD 2 24010102 fiasco 4 The following VMS example shows an example of an ERF translated host error log a Disk Transfer Event log Locate MSLG B_FORMAT and CONTROLLER DEPENDENT INFORMATION in the example MS SYSTEM ERROR REPORT COMPILED 9 AUG 1994 13 41 37 PAGE 758 FOI III IG I ENTRY L122 RR I IO OR OK K K IK ERROR SEQUENCE 5 LOGGED ON CPU_TYPE 00000002
235. ntroller will not establish a VMS disk or tape class connection Change the SCS_NODENAME and restart the controller Foreign disk drive is not usable HSJ50 Array Controller Symptoms e An RZ disk drive moved from another environment to the StorageWorks subsystem is not usable from the host e You want to keep the data that is on the foreign disk drive Likely Cause The disk does not contain controller metadata and is not set as TRANSPORTABLE The controller can only present foreign disks when they are set as TRANSPORTABLE Solution 1 From the controller CLI set the disk drive as transportable HSJ50 gt SET disk name TRANSPORTABLE where disk name is the name that you gave the foreign drive with the ADD DISK command 2 Look for the disk as a DKA device in VMS Service Manual 1 6 Troubleshooting Interpreting controller LED codes The operator control panel OCP on each HSJ50 controller contains a green reset LED and six device bus LEDs These LEDs light in patterns to display codes when there is a problem with a device configuration a device or a controller e During normal operation the green reset LED on each controller flashes once per second and the device bus LEDs are not lit e The amber LED for a device bus lights continuously when the installed devices do not match the controller configuration or when a device fault occurs e The green reset LED lights continuously and the amber LEDs display a c
236. nvalid status 027D0100 Unable to allocate memory for a Failover Control Block 027E0100 Unable to allocate memory for a Failover 0l Control Block 027F0100 Unable to allocate memory for a Failover 01 Control Block 02800100 Unable to allocate memory for a Failover 01 Control Block 02820100 Unable to allocate memory for the Dirty 0l Count Array 02830100 Unable to allocate memory for the Cache 0l Buffer Index Array 02840100 Unable to allocate memory for the XNode 01 Array 02850100 Cache was declared bad by the cache 0l diagnostics after first Meg was tested Can t recover and use local memory because cannot get those initial buffers back 02860100 Unable to allocate memory for the Fault 01 Management Event Information Packet used by the Cache Manager in generating error logs to the host 02880100 Invalid FOC Message in cmfoc_snd_cmd for 02890100 Invalid FOC Message in cmfoc_rcv_cmd 01l i A 1 028A0100 Invalid return status from DIAG 01 CACHE_MEMORY_TEST 028B0100 Invalid return status from DIAG 01 CACHE_MEMORY_TEST s gi w 01 028C0100 Invalid error status given to cache_fail for HSJ50 Array Controller Service Manual A 52 Appendix A Last Fail Code Explanation 028E0100 Invalid DCA state detected in init_crashover 028F0100 Invalid status returned from CACHE 01 CHECK_METADATA 02900100 Unable to allocate memory for the First Cache 0l Buffer Index Array 02910100 Invalid metadata combination dete
237. o o o o 0 0 o w o o VOLUME o SHIELD e 5 R o o ok o o o o o o U1 CXO 5175A MC 5 Remove the failed cable from the controller shelf backplane by pinching the cable connector side clips and disconnecting the cable Caution Digital recommends that you label all devices before you remove them from the device shelf 6 Before removing the disk drives from the shelf let the drive spin down for at least 30 seconds Gyroscopic motion from a spinning disk may cause you to drop and damage the disk 7 Remove any SBBs necessary to gain access the SCSI cable See Figure 2 33 HSJ50 Array Controller Service Manual 2 60 Replacing field replaceable units Figure 2 33 Access to the SCSI cables 8 BIT SHELF SCSI REMOVE CABLE SBBs ACCESS BUS CONNECTOR JA1 BUS CONNECTOR JB1 REMOVE DEVICE CABLE CXO 5176A MC 8 Remove the failed cable from the device shelf backplane by pinching the cable connector side clips and disconnecting the cable See Figure 2 33 9 To install the new SCSI device port cable at the device shelf gently slide the cable connector in from one side to the other and rock the connector to help seat it Listen for the connector to snap into the place Take care not to bend any connector pins Service Manual HSJ50 Array Controller Replacing field replaceable units 2 61 HSJ50 Array Controller 10 To install the cable at the controller shelf gently slide the cable connector on fro
238. o the Failedset The RAID set associated with the logical unit has transitioned from Reconstructing state to Normal state Note that in this instance information supplied in the Device Locator Device Firmware Revision Level Device Product ID and Device Type fields is for the first device in the RAID set The RAIDset associated with the logical unit has gone inoperative Note that in this instance information supplied in the Device Locator Device Firmware Revision Level Device Product ID and Device Type fields is for the first device in the RAIDset The RAIDset associated with the logical unit has transitioned from Normal state to Reconstructing state Note that in this instance information supplied in the Device Locator Device Firmware Revision Level Device Product ID and Device Type fields is for the first device in the RAIDset Mirroring support is enabled but not licensed on this controller Any use of this feature requires licensing Continued use does not comply with the terms and conditions of licensing for this product The device specified in the Device Locator field has been added to the mirrorset associated with the logical unit The new mirrorset member is now in the Copying state The device specified in the Device Locator field has been removed from the mirrorset associated with the logical unit The removed device is now in the Failedset Service Manual A 20 Instance Code 022A0064 022B0064 0
239. o write a byte to CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during a host port attempt to read CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write CACHEAO memory HSJ50 Array Controller Appendix A Instance Code 016A2D02 016B2D02 016C2D02 016D2D02 016E2D02 016F2D02 01702D02 01712D02 01722D02 01732D02 01742D02 01752D02 01762D02 HSJ50 Array Controller Explanation The CACHEAO DRAB detected a Nonexistent Memory Error condition during a Device port attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during a Device port attempt to read CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Nonexistent Memory Error condition during an 1960 attempt to read CACHEAO memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during an FX attempt to write CACHEA1 memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during an FX attempt to write a byte to CACHEA1 memory The CACHEA1 DRAB detected a Nonexistent Memory Error condition during an FX attempt to read CACHEA1 memory The CACHEA1
240. oblem still persists then replace the controller backplane The Environmental Monitor Unit EMU has detected an elevated temperature condition Check the shelf and its components for the cause of the fault The Environmental Monitor Unit EMU has detected an external air sense fault Check components outside of the shelf for the cause of the fault An environmental fault previously detected by the EMU is now fixed The EIP is used to notify that the repair was a EES Replace the controller module the controller module Eeo eee the indicated cache module or the appropriate memory Eeo eee located on the indicated cache module Replace the indicated write cache battery Caution BATTERY REPLACEMENT MAY CAUSE INJURY HSJ50 Array Controller Appendix A A 93 Repair Action Action to take Code Check for the following invalid write cache configurations If it is the wrong write cache module replace with the matching module or clear the invalid cache error via the CLI See the CLI Reference Manual for details If the write cache module is missing re seat cache if it is actually present or add the missing cache module or clear the invalid cache error via the CLI See the CLI Reference Manual for details If this is a dual redundant configuration and one of the write cache modules is missing match write cache boards with both controllers If this is a dual redundant configuration and both caches are not of the same typ
241. ociated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined A failure occurred while attempting a SCSI Test Unit Ready or Read Capacity command to a device The device type is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Service Manual A 32 Instance Code 03D24402 03D3450A 03D4450A 03D5450A 03D6450A 03D7450A Service Manual Appendix A Explanation SCSI bus errors during device operation The device type is unknown to the controller Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined During device initialization the device reported the SCSI Sense Key NO SENSE This indicates that there is no specific sense key information to be reported for the designated logical unit This would be the case for a successful command or a command that received CHECK CONDITION or COMMAND TERMINATED status because one of the FM EOM or ILI bits is set to one in the sense data flags field During device initialization the device reported the SCSI Sense Key RECOVERED ERROR This indicates the last command completed successfully with some recovery action performed by the target During device initialization the device reported the SCSI Sense Key NOT READY This indicates that the logical
242. ode when a controller problem occurs Solid LED codes indicate a fault detected by internal diagnostic and initialization routines Flashing LED codes indicate a fault that occurred during core diagnostics Look up the LED code that is showing on your controller in Table 1 1 or Table 1 2 to determine its meaning and find the corrective action The symbols used in the tables have the following meanings LEDon O LED off x LED flashing Table 1 1 Solid controller LED codes Description of Error Corrective Action DAEMON hard error Replace controller module Repeated firmware Replace controller bugcheck module NVMEM version Replace program card mismatch with later version of firmware NVMEM write error Replace controller module NVMEM read error Replace controller module NMI error within Reset the controller firmware bugcheck Service Manual HSJ50 Array Controller Troubleshooting 1 7 Description of Error Corrective Action Inconsistent NVMEM Reset the controller structures repaired Bugcheck with no Reset the controller restart Firmware induced Replace controller restart following module bugcheck failed to occur Hardware induced Replace controller restart following module bugcheck failed to occur Bugcheck within Reset controller bugcheck controller module NVMEM version is too Verify the card is the low latest revision If the problem still exists replace the module Program car
243. odules out of the shelf and place them on an ESD mat See Figure 3 23 Figure 3 23 Removing controller and cache modules CXO 5327A MC 13 Remove the cache modules from the controller shelf HSJ50 Array Controllers Service Manual 3 56 Installing 14 Refer to Table 3 11 and install as many SIMM cards into each cache modules as are required up to 4x32MB maximum Table 3 11 Adding cache memory capacity SIMM slot occupied SIMM 1 SIMM 1 2 SIMM 1 2 3 4 Figure 3 24 shows all possible cache configurations Figure 3 24 Cache configurations for cache version 3 QONOUOOUO0 simm 1 QOOOUOONOU simm 1 ALLN simm 4 8 WSM 2o mmm sme 2o mm s F a m SIMM 3 g a m SIMM3 g AN sivm 3 ooo q TF sm4 ooo q C sim4 200 i QUUOUCUUUU SIMM 4 32 MB configuration 64 MB configuration 128 MB configuration CXO 5361A MC 15 Reinstall the cache modules into the controller shelf 16 Reinstall the controller modules into their original slots See Figure 3 25 Use a gentle rocking motion to help seat the module If you are using a single controller configuration use the slot that is designated SCSI ID 7 Service Manual HSJ50 Array Controllers Installing 3 57 Figure 3 25 Installing the controller
244. oller Replacing field replaceable units 2 55 3 At the CLI prompt enter HSJ50 gt SHUTDOWN OTHER CONTROLLER HSJ50 gt SHUTDOWN THIS_CONTROLLER 4 Remove the power cords from the device shelf that contains the failed solid state disk drive optical or CD ROM 5 Press the two mounting tabs together to remove the starage building block device from the shelf See Figure 2 30 Figure 2 30 Removing a CD ROM drive 1 5 Oo 5 00 00 00 00 00 00 CXO 5229A MC 6 Align the replacement device with the shelf rails 7 Push the device all the way into the shelf until the locking tabs snap into place 8 Observe the status LED for the following indications The device fault amber LED is off HSJ50 Array Controller Service Manual 2 56 Replacing field replaceable units Replacing internal CI cables Servicing the internal CI cables will cause some system downtime because the host path will
245. ollers To restart the other HSJ50 1 Enter the command RESTART OTHER_CONTROLLER 2 Press and hold in the Reset button while inserting the program card 3 Release Reset the controller will initialize 4 Configure new controller by referring to controller s user guide Restarting the subsystem HSJ50 Array Controllers 1 Restart the new controller CLI gt RESTART OTHER_CONTROLLER Connect the maintenance terminal to the newly installed controller Press and hold the Reset button on the new controller while inserting the program card from the new controller Release the Reset button to initialize the controller Wait for the CLI prompt to appear at the terminal You will see a Controllers misconfigured message which you can ignore If the new controller reports an invalid cache error enter the following CLI command to clear the error CLI gt CLEAR_ERRORS INVALID_CACHE THIS_CONTROLLER DESTROY_UNFLUSHED_DATA Ensure that the new controller is not in dual redundant mode by entering the following command on the new controller CLI gt SET NOFAILOVER If the controller was already in nonredundant mode you will see a message indicating that Enter the following command from the new controller CLI to put the controllers into dual redundant mode CLI gt SET FAILOVER COPY OTHER_ CONTROLLER The new controller will initialize Verify that all host settings are correct Service
246. on Guide StorageWorks Configuration Manager for DEC AA QC39A TE OSF 1 System Manager s Guide for HSZterm StorageWorks Solutions Configuration Guide EK BA350 CG StorageWorks Solutions Shelf and SBB User s EK BA350 UG Guide StorageWorks Solutions SW300 Series RAID EK SW300 UG Enclosure Installation and User s Guide StorageWorks SW500 Series Cabinet Installation EK SW500 UG and User s Guide StorageWorks SW800 Series Data Center Cabinet EK SW800 UG Installation and User s Guide The RAIDBOOK A Source for RAID RAID Advisory Technology Board Polycenter Console Manager User s Guide Computer Associates VAXcluster Systems Guidelines for VAXcluster System Configurations 16 Bit SBB User s Guide Service Manual HSJ50 Array Controller 1 Troubleshooting Fault isolation guide Interpreting log messages controller LED codes Interpreting host event Reading a DECevent error log Using FMU to describe event log codes Testing disk drives Testing tapes Monitoring subsystem performance HSJ50 Array Controller Service Manual 1 2 Troubleshooting Introduction This chapter is designed to help you diagnose and correct any problems you might encounter while working with StorageWorks HSJ controllers Service Manual HSJ50 Array Controller Troubleshooting 1 3 Fault isolation guide This chapter is structured to help you match the symptoms that you are seeing console message unit not available and so on with the source of t
247. ond row to determine the low order digit of the node number For CI controllers the number of nodes displayed is determined by the controllers MAX NODE parameter The maximum supported value for this parameter is 32 For DSSI controllers the number of nodes is fixed at 8 Each location in the grid contains a character to indicate the path status A indicates only CI path A is functioning properly In this example node 12 demonstrates this This value is not displayed for DSSI based controllers B indicates only CI path B is functioning properly In this example node 14 demonstrates this This value is not displayed for DSSI based controllers Service Manual 1 52 Troubleshooting X indicates the CI cables are crossed In this example node 27 demonstrates this This value is not displayed for DSSI based controllers A circumflex indicates the single DSSI path or both CI paths are functioning properly In this example nodes 8 9 and 15 demonstrate this Ifa period is in a position corresponding to a node that node does not have any virtual circuits or connections to this controller so either the path status cannot be determined or neither path is functioning properly A space indicates the address is beyond the visible node range for this controller Device SCSI Status Service Manual Target 01234567 Pl DDDDFhH O2TTT T hH r3DDD hH t 4DDDDDDhH 5DDDD hH 6 hH Description Thi
248. or more bits in the Replace controller diagnostic registers did module not match the expected reset value OO BHO Memory error in the Replace controller nonvolatile journal module SRAM OO0O 8 amp amp amp Wrong image seen on Replace program card program card At least one register in Replace controller the controller DRAB module does not read as written O XOOO Main memory is Replace controller fragmented into too module many sections for the number of entries in the good memory list O x OOF O The controller DRAB or Replace controller DRAC chip does not module arbitrate correctly Service Manual HSJ50 Array Controller Troubleshooting HSJ50 Array Controller Description of Error Corrective Action 0O OF O00 O X OKO OOOO O w ta ba ote ae Tae ye Re a he he ze Re Re The controller DRAB or DRAC chop failed to detect forced parity or detected parity when not forced The controller DRAB or DRAC chip failed to verify the EDC correctly The controller DRAB or DRAC chip failed to report forced ECC The controller DRAB or DRAC chip failed some operation in the reporting validating and testing of the multibit ECC memory error The controller DRAB or DRAC chip failed some operation in the reporting validating and testing of the multiple single bit ECC memory error The controller main memory did not write correctly in one or more sized memory transfers The cont
249. orming work related to tape unit operations 03674002 A Drive failed because a Test Unit Ready 40 command or a Read Capacity command failed 0368000A Drive was failed by a Mode Select command received from the host 03694002 Drive failed due to a deferred error reported by drive 036A4002 Unrecovered Read or Write error 036B4002 No response from one or more drives 3 036C430A Nonvolatile memory and drive metadata indicate conflicting drive configurations 036D430A The Synchronous Transfer Value differs 4 between drives in the same storageset Service Manual HSJ50 Array Controller 033E4002 Message Reject received on a valid message lao 40 40 40 40 40 40 40 01 20 01 Appendix A A 27 Instance Code Explanation 036E4002 Maximum number of errors for this data transfer operation exceeded 036F4002 Drive reported recovered error without transferring all data 03784002 Unexpected bus phase 037B4002 Synchronous negotiation error 037C4002 The drive unexpectedly disconnected from the SCSI bus 03760101 Watchdog timer time out 037D4002 Unexpected message 037E4002 Unexpected Tag message 037F4002 Channel busy 03804002 Message Reject received on a valid message 0381450A The tape device reported Vendor Unique SCSI Sense Data 037A4002 Message not sent by drive 03820101 No command control structures available for 0 tape operation Note that in this instance the Associated Additional Sense Code and
250. orted a fault Last Failure Parameter 0 contains the PC value Last Failure Parameter 1 contains the AC value Last Failure Parameter 2 contains the fault type and subtype values Last Failure Parameter 3 contains the address of the faulting instruction 01070100 Timer chip setup failed Service Manual HSJ50 Array Controller Appendix A Last Fail Code 01082004 01090105 010A2080 010B2380 010C2380 HSJ50 Array Controller A 43 Explanation The core diagnostics reported a fault contains the error blinking OCP LEDs Last Failure Parameter 0 code value same as error code Last Failure Parameter 1 contains the address of the fault Last Failure Parameter 2 contains the actual data value Last Failure Parameter 3 contains the expected data value An NMI occurred during EXEC BUGCHECK processing Last Failure Parameter 0 contains the executive flags value Last Failure Parameter 1 contains the RIP from the NMI stack Last Failure Parameter 2 contains the read diagnostic register 0 value Last Failure Parameter 3 contains the Master DRAB CSR value Last Failure Parameter 4 contains the SIP last failure code value A single bit or a multi single bit ECC error was detected To prevent data corruption the controller was reset If this event occurs frequently the controller module should be replaced A processor interrupt was generated by the CACHEA Dynamic Ra
251. orted via the DRAB_INT Follow repair action 34 HSJ50 Array Controller Appendix A HSJ50 Array Controller Repair Action Code 2A A 95 Action to take A Multiple Bit ECC error was detected by the CACHEBO or CACHEB1 DRAB Use the following register information to locate additional details The CACHEBn DRAB DER register bits 0 through 6 contain the syndrome value The CACHEBn DRAB EAR register combined with the Master DRAB RSR register bits 12 through 15 CACHEB memory region yields the affected memory address The CACHEBn DRAB EDR register contains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 34 The Master DRAB detected an Ibus to Nbus Time out condition If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT If any of the following is true a firmware fault is indicated follow repair action 01 Master DRAB CSR register bits 10 through 12 contains the value 1 and WDR1 register bit 26 is clear Master DRAB CSR register bits 10 through 12 contains the value 2 and WDR1 register bit 27 is clear Master DRAB CSR regist
252. ounting screws to reinstall the controller 5 32 inch Allen wrench To unlock the SW800 series cabinet Caution Before starting the C_SWAP utility terminate all other running utilities and disable all other terminals Prepare the subsystem 1 Have the new module at hand The module should be factory fresh or should have been shut down cleanly with the SHUTDOWN command in its last application 2 Connect a maintenance terminal to the existing controller See Figure 3 13 HSJ50 Array Controllers Service Manual 3 40 Service Manual Installing Figure 3 13 Connecting a maintenance terminal to the controller Local connection port BC16E XX we To terminal CXO 5322A MG 3 Show the configuration of the existing controller CLI gt SHOW THIS_CONTROLLER Controller HSJ50 2634901786 Firmware V05 0 1 Hardware BX11 Not configured for dual redundancy SCSI address 7 Time 15 JUN 1995 16 32 54 Host port Node name HSJA1 valid CI node 21 32 max nodes Path A is on Path B is on SCP allocation class 3 TMSCP allocation class 3 CI_ARBITRATION SYNCHRONOUS AXIMUM HOSTS 9 NOCI_4K_ PACKET _CAPABILITY Cache 128 megabyte write cache version 3 Cache is GOOD Battery is good No unflushed data in cache CACHE_FLUSH TIMER DEFAULT 10 seconds CACHE POLICY B NOCACHE_UPS HSJ50 Array Controllers Installing HSJ50 Array Controllers 10 11 12 3 41 Note the
253. ower supplies 2 47 SCSI device port cables 2 58 solid state disk drives 2 54 storage devices 2 50 tape drives 2 53 write back cache battery cells 2 33 Replacing units FRUs C_swap 2 3 dual redundant configuration 2 22 precautions 2 2 single configuration 2 15 S SBB G 8 SBBs 3 64 installing SBBs 3 64 installing procedure 3 69 status indicators 3 66 SCS G 9 SCSI G 9 SCSI device port cables replacing 2 58 SIMM card installing 3 53 precaution 3 57 Single disk drive unit moving 4 8 HSJ50 Array Controller Index Software patching 3 3 SPD G 10 Storage device building blocks See SBBs Storage devices 2 50 Storage unit G 10 Storageset moving 4 3 T Tagged command queuing G 10 Tape drives installing new firmware on 3 15 removing 5 10 Target G 10 TILX basic tape test 1 34 drive read only test 1 38 TMSCP G 11 Trobleshooting LED codes 1 6 Troubleshooting DECevent error log 1 15 host event logs 1 12 introductiion 1 2 testing disks 1 22 testing tapes 1 33 using FMU 1 19 HSJ50 Array Controller Troublesooting fault isolation 1 3 U Unit G 11 Upgrading device firmware 3 15 V VCS G 11 Virtual terminal G 11 VTDPY command line 7 46 control keys 1 46 display fields 47 help 1 63 running VTDPY 1 45 W Warm swap G 11 controllers 2 3 Write hole G 11 Write back cache handling for ESD 2 2 replacing battery
254. p dstat register value Last Failure Parameter 1 contains the 720 chip demd register value Dssi_wait_isr routine found that 720 reported an unexpected status in initiator mode causes stacked interrupters too many Last Failure Parameter 0 contains the 720 chip dstat register value Last Failure Parameter 1 contains the 720 chip dcmd register value Host port found that the controller has exceeded the maximum number of user specified host VCs Last Failure Parameter 0 is a 32 bit MASK of OPEN VCs the controller sees to host nodes Appendix A Table A 16 Disk and tape MSCP server last failure codes Last Fail Code Explanation 60030100 Unable to fine free DCD_CMDCORL_BLOCK 60050100 60070100 60090100 Invalid return value from routine HIS CONNECT while DCD attempting to establish connection to a remote subsystem Invalid return value from routine HIS MAP while dmscp_dced_allocate_bh attempting to map a buffer Invalid return value from routine HIS DISCONNECT while dmscp_dcd_comm_path_event attempting to disconnect a remote source connection 01 HSJ50 Array Controller Appendix A A 81 Last Fail Code Explanation 600C0100 Invalid return value from routine RESMGR 01 ALLOCATE_DATA_SEGMENT 600D0100 Opcode field in command being aborted is not 01 valid 60120100 Opcode of TMSCP command to be aborted is 0l invalid 01 60130100 tmscp_clear_sex_cdl_cmpl_rtn detected an unexpected opco
255. p the transmit and receive cables for the indicated path Check the indicated path cables for proper installation For HSHJ3x 4x Check the CI adapter on the host system identified in the Remote Node Name field for proper operation For HSJ3x HS1CP Check the DSSI adapter on the host system identified in the Remote Node Name field for proper operation Excessive VC closures are occurring Perform repair action 61 on both sets of path cables If the problem persists perform repair action 63 Polling failed to complete in a timely manner Perform repair action 61 on all path cables HSJ50 Array Controller Consult the device s maintenance manual for guidance on replacing the indicated device FRU Appendix A A 101 Repair Action Action to take Code The number of hosts forming virtual circuits with the controller exceeds the current user specified maximum Increase the maximum number of hosts allowed value Perform repair action 61 If the problem persists perform repair action 20 The external cache battery cable might have been disconnected HSJ50 Array Controller Service Manual Glossary HSJ50 Array Controller Service Manual G 2 Service Manual Glossary adapter A device that converts the protocol and hardware interface of one bus type into that of another without changing the functionality of the bus allocation class A numerical value assigned to a controller to identify units across multiple in
256. p_initialize unable to get LOCAL 01 STATIC memory from exec for use as an AVAILABLE ITB 60570100 mscp_initialize unable to get LOCAL 01 STATIC memory from exec for use as an AVAILABLE state change ITB 60580100 mscp_initialize unable to get LOCAL 01 STATIC memory from exec for use as a state change ITB 605D0100 tmscp_onl_cleanup_rtn detected a failure in 01 enabling variable speed mode suppression 605E0100 tmscp_suc_cmpl_rtn detected a failure in 01 enabling variable speed mode suppression 605F0100 tmscp_suc_cmpl_rtn detected a failure in 0l enabling variable speed mode suppression HSJ50 Array Controller Service Manual A 84 Appendix A Last Fail Code Explanation 60610100 mscp_initialize unable to get BUFFER STATIC memory from exec for use as Write History Logs 60620100 mscp_initialize unable to get LOCAL STATIC memory from exec for use as Write History Log Allocation Failure Lists 60650100 Attempting to block incoming requests for the tape loader when it was unexpectedly found already blocked 60660100 Loader boundary block request to stall incoming requests to the tape loader unit was not setup as expected 60670100 The controller has insufficient memory available for allocating context blocks needed for Disk_Copy_Data commands 60680100 VA ENABLE_NOTIFICATION failed with insufficient resources at init time 606B0100 mscp_foc_receive_cmd detected that the message sent from the other controller had an illegal
257. places a heavy load on the controllers To avoid the possibility that data may be lost you should stop normal I O operations before running DILX or run DILX during periods of low activity DILX can test several disks at the same time Before starting DILX you must configure the disks that you want to test as single disk units In other words the disks cannot be part of any storageset and they must have a unit number assigned There are four tests that you can run with DILX a quick disk test an initial test on all disks a basic function test and an advanced user defined test Running a quick disk test HSJ50 Array Controller This section provides instructions on how to run a quick DILX test on one or more disks This is a 10 minute read only test that uses the default DILX settings 1 Start DILX from the CLI prompt HSJ50 gt RUN DILX 2 Skip the auto configure option so that you can specify which disk drives to test Do you wish to perform an Auto configure y n n 3 Accept the default settings Use all defaults and run in read only mode y n y 4 The system displays a list of all single disk units by unit number that you can choose for DILX testing Select the first disk that you want to test Do not include the letter D in the unit number Enter unit number to be tested 350 5 DILX indicates whether it has been able to allocate the disk If you want to test more disks enter the unit numbers wh
258. plies Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities 5 32 Allen wrench To unlock the SW800 Series cabinet Removing the power supply Use this procedure in a dual redundant configuration if you have only one power supply in the device shelf and the controller shelf 1 If you are performing a cold swap and the power supply is still operating connect a maintenance terminal to one of the controllers 2 At the CLI prompt enter HSJ50 gt SHUTDOWN OTHER_CONTROLLER HSJ50 gt SHUTDOWN THIS_CONTROLLER 3 Disconnect the power cords from the power supply 4 Press the two mounting tabs together to release the power supply from the shelf and partially pull it out of the shelf See Figure 2 26 5 Use both hands to pull the power supply out of the shelf HSJ50 Array Controller Service Manual 2 48 Replacing field replaceable units Figure 2 26 Removing the power supply CXO 5228A MC Installing the new power supply 1 Firmly push the power supply into the shelf until the mounting tabs snap into place 2 Reconnect the power cord to the power supply Service Manual HSJ50 Array Controller Replacing field replaceable units 2 49 3 Observe the power and shelf status indicators to make sure they are on See Figure 2 27 If the status indicators are not on refer to the installation section tables 46 and 47 of this manual Figure 2 27 The power su
259. power is on and the ECB is fully charged LED blinks rapidly System power is on and the ECB is charging LED blinks slowly System power is off and the ECB is supplying power to the cache LED is off System power is off and the ECB is not supplying power to the cache Adding a second controller using C_SWAP Service Manual You can add a second controller to a single controller configuration to create a dual redundant configuration There are two procedures for adding a second controller an online method using the C_LSWAP utility in which the existing controller continues to process I O and an offline method in which you must shut down the existing controller See previous procedure To add a controller module using the online C_SWAP method your system must have two power supplies in the controller shelf of an SW500 or SW800 cabinet If you are adding a second controller in an SW300 cabinet your cabinet must have a minimum of five power supplies HSJ50 Array Controllers Installing 3 39 The following steps guide you through the online method using the C_SWAP utility Required tools The tools listed in Table 3 8 are required for adding a second controller Table 3 8 Required tools for adding a second controller Maintenance terminal To change controller parameters and cable ESD wrist strap and ESD To protect all equipment against mat electrostatic discharge 3 32 inch Allen wrench To loosen the controller m
260. pply status indicator SHELF STATUS LED _ Do POWER SUPPLY STATUS LED CXO 4651A MC Asynchronous swap method Use this swap method only if you have redundant power supplies and if one of the supplies is still operating 1 Remove the failed power supply using steps 4 and 5 in the preceeding procedure 2 Follow the procedures for replacing a new power supply from the preceeding procedure HSJ50 Array Controller Service Manual 2 50 Replacing field replaceable units Replacing storage devices The asynchronous swap method may be used to replace disk drives Use the warm swap method to replace tape drives Solid state disks optical and CD ROM drives can be replaced only using the cold swap method Disk drives Asynchronous disk drive swap Software Version 5 0 supports asynchronous disk drive replacement device removal and device insertion without first quiescing the device bus You can remove or insert devices at any time with the following restrictions Required tools Do not remove or insert devices during failover Do not remove or insert devices during failback Do not remove or insert devices before the CLI prompt appears during controller initialization Do not remove or insert devices while the controller is still recognizing a device removal indicated by flashing LEDs Do not remove or insert devices while the controller is running a local program such as DILX or HSUTIL Wait 50 s
261. pt received during initialization An illegal process was Replace controller activated during module initialization HSJ50 Array Controller Service Manual Troubleshooting Interpreting host event log messages This section explains how to interpret event messages sent to the host system by StorageWorks HSJ controllers Some of these messages are information only others report significant events and others report subsystem and controller failures You can use the Fault Management Utility FMU in the controller to get more information on any specific event The FMU can provide a description of instance codes last failure codes and memory system failure codes The FMU can also provide recommended repair actions for each code Because the subsystem and controller might be out of service after an event that generates a last failure code you might not be able to run FMU Instance codes and last failure codes are listed in this chapter Finding the Instance Code in an Event Message Service Manual To work with and understand the event logs use these steps and guidelines 1 Note the MSLG B_FORMAT field in the upper portion of the log Also note the CONTROLLER DEPENDENT INFORMATION in the lower portion of the log this information varies according to the MSLG B_FORMAT field 2 Locate a 32 bit instance code in the CONTROLLER DEPENDENT INFORMATION area This is the key to interpreting error logs The instance code uniquely ide
262. r HSAC Sense Data Error Code Segment Information Bytes CMD Specific Info Sense Key ASC amp ASCQ FRU Code Sense Key Specific Data Service Manual x00 x00 x00 x00 x00000000 x00000000 x04 x3F85 Troubleshooting 35349717 Test Unit Ready Buf Mode The target shall not report GOOD status on write commands until the data blocks are actually written on the medium UWEUO zero not defined MSBD zero not defined FBW zero not defined DSSD Sense Data fields were generated by the HSAC controller on behalf of the target devices because the Sense Data could not be obtained from that device Error Code no decoded Hardware Error ASC x003F ASCO x0085 Test Unit Ready or Read Capacity Command failed x00 x00000000 Sense Key Data NOT Valid Byte 1 x00000000 Byte 2 x00000000 Byte 3 x00000000 HSJ50 Array Controller Troubleshooting Using FMU to Describe Event Log Codes FMU has a DESCRIBE function that you can use to interpret event codes produced by the controller Use this function to understand events that have occurred in the subsystem instance codes and to find the recommended repair action repair action codes as well as to interpret other codes The types of codes that FMU can describe are HSJ50 Array Controller INSTANCE_CODE REPAIR_ACTION_CODE LAST_FAILURE_CODE ASC_ASCQ_CODE COMPONENT_CODE CONTROLLER_UNIQUE_ASC_ASCQ_ CODE DEVICE_TYPE_CODE EVENT_
263. r 1 Connect a maintenance terminal to the controller See Figure 3 14 Figure 3 14 Connecting a maintenance terminal to the controller Local connection port BC16E XX Pa To terminal CXO 5322A MC 2 Halt all host I O activity using the appropriate procedure for your operating system 3 At the CLI prompt enter CLI gt SHUTDOWN THIS CONTROLLER 4 After the controller has shut down remove the maintenance terminal cable and remove power cables from the controller shelf 5 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist 6 Attach or clip the other end of the ESD strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface 7 With a small flat head screw driver loosen the captive screws on the CI cable of the controller and remove the cable See Figure 3 15 HSJ50 Array Controllers Service Manual 3 46 Installing Figure 3 15 Disconnecting the CI cable adapter CI bus cable CXO 5319A MC 8 Use a gentle rocking motion to loosen the controller modules 9 Slide the controller modules out of the shelf as shown in Figure 3 16 and place them on an ESD mat Figure 3 16 Removing controller modules CXO 5332A MC Service Manual HSJ50 Array Controllers Installing 3 47 Installing a write back cache module 1 Install the ECB SBB into a convenient device slot See Figure 3 17 If you are rep
264. r during a Host port attempt to write CACHEAI memory The CACHEA1 DRAB detected a Write Data Parity error during a Host port attempt to write a byte to CACHEAI memory The CACHEA1 DRAB detected a Write Data Parity error during a Device port attempt to write CACHEA1 memory The CACHEA1 DRAB detected a Write Data Parity error during a Device port attempt to write a byte to CACHEA1 memory The CACHEA1 DRAB detected a Write Data Parity error during an 960 attempt to write CACHEAI memory The CACHEA1 DRAB detected a Write Data Parity error during an 1960 attempt to write a byte to CACHEA1 memory The CACHEBO DRAB detected a Write Data Parity error during an FX attempt to write CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during an FX attempt to write a byte to CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during a Host port attempt to write CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during a Host port attempt to write a byte to CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during a Device port attempt to write CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during a Device port attempt to write a byte to CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during an 1960 attempt to write CACHEBO memory The CACHEBO DRAB detected a Write Data Parity error during an 1960 attempt to write a byte to CACH
265. r has been notified that the data has been written vcs VAXcluster console system virtual terminal A software path from an operator terminal on the host to the controller s CLI The path can be established via the host port on the controller using DUP or via the maintenance port through on intermediary host VCS A virtual terminal is also sometimes called a host console warm swap A method for adding or replacing a device whereby the system remains online but all activity on the device s bus must be halted for the duration of the swap write back caching A caching strategy that writes data to the cache memory then flushes the data to the intended device at some future time From the user s perspective the write operation is complete when the data is stored in the cache memory This strategy avoids unnecessary access of the devices write hole Undetectable RAID level 1 or 5 data corruption A write hole is caused by the successful writing of some but not all of the storageset members Write holes occur under conditions such as power outages in which the writing of multiple members can be abruptly interrupted A battery backed up cache design eliminates the write hole because data is preserved and writes can be retried Service Manual G 12 Service Manual Glossary write through cache A cache write strategy in which the destination of the write data is the primary storage media This operation may updat
266. ration eeee 2 15 REqUITERtOOIS i ikee h A a EE T a A ai e aa A E ARTESA 2 15 Removing the controller meen senserite sees seesi FEE teo ee eS CESES Ea SEENE 2 15 Installing the n wimod l iiser area aa A Naa ES EE EEOSE 2 20 Replacing one dual redundant controller and write back cache module cceeceeees 2 22 ReGuited OOE a a acatecanuesuets cvesteag ees N e R E E e RiT 2 22 Removing the Controllers isre iste cevedus E EEE EE ES ei E TO dah SE eE 2 23 Installing the new Controller sccis s 52ccscs syzdessesssntedend Eaa ano aa a EEEa eaei ea 2 26 Replacing cachem dules dii ene aa eaea a Ses aad eed ae a e E E E ES 2 29 Required tools i a E E A ET E E EEEE AR 2 29 Service Manual HSJ50 Array Controller Removing a write back cache module in a single controller configuration 2 30 Installing the new cache MOdUIe eee eee eeeeeesneeceneeseecsceceeeeeseeeesaeeneeeesaeecsaeesaeers 2 32 Replacing external cache batteries BCBS enorsensonaciainnaenneanninn aiana 2 33 Required LOOMS ss iis test asivks retsccosd thanbores Sa savedes severe tevsda ta taubieatevesda tadus dam A 2 33 Replacing the SBB battery module eee ee ee cece ceeeceseeeseeeesseeseeeseeeeeteeseeeseesseeaes 2 34 Preparing the subsystemis 1 420i ok Sith dle E ale Rl ek ihe 2 34 Removing the ECB siitia ea a A a e E A E an 2 35 Reinstalling the modules in inersion hes seta dees hs atte A eS e eano EES 2 39 Restarting the
267. rce driver programming error encountered during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03080101 Miscellaneous SCSI Port Driver coding error detected during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03094002 An unrecoverable disk drive error was encountered while performing work related to disk unit operations 030C4002 A Drive failed because a Test Unit Ready command or a Read Capacity command failed 030D000A Drive was failed by a Mode Select command received from the host 030E4002 Drive failed due to a deferred error reported by drive 03104002 No response from one or more drives 0311430A Nonvolatile memory and drive metadata indicate conflicting drive configurations 0312430A The Synchronous Transfer Value differs 4 between drives in the same storageset 030F4002 Unrecovered Read or Write error 00 0 0 0 3 3 Service Manual HSJ50 Array Controller Appendix A A 25 transfer operation exceeded transferring all data SCSI bus 03224002 03284002 45 40 00 40 03244002 Channel busy 03254002 Message Reject received on a valid message 0326450A The disk device reported Vendor Unique SCSI Sense Data 03270101 A disk related error code was reported that 0 was unknown to the Fault Manageme
268. rd Use the following guidelines when handling the program card Cover the program card with the snap on ESD cover when the card is installed in the controller Keep the program card in its original carrying case when not in use Do not twist or bend the program card Do not touch the program card contacts Handling controller host port cables Use the following guidelines when handling controller host port cables Service Manual When installing host port cables use care not to touch the connector s pins Use care not to bend any connector pins when plugging the host cables into the controller CI connector HSJ50 Array Controller Replacing field replaceable units 2 3 Replacing controllers and cache modules using C_swap This section presents controller and controller component replacement procedures for HSJ50 controllers Also replacement procedures for cache modules and write back cache batteries are provided Controller and cache module warm swap procedure Note Use this procedure when you cannot shut down the system and one controller is still fully functioning Use the warm swap procedure to replace a controller that has failed and failover already has taken place Also use this procedure if you a suspect that a controller is not operating properly that is one or more controller ports have failed Use this procedure when you cannot shut down the system and only if the controller is in a dual redundant conf
269. re that the program card is not installed If it is installed take the card out by removing the ESD cover and then pressing the eject button next to the card Install the controller module Slide the module straight in along the rails and then push firmly to seat it in the backplane Caution Do not overtighten the controller s front panel captive screws the cache module s front panel captive screws or the ECB cable captive screws Damage to the controller PC board or front panel the cache module front panel or the SBB may result Tighten the front panel captive screws on the cache and the controller modules Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface Connect the battery cable to the cache module and then the ECB For dual ECB SBBs a Connect one end of a battery cable to ECB A and the other end to cache module A b Connect one end of a battery cable to ECB B and the other end to cache module B Press Return on the operating controller s console Wait for the following text to be displayed on the operating controller s console Port 1 restarted Port 2 restarted Port 3 restarted Port 4 restarted HSJ50 Array Controllers Installing 3 43 Port 5 restarted Port 6 restarted Controller Warm Swap terminated The configuration has two contr
270. rforman e r tine Ea REE E Ee Ea ESEETO 1 62 Help Example nenea ree EENE cous sats sua ssuteawaurde seve ctaessuevens suai Jenceus ouetesdeebes nod 1 63 2 Replacing field replaceable units Electrostatic discharge protection c cccessssecesneceeseneeceeeeeeceeeecesneeeceeeeeseeeecesnaeeesneeeeeseaeens 2 2 Handling controllers or cache modules 0 ceecceeeesseeeseeecenceeeeeeeeeeseeeseeeecnaceceseeeeeseaeees 2 2 Handling the program Card ec ccceseccessescceeseeccesenceeceneeeceeaeeeceneeccesaeeceaeeeceenaeeenneeeeseaeees 2 2 Handling controller host port cables cececcceesseceeeenceeceeeeeeeeeeeceseeeceeneeeeeeseeeseneeeeeaees 2 2 Replacing controllers and cache modules using C_SWap eseeeceseeeeseceseeceeeeeseeeseeeeeeeseeenees 2 3 Controller and cache module warm swap procedure eeeseeesecseeeeeeceneeeeecsaeeeneeeneers 2 3 Required toolssivis a Mepis Aout waite sais bi nave eddies 2 3 Preparing the subSYSteM cceeccceesssseecenceeceeeecesneeecesenecesaeeeeeseaeeeeeneeeeeeeeessaeeeess 2 4 Removing the modules cceeescceesesccceseeeeeenneeeeseeeeceeeeceeeeeessnaeeecseeeeeeneeeesenees 2 7 Installing the new MOUIES ee eee eeseeeeeeeseeceeeesseecsseecseeceaeesseeseecsneeeeseeeteeeenaes 2 10 Restarting the Subsystemis cicc3223 A esisthassd aciestissdsiviesa tsi Aaetiass sateen das a Es a aa esaii 2 13 Replacing a controller and cache module in a single controller configu
271. rictions e Do not remove devices during failover e Do not remove devices during failback e Do not remove devices before the CLI prompt appears during controller initialization e Do not remove devices while the controller is running a local program such as DILX or VTDPY Required tools The tools listed in Table 5 3 are required for removing SBBs Table 5 3 Required tools Required tools 5 32 inch Allen wrench To unlock the SW800 series cabinet To remove storage devices Use the following procedure to remove 3 1 2 inch and 5 1 4 inch disk drives 1 Determine the disk drive you wish to remove 2 Press the two mounting tabs together to release the disk drive from the shelf and partially pull the disk drive out of the shelf Allow the disk drive to spin down Service Manual HSJ50 Array Controller Removing 5 11 Figure 5 6 Removing a 3 5 inch disk drive lt o 7 Oo Oo fb Pp a a M M CXO 4439A MC 3 Using both hands slide the disk drive out of the shelf and place it on flat surface see Figure 5 6 Removing solid state disks read write optical devices and CD ROM drives When removing solid state disk drives or read write optical devices the controller needs to be shutdown and power must be removed from the device shelf Required tools The tools listed in Table 5 4 are required for removing solid state disk drives tape drives
272. ring an FX attempt to read CACHEAO memory 01A23002 The CACHEAO DRAB detected an Address 30 Parity error during a Host port attempt to read CACHEAO memory 01A33002 The CACHEAO DRAB detected an Address 30 Parity error during a Device port attempt to read CACHEAO memory HSJ50 Array Controller Appendix A A 11 Instance Code Explanation 01A43002 The CACHEAO DRAB detected an Address Parity error during an 1960 attempt to read CACHEAO memory 01A53002 The CACHEA DRAB detected an Address Parity error during an FX attempt to read CACHEAI memory 01A63002 The CACHEA1 DRAB detected an Address Parity error during a Host port attempt to read CACHEAI memory 01A73002 The CACHEA1 DRAB detected an Address Parity error during a Device port attempt to read CACHEA1 memory 01A83002 The CACHEA1 DRAB detected an Address Parity error during an 1960 attempt to read CACHEA1 memory 01A93102 The CACHEBO DRAB detected an Address Parity error during an FX attempt to read CACHEBO memory 01AA3102 The CACHEBO DRAB detected an Address Parity error during a Host port attempt to read CACHEBO memory 01AB3102 The CACHEBO DRAB detected an Address Parity error during a Device port attempt to read CACHEBO memory 01AC3102 The CACHEBO DRAB detected an Address Parity error during an 1960 attempt to read CACHEBO memory 01AD3102 The CACHEB1 DRAB detected an Address Parity error during an FX attempt to read CACHEB1 memory 01AE3102 The CACHEB1
273. roller configuration A controller configuration that does not include an second backup controller permitting failover in the event of a failure normal member A mirrorset member whose entire contents is guaranteed to be the same as all other NORMAL members All NORMAL members are exactly equivalent normalizing member A mirrorset member whose contents is the same as all other NORMAL and NORMALIZING members for data that has been written since the mirrorset was created or lost cache data was cleared Data that has never been written may differ among NORMALIZING members NV Nonvolatile A term used to describe memory that can retain data during a power loss to the controller partition A percentage of a storageset or single disk unit that may be presented to the host as a storage unit port The hardware and software used to connect a host controller to a communication bus such as CI DSSI or SCSI bus This term also is used to describe the connection between the controller and its SCSI storage devices Service Manual G 8 Service Manual Glossary PTL Port target LUN A method of device notation where P designates the controller s device port 1 6 T designates the target ID of the device 0 6 and L designates the LUN of the device 0 7 qualified device A device that has been fully tested in an approved StorageWorks configuration that is shelf cabinet power supply cabling and so forth and is in
274. roller did not cause an I to N bus timeout when accessing a reset host port chip The controller DRAB or DRAC did not report an I to N bus timeout when accessing a reset host port chip The controller DRAB or DRAC did not interrupt the controller processor when expected Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Replace controller module Service Manual Service Manual Troubleshooting Description of Error Corrective Action O 8 amp amp OO he ha he mr Re Re Re He UK EO she ha he he Re ae Re Re ooo0oo0oo0 xooo0oo ROOOK OO OO The controller DRAB or DRAC did not report an NXM error when nonexistent memory was accessed The controller DRAB or DRAC did not report an address parity error when one was forced There was an unexpected nonmaskable interrupt from the controller DRAB or DRAC during the DRAB memory test Diagnostic register indicates there is no cache module but an interrupt exists from the non existent cache module The required amount of memory available for the code image to be loaded from the program cad is insufficient The required amount of memory available in the pool area is insufficient for the controller to run The required amount of m
275. s write protected This column is left blank for other device types The data caching state is indicated using the following letters b Both Read caching and Write Back caching are enabled r Read caching is enabled w Write Back caching is enabled A space in this column indicates caching is disabled KB S This column indicates the average amount of kilobytes of data transferred to and from the unit in the previous screen update interval This data is available only for disk and tape units Rd This column indicates what percentage of data transferred between the host and the unit were read from the unit This data is contained only in the DEFAULT display for disk and tape device types Wr This column indicates what percentage of data transferred between the host and the unit were written to the unit This data is contained only in the DEFAULT display for disk and tape device types Service Manual 1 56 Troubleshooting Cm This column indicates what percentage of data transferred between the host and the unit were compared A compare operation can be accompanied by either a read or a write operation so this column is not cumulative with read percentage and write percentage columns This data is contained only in the DEFAULT display for disk and tape device types HT This column indicates the cache hit percentage for data transferred between the host and the unit Un
276. s SBB battery module in a dual redundant controller configuration Preparing the subsystem 1 For the purpose of this procedure mark one controller A and the other controller B 2 Connect a maintenance terminal to controller B See Figure 2 20 Figure 2 20 Connecting a maintenance terminal to the controller Local connection port BC16E XX p To terminal CXO 5322A MC 3 Loosen the captive screws of controller A CI connector and the front bezel of controller A and cache module A 4 Shut down controller A HSJ50 gt SHUTDOWN OTHER CONTROLLER When the controller halts the green Reset LED stops flashing and stays lit 5 Take the operating controller out of dual redundant failover mode HSJ50 gt SET NOFAILOVER You may see a Warning 600 at the terminal you can safely ignore this warning Service Manual HSJ50 Array Controller Replacing field replaceable units 2 35 6 Start the C_SWAP program HSJ50 gt RUN C_SWAP Removing the ECB 1 When the controller prompts you answer the question Do you wish to remove the other HSJ50 y n n 2 Press Y for yes 3 Answer the question Will its cache module also be removed Y N n 4 Press Y for yes 5 Wait for the following text to be displayed at the console Killing other controller Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced Port 4 quiesced Port 5 quiesced Port 6 quiesced A
277. s display shows what devices the controller has been able to identify on the device busses The controller does not look for devices that are not configured into the nonvolatile memory using the CLI ADD command The column headings indicate the SCSI target numbers for the devices SCSI targets are in the range 0 through 7 Target 7 is always used by a controller In a dual controller configuration target 6 is used by the second controller The device grid contains a letter signifying the device type in each port target location where a device has been found C indicates a CD ROM device D indicates a disk device HSJ50 Array Controller Troubleshooting 1 53 F indicates a device type not listed above H indicates bus position of this controller h indicates bus position of the other controller L indicates a media loader T indicates a tape device A period indicates the device type is unknown A space indicates there is no device configured at this location This subdisplay contains a row for each SCSI device port supported by the controller The subdisplay for a controller that has six SCSI device ports is shown Unit Status abbreviated Unit ASWC KB S Rdt Wrs cms HTS HSJ50 Array Controller DO110 a xr 0 0 0 0 0 D0120 at xr 0 0 0 0 0 D0130 o r 236 100 0 O 100 T0220 av 0 0 0 0 0 T0230 o 123 O 100 0 0 Description This subdisplay shows the status of the logical units that
278. s is the Enter a Code Patch option The program prompts you for the patch information one line at time Be careful to enter the information exactly as it appears on the patch release Patches may be installed for any version of firmware however patches entered for firmware versions other than V51J are not applied until the matching version of firmware is installed To enter any patch you must first install all patches with lower patch numbers beginning with patch number 1 for the specific firmware version If you incorrectly enter the patch information you are given the option to review the patch one line at a time Type Y or C then RETURN at any time to abort Code Patch Do you wish to continue y n y Press Y to continue Enter the required information as shown Version v51d Length 10 Patch Type 0 Patch Number 1 Count 1 Address 10 Value 0 0 Count 0 HSJ50 Array Controllers Installing 3 9 Verification 18FG2118 The patch you just entered is not applied until the controller is restarted Code Patch Main Menu QO Exit 1 Enter a Patch 2 Delete Patches 3 List Patches Enter option number 0 3 0 CLCP Normal Termination Restart of the controller required to apply new patch CLI gt 8 If you are using a dual redundant controller configuration repeat the Installing a Patch procedure for the second controller Code patch messages
279. s number to the nanosecond period invert and multiply by 1000 The period for this is approximately 280 nanoseconds If the field is still Async this might indicate a failure to establish communication between host adapter and HSZ The problem could be one of the following Host port SCSI bus configuration SCSI termination HSJ50 Array Controller Troubleshooting 6 1 49 SCSI cables HSZ Async indicates communication between this target and all initiators is being done in asynchronous mode This is the default communication mode and is used unless the initiator successfully negotiates for synchronous communications If there is no communication with a given target ID the communication mode is listed as asynchronous CI Performance Display Path A Pkts Pkts S RCV ACK NAK NOR 5710 519 11805 1073 2073 188 1072 97 Path B Pkts Pkts S RCV 5869 533 ACK 11318 1028 NAK 2164 196 NOR 445 40 Description This display indicates the number of packets sent and received over each CI path and the packet rate This display is available only on CI based controllers o 2 4 Packets received from a remote node Packets sent to a remote node that were ACKed Packets sent to a remote node that were NAKed Packets sent to a remote node for which no response was received DSSI Performance Display HSJ50 Array Controller DSSI Pkts Pkts S RCV 5710 519 ACK 11805 1073 NAK 2073 188 NOR
280. s the average data transfer rate to all devices on the SCSI bus in kilobytes during the previous screen update interval CR This column indicates the number of SCSI command resets that occurred since VTDPY was started BR This column indicates the number of SCSI bus resets that occurred since VTDPY was started TR This column indicates the number of SCSI target resets that occurred since VTDPY was started HSJ50 Array Controller Troubleshooting Help Example HSJ50 Array Controller 1 63 VTIDPY gt HELP Available VTDPY commands C Prompt for commands G or Z Update screen O Pause Resume screen updates Y Terminate program R or W Refresh screen DISPLAY CACHE Use 132 column unit caching statistics display DISPLAY DEFAULT Use default 132 column system performance display DISPLAY DEVICE Use 132 column device performance display DISPLAY STATUS Use 80 column controller status display EXIT Terminate program same as QUIT NTERVAL lt seconds gt Change update interval I HELP Display this help message REFRESH Refresh the current display QUIT Terminate program same as EXIT UPDATE Update screen display VTDPY gt Description This is the sample output from executing the HELP command Service Manual 2 Replacing field replaceable units Controller and cache modules using C_Swap Single controller and cache
281. sconnect the ECB cable from the cache module Disable the ECB by pressing the battery disable switch until the LED goes out See Figure 2 3 HSJ50 Array Controller Service Manual 2 8 Replacing field replaceable units Figure 2 3 Disconnecting the ECB cable U CXO 5282A MC 7 Slide the controller and cache modules out of the shelf Note in which rails the module was seated and place the modules on an ESD mat See Figure 2 4 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 9 Figure 2 4 Removing the controller and cache modules Controller CXO 5327A MC HSJ50 Array Controller Service Manual 2 10 9 Replacing field replaceable units Wait for the following text to be displayed at the operating controller s console Note You may remove the cache module before or after port activity has restarted Port 1 restarted Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted If required remove the cache module Installing the new modules Service Manual 1 When the controller prompts you answer the questions Do you have a replacement HSJ50 readily available N y If you have a replacement module available enter YES Answer the question Sequence to INSERT the other HSJ50 has begun Do you wish to INSERT the other HSJ50 N Press Y
282. seeeeneeeneeeees 2 27 Figure 2 17 Connecting a maintenance terminal to the controller ee eeeeeeeeeeeeeeeeeeeee 2 30 Figure 2 18 Removing controller and cache modules ec eeecessessseceseeeeseceeeeeseesseeeenees 2 31 Figure 2 19 Installing cache and controller Module ee eee eeeeeeeeeeeneceeeeeseeeaeeseaeenneers 2 32 Figure 2 20 Connecting a maintenance terminal to the controller tees eeeeeeeeeeeeeeeeeeeee 2 34 Figure 2 21 Removing the program Card ceeseesesssseeseecseeesseeceeeesseecseecsaecsaeesseeseesnaees 2 36 Figure 2 22 Removing the controller and cache module ee eeeeeseceseeeeseceeeeeseeeeeeeenees 2 37 Figure 2 23 ECB Cable Connectton sssi ansir re ni areri e e aii ea eas aii 2 39 Figure 2 24 Reinstalling the cache and controller module 0 0 0 eles eeseeeeeceeeeeneeeneeeseeeees 2 40 Figure 2 25 ECB Cable Connections aee n E E A NE E 2 44 Figure 2 26 Removing the power supply cesseseccssecsseeceeeeeseeceeseseeeseeeesseesaeecsaeesaeeses 2 48 Figure 2 27 The power supply status indicator eee eeecesseecseeeeseeeeseeseeeeeseeeseeseeeeeseeens 2 49 Figure 2 28 Removing a disk drive sectarios oeoa eE EEEa aio 2 51 Figure 2 29 Status indicators for 3 5 and 5 25 inch SBBs 00 eeeeeeeseeseeeeeeeeneeeneeeneeeenees 2 52 Figure 2 30 Removing a CD ROM drive eee eesecseessseecnceseseeesceceeeseeeeeeesaeeceeeeeaeeraee 2 55 Figure 2 31 Disconnecting the internal CI cable eee eeeesesecseeeseeeseeeesseesn
283. sk name PTL location 2 Configure the disk drive as a single disk drive unit See Working with HSJ50 units in Interface for the appropriate unit number syntax CLI gt ADD UNIT unit number disk name 3 From a OpenVMS account copy contiguously the firmware from your host to the single disk drive unit COPY CONTIGUOUS firmware unit number 000000 Repeat this step to copy more than one firmware image to the single disk drive unit To find the starting LBN of each firmware image DUMP HEAD BLOCK COUNT 0 2 DUA300 000000 FUP 4 Copy the firmware onto the target devices in your subsystem Copy Command Output Example COPY CONT BABAGI LCA FIRMWARE RZ2X RZ29B_DEC_0014 LOD 2 DUA300 000000 COPY CONT BABAGI LCA FIRMWARE RZ2X RZ28P4_42C_DEC FUP 2 DUA300 000000 DUMP HEAD BLOCK COUNT 0 2 DUA300 000000 LOD Service Manual Installing 2SDUA30 000000 RZ29B_ DEC_0014 LOD LBN 8 DUMP HEAD BLOCK COUNT 0 2SDUA300 000000 FUP 2SDUA300 000000 RZ28P4_42C_DEC FUP LBN 520 Installing the firmware onto a target device Use HSUTIL s DEVICE_CODE_LOAD_DISK option to update a disk drive by installing new firmware Use the DEVICE_CODE_LOAD_TAPE option to update a tape drive In both cases the disk drive that contains the new firmware and the device onto which you re installing it must be configured on the controller from which you re running HSUTIL
284. ssociated with the logical unit has had its nominal membership changed The information provided in the device locator device type device identification and the device serial number fields is for the first device in the mirrorset No command control structures available for disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined A SCSI interface chip command time out occurred during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined Service Manual A 24 Appendix A 03034002 Byte transfer time out during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03044402 SCSI bus errors during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03052002 Device port SCSI chip reported gross error during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03062002 Non SCSI bus parity error during disk operation Note that in this instance the Associated Additional Sense Code and Associated Additional Sense Code Qualifier fields are undefined 03070101 Sou
285. started Port 2 restarted Port 3 restarted Port 4 restarted Port 5 restarted Port 6 restarted Caution Do not disconnect the ECB cable from cache module B Data may be lost if the battery cable is disconnected from the operating cache module 13 Pull the cache module for controller A partly out of the shelf Leave the SBB battery module installed in the device shelf Caution To avoid the possibility of short circuit or electrical shock do not allow the free end of an ECB cable attached to a cache module to make contact with a conductive surface 14 Disconnect the ECB cable from the ECB you are replacing and connect it to the new ECB Tighten the battery cable connector mounting screws Do not overtighten See Figure 2 23 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 39 Figure 2 23 ECB Cable Connection CXO 5282A MC Reinstalling the modules 1 When the controller prompts you answer the question Do you have a replacement HSJ50 readily available N y 2 Answer the question Sequence to INSERT the other HSJ50 has begun Do you wish to INSERT the other HSJ50 N 3 Press Y for yes Wait for the following text to appear on the operating controller s console Attempting to quiesce all ports Port 1 quiesced Port 2 quiesced Port 3 quiesced HSJ50 Array Controller Service Manual 2
286. t Do not include the letter D in the unit number Enter unit number to be tested 350 DILX indicates whether it has been able to allocate the disk If you enabled the read write test DILX gives you a final warning that the data on the disk will be destroyed Unit 350 will be writ nabled Do you still wish to add this unit y n n y If you want to test more disks enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n DILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes HSJ50 Array Controller Troubleshooting 15 DILX will run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test T if you are running DILX through VCS Type C to terminate the current DILX test Type Y to terminate the current test and exit DILX DILX error codes HSJ50 Array Controller If DILX detects an error the performance display for the unit includes the controller instance code IC the device PTL location PTL the SCSI sense key Key the ASC and ASCQ codes ASC Q the number of hard and soft errors HC SC In addition you will see the message DILX detected error code x where x is 1 2 3 or 4 The meanings of the codes are Message Code 1 I
287. t Failure Parameter 0 contains the SCSI command opcode Invalid SCSI medium changer device opcode in misc command DWD Last Failure Parameter 0 contains the SCSI command opcode Invalid SCSI device type in PUB Last Failure Parameter 0 contains the SCSI device type Invalid CDB Group Code detected during create of misc cmd DWD Last Failure Parameter 0 contains the SCSI command opcode Invalid SCSI OPTICAL MEMORY device opcode in misc command DWD Last Failure Parameter 0 contains the SCSI command opcode 030A0100 Error DWD not found in port in_proc_q 0l Service Manual HSJ50 Array Controller Appendix A Last Fail Code 030B0188 03150100 More DBDs than allowed for in mask 031E0100 Can t find in_error dwd on in process queue 031F0100 Either DWD_ptr is null or bad value in dsps 03280100 SCSI CDB contains an invalid group code for a transfer command 03290100 Explanation A dip error was detected when pcb_busy was set Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the new info NULL SSTATO DSTAT ISTAT Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure
288. t is on line to the HSZ controller only It does not indicate that the unit is mounted by the host The availability state is indicated using the following letters a Available Available to be mounted by a host system d Offline Disabled by Digital Multivendor Customer Services The unit has been disabled for service e Online Exclusive Access Unit has been mounted for exclusive access by a user f Offline Media Format Error The unit cannot be brought available due to a media format inconsistency i Offline Inoperative The unit is inoperative and cannot be brought available by the controller m Offline Maintenance The unit has been placed in maintenance mode for diagnostic or other purposes o Online Mounted by at least one of the host systems For HSZ controllers on line in this column means that the unit is on line to the HSZ controller only It does not indicate that the unit is mounted by the host r Offline Rundown The CLI SET NORUN command has been issued for this unit v Offline No Volume Mounted The device does not contain media x Online to other controller Not available for use by this controller A space in this column indicates the availability is unknown Service Manual 1 58 Service Manual Troubleshooting The spindle state is indicated using the following characters For disks this symbol indicates the device is at speed For t
289. t port services A 77 Host interconnect services A 74 induce controller crash utility CRASH A 90 Integrated logging facility A 72 nonvolatile parameter memory failover A 69 overview A 42 SCSI host interconnect services A 76 SCSI host value added services A 85 system communication services directory A 85 tape in line exerciser TILX A 87 LED codes 1 6 LED status indicators 3 60 Local terminal G 6 Logical unit G 6 LRU G 6 Service Manual LUN G 6 Maintenance terminal G 6 Metadata G 6 Moving reduced RAIDset 4 5 single disk drive unit 4 8 storageset members 4 6 storagesets 4 3 MSCP G 7 N Non redundant configuration G 7 Normal member G 7 NV G 7 O OpenVMS host copy script 3 17 optical drives installing 3 69 P Patches installing for controllers 3 3 removing 5 3 Patching controller software 3 3 Port G 7 Power supplies installing into shelf 3 60 replacing 2 47 Precautions electrostatic discharge 3 2 Program cards guidelines 3 2 handling for ESD 2 2 Protection ESD 2 2 Q Qualified device G 8 Quiesce G 8 Service Manual Index R Read cache G 8 Removing cache modules 5 6 controllers 5 6 disk drives 5 10 patches 5 3 storage devices 5 10 Repair action codes A 91 Replacement procedures battery cells 2 33 cache modules 2 29 CD ROM drives 2 53 CI host cables 2 56 controllers 2 3 optical drives 2 53 p
290. t to write a byte to buffer memory The Master DRAB detected a Write Data Parity error during an I960 attempt to write buffer memory The Master DRAB detected a Write Data Parity error during an 1960 attempt to write a byte to buffer memory The CACHEAO DRAB detected a Write Data Parity error during an FX attempt to write CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during an FX attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during a Host port attempt to write CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during a Host port attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during a Device port attempt to write CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during a Device port attempt to write a byte to CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during an 1960 attempt to write CACHEAO memory The CACHEAO DRAB detected a Write Data Parity error during an 1960 attempt to write a byte to CACHEAO memory The CACHEA1 DRAB detected a Write Data Parity error during an FX attempt to write CACHEAI memory Service Manual A 14 Service Manual Appendix A Explanation The CACHEA1 DRAB detected a Write Data Parity error during an FX attempt to write a byte to CACHEA1 memory The CACHEA1 DRAB detected a Write Data Parity erro
291. tached devices to list Message What BUFFER SIZE KB 1024 does the drive requir 2 4 8 16 32 8 Explanation This message is displayed if HSUTIL detects that an unsupported device has been selected as the target device and if you re downloading the firmware image using more than one SCSI Write Buffer command You must specify the number of bytes to be sent in each Write Buffer command The default buffer size is 8192 bytes A firmware image of 256 KB for example can be code loaded in 32 Write Buffer commands each transferring 8192 bytes In this example the correct entry for the buffer size would be 8 Message What is the TOTAL SIZE of the code image in 512 byte blocks MAX 512 Explanation This message is displayed if HSUTIL detects that an unsupported device has been selected as the target device You must enter the total number of 512 byte blocks of data to be sent in the code load operation For example a firmware image that is 262 144 bytes long would require 512 512 byte blocks Message Does the target device support only the download microcode and SAVE y n y Explanation This message is displayed if HSUTIL detects that an unsupported device has been selected as the target device You must specify whether or not the device supports the SCSI Write Buffer command s DOWNLOAD AND SAVE function HSJ50 Array Controllers Installing 3 25 Message Should the code be
292. tains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT For Write Data Parity Error conditions bits 0 through 3 of the CACHEBn DRAB CSR register identify the byte in error For Address Parity Error conditions follow repair action 34 For Write Data Parity Error conditions follow repair action 35 The Master DRAB detected an Ibus Parity Error condition Use the following register information to locate additional details about the error The Master DRAB EAR register combined with the Master DRAB ERR bits 4 through 7 address region yields the affected memory address The Master DRAB EDR register contains the error data If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT If bits 20 through 23 of the Master DRAB DCSR register contain a non zero value a firmware fault is indicated follow repair action 01 otherwise follow repair action 36 Service Manual A 100 Service Manual Appendix A Repair Action Action to take Code This event report contains supplemental information related to a Memory System Failure event report delivered earlier Use the instan
293. tatistics on how many read and write operations were performed during the test Include performance statistics in performance summary y n n y DILX asks if you want hard and soft errors sense data and deferred errors displayed If you do answer y and respond to the rest of the questions If you don t want to see the errors displayed answer n and proceed to the next step Display hard soft errors y Display hex dump of Error Information Packet Requester Specific information y n n y When the hard error limit is reached the unit will be dropped from testing Enter hard error limit 1 65535 65535 100 When the soft error limit is reached soft errors will no longer be displayed but testing will continue for the unit Enter soft error limit 1 65535 32 32 Set the maximum number of outstanding I Os for each unit Set the I O queue depth 1 12 4 9 HSJ50 Array Controller Troubleshooting 9 10 11 12 13 14 HSJ50 Array Controller 1 27 Run the basic function test xxx Available tests are 1 Basic Function 2 User Defined Use the Basic Function test 99 9 of the time The User Defined test is for special problems only Enter test number 1 2 1 1 Caution If you choose to write enable disks during the test make sure that the disks do not contain customer data Set the test as read only or read write Write enable disk unit s to be tested y
294. ter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers HSJ50 Array Controller Appendix A HSJ50 Array Controller Last Fail Code 03390108 033C0101 A 61 Explanation An unknown interrupt code was found in a device port s DSPS register Last Failure Parameter 0 contains the PCB port_ptr value Last Failure Parameter 1 contains the PCB copy of the device port TEMP register Last Failure Parameter 2 contains the PCB copy of the device port DBC register Last Failure Parameter 3 contains the PCB copy of the device port DNAD register Last Failure Parameter 4 contains the PCB copy of the device port DSP register Last Failure Parameter 5 contains the PCB copy of the device port DSPS register Last Failure Parameter 6 contains the PCB copies of the device port SSTAT2 SSTAT1 SSTATO DSTAT registers Last Failure Parameter 7 contains the PCB copies of the device port LCRC RESERVED ISTAT DFIFO registers An invalid code was seen by the error
295. ter compare percentage 1 100 5 10 16 The system displays a list of all single disk units by unit number that you can choose for DILX testing Select the first disk that you want to test Do not include the letter D in the unit number Enter unit number to be tested 350 17 DILX indicates whether it has been able to allocate the disk If you enabled the read write test DILX gives you a final warning that the data on the disk will be destroyed Unit 350 will be writ nabled Do you still wish to add this unit y n n y 18 If you want to test more disks enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n DILX testing started at lt date gt lt time gt Test will run for lt nn gt minutes 19 DILX will run for the amount of time that you selected and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test T if you are running DILX through VCS Type C to terminate the current DILX test Type Y to terminate the current test and exit DILX Running an advanced disk test This section provides instructions on how to run an advanced DILX test in which you define the commands that make up the test read write access and so on Only select this test if you are very knowledgeable about disk testing You should use the basic function
296. that s shown in the Used by column HSJ50 gt DELETE unit number Delete the device name shown in the Name column HSJ50 gt DELETE device name Remove the device and move it to its new PTL location Re add the device to the controller s list of valid devices HSJ50 gt ADD DEVICE device name PTL location If you re moving a tape loader re create the passthrough device that represents the loader HSJ50 gt ADD PASSTHROUGH passthrough_name PTL location Re present the device to the host by giving it a unit number that the host can recognize You can use the original unit number or create a new one HSJ50 gt ADD UNIT unit number device name You might have to reconfigure the host based software that controls the loader Refer to the documentation that accompanied the loader and its software HSJ50 Array Controller Moving Storagesets and Devices 4 11 Example The following example moves TAPE100 unit T108 from PTL 1 0 0 to PTL 600 50 gt SHOW tape100 HSJ NA E E Type Port Targ lun Used by TAP HSJ HSJ HSJ HSJ E100 50 gt 50 gt tape 1 0 0 T108 DELE 108 DEL APE100 move tape100 to its new location 50 gt 50 gt A A DD DD APE TAPE600 6 0 0 UNIT T600 TAPE600 The following example moves tape LOADER 120 from p3 to p1 HSJ50 gt SHOW PASSTHROUGH LOADER
297. that have been initialized with the INITIALIZE container name SAVE CONFIGURATION command The version numbers and patch numbers in this procedure are only examples The Patch Code program will not allow you enter any of these numbers used in these examples Required tools HSJ50 Array Controller The tools listed in Table 5 1 are required for removing a patch Table 5 1 Required tools Maintenance terminal To start the CLCP utility 5 32 inch Allen wrench To unlock the SW800 cabinet To remove a patch 1 Connect a maintenance terminal to one of the controllers See Figure 5 1 Figure 5 1 Connecting a maintenance terminal to the controller Local connection port BC16E XX 0 To terminal CXO 5322A MC Service Manual Service Manual Removing Start the CLCP utility HSJ50 gt RUN CLCP The CLCP main menu is displayed Select an option from the following list Code Load amp Code Patch Utility Main Menu O Exit 1 Enter Code LOAD utility 2 Enter Code PATCH utility Enter option number Oos FEOJ 2 This controller module does not support code load functionality Exiting CLCP HSJ50 gt Press 2 to select the code patch program The code patch menu is displayed You have selected the Code Patch local program This program is used to manage firmware code patches Select an option from the following list Type Y or C then RETURN at any time to abort Code Patch Code Patch Main
298. the EI A 423 maintenance port on the front bezel of the HS array controller Also called a maintenance terminal logical unit The physical device or storage unit seen by the host Often these logical units are spread across more than one physical device especially in RAID implementations This is not a LUN Logical Unit Number See LUN LRU Least recently used This is cache terminology for the block replacement policy for the read cache LUN A logical unit number is a physical or virtual peripheral device addressable through a target LUNs use their target s bus connection to communicate on the SCSI bus maintenance terminal Any EIA 423 compatible terminal to be plugged into the HS controller This terminal is used to identify the controller enable host paths define the configuration and check controller status It is not required for normal operations It is sometimes referred to as a local terminal metadata Data written on the physical disk that is not visible to the host customer that allows the HS array controller to maintain a high integrity of customer data HSJ50 Array Controller Glossary HSJ50 Array Controller G 7 mirrorset Two or more physical disks configured to present one highly reliable virtual unit to the host MSCP Mass storage control protocol The protocol by which blocks of information are transferred between the host and the controller non redundant configuration A single cont
299. the controller s front bezel HSJ50 Array Controllers Service Manual 3 50 Installing Restarting the subsystem 1 Remove the program card from the controller by pressing and holding in the Reset button then pressing the eject button next to the program card See Figure 3 19 Figure 3 19 Removing the program card PCMCIA button card CXO 5323A MC 2 Reconnect power cords to the controller power supplies Service Manual HSJ50 Array Controllers Installing 3 51 3 Press and hold the Reset button on the controller while pushing in the program cards Release the Reset button The controller will initialize See Figure 3 20 When the reset light on each controller flashes at a rate of once every second the initialization process is complete Figure 3 20 Installing the program card Orientation dot PCMCIA card cover CXO 5331A MC HSJ50 Array Controllers Service Manual 3 52 Installing 4 Snap the ESD covers into place over each program card Push the pins inward to lock the covers in place 5 Reconnect the maintenance terminal to the controller 6 At the controller check for the new write back cache CLI gt SHOW THIS_CONTROLLER Notice that the new write back cache is reported on THIS_CONTROLLER 7 Enable write back cache on specific units using the command CLI gt SET unit name WRITEBACK_CACHE Adding Cache Memory This procedure contains information for increas
300. the failure was reported via the DRAB_INT Follow repair action 36 A Multiple Bit ECC error was detected by the Master DRAB Use the following register information to locate additional details The Master DRAB DER register bits 0 through 6 contain the syndrome value The Master DRAB EAR register combined with Master DRAB ERR bits 0 through 3 address region yields the affected memory address The Master DRAB EDR register contains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was reported via the DRAB_INT Follow repair action 34 A Multiple Bit ECC error was detected by the CACHEAO or CACHEA1 DRAB Use the following register information to locate additional details The CACHEAn DRAB DER register bits 0 through 6 contain the syndrome value The CACHEAn DRAB EAR register combined with the Master DRAB RSR register bits 8 through 11 CACHEA memory region yields the affected memory address The CACHEAn DRAB EDR register contains the error data If the failure involved a Device port the Master DRAB CSR register bits 10 through 12 identify that Device port If Master DRAB DSR register bit 14 is set the failure was reported via the NMI If Master DRAB DSR register bit 14 is clear the failure was rep
301. this environment Last Failure Parameter 0 contains the instance code value The requester s error table index passed to FM REPORT_EVENT is larger than the maximum allowed for this requester Last Failure Parameter 0 contains the instance code value Last Failure Parameter 1 contains the requester error table index value The USB index supplied in the EIP is larger than the maximum number of USBs Last Failure Parameter 0 contains the instance code value Last Failure Parameter 1 contains the USB index value HSJ50 Array Controller Appendix A Last Fail Code 04040103 04050100 04070103 04080102 04090100 HSJ50 Array Controller A 65 Explanation The event log format found in V_fm_template_table is not supported by the Fault Manager The bad format was discovered while trying to fill in a supplied eip Last Failure Parameter 0 contains the instance code value Last Failure Parameter 1 contains the format code value Last Failure Parameter 2 contains the requester error table index value The Fault Manager could not allocate memory for his Event Information Packet EIP buffers There is more EIP information than will fit into a datagram The requester specific size is probably too large Last Failure Parameter 0 contains the instance code value Last Failure Parameter 1 contains the format code value Last Failure Parameter 2 contains the requester error table i
302. this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 020F2401 The write cache modules are not configured properly for a dual redundant configuration One of the write cache modules is not present to perform cache failover of dirty write back cached data Note that in this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 02102401 The write cache modules are not configured properly for a dual redundant configuration One of the cache modules is not the same size to perform cache failover of dirty write back cached data Note that in this instance the Memory Address Byte Count DRAB register and Diagnostic register fields are undefined 02110064 Disk Bad Block Replacement attempt completed for a read within the user data area of the disk Note that due to the way Bad Block Replacement is performed on SCSI disk drives information on the actual replacement blocks is not available to the controller and is therefore not included in the event report 02120064 There are insufficient resources to complete operation in a SCSI environment Insufficient resources returned from HIS CREATE_RECEIVE_DATA 02130064 The tape device does not contain any medium 00 02140064 The unit has been marked inoperative or 00 UNKNOWN In either case the unit is not available 02150064 The Unit State Block unit status associated with this I O has chang
303. through FF hexadecimal HSJ50 Array Controller Troubleshooting 1 21 FMU gt DESCRIBE ASC_ASCQ_ CODE 0 Your options are ASCQ value range 0 through FF hexadecimal FMU gt DESCRIBE ASC_ASCQ_ CODE 0 0 Your options are SCSI Device Type value range 0 through FF hexadecimal FMU gt Using FMU to Describe Recent Last Failure or Memory System Failure Codes HSJ50 Array Controller HSJ controllers store the four most recent last failure codes and memory system failure codes You can use the FMU utility to retrieve these codes and their descriptions To view a last failure or memory system failure code 1 Start FMU from the CLI HSJ50 gt RUN FMU To see all of the stored last failure or memory system failure events FMU gt DESCRIBE LAST_FAILURE ALL or U gt DESCRIBE MEMORY_SYSTEM_FAILURE ALL To see the most recent last failure or memory system failure events FMU gt DESCRIBE LAST_FAILURE MOST_RECENT or FMU gt DESCRIBE MEMORY_SYSTEM_FAILURE MOST_RECENT To see one of the four stored last failure or memory system failure events FMU gt DESCRIBE LAST_FAILURE n or FMU gt DESCRIBE MEMORY_SYSTEM_FAILURE n where n is the stored event number from 1 4 Service Manual 1 22 Troubleshooting FMU Output Example HSJ50 gt RUN FMU Fault Management Utility FMU gt SHOW LAST_FAILURE MOST _RECENT Last Failure Entry 1 Flags OOOFF301 Template 1 01 Description Last Failure Event
304. troller Local connection port BC16E XX BE To terminal CXO 5322A MC 2 Take the controller out of service HSJ50 gt SHUTDOWN THIS CONTROLLER To ensure that the controller has shut down cleanly check for the following indication on the controller s OCP operator control panel The Reset light is lit continuously Port lights 1 2 3 are also lit continuously 3 Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist 4 Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point 5 Remove the power cords from the controller power supplies 6 Disable the ECB switch by pressing the battery disable switch on the battery module front panel See Figure 2 8 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 17 Figure 2 8 Disconnecting the ECB cable ii ill 2 Ii II 2 i lol lis ti M I la ji 5 a i A CXO 5282A MC 7 Disconnect the ECB cable from the cache module See Figure 2 8 8 Unsnap and remove the program card ESD cover See Figure 2 9 HSJ50 Array Controller Service Manual 2 18 Replacing field replaceable units 9 Remove the program card by pressing and holding in the Reset button then pressing the eject button next to the card See Figure 2 9 Pull the card from the controller mod
305. ual Installing Formatting disk drives Use HSUTIL s FORMAT_DISK option to format simultaneously up to seven disk drives attached to a single controller or up to six disk drives attached to a dual redundant pair of controllers Caution To avoid the possibility that data may be lost you must suspend all I O to the buses that service the target disk drives To format one or more disk drives 1 Start HSUTIL CLI gt RUN HSUTIL Press 1 to select the FORMAT function HSUTIL finds and displays all of the unformatted disk drives attached to the controller Enter the name of each disk drive you want to format Enter a device to format disk_name Press Y to enter another disk drive name or N to begin the formatting operation Select another device y n n N Read the cautionary information that HSUTIL displays then confirm or cancel the formatting operation Do you want to continue y n n Y Considerations for formatting disk drives Keep the following points in mind for formatting disk drives with HSUTIL Service Manual HSUTIL cannot format disk drives that have been configured as single disk drive units or as members of a storageset spareset or failedset If you want to format a disk drive that s previously been configured as such you ll have to delete the unit number and storageset name associated with it If the power fails or the bus is reset while HSUTIL is formatting a disk
306. ule Save the program card for the replacement controller Figure 2 9 Removing the program card Eject pcmcia button card CXO 5323A MC 10 With a small flat head screwdriver loosen the captive screws on the CI connector at the controller s front panel Service Manual HSJ50 Array Controller Replacing field replaceable units 2 19 HSJ50 Array Controller Figure 2 10 Removing the controller and cache module 11 12 13 14 15 CXO 5327A MC Loosen the two captive retaining screws at each corner of the controller s front bezel Use a gentle rocking motion to loosen the controller module from the shelf backplane Slide the controller module out of the shelf and place it on an ESD mat See Figure 2 10 If you are also removing the cache module loosen the captive screws at each corner of the cache module s front bezel Use a gentle rocking motion to loosen the cache from the shelf backplane Slide the cache module out of the shelf See Figure 2 10 Service Manual 2 20 Replacing field replaceable units Installing the new module 1 If required install the replacement cache module into the shelf See Figure 2 11 Figure 2 11 Installing the new cache and controller modules Cache ae CXO 5324A MC 2 Slide the new controller module into the shelf using the same rails as the removed module See Figure 2 11 Service Manual HSJ50 Array Controller Replacing fie
307. ung Dieses ist ein Ger t der Funkst6rgrenzwertklasse A In Wohnbereichen k nnen bei Betrieb dieses Ger tes Rundfunkst rungen auftreten in welchen Fallen der Benutzer fiir entsprechende Gegenma nahmen verantwortlich ist Avertissement Cet appareil est un appareil de Classe A Dans un environnement r sidentiel cet appareil peut provoquer des brouillages radio lectriques Dans ce cas il peut tre demand 1 utilisateur de prendre les mesures appropri es Table of Contents 1 Troubleshooting JEKA RONG LOVENO tRNA EE OET bath SEE EOE EE E upenat wi stat bay wast secant ery nitty T 1 2 Fault isolation pu dE a aa tana nce nue to RAB AoE Suelo A Ble 1 3 Controller is not operating aiian aE a Aa NE EEE EE T anit 1 3 Unable to see units from Hosts o 5 25 scesscessccstuesdovesds2s estes esha tub ends cas snedseesdaelbessida ikea NEESS 1 3 VMS shadowsets go into mount verify eee eesceseeesseeseeceseceaeeceeeceaeecseecsacecseeesteeeeeeees 1 4 Units are Host Unavailable s t sci deh a eeiei te tel et e Sl seeds 1 4 Foreign disk drive is not Usable eeeeesseeesseeseesseecseecsseecseecseeesseeeesseesaeeesaeesseeesaeers 1 5 Interpreting controller LED COdeS 0 eee eee eeseecsneeeneeeseesseesacecscecsseesseecesaeeneeeesaeeesaeeeaeessaters 1 6 Interpreting host event log messages eeeecceeseceecesececseneeeeeeeeceeneeceeseecesseneeesseeeeesteeesenees 1 12 Finding the Instance Code in an Event Message ecceseccs
308. us transceiver circuits DUP Diagnostic and utility protocol Host application software that allows a host terminal to be connected to the controller s command line interpreter DWZZA The StorageWorks compatible SCSI bus signal converter ECB External cache battery ECC Error correction code One or more cyclic redundancy check CRC words that allow detection of a mismatch between transmitted and received data in a communications system or between stored and retrieved data in a storage system The ECC allows for location and correction of an error in the received retrieved data All ECCs have limited correction power EDC Error detection code One or more checksum words that allow detection of a mismatch between transmitted and received data in a communications system or between stored and retrieved data in a storage system The EDC has no data correction capability ESD Electrostatic discharge The discharge of a potentially harmful static electric voltage as a result of improper grounding failedset A group of disk drives that have been removed from RAIDsets due to a failure or a manual removal Disk drives in the failedset should be HSJ50 Array Controller Glossary HSJ50 Array Controller G 5 considered defective and should be tested repaired and then placed into the spareset failover The process that takes place when one controller in a dual redundant configuration assumes the workload of a failed
309. usb index 606C0100 mscp_foc_receive_cmd detected that the message sent from the other controller had an illegal exclusive access state 606D0100 FOC provided mscp_foc_send_cmpl_rtn with an invalid status for the FOC SEND transmit command completion 606E0100 FOC provided mscp_foc_send_rsp_done with an invalid transmit status for the FOC SEND transmit response completion Table A 17 Diagnostics and utilities protocol server last failure codes Last Fail Code Explanation 61010000 Controller crash was intentionally caused by the execution of the CRASH program This bugcheck does not indicate the occurrence of a controller failure Removed from HSOF firmware at Version 2 7 Service Manual HSJ50 Array Controller Appendix A A 85 Last Fail Code Explanation 61020100 HIS LISTEN call failed with 01 INSUFFICIENT_RESOURCES 61090100 LISTEN_CONNECTION_ESTABLISHED 01 event from HIS specified a connection ID for a connection we already know about 610B0100 Code Load or Code Patch utility in CLCP local program forced controller restart to force new code or patch to take effect This last_failure code was removed from HSOF firmware at Version 2 7 610C0100 HIS has reported a connection event that 01 should not be possible Table A 18 System communication services directory last failure code Last Fail Code Explanation 62000100 HIS LISTEN call failed with 01 INSUFFICIENT_RESOURCES 62020100 Failure to allocate associated tim
310. valid status 025A0102 An invalid status was returned from CACHE LOOKUP_LOCK Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 025B0102 An invalid mapping type was specified for a logical unit Last Failure Parameter 0 contains the USB address Last Failure Parameter 1 contains the Unit Mapping Type 025C0102 An invalid mapping type was specified for a logical unit Last Failure Parameter 0 contains the USB address Last Failure Parameter 1 contains the Unit Mapping Type 02620102 An invalid status was returned from CACHE LOOKUP_LOCK Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 02690102 An invalid status was returned from CACHE OFFER_WRITE_DATA Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 02730100 A request was made to write a device 01 metadata block with an invalid block type Service Manual HSJ50 Array Controller Appendix A A 51 Last Fail Code Explanation 02790102 An invalid status was returned from VA XFER in a complex read operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the invalid status 027B0102 An invalid status was returned from VA XFER in a complex ACCESS operation Last Failure Parameter 0 contains the DD address Last Failure Parameter 1 contains the i
311. ve the power cords from the controller power supplies Disable the ECB by pressing the battery disable switch on the battery module s front pane HSJ50 Array Controller Replacing field replaceable units 2 31 8 Disconnect the ECB cable from the cache module 9 With a small flat head screwdriver remove the CI connector from the controller s front panel 10 Loosen the two captive retaining screws on the controller and cache module front panel 11 Use a gentle rocking motion to loosen the controller module from the shelf backplane Figure 2 18 Removing controller and cache modules Cache le module CXO 5327A MC 12 Slide the controller module out of the shelf noting which rail it was seated in and place it on an ESD mat See Figure 2 18 HSJ50 Array Controller Service Manual 2 32 Replacing field replaceable units 13 Slide the cache module out of the shelf See Figure 2 18 Installing the new cache module 1 Install the replacement cache module into the shelf using the same rails as the removed modules See Figure 2 19 Figure 2 19 Installing cache and controller module Cache ae CXO 5324A MC 2 Reinstall the controller module into the shelf using the same controllers rails and tighten the two captive retaining screws on the controller and cache modules See Figure 2 19 Service Manual HSJ50 Array Controller Replacing field replaceable units 2 33 Reconnect the open end of
312. vel Device Product ID and Device Type fields is for the first device in the mirrorset This Instance code has been removed from the HSOF Version 2 7 release The device specified in the Device Locator field had a read error that has been repaired with data from another mirrorset member The device specified in the Device Locator field had a read error Attempts to repair the error with data from another mirrorset member failed bacause of a lack of alternate error free data source HSJ50 Array Controller Appendix A Instance Code 023D0064 02422464 02432201 02442201 02452201 02460064 02470064 02480064 HSJ50 Array Controller A 21 Explanation The device specified in the Device Locator field had a read error Attempts to repair the error with data from another mirrorset member failed because of a write error on the original device The original device will be removed from the mirrorset Cache failover attempt failed because the other cache was illegally configured with SIMMs Note that in this instance the memory address byte count DRAB register and Diagnostic register fields are undefined The CACHE Dynamic Ram controller and ArBitration engine 0 DRABO failed cache diagnostics testing performed on Cache B other cache during a cache failover attempt The memory address field contains the starting physical address of the CACHEBO memory The CACHE Dynamic Ram controller and ArBitration engi
313. ves enter the unit numbers when prompted Otherwise enter n to start the test Select another unit y n n n TILX testing started at lt date gt lt time gt Test will run for 10 minutes 6 TILX will run for 10 minutes and then display the results of the testing If you want to interrupt the test early Type G Control G to get a performance summary without stopping the test Type T if you are running TILX through the VAXcluster control system VCS Type C to terminate the current TILX test Type Y to terminate the current test and exit TILX HSJ50 Array Controller Troubleshooting 1 35 Running a tape drive basic function test HSJ50 Array Controller Caution The basic function test performs write operations Make sure that the tapes that you use do not contain customer data This section provides instructions on how to run a TILX basic function test on one or more tape drives The test performs a repeating cycle of a write pass followed by a read pass The write pass executes in two phases Data intensive The first one third of the specified number of records are written as 16 kilobyte records With this high byte count and the default queue depth this phase should test the streaming capability if supported of the tape unit Random The remaining two thirds of the records are written with random byte counts The command sequence is write reposition back one record read and is r
314. wap procedure to replace a failed cache module Required tools HSJ50 Array Controller The tools listed in Table 2 5 are required for replacing cache modules Table 2 5 Required tools Maintenance terminal To shut down controllers restart controllers execute CLI commands and invoke utilities ESD wrist strap and ESD To protect all equipment against ESD mat Small flat head To loosen CI connector and front bezel captive screwdriver screws 5 32 Allen wrench To unlock the SW800 Series cabinet Service Manual 2 30 Replacing field replaceable units Removing a write back cache module in a single controller configuration Service Manual 1 Connect a maintenance terminal to the controller that contains the cache module to be replaced See Figure 2 17 Take the controller out of servoce HSJ50 gt SHUTDOWN THIS_CONTROLLER Figure 2 17 Connecting a maintenance terminal to the controller Local connection port BC16E XX ae To terminal CXO 5322A MC To ensure that the controller has shut down cleanly check for the following indication on the controller s OCP The Reset light is lit continuously Port lights 1 2 3 are also lit continuously Obtain and place an ESD wrist strap around your wrist Ensure that the strap fits snugly around your wrist Attach or clip the other end of the ESD wrist strap to the cabinet grounding stud or a convenient cabinet grounding point Remo
315. wrist strap around your wrist Ensure that the strap fits snugly around you wrist Attach or clip the other end of the ESD strap to the cabinet grounding stud or a convenient cabinet grounding point nonpainted surface Disconnect power cords from controller power supplies If you do not already have a second controller power supply this may be the time to add one Using Table 3 6 as a guide find the slot and the SCSI ID into which the controller is to be installed Note that the second controller should be installed in the slot that corresponds to SCSI ID 6 Table 3 6 Controller installation guide Controller SW800 Front SW800 Sw500 View Rear View Front amp Rear View First Controller Second Left Side Right Side Bottom Slot Top Slot Controller scst ip 6 SCSIID6 SCSIID6 SCSI ID 6 HSJ50 Array Controllers Right Side Left Side Top Slot Bottom Slot SCSIID 7 SCSI ID 7 SCSI ID 7 SCSI ID 7 Service Manual 3 34 Installing 11 Install the SBB battery module into a convenient device slot See Figure 3 10 If you are replacing a single ECB with a dual ECB follow these steps a Press the shutdown button on the single ECB until the LED stops flashing b Remove the ECB cable from the single ECB c Remove the single ECB from the device slot d Install the dual ECB into the device slot Figure 3 10 Installing an SBB Battery module CXO 5306A MC Service Manual HSJ50 Array Controllers Installin
316. y available for autoconfig 01 buffer allocation 034A0100 Insufficient memory available for PUB 01 allocation 034B0100 Insufficient memory available for DS init 01 buffer allocation 034C0100 Insufficient memory available for static 01 structure allocation 034D0100 DS init DWDs exhausted 034E2080 Diagnostics report all device ports are broken 03500100 Insufficient memory available for command 01 disk allocation 03510100 Insufficient resources available for command 01 disk data region HSJ50 Array Controller Service Manual A 64 Service Manual Appendix A Last Fail Code Explanation 03520100 A failure resulted when an attempt was made to allocate a DWD for use by DS CDI 035A0100 Invalid SCSI message byte passed to DS 035B0100 Insufficient DWD resources available for 01 SCSI message pass through 1 03640100 Processing run_switch disabled for LOGDISK 0 associated with the other controller 03650100 Processing pub unblock for LOGDISK Ol associated with the other controller 03660100 No memory available to allocate pub to tell 01 the other controller of reset to one if its LUNs 03670100 No memory available to allocate pub to tell 01 the other controller of a bdr to one if its LUNs Table A 5 Fault manager last failure codes Last Fail Code 04010101 04020102 04030102 Explanation The requester id component of the instance code passed to FM REPORT_EVENT is larger than the maximum allowed for
317. y is functioning properly however one power supply on the associated bus has failed Either there is no AC power to this supply or this power supply should be replaced LED on LED off li Service Manual HSJ50 Array Controllers Installing 3 61 Table 3 13 shows all possible fault indications for the SW500 and the SW800 cabinet in a single power supply configuration Table 3 13 Shelf and single power supply status indicators Status Indicator Shelf LED System is operating normally Power supply LED Shelf and power supply fault Replace power supply Shelf LED Power supply LED Described in the Replace Section Shelf LED There is a shelf fault there is no power supply fault Power supply LED Replace shelf blower LED on Hi LED off _ HSJ50 Array Controllers Service Manual 3 62 Installing Table 3 14 shows all possible fault indications for the SW500 and the SW800 cabinets for a dual power supply configuration Note The status indicators will operate ONLY if the power supplies and the shelf blowers are present The failure must be a electrical or mechanical failure Table 3 14 Shelf and dual power supply status indicators Statusindicator Ps1 Ps2 Indication Shelf LED Normal Power supply LED System is operating normally Shelf LED There is a shelf fault there is no power supply Power Supply LED Replace shelf blower Shelf LED PS 1 is operational stupa mmm
318. y pressing and holding in the Reset button then pressing the eject button next to the card See Figure 2 13 Pull the card out and save it for use in the replacement controller module Service Manual HSJ50 Array Controller Replacing field replaceable units 2 25 9 With a small flat head screwdriver remove the CI connector from the controller s front panel 10 Disconnect the ECB cable from the cache module While holding the battery cable in one hand disable the ECB by pressing the battery disable switch See Figure 2 14 Figure 2 14 Disconnecting the ECB cable A A q CXO 5282A MC 11 Loosen the front bezel captive screws on the controller and cache module 12 Use a gentle rocking motion to loosen the controller module from the shelf backplane 13 Slide the controller module out of the shelf noting which rails the module was seated in and place it on an approved ESD mat See Figure 2 15 14 If necessary remove the cache module and note in which rails it was seated Place the module on an approved ESD mat See Figure 2 15 HSJ50 Array Controller Service Manual 2 26 Replacing field replaceable units Figure 2 15 Removing the controller and cache modules CXO 5327A MC Installing the new controller 1 Reinstall the cache module if it has been removed See Figure 2 16 2 Use a gentle rocking motion to help seat the module into the backplane Press firmly on th

Download Pdf Manuals

image

Related Search

Related Contents

TR8UT+ Detailed User`s Manual    A o B  User Manual  Philips HX7002  取扱説明書 変換器 KF3  0.56 ∼ 0.62 1.55 ∼ 1.65  Mode d`emploi - buehler  Bibiothèque bdv.xlsx  

Copyright © All rights reserved.
Failed to retrieve file