Home

Sun Microsystems 3900 Series Computer Drive User Manual

image

Contents

1. TABLE6 1 Storage Automated Diagnostic Environment Event Grid for Switches Continued ICat Component EventType Sev Action Description Information Action switch port StateChange Info Action port 1 Port on switch is lin SWITCH diag185 Inow available ip Ixxx 20 67 185 is Inow Available status state changed from OFFLINE to ONLINE switch port StateChange Red Y Info Action port 1 Information A port in SWITCH diag185 Jon the switch has ip xxx 20 67 185 flogged out of the is now Not Available Fabric and has gone status state changed loffline from ONLINE to OFFLINE IRecommended jaction il Verify cables GBICs and connections along the Fibre Channel path 2 Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path B Verify the correct FC switch configuration switch enclosure Statistics Info Statistics Port Statistics about switch d2 swb1 ipxxx 0 0 41 10002000007a609 Chapter 6 Troubleshooting Sun StorEdge FC Switch 8 and Switch 16 Devices 67 For Internal Use Only Replacing the Master Midplane Follow this procedure when replacing the master midplane in a Sun StorEdge network FC switch 8 or switch 16 switch or a Brocade Silkworm switch This procedure is detailed in the Storage Automated Diagnostic Environment User s Guide v To Replace the Master Midplane 1 Choose Maintenance gt General Maintenance gt Maint
2. For Internal Use Only Category Component EventType Sev Action Description virtualization j ve_diag Diagnostic Red ve_diag diag240 engine Test jon ve 1 ip xxx 20 67 213 failed virtualization veluntest Diagnostic Red veluntest engine Test diag240 on ve 1 ip xxx 20 67 213 failed virtualization enclosure Discovery Info Discovered nformation engine ja new Virtualization Discovery events Engine called vla joccur the first time the agent probes a storage device and creates a detailed description of the device monitored The discovery device sends it using any lactive notifier such as INetConnect or email Chapter 7 Troubleshooting Virtualization Engine Devices 97 98 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 8 Troubleshooting the Sun StorEdge T3 Array Devices This chapter contains the following sections a Explorer Data Collection Utility on page 99 m Sun StorEdge T3 Array Event Grid on page 109 Explorer Data Collection Utility The Explorer Data Collection Utility script is included on the Storage Service Processor in the export packages directory The Explorer Data Collection Utility is not installed by default but can be installed during rack setup Customer specific site information can be entered at that time To Install Explorer Data Collection Utility on the Storage Service Processor cd export packages pkgadd d S
3. SRN after Corrective ISRN SNMP Description Corrective Action Action 70020 e SAN topology has changed Check SAN cabling and connections 70020 170030 e Global SAN configuration has between Sun StorEdge T3 array 70030 70050 changed and virtualization engine 70051 70021 e SAN configuration has changed Perform Sun StorEdge T3 array e A physical device is missing ffailback if necessary 70025 Partner s virtualization engine s IP isnot Check Ethernet cabling and INone reachable connections 70020 e SAN topology has changed e Check cabling and connections 70020 70030 e Global SAN configuration has between virtualization engine 70030 70050 changed e Cycle power on failed 70050 70025 e SAN configuration has changed virtualization engine if fault LED 70024 70021 e Partner virtualization engine s IP is __ flashes 70021 70022 Inot reachable e Perform Sun StorEdge T3 70022 e A physical device is missing array failback if necessary e A SLIC virtualization engine is e Enable VERITAS path Readings ee 12097 When error halt on virtualization engine not master e SLIC daemon connection is inactive 72000 Failed to check for SAN changes daemon error check the SLIC virtualization engine e Secondary daemon connection is active Sun StorEdge T3 array LUN Failover Sun StorEdge T3 array LUN Failback 128 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE A
4. Rebuild is aborted with write error This Imeans the primary drive cannot write to the drive being built Read error is reported by follower If the initiator is master then its follower has detected a read error on a member within a mirror drive Read error is detected by master If the initiator is master then it has detected a read error on a member within a mirror drive CleanUp configuration table is completed SAN physical configuration has changed Internal error Update firmware If a spare drive is available it will be brought in and used to replace the failed drive If no spare is available replace the failed drive with a new drive If a spare drive is available it will be brought in and used to replace the failed drive If no spare is available replace the failed drive with a new drive If a spare drive is available it will be brought in and used to replace the failed drive If no spare is available replace the failed drive with a new drive If unintentional check condition of drives 70021 Drive is offline If unintentional check condition of drives 70022 70023 70024 70025 virtualization engine is offline Drive is unresponsive For Sun StorEdge T3 array pack Master virtualization engine has detected the partner virtualization engine s IP Address For Sun StorEdge T3 array pack Master virtualization engine is unable to detect the partner virtualization en
5. Count Type Description Link Failure Count Loss of Synchronization Count Loss of Signal Count Primitive Sequence Protocol Error Invalid Transmission Word Invalid CRC Count The number of times the virtualization engine s frame manager detects a non operational state or other failure of N_Port initialization protocol The number of times that the virtualization engine detects a loss in synchronization The number of times that the virtualization engine s frame manager detects a loss of signal The number of times that the virtualization engine s frame manager detects N_Port protocol errors The number of times that the virtualization engine s 8b 10b decoder does not detect a valid 10 bit code The number of times that the virtualization engine receives frames with a bad CRC and a valid EOF A valid EOF includes EOFn EOFt or EOFdti Chapter 7 Troubleshooting Virtualization Engine Devices For Internal Use Only 75 76 v To Check Fibre Channel Link Error Status Manually The Storage Automated Diagnostic Environment which runs on the Storage Service Processor monitors the Fibre Channel link status of the virtualization engine The virtualization engine must be power cycled to reset the counters Therefore you should manually check the accumulation of errors between a fixed period of time To check the status manually follow these steps Use the svstat command to take a reading as s
6. Virtualization Engine Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort virtualization engine events by component category or event type The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event whether action is required a description of the event and the recommended action Refer to the Storage Automated Diagnostic Environment User s Guide Help section for more information v Using the Virtualization Engine Event Grid 1 From the Storage Automated Diagnostic Environment Help menu click the Event Grid link 2 Select the criteria from the Storage Automated Diagnostic Environment event grid like the one shown in FIGURE 7 7 Home Help Logout central om eles PTE Event Grid Help Select a Category Component EventType and type GO to limit the report Click on the Columns headers to change the vent Grid pdf Architecture Diagnostics Diag Strategy Utilities Release Notes ReportFormat Action Description Info Action Lost communication with VE slicd v1a yeluntest diag240 on ve 1 ip xxx 20 67 213 failed Page 1 of 1 9 events Sey Severity of the event warning gt Error gt Down a Action This event is Actionable and will be sent to RSS SRS SubComp SubComponent FIGURE 7 7 Virtualization Engine Event Grid Chapter 7 Troubleshooting Virtualization Engine Devices 95
7. Info Agent Page 1 of 1 8 events a Sey Severity of the event warning gt Error gt Down Action This event is Actionable and will be sent to RSS SRS SubComp SubComponent FIGURE 5 1 Host Event Grid 54 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE 5 1 lists all the host events in the Storage Automated Diagnostic Environment TABLE 5 1 Storage Automated Diagnostic Environment Event Grid for the Host Category Component EventType Sev Action Description Information host hba Alarm Yellow Info status of hba Monitors changes in devices the output of the sbus 9 0 luxadm e port SUNW qlc 0 30000 p 0 0 devectl on diag Xxxxx xxx com changed from NOT CONNECTED to CONNECTED host hba Alarm Red Y Info status of hba e Monitors changes devices in the output of the sbus 9 0 luxadm e port SUNW qlc 0 30000 e Found path to 20 p 0 0 devctl IHBA ports on diag Xxxxx xxx com changed from CONNECTED to INOT CONNECTED host lun t300 Alarm Red Y Info The state of luxadm display lun T300 c14t500 reported a change in 20F2300003EE5d0s __ the port status of 2 statusA on lone of its paths The diag xxxxx xxx com Storage Automated changed from OK to Diagnostic ERROR Environment then target t3 diag244 tries to find to t3b0 90 0 0 40 which enclosure this path corresponds by reviewing its database of Sun StorEdge T3 array
8. Jan 8 17 34 36 WWN 2b000060220041f4 diag xxxxx xxx com fp ID 517869 kern warning WARNING fp 0 N_x Port with D_ID 108000 PWWN 2b000060220041f4 disappeared from fabric lt snip gt multipath status degraded path pci 6 4000 SUNW qlc 2 fp 0 0 fp0 to target address 2b000060220041f4 1 is offline Jan 8 17 34 55 WWN 2b000060220041f4 diag xxxxx xxx com mpxio ID 779286 kern info scsi_vhci ssd g29000060220041f96257354230303052 ssdl18 multipath status degraded path pci 6 4000 SUNW qlc 2 fp 0 0 fp0 to target address 2b000060220041f4 0 is offline FIGURE 3 4 A2 B2 FC Link Host Side Event Chapter 3 Troubleshooting the Fibre Channel Links 29 For Internal Use Only 30 Site FSDE LAB Broomfield CO Source diag xxxxx xXxx com Severity Normal Category Switch Key switch 100000c0dd0061bb EventType StateChangeEvent X port 1 EventTime 01 08 2002 17 38 32 port 1 in SWITCH diag swlb ip 192 168 0 31 is now Unknown state changed from Online to Admin Site FSDE LAB Broomfield CO Source diag xxxxx xXxx com Severity Normal Category San Key switch 100000c0dd0061bb 1 EventType LinkEvent ITW switch ve EventTime 01 08 2002 17 39 47 Destination port 1 on ve diag vlb 29000060220041f4 nfo This could indicate a potential problem Cause Action tests associated with this link segment FIGURE 3 5 A2 B2 FC Link Storage Service Processor Side Even
9. grep savevemap or listavailable v which returns the status of individual virtualization engines Another virtualization engine command is updating the configuration Try listavailable v which returns the status of individual Virtualization engines and check for lock file directly by using 1s la opt SUNWsecfg etc look for vl lock or v2 1lock If the lock is set in error use the removelocks v command to clear Common to Virtualization engine Common to Virtualization engine Unable to start slicdon vepair Cannot execute command Login failed The environment variable VEPASSWD might be set to an incorrect value Try again Iry running startslicd and then showlogs e 50 to determine why Istartslicd couldn t start the daemon You might have to reset or power off the virtualization engine if the problem persists A password is required to log in to the virtualization engine The utility uses the VEPASSWD environment variable to login Set the VEPASSWD environment variable with the proper Value Common to Virtualization engine After resetting the virtualization engine the VENAME is unreachable Be aware that after a reset it takes approximately 30 seconds to boot The hardware might be faulty Check the IP address and netmask that has been assigned to the virtualization engine hardware 132 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002
10. 2 Verify that the power cooling unit state is in fru stat B Replace the PCU if necessary t3 enclosure Alarm log IRed Y Info Action Information This Errors s found in event includes all logfile var adm important errors messages t3 found Recommended action Check the messages file for appropriate action For Internal Use Only Category Component EventType Sev Action Description Information t3 enclosure Alarm Yello Action Time of T3 Recommended action time lw diag213 Discrepancy ip xxx 20 67 213 is Fix the date and time different from host jon the Sun StorEdge IT3 Fri Oct 26 3 array using the 10 16 17 200 date command Date Host 2001 10 26 land time should be 12 21 04 the same as the monitoring host t3 enclosure Audit Info Auditing a Information Audits new Sun StorEdge occur every week and IT3 array called ras pend a detailed d2 t3b1 description of the ip xxx 0 0 41 enclosure to the Sun slr mi 370 3990 Network Storage 01 e e1 003239 Command Center NSCC t3 lib Comm_ Info Information InBand Established Communication Communication regained InBand ccadieux with diag213 ip xxx 20 67 213 last reboot was 2001 09 27 15 22 00 t3 loob Comm_ Info Information Established OutOfBand Communication communications regained OutOfBand with diag213 ip xxx 20 67 213 Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices
11. 2b000060220041 f9 2b000060220041f4 080c Unsupported 102400 000 MBytes Enabled Enabled 0x0 0x0 Disk device dev rdsk c6t29000060220041F96257354230303052d0s2 devices scsi_vhci ssd g29000060220041 96257354230303052 c raw devices pci 6 4000 SUNW qlc 3 fpe0 0 2b6000060220041f9 0 primary ONLINE devices pci 6 4000 SUNW qlc 2 fpe0 0 2b6000060220041f4 0 primary OFFLINE Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Condition unknown CONNECTED CONNECTED luxadm display dev rdsk c6t29000060220041F96257354230303052d0s2 Note You can find procedures for restoring virtualization engine settings in the Sun StorEdge 3900 and 6900 Series Reference Manual v To Verify the A2 B2 FC Link You can check the A2 B2 FC link using the Storage Automated Diagnostic Environment Diagnose Test from Topology functionality The Storage Automated Diagnostic Environment s implementation of diagnostic tests verifies the operation of user selected components Using the Topology view you can select specific tests subtests and test options Refer to Chapter 5 of the Storage Automated Diagnostic Environment User s Guide for more information FRU Tests Available for A2 B2 FC Link Segment m The linktest is not available m The switch and or GBIC switchtest test Can be used only in conjunction with the loopback connector a Cannot be cabled to the virtualization engine while swi
12. gt System gt System gt Timeouts The current default timeouts are 10 seconds for ping and 60 seconds for http tokens 114 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 For Internal Use Only Category Component lEventType Sev Action Description Information t3 t3ofdg Diagnostic Red t3ofdg diag240 Test on diag213 ip xxx 20 67 213 failed t3 t3test Diagnostic Red t3test diag240 on Test diag213 ip xxx 20 67 213 failed t3 t3volverify Diagnostic Red t3volverify diag240 Test jon diag213 ip xxx 20 67 213 failed t3 enclosure Discovery Info Information Discovery events Discovered anew Sun jccur the first time the StorEdge T3 array agent probes a storage called ras d2 t3b1 device The Discovery ip xxx 0 0 41 event creates a Slr mi 370 3990 detailed description of 01 e e1 003239 the device monitored land sends it using any active notifier such as INetConnect or Email t3 kontroller Insert Info Information Component IA new Controller as controller ulctr identified by its serial id was added to T3 Inumber has been diag213 installed on the Sun ip xxx 20 67 213 StorEdge T3 array t3 disk Insert disk u2d3 SEAGATE Component ST318203FSUN18G LRGO7139 was added to diag158 i p xxx 20 67 158 t3 interface Insert Info loopcard Component A new LoopCard as identified by its serial Inumber has been installed on the Su
13. of LED off TABLE 7 2 lists the status LED code descriptions TABLE 7 2 LED Service and Diagnostic Codes 0 wo N 10 Fast blink LED blinks once LED blinks twice with one short duration one second between blinks LED blinks three times with one short duration one second between blinks LED blinks ten times with one short duration one second between blinks The blink code repeats continuously with a four second off interval between code sequences Chapter 7 Troubleshooting Virtualization Engine Devices 73 For Internal Use Only 74 Back Panel Features The back panel of the virtualization engine contains the Sun StorEdge network FC switch 8 or switch 16 switches and a socket for the AC power input and various data ports and LEDs Ethernet Port LEDs The Ethernet port LEDs indicate the speed activity and validity of the link shown in TABLE 7 3 TABLE7 3 Speed Activity and Validity of the Link LED Color State Description Speed Amber Solid On The link is 100Base TX Off The link is 10base T Link Activity Green Solid On A valid link is established Blink Normal operations including data activity Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Fibre Channel Link Error Status Report The virtualization engine s host side and device side interfaces provide statistical data for the counts listed in TABLE 7 4 TABLE 7 4 Virtualization Engine Statistical Data
14. t3 disk port Alarm Red Y Info Action The Information The Sun state of disk u1d1 StorEdge T3 array PortiState on Sun has reported that one StorEdge T3 array port of a dual ported t300 changed from OK _ disk has failed to failed IRecommended action 1 Telnet to affected Sun StorEdge T3 array 2 Verify disk state in fru stat fru list and vol stat t3 interface Alarm Red Y Info Action The Information The Sun StorEdge T3 array has reported that a loopcard is in a failed state Recommended action 1 Telnet to affected Sun StorEdge T3 array 2 Verify tje loopcard state with fru stat B Verify the matching firmware with the other loopcard 4 Re enable the loopcard if possible enable u encid 1 2 Replace loopcard if necessary 5 Re enable the disk if possible 6 Replace the disk if necessary 110 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 For Internal Use Only Category Component lEventType Sev Action Description Information t3 power Alarm Red Y Info Action The Information The state battery state of lof the batteries in the Jpower ulpcul BatStat Sun StorEdge T3 ke on diag213 array is not optimal ip xxx 20 67 213 is Fault IRecommended action Possible causes are 1 Telnet to the 1 Voltage level on affected Sun power supply and StorEdge T3 battery have array moved out of 2 Run refresh s acceptable
15. to access the LUN on the Alt Master the Sun StorEdge T3 array I O could travel a From HBA 0 gt Switch gt SVE 1 gt Switch gt Alt Master Controller Primary Route from HBA 0 m From HBA O gt Switch gt SVE 1 gt Switch gt Switch gt Master Controller gt Backend Loop to Alt Master Secondary Route from HBA 0 m From HBA 1 gt Switch gt SVE 2 gt Switch gt Switch gt Alt Master Controller Primary Route from HBA 1 a From HBA 1 gt Switch gt SVE 2 gt Switch gt Master Controller gt Backend Loop to Alt Master Secondary Route from HBA 1 The virtualization engine recognizes the primary active and secondary passive pathing for the LUNs and routes the I O to the primary controller unless there is a pathing failure to the primary path In this case the virtualization engine initiates a LUN failover and routes the I O through the secondary path which in turn goes through the interconnect cables Refer to FIGURE 7 6 The host using multipathing software is presented two primary active paths for each LUN allowing the host to route I O through either or both HBAs In the event of a path failure before the second tier of Sun StorEdge network FC switch 8 and switch 16 switches refer to FIGURE 7 5 one of the paths is disabled but the other path continues sending I O as normal and takes over the entire load No Sun StorEdge T3 array LUN failure is noted because of the redundant
16. using the set switchs1 command View and verify this nonstandard configuration setup as required using the showswitch command Refer to the Sun StorEdge 3900 and 6900 Series Reference Manual for detailed configuration information e INFO The chassis ID on the switch is not set to the default value This could be caused by unique ID settings or by conflicts in a SAN environment e INFO Ports are identified that are not in the default hard zone This could be because the port is set to the same hard zone as the cascaded switch in a SAN environment INOTE If multiple solutions are connected to a switch the switch settings might not match the default settings Appendix B SUNWsecfg Error Messages 141 142 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Index A accessing documentation online xv C checkswitch used to diagnose and troubleshooting switch 62 comments sending documentation comments xv configuration settings 47 verification of 47 D data host verification for Sun StorEdge 39x0 series 42 for Sun StorEdge 69x0 series 42 diagrams fibre channel link 15 16 documentation how book is organized xi shell prompts xiii using UNIX commands xii E error status checking fibre channel link manually 76 error status report fibre channel link 75 ethernet hubs related documentation 123 troubleshooting 123 event grid host 53 Explorer Data Collection uti
17. 3 Port Communication Port Port Port Number Daemon Management Programs 20000 Daemon Daemon 20001 Daemon virtualization engine 25000 virtualization engine virtualization engine 25001 TABLE A 4 provides service codes for the virtualization engine TABLE A 4 Service Codes Code Number Cause Corrective Action 005 IPCI bus parity error e Replace virtualization engine 24 The attempt to report one error resulted in e Cycle power to the virtualization engine another error 40 Corrupt database e Clear SAN database e Cycle power to the virtualization engine e Import SAN zone configuration 41 Corrupt database e Clear SAN database e Cycle power to the virtualization engine e Import SAN zone configuration 42 Zone mapping database e Import SAN zone configuration 050 This message indicates that an attempt to e Clear the SAN database write a value into non volatile storage e Cycle power to the virtualization engine failed It could be a hardware failure or it could be that one of the databases stored in Flash memory could not accept the entry being added 051 Cannot erase FLASH memory e Replace virtualization engine 53 Unauthorized cabling configuration e Check cabling Ensure server switch connects to host side and storage connects to device side of virtualization engine virtualization engine e If necessary clear SAN database e If necessary cycle virtualization engine power e If necessary import
18. 55 4e 20 20 20 20 20 wee et SUN 53 45 53 53 30 31 20 20 20 20 20 20 20 20 20 20 SESSO1 30 38 30 45 62 57 33 4b 30 30 31 48 30 30 30 O80EbW3K001H000 Vendor SUN Product SESSOL1 Revision 080E Removable media no Device type 0 From this screen note that the VLUN number is 62 57 33 4b 30 30 31 48 beginning with the 5th pair of numbers on the 3rd line up to and including the 12 pair Chapter 7 Troubleshooting Virtualization Engine Devices 79 For Internal Use Only Sun StorEdge Traffic Manager Enabled Devices 1 If the devices support the Sun StorEdge Traffic Manager software you can use this shortcut 2 Type luxadm display dev rdsk c6t29000060220041956257334B30303148d0s2 DEVICE PROPERTIES for disk dev rdsk c6t29000060220041956257334B30303148d0s2 Status Port A Oh K s Status Port B O K Vendor SUN Product ID SESSO1 WWN Node 2a00006022004195 WWN Port A 2b00006022004195 WWN Port B 2b00006022004186 Revision 080E Serial Num Unsupported Unformatted capacity 56320 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Device Type Disk device Path s dev rdsk c6t29000060220041956257334B30303148d0s2 devices scsi_vhci ssd g29000060220041956257334b30303148 c raw Controller devices pci lf 4000 SUNW qlc 4 fpe0 0 Device Address 2b00006022004195 0 Class primary State ONLINE Controller devices pci 1lf 4000 pci 2 SUNW qlc 5 fpe0 0 Device A
19. Environment device monitoring reports m Run the SEcfg script which displays and shows the Sun StorEdge T3 array configuration a LED Status online offline POST error codes found in the Sun StorEdge network FC switch 8 and switch 16 switch Installation and Configuration Guide m Explorer Data Collection Utility output located on the Storage Service Processor m SANsurfer GUI Note To run the SANsurfer GUI from the Storage Service Processor you must export X Display 5 Check the status of the virtualization engine using one or more of the following methods a Storage Automated Diagnostic Environment device monitoring reports m Run the SEcfg script which displays and shows the virtualization engine m Refer to the LED status blink codes in Chapter 7 6 Quiesce the I O along the path to be tested as follows a For installations using VERITAS VxDMP disable vxdmpadm a For installations using the Sun StorEdge Traffic Manager software unconfigure the Fabric device m Refer to To Quiesce the I O on page 8 m Halt the application 7 Test and isolate the FRUs using the following tools m Storage Automated Diagnostic Environment diagnostic tests this might require the use of a loopback cable for isolation m Sun StorEdge T3 array tests including t 3test 1M t 30fdg 1M and t3volverify 1M which can be found in the Storage Automated Diagnostic Environment User s Guide Note These tests isolate the probl
20. FRU run mpdrive failback if needed 108 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Sun StorEdge T3 Array Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort Sun StorEdge T3 array events by component category or event type The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event whether action is required a description of the event and the recommended action Refer to the Storage Automated Diagnostic Environment User s Guide for more information v Using the Sun StorEdge T3 Array Event Grid 1 From the Storage Automated Diagnostic Environment Help menu click the Event Grid link 2 Select the criteria from the Storage Automated Diagnostic Environment event grid like the one shown in FIGURE 8 5 Monitor rt Utilities Maintenance Diagnose amp Sun ge Automated Diagnostic Environment microsystems 20 06 010 diag176 central sun com Help Event Grid Help Help Page Select a Category Compon mEvent Grid heck ReportF t to d ent EventType and type GO to limit the report Click on the Columns headers to change the sort ispl R tfi t Click Info Action to Revi EventType rchitecture Diagnostics Diaq Strategy Utilities Release Notes power temp User s Guide pd Abbreviations disk port Copyrights Action Description Inf
21. Initiator Online Number of VLUNs Undefined 210000E08B033401 100001 Yes 0 Undefined 210000E08BO26COF 100002 Yes 0 Note This example uses the virtualization engine map file which could include old information Chapter 7 Troubleshooting Virtualization Engine Devices 81 For Internal Use Only 2 You can optionally establish a telnet connection to the virtualization engine and run the runsecfg utility to poll a live snapshot of the virtualization engine map Refer to To Replace a Failed Virtualization Engine on page 84 for telnet instructions Determining the virtualization engine pairs on the system MAIN MENU SUN StorEdge 6910 SYSTEM CONFIGURATION TOOL 1 T3 Configuration Utility 2 Switch Configuration Utility 3 Virtualization Engine Configuration Utility 4 View Logs 5 View Errors 6 Exit Select option above gt 3 VIRTUALIZATION ENGINE MAIN MENU 1 anage VLUNs 2 anage Virtualization Engine Zones 3 Manage Configuration Files 4 Manage Virtualization Engine Hosts 5 Help 6 Return Select option above gt 3 MANAGE CONFIGURATION FILES MENU 1 Display Virtualization Engine Map 2 Save Virtualization Engine Map 3 Verify Virtualization Engine Map 4 Help 5 Return Select configuration option above gt 1 Do you want to poll the live system time consuming or view the file l f 1 From the virtualization engine map output you can match the VLUN serial number to
22. Link linktest Running linktest from the Storage Automated Diagnostic Environment GUI will guide the Service Engineer to discover the failed FRU Once the test has completed its run an email message similar to the following message will be sent to the Email recipient that was specified in linktest running on diag xxxxx xxx com linktest started on FC interconnect switch to switch switchtest started on switch 100000c0Odd00b682 port 8 Estimated test time 14 minute s 01 30 02 11 21 26 diag209 Storage Automated Diagnostic Environment MSGID 6013 switchtest FATAL switch0O Device Switch Port 8 is Offline switchtest failed Remove FC Cable from switch 100000c0Odd00b682 port 8 Insert FC loopback cable into switch 100000c0dd00b682 port 8 Continue Isolation switchtest started on switch 100000c0dd00b682 port 8 Estimated test time 14 minute s 01 30 02 11 22 11 diag209 Storage Automated Diagnostic Environment MSGID 6013 switchtest FATAL switchO Device Switch Port 8 is Offline switchtest failed Remove FC loopback cable from switch 100000c0dd00b682 port 8 Insert a NEW FC GBIC into switch 100000c0dd00b682 port 8 Insert FC loopback cable into switch 100000c0dd00b682 port 8 Continue Isolation switchtest started on switch 100000c0dd00b682 port 8 Estimated test time 14 minute s 01 30 02 11 25 12 diag209 Storage Automated Diagnostic Environment MSGID 4001 switchtest WARNING switch0O Maximum tran
23. as examples qlctest 1M The qlctest 1M comprises several subtests that test the functions of the Sun StorEdge PCI dual Fibre Channel FC host adapter board This board is an HBA that has diagnostic support This diagnostic test is not scalable CODE EXAMPLE 2 1 qlctest 1M opt SUNWstade Diags bin qlctest v o dev devices pci 6 4000 SUNW q1lc 3 fp 0 0 devct1 run_connect Yes mbox Disable ilb Disable ilb_10 Disable elb Enable qlctest called with options dev devices pci 6 4000 SUNW qlc 3 fp 0 0 devct1 run_connect Yes mbox Disable ilb Disable ilb_10 Disablel el b Enable qlctest Started Program Version is 4 0 1 Testing qlcO device at devices pci 6 4000 SUNW qlc 3 fp 0 0 devetl QLC Adapter Chip Revision 1 Risc Revision 3 Frame Buffer Revision 1029 Riscrom Revision 4 Driver Revision 5 a 2 1 15 Running ECHO command test with pattern Ox7e7e7e7e Running ECHO command test with pattern Oxlelelele Running ECHO command test with pattern Oxf1f1f1f1 lt snip gt Running ECHO command test with pattern 0x4a4a4a4a Running ECHO command test with pattern 0x78787878 Running ECHO command test with pattern 0x25252525 FCODE revision is ISP2200 FC AL Host Adapter Driver 1 12 01 01 16 Firmware revision is 2 1 7f Running CHECKSUM check Running diag selftest qlctest Stopped successfully Chapter 2 General Troubleshooting Procedures 19 For Internal Use O
24. cause SCSI errors on the data host and a brief suspension of I O while the failover occurs Chapter 2 General Troubleshooting Procedures 13 For Internal Use Only v To Return the Path to Production 1 Type vxdmpadm enable ctlr lt c gt 2 Verify that the path has been re enabled by typing vxdmpadm listctlr all 14 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Fibre Channel Links The following sections provide troubleshooting information for the basic components and Fibre Channel links listed in TABLE 2 1 TABLE 2 1 Link Provides Fibre Channel Link Between these Components Al to B1 Datahost swla and sw1b A2 swla and vla B2 sw1b and v1b A3 vla and sw2a B3 vlb and sw2b A4 Master Sun StorEdge T3 array and the A path switch B4 AltMaster Sun StorEdge T3 array and the B path switch T1 to T2 sw2a and sw2b Sun StorEdge 6900 series only Note In an actual Sun StorEdge 3900 or 6900 series configuration there could be more Sun StorEdge T3 arrays than are shown in FIGURE 2 1 and FIGURE 2 2 By using the Storage Automated Diagnostic Environment you should be able to isolate the problem to one particular segment of the configuration The information found in this section is based on the assumption that the Storage Automated Diagnostic Environment is running on the data host and that it is configured to monitor host errors If the Storage Automate
25. data 71000 virtualization engine to virtualization engine communication has recovered 71001 This is a generic error code for the SLIC It Check the condition of the virtualization signifies communication problems between ngine the virtualization engine and the Daemon Check the cabling between the Virtualization engine and Daemon server Error halt mode also forces this SRN 71002 This indicates that the SLIC was busy Check the condition of the virtualization engine Check the cabling between the Virtualization engine and the Daemon server Error halt mode also forces this SRN 71003 SLIC Master unreachable Check conditions of the virtualization engines in the SAN 71010 The status of the SLIC daemon has changed 72000 Primary Secondary SLIC daemon connection is active 72001 Failed to read SAN drive configuration 72002 Failed to lock on to SLIC daemon 72003 Failed to read SAN SignOn Information 72004 Failed to read Zone configuration Appendix A Virtualization Engine References 127 TABLE A 1 SRN and SNMP Reference ISRN Description Corrective Action 72005 Failed to check for SAN changes 72006 Failed to read SAN event log 72007 SLIC daemon connection is down Wait for 1 5 minutes for backup daemon to come up If it doesn t check the network connection for virtualization engine halt or hardware failure TABLE A 2 SRN SNMP Single Point of Failure Table
26. device the data host multipathing software is responsible for initiating the failover and reports it in var adm messages such as those reported by the Storage Automated Diagnostic Environment email notifications The luxadm failover command is used to fail the Sun StorEdge T3 array LUNs back to the proper configuration after the failing FRU is replaced This command is issued from the data host Sun StorEdge 6900 Series In a Sun StorEdge 6900 series device the virtualization engine pairs handle the failover and the failover is not noted on the data host All paths would remain ONLINE and ACTIVE The mpdrive failback command is used and is issued from the Storage Service Processor Note In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900 series configuration the virtualization engine pairs handle the failover In addition the multipathing software notes a path failure on the data host Sun StorEdge Traffic Manager or VxDMP takes the entire path that was connected to the failed switch offline and the ISL ports on the surviving switch go offline as well 42 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 To verify the failover luxadm display can be used the failed path will be marked OFFLINE as shown in CODE EXAMPLE 3 7 CODE EXAMPLE 3 7 Failed Path marked OFFLINE luxadm display dev rdsk c26t60020F200000644 gt DEVICE PROPERTIES for disk dev rdsk c26t60020F2000006443
27. e Interface e Audit SRS e Sun Switch e LUN e CommunicationEstablished e Sun StorEdge T3 array e Port e CommunicationLost Yellow e Tape e Power e Discovery Alert IN This e Vvirtualization engine e Heartbeat Warning byent is e Insert Component non e Location Change actionable e Patch Info Quiesce End Down o System Down e Remove Component e State Change from offline to online e State Change from online to offline e Statistics e Backup For Internal Use Only Chapter 2 General Troubleshooting Procedures 21 22 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 3 Troubleshooting the Fibre Channel Links A1 B1 Fibre Channel FC Link If a problem occurs with the A1 B1 FC link m Ina Sun StorEdge 3900 series system the Sun StorEdge T3 array will fail over m Ina Sun StorEdge 6900 series system no Sun StorEdge T3 array will fail over but a severe problem can cause a path to go offline FIGURE 3 1 FIGURE 3 2 and FIGURE 3 3 are examples of A1 B1 Fibre Channel Link Notification Events Site FSDE LAB Broomfield CO Source diag xxxxx xxx com Severity Normal Category Message Key message diag xxxxx xxx com EventType LogEvent driver LOOP_OFFLINE EventTime 01 08 2002 14 34 45 Found 1 driver LOOP_OFFLINE error s in logfile var adm messages on diag xxxxx xxx com id 80fee746 info Loop Offline Jan 8 14 34
28. engine changes are accepted 2 Monitor the var adm log SEcfglog file to see when the savevemap process successfully exits 50 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CODE EXAMPLE 4 2 savevemap output Tue Jan 29 34 MST 2002 savevemap vl ENTER Tue Jan 29 2 12 34 MST 2002 checkslicd vl ENTER Tue Jan 29 2 12 42 MST 2002 checkslicd vl EXIT Tue Jan 29 14 01 MST 2002 savevemap v1 EXIT When savevemap lt ve pair gt EXIT is displayed the savevemap process has successfully exited Chapter 4 Configuration Settings 51 For Internal Use Only 52 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 5 Troubleshooting Host Devices This chapter describes how to troubleshoot components associated with a Sun StorEdge 3900 or 6900 series Host This chapter contains the following sections a Using the Host Event Grid on page 53 a To Replace the Master Host on page 57 a To Replace the Alternate Master or Slave Monitoring Host on page 58 Host Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort host events by component category or event type The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event whether action is required a description of the event and the recommended action Refer to the Storage Automated Diagnostic Environment User s G
29. failed Check the status of both virtualization 2 Restore physical and logical data engines If there is an error condition failed refer to Appendix A for corrective B Restore zone data failed faction Attempt to run the restorevemap command again setdefaultconfig 1 Unable to properly configure the Check the status of the virtualization virtualization engine host engine and try again S vehost 2 Cannot continue configuration of other components setdefaultconfig The setupvecommand failed Iry running setupve n ve_hostname v verbose mode and check the errors Then run checkve n ve_hostname You can continue to configure VLUNs land zones only if both of these commands work 134 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE B 2 Sun StorEdge Network FC Switch 8 and Switch 16 Switch SUNWsecfg Error Messages Message Description and Cause of Error Suggested Action Common Switch Sun StorEdge system type entered S cab_type does not match system type discovered boxtype Either call the command with the f force option to force the series type or do not specify the cabinet type no c option Common Switch 1 Unable to obtain lock on switch switch Another command is running 1 Another switch command might be updating the configuration Check listavailable s 2 If the switch in question does not appear check for the existence of t
30. series storage systems The information in this appendix expands on that information by providing recommendations for corrective action should you encounter errors with the command utilities The error messages are broken out into the following tables m TABLE B 1 lists SUNWsecfg error messages specific to the virtualization engine m TABLE B 2 lists SUNWsecfg error messages specific to the Sun StorEdge network FC switch 8 and switch 16 switches m TABLE B 3 lists SUNWsecfg error messages specific to the Sun StorEdge T3 array m TABLE B 4 lists miscellaneous SUNWsecfg error messages common to all components 131 For Internal Use Only TABLE B 1 Virtualization Engine SUNWsecfg Error Messages Message Common to virtualization engines Description and Cause of Error Invalid virtualization engine pair name Svepair or virtualization engine is unavailable Confirm that the configuration locks are set This is usually due to the savevemap command running Suggested Action Iry ps ef grep savevemap or listavailable v which returns the status of individual virtualization engines Common to Virtualization engine Common to Virtualization engine INo virtualization engine pairs found or the virtualization engine pairs are offline Confirm that the configuration locks are set This is usually due to the Savevemap command running Unable to obtain lock on vepair Another command is running Iry ps ef
31. the VLUN name VDRV000 the disk pool t3b00 and the MP drive target 149152 This information can also help you find the controller serial number 60020F2000006DFA which you need to perform Sun StorEdge T3 array LUN failback commands 82 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Failback the Virtualization Engine In the event of a Sun StorEdge T3 array LUN failover use the following procedure to fail the LUN back to its original controller 1 From the Storage Service Processor type opt svengine sduc mpdrive failback d v1 j 60020F2000006DFA where d Virtualization engine pair on which to run the command lt j Controller serial number which corresponds to the Sun StorEdge T3 array WWN of the affected partner pair The failback command will always be performed on the controller serial number regardless by which controller the LUN actually is currently owned the Master or Alt Master All VLUNS are affected by a failover and failback of the underlying physical LUN The controller serial number is the system WWN for the Sun StorEdge T3 array In the above example the master Sun StorEdge T3 array WWN is 50020F2300006DFA and the number used in the failback command is 60020F2000006DFA 2 The SLIC daemon must be running for the mpdrive failback command to work Ensure that the SLIC daemon is running by using the command found in CODE EXAMPLE 7 3 If no SLIC p
32. the system and configuring devices See one or more of the following for this information m Solaris Handbook for Sun Peripherals a AnswerBook2 online documentation for the Solaris operating environment a Other software documentation that you received with your system Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Typographic Conventions Typeface Meaning AaBbCc123 The names of commands files and directories on screen computer output AaBbCc123 What you type when contrasted with on screen computer output AaBbCc123 Book titles new words or terms words to be emphasized Command line variable replace with a real name or value Examples Edit your login file Use 1s a to list all files o You have mail S su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Shell Prompts Shell C shell C shell superuser Bourne shell and Korn shell Bourne shell and Korn shell superuser Prompt machine_name machine_name Preface xiii Related Documentation Product Title Part Number Late breaking News e Sun StorEdge 3900 and 6900 Series Release Notes 816 3247 Sun StorEdge 3900 and 6900 e Sun StorEdge 3900 and 6900 Series Site Preparation Guide 816 3242 series hardware information e Sun StorEdge 3900 and 6900 Series Regulatory and Saf
33. to verify the thresholds battery state 2 The internal PCU B Replace the temp has exceeded battery if acceptable necessary thresholds B A PCU fan has failed t3 power fan Alarm Red Y Info Action The Information The state State of lof a fan on the Sun power ulpcu1 Fan1Sta torEdge T3 array is te on diag213 not optimal ip xxx 20 67 213 is Fault Recommended action 1 Telnet to affected Sun StorEdge T3 array 2 Verify the fan state with fru stat B Replace the power cooling unit if necessary Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 111 112 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Category Component EventType Sev Action Description Information t3 power Alarm Red Y Info Action The Information The state butput state of lof the power in the power ulpcul PowOu Bun StorEdge T3 tput on diag213 array power cooling ip xxx 20 67 21 junit is not optimal 3 is Fault Recommended action 1 Telnet to affected Sun StorEdge T3 array 2 Verify power cooling unit state in fru stat B Replace PCU if necessary t3 power temp Alarm Red Y Info Action The Information The state state of lof the temperature in power ulpcul PowT the Sun StorEdge T3 emp on diag213 array power cooling ip junit is either too high KXX 20 67 213 or is unknown lis Fault Recommended action 1 Telnet to the affected Sun StorEdge T3 array
34. 00007a609 switch loob Comm_ Communication Established regained with swia Lp xxx 20 67 213 switch loob Comm_Lost IDown Yes Info Action Lost Information communication with Ethernet swla connectivity to the ip xxx 20 67 213 switch has been lost IRecommended faction 1 Check Ethernet connectivity to the switch 2 Verify that the switch is booted correctly with no POST errors B Verify that the switch Test Mode is set for normal operations 4 Verify the TCP IP settings on switch via Forced PROM Mode access 5 Replace switch if needed switch switchtest Diagnostic Red switchtest diag240 Test jon d2 swb1 ip xxx 0 0 41 110002000007a609 Chapter 6 Troubleshooting Sun StorEdge FC Switch 8 and Switch 16 Devices 65 For Internal Use Only TABLE 6 1 Storage Automated Diagnostic Environment Event Grid for Switches Continued ICat Component lEventType Sev Action Description Information Action switch enclosure Discovery Info Discovered a Discovery events mew switch called ras occur the very first d2 swb1 time the agent ip xxx 0 0 41 probes a storage 110002000007a609 device It creates a detailed description lof the device monitored and sends it using any active notifier NetConnect Email switch enclosure LocationChan Location of switch ge rasd2 swb0 ip xxx 0 0 40 was changed 66 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002
35. 0041F96257354230303052d0s2 Status Port A O K Status Port B O K Vendor SUN Product ID SESSO1 WWN Node 2a000060220041f4 WWN Port A 2b000060220041 f4 WWN Port B 2b000060220041f9 Revision 080C Serial Num Unsupported Unformatted capacity 102400 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Device Type Disk device Path s dev rdsk c6t29000060220041F96257354230303052d0s2 devices scsi_vhci ssd g29000060220041f96257354230303052 c raw Controller devices pci 6 4000 SUNW glc 2 fp 0 0 Device Address 2b6000060220041f4 0 Class primary State ONLINE Controller devices pci 6 4000 SUNW glc 3 fp 0 0 Device Address 2b000060220041 f9 0 Class primary State ONLINE Chapter 2 General Troubleshooting Procedures 7 For Internal Use Only Note that in the Class and State fields the virtualization engines are presented as two primary ONLINE devices The current Sun StorEdge Traffic Manager design does not enable you to manually halt the I O that is you cannot perform a failover to the secondary path when only primary devices are present Alternatives to Sun StorEdge Traffic Manager As an alternative to using Sun StorEdge Traffic Manager you can manually halt the I O using one of two methods quiesce I O and unconfigure the c2 path These methods are explained below v To Quiesce the I O 1 Determine the path you want to disable 2 Type cf
36. 07 51 PST 2002 checkt3config t3b0 INFO In this example the mirror setting in the Sun StorEdge T3 array system settings is off The SAVED CONFIGURATION setting for this parameter which is the default setting should be auto Chapter 4 Configuration Settings 49 For Internal Use Only 11 Fix the FAIL condition and then verify the settings again opt SUNWsecfg bin checkt3config n t3b0 Checking t3b0 Configuration Checking command ver PASS Checking command vol stat PASS Checking command port list PASS Checking command port listmap PASS Checking command sys list PASS If you interrupt any of the SUNWsecfg scripts by typing a Cont rol c default font for example a lock file might remain in the opt SUNWsecfg etc directory causing subsequent commands to fail Use the following procedure to clear the lock file v To Clear the Lock File 1 Type the following command opt SUNWsecfg bin removelocks usage removelocks t s v where t remove all T3 related lock files s remove all switch related lock files v remove all virtualization engine related lock files opt SUNWsecfg bin removelocks v Note After any virtualization engine configuration change the script saves a new copy of the virtualization engine map This may take a minimum of two minutes during which time no additional virtualization
37. 113 Category t3 t3 Component lib loob EventType Comm_Lost Comm_Lost Sev Down Down Action Description Info Action Lost kommunication InBandwithdiag213 ip xxx 20 67 21 3 last reboot was 2001 09 27 15 22 00 Information InBand This event is established using luxadm This monitoring may not be activated for a particular Sun StorEdge T3 array Info Action Lost communication OutOfBand with diag213 ip xxx 20 67 212 Probable Cause This problem can also be caused by a very slow Inetwork or because the Ethernet connection to this Sun StorEdge T3 array lwas lost Information OutOfBand This means that the Sun StorEdge T3 array failed to answer to a ping or failed to return its tokens Information IRecommended action 1 Verify Luxadm via command line luxadm probe luxadm display 2 Verify cables GBICs and connections along data path B Check the Storage Automated Diagnostic Environment SAN Topology GUI to identify the failing segment of the data path 4 Verify the correct FC switch configuration if applicable Recommended action 1 Check Ethernet connectivity to the affected Sun StorEdge T3 array 2 Verify the Sun StorEdge T3 array is booted correctly B Verify the correct TCP IP settings on the Sun StorEdge T3 array 4 Increase the http and or ping timeout in Utilities
38. 196609 0x5555aa7a ra root other s 196610 0x5555aaba ra root other s 3 0x10e1 a l iirinn root root Segments identified with 0x5555aa in the address are associated with the SLIC daemon Chapter 7 Troubleshooting Virtualization Engine Devices For Internal Use Only 87 3 Remove the segments by typing the following ipcrm m 301 m 302 m 303 s 196608 s 196609 s 196610 Check the ipcrm 1m man page for details 4 Restart the SLIC daemon opt SUNWsecfg bin startslicd n vl or v2 depending on configuration 5 Confirm that the SLIC daemon is running ps eaf grep slicd root 16132 16130 0 11 45 00 0 00 slicd root 16135 16130 0 11 45 00 0 00 slicd root 16130 1 0 11 45 00 0200 slied r et 16131 16130 011245200 2 0 00 slicd root 16189 15877 0 11 48 49 pts 1 0 00 grep slicd root 16143 16130 0 11 45 00 0 00 slicd The message queues shared memory and semaphores have been removed 88 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Sun StorEdge 6900 Series Multipathing Example One Sun StorEdge T3 array partner pair with 1 500GB RAID 5 LUN per brick 2 LUNs total Currently there is one 10GB VLUN created from each physical LUN for a total of two VLUNs In a Sun StorEdge 6900 series there are four possible physical paths to each Sun StorEdge T3 array Volume LUN Refer to FIGURE 7 4 and FIGURE 7 3 For example
39. 2 Editing Sun StorEdge T3 array information using vi vi t3input txt Input file for extended data collection Format is HOST PASSWORD t3b0 XXXX t3b2 XXXX t3b3 XXXX wq Note xxxx represents Sun StorEdge T3 array passwords m You can now run opt SUNWexplo bin explorer to collect information about the Storage Service Processor operating system the Sun StorEdge network FC switch 8 or switch 16 switch and Sun StorEdge T3 array information which can be used for troubleshooting purposes a Atar gzip file will be put into the opt SUNWexplo output directory The tar gzip file can be sent to Sun Service for evaluation m The Sun StorEdge network FC switch 8 and switch 16 switch information will be placed in the san directory of the tar file m Sun StorEdge T3 array information will be placed in the disk s t3 directory Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 101 For Internal Use Only 102 Troubleshooting the T1 T2 Data Path Notes There are two T Port links for redundancy If one of the two links is lost no Sun StorEdge T3 array LUN failover will occur and no pathing failures will be noted If both T Port links fail there will be a Sun StorEdge T3 array LUN failover as one of the virtualization engines take control of the I O operations One of the Sun StorEdge T3 array LUNs will failover as all I O is routed to the controlling virtualization engine The host will n
40. 25 WWN Received 2 Loop Offline message s threshold is 1 in 5mins Last Message diag xxxxx xxx com qlc ID 686697 kern info NOTICE Qlogic qlc 0 Loop OFFLINE FIGURE 3 1 Data Host Notification of Intermittent Problems 23 Site FSDE LAB Broomfield CO Source diag XXXXX XXX COom Severity Normal Category Message Key message diag XXXXX XXX COom EventType LogEvent driver MPXIO_offline EventTime 01 08 2002 14 48 02 Found 2 driver MPXIO_offline warning s in logfile var adm messages on diag xxxxx xxx com id 80fee746 Jan 8 14 47 07 WWN 2b000060220041f9 diag xxxxx xxx com mpxio ID 779286 kern info scsi_vhci ssd g29000060220041f96257354230303053 ssd19 multipath status degraded path pci 6 4000 SUNW glc 3 fp 0 0 fpl to target address 2b000060220041f9 1 is offline Jan 8 14 47 07 WWN 2b000060220041 f9 diag xxxxx xxx com mpxio ID 779286 kern info scsi_vhci ssd g29000060220041 96257354230303052 ssd18 multipath status degraded path pci 6 4000 SUNW glc 3 fpe 0 0 fpl to target address 2b000060220041f9 0 is offline FIGURE 3 2 Data Host Notification of Severe Link Error Site FSDE LAB Broomfield CO Source diag xxxxx xXxx com Severity Normal Category Switch Key switch 100000c0dd0057bd EventType StateChangeEvent X port 6 EventTime 01 08 2002 14 54 20 port 6 in SWITCH diag swla ip 192 168 0 30 is now Unknown status state changed from Online t
41. 3 B3 FC link Notification Events Site FSDE LAB Broomfield CO Source diag xxxxx xXxx com Severity Normal Category Message Key message diag xxxxx xXxx Ccom EventType LogEvent driver MPXIO_offline EventTime 01 08 2002 18 25 18 Found 2 driver MPXIO_offline warning s in logfile var adm messages on diag xxxxx xxx com id 80fee746 Jan 8 18 24 24 WWN 2b000060220041 f9 diag xxxxx xxx com mpxio ID 779286 kern info scsi_vhci ssd g29000060220041f96257354230303053 ssd19 multipath status degraded path pci 6 4000 SUNW glc 3 fp 0 0 fpl to target address 2b000060220041f9 1 is offline Jan 8 18 24 24 WWN 2b000060220041f9 diag xxxxx xxx com mpxio ID 779286 kern info scsi_vhci ssd g29000060220041 96257354230303052 ssd18 multipath status degraded path pci 6 4000 SUNW glc 3 fp 0 0 fpl to target address 2b000060220041f9 0 is offline Site FSDE LAB Broomfield CO Source diag XXXXX XXX COom Severity Normal Category Message Key message diag XXXXX XXX COom EventType LogEvent driver Fabric_Warning EventTime 01 08 2002 18 25 18 Found 1 driver Fabric_Warning warning s in logfile var adm messages on diag xxxxx xxx com id 80fee746 Info Fabric warning Jan 8 18 24 04 WWN 2b000060220041f9 diag xxxxx xxx com fp ID 517869 kern warning WARNING fp 1 N_x Port with D_ID 104000 PWWN 2b000060220041f9 disappeared from fabric FIGURE 3 6 A3 B3 FC Link Host Side Event Chap
42. 3C3352A60003E82Fd0s2 Status Port A O K Status Port B O K Vendor SUN Product ID T300 WWN Node 50020 2000006443 WWN Port A 50020 2300006355 WWN Port B 50020 2300006443 Revision 0118 Serial Num Unsupported Unformatted capacity 488642 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Device Type Disk device Path s dev rdsk c26t60020F20000064433C3352A60003E82Fd0s2 devices scsi_vhci ssd g60020 20000064433c3352a60003e82f c raw Controller devices pci a 2000 pcit 2 SUNW qlc 5 fp 0 0 Device Address 50020 2300006355 1 Class primary State OFFLINE Controller devices pcile 2000 pcit2 SUNW glc 5 fp 0 0 Device Address 50020 2300006443 1 Class secondary State ONLINE Note This type of error may also cause the device to show up unusable in cfgadm as shown in CODE EXAMPLE 3 8 Chapter 3 Troubleshooting the Fibre Channel Links 43 For Internal Use Only CODE EXAMPLE 3 8 Failed Path marked unusable cfgadm al Ap_Id Type Receptacle Occupant Condition ac0 bank0 memory connected configured ok ac0 bank1 memory empty unconfigured unknown cl scsi bus connected configured unknown c16 scsi bus connected unconfigured unknown c18 scsi bus connected unconfigured unknown c19 scsi bus connected unconfigured unknown cl dsk cl1t6d0 CD ROM connected configured unknown c20 fc private connected unconfigured unknown c21 fc fabric connected c
43. 3config and restoret3config utility to configure the Sun StorEdge T3 array checkt3config cCheckt3config vol init command is being executed by another user Additional vol commands cannot run An error occurred while checking proc list aborting operation on SBRICK_IP brick_name Appendix B Check whether any other secfg utility is running If one is running allow it to finish Check whether any other secfg or Inative Sun StorEdge T3 commands are being executed on the particular Sun StorEdge T3 array SUNWsecfg Error Messages 137 TABLE B 3 Sun StorEdge T3 Array SUNWsecfg Error Messages Continued Message Description and Cause of Error Suggested Action checkt 3config Snapshot configuration files are not Make sure that the snapshot files are present Unable to check configuration saved and have read permissions in the opt SUNWsecfg etc t3name directory If the snapshot files are not available create them by using the Savet 3config command checkt 3mount 1 The lun status reported a bad or Make sure that the requested LUN nonexistent LUN exists on the Sun StorEdge T3 array by using the showt3 n command 2 While checking the configuration using the showt3 ncommand Confirm that the Sun StorEdge T3 operations abort array configuration matches standard configurations createvlun Invalid diskpool diskpool on vepair Ensure the diskpool was created lor diskpool is unavailable p
44. 55 copy 01 offset 000000 enabled Multipathing information numpaths 2 c20t2B000060220041F4d0s2 state enabled c23t2B000060220041F9d0s2 state enabled vxdmpadm listctlr all ENCLR TYPE STATE ENCLR NAME OTHER_DISKS ENABLED OTHER_DISKS SENA ENABLED SENAO SENA ENABLED SENAO Disk ENABLED Disk Disk ENABLED Disk From the Vx m c20t2 m C23t2 Disk output notice that there are two physical paths to the LUN B000060220041F4d0s2 B000060220041F9d0s2 Both of these paths are currently enabled with VxDMP Chapter 2 General Troubleshooting Procedures 11 For Internal Use Only 2 Use the luxadm 1M command to display further information about the underlying LUN luxadm display dev rdsk c20t2B000060220041F4d0s2 DEVICE PROPERTIES for disk dev rdsk c20t2B000060220041F4d0s2 Status Port A O K Vendor SUN Product ID SESSOL WWN Node 2a000060220041f4 WWN Port A 2b000060220041f4 Revision 080c Serial Num Unsupported Unformatted capacity 102400 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Device Type Disk device Path s dev rdsk c20t2B000060220041F4d0s2 devices pci a 2000 pci 2 SUNW glc 4 fp 0 0 ssd w2b000060220041f4 0 c raw luxadm display dev rdsk c23t2B000060220041F9d0s2 Device Type Path s DEVICE PROPERTIES for disk dev rdsk c23t2B000060220041F9d0s2 Status Port A O K Vendor SUN Product ID SESSO1 W
45. 7 18 07 51 PST 2002 checkt3config t3b0 INFO SAVED CONF IGURATION x on Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO blocksize 16k on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO cache auto on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO mirror auto on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO mp_Ssupport rw Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO rd_ahead off Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO recon_rate med on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO sys memsize 32 MBytes on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO cache memsize 256 MBytes on Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO CURRENT CONF IGURATION on Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO blocksize 16k on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO cache auto Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO mirror off on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO mp_Support rw on Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO rd_ahead off on Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO recon_rate med Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 FO sys memsize 32 Bytes Mon Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO cache memsize 256 MBytes on Jan 7 18 07 51 PST 2002 checkt3config t3b0 INFO on Jan 7 18
46. Address The FCAddress can be set to 0x0 26 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CODE EXAMPLE 3 3 switchtest 1M called with options switchtest v o dev 2 192 168 0 30 0 switchtest called with options dev 2 192 168 0 30 0 switchtest Started Testing port 2 Using ip_addr 192 168 0 30 fcaddr 0x0 to access this port Chassis Status for Device Switch Power OK Temp OK 23 0c Fan 1 OK Fan 2 OK 02 06 02 15 09 45 diag Storage Automated Diagnostic Environment MSGID 4001 switchtest WARNING switch0O Maximum transfer size for a FABRIC port is 200 Changing transfer size 2000 to 200 Testing Device Switch Port 2 Pattern 0x7e7e7e7e Testing Device Switch Port 2 Pattern Oxlelelele Note The Storage Automated Diagnostic Environment automatically resets the transfer size if it notes that it is about to test a switch to HBA connection This is done both in the Storage Automated Diagnostic Environment GUI and from the command line interface CLI Chapter 3 Troubleshooting the Fibre Channel Links For Internal Use Only 27 v To Isolate the A1 B1 FC Link 1 Quiesce the I O on the A1 B1 FC link path 2 Run switchtest or qlctest to test the entire link 3 Break the connection by uncabling the link 4 Insert a loopback connector into the switch port 5 Rerun switchtest a If switchtest fails replace the GBIC and rerun switchtest b If switchtes
47. E LAB Broomfield CO Source diag Severity Warning Category T3message DeviceId t3message 83060c0c EventType LogEvent MessageLog EventTime 01 29 2002 14 25 06 Warning s found in logfile var adm messages t3 on diag id 83060c0c Jan 29 14 12 58 t3b0 ISR1 2 W u2ctr ISP2100 2 Received LOOP DOWN async event Jan 29 14 13 32 t3b0 MNXT 1 W ulctr starting lun 1 failover Site FSDE LAB Broomfield CO Source diag Severity Warning Category T3message DeviceId t3message 83060c0c EventType LogEvent MessageLog EventTime 01 29 2002 14 11 14 Warning s found in logfile var adm messages t3 on diag id 83060c0c Jan 29 14 05 18 t3b0 ISR1 1 W u2d4 SVD_PATH_FAILOVER path_id 0 Jan 29 14 05 18 t3b0 ISR1 1 W u2d5 SVD_PATH_FAILOVER path_id 0 Jan 29 14 05 18 t3b0 ISR1 1 W u2d6 SVD_PATH_FAILOVER path_id 0 Jan 29 14 05 18 t3b0 ISR1 1 W u2d7 SVD_PATH_FAILOVER path_id 0 Jan 29 14 05 18 t3b0 ISR1 1 W u2d8 SVD_PATH_FAILOVER path_id 0 Jan 29 14 05 18 t3b0 ISR1 1 W u2d9 SVD_PATH_FAILOVER path_id 0 FIGURE 3 10 Storage Service Processor Notification Chapter 3 Troubleshooting the Fibre Channel Links 41 For Internal Use Only v To Verify the Data Host A problem in the A4 B4 FC Link appears differently on the data host depending on if the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 seriesdevice Sun StorEdge 3900 Series In a Sun StorEdge 3900 series
48. Edge 3900 and 6900 Series Troubleshooting Guide March 2002 QS N age Automated Diagnostic Environment Maintenance Monitor Diagnose Report Utilities FE un 2 0 06 010 diag176 central sun com Bap Sitemap amp UN C Help Event Grid Help Help Page Select a Category Component EventType and type GO to limit the report Click on the Columns headers to change the sort Event Grid nt Grid pdf Check ReportFormat to display a Report format Click Info Action to Review ReportFormat Description Info Action Change in Port Statistics on switch Info chassis power 1 status changed from OK to Invalid Auditing a new switch called ras d2 swb1 ip xxx 0 0 41 10002000007a609 dnfecaction Lost communication with sw1a ip xxx 2 s switchtest diag240 on d2 swb1 ip xxx 0 0 41 DiagnosticTest 10002000007a609 Info Discovered a new switch called ras d2 swb1 Comm_Lost Discovery feet Info Statistics about switch d2 swb1 ip xxx 0 0 41 Statistics 100020000072609 Page 1 of 1 13 events a Sev Severity of the event warning gt Error gt Down Action This event is Actionable and will be sent to RSS SRS SubComp SubComponent FIGURE 6 1 Switch Event Grid Chapter 6 Troubleshooting Sun StorEdge FC Switch 8 and Switch 16 Devices 63 For Internal Use Only TABLE 6 1 lists the switch events TABLE6 1 Storage Automated Diagnostic Environment Event Grid for Switch
49. For Internal Use Only TABLE 7 5 lists the Virtualization Engine Events TABLE7 5 Storage Automated Diagnostic Environment Event Grid for Virtualization Engine Category Component lEventType Sev Action Description virtualization enclosure Alarm Yellow Volume E00012 on engine vla changed Imapping Virtualization enclosure Alarm log Yellow Change in Port engine Statistics on virtualization engine vla virtualization enclosure Audit Info Auditinga nformation engine Virtualization Audits occur every Engine called vla week and send a detailed description of the enclosure to the Sun Network Storage Command Center NSCC virtualization joob Comm_ Communication engine Established regained with virtualization engine vla virtualization loob Comm_ IDown Y Info Action Information engine Lost Lost Ethernet connectivity communication with virtualization engine vla to the virtualization engine unit has been llost Recommended action 1 Check Ethernet connectivity to the virtualization engine 2 Make sure the virtualization engine is boosted correctly B Verify that the TCP IP settings on the virtualization engine are correct 4 Replace the virtualization engine if necessary 96 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE7 5 Storage Automated Diagnostic Environment Event Grid for Virtualization Engine Continued
50. SAN zone configuration Appendix A Virtualization Engine References 129 TABLE A 4 Service Codes 54 Unauthorized cabling configuration e Check cabling 57 loo many HBAs attempting to log in e Check cabling 60 INode mapping table cleared using SW2 e No action required 62 Improper SW2 setting e Correct SW2 setting e Cycle virtualization engine power 126 Too many virtualization engines in SAN e Remove the extra virtualization engine e Cycle virtualization engine power 130 Heartbeat connection between e Correct problem virtualization engines is down e Cycle the power on the follower virtualization engine 400 599 Device side interface driver errors 409 IFC device side type code invalid e Cycle power e If problem persists replace virtualization engine 434 foo many elastic store errors to continue e Check for faulty component and replace Elastic store errors result from a clock e Cycle the power on the follower mismatch between transmitter and receiver virtualization engine and indicates an unreliable link This error can also occur if a device in the SAN loses power unexpectedly 130 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 APPENDIX B SUNWsecfg Error Messages The Sun StorEdge 3900 and 6900 Series Reference Manual lists and defines the command utilities that configure the various components of the Sun StorEdge 3900 and 6900
51. Sun StorEdge 6900 Series logical view 90 primary data paths to Sun StorEdge T3 array 92 Sun StorEdge 6900 series IO routed through both HBAs 94 primary data paths to alternate master 91 Sun StorEdge Network FC Switch 8 and Switch 16 switch diagnosis of 20 troubleshooting 61 Sun StorEdge T3 array LUN failover 10 troubleshooting 99 Sun StorEdge Traffic Manager enabled devices 80 troubleshooting workarounds 8 svengine command 72 switch pairing through SANSurfer GUI 62 switch diagnostics 20 T T1 T2 FRU tests available 107 Index 144 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 notification events 103 VLUN serial number T1 T2 data path how to find 79 troubleshooting 102 VxDMP test examples used in troubleshooting 11 command line 19 VxDMP error message qictest 1M 19 for A3 B3 link 38 switchtest 1M 20 thresholds used in PFA 2 Ww troubleshooting WWN broad steps 3 check status of Sun StorEdge T3 array 4 check status of the Sun StorEdge FC Network Switch 8 and Switch 16 5 check status of the virtualization engine 5 determine extent of the problem 4 discovering the error 4 ethernet hubs 123 general procedures 3 host side and service processor side 18 quiesce IO 5 Sun StorEdge T3 array 99 T1 T2 data path 102 test and isolate FRUs 5 tools and resources available 3 virtualization engines 69 of virtualization engine 9 virtualization e
52. Sun StorEdge T3 Array SUNWsecfg Error Messages Continued Message Description and Cause of Error Suggested Action restoret3config Error while the block size compare The Sun StorEdge T3 array block size command is executing The parameter is different from the SBRICK_IP SIPADD command is snapshot file The Sun StorEdge T3 aborted array may have been reconfigured IRun restoret3config restoret3config SLUN configuration failed to restore Check the Sun StorEdge T3 land the force option was used to configuration with the showt3 n reinitialize without success t3_name command Restore the Sun StorEdge T3 array configuration with the restoret 3config command restoret3config SLUN configuration is not found in the Check for snapshot files in the Srestore_file Cannot restore opt SUNWsecfg etc t3_name SLUN directory If the snapshot files are not found use the modifyt3config command to configure the Sun StorEdge T3 array sSavet3config While checking the configuration the Check the Sun StorEdge T3 array Sun StorEdge T3 array configuration konfiguration by using the showt3 n has not been saved it 3_ name command if the configuration is different from standard Sun StorEdge T3 configurations Use the modifyt3config command to reconfigure the device Appendix B SUNWsecfg Error Messages 139 TABLE B 4 Message Other SUNWsecfg Error Messages Description and Cause of Error Suggested Action Commo
53. TABLE B 1 Message Description and Cause of Error Virtualization Engine SUNWsecfg Error Messages Continued Suggested Action Common to virtualization engine 1 Device side operating mode is not set properly 2 Device side UID reporting scheme is not set properly B Host side operating mode is not set properly 4 Host side LUN mapping mode is not set properly 5 Host side Command Queue Depth is not set properly Host side UID distinguish is not set properly IP is not set properly Subnet mask is not set properly Default gateway is not set properly 0 Server port number is not set properly 11 Host WWN Authentications are not set properly 12 Host IP Authentications are not set properly 13 Other VEHOST IP is not set properly 5 SON Log in to the virtualization engine and verify that the device host and Inetwork settings are correct Make sure the virtualization engine hardware is Inot in ERROR 50 mode If required power cycle the virtualization engine hardware or disable the host side switch port Run the setupve n lve_name command and enable the switch port S vepair Appendix B Ccheckslicd Cannot establish communication with Run startslicd n S vepair S vepair Ccheckslicd Cannot establish communication with Determine the host name associated Virtualization engine pair vepair with initiator by using the initiator S initiator Showvemap n ve
54. UNWexplo As part of the installation procedure you will be asked to enter in site specific information You can optionally press the Return button to accept the blank defaults 99 Do not accept automatic emailing of the Explorer Data Collection Utility output unless the Storage Service Processor is properly set up to handle mail correctly Automatic Email Submission Would you like all explorer output to be sent to explorer database americas sun com at the completion of explorer when mail or e is specified y n n Before running the Explorer Data Collection Utility make sure that the switch and Sun StorEdge T3 array information is added to the proper opt SUNWexplo etc files Example 1 Type switch information into the opt SUNWexplo etc saninput txt file Edit the file with a text editor such as vi CODE EXAMPLE 8 1 Editing switch information using vi vi saninput txt Input file for extended data collection Format is SWITCH SWITCH TYPE PASSWORD LOGIN Valid switch types are ancor and brocade LOGIN is required for brocade switches the default is admin swla ancor swlb ancor sw2a ancor sw2b ancor wq 2 Type Sun StorEdge T3 array information into the opt SUNWexplo etc t3input txt file Edit the file with a text editor such as vi 3 Type the password for your specific site 100 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CODE EXAMPLE 8
55. Verify that the disk state is in fru stat fru list and vol stat B Replace the disk if necessary Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 119 Category t3 Component interface loopcard EventType StateChange Sev Red Action Description Info Action Information The Sun StorEdge T3 array has indicated that the loopcard is no longer in an optimal state iL Information Recommended action Telnet to the affected Sun StorEdge T3 array Verify loopcard state with fru stat Verify matching firmware with other loopcard Re enable loopcard if possible enable u encid 1121 Replace the loopcard if necessary 120 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 For Internal Use Only Category Component lEventType Sev Action Description Information t3 volume StateChange Red Y Info Action Information The Sun StorEdge T3 array has reported that a power cooling unit has been disabled Recommended action 1 Check the Sun StorEdge T3 array syslog for battery hold times 2 If lt 6 minutes replace the battery or the entire PCU as required t3 power StateChange Red Y Info Action Information The Sun power ulpcu2 TECT BtorEdge T3 array ROL CAN 300 has reported that a 1454 LUN has changed 01 50 008275 in state T3 rasd2 t3bl ip xxx 0 0 41 is Rec
56. WN Node 2a000060220041 f9 WWN Port A 2b000060220041 f9 Revision 080C Serial Num Unsupported Unformatted capacity 102400 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Disk device dev rdsk c23t2B000060220041F9d0s2 devices pci e 2000 pci 2 SUNW qlc 4 fp 0 0 ssd w2b000060220041f9 0 c raw 12 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Quiesce the I O on the A3 B3 Link 1 Determine the path you want to disable 2 Disable the path by typing the following vxdmpadm disable ctlr lt c gt 3 Verify that the path is disabled vxdmpadm listctlr all Steps 1 and 2 halt I O only up to the A3 B3 link I O will continue to move over the T1 amp T2 paths as well as the A4 B4 links to the Sun StorEdge T3 array v To Suspend the I O on the A3 B3 Link Use one of the following methods to suspend I O while the failover occurs 1 Stop all customer applications that are accessing the Sun StorEdge T3 array 2 Manually pull the link from the Sun StorEdge T3 array to the switch and wait for a Sun StorEdge T3 array LUN failover a After the failover occurs replace the cable and proceed with testing and FRU isolation b After testing is complete and any FRU replacement is finished return the controller state back to the default by using the virtualization engine failback command Caution This action will
57. ain Devices Refer to Chapter 3 of the Storage Automated Diagnostic Environment User s Guide 2 In the Maintain Devices window delete the device that is to be replaced 3 Choose Maintenance gt General Maintenance gt Discovery 4 In the Device Discovery window rediscover the device 5 Choose Maintenance gt Topology Maintenance gt Topology Snapshot a Select the host that monitors the replaced FRU b Click Create and Retrieve Selected Topologies c Click Merge and Push Master Topology Conclusion Any time a master midplane is replaced you must rediscover the device using the procedure described above This is especially important when the Storage Service Processor is replaced as a FRU whether the Storage Service Processor is the master or the slave 68 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 7 Troubleshooting Virtualization Engine Devices This chapter describes how to troubleshoot the virtualization engine component of a Sun StorEdge 6900 series system This chapter contains the following sections a Virtualization Engine Description on page 69 a Translating Host Device Names on page 78 m Sun StorEdge 6900 Series Multipathing Example on page 89 a Virtualization Engine Event Grid on page 95 Virtualization Engine Description The virtualization engine supports the multipathing functionality of the Sun StorEdge T3 array Each virtua
58. and 6900 series storage subsystems Chapter 2 offers general troubleshooting guidelines such as quiescing the I O and tools you can use to isolate and troubleshoot problems Chapter 3 provides Fibre Channel link troubleshooting procedures Chapter 4 presents information about configuration settings specific to the Sun StorEdge 3900 and 6900 series It also provides a procedure for how to clear the lock file Chapter 5 provides information on host device troubleshooting Chapter 6 provides information on Sun StorEdge network FC switch 8 and switch 16 switch device troubleshooting xi Chapter 7 provides detailed information for troubleshooting the virtualization engines Chapter 8 describes how to troubleshoot the Sun StorEdge T3 array devices Also included in this chapter is information about the Explorer Data Collection Utility Chapter 9 discusses ethernet hub troubleshooting Information associated with the 3COM Ethernet hubs is limited in this guide however as this is third party information Appendix A provides virtualization engine references including SRN and SNMP Reference an SRN SNMP single point of failure table and port communication and service code tables Appendix B provides a list of SUNWsecfg Error Messages and recommendations for corrective action xii Using UNIX Commands This document may not contain information on basic UNIX commands and procedures such as shutting down the system booting
59. d 6900 Series Troubleshooting Guide March 2002 CHAPTER 9 Troubleshooting Ethernet Hubs The Sun StorEdge 3900 and 6900 series uses an Ethernet hub as the backbone for the internal service network The allocation of Ethernet ports are as follows a 1 Storage Service Processor per subsystem a 1 for each Fibre Channel Switch m 1 for each Virtualization Engine m 2 for each Sun StorEdge T3 array partner group a 1 for the Ethernet hub that is installed on the second Sun StorEdge Expansion Cabinet in the Sun StorEdge 3960 and 6960 systems Note Information about LED Status lights power information and front panel settings can be found in the SuperStack 3 Baseline Hub 12 Port TP 3C16440A and 24 Port TP 3C16441A User Guide pn DUA1644 0AAA03 This is a 3COM document Log into http www 3com com to access the documentation 123 124 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 APPENDIX A Virtualization Engine References This Appendix contains the following Tables a Table A 1 SRN and SNMP Reference a Table A 2 SRN SNMP Single Point of Failure Table m Table A 3 Port Communication m Table A 4 Service Codes TABLE A 1 provides an explanation of Service Request Numbers for the virtualization engine TABLE A 1 SRN and SNMP Reference ISRN Description Corrective Action IXxxx Disk drive Check Condition status xxxx is f too many Check Cond
60. d Diagnostic Environment is not installed on the data host there will be areas of limited monitoring diagnosis and isolation The following diagrams provide troubleshooting information for the basic components and Fibre Channel links specific to the Sun StorEdge 3900 series shown in FIGURE 2 1 and the Sun StorEdge 6900 series shown in FIGURE 2 2 Chapter 2 General Troubleshooting Procedures 15 For Internal Use Only Fibre Channel Link Diagrams FIGURE 2 1 shows the basic components and the Fibre Channel links for a Sun StorEdge 3900 series system A1 to B1 HBA to Sun StorEdge FC network switch 8 and switch 16 switch link m A4 to B4 Sun StorEdge FC network switch 8 and switch 16 switch to Sun StorEdge T3 array link T3 Alt Master A4 T3 Master FIGURE 2 1 Sun StorEdge 3900 Series Fibre Channel Link Diagram 16 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 FIGURE 2 2 shows the basic components and the Fibre Channel links for a Sun StorEdge 6900 series system A1 to B1 HBA to Sun StorEdge network FC switch 8 and switch 16 switch link m A2 to B2 Sun StorEdge network FC switch 8 and switch 16 switch to virtualization engine link on the host side A3 to B3 Sun StorEdge network FC switch 8 and switch 16 switch to the virtualization engine link on the device side m A4 to B4 Sun StorEdge network FC switch 8 and switch 16 switch to Sun StorEdge T3 array switch a T1
61. ddress 2b00006022004186 0 Class primary State ONLINE The dev rdsk c t represents the Global Unique Identifier of the device It is 32 bits long m The first 16 bits correspond to the WWN of the master virtualization engine router m The remaining 16 bits are a the VLUN serial number a Virtualization engine WWN 2900006022004195 a VLUN serial number 6257334B30303148 80 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To View the Virtualization Engine Map The virtualization engine map is stored on the Storage Service Processor 1 To view the virtualization engine map type showvemap n v1 f VIRTUAL LUN SUMMARY Disk pool VLUN Serial MP Drive VLUN VLUN Size Slic Zones Number Target Target Name GB t3b00 6257334B30303148 T49152 T16384 VDRV000 95 60 t3b00 6257334B30303149 T49152 T16385 VDRVOO01 55 0 Kk KK DISK POOL SUMMARY Disk pool RAID MP Drive Size Free Space T3 Active Number of Target GB GB Path WWN VLUNs t3b00 5 T49152 TEG 6 7 50020F2300006DFA 2 t3b01 5 T49153 TL67 TEG T7 50020F230000725B 0 KKK MULTIPATH DRIVE SUMMARY Disk pool MP Drive T3 Active Controller Serial Target Path WWN Number t3b00 T49152 50020F2300006DFA 60020F2000006DFA t3b01 T49153 50020F230000725B 60020F2000006DFA Initiator UID VE Host Online Revision Number of SLIC Zones I00001 2900006022004195 vla Yes 08 14 0 100002 2900006022004186 vib Yes 08 14 0 Kk KKK ZONE SUMMARY Zone Name HBA WWN
62. e following messages or files Storage Automated Diagnostic Environment alerts or email messages a var adm messages a Sun StorEdge T3 array syslog file Storage Service Processor messages a var adm messages t3 messages a var adm log SEcfglog file 2 Determine the extent of the problem by using one or more of the following methods Storage Automated Diagnostic Environment Topology view Storage Automated Diagnostic Environment Revision Checking manual patch or package to check whether the package or patch is installed Verify the functionality using one of the following a checkdefaultconfig 1M a checkt3config 1M m cfgadm al output a luxadm 1M output Check the multipathing status using the Sun StorEdge Traffic Manager software or VxDMP 3 Check the status of a Sun StorEdge T3 array by using one or more of the following methods Storage Automated Diagnostic Environment device monitoring reports Run the SEcfg script which displays and shows the Sun StorEdge T3 array configuration Manually open a telnet session to the Sun StorEdge T3 array luxadm 1M display output LED status on the Sun StorEdge T3 array Explorer Data Collection Utility output located on the Storage Service Processor 4 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 4 Check the status of the Sun StorEdge FC network switch 8 and switch 16 switches using the following tools m Storage Automated Diagnostic
63. em to a FRU that must be replaced Follow the instructions in the Sun StorEdge 3900 and 6900 Series Reference Manual and the Sun StorEdge 3900 and 6900 Installation and Service Manual for proper FRU replacement procedures Chapter 2 General Troubleshooting Procedures 5 For Internal Use Only 8 Verify the fix using the following tools m Storage Automated Diagnostic Environment GUI Topology View and Diagnostic Tests mw var adm messages on the data host 9 Return the path to service by using one of the following methods a Multipathing software m Restarting the application 6 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Multipathing Options in the Sun StorEdge 6900 Series Using the virtualization engines presents several challenges in how multipathing is handled in the Sun StorEdge 6900 series Unlike Sun StorEdge T3 array and Sun StorEdge network FC switch 8 and switch 16 switch installations which present primary and secondary pathing options the virtualization engines present only primary pathing options to the data host The virtualization engines handle all failover and failback operations and mask those operations from the multipathing software on the data host The following example illustrates a Sun StorEdge Traffic Manager problem on a Sun StorEdge 6900 series system luxadm display dev rdsk c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk dev rdsk c6t2900006022
64. eplaced as a FRU whether the Storage Service Processor is the master or the slave Chapter 5 Troubleshooting Host Devices 59 For Internal Use Only 60 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 6 Troubleshooting Sun StorEdge FC Switch 8 and Switch 16 Devices This chapter describes how to troubleshoot the switch components associated with a Sun StorEdge 3900 or 6900 series system This chapter contains the following sections m Sun StorEdge Network FC Switch 8 and Switch 16 Switch Description on page 61 m Switch Event Grid on page 62 a setupswitch Exit Values on page 68 m Replacing the Master Midplane on page 68 Sun StorEdge Network FC Switch 8 and Switch 16 Switch Description The Sun StorEdge network FC switch 8 and switch 16 switches provide cable consolidation and increased connectivity for the internal data interconnection infrastructure The switches are paired to provide redundancy Two switches are used in each Sun StorEdge 3900 series and four switches are used in each Sun StorEdge 6900 series Each Sun StorEdge network FC switch 8 and switch 16 switch is connected by way of an Ethernet to the service network for management and service from the Storage Service Processor 61 These switches can be monitored through the SANSurfer GUI which is available on the Storage Service Processor You configure and modify the switches using the Configuration U
65. eptacle connected connected connected connected connected connected connected connected connected connected connected connected connected Occupant Condition configured configured configured configured configured unconfigured unconfigured unconfigured configured unconfigured configured unconfigured unconfigured un un Known Known Known KNOWN KNOWN KNOWN KNOWN KNOWN KNOWN KNOWN Known Known KNOWN 4 Verify that I O has halted This halts the I O only up to the A3 B3 link see FIGURE 2 2 I O continues to move over the T1 and T2 paths as well as the A4 B4 links to the Sun StorEdge T3 array For Internal Use Only Chapter 2 General Troubleshooting Procedures 9 v To Suspend the I O Use one of the following methods to suspend the I O while the failover occurs 1 Stop all customer applications that are accessing the Sun StorEdge T3 array 2 Manually pull the link from the Sun StorEdge T3 array to the switch and wait for a Sun StorEdge T3 array LUN failover m After the failover occurs replace the cable and proceed with testing and FRU isolation m After testing and any FRU replacement is finished return the Controller state back to the default by using virtualization engine failback Refer to Virtualization Engine Failback on page 81 Note To confirm that a failover is occurring open a telnet session to the Sun StorEdge T3 array and check the output of por
66. es ICat Component lEventType Sev Action Description Information Action switch port statistics Log Yellow Y Info Action Information The switch has reported Change in port fa change in an error statistics on switch counter This could diag156 sw1lb indicate a failing ip 192 168 0 31 component in the link Action Check the Topology GUI for any link errors Run linktest on the link to isolate the failing FRU Quiesce I O on the link before running llinktest switch khassis fan Alarm Yellow chassis fan 1 status changed from OK switch chassis power Alarm Yellow Info This event monitors chassis power 1 status changes in the changed from OK status of the chassis power supply as reported by SANbox chassis_status switch fchassis temp Alarm Yellow Info chassis temp 1 This event monitors status changed from changes in the OK status of the chassis temperature supply las reported by SANbox chassis_status switch chassis zone Alarm Yellow Info Switch swla This event reports was rezoned new changes in the zones zoning of a switch 64 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE6 1 Storage Automated Diagnostic Environment Event Grid for Switches Continued ICat Component lEventType Sev Action Description Information Action switch enclosure Audit Auditing a new switch called ras d2 swb1 ip xxx 0 0 41 100020
67. es with the virtualization engine The SLIC daemon periodically polls the virtualization engine for all subsystem errors and for topology changes It then passes this information in the form of an SRN to the Error Log file 70 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Display Log Files and Retrieve SRNs Use the opt svengine sduc sreadlog command to display log files and retrieve the Service Request Numbers SRN for errors that need action Data is returned in the following format TimeStamp nnn Txxxxx uuuuuuuu SRN mmmmmn TimeStamp nnn Txxxxx uuuuuuuu SRN mmmmm TimeStamp nnn Item TimeStamp nnn TXXXXX XXXXX UUUUUUUU SRN mmmmm Description Time and date when error occurred The name of the virtualization engine pair v1 or v2 The LUN where the error occurred Note Txxxxx can represent a physical or a logical LUN UUuUuUuUuuUuuuu SRN mmmmm The unique ID of the drive or the virtualization engine router The SRN defined in numerical order Example 2002 Jan 3 10 2002 Jan 3 10 2002 Ja i3 L0 2002 Jan 3 10 2002 Jan 3 10 2002 Jan 3 10 opt svengine sduc sreadlog d v1 13 05 v1 29000060 220041F9 SRN 70030 13 31 v1 29000060 220041F9 SRN 70030 17 10 v1 29000060 220041F9 SRN 70030 17 37 v1 29000060 220041F9 SRN 70030 222 26 v1 29000060 220041F9 SRN 70030 225 54 v1 29000060 220041F9 SRN 70030 Chapter 7 Troubleshooting Virtualization Engine Devices Fo
68. eshooting Guide March 2002 11 12 13 14 15 Enable the switch port opt SUNWsecfg flib setveport v virtualization engine name e Reset the virtualization engine resetve n virtualization engine name Find the initiator number for the new and old number showvemap n virtualization engine pairname 1 The new unit will not have any zones defined If zones were present before the replacement type the following restorevemap n virtualization engine pair z c old ve initiator number d new ve initiator number Verify the new unit by typing showvemap n virtualization engine pairname 1 Chapter 7 Troubleshooting Virtualization Engine Devices For Internal Use Only 85 v 5 To Manually Clear the SAN Database It is occasionally necessary to manually clear the SAN database on the virtualization engine routers Caution This procedure will wipe out the SAN database and will remove the configuration of disk pools Multipath drives Zoning and VLUNs After performing this procedure the virtualization map must be restored to the virtualization engine pair using opt SUNWsecfg bin restorevemap This requires a valid copy of the opt SUNWsecfg etc vl san or v2 san file To Reset the SAN Database on Both Virtualization Engines Type resetsandb n vepair command To Reset the SAN Database on a Single Virtualization Engine Disconnect t
69. et The switch setting matches the default configuration 1 IERROR Errors occurred while trying to set the proper switch settings The switch setting does not match the default configuration or any valid alternatives 2 IWARNING Errors occurred while trying to set the proper switch settings The ports did Inot self configure properly A cable connection might not be working properly T ports self configure that is the configuration tool cannot control the configuration from F ports when they are cabled properly Specifically these are the ports on the back end switches in Sun StorEdge 6900 series configurations only The ports support the ISL connections 3 WARNING The Flash code is different from the release level The switch Flash code does not match the current release version 30462 This is not an error QLogic periodically releases new versions of the switch Flash code and the new version will not match the default version 4 WARNING The configuration is not set to the default but the differences are likely supported alternatives The default switch configurations were overridden with valid alternatives which are also supported by the SUNWsecfg configuration tools It should still be flagged as not the default It can imply any of the following alternatives these messages are printed to the screen and to the Storage Automated Diagnostic Environment GUI e INFO Some ports have been set to SL mode but should have been set
70. ety 816 3243 Compliance Manual e Sun StorEdge 3900 and 6900 Series Hardware Installation and 816 3244 Service Manual Sun StorEdge T3 and T3 e Sun StorEdge T3 and T3 Array Start Here 816 0772 array e Sun StorEdge T3 and T3 Array Installation Operation and 816 0773 Service Manual 816 0776 e Sun StorEdge T3 and T3 Array Administrator s Guide 816 0777 e Sun StorEdge T3 and T3 Array Configuration Guide 816 0778 e Sun StorEdge T3 and T3 Array Site Preparation Guide 816 0779 e Sun StorEdge T3 and T3 Field Service Manual 816 0781 e Sun StorEdge T3 and T3 Array Release Notes Diagnostics e Storage Automated Diagnostics Environment User s Guide 816 3142 Sun StorEdge network FC e Sun StorEdge Network FC Switch 8 and Switch 16 Release Notes 816 0842 switch 8 and switch 16 e Sun StorEdge Network FC Switch 8 and Switch 16 Installation 816 0830 and Configuration Guide e Sun StorEdge Network FC Switch 8 and Switch 16 Best 816 2688 Practices Manual e Sun StorEdge Network FC Switch 8 and Switch 16 Operations 816 1986 Guide e Sun StorEdge Network FC Switch 8 and Switch 16 Field 816 1701 Troubleshooting Guide SANbox switch management e SANbox 8 16 Segmented Loop Switch Management User s 875 3060 using SANsurfer Manual e SANbox 8 Segmented Loop Fibre Channel Switch Installer s 875 1881 User s Manual e SANbox 16 Segmented Loop Fibre Channel Switch Installer s 875 3059 User s Manual Expansion cabinet e Sun StorEdge Expansion Cabinet Installation and Se
71. gadm c unconfigure device v To Unconfigure the c2 Path 1 Type cfgadm al Ap_Id Type Receptacle Occupant Condition c0 scsi bus connected configured unknown c0 dsk cOt0d0 disk connected configured unknown c0 dsk c0t1d0 disk connected configured unknown foul scsi bus connected configured unknown cl dsk clt6d0 CD ROM connected configured unknown c2 fc fabric connected configured unknown c2 210100e08b23fa25 unknown connected unconfigured unknown c2 2b000060220041f4 disk connected configured unknown c3 fc fabric connected configured unknown c3 210100e08b230926 unknown connected unconfigured unknown c3 2b000060220041f9 disk connected configured unknown c4 fc private connected unconfigured unknown c5 fc connected unconfigured unknown 8 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 2 Using Storage Automated Diagnostic Environment Topology GUI determine which virtualization engine is in the path you need to disable 3 Use the world wide name WWN of the virtualization engine that is in the unconfigure command as follows cfgadm al Ap_Id c0 c0 dsk c0t0d0 c0 dsk cOt1d0 l cl dsk clt6d0 c2 c2 210100e08b23fa25 c2 2b000060220041f4 c3 c3 210100eC08b230926 c3 2b000060220041 f9 c4 eo Type scsi bus disk disk scsi bus CD ROM fc fabric unknown disk fc fabric unknown disk fc private fc cfgadm c unconfigure c2 2b000060220041f4 Rec
72. gine s IP Address If unintentional check condition of drives Check condition of drives Check the Ethernet connection between the two virtualization engines 70030 70040 SAN configuration changed by SV SAN Builder Host zoning configuration has changed 70050 MultiPath drive Failover Check MultiPath drive 70051 70098 MultiPath drive Failback Instant Copy degrade If no spare is available replace the failed drive with a new drive 70099 Degrade because the drive has disappeared IReinsert the missing drive or replace it with a drive of equal or greater capacity 126 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE A 1 SRN and SNMP Reference ISRN Description Corrective Action 7009A Read degrade recorded A mirror drive IReinsert the missing drive or replace it was written to causing it to enter the with a drive of equal or greater capacity degrade state 7009B Write degrade recorded If a spare drive is The removed drive needs to be if good available it will be brought in and used to reinserted or if bad replaced replace the failed drive 7009C Last primary failed during rebuild Thisis e Backup drive data a multi point failure and is very rare Destroy mirror drive where failure has occurred e Format mode 14 drives e Create new mirror drive e Re assign old SCSI ID and LUN to mirror drive e Restore
73. he lock file directly by typing 1s la opt SUNWsecfg etc look for Sswitch lock B If the lock is set in error use the removelocks s command to clear it cCheckswitch listavailable 1 Current configuration on switch does not match the defined configuration 2 One of the predefined static switch configuration parameters that can be overridden for special configurations such as NT connect or cascaded switches is set incorrectly lor switch 16 switch devices are available They are either not found or the configuration lock is set Either the components the Sun StorEdge T3 array the switch or the Virtualization engine are down cannot be pinged or another SUNWsecfg command is running and is updating the configuration ps ef il Select View Logs or directly view S LOGFILE for more details 2 Re run setupswitch on the specified switch INo Sun StorEdge network FC switch 8 If no other commands are running and lyou believe the configuration lock Imight be set in error run the removelocks command Appendix B SUNWsecfg Error Messages 135 TABLE B 2 Message Description and Cause of Error Sun StorEdge Network FC Switch 8 and Switch 16 Switch SUNWsecfg Error Messages Suggested Action setswitchflash setswitchflash Invalid flash file flashfile Check the number of ports on switch Sswitch switch timed out after reset The switch took longer than t
74. he virtualization engine device side FC cables Telnet to the first virtualization engine in the pair Enter the password The User Service Utility Menu is displayed Enter 9 to clear the SAN database m A successful command will display the message SAN database has been cleared m An unsuccessful command will result in the service code 051 If this occurs repeat steps 1 3 m Ifthe command continues to fail replace the virtualization engine Reconnect the virtualization engine device side FC cables Enter B to Warm Reboot both virtualization engines 86 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Stopping and Restarting the SLIC Daemon Follow this procedure to restart the SLIC daemon if the SLIC daemon becomes unresponsive or if messages such as the following are displayed connect Connection refusedor Socket error encountered v To Restart the SLIC Daemon 1 Check whether the SLICD is running ps eaf grep slicd 2 Check for any message queues shared memory or semaphores still in use ipcs IPC status from lt running system gt as of Wed Feb 20 12 48 30 MST 2002 T ID KEY MODE OWNER GROUP Message Queues Shared Memory m 0 0x50000483 rw r r root root m 301 0x5555aa8a rw root other m 302 0x5555aaaa rw root other m 303 0x5555aaba rw root other m 4 Ox7cc napy root root Semaphores s 196608 0x5555aa9a ra root other s
75. hown in CODE EXAMPLE 7 1 A Status report for the host side and device side ports is displayed Within the next few minutes take another reading The number of new errors that occurred within that time frame represents the number of link errors Note If the t 30fdg 1M is running while you perform these steps the following error message is displayed Daemon error check the SLIC router Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CODE EXAMPLE 7 1 Fibre Channel Link Error Status Example opt svengine sduc svstat d v1 I00001 Host Side FC Vital Statistics Link Failure Count 0 Loss of Syne Count Loss of Signal Count Protocol Error Count Invalid Word Count Invalid CRC Count O O OOG I00001 Device Side FC Vital Statistics Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 139 Invalid CRC Count 0 100002 Host Side FC Vital Statistics Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count Invalid Word Count Invalid CRC Count oro m I00002 Device Side FC Vital Statistics Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Protocol Error Count 0 Invalid Word Count 135 Invalid CRC Count 0 diag xxxxx xxx com root Note v1 represents the first virtualization engine pair Note The SLIC daemon must be running for the opt svengine sduc svstat d v1 com
76. htest runs a No virtualization engine tests are available at this time 38 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Isolate the A3 B3 FC Link 1 Quiesce the I O on the A3 B3 FC link path 2 Break the connection by uncabling the link 3 Insert the loopback connector into the switch port 4 Run switchtest a If the test fails replace the GBIC and rerun switchtest b If the test fails again replace the switch 5 If the switch or the GBIC show no errors replace the remaining components in the following order a Replace the virtualization engine side GBIC recable the link and monitor the link for errors b Replace the cable recable the link and monitor the link for errors c Replace the virtualization engine restore the virtualization engine settings recable the link and monitor the link for errors 6 Return the path to production The procedures for restoring virtualization engine settings are in the Sun StorEdge 3900 and 6900 Series Reference Manual Chapter 3 Troubleshooting the Fibre Channel Links 39 For Internal Use Only A4 B4 Fibre Channel FC Link If a problem occurs with the A4 B4 FC link m Ina Sun StorEdge 3900 series system the Sun StorEdge T3 array will fail over m Ina Sun StorEdge 6900 series system no Sun StorEdge T3 array will fail over but a severe problem can cause a path to go offline and FIGURE 3 10 are examples of A4 B4 Link Notification E
77. ice Processor Side Event 36 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Verify the Host Side An error in the A3 B3 FC link results in a device being listed as in an unusable state in cf gadm but no HBAs are listed as in the unconnected state in luxadm output The multipathing software will note an offline path CODE EXAMPLE 3 5 Devices in the connected state cfgadm al Ap_Id c0 c0 dsk c0t0d0 c0 dsk c0t1d0 el cl dsk cl1t6d0 c2 c2 210100e08b23fa25 c2 2b000060220041f4 c3 c3 2b000060220041 f9 c3 210100e08b230926 c4 c5 luxadm e port luxadm display lt snip gt Type scsi bus disk disk scsi bus CD ROM fc fabric unknown disk fc fabric disk unknown fc private ie Found path to 2 HBA ports Receptacle connected connected connected connected connected connected connected connected connected connected connected connected connected devices pcit6 4000 SUNW qlc 2 fp 0 0 devctl devices pcit6 4000 SUNW qlc 3 fp 0 0 devctl dev rdsk c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk dev rdsk c6t29000060220041F96257354230303052d0s2 Occupant Condition configured unknown configured unknown configured unknown configured unknown configured unknown configured unknown unconfigured unknown configured unknown configured unknown configured unusable unconfigured unknown unconfigured unkn
78. ices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Sun Sun Microsystems le logo Sun AnswerBook2 Sun StorEdge StorTools docs sun com Sun Enterprise Sun Fire SunOS Netra et Solaris sont des marques de fabrique ou des marques d pos es ou marques de service de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les marques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionniers de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une licence non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci s de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et qui en outre se co
79. ion The Sun StorEdge T3 array has reported that a loopcard has been removed from the chassis Info Action power ulpcu2 TE ICTROL CAN 300 1454 01 50 008275 was removed from T3 diag213 ip xxx 20 67 213 controller ulctr inT3 diag213 ip xxx 20 67 213 lis now Available status state changed from disabled to ready enabled Information Information The Sun StorEdge T3 array has reported a disk has been removed from the chassis Replace the disk within the 30 minute power shutdown window Replace the loopcard within the 30 minute power shutdown window Information The Sun StorEdge T3 array has reported that a power cooling unit has been removed from the chassis Replace the PCU within 30 minute shutdown window Troubleshooting the Sun StorEdge T3 Array Devices Recommended action Recommended action Recommended action 117 Category Component t3 disk EventType State Change Sev Action Description disk uld5 in Sun StorEdge T3 array rasd3 t3b1 ip xxx 0 0 41 is mnow Available status state changed from fault disabled to ready enabled Information t3 interface loopcard t3 volume State Change State Change Info loopcard ull1 SLR MI 375 0085 01 G G4 070924 in T3 msp0 t3b0 volume ulvoll slr mi 370 3990 01 e 0 022542 ulvoll in T3 dvt2 t3b0 ip 192 168 0 40 lis now Available sta
80. itions are returned the Unit Error Code then check the link status The Unit Error Codes are returned by the drive in Sense Data bytes 20 21 in response to the SCSI Request Sense command 70000 SAN Configuration has changed 70001 Rebuild process has started 70002 Rebuild is completed without error 70003 Rebuild is aborted with a read error This If a spare drive is available it will be Imeans that the drive copying information brought in and used to replace the failed cannot read from the primary drive drive If no spare is available replace the failed drive with a new drive 70004 Write error is reported by follower If the If a spare drive is available it will be initiator is master then its follower has brought in and used to replace the failed detected a write error on a member within _ drive If no spare is available replace the la mirror drive failed drive with a new drive For Internal Use Only 125 TABLE A 1 SRN SRN and SNMP Reference Description Corrective Action 70005 Write error is detected by master If the initiator is master then it has detected la write error on a member within a mirror drive f a spare drive is available it will be brought in and used to replace the failed drive If no spare is available replace the failed drive with a new drive 70006 70007 70008 70009 70010 70020 virtualization engine to virtualization engine communication has failed
81. lity 99 installation of 99 F failback virtualization engine 83 fibre channel link A1 B1 data host verification 25 A2 B2 host side verification 31 A3 B3 host side verification 37 A3 B3 link service processor verification 38 data host verification for A4 B4 42 FRU tests for A2 B2 link 33 FRU tests for A3 B3 link 38 troubleshooting A1 B1 link 23 troubleshooting A2 B2 link 29 troubleshooting A4 B4 link 40 fibre channel link diagrams 16 fibre channel links used for PFA 2 FRU tests available for A1 B1 FC link 26 Index 143 H health functions for Sun StorEdge 3900 and 6900 series 2 host device names translating 78 host devices troubleshooting 53 host event grid 53 host side troubleshooting 18 l IO suspension of 10 13 isolation procedures for A2 B2 link 33 L link error example of severe data host error 24 lock file how to clear 50 luxadm 1M used to display information 12 M monitoring functions for Sun StorEdge 3900 and 6900 Series 2 N notification used in PFA 2 notification events T1 T2 103 P paths how to unconfigure 8 returning to production 10 14 Predictive Failure Analysis 2 problem isolation 15 Q quiesce IO 13 S SAN database how to manually clear 86 sending documentation comments xv service processor troubleshooting 18 SLIC daemon killing and restarting 87 Sun StorEdge 3900 and 6900 series description of 1 related documentation xiv
82. lity during the repair process PFA is not always effective in detecting or isolating failures The remainder of this document provides guidelines that can be used to troubleshoot problems that occur in supported components of the Sun StorEdge 3900 and 6900 series Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 2 General Troubleshooting Procedures This chapter contains the following sections a Troubleshooting Overview Tasks on page 3 a Multipathing Options in the Sun StorEdge 6900 Series on page 7 m Fibre Channel Links on page 15 a Storage Automated Diagnostic Environment Event Grid on page 21 Troubleshooting Overview Tasks This section lists the high level steps to isolate and troubleshoot problems in the Sun StorEdge 3900 and 6900 series It offers a methodical approach and lists the tools and resources available at each step Note A single problem can cause various errors throughout the SAN A good practice is to begin by investigating the devices that have experienced Loss of Communication events in the Storage Automated Diagnostic Environment These errors usually indicate more serious problems A Loss of Communication error on a switch for example could cause multiple ports and HBAs to go offline Concentrating on the switch and fixing that failure can help bring the ports and HBAs back online 1 Discover the error by checking one or more of th
83. lization engine has physical access to all underlying Sun StorEdge T3 arrays and controls access to half of the Sun StorEdge T3 arrays The virtualization engine has the ability to assume control of all arrays in the event of component failure The configuration is maintained between virtualization engine pairs through redundant T Port connections by way of a pair of Sun StorEdge network FC switch 8 or switch 16 switches 69 Virtualization Engine Diagnostics The virtualization engine monitors the following components m Virtualization engine router m Sun StorEdge T3 array m Cabling among the router and storage Service Request Numbers The service request numbers are used to inform the user of storage subsystem activities Service and Diagnostic Codes The virtualization engine s service and diagnostic codes inform the user of subsystem activities The codes are presented as a LED readout See Appendix A for the table of codes and actions to take In some cases you might not be able to receive Service Request Numbers SRNs because of communication errors If this occurs you must read the virtualization engine LEDs to determine the problem v To Retrieve Service Information You can retrieve service information in two ways a CLI Interface m Error Log Analysis Commands Both of these methods are described in the following sections CLI Interface The SLIC daemon which runs on the Storage Service Processor communicat
84. mand to work Chapter 7 Troubleshooting Virtualization Engine Devices For Internal Use Only 77 Translating Host Device Names You can translate host device names to VLUN disk pool and physical Sun StorEdge T3 array LUNs The luxadm output for a host device shown in CODE EXAMPLE 7 2 does not include the unique VLUN serial number that is needed to identify this LUN CODE EXAMPLE 7 2 luxadm Output for a Host Device luxadm display dev rdsk c4t2B00006022004186d0s2 DEVICE PROPERTIES for disk dev rdsk c4t2B00006022004186d0s2 Status Port A OK Vendor SUN Product ID SESSO1 WWN Node 2a00006022004186 WWN Port A 2b00006022004186 Revision 080E Serial Num Unsupported Unformatted capacity 56320 000 MBytes Write Cache Enabled Read Cache Enabled Minimum prefetch 0x0 Maximum prefetch 0x0 Device Type Disk device Path s dev rdsk c4t2B00006022004186d0s2 devices pci lf 4000 pci 2 SUNW qlc 5 fpe0 0 ssd w2b00006022004186 0 c raw 78 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 To Display the VLUN Serial Number Devices That Are Not Sun StorEdge Traffic Manager Enabled Use the format e command Type the disk on which you are working at the format prompt Type inquiry at the scsi prompt Find the VLUN serial number in the Inquiry displayed list format e c4t2B00006022004186d0 format gt scsi scsi gt inquiry Inquiry 00 00 03 12 2b 00 00 02 53
85. mary ulpl 0 hard 1 vol2 ul failover u2pl 1 hard 0 voll ul failover u2pl 1 hard 1 vol2 ul primary 2 Compare the virtualization engine configuration to a saved configuration by running opt SUNWsecfg runsecfg and choosing Verify Virtualization Engine Map The output is from the diff 1 command which shows the lines that have been added changed or deleted Notice that the active Sun StorEdge T3 array controller WWN has changed for one of the Sun StorEdge T3 arrays indicating it is using its alternate path MANAGE CONFIGURATION FILES MENU 1 Display Virtualization Engine Map 2 Save Virtualization Engine Map 3 Verify Virtualization Engine Map 4 Help 5 Return Select configuration option above gt 3 Verifying Virtualization Engine map for vl ERROR virtualization engine map for vl has changed 18c18 lt t3b01 5 T49153 116 7 0 7 50020F230000725B 1 gt t3b01 5 T49153 116 7 0 7 50020F2300006DFA 1 28c28 lt t3b01 T49153 50020F230000725B 60020F2000006DFA gt t3b01 T49153 50020F2300006DFA 60020F2000006DFA 37637 lt 100002 2900006022004186 vib Yes 08 14 0 gt 100002 2900006022004186 Unknown No Unknown 0 46d45 lt Undefined 210000E08BO26COF 100002 Yes 0 checkvemap virtualization engine map vl verification complete FAIL FIGURE 8 3 Manage Configuration Files Menu 106 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 T1 T2 FRU Tests Available m Switch switchtest m
86. mmands Chapter 3 Troubleshooting the Fibre Channel Links 45 For Internal Use Only 46 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 4 Configuration Settings This chapter contains the following sections a Verifying Configuration Settings on page 47 a To Clear the Lock File on page 50 For a complete listing of SUNWsecfg Error Messages and recommended action refer to Appendix B Verifying Configuration Settings During the course of troubleshooting you might need to verify configuration settings on the various components in the Sun StorEdge 3900 or 6900 series To Verify Configuration Settings Run one of the following scripts m Use the opt SUNWsecfg runsecfg script and select the various Verify menu selections a Run the opt SUNWsecfg bin checkdefaultconfig script to check all accessible components The output is shown in CODE EXAMPLE 4 1 m Runthe checkswitch checkt3config checkve checkvemap scripts manually from opt SUNwsecfg bin The scripts listed above check the default configuration files in the opt SUNWsecfg etc directory and compare the current live settings to those of the defaults Any differences are marked with a FAIL 47 Note For cluster configurations and systems that are attached to Windows NT the default configurations may not match the current installed configuration Be aware of this when running the verification scripts Cer
87. n StorEdge T3 array Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 115 Category Component lEventType Sev Action Description Information t3 power Insert Info Component power ulpcu2 TE CTROL CAN 300 1454 01 50 008275 was added to T3 diag213 ip xxx 20 67 21 3 t3 enclosure Location Location of t3 Change rasd2 t3b0 ip xxx 0 0 40 was changed t3 enclosure QuiesceEnd Quiesce End on t3 d2 t3b1 ip xxx 0 0 41 t3 enclosure QuiesceStart Quiesce Start on t3 d2 t3b1 ip xxx 0 0 41 t3 enclosure IRemoval Monitoring of t3 d2 t3b1 ip xxx 0 0 41 ended t3 kontroller Remove Red Y Info Action Information The Sun Component fcontroller ulctr torEdge T3 array id was removed has reported that a from T3 diag213 controller was ip xxx 20 67 213 removed from the chassis Recommended action Replace the Controller within 30 minute power shutdown window 116 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Category t3 t3 t3 t3 Component disk interface loopcard power kontroller EventType Remove Component Remove Component Remove Component State Change Sev Red Red Red Chapter 8 For Internal Use Only Action Description Info Action disk u2d3 SEAGAT E ST318203FSUN18 G LRG07139 was removed from diag158 ip xxx 20 67 158 Info Action Informat
88. n to all components checkdefaultconfig If the Sun StorEdge 3900 or 6900 series has multiple more than two failures for example both virtualization engines and two switches are down the getcabinet tool might not determine the correct cabinet type In this example the getcabinet script Imight determine the device to be a Sun StorEdge 3900 series when in reality it lis a Sun StorEdge 6900 series Could not determine the Sun StorEdge system type Multiple components might be down land the getcabinet command could not determine the Sun StorEdge series type 3910 3960 6910 or 6960 Set the BOXTYPE variable as follows IBOXTYPE 6910 export BOXTYPE Iry using the command line interface CLI by setting the BOXTYPE environment variable to one of the four values For example BOXTYPE 3910 export BOXTYPE setdefaultconfig The system could not determine the Sun StorEdge system type Iry using the command line interface CLI by setting the BOXTYPE environment variable to one of the four values For example BOXTYPE 3910 export BOXTYPE 140 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 setupswitch Exit Values TABLE 9 1 lists the setupswitch exit values The associated messages are logged in the var adm log SEcfglog log file TABLE 9 1 tupswitch Exit Values Severity Level Message Type Message Meaning 0 INFO All switch settings are properly s
89. nforment aux licences crites de Sun LA DOCUMENTATION EST FOURNIE EN L ETAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFA ON Od aoe Ca Adobe PostScript Contents Introduction 1 Predictive Failure Analysis Capabilities 2 General Troubleshooting Procedures 3 Troubleshooting Overview Tasks 3 Multipathing Options in the Sun StorEdge 6900 Series 7 Alternatives to Sun StorEdge Traffic Manager 8 To Quiesce the I O 8 To Unconfigure the c2 Path 8 To Suspend the I O 10 To Return the Path to Production 10 To View the VxDisk Properties 11 To Quiesce the I O on the A3 B3 Link 13 To Suspend the I O on the A3 B3 Link 13 DENSE SESS SEEE SEEE GE EEE 4 To Return the Path to Production 14 Fibre Channel Links 15 Fibre Channel Link Diagrams 16 Host Side Troubleshooting 18 Storage Service Processor Side Troubleshooting 18 Contents iii For Internal Use Only Command Line Test Examples 19 qictest 1M 19 switchtest 1M 20 Storage Automated Diagnostic Environment Event Grid 21 v To Customize an Event Report 21 Troubleshooting the Fibre Channel Links 23 A1 B1 Fibre Channel FC Link 23 v To Verify the Data Host 25 FRU Tests Available for A1 B1 FC Link Segment 26 v T
90. ngine failback 83 how to replace 84 power LED codes 73 references 125 service codes 129 SRN and SNMP reference 125 SRN and SNMP single points of failure 128 virtualization engines description of 69 diagnostics 70 ethernet port LEDs 74 LEDs 72 reading LED service and diagnostic codes 73 retrieving service information 70 service and diagnostic codes 70 service request numbers 70 troubleshooting 69 For Internal Use Only Index 145 Index 146 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002
91. nly switchtest 1M switchtest 1M is used to diagnose the Sun StorEdge network FC switch 8 and switch 16 switch devices The switchtest process also provides command line access to switch diagnostics switchtest supports testing on local and remote switches switchtest runs the port diagnostic on connected switch ports While switchtest is running the port statistics are monitored for errors and the chassis status is checked CODE EXAMPLE 2 2 switchtest 1M opt SUNWstade Diags bin switchtest v o dev 2 192 168 0 30 0x0 xfersize 200 switchtest called with options dev 2 192 168 0 30 0x0 xfersize 200 switchtest Started Testing port 2 Using ip_addr 192 168 0 30 fcaddr 0x0 to access this port Chassis Status for Device Switch Power OK Temp OK 23 0c Fan 1 OK Fan 23 ORM Testing Device Switch Port Pattern 0x7e7e7e7e Pattern Oxlelelele Pattern Oxflflflfi1 Pattern Oxb5b5b5b5 Pattern Ox4a4a4a4a Pattern 0x78787878 Pattern Oxe7e7e7e7 Testing Device Switch Port Port Port Testing Device Switc Testing Device Switc Testing Device Switch Port Testing Device Switch Port Pattern Oxaa55aa55 Pattern 0x7f7f7f7f Port Pattern Ox0f0f0f0E Testing Device Switch Port Pattern Ox00ff00fE Testing Device Switch Port 2 Pattern 0x25252525 Port 2 passed all tests on Switch switchtest Stopped successfully Testing Device Switch Port Testing Device S
92. o Admin FIGURE 3 3 Storage Service Processor Notification Note An A1 B1 FC link error can cause a port in swla or swlb to change state 24 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To Verify the Data Host An error in the A1 B1 FC link can cause a path to go offline in the multipathing software CODE EXAMPLE 3 1 luxadm 1M Display luxadm display Status Port A Status Port B Vendor Product ID WWN Node WWN Port A WWN Port B Revision Serial Num Unformatted capacity Write Cache Read Cache Minimum prefetch Maximum prefetch Device Type Path s Controller Device Address Class State Controller Device Address dev rdsk c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk c6t29000060220041F96257354230303052d0s2 dev rdsk O K O K SUN SESSO1 2a000060220041f4 2b000060220041 f4 2b000060220041f9 080c Unsupported 102400 000 MBytes Enabled Enabled 0x0 0x0 Disk device dev rdsk c6t29000060220041F96257354230303052d0s2 devices scsi_vhci ssd g29000060220041 96257354230303052 c raw devices pci 6 4000 SUNW glc 3 fp 0 0 2b6000060220041f9 0 primary OFFLINE devices pci 6 4000 SUNW glc 2 fp 0 0 2b6000060220041f4 0 For Internal Use Only Class primary State ONLINE Chapter 3 Troubleshooting the Fibre Channel Links 25 An error in the A1 B1 FC link can also cause a device t
93. o Action The state of disk u1d1 Port1State on T3 300 changed from OK to failed nterface loopcard cab hanged from Ok to failed el Anfo Action The state of power u1pcul BatState on power battery iag213 ip xxx 20 67 213 is Faut oe ieee aN Anfo Action The state of power u1pcu1 PowOutput on Info Action The state of power u1pcul PowTemp on iag213 ip xxx 20 67 213 is Fault Info Action Errors s found in logfile Action Time of T3 diag213 ip xxx 20 67 213 is ifferent from host T3 Fri Oct 26 10 16 17 200 2001 10 26 12 21 04 Info Auditing a new T3 called ras d2 t3b1 enclosure enclosure FIGURE 8 5 Sun StorEdge T3 array Event Grid Chapter8 Troubleshooting the Sun StorEdge T3 Array Devices 109 For Internal Use Only The following table lists all of the events for the Sun StorEdge T3 array loopcard cab lle state of loopcable u111 CableSt late changed from OK to failed Drive Status Messages Value Description 0 Drive mounted 2 Drive present B Drive is spun up 4 Drive is disable 5 Drive has been replaced 7 Invalid system area jon drive 9 Drive not present ID Drive disabled drive is being reconstructed S Drive substituted Category Component EventType Sev Action Description Information t3 power temp Alarm The state of lpower ulpcul PowTe Imp on diag213 ip xxx 20 67 213 is INormal
94. o Isolate the A1 B1 FC Link 28 A2 B2 Fibre Channel FC Link 29 v To Verify the Host Side 31 v To Verify the A2 B2 FC Link 33 v To Isolate the A2 B2 FC Link 33 A3 B3 Fibre Channel FC Link 35 v To Verify the Host Side 37 v To Verify the Storage Service Processor 38 FRU Tests Available for the A3 B3 FC Link Segment 38 v___ To Isolate the A3 B3 FC Link 39 A4 B4 Fibre Channel FC Link 40 v To Verify the Data Host 42 Sun StorEdge 3900 Series 42 Sun StorEdge 6900 Series 42 FRU tests available for the A4 B4 FC Link Segment 44 v To Isolate the A4 B4 FC Link 44 Configuration Settings 47 Verifying Configuration Settings 47 Contents iv For Internal Use Only v To Verify Configuration Settings 47 v ToClear the Lock File 50 Troubleshooting Host Devices 53 Host Event Grid 53 v Usingthe Host Event Grid 53 Replacing the Master Alternate Master and Slave Monitoring Host 57 v To Replace the Master Host 57 v To Replace the Alternate Master or Slave Monitoring Host 58 Conclusion 59 Troubleshooting Sun StorEdge FC Switch 8 and Switch 16 Devices 61 Sun StorEdge Network FC Switch 8 and Switch 16 Switch Description 61 v To Diagnose and Troubleshoot Switch Hardware 62 Switch Event Grid 62 v Using the Switch Event Grid 62 Replacing the Master Midplane 68 v To Replace the Master Midplane 68 Conclusion 68 Troubleshooting Virtualization Engine Devices 69 Virtualization Engine Description 69 Virtualization Engine Diagnostics 70 Se
95. o enter the unusable state in cfgadm In this case the output for luxadm e port will show that a device that was connected changed to an unconnected state CODE EXAMPLE 3 2 cfgadm al Display cfgadm al Ap_Id Type Receptacle Occupant Condition co scsi bus connected configured unknown c0 dsk c0t0d0 disk connected configured unknown c0 dsk c0t1d0 disk connected configured unknown c1 scsi bus connected configured unknown c1 dsk c1t6d0 CD ROM connected configured unknown g2 fc fabric connected configured unknown c2 210100e08b23fa25 unknown connected unconfigured unknown c2 2b000060220041f4 disk connected configured unknown c3 fc fabric connected configured unknown c3 2b000060220041f9 disk connected configured unusable c4 fc private connected unconfigured unknown c5 fc connected unconfigured unknown FRU Tests Available for A1 B1 FC Link Segment aw HBA qlictest 1M a Available only if the Storage Automated Diagnostic Environment is installed on a data host Causes HBA to go offline and online during tests m Switch switchtest 1M a Can be run while the link is still cabled and online connected to HBA a You must specify a payload of 200 bytes or less when testing the A1 B1 FC link while the link is connected to the HBA limitation in HBA ASIC a Can be run only from the Storage Service Processor a The dev option to switchtest is in the following format Port IP Address FC
96. ommended action now Not Available 1 Telnet to the status state changed affected Sun from ready enabled to StorEdge T3 array lready disable 2 Check the status of LUNs via vol mode or vol stat t3 enclosure Statistics Statistics about T3 d2 t3b1 ip xxx 0 0 41 Chapter8 Troubleshooting the Sun StorEdge T3 Array Devices 121 Replacing the Master Midplane Follow this procedure when replacing the master midplane in a Sun StorEdge T3 array This procedure is detailed in the Storage Automated Diagnostic Environment User s Guide v To Replace the Master Midplane 1 Choose Maintenance gt General Maintenance gt Maintain Devices Refer to Chapter 3 of the Storage Automated Diagnostic Environment User s Guide 2 In the Maintain Devices window delete the device that is to be replaced 3 Choose Maintenance gt General Maintenance gt Discovery 4 In the Device Discovery window rediscover the device 5 Choose Maintenance gt Topology Maintenance gt Topology Snapshot a Select the host that monitors the replaced FRU b Click Create and Retrieve Selected Topologies c Click Merge and Push Master Topology Conclusion Any time a master midplane is replaced you must rediscover the device using the procedure described above This is especially important when the Storage Service Processor is replaced as a FRU whether the Storage Service Processor is the master or the slave 122 Sun StorEdge 3900 an
97. onfigured unknown c21 50020 2300006355 disk connected configured unusable FRU tests available for the A4 B4 FC Link Segment m The switchtest can only be run from the Storage Service Processor m The linktest will be able to isolate the switch and the GBIC on the switch It will not be able to isolate the cable or the Sun StorEdge T3 array controller v To Isolate the A4 B4 FC Link 1 Quiesce the I O on the A4 B4 FC link path 2 Run linktest from the Storage Automated Diagnostic Environment GUI to isolate suspected failing components Alternatively follow these steps 1 Quiesce the I O on the A4 B4 FC link path 2 Run switchtest to test the entire link re create the problem 3 Break the connnection by uncabling the link 4 Insert the loopback connector into the switch port 44 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Rerun switchtest a If switchtest fails replace the GBIC and rerun switchtest b If the test fails again replace the switch If switchtest passes assume that the suspect components are the cable and the Sun StorEdge T3 array controller a Replace the cable b Rerun switchtest If the test fails again replace the Sun StorEdge T3 array controller Return the path to production Return the Sun StorEdge T3 array LUNs to the correct controllers if a failover occured determine if failovers occur using the luxadm failover or mpdrive failback co
98. orEdge 3900 and 6900 Series Troubleshooting Guide March 2002 continued from previous page Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xxx com Severity Warning Category Message DeviceId message diag xxxxx xxx com EventType LogEvent driver Fabric_Warning EventTime 01 30 2002 11 50 07 Found 1 driver Fabric_Warning warning s in logfile var adm messages on diag xxxxx xxx com id 809f76b4 INFORMATION Fabric warning Jan 30 11 46 37 WWN 2b00006022004186 diag xxxxx xxx com fp ID 517869 kern warning WARNING fp 2 N_x Port with D_ID 108000 PWWN 2b00006022004186 reappeared in fabric in backup diag xxxxx xxx com Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xxx com Severity Warning Actionable Category Host DeviceId host diag xxxxx xxx com EventType AlarmEvent P hba EventTime 01 30 2002 11 50 10 status of hba devices pci 1f 4000 pci 2 SUNW qlc 5 fp 0 0 devctl on diag xxxxx xxx com changed from NOT CONNECTED to CONNECTED INFORMATION monitors changes in the output of luxadm e port FIGURE 8 2 Virtualization Engine Alert Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 105 For Internal Use Only Sun StorEdge T3 Array Storage Service Processor Verification 1 Run port listmap on the Sun StorEdge T3 array to see the failover event t3b0 lt 1 gt port listmap port targetid addr_type lun volume owner access ulpl 0 hard 0 voll ul pri
99. osystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements Federal Acquisitions Commercial Software Government Users Subject to Standard License Terms and Conditions DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2002 Sun Microsystems Inc 4150 Network Circle Santa Clara CA 95054 Etats Unis Tous droits r serv s Ce produit ou document est distribu avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y en a Le logiciel d tenu par des tiers et qui comprend la technologie relative aux pol
100. otice a pathing failure in its multipathing software Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 T1 T2 Notification Events The example below shows a typical port failure event Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xxx com Severity Error Actionable Category Switch DevicelId switch 100000c0dd00b682 EventType StateChangeEvent M port 8 EventTime 01 30 2002 11 17 22 port 8 in SWITCH diag209 sw2a ip 192 168 0 32 is now Not Available status state changed from Online to Offline INFORMATION A port on the switch has logged out of the fabric and gone offline PROBABLE CAUSE 1 Verify cables GBICs and connections along Fibre Channel path 2 Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path 3 Verify correct FC switch configuration Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xXxx com Severity Warning Category Switch DevicelId switch 100000c0dd00b682 EventType LogEvent MessageLog EventTime 01 30 2002 11 17 22 Change in Port Statistics on switch diag209 sw2a ip 192 168 0 32 Port 8 Received 9746 InvalidTxWds in 0 mins value 9805 FIGURE 8 1 Storage Service Processor Event Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 103 For Internal Use Only If both T Ports go offline you might see messages like the following Note the virtualization engine Even
101. own unconfigured unknown CONNECTED CONNECTED devices scsi_vhci ssd g29000060220041 96257354230303052 c raw For Internal Use Only Controller devices pcit6 4000 SUNW qlc 3 fp 0 0 Device Address 2b000060220041f9 0 Class primary State OFFLINE Controller devices pcit6 4000 SUNW qlc 2 fp 0 0 Device Address 26000060220041f4 0 Class primary State ONLINE Chapter 3 Troubleshooting the Fibre Channel Links 37 CODE EXAMPLE 3 6 VxDMP Error Message Jan 8 18 26 38 diag xxxxx xxx com vxdmp ID 619769 kern notice NOTICE vxdmp Path failure on 118 0x1f8 Jan 8 18 26 38 diag xxxxx xxx com vxdmp ID 997040 kern notice NOTICE vxvm vxdmp disabled path 118 0x1f8 belonging to the dmpnode 231 0xd0 v To Verify the Storage Service Processor You can check the A3 B3 FC link using the Storage Automated Diagnostic Environment Diagnose Test from Topology functionality Storage Automated Diagnostic Environment s implementation of diagnostic tests verify the operation of user selected components Using the Topology view you can select specific tests subtests and test options Refer to the Storage Automated Diagnostic Environment User s Guide for more information FRU Tests Available for the A3 B3 FC Link Segment m The Linktest is not available m The switch and or GBIC switchtest test Can be used only in conjunction with the loopback connector a Cannot be cabled to the virtualization engine while switc
102. pair f command output Run the command resetve n vename checkvemap Cannot establish communication with Run the command again If this fails check the status of both virtualization engines If there is an error condition see Appendix A for corrective action SUNWsecfg Error Messages 133 TABLE B 1 Virtualization Engine SUNWsecfg Error Messages Continued Message Description and Cause of Error Suggested Action createvezone Invalid WWN wwn on vepair WWN that has already been specified initiator Sinit or virtualization has a SLIC zone and or an HBA alias engine is unavailable assigned Note that for a WWN to be available for createvezone the zone name in the map file showvemap n lve_pairname must be undefined and the online status should be yes If a zone name is assigned run the rmvezone command If there are still errors try Sadapter alias d vepair r Sinitiator a zone n and then run savemap n Svepair Listavailable No virtualization engines are available If no other commands are running and They are either not found or the lyou believe the configuration lock configuration lock is set might be set in error run the remove locks command Either the components the Sun StorEdge T3 array the switch or the Virtualization engine are down cannot be pinged or another ISUNWsecfg command is running and lis updating the configuration ps ef Irestorevemap 1 Import zone data
103. path by way of the Sun StorEdge network FC switch 8 and switch 16 switch T Ports Chapter 7 Troubleshooting Virtualization Engine Devices 89 For Internal Use Only In the event of a path failure after the second tier of Sun StorEdge network FC switch 8 and switch 16 switches or in the event of both T Ports failing between the switches the virtualization engines force a LUN failover of the affected Sun StorEdge T3 array and routes all I O to its secondary path From the host side nothing has changed all I O is routed through both HBAs refer to FIGURE 7 6 Storage IO amp SVE Communications Traffic Logical Multipath Drive MPDrive 0 FIGURE 7 2 Sun StorEdge 6900 Series Logical View 90 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Storage IO amp SVE Communications Traffic FIGURE 7 3 Primary Data Paths to the Alternate Master Chapter 7 Troubleshooting Virtualization Engine Devices 91 For Internal Use Only FIGURE 7 4 Primary Data Paths to the Master Sun StorEdge T3 Array 92 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Storage IO amp SVE Communications Traffic FIGURE 7 5 Path Failure Before the Second Tier of Switches Chapter 7 Troubleshooting Virtualization Engine Devices 93 For Internal Use Only FIGURE 7 6 Path Failure I O Routed through Both HBAs 94 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002
104. r Internal Use Only 71 Item Description TimeStamp January 3 2002 10 13 nnn v1 virtualization engine pair v1 uuuuuuuu 29000060 220041F9 vla obtained by checking the virtualization engine map from the SEcfg utility SRN mmmmm SRN 70030 SAN Configuration Changed Refer to Appendix A for codes v To Clear the Log Use the opt svengine sduc sclrlog command Virtualization Engine LEDs TABLE 7 1 describes the LEDs on the back of the virtualization engine TABLE7 1 Virtualization Engine LEDs LED Color State Description Power Green Solid on The virtualization engine is powered on Status Green e Solid on e Normal operating mode e Blink Service e Number of blinks to indicate a Code decimal number Fault Amber Serious problem Decipher the blinking of the Status LED to determine the service code Once you have determined the service code look up the decimal number of the service code in Appendix A 1 The Status LED will blink a service code when the Fault LED is Solid on 72 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Power LED Codes The virtualization engine LEDs are shown in FIGURE 7 1 FIGURE 7 1 Virtualization Engine Front Panel LEDs Interpreting LED Service and Diagnostic Codes The Status LED communicates the status of the virtualization engine in decimal numbers Each decimal number is represented by number of blinks followed by a medium duration two seconds
105. rid 63 Virtualization Engine Front Panel LEDs 73 Sun StorEdge 6900 Series Logical View 90 Primary Data Paths to the Alternate Master 91 Primary Data Paths to the Master Sun StorEdge T3 Array 92 Path Failure Before the Second Tier of Switches 93 List of Figures ix GURE 7 6 GURE 7 7 GURE 8 1 GURE 8 2 GURE 8 3 GURE 8 4 GURE 8 5 Path Failure I O Routed through Both HBAs 94 Virtualization Engine Event Grid 95 Storage Service Processor Event 103 Virtualization Engine Alert 105 Manage Configuration Files Menu 106 Example Link Test Text Output from the Storage Automated Diagnostic Environment 107 Sun StorEdge T3 array Event Grid 109 List of Figures x Preface The Sun StorEdge 3900 and 6900 Series Troubleshooting Guide provides guidelines for isolating problems in supported configurations of the Sun StorEdge 3900 and 6900 series For detailed configuration information refer to the Sun StorEdge 3900 and 6900 Series Reference Manual The scope of this troubleshooting guide is limited to information pertaining to the components of the Sun StorEdge 3900 and 6900 series including the Storage Service Processor and the virtualization engines in the Sun StorEdge 6900 series This guide is written for Sun personnel who have been fully trained on all the components in the configuration How This Book Is Organized This book contains the following topics Chapter 1 introduces the Sun StorEdge 3900
106. rocesses are running you can start them manually using the SUNWsecfg scripts which are located in the opt SUNWsecfg bin startslicd n v1 directory CODE EXAMPLE 7 3 slicd Output Example ps ef grep slic root 6299 6295 LO Jan 04 0 00 slicd root 6296 6295 0 Jan 04 0 02 slicd root 6295 1 0 Jan 04 0 01 slicd root 6357 6295 0 Jan 04 0 00 slicd root 6362 6295 0 Jan 04 0 03 slicd Chapter 7 Troubleshooting Virtualization Engine Devices 83 For Internal Use Only 10 For detailed information about the SUNWsecfg scripts refer to the Sun StorEdge 3900 and 6900 Series Reference Manual To Replace a Failed Virtualization Engine Replace the old failed virtualization engine unit with a new unit Identify the MAC address of the new unit and replace the old MAC address with the new one in the etc ethers file 8 0 20 7d 82 9e virtualization engine name Verify that RARP is running on the Storage Service Processor Disable the switch port opt SUNWsecfg flib setveport v VE name d Power on the new unit Log in to the new unit for example telnet vla virtualization engine name From the User Service Utility Menu enter 9 to clear the SAN database Choose Quit to clear the SAN database Configure the new unit setupve n virtualization engine name Check the configuration checkve n virtualization engine name Sun StorEdge 3900 and 6900 Series Troubl
107. roperly using the showvemap n Svepair command If the diskpool is unavailable try creatediskpools In t3name If that fails check the Sun StorEdge 3 array for unmounted volumes or path failures by using Checkt3config n St3name v createvlun Unable to execute command The Run checkt3mount n t3name associated Sun StorEdge T3 array 1 ALL to see the mount status of the physical LUN t31lun for disk pool volume For further information about diskpool might not be mounted problems with the underlying Sun StorEdge T3 array try cCheckt3config n St3name v Listavailable INo Sun StorEdge T3 arrays are If no other commands are running and available They are either not found or lyou believe the configuration lock the configuration lock is set might be set in error run the removelocks command Either the components the Sun StorEdge T3 array the switch or the virtualization engine are down cannot be pinged or another ISUNWsecfg command is running and lis updating the configuration ps ef modifyt3config The lock file clear waiting period Check to see if the modifyt3config expired and the creatediskpools and restoret3config commands command is aborted fare executing on other Sun StorEdge T3 arrays If the commands are executing wait for them to complete and then run creatediskpools n it 3name 138 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE B 3
108. roubleshooting Guide March 2002 Troubleshooting the T1 T2 Data Path 102 Notes 102 T1 T2 Notification Events 103 Sun StorEdge T3 Array Storage Service Processor Verification 106 T1 T2 FRU Tests Available 107 Notes 108 T1 T2 Isolation Procedures 108 Sun StorEdge T3 Array Event Grid 109 v Using the Sun StorEdge T3 Array Event Grid 109 Replacing the Master Midplane 122 v To Replace the Master Midplane 122 Conclusion 122 Troubleshooting Ethernet Hubs 123 setupswitch Exit Values 141 Contents For Internal Use Only vii viii Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 GURE 2 1 GURE 2 2 GURE 3 1 GURE 3 2 GURE 3 3 GURE 3 4 GURE 3 5 GURE 3 6 GURE 3 7 GURE 3 8 GURE 3 9 GURE 3 10 GURE 5 1 GURE 6 1 GURE 7 1 GURE 7 2 GURE 7 3 GURE 7 4 GURE 7 5 List of Figures Sun StorEdge 3900 Series Fibre Channel Link Diagram 16 Sun StorEdge 6900 Series Fibre Channel Link Diagram 17 Data Host Notification of Intermittent Problems 23 Data Host Notification of Severe Link Error 24 Storage Service Processor Notification 24 A2 B2 FC Link Host Side Event 29 A2 B2 FC Link Storage Service Processor Side Event 30 A3 B3 FC Link Host Side Event 35 A3 B3 FC Link Storage Service Processor Side Event 36 A3 B3 FC Link Storage Service Processor Side Event 36 A4 B4 FC Link Data Host Notification 40 Storage Service Processor Notification 41 Host Event Grid 54 Switch Event G
109. rvice 805 3067 Manual Storage server processor e Netra X1 Server User s Guide 806 5980 e Netra X1 Server Hard Disk Drive Installation Guide 806 7670 xiv Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Accessing Sun Documentation Online A broad selection of Sun system documentation is located at http www sun com products n solutions hardware docs A complete set of Solaris documentation and many other titles are located at http docs sun com Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions You can email your comments to Sun at docfeedback sun com Please include the part number 816 4290 10 of your document in the subject line of your email Preface xv xvi Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 CHAPTER 1 Introduction Series The Sun StorEdge 3900 and 6900 series storage subsystems are complete preconfigured storage solutions The configurations for each of the storage subsystems are shown in TABLE 1 1 TABLE 1 1 Sun StorEdge 3900 series Sun StorEdge 6900 series System Sun StorEdge 3910 system Sun StorEdge 3960 system Sun StorEdge 6910 system Sun StorEdge 6960 system Sun StorEdge Fibre Channel Switch Supported Two 8 port switches Two 16 port switches Two 8 port switches Two 16 port switches Sun StorEdge T3 Array Partner Gro
110. rvice Request Numbers 70 Service and Diagnostic Codes 70 v To Retrieve Service Information 70 CLI Interface 70 v To Display Log Files and Retrieve SRNs_ 71 v ToClear the Log 72 Contents For Internal Use Only v vi Virtualization Engine LEDs 72 Power LED Codes 73 Interpreting LED Service and Diagnostic Codes 73 Back Panel Features 74 Ethernet Port LEDs 74 Fibre Channel Link Error Status Report 75 v To Check Fibre Channel Link Error Status Manually 76 Translating Host Device Names 78 v To Display the VLUN Serial Number 79 Devices That Are Not Sun StorEdge Traffic Manager Enabled 79 Sun StorEdge Traffic Manager Enabled Devices 80 To View the Virtualization Engine Map 81 v To Failback the Virtualization Engine 83 v To Replace a Failed Virtualization Engine 84 v To Manually Clear the SAN Database 86 v To Reset the SAN Database on Both Virtualization Engines 86 v To Reset the SAN Database on a Single Virtualization Engine 86 Stopping and Restarting the SLIC Daemon 87 v To Restart the SLIC Daemon 87 Sun StorEdge 6900 Series Multipathing Example 89 One Sun StorEdge T3 array partner pair with 1 500GB RAID 5 LUN per brick 2 LUNs total 89 Virtualization Engine Event Grid 95 v___ Using the Virtualization Engine Event Grid 95 Troubleshooting the Sun StorEdge T3 Array Devices 99 Explorer Data Collection Utility 99 v To Install Explorer Data Collection Utility on the Storage Service Processor 99 Sun StorEdge 3900 and 6900 Series T
111. s and virtualization engines Chapter 5 Troubleshooting Host Devices 55 For Internal Use Only Storage Automated Diagnostic Environment Event Grid for the Host Continued TABLE 5 1 host lun VE host ifptest host qlctest host socaltest host enclosure host enclosure Alarm Diagnostic Test Diagnostic Test Diagnostic Test IPatchInfo backup Red Red Red Red Y Info The state of lun VE c14t50020 F2300003EE5d0s2 statusA on diag Xxxxx xxx com changed from OK to IERROR target ve diag244 lve0 90 0 0 40 ifptest diag240 on host failed qictest diag240 on host failed socaltest diag240 on host failed Info New patch land package information Info Agent Backup 56 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 luxadm display reported a change in the port status of fone of its paths The Storage Automated Diagnostic Environment then tries to find to which enclosure this path corresponds by reviewing its database of Sun StorEdge T3 arrays and virtualization engines Send changes to the output of showrev p and pkginfo generated Backup of the configuration file of the agent Replacing the Master Alternate Master and Slave Monitoring Host The following procedures are a high level overview of the procedures that are detailed in the Storage Automated Diagnostic Environment User s G
112. sS amp Sun microsystems Sun StorEdge 3900 and 6900 series Troubleshooting Guide Sun Microsystems Inc 4150 Network Circle Santa Clara CA 95054 U S A 650 960 1300 Part No 816 4290 11 March 2002 Revision A Send comments about this document to docfeedback sun com Copyright 2002 Sun Microsystems Inc 4150 Network Circle Santa Clara CA 95054 U S A All rights reserved This product or document is distributed under licenses restricting its use copying distribution and decompilation No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and other countries exclusively licensed through X Open Company Ltd Sun Sun Microsystems the Sun logo AnswerBook2 Sun StorEdge StorTools docs sun com Sun Enterprise Sun Fire SunOS Netra and Solaris are trademarks registered trademarks or service marks of Sun Microsystems Inc in the U S and other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Micr
113. sfer size for a FABRIC port is 200 Changing transfer size 2000 to 200 switchtest completed successfully Remove FC loopback cable from switch 100000c0dd00b682 port 8 Restore ORIGINAL FC Cable into switch 100000cOdd00b682 port 8 Suspect ORIGINAL FC GBIC in switch 100000c0dd00b682 port 8 Retest to verify FRU replacement linktest completed on FC interconnect switch to switch FIGURE 8 4 Example Link Test Text Output from the Storage Automated Diagnostic Environment Chapter 8 Troubleshooting the Sun StorEdge T3 Array Devices 107 For Internal Use Only Notes a When inserting a loopback connector into the T Port there will be NO green light indicating a proper insertion However the test will run and be valid There is currently an RFE to address this issue m If only one of the links has failed and the I O is travelling over the remaining link once the failed link is replaced and recabled I O will be automatically be routed over the repaired link by the switch No manual intervention is required a If both links have failed and a LUN failover has occured after repairing the links and recabling them the user will have to manually perform a mpdrive failback to return the paths to their optimal state I O will then resume as normal over the T Ports T1 T2 Isolation Procedures 1 Run linktest from the Storage Automated Diagnostic Environment for a guided isolation procedure 2 After replacing the failed
114. t Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TW ERROR 765 in 11 mins Origin port 1 on switch swlb 192 168 0 31 An invalid transmission word ITW was detected between two components Likely Causes are GBIC FC Cable and device optical connections To isolate further please run the Storage Automated Diagnostic Environment v To Verify the Host Side An error in the A2 B2 FC link can result in a device being listed as in an unusable state in cf gadm but no HBAs are listed as in the unconnected state in luxadm output The multipathing software will note an OFFLINE path Chapter 3 Troubleshooting the Fibre Channel Links 31 For Internal Use Only CODE EXAMPLE 3 4 cfgadm cfgadm al Ap_Id c0 lt snip gt luxadm e port Status Port A Status Port B Vendor Product ID WWN Node WWN Port A WWN Port B Revision Serial Num Unformatted capacity Write Cache Read Cache Minimum prefetch Maximum prefetch Device Type Path s Controller Device Address Class State Controller Device Address Class State 32 DEVICE PROPERTIES for disk c6t29000060220041F96257354230303052d0s2 al Type Receptacle scsi bus Occupant connected configured Found path to 2 HBA ports devices pci 6 4000 SUNW qlc 2 fp 0 0 devct1 devices pci 6 4000 SUNW qlc 3 fp 0 0 devct1 dev rdsk O K O Ks SUN SESSO1 2a000060220041 f9
115. t alerting the LUN failover Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xxx com Severity Warning Actionable Category Ve DeviceId ve 6257335A 30303142 EventType AlarmEvent volume EventTime 01 30 2002 11 49 05 Volume T49152 on diag209 vla changed from 6257335A 30303142 active 50020F23 00006DFA passive to 6257335A 30303142 active 50020F23 00006DFA passive 50020F23 0000725B INFORMATION This event occurs when the virtualization engine has detected a change in status for a Multipath Drive or VLUN usually meaning a pathing problem to a Sun StorEdge T3 array controller for changes in Active Passive paths 2 Check Sun StorEdge T3 array for current LUN ownership port listmap 3 Use mpdrive failback if needed to fail LUNs back to correct controller if needed Site Lab 3286 DSQA1 Broomfield Source diag xxxxx xxx com Severity Warning Category Message DeviceId message diag xxxxx xxx com EventType LogEvent driver SSD_WARN EventTime 01 30 2002 11 50 07 Found 1 driver SSD_WARN warning s in logfile var adm messages on diag xxxxx xxx com id 809f76b4 INFORMATION SSD warnings Jan 30 11 49 48 WWN Received 7 SSD Warning message s on ssd56 in 8 mins threshold is 5 in 24hours Last Message diag xxxxx xxx com scsi ID 243001 kern warning WARNING scsi_vhci ssd g29000060220041956257335a30303145 ssd56 continued on next page 104 Sun St
116. t fails again replace the switch 6 Insert a loopback connector into the HBA 7 Run qlctest m If the test fails replace the HBA a If the test passes replace the cable 8 Recable the entire link 9 Run switchtest or qlctest to validate the fix 10 Return the path to production 28 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 A2 B2 Fibre Channel FC Link If a problem occurs with the A2 B2 FC link m Ina Sun StorEdge 3900 series system the Sun StorEdge T3 array will fail over m Ina Sun StorEdge 6900 series system no Sun StorEdge T3 array will fail over but a severe problem can cause a path to go offline FIGURE 3 4 and FIGURE 3 5 are examples of A2 B2 FC Link Notification Events From root Tue Jan 8 18 39 48 2002 Date Tue 8 Jan 2002 18 39 47 0700 MST Message Id lt 200201090139 g091d1g07015 diag xxxxx xxx com gt From Storage Automated Diagnostic Environment Agent Subject Message from diag xxxxx xxx com 2 0 B2 002 Content Length 2742 You requested the following events be forwarded to you from diag skeeK ae COM Site FSDE LAB Broomfield CO Source diag226 xxxxx xxx com Severity Normal Category Message Key message diag xxxxx xXxxX Ccom EventType LogEvent driver Fabric_Warning EventTime 01 08 2002 17 34 47 Found 1 driver Fabric_Warning warning s in logfile var adm messages on diag xxxxx xxx com id 80fee746 Info Fabric warning
117. t listmap Another but slower method is to run the runsecfg script and verify the virtualization engine maps by polling them against a live system Caution During the failover SCSI errors will occur on the data host and a brief suspension of I O will occur v To Return the Path to Production 1 Type cfgadm c configure device cfgadm c configure c2 2b000060220041f4 2 Verify that I O has resumed on all paths 10 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 v To View the VxDisk Properties 1 Type the fol lowing Device type hostid disk group flags pubpaths version iosize public private update headers configs logs config config log CTLR NAME devicetag privpaths vxdisk list Disk_1 Disk_1 Disk_1 sliced diag xxxxx xxx COM name t3dg02 id 1010283311 1163 diag xxxxx xxx com name t3dg id 1010283312 1166 diag xxxxx xxx com online ready private autoconfig nohotuse autoimport imported block dev vx dmp Disk_1s4 char dev vx rdmp Disk_1s4 block dev vx dmp Disk_1s3 char dev vx rdmp Disk_1s3 2 2 min 512 bytes max 2048 blocks slice 4 offset 0 len 209698816 slice 3 offset 1 len 4095 time 1010434311 seqno 0 6 0 248 count 1 len 3004 count 1 len 455 Defined regions priv 000017 000247 000231 copy 01 offset 000000 enabled priv 000249 003021 002773 copy 01 offset 000231 enabled priv 003022 003476 0004
118. tain items may be flagged as FAIL in these special circumstances CODE EXAMPLE 4 1 opt SUNWsecfg checkdefaultconfig output opt SUNWsecfg checkdefaultconfig Checking all accessible components hecking switch swla witch swla PASSED hecking switch swlb witch swlb PASSED hecking switch sw2a witch sw2a PASSED hecking switch sw2b Switch sw2b PASSED Please enter the Sun StorEdge T3 array password QANnANANA Checking T3 t3b0 Checking t3b0 Configuration ssis Checking command ver PASS Checking command vol stat PASS Checking command port list PASS Checking command port listmap PASS Checking command sys list FAIL lt Failure Noted Checking T3 t3b2 Checking t3b2 Configuration Checking command ver PASS Checking command vol stat PASS Checking command port list PASS Checking command port listmap PASS Checking command sys list PASS lt snip gt Checking Virtualization Engine Pair Parameters vla vla configuration check passed Checking Virtualization Engine Pair Parameters vlb vlb configuration check passed Checking Virtualization Engine Pair Configuration vl checkvemap virtualization engine map vl verification complete PASS 48 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 10 If anything is marked FAIL check the var adm log SEcfglog file for the details of the failure Mon Jan
119. tchtest runs a No virtualization engine tests are available at this time v To Isolate the A2 B2 FC Link 1 Quiesce the I O on the A2 B2 FC link path 2 Break the connection by uncabling the link 3 Insert the loopback connector into the switch port 4 Run switchtest a If the test fails replace the GBIC and rerun switchtest b If the test fails again replace the switch Chapter 3 Troubleshooting the Fibre Channel Links 33 For Internal Use Only 5 If the switch or the GBIC show no errors replace the remaining components in the following order a Replace the virtualization engine side GBIC recable the link and monitor the link for errors b Replace the cable recable the link and monitor the link for errors c Replace the virtualization engine restore the virtualization engine settings recable the link and monitor the link for errors 6 Return the path to production The procedures for restoring virtualization engine settings are in the Sun StorEdge 3900 and 6900 Series Reference Manual 34 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 A3 B3 Fibre Channel FC Link If a problem occurs with the A3 B3 FC link m Ina Sun StorEdge 3900 series system the Sun StorEdge T3 array will fail over m Ina Sun StorEdge 6900 series system no Sun StorEdge T3 array will fail over but a severe problem can cause a path to go offline FIGURE 3 6 FIGURE 3 7 and FIGURE 3 8 are examples of A
120. ter 3 Troubleshooting the Fibre Channel Links 35 For Internal Use Only Site FSDE LAB Broomfield CO Source diag xxxxx xXxx com Severity Normal Category Switch Key switch 100000c0dd0057bd EventType StateChangeEvent M port 1 EventTime 01 08 2002 18 28 38 port 1 in SWITCH diag swla ip 192 168 0 30 is now Not Available status state changed from Online to Offline Info A port on the switch has logged out of the fabric and gone offline Action 1 Verify cables GBICs and connections along Fibre Channel path 2 Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path 3 Verify correct FC switch configuration FIGURE 3 7 A3 B3 FC Link Storage Service Processor Side Event Site FSDE LAB Broomfield CO Source diag XXXXX XXX COom Severity Normal Category Switch Key switch 100000c0dd00cbfe EventType StateChangeEvent M port 1 EventTime 01 08 2002 18 28 40 port 1 in SWITCH diag sw2a ip 192 168 0 32 is now Not Available status state changed from Online to Offline Info A port on the switch has logged out of the fabric and gone offline Action 1 Verify cables GBICs and connections along Fibre Channel path 2 Check Storage Automated Diagnostic Environment SAN Topology GUI to identify failing segment of the data path 3 Verify correct FC switch configuration FIGURE 3 8 A3 B3 FC Link Storage Serv
121. the master host v To Replace the Alternate Master or Slave Monitoring Host 1 Choose Maintenance gt General Maintenance gt Maintain Hosts Refer to Chapter 3 Maintenance of the Storage Automated Diagnostic Environment User s Guide 2 In the Maintain Hosts window select the host to be replaced from the Existing Hosts list and click Delete 3 Install the new host Refer to Chapter 2 of the Storage Automated Diagnostic Environment User s Guide for detailed instructions for the next four steps 4 Install the SUNWstade package on the new host 5 Run opt SUNWstade bin ras_install 6 Configure the host as a slave 58 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 7 Choose Maintenance gt General Maintenance gt Maintain Hosts Refer to Chapter 3 Maintenance of the Storage Automated Diagnostic User s Guide for detailed instructions 8 In the Maintain Hosts window select the new host 9 Configure the options as needed 10 Choose Maintenance gt Topology Maintenance gt Topology Snapshot a In the Topology Snapshot window select the new host b Click Create and Retrieve Selected Topologies c Click Merge and Push Master Topology Conclusion Any time a master alternate master or slave monitoring host is replaced you must recover the configuration using the procedures described above This is especially important when the Storage Service Processor is r
122. tilities Do not configure or modify the switches using any method other than the SUNWsecfg tools v To Diagnose and Troubleshoot Switch Hardware 1 To diagnose and troubleshoot the switch hardware begin by running the SUNWsecfg checkswitch utility 2 For detailed troubleshooting procedures refer to the Sun StorEdge SAN Field Troubleshooting Guide Release 3 0 The Sun StorEdge SAN Field Troubleshooting Guide Release 3 0 describes how to diagnose and troubleshoot the switch hardware The scope of this document includes the Sun StorEdge network FC switch 8 and switch 16 switch and the interconnections HBA GBIC cables on either side of the switch In addition the document provides examples of fault isolation and includes a Brocade switch appendix Switch Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort switch events by component category or event type The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event whether action is required a description of the event and the recommended action Refer to the Storage Automated Diagnostic Environment User s Guide for more information v Using the Switch Event Grid 1 From the Storage Automated Diagnostic Environment Help menu click the Event Grid link 2 Select the criteria from the Storage Automated Diagnostic Environment event grid like the one shown in FIGURE 6 1 62 Sun Stor
123. to T2 T Port switch to switch link T3 Alt Master aN T3 Master FIGURE 2 2 Sun StorEdge 6900 Series Fibre Channel Link Diagram Chapter 2 General Troubleshooting Procedures 17 For Internal Use Only 18 Host Side Troubleshooting Host side troubleshooting refers to the messages and errors the data host detects Usually these messages appear in the var adm messages file Storage Service Processor Side Troubleshooting Storage Service Processor side Troubleshooting refers to messages alerts and errors that the Storage Automated Diagnostic Environment running on the Storage Service Processor detects You can find these messages by monitoring the following Sun StorEdge 3900 series and the Sun StorEdge 6900 series components m Sun StorEdge network FC switch 8 and switch 16 switches a Virtualization engine m Sun StorEdge T3 array Combining the host side messages and errors and the Storage Service Processor side messages alerts and errors into a meaningful context is essential for proper troubleshooting Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Command Line Test Examples To run a single Sun StorEdge diagnostic test from the command line rather than through the Storage Automated Diagnostic Environment interface you must log into the appropriate Host or Slave for testing the components The following two tests the qlctest 1M and the switchtest 1M are provided
124. tus state changed from unmounted to Imounted Information The Sun StorEdge T3 array has reported that a lloopcard has been replaced or brought back online t3 power State Change lpower ulpcu2 TEC ITROL CAN 300 1454 01 50 008275 in T3 rasd2 t3bl ip xxx 0 0 41 is now Available status state changed from ready disable to Iready enable 118 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 For Internal Use Only Category Component lEventType Sev Action Description Information t3 kontroller State Red Y Info Action Recommended action Change controller ulctr l Telnet to affected in T3 diag213 Sun StorEdge T3 ip xxx 20 67 213 array lis now Not Available 2 Verify the status state changed controller state from unknown to with fru stat and ready disabled sys stat B Run logger Information The Sun dmprstlog to StorEdge T3 array capture controller controller has been information disabled 4 Re enable the controller if possible enable u 5 Replace the controller if necessary t3 StateChange Red Y Info Action Information The Sun StorEdge T3 array disk u1ld5 in T3 has reported that a rasd3 t3b1 disk has failed ip xxx 0 0 41 is mow Not Available Recommended action status state changed from unknown to 1 Telnet to the fault disabled affected Sun StorEdge T3 array 2
125. uide Follow these procedures when replacing a master alternate master or slave monitoring host Note The procedures for replacing the master host are different from the procedures for replacing an alternate master or slave monitoring host v To Replace the Master Host Refer to Chapter 2 of the Storage Automated Diagnostic Environment User s Guide for detailed instructions for the next four steps 1 Install the SUNWstade package on a new Master Host 2 Run opt SUNWstade bin ras_install on the new Master Host 3 Configure the Host as the Master Host 4 Connect to the Master Server s GUI at http lt servername gt 7654 Chapter 5 Troubleshooting Host Devices 57 For Internal Use Only 5 Choose Utilities gt System gt Recover Config Refer to Chapter 7 of the Storage Automated Diagnostic Environment User s Guide for detailed instructions a In the Recover Config window enter the IP address of any alternate master or slave monitoring host all hosts keep a copy of the configuration b Make sure the Recover Config and Reset slave to this master checkboxes are checked c Click Recover 6 Choose Maintenance gt General Maintenance Ensure that all host and device settings are recovered correctly Refer to Chapter 3 of the Storage Automated Diagnostic Environment User s Guide for detailed instructions 7 Choose Maintenance gt General Maintenance gt Start Stop Agent to start the agent on
126. uide for more information Using the Host Event Grid From the Storage Automated Diagnostic Environment Help menu click the Event Grid link Select the criteria from the Storage Automated Diagnostic Environment event grid like the one shown in FIGURE 5 1 53 rt Utiliti amp SUN Storage Automated Diagnostic Environment 2 0 06 010 diag176 central sun com WV SUN Help SiteMap Help Event Grid Help Help Page Select a Category Component EventType and type GO to limit the report Click on the Columns headers to change the Event Grid sort Check ReportFormat to displ Click Info Action to Review Event Grid pdf Category i EventType Architecture Diagnosti Diag strategy All m ReportFormat host a Info status of hba devices sbus 9 0 SUNW glc 0 30000 fp 0 0 deyctl on iag245 central sun com changed from NOT CONNECTED to ONNECTED Info status of hba devices sbus 9 0 SUNW glc 0 30000 fp 0 0 devctl on iag245 central sun com changed from CONNECTED to NOT ONNECTED Info The state of lun T300 c14tS0020F2300003EESdOs2 status on iag245 centralsun com changed from O K to ERROR target t3 diag244 t3b0 90 0 0 40 lun T300 iag245 central sun com changed from O K to ERROR ifptest diag240 on host failed qictest diag240 on host failed Info New Patch and Package Information generated
127. ups Supported 1to4 1to4 1to3 1to3 Additional Array Partner Groups Supported with Optional Additional Expansion Cabinet Not applicable 1to5 1to4 2 Predictive Failure Analysis Capabilities The Storage Automated Diagnostic Environment software provides the health and monitoring functions for the Sun StorEdge 3900 and 6900 series systems This software provides the following predictive failure analysis PFA capabilities m FC links Fibre Channel links are monitored at all end points using the link FC ELS link counters When link errors surpass the threshold values an alert is sent This enables Sun personnel to replace components that are experiencing high transient fault levels before a hard fault occurs a Enclosure status Many devices like the Sun StorEdge network FC switch 8 and switch 16 switch and the Sun StorEdge T3 array will cause the Storage Automated Diagnostic Environment alerts to be sent if the temperature thresholds are exceeded This enables Sun trained personnel to address the problem before the component and enclosure fails SPOF notification Storage Automated Diagnostic Environment notification for path failures and failovers that is Sun StorEdge Traffic Manager software failover can be considered PFA since Sun trained personnel are notified and can repair the primary path This eliminates the time of exposure to single points of failure and helps to preserve customer availabi
128. urations Check the present Sun StorEdge T3 array configuration with showt3 n Kt 3 gt command and verify whether the configuration is corrupted or has changed If it is not one of the standard configurations restore the configuration using the restoret3config command Common to Sun StorEdge T3 array 1 Could not mount volume vol 2 Slun config does not match There might be multiple drive failures lor corrupted data or parity on the ILUN Replace the failed FRUs and restore the Sun StorEdge T3 array configuration with the restoret3config f n t3_name command Common to Sun StorEdge T3 array The fru status is not ready or enabled Operations on the Sun StorEdge T3 array are being aborted The disk controller or loop interface card in the Sun StorEdge T3 array might be bad Replace the failed FRU and rerun the utility Common to Sun StorEdge T3 array 1 The Sun StorEdge T3 array is not of T3B type and it cannot continue aborting operations 2 t3config utilities are supported only in the Sun StorEdge T3 array the t3config utilities are not supported on Sun StorEdge T3 arrays with 1 xx firmware The Sun StorEdge T3 array configuration is not a standard configuration refer to the t3 default custom configuration table in the Sun StorEdge 3900 and 6900 Series Hardware Installation and Service Manual Use showt3 n t3_name to display the present configuration Use the modifyt
129. vents Site FSDE LAB Broomfield CO Source diag xxxxx xxx com Severity Warning Category Message DeviceId message diag xxxxx xxx com EventType LogEvent driver MPXIO_offline EventTime 01 29 2002 14 28 06 Found 2 driver MPXIO_offline warning s in logfile var adm messages on diag xxxxx xxx com id 80e4aa60 lt snip gt Site FSDE LAB Broomfield CO Source diag xxxxx xxx com Severity Warning Category Message DeviceId message diag xxxxx xxx com EventType LogEvent driver Fabric_Warning EventTime 01 29 2002 14 28 06 Found 1 driver Fabric_Warning warning s in logfile var adm messages on diag xxxxx xxx com id 80e4aa60 INFORMATION Fabric warning lt snip gt status of hba devices pci a 2000 pci 2 SUNW qlc 5 fp 0 0 devctl on diag xxxxx xxx com changed from CONNECTED to NOT CONNECTED INFORMATION monitors changes in the output of luxadm e port Found path to 20 HBA ports devices sbus 2 0 SUNW socal d 10000 0 NOT CONNECTED FIGURE 3 9 A4 B4 FC Link Data Host Notification 40 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Site FSDE LAB Broomfield CO Source diag Severity Warning Category Switch DeviceId switch 100000c0dd0061bb EventType LogEvent MessageLog EventTime 01 29 2002 14 25 05 Change in Port Statistics on switch diag swlb ip 192 168 0 31 Port 1 Received 16289 InvalidTxWds in 0 mins value 365972 Site FSD
130. witch Port Testing Device Switc NONNNNN NNN ND NY h h h h h Testing Device Switch Port h h h h h All Storage Automated Diagnostic Environment diagnostics tests are located in opt SUNWstade Diags bin Refer to the Storage Automated Diagnostic Environment User s Guide for a complete list of tests subtests options and restrictions 20 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 Storage Automated Diagnostic Environment Event Grid The Storage Automated Diagnostic Environment generates component specific event grids that describe the severity of an Event whether action is required a description of the event and recommended action Refer to Chapters 5 through 9 of this troubleshooting guide for component specific event grids v To Customize an Event Report 1 Click the Event Grid link on the the Storage Automated Diagnostic Environment Help menu 2 Select the criteria from the Storage Automated Diagnostic Environment event grid like the one shown in in TABLE 2 2 TABLE 2 2 Event Grid Sorting Criteria Category Component Event Type Severity Action e All Default e All e Agent Deinstall Red Y This e Sun StorEdge A3500FC array Default e Agent Install Critical event is e Sun StorEdge A5000 array e Backplane e Alarm Error actionable e Agent e Controller e Alternate Master and is sent e Host e Disk e Alternate Master Oo to RSS e Message
131. wo minutes to reset after a configuration change You might be attempting to download fa flash file for an 8 port switch to a 16 port switch Check showswitch s switch and look for number of ports Ensure that this matches the second and third characters of the flash file name for example m08030462 fls The switch might not be set for rarp or rarp is not working correctly Try ping switch after waiting a few Imore minutes If errors persist manually power cycle the switch setupswitch setupswitch Switch switch timed out after reset Could not set chassis ID on switch S switch to cid The switch took longer than two minutes to reset after a configuration change Try ping switch after waiting a few more minutes If errors persist manually power cycle the switch This should occur only in a SAN environment with cascaded switches Be aware of the switch chassis IDs of lall switches in the SAN and make sure the IDs are all unique Once the chassis IDs are established override the switch chassis IDs with the following command Isetupswitch s Sswitch_name i unique_chassis_id v 136 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide March 2002 TABLE B 3 Sun StorEdge T3 Array SUNWsecfg Error Messages Message Description and Cause of Error Suggested Action Common to Sun StorEdge T3 array Present configuration does not match Reference config

Download Pdf Manuals

image

Related Search

Related Contents

Samsung Genio II User Manual  Bryant 280ANV Heat Pump User Manual  Ozone Environment Test Chamber 150L User  Thunderbolt V Ignition System  MSA Optimair HC (Health Care) User Manual  White Rodgers 1F86EZ-0251 Thermostat User Manual  取扱説明書 - SOOKI  Samsung 2343BW Lietotāja rokasgrāmata  télécharger le guide utilisateur de FileSender - Université Paris-Est  Untitled  

Copyright © All rights reserved.
Failed to retrieve file