Home
Sun SPARC Enterprise T5440 Server Service Manual
Contents
1. Servicing Customer Replaceable Units Note Depending on the configuration of Oracle ILOM POST variables and whether POST detected faults or not the system might boot or the system might remain at the ok prompt If the system is at the ok prompt type boot d Return the virtual keyswitch to Normal mode gt set SYS keyswitch state Normal Set ketswitch_state to Normal e Switch to the system console and issue the Solaris OS fmadm faulty command fmadm faulty No memory faults should be displayed If faults are reported refer to the diagnostics flowchart in FIGURE Diagnostic Flowchart on page 12 for an approach to diagnose the fault 4 Switch to the Oracle ILOM command shell 5 Run the show faulty command m If the fault was detected by the host and the fault information persists the output will be similar to the following example show faulty Target Property Value Eo TR ees t SP faultmgmt 0 fru SYS MB CPU0 CMP0 BRO CH1 DO SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUNAV 8000 DX faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Dec 14 22 43 59 faults 0 m Ifthe show faulty command does not report a fault with a UUID then you do not need to proceed with the following step b
2. show SYS SYS Targets SERVICE LOCATE ACT PS_FAULT TEMP FAULT FAN FAULT 66 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Properties type Host System keyswitch state Normal product name T5440 product serial number 0723BBC006 fault state OK clear fault action none power state On Commands cd reset set show start stop Powering Off the System Note Additional information about powering off the system is located in the Sun SPARC Enterprise T5440 Server Administration Guide This topic includes the following m Power Off Command Line on page 67 m Power Off Graceful Shutdown on page 68 m Power Off Emergency Shutdown on page 68 m Disconnect Power Cords From the Server on page 68 V Power Off Command Line 1 Shut down the Solaris OS At the Solaris prompt type shutdown g0 i0 y svc startd The system is coming down Please wait svc startd 91 system services are now being stopped Jun 12 19 46 57 wgs41 58 syslogd going down on signal 15 Preparing to Service the System 67 svc stard The system is down syncing file systems done Program terminated r eboot o k prompt h alt 2 Switch from the system console prompt to the service processor console prompt Type ok gt 3 From the Oracle ILOM gt prompt type gt stop SYS Are you sure you want to stop SYS y n y Stopping
3. Parameter Values Description normal POST output displays all test and informational messages max POST displays all test informational and some debugging messages Related Information m Diagnostic Flowchart on page 11 m Change POST Parameters on page 29 m Run POST in Maximum Mode on page 30 m Detecting Faults Using POST on page 45 m Clear Faults Detected During POST on page 51 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide V Change POST Parameters 1 Access the Oracle ILOM prompt See Connecting to the Service Processor on page 23 2 Use the Oracle ILOM commands to change the POST parameters Refer to Component Fault on page 33 for a list of Oracle ILOM POST parameters and their values The set SYS keyswitch state command sets the virtual keyswitch parameter For example gt set SYS keyswitch state Diag Set keyswitch_ state to Diag To change individual POST parameters you must first set the keyswitch state parameter to normal For example set SYS keyswitch state Normal Set ketswitch state to Normal set HOST diag property Min Managing Faults 30 WV Run POST in Maximum Mode This procedure describes how to run POST when you want maximum testing as in the case when you are troubleshooting a server or verifying a hardware upgrade or repair 1 Access the Orac
4. Antistatic Prevention Measures on page 73 m Remove a Fan Tray on page 89 Note You must remove all four fan trays m Remove the Top Cover on page 73 Servicing Field Replaceable Units 137 m Remove a CMP Memory Module on page 106 Note You must remove all CPU modules and memory modules from the system Do the following 1 Remove the nine No 1 Phillips screws securing the fan tray carriage to the top of the chassis 2 Loosen the seven captive No 2 Phillips securing the bottom of the fan tray carriage to the motherboard assembly 3 Lift the fan tray carriage up and out of the system V Install the Fan Tray Carriage 1 Lower the fan tray carriage into the system 138 Sun SPARC Enterprise T5440 Server Service Manual June 2011 2 Secure the seven captive No 2 Phillips screws 3 Install the nine No 1Phillips screws Next Steps m Install a Fan Tray on page 90 Note Install all four fan trays m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Power On the Server on page 161 Servicing the Hard Drive Backplane The hard drive backplane provides the power and data interconnect to the internal hard drives This topic includes the following m Remove the Hard Drive Backplane on page 140 m Install the Hard Drive Backplane on page 141 Servicing Field Replaceable Units 139 Related Information m Servicing Hard Drives
5. show faulty Target Property Value eee eer ater Serene Peete ee CM soe ee ehe Se ee La er ee SP faultmgmt 0 fru SYS SP faultmgmt 0 timestamp Mar 17 08 17 45 SP faultmgmt 0 timestamp Mar 17 08 17 45 faults 0 SP faultmgmt 0 sp detected fault At least 2 power supplies must faults 0 have AC power Note Environmental and configuration faults automatically clear when the environmental condition returns to the normal range of when the configuration fault is addressed m Example showing a fault that was detected by the PSH technology These kinds of faults are distinguished from other kinds of faults by the presence of a sunw msg id and by a UUID show faulty Target Property Value EERE th Eee E RES Oe AR ORIENTIEREN bitte herd a lata AL ete te cha ta a SP faultmgmt 0 fru SYS MB MEMO CMP0 BR1 CH1 D1 SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUNAV 8000 DX faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Dec 14 22 43 59 faults 0 36 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Example showing a fault that was detected by POST These kinds of faults are identified by the message Forced fail reason where reason is the name of the power on routine that detected the failure show faulty SP faultmgmt 0 SYS MB CPU0 CMP0 BR1 CHO DO SP fau
6. 2 Type the following command more var adm messages Sun SPARC Enterprise T5440 Server Service Manual June 2011 3 If you want to view all logged messages type the following command more var adm messages Detecting Faults Oracle ILOM Event Log Certain problems are recorded in the Oracle ILOM event log but not posted to the list of faults displayed by the Oracle ILOM show faulty command Inspect the Oracle ILOM event log if you suspect a problem but no entry appears in the Oracle ILOM show faulty command output Related Information m Diagnostic Flowchart on page 11 m View Oracle ILOM Event Log on page 39 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server View Oracle ILOM Event Log Type the following command gt show SP logs event list Note The Oracle ILOM event log can also be viewed through the Oracle ILOM BUI or the ALOM CMT CLI If a major or critical event is found that was not expected and not included under Oracle ILOM show faulty than it may indicate a system fault The following is an example of unexpected major events in the log gt show sp logs event list 1626 Fri Feb 15 18 57 29 2008 Chassis Log major Feb 15 18 57
7. Note You can use the FB DIMM DIAG buttons on the CMP module and memory module to identify faulty FB DIMMs See FB DIMM Fault Button Locations on page 120 Once you identify which FB DIMMs you want to replace see Servicing FB DIMMs on page 110 for FB DIMM removal and replacement instructions You must perform the instructions in that section to clear the faults and enable the replaced FB DIMMs Related Information m POST Parameters on page 28 m Displaying FRU Information With Oracle ILOM on page 25 m Detecting Faults on page 32 m Servicing FB DIMMs on page 110 Connecting to the Service Processor Before you can run Oracle ILOM commands you must connect to the service processor There are several ways to connect to the service processor Topic Links Connect an ASCII terminal directly to the serial management port Use the ssh command to connect to service processor through an Ethernet connection on the network management port Switch from the system console to the service processor Switch from the service processor to the system console Sun SPARC Enterprise T5440 Server Installation and Setup Guide Sun SPARC Enterprise T5440 Server Installation and Setup Guide Switch From the System Console to the Service Processor Oracle ILOM or ALOM CMT Compatibility Shell on page 24 Switch From Oracle ILOM to the System Console on page 24 Switch From the ALOM CMT Compatibility She
8. You must install each hard drive in the same bay from which it was removed 2 Press the hard drive latch release button 3 Slide the hard drive out of its bay V Install a Hard Drive If you are installing a hard drive after servicing another component in the system do the following 1 Align the replacement drive to the drive slot Hard drives are physically addressed according to the slot in which they are installed If you removed an existing hard drive from a slot in the server you must install the replacement drive in the same slot as the drive that was removed 2 Slide the drive into the drive slot until it is fully seated 84 Sun SPARC Enterprise T5440 Server Service Manual June 2011 3 Close the latch to lock the drive in place 4 If you performed any additional service procedures see Power On the Server on page 161 Hard Drive Device Identifiers The following table lists physical drive locations and their corresponding default path names in OpenBoot PROM and Solaris for the server Device Device Identifier OpenBoot PROM Solaris Default Drive Path Name HDDO SYS HDDO c0 dsk d1t0d0 HDD1 SYS HDD1 c0 dsk d1t1d0 HDD2 SYS HDD2 c0 dsk d1t2d0 HDD3 SYS HDD3 c0 dsk d1t3d0 Note Hard drive names in Oracle ILOM messages are displayed with the full FRU name such as SYS HDD0 Related Information m Hard Drive LEDs on page 86 Servicing Customer Replaceable Units 85 H
9. All other devices 1 Slot 0 1 2 3 Maximum of 4 cards install in order shown PCIe Slots 4 5 6 and 7 are unavailable in 1P systems Both XAUI Slots 0 and 1 are available in 1P systems 2 Slot 0 4 1 5 2 63 7 Maximum of 8 cards install in order shown 4 Slot 0 4 2 6 1 5 3 7 Maximum of 8 cards install in order shown Note These are guidelines to spread out the I O load across multiple CMP memory module pairs These are not configuration restrictions External I O Expansion Unit PCIe Link cards must be placed in a PCIe slot with a CMP memory module pair present as follows m PCIe Slots 0 and 1 require CMP Memory pair 0 m PCIe Slots 4 and 5 require CMP Memory pair 1 m PCIe Slots 2 and 3 require CMP Memory pair 2 m PCIe Slots 6 and 7 require CMP Memory pair 3 Related Information m PCIe Device Identifiers on page 101 m System Bus Topology on page 171 m I O Fabric in 2P Configuration on page 172 m I O Fabric in 4P Configuration on page 173 Servicing Customer Replaceable Units 103 104 Servicing CMP Memory Modules This topic includes the following CMP Memory Modules Overview on page 104 Remove a CMP Memory Module on page 106 Install a CMP Memory Module on page 107 Add a CMP Memory Module on page 108 CMP and Memory Module Device Identifiers on page 109 Supported CMP Memory Module Configurations on page 110 CMP Memory Modules Overview Up to four
10. Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Do the following 1 2 Identify the correct slot for installation Remove the air baffle Squeeze the air baffle latches toward each other and lift the air baffle straight up and out of the chassis If you are installing the module into a previously empty slot remove the plastic connector cover on the motherboard Slide the module down into its slot 5 i J n TO v mm mm ML LL Rotate the ejector levers down to secure the module into place Next Steps Install the Top Cover on page 158 Slide the Server Into the Rack on page 159 Power On the Server on page 161 108 Sun SPARC Enterprise T5440 Server Service Manual June 2011 CMP and Memory Module Device Identifiers The following table describes device device identifiers and supported configurations for CMP and memory modules Device identifiers are case sensitive Device Device Identifier CMPO SYS MB CPU0 CMPO MEMO SYS MB MEMO0 CMPO CMP1 SYS MB CPU1 CMP1 MEM1 SYS MB MEM1 CMP1 CMP2 SYS MB CPU2 CMP2 MEM2 SYS MB MEM2 CMP2 CMP3 SYS MB CPU3 CMP3 MEM3 SYS MB MEM3 CMP3 Note CMP and memory module names in Oracle ILOM messages are displayed with the full FRU name such as SYS MB CPUO Related Information m Managing Faults on page 9 FB DIMM Configuration on page 116 m
11. Detecting Faults Oracle ILOM Event Log on page 39 Detecting Faults Using POST on page 45 Identifying Faults Detected by PSH on page 47 Detecting Faults Using LEDs The server provides the following groups of LEDs Front panel system LEDs See Front Panel LEDs on page 5 Rear panel system LEDs See Rear Panel LEDs on page 8 Hard drive LEDs See Hard Drive LEDs on page 86 Power supply LEDs See Power Supply LED on page 97 Fan tray LEDs See Fan Tray Fault LED on page 91 Rear panel Ethernet port LEDs See Ethernet Port LEDs on page 9 CMP module or memory module LEDs See Servicing CMP Memory Modules on page 104 FB DIMM Fault LEDs See FB DIMM Fault Button Locations on page 120 32 Sun SPARC Enterprise T5440 Server Service Manual June 2011 These LEDs provide a quick visual check of the state of the system The following table describes which fault LEDs are lit under given error conditions Use the Oracle ILOM show faulty command to obtain more information about the nature of a given fault See Detect Faults Oracle ILOM show faulty Command on page 35 Component Fault Fault LEDs Lit Additional Information Power supply Fan tray Hard drive Service Required LED front and rear panel Front panel Power Supply Fault LED Individual power supply Fault LED Service Required LED front and rear panel Front panel Fan Fault LED Individual fan tray Fault LED
12. FRUID PROMs are updated the fault is logged and alerts are displayed Faulty FRUs are identified in fault messages using the FRU name 16 Sun SPARC Enterprise T5440 Server Service Manual June 2011 FIGURE Oracle ILOM Fault Management Envi tal FRU fault LEDs nvironmentals FHUTaut LEUS gt ILOM System fault LED Er gt fault manager gt User alerts Solaris PSH showfaults The service processor can detect when a fault is no longer present and clears the fault in several ways m Fault recovery The system automatically detects that the fault condition is no longer present The service processor extinguishes the Service Required LED and updates the FRU s PROM indicating that the fault is no longer present m Fault repair The fault has been repaired by human intervention In most cases the service processor detects the repair and extinguishes the Service Required LED If the service processor does not perform these actions you must perform these tasks manually by setting the Oracle ILOM component state or fault state of the faulted component The service processor can detect the removal of a FRU in many cases even if the FRU is removed while the service processor is powered off for example if the system power cables are unplugged during service procedures This function enables Oracle ILOM to know that a fault diagnosed to a specific FRU has been
13. Off the System on page 67 Disconnect Power Cords From the Server on page 68 154 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Remove the Server From the Rack on page 71 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove a Fan Tray on page 89 m Remove the Fan Tray Carriage on page 137 1 Unplug the front control panel cable from J9901 on the motherboard 2 Unplug the front control panel cable from the front I O board 3 Remove the two No 2 Phillips screws 4 Lift the front I O board up and out of the system 5 Place the front I O board on an antistatic mat V Install the Front I O Board 1 Lower the front I O board into the system 2 Install the two No 2 Phillips screws 3 Plug the front control panel connector into the front I O board 4 Plug the front control panel connector into J9901 on the motherboard Servicing Field Replaceable Units 155 Next Steps m Install the Fan Tray Carriage on page 138 m Install a Fan Tray on page 90 m Install the Top Cover on page 158 m Install the Server Into the Rack on page 158 m Connect the Power Cords to the Server on page 161 m Power On the Server on page 161 156 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Returning the Server to Operation These topics describe how to return the server to operation after you have performed
14. Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Do the following 1 2 Identify the PCIe card you want to remove Open the PCIe card latch 98 Sun SPARC Enterprise T5440 Server Service Manual June 2011 FIGURE Removing a PCIe Card 3 Remove the PCIe card the system 4 Place the PCIe card on an antistatic mat 5 If you are not replacing the PCIe card install a PCIe filler panel in its place 6 Close the PCIe card latch V Install a PCIe Card 1 Identify the correct slot for installation 2 Open the PCIe card latch Servicing Customer Replaceable Units 99 FIGURE Installing a PCIe Card 3 4 Insert the PCIe card into its slot Close the PCIe card latch Next Steps Install the Top Cover on page 158 Slide the Server Into the Rack on page 159 Power On the Server on page 161 W Add a PCIe Card Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Disconnect Power Cords From the Server on page 68 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic
15. ch ant lui en faire part par crit Si ce logiciel ou la documentation qui l accompagne est conc d sous licence au Gouvernement des Etats Unis ou toute entit qui d livre la licence de celogiciel ou l utilise pour le compte du Gouvernement des Etats Unis la notice suivante s applique U S GOVERNMENT RIGHTS Programs software databases and related documentation and technical data delivered to U S Government customers are commercial computer software or commercial technical data pursuant to the applicable Federal Acquisition Regulation and agency specific supplemental regulations As such the use duplication disclosure modification and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract and to the extent applicable by the terms of the Government contract the additional rights set forth in FAR 52 227 19 Commercial Computer Software License December 2007 Oracle America Inc 500 Oracle Parkway Redwood City CA 94065 Celogiciel ou mat riel a t d velopp pour un usage g n ral dans le cadre d applications de gestion des informations Ce logiciel ou mat riel n est pas concu ni n est destin tre utilis dans des applications risque notamment dans des applications pouvant causer des dommages corporels Si vous utilisez ce logiciel ou mat riel dans le cadre d applications dangereuses il est de votre responsabilit de prendre toutes les mesur
16. on page 158 m Slide the Server Into the Rack on page 159 Servicing Customer Replaceable Units 111 m Power On the Server on page 161 V Verify FB DIMM Replacement 1 Access the Oracle ILOM gt prompt Refer to the Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server for instructions 2 Run the show faulty command to determine how to clear the fault The method you use to clear a fault depends on how the fault is identified by the showfaults command Examples m If the fault is a host detected fault displays a UUID continue to Step 3 For example show faulty Target Property Value c C P PH SP faultmgmt 0 fru SYS MB CPU0 CMP0 BRO CH1 D0 SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUNAV 8000 DX faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Dec 14 22 43 59 faults 0 m In most cases if the fault was detected by POST and resulted in the FB DIMM being disabled such as the following example the replacement of the faulty FB DIMM is detected when the service processor is power cycled In this case the fault is automatically cleared from the system show faulty Target Property Value C CR T SP faultmgmt 0 fru S
17. service procedures Caution Never attempt to run the server with the cover removed Hazardous voltage is present Caution Equipment damage could occur if you run the server with the cover removed The cover must be in place for proper air flow Description Links Install the top cover after servicing internal Install the Top Cover on page 158 components Re attach the server to the cabinet slide rails Install the Server Into the Rack on after performing a bench procedure page 158 Slide the server back into the equipment Slide the Server Into the Rack on page 159 rack Re attach power cords and data cables to the Connect the Power Cords to the Server on back panel of the server page 161 Power on the server after performing a Power On the Server on page 161 service procedure Related Information m Preparing to Service the System on page 63 m Servicing Customer Replaceable Units on page 77 m Servicing Field Replaceable Units on page 123 157 V Install the Top Cover If you removed the top cover perform the steps in this procedure Note If removing the top cover caused an emergency shutdown you must install the top cover and use the poweron command to restart the system See Power On the Server on page 161 1 Place the top cover on the chassis Set the cover down so that it hangs over the rear of the server by about an inch 25 4 mm 2 Slide the top cover forw
18. 10 Tools Boot directory of your exported Solaris 10 8 07 Solaris 10 5 08 or Solaris 10 10 08 OS image on your JumpStart server 3 Power off the system 4 Log in to the ALOM compatibility shell Type Sc setsc sys ioreconfigure nextboot 5 Power on the system 6 Boot from the network Type ok boot net s 7 Mount the system boot disk under the mnt directory Type mount dev dsk c0t0d0s0 mnt Performing Node Reconfiguration 167 8 Change to the root directory of your boot disk and copy the reconfig pl script to the root of the boot disk Type cd mnt 9 Do one of the following m If your Jumpstart server is exporting Solaris 10 8 07 or Solaris 10 5 08 type cp reconfig pl m If your Jumpstart server is exporting Solaris 10 10 08 type cp cdrom Solaris 10 Tools Boot reconfig p1l 10 Run the reconfig pl script Type mnt reconfig pl 11 Halt the system Type halt 12 Power off the system For example to power off using the ALOM compatibility shell type sc gt poweroff Wait for the console message which indicates that the system has been powered off V Temporarily Disable All Memory Modules A disabled CMP node complicates the memory topology and can prevent a system from booting To run the system in a degraded state you must reduce the total amount of system memory by disabling all of the FB DIMMs on all of
19. 29 ERROR CMPO Only 4 cores up to 32 cpus are configured because some L2 BANKS are unusable Managing Faults 39 1625 Fri Feb 15 18 57 28 2008 Chassis Log major Feb 15 18 57 28 ERROR System DRAM Available 004096 MB 1624 Fri Feb 15 18 57 28 2008 Chassis Log major Feb 15 18 57 28 ERROR CMP1 memc 1 1 unused because associated L2 banks on CMPO cannot be used 1623 Fri Feb 15 18 57 27 2008 Chassis Log major Feb 15 18 57 27 ERROR Degraded configuration system operating at reduced capacity 1622 Fri Feb 15 18 57 27 2008 Chassis Log major Feb 15 18 57 27 ERROR CMP0 MB CPUO CMPO BR1 neither channel populated with DIMMO Branch 1 not configured Detecting Faults Oracle VTS Software This topic includes the following m About Oracle VTS Software on page 40 m Verify Installation of Oracle VTS Software on page 41 m Start the Oracle VTS Browser Environment on page 42 m Oracle VTS Software Packages on page 44 m Useful Oracle VTS Tests on page 45 About Oracle VTS Software The Oracle VTS software features a Java based browser environment an ASCII based screen interface and a command line interface For more information about how to use the Oracle VTS software see the Oracle VTS 7 0 User s Guide The Oracle Solaris OS must be running in order to use the Oracle VTS software You also must ensure that the Oracle VTS validation test software is installed on your system This section descri
20. 40 must be running the Oracle Solaris OS If Oracle VTS reports a faulty device replace the FRU f Oracle VTS does not report a faulty device go to Action No 5 Managing Faults 13 TABLE Diagnostic Flowchart Actions Continued Action No Diagnostic Action Resulting Action For more information 5 Run POST POST performs basic tests of the server components Detecting Faults Using and reports faulty FRUs POST on page 45 Controlling How POST Runs on page 27 6 Determine if the Determine if the fault is an environmental fault ora Detecting Faults Oracle fault is an configuration fault ILOM show faulty environmental or Jf the fault listed by the show faulty command Command on page 34 configuration displays a temperature or voltage fault then the fault fault is an environmental fault Environmental faults Detecting Faults on can be caused by faulty FRUs power supply or fan page 32 or by environmental conditions such as when computer room ambient temperature is too high or the server airflow is blocked When the environmental condition is corrected the fault will automatically clear If the fault indicates that a fan or power supply is bad you can perform a hot swap of the FRU You can also use the fault LEDs on the server to identify the faulty FRU fans and power supplies If the FRU displayed by the show faulty command is SYS the fault is a configuration problem SYS indicates no faulty FRU has
21. Assembly 148 v Remove the Flex Cable Assembly 149 v Install the Flex Cable Assembly 150 Servicing the Front Control Panel 152 v Remove the Front Control Panel 152 v Install the Front Control Panel 153 Servicing the Front I O Board 154 v Remove the Front I O Board 155 v Install the Front I O Board 156 Returning the Server to Operation 157 v Installthe Top Cover 158 Install the Server Into the Rack 158 w Slide the Server Into the Rack 159 w Connect the Power Cords to the Server 161 w PowerOntheServer 161 Performing Node Reconfiguration 163 I O Connections to CMP Memory Modules 164 Recovering From a Failed CMP Memory Module 165 Options for Recovering From a Failed CMP Memory Module 165 Reconfiguring I O Device Nodes 166 Options for Reconfiguring I O Device Nodes 166 v Reconfigure the I O and PCIe Fabric 167 v Temporarily Disable All Memory Modules 168 v Re Enable All Memory Modules 169 v Resetthe LDoms Guest Configuration 170 System Bus Topology 171 I O Fabric in 2P Configuration 172 Contents ix I O Fabric in 4P Configuration 173 Identifying Connector Pinouts 175 Serial Management Port Connector Pinouts 176 Network Management Port Connector Pinouts 177 Serial Port Connector Pinouts 178 USB Connector Pinouts 179 Gigabit Ethernet Connector Pinouts 180 Server Components 181 Customer Replaceable Units 182 Field Replaceable Units 184 Index 187 x Sun SPARC Enterprise T5440 Server Service Manual June 2011 Using This
22. Buffer 38 Y View System Message Log Files 38 Detecting Faults Oracle ILOM Event Log 39 v View Oracle ILOM Event Log 39 Detecting Faults Oracle VIS Software 40 About Oracle VTS Software 40 vw Verify Installation of Oracle VTS Software 41 v Start the Oracle VTS Browser Environment 42 Oracle VTS Software Packages 44 Useful Oracle VTS Tests 45 Detecting Faults Using POST 45 iv Sun SPARC Enterprise T5440 Server Service Manual June 2011 Identifying Faults Detected by PSH 47 v Detect Faults Identified by the Oracle Solaris PSH Facility Oracle ILOM fmdump Command 48 Clearing Faults 51 v Clear Faults Detected During POST 51 v Clear Faults Detected by PSH 53 v Clear Faults Detected in the External I O Expansion Unit 54 Disabling Faulty Components 54 Disabling Faulty Components Using Automatic System Recovery 55 v Disable System Components 56 v Re Enable System Components 56 Oracle ILOM to ALOM CMT Command Reference 57 Preparing to Service the System 63 Safety Information 63 Observing Important Safety Precautions 64 Safety Symbols 64 Electrostatic Discharge Safety Measures 65 Handling Electronic Components 65 Antistatic Wrist Strap 65 Antistatic Mat 65 Required Tools 66 v Obtain the Chassis Serial Number 66 v Obtain the Chassis Serial Number Remotely 66 Powering Off the System 67 v Power Off Command Line 67 v Power Off Graceful Shutdown 68 v Power Off Emergency Shutdown 68 v Disconnect Power Cords From the Server 68 C
23. CMP memory modules can be installed in the system Each CMP module is paired with a memory module CMP modules and memory modules are keyed uniquely to prevent incorrect insertion into the wrong type of slot A faulty CMP or memory module is indicated with an alluminated fault LED An alluminated module LED also might indicate a faulty FB DIMM on that module Sun SPARC Enterprise T5440 Server Service Manual June 2011 Related Information CMP and Memory Module Device Identifiers on page 109 Supported CMP Memory Module Configurations on page 110 I O Connections to CMP Memory Modules on page 164 Reconfiguring I O Device Nodes on page 166 Servicing FB DIMMs on page 110 System Bus Topology on page 171 I O Fabric in 2P Configuration on page 172 I O Fabric in 4P Configuration on page 173 Servicing Customer Replaceable Units 105 106 Y Remove a CMP Memory Module Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 R
24. CMP modules to CMP Slot 0 3 If neither option 1 nor 2 is possible you must do the following m Temporarily Disable All Memory Modules on page 168 m Reconfigure the I O and PCIe Fabric on page 167 m Re Enable All Memory Modules on page 169 m Reset the LDoms Guest Configuration on page 170 Related Information m Managing Faults on page 9 m Servicing CMP Memory Modules on page 104 Performing Node Reconfiguration 165 m Servicing FB DIMMs on page 110 m I O Connections to CMP Memory Modules on page 164 m Reconfiguring I O Device Nodes on page 166 m System Bus Topology on page 171 m I O Fabric in 2P Configuration on page 172 m I O Fabric in 4P Configuration on page 173 Reconfiguring I O Device Nodes This topic includes the following m Options for Reconfiguring I O Device Nodes on page 166 m Reconfigure the I O and PCIe Fabric on page 167 m Temporarily Disable All Memory Modules on page 168 m Re Enable All Memory Modules on page 169 Options for Reconfiguring I O Device Nodes You might need to change the connection between the CMP modules and the onboard devices described in I O Fabric in 2P Configuration on page 172 or I O Fabric in 4P Configuration on page 173 in one of the following circumstances m CMP module has completely failed you need access to a PCIe slot or device which was attached to that CMP module and you are unable to temporarily replace the failed mo
25. Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Identify the correct slot for installation See PCIe Device Identifiers on page 101 and PCIe Slot Configuration Guidelines on page 102 100 Sun SPARC Enterprise T5440 Server Service Manual June 2011 2 Open the PCIe card latch 3 Remove the PCIe filler panel 4 Insert the PCIe card into its slot 5 Close the PCIe card latch Next Steps m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Power On the Server on page 161 PCIe Device Identifiers Device identifiers are case sensitive Device Device Identifier PCIe0 SYS MB PCIEO PCIe1 SYS MB PCIE1 PCIe2 SYS MB PCIE2 PCIe3 SYS MB PCIE3 PCIe4 SYS MB PCIE4 or XAUIO SYS MB XAUIO Notes x8 slot x16 slot operating at x8 x8 slot x8 slot x8 slot shared with XAUI slot Servicing Customer Replaceable Units 101 Device Device Identifier Notes PCIe5 SYS MB PCIE5 or x8 slot shared with XAUI slot XAUII SYS MB XAUI1 PCIe6 SYS MB PCIE6 x16 slot operating at x8 PCIe7 SYS MB PCIE7 x8 slot Note PCIe names in Oracle ILOM messages are displayed with the full FRU name such as SYS MB PCIEO Note In the Solaris OS PCIe slot addresses are associated with CMP modules The PCIe slot address in the Solaris OS might change if you add or remove CMP modules or if a CMP module is brought offl
26. Documentation This document describes how to remove and install replaceable parts in Oracle s Sun SPARC Enterprise T5440 server This manual also includes information about the use and maintenance of the servers This document is written for technicians system administrators authorized service providers ASPs and users who have advanced experience troubleshooting and replacing hardware m Related Documentation on page xi m Documentation Support and Training on page xii Related Documentation The documents listed as online are available at the following URL http download oracle com docs cd E19488 01 Application Title Format Location Late breaking information Sun SPARC Enterprise T5440 Server PDF Online Product Notes Site planning Sun SPARC Enterprise T5440 Server PDF Online Site Planning Guide Safety and regulatory Sun SPARC Enterprise T5440 Server PDF Online compliance Safety and Compliance Guide Installation Sun SPARC Enterprise T5440 Server Printed Shipping Installation and Setup Guide PDF kit Online System administration Sun SPARC Enterprise T5440 Server PDF Online Administration Guide Service processor Oracle Integrated Lights Out PDF Online Manager ILOM 3 0 Getting Started Guide xi Documentation Support and Training These web sites provide additional resources m Documentation http www oracle com technetwork indexes documentation index html Support https support ora
27. FB DIMM you want to remove 110 Sun SPARC Enterprise T5440 Server Service Manual June 2011 a Press the FB DIMM fault button See FB DIMM Fault Button Locations on page 120 b Note which FB DIMM fault LED is illuminated 2 Push down on the ejector tabs on each side of the FB DIMM until the FB DIMM is released Caution FB DIMMs might be hot Use caution when servicing FB DIMMs 3 Grasp the top corners of the faulty FB DIMM and remove it from the CMP memory module 4 Place the FB DIMM on an antistatic mat 5 Repeat Step 2 through Step 4 to remove any additional FB DIMMs Install FB DIMMs 1 Unpackage the replacement FB DIMMs and place them on an antistatic mat Tip See FB DIMM Configuration on page 116 for information about configuring the FB DIMMs 2 Ensure that the ejector tabs are in the open position 3 Line up the replacement FB DIMM with the connector Align the FB DIMM notch with the key in the connector This ensures that the FB DIMM is oriented correctly 4 Push the FB DIMM into the connector until the ejector tabs lock the FB DIMM in place If the FB DIMM does not easily seat into the connector verify that the orientation of the FB DIMM is correct If the orientation is reversed damage to the FB DIMM might occur 5 Repeat Step 2 through Step 4 until all replacement FB DIMMs are installed Next Steps m Install a CMP Memory Module on page 107 m Install the Top Cover
28. H W under test SYS MB CPU0 CMP0 BR1 CH0 DO 2 gt Repair Instructions Replace items in order listed by H W under test above 7 2 gt MSG Pin 149 failed on SYS MB CPUO CMPO BR1 CHO DO J792 7 2 END ERROR I I E s 7 2 gt Decode of Dram Error Log Reg Channel 2 bits 60000000 0000108c 7 2 1 MEC 62 R W1C Multiple corrected errors one or more CE not logged 7 2 1 DAC 61 R W1C Set to 1 if the error was a DRAM access CE 7 2 108c SYND 15 0 RW ECC syndrome 2 gt 2 gt Dram Error AFAR channel 2 00000000 00000000 2 gt L2 AFAR channel 2 00000000 00000000 I J d Perform further investigation if needed m If POST detects a faulty device the fault is displayed and the fault information is passed to the service processor for fault handling Faulty FRUs are identified in fault messages using the FRU name m The fault is captured by the service processor where the fault is logged the Service Required LED is lit and the faulty component is disabled See EXAMPLE Fault Detected by POST on page 53 m Run the Oracle ILOM show faulty command to obtain additional fault information In this example SYS MB CPU0 CMP0 BR1 CHO DO is disabled The system can boot using memory that was not disabled until the faulty component is replaced Note You can use ASR commands to display and control disabled components See Disabling Faulty Components on page 54 Related Information m Diagnostic Flo
29. Healing PSH Q quick visual notification 10 R rack extending server to maintenance position 69 slide rail 70 Ready to Remove hard drive LED 80 82 rear panel access 5 reconfig pl script 167 removefru command 59 removing 140 battery 133 CMP memory module 106 DVD ROM drive 126 fan tray 87 89 fan tray carriage 137 FB DIMMs 110 flex cable assembly 149 front bezel 124 front control panel 152 front I O board 155 hard drive 79 83 hard drive backplane 140 IDPROM 131 motherboard 143 PCIe card 98 power distribution board 134 power supply 92 95 service processor 128 reset command 59 reset system using ILOM 30 using POST commands 30 resetsc command 59 S safety information 63 safety symbols 64 Index 191 sanity check for hardware components 20 SCC module and host ID 2 and MAC addresses 2 serial management port pinouts 176 serial number chassis 66 serial port DB 9 pinouts 178 service processor installing 130 removing 128 Service Required system LED 34 about 4 cleared by enablecomponent command 52 interpreting to diagnose faults 33 triggered by ILOM 16 triggered by power supply fault 98 set command and component state property 52 setkeyswitch parameter 29 60 61 113 setlocator command 4 7 60 70 show faulty command 33 46 60 and faults detected by POST 37 and PSH faults 36 and Service Required LED 34 description and examples 34 environmental f
30. I O Fabric in 4P Configuration CMP Number Devices Controlled CMDPO0 Onboard disk drives Onboard USB ports Onboard DVD drive PCIe0 PCIe1 CMP1 Onboard Gbit or 10 Gbit network PCIe4 PCIe5 CMP2 PCIe2 PCIe3 CMP3 PCIe6 PCIe7 Related Information m System Bus Topology on page 171 m I O Fabric in 2P Configuration on page 172 Performing Node Reconfiguration 173 174 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Identifying Connector Pinouts This section provides reference information about the system back panel ports and pin assignments Topic Links Reference for system Serial Management Port Connector Pinouts on page 176 connector pinouts Network Management Port Connector Pinouts on page 177 Serial Port Connector Pinouts on page 178 USB Connector Pinouts on page 179 Gigabit Ethernet Connector Pinouts on page 180 Related Information m Identifying Server Components on page 1 175 176 Serial Management Port Connector Pinouts The serial management connector labeled SERIAL MGT is an RJ 45 connector located on the back panel This port is the default connection to the system console SERIAL MGT 8 Pin Signal Description Pin Signal Description 1 Request to Send 5 Ground 2 Data Terminal Ready 6 Receive Data 3 Transmit Data 7 Data Set Ready 4 Ground 8 Clear to Send Sun SPARC Enterprise T5440 Server Service Manual June 2011 Network Man
31. Manual June 2011 Location FB DIMM Device Identifiers Connector Number FB DIMM Group SYS MB MEMx CMPx BR0 CH1 D 1 SYS MB MEMx CMPx BR0 CH0 D 1 SYS MB MEMx CMPx BR1 CH1 D 1 SYS MB MEMx CMPx BR1 CH0 D 1 Motherboard connector J746 Bank 1 J511 J1344 J927 The FB DIMM address follows the same convention as the CMP or memory module upon which it is mounted For example SYS MB CPU0 CMPO BR1 CH0 DO is the device identifier for the FB DIMM mounted at J792 on CMP module 0 Related Information m Managing Faults on page 9 m FB DIMM Configuration on page 116 m FB DIMM Fault Button Locations on page 120 m Performing Node Reconfiguration on page 163 FB DIMM Fault Button Locations This figure shows the location of the FB DIMM fault buttons on the CMP module and the memory module Press this button to illuminate the fault indicator on the module Replace the FB DIMM identified by the indicator Note You must replace a faulty FB DIMM with an identical part same part number See FB DIMM Configuration on page 116 for more information Servicing Customer Replaceable Units 119 IH UE Related Information m Managing Faults on page 9 FB DIMM Configuration on page 116 m FB DIMM Device Identifiers on page 119 120 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Servicing Customer Replaceable Units 121 122 Sun SPARC Enterpris
32. Processor on page 23 m Displaying FRU Information With Oracle ILOM on page 25 m Controlling How POST Runs on page 27 m Detecting Faults on page 32 m Clearing Faults on page 51 m Disabling Faulty Components on page 54 m Oracle ILOM to ALOM CMT Command Reference on page 57 Understanding Fault Handling Options This topic contains the following m Server Diagnostics Overview on page 10 m Diagnostic Flowchart on page 11 m Options for Accessing the Service Processor on page 15 m Oracle ILOM Overview on page 16 m ALOM CMT Compatibility Shell Overview on page 18 m Predictive Self Healing Overview on page 19 m Oracle VTS Overview on page 20 m POST Fault Management Overview on page 20 m POST Fault Management Flowchart on page 21 10 Memory Fault Handling Overview on page 22 server Diagnostics Overview You can use a variety of diagnostic tools commands and indicators to monitor and troubleshoot a server LEDs Provide a quick visual notification of the status of the server and of some of the FRUs See Detecting Faults Using LEDs on page 32 Oracle ILOM firmware This system firmware runs on the service processor In addition to providing the interface between the hardware and OS Oracle ILOM also tracks and reports the health of key server components Oracle ILOM works closely with POST and Oracle Solaris Operating System Oracle Solaris OS Predictive Self Healing technology to keep th
33. SYS a gt Note To perform an immediate shutdown use the stop force script SYS command Ensure that all data is saved before entering this command V Power Off Graceful Shutdown Press and release the Power button If necessary use a pen or pencil to press the Power button V Power Off Emergency Shutdown Caution All applications and files will be closed abruptly without saving changes File system corruption might occur Press and hold the Power button for four seconds WV Disconnect Power Cords From the Server Unplug all power cords from the server 68 Sun SPARC Enterprise T5440 Server Service Manual June 2011 unplug the power cords before accessing any cold serviceable components Caution Because 3 3v standby power is always present in the system you must Extending the Server to the Maintenance Position This topic includes the following m Components Serviced in the Maintenance Position on page 69 m Extend the Server to the Maintenance Position on page 70 Components Serviced in the Maintenance Position The following components can be serviced with the server in the maintenance position Fan trays CMP memory modules FB DIMMs PCle XAUI cards Service processor Power supply backplane Hard drive backplane Related Information Front Panel Diagram on page 3 Rear Panel Diagram on page 6 m Extend the Server to the Maintenance Position
34. Sun prescribed level of diagnostic execution Overrides user defined settings as if parameters were diag_level max diag_verbosity max diag_trigger all resets User defined settings are not modified System Boot OpenBoot PROM Solaris boot Related Information m Diagnostic Flowchart on page 11 Normal Mode Diagnostic execution is enabled User defined settings control test coverage and verbosity via diag_level diag_verbosity diag_trigger Managing Faults 21 m Detecting Faults Using POST on page 45 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide Memory Fault Handling Overview A variety of features plays a role in how the memory subsystem is configured and how memory faults are handled Understanding the underlying features helps you identify and repair memory problems This section describes how the server deals with memory faults Note For memory configuration information see FB DIMM Configuration on page 116 The server uses advanced ECC technology that corrects up to 4 bits in error on nibble boundaries as long as the bits are all in the same DRAM On 4 GB FB DIMMs if a DRAM fails the DIMM continues to function The following server features independently manage memory faults m POST Based on Oracle ILOM configuration variables POST runs when the server is powered on For correctable mem
35. This software or hardware and documentation may provide access to or information on content products and services from third parties Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third party content products and services Oracle Corporation and its affiliates will not be responsible for any loss costs or damages incurred due to your access to or use of third party content products or services Copyright O 2008 2011 Oracle et ou ses affili s Tous droits r serv s Ce logiciel et la documentation qui l accompagne sont pou par les lois sur la propri t intellectuelle Ils sont conc d s sous licence et soumis des restrictions d utilisation et de divulgation Du disposition de votre contrat de licence ou de la loi vous ne pouvez pas copier reproduire traduire diffuser modifier breveter transmettre distribuer exposer ex cuter publier ou afficher le logiciel m me partiellement sous quelque forme et par quelque proc d que ce soit Par ailleurs il est interdit de proc der toute ing nierie inverse du logiciel de le d sassembler ou dele d compiler except des fins d interop rabilit avec des logiciels tiers ou tel que prescrit par la loi Les informations fournies dans ce document sont susceptibles de modification sans pr avis Par ailleurs Oracle Corporation ne garantit pas qu elles soient exemptes d erreurs et vous invite le cas
36. and Memory Module Device Identifiers 109 Supported CMP Memory Module Configurations 110 Servicing FB DIMMs 110 v Remove FB DIMMs 110 v Install FB DIMMs 111 v Verify FB DIMM Replacement 112 v AddFB DIMMs 115 FB DIMM Configuration 116 Supported FB DIMM Configurations 116 Memory Bank Configurations 117 FB DIMM Device Identifiers 119 FB DIMM Fault Button Locations 120 Servicing Field Replaceable Units 123 Contents vii Servicing the Front Bezel 123 v Remove the Front Bezel 124 v Install the Front Bezel 125 Servicing the DVD ROM Drive 126 v Remove the DVD ROM Drive 126 v Install the DVD ROM Drive 127 Servicing the Service Processor 128 v Remove the Service Processor 128 v Install the Service Processor 130 Servicing the IDPROM 131 v Remove the IDPROM 131 v Installthe IDPROM 132 Servicing the Battery 133 v Remove the Battery 133 v Install the Battery 134 Servicing the Power Distribution Board 134 v Remove the Power Distribution Board 134 v Install the Power Distribution Board 136 Servicing the Fan Tray Carriage 137 v Remove the Fan Tray Carriage 137 v Install the Fan Tray Carriage 138 Servicing the Hard Drive Backplane 139 v Remove the Hard Drive Backplane 140 v Install the Hard Drive Backplane 141 Servicing the Motherboard 143 v Remove the Motherboard 143 v Install the Motherboard 146 Motherboard Fastener Locations 147 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Servicing the Flex Cable
37. message ID SUNAV 8000 JA provides information for corrective action 3 Follow the suggested actions to repair the fault EXAMPLE Output from the fmdump v Command fmdump v u fd940ac2 d21e c94a f258 f8a9bb69d05b TIME UUID SUNW MSG ID Jul 31 12 47 42 2007 fd940ac2 d21e c94a f258 f8a9bb69d05b SUNAV 8000 JA 100 fault cpu ultraSPARC T2 misc regs Problem in cpu cpuid 16 serial 5D67334847 Affects cpu cpuid 16 serial 5D67334847 FRU hc serial 101083 part 541215101 motherboard 0 Location MB EXAMPLE PSH Message Output CPU errors exceeded acceptable levels Type Fault Severity Major Description The number of errors associated with this CPU has exceeded acceptable levels Managing Faults 49 50 Automated Response The fault manager will attempt to remove the affected CPU from service Impact System performance may be affected Suggested Action for System Administrator Schedule a repair procedure to replace the affected CPU the identity of which can be determined using fmdump v u EVENT ID Details The Message ID SUNAV 8000 JA indicates diagnosis has determined that a CPU is faulty The Oracle Solaris fault manager arranged an automated attempt to disable this CPU Sun SPARC Enterprise T5440 Server Service Manual June 2011 Clearing Faults This section describes how to clear faults Note Some system faults are cleared automatica
38. output to the screen s displays static information about system FRUs defaults to all FRUs unless one is specified d displays dynamic information about system FRUs defaults to all FRUs unless one is specified See Display Individual Component Information Oracle ILOM show Command on page 26 showkeyswitch Sets the virtual keyswitch Turns the Locator LED on the server on or off Displays the environmental status of the host server This information includes system temperatures power supply front panel LED hard drive fan voltage and current sensor status See Display Individual Component Information Oracle ILOM show Command on page 26 Displays current system faults See Detecting Faults on page 32 Displays information about the FRUS in the server Displays the status of the virtual keyswitch 60 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Oracle ILOM Command ALOM CMT Command Description show SYS LOCATE show SP logs event list showlocator Displays the current state of the Locator LED as either on or off showlogs b lines e lines v g lines p logtypel r p11 Displays the history of all events logged in the service processor event buffers in RAM or the persistent buffers show SYS showplatform v Displays information about the operating state of the host system the system serial number and whether the hardwa
39. practices as described in this section Related Information m Safety Symbols on page 64 m Handling Electronic Components on page 65 m Electrostatic Discharge Safety Measures on page 65 Safety Symbols Note the meanings of the following symbols that might appear in this document Caution There is a risk of personal injury or equipment damage To avoid personal injury and equipment damage follow the instructions Caution Hot surface Avoid contact Surfaces are hot and might cause personal injury if touched Caution Hazardous voltages are present To reduce the risk of electric shock and danger to personal health follow the instructions Related Information m Safety Information on page 63 64 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Electrostatic Discharge Safety Measures This topic includes the following m Handling Electronic Components on page 65 m Antistatic Wrist Strap on page 65 m Antistatic Mat on page 65 Handling Electronic Components Electrostatic discharge ESD sensitive devices such as the motherboard PCI cards hard drives and memory modules require special handling Caution Circuit boards and hard drives contain electronic components that are extremely sensitive to static electricity Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards Do not touch the compone
40. the memory modules in order to work around this complication If you are recovering from a failed CMP module you must temporarily disable the FB DIMMS on all memory modules when Solaris is halted and the system is powered off The FB DIMMs are re enabled after the I O and PCIe devices are reconfigured 168 Sun SPARC Enterprise T5440 Server Service Manual June 2011 You can either physically remove the memory modules from the system or remotely disable all FB DIMMs located on all memory modules using the disablecomponent command To remove the memory modules from the system see the instructions in the Sun SPARC Enterprise T5440 Server Service Manual To remotely disable all FB DIMMs in the system do the following 1 Halt the Solaris OS 2 Power off the system 3 Disable each FB DIMM Sc disablecomponent SYS MB MEMXx CMPx BRO0 CH0 D1 Sc disablecomponent SYS MB MEMXx CMPx BR0 CH0 D2 Sc disablecomponent SYS MB MEMXx CMPx BR1 CH1 D3 where x is the memory module to be disabled The following example shows how to disable all the FB DIMMs on MEMI Sc Sc Sc Sc Sc sc gt sc gt sc gt Sc Sc Sc Sc disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent disablecomponent SYS MB MEM1 CMP1 BR0 CH0 D1 SYS MB MEM1 CMP1 BRO0 CH0 D2 SY
41. 3 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Disconnect Power Cords From the Server on page 68 m Extend the Server to the Maintenance Position on page 70 m Remove a Power Supply on page 95 134 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Note You must remove all four power supplies from the system m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 Do the following 1 Remove the flex cable retainer Loosen the captive No 2 Phillips screw and lift the retainer up and out of the chassis 2 Unplug the flex cable from the power distribution board 3 Unplug the auxiliary power cable from the power distribution board 4 Remove the No 2 Phillips screw 5 Remove the two 7 mm hex nuts securing the bus bars to the power distribution board Servicing Field Replaceable Units 135 6 Slide the power distribution board up and out of the chassis V Install the Power Distribution Board 1 Align the keyholes in the power distribution board with the corresponding mushroom standoffs in the chassis 2 Lower the power distribution board into the chassis 3 Install the No 2 Phillips screw 4 Install the two 7 mm nuts securing the bus bars to the power distribution board 5 Plug in the flex cable connector Ensure that the auxilliary power cable
42. 5440 Server Service Manual June 2011 Useful Oracle VTS Tests SunVTS Tests FRUs Exercised by Tests Memory Test FB DIMMs Processor Test CMP motherboard Disk Test Disks cables disk backplane DVD drive Network Test Network interface network cable CMP motherboard Interconnect Test Board ASICs and interconnects IO Ports Test I O serial port interface USB subsystem Environmental Test Motherboard and service processor Related Information m Diagnostic Flowchart on page 11 m Oracle VTS Software Packages on page 44 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide Detecting Faults Using POST Run POST in maximum mode to detect system faults See Run POST in Maximum Mode on page 30 POST error messages use the following syntax c 8 gt ERROR TEST failing test Cc H W under test FRU c S gt Repair Instructions Replace items in order listed by H W under test above c MSG test error message c s gt END ERROR In this syntax c the core number s the strand number Warning and informational messages use the following syntax INFO or WARNING message Managing Faults 45 In the following example POST reports a memory error at FB DIMM location SYS MB CPU0 CMP0 BR1 CH0 D0 The error was detected by POST running on core 7 strand 2 EXAMPLE show Command Output 2 2 gt ERROR TEST Data Bitwalk 2 gt
43. 8 Servicing FB DIMMs on page 110 FB DIMM Fault Button Locations on page 120 Not all components have an individual component Fault LED If the Service Required LED is lit use the show faulty command to obtain additional information about the component affected See these sections Front Panel LEDs on page 5 Rear Panel LEDs on page 8 m Diagnostic Flowchart on page 11 m Detecting Faults Using LEDs on page 32 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server Detecting Faults Oracle ILOM show faulty Command Use the Oracle ILOM show faulty command to display the following kinds of faults Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Environmental or configuration faults System configuration faults Or temperature or voltage problems that might be caused by faulty FRUs power supplies fans or blower or by room temperature or blocked air flow to the server m POST detected faults Faults on devices detected by the POST diagnostics m PSH detected faults Faults detected by the Predictive Self Healing PSH technology m External I O Expansion Unit faults Faults detected in the optional External I O Expansion Unit Use the show faulty comma
44. CH1 D2 SYS MB MEM1 CMP1 BR0 CH1 D3 SYS MB MEM1 CMP1 BR1 CH0 D1 SYS MB MEM1 CMP1 BR1 CH0 D2 SYS MB MEM1 CMP1 BR1 CH0 D3 SYS MB MEM1 CMP1 BR1 CH1 D1 SYS MB MEM1 CMP1 BR1 CH1 D2 SYS MB MEM1 CMP1 BR1 CH1 D3 V Reset the LDoms Guest Configuration After reconfiguring the I O and PCIe fabric you must recreate your LDoms guest configurations as hardware resources that had been previously assigned to your guests might no longer be available 1 Power off the system 2 In the ALOM compatibility shell type Sc bootmode config factory default 3 Power on the system 4 Recreate your LDoms guests using the remaining hardware resources 170 Sun SPARC Enterprise T5440 Server Service Manual June 2011 System Bus Topology Neptune amp Quad Gbe ITE i 0 uc Quad 10 100 1000 Related Information m I O Fabric in 2P Configuration on page 172 m I O Fabric in 4P Configuration on page 173 Performing Node Reconfiguration 171 I O Fabric in 2P Configuration CMP Number Devices Controlled CMPO0 Onboard disk drives Onboard USB ports Onboard DVD drive PCIe0 PCIel PCIe2 PCIe3 CMP1 Onboard Gbit or 10 Gbit network PClIe4 PClIe5 PCIe6 PCIe7 Related Information m System Bus Topology on page 171 m I O Fabric in 4P Configuration on page 173 172 Sun SPARC Enterprise T5440 Server Service Manual June 2011
45. Dc m JL Uo unm UT 1i AVENUE TERE TEU Z Figure Legend 1 Configuration 1 4 FB DIMMs Bank 0 populated 2 Configuration 2 8 FB DIMMs Banks 0 and 1 populated 3 Configuration 3 16 FB DIMMs Banks 0 1 2 and 3 populated Servicing Customer Replaceable Units 117 118 Note See FB DIMM Device Identifiers on page 119 for a list of FB DIMM device identifiers and the corresponding slots on the CMP memory modules Related Information m Managing Faults on page 9 m FB DIMM Device Identifiers on page 119 m FB DIMM Fault Button Locations on page 120 m Performing Node Reconfiguration on page 163 FB DIMM Device Identifiers Thsese are the device and device identifiers for FB DIMMs on a CMP and memory module pair Device identifiers are case sensitive Connector Location FB DIMM Device Identifiers Number FB DIMM Group CMP module SYS MB CPUx CMPx BR1 CHO DO J792 Bank 0 SYS MB CPUx CMPx BR1 CH1 DO J896 Minimum SYS MB CPUx CMPx BRO CHO DO0 J585 Configuration SYS MB CPUx CMPx BRO CH1 DO J687 Motherboard connector Memory module SYS MB MEMx CMPx BR1 CH1 D J1471 Bank 3 2 71573 SYS MB MEMx CMPx BR1 CH1 D J1066 3 J1167 SYS MB MEMx CMPx BR1 CH0 D 2 SYS MB MEMx CMPx BR1 CH0 D 3 SYS MB MEMx CMPx BRO CH1 D J847 Bank 2 2 J948 SYS MB MEMx CMPx BRO CH1 D J660 3 J762 SYS MB MEMx CMPx BR0 CH0 D 2 SYS MB MEMx CMPx BR0 CH0 D 3 Sun SPARC Enterprise T5440 Server Service
46. ILOM POST variables The following table lists the Oracle ILOM variables used to configure POST POST Fault Management Flowchart on page 21 shows how the variables work together Parameter keyswitch mode diag mode diag level diag trigger diag verbosity Values normal diag stby locked off normal service max min none user reset power on reset error reset all resets none min Description The system can power on and run POST based on the other parameter settings For details see FIGURE Flowchart of Variables for POST Configuration on page 21 This parameter overrides all other commands The system runs POST based on predetermined settings The system cannot power on The system can power on and run POST but no flash updates can be made POST does not run Runs POST according to diag level value Runs POST with preset values for diag level and diag verbosity If diag mode normal runs all the minimum tests plus extensive processor and memory tests If diag mode normal runs minimum set of tests Does not run POST on reset Runs POST upon user initiated resets Only runs POST for the first power on This option is the default Runs POST if fatal errors are detected Runs POST after any reset No POST output is displayed POST output displays functional tests with a banner and pinwheel Sun SPARC Enterprise T5440 Server Service Manual June 2011
47. Identifying Server Components on page 1 m Managing Faults on page 9 m Powering Off the System on page 67 m Hot Pluggable and Hot Swappable Devices on page 77 m Fan Tray Device Identifiers on page 90 m Fan Tray Fault LED on page 91 m Server Components on page 181 Remove a Fan Tray Hot Swap Before you begin complete these tasks m Read the section Safety Information on page 63 m Perform the task Extend the Server to the Maintenance Position on page 70 m Perform the task Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Do the following 1 Identify the fan tray to be removed See Fan Tray Device Identifiers on page 90 and Fan Tray Fault LED on page 91 2 Press the fan tray latches toward the center of the fan tray and pull the fan tray up and out of the system Servicing Customer Replaceable Units 87 V Install a Fan Tray Hot Swap 1 Slide the fan tray into its bay until it locks into place Ensure that the fan tray is oriented correctly Airflow in the system is from front to back 2 Verify proper fan tray operation See Fan Tray Fault LED on page 91 Next Steps If you are replacing a faulty fan tray due to an overtemperature condition monitor the system to ensure proper cooling m Slide the Server Into the Rack on page 159 m If you performed any additional service procedures see Power On the Server on page 161 88 Sun SPARC E
48. LEDs on the server The AC Present LED is located on the rear of the server on each power supply If these LEDs are not on check the power source and power connections to the server 2 Run the Oracle The show faulty command displays the following Detect Faults Oracle ILOM show kinds of faults ILOM show faulty faulty command Environmental faults Command on page 35 to check for faults External I O Expansion Unit faults Predictive Self Healing PSH detected faults POST detected faults Faulty FRUs are identified in fault messages using the FRU name Note If the Oracle ILOM show faulty output includes an error string such as Ext sensor or Ext FRU it indicates a fault in the External I O Expansion Unit 3 Check the Oracle The Oracle Solaris log files and the Oracle ILOM Detecting Faults Oracle Solaris log files and system event log record system events and provide Solaris OS Files and Oracle ILOM information about faults Commands on page 37 system event log e Browse the Oracle ILOM system event log for for fault major or critical events Some problems are logged information in the event log but not added to the show faulty list If system messages indicate a faulty device replace the FRU To obtain more diagnostic information go to Action No 4 4 Run Oracle VTS Oracle VTS is an application you can run to exercise Detecting Faults Oracle software and diagnose FRUs To run Oracle VTS the server VTS Software on page
49. Ms must be added Configuration Number Banks 2 and 3 must be either Bank 1 if filled must contain completely empty or completely filled FB DIMMs of the same capacity as If filled they must have the FB DIMMs Bank 0 of the same capacity as Banks 0 and 1 CPU Module Includes Memory Bank 0 Memory Module Includes Memory Banks 1 2 and 3 Memory Bank 0 Memory Bank 1 Memory Bank 2 Memory Bank 3 Total Memory Configuration 1 Configuration 2 Configuration 3 Configuration 4 Configuration 5 4 x 2 Gbyte 8 GBytes 4 x 2 GByte 4 x 2 GByte 16 GBytes 4 x 2 GByte 4 x 2 GByte 4 x 2 GByte 4 x 2 GByte 32 GBytes 4 x 4 GByte i 16 GBytes 4 x 4 GByte 4 x 4 GByte 32 GBytes 116 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Banks 2 and 3 must be either Bank 1 if filled must contain completely empty or completely filled FB DIMMs of the same capacity as If filled they must have the FB DIMMs Bank 0 of the same capacity as Banks 0 and 1 CPU Module Includes Memory Bank 0 Memory Module Includes Memory Banks 1 2 and 3 Configuration Number Memory Bank 0 Memory Bank 1 Memory Bank 2 Memory Bank 3 Total Memory Configuration 6 4 x 4 GByte 4 x 4 GByte 4 x 4 GByte 4 x 4 GByte 64 GBytes Configuration 7 4 x 8 GByte 32 GBytes Configuration 8 4 x 8 GByte 4 x 8 GByte zi 64 GBytes Configuration 9 4x 8 GByte 4x 8 GByte 4x 8 GByte 4x 8 GByte 128 GBytes FIGURE Supported FB DIMM Configurations
50. OM CMT shell Related Information m Diagnostic Flowchart on page 11 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server Oracle ILOM Overview The Integrated Lights Out Manager Oracle ILOM firmware runs on the service processor in the server enabling you to remotely manage and administer your server Oracle ILOM enables you to remotely run diagnostics such as power on self test POST that would otherwise require physical proximity to the server s serial port You can also configure Oracle ILOM to send email alerts of hardware failures hardware warnings and other events related to the server or to Oracle ILOM The service processor runs independently of the server using the server s standby power Therefore Oracle ILOM firmware and software continue to function when the server OS goes offline or when the server is powered off Note Refer to the Oracle Integrated Lights Out Manager 3 0 Concepts Guide for comprehensive Oracle ILOM information Faults detected by Oracle ILOM POST the Predictive Self Healing PSH technology and the External IO Expansion Unit if attached are forwarded to Oracle ILOM for fault handling FIGURE Oracle ILOM Fault Management on page 17 In the event of a system fault Oracle ILOM ensures that the Service Required LED is lit
51. Oracle VTS software is not installed you can obtain the installation packages from the following places m Oracle Solaris Operating System DVDs m Download from the web Refer to the Preface for information on how to access the web site Managing Faults 41 42 EXAMPLE show Command Output ERROR information for SUNWvts was not found ERROR information for SUNWvtsr was not found Start the Oracle VTS Browser Environment For information about test options and prerequisites refer to the Oracle VTS 7 0 User s Guide Note Oracle VTS software can be run in several modes You must perform this procedure using the default mode 1 Start the Oracle VTS agent and Javabridge on the server cd usr Oracle VTS bin startOracle VTS 2 At the interface prompt choose C to start the Oracle VTS client 3 Start the Oracle VTS browser environment from a web browser on the client system Type https server name 6789 The Oracle VTS browser environment is displayed Sun SPARC Enterprise T5440 Server Service Manual June 2011 Loc our Here SunVTS Host Machine View gt Test Group Test Group on bofa d By Default all the tests are enabled To run a subset of tests select the tests that should not be run and click Disable bu
52. Overtemp LED if overtemp condition exists Service Required LED front and rear panel Individual hard drive Fault LED Front Panel LEDs on page 5 Rear Panel LEDs on page 8 Power Supply LED on page 97 Servicing Power Supplies on page 91 Front Panel LEDs on page 5 Rear Panel LEDs on page 8 Fan Tray Fault LED on page 91 Servicing Fan Trays on page 86 See these sections Front Panel LEDs on page 5 Rear Panel LEDs on page 8 Hard Drive LEDs on page 86 Servicing Hard Drives on page 78 Managing Faults 33 Component Fault Fault LEDs Lit Additional Information CMP module or memory module FB DIMM Other components 34 Service Required LED front and rear panel CMP Module Fault LED or Memory Module Fault LED Service Required LED front and rear panel CMP Module Fault LED or Memory Module Fault LED FB DIMM Fault LED CMP and memory modules when FB DIMM Locate button is pressed Service Required LED front and rear panel Related Information A lit CMP module or memory module fault LED might indicate a problem with an FB DIMM installed on the CMP module or a problem with the CMP module itself See these sections Front Panel LEDs on page 5 Rear Panel LEDs on page 8 Servicing CMP Memory Modules on page 104 Servicing FB DIMMs on page 110 See these sections Front Panel LEDs on page 5 Rear Panel LEDs on page
53. Performing Node Reconfiguration on page 163 Supported CMP Memory Module Configurations These are the supported CMP memory module configurations as viewed from the front of the server CMP3 CMP1 CMP2 CMPO Configuration MEM3 MEM1 MEM2 MEMO One CMP memory pair X Two CMP memory pairs X X Three CMP memory pairs X X X Servicing Customer Replaceable Units 109 CMP3 CMP1 CMP2 CMPO Configuration MEM3 MEM1 MEM2 MEMO Four CMP memory pairs X X X X full configurations Related Information CMP and Memory Module Device Identifiers on page 109 Performing Node Reconfiguration on page 163 Servicing FB DIMMs Remove FB DIMMs on page 110 Install FB DIMMs on page 111 Verify FB DIMM Replacement on page 112 Add FB DIMMs on page 115 FB DIMM Configuration on page 116 FB DIMM Device Identifiers on page 119 FB DIMM Fault Button Locations on page 120 V Remove FB DIMMs Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Remove a CMP Memory Module on page 106 Do the following 1 If you are removing a faulty FB DIMM determine which
54. S MB MEM1 CMP1 BR0 CH0 D3 SYS MB MEM1 CMP1 BR0 CH1 D1 SYS MB MEM1 CMP1 BR0 CH1 D2 SYS MB MEM1 CMP1 BR0 CH1 D3 SYS MB MEM1 CMP1 BR1 CH0 D1 SYS MB MEM1 CMP1 BR1 CH0 D2 SYS MB MEM1 CMP1 BR1 CH0 D3 SYS MB MEM1 CMP1 BR1 CH1 D1 SYS MB MEM1 CMP1 BR1 CH1 D2 SYS MB MEM1 CMP1 BR1 CH1 D3 WV Re Enable All Memory Modules Now that the connection between the CMP modules and the I O devices has been reestablished you can re enable the FB DIMMS that were temporarily disabled in Temporarily Disable All Memory Modules on page 168 Do one of the following m Install the memory modules if you removed them Performing Node Reconfiguration 169 m Re enable all of the FB DIMMs which you previously disabled using the enablecomponent command Sc enablecomponent SYS MB MEMXx CMPx BR0 CH0 D1 Sc enablecomponent SYS MB MEMx CMPx BR0 CHO D2 Sc enablecomponent SYS MB MEMx CMPx BR1 CH1 D3 where x is the CMP memory module to be enabled The following example shows how to enable all the FB DIMMs on MEM1 Sc Sc Sc Sc Sc Sc Sc Sc Sc Sc Sc Sc enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent enablecomponent SYS MB MEM1 CMP1 BR0 CHO D1 SYS MB MEM1 CMP1 BR0 CH0 D2 SYS MB MEM1 CMP1 BR0 CH0 D3 SYS MB MEM1 CMP1 BR0 CH1 D1 SYS MB MEM1 CMP1 BR0
55. ST are distinguished from other kinds of faults by the text Forced fail No UUID number is reported Refer to EXAMPLE Fault Detected by POST on page 53 If no fault is reported you do not need to do anything else Do not perform the subsequent steps 2 Use the component state property of the component to clear the fault and remove the component from the ASR blacklist Use the FRU name that was reported in the fault in Step 1 gt set SYS MB CPUO CMPO0 BR1 CHO DO0 component state Enabled The fault is cleared and should not show up when you run the show faulty command Additionally the Service Required LED is no longer on 3 Reset the server You must reboot the server for the component state property to take effect 4 At the Oracle ILOM prompt use the show aulty command to verify that no faults are reported gt show faulty Target Property Value 52 Sun SPARC Enterprise T5440 Server Service Manual June 2011 EXAMPLE Fault Detected by POST show faulty Target Property Value NEM ERN ON Mec ie ces cuu e LU UL e SP faultmgmt 0 fru SYS MB CPU0 CMPO BR1 CH0 D0 SP faultmgmt 0 timestamp Dec 21 16 40 56 SP faultmgmt 0 timestamp Dec 21 16 40 56 faults 0 SP faultmgmt 0 Sp detected fault SYS MB CPU0 CMP0 BR1 CH0 DO faults 0 Forced fail POST V Clear Faults Detected by PSH When the Oracle Solaris PSH facility detects faults the faults are logged and disp
56. Solaris software displays the fault logs it and passes information to Oracle ILOM where it is logged Depending on the fault one or more LEDs might be illuminated See TABLE Diagnostic Flowchart Actions on page 13 and Parameter on page 28 for an approach for using the server diagnostics to identify a faulty field replaceable unit FRU The diagnostics you use and the order in which you use them depend on the nature of the problem you are troubleshooting So you might perform some actions and not others Sun SPARC Enterprise T5440 Server Service Manual June 2011 Before referring to the flowchart perform some basic troubleshooting tasks m Verify that the server was installed properly m Visually inspect cables and power m Optional Perform a reset of the server Related Information m Diagnostic Flowchart on page 11 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide Diagnostic Flowchart The following diagnostics are available to troubleshoot faulty hardware See Change POST Parameters on page 29 for more information about each diagnostic in this chapter Managing Faults 11 FIGURE 12 Diagnostic Flowchart 1 Are the Check the Power OK and power souroe AC OK LEDs and off connections Faulty hardware suspected 2 Are any faults reported by the ILOM displays a fault the fault an en
57. Sun SPARC Enterprise T5440 Server Service Manual S KA S u n Part No E22634 01 PAN an iu June 2011 Revision A Copyright 2008 2011 Oracle and or its affiliates All rights reserved This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws Except as expressly permitted in your license agreement or allowed by law you may not use copy reproduce translate broadcast modify license transmit distribute exhibit perform publish or display any part in any form or by any means Reverse engineering disassembly or decompilation of this software unless required by law for interoperability is prohibited The information contained herein is subject to change without notice and is not warranted to be error free If you find any errors please report them to us in writing If this is software or related software documentation that is delivered to the U S Government or anyone licensing it on behalf of the U S Government the following notice is applicable U S GOVERNMENT RIGHTS Programs software databases and related documentation and technical data delivered to U S Government customers are commercial computer software or commercial technical data pursuant to the applicable Federal Acquisition Regulation and agency specific ja ieee oleae regulations As such the use duplication disclosure modification
58. V standby power supply rail 2 A AC Present power supply LED 13 97 adding CMP memory module 108 FB DIMMs 115 PCIe card 100 addresses device and system configuration 164 advanced ECC technology 22 Advanced Lights Out Management ALOM CMT connecting to 23 airflow blocked 14 antistatic wrist strap 65 ASR blacklist 55 56 asrkeys system components 25 Automatic System Recovery ASR 55 B battery installing 134 removing 133 blacklist ASR 55 bootmode command 59 break command 58 C cfgadm command 80 82 chassis dimensions 1 serial number 66 clearfault command 58 clearing POST detected faults 51 clearing PSH detected faults 53 CMP module disabling to run system in degraded state 168 failure recovery 165 fault recovery 163 168 I O devices connected to 164 CMP memory module 107 adding 108 device identifiers 109 installing 107 removing 106 supported configurations 110 CMP0 failure mode 164 CMP1 failure mode 164 command cfgadm 80 82 disablecomponent 56 fmdump 48 iostat E 83 removefru 59 setlocator 4 7 60 70 show faulty 34 114 showfaults 60 showfru 26 60 component state ILOM component property 52 components disabled automatically by POST 55 disabling using disablecomponent command 56 displaying state of 55 displaying using showcomponent command 25 configuration device addresses 164 connecting to ALOM CMT 23 console command 30 58 114 187
59. VCORE has been margined up to the value specified in the default scr file that was previously configured Example start SYS To initiate the power on sequence manually use a pen or pencil to press the Power button on the front panel See Front Panel Diagram on page 3 for Power button location Note If you are powering on the server following an emergency shutdown triggered by the top cover interlock switch you must use the poweron command Returning the Server to Operation 161 162 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Performing Node Reconfiguration If a CMP memory module pair develops a fault the server can be reconfigured to run in a degraded state until the CMP memory module is replaced In addition you can add CMP memory module pairs to existing systems However adding or removing CMP memory modules might affect internal hardware device addresses as well as the device address of any external devices attached to the system such as external disk arrays and devices attached via an External I O Expansion Unit Depending on which CMP memory module is added or removed it might be necessary to manually reassign one or more I O devices before they can function correctly in the new system configuration Topic Links Learn about how CMP memory modules map to I O devices Learn how to reconfigure the server to temporarily bypass a failed CMP memory module Disabl
60. YS MB CPU0 CMP0 BR1 CH0 DO SP faultmgmt 0 timestamp Dec 21 16 40 56 SP faultmgmt 0 timestamp Dec 21 16 40 56 faults 0 SP faultmgmt 0 Sp detected fault SYS MB CPUO CMP0O BR1 CH0 DO faults 0 Forced fail POST If the fault is still displayed by the show faulty command then run the set command to enable the FB DIMM and clear the fault 112 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Example gt set SYS MB CPU0 CMP0 BR0 CH0 D0 component_state Enabled 3 Perform the following steps to verify the repair a Set the virtual keyswitch to diag so that POST will run in Service mode gt set SYS keyswitch_state Diag Set keyswitch_ state to Diag b Power cycle the system stop SYS Are you sure you want to stop SYS y n y Stopping SYS start SYS Are you sure you want to start SYS y n y Starting SYS Note The server takes about one minute to power off Use the show HOST command to determine when the host has been powered off The console will display status Powered Off c Switch to the system console to view POST output gt start SYS console Watch the POST output for possible fault messages The following output is a sign that POST did not detect any faults 0 gt INFO 0 gt POST Passed all devices 0 gt POST Return to VBSC 0 gt Master set ACK for vbsc runpost command and spin
61. ackplane 140 about 2 installing 141 removing 140 hard drive LEDs 86 help command 58 host ID stored on SCC module 2 hot pluggable devices 77 hot plugging hard drive 79 81 hard drive situations inhibiting 79 hot swappable devices 78 hot swapping fan tray 87 88 power supply 92 l I O connections to CMP module 164 I O fabric in 2 processor configuration 172 in 4 processor configuration 173 I O subsystem 19 20 55 IDPROM installing 132 removing 131 ILOM commands show 26 show faulty 35 46 60 114 ILOM system event log 13 ILOM See Integrated Lights Out Management ILOM indicators 33 infrastructure boards about 1 infrastructure boards about See also power distribution board power supply backplane hard drive backplane front I O board front control panel installing 107 battery 134 CMP memory module 107 DVD ROM drive 127 fan tray 88 90 fan tray carriage 138 FB DIMMs 111 flex cable assembly 150 front bezel 125 front control panel 153 Index 189 front I O board 156 hard drive 81 84 hard drive backplane 141 IDPROM 132 motherboard 146 PCIe card 99 power distribution board 136 power supply 93 96 service processor 130 top cover 158 Integrated Lights Out Manager and fault detection in External I O Expansion Unit 16 iostat Ecommand 83 L latch power supply 93 95 slide rail 70 LED AC Present power supply LED 13 97 DC OK power supply LED 97 Fa
62. agement Port Connector Pinouts The network management connector labeled NET MGT is an RJ 45 connector located on the motherboard and can be accessed from the back panel This port needs to be configured prior to use Qe e e NET MGT 8 Pin Signal Description Pin Signal Description 1 Transmit Data 5 Common Mode Termination 2 Transmit Data 6 Receive Data 3 Receive Data 7 Common Mode Termination 4 Common Mode Termination 8 Common Mode Termination Identifying Connector Pinouts 177 178 Serial Port Connector Pinouts The serial port connector TTYA is a DB 9 connector that can be accessed from the back panel 12345 6789 Pin Signal Description Pin Signal Description 1 Data Carrier Detect 6 Data Set Ready 2 Receive Data 7 Request to Send 3 Transmit Data 8 Clear to Send 4 Data Terminal Ready 9 Ring Indicate 5 Ground Sun SPARC Enterprise T5440 Server Service Manual June 2011 USB Connector Pinouts Two Universal Serial Bus USB ports are located on the motherboard in a double stacked layout and can be accessed from the back panel Two additional USB ports are located on the front panel lt gt USB3 USB2 Pin Signal Description Pin Signal Description A1 5 V fused B1 5 V fused A2 USB0 1 B2 USB2 3 A3 USB0 1 B3 USB2 3 A4 Ground B4 Ground Identifying Connector Pinouts 179 180 Gigabit Ethernet Connector Pinouts Four RJ 45 Gigabit Et
63. and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract and to the extent applicable by the terms of the Government contract the additional rights set forth in FAR 52 227 19 Commercial Computer Software License December 2007 Oracle America Inc 500 Oracle Parkway Redwood City CA 94065 This software or hardware is developed for general use in a variety of information management applications It is not developed or intended for use in any inherently dangerous applications including applications which may create a risk of personal injury If you use this software or hardware in dangerous applications then you shall be responsible to take all appropriate fail safe backup redundancy and other measures to ensure its safe use Oracle orporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications Oracle and Java are registered trademarks of Oracle and or its affiliates Other names may be trademarks of their respective owners AMD Opteron the AMD logo and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc UNIX is a registered trademark licensed through X Open Company Ltd
64. ar Panel Diagram on page 6 m Ethernet Port LEDs on page 9 m Detecting Faults Using LEDs on page 32 8 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Ethernet Port LEDs The service processor network management port and the four 10 100 1000 Mbps Ethernet ports each have two LEDs LED Color Description Left LED Amber Speed indicator or Amber on The link is operating as a Gigabit connection green 1000 Mbps Green on The link is operating as a 100 Mbps connection Off The link is operating as a 10 Mbps connection Right LED Green Link Activity indicator e Steady on A link is established Blinking There is activity on this port Off No link is established The NET MGT port only operates in 100 Mbps or 10 Mbps so the speed indicator LED will be green or off never amber Related Information m Rear Panel Diagram on page 6 m Rear Panel LEDs on page 8 m Detecting Faults Using LEDs on page 32 Identifying Server Components 9 10 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Managing Faults These topics describe the diagnostics tools that are available for monitoring and troubleshooting the server These topics are intended for technicians service personnel and system administrators who service and repair computer systems It contains the following topics m Understanding Fault Handling Options on page 9 m Connecting to the Service
65. ard Drive LEDs No LED Color Notes 1 Ready Blue This LED is lit to indicate that a hard drive to Remove iB can be removed safely during a hot plug operation 2 Service Amber This LED is lit when the system is running Required A and the hard drive is faulty 3 OK Activity Green This LED lights when data is being read from or written to the hard drive The front and rear panel Service Required LEDs are also lit if the system detects a hard drive fault Related Information Hard Drive Device Identifiers on page 85 Servicing Fan Trays This topic includes the following About Fan Trays on page 87 Remove a Fan Tray Hot Swap on page 87 Install a Fan Tray Hot Swap on page 88 Remove a Fan Tray on page 89 Install a Fan Tray on page 90 Fan Tray Device Identifiers on page 90 86 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Fan Tray Fault LED on page 91 About Fan Trays Four fan trays are located toward the front of the server arranged in two N 1 redundant pairs Each fan tray contains a fan mounted in an integrated hot swappable CRU If a fan tray fails replace it as soon as possible to maintain server availability Caution Hazardous moving parts Unless the power to the server is completely shut down the only service permitted in the fan compartment is the replacement of the fan trays by trained personnel Related Information m
66. ard until it seats 3 Secure the top cover by tightening the two captive screws along the rear edge AN A V Install the Server Into the Rack The following procedure explains how to insert the server into the rack Caution The weight of the server on extended slide rails can be enough to overturn an equipment rack Before you begin deploy the antitilt feature on your cabinet Caution The server weighs approximately 88 Ib 40 kg Two people are required to lift and mount the server into a rack enclosure when using the procedures in this chapter 1 Slide the inner slide assemblies out from the outer rails about 2 inches 5 cm from the front face of the rail s bracket Ensure the following m The inner slide assemblies are locked past the internal stop m The ball bearing retainer is locked all the way forward 158 Sun SPARC Enterprise T5440 Server Service Manual June 2011 2 Lift the server up and insert the inner rails into the inner slide assemblies Ensure that the inner rails are horizontal when the inner rails enter the inner slide assemblies 3 Ensure that the inner rails are engaged with the ball bearing retainers on both inner slide assemblies Note If necessary support the server with the mechanical lift while aligning the inner rails parallel to the rack mounted inner slide assemblies WV Slide the Server Into the Rack 1 Press the inner rail release buttons on bo
67. are logs all status and error messages To view these messages click the Logs tab You can choose to view the following logs m Test Error Detailed error messages from individual tests Managing Faults 43 m Oracle VTS Test Kernel Vtsk Error Error messages pertaining to the Oracle VTS software itself Look here if the Oracle VTS software appears to be acting strangely especially when it starts up m Information Detailed versions of all the status and error messages that appear in the test messages area m Oracle Solaris OS Messages var adm messages A file containing messages generated by the operating system and various applications m Test Messages var Oracle VTS logs Oracle VTS info A directory containing the Oracle VTS log files Oracle VTS Software Packages Package Description SUNWvts Test development library APIs and Oracle VTS kernel You must install this package to run the Oracle VTS software SUNWvtsmn Man pages for the Oracle VTS utilities including the command line utility SUNWvtsr Oracle VTS framework root SUNWvtss Oracle VTS browser user interface BUI components required on the server system SUNWvtsts Oracle VTS test binaries Related Information m Diagnostic Flowchart on page 11 m Useful Oracle VTS Tests on page 45 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide 44 Sun SPARC Enterprise T
68. ata from previous errors and other related information to diagnose the problem Once diagnosed the Fault Manager daemon assigns the problem a Universal Unique Identifier UUID that distinguishes the problem across any set of systems When possible the Fault Manager daemon initiates steps to self heal the failed component and take the component offline The daemon also logs the fault to the syslogd daemon and provides a fault notification with a message ID MSGID You can use the message ID to get additional information about the problem from the knowledge article database The Predictive Self Healing technology covers the following server components m UltraSPARC T2 Plus multicore processor m Memory m I O subsystem The PSH console message provides the following information about each detected fault m Type m Severity m Description m Automated response m Impact m Suggested action for system administrator Related Information m Diagnostic Flowchart on page 11 m Identifying Faults Detected by PSH on page 47 m Sun SPARC Enterprise T5440 Server Administration Guide Managing Faults 19 Oracle VIS Overview Sometimes a server exhibits a problem that cannot be isolated definitively to a particular hardware or software component In such cases it might be useful to run a diagnostic tool that stresses the system by continuously running a comprehensive battery of tests Oracle VTS software is provided for this purpose Relate
69. ault 36 reasons to use 35 use in detecting faults in an External I O Expansion Unit 37 using to check for faults 13 using to diagnose FB DIMMs 112 using to verify successful FB DIMM replacement 114 showcomponent command 25 55 showenvironment command 60 showfaults command syntax 60 showfru command 26 60 showkeyswitch command 60 showlocator command 61 showlogs command 61 showplatform command 61 66 shutdown triggered by top cover removal emergency shutdown 158 using Power button emergency shutdown 5 using Power button graceful shutdown 5 using powercycle command graceful shutdown 59 using powercycle f command emergency shutdown 59 using poweroff command 59 slide rail latch 70 Solaris log files 13 Solaris log files as diagnostic tool 13 Solaris OS checking log files for fault information 13 collecting diagnostic information from 37 message buffer checking 38 message log files viewing 38 Solaris Predictive Self Healing 19 SunVTS 20 as fault diagnosis tool 13 browser environment 42 Component Stress parameter 43 exercising the system with 40 System Excerciser 43 tests 45 user interfaces 40 42 43 44 45 using for fault diagnosis 13 verifying installation 41 syslogd daemon 38 system console 24 system console switching to 24 system controller 10 T tools required for service 66 Top system LED about 5 top cover and emergency shutdown 158 installing 158 troubl
70. back Test Port 3 2007 12 19 22 01 22 556 0 0 0 gt INFO STATUS Running BMAC level Loopback Test 2007 12 19 22 01 32 004 0 0 0 gt End Neptune 1G Loopback Test Port 3 Enter 4 to return to ALOM 2007 12 19 22 01 27 271 0 0 0 gt 2007 12 19 22 01 32 012 0 0 0 gt INFO 2007 12 19 22 01 32 019 0 0 0 gt POST Passed all devices 2007 12 19 22 01 27 274 0 0 0 gt INFO STATUS Running RGMII 1G BCM5466R PHY level Loopback Test 2007 12 19 22 01 32 036 0 0 0 gt Master set ACK for vbsc runpost command and spin T5440 No Keyboard OpenBoot 7968 MB memory available Serial 475916434 stacie obp 0 0 ok 2007 12 19 22 01 32 028 0 0 0 gt POST Return to VBSC Ethernet address 0 14 4f 86 64 92 Host ID xxxxx Managing Faults 31 Detecting Faults This section describes the different methods you can use to identify system faults in the server Task Topic Use front panel and back panel LEDs to Detecting Faults Using LEDs on page 32 U U U U U identify system faults se the Oracle ILOM show faulty command to detect faults se Oracle Solaris OS files and commands to detect faults se the Oracle ILOM event log to detect faults se POST to identify faults se Predictive Self Healing PSH to identify faults Detecting Faults Oracle ILOM show faulty Command on page 34 Detecting Faults Oracle Solaris OS Files and Commands on page 37
71. been diagnosed but there is a problem with the system configuration 7 Determine if the Problems detected in the External I O Expansion Detecting Faults Oracle fault was detected Unit include the text string Ext FRU or Ext ILOM show faulty in the External I O Sensor at the beginning of the fault description Command on page 34 Expansion Unit Clear Faults Detected in the External I O Expansion Unit on page 54 14 Sun SPARC Enterprise T5440 Server Service Manual June 2011 TABLE Diagnostic Flowchart Actions Continued Action No Diagnostic Action Resulting Action For more information 8 Determine if the If the fault displayed included a uuid and Identifying Faults fault was detected sunw msg id property the fault was detected by the Detected by PSH on by PSH Predictive Self Healing software page 47 If the fault is a PSH detected fault refer to the PSH Knowledge Article web site for additional Clear Faults Detected by information The Knowledge Article for the fault is PSH on page 53 located at the following link http www sun com msg message ID where message ID is the value of the sunw msg id property displayed by the show faulty command After the FRU is replaced perform the procedure to clear PSH detected faults 9 Determine if the POST performs basic tests of the server components POST Fault Management fault was detected and reports faulty FRUs When POST detects a Overview on page 20 by POST
72. bes the tasks necessary to use Oracle VTS software to exercise your server Related Information m Diagnostic Flowchart on page 11 m Verify Installation of Oracle VTS Software on page 41 Start the Oracle VTS Browser Environment on page 42 40 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Oracle VTS Software Packages on page 44 m Useful Oracle VTS Tests on page 45 V Verify Installation of Oracle VTS Software To perform this procedure the Oracle Solaris OS must be running on the server and you must have access to the Oracle Solaris command line Note The Oracle VTS 7 0 software and future compatible versions are supported on the server The Oracle VTS installation process requires that you specify one of two security schemes to use when running Oracle VTS The security scheme you choose must be properly configured in the Oracle Solaris OS for you to run the Oracle VTS software For details refer to the Oracle VTS User s Guide 1 Check for the presence of Oracle VTS packages using the pkginfo command pkginfo 1 SUNWvts SUNWvtsmn SUNWvtsr SUNWvtss SUNWvtsts m If the Oracle VTS software is installed information about the packages is displayed m If the Oracle VTS software is not installed you see an error message for each missing package as in EXAMPLE show Command Output on page 42 See Oracle VTS Overview on page 20 for a list of required Oracle VTS software packages 2 If the
73. ble assembly The flex cable assembly serves as the interconnect between the power supply backplane motherboard hard drive backplane and DVD ROM drive m Power supply backplane I2C cable This cable transmits power supply status to the motherboard Related Information m Sun SPARC Enterprise T5440 Server Site Planning Guide m Managing Faults on page 9 m Servicing Customer Replaceable Units on page 77 m Servicing Field Replaceable Units on page 123 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Front Panel Diagram The server front panel contains a recessed system power button system status and fault LEDs Locator button and LED The front panel also provides access to internal hard drives the DVD ROM drive if equipped and the two front USB ports The following illustration shows front panel features on the server fron panel For a detailed description of front panel controls and LEDs see Front Panel LEDs on page 5 FIGURE Front Panel Features I ve A OH oo o ommo Figure Legend 1 Locator Button LED 5 Component Fault LEDs 2 Service Required LED 6 DVD ROM Drive 3 Power OK LED 7 USB Ports 4 Power Button 8 Hard Drives Related Information m Front Panel LEDs on page 5 Identifying Server Components 3 m Rear Panel Diagram on page 6 m Servicing the Front Bezel on page 123 4 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Front Panel LED
74. ce processor Servicing Field Replaceable Units 131 2 Place the IDPROM on an antistatic mat V Install the IDPROM Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Disconnect Power Cords From the Server on page 68 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Remove the Service Processor on page 128 132 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Plug the IDPROM into its connector on the service processor Ensure that the service processor is oriented correctly A notch on the IDPROM corresponds to a similar notch on the connector Servicing the Battery The battery provides the power necessary to maintain system configuration parameters during power outages or while the system is being serviced stored or relocated m Remove the Battery on page 133 m Install the Battery on page 134 Related Information m Servicing the Service Processor on page 128 m Servicing the IDPROM on page 131 Remove the Battery Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the Syst
75. cle com m Training https education oracle com xii Sun SPARC Enterprise T5440 Server Service Manual June 2011 Identifying Server Components These topics provide an overview of the server including major boards and components as well as front and rear panel features For a more comprehensive overview of the server performance features and specifications see the Sun SPARC Enterprise T5440 Seroer Overview Guide Description Links Review the infrastructure boards and cables Infrastructure Boards and Cables on in the server page 1 Review the front panel features Front Panel Diagram on page 3 Front Panel LEDs on page 5 Review the rear panel features Rear Panel Diagram on page 6 Rear Panel LEDs on page 8 Ethernet Port LEDs on page 9 Related Information m Server Components on page 181 Infrastructure Boards and Cables The server is based on a 4U chassis and has the following boards installed m Motherboard The motherboard includes slots for up to four CMP modules and four memory modules memory control subsystem up to eight PCIe expansion slots and a service processor slot The motherboard also contains a top cover safety interlock kill switch Note 10 Gbit Ethernet XAUI cards are shared in Slots 4 and 5 m CMP module Each CMP module contains an UltraSPARC T2 Plus chip slots for four FB DIMMs and associated DC DC converters m Memory module A me
76. consolehistory command 58 D DC OK power supply LED 97 device identifiers CMP memory modules 109 fan tray 90 FB DIMMs 119 hard drive 85 PCIe card 101 power supply 97 diag level parameter 28 61 diag mode parameter 28 61 diag trigger parameter 28 61 diag verbosity parameter 28 61 diagnostics about 10 flowchart 12 low level 20 running remotely 16 using SunVTS 20 disablecomponent command 56 displaying FRU status 26 dmesg command 38 DVD ROM drive installing 127 removing 126 E ejector tabs FB DIMM 111 electrostatic discharge ESD preventing using an antistatic mat 65 preventing using an antistatic wrist strap 65 safety measures 65 emergency shutdown 68 using Power button 5 enablecomponent command 52 environmental faults 13 14 17 35 event log checking the PSH 48 EVENT ID FRU 48 exercising the system with SunVTS 40 External I O Expansion Unit fault detected by show faulty command 37 faults detection in 16 F Fan Fault system LED interpreting to diagnose faults 33 fan tray 89 determining fault state 33 device identifiers 90 Fault LED 33 installing 88 90 removing 87 89 fan tray carriage installing 138 removing 137 fan tray LEDs about 91 using to identify faults 33 fan trays about 86 Fault hard drive LED 33 Fault power supply LED 92 98 fault manager daemon md 1M 19 fault records 53 fault recovery CMP module 163 I O device 166
77. d Steady on Indicates that a temperature failure event has been acknowledged and a service action is required Related Information m Front Panel Diagram on page 3 m Rear Panel LEDs on page 8 m Detecting Faults Using LEDs on page 32 Rear Panel Diagram The rear panel provides access to system I O ports PCIe ports Gigabit Ethernet ports power supplies Locator button and LED and system status LEDs FIGURE Rear Panel Features on page 7 shows rear panel features on the SPARC Enterprise T5440 server For more detailed information about ports and their uses see the Sun SPARC Enterprise T5440 Server Installation and Setup Guide For a detailed description of PCIe slots see PCIe Device Identifiers on page 101 6 Sun SPARC Enterprise T5440 Server Service Manual June 2011 FIGURE Rear Panel Features Figure Legend o0 BR ON Power supplies Serial port Serial management port System status LEDs USB ports Network management port Gigabit ethernet ports Related Information m Front Panel Diagram on page 3 m Rear Panel LEDs on page 8 m Ethernet Port LEDs on page 9 m Detecting Faults Using LEDs on page 32 Identifying Server Components 7 Rear Panel LEDs LED Icon Description Locator LED NK The Locator LED enables you to find a particular system The LED is and button activated using one of the following methods white The ALOM CMT command
78. d drive faults are usually captured by the Oracle Solaris message files Managing Faults 37 38 Use the dmesg command to view the most recent system message To view the system messages log file view the contents of the var adm messages file Related Information m Diagnostic Flowchart on page 11 m Detecting Faults Using LEDs on page 32 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server Check the Message Buffer 1 Log in as superuser 2 Issue the dmesg command dmesg The dmesg command displays the most recent messages generated by the system View System Message Log Files The error logging daemon syslogd automatically records various system warnings errors and faults in message files These messages can alert you to system problems such as a device that is about to fail The var adm directory contains several message files The most recent messages are in the var adm messages file After a period of time usually every week a new messages file is automatically created The original contents of the messages file are rotated to a file named messages 1 Over a period of time the messages are further rotated to messages 2 and messages 3 and then deleted 1 Log in as superuser
79. d Information m Diagnostic Flowchart on page 11 m Oracle VTS Software Packages on page 44 m Useful Oracle VTS Tests on page 45 m Sun SPARC Enterprise T5440 Server Administration Guide POST Fault Management Overview Power on self test POST is a group of PROM based tests that run when the server is powered on or reset POST checks the basic integrity of the critical hardware components in the server CMP memory and I O subsystem POST tests critical hardware components to verify functionality before the system boots and accesses software If POST detects a faulty component the component is disabled automatically preventing faulty hardware from potentially harming any software If the system is capable of running without the disabled component the system will boot when POST is complete For example if one of the processor cores is deemed faulty by POST the core will be disabled The system will boot and run using the remaining cores You can use POST as an initial diagnostic tool for the system hardware In this case configure POST to run in maximum mode diag mode service setkeyswitch diag diag level max for thorough test coverage and verbose output 20 Sun SPARC Enterprise T5440 Server Service Manual June 2011 POST Fault Management Flowchart FIGURE Flowchart of Variables for POST Configuration diag mode ar user reset power on reset error reset diag trigger Service Mode Forces a
80. detected by Oracle Solaris PSH You can also configure the ALOM CMT compatibility shell to display Oracle Solaris PSH alerts See the Oracle Integrated Lights Out Manager 3 0 Concepts Guide The following example depicts an ALOM CMT alert of the same fault reported by Oracle Solaris PSH in EXAMPLE Console Message Showing Fault Detected by PSH on page 47 Managing Faults 47 EXAMPLE ALOM CMT Alert of PSH Diagnosed Fault SC Alert Host detected fault MSGID SUNAV 8000 DX The Oracle ILOM show faulty command provides summary information about the fault See Detect Faults Oracle ILOM show faulty Command on page 35 for more information about the show faulty command Note The Service Required LED is also turned on for PSH diagnosed faults Related Information m Diagnostic Flowchart on page 11 m Predictive Self Healing Overview on page 19 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server V Detect Faults Identified by the Oracle Solaris PSH Facility Oracle ILOM mdump Command The Oracle ILOM mdump command displays the list of faults detected by the Oracle Solaris PSH facility and identifies the faulty FRU for a particular EVENT ID UUID Note Do not use mdump to verify that a FRU replacement has cleared a fault because the output
81. displays n lines from the beginning of the buffer v displays the entire buffer boot run specifies the log to display run is the default log Displays a list of all available commands with syntax and descriptions Specifying a command name as an option displays help for that command Takes the host server from the OS to either kmdb or OpenBoot PROM equivalent to a Stop A depending on the mode Oracle Solaris software was booted Manually clears host detected faults The LIUID is the unique fault ID of the fault to be cleared Connects you to the host system Displays the contents of the system s console buffer 58 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Oracle ILOM Command ALOM CMT Command Description set HOST bootmode value normal re set_nvram bootscript string stop SYS start SYS stop SYS start SYS Set SYS PSx prepare to remove acti on true reset SYS reset SP bootmode value normal reset_nvram bootscript string powercycle The f option forces an immediate poweroff Otherwise the command attempts a graceful shutdown poweroff y y enables you to skip the confirmation question e f forces an immediate shutdown poweron c c executes a console command after completion of the poweron command removefru PSO PS1 reset y c y enables you to skip the confirmation question cexecu
82. dule or move an existing module over from a different slot until the failed CMP is replaced m You are upgrading from a 2P to a 4P system Related Information m Managing Faults on page 9 I O Connections to CMP Memory Modules on page 164 m System Bus Topology on page 171 m I O Fabric in 2P Configuration on page 172 m I O Fabric in 4P Configuration on page 173 m Temporarily Disable All Memory Modules on page 168 m Reconfigure the I O and PCIe Fabric on page 167 m Re Enable All Memory Modules on page 169 166 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Reset the LDoms Guest Configuration on page 170 V Reconfigure the I O and PCIe Fabric The reconfig pl script reconfigures the PCIe fabric to reconnect the PCIe slots and onboard devices to the CMP nodes as efficiently as possible The reconfig pl script also reconfigures the Solaris device names to match the new connections between the CMP modules and the PCIe devices and slots Use the reconfig pl script to reattach each PCIe slot and onboard device to its nearest available CMP module To use the reconfig pl you must have the following m Solaris OS JumpStart server m Net install image m The reconfig pl script Do the following 1 Download the reconfig pl script The reconfig pl script is included in Patch ID 10264587 2 Copy the reconfig pl script to the root directory of the miniroot of the netinstall image This is the Solaris
83. e T5440 Server Service Manual June 2011 Servicing Field Replaceable Units These topics describe how to service field replaceable units FRUs in the server Note The procedures in this chapter must be performed by a qualified service technician Topic Remove and install field replaceable components Exploded views of FRUs Links Servicing the Front Bezel on page 123 Servicing the DVD ROM Drive on page 126 Servicing the Service Processor on page 128 Servicing the IDPROM on page 131 Servicing the Battery on page 133 Servicing the Power Distribution Board on page 134 Servicing the Fan Tray Carriage on page 137 Servicing the Hard Drive Backplane on page 139 Servicing the Motherboard on page 143 Servicing the Flex Cable Assembly on page 148 Servicing the Front Control Panel on page 152 Servicing the Front I O Board on page 154 Field Replaceable Units on page 184 Servicing the Front Bezel You must remove the front bezel in order to service the DVD ROM drive m Remove the Front Bezel on page 124 m Install the Front Bezel on page 125 123 Related Information m Servicing the DVD ROM Drive on page 126 WV Remove the Front Bezel Before you begin complete these tasks m Read the section Safety Information on page 63 m If you are performing additional service procedures power off the server using one of the methods described in the section Power
84. e memory modules Reconfigure I O and PCIe fabric Re enable memory modules to work in a new I O and PCIe configuration Reset logical domain guest configuration Reference for system bus topology Reference for I O fabric in supported configurations T O Connections to CMP Memory Modules on page 164 Reconfiguring I O Device Nodes on page 166 Temporarily Disable All Memory Modules on page 168 Reconfigure the I O and PCIe Fabric on page 167 Re Enable All Memory Modules on page 169 Reset the LDoms Guest Configuration on page 170 System Bus Topology on page 171 I O Fabric in 2P Configuration on page 172 T O Fabric in 4P Configuration on page 173 Related Information m Managing Faults on page 9 m Servicing PCIe Cards on page 98 163 m Servicing CMP Memory Modules on page 104 m Servicing FB DIMMs on page 110 I O Connections to CMP Memory Modules Each PCIe slot and onboard I O device is connected to one CMP module Device address is dependent on system configuration See CMP Number on page 172 and CMP Number on page 173 for more information If a CMP module fails the onboard devices and slots directly connected to it become unavailable Recovery of the I O services connected to the failed CMP requires I O node reconfiguration For example in a 4P system if CMPO goes offline the following devices become unavailable m PCIe0 m PClel m O
85. e system up and running even when there is a faulty component See Oracle ILOM Overview on page 16 Power on self test POST POST performs diagnostics on system components upon system reset to ensure the integrity of those components POST is configurable and works with Oracle ILOM to take faulty components offline if needed See POST Fault Management Overview on page 20 Oracle Solaris OS Predictive Self Healing PSH This technology continuously monitors the health of the processor and memory and works with Oracle ILOM to take a faulty component offline if needed The Predictive Self Healing technology enables systems to accurately predict component failures and mitigate many serious problems before they occur See Identifying Faults Detected by PSH on page 47 Log files and console messages Oracle Solaris OS log files and Oracle ILOM system event log can be accessed and displayed on the device of your choice For more information see Detecting Faults Oracle Solaris OS Files and Commands on page 37 and Detecting Faults Oracle ILOM Event Log on page 39 Oracle VTS software The Oracle VTS software exercises the system provides hardware validation and discloses possible faulty components with recommendations for repair See About Oracle VTS Software on page 40 The LEDs Oracle ILOM Oracle Solaris OS PSH and many of the log files and console messages are integrated For example a fault detected by the Oracle
86. eared from Oracle ILOM show faulty after the problem has been repaired Note After the problem has been repaired the fault will also be cleared from the Oracle ILOM show faulty command by resetting the service processor The example below shows a problem detected in the External I O Expansion Unit show faulty Target Property Value dele c e ee nea a ey eles OR Oe ge ero SP faultmgmt 0 fru SYS IOX XOTC IOB1 LINK SP faultmgmt 0 timestamp Feb 05 18 28 20 SP faultmgmt 0 timestamp Feb 05 18 28 20 faults 0 SP faultmgmt 0 sp_detected_fault Ext FRU SYS IOXQGXOTC IOB1 LINK faults 0 SIGCON 0 I2C no device response After the problem is repaired use the Oracle ILOM set clear fault action command to clear a fault in the External I O Expansion Unit gt set clear fault action true SYS IOXGXOTC IOB1 LINK Are you sure you want to clear SYS IOX X0TC IOB1 LINK y n y Set clear fault action to true 54 Disabling Faulty Components This topic contains the following m Disabling Faulty Components Using Automatic System Recovery on page 55 m Disable System Components on page 56 m Re Enable System Components on page 56 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Disabling Faulty Components Using Automatic System Recovery You can use the Automatic System Recovery ASR feature to configure the server to automatically disable failed co
87. ecause the fault is cleared 6 Run the set command gt set SYS MB CPU0 CMP0 BRO CH1 D0 clear fault action True Are you sure you want to clear SYS MB CPUO CMPO BRO CH1 DO y n y Set clear fault action to true 114 Sun SPARC Enterprise T5440 Server Service Manual June 2011 V Add FB DIMMs If you are upgrading the system with additional FB DIMMs use this procedure Before you begin complete these tasks 5 Read the section Safety Information on page 63 Read the sections FB DIMM Configuration on page 116 and FB DIMM Device Identifiers on page 119 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Remove a CMP Memory Module on page 106 Unpackage the FB DIMMs and place them on an antistatic mat Ensure that the ejector tabs are in the open position Line up the FB DIMM with the connector Align the FB DIMM notch with the key in the connector This ensures that the FB DIMM is oriented correctly Push the FB DIMM into the connector until the ejector tabs lock the FB DIMM in place If the FB DIMM does not easily seat into the connector verify that the orientation of the FB DIMM is correct If the orientation is reversed damage to the FB DIMM migh
88. em on page 67 m Extend the Server to the Maintenance Position on page 70 m Disconnect Power Cords From the Server on page 68 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove the Service Processor on page 128 1 Release the latch securing the battery to its holder on the service processor board 2 Lift the battery up and off the board Servicing Field Replaceable Units 133 V Install the Battery 1 Place the battery into its holder on the service processor board Ensure that the battery is oriented correctly 2 Press the battery firmly until it snaps into place Next Steps m Install the Service Processor on page 130 m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Connect the Power Cords to the Server on page 161 m Power On the Server on page 161 Servicing the Power Distribution Board Main 12V power is connected to the motherboard through a bus bar Standby power and other control signals are routed through the flex cable circuit to the motherboard m Remove the Power Distribution Board on page 134 m Install the Power Distribution Board on page 136 Related Information m Safety Information on page 63 m Servicing Power Supplies on page 91 V Remove the Power Distribution Board Before you begin complete these tasks m Read the section Safety Information on page 6
89. emove the Top Cover on page 73 Do the following 1 2 Identify the module you want to remove Rotate the ejector levers up and away from the module UD D D D D D AS ha l TN d if N 4 LA JT FL UI n H ad s MIN 3 4 Slide the module up and out of the system Place the module on an antistatic mat Sun SPARC Enterprise T5440 Server Service Manual June 2011 V Install CMP Memory Module Note If you are replacing a faulty CMP or memory module you must transfer the FB DIMMs on the faulty module to the replacement module Replacement CMP memory modules do not include FB DIMMs For more information about installing FB DIMMs see Servicing FB DIMMs on page 110 1 Identify the correct slot for installation 2 Slide the module down into its slot My mpm TR eee 3 Rotate the ejector levers down to secure the module into place Next Steps m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Power On the Server on page 161 WV Add a CMP Memory Module Before you begin complete these tasks m Read the section Safety Information on page 63 Servicing Customer Replaceable Units 107 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge
90. es de secours de sauvegarde de redondance et autres mesures n cessaires son utilisation dans des conditions optimales de s curit Oracle Corporation et ses affili s d clinent toute responsabilit quant aux dommages caus s par l utilisation de ce logiciel ou mat riel pour ce type d applications Oracle et Java sont des marques d pos es d Oracle Corporation et ou de ses affili s Tout autre nom mentionn peut correspondre des marques appartenant d autres propri taires qu Oracle AMD Opteron le logo AMD et le logo AMD Opteron sont des marques ou des marques depos es d Advanced Micro Devices Intel et Intel Xeon sont des marques ou des marques d pos es d Intel Corporation Toutes les marques SPARC sont utilis es sous licence et sont des marques ou des marques d pos es de SPARC International Inc UNIX est une marque d pos e conc d e sous licence par X Open Company Ltd Celogiciel ou mat riel et la documentation qui l accompagne peuvent fournir des informations ou des liens donnant acc s des contenus des produits et des services manant de tiers Oracle Corporation et ses affili s d clinent toute responsabilit ou garantie expresse quant aux contenus produits ou services manant de tiers En aucun cas Oracle Corporation et ses affili s ne sauraient tre tenus pour responsables des pertes subies des co ts occasionn s ou des dommages caus s par l acc s des contenus produits ou services tiers ou leur ut
91. eshooting AC OK LED state 13 actions 13 by checking Solaris OS log files 13 CMP0 failure 164 CMP1 failure 164 FB DIMMs 23 Power OK LED state 13 192 Sun SPARC Enterprise T5440 Server Service Manual June 2011 using LEDs 32 using POST 14 15 using SunVTS 13 using the show faulty command 13 U UltraSPARC T2 multicore processor 19 Universal Unique Identifier UUID 19 48 USB ports front 3 pinouts 179 V virtual keyswitch 29 113 X XAUI card about 1 configuration guidelines See PCIe configuration guidelines installing See PCIe card installing Index 193 194 Sun SPARC Enterprise T5440 Server Service Manual June 2011
92. f the status of a component is changed there is no effect to the system until the next reset or power cycle Related Information m Diagnostic Flowchart on page 11 m Detecting Faults on page 32 m Sun SPARC Enterprise T5440 Server Administration Guide V Disable System Components The component state property disables a component by adding it to the ASR blacklist 1 At the gt prompt set the component state property to Disabled gt set SYS MB CPUO CMPO BR1 CHO DO0 component state Disabled 2 Reset the server so that the ASR command takes effect stop SYS Are you sure you want to stop SYS y n y Stopping SYS start SYS Are you sure you want to start SYS y n y Starting SYS Note In the Oracle ILOM shell there is no notification when the system is actually powered off Powering off takes about a minute Use the show HOST command to determine if the host has powered off V Re Enable System Components The component state property enables a component by removing it from the ASR blacklist 56 Sun SPARC Enterprise T5440 Server Service Manual June 2011 1 At the gt prompt set the component state property to Enabled gt set SYS MB CPU0 CMP0 BR1 CH0 D0 component state Enabled 2 Reset the server so that the ASR command takes effect stop SYS Are you sure you want to stop SYS y n y Stopping SYS start SYS Are yo
93. faults clearing POST detected faults 51 detected by POST 13 35 37 detected by PSH 13 36 diagnosing with LEDs 32 to 34 environmental 13 14 35 environmental displayed by show faulty command 36 FB DIMM 112 forwarded to ILOM 16 recovery 17 repair 17 types of 35 FB DIMM fault button 120 FB DIMM Fault LEDs 34 FB DIMMs adding 115 degraded 168 device identifiers 119 diagnosing with fault button 120 diagnosing with show faulty command 112 disabling to run system in degraded state 168 ejector tabs 111 188 Sun SPARC Enterprise T5440 Server Service Manual June 2011 example POST error output 45 fault handling 22 installing 111 managing faults in 112 re enabling to run system in degraded state 169 removing 110 troubleshooting 23 verifying successful replacement 112 flex cable assembly installing 150 removing 149 fmadm command 53 114 fmdump command 48 front bezel installing 125 removing 124 front control panel installing 153 removing 152 front I O board installing 156 removing 155 front panel diagram 3 front panel LEDs 4 FRU event ID 48 FRU ID PROMs 16 FRU information displaying with show command 26 FRU status displaying 26 G Gigabit Ethernet ports LEDs 8 pinouts 180 H hard drive about 78 addressing 81 84 determining fault state 33 device identifiers 85 Fault LED 33 hot plugging 81 installing 81 84 Ready to Remove LED 82 removing 79 83 hard drive b
94. faulty FRU it logs the fault and if possible takes the FRU offline POST detected FRUs display the ncisar Faults Detected following text in the fault message During POST on page 51 Forced fail reason In a POST fault message reason is the name of the power on routine that detected the failure 10 Contact technical The majority of hardware faults are detected by the Obtain the Chassis Serial support server s diagnostics In rare cases a problem might Number on page 66 require additional troubleshooting If you are unable to determine the cause of the problem contact your service representative for support Related Information m Server Diagnostics Overview on page 10 m Sun SPARC Enterprise T5440 Server Administration Guide Options for Accessing the Service Processor There are three methods of interacting with the service processor m Oracle Integrated Lights Out Manager Oracle ILOM shell default Available via the System Management Port and the Network Management Port m Oracle ILOM browser interface BI Documented in the Oracle Integrated Lights Out Manager 2 0 User s Guide m ALOM CMT compatibility shell Legacy shell emulation of ALOM CMT Managing Faults 15 The code examples in this document depict use of the Oracle ILOM shell Note Multiple service processor accounts can be active concurrently A user can be logged in under one account using the Oracle ILOM shell and another account using the AL
95. he Chassis Serial Number on page 66 Obtain the Chassis Serial Number Remotely on page 66 Powering Off the System on page 67 Extending the Server to the Maintenance Position on page 69 Remove the Server From the Rack on page 71 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Related Information Managing Faults on page 9 Servicing Customer Replaceable Units on page 77 Servicing Field Replaceable Units on page 123 Returning the Server to Operation on page 157 Safety Information The following topics describe important safety information that you need to know prior to removing or installing parts in the server Observing Important Safety Precautions on page 64 Safety Symbols on page 64 Electrostatic Discharge Safety Measures on page 65 63 E Observing Important Safety Precautions For your protection observe the following safety precautions when setting up your equipment m Follow all cautions and instructions marked on the equipment and described in the documentation shipped with your system m Follow all cautions and instructions marked on the equipment and described in the Sun SPARC Enterprise T5440 Server Safety and Compliance Guide m Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment s electrical rating label m Follow the electrostatic discharge safety
96. he iostat E command displays information about your system s installed devices such as manufacturer model number serial number size and system error statistics EXAMPLE Sample Ap_id Output Ap_id Type Receptacle Occupant Condition co scsi bus connected configured unknown c0 dsk di1t0dO disk connected configured unknown c0 sd1 disk connected unconfigured unknown usb0 1 unknown empty unconfigured ok usb0 2 unknown empty unconfigured ok usb0 3 unknown empty unconfigured ok usb1 1 unknown empty unconfigured ok usb1 2 unknown empty unconfigured ok usb1 3 unknown empty unconfigured ok usb2 1 unknown empty unconfigured ok usb2 2 unknown empty unconfigured ok usb2 3 unknown empty unconfigured ok usb2 4 unknown empty unconfigured ok usb2 5 unknown empty unconfigured ok usb2 6 unknown empty unconfigured ok usb2 7 unknown empty unconfigured ok usb2 8 unknown empty unconfigured ok V Remove a Hard Drive If you are removing a hard drive as a prerequisite for another service procedure follow the steps in this section Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Do the following Servicing Customer Replaceable Units 83 1 Note the location of each hard drive Note
97. he motherboard 6 Remove the six No 2 Phillips screws that secure the bus bar assembly to the motherboard 7 Slide the chassis midwall panel up Note Use the clips to secure the midwall panel in the open position 8 Loosen the No 2 Phillips screws that secure the motherboard to the chassis floor See Motherboard Fastener Locations on page 147 for the fastener locations 9 Lift the motherboard up and out of the chassis Guide the flex cable connector out from under the midwall partition 10 Place the motherboard on an antistatic mat Next Steps If you are replacing a faulty motherboard you must program the chassis serial number and product part number into the new motherboard See your service representative Servicing Field Replaceable Units 145 V Install the Motherboard 1 Ensure that all 14 captive screws in the motherboard are retracted 2 Lower the motherboard down into the chassis Guide the flex cable connector through the midwall partition 3 Secure the No 2 captive Phillips screws Ensure that all fasteners are secured See Motherboard Fastener Locations on page 147 4 Lower and secure the midwall partition 5 Install the six No 2 Phillips screws that secure the bus bar assembly to the motherboard 6 Install the CMP memory module bracket The bracket is secured with six No 2 Phillips screws 7 Plug in the auxiliary power cable to 9803 8 Plug in the flex cable c
98. heir status Atthe gt prompt type the show components command The examples below show two possibilities Managing Faults 25 EXAMPLE Output of the show components Command With No Disabled Components show components Target Property Value ee P a ee ee ee E tele ee E te eee ee ee ae SYS MB PCIEO component state Enabled SYS MB PCIE3 component state Enabled SYS MB PCIE1 component state Enabled SYS MB PCIEA component state Enabled SYS MB PCIE2 component state Enabled SYS MB PCIE5 component state Enabled SYS MB NETO component state Enabled SYS MB NET1 component state Enabled SYS MB NET2 component state Enabled SYS MB NET3 component state Enabled SYS MB PCIE component_state Enabled EXAMPLE Output of the show components Command Showing Disabled Components show components Target Property Value TCR CENSOR ONDE TERM eet es ee EOCP ARE CEP ee SYS MB PCIEO0 component state Enabled SYS MB PCIE3 component state Disabled SYS MB PCIE1 component state Enabled SYS MB PCIEA component state Enabled SYS MB PCIE2 component state Enabled SYS MB PCIE5 component state Enabled SYS MB NETO component state Enabled SYS MB NET1 component state Enabled SYS MB NET2 component state Enabled SYS MB NET3 component state Enabled SYS MB PCIE component_state Enab
99. hernet connectors NETO NET1 NET2 NET3 are located on the system motherboard and can be accessed from the back panel The Ethernet interfaces operate at 10 Mbit sec 100 Mbit sec and 1000 Mbit sec 1 8 Qe Pin Signal Description Pin Signal Description 1 Transmit Receive Data 0 Transmit Receive Data 2 Transmit Receive Data 0 Transmit Receive Data 1 oN FD A Ae WO N Transmit Receive Data 2 Transmit Receive Data 1 Transmit Receive Data 3 Transmit Receive Data 3 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Server Components These topics provide illustrations depicting components of Oracle s Sun SPARC Enterprise T5440 server Description Links A diagram and list of customer Customer Replaceable Units on page 182 replaceable units CRUs A diagram and list of components Field Replaceable Units on page 184 that only field service personnel can replace Related Information m Identifying Server Components on page 1 m Servicing Customer Replaceable Units on page 77 m Servicing Field Replaceable Units on page 123 181 Customer Replaceable Units FIGURE Customer Replaceable Units CRUs Figure Legend 1 CMP modules 5 Front bezel 2 Memory modules 6 Hard drives 3 Fan trays 7 Power supplies 4 Removable media drive 8 182 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Related Information m Hot Pluggab
100. ilisation CA m Adobe PostScript Contents Using This Documentation xi Identifying Server Components 1 Infrastructure Boards and Cables 1 Front Panel Diagram 3 Front Panel LEDs 4 Rear Panel Diagram 5 Rear Panel LEDs 7 Ethernet Port LEDs 8 Managing Faults 9 Understanding Fault Handling Options 9 Server Diagnostics Overview 10 Diagnostic Flowchart 11 Options for Accessing the Service Processor 15 Oracle ILOM Overview 16 ALOM CMT Compatibility Shell Overview 18 Predictive Self Healing Overview 19 Oracle VTS Overview 20 POST Fault Management Overview 20 POST Fault Management Flowchart 21 Memory Fault Handling Overview 22 Connecting to the Service Processor 23 v Switch From the System Console to the Service Processor Oracle ILOM or ALOM CMT Compatibility Shell 24 v Switch From Oracle ILOM to the System Console 24 v Switch From the ALOM CMT Compatibility Shell to the System Console 25 Displaying FRU Information With Oracle ILOM 25 v Display System Components Oracle ILOM show components Command 25 v Display Individual Component Information Oracle ILOM show Command 26 Controlling How POST Runs 27 POST Parameters 28 v Change POST Parameters 29 v Run POST in Maximum Mode 30 Detecting Faults 32 Detecting Faults Using LEDs 32 Detecting Faults Oracle ILOM show faulty Command 34 v Detect Faults Oracle ILOM show faulty Command 35 Detecting Faults Oracle Solaris OS Files and Commands 37 v Check the Message
101. ine For more information see the Sun SPARC Enterprise T5440 Server Product Notes Related Information m Managing Faults on page 9 m PCIe Slot Configuration Guidelines on page 102 m System Bus Topology on page 171 m Performing Node Reconfiguration on page 163 PCIe Slot Configuration Guidelines You can install up to eight low profile PCIe in the system All slots are wired to x8 PCIe lanes Slot 1 and Slot 7 support graphics cards with x16 connectors Slot 4 and Slot 5 also support 10 Gbyte Ethernet cards XAUI cards When a XAUI card is installed a PCIe card cannot be installed in the same slot If you are installing a XAUI card note the following m If you are installing a XAUI card in XAUI Port 0 the onboard NET1 port is disabled m If you are installing a XAUI card in XAUI Port 1 the onboard NETO port is disabled 102 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Use the following guidelines to spread the load evenly across CMP memory modules If a slot is already populated with a device install a new device in the next available slot in the order indicated Number of CMP Memory PCle XAUI Card Type Modules Installation Order Notes 10 GBit Ethernet XAUI 1 2 30r4 Slot 4 5 Install XAUI cards first card External I O Expansion 2 Slot 0 4 1 5 Maximum of 4 cards install in order Unit PCIe Link card shown 4 Slot 0 4 2 6 1 5 3 7 Maximum of 8 cards install in order shown
102. ing Off the System on page 67 m Extend the Server to the Maintenance Position on page 70 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Do the following 1 Grasp the front bezel on the left and right sides 2 Pull the bezel off of the front of the chassis The bezel is secured with three snap in posts 124 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Note Avoid bending the bezel by gradually pulling it from the middle and both ends simultaneously V Install the Front Bezel 1 Align the bezel with the chassis front panel 2 Press the bezel onto the front panel The bezel is oriented with four guide pins and is secured with three snap in posts Next Steps m Slide the Server Into the Rack on page 159 m If you performed any additional service procedures see Power On the Server on page 161 Servicing Field Replaceable Units 125 Servicing the DVD ROM Drive You must remove the front bezel before servicing the DVD ROM drive Remove the DVD ROM Drive on page 126 Install the DVD ROM Drive on page 127 Related Information Servicing the Front Bezel on page 123 WV Remove the DVD ROM Drive Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Mainte
103. ion See Extending the Server to the Maintenance Position on page 69 3 Disconnect the CMA Pull out the retention pin that secures the cable management arm CMA to the rack rail FIGURE Removing the Server From the Rack on page 72 Slide the CMA out of the end of the inner glide The CMA is still attached to the cabinet but the server is now disconnected from the CMA Preparing to Service the System 71 FIGURE Removing the Server From the Rack E E E B a Figure Legend 1 Disconnect system cables and CMA 2 Press inner rail release buttons to remove the server from the rack Caution Use two people to dismount and carry the chassis FIGURE Lift Warning d 4 From the front of the server press inner rail release buttons and pull the server forward until it is free of the rack rails 5 Set the server on a sturdy work surface 72 Sun SPARC Enterprise T5440 Server Service Manual June 2011 V Perform Electrostatic Discharge Antistatic Prevention Measures 1 Prepare an antistatic surface to set parts on during the removal installation or replacement process Place ESD sensitive components such as the printed circuit boards on an antistatic mat The following items can be used as an antistatic mat m Antistatic bag used to wrap a replacement part m ESD mat m disposable ESD mat shipped with some replacement parts or optional system components Attach a
104. is routed under the flex cable connector 6 Plug in the auxiliary power cable 7 Install the flex cable retainer Place the retainer into position and tighten the captive No 2 Phillips screw Next Steps m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Install a Power Supply on page 96 Note Install all four power supplies m Connect the Power Cords to the Server on page 161 136 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Power On the Server on page 161 Servicing the Fan Tray Carriage You must remove the fan tray carriage in order to service the following components m Hard drive backplane m Motherboard m Front control panel m Front I O board This topic includes the following m Remove the Fan Tray Carriage on page 137 m Install the Fan Tray Carriage on page 138 Related Information m Servicing Fan Trays on page 86 m Servicing the Hard Drive Backplane on page 139 m Servicing the Motherboard on page 143 m Servicing the Front Control Panel on page 152 m Servicing the Front I O Board on page 154 Remove the Fan Tray Carriage Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Extend the Server to the Maintenance Position on page 70 m Perform Electrostatic Discharge
105. layed on the console In most cases after the fault is repaired the corrected state is detected by the system and the fault condition is repaired automatically However this repair should be verified In cases where the fault condition is not automatically cleared the fault must be cleared manually 1 After replacing a faulty FRU power on the server 2 At the Oracle ILOM prompt use the show faulty command to identify PSH detected faults m If no fault is reported you do not need to do anything else Do not perform the subsequent steps m Ifa fault is reported perform Step 3 and Step 4 3 Use the clear fault action property of the FRU to clear the fault from the service processor For example gt set SYS MB CPUO CMPO BRO CHO DO clear fault action True Are you sure you want to clear SYS MB CPUO CMPO BRO CHO DO y n y Set clear fault action to true 4 Clear the fault from all persistent fault records In some cases even though the fault is cleared some persistent fault information remains and results in erroneous fault messages at boot time To ensure that these messages are not displayed perform the following Oracle Solaris command fmadm repair UUID Example fmadm repair 7ee0e46b ea64 6565 e684 e996963f7Db86 Managing Faults 53 V Clear Faults Detected in the External I O Expansion Unit For service processor detected faults in the External I O Expansion Unit the fault must be manually cl
106. le ILOM prompt See Connecting to the Service Processor on page 23 2 Set the virtual keyswitch to diag so that POST will run in service mode gt set SYS keyswitch state Diag Set keyswitch state to Diag 3 Reset the system so that POST runs There are several ways to initiate a reset EXAMPLE show Command Output on page 31 shows a reset using a power cycle command sequence For other methods refer to the Sun SPARC Enterprise T5440 Seroer Administration Guide Note The server takes about one minute to power off Use the show HOST command to determine when the host has been powered off The console will display Status Powered Off 4 Switch to the system console to view the POST output start SP console If no faults were detected the system will boot EXAMPLE show Command Output on page 31 depicts abridged POST output Sun SPARC Enterprise T5440 Server Service Manual June 2011 EXAMPLE show Command Output stop SYS Are you sure you want to stop SYS y n y Stopping SYS start SYS Are you sure you want to start SYS y n y Starting SYS EXAMPLE show Command Output start SP console 2007 12 19 22 01 17 810 0 0 0 gt INFO STATUS Running RGMII 1G BCM5466R PHY level Loopback Test 2007 12 19 22 01 22 534 0 0 0 gt End Neptune 1G Loopback Test Port 2 2007 12 19 22 01 22 553 0 0 0 gt 2007 12 19 22 01 22 542 0 0 0 gt Begin Neptune 1G Loop
107. le and Hot Swappable Devices on page 77 m Servicing Hard Drives on page 78 m Servicing Fan Trays on page 86 m Servicing Power Supplies on page 91 m Servicing CMP Memory Modules on page 104 m Servicing FB DIMMs on page 110 m Servicing the Front Bezel on page 123 m Servicing the DVD ROM Drive on page 126 Server Components 183 Field Replaceable Units FIGURE Field Replaceable Units FRUs Figure Legend 1 CMP memory module bracket 4 Power supply backplane 2 Fan cage 5 Flex cable assembly 3 Hard drive backplane 6 Auxiliary power cable 184 Sun SPARC Enterprise T5440 Server Service Manual June 2011 FIGURE Field Replacable Units FRUs Motherboard and Auxiliary Boards Figure Legend 1 IDPROM 4 Motherboard 2 Front Control Panel 5 Battery 3 Front I O Board 6 Service Processor Related Information m Servicing the Service Processor on page 128 m Servicing the IDPROM on page 131 m Servicing the Battery on page 133 m Servicing the Power Distribution Board on page 134 Server Components 185 m Servicing the Fan Tray Carriage on page 137 m Servicing the Hard Drive Backplane on page 139 m Servicing the Motherboard on page 143 m Servicing the Flex Cable Assembly on page 148 m Servicing the Front Control Panel on page 152 m Servicing the Front I O Board on page 154 186 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Index Numerics 3 3
108. led V Display Individual Component Information Oracle ILOM show Command Use the show command to display information about individual components in the server At the gt prompt enter the show command In EXAMPLE show Command Output on page 27 the show command is used to get information about a memory module FB DIMM 26 Sun SPARC Enterprise T5440 Server Service Manual June 2011 EXAMPLE show Command Output gt show SYS MB CPU0 CMP0 BR1 CH0 DO SYS MB CPUO0 CMPO BR1 CHO0 DO Targets RO R1 SEEPROM SERVICE PRSNT T_AMB Properties type DIMM component_state Enabled fru_name 1024MB DDR2 SDRAM FB DIMM 333 PC2 5300 fru_description FBDIMM 1024 Mbyte fru_manufacturer Micron Technology fru_version FFFFFF fru_part_number 18HF12872FD667D6D4 fru serial number d81813ce fault state OK clear fault action none Commands cd show Controlling How POST Runs This topic contains the following m POST Parameters on page 28 m Change POST Parameters on page 29 m Run POST in Maximum Mode on page 30 Managing Faults 27 POST Parameters The server can be configured for normal extensive or no POST execution You can also control the level of tests that run the amount of POST output that is displayed and which reset events trigger POST by using Oracle ILOM command variables The keyswitch state parameter when set to diag overrides all the other Oracle
109. ll to the System Console on page 25 Related Information m Diagnostic Flowchart on page 11 Managing Faults 23 m Switch From the System Console to the Service Processor Oracle ILOM or ALOM CMT Compatibility Shell on page 24 m Switch From Oracle ILOM to the System Console on page 24 m Switch From the ALOM CMT Compatibility Shell to the System Console on page 25 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide WV Switch From the System Console to the Service Processor Oracle ILOM or ALOM CMT Compatibility Shell To switch from the system console to the service processor prompt type Hash Period H gt Vv Switch From Oracle ILOM to the System Console From the Oracle ILOM gt prompt type start SP console gt start SP console V Switch From the ALOM CMT Compatibility Shell to the System Console From the ALOM CMT sc gt prompt type console Sc console 24 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Displaying FRU Information With Oracle ILOM m Display System Components Oracle ILOM show components Command on page 25 m Display Individual Component Information Oracle ILOM show Command on page 26 Display System Components Oracle ILOM show components Command The show components command displays the system components asrkeys and reports t
110. lly Description Topic Clear faults detected during POST Clear Faults Detected During POST on page 51 Clear faults detected by PSH Clear Faults Detected by PSH on page 53 Clear faults detected in the Internal I O Clear Faults Detected in the External I O Expansion Unit Expansion Unit on page 54 Related Information Diagnostic Flowchart on page 11 POST Fault Management Overview on page 20 Predictive Self Healing Overview on page 19 Sun SPARC Enterprise T5440 Server Administration Guide Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server Sun External 1 0 Expansion Unit Installation and Service Manual for SPARC Enterprise T5120 T5240 T5220 T5240 T5440 Servers Clear Faults Detected During POST In most cases when POST detects a faulty component POST logs the fault and automatically takes the failed component out of operation by placing the component in the ASR blacklist See Disabling Faulty Components on page 54 In most cases the replacement of the faulty FRU is detected when the service processor is reset or power cycled In this case the fault is automatically cleared from the system This procedure describes how to identify a POST detected fault and if necessary manually clear the fault Managing Faults 51 1 After replacing a faulty FRU at the Oracle ILOM prompt use the show faulty command to identify POST detected faults Faults detected by PO
111. ltmgmt 0 timestamp Dec 21 16 40 56 SP faultmgmt 0 timestamp Dec 21 16 40 56 faults 0 SP faultmgmt 0 sp detected fault SYS MB CPU0 CMP0 CMP0 BR1 CH0 DO faults 0 Forced fail POST m Example showing a fault in the External I O Expansion Unit These faults can be identified by the text string Ext FRU or Ext sensor at the beginning of the fault description The text string Ext FRU indicates that the specified FRU is faulty and should be replaced The text string Ext sensor indicates that the specified FRU contains the sensor that detected the problem In this case the specified FRU may not be faulty Contact service support to isolate the problem show faulty Target Property Value m xcci cutem ie ire pee LM oe M Eee d des a SP faultmgmt 0 fru SYS IOX X0TC IOB1 LINK SP faultmgmt 0 timestamp Feb 05 18 28 20 SP faultmgmt 0 timestamp Feb 05 18 28 20 faults 0 SP faultmgmt 0 sp detected fault Ext FRU SYS IOX X0TC IOB1 LINK faults 0 SIGCON 0 I2C no device response Detecting Faults Oracle Solaris OS Files and Commands With the Oracle Solaris OS running on the server you have the full complement of Oracle Solaris OS files and commands available for collecting information and for troubleshooting If POST Oracle ILOM or the Oracle Solaris PSH features do not indicate the source of a fault check the message buffer and log files for notifications for faults Har
112. mmands that resemble the commands of ALOM CMT The service processor sends alerts to all ALOM CMT users that are logged in sends the alert through email to a configured email address and writes the event to the Oracle ILOM event log The Oracle ILOM event log is also available using the ALOM CMT compatibility shell See the Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server for comparisons between the Oracle ILOM CLI and the ALOM CMT compatibility CLI and for instructions for adding an ALOM CMT account Related Information m Diagnostic Flowchart on page 11 m Detecting Faults Using LEDs on page 32 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server 18 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Predictive Self Healing Overview The Predictive Self Healing PSH technology enables the server to diagnose problems while the Oracle Solaris OS is running and mitigate many problems before they negatively affect operations The Oracle Solaris OS uses the Fault Manager daemon md 1M which starts at boot time and runs in the background to monitor the system If a component generates an error the daemon handles the error by correlating the error with d
113. mory module containing slots for an additional 12 FB DIMMs is associated with each CMP module m Service processor The service processor Oracle ILOM board controls the server power and monitors server power and environmental events The service processor draws power from the server s 3 3V standby supply rail which is available whenever the system is receiving main input power even when the system is turned off A removable IDPROM contains MAC addresses host ID and ILOM and OpenBoot PROM configuration data When replacing the service processor the IDPROM can be transferred to a new board to retain system configuration data m Power supply backplane This board distributes main 12V power from the power supplies to the rest of the system The power supply backplane is connected to the motherboard and the disk drive backplane via a flex cable High voltage power is provided to the motherboard via a bus bar assembly m Hard drive backplane This board includes the connectors for up to four hard drives It is connected to the motherboard via a flex cable assembly Each drive has its own Power Activity Fault and Ready to Remove LEDs m Front control panel This board connects directly to the motherboard and serves as the interconnect for the front I O board It contains the front panel LEDs and the Power button m Front I O board This board connects to the front control panel interconnect It contains two USB ports m Flex ca
114. mponents until they can be replaced The following components are managed by the ASR feature m UltraSPARC T2 Plus processor strands m Memory FB DIMMs m I O subsystem The database that contains the list of disabled components is referred to as the ASR blacklist asr db In most cases POST automatically disables a faulty component After the cause of the fault is repaired FRU replacement loose connector reseated and so on you might need to remove the component from the ASR blacklist Note For instructions on enabling or disabling ASR see the Sun SPARC Enterprise T5440 Server Administration Guide The ASR commands TABLE ASR Commands on page 55 enable you to view and manually add or remove components asrkeys from the ASR blacklist You run these commands from the Oracle ILOM gt prompt TABLE ASR Commands Command Description show components Displays system components and their current state set asrkey component state Removes a component from the asr db blacklist Enabled where asrkey is the component to enable set asrkey component state Adds a component to the asr db blacklist where Disabled asrkey is the component to disable Note The asrkeys vary from system to system depending on how many cores and memory are present Use the show components command to see the asrkeys on a given system Managing Faults 55 Note A reset or power cycle is required after disabling or enabling a component I
115. n The service processor is running Slow blink Indicates that a normal transitory activity is taking place Slow blinking could indicate that the system diagnostics are running or that the system is booting The recessed Power button toggles the system on or off f the system is powered off press once to power on If the system is powered on press once to initiate a graceful system shutdown f the system is powered on press and hold for 4 seconds to initiate an emergency shutdown For more information about powering on and powering off the system see the Sun SPARC Enterprise T5440 Server Administration Guide Identifying Server Components 5 LED or Button Icon Description Fan Fault LED TOP amber FAN Power Supply REAR Fault LED PS amber Overtemp LED amber Provides the following operational fan indications e Off Indicates a steady state no service action is required Steady on Indicates that a fan failure event has been acknowledged and a service action is required on at least one of the fan modules Provides the following operational PSU indications e Off Indicates a steady state no service action is required Steady on Indicates that a power supply failure event has been acknowledged and a service action is required on at least one PSU Provides the following operational temperature indications e Off Indicates a steady state no service action is require
116. n Fault system LED 33 Fault fan tray LED 33 Fault hard drive LED 33 Fault power supply LED 33 92 98 FB DIMM Fault motherboard LEDs 34 Gigabit Ethernet port 8 Locator 4 7 Overtemp system LED 5 33 Power OK system LED 13 Power Supply Fault system LED 5 33 94 98 Ready to Remove hard drive LED 80 82 Service Required system LED 4 33 98 Top system LED 5 LEDs about 32 fan tray 33 91 front panel 4 hard drive 86 network management port 8 Service Required system LED 34 using to diagnose faults 32 using to identify device state 32 Locator LED and button 3 4 5 7 log files viewing 38 logical domains guest configuration 170 M MAC addresses stored on SCC module 2 maintenance position 69 71 memory fault handling 22 memory modules See CMP memory modules memory See also FB DIMMs message ID 19 messages file 38 motherboard about 1 fastener locations 147 installing 146 removing 143 N network management port LEDs 8 pinouts 177 node reconfiguration 163 and I O services 164 I O device nodes 166 PCIe 167 Normal mode virtual keyswitch position 114 Normal mode virtual keyswitch position See also setkeyswitch command O Overtemp system LED 5 33 overtemperature condition 33 P PCIe card adding 100 configuration guidelines 102 device identifiers 101 installing 99 removing 98 PCIe fabric reconfiguration 167 pinouts Gigabit Ethernet ports 180 ne
117. n antistatic wrist strap When servicing or removing server components attach an antistatic strap to your wrist and then to a metal area on the chassis V Remove the Top Cover Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Loosen the two captive No 2 Phillips screws at the rear edge of the top panel Slide the top cover to the rear about 0 5 inch 12 7 mm Remove the top cover Lift up and remove the cover Preparing to Service the System 73 Caution If the top cover is removed before the server is powered off the server will immediately disable the front panel Power button and shut down After such an event you must replace the top cover and use the poweron command to power on the server See Power On the Server on page 161 74 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Preparing to Service the System 75 76 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Servicing Customer Replaceable Units These topics describe how to service customer replaceable units CRUs in the server Topic Links Read and learn about components which can be serviced while the system is in ope
118. nance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Remove the Front Bezel on page 124 Do the following 1 2 3 Remove the flex cable retainer Loosen the captive No 2 Phillips screw and lift the retainer up and out of the chassis Unplug the DVD ROM drive from the flex cable assembly Push the DVD ROM drive forward until it protrudes from the front of the chassis 126 Sun SPARC Enterprise T5440 Server Service Manual June 2011 4 Slide the DVD ROM drive out of the chassis WV Install the DVD ROM Drive 1 Slide the DVD ROM drive into its bay 2 Connect the DVD ROM drive to the flex cable assembly 3 Install the flex cable retainer Place the retainer into position and tighten the captive No 2 Phillips screw Next Steps m Install the Front Bezel on page 125 m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 Servicing Field Replaceable Units 127 m Power On the Server on page 161 Servicing the Service Processor The service processor module contains the service processor firmware IDPROM and system battery m Remove the Service Processor on page 128 m Install the Service Processor on page 130 Related Information m Servicing the IDPROM on page 131 m Servicing the Battery on page 133 WV Remo
119. nboard hard drives In this failure scenario the system is unable to boot from internal drives Similarly if CMP1 goes offline the following devices become unavailable m PCle4 m PCle5 m Onboard network devices Related Information m System Bus Topology on page 171 m I O Fabric in 2P Configuration on page 172 m I O Fabric in 4P Configuration on page 173 164 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Recovering From a Failed CMP Memory Module This topic includes the following m Options for Recovering From a Failed CMP Memory Module on page 165 m Reconfiguring I O Device Nodes on page 166 m Reset the LDoms Guest Configuration on page 170 Options for Recovering From a Failed CMP Memory Module If your system experiences a complete CMP memory module failure do one of the following 1 Replace the failed CMP memory module 2 If a replacement CMP module is not available remove the failed CMP module and replace it with a CMP from a different slot that does not have any directly connected I O devices in use see I O Fabric in 2P Configuration on page 172 and I O Fabric in 4P Configuration on page 173 If this leaves a memory module without its associated CMP module remove the memory module Note At a minimum a functioning CMP module must be installed in CMP Slot 0 If you are performing a node reconfiguration following a failure in CMP Slot 0 you must move one of the remaining
120. nd for the following reasons m To see if any faults have been diagnosed in the system m To verify that the replacement of a FRU has cleared the fault and not generated any additional faults Related Information m Diagnostic Flowchart on page 11 m Detecting Faults Using LEDs on page 32 m Oracle ILOM to ALOM CMT Command Reference on page 57 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server V Detect Faults Oracle ILOM show faulty Command Atthe gt prompt type the show faulty command The following show faulty command examples show the different kinds of output from the show faulty command m Example of the show faulty command when no faults are present show faulty Target Property Managing Faults 35 m Example of the show faulty command displaying an environmental fault show faulty Target Property Value Be ieee er Na So hh pete he nb ue fois EAE te t ee ce tie ro SP faultmgmt 0 fru SYS MB FT1 SP faultmgmt 0 timestamp Dec 14 23 01 32 SP faultmgmt 0 timestamp Dec 14 23 01 32 faults 0 SP faultmgmt 0 sp detected fault TACH at SYS MB FT1 has faults 0 exceeded low non recoverable threshold m Example of the show faulty command displaying a configuration fault
121. nterprise T5440 Server Service Manual June 2011 6 Install the flex cable retainer Place the retainer into position and tighten the captive No 2 Phillips screw 7 Plug in the power cables Next Steps m Install the Top Cover on page 158 m Slide the Server Into the Rack on page 159 m Power On the Server on page 161 Servicing Field Replaceable Units 151 Servicing the Front Control Panel The front control panel contains system status LEDs and the Power button This topic includes the following m Remove the Front Control Panel on page 152 m Install the Front Control Panel on page 153 Related Information m Infrastructure Boards and Cables on page 1 m Front Panel Diagram on page 3 m Front Panel LEDs on page 5 V Remove the Front Control Panel Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Disconnect Power Cords From the Server on page 68 m Remove the Server From the Rack on page 71 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove a Fan Tray on page 89 m Remove the Fan Tray Carriage on page 137 1 Unplug the front control panel cable from J9901 on the motherboard 2 Unplug the front control panel cable from the front I O board 3 Remove
122. nterprise T5440 Server Service Manual June 2011 V Remove a Fan Tray If you are removing the fan trays as a prerequisite for another service procedure follow the steps in this procedure Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Perform the task Extend the Server to the Maintenance Position on page 70 m Perform the task Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Do the following Press the fan tray latches toward the center of the fan tray and pull the fan tray up and out of the system Servicing Customer Replaceable Units 89 V Install a Fan Tray 1 Slide each fan tray into its bay until it locks into place Ensure that the fan tray is oriented correctly Airflow in the system is from front to back 2 Verify proper fan tray operation See Fan Tray Fault LED on page 91 Next Steps If you are replacing the fan trays after performing another service procedure complete these steps m Slide the Server Into the Rack on page 159 m Power On the Server on page 161 Fan Tray Device Identifiers These are the FRU device names for the fan trays in the server Device Device Identifier FTO SYS MB FTO FT1 SYS MB FT1 FT2 SYS MB FT2 FT3 SYS MB FT3 Related Information m Managing Fault
123. ntion Measures on page 73 Note If you are servicing Power Supply 0 you must disconnect the cable management arm support strut 1 2 Grasp the power supply handle and press the release latch Pull the power supply out of the chassis V Install a Power Supply If you are installing the power supplies following another service tasks complete these steps 1 Align the replacement power supply with the empty power supply bay Servicing Customer Replaceable Units 95 2 Slide the power supply into the bay until it is fully seated Next Steps m Connect the Power Cords to the Server on page 161 m Power On the Server on page 161 Power Supply Device Identifiers These are the the FRU device names for power supplies Device Device Identifier PS0 SYS PSO PS1 SYS PS1 PS2 SYS PS2 PS3 SYS PS3 96 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Note Power supply names in Oracle ILOM messages are displayed with the full FRU name such as SYS PSO0 Related Information m Managing Faults on page 9 m Hot Pluggable and Hot Swappable Devices on page 77 Power Supply LED on page 97 Power Supply LED Each power supply contains a dual color LED that is visible when looking at the back panel of the system The following table includes a description of power supply LED modes and their function listed from top to bottom LED State Meaning Notes Off No AC present Power s
124. nts along their connector edges Caution You must disconnect both power supplies before servicing any of the components documented in this chapter Antistatic Wrist Strap Wear an antistatic wrist strap and use an antistatic mat when handling components such as hard drive assemblies circuit boards or PCI cards When servicing or removing server components attach an antistatic strap to your wrist and then to a metal area on the chassis Following this practice equalizes the electrical potentials between you and the server Note An antistatic wrist strap is no longer included in the server accessory kit However antistatic wrist straps are still included with options Antistatic Mat Place ESD sensitive components such as motherboards memory and other PCBs on an antistatic mat Preparing to Service the System 65 Required Tools m Antistatic wrist strap m Antistatic mat m No 1 Phillips screwdriver m No 2 Phillips screwdriver m 7 mm hex driver m No 1 flat blade screwdriver battery removal m Pen or pencil power on server W Obtain the Chassis Serial Number To obtain support for your system you need your chassis serial number The chassis serial number is located on a sticker that is on the front of the server and another sticker on the side of the server W Obtain the Chassis Serial Number Remotely Use the Oracle ILOM show SYS command to obtain the chassis serial number
125. of mdump is the same after the FRU has been replaced Use the fmadm faulty command to verify that the fault has cleared See Clear Faults Detected by PSH on page 53 1 Check the event log using the mdump command with v for verbose output In the following example a fault is displayed indicating the following details m Date and time of the fault Jul 31 12 47 42 2007 m Universal Unique Identifier UUID The UUID is unique for every fault d940ac2 d21e c94a 258 f8a9bb69d05b m Message identifier which can be used to obtain additional fault information SUNAV 8000 JA 48 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Faulted FRU The information provided in the example includes the part number of the FRU part 541215101 and the serial number of the FRU serial 101083 The Location field provides the name of the FRU In EXAMPLE Output from the fmdump v Command on page 49 the FRU name is MB meaning the motherboard Note fmdump displays the PSH event log Entries remain in the log after the fault has been repaired 2 Use the message ID to obtain more information about this type of fault a In a browser go to the Predictive Self Healing Knowledge Article web site http www sun com msg b Obtain the message ID from the console output or the Oracle ILOM show faulty command c Enter the message ID in the SUNW MSG ID field and click Lookup In EXAMPLE PSH Message Output on page 49 the
126. on page 70 Preparing to Service the System 69 W Extend the Server to the Maintenance Position 1 Optional Use the set SYS LOCATE command from the gt prompt to locate the system that requires maintenance gt set SYS LOCATE value Fast Blink Once you have located the server press the Locator LED and button to turn it off 2 Verify that no cables will be damaged or will interfere when the server is extended Although the cable management arm CMA that is supplied with the server is hinged to accommodate extending the server you should ensure that all cables and cords are capable of extending 3 From the front of the server release the two slide release latches FIGURE Extending the Server Into the Maintenance Position on page 70 Squeeze the slide rail locks to release the slide rails FIGURE Extending the Server Into the Maintenance Position Figure Legend 1 Slide Rail Lock 2 Inner Rail Release Button 70 Sun SPARC Enterprise T5440 Server Service Manual June 2011 4 While squeezing the slide rail locks slowly pull the server forward until it is locked in the service position V Remove the Server From the Rack The server must be removed from the rack to remove or install the following components m Motherboard Caution Two people must dismount and carry the chassis 1 Disconnect all the cables and power cords from the server 2 Extend the server to the maintenance posit
127. on page 78 m Servicing the Fan Tray Carriage on page 137 V Remove the Hard Drive Backplane Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Extend the Server to the Maintenance Position on page 70 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove a Hard Drive on page 83 Note You must remove all four hard drives from the server Note the location of each hard drive you remove You must re install each hard drive in the correct bay m Remove a Fan Tray on page 89 Note You must remove all four fan trays m Remove the Fan Tray Carriage on page 137 Do the following 1 Remove the flex cable retainer Loosen the captive No 2 Phillips screw and lift the retainer up and out of the chassis 2 Unplug the cable from the hard drive backplane 3 Loosen the three captive No 2 Phillips screws 140 Sun SPARC Enterprise T5440 Server Service Manual June 2011 a 0 M eee ved ill cat S cya sA Me 4 Lift the backplane up and out of the system V Install the Hard Drive Backplane 1 Lower the hard drive backplane into the system Align the tab on the lower edge the backplane with the corresponding slot in the chassis floor Servicing Field Re
128. on to open the latch Caution The latch is not an ejector Do not bend the latch too far Doing so can damage the latch 5 Grasp the latch and pull the drive out of the drive slot 80 Sun SPARC Enterprise T5440 Server Service Manual June 2011 EXAMPLE Sample Ap id Output Ap id Type Receptacle Occupant Condition cO scsi bus connected configured unknown c0 dsk di1t0dO disk connected configured unknown c0 dsk di1ti1dO disk connected configured unknown usb0 1 unknown empty unconfigured ok usb0 2 unknown empty unconfigured ok usb0 3 unknown empty unconfigured ok usb1 1 unknown empty unconfigured ok usb1 2 unknown empty unconfigured ok usb1 3 unknown empty unconfigured ok usb2 1 unknown empty unconfigured ok usb2 2 unknown empty unconfigured ok usb2 3 unknown empty unconfigured ok usb2 4 unknown empty unconfigured ok usb2 5 unknown empty unconfigured ok usb2 6 unknown empty unconfigured ok usb2 7 unknown empty unconfigured ok usb2 8 unknown empty unconfigured ok V Install a Hard Drive Hot Plug Installing a hard drive into the server is a two step process You must first install a hard drive into the desired drive slot Then you must configure that drive to the server Perform the following process to install a hard drive 1 If necessary remove the blank panel from the chassis Note The server might have up to three blank panels covering unoccupied drive slots 2 Align the replacemen
129. onnector to J9801 9 Install the flex cable retainer Place the retainer into position and tighten the captive No 2 Phillips screw 10 Plug in the front I O cable to J9901 146 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Next Steps m Install the Fan Tray Carriage on page 138 m Install a Fan Tray on page 90 Note Install all four fan trays m Install a CMP Memory Module on page 107 Note Install all CMP and memory modules m Install the Service Processor on page 130 m Install a PCIe Card on page 99 m Install the Top Cover on page 158 m Install the Server Into the Rack on page 158 m Connect the Power Cords to the Server on page 161 m Power On the Server on page 161 Motherboard Fastener Locations This figure shows the location of the captive screws that secure the motherboard to the chassis floor Servicing Field Replaceable Units 147 Related Information m Servicing the Motherboard on page 143 Servicing the Flex Cable Assembly The flex cable assembly provides the power and data connection between the power supply backplane hard drive backplane and motherboard This topic includes the following m Remove the Flex Cable Assembly on page 149 m Install the Flex Cable Assembly on page 150 148 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Related Information Safety Information on page 63 Servicing Po
130. ontents v Extending the Server to the Maintenance Position 69 Components Serviced in the Maintenance Position 69 w Extend the Server to the Maintenance Position 70 Remove the Server From the Rack 71 Perform Electrostatic Discharge Antistatic Prevention Measures 73 Remove the Top Cover 73 Servicing Customer Replaceable Units 77 Hot Pluggable and Hot Swappable Devices 77 Servicing Hard Drives 78 About Hard Drives 78 v Remove a Hard Drive Hot Plug 79 v Install a Hard Drive Hot Plug 81 v Remove a Hard Drive 83 v Install a Hard Drive 84 Hard Drive Device Identifiers 85 Hard Drive LEDs 86 Servicing Fan Trays 86 About Fan Trays 87 v Remove a Fan Tray Hot Swap 87 v Install a Fan Tray Hot Swap 88 v Remove a Fan ray 89 v Installa Fan Tray 90 Fan Tray Device Identifiers 90 Fan Tray Fault LED 91 Servicing Power Supplies 91 About Power Supplies 92 v Remove a Power Supply Hot Swap 92 v Install a Power Supply Hot Swap 93 vi Sun SPARC Enterprise T5440 Server Service Manual June 2011 v RemoveaPowerSupply 95 v Installa Power Supply 96 Power Supply Device Identifiers 97 Power Supply LED 97 Servicing PCIe Cards 98 v Remove a PCIe Card 98 v Install a PCIe Card 99 v AddaPCle Card 100 PCle Device Identifiers 101 PCIe Slot Configuration Guidelines 102 Servicing CMP Memory Modules 104 CMP Memory Modules Overview 104 v Removea CMP Memory Module 106 v Install a CMP Memory Module 107 v AddaCMP Memory Module 108 CMP
131. ory errors CEs POST forwards the error to the Predictive Self Healing PSH daemon for error handling If an uncorrectable memory fault is detected POST displays the fault with the device name of the faulty FB DIMMs and logs the fault POST then disables the faulty FB DIMMs Depending on the memory configuration and the location of the faulty FB DIMM POST disables half of physical memory in the system or half the physical memory and half the processor threads When this offlining process occurs in normal operation you must replace the faulty FB DIMMs based on the fault message and enable the disabled FB DIMMs with the Oracle ILOM command set device component state enabled where device is the name of the FB DIMM being enabled for example set SYS MB CPU0 CMP0 BRO0 CHO DO component state enablegQ m Predictive Self Healing PSH technology A feature of the Oracle Solaris OS PSH uses the Fault Manager daemon md to watch for various kinds of faults When a fault occurs the fault is assigned a unique fault ID UUID and logged PSH reports the fault and identifies the locations of the faulty FB DIMMs If you suspect that the server has a memory problem follow the flowchart see FIGURE Diagnostic Flowchart on page 12 Run the Oracle ILOM show faulty command The show faulty command lists memory faults and lists the specific FB DIMMs that are associated with the fault 22 Sun SPARC Enterprise T5440 Server Service Manual June 2011
132. ower supply handle and press the release latch 5 Pull the power supply out of the chassis V Install a Power Supply Hot Swap 1 Align the replacement power supply with the empty power supply bay 2 Slide the power supply into the bay until it is fully seated Servicing Customer Replaceable Units 93 N 3 Reconnect the power cord to the power supply Verify that the power supply LED is green or blinking green 4 Verify that the system Power Supply Fault LED and the front and rear Service Required LEDs are not lit Note See Front Panel LEDs on page 5 and Rear Panel LEDs on page 8 for more information about identifying and interpreting system LEDs 5 At the Oracle ILOM gt prompt use the show faulty command to verify the status of the power supplies Remove a Power Supply Caution Hazardous voltages are present To reduce the risk of electric shock and danger to personal health follow the instructions 94 Sun SPARC Enterprise T5440 Server Service Manual June 2011 If you are removing the power supplies as a prerequisite for another service procedure follow these steps Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Disconnect Power Cords From the Server on page 68 Perform Electrostatic Discharge Antistatic Preve
133. placeable Units 141 2 3 4 CN IZ My lt NN ES i lt Ap i DNN CE A i Tighten the three captive No 2 Phillips screws Plug the cable into its connector on the backplane Install the flex cable retainer Place the retainer into position and tighten the captive No 2 Phillips screw Next Steps Install the Fan Tray Carriage on page 138 Install a Fan Tray on page 90 Install a CMP Memory Module on page 107 Install the Top Cover on page 158 Install a Hard Drive on page 84 Note You must install the hard drives in the correct slots Slide the Server Into the Rack on page 159 Power On the Server on page 161 142 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Servicing the Motherboard Note If you are replacing faulty motherboard you must set diag mode to normal or o before performing this procedure This topic includes the following m Remove the Motherboard on page 143 m Install the Motherboard on page 146 m Motherboard Fastener Locations on page 147 Related Information m POST Parameters on page 28 m Servicing CMP Memory Modules on page 104 m Servicing PCIe Cards on page 98 m Servicing the Service Processor on page 128 m Servicing the Fan Tray Carriage on page 137 m Motherboard Fastener Locations on page 147 Remove the Motherboard Before you begin complete these tasks m Read the sec
134. power supplies enable you to remove and replace a power supply without shutting the server down provided that at least two other power supplies are online and working Note If a power supply fails and you do not have a replacement available leave the failed power supply installed to ensure proper airflow in the server Related Information m Identifying Server Components on page 1 m Managing Faults on page 9 m Hot Pluggable and Hot Swappable Devices on page 77 m Power Supply Device Identifiers on page 97 m Power Supply LED on page 97 m Server Components on page 181 V Remove a Power Supply Hot Swap danger to personal health follow the instructions Caution Hazardous voltages are present To reduce the risk of electric shock and Note If you are servicing Power Supply 0 you must disconnect the cable management arm support strut 1 Identify which power supply requires replacement An amber LED on a power supply indicates that a failure was detected In addition the show faulty command indicates which power supply is faulty See Detecting Faults on page 32 2 Gain access to the rear of the server where the faulty power supply is located If necessary slide the system partially out of the rack to obtain better access to the rear panel 3 Disconnect the power cord from the faulty power supply 92 Sun SPARC Enterprise T5440 Server Service Manual June 2011 4 Grasp the p
135. pplies on page 91 m Server Components on page 181 Servicing Hard Drives This topic includes the following m About Hard Drives on page 78 m Remove a Hard Drive Hot Plug on page 79 m Install a Hard Drive Hot Plug on page 81 m Remove a Hard Drive on page 83 m Install a Hard Drive on page 84 m Hard Drive Device Identifiers on page 85 m Hard Drive LEDs on page 86 About Hard Drives The hard drives in the server are hot pluggable but this capability depends on how the hard drives are configured To hot plug a drive you must take the drive offline before you can safely remove it Taking a drive offline prevents any applications from accessing it and removes the logical software links to it 78 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Caution You must use hard drives designed for this server which have a vented front panel to allow adequate airflow to internal system components Installing inappropriate hard drives could result in an overtemperature condition The following situations inhibit your ability to hot plug a drive m If the hard drive contains the operating system and the operating system is not mirrored on another drive m If the hard drive cannot be logically isolated from the online operations of the server If your drive falls into one of these conditions you must power off the server before you replace the hard drive Related Information m Identifying Se
136. ration Remove install and add hard drives Remove and install fan trays Remove and install power supplies Remove install and add PCIe cards Remove install and add CMP or memory modules Remove install and add FB DIMMs Exploded views of CRUs Hot Pluggable and Hot Swappable Devices on page 77 Servicing Hard Drives on page 78 Servicing Fan Trays on page 86 Servicing Power Supplies on page 91 Servicing PCIe Cards on page 98 Servicing CMP Memory Modules on page 104 Servicing FB DIMMs on page 110 Customer Replaceable Units on page 182 Related Information m Servicing Field Replaceable Units on page 123 Hot Pluggable and Hot Swappable Devices Hot pluggable devices are those devices that you can remove and install while the server is running However you must perform administrative tasks before or after installing the hardware for example mounting a hard drive The following devices are hot pluggable 77 m Hard drives Hot swappable devices are those devices that can be removed and installed while the server is running without affecting the rest of the server s capabilities The following devices are hot swappable m Fan trays m Power supplies Note The chassis mounted hard drives can be hot swappable depending on how they are configured Related Information m Servicing Hard Drives on page 78 m Servicing Fan Trays on page 86 m Servicing Power Su
137. re is providing service The following table shows typical combinations of ALOM CMT variables and associated POST modes Normal Diagnostic Mode Diagnostic Service Keyswitch Diagnostic Parameter Default Settings No POST Execution Mode Preset Values diag mode normal Off service normal keyswitch state normal normal normal diag diag level max N a max max diag trigger power on reset None all resets all resets error reset diag verbosity normal N a max max Description of POST execution This is the default POST configuration This configuration tests the system thoroughly and suppresses some of the detailed POST output Related Information POST runs the full spectrum of tests with the maximum output displayed POST does not run resulting in quick system initialization This is not a suggested configuration Diagnostic Flowchart on page 11 Detecting Faults Using LEDs on page 32 Oracle ILOM to ALOM CMT Command Reference on page 57 Sun SPARC Enterprise T5440 Server Administration Guide Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server Managing Faults POST runs the full spectrum of tests with the maximum output displayed 61 62 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Preparing to Service the System These topics describe how to prepare the server for servicing Safety Information on page 63 Required Tools on page 66 Obtain t
138. repaired Note Oracle ILOM does not automatically detect hard drive replacement Many environmental faults can automatically recover A temperature that is exceeding a threshold might return to normal limits An unplugged power supply can be plugged in and so on Recovery of environmental faults is automatically detected Note No Oracle ILOM command is needed to manually repair an environmental fault The Predictive Self Healing technology does not monitor the hard drive for faults As a result the service processor does not recognize hard drive faults and will not light the fault LEDs on either the chassis or the hard drive itself Use the Oracle Solaris message files to view hard drive faults Managing Faults 17 Related Information m Diagnostic Flowchart on page 11 m Detecting Faults Using LEDs on page 32 m Detecting Faults Oracle Solaris OS Files and Commands on page 37 m Sun SPARC Enterprise T5440 Server Installation and Setup Guide m Sun SPARC Enterprise T5440 Server Administration Guide m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise 15440 Server ALOM CMT Compatibility Shell Overview The default shell for the service processor is the Oracle ILOM shell However you can use the ALOM CMT compatibility shell to emulate the ALOM CMT interface supported on the previous generation of CMT servers Using the ALOM CMT compatibility shell with a few exceptions you can use co
139. rver Components on page 1 m Managing Faults on page 9 m Powering Off the System on page 67 m Hot Pluggable and Hot Swappable Devices on page 77 m Hard Drive Device Identifiers on page 85 m Hard Drive LEDs on page 86 m Server Components on page 181 Remove a Hard Drive Hot Plug Removing a hard drive from the server is a three step process You must first identify the drive you want to remove unconfigure that drive from the server and then manually remove the drive from the chassis Note See Hard Drive Device Identifiers on page 85 for information about identifying hard drives Before you begin complete these tasks m Read the section Safety Information on page 63 Servicing Customer Replaceable Units 79 1 At the Solaris prompt issue the c gadm al command to list all drives in the device tree including drives that are not configured Type cfgadm al This command should identify the Ap id for the hard drive you wish to remove as in EXAMPLE Sample Ap id Output on page 81 2 Issue the c gadm c unconfigure command to unconfigure the disk For example type cfgadm c unconfigure c0 dsk d1t1d1 where c0 dsk cO0t1d1 is the disk that you are trying to unconfigure 3 Wait until the blue Ready to Remove LED lights This LED will help you identify which drive is unconfigured and can be removed 4 On the drive you plan to remove push the hard drive release butt
140. s LED or Button Description Locator LED and button white Service Required LED amber Power OK LED green Power button The Locator LED enables you to find a particular system The LED is activated using one of the following methods The ALOM CMT command setlocator on The Oracle ILOM command set SYS LOCATE value Fast Blink Manually press the Locator button to toggle the Locator LED on or off This LED provides the following indications Off Normal operating state Fast blink System received a signal as a result of one of the methods previously mentioned indicating that it is active If on indicates that service is required POST and Oracle ILOM are two diagnostics tools that can detect a fault or failure resulting in this indication The Oracle ILOM show faulty command provides details about any faults that cause this indicator to light Under some fault conditions individual component fault LEDs are lit in addition to the system Service Required LED Provides the following indications Off Indicates that the system is not running in its normal state System power might be off The service processor might be running Steady on Indicates that the system is powered on and is running in its normal operating state No service actions are required Fast blink Indicates the system is running at a minimum level in standby and is ready to be quickly returned to full functio
141. s N 3 Press down evenly to plug the service processor into the motherboard 4 Secure the service processor with the two captive No 2 Phillips screws Next Steps m Install the Top Cover on page 158 130 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m Slide the Server Into the Rack on page 159 m Connect the Power Cords to the Server on page 161 m Power On the Server on page 161 Servicing the IDPROM The IDPROM stores system parameters such as host ID and MAC address Oracle ILOM configuration settings and OpenBoot PROM configuration settings If you are replacing a faulty service processor you must move the IDPROM from the old service processor to the new one m Remove the IDPROM on page 131 m Install the IDPROM on page 132 Related Information m Servicing the Service Processor on page 128 m Servicing the Battery on page 133 Remove the IDPROM Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Extend the Server to the Maintenance Position on page 70 m Disconnect Power Cords From the Server on page 68 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove the Service Processor on page 128 1 Lift the IDPROM up off its connector on the servi
142. s on page 9 m Hot Pluggable and Hot Swappable Devices on page 77 m Fan Tray Fault LED on page 91 90 Sun SPARC Enterprise T5440 Server Service Manual June 2011 Fan Tray Fault LED Each fan tray contains a Fault LED that is located on the top panel of the server The LED is visible when you slide the server partially out of the rack LED Color Notes Fault i Amber This LED is lit when the fan tray is faulty The front panel Fan Fault LED and the front and rear panel Service Required LEDs are also lit if the system detects a fan tray fault In addition the system Overtemp LED might be lit if a fan fault causes an increase in system operating temperature See Front Panel LEDs on page 5 and Rear Panel LEDs on page 8 for more information about system status LEDs Related Information m Managing Faults on page 9 m Hot Pluggable and Hot Swappable Devices on page 77 m Fan Tray Fault LED on page 91 Servicing Power Supplies This topic includes the following m About Power Supplies on page 92 m Remove a Power Supply Hot Swap on page 92 m Install a Power Supply Hot Swap on page 93 m Remove a Power Supply on page 95 m Install a Power Supply on page 96 m Power Supply Device Identifiers on page 97 m Power Supply LED on page 97 Servicing Customer Replaceable Units 91 About Power Supplies The server is equipped with redundant hot swappable power supplies Redundant
143. setlocator on The Oracle ILOM command set SYS LOCATE value Fast Blink Manually press the Locator button to toggle the Locator LED on or off This LED provides the following indications Off Normal operating state e Fast blink System received a signal as a result of one of the methods previously mentioned indicating that it is active Service If on indicates that service is required POST and Oracle ILOM are two Required LED diagnostics tools that can detect a fault or failure resulting in this indication amber The Oracle ILOM show faulty command provides details about any faults that cause this indicator to light Under some fault conditions individual component fault LEDs are lit in addition to the system Service Required LED Power OK LED Provides the following indications green e Off Indicates that the system is not running in its normal state System power might be off The service processor might be running Steady on Indicates that the system is powered on and is running in its normal operating state No service actions are required Fast blink Indicates the system is running at a minimum level in standby and is ready to be quickly returned to full function The service processor is running e Slow blink Indicates that a normal transitory activity is taking place Slow blinking could indicate the system diagnostics are running or that the system is booting Related Information m Re
144. t drive to the drive slot Hard drives are physically addressed according to the slot in which they are installed If you removed an existing hard drive from a slot in the server you must install the replacement drive in the same slot as the drive that was removed 3 Slide the drive into the drive slot until it is fully seated Servicing Customer Replaceable Units 81 4 Close the latch to lock the drive in place 5 At the Solaris prompt type the cfgadm al command to list all drives in the device tree including any drives that are not configured Type cfgadm al This command should help you identify the Ap_id for the hard drive you installed For an output example refer to EXAMPLE Sample Ap_id Output on page 83 6 Type the c gadm c configure command to configure the disk For example type cfgadm c configure c0 sd1 where c0 sdl is the disk that you are trying to configure 7 Wait until the blue Ready to Remove LED is no longer lit on the drive that you installed 8 At the Solaris prompt type the cfgadm al command to list all drives in the device tree including any drives that are not configured Type cfgadm al This command should identify the Ap_id for the hard drive that you installed The drive you installed should be is configured 82 Sun SPARC Enterprise T5440 Server Service Manual June 2011 9 Type the iostat E command Type iostat E T
145. t occur Repeat Step 2 through Step 4 until all the FB DIMMs are installed Next Steps Install a CMP Memory Module on page 107 Install the Top Cover on page 158 Slide the Server Into the Rack on page 159 Power On the Server on page 161 FB DIMM Configuration This topic includes the following Supported FB DIMM Configurations on page 116 Servicing Customer Replaceable Units 115 Memory Bank Configurations on page 117 Supported FB DIMM Configurations Use these FB DIMM configuration rules to help you plan the memory configuration of your server Up to 16 FB DIMMs can be installed in each CMP memory module pair Each bank consists of four FB DIMMs Each bank must be populated completely never partially For each CPU Memory module pair all FB DIMMs must be of the same capacity either 2 GB 4 GB or 8 GB per FB DIMM Memory bank 0 must always be populated Memory bank 1 must be populated before banks 2 and 3 Memory banks 2 and 3 must be populated simultaneously and completely The number of FB DIMMs installed on a processor and its associated memory expansion module must be either 4 8 or 16 No other combinations are supported 4 GB FB DIMMs at 800 Mhz are available for 1 6 Ghz systems only and cannot be mixed with other FB DIMMs of different speed within the same system Memory Bank Configurations The following table describes the supported memory configurations and the order in which FB DIM
146. tes a console command after completion of the reset command resetsc y y enables you to skip the confirmation question Enables control of the firmware during system initialization with the following options normal is the default boot mode e reset nvram resets OpenBoot PROM parameters to their default values bootscript string enables the passing of a string to the boot command Performs a poweroff followed by poweron Powers off the host server Powers on the host server Indicates if it is okay to perform a hot swap of a power supply This command does not perform any action But this command provides a warning if the power supply should not be removed because the other power supply is not enabled Generates a hardware reset on the host server Reboots the service processor Managing Faults 59 Oracle ILOM Command ALOM CMT Command Description set SYS keyswitch state value normal stby diag locked set SUS LOCATE value value Fast blink Off No Oracle ILOM equivalent show faulty No Oracle ILOM equivalent show SYS keyswitch state setkeyswitch y value normal stby diag locked y enables you to skip the confirmation question when setting the keyswitch to stby setlocator value on off showenvironment showfaults v showfru g lines s a FRU g lines specifies the number of lines to display before pausing the
147. th sides of the server Returning the Server to Operation 159 FIGURE Sliding the server into the rack Figure Legend 1 Inner rail release button 2 Slide rail lock 2 While pushing on the release buttons slowly push the server into the rack Ensure that the cables do not get in the way 3 If necessary re attach the CMA a Attach the CMA support strut to the inner glide b Attach the CMA to the inner glide Slide the hinge plate into the end of the outer rail until the retaining pin snaps into place 4 Reconnect the cables to the back of the server If the CMA is in the way slide the server partially out of the cabinet to access the necessary rear panel connections 160 Sun SPARC Enterprise T5440 Server Service Manual June 2011 W Connect the Power Cords to the Server Reconnect both power cords to the power supplies Note As soon as the power cords are connected standby power is applied Depending on the configuration of the firmware the system might boot See the Sun SPARC Enterprise T5440 Server Administration Guide for configuration and power on information W Power On the Server To power on the server do one of the following To initiate the power on sequence from the service processor prompt issue the poweron command You will see an Alert message on the system console This message indicates that the system is reset You will also see a message indicating that the
148. the two No 2 Phillips screws 152 Sun SPARC Enterprise T5440 Server Service Manual June 2011 4 Lift the front control panel up and out of the system 5 Place the front control panel on an antistatic mat V Install the Front Control Panel 1 Lower the front control panel into the system Servicing Field Replaceable Units 153 2 3 4 Install the two No 2 Phillips screws Plug the front control panel connector into the front I O board Plug the front control panel connector into J9901 on the motherboard Next Steps Install the Fan Tray Carriage on page 138 Install a Fan Tray on page 90 Install the Top Cover on page 158 Install the Server Into the Rack on page 158 Connect the Power Cords to the Server on page 161 Power On the Server on page 161 Servicing the Front I O Board The front I O board contains two USB connectors You must remove the front control panel to service the front I O board This topic includes the following Remove the Front I O Board on page 155 Install the Front I O Board on page 156 Related Information Infrastructure Boards and Cables on page 1 Front Panel Diagram on page 3 Servicing the Front Control Panel on page 152 V Remove the Front I O Board Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering
149. tion Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Disconnect Power Cords From the Server on page 68 m Remove the Server From the Rack on page 71 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 m Remove a PCIe Card on page 98 Note You must remove all PCIe cards Note the location of all PCIe cards so you can install them in the correct slots during reassembly Servicing Field Replaceable Units 143 m Remove the Service Processor on page 128 m Remove a CMP Memory Module on page 106 Note You must remove all CMP and memory modules m Remove a Fan Tray on page 89 Note You must remove all four fan trays m Remove the Fan Tray Carriage on page 137 1 Remove the CMP memory module bracket The bracket is secured with six captive No 2 Phillips screws See Motherboard Fastener Locations on page 147 hd l fil LP A 144 Sun SPARC Enterprise T5440 Server Service Manual June 2011 2 Remove the flex cable retainer Loosen the captive No 2 Phillips screw and lift the retainer up and out of the chassis 3 Unplug the flex cable from J9801 on the motherboard 4 Unplug the auxiliary power cable from 9803 on the motherboard 5 Unplug the front I O connector from J9901 on t
150. tton Then to execute enabled tests click on Start Tests button To restart a completed test reset the test counters by clicking the Reset Results button System Status idle Elapsed Time 000 00 00 T sesin wl Edit Global Options StartTests Stop Tests ResetResuts Reprobe az 8 MIN Enable visane EJES Test Results Test State t Scheduling Policy t Stress Progress Indicator Test Status 0 0 Disk Enabled Time high idle 0 0 Interconnect Enabled Time high idle 0 0 loports Enabled Time high idle 0 0 Memory Enabled Time high idle 0 0 Network Enabled Time high idle 0 0 Processor Enabled Time high idle 0 0 Enable Disable 4 Optional Select the test categories you want to run Certain test categories are enabled by default You can choose to accept these Note SunVTS Tests on page 45 lists test categories that are especially useful to run on this server 5 Optional Customize individual tests Click on the name of the test to select and customize individual tests Tip Use the System Excerciser High Stress Mode to test system operations Use the Component Stress High setting for the highest stress possible 6 Start testing Click the Start Tests button Status and error messages appear in the test messages area located across the bottom of the window You can stop testing at any time by clicking the Stop button During testing the Oracle VTS softw
151. twork management port 177 serial management port 176 serial port DB 9 178 USB ports 179 power cords 190 Sun SPARC Enterprise T5440 Server Service Manual June 2011 plugging into server 161 unplugging before servicing the system 65 power distribution board about 2 installing 136 removing 134 power off 68 Power OK system LED 13 power supply about 92 AC Present LED 13 97 DC OK LED 97 device identifiers 97 Fault LED 33 92 98 hot swapping 93 96 installing 93 96 removing 92 95 Power Supply Fault system LED about 5 98 interpreting to diagnose faults 33 using to verify successful power supply replacement 94 powercycle command 30 59 powering off server emergency shutdown 68 from service processor prompt 67 graceful shutdown 68 service processor command 67 powering on at service processor prompt 161 following emergency shutdown triggered by top panel removal 158 161 using Power button 161 poweron command 59 power on self test POST 20 about 20 components disabled by 55 configuration flowchart 21 controlling output 28 error messages 45 fault clearing 51 faults detected by 13 35 faulty components detected by 51 parameters changing 29 running in maximum mode 30 troubleshooting with 15 using for fault diagnosis 14 Predictive Self Healing PSH about 19 clearing faults 53 faults detected by 13 faults displayed by ILOM 35 memory faults 22 PSH See Predictive Self
152. u sure you want to start SYS y n y Starting SYS Note In the Oracle ILOM shell there is no notification when the system is actually powered off Powering off takes about a minute Use the show HOST command to determine if the host has powered off Oracle ILOM to ALOM CMT Command Reference The following table describes the typical commands for servicing a server For descriptions of all ALOM CMT commands issue the help command or refer to the following documents m Oracle Integrated Lights Out Manager 3 0 Concepts Guide Managing Faults 57 m Oracle Integrated Lights Out Manager 3 0 Supplement for the Sun SPARC Enterprise T5440 Server Oracle ILOM Command ALOM CMT Command Description help command set HOST send break action true set SYS component clear fault a ction true Start SP console show SP console history help command break y c D y skips the confirmation question cexecutes a console command after the break command completes D forces a core dump of the Oracle Solaris OS clearfault UUID console f f forces the console to have read and write capabilities consolehistory b lines e lines v g lines boot run The following options enable you to specify how the output is displayed g lines specifies the number of lines to display before pausing e e lines displays n lines from the end of the buffer e b lines
153. upply is unplugged or if no AC power is present Blinking green AC present system AC power is present and system is in standby in standby mode Green AC present system System is powered on powered on Blinking amber Fault Voltage overcurrent or other power fault Amber Fault Internal power supply failure or power supply fan failure The following LEDs are lit when a power supply fault is detected m Front and rear Service Required LEDs m Rear PS Failure LED on the bezel of the server m Fault LED mode on the faulty power supply The front and rear panel Service Required LEDs are also lit if the system detects a power supply fault See Front Panel LEDs on page 5 and Rear Panel LEDs on page 8 for more information about identifying and interpreting system LEDs Servicing Customer Replaceable Units 97 See Power Supply LED on page 97 for specific information about power supply status LEDs Related Information Managing Faults on page 9 Hot Pluggable and Hot Swappable Devices on page 77 Front Panel LEDs on page 5 Rear Panel LEDs on page 8 Servicing PCIe Cards This topic includes the following Remove a PCIe Card on page 98 Install a PCIe Card on page 99 Add a PCIe Card on page 100 PCIe Device Identifiers on page 101 PCIe Slot Configuration Guidelines on page 102 V Remove a PCIe Card Before you begin complete these tasks Read the section Safety Information on page 63
154. ve the Service Processor Before you begin complete these tasks m Read the section Safety Information on page 63 m Power off the server using one of the methods described in the section Powering Off the System on page 67 m Extend the Server to the Maintenance Position on page 70 m Disconnect Power Cords From the Server on page 68 m Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 m Remove the Top Cover on page 73 Do the following 1 Ensure that the power cords are disconnected from the server 2 Loosen the two captive No 2 Phillips screws securing the service processor to the motherboard 128 Sun SPARC Enterprise T5440 Server Service Manual June 2011 3 Lift the service processor up and out of the system 4 Place the service processor on an antistatic mat Next Steps If you are replacing a faulty service processor you must install the IDPROM onto the new service processor Do the following m Remove the IDPROM from the old service processor See Remove the IDPROM on page 131 Servicing Field Replaceable Units 129 m Install the IDPROM onto the new service processor See Install the IDPROM on page 132 V Install the Service Processor 1 Ensure that the power cords are disconnected from the system 2 Lower the service processor into position Ensure that the service processor is oriented correctly over the motherboard connector and the two snap on standoff
155. vironmental or configuration Identify faulty FRU from the fault message and replace the FRU the Solaris logs or ILOPM event log indicate a faulty 7 ls the fault an External 1 0 Expansion Unit fault Identify faulty FRU from the Sun VTS 4 Does Sun VTS report message and any faulty fault a PSH replace the devices detected FRU fault 8 Is the Yes Identify faulty FRU from the 5 Does POST report any faulty devices 9 The fault is a POST and replace detected fault the FRU 10 Contact Sun Support if the fault condition persists Sun SPARC Enterprise T5440 Server Service Manual June 2011 Numbers in this flow chart correspond to the Action numbers in Table 2 1 Identify the fault condition from the fault message Isolate and replace the faulty FRU in the External 1 0 Expansion Unit Identify and replace the faulty FRU from the PSH message and perform the procedure to clear the PSH detected fault Identify and replace the faulty FRU from the POST message and perform the procedure to clear the POST detected faults TABLE Diagnostic Flowchart Actions Action No Diagnostic Action Resulting Action For more information 1 Check Power OK The Power OK LED is located on the front and rear Detecting Faults on and AC Present of the chassis page 32
156. wchart on page 11 46 Sun SPARC Enterprise T5440 Server Service Manual June 2011 m POST Fault Management Overview on page 20 m POST Fault Management Flowchart on page 21 m Sun SPARC Enterprise T5440 Server Administration Guide Identifying Faults Detected by PSH When a PSH fault is detected a Oracle Solaris console message is displayed similar to the following example EXAMPLE Console Message Showing Fault Detected by PSH SUNW MSG ID EVENT TIME SUNAV 8000 DX TYPE Fault VER 1 SEVERITY Minor Wed Sep 14 10 09 46 EDT 2005 PLATFORM SUNW system name CSN HOSTNAME wgs48 37 SOURCE cpumem diagnosis REV 1 5 EVENT ID f92e9fbe 735e c218 cf87 9e1720a28004 DESC The number of errors associated with this memory module has exceeded acceptable levels Refer to http sun com msg SUNAV 8000 DX for more information AUTO RESPONSE Pages of memory associated with this memory module are being removed from service as errors are reported IMPACT Total system memory capacity will be reduced as pages are retired REC ACTION Schedule a repair procedure to replace the affected memory module Use fmdump v u EVENT ID to identify the module Faults detected by the Oracle Solaris PSH facility are also reported through service processor alerts Note You can configure Oracle ILOM to generate SNMP traps or e mail alerts when a fault is
157. wer Supplies on page 91 Servicing the Power Distribution Board on page 134 Servicing the Hard Drive Backplane on page 139 Servicing the Motherboard on page 143 Remove the Flex Cable Assembly Before you begin complete these tasks Read the section Safety Information on page 63 Power off the server using one of the methods described in the section Powering Off the System on page 67 Extend the Server to the Maintenance Position on page 70 Perform Electrostatic Discharge Antistatic Prevention Measures on page 73 Remove the Top Cover on page 73 Do the following 1 2 Unplug the power cords Remove the flex cable retainer Loosen the captive No 2 Phillips screw and lift the retainer up and out of the chassis Servicing Field Replaceable Units 149 ER b 6 0 Y D D D i f A S O EON are IF 9 st d Jf 3 Unplug the flex cable to power supply backplane connection 4 Unplug the flex cable to hard drive backplane connection 5 Unplug the flex cable to DVD ROM drive connection 6 Unplug the flex cable to motherboard connection 7 Lift the flex cable up and out of the system V Install the Flex Cable Assembly 1 Ensure the power cables are unplugged 2 Plug in the motherboard connector Plug in the hard drive backplane connector B oo Plug in the DVD ROM drive connector 5 Plug in the power supply backplane connector 150 Sun SPARC E
Download Pdf Manuals
Related Search
Related Contents
769-08457 01 TB625 EC MAN:Sears VLT-MAN-ESO-15736-4547-PRIMET User Manual 9801i, 9815i, 9820i User Manual LDP-30 - Inter Hospitalar ZTT-540 T-slot Table Installation Manual 22 microgrammes/0,5 ml FreeRDP Configuration Manual - FOSS ROD-L M100DC 5.5-5 Operation & Service Bigben Interactive Pad RFLX Les Grandes Traversées du Vercors - parcours Copyright © All rights reserved.
Failed to retrieve file