Home
SPARC T3-1B Server Module Service Manual
Contents
1. Documentation Links All Oracle products http www oracle com documentation SPARC T3 1B server http www oracle com pls topic lookup ctx SPARCT3 1B module Sun Blade 6000 modular http www oracle com pls topic lookup ctx E19938 01 system Oracle Integrated Lights http www oracle com pls topic lookup ctx ilom30 Out Manager Oracle ILOM Oracle Solaris OS and http www oracle com technetwork indexes documentation sys_sw other system software Oracle VTS software http www oracle com pls topic lookup ctx OracleVTS7 0 SAS 1 SAS 2 http www oracle com pls topic lookup ctx E22513_01 Compatibility Feedback Provide feedback about this documentation at http www oracle com goto docfeedback x SPARC T3 1B Server Module Service Manual July 2012 Support and Accessibility Description Links Access electronic support through My Oracle Support Learn about Oracle s commitment to accessibility http support oracle com For hearing impaired http www oracle com accessibility support html http www oracle com us corporate accessibility index html Using This Documentation xi xii SPARC T3 1B Server Module Service Manual July 2012 Identifying Components These topics explain the components of the server module focusing on the components that can be removed and replaced for service m Front and Rear Panel Components on page 2 m Illustrated Parts Breakdown on
2. gt gt gt 4 Confirm that the server module is in standby mode by viewing the blue Ready to Remove LED on the front of the server module See Front and Rear Panel Components on page 2 to locate this LED If the Ready to Remove LED is on the server module is ready for removal from the modular system chassis Related Information m Shut Down the Oracle Solaris OS on page 56 m Power Off the Server Module Power Button Standby Mode on page 57 m Power Off the Server Module Emergency Shutdown on page 58 Remove the Server Module From the Modular System Before performing this task review the following cautions Caution A server module can weigh as much as 17 pounds 8 0 kg During removal hold the server module firmly with both hands Caution Do not stack server modules higher than five units tall Caution Insert a filler panel into the empty server module slot within 60 seconds after removing a server module ensure proper modular system chassis cooling 1 If a cable is connected to the front of the server module disconnect it Press the buttons on either side of the UCP to release the connector Preparing for Service 59 WEEN N WNS men ANY Qoses 2 Open both ejector arms panel 2 Squeeze both latches on each of the two ejector arms SPARC T3 1B Server Module Service Manual July 2012 60 3 Pull the server module halfway out pan
3. on page 44 Checking if Oracle VTS Software Is Installed on page 48 Related Information Preparing for Service on page 51 Diagnostics Overview You can use a variety of diagnostic tools commands and indicators to monitor and troubleshoot a server module LEDs Provide a quick visual notification of the status of the server module and of some of the FRUs Oracle ILOM This firmware runs on the SP In addition to providing the interface between the hardware and OS Oracle ILOM also tracks and reports the health of key server module components Oracle ILOM works closely with POST and Oracle Solaris PSH technology to keep the system running even when there is a faulty component You can log in to multiple SP accounts simultaneously and have separate Oracle ILOM shell commands executing concurrently under each account 6 Note Unless indicated otherwise all examples of interaction with the SP are depicted with Oracle ILOM shell commands a POST POST performs diagnostics on system components upon system reset to ensure the integrity of those components POST can be configured and works with Oracle ILOM to take faulty components offline if needed m Oracle Solaris PSH This technology continuously monitors the health of the CPU memory and other components and works with Oracle ILOM to take a faulty component offline if needed The PSH technology enables systems to accurately predict component failures and
4. For possible values for the keyswitch_state parameter see Oracle ILOM Properties That Affect POST Behavior on page 33 3 If the virtual keyswitch is set to normal and you want to define the mode level verbosity or trigger set the respective parameters Syntax set HOST diag property value See Oracle ILOM Properties That Affect POST Behavior on page 33 for a list of parameters and values Examples gt set HOST diag mode normal or gt set HOST diag verbosity max 4 To see the current values for settings use the show command Example showing default values gt show HOST diag HOST diag Targets Properties error_reset_level max error_reset_verbosity normal hw_change_level max hw_change_verbosity normal level max mode normal power_on_level max power_on_verbosity normal trigger hw change error reset verbosity normal SPARC T3 1B Server Module Service Manual July 2012 Commands cd set show Related Information m POST Overview on page 32 m Oracle ILOM Properties That Affect POST Behavior on page 33 m Run POST With Maximum Testing on page 37 m Interpret POST Fault Messages on page 39 m Clear POST Detected Faults on page 40 Run POST With Maximum Testing This procedure describes how to configure the server module to run the maximum level of POST 1 Access the Oracle ILOM gt prompt See
5. Access the SP Oracle ILOM on page 15 2 Set the virtual keyswitch to diag so that POST will run in service mode gt set SYS keyswitch_state diag Set keyswitch_state to Diag 3 Reset the system so that POST runs There are several ways to initiate a reset The following example shows a reset by issuing commands that will power cycle the host gt stop SYS Are you sure you want to stop SYS y n y Stopping SYS gt start SYS Are you sure you want to start SYS y n y Starting SYS Note The server module takes about one minute to power off Type the show HOST command to determine when the host has been powered off The console will display status Powered Off Detecting and Managing Faults 37 4 Switch to the system console to view the POST output gt start HOST console The following example shows abridged POST output 0 a E OO OG 0O O e E E E E 1 E 1 E e E M E oOo l a e E 0 So OOO oo Oo O O O o 2 Oo oO 0 gt Network 0 gt SPARC T3 1B POST 4 32 1 b 2010 11 15 21 42 0 gt 0 gt Copyright c 2010 Oracle and or its affiliates All rights reserved 0 gt POST enabling CMP 0 threads fffffffL f fLLLLLL f LLLLLLL LLLLLLE 0 gt Diag mode 1 Normal 0 gt Diag level 1 Max 0 gt Diag verbosity 2 Normal 0 gt Test Memory Done 0 gt Setup POST Mailbox Done 0 gt Master CPU Tests B
6. Checking if Oracle VTS Software Is Installed 48 Oracle VTS Overview 49 v Check if Oracle VTS Software Is Installed 50 Preparing for Service 51 General Safety Information 51 Safety Symbols 52 iv SPARC T3 1B Server Module Service Manual July 2012 ESD Safety Measures 52 Antistatic Wrist Strap Use 52 Antistatic Mat 52 Tools Needed for Service 53 v Find the Modular System Serial Number 53 v v Find the Server Module Serial Number 54 Locate the Server Module 55 Removing the Server Module From the Modular System for Service 55 Shut Down the Oracle Solaris OS 56 Power Off the Server Module Power Button Standby Mode 57 Power Off the Server Module Emergency Shutdown 58 Prepare the Server Module for Removal 58 Remove the Server Module From the Modular System 59 4 a lt lt a lt Remove the Cover 62 Servicing Hard Drives 63 Drive Hot Plugging Rules 63 v v v v Remove a Drive 64 Replace or Add a Drive 65 Remove a Drive Filler 67 Install a Drive Filler 67 Servicing Memory 69 Memory Faults 69 v lt lt lt Locate a Faulty DIMM LEDs 70 Remove a DIMM 73 Install a Replacement DIMM 74 Clear the Fault and Verify the Functionality of the Replacement DIMM 75 Verify DIMM Functionality 79 Contents v DIMM Configuration Reference 81 Servicing aREM 85 v RemoveaREM 85 v InstallaREM 86 Servicing aFEM 89 v RemoveaFEM 89 v InstallaFEM 90 Servicing a Service Processor
7. July 2012 2010 07 03 18 44 14 804 0 7 2 gt 1 VEU 57 R W1C Set to 1 on an UE if VEF 0 and no fatal error is detected in same cycle 2010 07 03 18 44 14 983 0 7 2 gt 1 VEC 56 R W1C Set to 1 on a CE if VEF VEU 0 and no fatal or UE is detected in same cycle 2010 07 03 18 44 15 169 0 7 2 gt 1 DAU 50 R W1C Set to 1 if the error was a DRAM access UE 2010 07 03 18 44 15 304 0 7 2 gt 1 DAC 46 R W1C Set to 1 if the error was a DRAM access CE 2010 07 03 18 44 15 440 0 7 2 gt 2010 07 03 18 44 15 486 0 7 2 gt DRAM Error Address Reg for Branch 1 00000034 8647d2e0 2010 07 03 18 44 15 614 0 7 2 gt Physical Address is 00000005 d21bc0c0 2010 07 03 18 44 15 715 0 7 2 gt DRAM Error Location Reg for Branch 1 00000000 00000800 2010 07 03 18 44 15 842 0 7 2 gt DRAM Error Syndrome Reg for Branch 1 dd1676ac 8c18c045 2010 07 03 18 44 15 967 0 7 2 gt DRAM Error Retry Reg for Branch 1 00000000 00000004 2010 07 03 18 44 16 086 0 7 2 gt DRAM Error RetrySyndrome 1 Reg for Branch 1 a8a5f81le f6411b5a 2010 07 03 18 44 16 218 0 7 2 gt DRAM Error Retry Syndrome 2 Reg for Branch 1 a8a5f8le f6411b5a 2010 07 03 18 44 16 351 0 7 2 gt DRAM Failover Location 0 for Branch 1 00000000 00000000 2010 07 03 18 44 16 475 0 7 2 gt DRAM Failover Location 1 for Branch 1 00000000 00000000 2010 07 03 18 44 16 604 0 7 2 gt 2010 07 03 18 44 16 648 0 7 2 gt ERROR POST terminated prematurel
8. Managing Faults POST These topics explain how to use POST as a diagnostic tool m POST Overview on page 32 m Oracle ILOM Properties That Affect POST Behavior on page 33 m Configure How POST Runs on page 35 m Run POST With Maximum Testing on page 37 m Interpret POST Fault Messages on page 39 m Clear POST Detected Faults on page 40 m POST Error Message Syntax on page 42 Related Information m Diagnostics Overview on page 5 m Diagnostics Process on page 7 Detecting and Managing Faults 31 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 m Managing Faults Oracle Solaris PSH on page 25 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 POST Overview POST is a group of PROM based tests that run when the server module is powered on or when it is reset POST checks the basic integrity of the critical hardware components in the server module CMP memory and I O subsystem You can also run POST as system level hardware diagnostic tool To do this use the Oracle ILOM set command to set the parameter keyswitch_state to diag You can also set other Oracle ILOM properties to control various other aspects of POST operations For example you can specify the events that cause POST to run the level of testing POST performs and the amount of
9. show SYS LOCATE show SP logs event list show HOST show SYS Displays a list of all available commands with syntax and descriptions Specifying a command name as an option displays help for that command Takes the host server module from the OS to either kmdb or OBP equivalent to a Stop A depending on the mode Oracle Solaris software was booted Manually clears host detected faults The component is the unique ID of the device with a fault to be cleared Connects you to the host system Displays the contents of the system s console buffer Controls the host server module OBP firmware method of booting property is state config or script Powers off the host server module and then powers on the host server module Powers off the host server module Powers on the host server module Generates a hardware reset on the host server module Reboots the SP Sets the virtual keyswitch value is normal standby diag or locked Turns the Locator LED on the server module on or off value is Fast_blink or Off Displays current system faults See Check for Faults show faulty Command on page 18 Displays the status of the virtual keyswitch Displays the current state of the Locator LED as either on or off Displays the history of all events logged in the SP event buffers in RAM or the persistent buffers Displays information about the operating state of the host system whether the hardware is p
10. 2010 Oracle and or its affiliates Inc All rights reserved Warning password is set to factory default The Oracle ILOM gt prompt indicates that you are accessing the SP with the Oracle ILOM CLI 4 Perform Oracle ILOM commands that provide the diagnostic information you need The following Oracle I LOM commands are commonly used for fault management a show command Displays information about individual FRUs See Display FRU Information show Command on page 17 a show faulty command Displays environmental POST detected and PSH detected faults See Check for Faults show faulty Command on page 18 Note You can use fmadm faulty in the Oracle ILOM faultmgmt shell as an alternative to show faulty a clear_fault_action property of the set command Manually clears PSH detected faults See Clear Faults clear_fault_action Property on page 21 Related Information m Oracle Integrated Lights Out Manager ILOM 3 0 Concepts Guide m Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle I LOM Command Summary on page 21 16 SPARC T3 1B Server Module Service Manual July 2012 m Oracle ILOM Properties That Affect POST Behavior on page 33 Vv Display FRU Information show Co
11. Detecting and Managing Faults 27 Y Check for PSH Detected Faults Use the fmadm faulty command to display the list of faults detected by the Oracle Solaris PSH facility You can run this command either from the host or through the Oracle ILOM madm shell As an alternative you can display fault information by running the Oracle LOM command show 1 Check the event log fmadm faulty TIME EVENT ID MSG ID SEVERITY Aug 13 11 48 33 21a8b59e 89ff 692a c4bc f4c5cccca8c8 SUN4V 8002 6E Major Platform Product_sn Fault class sun4v Chassis_id fault cpu generic sparc strand Affects cpu cpurd Seriala ee Fe ERR EREEKREEREEKER faulted and taken out of service FRU SYS MB he product id iproduct Sna s s Server AIQ E tsk kk eRRONKKER ChaSsis ida ke KKK RR KKK KKK KKK KKK KK KKK KKKEK gt Sarial a revision 05 chassis 0 motherboard 0 faulty Description The number of correctable errors associated with this strand has xceeded acceptable levels Refer to http sun com msg SUN4V 8002 6E for more information Response The fault manager will attempt to remove the affected strand from service Impact System performance may be affected Action Schedule a repair procedure to replace the affected resource the identity of which can be determined using fmadm faulty In this example a fault is displayed indicating the following details m Date and time
12. PSH detected faults Faults detected by the Oracle Solaris PSH technology At the gt prompt enter the show faulty command If a fault is displayed check the output to determine the nature of the fault The following examples show the different kinds of output that might be displayed m Example of the show faulty command when no faults are present gt show faulty Target Property Value Example of the show faulty command displaying a fault when one of the AC inputs for power supply PSO is not plugged in gt show faulty Target Property Value oD oR tt hee at el Papen spe pent te IS a ee ol fe a eee gt th ek E SP faultmgmt 0 fru SYS PSO SP faultmgmt 0 class fault chassis env power loss faults 0 SP faultmgmt 0 sunw msg id SPT 8000 5X faults 0 SP faultmgmt 0 uuid 64d52ce4 614e 693 bb71 ea3 829d faults 0 ad73 SP faultmgmt 0 timestamp 2010 10 14 20 14 13 faults 0 SP faultmgmt 0 detector SYS PS0 S1 V_IN_ERR faults 0 SP faultmgmt 0 product_serial_number 1030NNDOD2 faults 0 18 SPARC T3 1B Server Module Service Manual July 2012 SP faultmgmt 0 chassis_serial_number 0000000 0000000000 faults 0 gt m Example of the show faulty command displaying a fault that was detected by POST These kinds of faults are identified by the message Forced fail reason where reason is the name of the power on routine that detected t
13. Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Remove a USB Flash Drive 1 Prepare for service See m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 2 Locate the USB flash drive at the rear of the server module panel 1 101 3 Pull the drive out panel 2 4 If needed Install a USB flash drive See Install a USB Flash Drive on page 102 Related Information m Install a USB Flash Drive on page 102 Y Install a USB Flash Drive The server module has a USB port on the motherboard The USB port accepts USB flash drives that do not exceed a length of 39 mm 1 Prepare for service See Shut Down the Oracle Solaris OS on page 56 Prepare the Server Module for Removal on page 58 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 ESD Safety Measures on page 52 If needed Remove a USB Flash Drive on page 101 2 Locate the USB connector on the motherboard 102 SPARC T3 1B Server Module Service Manual e July 2012 3 Plug your USB flash drive into the upper port of the USB connector panels 1 and 2 Do not use the lower port of this connector 4 Retur
14. time setting 105 tools for service 53 troubleshooting by checking Oracle Solaris OS log files 8 using POST 8 9 using the show faulty command 8 using VTS 8 124 SPARC T3 1B Server Module Service Manual July 2012
15. 81 installing replacements 74 LEDs 70 locating faulty 70 low voltage types 81 removing 73 supported 81 verifying 75 79 121 disk drives 63 displaying faults 18 FRU information 17 dmesg command 24 documentation related x E ejector tabs 73 emergency shutdown 58 enclosure assembly 108 topics 107 environmental faults 8 9 14 18 ESD 52 preventing using an antistatic mat 52 preventing using an antistatic wrist strap 52 safety measures 52 ethernet port 15 F fault management 5 fault messages POST interpreting 39 faults clearing 21 detected by POST 8 detected by PSH 8 displaying 18 environmental 8 9 memory 69 PSH detected fault example 27 PSH detected checking for 28 recovery 13 repair 14 FEM installing 90 removing 89 topics 89 field replaceable units FRUs displaying status of 25 flash drive installing 102 removing 101 topics 101 fmadm command 30 75 fmadm faulty command 20 fmdump command 28 FRU ID PROMs 13 FRU information displaying 17 FRU names DIMMs 81 H hard drives 63 HDD 63 hot plugging 63 l I O subsystem 32 44 ID PROM 13 installing 98 removing 97 topics 97 verifying 99 identifying components 1 ILOM CLI 15 fault management 12 service related commands 21 troubleshooting 12 web interface 15 ILOM commands show faulty 22 installation order DIMMs 81 installing battery 105 cover 111 replacement DIMM
16. D1 13 Black 16 BOB2 CH0 D1 14 White 8 16 BOB2 CH0 DO 15 Black 16 BOB2 CH1 D1 16 Blue 4 8 16 BOB2 CH1 D0 From top to bottom when viewed with the front panel of the server module on your left Related Information m Memory Faults on page 69 m Locate a Faulty DIMM LEDs on page 70 m Remove a DIMM on page 73 m Install a Replacement DIMM on page 74 Servicing Memory 83 m Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 84 SPARC T3 1B Server Module Service Manual July 2012 Servicing a REM The server module support the installation of one REM Only certain REMs are supported For a list of supported REMs refer to the SPARC T3 1B Server Module Product Notes Description Links Replace a REM Remove a REM on page 85 Install a REM on page 86 Install a REM Install a REM on page 86 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Remove a REM 1 Prepare for service See Shut Down the Oracle Solaris OS on page 56 Prepare the Server Module for Removal on page 58 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 ESD Safety Measures on page 52 2 Lift up on the REM lever panel 1 85 3 Rotate the card up and off the retainer panels 2 and 3 4 Set the car
17. On Standby button HDD Ready to Remove LED HDD Service Action Required LED HDD OK Activity LED Green Indicates the following conditions OK e Off System is not running in its normal state System power might be off The SP might be running e Steady on System is powered on and is running in its normal operating state No service actions are required e Fast blink System is running in standby mode and can be quickly returned to full function e Slow blink A normal but transitory activity is taking place Slow blinking might indicate that system diagnostics are running or the system is booting n a The recessed Power button toggles the system on or off e Press once to turn the system on e Press once to shut the system down to a standby state e Press and hold for 4 seconds to perform an emergency shutdown Blue Indicates that a hard drive can be removed during a hot plug e operation Amber Indicates that the hard drive has experienced a fault condition Green On HDDs indicates the following drive status e On Drive is idle and available for use e Off Read or write activity is in progress Related Information m Diagnostics Overview on page 5 m Diagnostics Process on page 7 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 m Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on
18. Service Manual July 2012 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 m If needed Remove a FEM on page 89 2 Determine the correct set of motherboard FEM connectors for your FEM m An L shaped FEM card 1 uses connectors FEM X and FEM O m A rectangular double width FEM card 2 uses connectors FEM 0 and FEM 1 m A rectangular single width FEM card 3 uses connector FEM 0 3 Insert the FEM edge into the bracket and carefully align the FEM so that the card connects with the correct motherboard connectors Servicing a FEM 91 4 Lower the card and press the card into place If the card has rubber bumpers you can press directly on them to seat the card into the connectors 5 Return the server module to operation See Returning the Server Module to Operation on page 111 Related Information a Remove a FEM on page 89 92 SPARC T3 1B Server Module Service Manual July 2012 Servicing a Service Processor Card The server module has a service processor card with firmware that provides the SP m Remove the Service Processor Card on page 93 m Install the Service Processor Card on page 94 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Remove the Service Proce
19. clear_fault_action true Are you sure you want to clear SYS MB CMP0 BOBO CHO DO y n y Set clear_fault_action to true 7 Only if previous steps did not clear the fault Switch to the system console and type the fmadm repair command with the UUID Use the same UUID that was displayed from the output of the Oracle ILOM show faulty command fmadm repair 3aa7c854 9667 e176 efe5 e487e5207a8a Related Information a Install a Replacement DIMM on page 74 78 SPARC T3 1B Server Module Service Manual July 2012 m Verify DIMM Functionality on page 79 V Verify DIMM Functionality 1 Access the Oracle ILOM gt prompt Refer to the SPARC T3 Series Servers Administration Guide for instructions 2 Use the show faulty command to determine how to clear the fault m If show faulty indicates a POST detected fault go to Step 3 a If show faulty output displays a UUID which indicates a host detected fault skip Step 3 and go to Step 4 3 Use the set command to enable the DIMM that was disabled by POST In most cases replacement of a faulty DIMM is detected when the SP is power cycled In those cases the fault is automatically cleared from the system If show faulty still displays the fault the set command will clear it gt set SYS MB CMP0 BOB0O CH0 DO component_state Enabled 4 For a host detected fault perform the following steps to verify the new DIMM a Set the virtual keyswitch to
20. diag so that POST will run in Service mode gt set SYS keyswitch_state Diag Set keyswitch_state to Diag b Power cycle the server module host gt stop SYS Are you sure you want to stop SYS y n y Stopping SYS gt start SYS Are you sure you want to start SYS y n y Starting SYS Note Use the show HOST command to determine when the host has been powered off The console will display status Powered Off Allow approximately one minute before running this command Servicing Memory 79 c Switch to the system console to view POST output Watch the POST output for possible fault messages The following output indicates that POST did not detect any faults gt start HOST console 2 gt INFO 2 gt POST Passed all devices 2 gt POST Return to VBSC 2 gt Master set ACK for vbsc runpost command and spin So Oo NNNN Note The system might boot automatically at this point If so go directly to Step e If it remains at the ok prompt go to Step d d If the server module remains at the ok prompt type boot e Return the virtual keyswitch to Normal mode gt set SYS keyswitch_state Normal Set ketswitch_state to Normal f Switch to the system console and type the Oracle Solaris OS fmadm faulty command fmadm faulty If any faults are reported see the diagnostics instructions in Oracle ILOM Troubleshooting Overv
21. drives and DIMMs require special handling Caution Circuit boards and hard drives contain electronic components that are extremely sensitive to static electricity Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards Do not touch the components along their connector edges Antistatic Wrist Strap Use Wear an antistatic wrist strap and use an antistatic mat when handling components such as hard drive assemblies circuit boards or PCI cards When servicing or removing server module components attach an antistatic strap to your wrist and then to a metal area on the chassis Following this practice equalizes the electrical potentials between you and the server module Antistatic Mat Place ESD sensitive components such as cards and DIMMs on an antistatic mat SPARC T3 1B Server Module Service Manual July 2012 Related Information m Returning the Server Module to Operation on page 111 Tools Needed for Service The following tools are required for service procedures m Antistatic wrist strap m Antistatic mat m Stylus or pencil to operate the power button a UCP 3 dongle UCP 4 dongle can be used but see instructions in the SPARC T3 1B Server Module Installation Guide m Blade filler panel Related Information m General Safety Information on page 51 V Find the Modular System Serial Number To obtain support for your s
22. have to clear the fault manually Note This procedure clears the fault from the SP but not from the host If the fault persists in the host clear it manually as described in Clear PSH Detected Faults on page 30 Atthe gt prompt use the set command with the clear_fault_action True property Example gt set SYS MB CMP0 BOBO CHO DO clear_fault_action True Are you sure you want to clear SYS MB CMP0 BOBO CHO DO y n y Set clear_fault_action to true Related Information m Diagnostics Process on page 7 m Access the SP Oracle ILOM on page 15 m Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Service Related Oracle ILOM Command Summary on page 21 Service Related Oracle ILOM Command Summary The following table describes the Oracle ILOM shell commands most frequently used when performing service related tasks Detecting and Managing Faults Oracle ILOM Command Description help command set HOST send_break_action break set SYS component clear_fault_action true start HOST console show HOST console history set HOST bootmode property value stop SYS start SYS stop SYS start SYS reset SYS reset SP set SYS keyswitch_state value set SYS LOCATE value value show faulty show SYS keyswitch_state
23. number of correctable errors associated with this strand has xceeded acceptable levels Refer to http sun com msg SUN4V 8002 6E for more information Response The fault manager will attempt to remove the affected strand from service Impact System performance may be affected Action Schedule a repair procedure to replace the affected resource the identity of which can be determined using fmadm faulty a If no fault is reported you do not need to do anything else Do not perform the subsequent steps a Ifa fault is reported continue to the next step 30 SPARC T3 1B Server Module Service Manual July 2012 3 Clear the fault from all persistent fault records In some cases even though the fault is cleared some persistent fault information remains and results in erroneous fault messages at boot time To ensure that these messages are not displayed type the following Oracle Solaris command fmadm repair UUID For the UUID in the example shown in Step 2 type fmadm repair 21a8b59e 89ff 692a c4bc f4c5cccca8c8 4 Use the clear_fault_action property of the FRU to clear the fault gt set SYS MB clear_fault_action True Are you sure you want to clear SYS MB y n y set clear_fault_action to true Related Information m Oracle Solaris PSH Technology Overview on page 26 m PSH Detected Fault Example on page 27 m Clear PSH Detected Faults on page 30
24. of the fault Aug 13 11 48 33 a UUID which is unique for every fault 21a8b59e 89ff 692a c4bc f4c5cccca8c8 a Message identifier which can be used to obtain additional fault information SUN4V 8002 6E 28 SPARC T3 1B Server Module Service Manual July 2012 m Faulted FRU The information provided in the example includes the part number of the FRU part 511127809 and the serial number of the FRU serial 1005LCB 1019B100A2 The FRU field provides the name of the FRU SYS MB for motherboard in this example 2 Use the message ID to obtain more information about this type of fault a Obtain the message ID from console output or from the Oracle ILOM show faulty command b Enter the message ID at the end of the PSH Knowledge Article web site http www sun com msg In the current example enter this in the browser address window http www sun com msg SUN4V 8002 6E The following example shows the message ID SUN4V 8002 6E and provides information for corrective action Correctable strand errors exceeded acceptable levels Type Fault Severity Major Description The number of correctable errors associated with this strand has exceeded acceptable levels Automated Response The fault manager will attempt to remove the affected strand from service Impact System performance may be affected Suggested Action for System Administrator Schedule a repair procedure to replace the affected resource the identity of wh
25. on page 1 m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 107 V Transfer Components to Another Enclosure Assembly 1 Prepare to take all ESD precautions when working with both the original server module and the new enclosure assembly Prepare to place all components on an antistatic mat unless you install each component immediately in the new enclosure assembly Follow the precautions explained in Preparing for Service on page 51 Remove the top cover from the original server module and the new enclosure assembly See Remove the Cover on page 62 Transfer the drives from the original server module to the enclosure assembly See Servicing Hard Drives on page 63 Transfer the drive fillers from the original server module to the enclosure assembly See Remove a Drive Filler on page 67 and Install a Drive Filler on page 67 Transfer the DIMMs from the original server module to the enclosure assembly Move each DIMM to the same slot in the enclosure assembly See Servicing Memory on page 69 Transfer the REM from the original server module to the enclosure assembly See Servicing a REM on page 85 Transfer the FEM from the original server module to the enclosure assembly Install the FEM in the same connectors in the enclosure assembly See Servicing a FEM on page 89 Transfer the service processor card from the original server modul
26. page 3 Related Information m Detecting and Managing Faults on page 5 m Replacing the Server Module Enclosure Assembly on page 107 Front and Rear Panel Components FIGURE Front and Rear Components Figure Legend White LED Locator functions as the physical presence switch Blue LED Ready to Remove Amber LED Service Action Required Green LED OK Power button Reset button NMI for service use only Green LED Drive OK Amber LED Drive Service Action Required on oOo art WD 9 Blue LED Drive Ready to Remove 10 RFID sticker indicates serial number of the server module 11 Universal connector port UCP 12 Chassis power connector 13 Chassis data connector 2 SPARC T3 1B Server Module Service Manual July 2012 Related Information m Diagnostics LEDs on page 10 a Illustrated Parts Breakdown on page 3 Illustrated Parts Breakdown This topic identifies components in the server module that you can install or remove and replace The following table provides information about the replaceable components Identifying Components 3 TABLE Replaceable Components FRU 1 Hard drives 2 Replacement enclosure 3 Service processor card 4 DIMMs 5 FEM card 6 REM card 7 Clock battery 8 Connector cover 9 USB flash drive 10 ID PROM 11 Drive filler Replacement Instructions Servicing Hard Drives on page 63 Replacing the Se
27. page 31 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 Detecting and Managing Faults 11 Managing Faults Oracle ILOM These topics explain how to use Oracle ILOM the SP firmware to diagnose faults and verify successful repairs a Oracle ILOM Troubleshooting Overview on page 12 m Access the SP Oracle ILOM on page 15 a Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle ILOM Command Summary on page 21 m Oracle ILOM Properties That Affect POST Behavior on page 33 Related Information a Diagnostics Overview on page 5 m Diagnostics Process on page 7 m Interpreting Log Files and System Messages on page 23 m Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on page 31 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 m POST Overview on page 32 m Oracle ILOM Properties That Affect POST Behavior on page 33 Oracle ILOM Troubleshooting Overview The Oracle ILOM firmware enables you to remotely run diagnostics such as POST that would otherwise require physical proximity to the server module Yo
28. system is coming down Please wait svc startd 100 system services are now being stopped SPARC T3 1B Server Module Service Manual July 2012 Jun 28 13 06 34 dt90 366 syslogd going down on signal 15 svc startd The system is down syncing file systems done Program terminated SPARC T3 1B No Keyboard OpenBoot 4 30 16256 MB memory available Serial 87305111 Ethernet address 0 21 28 34 2b 90 Host ID 85342b90 0 ok 6 Switch from the system console to the gt prompt by typing the Hash Period key sequence 7 At the gt prompt type gt stop SYS Note You can also use the Power button on the front of the server module to initiate a graceful shutdown See Power Off the Server Module Power Button Standby Mode on page 57 This button is recessed to prevent accidental server module power off Use the tip of a pen or other stylus to operate this button Related Information m Power Off the Server Module Power Button Standby Mode on page 57 m Power Off the Server Module Emergency Shutdown on page 58 m Prepare the Server Module for Removal on page 58 Power Off the Server Module Power Button Standby Mode This procedure places the server module in the power standby mode In this mode the Power OK LED blinks rapidly Press and release the recessed Power button Use a stylus or the tip of a pen to operate this button See Fron
29. the replacement enclosure assembly This action ensures that your server module will maintain the same host ID and MAC address m Remove the ID PROM on page 97 m Install the ID PROM on page 98 m Verify the ID PROM on page 99 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Remove the ID PROM 1 Prepare for service See m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 2 Locate the ID PROM on the motherboard 97 3 Lift the ID PROM 1 straight up from its socket 2 Place the ID PROM on an antistatic mat 4 Install the ID PROM See Install the ID PROM on page 98 Related Information a Install the ID PROM on page 98 m Verify the ID PROM on page 99 V Install the ID PROM 1 If needed Remove the ID PROM See Remove the ID PROM on page 97 2 Locate the ID PROM socket on the motherboard 98 SPARC T3 1B Server Module Service Manual July 2012 3 Align the ID PROM notched end with the notched end on the motherboard socket and press in place 4 Return the server module to operation See Returning the Server Module to Operation on page 111 5 Verify the ID PROM See Verify the ID PRO
30. to avoid damaging the circuit boards Caution Components inside the chassis might be hot Use caution when servicing components inside the chassis 1 If needed Prepare for service See Shut Down the Oracle Solaris OS on page 56 Prepare the Server Module for Removal on page 58 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 ESD Safety Measures on page 52 2 If needed Locate the faulty DIMM See Locate a Faulty DIMM LEDs on page 70 3 Remove the DIMM from the motherboard as described in the following steps C Push down on the ejector tabs on each side of the DIMM until the DIMM is released panel 1 Grasp the top corners of the DIMM and lift and remove it from the server module panel 2 Place the DIMM on an antistatic mat 4 Install a replacement DIMM See Install a Replacement DIMM on page 74 Servicing Memory 73 Related Information m Install a Replacement DIMM on page 74 a DIMM Configuration Reference on page 81 AN V Install a Replacement DIMM Caution This procedure involves handling circuit boards that are extremely sensitive to static electricity Ensure that you follow ESD preventative practices to avoid damaging the circuit boards Caution Components inside the chassis might be hot Use caution when servicing components inside the chassis 1 If n
31. 0 faults 0 Forced fail POST In most cases the replacement of the faulty DIMM is detected when the SP is power cycled In this case the fault is automatically cleared from the system If the fault is still displayed by the show faulty command then use the set command to enable the DIMM and clear the fault Example gt set SYS MB CMP0 BOB0O CH0 DO component_state Enabled 3 Perform the following steps to verify the repair 76 SPARC T3 1B Server Module Service Manual July 2012 a Set the virtual keyswitch to diag so that POST will run in Service mode gt set SYS keyswitch_state Diag Set keyswitch_state to Diag b Power cycle the system gt stop SYS Are you sure you want to stop SYS y n y Stopping SYS gt start SYS Are you sure you want to start SYS y n y Starting SYS Note The server module takes about one minute to power off Use the show HOST command to determine when the host has been powered off The console will display status Powered Off c Switch to the system console to view POST output gt start HOST console Watch the POST output for possible fault messages The following output is a sign that POST did not detect any faults 0 gt INFO 0 gt POST Passed all devices 0 gt POST Return to VBSC 0 gt Master set ACK for vbsc runpost command and spin ooo G OO OO Note Depending on the configuration of
32. ARC T3 1B Server Module Service Manual July 2012 Glossary A ANSI SIS ASR B blade blade server C chassis CLI CMM CMM ILOM American National Standards Institute Status Indicator Standard Automatic system recovery Generic term for server modules and storage modules Server module Modular system enclosure Command line interface Chassis monitoring module ILOM runs on the CMM providing lights out management of the components in the modular system chassis See ILOM ILOM that runs on the CMM See ILOM 115 D DHCP disk module or disk blade DTE ESD FEM FRU HBA ILOM ID PROM IP Dynamic Host Configuration Protocol Interchangeable terms for storage module Data terminal equipment Electrostatic discharge Fabric expansion module FEMs enable server modules to use the 10GbE connections provided by certain NEMs See NEM Field replaceable unit Host bus adapter See REM Oracle Integrated Lights Out Manager ILOM firmware is preinstalled on a variety of Oracle systems ILOM enables you to remotely manage your Oracle servers regardless of the state of the host system Chip that contains system information for the server module Internet Protocol 116 SPARC T3 1B Server Module Service Manual July 2012 KVM M MAC or MAC address MSGID N name space NEM NET MGT NMI O OBP PCI EM Keyboard v
33. ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 SPARC T3 1B Server Module Service Manual July 2012 Diagnostics Process The following flowchart illustrates the complementary relationship of the different diagnostic tools and indicates a default sequence of use Faulty hardware suspected 9 Contact Support if the fault condition persists Detecting and Managing Faults 7 The following table provides brief descriptions of the troubleshooting actions shown in the flowchart The table also provides links to topics with additional information on each diagnostic action TABLE Diagnostic Action Diagnostic Flowchart Reference Table Possible Outcome Additional Information Flowchart item 1 Check the Power OK LED Flowchart item 2 Run the Oracle ILOM show faulty command to check for faults Flowchart item 3 Check the Oracle Solaris log files for fault information Flowchart item 4 Run VTS software Flowchart item 5 Run POST The Power OK LED is located on the front of the server module If this LED is not lit check the power source and ensure that the server module is properly installed in the modular system chassis The show faulty command displays the following kinds of faults e Environmental and configuration faults e PSH detected faults e POST detected faults Faulty FRUs are identified in fault messages using the FRU na
34. Card 93 v Remove the Service Processor Card 93 v Install the Service Processor Card 94 Servicing the ID PROM 97 v Remove the ID PROM 97 v Installthe ID PROM 98 v Verify the ID PROM 99 Servicing a USB Flash Drive 101 v Remove a USB Flash Drive 101 v Installa USB Flash Drive 102 Servicing the Battery 105 v Replace the Battery 105 Replacing the Server Module Enclosure Assembly 107 v Transfer Components to Another Enclosure Assembly 108 Returning the Server Module to Operation 111 v Replace the Cover 111 v Install the Server Module Into the Modular System 112 SPARC T3 1B Server Module Service Manual July 2012 v Start the Server Module Host 114 Glossary 115 Index 121 Contents vii viii SPARC T3 1B Server Module Service Manual July 2012 Using This Documentation This service manual explains how to identify faults replace parts and add additional options in the SPARC T3 1B server module from Oracle This document is written for technicians system administrators authorized service providers and users who have advanced experience troubleshooting and replacing hardware Product Notes on page ix Product Notes on page ix Feedback on page x Support and Accessibility on page xi Product Notes For late breaking information and known issues about this product refer to the product notes at http www oracle com pls topic lookup ctx SPARCT3 1B Related Documentation
35. FAN FAULT Properties type Host System keyswitch_state Normal product_name SPARC T3 1B product_serial_number 0723BBC006 lt fault_state OK clear_fault_action none power_state On 54 SPARC T3 1B Server Module Service Manual July 2012 Related Information m Locate the Server Module on page 55 m Find the Modular System Serial Number on page 53 V Locate the Server Module To identify a specific server module from others in the modular system perform the following steps 1 Log in to Oracle ILOM on the server module you plan to locate 2 Type gt set SYS LOCATE value fast_blink The Locator LED on the server module blinks 3 Identify the server module with a blinking white LED 4 Once you locate the server module press the Locator LED to turn it off Note Alternatively you can turn off the Locator LED by typing the Oracle ILOM set SYS LOCATE value off command Related Information m Remove the Server Module From the Modular System on page 59 Removing the Server Module From the Modular System for Service Perform the following tasks m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 Preparing for Service 55 56 Remove the Cover on page 62 Related Information Install the Server Module Into the Mo
36. M on page 99 Related Information m Remove the ID PROM on page 97 m Verify the ID PROM on page 99 V Verify the ID PROM The host MAC address and the Host ID values are stored in the ID PROM This task describes ways to display these values 1 Display the MAC address that is stored in the ID PROM Example using the Oracle ILOM show command gt show HOST macaddress Servicing the ID PROM 99 HOST Properties macaddress 00 21 28 34 29 9c 2 Display the host ID Example using the Solaris hostid command hostid 857 6844 3 Display the Ethernet address Example using the Solaris ifconfig command ifconfig a 100 flags 2001000849 lt UP LOOPBACK RUNNING MULTICAST IPv4 VIRTUAL gt mtu 8232 index 1 inet 127 0 0 1 netmask 000000 igb0 flags 1004843 lt UP BROADCAST RUNNING MULTICAST DHCP IPv4 gt mtu 1500 index inet 10 6 91 117 netmask fffffe00 broadcast 10 6 91 255 ether 0 21 28 7 68 44 Related Information a Remove the ID PROM on page 97 a Install the ID PROM on page 98 100 SPARC T3 1B Server Module Service Manual July 2012 Servicing a USB Flash Drive You can install one USB flash drive in the server module Description Links Replace a USB flash drive Remove a USB Flash Drive on page 101 Install a USB Flash Drive on page 102 Add a USB flash drive Install a USB Flash Drive on page 102
37. OM command show faulty for the same purpose Related Information m Check for Faults show faulty Command on page 18 a PSH Detected Fault Example on page 27 m Check for PSH Detected Faults on page 28 m Clear PSH Detected Faults on page 30 PSH Detected Fault Example When a PSH fault is detected an Oracle Solaris console message similar to the following example is displayed SUNW MSG ID SUN4V 8000 DxX TYPE Fault VER 1 SEVERITY Minor EVENT TIME Wed Jun 17 10 09 46 EDT 2009 PLATFORM SUNW system_name CSN HOSTNAME server48 37 SOURCE cpumem diagnosis REV 1 5 EVENT ID 92e9fbe 735e c218 cf 87 9e1720a28004 DESC The number of errors associated with this memory module has exceeded acceptable levels Refer to http sun com msg SUN4V 8000 DX for more information AUTO RESPONSE Pages of memory associated with this memory module are being removed from service as errors are reported IMPACT Total system memory capacity will be reduced as pages are retired REC ACTION Schedule a repair procedure to replace the affected memory module Use fmdump v u lt EVENT_ID gt to identify the module Note The Service Action Required LED is also turned on for PSH diagnosed faults Related Information m Oracle Solaris PSH Technology Overview on page 26 m Check for PSH Detected Faults on page 28 m Clear PSH Detected Faults on page 30
38. Oracle ILOM variables that affect POST and whether POST detected faults or not the system might boot or the system might remain at the ok prompt If the system is at the ok prompt type boot d Return the virtual keyswitch to Normal mode gt set SYS keyswitch_state Normal Set ketswitch_state to Normal Servicing Memory 77 e Switch to the system console and type the Solaris OS fmadm faulty command fmadm faulty No memory faults should be displayed If faults are reported refer to the Diagnostics Process on page 7 for an approach to diagnose the fault 4 Switch to the Oracle ILOM prompt gt 5 Type the show faulty command m If the fault was detected by the host and the fault information persists the output will be similar to the following example gt show faulty Target Property Value pee eee eee PRIN OERE ate ee ee ae at ee SP faultmgmt 0 fru SYS MB CMP0 BOB0 CH0 D0 SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUN4V 8000 DX faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Dec 14 22 43 59 faults 0 m Ifthe show faulty command does not report a fault with a UUID the fault is cleared You do not need to proceed with the following steps 6 Only if previous steps did not clear the fault Type the set command gt set SYS MB CMP0 BOB0O CH0O DO
39. SPARC T3 1B Server Module Service Manual SS Sun o Copyright 2010 2012 Oracle and or its affiliates All rights reserved This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws Except as expressly permitted in your license agreement or allowed by law you may not use copy reproduce translate broadcast modify license transmit distribute exhibit perform publish or display any part in any form or by any means Reverse engineering disassembly or decompilation of this software unless required by law for interoperability is prohibited The information contained herein is subject to change without notice and is not warranted to be error free If you find any errors please report them to us in writing If this is software or related software documentation that is delivered to the U S Government or anyone licensing it on behalf of the U S Government the following notice is applicable U S GOVERNMENT RIGHTS Programs software databases and related documentation and technical data delivered to U S Government customers are commercial computer software or commercial technical data pursuant to the applicable Federal Acquisition Regulation and agency specific ja ieee oleae regulations As such the use duplication disclosure modification and adaptation shall be subject to the restrictions and license t
40. SPARC T3 Series Servers Administration Guide Related Information m Display System Components on page 45 m Disable System Components on page 47 m Enable System Components on page 48 Vv Display System Components The show components command displays the system components asrkeys and reports their status At the gt prompt type the show components command In the following example one of the DIMMs BOB1 CH0 D0 is shown as disabled gt show components Target Property Value Bees tese cocoa ses P Reh Per SO E SYS MB REM component_state Enabled Detecting and Managing Faults 45 SYS MB FEMO component_state Enabled SYS MB CMP0 L2T0O component_state Enabled SYS MB CMP0 L2T1 component_state Enabled SYS MB CMP0 L2T2 component_state Enabled SYS MB CMP0 L2T3 component_state Enabled SYS MB CMP0 L2T4 component_state Enabled SYS MB CMP0 L2T5 component_state Enabled SYS MB CMP0 L2T6 component_state Enabled SYS MB CMP0 L2T7 component_state Enabled SYS MB CMP0 COREO component_state Enabled PO SYS MB CMP0 CORE0 component_state Enabled P1 Kie lt gt SYS MB CMPO MCUO component_state Enabled SYS MB CMPO MCUL1 component_state Enabled SYS MB CMPO NIUO component_state Enabled SYS MB CMP0 NIU1L component_state Enabled SYS MB CMPO component_state Enabled NIU_CORE SYS MB CMP0 PEX component_
41. SYS gt start SYS Are you sure you want to start SYS y n y Starting SYS Note In the Oracle ILOM shell there is no notification when the system is actually powered off Powering off takes about a minute Use the show HOST command to determine if the host has powered off Related Information m View the System Message Log Files on page 24 m Display System Components on page 45 m Enable System Components on page 48 Detecting and Managing Faults 47 V Enable System Components You enable a component by setting its component_state property to Enabled This action removes the component from the ASR blacklist 1 At the gt prompt set the component_state property to Enabled gt set SYS MB CMP0 BOB1 CH0 DO component_state Enabled 2 Reset the server module so that the ASR command takes effect gt stop SYS Are you sure you want to stop SYS y n y Stopping SYS gt start SYS Are you sure you want to start SYS y n y Starting SYS Note In the Oracle ILOM shell there is no notification when the system is actually powered off Powering off takes about a minute Use the show HOST command to determine if the host has powered off Related Information m View the System Message Log Files on page 24 m Display System Components on page 45 m Disable System Components on page 47 48 Checking if Oracle VTS Software Is Inst
42. U POST logs the fault and if possible takes the FRU offline POST detected FRUs display the following text in the fault message Forced fail reason where reason is the name of the power on routine that detected the failure The majority of hardware faults are detected by the server module s diagnostics In rare cases a problem might require additional troubleshooting If you are unable to determine the cause of the problem contact your service representative for support Related Information m SPARC T3 Series Servers Administration Guide m Diagnostics Overview on page 5 m Diagnostics LEDs on page 10 e Check for Faults show faulty Command on page 18 e Managing Faults Oracle Solaris PSH on page 25 e Clear PSH Detected Faults on page 30 e Managing Faults POST on page 31 e Clear POST Detected Faults on page 40 7 e Support and Accessibility on page xi Detecting and Managing Faults m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 a Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on page 31 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 Diagnostics LEDs The server module has LEDs on the front panel and on the hard drives The LEDs conform to ANSI SIS For the locations of these LED
43. Universal unique identifier 118 SPARC T3 1B Server Module Service Manual July 2012 W WWID World wide identifier A unique number that identifies a SAS target Glossary 119 120 SPARC T3 1B Server Module Service Manual July 2012 Index A accessing the service processor 15 accounts ILOM 15 airflow blocked 9 antistatic mat and wrist strap 52 ASR disabling components 47 enabling components 48 overview 44 show components command 45 topic 44 ASR blacklist 44 asrkeys system components 45 B battery installing 105 removing 105 topics 105 blacklist ASR 44 button Remind 70 C clear_fault_action property 21 clearing PSH detected faults 30 clearing faults memory faults 75 clearing POST detected faults 40 clock battery 105 completing service 111 components disabled automatically by POST 44 displaying using showcomponent command 45 front and rear panel 2 identifying 1 location 3 managing with ASR 44 configuration guidelines memory 81 configuration reference DIMMs 81 configuring how POST runs 35 cover installing 111 removing 62 D default ILOM password 15 detecting faults 5 diag_level parameter 33 diag_mode parameter 33 diag_trigger parameter 33 diag_verbosity parameter 33 diagnostics low level 32 overview 5 process 7 running remotely 12 diagnostics overview 6 DIMMs configuration reference 81 FRU names 81 installation order
44. V List FRU Status prtdiag Command From an Oracle Solaris OS command line run the prtdiag command FRU status information is displayed Example prtdiag System Configuration Sun Microsystems sun4v SPARC T3 1B Memory size 130560 Megabytes BoSSschseessesscecesesscecesecss Virtual CPUs ssssssssnsssssscnsscosescscscass Status 1649 MHz SPARC T3 on line 1649 MHz SPARC T3 All FRUs are enabled Related Information m Check the Message Buffer dmesg Command on page 24 m View the System Message Log Files on page 24 Display FRU Information show Command on page 17 Managing Faults Oracle Solaris PSH The following topics describe the Oracle Solaris PSH feature m Oracle Solaris PSH Technology Overview on page 26 m PSH Detected Fault Example on page 27 m Check for PSH Detected Faults on page 28 m Clear PSH Detected Faults on page 30 Related Information m Diagnostics Overview on page 5 m Diagnostics Process on page 7 Detecting and Managing Faults 25 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 m Managing Faults POST on page 31 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 m POST Overview on page 32 Oracle Solaris PSH Technology Overview The Oracle Solaris PSH technology en
45. ables the server module to diagnose problems while the Oracle Solaris OS is running and to mitigate many problems before they negatively affect operations The Oracle Solaris OS uses the fault manager daemon fmd 1M which starts at boot time and runs in the background to monitor the system If a component generates an error the daemon correlates the error with data from previous errors and other relevant information to diagnose the problem Once diagnosed the fault manager daemon assigns a UUID to the error This value distinguishes this error across any set of systems When possible the fault manager daemon initiates steps to self heal the failed component and take the component offline The daemon also logs the fault to the syslogd daemon and provides a fault notification with a message ID sometimes labeled MSG ID You can use the message ID to get additional information about the problem from the knowledge article database The PSH technology covers the following server module components a CPU Memory a I O subsystem The PSH console message provides the following information about each detected fault m Type m Severity m Description Automated response Impact m Suggested action for system administrator 26 SPARC T3 1B Server Module Service Manual July 2012 If the PSH facility detects a faulty component use the fmadm faulty command to display information about the fault Alternatively you can use the Oracle IL
46. alled Oracle VTS previously named SunVTS is a validation test suite that you can use to test this server module This section provides an overview and a way to check if VTS is installed For comprehensive VTS information refer to the Oracle VTS 7 0 documentation a Oracle VTS Overview on page 49 m Check if Oracle VTS Software Is Installed on page 50 SPARC T3 1B Server Module Service Manual July 2012 Related Information m Diagnostics Overview on page 5 m Diagnostics Process on page 7 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 m Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on page 31 m Managing Components ASR Commands on page 44 Oracle VTS Overview VTS is a validation test suite that you can use to test this server module VTS provides multiple diagnostic hardware tests that verify the connectivity and functionality of most hardware controllers and devices for this server module VTS provides these kinds of test categories a Audio Communication serial and parallel m Graphic and video m Memory m Network m Peripherals hard disk drives CD DVD devices and printers m Processor m Storage Use VTS to validate a system during development production receiving inspection troubleshooting periodic maintenance and system or subsystem stressing You can run VTS through a brow
47. are met The server module is in Standby mode installed in a powered modular system but the server module s host is not started See Prepare the Server Module for Removal on page 58 You have connectivity to the SP See Access the SP Oracle ILOM on page 15 Access the Oracle ILOM gt prompt See Access the SP Oracle ILOM on page 15 Servicing Memory 75 2 Determine how to clear the fault The method you use to clear a fault depends on how the fault is identified by the show faulty command Examples m If the fault is a host detected fault displays a UUID continue to Step 3 For example gt show faulty Target Property Value ee ee ieee es Sa eee ee Se ee a ee ee SP faultmgmt 0 fru SYS MB CMP0 BOBO CH0 DO SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUN4V 8000 DxX faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Apr 24 22 43 59 faults 0 m If the fault was detected by POST and resulted in the DIMM being disabled you will see something similar to the following output gt show faulty Target Property Value ee ee fe cal ep a he pen ee Be ee ee ene SP faultmgmt 0 fru SYS MB CMP0 BOB1 CHO0 DO SP faultmgmt 0 timestamp Apr 24 16 40 56 SP faultmgmt 0 timestamp Apr 24 16 40 56 faults 0 SP faultmgmt 0 sp_detected_fault SYS MB CMP0 BOB1 CH0 D
48. asic Done O gt Init MMU 0 gt Setup POST Mailbox Done 0 gt L2 Tests Done 0 gt Extended CPU Tests Done 0 gt Scrub Memory Done 0 gt Functional CPU Tests Done 0 gt Extended Memory Tests Done 0 gt SPU CWQ Tests Done 0 gt MAU Tests Done 0 gt I0S register tests Done 0 gt Network Interface Unit Port 0 Tests Done Interface Unit Port 1 Tests Done 2010 11 18 22 24 47 330 0 0 0 gt INFO 2010 11 18 22 24 47 338 0 0 0 gt POST Passed all devices 2010 11 18 22 24 47 351 0 0 0 gt POST Return to Host Config 38 If you receive POST error messages learn how to interpret them See Interpret POST Fault Messages on page 39 Related Information SPARC T3 1B POST Overview on page 32 Oracle ILOM Properties That Affect POST Behavior on page 33 Configure How POST Runs on page 35 Interpret POST Fault Messages on page 39 Clear POST Detected Faults on page 40 Server Module Service Manual e July 2012 V Interpret POST Fault Messages 1 Run POST See Run POST With Maximum Testing on page 37 2 View the output and watch for messages that look similar to the following syntax descriptions and example m POST error messages use the following syntax where c the core number s the strand number c s gt ERROR TEST failing test cs gt H W under test FRU cis gt Repair Instructions Replace items in order liste
49. at condition m Faults When the fault manager determines that a particular FRU has an error condition that is permanent that error is classified as a fault This condition causes the Service Action Required LEDs to be turned on the FRUID PROMs updated and a fault message logged If the FRU has status LEDs the Service Action Required LED for that FRU will also be turned on You must replace a FRU identified as having a fault condition In the event of a system fault Oracle ILOM ensures that the Service Action Required LED is turned on FRUID PROMs are updated the fault is logged and alerts are displayed Faulty FRUs are identified in fault messages using the FRU name Fault Clearing The SP can detect when a fault is no longer present When this happens it clears the fault state in the FRU PROM and extinguishes the Service Action Required LED A fault condition can be removed in two ways m Unaided recovery Faults caused by environmental conditions can clear automatically if the condition responsible for the fault is no longer present Detecting and Managing Faults 13 14 m Repaired fault When a fault is repaired by human intervention such as a FRU replacement the SP will usually detect the repair automatically and extinguish the Service Action Required LED If the SP does not perform these actions you must perform these tasks manually by setting the Oracle ILOM component_state or fault_state of the faulted compon
50. ation to the serial management port On the CMM this connector is labeled SER MGT Set up your terminal device for 9600 baud 8 bit no parity 1 stop bit and no handshaking and use a null modem configuration transmit and receive signals crossed over to enable DTE to DTE communication The crossover adapters supplied with the server module provide a null modem configuration Network management port Connect this port to an Ethernet network On the CMM this connector is labeled NET MGT This port requires an IP address By default this port is configured for DHCP or you can assign an IP address 2 Decide which interface to use m Oracle ILOM CLI The CLI is the default Oracle ILOM UI Most of the commands and examples in this service manual use this interface The default login account is root with a password of changeme m Oracle ILOM browser interface Can be used when you access the SP through the network management port and have a browser Refer to the Oracle ILOM 3 0 documentation for details This interface is not referenced in this service manual Detecting and Managing Faults 15 3 Log in to Oracle ILOM The default Oracle ILOM login account is root with a default password of changeme Example of logging in to the Oracle ILOM CLI ssh root xxx XXX XXX XXX Password Waiting for daemons to initialize Daemons ready Oracle R Integrated Lights Out Manager Version 3 0 12 1 r57146 Copyright c
51. autres mesures n cessaires son utilisation dans des conditions optimales de s curit Oracle Corporation et ses affili s d clinent toute responsabilit quant aux dommages caus s par l utilisation de ce logiciel ou mat riel pour ce type d applications Oracle et Java sont des marques d pos es d Oracle Corporation et ou de ses affili s Tout autre nom mentionn peut correspondre a des marques appartenant d autres propri taires qu Oracle AMD Opteron le logo AMD et le logo AMD Opteron sont des marques ou des marques co d Advanced Micro Devices Intel et Intel Xeon sont des marques ou des marques d pos es d Intel Corporation Toutes les marques SPARC sont utilis es sous licence et sont des marques ou des marques d pos es de SPARC International Inc UNIX est une marque d pos e conc d e sous licence par X Open Company Ltd Ce logiciel ou mat riel et la documentation qui l accompagne peuvent fournir des informations ou des liens donnant acc s a des contenus des produits et des services manant de tiers Oracle Corporation et ses affili s d clinent toute responsabilit ou garantie expresse quant aux contenus produits ou services manant de tiers En aucun cas Oracle Corporation et ses affili s ne sauraient tre tenus pour responsables des pertes subies des co ts occasionn s ou des dommages caus s par l acc s des contenus produits ou services tiers ou a leur utilisation eo n ga Adobe PostScript Conten
52. d by H W under test above c s gt MSG test error message c s gt END_ERROR In this syntax c the core number s the strand number Warning and informational messages use the following syntax INFO message or WARNING message Example 33 3 36 2 gt ERROR TEST Data Bitwalk 2 gt H W under test SYS MB BOB1 CHO0 DO 2 gt Repair Instructions Replace items in order listed by H W under test above 36 3 3 2 gt MSG Pin 149 failed on SYS MB BOB1 CHO DO J1101 2 gt END_ERROR 2 gt Decode of Dram Error Log Reg Channel 2 bits 60000000 0000108c 3 2 gt 1 MEC 62 R W1C Multiple corrected errors one or more CE not logged 3 2 gt 1 DAC 61 R W1C Set to 1 if the error was a DRAM access CE 33 32 gt 38 3 2 gt L2 AFAR channel 2 00000000 00000000 2 gt 108c SYND 15 0 RW ECC syndrome 2 gt Dram Error AFAR channel 2 00000000 00000000 Detecting and Managing Faults 39 3 To obtain more information on faults run the show faulty command See Check for Faults show faulty Command on page 18 Related Information a Clear POST Detected Faults on page 40 m POST Overview on page 32 m Oracle ILOM Properties That Affect POST Behavior on page 33 a Diagnostics Overview on page 5 m Configure How POST Runs on page 35 a Run POST With Maximum Testing on page 37 Vv Clear POST Detected Faults Use this procedure if
53. d on an antistatic surface 5 Install a REM See Install a REM on page 86 Related Information a Install a REM on page 86 V Install a REM This task describes how to install a REM onto the server module For information about specific configuration tasks for your REM refer to the REM documentation 1 Prepare for service by performing the following tasks m Shut Down the Oracle Solaris OS on page 56 86 SPARC T3 1B Server Module Service Manual July 2012 Prepare the Server Module for Removal on page 58 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 ESD Safety Measures on page 52 If needed Remove a REM on page 85 2 Align the REM for installation panel 1 3 Slide the end of the REM that is opposite the connector under the tabs of the plastic standoff panel 2 4 Press the REM until the connector is fully seated on the motherboard panel 3 If there is a rubber bumper on the REM you can press down on it directly to seat the connector 5 Return the server module to operation See Returning the Server Module to Operation on page 111 6 Configure or verify the RAID after installing the REM Refer to the SPARC T3 Series Servers Administration Guide for information about RAID configuration on this server module Servicing a REM 87 Related Information a Remove a REM on page 85 88 SPARC T3 1B Serve
54. diagnostic information POST displays These properties are listed and described in Oracle ILOM Properties That Affect POST Behavior on page 33 If POST detects a faulty component the component is disabled automatically If the system is able to run without the disabled component it will boot when POST completes its tests For example if POST detects a faulty processor core the core will be disabled After POST completes its test sequence the system will boot and run using the remaining cores Related Information a Diagnostics Overview on page 5 m Oracle ILOM Properties That Affect POST Behavior on page 33 a Configure How POST Runs on page 35 m Run POST With Maximum Testing on page 37 m Interpret POST Fault Messages on page 39 a Clear POST Detected Faults on page 40 m POST Error Message Syntax on page 42 32 SPARC T3 1B Server Module Service Manual July 2012 Oracle ILOM Properties That Affect POST Behavior The following table describes the Oracle ILOM properties that determine how POST performs its operations Note The value of keyswitch_state must be normal when individual POST parameters are changed Parameter Values Description SYS keyswitch_state normal The system can power on and run POST based on the other parameter settings This parameter overrides all other commands diag The system runs POST based on predetermined settings standby The system cannot po
55. dular System on page 112 Y Shut Down the Oracle Solaris OS This topic describes one method for shutting down the Oracle Solaris OS For information on other ways to shut down the Oracle Solaris OS refer to the Oracle Solaris OS documentation 1 Log in as superuser or equivalent Depending on the type of problem you might want to view server module status or log files You also might want to run diagnostics before you shut down the server module Notify affected users that the server module will be shut down Refer to the Oracle Solaris system administration documentation for additional information Save any open files and quit all running programs Refer to the application documentation for specific information on these processes If applicable Shut down all logical domains Refer to the Oracle Solaris system administration and Oracle VM Manager for SPARC documentation for additional information Shut down the Oracle Solaris OS and reach the ok prompt Refer to the Oracle Solaris system administration documentation for additional information The following example uses the Oracle Solaris shutdown command shutdown g0 i0 y Shutdown started Tue Jun 28 13 06 20 PDT 2010 Changing to init state 0 please wait Broadcast Message from root console on server1 Tue Jun 28 13 06 20 THE SYSTEM serverl IS BEING SHUT DOWN NOW Log off now or risk your files being damaged svc startd The
56. e Returning the Server Module to Operation on page 111 Servicing a Service Processor Card 95 96 6 Access Oracle ILOM on the SP See Access the SP Oracle ILOM on page 15 If the replacement service processor detects that the service processor firmware is not compatible with the existing host firmware further action is suspended and the following message is displayed Unrecognized Chassis This module is installed in an unknown or unsupported chassis You must upgrade the firmware to a newer version that supports this chassis If you see this message go to Step 7 Otherwise go to Step 8 7 Download the system firmware Refer to the Oracle ILOM documentation for instructions 8 If you created a backup of the SP configuration use the Oracle ILOM restore utility to restore the configuration 9 Return the server module to operation Related Information m Remove the Service Processor Card on page 93 SPARC T3 1B Server Module Service Manual July 2012 Servicing the ID PROM The system ID PROM sometimes referred to as the SCC provides the server module with the host ID MAC addresses and some Oracle ILOM configuration information The system ID PROM does not typically require replacement However if you replace the ID PROM be aware that the host ID and MAC address will change When you replace the enclosure assembly swap the system ID PROM from the original enclosure assembly to
57. e filler from this slot See Remove a Drive Filler on page 67 3 Slide the drive into the bay until it is fully seated panel 1 Ay Be H9 i Ho s g ies 1 Ho 9 Al lo 4 Close the latch to lock the drive in place panels 2 and 3 5 Perform administrative tasks to reconfigure the drive The procedures that you perform at this point depend on how your data is configured You might need to partition the drive create file systems load data from backups or have data updated from a RAID configuration The following commands might apply to your circumstances m You can use the Solaris command cfgadm al to list all disks in the device tree including unconfigured disks m If the disk is not in the list such as with a newly installed disk you can use devfsadm to configure it into the tree See the devfsadm man page for details Related Information m Remove a Drive on page 64 66 SPARC T3 1B Server Module Service Manual July 2012 V Remove a Drive Filler All drive bays must be populated by either a drive or a filler 1 Open the filler lever panels 1 and 2 SO SS NASAIS ESILLE EEEE HY g g Bio ae Ae i i ge H 12 S yee Till 2 Pull to remove the filler panel 3 Related Information m Replace or Add a Drive on page 65 m Install a Drive Filler on page 67 V Install a Drive Filler All drive bays must be populated by either a drive or a fi
58. e next step in this procedure Use the component_state property of the component to clear the fault and remove the component from the ASR blacklist Use the FRU name that was reported in the fault in Step 1 Example gt set SYS MB CMP0 BOB1 CHO0 DO component_state Enabled The fault is cleared and should not show up when you run the show faulty command Additionally the front panel Fault Service Action Required LED is no longer on Reset the server module You must reboot the server module for the component_state property to take effect At the Oracle ILOM prompt use the show faulty command to verify that no faults are reported Example gt show faulty Target Property Value Seite ee eee pd he 2 Soe tet eet es ee ee ee se gt Related Information POST Overview on page 32 Oracle ILOM Properties That Affect POST Behavior on page 33 Configure How POST Runs on page 35 Run POST With Maximum Testing on page 37 Clear POST Detected Faults on page 40 Detecting and Managing Faults 41 POST Error Message Syntax POST error messages use the following syntax c S gt ERROR TEST failing test c s gt H W under test FRU c S gt Repair Instructions Replace items in order listed by H W under test above C s gt MSG test error message c S gt END_ERROR In this syntax c the core number s the strand number Warning messages use the following sy
59. e to the enclosure assembly See Servicing a Service Processor Card on page 93 Transfer the ID PROM from the original server module to the enclosure assembly See Servicing the ID PROM on page 97 108 SPARC T3 1B Server Module Service Manual July 2012 10 Transfer the USB flash drive if present from the original server module to the enclosure assembly Ensure that you install a USB flash drive only in the top slot of the connector See Servicing a USB Flash Drive on page 101 Note Do not move the battery from the original server module The replacement enclosure assembly includes a battery 11 Attach the cover to the enclosure assembly See Replace the Cover on page 111 12 Insert the completed enclosure assembly in the same slot as the original server module See Install the Server Module Into the Modular System on page 112 13 Start the server module host See Start the Server Module Host on page 114 14 Access Oracle ILOM on the SP See Access the SP Oracle ILOM on page 15 If the replacement service processor detects that the service processor firmware is not compatible with the existing host firmware further action is suspended and the following message is displayed Unrecognized Chassis This module is installed in an unknown or unsupported chassis You must upgrade the firmware to a newer version that supports this chassis If you see this message g
60. eeded Prepare the server module for service and remove the faulty DIMM See Remove a DIMM on page 73 2 Unpackage the replacement DIMM and set it on an antistatic mat 3 Ensure that the DIMM ejector tabs are in the open position panel 1 tota 4 Line up the replacement DIMM with the connector Align the DIMM notch with the key in the connector as in panel 3 This action ensures that the DIMM is oriented correctly Panel 2 shows an incorrect alignment 74 SPARC T3 1B Server Module Service Manual July 2012 Push the DIMM into the connector until the ejector tabs lock the DIMM in place If the DIMM does not easily seat into the connector verify that the orientation of the DIMM is correct Never apply excessive force Return the server module to operation See Returning the Server Module to Operation on page 111 Perform one of the following tasks to verify the DIMM m Verify a replacement DIMM See Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 m Verify additional memory See Verify DIMM Functionality on page 79 Related Information Remove a DIMM on page 73 DIMM Configuration Reference on page 81 V Clear the Fault and Verify the Functionality of the Replacement DIMM This procedure describes how to clear a memory fault and how to verify the functionality of the replacement DIMM Ensure that the following conditions
61. el 3 4 Close the ejector arms 5 Remove the server module from the modular system Lift the server module with two hands o gt Place the server module on an antistatic mat or surface 7 Insert a filler panel into the empty chassis slot Note When the modular system is operating you must fill every slot with a filler panel or a server module within 60 seconds Related Information m Remove the Cover on page 62 m Install the Server Module Into the Modular System on page 112 Preparing for Service 61 V Remove the Cover 1 Attach an antistatic strap to your wrist and then to a metal area on the server module 2 While pressing the cover release button slide the cover toward the rear of the server module about half an inch 1 cm 3 Lift the cover off the server module chassis Related Information a Illustrated Parts Breakdown on page 3 m Replace the Cover on page 111 62 SPARC T3 1B Server Module Service Manual July 2012 Servicing Hard Drives The following topics apply to hard drives installed in the external slots of the server module Description Links Determine if you can remove and replacea Drive Hot Plugging Rules on page 63 drive using hot plugging capabilities Replace a drive Remove a Drive on page 64 Replace or Add a Drive on page 65 Add an additional drive to the server Remove a Drive Filler on page 67 module Replace
62. ent The procedure for clearing faults manually is described in Clear Faults clear_fault_action Property on page 21 Many environmental faults can automatically recover For example a temporary condition might cause the computer room temperature to rise above the maximum threshold producing an overtemperature fault in the server module If the computer room temperature then returns to the normal range and the server module s internal temperature also drops back to an acceptable level the SP will detect the new fault free condition The SP will extinguish the Service Action Required LED and clear the fault state from the FRU PROM The SP can automatically detect when a FRU is removed In many cases the SP does this even if you remove the FRU while the SP is not running for example if you unplug the system power cables during service procedures This function enables Oracle ILOM to sense that a fault diagnosed to a specific FRU has been repaired Note Oracle ILOM does not automatically detect hard drive replacement Oracle Solaris Fault Manager Commands in Oracle ILOM The Oracle ILOM CLI includes a feature that enables you to access Oracle Solaris fault manager commands such as fmadm fmdump and fmstat from within the Oracle ILOM shell This feature is referred to as the Oracle ILOM faultmgmt shell HDD Faults The Oracle Solaris PSH technology does not monitor hard drives for faults As a result the SP does not rec
63. er module uses advanced ECC technology that corrects up to 4 bits in error on nibble boundaries as long as the bits are all in the same DRAM On some DIMMs if a DRAM fails the DIMM continues to function 69 The following server module features independently manage memory faults POST Based on Oracle ILOM configuration variables POST runs when the server module is powered on For correctable memory errors sometimes called CEs POST forwards the error to the Oracle Solaris PSH daemon for error handling If an uncorrectable memory fault is detected POST displays the fault with the device name of the faulty DIMMs and logs the fault POST then disables the faulty DIMMs Depending on the memory configuration and the location of the faulty DIMM POST disables half of physical memory in the system or half the physical memory and half the processor threads When the offlining process occurs in normal operation you must replace the faulty DIMMs based on the fault message and then enable the disabled DIMMs See Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 Oracle Solaris PSH technology A feature of the Solaris OS PSH uses the fault manager daemon fmd to watch for various kinds of faults When a fault occurs the fault is assigned a UUID and logged PSH reports the fault and suggests a replacement for the DIMMs associated with the fault If you suspect that the server module has a memory proble
64. erms set forth in the applicable Government contract and to the extent applicable by the terms of the Government contract the additional rights set forth in FAR 52 227 19 Commercial Computer Software License December 2007 Oracle America Inc 500 Oracle Parkway Redwood City CA 94065 This software or hardware is developed for general use in a variety of information management applications It is not developed or intended for use in any inherently dangerous applications including applications which may create a risk of personal injury If you use this software or hardware in dangerous applications then you shall be responsible to take all appropriate fail safe backup redundancy and other measures to ensure its safe use Oracle orporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications Oracle and Java are registered trademarks of Oracle and or its affiliates Other names may be trademarks of their respective owners AMD Opteron the AMD logo and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc UNIX is a registered trademark licensed through X Open Company Ltd This software or hardware and documentation may provide access
65. erver module you need the serial number of the Sun Blade 6000 modular system in which the server module is located not the serial number of the server module The serial number of the modular system is provided on a label on the upper left edge of the front bezel Use the following procedure to obtain the serial number remotely 1 Log in to the CMM of the modular system See the documentation for the Sun Blade 6000 modular system 2 Type gt show CH Preparing for Service 53 3 In the output locate the value for product_serial_number That number is the serial number of the modular system Related Information a Find the Server Module Serial Number on page 54 m Locate the Server Module on page 55 V Find the Server Module Serial Number Note To obtain support for your server module you need the serial number of the Sun Blade 6000 modular system in which the server module is located not the serial number of the server module See Find the Modular System Serial Number on page 53 The serial number of the server module is located on a sticker on the RFID mounted in the center of the front panel However this label is not present on a system that has been moved into a new enclosure assembly You also can type the Oracle ILOM show SYS command to display the number Access the Oracle ILOM CLI and type gt show SYS SYS Targets SERVICE LOCATE ACT PS_ FAULT TEMP FAULT
66. erver module to automatically configure failed components out of operation until they can be replaced In the server module ASR manages the following components a CPU strands Memory DIMMs a I O subsystem The database that contains the list of disabled components is referred to as the ASR blacklist asr db In most cases POST automatically disables a faulty component After the cause of the fault is repaired FRU replacement loose connector reseated and so on you might need to remove the component from the ASR blacklist 44 SPARC T3 1B Server Module Service Manual July 2012 The following ASR commands enable you to view and add or remove components asrkeys from the ASR blacklist You run these commands from the Oracle ILOM gt prompt TABLE ASR Commands Command Description show components Displays system components and their current state set asrkey component_state Removes a component from the asr db blacklist Enabled where asrkey is the component to enable set asrkey component_state Adds a component to the asr db blacklist where Disabled asrkey is the component to disable Note The asrkey values vary from system to system depending on how many cores and memory are present Use the show components command to see the asrkey values on a given system After you enable or disable a component you must reset or power cycle the system for the component s change of state to take effect See the
67. he characters SPT at the beginning of a message ID indicate that Oracle ILOM detected the fault 1 At the gt prompt access the Oracle ILOM faultmgmt shell gt start SP faultmgmt shell Are you sure you want to start SP faultmgmt shell y n y 2 At the faultmgmtsp gt prompt type the fmadm faulty command faultmgmtsp gt fmadm faulty 2010 08 11 14 54 23 HEKKH ARK kK eA REE SPT 8000 LC Critical 3 Type the exit command when you are finished using the Oracle ILOM faultmgt shell faultmgmtsp gt exit Related Information m Diagnostics Process on page 7 m Access the SP Oracle ILOM on page 15 a Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle ILOM Command Summary on page 21 20 SPARC T3 1B Server Module Service Manual July 2012 Vv Clear Faults clear_fault_action Property Use the clear_fault_action property with the set command to manually clear PSH detected faults for a FRU If Oracle ILOM detects a FRU replacement it will automatically clear the fault so that you do not have to clear the fault manually For PSH diagnosed faults if the replacement of the FRU is detected by the system or the fault is manually cleared on the host the fault will also be cleared from Oracle ILOM In such cases you typically do not
68. he fault gt show faulty Target Property SP faultmgmt 0 SP faultmgmt 0 faults 0 SP faultmgmt 0 faults 0 timestamp Oct 12 16 40 56 sp_detected_fault SYS MB CMP0O BOB1 CH0 DO ki SYS MB CMP0 BOB1 CH0 D0 Forced fail POST m Example of the show faulty command displaying a fault that was detected by the PSH technology These kinds of faults are identified by the presence of a UUID value gt show faulty Target Property Value eee eee Se i ce R ee eee ee E D SP f aultmgmt 0 fru SYS MB CMP0 BOB0 CHO DO SP faultmgmt 0 timestamp Mar 29 22 43 59 SP faultmgmt 0 sunw msg id SUN4V 8000 Dx faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Mar 29 22 43 59 faults 0 Related Information m Diagnostics Process on page 7 m Access the SP Oracle I LOM on page 15 m Display FRU Information show Command on page 17 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle ILOM Command Summary on page 21 Detecting and Managing Faults 19 V Check for Faults fmadm faulty Command The following is an example of the fmadm faulty command which is an alternative to the show faulty command The Oracle Solaris fmadm faulty command is invoked from within the Oracle ILOM faultmgmt shell Note T
69. ich can be determined using fmadm faulty Details There is no more information available at this time 3 Follow the suggested actions to repair the fault Related Information m PSH Detected Fault Example on page 27 m Clear PSH Detected Faults on page 30 Detecting and Managing Faults 29 Y Clear PSH Detected Faults When the Oracle Solaris PSH technology detects faults the faults are logged and displayed on the console In most cases after the fault is repaired the system detects the corrected state and repairs the fault condition automatically However you should verify this repair In cases where the fault condition is not automatically cleared you must clear the fault manually 1 After replacing a faulty FRU power on the server module 2 At the host prompt use the fmadm faulty command to determine whether the replaced FRU still shows a faulty state fmadm faulty TIME EVENT ID MSG ID SEVERITY Aug 13 11 48 33 21a8b59e 89ff 692a c4bc f4c5cccca8c8 SUN4V 8002 6E Major Platform sun4dv Chassis_id Product_sn Fault class fault cpu generic sparc strand Affects cpu cpurd Seriala ee ke ERREREEREERERKER faulted and taken out of service FRU SYS MB he product id product sn server Lda K AAEREN RRA KK ChaSsis ida ke KKK KKK KKK KKK KK KKK KK KKK KKKEK gt Serial a revyision 05 chassis 0 motherboard 0 faulty Description The
70. iciel ou la documentation qui l accompagne est conc d sous licence au Gouvernement des Etats Unis ou toute entit qui d livre la licence de ce logiciel ou l utilise pour le compte du Gouvernement des Etats Unis la notice suivante s applique U S GOVERNMENT RIGHTS Programs software databases and related documentation and technical data delivered to U S Government customers are commercial computer software or commercial technical data pursuant to the applicable Federal Acquisition Regulation and agency specific supplemental regulations As such the use duplication disclosure modification and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract and to the extent applicable by the terms of the Government contract the additional rights set forth in FAR 52 227 19 Commercial Computer Software License December 2007 Oracle America Inc 500 Oracle Parkway Redwood City CA 94065 Ce logiciel ou mat riel a t d velopp pour un usage g n ral dans le cadre d applications de gestion des informations Ce logiciel ou mat riel n est pas con u ni n est destin tre utilis dans des applications risque notamment dans des applications pouvant causer des dommages corporels Si vous utilisez ce logiciel ou mat riel dans le cadre d applications dangereuses il est de votre responsabilit de prendre toutes les mesures de secours de sauvegarde de redondance et
71. ideo mouse Refers to using a switch to enable sharing of one keyboard one display and one mouse with more than one computer Media access controller address Message ID Top level ILOM CMM target Network express module NEMs provide 10 100 1000 Ethernet 10GbE Ethernet ports and SAS connectivity to storage modules Network management port An Ethernet port on the CMM and on server module service processors Non maskable interrupt OpenBoot PROM PCle ExpressModule Modular components that are based on the PCI Express industry standard form factor and offer I O features such as Gigabit Ethernet and Fibre Channel Glossary 117 POST PSH REM S SAS SCC SER MGT server module SP SSH storage module U UCP UI UTC UUID Power on self test Predictive self healing RAID expansion module Sometimes referred to as an HBA See HBA Supports the creation of RAID volumes on disk drives Serial attached SCSI System configuration chip Serial management port A serial port on the CMM and on server modules service processors Modular component that provides the main compute resources CPU and memory in a modular system Server modules might also have onboard storage and connectors that hold REMs and FEMs Service processor Secure shell Modular component that provides computing storage to the server modules Universal connector port User interface Coordinated Universal Time
72. iew on page 12 5 Switch to the Oracle ILOM command shell 80 SPARC T3 1B Server Module Service Manual July 2012 6 Type the show faulty command gt show faulty Target Property SP faultmgmt 0 timestamp Dec 14 22 43 59 SP faultmgmt 0 sunw msg id SUN4V 8000 DxX SP faultmgmt 0 SYS MB CMP0 BOB0 CH1 D0 faults 0 SP faultmgmt 0 uuid 3aa7c854 9667 e176 efe5 e487e520 faults 0 7a8a SP faultmgmt 0 timestamp Dec 14 22 43 59 faults 0 If the show faulty command reports a fault with a UUID go to Step 7 If show faulty does not report a fault with a UUID you have completed the verification process 7 Switch to the system console and type the fmadm repair command with the UUID Use the same UUID that was displayed from the output of the Oracle ILOM show faulty command fmadm repair 3aa7c854 9667 e176 efe5 e487e520 Related Information m Remove a DIMM on page 73 m Install a Replacement DIMM on page 74 a DIMM Configuration Reference on page 81 DIMM Configuration Reference This topic provides configuration guidelines and the relationships between the DIMM physical locations and FRU names DIMM configuration guidelines m There are 16 DIMM slots that support industry standard DIMMs m You can install quantities of 4 8 or 16 DIMMs m Supported DIMM capacities 2 Gbyte 4 Gbyte and 8 Gbyte Refer to the SPARC T3 1B Server Module Product Notes f
73. ing Faults POST on page 31 m Managing Components ASR Commands on page 44 m Checking if Oracle VTS Software Is Installed on page 48 Detecting and Managing Faults 23 Vv Check the Message Buffer dmesg Command The dmesg command checks the system buffer for recent diagnostic messages and displays them 1 Log in as superuser 2 Type dmesg Related Information m View the System Message Log Files on page 24 m List FRU Status prtdiag Command on page 25 V View the System Message Log Files The error logging daemon syslogd automatically records various system warnings errors and faults in message files These messages can alert you to system problems such as a device that is about to fail The var adm directory contains several message files The most recent messages are in the var adm messages file After a period of time usually every week a new message file is automatically created The original contents of the messages file are rotated to a file named messages 0 Over a period of time the messages are further rotated to messages 1 and messages 2 and then deleted 1 Log in as superuser 2 Type more var adm messages Or if you want to view all logged messages type more var adm messages Related Information m Check the Message Buffer dmesg Command on page 24 24 SPARC T3 1B Server Module Service Manual July 2012
74. is OS If the drive cannot be taken offline shut down the Solaris OS on the server module See Shut Down the Oracle Solaris OS on page 56 3 Verify whether the blue Drive Ready LED is illuminated on the front of the drive See Diagnostics LEDs on page 10 The blue LED will only be illuminated if the drive was taken offline using c gadm or an equivalent command It will not be illuminated if Oracle Solaris was shut down 4 Remove the drive as described in the following steps a Push the latch release button on the drive panels 1 and 2 64 SPARC T3 1B Server Module Service Manual July 2012 Bi trd 5 Blo Wi Ho Bo BI HH ird I o b Grasp the latch and pull the drive out of the drive slot panel 3 5 Insert a drive filler if you are not replacing the drive in this slot See Install a Drive Filler on page 67 Related Information m Install a Drive Filler on page 67 m Replace or Add a Drive on page 65 V Replace or Add a Drive The physical address of a hard drive is based on he hard drive is physically addressed based on the slot in which it is installed 1 Identify the slot in which to install the drive m If you are replacing a drive ensure that you install the replacement drive in the same slot as the drive you removed m If you are adding an additional drive install the drive in the next available drive slot Servicing Hard Drives 65 2 If necessary remove the driv
75. le to Operation on page 111 Use an Oracle ILOM command to set the clock s day and time For example gt set SP clock datetime 061716192010 gt show SP clock SP clock Targets Properties datetime Thu JUN 17 16 19 56 2010 timezone GMT GMT usentpserver disabled Related Information a Servicing a FEM on page 89 m Returning the Server Module to Operation on page 111 SPARC T3 1B Server Module Service Manual July 2012 Replacing the Server Module Enclosure Assembly When certain parts and components in the server module such as the motherboard require replacing you must replace a high level assembly called the enclosure assembly This includes a new server module chassis with the motherboard and many other components already installed If you determine that a faulty component is not one of the replaceable FRUs described in this service manual replace the enclosure assembly of the faulty server module with a new enclosure assembly Note This procedure must be performed by an Oracle field service representative When you use an enclosure assembly you must move the following parts from the original server module to the same locations in the replacement enclosure assembly Drives drive fillers DIMMs REM FEMs SP ID PROM and USB flash drive m Transfer Components to Another Enclosure Assembly on page 108 Related Information m Identifying Components
76. ller Servicing Hard Drives 67 1 Extend the filler handle then align the filler to the empty drive bay panel 1 2 Push the filler into place 3 Close the filler lever panels 2 and 3 Related Information m Remove a Drive on page 64 m Remove a Drive Filler on page 67 68 SPARC T3 1B Server Module Service Manual July 2012 Servicing Memory The following topics describe how to determine which DIMMs are faulty remove DIMMs install DIMMs and verify DIMM functionality after installation Description Links Understand memory faults Memory Faults on page 69 Replace a faulty DIMM Locate a Faulty DIMM LEDs on page 70 Remove a DIMM on page 73 Locate a Faulty DIMM LEDs on page 70 Install a Replacement DIMM on page 74 Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 Add memory to the server module Install a Replacement DIMM on page 74 Verify DIMM Functionality on page 79 DIMM Configuration Reference on page 81 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 Memory Faults A variety of features play a role in how the memory subsystem is configured and how memory faults are handled Understanding the underlying features helps you identify and repair memory problems This topic describes how the server module deals with memory faults The serv
77. m follow the Diagnostics Process on page 7 The flowchart helps you determine if the memory problem was detected by POST or by the PSH technology Once you identify which DIMMs you want to replace see Locate a Faulty DIMM LEDs on page 70 After replacing a faulty DIMM You must perform the instructions in Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 Related Information Locate a Faulty DIMM LEDs on page 70 Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 Clear the Fault and Verify the Functionality of the Replacement DIMM on page 75 Detecting and Managing Faults on page 5 V Locate a Faulty DIMM LEDs This procedure describes how to use the DIMM LEDs on the motherboard to pinpoint the physical location of a faulty DIMM 70 SPARC T3 1B Server Module Service Manual July 2012 Note You can also obtain the location of the faulty DIMM using the Oracle ILOM show faulty command This command displays the FRU name such as SYS MB CMP0 BOBO CHO Use the FRU name and information to locate the faulty DIMM See DIMM Configuration Reference on page 81 1 Check the front panel Fault LED See Diagnostics LEDs on page 10 When a faulty DIMM is detected the front panel Fault LED and the motherboard DIMM Fault LEDs are illuminated Before opening the server module to check the DIMM Fault LEDs verify that the S
78. me All Oracle ILOM detected fault messages begin with the characters SPT For additional information on a reported fault including possible corrective action go to this web site http www sun com msg message ID where message ID is the message contained in the fault message The Oracle Solaris message buffer and log files record system events and provide information about faults e If system messages indicate a faulty device replace the FRU e For more diagnostic information review the VTS report flowchart item 4 VTS is an application you can run to exercise and diagnose FRUs To run VTS the server module must be running the Oracle Solaris OS e If VTS reports a faulty device replace the FRU e If VTS does not report a faulty device run POST flowchart item 5 POST performs basic tests of the server module components and reports faulty FRUs e Diagnostics LEDs on page 10 e Service Related Oracle ILOM Command Summary on page 21 e Check for Faults show faulty Command on page 18 e Interpreting Log Files and System Messages on page 23 e Checking if Oracle VTS Software Is Installed on page 48 e Managing Faults POST on page 31 e Oracle ILOM Properties That Affect POST Behavior on page 33 8 SPARC T3 1B Server Module Service Manual July 2012 TABLE Diagnostic Action Diagnostic Flowchart Reference Table Continued Possible Ou
79. mitigate many serious problems before they occur Log files and command interface Provide the standard Oracle Solaris OS log files and investigative commands that can be accessed and displayed on the device of your choice a Oracle VTS formerly SunVTS An application that exercises the system provides hardware validation and discloses possible faulty components with recommendations for repair The LEDs Oracle ILOM PSH and many of the log files and console messages are integrated For example when the Oracle Solaris software detects a fault it displays the fault logs it and passes information to Oracle I LOM where it is logged Depending on the fault one or more LEDs might also be illuminated The diagnostic flow chart in Diagnostics Process on page 7 describes an approach for using the server module diagnostics to identify a faulty field replaceable unit FRU The diagnostics you use and the order in which you use them depend on the nature of the problem you are troubleshooting Therefore you might perform some actions and not others Related Information m SPARC T3 Series Servers Administration Guide m Diagnostics Process on page 7 a Diagnostics LEDs on page 10 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 a Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on page 31 m Managing Components
80. mmand Use the Oracle ILOM show command to display information about individual FRUs At the gt prompt enter the show command In the following example the show command displays information about a memory module gt show SYS MB CMP0 BOB0 CHO DO SYS MB CMP0 BOBO CHO0 DO Targets T_AMB SERVICE Properties Type DIMM ipmi_name B0 C0 D0 component_state Enabled fru_name 2048MB DDR3 SDRAM fru_description DDR3 DIMM 2048 Mbytes fru_manufacturer Samsung fru_version 0 fru_part number kkxkxkxkxkxkxkkxkkkxkkkxk fru serial number kkxkxkxkxkxkxkxkxkxkxkxkxkxkkxkx fault_state OK clear_fault_action none Commands cd set show Related Information m Diagnostics Process on page 7 m Access the SP Oracle ILOM on page 15 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle LOM Command Summary on page 21 Detecting and Managing Faults 17 V Check for Faults show faulty Command Use the Oracle ILOM show faulty command to display the following kinds of faults and alerts Environmental or configuration faults Faults caused by temperature or voltage problems Environmental faults can also be caused by room temperature or blocked air flow POST detected faults Faults on devices detected by the POST diagnostics
81. n the server module to operation See Returning the Server Module to Operation on page 111 Related Information m Remove a USB Flash Drive on page 101 Servicing a USB Flash Drive 103 104 SPARC T3 1B Server Module Service Manual July 2012 Servicing the Battery The battery operates the clock for the server module m Replace the Battery on page 105 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Replace the Battery The battery maintains system time when the server module is powered off If the server module fails to maintain the proper time when it is powered off replace the battery 1 Prepare for service by performing the following tasks Shut Down the Oracle Solaris OS on page 56 Prepare the Server Module for Removal on page 58 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 ESD Safety Measures on page 52 2 Remove a FEM card using connector FEM 1 if present See Remove a FEM on page 89 3 Push the top of the battery forward then lift the battery from the holder panel 1 105 Install the replacement battery with the negative side out Put the FEM back in place if you needed to remove it to access the battery See Install a FEM on page 90 Return the server module to operation See Returning the Server Modu
82. ng the Keyswitch_state keyswitch_state HOST diag mode HOST diag level HOST diag trigger HOST diag verbosity Description of POST Execution normal normal max hw change error reset normal This is the default POST configuration This configuration tests the system thoroughly and suppresses some of the detailed POST output normal Off N A none N A POST does not run resulting in quick system initialization This configuration is not suggested diag N A N A N A N A POST runs the full spectrum of tests with the maximum output displayed The keyswitch_state parameter when set to diag overrides all the other POST variables Related Information m POST Overview on page 32 m Configure How POST Runs on page 35 m Run POST With Maximum Testing on page 37 m Interpret POST Fault Messages on page 39 m Clear POST Detected Faults on page 40 m POST Error Message Syntax on page 42 Vv Configure How POST Runs 1 Log in to Oracle ILOM gt prompt See Access the SP Oracle ILOM on page 15 Detecting and Managing Faults 35 36 2 Set the virtual keyswitch to the value that corresponds to the POST configuration you want to run The following example sets the virtual keyswitch to normal which will configure POST to run according to other parameter values gt set SYS keyswitch_state normal Set keyswitch_state to Normal
83. ntax WARNING message Informational messages use the following syntax INFO message In the following example POST reports an uncorrectable memory error affecting DIMM locations SYS MB CMP0 BOBO CHO DO and SYS MB CMP0 BOB1 CHO0 DO The error was detected by POST running on node 0 core 7 strand 2 2010 07 03 18 44 13 359 0 7 2 gt Decode of Disrupting Error Status Reg DESR HW Corrected bits 00300000 00000000 2010 07 03 18 44 13 517 0 7 2 gt T DESR_SOCSRE SOC non local sw_recoverable_error 2010 07 03 18 44 13 638 0 7 2 gt 1 DESR_SOCHCCE SOc non local hw_corrected_and_cleared_error 2010 07 03 18 44 13 773 0 7 2 gt 2010 07 03 18 44 13 836 0 7 2 gt Decode of NCU Error Status Reg bits 00000000 22000000 2010 07 03 18 44 13 958 0 7 2 gt 1 NESR_MCU1SRE MCU1 issued a Software Recoverable Error Request 2010 07 03 18 44 14 095 0 7 2 gt 1 NESR_MCU1HCCE MCU1 issued a Hardware Corrected and Cleared Error Request 2010 07 03 18 44 14 248 0 7 2 gt 2010 07 03 18 44 14 296 0 7 2 gt Decode of Mem Error Status Reg Branch 1 bits 33044000 00000000 2010 07 03 18 44 14 427 0 7 2 gt 1 MEU 61 R W1C Set to 1 on an UE if VEU 1 or VEF 1 or higher priority error in same cycle 2010 07 03 18 44 14 614 0 7 2 gt 1 MEC 60 R W1C Set to 1 on a CE if VEC 1 or VEU 1 or VEF 1 or another error in same cycle 42 SPARC T3 1B Server Module Service Manual
84. o to Step 15 Otherwise go to Step 16 15 Download the system firmware Refer to the Oracle ILOM documentation for instructions 16 Perform diagnostics to verify the proper operation of the server module 17 Transfer the serial number and product number to the FRUID of the new enclosure assembly Refer to the SPARC T3 1B knowledge article for specific instructions for updating FRUID Note The replacement enclosure assembly does not have a label with the serial number on the front of the system as was present on the original server module Replacing the Server Module Enclosure Assembly 109 18 Update any customer database that contains RFID data to include data from the RFID on the new enclosure assembly The RFID on the original server module contained different values Related Information m Detecting and Managing Faults on page 5 m Identifying Components on page 1 110 SPARC T3 1B Server Module Service Manual July 2012 Returning the Server Module to Operation The following topics describe hot to return the server module to operation after removing it from the modular system for service m Replace the Cover on page 111 m Install the Server Module Into the Modular System on page 112 m Start the Server Module Host on page 114 Related Information m Preparing for Service on page 51 V Replace the Cover Perform this task after completing installation or servicing
85. of components inside the server module 1 Set the cover on the server module panel 1 The cover edge hangs over the rear of the server module by about half an inch 1 cm 111 2 Slide the cover forward until it latches into place panel 2 3 Install the server module into the modular system chassis See Install the Server Module Into the Modular System on page 112 Related Information m Install the Server Module Into the Modular System on page 112 m Remove the Cover on page 62 V Install the Server Module Into the Modular System Caution Hold the server module firmly with both hands so that you do not drop Caution Insert a filler panel into an empty modular system slot within 60 seconds of server module removal to ensure proper chassis cooling N it The server module weighs approximately 17 pounds 8 0 kg 1 Remove the rear connector cover from the server module before inserting it in the modular system 112 SPARC T3 1B Server Module Service Manual July 2012 2 Remove a filler panel from the modular system chassis slot you intend to use When the modular system is operating you must fill every slot with a filler panel or a server module within 60 seconds 3 Hold the server module in a vertical position so that both ejector levers are on the right panel 1 cO BOP se GO AEL S666 4 Slide the server module into the chassis panel 2 5 Clo
86. ognize hard drive faults and will not light the fault LEDs on either the server module or the hard drive itself Use the Oracle Solaris message files to view hard drive faults See View the System Message Log Files on page 24 Related Information m Oracle Integrated Lights Out Manager ILOM 3 0 Concepts Guide m SPARC T3 Series Servers Administration Guide m Oracle ILOM Troubleshooting Overview on page 12 m Access the SP Oracle I LOM on page 15 SPARC T3 1B Server Module Service Manual July 2012 m Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Service Related Oracle LOM Command Summary on page 21 m Oracle ILOM Properties That Affect POST Behavior on page 33 Access the SP Oracle ILOM Note Unless indicated otherwise all examples of interaction with the SP are depicted with Oracle ILOM shell commands rather than the Oracle ILOM browser interface You can access the server module s SP either directly or through the CMM of the modular system The following steps are in terms of connecting directly to the server module 1 Establish connectivity to the SP using one of the following methods m Serial management port Connect a terminal device such as an ASCII terminal or laptop with terminal emul
87. om the Modular System for Service on page 55 Remove the Server Module From the Modular System on page 59 Remove the Cover on page 62 Related Information Returning the Server Module to Operation on page 111 General Safety Information For your protection observe the following safety precautions when setting up your equipment Follow all cautions and instructions marked on the equipment Follow all cautions and instructions described in the documentation that shipped with your system and in the SPARC T3 1B Server Module Safety and Compliance Guide Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment s electrical rating label Follow the electrostatic discharge safety practices as described in this section 51 52 Safety Symbols You will see the following symbols in various places in the server module documentation Note the explanations provided next to each symbol Caution There is a risk of personal injury or equipment damage To avoid personal injury and equipment damage follow the instructions Caution Hot surface Avoid contact Surfaces are hot and might cause personal injury if touched Caution Hazardous voltages are present To reduce the risk of electric shock and danger to personal health follow the instructions ESD Safety Measures ESD sensitive devices such as the motherboard cards hard
88. or Add a Drive on page 65 Install a drive filler Install a Drive Filler on page 67 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 Drive Hot Plugging Rules To safely remove a hard drive you must m Prevent any applications from accessing the drive m Remove the logical software links Drives cannot be hot plugged if m The drive provides the operating system and the operating system is not mirrored on another drive m The drive cannot be logically isolated from the online operations of the server module 63 If your drive falls into these conditions you must shut down the Oracle Solaris OS before you replace the drive Related Information m Remove a Drive on page 64 a Replace or Add a Drive on page 65 m Shut Down the Oracle Solaris OS on page 56 V Remove a Drive 1 Identify the drive you plan to remove Use the Ready to Remove indicator See Diagnostics LEDs on page 10 2 Prepare the drive for removal by performing one of the following m Take the drive offline The exact commands required to take the drive offline depend on the configuration of your drives For example you might need to unmount file systems or perform certain RAID commands One command that is commonly used to take a drive offline is the cEgadm command For more information refer to the Solaris cfgadm man page m Shut down the Solar
89. or the latest information Servicing Memory 81 a All DIMMs in the server module must be the same capacity FIGURE DIMM Slot Locations BOBO CH1 D0 BOBO CH1 D1 BOB0 CH0 D0 BOB0 CH0 D1 BOB1 CH0 D1 BOB1 CH0 D0 BOB1 CH1 D1 BOB1 CH1 D0 E HT BOB3 CH1 D0 al H i i BOB3 CH1 D1 mat gt i ESSN BOB3 CHO DO E BOB3 CH0 D1 DNF 1 a UF E L SSS BOB2 CHO D1 BOB2 CH0 D0 BOB2 CH1 D1 e a BOB2 CH1 DO Figure Legend DIMM slots controlled by BOBO DIMM slots controlled by BOB1 DIMM slots controlled by BOB3 DIMM slots controlled by BOB2 Fault remind button Memory fault LED for the adjacent DIMM oa fF OON 82 SPARC T3 1B Server Module Service Manual July 2012 The slots are color coded to indicate which slots to use to install different quantities of DIMMs a 4 DIMMs Blue slots a 8 DIMMs White and blue slots m 16 DIMMs Black white and blue slots The following table summarizes details on using each of the 16 DIMM slots Slot Is Used For DIMM This Quantity of FRU Name all start with Location Slot Color DIMMs SYS MB CMPO 1 Blue 4 8 16 BOBO CH1 D0 2 Black 16 BOBO CH1 D1 3 White 8 16 BOBO CH0 DO 4 Black 16 BOBO CHO0 D1 5 Black 16 BOB1 CH0 D1 6 White 8 16 BOB1 CHO DO 7 Black 16 BOB1 CH1 D1 8 Blue 4 8 16 BOB1 CH1 D0 9 Blue 4 8 16 BOB3 CH1 D0 10 Black 16 BOB3 CH1 D1 11 White 8 16 BOB3 CH0 DO 12 Black 16 BOB3 CHO
90. r Module Service Manual July 2012 Servicing a FEM The server module supports the installation of one FEM To see a list of supported FEMs for this server module refer to the SPARC T3 1B Server Module Product Notes Description Links Replace a FEM Remove a FEM on page 89 Install a FEM on page 90 Install a FEM Install a FEM on page 90 Related Information m Detecting and Managing Faults on page 5 m Preparing for Service on page 51 V Remove a FEM In addition to when you replace a FEM you might need to remove a FEM in the FEM 1 connector to access the clock battery 1 Prepare for service See m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 2 Lift the lever to eject the card panel 1 89 3 Rotate the card up and off the retainer panels 2 and 3 4 Place the removed FEM on an antistatic mat 5 Install a FEM See Install a FEM on page 90 Related Information m Install a FEM on page 90 Y Install a FEM This procedure applies to any of the form factors of FEM cards that are supported by this server module 1 Prepare for service by performing the following tasks m Shut Down the Oracle Solaris OS on page 56 90 SPARC T3 1B Server Module
91. roviding service and system firmware version information Displays information about the system serial number 22 SPARC T3 1B Server Module Service Manual July 2012 Related Information m Oracle ILOM Troubleshooting Overview on page 12 m Access the SP Oracle ILOM on page 15 m Display FRU Information show Command on page 17 m Check for Faults show faulty Command on page 18 m Check for Faults fmadm faulty Command on page 20 m Clear Faults clear_fault_action Property on page 21 m Oracle ILOM Properties That Affect POST Behavior on page 33 Interpreting Log Files and System Messages With the Oracle Solaris OS running on the server module you have the full complement of Oracle Solaris OS files and commands available for collecting information and for troubleshooting If POST or the Oracle Solaris PSH features do not indicate the source of a fault check the message buffer and log files for notifications for faults Hard disk drive faults are usually captured by the Oracle Solaris message files m Check the Message Buffer dmesg Command on page 24 m View the System Message Log Files on page 24 m List FRU Status prtdiag Command on page 25 Related Information m Diagnostics Overview on page 5 m Diagnostics Process on page 7 m Managing Faults Oracle ILOM on page 12 m Managing Faults Oracle Solaris PSH on page 25 m Manag
92. rtdiag command 25 PSH faults detected by 8 knowledge article web site 28 overview 26 topics 25 PSH detected faults 18 checking for 28 clearing 30 example 27 R related documentation x REM installing 86 removing 85 topics 85 Remind button 70 Remind Power LED 70 removing battery 105 cover 62 DIMMs 73 drive fillers 67 drives 64 FEM 89 flash drive 101 ID PROM 97 REM 85 server module 55 59 service processor card 93 returning to operation 111 running POST in Diag Mode 37 Index 123 S U safety 51 Universal Unique Identifier UUID 28 SCC 97 USB flash drive 101 SER MGT port 15 serial number 54 109 V serial port 15 var adm messages file 24 server module verifying host 114 DIMMs 75 79 installing 112 ID PROM 99 power on 114 viewing system message log files 24 service processor VTS accessing 15 checking if VTS is installed 50 service processor card 93 overview 49 installing 94 packages 50 removing 93 test types 49 Service Required system LED topics 48 triggered by ILOM 13 show command 17 show faulty command 8 18 22 30 40 75 showcomponent command 45 shutdown command 56 slot assignments DIMMs 81 Solaris OS shutting down 56 Solaris Predictive Self Healing PSH and memory faults 69 SP 93 standby mode 57 58 supported DIMMs 81 system components see components system message log files 24 using for fault diagnosis 8 T
93. rver Module Enclosure Assembly on page 107 Servicing a Service Processor Card on page 93 Servicing Memory on page 69 Servicing a FEM on page 89 Servicing a REM on page 85 Servicing the Battery on page 105 Servicing a USB Flash Drive on page 101 Servicing the ID PROM on page 97 Servicing Hard Drives on page 63 Notes FRU Name If Applicable SYS HDDn where n 0 3 SYS MB SYS MB SP SYS MP CMP0O BOBn CHn Dn SYS MB FEMn SYS MB REM SYS MB BAT Remove before inserting the server module in a slot Oracle does not offer supported USB flash drives for this server module SYS MB SCC Related Information m Front and Rear Panel Components on page 2 m Detecting and Managing Faults on page 5 m Replacing the Server Module Enclosure Assembly on page 107 4 SPARC T3 1B Server Module Service Manual July 2012 Detecting and Managing Faults These topics explain how to use various diagnostic tools to monitor server module status and troubleshoot faults in the server module Diagnostics Overview on page 5 Diagnostics Process on page 7 Diagnostics LEDs on page 10 Managing Faults Oracle ILOM on page 12 Interpreting Log Files and System Messages on page 23 Managing Faults Oracle Solaris PSH on page 25 Managing Faults POST on page 31 Managing Components ASR Commands
94. s 74 drive fillers 67 FEM 90 flash drive 102 ID PROM 98 REM 86 server module 112 service processor card 94 installinging drives 65 L LEDs DIMMs 70 front panel 2 interpreting 10 Remind Power 70 122 SPARC T3 1B Server Module Service Manual July 2012 locating faulty DIMMs 70 locating the server module to be serviced 55 log files 8 24 logging into ILOM 15 M MAC address 99 maximum testing with POST 37 memory configuration guidelines 81 memory faults 69 memory faults and POST 69 message buffer checking 24 message identifier 28 messages POST fault 39 motherboard 107 N NET MGT port 15 O ok prompt 56 Oracle Solaris log files 8 Oracle Solaris OS checking log files for fault information 8 files and commands 23 Oracle Solaris Predictive Self Healing See PSH Oracle Solaris Predictive Self Healing PSH see PSH OS log files 8 P password default ILOM 15 POST clearing faults 40 components disabled by 44 configuration examples 35 configuring 35 faults detected by 8 32 interpreting POST fault messages 39 and memory faults 69 modes and ILOM parameters 33 output 42 overview running 31 running in Diag Mode 37 troubleshooting with 9 using for fault diagnosis 8 POST detected faults 18 power button 57 58 powering on 114 power on self test See POST Predictive Self Healing See PSH Predictive Self Healing PSH see PSH preparing for service 51 p
95. s see Front and Rear Panel Components on page 2 The table identifies the server module LEDs and explains how to interpret their behavior LED or Button Locator LED and button Ready to Remove LED Service Action Required LED Icon or Label Color White Blue Amber Description You can turn on the Locator LED to identify a particular server module When on the LED blinks rapidly There are two methods for turning a Locator LED on e Issuing the Oracle I LOM command set SYS LOCATE value Fast_Blink e Pressing the Locator button The Locator LED functions as the physical presence switch Steady state If LED is off it is not safe to remove the server module from the modular system chassis You must use Oracle ILOM to shut down the server module and put the blade into ready to remove state before this LED is on Indicates that service is required POST and Oracle ILOM are two diagnostics tools that can detect a fault or failure resulting in this indication Also faults detected by Solaris PSH can result in Oracle ILOM lighting this LED The Oracle ILOM show faulty command provides details about any faults that cause this indicator to light Under some fault conditions individual component fault LEDs are turned on in addition to the Service Action Required LED 10 SPARC T3 1B Server Module Service Manual July 2012 LED or Button Icon or Label Color Description Power OK LED
96. se both latches simultaneously locking the server module in the modular system chassis panel 3 Once installed the following server module activities take place m Standby power is applied m The front panel LEDs blinks three times then the green OK LED on the front panel blinks for a few minutes m Oracle ILOM is initialized on the server module SP and ready to use but the server module host is not started 6 Start the server module host See Start the Server Module Host on page 114 Related Information m Start the Server Module Host on page 114 Returning the Server Module to Operation 113 m Remove the Server Module From the Modular System on page 59 V Start the Server Module Host Perform this step after the server module is installed in a powered modular system 1 Perform one of the following actions m Press the Power button on the front of the server module See Front and Rear Panel Components on page 2 to locate the Power button m Access Oracle ILOM on the server module and run the start SYS command Note The server module power on process can take several minutes to complete depending on the amount of installed memory and the configured diagnostic level By default the server module boots the Oracle Solaris OS 2 Perform any diagnostics that verify the results of servicing the server module Related Information m Detecting and Managing Faults on page 5 114 SP
97. ser UI terminal UI or command UI You can run tests in a variety of modes for online and offline testing VTS also provides a choice of security mechanisms VTS software is provided in the preinstalled Oracle Solaris OS that shipped with the server module Related Information m Oracle VTS documentation m Check if Oracle VTS Software Is Installed on page 50 Detecting and Managing Faults 49 V Check if Oracle VTS Software Is Installed 1 Log in as superuser 2 Check for the presence of VTS packages pkginfo 1 SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn m If information about the packages is displayed then VTS software is installed m If you receive messages reporting ERROR information for package was not found then VTS is not installed You must take action to install the software before you can use it You can obtain the VTS software from the following places a Oracle Solaris OS media kit DVDs a Asa download from the web Related Information m Oracle VTS documentation 50 SPARC T3 1B Server Module Service Manual July 2012 Preparing for Service The following topics describe how to prepare the server module for servicing General Safety Information on page 51 Tools Needed for Service on page 53 Find the Modular System Serial Number on page 53 Find the Server Module Serial Number on page 54 Locate the Server Module on page 55 Removing the Server Module Fr
98. ssor Card 1 If possible save the configuration information for the SP Refer to the related procedures using Oracle ILOM in the SPARC T3 Series Servers Administration Guide 2 Prepare for service See m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 3 Lift the lever to eject the SP card panel 1 93 4 Rotate the card up and off the retainer panels 2 and 3 Set the card on an antistatic mat 5 Continue to install the new card See Install the Service Processor Card on page 94 Related Information m Install the Service Processor Card on page 94 V Install the Service Processor Card 1 If needed Remove the service processor card See Remove the Service Processor Card on page 93 94 SPARC T3 1B Server Module Service Manual July 2012 2 Insert the replacement service processor card into the retainer panel 1 Make sure the tab is aligned with the key panel 2 3 Lower the service processor card until it is aligned with the connector panel 3 4 Seat the service processor card into the connector by pressing the card toward the tabs while pressing down panel 4 When the service processor card is in place the lever will close 5 Return the server module to the chassis Se
99. state Enabled SYS MB CMP0 PEU0 component_state Enabled SYS MB CMP0 PEU1 component_state Enabled SYS MB CMP0 BOB0 component_state Enabled CHO DO SYS MB CMP0 BOBO component_state Enabled CH1 D0 SYS MB CMP0 BOB1 _ component_state Disabled CHO DO SYS MB CMP0 BOB1 _ component_state Enabled CH1 D0 SYS MB CMP0 BOB2 component_state Enabled CHO DO SYS MB CMP0 BOB2 component_state Enabled CH1 D0 SYS MB CMP0 BOB3 _ component_state Enabled CHO DO SYS MB CMP0 BOB3 component_state Enabled CH1 D0 SYS MB GBE component_state Enabled SYS MB USB component_state Enabled SYS MB VIDEO component_state Enabled SYS MB PCI component_state Enabled SWITCHO SYS MB PCI component_state Enabled 46 SPARC T3 1B Server Module Service Manual July 2012 SWITCH1 Related Information m View the System Message Log Files on page 24 m Disable System Components on page 47 m Enable System Components on page 48 Disable System Components You disable a component by setting its component_state property to Disabled This adds the component to the ASR blacklist 1 At the gt prompt set the component_state property to Disabled gt set SYS MB CMP0 BOB1 CH0 D0 component_state Disabled 2 Reset the server module so that the ASR command takes effect gt stop SYS Are you sure you want to stop SYS y n y Stopping
100. t and Rear Panel Components on page 2 Related Information m Shut Down the Oracle Solaris OS on page 56 m Power Off the Server Module Emergency Shutdown on page 58 Preparing for Service 57 A m Prepare the Server Module for Removal on page 58 V Power Off the Server Module Emergency Shutdown Caution All applications and files will be closed abruptly without saving changes File system corruption might occur Press and hold the Power button for four seconds Use a stylus or the tip of a pen to operate this button Related Information a Shut Down the Oracle Solaris OS on page 56 m Power Off the Server Module Power Button Standby Mode on page 57 m Prepare the Server Module for Removal on page 58 Prepare the Server Module for Removal 1 Log in to Oracle ILOM on the server module you plan to remove 2 Ensure the server module is in standby mode with the host powered off Type gt show SYS power_state SYS properties power_state Off If you do not see this message check that you have performed all the steps in Shut Down the Oracle Solaris OS on page 56 3 Type gt set SYS prepare_to_remove_action true Set prepare_to_remove_action to true The server module is in standby mode Power is removed from the host while standby power is applied to the SP 58 SPARC T3 1B Server Module Service Manual July 2012
101. tcome Additional Information Flowchart item 6 Check if the fault is environmental Flowchart item 7 Determine if the fault was detected by PSH Flowchart item 8 Determine if the fault was detected by POST Flowchart item 9 Contact technical support Determine if the fault is an environmental fault or a configuration fault If the fault listed by the show faulty command displays a temperature or voltage fault then the fault is an environmental fault Environmental faults can be caused by faulty FRUs or by environmental conditions such as when computer room ambient temperature is too high or airflow is blocked When the environmental condition is corrected the fault will automatically clear For additional information on a reported fault including possible corrective action go to this web site http www sun com msg message ID where message ID is the message contained in the fault message If the fault message does not begin with the characters SPT the fault was detected by the PSH feature For additional information on a reported fault including possible corrective action go to this web site http www sun com msg message ID where message ID is the message contained in the fault message After the FRU is replaced perform the procedure to clear PSH detected faults POST performs basic tests of the server module components and reports faulty FRUs When POST detects a faulty FR
102. to or information on content products and services from third parties Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third party content products and services Oracle Corporation and its affiliates will not be responsible for any loss costs or damages incurred due to your access to or use of third party content products or services Copyright 2010 2012 Oracle et ou ses affili s Tous droits r serv s Ce logiciel et la documentation qui accompagne sont pee par les lois sur la propri t intellectuelle Ils sont conc d s sous licence et soumis des restrictions d utilisation et de divulgation Sauf disposition de votre contrat de licence ou de la loi vous ne pouvez pas copier reproduire traduire diffuser modifier breveter transmettre distribuer exposer ex cuter publier ou afficher le logiciel m me partiellement sous quelque forme et par uelque proc d que ce soit Par ailleurs il est interdit de proc der a toute ing nierie inverse du logiciel de le d sassembler ou de le d compiler except a es fins d interoperabilit avec des logiciels tiers ou tel que prescrit par la loi Les informations fournies dans ce document sont susceptibles de modification sans pr avis Par ailleurs Oracle Corporation ne garantit pas qu elles soient exemptes d erreurs et vous invite le cas ch ant a lui en faire part par crit Si ce log
103. ts Using This Documentation ix Identifying Components 1 Front and Rear Panel Components 2 Illustrated Parts Breakdown 3 Detecting and Managing Faults 5 Diagnostics Overview 5 Diagnostics Process 7 Diagnostics LEDs 10 Managing Faults Oracle ILOM 12 Oracle ILOM Troubleshooting Overview 12 q 4 lt 4 lt 4 lt Fault Management 13 Fault Clearing 13 Oracle Solaris Fault Manager Commands in Oracle LOM 14 HDD Faults 14 Access the SP Oracle ILOM 15 Display FRU Information show Command 17 Check for Faults show faulty Command 18 Check for Faults fmadm faulty Command 20 Clear Faults clear_fault_action Property 21 Service Related Oracle ILOM Command Summary 21 Interpreting Log Files and System Messages 23 v Check the Message Buffer dmesg Command 24 v View the System Message Log Files 24 v List FRU Status prtdiag Command 25 Managing Faults Oracle Solaris PSH 25 Oracle Solaris PSH Technology Overview 26 PSH Detected Fault Example 27 v Check for PSH Detected Faults 28 v Clear PSH Detected Faults 30 Managing Faults POST 31 POST Overview 32 Oracle ILOM Properties That Affect POST Behavior 33 v Configure How POST Runs 35 v Run POST With Maximum Testing 37 v Interpret POST Fault Messages 39 v Clear POST Detected Faults 40 POST Error Message Syntax 42 Managing Components ASR Commands 44 ASR Overview 44 v Display System Components 45 v Disable System Components 47 v Enable System Components 48
104. u can also configure Oracle ILOM to send email alerts of hardware failures hardware warnings and other events related to the server module or Oracle ILOM The SP runs independently of the server module using the server module s standby power Therefore Oracle ILOM firmware and software continue to function when the server module OS goes offline or when the server module is powered off 12 SPARC T3 1B Server Module Service Manual July 2012 Fault Management Error conditions detected by Oracle ILOM POST and the Oracle Solaris PSH technology are forwarded to Oracle ILOM for fault handling Environmentals gee LOM FRU fault LEDs System fault LED POST s fault manager User alerts Solaris PSH show faulty The Oracle ILOM fault manager evaluates error messages it receives to determine whether the condition being reported should be classified as an alert or a fault m Alerts When the fault manager determines that an error condition being reported does not indicate a faulty FRU the fault manager classifies the error as an alert Alert conditions are often caused by environmental conditions such as computer room temperature which might improve over time Conditions might also be caused by a configuration error such as the wrong DIMM type being installed If the conditions responsible for the alert go away the fault manager will detect the change and will stop logging alerts for th
105. wer on locked The system can power on and run POST but no flash updates can be made HOST diag mode off POST does not run normal Runs POST according to diag level value service Runs POST with preset values for diag level and diag verbosity HOST diag level max If diag mode normal runs all the minimum tests plus extensive processor and memory tests min If diag mode normal runs minimum set of tests HOST diag trigger none Does not run POST on reset HOST diag verbosity hw change power on reset error reset all resets normal min Default Runs POST following an AC power cycle and when the top cover is removed Only runs POST for the first power on Default Runs POST if fatal errors are detected Runs POST after any reset POST output displays all test and informational messages POST output displays functional tests with a banner and pinwheel Detecting and Managing Faults 33 Parameter Values Description max POST displays all test informational and some debugging messages debug none No POST output is displayed The following flowchart illustrates the same set of Oracle ILOM set command variables Host boot settings The following table shows combinations of Oracle ILOM parameters and associated POST modes 34 SPARC T3 1B Server Module Service Manual July 2012 Oracle ILOM Parameter Normal Diagnostic Mode Default Settings No POST Execution Service Mode Usi
106. y Not all system components tested 2010 07 03 18 44 16 786 0 7 2 gt POST Return to VBSC 2010 07 03 18 44 16 795 0 7 2 gt ERROR 2010 07 03 18 44 16 839 0 7 2 gt POST toplevel status has the following failures 2010 07 03 18 44 16 952 0 7 2 gt Node 0 2010 07 03 18 44 17 051 0 7 2 gt SYS MB CMP0 BOBO CH1 D0 J1001 2010 07 03 18 44 17 145 0 7 2 gt SYS MB CMP0 BOB1 CH1 D0 J3001 2010 07 03 18 44 17 241 0 7 2 gt END_ERROR Related Information m Oracle ILOM Properties That Affect POST Behavior on page 33 m Run POST With Maximum Testing on page 37 m Clear POST Detected Faults on page 40 Detecting and Managing Faults 43 Managing Components ASR Commands The following topics explain the role played by the Automatic System Recovery ASR feature and how to manage the components it controls m ASR Overview on page 44 m Display System Components on page 45 m Disable System Components on page 47 m Enable System Components on page 48 Related Information a Diagnostics Overview on page 5 m Diagnostics Process on page 7 m Managing Faults Oracle ILOM on page 12 m Interpreting Log Files and System Messages on page 23 m Managing Faults Oracle Solaris PSH on page 25 m Managing Faults POST on page 31 m Checking if Oracle VTS Software Is Installed on page 48 ASR Overview The ASR feature enables the s
107. you suspect that a fault was not automatically cleared This procedure describes how to identify a POST detected fault and if necessary manually clear the fault In most cases when POST detects a faulty component POST logs the fault and automatically takes the failed component out of operation by placing the component in the ASR blacklist See Managing Components ASR Commands on page 44 Usually when a faulty component is replaced the replacement is detected when the SP is reset or power cycled Then the fault is automatically cleared from the system 1 After replacing a faulty FRU at the Oracle ILOM prompt use the show faulty command to identify POST detected faults POST detected faults are distinguished from other kinds of faults by the text Forced fail No UUID number is reported Example gt show faulty Target Property Value a E eae ee Ma ns a eg et SP faultmgmt 0 fru SYS MB CMP0 BOB1 CHO DO SP faultmgmt 0 timestamp Dec 21 16 40 56 SP faultmgmt 0 timestamp Dec 21 16 40 56 faults 0 SP faultmgmt 0 sp_detected_fault SYS MB CMP0 BOB1 CH0 D0 faults 0 Forced fail POST 40 SPARC T3 1B Server Module Service Manual July 2012 2 3 Take one of the following actions based on the show faulty output No fault is reported The system cleared the fault and you do not need to manually clear the fault Do not perform the subsequent steps m Fault reported Go to th
108. ystem Fault LED is lit m If the System Fault LED is not lit and you suspect there is a problem see Diagnostics Process on page 7 m If the System Fault LED is lit go to the next step 2 If needed Prepare for service See m Shut Down the Oracle Solaris OS on page 56 m Prepare the Server Module for Removal on page 58 m Remove the Server Module From the Modular System on page 59 m Remove the Cover on page 62 m ESD Safety Measures on page 52 3 Press the Remind button on the motherboard While the Remind button is pressed an LED next to the faulty DIMM illuminates enabling you to identify the faulty DIMM Tip The DIMM Fault LEDs are small and difficult to identify when they are not illuminated If you do not see any illuminated LEDs in the area of the DIMM LEDs assume that the DIMMs are not faulty Servicing Memory 71 FIGURE Locating Faulty DIMMs Figure Legend 1 DIMM 1 BOB0 CH1 D0 2 Fault LED for DIMM 1 3 Locate button for LEDs of faulty DIMMs 4 Remove the faulty DIMM See Remove a DIMM on page 73 Related Information a DIMM Configuration Reference on page 81 m Remove a DIMM on page 73 72 SPARC T3 1B Server Module Service Manual July 2012 AN AN V Remove a DIMM Caution This procedure involves handling circuit boards that are extremely sensitive to static electricity Ensure that you follow ESD preventative practices
Download Pdf Manuals
Related Search
Related Contents
Illumine CLI-EMM022067 Installation Guide Dear Sceptre Customer, TS-R-IN32M3-CL-E User Manual 取扱説明書 FLOORTEC R 670 B - AZ Reinigungstechnik.de Peg Perego Pliko P3 Compact utilisation Activation et utilisation de votre nouvelle carte SIM Belkin Play N450 Copyright © All rights reserved.