Home
SPARC Enterprise T2000 Server Service Manual
Contents
1. Q FIGURE 5 22 Removing the Fan Power Board 5 2 14 Replacing the Fan Power Board 1 Unpackage the replacement fan power board and place it on an antistatic mat 5 34 SPARC Enterprise T2000 Server Service Manual April 2007 5 215 Lower the board into place and slide the board to the left to plug it into the front T O board Secure the board to the chassis with the screws Reinstall all three fans See Section 4 2 2 Replacing a Fan on page 4 4 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the Front I O Board Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Remove all three fans See Section 4 2 1 Removing a Fan on page 4 2 Disengage the fan power board from the front I O board Step 3 and Step 4 in Section 5 2 13 Removing the Fan Power Board on page 5 34 Remove the fan guard to gain access to the M3x6 flat head screw that secures the front I O board to the chassis a Remove the screw that secures the fan guard to the chassis FIGURE 5 23 Removing the Fan Guard b Remove the fan guard from the chassis Disconnect the front I O board data cable Chapter 5 Replacing Cold Swappable FRUs 5 35 5 2 16 Remove the LED board See Section 5 2 11 Removing the LED Board on page 5 32 Remove the screw that se
2. Blower Unit Amber On The blower unit is faulty LED Off Normal operation Note When a blower fault is detected the Rear FRU Fault LED is lit 3 2 6 Ethernet Port LEDs The ALOM CMT management Ethernet port and the four 10 100 1000 Mbps Ethernet ports each have two LEDs as shown in FIGURE 3 8 and described in TABLE 3 7 3 14 SPARC Enterprise T2000 Server Service Manual April 2007 FIGURE 3 8 Ethernet Port LEDs TABLE 3 7 Ethernet Port LEDs LED Color Description Left LED Amber Speed indicator or e Amber on The link is operating as a Gigabit connection 1000 Green Mbps e Green on The link is operating as a 100 Mbps connection e Off The link is operating as a 10 Mbps connection Right LED Green Link Activity indicator Steady on A link is established e Blinking There is activity on this port e Off No link is established The NET MGT port only operates in 100 Mbps or 10 Mbps so the speed indicator LED will be green or off never amber Chapter 3 Server Diagnostics 3 15 3 3 Using ALOM CMT for Diagnosis and Repair Verification The Advanced Lights Out Manager ALOM CMT is a system controller in the server that enables you to remotely manage and administer your server ALOM CMT enables you to run diagnostics remotely such as power on self test POST that would otherwise require physical proximity to the server s serial port You can also configure ALOM CMT to send e
3. FIGURE 5 2 Locating the Metal Lever Caution The server weighs approximately 40 lb 18 kg The next step requires two people to dismount and carry the chassis 4 From the front of the server pull the release tabs forward and pull the server forward until it is free of the rack rails The release tabs are located on each rail about midway on the server 5 Set the server on a sturdy work surface Chapter 5 Replacing Cold Swappable FRUs 5 5 5 1 5 5 1 6 5 1 7 Disconnecting Power From the Server Caution The system supplies standby power to the circuit boards even when the system is powered off Disconnect both power cords from the power supplies Note The following FRU replacements do not require that power be removed DIMMs and PCI cards Performing Electrostatic Discharge Prevention Measures Prepare an antistatic surface on which to set parts during removal and installation Place ESD sensitive components such as the printed circuit boards on an antistatic mat The following items can be used as an antistatic mat m Antistatic bag used to wrap a replacement part m ESD mat part number 250 1088 m Disposable ESD mat shipped with some replacement parts or optional system components Attach an antistatic wrist strap When servicing or removing server components attach an antistatic strap to your wrist and then to a metal area on the chassis Do this after you disconnect th
4. PCI Express and PCI X specifications described in this table list the physical requirements for PCI cards Ad ditional support capabilities such as device drivers must also be provided for a PCI card to function in the server Refer to the specifications and documentation for a given PCI card to determine if the required drivers are provided that enable the card to function in this server Remote Manageability With ALOM CMT The Advanced Lights Out Manager ALOM CMT feature is a system controller SC that enables you to remotely manage and administer the server The ALOM CMT software is preinstalled as firmware and it initializes as soon as you apply power to the system You can customize ALOM CMT to work with your particular installation ALOM CMT enables you to monitor and control your server over a network or by using a dedicated serial port for connection to a terminal or terminal server ALOM CMT provides a command line interface for remotely administering geographically distributed or physically inaccessible machines In addition ALOM CMT enables you to run diagnostics such as POST remotely that would otherwise require physical proximity to the server s serial port You can configure ALOM CMT to send email alerts of hardware failures hardware warnings and other events related to the server or to ALOM CMT The ALOM CMT circuitry runs independently of the server using the server s standby power Therefore ALOM CMT firmwar
5. Viewing System Message Log Files The error logging daemon syslogd automatically records various system warnings errors and faults in message files These messages can alert you to system problems such as a device that is about to fail The var adm directory contains several message files The most recent messages are in the var adm messages file After a period of time usually every ten days a new messages file is automatically created The original contents of the messages file are rotated to a file named messages 1 Over a period of time the messages are further rotated to messages 2 and messages 3 and then deleted 1 Log in as superuser Chapter 3 Server Diagnostics 3 45 2 Issue the following command more var adm messages If you want to view all logged messages issue the following command more var adm messages 3 7 3 46 Managing Components With Automatic System Recovery Commands The Automatic System Recovery ASR feature enables the server to automatically configure failed components out of operation until they can be replaced In the server the following components are managed by the ASR feature m UltraSPARC T1 processor strands a Memory DIMMS a I O bus The database that contains the list of disabled components is called the ASR blacklist asr db In most cases POST automatically disables a faulty component After the cause of the fault is repaired FRU replacem
6. 0 gt IMMU TLB DATA RAM Access 0 gt IMMU TLB TAGS Access 0 gt IMMU CAM 0 gt Setup and Enable DMMU 0 gt Setup DMMU Miss Handler 0 gt Niagara Version 2 0 0 gt Serial Number 00000098 00000820 0 gt Init JBUS Config Regs 0 gt T0 Bridge unit 1 init test 0 0 gt sys 150 MHz CPU 600 MHz 0 gt Integrated POST Testing 0 gt Setup L2 Cache 0 gt L2 Cache Control 0 gt Scrub and Setup L2 Cache 0 gt L2 Directory clear 0 gt L2 Scrub VD amp UA 0 gt L2 Scrub Tags oo ooo co 0 coe 0 00 00 00 00 00 0000 0 O 0 gt Test Memory 0 0 gt Copyright 2005 Sun Microsystems Note some output omitted Inc All rights reserved fELLL231 17422755 mem 150 MHz 00000000 00300000 Chapter 3 Server Diagnostics 3 33 0 0 gt Scrub 00000000 00600000 gt 00000001 00000000 on Memory Channel 0 1 2 3 Rank 0 Stack 0 0 gt Scrub 00000001 00000000 gt 00000002 00000000 on Memory Channel 1 2 3 Rank 1 Stack 0 0 gt IMMU Functional 0 gt IMMU Functional 0 gt DMMU Functional 0 gt IMMU Functional 0 gt DMMU Functional 0 gt Print Mem Config 0 gt Caches Icache is ON Dcache is ON 0 gt Bank 0 4096MB 00000000 00000000 gt 00000001 00000000 0 gt Bank 2 4096MB 00000001 00000000 gt 00000002 00000000 0 gt Block Mem Test 0 gt Test 4288675840 bytes at 00000000 00600000 Memory Channel 0 1 2 3 Rank 0 Stack 0 J J WOO 0 0 gt Test 4294967
7. M Memory M Cryptography SCSI Devices mpt0 M Network e1000g3 netlbtest L e1000g1 netlbtest L e1000g2 netlbtest M e1000g0 nettest HARE FIGURE 3 12 SunVTS Test Selection Panel Optional Select the tests you want to run Certain tests are enabled by default and you can choose to accept these Alternatively you can enable and disable individual tests or blocks of tests by clicking the checkbox next to the test name or test category name Tests are enabled when checked and disabled when not checked TABLE 3 12 lists tests that are especially useful to run on this server TABLE 3 12 SunVTS Tests Useful SunVTS Tests to Run on This Server FRUs Exercised by Tests cmttest cputest fputest iutest lldcachetest dtlbtest and 12sramtest indirectly mptest and systest disktest cddvdtest nettest netlbtest pmemtest vmemtest ramtest serialtest usbkbtest disktest hsclbtest DIMMS CPU motherboard Disks cables disk backplane CD DVD device cable motherboard Network interface network cable CPU motherboard DIMMs motherboard 1 0 serial port interface USB devices cable CPU motherboard USB controller Motherboard system controller Host to system controller interface Chapter 3 Server Diagnostics 3 53 3 54 7 Optional Customize individual tests You can customize indi
8. Errors Detected by POST diag_level max the detected errors might be 3 36 correctable by PSH after the server boots i pape e If POST does not indicate a faulty FRU go to Action No 9 6 Determine if the If the fault listed by the showfaults command Section 3 3 2 Running fault is an displays a temperature or voltage fault then the the showfaults environmental fault is an environmental fault Environmental Command on page 3 21 fault faults can be caused by faulty FRUs power supply supplies fan or blower or by environmental conditions such as when computer room ambient temperature is too high or the server airflow is blocked When the environmental condition is corrected the fault will automatically clear If the fault indicates that a fan blower or power supply is bad you can perform a hot swap of the FRU You can also use the fault LEDs on the server to identify the faulty FRU fans blower and power Chapter 3 Server Diagnostics Chapter 4 Section 3 2 Using LEDs to Identify the State of Devices on page 3 8 3 5 TABLE 3 1 Diagnostic Flowchart Actions Continued Action For more information see No Diagnostic Action Resulting Action these sections 7 Determine if the If the fault message displays the following text the Section 3 5 Using the fault was detected fault was detected by the Solaris Predictive Self Solaris Predictive Self by PSH Healing software Healing Feature on Host detecte
9. Provides the main 12V power interconnect PDB distribution Removing the between the power supplies and the other board Power Distribution boards Board on page 5 27 7 Cable Cable replacement Includes the following bus bars hard disk drive n a management instructions are cable motherboard I O cable PDB I O cable kit provided in the PDB DVD cable front I O board cable corresponding FRU procedures 8 Power supply Section 4 3 1 The power supplies provide 3 3 Vdc standby PSO units PS Removing a Power power at 3 Amps and 12 Vdc at 25 Amps PS1 Supply on page 4 4 When facing the rear of the system PSO is on the left and PS1 is on the right 9 Rear blower Section 4 4 1 Blower FT2 Removing the Rear Blower on page 4 7 10 LED board Section 5 2 11 Contains the push button circuitry and LEDs that LEDBD Removing the LED are displayed on the front bezel of the chassis Board on page 5 32 11 Front I O Section 5 2 15 Front I O board FIOBD board Removing the Front I O Board on page 5 35 12 Fan power Section 5 2 13 Houses the connectors and three amber LEDs for FANBD board Removing the Fan the fan assemblies Power Board on page 5 34 13 Fans Section 4 2 1 Fans 0 1 and 2 FTO FMO Removing a Fan FTO FM1 ni be FTO FM2 Appendix Field Replaceable Units A 5 TABLE A 1 Server FRU List Continued Replacement Item No FRU Instructions Description FRU Name 14 SAS disk Section 5 2 19 T
10. Rotate the hold down bracket so that it does not protrude into the chassis Replacing PCI Cards Unpackage the replacement PCI Express or PCI X card and place it on an antistatic mat Locate the proper socket for the card you are replacing Rotate the PCI hold down bracket 90 degrees so you can install the card Insert the card into the socket Rotate the PCI hold down bracket 90 degrees to lock the card in place Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Chapter 5 Replacing Cold Swappable FRUs 5 11 625 Removing DIMMs Note Not all DIMMs detected as faulty and offlined by POST must be replaced In service maximum mode POST detects memory devices with errors that might be corrected with Solaris PSH See Section 3 4 5 Correctable Errors Detected by POST on page 3 36 Caution This procedure requires that you handle components that are sensitive to static discharges that can cause the component to fail To avoid this problem ensure that you follow antistatic practices as described in Section 5 1 6 Performing Electrostatic Discharge Prevention Measures on page 5 6 1 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 2 Locate the DIMMs FIGURE 5 8 that you want to replace Use FIGURE 5 8 and TABLE 5 1 to identify the DIMMs you want to remove Note For memory c
11. sc gt enablecomponent MB CMP0 CH0 R0 DO 8 Perform the following steps to verify the repair a Set the virtual keyswitch to diag so that POST will run in Service mode sc gt setkeyswitch diag b Issue the poweron command sc gt poweron Chapter 5 Replacing Cold Swappable FRUs 5 15 c Switch to the system console to view POST output sc gt console Watch the POST output for possible fault messages The following output is a sign that POST did not detect any faults 0 gt POST Passed all devices 0 gt 0 gt DEMON Diagnostics Engineering MONitor 0 gt Select one of the following functions 0 gt POST Return to OBP 0 gt INFO 0 gt POST Passed all devices 0 gt Master set ACK for vbsc runpost command and spin D O O oS 6 Note Depending on the configuration of ALOM CMT POST variables and whether POST detected faults or not the system might boot or the system might remain at the ok prompt If the system is at the ok prompt type boot d Return the virtual keyswitch to normal mode sc gt setkeyswitch normal e Issue the Solaris OS fmadm faulty command fmadm faulty No memory or DIMM faults should be displayed If faults are reported refer to the diagnostics flowchart in FIGURE 3 1 for an approach to diagnose the fault 9 Gain access to the ALOM CMT sc gt prompt 10 Run the showfaults command 5 16 SPAR
12. sc gt showcomponent Keys MB CMP0 PO MB CMPO P1 MB CMP0 P2 MB CMPO P3 MB CMP0 P8 MB CMP0 P9 MB CMP0 P10 MB CMPO P11 MB CMP0 P12 MB CMP0 P13 MB CMPO P14 MB CMPO P15 MB CMP0 P16 MB CMP0O P17 MB CMP0 P18 MB CMP0 P19 MB CMPO0 P20 MB CMPO P21 MB CMPO0 P22 MB CMP0 P23 MB CMP0 P28 MB CMP0 P29 MB CMPO P30 MB CMP0 P31 MB CMP0 CH0 RO DO MB CMP0 CHO RO D1 MB CMPO0O CH0 R1 D0 MB CMP0 CHO R1 D1 MB CMP0 CH1 R0 DO0 MB CMP0 CH1 RO D1 MB CMP0 CH1 R1 D0 MB CMP0 CH1 R1 D1 MB CMP0 CH2 R0 D0 MB CMP0 CH2 R0 D1 MB CMP0 CH2 R1 D0 MB CMP0 CH2 R1 D1 MB CMP0 CH3 RO0 DO MB CMP0 CH3 RO0 D1 MB CMP0 CH3 R1 D0 MB CMP0 CH3 R1 D1 IOBD PCIEa TOBD PCIEb PCIX1 PCIXO PCIE2 PCIET PCIEO TTYA ASR state clean Chapter 3 Server Diagnostics 3 47 S72 3 7 3 Example showing a disabled component sc gt showcomponent ASR state Disabled Devices MB CMP0 CH3 R1 D1 dimmi5 deemed faulty Disabling Components The disablecomponent command disables a component by adding it to the ASR blacklist At the sc gt prompt enter the disablecomponent command sc gt disablecomponent MB CMP0 CH3 R1 D1 SC Alert MB CMP0 CH3 R1 D1 disabled After receiving confirmation that the disablecomponent command is complete reset the server so that the ASR command takes effect sc gt reset Enabling Disabled Components The enablecomponent command enables a disabled component by removing it from the ASR blacklist 1 At the sc gt prompt
13. setcsn command 5 31 setkeyswitch parameter 3 20 3 30 5 15 setlocator command 3 10 3 20 5 3 showcomponent command 3 46 3 47 showenvironment command 3 20 3 22 4 6 showfaults command 3 4 5 15 5 16 Index 5 description and examples 3 21 syntax 3 20 troubleshooting with 3 5 showfru command 3 20 3 25 showkeyswitch command 3 20 showlocator command 3 20 showlogs command 3 20 showplat form command 2 10 3 20 5 32 shutting down the system 5 2 slide rails release lever 5 5 releasing 5 3 5 43 Solaris log files 3 4 Solaris OS collecting diagnostic information from 3 45 Solaris Predictive Self Healing PSH detected faults 3 4 standby power 5 6 state of server 3 10 sun4v architecture 2 3 SunVTS 3 2 3 4 exercising the system with 3 50 running 3 51 tests 3 53 user interfaces 3 50 3 51 3 53 3 54 support obtaining 3 5 switch intrusion 5 1 syslogd daemon 3 45 system configuration PROM 5 18 system console switching to 3 18 system controller 3 2 system controller card A 4 battery 5 40 removing 5 17 replacing 5 18 system temperatures displaying 3 22 T temperature sensors 2 7 TLB misses reduction of 2 3 tools required 5 2 top cover release button 5 7 removing 5 6 replacing 5 42 Top Fan Fault LED 3 10 4 2 top front cover removing 5 7 replacing 5 41 troubleshooting actions 3 4 DIMMs 3 8 U UltraSPARC T1 multicore processor 2 2 2 3 3 41 Universal Unique
14. 1 Disengage both power supplies from the power distribution board To disengage a power supply push the power supply latch to the right and pull the power supply out a few inches to disengage it from the PDB FIGURE 5 17 Chapter 5 Replacing Cold Swappable FRUs 5 27 Power supply latches FIGURE 5 17 Location of Power Supply Latch 3 Disconnect all cables from the PDB Disconnect the hard drive power connector from the PDB Release the latches on the DVD cable and disconnect it Disconnect the cable marked P7 Disconnect the blower power cable from the power distribution board 5 28 SPARC Enterprise T2000 Server Service Manual April 2007 4 Remove the two screws that secure the power distribution board to the bus bar FIGURE 5 18 E PDB mounting screw r ry Bus bar screws 1 ul FIGURE 5 18 Location of Bus Bar Screws on the Power Distribution Board and the Motherboard Assembly Chapter 5 Replacing Cold Swappable FRUs 5 29 5 2 10 5 Remove the screw FIGURE 5 19 that secures the power distribution board to the chassis FIGURE 5 19 Removing the Power Distribution Board 6 Slide the power distribution board toward the front of the chassis and remove it from the chassis FIGURE 5 19 7 Place the power distribution board on an antistatic mat Replacing the Power Distribution Board Caution The system supplies power to the power distribution board even when t
15. 16 fmdump command 3 42 front bezel removing 5 8 replacing 5 42 front I O board A 5 removing 5 35 replacing 5 36 front panel illustration 2 9 LED status displaying 3 22 LEDs 3 8 FRU ID PROMs 3 16 FRU replacement common procedures 5 1 FRU status displaying 3 25 FRUs hot swapping 4 1 illustration of A 2 names locations and descriptions A 4 replacing 5 8 FTO fan FRU names A 5 FT2 rear blower FRU name A 5 H hard drive controller 6 1 hard drives A 6 adding additional 6 1 hot plugging 4 9 identification 4 9 latch release button 4 10 LEDs 3 11 removing 4 9 replacing 4 9 4 10 slot assignments 6 3 specifications 2 4 status displaying 3 22 hardware components sanity check 3 31 HDD hard drive FRU names A 6 help command 3 19 host ID 5 18 hot pluggable devices adding 6 1 hot plugging hard drives 4 9 hot swappable devices fans 4 2 FRUs 4 1 overview 2 6 power supplies 4 4 l I O board see also motherboard 5 19 identification of chassis 2 9 indicators 3 8 installing additional devices 6 1 insulating washer for motherboard 5 26 interlock 5 1 intrusion switch 5 1 5 41 IOBD motherboard FRU name A 4 J JBus I O interface 2 3 L L1 and L2 cache 2 3 large page optimization 2 3 latch release button hard drive 4 10 LED board A 5 removing 5 32 replacing 5 33 LEDBD LED board FRU name A 5 LEDs about 3 8 AC OK 3 4 3 12 blower unit fault 3 1
16. 3 and swing the cable management arm out of the way so you can access the power supply Unscrew the two thumbscrews FIGURE 4 4 that secure the rear blower to the chassis LED FIGURE 4 4 Removing the Rear Blower Grasp the thumbscrews and slowly slide the blower out of the chassis keeping the blower level as you remove it Replacing the Rear Blower Unpackage the replacement blower Slide the blower into the chassis until it locks into the power connector at the front of the blower compartment FIGURE 4 5 Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs 4 7 FT2 FIGURE 4 5 Replacing the Blower Unit 3 Tighten the two thumbscrews to secure the blower to the chassis 4 Verify that the Rear Blower and Service Required LEDs are not lit 5 Close the CMA inserting the end of the CMA into the rear left rail bracket 4 8 SPARC Enterprise T2000 Server Service Manual April 2007 4 5 4 5 1 Hot Plugging a Hard Drive The hard drives in the server are hot pluggable but this capability depends on how the hard drives are configured To hot plug a drive you must be able to take the drive offline prevent any applications from accessing it and remove the logical software links to it before you can safely remove it The following situations inhibit the ability to perform hot plugging of a drive a The hard drive provides the operating sys
17. 5 37 m Section 5 2 19 Removing the SAS Disk Backplane on page 5 37 and Section 5 2 20 Replacing the SAS Disk Backplane on page 5 38 m Section 5 2 21 Removing the Battery on the System Controller on page 5 40 and Section 5 2 22 Replacing the Battery on the System Controller on page 5 40 To locate these FRUs refer to Appendix A 0 21 Removing PCI Express and PCI X Cards 1 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 2 Locate the PCI card that you want to remove To locate the PCI card slots refer to FIGURE 5 5 and FIGURE 5 6 The PCI card slots are located on the I O portion of the motherboard assembly PCI E slots PCI X slots FIGURE 5 5 Location of PCI Express and PCI X Card Slots 3 Note where the PCI card is installed and note any cables so you know where to reinstall the card and cables Chapter 5 Replacing Cold Swappable FRUs 5 9 PCI E slots 0 1 2 PCI X slots 0 1 FIGURE 5 6 Location of PCI Express and PCI X Card Slots 4 Note and remove any cables that are attached to the card 5 Rotate the PCI hold down bracket 90 degrees so it no longer covers the PCI card FIGURE 5 7 5 10 SPARC Enterprise T2000 Server Service Manual April 2007 5 22 PCI hold down bracket FIGURE 5 7 PCI Card and Hold Down Bracket Carefully pull the card out of the socket Place the card on an antistatic mat
18. 5 6 Chapter 5 Replacing Cold Swappable FRUs 5 19 CPU board I O board FIGURE 5 11 Motherboard Assembly 1 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 2 Remove all cables from the rear of the server Ensure that you remove all cables and power cords 3 Remove any PCI option cards that are installed and then rotate the hold down brackets so they do not protrude into the chassis 4 Remove all DIMMs from the motherboard assembly See Section 5 2 3 Removing DIMMs on page 5 12 Note the memory configuration so you can reinstall the memory in the replacement board 5 Remove the system controller card from the motherboard assembly See Section 5 2 5 Removing the System Controller Card on page 5 17 6 Disconnect cables from the motherboard assembly m Disconnect the gray ribbon cable that runs along the left side of the chassis and motherboard m Disconnect the cable marked P8 FIGURE 5 12 m Disconnect the hard drive data cables and carefully pull them through the interior wall of the chassis 5 20 SPARC Enterprise T2000 Server Service Manual April 2007 The SAS hard drive and the cable marked P8 pass through a cutout in the interior wall of the chassis Before removing the motherboard assembly ensure that these cables are out of the way The SAS hard drive cables can be folded back over the interior wall or passed through the cutout FIG
19. 78 1 53 1 62 1 98 2 07 IOBD V_ 3V3MAIN OK 3 38 2 80 2 97 3 63 3 79 IOBD V_ 3V3STBY OK 3 33 2 80 2 97 3 63 3 519 IOBD V_ 1V OK 1 11 0 93 0 99 1 21 1 26 IOBD V_ 1V2 OK EELT 1 02 1 08 1 32 1 38 IOBD V_ 5V OK 5 09 4 25 4 50 5 50 5 75 IOBD V_ 12V OK 12 11 13 80 13 20 10 80 10 20 IOBD V_ 12V OK 1 2 1 8 10 20 10 80 13 20 13 80 SC BAT V_BAT OK 3 03 2 69 System Load in amps Sensor Status Load Warn Shutdown MB I_VCORE OK 25 280 80 000 88 000 MB I_VMEML OK 4 680 60 000 66 000 MB I_VMEMR OK 4 680 60 000 66 000 Current sensors sensor Status TIOBD TI_USBO OK IOBD T_USB1 OK FIOBD I_USB OK Power Supplies Supply Status Underspeed Overtemp Overvolt Undervolt Overcurrent PSO OK OFF OFF OFF OFF OFF PS1 OK OFF OFF OFF OFF OFF sc gt Note Some environmental information might not be available when the server is in standby mode 3 24 SPARC Enterprise T2000 Server Service Manual April 2007 3 3 4 Running the showfru Command The showfru command displays information about the FRUs in the server Use this command to see information about an individual FRU or for all the FRUs Note By default the output of the showfru command for all FRUs is very long At the sc gt prompt enter the showfru command In the following example the showfru command is used to get information about the motherboard MB sc gt showfru MB SEEPROM SEGMENT SD ManR ManR UNIX_Timestamp32 WED OCT 12
20. POST runs the full spectrum of tests with the maximum output displayed The setkeyswitch parameter when set to diag overrides all the other ALOM CMT POST variables Earlier versions of firmware have max as the default setting for the POST diag_level variable To set the default to min use the ALOM CMT command setsc diag_level min 3 4 2 Changing POST Parameters 3 30 1 Access the ALOM CMT sc gt prompt At the console issue the key sequence 2 Use the ALOM CMT sc gt prompt to change the POST parameters Refer to TABLE 3 9 for a list of ALOM CMT POST parameters and their values The setkeyswitch parameter sets the virtual keyswitch so it does not use the setsc command For example to change the POST parameters using the setkeyswitch command enter the following sc gt setkeyswitch diag SPARC Enterprise T2000 Server Service Manual April 2007 3 4 3 3 4 3 1 To change the POST parameters using the setsc command you must first set the setkeyswitch parameter to normal then you can change the POST parameters using the setsc command sc gt setkeyswitch normal sc gt setsc value Example sc gt setkeyswitch normal sc gt setsc diag mode service Reasons to Run POST You can use POST for basic hardware verification and diagnosis and for troubleshooting as described in the following sections Verifying Hardware Functionality POST tests c
21. T2000 Server Service Manual April 2007 FIGURE 2 1 FIGURE 2 2 FIGURE 2 3 FIGURE 2 4 FIGURE 3 1 FIGURE 3 2 FIGURE 3 3 FIGURE 3 4 FIGURE 3 5 FIGURE 3 6 FIGURE 3 7 FIGURE 3 8 FIGURE 3 9 FIGURE 3 10 FIGURE 3 11 FIGURE 3 12 FIGURE 4 1 FIGURE 4 2 FIGURE 4 3 FIGURE 4 4 Figures Server 2 2 Motherboard and UltraSPARC T1 Multicore Processor 2 3 Server Front Panel 2 9 Server Rear Panel 2 9 Diagnostic Flow Chart 3 3 Front Panel LEDs 3 9 Rear Panel LEDs 3 9 Hard Drive LEDs 3 11 Power Supply LEDs 3 12 Location of FanLEDs 3 13 Location of the Blower UnitLED 3 14 Ethernet PortLEDs 3 15 ALOM CMT Fault Management 3 16 Flowchart of ALOM CMT Variables for POST Configuration 3 29 SunVTS GUI 3 52 SunVTS Test Selection Panel 3 53 Fan Identification and Removal 4 3 Locating Power Supplies and Release Latch 4 5 Rotating the Cable Management Arm 4 6 Removing the Rear Blower 4 7 xi FIGURE 4 5 FIGURE 4 6 FIGURE 5 1 FIGURE 5 2 FIGURE 5 3 FIGURE 5 4 FIGURE 5 5 FIGURE 5 6 FIGURE 5 7 FIGURE 5 8 FIGURE 5 9 FIGURE 5 10 FIGURE 5 11 FIGURE 5 12 FIGURE 5 13 FIGURE 5 14 FIGURE 5 15 FIGURE 5 16 FIGURE 5 17 FIGURE 5 18 FIGURE 5 19 FIGURE 5 20 FIGURE 5 21 FIGURE 5 22 FIGURE 5 23 FIGURE 5 24 FIGURE 5 25 FIGURE 5 26 FIGURE 5 27 Replacing the Blower Unit 4 8 Locating the Hard Drive Release Button and Latch 4 10 Slide Relea
22. article database SPARC Enterprise T2000 Server Service Manual April 2007 The Predictive Self Healing technology covers the following server components a UltraSPARC T1 multicore processor Memory a I O bus The PSH console message provides the following information Type Severity Description Automated response Impact Suggested action for system administrator If the Solaris PSH facility detects a faulty component use the fmdump command to identify the fault Faulty FRUs are identified in fault messages using the FRU name For a list of FRU names see Appendix A Note Additional Predictive Self Healing information is available at http www sun com msg SA Identifying PSH Detected Faults When a PSH fault is detected a Solaris console message similar to the following is displayed SUNW MSG ID SUN4V 8000 DX TYPE Fault VER 1 SEVERITY Minor EVENT TIME Wed Sep 14 10 09 46 EDT 2005 PLATFORM SUNW SPARC Enterprise T2000 CSN HOSTNAME wgs48 37 SOURCE cpumem diagnosis REV 1 5 EVENT ID f92e9fbe 735e c218 cf87 9e1720a28004 DESC The number of errors associated with this memory module has exceeded acceptable levels Refer to http sun com msg SUN4V 8000 DX for more information AUTO RESPONSE Pages of memory associated with this memory module are being removed from service as errors are reported IMPACT Total system memory capacity will be reduced as pages are retired REC ACTI
23. differs according to your system s model and configuration Example sc gt showenvironment Environmental Status System Temperatures Temperatures in Celsius Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard PDB T_AMB OK 23 10 5 0 45 50 55 MB T_AMB OK 26 10 5 0 50 55 60 MB CMP0 T_TCORE OK 44 10 5 0 85 3 22 SPARC Enterprise T2000 Server Service Manual April 2007 95 100 MB CMP0 T_BCORE OK 45 10 5 0 85 95 100 IOBD IOB TCORE OK 41 10 5 0 95 100 105 IOBD T_AMB OK 30 10 5 0 45 50 55 SYS LOCATE SYS SERVICE SYS ACT OFF ON ON SYS REAR_FAULT SYS TEMP_FAULT SYS TOP_FAN_FAULT OFF OFF OFF Disk Status Service OK2RM HDDO OK OFF OFF HDD1 OK OFF OFF HDD2 OK OFF OFF HDD3 OK OFF OFF Fans Speeds Revolution Per Minute Sensor Status Speed Warn Low FTO FMO OK 3618 D 1920 FTO FM1 OK 3437 1920 FTO FM2 OK 3556 1920 FT2 OK 2578 1900 Voltage sensors in Volts Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft MB V_ 1V5 OK 1 48 1 36 1 39 1 60 1 63 MB V_VMEML OK 1 78 1 69 1 72 1 87 1 90 MB V_VMEMR OK 1 78 1 69 dr 72 1 87 1 90 MB V_VTTL OK 0 87 0 84 0 86 0 93 0 95 MB V_VTTR OK 0 87 0 84 0 86 0 93 0 95 Chapter 3 Server Diagnostics 3 23 MB V_ 3V3STBY OK 3 33 3 13 3 16 3 53 3 59 MB V_VCORE OK 1 30 1 20 1 24 1 36 1 39 TOBD V_ 1V5 OK 1 48 1527 1435 1 65 L 72 IOBD V_ 1V8 OK 1
24. does not perform any action but it provides a warning if the power supply should not be removed because the other power supply is not enabled Generates a hardware reset on the host server The y option enables you to skip the confirmation question The c option executes a console command after completion of the reset command Reboots the system controller The y option enables you to skip the confirmation question Sets the virtual keyswitch The y option enables you to skip the confirmation question when setting the keyswitch to stby Turns the Locator LED on the server on or off Displays the environmental status of the host server This information includes system temperatures power supply front panel LED hard drive fan voltage and current sensor status See Section 3 3 3 Running the showenvironment Command on page 3 22 Displays current system faults See Section 3 3 2 Running the showfaults Command on page 3 21 Displays information about the FRUs in the server e g lines specifies the number of lines to display before pausing the output to the screen e s displays static information about system FRUs defaults to all FRUs unless one is specified e d displays dynamic information about system FRUs defaults to all FRUs unless one is specified See Section 3 3 4 Running the showfru Command on page 3 25 Displays the status of the virtual keyswitch Displays the current state of the
25. enter the enablecomponent command sc gt enablecomponent MB CMP0 CH3 R1 D1 SC Alert MB CMP0 CH3 R1 D1 reenabled 3 48 SPARC Enterprise T2000 Server Service Manual April 2007 2 After receiving confirmation that the enablecomponent command is complete reset the server for so that the ASR command takes effect sc gt reset 3 8 3 8 1 Exercising the System With Sun VTS Sometimes a server exhibits a problem that cannot be isolated definitively to a particular hardware or software component In such cases it might be useful to run a diagnostic tool that stresses the system by continuously running a comprehensive battery of tests Sun provides the SunVTS software for this purpose This chapter describes the tasks necessary to use SunVTS software to exercise your server m Section 3 8 1 Checking Whether SunVTS Software Is Installed on page 3 49 m Section 3 8 2 Exercising the System Using SunVTS Software on page 3 50 Checking Whether SunVTS Software Is Installed This procedure assumes that the Solaris OS is running on the server and that you have access to the Solaris command line Check for the presence of SunVTS packages using the pkginfo command pkginfo 1 SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn The following table lists SunVTS packages Package Description SUNWvts SunVTS framework SUNWvtsr SunVTS framework root SUNWvtsts SunVTS for tests SUNWvt smn SunVTS man
26. licence r gissant le produit ou la technologie fourni e SAUF MENTION CONTRAIRE EXPRESSEMENT STIPULEE DANS CE CONTRAT FUJITSU LIMITED SUN MICROSYSTEMS INC ET LES SOCIETES AFFILIEES REJETTENT TOUTE REPRESENTATION OU TOUTE GARANTIE QUELLE QU EN SOIT LA NATURE EXPRESSE OU IMPLICITE CONCERNANT CE PRODUIT CETTE TECHNOLOGIE OU CE DOCUMENT LESQUELS SONT FOURNIS EN L TAT EN OUTRE TOUTES LES CONDITIONS REPRESENTATIONS ET GARANTIES EXPRESSES OU TACITES Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALIT MARCHANDE A L APTITUDE UNE UTILISATION PARTICULI RE OU L ABSENCE DE CONTREFA ON SONT EXCLUES DANS LA MESURE AUTORIS E PAR LA LOI APPLICABLE Sauf mention contraire express ment stipul e dans ce contrat dans la mesure autoris e par la loi applicable en aucun cas Fujitsu Limited Sun Microsystems Inc ou l une de leurs filiales ne sauraient tre tenues responsables envers une quelconque partie tierce sous quelque th orie juridique que ce soit de tout manque gagner ou de perte de profit de probl mes d utilisation ou de perte de donn es ou d interruptions d activit s ou de tout dommage indirect sp cial secondaire ou cons cutif m me si ces entit s ont t pr alablement inform es d une telle ventualit LA DOCUMENTATION EST FOURNIE EN L ETAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLI
27. on page 5 11 SPARC Enterprise T2000 Server Service Manual April 2007 529 13 14 15 Reconnect the cables to the motherboard m Reconnect the cable marked P8 to the I O board m Reconnect the gray ribbon cable that runs along the left side of the chassis m Pull the hard drive data cables through the interior wall of the chassis and reconnect the cables to the motherboard Reconnect all cables that were removed from the rear of the server Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the Power Distribution Board The power distribution board PDB provides the circuitry that distributes power to the other components in the system The PDB also contains an electronic copy of the chassis serial number see Section 2 3 Obtaining the Chassis Serial Number on page 2 10 When you replace this board you must run certain service commands to update the replacement PDB with the chassis serial number The steps to perform these service commands are provided in Section 5 2 10 Replacing the Power Distribution Board on page 5 30 Caution The system supplies power to the power distribution board even when the system is powered off To avoid personal injury or damage to the system you must disconnect power cords before servicing the PDB Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5
28. pin Ensure that the system controller is correctly oriented A notch along the bottom of the system controller corresponds to a tab on the socket 6 Push firmly and evenly on both ends of the system controller until it is firmly seated in the socket You hear a click when the ejector levers lock into place 7 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the Motherboard Assembly The motherboard for your server has two distinct boards for the CPU and the I O board However they must be removed and replaced as a single motherboard assembly FIGURE 5 11 Caution Remove and replace the motherboard carefully The motherboard rests on metal standoffs If the motherboard is not handled carefully the components mounted on the underside of the motherboard can be damaged if they hit the standoffs To ensure that this damage does not occur perform the removal and replacement instructions described in this document Caution A flexible cable connects the CPU and I O boards This flexible cable is fragile Handle these parts very carefully to prevent damage Caution This procedure requires that you handle components that are sensitive to static discharges that can cause the component to fail To avoid this problem ensure that you follow antistatic practices as described in Section 5 1 6 Performing Electrostatic Discharge Prevention Measures on page
29. propri t intellectuelle relatifs aux produits et technologies d crits dans ce document De m me ces produits technologies et ce document sont prot g s par des lois sur le copyright des brevets d autres lois sur la propri t intellectuelle et des trait s internationaux Les droits de propri t intellectuelle de Sun Microsystems Inc et Fujitsu Limited concernant ces produits ces technologies et ce document comprennent sans que cette liste soit exhaustive un ou plusieurs des brevets d pos s aux Etats Unis et indiqu s l adresse http www sun com patents de m me qu un ou plusieurs brevets ou applications brevet es suppl mentaires aux Etats Unis et dans d autres pays Ce document le produit et les technologies aff rents sont exclusivement distribu s avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit de ces technologies ou de ce document ne peut tre reproduite sous quelque forme que ce soit par quelque moyen que ce soit sans l autorisation crite pr alable de Fujitsu Limited et de Sun Microsystems Inc et de leurs ventuels bailleurs de licence Ce document bien qu il vous ait t fourni ne vous conf re aucun droit et aucune licence expresses ou tacites concernant le produit ou la technologie auxquels il se rapporte Par ailleurs il ne contient ni ne repr sente aucun engagement de quelque type que ce soit de la part de Fujitsu
30. swappable power supplies two m Redundant hot swappable fan units three m Environmental monitoring m Error detection and correction for improved data integrity m Easy access for most component replacements a Extensive POST tests that automatically delete faulty components from the configuration m PSH automated run time diagnosis capability that takes faulty components offline For more information about using RAS features refer to the SPARC Enterprise T2000 Server Administration Guide Hot Pluggable and Hot Swappable Components The server hardware supports hot plugging or hot swapping of the chassis mounted hard drives fans power supplies and the rear blower Using the proper software commands you can install or remove these components while the server is running Hot plug and hot swap technologies significantly increase the server s serviceability and availability by providing the ability to replace hard drives fan units rear blower and power supplies without service disruption 2 6 SPARC Enterprise T2000 Server Service Manual April 2007 2 1 4 2 2 1 4 3 2 1 4 4 2 1 4 5 Power Supply Redundancy The server features two hot swappable power supplies which enable the system to continue operating should a power supply or power sources fail The server also has a single hot swappable blower unit that works in conjunction with the power supply fans to provide cooling for the internal hard drives If the blow
31. test POST 3 26 power cords disconnecting 5 6 reconnecting 5 45 power distribution board PDB A 5 cables 5 28 removing 5 27 replacing 5 30 Power OK LED 3 4 3 10 Power On Off button 3 10 5 3 power specifications 2 4 power supplies 4 2 A 5 fault LED 4 4 hot swapping 4 4 latches 5 27 LEDs 3 12 redundancy about 2 7 replacing 4 6 status displaying 3 22 powercycle command 3 19 3 32 3 37 powering off the server 5 2 powering on the server 5 45 poweroff command 3 19 5 2 poweron command 3 19 5 15 power on self test POST 3 5 about 3 26 ALOM CMT commands 3 27 configuration flowchart 3 29 error message example 3 35 error messages 3 35 example output 3 33 fault clearing 3 39 faulty components detected by 3 39 how to run 3 32 parameters changing 3 30 Index 4 SPARC Enterprise T2000 Server Service Manual April 2007 reasons to run 3 31 troubleshooting with 3 6 Predictive Self Healing PSH about 2 8 3 40 clearing faults 3 44 memory faults and 3 7 procedures for finishing up 5 41 procedures for parts replacement 5 1 processor 2 2 processor designation 2 4 PROM system configuration 5 18 PS0 PS1 power supply FRU names A 5 PSH detected faults 3 21 PSH see also Predictive Self Healing PSH 3 40 Q quick visual notification 3 1 R RAID redundant array of independent disks storage configurations 2 7 rear blower A 5 hot swapping 4 7 re
32. the System Controller Card Grasp the top corners of the card and pull it out of the socket Place the system controller card on an antistatic mat Remove the system configuration PROM FIGURE 5 10 from the system controller and place it on an antistatic mat The system controller contains the persistent storage for the host ID and Ethernet MAC addresses of the system as well as the ALOM CMT configuration including the IP addresses and ALOM CMT user accounts if configured This information will be lost unless the system configuration PROM is removed and installed in the replacement system controller The PROM does not hold the fault data and this data will no longer be accessible when the system controller is replaced System configuration PROM FIGURE 5 10 Locating the System Configuration PROM Replacing the System Controller Card Unpackage the replacement system controller card and place it on an antistatic mat Install the system configuration PROM that you removed from the faulty system controller card The PROM is keyed to ensure proper orientation 3 Locate the system controller slot on the motherboard assembly 5 18 SPARC Enterprise T2000 Server Service Manual April 2007 gt gt amp 4 Ensure that the ejector levers are open 5 Holding the bottom edge of the system controller parallel to its socket carefully align the system controller so that each of its contacts is centered on a socket
33. the console type console 3 18 SPARC Enterprise T2000 Server Service Manual April 2007 3 3 1 3 Service Related ALOM CMT Commands TABLE 3 8 describes the typical ALOM CMT commands for servicing a server For descriptions of all ALOM CMT commands issue the help command or refer to the Advanced Lights Out Management ALOM CMT Guide TABLE 3 8 Service Related ALOM CMT Commands ALOM CMT Command Description help command break y c D clearfault UUID console f consolehistory b lines e lines v g lines boot run bootmode normal reset_nvram bootscript string powercycle f poweroff y f poweron c Displays a list of all ALOM CMT commands with syntax and descriptions Specifying a command name as an option displays help for that command Takes the host server from the OS to either kmdb or OpenBoot PROM equivalent to a Stop A depending on the mode Solaris software was booted e y skips the confirmation question e c executes a console command after the break command completes e D forces a core dump of the Solaris OS Manually clears host detected faults The UUID is the unique fault ID of the fault to be cleared Connects you to the host system The option forces the console to have read and write capabilities Displays the contents of the system s console buffer The following options enable you to specify how the output is displayed e g lines spe
34. weighs approximately 40 lb 18 kg Two people are required to dismount and carry the chassis The system supplies power to the power distribution board even when the system is powered off To avoid personal injury or damage to the system you must disconnect all power cords before servicing the power distribution board xxii SPARC Enterprise T2000 Server Service Manual April 2007 Product Handling Maintenance Warning Certain tasks in this manual should only be performed by a certified service engineer User must not perform these tasks Incorrect operation of these tasks may cause electric shock injury or fire a Installation and reinstallation of all components and initial settings m Removal of front rear or side covers Mounting de mounting of optional internal devices m Plugging or unplugging of external interface cards m Maintenance and inspections repairing and regular diagnosis and maintenance Caution The following tasks regarding this product and the optional products provided from Fujitsu should only be performed by a certified service engineer Users must not perform these tasks Incorrect operation of these tasks may cause malfunction m Unpacking optional adapters and such packages delivered to the users m Plugging or unplugging of external interface cards Remodeling Rebuilding Caution Do not make mechanical or electrical modifications to the equipment Using this pro
35. 1 Common Procedures for Parts Replacement on page 5 1 2 Remove all three fans See Section 4 2 1 Removing a Fan on page 4 2 SPARC Enterprise T2000 Server Service Manual April 2007 5 2 12 3 Remove the screws that secure the LED board to the chassis FIGURE 5 21 FIGURE 5 21 Removing the LED Board From the Chassis Slide the LED board to the right to disconnect it from the front I O board Remove the LED board from the chassis and place it on an antistatic mat Replacing the LED Board Install the LED board in the chassis Slide the board to the left to connect it to the front I O board Secure the LED board to the chassis using two M3x6 flat head screws FIGURE 5 21 Replace all three fans See Section 4 2 2 Replacing a Fan on page 4 4 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Chapter 5 Replacing Cold Swappable FRUs 5 33 5 2 13 Removing the Fan Power Board 1 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 2 Remove all three fans See Section 4 2 1 Removing a Fan on page 4 2 3 Remove the screw that secures the fan power board to the chassis FIGURE 5 22 4 Slide the fan power board to the right to disengage it from the front I O board 5 Remove the fan power board from the front fan bay and place the board on an antistatic mat
36. 18 24 28 2005 ManR Description ASSY SPARC Enterprise T2000 CPU Board ManR Manufacture Location Sriracha Chonburi Thailand ManR Sun Part No 5016843 ManR Sun Serial No NCOOOD ManR Vendor Celestica ManR Initial HW Dash Level 06 ManR Initial HW Rev Level 02 ManR Shortname T2000_MB SpecPartNo 885 0483 04 SEGMENT FL Configured_LevelR Configured_LevelR UNIX_Timestamp32 WED OCT 12 18 24 28 2005 Configured_LevelR Sun_Part_No 5410827 Configured_LevelR Configured_Serial_No N4001A Configured_LevelR HW_Dash_Level 03 Chapter 3 Server Diagnostics 3 25 3 4 3 4 1 Running POST Power on self test POST is a group of PROM based tests that run when the server is powered on or reset POST checks the basic integrity of the critical hardware components in the server CPU memory and I O buses If POST detects a faulty component the component is disabled automatically preventing faulty hardware from potentially harming any software If the system is capable of running without the disabled component the system will boot when POST is complete For example if one of the processor cores is deemed faulty by POST the core will be disabled and the system will boot and run using the remaining cores In normal operation the default configuration of POST diag_level min provides a sanity check to ensure the server will boot Normal operation applies to any power on of the server not intended to
37. 296 bytes at 00000001 00000000 Memory Channel 0 1 2 3 Rank 1 Stack 0 DOS Eu 0 0 gt I10 Bridge Tests 0 0 gt I0 Bridge Quick Read 0 0 gt D OSSEESSS SSSR sh SSeS SS se Se SS SSeS See SSSR See SSeS eS SSeS 0 0 gt 10 Bridge Quick Read Only of CSR and ID Os ES 0 0 gt fire 1 JBUSID 00000080 0 000000 0 0 gt 10 Bridge unit 1 Config MB bridges 0 0 gt Config port A bus 2 dev 0 func 0 tag IOBD PCI SWITCHO 0 0 gt Config port A bus 3 dev 1 func 0 tag IOBD GBEO 0 0 gt INFO Master Abort for probe device IOBD PCIE1 looks like it is not present 0 0 gt INFO Master Abort for probe device IOBD PCIE2 looks like it is not present 0 0 gt INFO 0 0 gt POST Passed all devices 0 0 gt 0 0 gt DEMON Diagnostics Engineering MONitor 0 0 gt Select one of the following functions 0 0 gt POST Return to OBP 3 34 SPARC Enterprise T2000 Server Service Manual April 2007 0 0 gt INFO 0 0 gt POST Passed all devices 0 0 gt Master set ACK for vbsc runpost command and spin 5 Perform further investigation if needed a If no faults were detected the system will boot m If POST detects a faulty device the fault is displayed and the fault information is passed to ALOM CMT for fault handling Faulty FRUs are identified in fault messages using the FRU name For a list of FRU names see Appendix A a Interpret the POST messages POST error messages use the following syntax c
38. 3 Ethernet port 3 14 fan 4 2 fan fault 3 13 front panel 3 8 hard drive 3 11 hard drive activity 3 11 Locator 3 10 OK to Remove 3 11 Overtemp 3 11 4 2 Power OK 3 4 3 10 power supply 3 12 Rear Blower Fault 4 8 Rear FRU fault 3 10 Index 3 rear panel 3 9 Service Required 3 10 Top Fan Fault 3 10 locating the server 3 10 locating the server for maintenance 5 3 location of connectors 2 9 Locator LED button 3 10 5 3 log files viewing 3 45 M maintenance position 5 3 extending server to 5 3 MB motherboard FRU name A 4 memory configuration 3 6 configuration guidelines 6 4 fault handling 3 6 overview 2 4 ranks 6 4 memory access crossbar 2 3 memory also see DIMMs message ID 3 40 messages file 3 45 motherboard A 4 cables reconnecting 5 27 removing 5 19 removing cables 5 20 replacing 5 23 screw locations 5 26 motherboard washer 5 26 N names of FRUs A 1 O OK LED 3 10 OK to Remove LED 3 11 operating state determining 3 10 OSP card A 4 Overtemp LED 3 11 4 2 P parity checking 2 7 parts replacement see FRUs PCI PCIE and PCIX FRU names A 4 PCI capabilities 6 7 PCI hold down bracket 5 10 PCI E and PCI X cards adding 6 7 designations A 4 replacing 5 11 PCI E and PCI X interface specifications 2 4 PDB power distribution board FRU name A 5 performance enhancements 2 3 platform name 2 4 POST detected faults 3 4 3 21 POST see also power on self
39. 5 13 6 5 troubleshooting 3 8 disablecomponent command 3 46 3 48 disabled component 3 48 disabled DIMMs 5 15 disk drives see hard drives displaying FRU status 3 25 dmesg command 3 45 DVD drive A 6 removing 5 37 replacing 5 37 DVD drive FRU name A 6 DVD specification 2 4 E electrostatic discharge ESD prevention 1 2 5 6 enablecomponent command 3 40 3 46 3 48 5 15 environmental faults 3 4 3 5 3 17 3 21 error correction 2 7 error messages 2 7 Ethernet MAC addresses 5 18 Ethernet ports about 2 4 LEDs 3 14 specifications 2 4 event log checking the PSH 3 42 exercising the system with SunVTS 3 50 extending server to maintenance position 5 3 F fan cover latch 5 7 fan door 4 2 fan fault LEDs 3 13 4 2 fan power board A 5 removing 5 34 replacing 5 34 fan redundancy 2 7 fan status displaying 3 22 FANBD fan power board FRU name A 5 fans A 5 hot swapping 4 2 identifying faulty 4 3 removing 4 2 replacing 4 4 fault manager daemon fmd 1M 3 40 fault message ID 3 21 fault records 3 44 faults 3 16 3 21 environmental 3 4 3 5 managing DIMM faults 5 15 recovery 3 17 repair 3 17 types of 3 21 Index 2 SPARC Enterprise T2000 Server Service Manual April 2007 feature specifications 2 4 features server 2 2 field replaceable units FRUs also see FRUs 5 1 FIOBD front I O board FRU name A 5 firmware 2 5 flexible cable 5 19 5 24 fmadm command 3 44 5
40. 6 3 3 1 Running ALOM CMT Service Related Commands 3 18 3 3 1 1 Connecting to ALOM CMT 3 18 3 3 1 2 Switching Between the System Console and ALOM CMT 3 18 3 3 1 3 Service Related ALOM CMT Commands 3 19 3 3 2 Running the showfaults Command 3 21 3 3 3 Running the showenvironment Command 3 22 3 34 Running the showfru Command 3 25 3 4 Running POST 3 26 3 4 1 Controlling How POST Runs 3 26 3 4 2 Changing POST Parameters 3 30 vi SPARC Enterprise T2000 Server Service Manual April 2007 3 5 3 6 3 7 3 8 3 43 Reasons to Run POST 3 31 3 4 3 1 Verifying Hardware Functionality 3 31 3 4 3 2 Diagnosing the System Hardware 3 32 3 4 4 Running POST in Maximum Mode 3 32 3 4 5 Correctable Errors Detected by POST 3 36 3 4 5 1 Correctable Errors for Single DIMMs 3 37 3 4 5 2 Determining When to Replace Detected Devices 3 38 3 4 6 Clearing POST Detected Faults 3 39 Using the Solaris Predictive Self Healing Feature 3 40 3 5 1 Identifying PSH Detected Faults 3 41 3 5 1 1 Using the fmdump Command to Identify Faults 3 42 3 5 2 Clearing PSH Detected Faults 3 44 Collecting Information From Solaris OS Files and Commands 3 45 3 6 1 Checking the Message Buffer 3 45 3 6 2 Viewing System Message Log Files 3 45 Managing Components With Automatic System Recovery Commands 3 3 7 1 Displaying System Components 3 47 3 7 2 Disabling Components 3 48 3 7 3 Enabling Disabled Components 3 48 Exercising the System With SunVTS 3 49 3 8 1 Checking Whether Sun
41. C Enterprise T2000 Server Service Manual April 2007 5 2 0 11 12 13 a If the fault was detected by the host and the fault information persists the output will be similar to the following example sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CHO RO DO Host detected fault MSGID SUN4U 8000 2S UUID 7ee0e46b ea64 6565 e684 e996963F7b86 m If the showfaults command does not report a fault with a UUID then you do not need to proceed with the following steps because the fault is cleared Run the clearfault command sc gt clearfault 7ee0e46b ea64 6565 e684 e996963 7b86 Switch to the system console sc gt console Issue the fmadm repair command with the UUID Use the same UUID that you used with the clearfault command fmadm repair 7ee0e46b ea64 6565 e684 e996963 7b86 Removing the System Controller Card Caution The system controller card can be hot To avoid injury handle it carefully Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Locate the system controller card See Appendix A for an illustration of the servers FRUs that shows the system controller card Push down on the ejector levers on each side of the system controller until the card is released from the socket Chapter 5 Replacing Cold Swappable FRUs 5 17 5 2 6 P FIGURE 5 9 Ejecting and Removing
42. CABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFACON Contents Preface xvii Safety Information 1 1 1 1 Safety Information 1 1 1 2 Safety Symbols 1 1 1 3 Electrostatic Discharge Safety 1 2 1 3 1 Using an Antistatic Wrist Strap 1 2 1 3 2 Using an Antistatic Mat 1 2 Server Overview 2 1 2 1 Server Features 2 2 2 11 Chip Multitheaded Multicore Processor and Memory Technology 2 2 2 12 Performance Enhancements 2 3 2 13 Remote Manageability With ALOM CMT 2 5 2 1 4 System Reliability Availability and Serviceability 2 6 2 1 4 1 Hot Pluggable and Hot Swappable Components 2 6 2 142 Power Supply Redundancy 2 7 2 1 4 3 Fan Redundancy 2 7 2 1 4 4 Environmental Monitoring 2 7 2 1 4 5 Error Correction and Parity Checking 2 7 2 1 5 Predictive Self Healing 2 8 2 2 Chassis Identification 2 9 2 3 Obtaining the Chassis Serial Number 2 10 3 Server Diagnostics 3 1 3 1 Overview of Server Diagnostics 3 1 3 11 Memory Configuration and Fault Handling 3 6 3 1 1 1 Memory Configuration 3 6 3 1 1 2 Memory Fault Handling 3 7 3 1 1 3 Troubleshooting Memory Faults 3 8 3 2 Using LEDs to Identify the State of Devices 3 8 3 2 1 Front and Rear Panel LEDs 3 8 3 2 2 Hard Drive LEDs 3 11 3 2 3 Power Supply LEDs 3 12 3 2 4 Fan LEDs 3 13 3 2 5 Blower Unit LED 3 13 3 2 6 Ethernet Port LEDs 3 14 3 3 Using ALOM CMT for Diagnosis and Repair Verification 3 1
43. D EXPRESS OR IMPLIED REGARDING SUCH PRODUCT OR TECHNOLOGY OR THIS DOCUMENT WHICH ARE ALL PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Unless otherwise expressly set forth in such agreement to the extent allowed by applicable law in no event shall Fujitsu Limited Sun Microsystems Inc or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits loss of use or data or business interruptions or for any indirect special incidental or consequential damages even if advised of the possibility of such damages DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID 4 Adobe PostScript Copyright 2007 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 Etats Unis Tous droits r serv s Entr e et revue tecnical fournies par FUJITSU LIMITED sur des parties de ce mat riel Sun Microsystems Inc et Fujitsu Limited d tiennent et contrdlent toutes deux des droits de
44. Example ok sc gt showfaults v ID Time FRU Fault 1 APR 24 12 47 27 MB CMP0 CH2 R0 D0 MB CMP0 CH2 RO0 DO deemed faulty and disabled In this example MB CMP0 CH2 R0 DO is disabled The system can boot using memory that was not disabled until the faulty component is replaced Note You can use ASR commands to display and control disabled components See Section 3 7 Managing Components With Automatic System Recovery Commands on page 3 46 Correctable Errors Detected by POST In maximum mode POST detects and offlines memory devices with errors that could be correctable by PSH Use the examples in this section to verify if the detected memory devices are correctable Note For servers powered on in maximum mode without the intention of validating a hardware upgrade or repair examine all faults detected by POST to verify if the errors can be corrected by Solaris PSH See Section 3 5 Using the Solaris Predictive Self Healing Feature on page 3 40 When using maximum mode if no faults are detected return POST to minimum mode sc gt setkeyswitch normal sc gt setsc diag mode normal sc gt setsc diag level min 3 36 SPARC Enterprise T2000 Server Service Manual April 2007 3 4 5 1 Correctable Errors for Single DIMMs If POST faults a single DIMM CODE EXAMPLE 3 1 that was not part of a hardware upgrade or repair it is likely that POST encountered a correctable error that can be ha
45. Identifier UUID 3 40 3 42 USB connectors 6 4 USB devices guidelines for adding 6 3 USB ports 2 4 V virtual keyswitch 3 30 5 15 voltage and current sensor status displaying 3 22 W washer for motherboard 5 26 weight of server 5 4 Index 6 SPARC Enterprise T2000 Server Service Manual April 2007 cO FUJITSU
46. Limited ou de Sun Microsystems Inc ou des soci t s affili es Ce document et le produit et les technologies qu il d crit peuvent inclure des droits de propri t intellectuelle de parties tierces prot g s par copyright et ou c d s sous licence par des fournisseurs Fujitsu Limited et ou Sun Microsystems Inc y compris des logiciels et des technologies relatives aux polices de caract res Par limites du GPL ou du LGPL une copie du code source r gi par le GPL ou LGPL comme applicable est sur demande vers la fin utilsateur disponible veuillez contacter Fujitsu Limted ou Sun Microsystems Inc Cette distribution peut comprendre des composants d velopp s par des tierces parties Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Sun Sun Microsystems le logo Sun Java Netra Solaris Sun StorEdge docs sun com OpenBoot SunVTS Sun Fire SunSolve CoolThreads J2EE et Sun sont des marques de fabrique ou des marques d pos es de Sun Microsystems Inc aux Etats Unis et dans d autres pays Fujitsu et le logo Fujitsu sont des marques d pos es de Fujitsu Limited Toutes les marques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d aut
47. Locator LED as either on or off Displays the history of all events logged in the ALOM CMT event buffers in RAM or the persistent buffers Displays information about the host system s hardware configuration the system serial number and whether the hardware is providing service Note See TABLE 3 11 for the ALOM CMT ASR commands 3 20 SPARC Enterprise T2000 Server Service Manual April 2007 33 2 Running the showfaults Command The ALOM CMT showfaults command displays the following kinds of faults a Environmental faults Temperature or voltage problems that might be caused by faulty FRUs power supplies fans or blower or by room temperature or blocked air flow to the server m POST detected faults Faults on devices detected by the power on self test diagnostics a PSH detected faults Faults detected by the Solaris Predictive Self healing PSH technology Use the showfaults command for the following reasons m To see if any faults have been passed to or detected by ALOM CMT m To obtain the fault message ID SUNW MSG ID for PSH detected faults m To verify that the replacement of a FRU has cleared the fault and not generated any additional faults At the sc gt prompt type the showfaults command The following showfaults command examples show the different kinds of output from the showfaults command m Example of the showfaults command when no faults are present sc gt s
48. Memory Configuration In the server memory there are 16 slots that hold DDR 2 memory DIMMs in the following DIMM sizes m 512 MB maximum of 8 GB 1 GB maximum of 16 GB m 2 GB maximum of 32 GB m 4 GB maximum of 64 GB 3 6 SPARC Enterprise T2000 Server Service Manual April 2007 3 1 1 2 DIMMs are installed in groups of eight called ranks ranks 0 and 1 At a minimum rank 0 must be fully populated with eight DIMMS of the same capacity A second rank of DIMMs of the same capacity can be added to fill rank 1 See Section 5 2 3 Removing DIMMs on page 5 12 for instructions about adding memory to a server Memory Fault Handling The server uses advanced ECC technology also called chipkill that corrects up to 4 bits in error on nibble boundaries as long as the bits are all in the same DRAM If a DRAM fails the DIMM continues to function The following server features independently manage memory faults a POST Based on ALOM CMT configuration variables POST runs when the server is powered on In normal operation the default configuration of POST diag_level min provides a check to ensure the server will boot Normal operation applies to any boot of the server not intended to test power on errors hardware upgrades or repairs Once the Solaris OS is running PSH provides run time diagnosis of faults When a memory fault is detected POST displays the fault with the device name of the faulty DIMMS logs the fau
49. ON Schedule a repair procedure to replace the affected memory module Use fmdump v u lt EVENT_ID gt to identify the module Chapter 3 Server Diagnostics 3 41 The following is an example of the ALOM CMT alert for the same PSH diagnosed fault SC Alert Host detected fault MSGID SUN4V 8000 DX Note The Service Required LED is also turns on for PSH diagnosed faults 3 5 1 1 Using the fmdump Command to Identify Faults The fmdump command displays the list of faults detected by the Solaris PSH facility and identifies the faulty FRU for a particular EVENT_ID UUID Do not use fmdump to verify a FRU replacement has cleared a fault because the output of fmdump is the same after the FRU has been replaced Use the fmadm faulty command to verify the fault has cleared Note Faults detected by the Solaris PSH facility are also reported through ALOM CMT alerts In addition to the PSH mdump command the ALOM CMT showfaults command provides information about faults and displays fault UUIDs See Section 3 3 2 Running the showfaults Command on page 3 21 1 Check the event log using the fmdump command with v for verbose output fmdump v TIME UUID SUNW MSG ID Apr 24 06 54 08 2005 1ce22523 1c80 6062 e61d 3b39290ae2c SUN4U 8000 6H 100 fault cpu ultraSPARCT112cachedata FRU he component MB rsrc cpu cpuid 0 serial 22D1D6604A In this example a fault is displayed indic
50. Shut down the Solaris OS Refer to the Solaris system administration documentation for additional information Switch from the system console to the ALOM CMT sc gt prompt by typing the Hash Period key sequence d At the ALOM CMT sc gt prompt issue the poweroff command sc gt poweroff fy SC Alert SC Request to Power Off Host Immediately 5 2 SPARC Enterprise T2000 Server Service Manual April 2007 5 13 Note You can also use the Power On Off button on the front of the server to initiate a graceful system shutdown This button is recessed to prevent accidental server power off Use the tip of a pen to operate this button Refer to the Advanced Lights Out Management ALOM CMT Guide for more information about the ALOM CMT poweroff command Extending the Server to the Maintenance Position If the server is installed in a rack with the extendable slide rails that were supplied with the server use this procedure to extend the server to the maintenance position Note Remove the server from the rack for all cold swappable FRU replacement procedures except the DIMMs PCI cards and the system controller Optional Issue the following command from the ALOM CMT sc gt prompt to locate the system that requires maintenance sc gt setlocator on Locator LED is on Once you have located the server press the Locator LED button to turn it off Check to see that no cables w
51. URE 5 12 However the cable marked P8 is large and contains a number of small wires The cable will not easily pass through the cutout While pushing and pulling the cables through the cutout be careful not to damage the wires FIGURE 5 12 Cable Cutout 7 Remove the screws that secure the motherboard assembly to the chassis FIGURE 5 13 Caution Do not remove the screws that hold the flexible cable in place These screws must be installed at the factory and they must not be removed Chapter 5 Replacing Cold Swappable FRUs 5 21 Flexible cable do not remove flex cable screws FIGURE 5 13 Location of the Screws in the Motherboard Assembly 8 Lift the front of the motherboard to clear the front standoffs The front of the motherboard refers to the part of the motherboard nearest the front of the server 9 Slide the motherboard forward to clear the connectors from the cutouts in the rear of the chassis 5 22 SPARC Enterprise T2000 Server Service Manual April 2007 10 A 11 12 0 Using the handle tilt the motherboard assembly over the interior chassis wall and lift it out of the chassis FIGURE 5 14 Caution Do not lift the motherboard assembly over the front fan housing to remove it from the chassis because doing so can damage the assembly FIGURE 5 14 Removing the Motherboard Assembly From the Server Chassis Place the motherboard assembly on an antistatic mat Replacing the Mot
52. VTS Software Is Installed 3 49 3 8 2 Exercising the System Using SunVTS Software 3 50 3 8 3 Exercising the System With SunVTS Software 3 51 Replacing Hot Swappable and Hot Pluggable FRUs 4 1 4 1 4 2 Devices That Are Hot Swappable and Hot Pluggable 4 2 Hot Swapping a Fan 4 2 42 1 RemovingaFan 4 2 Contents vii 42 2 Replacing a Fan 44 43 Hot Swapping a Power Supply 4 4 4 3 1 Removing a Power Supply 4 4 43 2 Replacing a Power Supply 4 6 44 Hot Swapping the Rear Blower 4 7 441 Removing the Rear Blower 4 7 44 2 Replacing the Rear Blower 4 7 45 Hot Plugging a Hard Drive 4 9 4 5 1 Removing a Hard Drive 4 9 4 5 2 Replacing a Hard Drive 4 10 5 Replacing Cold Swappable FRUs 5 1 5 1 Common Procedures for Parts Replacement 5 1 5 1 1 Required Tools 5 2 5 12 Shutting the System Down 5 2 5 1 3 Extending the Server to the Maintenance Position 5 3 5 14 Removing the Server From a Rack 5 4 5 15 Disconnecting Power From the Server 5 6 5 1 6 Performing Electrostatic Discharge Prevention Measures 5 6 5 17 Removing the Top Cover 5 6 5 1 8 Removing the Front Bezel and Top Front Cover 5 7 5 2 Removing and Replacing FRUs 5 8 5 2 1 Removing PCI Express and PCI X Cards 5 9 5 22 Replacing PCI Cards 5 11 5 2 3 Removing DIMMs 5 12 5 2 4 Replacing DIMMs 5 14 5 2 5 Removing the System Controller Card 5 17 5 2 6 Replacing the System Controller Card 5 18 5 2 7 Removing the Motherboard Assembly 5 19 viii SPARC Enterprise T2000 Server Service Manual Apri
53. a location other than the default opt directory alter the path in the following command accordingly opt SUNWvts bin sunvts display display system 0 where display system is the name of the machine through which you are remotely logged in to the server The SunVTS GUI is displayed FIGURE 3 11 Chapter 3 Server Diagnostics 3 51 L SunVTS Diagnostic ox Commands View Options Reports Scheduler Help baldaa Start stop Reset Host Meter Quit Hostname wgs40 142 Model Sun Fire T200 Testing status idle System passes O Cumulative errors O Elapsed test time 000 00 00 Last Option File AC Coverage no_coverage Select Devices System map Physical Logical Green Pass Red Fail Default Processor s J None Memory All Cryptography _ Intervention SCSI Devices mpto Network Select Test Mode USB Devices 2 Connection OtherDevices Functional _ Auto Config J Exclusive _ Online FI J Test messages Clear N FI Jo FIGURE 3 11 Sun VTS GUI 5 Expand the test lists to see the individual tests The test selection area lists tests in categories such as Network as shown in FIGURE 3 12 To expand a category left click the FA icon expand category icon to the left of the category name 3 52 SPARC Enterprise T2000 Server Service Manual April 2007 M Processor s
54. aceable units FRUs in the server The following topics are covered Section 4 1 Devices That Are Hot Swappable and Hot Pluggable on page 4 2 Section 4 2 Hot Swapping a Fan on page 4 2 Section 4 3 Hot Swapping a Power Supply on page 4 4 Section 4 4 Hot Swapping the Rear Blower on page 4 7 Section 4 5 Hot Plugging a Hard Drive on page 4 9 4 1 4 1 Devices That Are Hot Swappable and Hot Pluggable Hot swappable devices are those devices that you can remove and install while the server is running without affecting the rest of the server s capabilities In a server the following devices are hot swappable m Fans a Power supplies m Rear blower Hot pluggable devices are those devices that can be removed and installed while the system is running but you must perform administrative tasks beforehand In a server the chassis mounted hard drives can be hot swappable depending on how they are configured 4 2 4 2 1 Hot Swapping a Fan Three hot swappable fans are located under the fan door Two working fans are required to provide adequate cooling for the server If a fan fails replace it as soon as possible to ensure system availability The following LEDs are lit when a fan fault is detected a Front and rear Service Required LEDs a Top Fan LED on the front of the server m LED on the faulty fan If an overtemperature conditions occurs the front panel Overtemp LED lights A m
55. adm al to list all disks in the device tree including unconfigured disks If the disk is not in list such as with a newly installed disk then use devfsadm to configure it into the tree See the devfsadm man page for details SPARC Enterprise T2000 Server Service Manual April 2007 6 1 2 Latch Latch release button FIGURE 6 1 Hard Drive Slots Adding a USB Device Follow these guidelines m Only perform USB hot plug operations while the operating system is running Do not perform USB hot plug operations when the system ok prompt is displayed or before the system has completed booting m You can connect up to 126 devices to each of the two USB controllers each controller provides two connectors for a total of 252 USB devices a The USB ports on the server support USB 1 1 devices Note There are many USB devices on the market Read the product documentation for your USB device for additional installation requirements and instructions that are not covered here Plug a standard USB device into one of the USB ports FIGURE 6 2 on the front or rear of the server Chapter 6 Adding New Components and Devices 6 3 FIGURE 6 2 Adding a USB Device 6 2 Adding Components Inside the Chassis You can add the following components to the server m Memory m PCI X cards m PCI Express cards 6 2 1 Memory Guidelines Use the following guidelines and FIGURE 6 3 and TABLE 6 1 to plan the me
56. al fan indications e Off Indicates a steady state no service action is required e Steady on Indicates that a fan failure event has been acknowledged and a service action is required on at least one of the three fans Use the fan LEDs to determine which fan requires service Rear FRU Amber Provides the following indications Fault LED e Off Indicates a steady state no service action is required e Steady on Indicates a failure of a rear access FRU a power supply or the rear blower Use the FRU LEDs to determine which FRU requires service SPARC Enterprise T2000 Server Service Manual April 2007 cays TABLE 3 2 Front and Rear Panel LEDs Continued LED Color Description OverTemp Amber Provides the following operational temperature indications LED e Off Indicates a steady state no service action is required e Steady on Indicates that a temperature failure event has been acknowledged and a service action is required View the ALOM CMT reports for further information on this event Hard Drive LEDs The hard drive LEDs FIGURE 3 4 and TABLE 3 3 are located on the front of each hard drive that is installed in the server chassis OK to Remove Unused Activity FIGURE 3 4 Hard Drive LEDs TABLE 3 3 Hard Drive LEDs LED Color Description OK to Blue e On The drive is ready for hot plug removal Remove Off Normal operation Unused Amber Activity Green On Drive is rec
57. ally detects that the fault condition is no longer present ALOM CMT extinguishes the Service Required LED and updates the FRU s PROM indicating that the fault is no longer present m Fault repair The fault has been repaired by human intervention In most cases ALOM CMT detects the repair and extinguishes the Service Required LED If ALOM CMT does not perform these actions you must perform these tasks manually with clearfault or enablecomponent commands ALOM CMT can detect the removal of a FRU in many cases even if the FRU is removed while ALOM CMT is powered off This enables ALOM CMT to know that a fault diagnosed to a specific FRU has been repaired The ALOM CMT clearfault command enables you to manually clear certain types of faults without a FRU replacement or if ALOM CMT was unable to automatically detect the FRU replacement Note ALOM CMT does not automatically detect hard drive replacement Many environmental faults can automatically recover A temperature that is exceeding a threshold might return to normal limits An unplugged a power supply can be plugged in and so on Recovery of environmental faults is automatically detected Recovery events are reported using one of two forms m fru at location is OK m sensor at location is within normal range Environmental faults can be repaired through hot removal of the faulty FRU FRU removal is automatically detected by the environmental monitoring and all faults assoc
58. an environmental fault 7 l the fault a PSH detactad fault 8 The fault isa POST detected fault 9 Contact Support ifthe fault condition persist Chapter 3 Server Diagnostics Numbers in this flow chart correspond to the Action numbers in Table 2 1 Kentify the fault condition from the fault message Kentify and replace the faulty FRU from the PSH message and perform the procedure to clear the PSH detected fault Identify and replace the faulty FRU from the POST message and perform the procedure to clear the POST detected faults 3 3 TABLE 3 1 Diagnostic Flowchart Actions Action For more information see No Diagnostic Action Resulting Action these sections 1 Check Power OK The Power OK LED is located on the front and rear Section 3 2 Using LEDs and AC OK LEDs on the server 2 Run the ALOM CMT showfaults command to check for faults 3 Check the Solaris log files for fault information 4 Run SunVTS of the chassis The AC OK LED is located on the rear of the server on each power supply If these LEDs are not on check the power source and power connections to the server The showfaults command displays the following kinds of faults e Environmental faults e Solaris Predictive Self Healing PSH detected faults e POST detected faults Faulty FRUs are identified in fault messages using the FRU name For a list of FRU names see Appendix A T
59. antistatic mat Replacing DIMMs Unpackage the replacement DIMMs and place them on an antistatic mat Ensure that the connector ejector tabs are in the open position Line up the replacement DIMM with the connector Align the DIMM notch with the key in the connector This action ensures that the DIMM is oriented correctly Push the DIMM into the connector until the ejector tabs lock the DIMM in place Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 5 14 SPARC Enterprise T2000 Server Service Manual April 2007 6 Gain access to the ALOM CMT sc gt prompt Refer to the SPARC Enterprise T2000 Advanced Lights Out Management ALOM Guide for instructions 7 Run the showfaults v command to determine how to clear the fault The method you use to clear a fault depends on how the fault is identified by the showfaults command Examples m If the fault is a host detected fault displays a UUID continue to Step 8 For example sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CHO RO DO Host detected fault MSGID SUN4V 8000 DX UUID 7ee0e46b ea64 6565 e684 e996963f7b86 m If the fault resulted in the FRU being disabled such as the following sc gt showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CHO RO DO MB CMP0 CH0 RO DO deemed faulty and disabled Then run the enablecomponent command to enable the FRU
60. atic mat Using a small flat head screwdriver carefully pry the battery FIGURE 5 27 from the system controller FIGURE 5 27 Removing the Battery From the System Controller Replacing the Battery on the System Controller Unpackage the replacement battery Press the new battery into the system controller FIGURE 5 28 with the positive side facing upward away from the card Battery FIGURE 5 28 Replacing the Battery in the System Controller 5 40 SPARC Enterprise T2000 Server Service Manual April 2007 3 Replace the system controller See Section 5 2 6 Replacing the System Controller Card on page 5 18 4 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 5 Use the ALOM CMT setdate command to set the day and time Use the setdate command before you power on the host system For details about this command refer to the Advanced Lights Out Management ALOM CMT Guide 5 3 Common Procedures for Finishing Up 5 3 1 Replacing the Top Front Cover and Front Bezel 1 Place the top front cover on the chassis 2 Slide the front top cover forward until it snaps into place being careful to avoid catching the cover on the intrusion switch FIGURE 5 29 Intrusion switch FIGURE 5 29 Replacing the Top Front Cover Chapter 5 Replacing Cold Swappable FRUs 5 41 5 3 2 D0 Position the bezel on the front of the chassis and snap it into
61. ating the following details m Date and time of the fault Apr 24 06 54 08 2005 a Universal Unique Identifier UUID that is unique for every fault 1ce22523 1c80 6062 e61d 3b39290ae2c m The message identifier SUNW4V 8000 6H that can be used to obtain additional fault information a Faulted FRU FRU hc component MB that in this example is identified as MB indicating that the motherboard requires replacement 3 42 SPARC Enterprise T2000 Server Service Manual April 2007 Note fmdump displays the PSH event log Entries remain in the log after the fault has been repaired 2 Use the message ID to obtain more information about this type of fault a In a browser go to the Predictive Self Healing Knowledge Article web site http www sun com msg b Obtain the message ID from the console output or the ALOM CMT showfaults command c Enter the message ID in the SUNW MSG ID field and click Lookup In this example the message ID SUN4U 8000 6H returns the following information for corrective action CPU errors exceeded acceptable levels Type Fault Severity Major Description The number of errors associated with this CPU has exceeded acceptable levels Automated Response The fault manager will attempt to remove the affected CPU from service Impact System performance may be affected Suggested Action for System Administrator Schedule a repair procedure to replace the affected CPU the identity of whi
62. ault LED 3 13 removing 4 7 replacing 4 7 bootmode command 3 19 break command 3 19 bus bar screws 5 26 5 29 button Locator 5 3 Power On Off 3 10 5 3 top cover release 5 7 C cable kit A 5 cable management arm CMA reconnecting 5 44 releasing 4 5 cables removing from motherboard 5 20 card slots PCI 5 9 cfgadmcommand 6 2 chassis identification 2 9 reinstalling in the rack 5 42 removing from the rack 5 4 serial number 2 10 5 31 serial number electronic 5 27 5 32 chip multithreading CMT 2 2 chipkill 3 7 clearasrdb command 3 46 Index 1 clearfault command 3 19 3 44 5 17 clearing POST detected faults 3 39 clearing PSH detected faults 3 44 common procedures for parts replacement 5 1 components disabled 3 46 3 48 displaying state of 3 46 connecting to ALOM CMT 3 18 connectors location of 2 9 console 3 18 console command 3 19 3 33 5 16 consolehistory command 3 19 cooling 2 4 cores 2 2 2 4 CPU board see also motherboard 5 19 cryptography 2 5 D DDR 2 memory DIMMs 3 6 diag_level parameter 3 27 3 30 diag_mode parameter 3 27 3 30 diag_trigger parameter 3 27 3 30 diag_verbosity parameter 3 27 3 30 diagnostics about 3 1 flowchart 3 3 low level 3 26 running remotely 3 16 SunVTS 3 49 DIMMs 3 7 A 5 error correcting 2 7 example POST error output 3 35 names and socket numbers 5 13 6 5 parity checking 2 7 removing 5 12 replacing 5 14 slot assignments
63. cO FUJITSU SPARC Enterprise T2000 Server Service Manual C120 E377 O1EN co FUJITSU SPARC Enterprise T2000 Server Service Manual Manual Code C120 E377 01EN Part No 875 4036 10 April 2007 Copyright 2007 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved FUJITSU LIMITED provided technical input and review on portions of this material Sun Microsystems Inc and Fujitsu Limited each own or control intellectual property rights relating to products and technology described in this document and such products technology and this document are protected by copyright laws patents and other intellectual property laws and international treaties The intellectual property rights of Sun Microsystems Inc and Fujitsu Limited in such products technology and this document include without limitation one or more of the United States patents listed at http www sun com patents and one or more additional patents or patent applications in the United States or other countries This document and the product and technology to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of such product or technology or of this document may be reproduced in any form by any means without prior written authorization of Fujitsu Limited and Sun Microsystems Inc and their applicable licensors if any The furnishing of this docum
64. ceable units FRUs Chapter 5 Replacing Cold Swappable FRUs Describes how to remove and replace the FRUs that cannot be hot swapped Chapter 6 Adding New Components and Devices Explains how to add new components such as hard drives memory and PCI cards to the SPARC Enterprise T2000 server Appendix A Field Replaceable Units Provides an illustrated breakdown of parts and lists the field replaceable units FRUs Index Provides keywords and corresponding reference page numbers so that the reader can easily search for items in this manual as necessary Related Documentation The latest versions of all the SPARC Enterprise Series manuals are available at the following Web sites Global Site http www fujitsu com sparcenterprise manual xviii SPARC Enterprise T2000 Server Service Manual April 2007 Japanese Site http primeserver fujitsu com sparcenterprise manual Title SPARC Enterprise T2000 Server Product Notes SPARC Enterprise T2000 Server Site Planning Guide SPARC Enterprise T2000 Server Getting Started Guide SPARC Enterprise T2000 Server Overview Guide SPARC Enterprise T2000 Server Installation Guide SPARC Enterprise T2000 Server Administration Guide Advanced Lights Out Management ALOM CMT vx x Guide SPARC Enterprise T2000 Server Safety and Compliance Guide Description Information about the latest product updates and issues Server specifications for site planning Informa
65. cessing it by tip or telnet commands refer to the SunVTS User s Guide SunVTS software can be run in several modes This procedure assumes that you are using the default mode This procedure also assumes that the server is headless That is it is not equipped with a monitor capable of displaying bitmap graphics In this case you access the SunVTS GUI by logging in remotely from a machine that has a graphics display Finally this procedure describes how to run SunVTS tests in general Individual tests might presume the presence of specific hardware or might require specific drivers cables or loopback connectors For information about test options and prerequisites refer to the following documentation m SunVTS 6 3 Test Reference Manual for SPARC Platforms m SunVTS 6 3 User s Guide 3 50 SPARC Enterprise T2000 Server Service Manual April 2007 3 8 3 Exercising the System With SunVTS Software Log in as superuser to a system with a graphics display The display system should be one with a frame buffer and monitor capable of displaying bitmap graphics such as those produced by the SunVTS GUI Enable the remote display On the display system type usr openwin bin xhost fest system where test system is the name of the server you plan to test Remotely log in to the server as superuser Use a command such as rlogin or telnet Start SunVTS software If you have installed SunVTS software in
66. ch can be determined using fmdump v u lt EVENT_ID gt Details The Message ID SUN4U 8000 6H indicates diagnosis has determined that a CPU is faulty The Solaris fault manager arranged an automated attempt to disable this CPU The recommended action for the system administrator is to contact Sun support so a Sun service technician can replace the affected component 3 Follow the suggested actions to repair the fault Chapter 3 Server Diagnostics 3 43 3 9 2 Clearing PSH Detected Faults When the Solaris PSH facility detects faults the faults are logged and displayed on the console After the fault condition is corrected for example by replacing a faulty FRU you must clear the fault Note If you are dealing with faulty DIMMs do not follow this procedure Instead perform the procedure in Section 5 2 4 Replacing DIMMs on page 5 14 After replacing a faulty FRU power on the server At the ALOM CMT prompt use the showfaults command to identify PSH detected faults PSH detected faults are distinguished from other kinds of faults by the text Host detected fault Example sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CHO RO DO Host detected fault MSGID SUN4U 8000 2S UUID 7ee0e46b ea64 6565 e684 e996963f7b86 a If no fault is reported you do not need to do anything else Do not perform the subsequent steps m Ifa fault is reported perform Step 2 thr
67. cifies the number of lines to display before pausing e e lines displays n lines from the end of the buffer e b lines displays n lines from beginning of buffer e v displays entire buffer boot run specifies the log to display run is the default log Enables control of the firmware during system initialization with the following options e normal is the default boot mode e reset_nvram resets OpenBoot PROM parameters to their default values bootscript string enables the passing of a string to the boot command Performs a poweroff followed by poweron The f option forces an immediate poweroff otherwise the command attempts a graceful shutdown Powers off the host server The y option enables you to skip the confirmation question The option forces an immediate shutdown Powers on the host server Using the c option executes a console command after completion of the poweron command Chapter 3 Server Diagnostics 3 19 TABLE 3 8 ALOM CMT Command Service Related ALOM CMT Commands Continued Description removefru PS0 PS1 reset y c resetsc y setkeyswitch y normal stby diag locked setlocator on off showenvironment showfaults v showfru g lines s d FRU showkeyswitch showlocator showlogs b lines e lines v g lines p logtype r p showplat form v Indicates if it is okay to perform a hot swap of a power supply This command
68. cures the front I O board to the chassis Slide the front I O board back tilt it up clear the two mounting tabs in the front and lift the board straight out of the chassis FIGURE 5 24 FIGURE 5 24 Removing the Front I O Board Place the front I O board on an antistatic mat Replacing the Front I O Board Unpackage the front I O board and place it on an antistatic mat Tip the front I O board downwards and slightly forward and push it into place aligning the board with the screw hole in the exterior wall of the chassis When the board is fully seated both connectors on the USB ports are mounted flush against the motherboard assembly Using the screw secure the front I O board to the chassis Reconnect the front I O board data cable Reinstall the LED board See Section 5 2 12 Replacing the LED Board on page 5 33 5 36 SPARC Enterprise T2000 Server Service Manual April 2007 5 2 17 5 2 18 5 2 19 Reconnect and secure the fan power board Replace all three fans See Section 4 2 2 Replacing a Fan on page 4 4 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the DVD Drive Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Remove the DVD interconnect board from the back of the DVD drive Press the release latch and pull the DVD drive o
69. d fault page 3 40 If the fault is a PSH detected fault identify the faulty FRU from the fault message and replace the Chapter 5 faulty FRU After the FRU is replaced perform the procedure to section 3 5 2 Clearing clear PSH detected faults PSH Detected Faults on page 3 44 8 Determine if the POST performs basic tests of the server components Section 3 4 Running fault was detected and reports faulty FRUs When POST detects a POST on page 3 26 by POST faulty FRU it logs the fault and if possible takes the FRU offline POST detected FRUs display the Chapter 5 following text in the fault message FRU_name deemed faulty and disabled Section SAG Craie In this case replace the FRU and run the procedure POST Detected Faults on to clear POST detected faults page 3 39 9 Contact technical The majority of hardware faults are detected by the Section 2 3 Obtaining the support server s diagnostics In rare cases a problem might Chassis Serial Number cs a require additional troubleshooting If you are unable to determine the cause of the problem contact support on page 2 10 Memory Configuration and Fault Handling 3 1 1 1 A variety of features play a role in how the memory subsystem is configured and how memory faults are handled Understanding the underlying features helps you identify and repair memory problems This section describes how the memory is configured and how the server deals with memory faults
70. duct after modifying or reproducing by overhaul may cause unexpected injury or damage to the property of the user or bystanders Preface xxiii Alert Labels The followings are labels attached to this product m Never peel off the labels m The following labels provide information to the users of this product SPARC Enterprise 12000 General Service Information Sample of SPARC Enterprise T2000 Fujitsu Welcomes Your Comments We would appreciate your comments and suggestions to improve this document You can submit your comments by using Reader s Comment Form xxiv SPARC Enterprise T2000 Server Service Manual April 2007 Reader s Comment Form We would appreciate your comments and suggestions for improving this publication Date Publication No Your Name Publication Name Company Address City State Zip Phone Email address Your Comments Comments Reply requested o Yes o No Please evaluate the overall quality of this manual by checking v the appropriate boxes Good Fair Poor Good Fair Poor Good Fair Poor Organization o o o Use of examples 0 o o Legibility o o o Accuracy o o o Index coverage 0 o o Binding o o o Clarity o o o Cross Figures and tables 0 o o Overall rating of referencing o o o General appearance o o O this publication o o o Technical level
71. e and software continue to function when the server operating system goes offline or when the server is powered off ALOM CMT monitors the following server components CPU temperature conditions Hard drive status Enclosure thermal conditions Fan speed and status Power supply status Voltage levels Faults detected by POST power on self test Solaris Predictive Self Healing PSH diagnostic facilities Chapter 2 Server Overview 2 5 2 1 4 2 1 4 1 For information about configuring and using the ALOM system controller refer to the latest Advanced Lights Out Manager ALOM CMT Guide System Reliability Availability and Serviceability Reliability availability and serviceability RAS are aspects of a system s design that affect its ability to operate continuously and to minimize the time necessary to service the system Reliability refers to a system s ability to operate continuously without failures and to maintain data integrity System availability refers to the ability of a system to recover to an operational state after a failure with minimal impact Serviceability relates to the time it takes to restore a system to service following a system failure Together reliability availability and serviceability features provide for near continuous system operation To deliver high levels of reliability availability and serviceability the server offers the following features Hot pluggable hard drives a Redundant hot
72. e populated with one of the following types of DDR 2 DIMMS e 512 MB 8 GB maximum e 1 GB 16 GB maximum e 2 GB 32 GB maximum e 4 GB 64 GB maximum The memory subsystem supports the chipkill feature 4 ports 10 100 1000 Mb autonegotiating 1 4 SAS 2 5 inch form factor drives hot pluggable 1 slimline DVD R CD RW device 4 USB 1 1 ports 2 in front and 2 in rear 3 hot swappable and redundant system fans and 1 blower unit 3 PCI Express PCI E slots that support cards with the following specifications e Low profile e x1 x4 and x8 width e 12v and 3 3v as defined by the PCI Express specification 2 PCI X slots that support cards with the following specifications e 64 bit 133 MHz e Low profile e 3 3v 5v is also supplied as defined by the PCI X specification using a 3 3V form factor connector 2 hot swappable and redundant power supply units PSUs Refer to the SPARC Enterprise T2000 Server Site Planning Guide for power and environmental specifications SPARC Enterprise T2000 Server Service Manual April 2007 2 1 9 TABLE 2 1 Server Features Continued Feature Description Remote ALOM CMT management controller with a serial and 10 100 Mb management Ethernet port Firmware System firmware comprising OpenBoot PROM for system settings and power on self test POST support e ALOM CMT for remote management administration Cryptography Hardware assisted cyptographic acceleration
73. e power cords from the server Removing the Top Cover All field replaceable units FRUs that are not hot swappable require the removal of the top cover 5 6 SPARC Enterprise T2000 Server Service Manual April 2007 1 Press the top cover release button FIGURE 5 3 Top cover Top cover release button a Top front cover FIGURE 5 3 Top Cover and Release Button cover latch 2 While pressing the top cover release button slide the cover toward the rear of the server about half of an inch 3 Lift the cover off the chassis 5 1 8 Removing the Front Bezel and Top Front Cover The following field replaceable units FRUs require the removal of the top front cover and front bezel Motherboard SAS disk backplane LED board Front I O board Fan power board DVD 1 Remove the top cover as described in Section 5 1 7 Removing the Top Cover on page 5 6 2 Lift the fan cover latch FIGURE 5 3 and open the fan cover 3 Loosen the captive screw near the farthest right fan that secures the bezel to the chassis FIGURE 5 4 Chapter 5 Replacing Cold Swappable FRUs 5 7 FIGURE 5 4 Removing the Front Bezel From the Server Chassis 4 Remove the front bezel from the chassis FIGURE 5 4 The bezel is held in place by a mounting tab and four fasteners that clamp the bezel to the chassis 5 While holding the fan cover open slide the top front cover forward to disengage the top front cover from the c
74. e repaired or replaced return POST to the default minimum level sc gt setkeyswitch normal sc gt setsc diag mode normal sc gt setsc diag level min Clearing POST Detected Faults In most cases when POST detects a faulty component POST logs the fault and automatically takes the failed component out of operation by placing the component in the ASR blacklist see Section 3 7 Managing Components With Automatic System Recovery Commands on page 3 46 After the faulty FRU is replaced you must clear the fault by removing the component from the ASR blacklist This procedure describes how to do this After replacing a faulty FRU at the ALOM CMT prompt use the showfaults command to identify POST detected faults POST detected faults are distinguished from other kinds of faults by the text deemed faulty and disabled and no UUID number is reported Example sc gt showfaults v ID Time FRU Fault 1 APR 24 12 47 27 MB CMP0 CH2 R0 D0 MB CMP0 CH2 RO0 DO deemed faulty and disabled If no fault is reported you do not need to do anything else Do not perform the subsequent steps Chapter 3 Server Diagnostics 3 39 2 Use the enablecomponent command to clear the fault and remove the component from the ASR blacklist Use the FRU name that was reported in the fault in the previous step Example sc gt enablecomponent MB CMP0 CH0 R0 DO The fault is cleared and should not show up when you run t
75. e to set the electronically readable chassis serial number The following steps describe how to do this Gain access to the ALOM CMT sc gt prompt Chapter 5 Replacing Cold Swappable FRUs 5 31 eel 5 32 9 Perform the following service commands to set the electronic chassis serial number in the power distribution board Caution Once the power distribution board is programmed with an electronic chassis serial number the serial number cannot be changed When executing the following commands ensure that you run the commands correctly and that you enter the correct chassis serial number because you will not be able to change it The chassis serial number is used to obtain support If the showplat form command outputs SUNW SPARC Enterprise T2000 the setpartner c 1 command was executed correctly sc gt setsc sc_servicemode true Warning misuse of this mode may invalidate your warranty sc gt setcsn c chassis serial number Are you sure you want to permanently set the Chassis Serial Number to chassis serial number y n y Chassis serial number recorded sc gt setpartner c 1 sc gt resetsc y x x System controller reboot message login admin password admin password sc gt showplatform SUNW SPARC Enterprise T2000 Chassis Serial Number chassis serial number Domain Status SO Running sc gt setsc sc_servicemode false Removing the LED Board 1 Perform the procedures described in Section 5
76. ection 5 1 6 Performing Electrostatic Discharge Prevention Measures on page 5 6 m Section 5 1 5 Disconnecting Power From the Server on page 5 6 m Section 5 1 7 Removing the Top Cover on page 5 6 m Section 5 1 8 Removing the Front Bezel and Top Front Cover on page 5 7 5 1 S11 5 1 2 Note These procedures do not apply to the hot pluggable and hot swappable devices fans power supplies hard drives and rear blower described in Chapter 4 The corresponding procedures that you perform when maintenance is complete are described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Required Tools The server can be serviced with the following tools a Antistatic wrist strap m Antistatic mat m No 2 Phillips screwdriver Shutting the System Down Performing a graceful shutdown ensures that all of your data is saved and the system is ready for restart Log in as superuser or equivalent Depending on the nature of the problem you might want to view the system status the log files or run diagnostics before you shut down the system Refer to the SPARC Enterprise T2000 Server Administration Guide for log file information Notify affected users Refer to your Solaris system administration documentation for additional information Save any open files and quit all running programs Refer to your application documentation for specific information on these processes
77. eiving power Solidly lit if drive is idle Flashes while the drive processes a command e Off Power is off Chapter 3 Server Diagnostics 3 11 3 2 3 Power Supply LEDs The power supply LEDs FIGURE 3 5 and TABLE 3 4 are located on the back of each power supply FIGURE 3 5 Power Supply LEDs TABLE 3 4 Power Supply LEDs LED Color Description Power OK Green On Normal operation DC output voltage is within normal limits e Off Power is off Failure Amber On Power supply has detected a failure Off Normal operation AC OK Green On Normal operation Input power is within normal limits Off No input voltage or input voltage is below limits 3 12 SPARC Enterprise T2000 Server Service Manual April 2007 3 2 4 3 2 5 Fan LEDs The fan LEDs are located on the top of each fan unit and are visible when you open the top fan door FIGURE 3 6 Fault FIGURE 3 6 Location of Fan LEDs TABLE 3 5 Fan LEDs LED Color Description Fan LEDs Amber On This fan is faulty Off Normal operation Note When a fan fault is detected the front panel Top Fan LED is lit Blower Unit LED The blower unit LED is located on the back of the blower unit and visible from the rear of the server TABLE 3 6 Chapter 3 Server Diagnostics 3 13 x Fault 7 D SSSSseessseeseses FIGURE 3 7 Location of the Blower Unit LED TABLE 3 6 Blower Unit LED LED Color Description
78. ent loose connector reseated and so on you must remove the component from the ASR blacklist The ASR commands TABLE 3 11 enable you to view and manually add or remove components from the ASR blacklist You run these commands from the ALOM CMT sc gt prompt TABLE 3 11 ASR Commands Command Description showcomponent Displays system components and their current state enablecomponent asrkey Removes a component from the asr db blacklist where asrkey is the component to enable disablecomponent asrkey Adds a component to the asr db blacklist where asrkey is the component to disable clearasrdb Removes all entries from the asr db blacklist The showcomponent command might not report all blacklisted DIMMS SPARC Enterprise T2000 Server Service Manual April 2007 Note The components asrkeys vary from system to system depending on how many cores and memory are present Use the showcomponent command to see the asrkeys ona given system Note A reset or powercycle is required after disabling or enabling a component If the status of a component is changed with power on there is no effect to the system until the next reset or power cycle 37l Displaying System Components The showcomponent command displays the system components asrkeys and reports their status At the sc gt prompt enter the showcomponent command Example with no disabled components
79. ent to you does not give you any rights or licenses express or implied with respect to the product or technology to which it pertains and this document does not contain or represent any commitment of any kind on the part of Fujitsu Limited or Sun Microsystems Inc or any affiliate of either of them This document and the product and technology described in this document may incorporate third party intellectual property copyrighted by and or licensed from suppliers to Fujitsu Limited and or Sun Microsystems Inc including software and font technology Per the terms of the GPL or LGPL a copy of the source code governed by the GPL or LGPL as applicable is available upon request by the End User Please contact Fujitsu Limited or Sun Microsystems Inc This distribution may include materials developed by third parties Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Sun Sun Microsystems the Sun logo Java Netra Solaris Sun StorEdge docs sun com OpenBoot SunVTS Sun Fire SunSolve CoolThreads J2EE and Sun are trademarks or registered trademarks of Sun Microsystems Inc in the U S and other countries Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited All SPARC trademarks are used under license and are registered trademarks of SPARC Int
80. er supply is located At the rear of the server release the cable management arm CMA tab FIGURE 4 3 and swing the CMA out of the way so you can access the power supply Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs 4 5 4 3 2 FIGURE 4 3 Rotating the Cable Management Arm Disconnect the power cord from the faulty power supply Grasp the power supply handle and push the power supply latch to the right Pull the power supply out of the chassis Replacing a Power Supply Align the replacement power supply with the empty power supply bay Slide the power supply into bay until it is fully seated Reconnect the power cord to the power supply Close the CMA inserting the end of the CMA into the rear left rail bracket Verify that the amber LED on the replaced power supply the Service Required LED and Rear FRU Fault LEDs are not lit At the sc gt prompt issue the showenvironment command to verify the status of the power supplies 4 6 SPARC Enterprise T2000 Server Service Manual April 2007 4 4 4 4 1 4 4 2 Hot Swapping the Rear Blower The rear blower on the server is hot swappable The following LEDs are lit when a blower unit fault is detected m Front and rear Service Required LEDs a LED on the blower Removing the Rear Blower Gain access to the rear of the server where the faulty blower unit is located Release the cable management arm tab FIGURE 4
81. er unit fails the two power supply fan units provide enough cooling for the hard drive bay to keep the server running Fan Redundancy The server features three hot swappable system fans Multiple fans enable the server to continue operating with adequate cooling in the event that one of the fans fails Environmental Monitoring The server features an environmental monitoring subsystem designed to protect the server and its components against Extreme temperatures Lack of adequate airflow through the system Power supply failures Hardware faults Temperature sensors located throughout the server monitor the ambient temperature of the server and internal components The software and hardware ensure that the temperatures within the enclosure do not exceed predetermined safe operating ranges If the temperature observed by a sensor falls below a low temperature threshold or rises above a high temperature threshold the monitoring subsystem software lights the amber Service Required LEDs on the front and back panel If the temperature condition persists and reaches a critical threshold the system initiates a graceful server shutdown All error and warning messages are sent to the system controller SC console and are logged in the ALOM CMT log file Additionally some FRUs such as power supplies provide LEDs that indicate a failure within the FRU Error Correction and Parity Checking The UltraSPARC T1 multicore processor provides parit
82. ernational Inc in the U S and other countries Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems Inc SPARC64 is a trademark of SPARC International Inc used under license by Fujitsu Microelectronics Inc and Fujitsu Limited The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements United States Government Rights Commercial use U S Government users are subject to the standard government user license agreements of Sun Microsystems Inc and Fujitsu Limited and the applicable provisions of the FAR and its supplements Disclaimer The only warranties granted by Fujitsu Limited Sun Microsystems Inc or any affiliate of either of them in connection with this document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT FUJITSU LIMITED SUN MICROSYSTEMS INC AND THEIR AFFILIATES MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIN
83. erprise T2000 Server Service Manual April 2007 22 Chassis Identification FIGURE 2 3 and FIGURE 2 4 show the physical characteristics of the server Indicators and buttons DVD drive 00 000 000 Va Zo N USB ports Hard drives FIGURE 2 3 Server Front Panel SC serial mgt GBE ports port L2 L3 TTYA serial SC net mgt Lolli PCI X slots port port oO i e oo ee o CD o o m eee Power Power PCI E slot Indicators USB ports PCI E slots supply 0 supply 1 1 0 FIGURE 2 4 Server Rear Panel Chapter 2 Server Overview 2 9 Zo 2 10 Obtaining the Chassis Serial Number To obtain support for your system you need your chassis serial number The chassis serial number is located on a sticker that is on the front of the server and another sticker on the side of the server You can also run the ALOM CMT showplat form command to obtain the chassis serial number Example sc gt showplatform SUNW SPARC Enterprise T2000 Chassis Serial Number 0529AP000882 Domain Status SO OS Standby sc gt SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 3 Server Diagnostics This chapter describes the diagnostics that are available for monitoring and troubleshooting the server This chapter is intended for technicians service personnel and system administrators
84. erprise T2000 Server Service Manual April 2007 FIGURE 5 15 Installing the Motherboard Assembly 5 Adjust the position of the motherboard assembly so that it is mounted on the bus bar 6 Adjust the position of the motherboard assembly so that it lines up with the standoff screw holes 7 Loosely install two screws shown in FIGURE 5 16 Chapter 5 Replacing Cold Swappable FRUs 5 25 10 11 12 Insulating washer Install these two screws first to properly align the board ee aye ee 0 FIGURE 5 16 Securing the Motherboard Assembly to the Chassis Secure the motherboard assembly to the chassis with the remaining screws and an insulating washer FIGURE 5 16 Do not fully tighten any screws until all of the screws are loosely installed One insulating washer is required in the position shown in FIGURE 5 16 A washer is supplied with the replacement FRU Install the washer in this position even if the original motherboard did not have a washer Tighten the two bus bar screws to secure the bus bar to the motherboard assembly Reinstall the system controller card in the motherboard assembly See Section 5 2 6 Replacing the System Controller Card on page 5 18 Reinstall all DIMMs in the motherboard assembly in the slots from which they were removed See Section 5 2 4 Replacing DIMMs on page 5 14 Reinstall any PCI option cards that were removed See Section 5 2 2 Replacing PCI Cards
85. essage is displayed on the console and logged by ALOM Use the showfaults command at the sc gt prompt to view the current faults Removing a Fan 1 Gain access to the top of the server where the fan door is located FIGURE 4 1 4 2 SPARC Enterprise T2000 Server Service Manual April 2007 You might need to extend the server to a maintenance position See Section 5 1 3 Extending the Server to the Maintenance Position on page 5 3 FM2 Latch Fan door FIGURE 4 1 Fan Identification and Removal 2 Unpackage the replacement fan and place it near the server 3 Lift the latch on the top of the fan door FIGURE 4 1 and lift the fan door open The fan door is spring loaded and you must hold it in the open position 4 Identify the faulty fan A lighted LED on the top of a fan indicates that the fan is faulty 5 Pull up on the fan strap handle until the fan is removed from the fan bay Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs 4 3 4 2 2 Replacing a Fan With the fan door held open slide the replacement fan into the fan bay Apply firm pressure to fully seat the fan Verify that the LED on the replaced fan and the Top fan Service Required and Locator LEDs are not lit Close the fan door If necessary return the server to its normal position in the rack 4 3 4 3 1 Hot Swapping a Power Supply The server s redundant hot swappable power supplies enable you to remo
86. fic DIMMS that are associated with the fault Once you identify which DIMMs you want to replace see Section 5 2 3 Removing DIMMs on page 5 12 for DIMM removal and replacement instructions It is important that you perform the instructions in that chapter to clear the faults and enable the replaced DIMMs 9 2 SA Using LEDs to Identify the State of Devices The server provides the following groups of LEDs m Section 3 2 1 Front and Rear Panel LEDs on page 3 8 m Section 3 2 2 Hard Drive LEDs on page 3 11 m Section 3 2 3 Power Supply LEDs on page 3 12 m Section 3 2 4 Fan LEDs on page 3 13 m Section 3 2 5 Blower Unit LED on page 3 13 m Section 3 2 6 Ethernet Port LEDs on page 3 14 These LEDs provide a quick visual check of the state of the system Front and Rear Panel LEDs The six front panel LEDs FIGURE 3 2 are located in the upper left corner of the server chassis Three of these LEDs are also provided on the rear panel FIGURE 3 3 3 8 SPARC Enterprise T2000 Server Service Manual April 2007 Service Power Required On Off Rear FRU Fault LED button LED Locator Power OK Top Fan Over Temp LED button LED LED LED ae FIGURE 3 2 Front Panel LEDs Locator Service Power OK LED button Required LED LED FIGURE 3 3 Rear Panel LEDs Chapter 3 Server Diagnostics 3 9 3 10 TABLE 3 2 lists and describes the front and rear panel LEDs TABLE 3 2 Fron
87. gable and Hot Swappable Devices Hot pluggable devices such as hard drives require administration during installation Hot swappable devices such as USB devices can be connected to and disconnected from the system while the system is running Other components and devices require you to shut down the system prior to installation See Section 6 2 Adding Components Inside the Chassis on page 6 4 Adding a Hard Drive to the Server Hard drives are physically addressed according to the slot in which they are installed Depending on the server model the hard drives might be connected to a PCI X SAS controller card or connected to a drive controller that is built into the 6 1 6 2 motherboard an onboard hard drive controller Regardless of the type of controller the hard drives are installed into the chassis the same way as described in this procedure Note Not all servers have the on board hard drive controller support These servers have the PCI X SAS controller card installed Remove the blank panel from the chassis a On the blank panel push the latch release button b Grasp the latch and pull the blank panel out Align the disk drive to the drive bay slot See FIGURE 6 1 For additional details see Section 4 5 1 Removing a Hard Drive on page 4 9 Slide the hard drive into the bay until the drive is fully seated Close the hard drive lever to lock the drive in place Use cfg
88. hassis 6 Lift the top front cover from the chassis 5 2 5 8 Removing and Replacing FRUs This section provides procedures for replacing the following field replaceable parts FRUs inside the server chassis Section 5 2 1 Removing PCI Express and PCI X Cards on page 5 9 and Section 5 2 2 Replacing PCI Cards on page 5 11 Section 5 2 3 Removing DIMMs on page 5 12 and Section 5 2 4 Replacing DIMMs on page 5 14 Section 5 2 5 Removing the System Controller Card on page 5 17 and Section 5 2 6 Replacing the System Controller Card on page 5 18 Section 5 2 7 Removing the Motherboard Assembly on page 5 19 and Section 5 2 8 Replacing the Motherboard Assembly on page 5 23 Section 5 2 9 Removing the Power Distribution Board on page 5 27 and Section 5 2 10 Replacing the Power Distribution Board on page 5 30 SPARC Enterprise T2000 Server Service Manual April 2007 m Section 5 2 11 Removing the LED Board on page 5 32 and Section 5 2 12 Replacing the LED Board on page 5 33 m Section 5 2 13 Removing the Fan Power Board on page 5 34 and Section 5 2 14 Replacing the Fan Power Board on page 5 34 m Section 5 2 15 Removing the Front I O Board on page 5 35 and Section 5 2 16 Replacing the Front I O Board on page 5 36 m Section 5 2 17 Removing the DVD Drive on page 5 37 and Section 5 2 18 Replacing the DVD Drive on page
89. he SAS backplane board contains the Molex SASBP backplane Removing the SAS connector for interfacing to 2 5 SAS or S ATA disk Disk Backplane on drives In addition the board contains four page 5 37 seven position vertical SAS connectors that bring each of the four SAS links from the I O board This board contains the electronic chassis serial number 15 Hard drives Section 4 5 1 SFF SAS 2 5 inch form factor hard drives HDDO Removing a Hard HDD1 Drive on page 4 9 HDD HDD3 16 DVD drive Section 5 2 17 DVD CD ROM drive DVD Removing the DVD Drive on page 5 37 The FRU name is used in system messages A 6 SPARC Enterprise T2000 Server Service Manual April 2007 Index A AC OK LED 3 4 3 12 activity indicator hard drive 3 11 adding memory 6 1 adding new devices 6 1 advanced ECC technology 3 7 Advanced Lights Out Management ALOM CMT about 2 7 configuration parameters 5 18 connecting to 3 18 diagnosis and repair of server 3 16 POST and 3 27 prompt 3 18 remote management 2 5 service related commands 3 18 airflow blocked 3 5 ALOM CMT see Advanced Lights Out Management ALOM CMT antistatic mat 1 2 antistatic wrist strap 1 2 architecture designation 2 4 ASR blacklist 3 46 3 48 asrkeys 3 47 Automatic System Recovery ASR 3 46 B battery system controller A 4 replacing 5 40 bezel removing 5 7 replacing 5 42 blacklist ASR 3 46 block copy optimized 2 3 blower 4 2 f
90. he Solaris message buffer and log files record system events and provide information about faults e If system messages indicate a faulty device replace the FRU e To obtain more diagnostic information go to Action No 4 SunVTS is an application you can run to exercise and diagnose FRUs To run SunVTS the server must be running the Solaris OS If SunVTS reports a faulty device replace the FRU e If SunVTS does not report a faulty device go to Action No 5 to Identify the State of Devices on page 3 8 Section 3 3 2 Running the showfaults Command on page 3 21 Section 3 6 Collecting Information From Solaris OS Files and Commands on page 3 45 Chapter 5 Section 3 8 Exercising the System With SunVTS on page 3 49 Chapter 5 3 4 SPARC Enterprise T2000 Server Service Manual April 2007 TABLE 3 1 Diagnostic Flowchart Actions Continued Action For more information see No Diagnostic Action Resulting Action these sections 5 Run POST POST performs basic tests of the server components Section 3 4 Running and reports faulty FRUs POST on page 3 26 Note diag_level min is the default ALOM CMT setting which tests devices required to boot TABLE 3 9 TABLE 3 10 the server Use diag_level max for troubleshooting and hardware replacement Chapices If POST indicates a faulty FRU while diag_level min replace the FRU e If POST indicates a faulty memory device while RS sea k
91. he product or other property may occur if the user does not perform the procedure correctly Alert Messages in the Text An alert message in the text consists of a signal indicating an alert level followed by an alert statement Alert messages are indented to distinguish them from regular text Also a space of one line precedes and follows an alert statement Preface xxi provided from Fujitsu should only be performed by a certified service engineer Users must not perform these tasks Incorrect operation of these tasks may cause malfunction N Caution The following tasks regarding this product and the optional products m Unpacking optional adapters and such packages delivered to the users Also important alert messages are shown in Important Alert Messages on page xxii Notes on Safety Important Alert Messages This manual provides the following important alert signals Caution This indicates a hazardous situation could result in minor or moderate personal injury if the user does not perform the procedure correctly This signal also indicates that damage to the product or other property may occur if the user does not perform the procedure correctly Task Warning Maintenance Electric shock The system supplies standby power to the circuit boards even when the system is powered off Extremely hot The system controller card can be hot To avoid injury handle it carefully Damage The server
92. he showfaults command Additionally the Service Required LED is no longer on Power cycle the server You must reboot the server for the enablecomponent command to take effect At the ALOM CMT prompt use the showfaults command to verify that no faults are reported sc gt showfaults Last POST run THU MAR 09 16 52 44 2006 POST status Passed all devices No failures found in System 3 9 3 40 Using the Solaris Predictive Self Healing Feature The Solaris Predictive Self Healing PSH technology enables the server to diagnose problems while the Solaris OS is running and mitigate many problems before they negatively affect operations The Solaris OS uses the fault manager daemon fmd 1M which starts at boot time and runs in the background to monitor the system If a component generates an error the daemon handles the error by correlating the error with data from previous errors and other related information to diagnose the problem Once diagnosed the fault manager daemon assigns the problem a Universal Unique Identifier UUID that distinguishes the problem across any set of systems When possible the fault manager daemon initiates steps to self heal the failed component and take the component offline The daemon also logs the fault to the syslogd daemon and provides a fault notification with a message ID MSGID You can use the message ID to get additional information about the problem from Sun s knowledge
93. he system is powered off To avoid personal injury or damage to the system you must disconnect all power cords before servicing the power distribution board 1 Loosely fit the power distribution board PDB onto the locator pins in the chassis and slide the board toward the rear of the chassis 2 Secure the PDB to the chassis with the mounting screw Do not tighten the screw yet 3 Secure the PDB to the bus bar with two screws and tighten all three screws FIGURE 5 20 5 30 SPARC Enterprise T2000 Server Service Manual April 2007 Bus bar I screws LS D a LIN Mounting screw 7 ty LA FIGURE 5 20 Installing the Power Distribution Board Connect the cables to the power distribution board Blower power cable Cable marked P7 DVD cable Hard drive power connector Re engage the power supplies with the power distribution board connectors Note the chassis serial number The chassis serial number is located on a sticker on the front of the server and on a sticker on the side of the server The serial number is unique to each server You need this number for subsequent steps in this procedure Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 and then return to this procedure to complete the remaining steps Note After replacing the power distribution board and powering on the system you must run the setcsn command on the ALOM CMT consol
94. herboard Assembly Caution Remove and replace the motherboard carefully The motherboard rests on metal standoffs If the motherboard is not handled carefully the components mounted on the underside of the motherboard can be damaged if they hit the standoffs To ensure that this damage does not occur perform the removal and replacement instructions described in this document Chapter 5 Replacing Cold Swappable FRUs 5 23 5 24 Caution A flexible cable connects the CPU and I O boards This flexible cable is fragile Handle these parts very carefully to prevent damage Caution This procedure requires that you handle components that are sensitive to static discharges that can cause the component to fail To avoid this problem ensure that you follow antistatic practices as described in Section 5 1 6 Performing Electrostatic Discharge Prevention Measures on page 5 6 Unpackage the replacement motherboard assembly and place it on an antistatic mat Tilt the motherboard assembly over the interior wall into the chassis FIGURE 5 15 and place it down on the rear standoffs Avoid touching the front standoffs with the motherboard Slide the motherboard backward on the rear standoffs to engage the connectors in the rear cutouts Place the front of the motherboard down on the front standoffs The front of the motherboard refers to the part of the motherboard nearest the front of the server SPARC Ent
95. howfaults Last POST run THU MAR 09 16 52 44 2006 POST status Passed all devices No failures found in System m Example of the showfaults command displaying an environmental fault sc gt showfaults v ID FRU 0 IOBD low warning threshold Fault VOLTAGE_SI Last POST run TUE FEB 07 18 51 02 2006 POST status Passed all devices ENSOR at IOBD V_ 1V has exceeded Chapter 3 Server Diagnostics 3 21 m Example showing a fault that was detected by POST These kinds of faults are identified by the message deemed faulty and disabled and by a FRU name sc gt showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0O CH0 R0 D0 MB CMP0 CH0 RO DO deemed faulty and disabled m Example showing a fault that was detected by the PSH technology These kinds of faults are identified by the text Host detected fault and by a UUID sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CHO RO DO Host detected fault MSGID SUN4U 8000 2S UUID 7ee0e46b ea64 6565 e684 e996963F7b86 AGN Running the showenvironment Command The showenvironment command displays a snapshot of the server s environmental status This command displays system temperatures hard drive status power supply and fan status front panel LED status voltage and current sensors The output uses a format similar to the Solaris OS command prtdiag 1m At the sc gt prompt type the showenvironment command The output
96. iated with the removed FRU are cleared The message for that case and the alert sent for all FRU removals is fru at location has been removed There is no ALOM CMT command to manually repair an environmental fault The Solaris Predictive Self Healing technology does not monitor the hard drive for faults As a result ALOM CMT does not recognize hard drive faults and will not light the fault LEDs on either the chassis or the hard drive itself Use the Solaris message files to view hard drive faults See Section 3 6 Collecting Information From Solaris OS Files and Commands on page 3 45 Chapter 3 Server Diagnostics 3 17 SA 3 3 1 1 3 3 1 2 Running ALOM CMT Service Related Commands This section describes the ALOM CMT commands that are commonly used for service related activities Connecting to ALOM CMT Before you can run ALOM CMT commands you must connect to the ALOM CMT There are several ways to connect to the system controller m Connect an ASCII terminal directly to the serial management port m Use the telnet command to connect to ALOM CMT through an Ethernet connection on the network management port Note Refer to the Advanced Lights Out Manager ALOM CMT Guide for instructions on configuring and connecting to ALOM CMT Switching Between the System Console and ALOM CMT m To switch from the console output to the ALOM CMT sc gt prompt type Hash Period m To switch from the sc gt prompt to
97. ice Mode Forces a Sun prescribed level of diagnostic execution Overrides user defined settings as if parameters were diag_level max diag_verbosity max diag_trigger all resets diag_mode service User defined settings are not modified Normal Mode Diagnostic execution is enabled User defined settings control test coverage and verbosity via diag_level diag_verbosity diag_trigger user_reset power_on_reset error_reset diag_trigger System Boot OpenBoot PROM FIGURE 3 10 Flowchart of ALOM CMT Variables for POST Configuration Chapter 3 Server Diagnostics 3 29 TABLE 3 10 shows typical combinations of ALOM CMT variables and associated POST modes TABLE 3 10 ALOM CMT Parameters and POST Modes Keyswitch Normal Diagnostic Mode Diagnostic Service Diagnostic Preset Parameter Default Settings No POST Execution Mode Values diag_mode normal off service normal setkeyswitch normal normal normal diag diag_level min n a max max diag_trigger power on reset none all resets all resets error reset diag_verbosity normal n a max Description of POST This is the default POST POST does not POST runs the execution configuration This run resulting in full spectrum of configuration tests the quick system tests with the system thoroughly and initialization but maximum output suppresses some of the this is nota displayed detailed POST output suggested configuration max
98. ill be damaged or interfere when the server is extended Although the cable management arm CMA that is supplied with the server is hinged to accommodate extending the server ensure that all cables and cords are capable of extending From the front of the server release the slide rail latches on each side Pinch the green latches as shown in FIGURE 5 1 Chapter 5 Replacing Cold Swappable FRUs 5 3 5 1 4 FIGURE 5 1 Slide Release Latches While pinching the release latches slowly pull the server forward until the slide rails latch Removing the Server From a Rack Remove the server from the rack for all cold swappable FRU replacement procedures except the DIMMs PCI cards and the system controller Caution The server weighs approximately 40 Ib 18 kg Two people are required to dismount and carry the chassis Disconnect all the cables and power cords from the server Extend the server to the maintenance position as described in Section 5 1 3 Extending the Server to the Maintenance Position on page 5 3 Press the metal lever FIGURE 5 2 that is located on the inner side of the rail to disconnect the CMA from the rail assembly on the right side from the back of the rack This action leaves the CMA still attached to the cabinet but the server chassis is now disconnected from the CMA 5 4 SPARC Enterprise T2000 Server Service Manual April 2007 l
99. ing the Server to the Rack 5 43 Release Levers 5 44 Installing the CMA 5 45 Hard Drive Slots 6 3 Adding a USB Device 6 4 DIMM Layout 6 5 Location of PCI Express and PCI X Card Slots 6 7 Field Replaceable Units 1 of 2 A 2 Field Replaceable Units 2 of 2 A 3 Figures xiii xiv SPARC Enterprise T2000 Server Service Manual April 2007 Tables TABLE 2 1 TABLE 3 1 TABLE 3 2 TABLE 3 3 TABLE 3 4 TABLE 3 5 TABLE 3 6 TABLE 3 7 TABLE 3 8 TABLE 3 9 TABLE 3 10 TABLE 3 11 TABLE 3 12 TABLE 5 1 TABLE 6 1 TABLE A 1 Server Features 2 4 Diagnostic Flowchart Actions 3 4 Front and Rear Panel LEDs 3 10 Hard Drive LEDs 3 11 Power Supply LEDs 3 12 FanLEDs 3 13 Blower UnitLED 3 14 Ethernet PortLEDs 3 15 Service Related ALOM CMT Commands 3 19 ALOM CMT Parameters Used For POST Configuration 3 27 ALOM CMT Parameters and POST Modes 3 30 ASR Commands 3 46 Useful SunVTS Tests to Run on This Server 3 53 DIMM Names and Socket Numbers 5 13 DIMM Names and Socket Numbers 6 6 Server FRU List A 4 XV xvi SPARC Enterprise T2000 Server Service Manual April 2007 Preface The SPARC Enterprise T2000 Server Service Manual provides information to aid in diagnosing hardware problems and describes how to replace components within the SPARC Enterprise T2000 server This guide also describes how to add components such as hard drives and memory to the server This manual is written for technician
100. it is fully seated 3 Close the latch to lock the drive in place 4 Perform administrative tasks to reconfigure the hard drive The procedures that you perform at this point depend on how your data is configured You might need to partition the drive create file systems load data from backups or have data updated from a RAID configuration Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs 4 11 4 12 SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 5 Replacing Cold Swappable FRUs This chapter describes how to remove and replace field replaceable units FRUs in the server that must be cold swapped The following topics are covered m Section 5 1 Common Procedures for Parts Replacement on page 5 1 m Section 5 2 Removing and Replacing FRUs on page 5 8 m Section 5 3 Common Procedures for Finishing Up on page 5 41 For a list of FRUs see Appendix A Note Never attempt to run the system with the cover removed The cover must be in place for proper air flow The cover interlock switch intrusion switch immediately shuts the system down when the cover is removed 5 1 Common Procedures for Parts Replacement Before you can remove and replace parts that are inside the server you must perform the following procedures m Section 5 1 2 Shutting the System Down on page 5 2 m Section 5 1 3 Extending the Server to the Maintenance Position on page 5 3 m S
101. ks with ALOM CMT to take faulty components offline if needed m Solaris OS Predictive Self Healing PSH This technology continuously monitors the health of the CPU and memory and works with ALOM CMT to take a faulty component offline if needed The Predictive Self Healing technology enables systems to accurately predict component failures and mitigate many serious problems before they occur Log files and console messages Provide the standard Solaris OS log files and investigative commands that can be accessed and displayed on the device of your choice m SunVTS An application that exercises the system provides hardware validation and discloses possible faulty components with recommendations for repair The LEDs ALOM CMT Solaris OS PSH and many of the log files and console messages are integrated For example a fault detected by the Solaris software displays the fault logs it passes information to ALOM CMT where it is logged and depending on the fault might light one or more LEDs The flow chart in FIGURE 3 1 and TABLE 3 1 describes an approach for using the server diagnostics to identify a faulty field replaceable unit FRU The diagnostics you use and the order in which you use them depend on the nature of the problem you are troubleshooting so you might perform some actions and not others The flow chart assumes that you have already performed some troubleshooting such as verification of proper installation and
102. l 2007 5 3 5 28 Replacing the Motherboard Assembly 5 23 5 29 Removing the Power Distribution Board 5 27 5 2 10 Replacing the Power Distribution Board 5 30 5 2 11 Removing the LED Board 5 32 5 2 12 Replacing the LED Board 5 33 5 2 13 Removing the Fan Power Board 5 34 5 2 14 Replacing the Fan Power Board 5 34 5 2 15 Removing the Front I O Board 5 35 5 2 16 Replacing the Front I O Board 5 36 5 2 17 Removing the DVD Drive 5 37 5 2 18 Replacing the DVD Drive 5 37 5 2 19 Removing the SAS Disk Backplane 5 37 5 220 Replacing the SAS Disk Backplane 5 38 5 2 21 Removing the Battery on the System Controller 5 40 5 222 Replacing the Battery on the System Controller 5 40 Common Procedures for Finishing Up 5 41 5 3 1 Replacing the Top Front Cover and Front Bezel 5 41 5 3 2 Replacing the Top Cover 5 42 5 3 3 Reinstalling the Server Chassis in the Rack 5 42 5 34 Returning the Server to the Normal Rack Position 5 43 5 3 5 Applying Power to the Server 5 45 6 Adding New Components and Devices 6 1 6 1 6 2 Adding Hot Pluggable and Hot Swappable Devices 6 1 6 1 1 Adding a Hard Drive to the Server 6 1 6 12 Adding a USB Device 6 3 Adding Components Inside the Chassis 6 4 6 2 1 Memory Guidelines 6 4 6 2 2 Adding DIMMs 6 6 Contents ix 6 2 3 PCI Express or PCI X Card Guidelines 6 7 6 2 4 Adding a PCI Express or PCI X Card 6 7 A Field Replaceable Units A 1 A 1 Illustrated FRU Locations A 2 Index Index 1 x SPARC Enterprise
103. le FRU e CPU board Comprises the central processing subsystem for the server which includes the UltraSPARC T1 CPU processor 16 DIMM connectors the memory controllers and supporting circuitry I O board Provides the I O logic including the connectors for the PCI X and PCI Express interfaces Ethernet interfaces all the power interconnections and miscellaneous logic Note This assembly is provided in different configurations to accommodate the different processor models 4 6 and 8 core This board implements the system controller subsystem The SC board contains a PowerPC Extended Core and a communications processor that controls the host power and monitors host system events power and environmental The board holds a socketed EEPROM for storing the system configuration all Ethernet MAC addresses and the host ID This board only draws power from the 3 3V standby supply rail which is available whenever the system is receiving AC input power even when the system is turned off Battery Optional add on cards MB IOBD SC SC BAT PCIEO PCIE1 PCIE2 PCIXO PCIX1 A 4 SPARC Enterprise T2000 Server Service Manual April 2007 TABLE A 1 Server FRU List Continued Replacement Item No FRU Instructions Description FRU Name 5 DIMMs Section 5 2 3 Can be ordered in the following sizes See Removing e 512 MB TABLE 5 1 DIMMs on e 1GB in page 5 12 e 2GB Chapter 5 e 4GB 6 Power Section 5 2 9
104. lt and disables the faulty DIMMs by placing them in the ASR blacklist For a given memory fault POST disables half of the physical memory in the system When this offlining process occurs in normal operation you must replace the faulty DIMMs based on the fault message and enable the disabled DIMMs with the ALOM CMT enablecomponent command In other than normal operation POST can be configured to run various levels of testing see TABLE 3 9 and TABLE 3 10 and can thoroughly test the memory subsystem based on the purpose of the test However with thorough testing enabled diag_level max POST finds faults and offlines memory devices with errors that could be correctable with PSH Thus not all memory devices detected and offlined by POST need to be replaced See Section 3 4 5 Correctable Errors Detected by POST on page 3 36 m Solaris Predictive Self Healing PSH technology A feature of the Solaris OS PSH uses the fault manager daemon fmd to watch for various kinds of faults When a fault occurs the fault is assigned a unique fault ID UUID and logged PSH reports the fault and provides a recommended proactive replacement for the DIMMs associated with the fault Chapter 3 Server Diagnostics 3 7 3 1 1 3 Troubleshooting Memory Faults If you suspect that the server has a memory problem follow the flowchart see FIGURE 3 1 Run the ALOM CMT showfaults command The showfaults command lists memory faults and lists the speci
105. mail alerts of hardware failures hardware warnings and other events related to the server or to ALOM CMT The ALOM CMT circuitry runs independently of the server using the server s standby power Therefore ALOM CMT firmware and software continue to function when the server OS goes offline or when the server is powered off Note Refer to the Advanced Lights Out Manager ALOM CMT Guide for comprehensive ALOM CMT information Faults detected by ALOM CMT POST and the Solaris Predictive Self healing PSH technology are forwarded to ALOM CMT for fault handling FIGURE 3 9 In the event of a system fault ALOM CMT ensures that the Service Required LED is lit FRU ID PROMs are updated the fault is logged and alerts are displayed Faulty FRUs are identified in fault messages using the FRU name For a list of FRU names see Appendix A Service Required LED FRU LEDs FRUID PROMs Environmentals TL ALOM POST gt m fault manager Logs Solaris PSH Alerts Ih FIGURE 3 9 ALOM CMT Fault Management ALOM CMT sends alerts to all ALOM CMT users that are logged in sending the alert through email to a configured email address and writing the event to the ALOM CMT event log 3 16 SPARC Enterprise T2000 Server Service Manual April 2007 ALOM CMT can detect when a fault is no longer present and clears the fault in several ways a Fault recovery The system automatic
106. mory configuration of your server m There are 16 slots that hold DDR2 memory DIMMs m The server accepts the following DIMM sizes 512 MB 1 GB 2 GB 4 GB m The server supports two ranks of eight DIMMs each 6 4 SPARC Enterprise T2000 Server Service Manual April 2007 a At minimum rank 0 must be fully populated with eight DIMMS of the same capacity a DIMMs can be added eight at a time of the same capacity to fill rank 1 Rank 0 J0901 J0701 J0801 JO601 J1401 J1201 J1301 J1101 Rank 1 J1901 J1701 J1801 J1601 J2401 J2201 J2301 J2101 FIGURE 6 3 DIMM Layout Chapter 6 Adding New Components and Devices 6 5 TABLE 6 1 DIMM Names and Socket Numbers DIMM Name Socket Number Rank 0 DIMMs CHO RO D1 J0701 CHO RO DO J0601 CH1 RO D1 J1201 CH1 R0 D0 J1101 CH2 RO D1 J1701 CH2 R0O DO J1601 CH3 RO D1 J2201 CH3 RO DO J2101 Rank 1 DIMMs CHO R1 D1 J0901 CHO R1 D0 J0801 CH1 R1 D1 J1401 CH1 R1 D0 J1301 CH2 R1 D1 J1901 CH2 R1 D0 J1801 CH3 R1 D1 J2401 CH3 R1 D0 J2301 6 2 2 Adding DIMMs 1 Perform all of the procedures in Section 5 1 Common Procedures for Parts Replacement on page 5 1 2 Unpackage the DIMMs and place them on an antistatic mat 3 Ensure that the connector ejector tabs on the CPU board DIMM connectors are in the open position 4 Line up the DIMM with the connector 5 Push the DIMM into the connector until the ejector tabs lock the DIMM in place 6 Repeat Step 3 thr
107. moving 4 7 replacing 4 7 Rear Blower LED 4 8 Rear FRU Fault LED 3 10 rear panel illustration 2 9 LEDs 3 9 reliability availability serviceability RAS features 2 6 remote management 2 5 removefru command 3 20 4 5 removing battery on the system controller 5 40 DIMMs 5 12 DVD drive 5 37 fan power board 5 34 front bezel 5 7 front I O board 5 35 LED board 5 32 motherboard assembly 5 19 PCI E and PCI X cards 5 9 power distribution board 5 27 SAS disk backplane 5 37 server from the rack 5 4 system controller card 5 17 top cover 5 6 replacing battery on the system controller 5 40 DIMMs 5 14 DVD drive 5 37 fan power board 5 34 front I O board 5 36 LED board 5 33 motherboard assembly 5 23 PCI cards 5 11 power distribution board 5 30 SAS disk backplane 5 38 system controller card 5 18 top cover 5 42 top front cover and front bezel 5 41 reset command 3 20 resetsc command 3 20 S safety information 1 1 safety symbols 1 1 SAS controller 6 1 SAS disk backplane A 6 removing 5 37 replacing 5 38 SASBP SAS disk backplane FRU name A 6 SC system controller card FRU name A 4 SC BAT system controller battery FRU name A 4 sc_servicemode parameter 5 32 sensors temperature 2 7 serial number chassis 2 10 server extending to maintenance position 5 3 illustration 2 2 locating 3 10 returning to normal rack position 5 43 weight 5 4 Service Required LED 3 10 3 16 3 40 4 2
108. ndled by PSH CODE EXAMPLE 3 1 POST Fault for a Single DIMM sc gt showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CHO RO DO MB CMP0 CHO RO DO deemed faulty and disabled In this case reenable the DIMM and run POST in minimum mode as follows Reenable the DIMM sc gt enablecomponent name of DIMM Return POST to minimum mode sc gt setkeyswitch normal sc gt setsc diag mode normal sc gt setsc diag level min Reset the system so that POST runs There are several ways to initiate a reset The following example uses the powercycle command For other methods refer to the SPARC Enterprise T2000 Server Administration Guide sc gt powercycle Are you sure you want to powercycle the system y n y Powering host off at MON JAN 10 02 52 02 2000 Waiting for host to Power Off hit any key to abort SC Alert SC Request to Power Off Host SC Alert Host system has shut down Powering host on at MON JAN 10 02 52 13 2000 SC Alert SC Request to Power On Host 4 Replace the DIMM if POST continues to fault the device in minimum mode Chapter 3 Server Diagnostics 3 37 3 4 5 2 Determining When to Replace Detected Devices Note This section assumes faults are detected by POST in maximum mode If a detected device is part of a hardware upgrade or repair or if POST detects multiple DIMMs CODE EXAMPLE 3 2 replace the detected devices CODE EXAMPLE 3 2 POST Fault fo
109. o Too detailed o Appropriate o Not enough detail All comments and suggestions become the property of Fujitsu Limited For Users in U S A Canada For Users in Other Countries and Mexico Fax this form to the number below or send this form Fold and fasten as shown on back to the address below No postage necessary if mailed in U S A Fujitsu Computer Systems Fujitsu Learning Media Limited Attention Engineering Ops M S 249 FAX 81 3 3730 3702 1250 East Arques Avenue 37 10 Nishi Kamata 7 chome P O Box 3470 Oota Ku Sunnyvale CA 94088 3470 Tokyo 144 0051 FAX 408 746 6813 JAPAN FUJITSU LIMITED Preface xxv FOLD AND TAPE BUSINESS REPLY MAIL FIRST CLASS MAIL PERMIT NO 741 SUNNYVALE CA POSTAGE WILL BE PAID BY ADDRESSEE co FUJITSU FUJITSU COMPUTER SYSTEMS AT TENTION ENGINEERING OPS M S 249 1250 EAST ARQUES AVENUE P O BOX 3470 SUNNYVALE CA 94088 3470 NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES xxvi FOLD AND TAPE SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 1 Safety Information This chapter provides important safety information for servicing the server The following topics are covered m Section 1 1 Safety Information on page 1 1 m Section 1 2 Safety Symbols on page 1 1 m Section 1 3 Electrostatic Discharge Safety on page 1 2 1 1 Safety Information This section describes safety information you need to kn
110. onfiguration information see Section 6 2 1 Memory Guidelines on page 6 4 5 12 SPARC Enterprise T2000 Server Service Manual April 2007 Rank 0 40901 JO701 JO801 JO601 J1401 J1201 J1301 J1101 Rank 1 Channel 1 E Front of board J1901 J1701 J1801 J1601 J2401 J2201 J2301 J2101 FIGURE 5 8 DIMM Locations Use FIGURE 5 8 and TABLE 5 1 to map DIMM names that are displayed in faults to socket numbers that identify the location of the DIMM on the motherboard TABLE 5 1 DIMM Names and Socket Numbers DIMM Name Used in Messages Socket No CHO R1 D1 J0901 CHO RO D1 J0701 CHO R1 D0 J0801 CHO RO DO J0601 CH1 R1 D1 J1401 CH1 RO D1 J1201 Chapter 5 Replacing Cold Swappable FRUs 5 13 5 2 4 TABLE 5 1 DIMM Names and Socket Numbers Continued DIMM Name Used in Messages Socket No CH1 R1 D0 J1301 CH1 R0 DO J1101 CH2 R1 D1 J1901 CH2 R0 D1 J1701 CH2 R1 D0 J1801 CH2 R0 D0 J1601 CH3 R1 D1 J2401 CH3 R0 D1 J2201 CH3 R1 D0 J2301 CH3 R0 D0 J2101 DIMM names in messages are displayed with the full name such as MB CMP0 CH1 R1 D1 This table omits the preced ing MB CMPO for clarity Note the DIMM locations so you can install the replacement DIMMs in the same sockets Push down on the ejector levers on each side of the DIMM connector until the DIMM is released Grasp the top corners of the faulty DIMM and remove it from the system Place DIMMs on an
111. ough Step 4 Run the clearfault command with the UUID provided in the showfaults output sc gt clearfault 7ee0e46b ea64 6565 e684 e996963f7b86 Clearing fault from all indicted FRUs Fault cleared Clear the fault from all persistent fault records In some cases even though the fault is cleared some persistent fault information remains and results in erroneous fault messages at boot time To ensure that these messages are not displayed perform the following command fmadm repair UUID Example fmadm repair 7ee0e46b ea64 6565 e684 e996963 7b86 3 44 SPARC Enterprise T2000 Server Service Manual April 2007 3 6 3 6 1 3 6 2 Collecting Information From Solaris OS Files and Commands With the Solaris OS running on the server you have the full complement of Solaris OS files and commands available for collecting information and for troubleshooting If POST ALOM CMT or the Solaris PSH features do not indicate the source of a fault check the message buffer and log files for notifications for faults Hard drive faults are usually captured by the Solaris message files Use the dmesg command to view the most recent system message To view the system messages log file view the contents of the var adm messages file Checking the Message Buffer Log in as superuser Issue the dmesg command dmesg The dmesg command displays the most recent messages generated by the system
112. ough Step 5 for each additional DIMM 6 6 SPARC Enterprise T2000 Server Service Manual April 2007 7 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 625 PCI Express or PCI X Card Guidelines Follow these guidelines and FIGURE 6 4 to plan your configuration m The server provides the following PCI capabilities a 3 PCI Express PCI E slots for low profile cards supports lane widths of x1 x2 x4 and x8 2 PCI X slots for low profile cards Note There are a variety of PCI X and PCI Express cards on the market Read the product documentation for your device for additional installation requirements and instructions that are not covered here PCI E slots 0 1 2 PCI X slots 0 1 FIGURE 6 4 Location of PCI Express and PCI X Card Slots 6 2 4 Adding a PCI Express or PCI X Card 1 Perform all of the procedures in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Chapter 6 Adding New Components and Devices 6 7 2 Rotate the PCI hold down bracket located on the edge of the chassis 90 degrees so that the chassis edge can accept the card You might need to loosen the screw that holds the bracket to the chassis 3 Line up the PCI card with the PCI connector on the rear of the motherboard 4 Push the card into the connector so it is fully seated 5 Rotate the PCI hold down bracket to the closed position and secure the screw on
113. ow prior to removing or installing parts in the server For your protection observe the following safety precautions when setting up your equipment a Follow all standard cautions warnings and instructions marked on the equipment and described in Important Safety Information for Hardware Systems C120 E391 m Ensure that the voltage and frequency of your power source match the voltage and frequency inscribed on the equipment s electrical rating label a Follow the electrostatic discharge safety practices as described in Section 1 3 Electrostatic Discharge Safety on page 1 2 12 Safety Symbols The following symbols might appear in this document Note their meanings gt E gt gt Caution There is a risk of personal injury and equipment damage To avoid personal injury and equipment damage follow the instructions Caution Hot surface Avoid contact Surfaces are hot and might cause personal injury if touched Caution Hazardous voltages are present To reduce the risk of electric shock and danger to personal health follow the instructions Ts QJ Low Electrostatic Discharge Safety Electrostatic discharge ESD sensitive devices such as the motherboard PCI cards hard drives and memory cards require special handling Caution The boards and hard drives contain electronic components that are extremely sensitive to static electricity Ordinary amounts of static electricit
114. pages a If SunVTS software is installed information about the packages is displayed Chapter 3 Server Diagnostics 3 49 OO a If SunVTS software is not installed you see an error message for each missing package ERROR information for SUNWvts was not found ERROR information for SUNWvtsr was not found If SunVTS is not installed you can obtain the installation packages from the Solaris Operating System DVDs The SunVTS 6 0 PS3 software and future compatible versions are supported on the server SunVTS installation instructions are described in the SunVTS User s Guide Exercising the System Using SunVTS Software Before you begin the Solaris OS must be running You also need to ensure that SunVTS validation test software is installed on your system See Section 3 8 1 Checking Whether SunVTS Software Is Installed on page 3 49 The SunVTS installation process requires that you specify one of two security schemes to use when running SunVTS The security scheme you choose must be properly configured in the Solaris OS for you to run SunVTS For details refer to the SunVTS User s Guide SunVTS software features both character based and graphics based interfaces This procedure assumes that you are using the graphical user interface GUI on a system running the Common Desktop Environment CDE For more information about the character based SunVTS TTY interface and specifically for instructions on ac
115. place Open the fan door Tighten the captive screw to secure the front bezel to the chassis Replacing the Top Cover Place the top cover on the chassis Set the cover down so that it hangs over the rear of the server by about an inch Slide the cover forward until it latches into place Reinstalling the Server Chassis in the Rack If you removed the server chassis from the rack perform these steps Caution The server weighs approximately 40 Ib 18 kg Two people are required to carry the chassis and install it in the rack Ensure that the rack rails are extended Place the ends of the chassis mounting brackets inner section into the slide rails FIGURE 5 30 5 42 SPARC Enterprise T2000 Server Service Manual April 2007 FIGURE 5 30 Returning the Server to the Rack 3 Slide the server into the rack until the brackets lock into place The server is now in the extended maintenace position 5 3 4 Returning the Server to the Normal Rack Position If you extended the server to the maintenance position use this procedure to return the server to the normal rack position 1 Release the slide rails from the fully extended position by pushing the release levers on the side of each rail FIGURE 5 31 Chapter 5 Replacing Cold Swappable FRUs 5 43 FIGURE 5 31 Release Levers 2 While pushing on the release levers slowly push the server into the rack Ens
116. r Multiple DIMMs sc gt showfaults v ID Time FRU Fault 1 OCT 13 12 47 27 MB CMP0 CH0O RO DO MB CMP0 CH0O RO DO deemed faulty and disabled 2 OCT 13 12 47 27 MB CMP0O CHO RO D1 MB CMP0 CH0 RO D1 deemed faulty and disabled Note The previous example shows two DIMMs on the same channel rank which could be an uncorrectable error If the detected device is not a part of a hardware upgrade or repair use the following list to examine and repair the fault 1 If a detected device is not a DIMM or if more than a single DIMM is detected replace the detected devices 2 If a detected device is a single DIMM and the same DIMM is also detected by PSH replace the DIMM CODE EXAMPLE 3 3 CODE EXAMPLE 3 3 PSH and POST Faults on the Same DIMM sc gt showfaults v ID Time FRU Fault 0 SEP 09 11 09 26 MB CMP0 CHO RO DO Host detected fault MSGID SUN4V 8000 DX UUID 7ee0e46b ea64 6565 e684 e996963f7b86 1 OCT 13 12 47 27 MB CMP0 CHO RO DO MB CMP0 CHO RO DO deemed faulty and disabled Note The detected DIMM in the previous example must also be replaced because it exceeds the PSH page retire threshold 3 38 SPARC Enterprise T2000 Server Service Manual April 2007 3 4 6 3 If a device detected by POST is a single DIMM and the same DIMM is not detected by PSH follow the procedure in Section 3 4 5 1 Correctable Errors for Single DIMMs on page 3 37 After the detected devices ar
117. r improves throughput while using less power and dissipating less heat than conventional processor designs Depending on the model purchased the processor has four or eight UltraSPARC cores Each core equates to a 64 bit execution pipeline capable of running four threads The result is that the 8 core processor handles up to 32 active threads concurrently 2 2 SPARC Enterprise T2000 Server Service Manual April 2007 Additional processor components such as L1 cache L2 cache memory access crossbar DDR2 memory controllers and a JBus I O interface have been carefully tuned for optimal performance UltraSPARC T1 multicore processor FIGURE 2 2 Motherboard and UltraSPARC T1 Multicore Processor 2 1 2 Performance Enhancements The server introduces several new technologies with its sun4v architecture and multithreaded UltraSPARC T1 multicore processor Some of these enhancements are m Large page optimization m Reduction on TLB misses a Optimized block copy Chapter 2 Server Overview 2 3 TABLE 2 1 lists feature specifications for the server TABLE 2 1 Server Features Feature Processor Architecture Memory Ethernet ports Internal hard drives Other internal peripherals USB ports Cooling PCI interfaces Power Description 1 UltraSPARC T1 multicore processor 4 or 8 cores SPARC V9 architecture ECC protected Platform group sun4v Platform name SUNW SPARC Enterprise T2000 16 slots that can b
118. res pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc SPARC64 est une marques d pos e de SPARC International Inc utilis e sous le permis par Fujitsu Microelectronics Inc et Fujitsu Limited L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconnait les efforts de pionniers de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une license non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci s de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun Droits du gouvernement am ricain logiciel commercial Les utilisateurs du gouvernement am ricain sont soumis aux contrats de licence standard de Sun Microsystems Inc et de Fujitsu Limited ainsi qu aux clauses applicables stipul es dans le FAR et ses suppl ments Avis de non responsabilit les seules garanties octroy es par Fujitsu Limited Sun Microsystems Inc ou toute soci t affili e de l une ou l autre entit en rapport avec ce document ou tout produit ou toute technologie d crit e dans les pr sentes correspondent aux garanties express ment stipul es dans le contrat de
119. ritical hardware components to verify functionality before the system boots and accesses software If POST detects an error the faulty component is disabled automatically preventing faulty hardware from potentially harming software In normal operation diag_level min POST runs in mimimum mode by default to test devices required to power on the server Replace any devices POST detects as faulty in minimum mode Run POST in maximum mode diag_level max for all power on or error generated resets and to validate hardware upgrades or repairs With maximum testing enabled POST finds faults and offlines memory devices with errors that could be correctable by PSH Check the POST generated errors with the showfaults v command to verify if memory devices detected by POST can be corrected by PSH or need to be replaced See Section 3 4 5 Correctable Errors Detected by POST on page 3 36 Chapter 3 Server Diagnostics 3 31 3 4 3 2 3 4 4 Diagnosing the System Hardware You can use POST as an initial diagnostic tool for the system hardware In this case configure POST to run in maximum mode diag_mode service setkeyswitch diag diag_level max for thorough test coverage and verbose output Running POST in Maximum Mode This procedure describes how to run POST when you want maximum testing as in the case when you are troubleshooting a server or verifying a hardware upgrade or repair Switch from the system console prompt to the
120. s gt ERROR TEST failing test c 5 gt H W under test FRU c s gt Repair Instructions Replace items in order listed by H W under test above c s gt MSG test error message c s gt END_ERROR In this syntax c the core number s the strand number Warning and informational messages use the following syntax INFO or WARNING message The following example shows a POST error message 7 2 gt 7 2 gt ERROR TEST Data Bitwalk 7 2 gt H W under test MB CMP0 CH2 R0 D0 SO MB CMP0 CH2 R0 D0 7 2 gt Repair Instructions Replace items in order listed by H W under test above 7 2 gt MSG Pin 149 failed on MB CMP0 CH2 R0 D0 J1601 7 2 gt END_ERROR 7 2 gt Decode of Dram Error Log Reg Channel 2 bits 60000000 0000108c 7 2 gt 1 MEC 62 R W1C Multiple corrected errors one or more CE not logged 7 2 gt 1 DAC 61 R W1C Set to 1 if the error was a DRAM access CE 7 2 gt 108c SYND 15 0 RW ECC syndrome 7 2 gt 7 2 gt Dram Error AFAR channel 2 00000000 00000000 7 2 gt L2 AFAR channel 2 00000000 00000000 In this example POST is reporting a memory error at DIMM location MB CMP0 CH2 R0 D0 It was detected by POST running on core 7 strand 2 Chapter 3 Server Diagnostics 3 35 3 4 5 b Run the showfaults command to obtain additional fault information The fault is captured by ALOM CMT where the fault is logged the Service Required LED is lit and the faulty component is disabled
121. s on screen computer Use ls a to list all files output You have mail AaBbCc123 What you type when contrasted su with on screen computer output Password AaBbCc123 Book titles new words or terms Read Chapter 6 in the User s words to be emphasized Replace command line variables with real names or values Guide These are called class options You must be superuser to do this To delete a file type rm filename The settings on your browser might differ from these settings xx SPARC Enterprise T2000 Server Service Manual April 2007 Prompt Notations The following prompt notations are used in this manual Shell Prompt Notations C shell machine name C shell superuser machine name Bourne shell and Korn shell Bourne shell and Korn shell and Korn shell superuser gt gt Conventions for Alert Messages This manual uses the following conventions to show alert messages which are intended to prevent injury to the user or bystanders as well as property damage and important messages that are useful to the user Warning This indicates a hazardous situation that could result in death or serious personal injury potential hazard if the user does not perform the procedure correctly Caution This indicates a hazardous situation that could result in minor or moderate personal injury if the user does not perform the procedure correctly This signal also indicates that damage to t
122. s service personnel and system administrators who service and repair computer systems The person qualified to use this manual Can open a system chassis identify and replace internal components Understands the Solaris Operating System and the command line interface Has superuser privileges for the system being serviced Understands typical hardware troubleshooting tasks FOR SAFE OPERATION This manual contains important information regarding the use and handling of this product Read this manual thoroughly Pay special attention to the section Notes on Safety on page xxii Use the product according to the instructions and information available in this manual Keep this manual handy for further reference Fujitsu makes every effort to prevent users and bystanders from being injured or from suffering damage to their property Use the product according to this manual xvii Structure and Contents of This Manual This manual is organized as described below Chapter 1 Safety Information Describes the safety precautions of the SPARC Enterprise T2000 server Chapter 2 Server Overview Describes the main features of the SPARC Enterprise T2000 server Chapter 3 Server Diagnostics Describes the diagnostics that are available for monitoring and diagnosing the SPARC Enterprise T2000 server Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs Explains how to remove and replace hot swappable and hot pluggable field repla
123. sc gt prompt by issuing the escape sequence ok sc gt Set the virtual keyswitch to diag so that POST will run in service mode sc gt setkeyswitch diag Reset the system so that POST runs There are several ways to initiate a reset The following example uses the powercycle command For other methods refer to the SPARC Enterprise T2000 Server Administration Guide sc gt powercycle Are you sure you want to powercycle the system y n y Powering host off at MON JAN 10 02 52 02 2000 Waiting for host to Power Off hit any key to abort SC Alert SC Request to Power Off Host SC Alert Host system has shut down Powering host on at MON JAN 10 02 52 13 2000 SC Alert SC Request to Power On Host 3 32 SPARC Enterprise T2000 Server Service Manual April 2007 4 Switch to the system console to view the POST output sc gt console Example of POST output SC Alert Host System has Reset 0 0 gt SUN PROPRIETARY CONFIDENTIAL Use is subject to license terms 0 gt VBSC selecting POST MAX Testing 0 gt VBSC enabling L2 Cache 0 gt VBSC enabling Full Memory Scrub 0 gt VBSC enabling threads fffff00f 0 gt Init CPU 0 gt Start Selftest 0 gt CPU 0 0 gt DMMU Registers Access 0 gt IMMU Registers Access 0 gt Init mmu regs 0 gt D Cache RAM 0 gt Init MMU 0 gt DMMU TLB DATA RAM Access 0 gt DMMU TLB TAGS Access 0 gt DMMU CAM
124. se Latches 5 4 Locating the Metal Lever 5 5 Top Cover and Release Button 5 7 Removing the Front Bezel From the Server Chassis 5 8 Location of PCI Express and PCI X Card Slots 5 9 Location of PCI Express and PCI X Card Slots 5 10 PCI Card and Hold Down Bracket 5 11 DIMM Locations 5 13 Ejecting and Removing the System Controller Card 5 18 Locating the System Configuration PROM 5 18 Motherboard Assembly 5 20 Cable Cutout 5 21 Location of the Screws in the Motherboard Assembly 5 22 Removing the Motherboard Assembly From the Server Chassis 5 23 Installing the Motherboard Assembly 5 25 Securing the Motherboard Assembly to the Chassis 5 26 Location of Power Supply Latch 5 28 Location of Bus Bar Screws on the Power Distribution Board and the Motherboard Assembly 5 29 Removing the Power Distribution Board 5 30 Installing the Power Distribution Board 5 31 Removing the LED Board From the Chassis 5 33 Removing the Fan Power Board 5 34 Removing the Fan Guard 5 35 Removing the Front I O Board 5 36 Removing the SAS Disk Backplane 5 38 Replacing the SAS Disk Backplane 5 39 Removing the Battery From the System Controller 5 40 xii SPARC Enterprise T2000 Server Service Manual April 2007 FIGURE 5 28 FIGURE 5 29 FIGURE 5 30 FIGURE 5 31 FIGURE 5 32 FIGURE 6 1 FIGURE 6 2 FIGURE 6 3 FIGURE 6 4 FIGURE A 1 FIGURE A 2 Replacing the Battery in the System Controller 5 40 Replacing the Top Front Cover 5 41 Return
125. t and Rear Panel LEDs LED Color Description Locator White Enables you to identify a particular server Activate the LED using LED one of the following methods button e Issuing the set locator on or off command e Pressing the button to toggle the indicator on or off This LED provides the following indications e Off Normal operating state e Fast blink The server received a signal as a result of one of the preceding methods and is indicating that it is operational Service Amber If on indicates that service is required The ALOM CMT Required showfaults command provides details about any faults that LED cause this indicator to light Power OK Green The LED provides the following indications LED e Off The server is unavailable Either it has no power or ALOM CMT is not running e Steady on Indicates that the server is powered on and is running in its normal operating state e Standby blink Indicates that the service processor is running while the server is running at a minimum level in standby mode and ready to be returned to its normal operating state e Slow blink Indicates that a normal transitory activity is taking place Server diagnostics might be running or the system might be powering on Power Turns the host system on and off This button is recessed to on off prevent accidental server power off Use the tip of a pen to operate button this button Top fan LED Amber Provides the following operation
126. tem and the operating system is not mirrored on another drive m The hard drive cannot be logically isolated from the online operations of the server If your drive falls into these conditions you must shut the system down before you replace the hard drive See Section 5 1 2 Shutting the System Down on page 5 2 Removing a Hard Drive 1 Identify the location of the hard drive that you want to replace FIGURE 4 6 Chapter 4 Replacing Hot Swappable and Hot Pluggable FRUs 4 9 4 5 2 Latch Latch release button FIGURE 4 6 Locating the Hard Drive Release Button and Latch Issue the Solaris OS commands required to stop using the hard drive Exact commands required depend on the configuration of your hard drives You might need to unmount file systems or perform RAID commands On the drive you plan to remove push the latch release button FIGURE 4 6 The latch opens Caution The latch is not an ejector Do not bend it too far to the left Doing so can damage the latch Grasp the latch and pull the drive out of the drive slot Replacing a Hard Drive Align the replacement drive to the drive slot The hard drive is physically addressed according to the slot in which it is installed See FIGURE 4 6 It is important to install a replacement drive in the same slot from which the drive was removed 4 10 SPARC Enterprise T2000 Server Service Manual April 2007 2 Slide the drive into the bay until
127. test power on errors hardware upgrades or repairs Once the Solaris OS is running PSH provides run time diagnosis of faults Note Earlier versions of firmware have max as the default setting for the POST diag_level variable To set the default to min use the ALOM CMT command setsc diag level min For validating hardware upgrades or repairs configure POST to run in maximum mode diag_level max Note that with maximum testing enabled POST detects and offlines memory devices with errors that could be correctable by PSH Thus not all memory devices detected by POST need to be replaced See Section 3 4 5 Correctable Errors Detected by POST on page 3 36 Note Devices can be manually enabled or disabled using ASR commands see Section 3 7 Managing Components With Automatic System Recovery Commands on page 3 46 Controlling How POST Runs The server can be configured for normal extensive or no POST execution You can also control the level of tests that run the amount of POST output that is displayed and which reset events trigger POST by using ALOM CMT variables 3 26 SPARC Enterprise T2000 Server Service Manual April 2007 TABLE 3 9 lists the ALOM CMT variables used to configure POST and FIGURE 3 10 shows how the variables work together Note Use the ALOM CMT setsc command to set all the parameters in TABLE 3 9 except setkeyswitch TABLE 3 9 ALOM CMT Parameters Used For POST Configura
128. the bottom of the chassis The ledges hold the backplane in place temporarily 5 38 SPARC Enterprise T2000 Server Service Manual April 2007 Secure the backplane to the drive cage assembly with five insulating washers and five screws FIGURE 5 26 Do not fully tighten any screws until all of the screws are loosely installed Insulating washers are supplied with the replacement FRU Install one insulating washer with each screw even if the original SAS disk backplane did not have any washers FIGURE 5 26 Replacing the SAS Disk Backplane Connect the SAS power cable from the power cable connector Connect the four SAS data cables to the replacement SAS disk backplane ensuring that you connect the cables in the same positions on the replacement SAS disk backplane Reinstall all four hard drives in the slots from which you removed them Reinstall the DVD drive See Section 5 2 18 Replacing the DVD Drive on page 5 37 Chapter 5 Replacing Cold Swappable FRUs 5 39 6 2 21 0 2 22 Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the Battery on the System Controller Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Remove the system controller from the chassis Section 5 2 5 Removing the System Controller Card on page 5 17 and place the system controller on an antist
129. the bracket 6 Install any cables that go to the PCI card 7 Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 6 8 SPARC Enterprise T2000 Server Service Manual April 2007 APPENDIX A Field Replaceable Units This appendix provides illustrated parts breakdown diagrams and a table that lists the server FRUs The following topic is covered m Section A 1 Illustrated FRU Locations on page A 2 A 1 A 1 Illustrated FRU Locations FIGURE A 1 FIGURE A 2 and TABLE A 1 list the locations of the field replaceable units FRUS in the server FIGURE A 1 Field Replaceable Units 1 of 2 A 2 SPARC Enterprise T2000 Server Service Manual April 2007 FIGURE A 2 Field Replaceable Units 2 of 2 A 3 Appendix A Field Replaceable Units TABLE A 1 Server FRU List Item No Replacement FRU Instructions Description FRU Name Motherboard Section 5 2 7 assembly Removing the Motherboard Assembly on page 5 19 System Section 5 2 5 controller card Removing the OSP board System Controller Card on page 5 17 System Section 5 2 21 controller Removing the battery Battery on the System Controller on page 5 40 PCI Express Section 5 2 1 and Removing PCI PCI X cards Express and PCI X Cards on page 5 9 The motherboard assembly is comprised of the following boards that must be replaced as a sing
130. tion Parameter Values Description setkeyswitch normal The system can power on and run POST based on the other parameter settings For details see FIGURE 3 10 This parameter overrides all other commands diag The system runs POST based on predetermined settings stby The system cannot power on locked The system can power on and run POST but no flash updates can be made diag_mode off POST does not run normal Runs POST according to diag_level value service Runs POST with preset values for diag_level and diag_verbosity diag_level min If diag_mode normal runs minimum set of tests max If diag_mode normal runs all the minimum tests plus extensive CPU and memory tests diag_trigger none Does not run POST on reset diag_verbosity user_reset power_on_reset error_reset all_resets none Runs POST upon user initiated resets Only runs POST for the first power on This option is the default Runs POST if fatal errors are detected Runs POST after any reset No POST output is displayed Chapter 3 Server Diagnostics 3 27 TABLE 3 9 ALOM CMT Parameters Used For POST Configuration Continued Parameter Values Description min POST output displays functional tests with a banner and pinwheel normal POST output displays all test and informational messages max POST displays all test informational and some debugging messages 3 28 SPARC Enterprise T2000 Server Service Manual April 2007 Serv
131. tion about where to find documentation to get your system installed and running quickly Provides an overview of the features of this server Detailed rackmounting cabling power on and configuring information How to perform administrative tasks that are specific to this server How to use the Advanced Lights Out Manager ALOM software Safety and compliance information about this server Manual Code C120 E374 C120 H017 C120 E372 C120 E373 C120 E376 C120 E378 C120 E386 C120 E375 Note The product notes document is available on the website only Please check for the recent update on your product a Manuals included on the Enhanced Support Facility CD ROM disk Remote maintenance service Title Manual Code Enhanced Support Facility User s Guide for REMCS C112 B067 Preface xix Using UNIX Commands This document might not contain information about basic UNIX commands and procedures such as shutting down the system booting the system and configuring devices Refer to the following for this information a Software documentation that you received with your system a Solaris Operating System documentation which is at http docs sun com Text Conventions This manual uses the following fonts and symbols to express specific types of information Typeface Meaning Example AaBbCc123 The names of commands files and Edit your login file directorie
132. ure that the cables do not get in the way 3 Reconnect the CMA into the back of the rail assembly Note Refer to the SPARC Enterprise T2000 Server Installation Guide for detailed CMA installation instructions a Insert the inner latch smaller right side into the clip located at the end of the mounting bracket FIGURE 5 32 5 44 SPARC Enterprise T2000 Server Service Manual April 2007 FIGURE 5 32 Installing the CMA b Plug the CMA rail extension into the end of the left slide rail assembly The tab at the front of the rail extension clicks into place 4 Reconnect the cables to the back of the server If the CMA is in the way disconnect the left CMA release and swing the CMA open 5 3 5 Applying Power to the Server Reconnect both power cords to the power supplies Note As soon as the power cords are connected standby power is applied and depending on the configuration of the firmware the system might boot Chapter 5 Replacing Cold Swappable FRUs 5 45 5 46 SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 6 Adding New Components and Devices This chapter describes how to add new components and devices to the server The following topics are covered m Section 6 1 Adding Hot Pluggable and Hot Swappable Devices on page 6 1 m Section 6 2 Adding Components Inside the Chassis on page 6 4 6 1 6 1 1 Adding Hot Plug
133. ut of the chassis Replacing the DVD Drive Slide the DVD drive into the front of the chassis Replace the DVD interconnect board on the back of the DVD drive Perform the procedures described in Section 5 3 Common Procedures for Finishing Up on page 5 41 Removing the SAS Disk Backplane Perform the procedures described in Section 5 1 Common Procedures for Parts Replacement on page 5 1 Remove the DVD from the chassis See Section 5 2 17 Removing the DVD Drive on page 5 37 Remove all hard drives from the chassis See Section 4 5 1 Removing a Hard Drive on page 4 9 Note the slot in which each drive belongs Disconnect the SAS power cable from the power cable plug Note of which SAS data cable is plugged into which slot and disconnect the four SAS data cables from the SAS disk backplane Chapter 5 Replacing Cold Swappable FRUs 5 37 6 Remove the five screws that secure the SAS disk backplane to the chassis FIGURE 5 25 SAS disk backplane Power cable 8 connector le plug FIGURE 5 25 Removing the SAS Disk Backplane 7 Remove the SAS disk backplane from the chassis and place it on an antistatic mat 5 2 20 Replacing the SAS Disk Backplane 1 Unpackage the replacement SAS disk backplane and place it on an antistatic mat 2 Place the SAS disk backplane on the two ledges on the bottom of the drive cage assembly with the power connector facing down toward
134. ve and replace a power supply without shutting the server down provided that the other power supply is online and working The following LEDs are lit when a power supply fault is detected m Front and rear Service Required LEDs m Rear FRU Fault LED on the front of the server a Amber Failure LED on the faulty power supply If a power supply fails and you do not have a replacement available leave the failed power supply installed to ensure proper air flow in the server Removing a Power Supply Identify which power supply 0 or 1 requires replacement FIGURE 4 2 A lighted amber LED on a power supply indicates that a failure was detected You can also use the showfaults command at the sc gt prompt 4 4 SPARC Enterprise T2000 Server Service Manual April 2007 Latches FIGURE 4 2 Locating Power Supplies and Release Latch At the sc gt prompt issue the removefru command The removefru command indicates if it is OK to perform a hot swap of a power supply This command does not perform any action but provides a warning if the power supply should not be removed because the other power supply is not providing power to the server Example sc gt removefru PSn Are you sure you want to remove PSO y n y lt PSn gt is safe to remove In this command Psn is the power supply identifier for the power supply you plan to remove either PSO or PS1 Gain access to the rear of the server where the faulty pow
135. vidual tests by right clicking on the name of the test For example in FIGURE 3 12 right clicking on the text string ce0 nettest brings up a menu that enables you to configure this Ethernet test 8 Start testing Click the Start button that is located at the top left of the SunVTS window Status and error messages appear in the test messages area located across the bottom of the window You can stop testing at any time by clicking the Stop button During testing SunVTS software logs all status and error messages To view these messages click the Log button or select Log Files from the Reports menu This action opens a log window from which you can choose to view the following logs Information Detailed versions of all the status and error messages that appear in the test messages area Test Error Detailed error messages from individual tests VTS Kernel Error Error messages pertaining to SunVTS software itself Look here if SunVTS software appears to be acting strangely especially when it starts up Solaris OS Messages var adm messages A file containing messages generated by the operating system and various applications Log Files var opt SUNWvts logs A directory containing the log files SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 4 Replacing Hot Swappable and Hot Pluggable FRUs This chapter describes how to remove and replace the hot swappable and hot pluggable field repl
136. visual inspection of cables and power and possibly performed a reset of the server refer to the SPARC Enterprise T2000 Server Installation Guide and SPARC Enterprise T2000 Server Administration Guide for details FIGURE 3 1 is a flow chart of the diagnostics available to troubleshoot faulty hardware TABLE 3 1 has more information about each diagnostic in this chapter Note POST is configured with ALOM CMT configuration variables TABLE 3 9 If diag_level is set to max diag_level max POST reports all detected FRUs including memory devices with errors correctable by Predictive Self Healing PSH Thus not all memory devices detected by POST need to be replaced See Section 3 4 5 Correctable Errors Detected by POST on page 3 36 SPARC Enterprise T2000 Server Service Manual April 2007 Identify faulty FRU from the fault message and replace the FRU Identify faulty FRU from the Sun VTS message and replace the FRU Identify faulty FRU from the POST message and replace the FRU FIGURE 3 1 1 Are the Power OK and AC OK LEDs off Aman faults reported by the ALOM showfaults command No 3 Do the Solaris kgs indicate a faulty FRU Yas 4 Does Sun VTS report any amp ulty devices Yes Diagnostic Flow Chart Yes Check the power source and connections The showfaults command displays a fault Yas 6 ls the fault
137. who service and repair computer systems The following topics are covered Section 3 1 Overview of Server Diagnostics on page 3 1 Section 3 2 Using LEDs to Identify the State of Devices on page 3 8 Section 3 3 Using ALOM CMT for Diagnosis and Repair Verification on page 3 16 Section 3 4 Running POST on page 3 26 Section 3 5 Using the Solaris Predictive Self Healing Feature on page 3 40 Section 3 6 Collecting Information From Solaris OS Files and Commands on page 3 45 Section 3 7 Managing Components With Automatic System Recovery Commands on page 3 46 Section 3 8 Exercising the System With SunVTS on page 3 49 Overview of Server Diagnostics You can use a variety of diagnostic tools commands and indicators to monitor and troubleshoot a server LEDs Provide a quick visual notification of the status of the server and of some of the FRUs 3 1 3 2 a ALOM CMT firmware This system firmware runs on the system controller In addition to providing the interface between the hardware and OS ALOM CMT also tracks and reports the health of key server components ALOM CMT works closely with POST and Solaris Predictive Self Healing technology to keep the system up and running even when there is a faulty component a Power on self test POST POST performs diagnostics on system components upon system reset to ensure the integrity of those components POST is configureable and wor
138. y from clothing or the work environment can destroy components Do not touch the components along their connector edges Using an Antistatic Wrist Strap Wear an antistatic wrist strap and use an antistatic mat when handling components such as drive assemblies boards or cards When servicing or removing server components attach an antistatic strap to your wrist and then to a metal area on the chassis Do this after you disconnect the power cords from the server Following this practice equalizes the electrical potentials between you and the server Using an Antistatic Mat Place ESD sensitive components such as the motherboard memory and other PCB cards on an antistatic mat 1 2 SPARC Enterprise T2000 Server Service Manual April 2007 CHAPTER 2 Server Overview This chapter provides an overview of the features of the server The following topics are covered m Section 2 1 Server Features on page 2 2 m Section 2 2 Chassis Identification on page 2 9 2 1 2 1 2 1 1 Server Features The server is a high performance entry level server that is highly scalable and extremely reliable FIGURE 2 1 Server Chip Multitheaded Multicore Processor and Memory Technology The UltraSPARC T1 multicore processor is the basis of the server The UltraSPARC T1 processor is based on chip multithreading CMT technology that is optimized for highly threaded transactional processing The UltraSPARC T1 processo
139. y protection on its internal cache memories including tag parity and data parity on the D cache and I cache The internal 3 Mbyte L2 cache has parity protection on the tags and ECC protection of the data Chapter 2 Server Overview 2 7 2 1 5 Advanced ECC also called chipkill corrects up to 4 bits in error on nibble boundaries as long as the bits are all in the same DRAM If a DRAM fails the DIMM continues to function Predictive Self Healing The server features the latest fault management technologies The Solaris 10 Operating System OS introduces a new architecture for building and deploying systems and services capable of Predictive Self Healing Self healing technology enables systems to accurately predict component failures and mitigate many serious problems before they occur This technology is incorporated into both the hardware and software of the server At the heart of the Predictive Self Healing capabilities is the Solaris Fault Manager a service that receives data relating to hardware and software errors and automatically and silently diagnoses the underlying problem Once a problem is diagnosed a set of agents automatically responds by logging the event and if necessary takes the faulty component offline By automatically diagnosing problems business critical applications and essential system services can continue uninterrupted in the event of software failures or major hardware component failures 2 8 SPARC Ent
Download Pdf Manuals
Related Search
Related Contents
Font Converter V3.16 - Doc V7 Portable Security Cable with Combination Lock ReadyToProcess columns User Manual Installation & User Manual V2.02 Water Ace Manual Wheel Charger FT 203099 Vanish activateur blanc sachets x 10 User Guide - This is Xeptor Copyright © All rights reserved.
Failed to retrieve file