Home
        819-3248-10 Sun Fire T1000 Server Service Manual
         Contents
1.       See                   Note     Depending on the configuration of ALOM POST variables  see  and whether  POST detected faults or not  the system might boot  or the system might remain at  the ok prompt  If the system is at the ok prompt  type boot        d  Issue the Solaris OS fmadm faulty command          fmadm faulty             No memory or DIMM faults should be displayed     If any faults are reported  return to the    Diagnostic Flow Chart    on page 11 for an  approach to diagnosing the fault     To Remove the Motherboard and Chassis    The motherboard  power supply  and chassis are replaced as a unit  Therefore   remove all other FRUs and associated cables from your chassis and install them in  the new chassis  The FRUs to remove and replace and the procedures to remove and  replace them are     Sun Fire T1000 Server Service Manual   January 2006      Remove the PCI Express card   See    To Remove the Optional PCI Express Card    on page 58       Remove the fan tray assembly and cable     See    To Remove the Fan Tray Assembly    on page 60       Remove the power supply and cable        To Remove the Power Supply    on page 61      Remove the hard drive and cable     See    To Remove the Hard Drive    on page 63       Remove the memory DIMMs    See    To Remove DIMMs    on page 65      Remove the socketed system configuration SEEPROM from the motherboard and  place it on an antistatic mat     The system configuration SEPROM contains the persistent storage for the
2.      Note     Never run the system with the top cover removed  The top cover must be in  place for proper air flow  The cover interlock switch immediately shuts the system  down when the cover is removed           Caution     The system supplies 3 3 Vdc standby power to the circuit boards even  when the system is powered off if the AC power cord is plugged in        1  Press the cover release button  FIGURE 3 3      2  While pressing the release button  grasp the rear of the cover and slide the cover  toward the rear of the server about one half inch     3  Lift the cover off the chassis         Cover rel  ase     button    Top cover    FIGURE 3 3 Location of Top Cover  Release Button       Removing and Replacing CRUs    This section provides procedures for replacing the following customer replaceable  parts CRUs  inside the server chassis     m    To Remove the Optional PCI Express Card    on page 58 and    To Add or Replace  the Optional PCI Express Card    on page 60    Chapter 3 Removing and Replacing FRUs 57    a    To Remove the Fan Tray Assembly    on page 60 and    To Replace the Fan Tray  Assembly    on page 61    a    To Remove the Power Supply    on page 61 and    To Replace the Power Supply     on page 62   m    To Remove the Hard Drive    on page 63 and    To Replace the Hard Drive    on  page 64   a    To Remove DIMMs    on page 65 and    To Add or Replace DIMMs    on page 66   a    To Remove the Clock Battery on the Motherboard    on page 70 and    To Replace  
3.      SunVTS software features both character based and graphics based interfaces  This   procedure assumes that you are using the graphical user interface  GUI  on a system  running the Common Desktop Environment  CDE   For more information about the  character based SunVTS TTY interface  and specifically for instructions on accessing  it by TIP or telnet commands  refer to the SunVTS User   s Guide     SunVTS software can be run in several modes  This procedure assumes that you are  using the default mode     This procedure also assumes that the Sun Fire T1000 server is headless   that is  it is  not equipped with a monitor capable of displaying bit mapped graphics  In this case   you access the SunVTS GUI by logging in remotely from a machine that has a  graphics display     Finally  this procedure describes how to run SunVTS tests in general  Individual tests  may presume the presence of specific hardware  or may require specific drivers   cables  or loopback connectors  For information about test options and prerequisites   refer to the following documentation     m SunVTS Test Reference Manual  m SunVTS 6 0 PS3 Doc Supplement  SPARC     To Exercise the System Using SunVTS Software      Log in as superuser to a system with a graphics display     The display system should be one with a frame buffer and monitor capable of  displaying bit mapped graphics such as those produced by the SunVTS GUI       Enable remote display  On the display system  type           usr openwin bi
4.   _  bge2 netilbtest            bge3 netlbtest     FIGURE 2 7 SunVTS Test Selection Panel    6   Optional  Select the tests you want to run   Certain tests are enabled by default  and you can choose to accept these     Alternatively  you can enable and disable individual tests or blocks of tests by  clicking the checkbox next to the test name or test category name  Tests are enabled  when checked  and disabled when not checked     TABLE 2 8 lists tests that are especially useful to run on a Sun Fire T1000 server     TABLE 2 8 Useful Sun VTS Tests to Run on a Sun Fire T1000 Server       SunVTS Tests FRUs Exercised by Tests       cmttest cputest  fputest  iutest  DIMMs  motherboard  lidcachetest  dtlbtest  and  12sramtest   indirectly  mptest  and    systest  disktest Disks  cables  disk backplane  nettest  netlbtest Network interface  network cable  motherboard       48 Sun Fire T1000 Server Service Manual   January 2006    TABLE 2 8 Useful SunVTS Tests to Run on a Sun Fire T1000 Server  Continued        SunVTS Tests FRUs Exercised by Tests   pmemtest  vmemtest  ramtest DIMMs  motherboard   serialtest 1 0  serial port interface    hsclbtest Motherboard  ALOM system Controller     Host to System Controller interface        Optional  Customize individual tests     You can customize individual tests by right clicking on the name of the test  For  example  in the illustration under FIGURE 2 7  right clicking on the text string  bg0  nettest  brings up a menu that enables you to co
5.   online documentation for the Solaris    operating environment  m Other software documentation that you received with your system    Sun Fire T1000 Server Service Manual   January 2006       Typographic Conventions          Typeface  Meaning Examples   AaBbCc123 The names of commands  files  Edit your  login file   and directories  on screen Use 1s  a to list all files   computer output   You have mail     AaBbCc123 What you type  when contrasted   su  with on screen computer output password     AaBbCc123 Book titles  new words or terms  Read Chapter 6 in the User   s Guide     words to be emphasized  These are called class options   Replace command line variables      You must be superuser to do this   with real names or values     To delete a file  type rm filename        1 The settings on your browser might differ from these settings        Shell Prompts       Shell Prompt   C shell machine name   C shell superuser machine name   Bourne shell and Korn shell     Bourne shell and Korn shell superuser         Sun Fire T1000 Server Documentation    You can view and print the following documents from the Sun documentation web    Preface ix    site at http    www sun com documentation       Title    Description    Part Number       Sun Fire T1000 Server Site Planning  Data Guide    Sun Fire T1000 Server Product Notes    Sun Fire T1000 Server Product  Overview    Sun Fire T1000 Server Getting  Started Guide    Sun Fire T1000 Server Installation  Guide    Sun Fire T1000 Server Sy
6.   service personnel  and system  administrators who service and repair computer systems     The following topics are covered        Overview of Sun Fire T1000 Server Diagnostics    on page 9      Using LEDs to Identify the State of Devices    on page 14      Using ALOM For Diagnosis and Repair Verification    on page 17      Running POST    on page 27      Using the Solaris Predictive Self Healing Feature    on page 35      Collecting Information From Solaris OS Files and Commands    on page 39     Managing System Components with Automatic System Recovery Commands     on page 40   m    Exercising the System with SunVTS    on page 43       Overview of Sun Fire T1000 Server  Diagnostics    There are a variety of diagnostic tools  commands  and indicators you can use to  troubleshoot a Sun Fire T1000 server     m LEDs   provide a quick visual notification of the status of the server and of some  of the FRUs     10    a ALOM CMT firmware   is the system firmware that runs on the system  controller  In addition to providing the interface between the hardware and OS   ALOM also tracks and reports the health of key server components  ALOM works  closely with POST and Solaris predictive self healing technology to keep the  system up and running even when there is a faulty component     a Power On self test  POST      Performs diagnostics on system components upon  system reset to ensure the integrity of those components  POST is configureable  and works with ALOM to take faulty compo
7.  35       The PSH console message provides the following information   Type   Severity   Description   Automated Response   Impact   Suggested Action for System Administrator   Details    If the Solaris OS PSH facility has detected a faulty component  use the fmdump  command to identify the fault        Note     Additional predictive self healing information is available at   http    www sun com msg        36 Sun Fire T1000 Server Service Manual   January 2006    v To Use the fmdump Command to Identify Faults    The fmdump command displays the list of faults detected by the Solaris PSH facility   Use this command for the following reasons     m To see if any faults have been detected by the Solaris PSH facility    m If you need to obtain the fault message ID  SUNW MSG ID  for detected faults    m To verify that the replacement of a FRU has cleared the fault and not generated  any additional faults     If you already have a fault message ID  go to Step 2 to obtain more information  about the fault from Suns Predictive Self Healing Knowledge Article web site     1  Check the event log using the fmdump command with  v for verbose output          fmdump  v       TIME UUID SUNW MSG ID  Oct 21 10 32 47 2211 a26d5379 24b8 4a46 bcbf d9elff75albc SUN4U   8000 28    95  fault memory dimm  FRU  mem     component MB CMPO CHO R1 D0 J0701  rsrc  mem     component MB CMPO CHO R1 D0 J0701             In this example  a fault is displayed  indicating the following details   m Date and time of
8.  CH1 R1 D1  but this table  lists the DIMM namei in an abbreviated way the preceding MB CMP0 is omitted  for clarity       Grasp the top corners of the DIMM and remove it from the motherboard       Place the DIMM on an antistatic mat     To Add or Replace DIMMs    Use the following guidelines and FIGURE 3 11 and TABLE 3 1 to plan the memory  configuration of your server     m Eight slots hold industry standard DDR 2 memory DIMMs  providing a total of  16 GBytes of memory    m The Sun Fire T1000 server accepts the following DIMM sizes       512 MB  a 1 GB  a 2GB    a All DIMMs installed must be the same size     DIMMs must be added four at a time   m Rank 0 memory must be fully populated for the Sun Fire T1000 to function      Unpackage the replacement DIMMs and place them on an antistatic mat      Ensure that the socket ejector tabs are in the open position      Line up the replacement DIMM with the connector      Push the DIMM into the socket until the ejector tabs lock the DIMM in place       Perform the procedures described in    Common Procedures for Finishing Up    on    page 72     Sun Fire T1000 Server Service Manual   January 2006    6  Perform the following steps to clear the memory fault     a  Gain access to the ALOM sc gt  prompt   Refer to the Sun Fire T2000 Server Advanced Lights Out Management  ALOM   Guide for instructions    b  Run the showfaults  v command to determine how to clear the fault    m If the fault is a Host detected fault  displays a UUID   such 
9.  For example  if one of the processor cores is deemed faulty by  POST  the core will be disabled  and the system will boot and run using the  remaining cores     Devices can be manually enabled or disabled using ASR commands  see    Managing  System Components with Automatic System Recovery Commands    on page 40      Controlling How POST Runs    The server can be configured for normal  extensive  or no POST execution  You can  also control the level of tests that run  the amount of POST output that is displayed   and which reset events trigger POST by using ALOM variables     Chapter 2 Sun Fire T1000 Server Diagnostics 27       28    TABLE 2 5 lists the ALOM variables used to configure POST and FIGURE 2 5 shows  how the variables work together     TABLE 2 5 ALOM Parameters Used For POST Configuration          Parameter Values Description  setkeyswitch  normal The system can power on and run POST  based  on the other parameter settings   For details see  FIGURE 2 5  This parameter overrides all other  commands   diag The system runs POST based on predetermined  settings   stby The system cannot power on   locked The system can power on and run POST  but no  flash updates can be made   diag mode off POST does not run   normal Runs POST according to diag level value   service Runs POST with preset values for diag level  and diag verbosity   diag level min If diag mode   normal  run minimum set of  tests   max If diag mode   normal  runs all the minimum  tests plus extensive CPU
10.  If the suggested action does not recommend  replacing a FRU  perform the suggested action   Contact Sun for additional support  if needed    The showenvironment command reports over  temperature conditions when the ambient room  temperature exceeds the upper limit     12 Sun Fire T1000 Server Service Manual   January 2006       To Remove the Power  Supply    on page 61 and     To Replace the Power  Supply    on page 62       To Run the showfaults  Command    on page 21       Using the Solaris  Predictive Self Healing  Feature    on page 35    Sun Support information   http    www sun com   service contacting       To Run the  showenvironment  Command    on page 22    TABLE 2 1    Action  No     Diagnostic Flow Chart Actions  Continued     Diagnostic Action    Resulting Action    For more information  see  these sections       8     10     11     12     Identify the cause  of the over  temperature  condition    Identify the faulty  FRU     Check the Solaris  log files for fault  information     Run POST     Run SunVTS     The over temperature condition may be caused   excessive ambient room temperature  an   overheating power supply or a faulty fan tray   assembly      If ambient room temperature is too high  reduce  room temperature      If over temperature condition still exists  go to  Action 9      If over temperature condition does not exist  go  to Action 10     The FRUs require that you shut down the server to  perform a cold swap     After replacing the faulty FRU  go
11.  Information    on page 51      Common Procedures for Parts Replacement    on page 53     Removing and Replacing CRUs    on page 57      Common Procedures for Finishing Up    on page 72    For a list of CRUs  see Appendix A     Field Replaceable Units  FRUs     on page 75        Note     Never attempt to run the system with the cover removed  The cover must be  in place for proper air flow  The cover interlock switch immediately shuts the system  down when the cover is removed           Safety Information    This section describes important safety information you need to know prior to  removing or installing parts in the Sun Fire T1000 server     For your protection  observe the following safety precautions when setting up your  equipment     a Follow all Sun standard cautions  warnings  and instructions marked on the  equipment and described in Important Safety Information for Sun Hardware Systems     m Ensure that the voltage and frequency of your power source match the voltage  and frequency inscribed on the equipment s electrical rating label     m Follow the electrostatic discharge safety practices as described in this section     51     gt     The document  Important Safety Information for Sun Hardware Systems  816 7190   contains a listing of safety precautions for Sun systems  This document is located in  the packing carton of your server     The Sun Fire T1000 server complies with regulatory requirements for safety and  EMI  Document about compliance is available onl
12.  System Reliability  Availability  and Serviceability 4  Environmental Monitoring 5  Error Correction and Parity Checking 5  Predictive Self Healing 6  Chassis Identification 6    Additional Service Related Information 7    Sun Fire T1000 Server Diagnostics 9  Overview of Sun Fire T1000 Server Diagnostics 9  Using LEDs to Identify the State of Devices 14  Front and Rear Panel LEDs 16  Power Supply LEDs 17  Using ALOM For Diagnosis and Repair Verification 17    Running ALOM Service Related Commands 19  Connecting to ALOM 19  Switching Between the System Console and ALOM 20  Service Related ALOM Commands 20  v To Run the showfaults Command 21  v To Run the showenvironment Command 22  v To Runthe showfru Command 24  Running POST 27  Controlling How POST Runs 27  v To Change POST Parameters 30  Reasons to Run POST 31  Routine Sanity Check of the Hardware 31  Diagnosing the System Hardware 31  v ToRunPOST 31  Using the Solaris Predictive Self Healing Feature 35  v To Use the fmdump Command to Identify Faults 37  Collecting Information From Solaris OS Files and Commands 39  v To Check the Message Buffer 39  v To View System Message Log Files 39  Managing System Components with Automatic System Recovery Commands 40  v To Run the showcomponent Command 41  To Run the disablecomponent Command 42  v To Run the enablecomponent Command 43  Exercising the System with SunVTS 43  Checking Whether SunVTS Software Is Installed 43  v To Check Whether SunVTS Software Is Installed 44  Exercising 
13.  and memory tests   diag trigger none Do not run POST on reset     diag verbosity    user reset    power on reset    error reset  all reset  none    min    normal    max    Runs POST upon user initiated resets     Only run POST for the first power on  This is the  default     Runs POST if fatal errors are detected   Runs POST after any reset   No POST output is displayed     POST output displays functional tests with a  banner and pinwheel     POST output displays all test and informational  messages     POST displays all test  informational  and some  debugging messages          All of these parameters are set using the ALOM setsc command except for the setkeyswitch command     Sun Fire T1000 Server Service Manual   January 2006    diag_mode            user_reset   power_on reset   error_reset    diag_trigger            System Boot       OpenBoot PROM    Service Mode   Forces a Sun prescribed  level of diagnostic  execution  Overrides  user defined settings  as  if parameters were   diag_level max   diag_verbosity max   diag_trigger all resets   User defined settings are  not modified           Normal Mode  Diagnostic execution is  enabled  User defined  settings control test  coverage and verbosity  via  diag_level   diag_verbosity   diag_trigger        FIGURE 2 5 Flowchart of ALOM Variable for POST Configuration    Chapter 2 Sun Fire T1000 Server Diagnostics 29    TABLE 2 6 shows typical combinations of ALOM variables and associated POST  mode           TABLE 26 ALOM Param
14.  command takes effect        sc gt  reset          42 Sun Fire T1000 Server Service Manual   January 2006       v To Run the enablecomponent Command    The enablecomponent command enables a disabled component by removing it  from the ASR blacklist     1  At the sc gt  prompt  enter the enablecomponent command        sc gt  enablecomponent MB CMP0 CH3 R1 D1    sc gt SC Alert MB CMP0 CH3 R1 D1 reenabled                2  After receiving confirmation that the enablecomponent command is complete   reset the server for so that the ASR command takes effect     sc gt  reset          Exercising the System with SunVTS    Sometimes a server exhibits a problem that cannot be isolated definitively to a  particular hardware or software component  In such cases  it may be useful to run a  diagnostic tool that stresses the system by continuously running a comprehensive  battery of tests  Sun provides the SunVTS software for this purpose     This chapter describes the tasks necessary to use SunVTS software to exercise your  Sun Fire T1000 server      m    Checking Whether SunVTS Software Is Installed    on page 43  m    Exercising the System Using SunVTS Software    on page 44    Checking Whether SunVTS Software Is Installed    This procedure assumes that the Solaris OS is running on the Sun Fire T1000 server   and that you have access to the Solaris OS command line     Chapter 2 Sun Fire T1000 Server Diagnostics 43    44    v To Check Whether SunVTS Software Is Installed    1  Check for the
15.  connect to the system console after performing the operation     Prepares a FRU for removal  and illuminates the host system   s OK to  Remove LED   gt     Generates a hardware reset on the host server  The  y option enables you  to skip the confirmation question  The  c option instructs ALOM to  connect to the system console after performing the operation     Reboots the ALOM system controller  The  y option enables you to skip  the confirmation question     Sets the virtual keyswitch     Turns the Locator LED on the server on or off        20 Sun Fire T1000 Server Service Manual   January 2006    TABLE 2 4  Service Related ALOM Commands  Continued     ALOM Command Description       showenvironment Displays the environmental status of the host server  This information  includes system temperatures  power supply  front panel LED  hard drive   fan  voltage  and current sensor status  See    To Run the  showenvironment Command    on page 22     showfaults   v  Displays current system faults  See    To Run the showfaults Command     on page 21    showfru   g lines    s    d  Displays information about the FRUs in the server     FRU    The  g lines option specifies the number of lines to display before    pausing the output to the screen       The  s option displays static information about system FRUs  defaults  to all FRUs  unless one is specified       The  d displays dynamic information about system FRUs  defaults to all  FRUs  unless one is specified   See    To Run the sho
16.  drive and  remove the drive and tray assembly from the chassis          Latches    Hard drive    Figure showing how to remove the hard disk drive     FIGURE 3 9 Removing the Hard Drive    Chapter 3 Removing and Replacing FRUs 63    64    v To Replace the Hard Drive    1  Unpackage the replacement hard drive and tray assembly     2  Slide the hard drive and tray assembly into the chassis until it mates with the    front of the chassis  FIGURE 3 10      Hard drive             Latches    FIGURE 3 10 Replacing the Hard Drive      Snap the catches on the latches to lock the drive and tray assembly into place in    the chassis       Redress the power and cable through the midwall in the chassis and reconnect the    cable to the rear of the drive       Perform the procedures described in    Common Procedures for Finishing Up    on    page 72      Perform administrative tasks to reconfigure the hard disk drive     The procedures that you perform at this point depend on how your data is  configured  You might need to partition the drive  create file systems  load data from  backups  or have it updated from a RAID configuration     Example     cfgadm  c configure c0t0d0s0C       Sun Fire T1000 Server Service Manual   January 2006    To Remove DIMMs       Caution     This procedure requires that you handle components that are sensitive to  static discharges that can cause the component to fail  To avoid this problem  ensure  that you follow antistatic practices as described in    To Pe
17.  gt 1O Bridge unit 1 interrupt test    0 0 gt 1O Bridge unit 1 Config MB bridges   0 0 gt Config port B  bus 2 dev 0 func 0  tag 5714 BRIDGE   0 0 gt Config port B  bus 3 dev 8 func 0  tag PCIX BRIDGE   0 0 gt 1O Bridge unit 1 PCI id test   0 0 gt INFO 10 count read passed for MB IOB_PCIEb BRIDGE  Last read VID 1166   DID 103  0 0 gt INFO 10 count read passed for MB IOB PCIEb BRIDGE GBE  Last read VID 14e4   DID 1648  0 0 gt INFO 10 count read passed for MB IOB_PCIEb BRIDGE HBA  Last read VID 1000   DID 50  0 0 gt Quick JBI Loopback Block Mem Test    0 0 gt Quick jbus loopback Test 262144 bytes at 00000000 00600000   0 0 gt INFO     0 0 gt POST Passed all devices    0 0 gt POST Return to VBSC     0 0 gt Master set ACK for vbsc runpost command and spin                5  Perform further investigation if needed   When POST is finished running  the system will continue to boot even if post detects  a faulty FRU  provided it does not leave the system without memory or a CPU core     Note that certain DIMM failures may not be diagnosable to a single DIMM  These  failures are fatal  and will result in both logical banks being unconfigured If POST  detects a faulty device  the fault is displayed and the fault information is passed to  ALOM for fault handling      a  Interpret the POST messages     POST error messages use the following syntax     c 5  gt  ERROR  TEST   failing test  c s  gt  H W under test   FRU  c s  gt  Repair Instructions  Replace items in order listed by H W    Ch
18.  host ID  and Ethernet MAC addresses of the system  as well as the ALOM configuration  including the IP addresses and ALOM user accounts  if configured  This information  will be lost unless the system configuration SEEPROM is removed and installed in  the replacement motherboard  The PROM does not hold the fault data  and this data  will no longer be accessible when the motherboard a nd chassis assembly is  replaced     The location of this SEEPROM is shown in Appendix A     Field Replaceable Units   FRUs     on page 75     To Replace the Motherboard and Chassis  Assembly      Reconnect the front panel LED cable       Replace the PCI Express card   See    To Add or Replace the Optional PCI Express Card    on page 60        Replace the fan tray assembly and cable   See    To Replace the Fan Tray Assembly    on page 61        Replace the power supply and cable      To Replace the Power Supply    on page 62      Replace the hard disk drive and cable   See    To Replace the Hard Drive    on page 64      Chapter 3 Removing and Replacing FRUs 69      Replace the memory DIMMs      To Add or Replace DIMMs    on page 66       Replace the socketed system configuration SEEPROM     The location of this SEEPROM is shown in Appendix A     Field Replaceable Units   FRUs     on page 75       Perform the procedures described in    Common Procedures for Finishing Up    on  page 72       Boot the system and run POST to verify that the system is fully operational  See     Running POST    on pag
19.  indicate the source of a fault   check the message buffer and log files for notifications for faults  Hard drive faults  are usually captured by the Solaris message files     Use the dmesg command to view the most recent system message  To view the  system messages log file  view the contents of the  var adm messages file     To Check the Message Buffer      Log in as superuser       Issue the dmesg command          dmesg             The dmesg command displays the most recent messages generated by the system     To View System Message Log Files    The error logging daemon  syslogd automatically records various system  warnings  errors  and faults in message files  These messages can alert you to system  problems such as a device that is about to fail     Chapter 2 Sun Fire T1000 Server Diagnostics 39    The  var adm directory contains several message files  The most recent messages  are in the  var adm messages file  After a period of time  usually every ten days    a new messages file is automatically created  The original contents of the  messages file are rotated to a file named messages  1  Over a period of time  the  messages are further rotated to messages  2 and messages  3  and then deleted     1  Log in as superuser     2  Issue the following command       more  var adm messages    3  If you want to view all logged messages  issue the following command          more  var adm messages                 40    Managing System Components with  Automatic System Recovery Co
20.  location is OK   sensor at location is within normal range     Environmental faults can be repaired through removal and replacement of the faulty  FRU  FRU removal is automatically detected by the environmental monitoring and  all faults associated with the removed FRU are cleared  The message for that case   and the alert sent for all FRU removals is     fru at location has been removed   There is no ALOM command to manually repair an environmental fault     ALOM does not handle hard drive faults  Use the Solaris message files to view hard  drive faults  See    Collecting Information From Solaris OS Files and Commands    on  page 39     Running ALOM Service Related Commands    This section describes the ALOM commands that are commonly used for service   related activities     Connecting to ALOM    Before you can run ALOM commands  you must connect to the ALOM  There are  several ways to connect to the system controller     a Connect an ASCII terminal directly to the serial management port     m Use the telnet command to connect to ALOM through an Ethernet connection  on the network management port     a Connect an external modem to the network management port and dial in to the  modem        Note     Refer to the Sun Fire T1000 Server Advanced Lights Out Manager  ALOM   Guide for instructions on configuring and connecting to ALOM     Chapter 2 Sun Fire T1000 Server Diagnostics 19    Switching Between the System Console and ALOM    m To switch from the console output to the 
21.  or interfere when the server chassis  is removed from the rack       Disconnect the power cord from the power supply     Disconnect all cables from the server and label them       From the front of the server  unlock both mounting brackets  FIGURE 3 1  and pull  the server chassis out until the brackets lock in the open position        FIGURE 3 1 Unlocking a Mounting Bracket    Chapter 3 Removing and Replacing FRUs 55       6  Press the release buttons on both mounting brackets  FIGURE 3 2  to release the  right and left mounting brackets  then pull the server chassis out of the rails     The mounting brackets slide approximately 4 in  10 cm  further before disengaging        FIGURE 3 2 Location of the Mounting Bracket Release Buttons    7  Set the chassis on a sturdy work surface     v To Perform Electrostatic Discharge  ESD   Prevention Measures    1  Prepare an antistatic surface by which to set parts during removal and installation     Place ESD sensitive components such as the printed circuit boards on an antistatic  mat  The following items can be used as an antistatic mat     a Antistatic bag used to wrap a Sun replacement part  a Sun ESD mat  part number 250 1088  a Disposable ESD mat  shipped with some replacement parts or optional system    components     2  Use an antistatic wrist strap     56 Sun Fire T1000 Server Service Manual   January 2006    v To Remove the Top Cover    Access to all customer replaceable units  CRUs  requires the removal of the top  cover   
22.  presence of SunVTS packages  Type          pkginfo  1 SUNWvts SUNWvtsr SUNWvtsts SUNWvtsmn    6             a If SunVTS software is loaded  information about the packages is displayed   m If SunVTS software is not loaded  you see an error message for each missing  package        ERROR  information for  SUNWvts  was not found  ERROR  information for  SUNWvtsr  was not found                The pertinent packages are as follows        Package Description   SUNWvts SunVTS framework  SUNWvtsr Sun VTS Framework  root   SUNWvtsts Sun VTS for tests  SUNWvtsmn Sun VTS man pages    If SunVTS is not installed  you can obtain the installations packages from the  following     m Solaris Operating System DVDs  m From the Sun Download Center  http    www sun com oem products vts    The Sun VTS 6 0 PS3 software  and future compatible versions  are supported on the  Sun Fire T1000 server     SunVTS installation instructions are described in the SunVTS User s Guide     Exercising the System Using SunVTS Software    Before you begin  the Solaris OS must be running  You also need to ensure that  SunVTS validation test software is installed on your system  See    Checking Whether  SunVTS Software Is Installed    on page 43     Sun Fire T1000 Server Service Manual   January 2006    SunVTS software requires that you use one of two security schemes  The security  scheme you choose must be properly configured in order for you to perform this  procedure  For details  refer to the Sun VTS User s Guide
23.  sont utilis  es sous licence et sont des marques de fabrique ou des marques d  pos  es de SPARC International  Inc   aux Etats Unis et dans d   autres pays  Les produits portant les marques SPARC sont bas  s sur une architecture d  velopp  e par Sun  Microsystems  Inc     L interface d utilisation graphique OPEN LOOK et Sun    a   t   d  velopp  e par Sun Microsystems  Inc  pour ses utilisateurs et licenci  s  Sun  reconna  t les efforts de pionnier de Xerox pour la recherche et le d  veloppement du concept des interfaces d utilisation visuelle ou graphique  pour l industrie de l informatique  Sun d  tient une license non exclusive de Xerox sur l interface d utilisation grephique Xerox  cette licence  couvrant   galement les licenci  es de Sun qui mettent en place l interface d    utilisation graphique OPEN LOOK et qui en outre se conforment  aux licences   crites de Sun     LA DOCUMENTATION EST FOURNIE  EN L   TAT  ET TOUTES AUTRES CONDITIONS  DECLARATIONS ET GARANTIES EXPRESSES  OU TACITES SONT FORMELLEMENT EXCLUES  DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE  Y COMPRIS NOTAMMENT  TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE  A L APTITUDE A UNE UTILISATION PARTICULIERE OU A  L ABSENCE DE CONTREFA  ON     EG 192 Ca    Adobe PostScript    Contents    Preface vii    Sun Fire T1000 Server Overview 1  Sun Fire T1000 Server Features 1  Chip Multitheaded  CMT  Multicore Processor and Memory Technology 2  Performance Enhancements 2  Remote Manageability With ALOM 3 
24.  temperature observed by a sensor falls below a low temperature  threshold or rises above a high temperature threshold  the monitoring subsystem  software lights the amber Service required LEDs on the front and back panels  If the  temperature condition persists and reaches a critical threshold  the system initiates a  graceful system shutdown     All error and warning messages are sent to the ALOM system controller system  console and logged in the ALOM log file  Additionally  some FRUs such as the  power supply provide LEDs that indicate a failure within the FRU     Additionally  the power supply contains an LED that is lit to indicate a failure within  the power supply     Error Correction and Parity Checking    The SPARC T1 multicore processor provides parity protection on its internal cache  memories  including tag parity and data parity on the D cache and I cache  The  internal 3MB L2 cache has parity protection on the tags  and ECC protection of the  data     Advanced ECC  also called Chipkill  detects up to 4 bits in error     Chapter 1 Sun Fire T1000 Server Overview 5    Predictive Self Healing    The Sun Fire T1000 server features the latest fault management technologies  With  the Solaris 10 Operating System  OS   Sun is introducing a new architecture for  building and deploying systems and services capable of predictive self healing  Self   healing technology enables Sun systems to accurately predict component failures  and mitigate many serious problems before th
25.  the fault  Oct 21 10 32 EDT 2004        a Universal Unique Identifier  UUID  that is unique for every fault  a26d5379   24b8 4a46 bchf d9e1ff75albc     m Sun message identifier  SUNW4U 8000 2S  that can be used to obtain additional  fault information    m Faulted FRU  FRU  mem     component MB CMP0 CHO R1 D0 J0701   that in  this example is identified as the DIMM at R1 D0  J0701      2  Use the Sun message ID to obtain more information about this type of fault     a  In a browser  go to the Predictive Self Healing Knowledge Article web site   http    www sun com msg    b  Enter the message ID in the SUNW MSG ID field  and press Lookup     In this example  the message ID SUN4U 8000 25 returns the following  information for corrective action     Chapter 2 Sun Fire T1000 Server Diagnostics 37    38          Memory module errors exceeded acceptable levels    Type  Fault  Severity  Major  Description  The Solaris TM  Fault Manager has determined that the number  of correctable  single bit  memory errors reported against  a memory DIMM module indicates a fault requiring repair  action is present   Automated Response  The system will attempt to remove the affected page of  memory from service   Impact  The system is at increased risk of incurring an uncorrectable  error  which will cause a service interruption  until the  memory DIMM module is replaced   Suggested Action for System Administrator  For Sun Fire TM  T1000  T2000 1280  3800 6800  2900 6900   E12K  E15K  F20K  and F25K 
26.  to Action 14     The Solaris message buffer and log files record   system events and can provide information about   faults    e If system messages indicate a faulty device   replace the FRU  Action 11       To obtain more diagnostic information  got to  Action 7     POST perforsm basic tests of the server components   and reports faulty FRUs    e If POST indicates a faulty FRU  replace the FRU   Action 9     e If POST does not indicate a faulty FRU  go to  Action 12     SunVTS provides tests used to exercise and  diagnose FRUs  To run SunVTS  the server must be  running the Solaris OS       If SunVTS reports a faulty device replace the  FRU  Action 9     e If SunVTS does not report a faulty device  go to  Action 11        To Remove the Fan Tray  Assembly    on page 60 and     To Replace the Fan Tray  Assembly    on page 61      To Remove the Power  Supply    on page 61 and     To Replace the Power  Supply    on page 62       Collecting Information  From Solaris OS Files and  Commands    on page 39       Running POST    on  page 27       Exercising the System  with SunVTS    on page 43    Chapter 2 Sun Fire T1000 Server Diagnostics 13             TABLE 2 1 Diagnostic Flow Chart Actions  Continued   Action For more information  see  No  Diagnostic Action Resulting Action these sections  13  Replace faulty The FRUs require that you shut down the server to    Removing and Replacing  FRU  perform a cold swap  FRUs    on page 51  After replacing the faulty FRU  go to Action 14   
27.  vi Sun Fire T2000 Server Service Manual   January 2006    Preface       The Sun Fire T1000 Service Manual provides information to aid in troubleshooting  problems with and replacing components within the Sun Fire    T1000 server     This manual is written for technicians  service personnel  and system administrators  who service and repair computer systems  The person qualified to use this manual     Can open a system chassis  identify  and replace internal components   Understands the Solaris Operating System and the command line interface   Has superuser privileges for the system being serviced    Understands typical hardware troubleshooting tasks        How This Book Is Organized    This guide is organized into the following chapters   Chapter 1 describes the main features of the Sun Fire T1000 server    Chapter 2 describes the diagnostics that are available for monitoring and  troubleshooting the Sun Fire T1000 server     Chapter 3 describes how to remove and replace the FRUS     Appendix A lists the customer replaceable components in the Sun Fire T1000 server     vii       viii    Using UNIX Commands    Use this section to alert readers that not all UNIX commands are provided   For example     This document might not contain information on basic UNIX   commands and  procedures such as shutting down the system  booting the system  and configuring  devices     See one or more of the following for this information     m Solaris Handbook for Sun Peripherals  m AnswerBook2  
28. 0 100 1000 Mbit auto negotiating  Each of the 4  Ethernet RJ45s includes two LEDs     e A green Link indicator  lit when a link is established at any speed   e A yellow Activity indicator  which blinks during packet transfers     1 DB 9 serial port    1 SATA disk drive  3 5 inch form factor  Support for hardware embedded RAID 1  mirroring     4 fans in a single assembly    1 PCI Express  PCI E  slot for low profile cards  supports 1x  4x  and  8x width cards     1 power supply  PS     ALOM system controller  integrated on motherboard  with a serial  and 10 100 Mbit Ethernet port    OpenBoot    PROM for reset and POST support  ALOM CMT for remote management administration    Solaris 10 1 06 or later Operating System preinstalled on the hard  disk drive    Java    Enterprise System with a 90 day trial license       For additional information on the Sun Fire T1000 server features refer to the Sun Fire  T1000 Server Product Overview     Remote Manageability With ALOM    The Sun Advanced Lights Out Manager  ALOM  feature is a system controller  SC   that enables to you remotely manage and administer the Sun Fire T1000 server     Chapter 1 Sun Fire T1000 Server Overview 3    4    The ALOM CMT software is preinstalled as firmware  and therefore  ALOM  initializes as soon as you apply power to the system  You can customize ALOM to  work with your particular installation     ALOM enables you to monitor and control your server over a network  or by using  a dedicated serial port for co
29. 14  Verify the repair  Various commands and utilities can be used to    To Run the showfaults  verify the functionality of the system components    Command    on page 21  Two useful commands are     Managing System    The ALOM showfaults command Components with  e The ASR showcomponents command Automatic System  If the FRU is blacklisted  you can manually remove Recovery Commands    on  it from the black list with the enablecomponent page 40  command     Exercising the System  If the fault is cleared  and the component is not W aN TE pn pape ae  blacklisted  the repair is verified well enough to  boot the server  For added assurance  you can run  the SunVTS diagnostic software   15  Contact Sun for The majority of hardware faults are detected by the Sun Support information   Support  server   s diagnostics  In rare cases it is possible that  http   www sun com   a problem requires additional troubleshooting  If service contacting  you are unable to determine the cause of the  problem  contact Sun for support   Using LEDs to Identify the State of  Devices  The Sun Fire T1000 server provides the following groups of LEDs   m ABrO and rear panel LEDS  FIGURE 2 2  FIGURE 2 3  and TABLE 2 2     LPBwer supply LEDs  FIGURE 2 3 and TABLE 2 3   These LEDs provide a quick visual check of the state of the system   14 Sun Fire T1000 Server Service Manual   January 2006             Power OK Service  LED power required  on off button LED    FIGURE 2 2 Sun Fire T1000 Server Front Panel    Ac
30. ALOM sc gt  prompt  type     Pound    Period      a To switch from the sc gt  prompt to the console  type console     Service Related ALOM Commands    TABLE 2 4 describes the typical ALOM commands for servicing a Sun Fire T1000  server  For descriptions of all ALOM commands  issue the help command or refer  to the Sun Fire T1000 Server Advanced Lights Out Management  ALOM  Guide     TABLE 2 4 Service Related ALOM Commands       ALOM Command    Description       help  command     clearfault UUID    powercycle   f     poweroff   y    f     poweron   y    c   FRU     removefru  y   FRU     reset   y    c     resetsc   y     setkeyswitch  normal   stby    diag   locked     setlocator  on   off     Displays a list of all ALOM commands with syntax and descriptions   Specifying a command name as an option displays help for that command     Manually clears system faults  UUID is the unique fault ID of the fault to  be cleared     Performs a poweroff followed by poweron  The  f option forces an  immediate poweroff  otherwise the command attempts a graceful  shutdown     Removes the main power from the host server  The  y option enables you  to skip the confirmation question  The  f option forces an immediate  shutdown  CAUTION  Using the  y option to skip the confirmation  question could enable you to inadvertently shut down the system     Applies the main power to the host server  or FRU  The  y option enables  you to skip the confirmation question  The   c  option instructs ALOM to 
31. GURE 2 1 Diagnostic Flow Chart    Chapter 2    Sun Fire T1000 Server Diagnostics    11          TABLE 2 1 Diagnostic Flow Chart Actions   Action For more information  see  No  Diagnostic Action Resulting Action these sections   1  Check the power The amber Fault LED indicates the power cord in    supply fault LED     2  Check the power  cord     3  Run the ALOM  showfaults  command     4  Check fault  message for a Sun  Message ID     5  Enter the Sun  Message ID into  the Sun  Knowledge  Article web site     6  Analyze the  suggested actions     7  Run the ALOM    showenvironment    command     unplugged or the power supply is faulty     If the Fault LED is lit  go to Action 2     Connect the power cord      If the Fault LED is still lit  replace faulty power  supply      If the green LEDs are lit  go to Action 3     The showfaults command displays faults  detected by the system firmware       If faults are displayed  go to Action 2   e If no faults are displayed  go to Action 6   Sun Message IDs  SUNW MSG ID  indicate that    information is available from Sun   s knowledge  article database       If you have a message ID number  go to Action 5     e If you do not have a message ID number  go to  Action 10     Enter the Sun Message ID number into the  knowledge article web site at   http www sun com msg and go to Action 4     In some cases  fault related messages are identified   with suggested actions    If the suggested action recommends replacing a  FRU  go to Action 9   
32. Refer to your application documentation for specific information on these processes   4  Shut down the OS     a  At the Solaris OS prompt  issue the uadmin command to halt the Solaris OS  and to return to the ok prompt          uadmin 2 0   WARNING  proc_exit  init exited  syncing file systems    done  Program terminated   ok             This command is described in Solaris system administration documentation     5  Switch from the system console prompt to the SC console prompt by issuing the      Pound Period  escape sequence        ok     sc gt              b  Using the SC console  issue the poweroff command     sc gt  poweroff  fy    SC Alert  SC Request to Power Off Host Immediately        Note     You can also use the Power On Off button on the front of the server to  initiate a graceful system shutdown     Refer to the Sun Fire T1000 Server Administration Guide for more information about  the ALOM poweroff command     54 Sun Fire T1000 Server Service Manual   January 2006    To Remove the Server From a Rack    If the server is installed in a rack with the extendable slide rails that were supplied  with the server  use this procedure to remove the server chassis from the rack        Optional  Issue the following command from the ALOM SC prompt to locate the  system that requires maintenance        sc gt  setlocator on  Locator LED is on              Once you have located the server  press the Locator button to turn it off       Check to see that no cables will be damaged
33. ademarks or  registered trademarks of Sun Microsystems  Inc  in the U S  and in other countries     All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International  Inc  in the U S  and in other  countries  Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems  Inc     The OPEN LOOK and Sun   Graphical User Interface was developed by Sun Microsystems  Inc  for its users and licensees  Sun acknowledges  the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry  Sun  holds a non exclusive license from Xerox to the Xerox Graphical User Interface  which license also covers Sun s licensees viho implement OPEN  LOOK GUIs and otherwise comply with Sun   s written license agreements     U S  Government Rights   Commercial use  Government users are subject to the Sun Microsystems  Inc  standard license agreement and  applicable provisions of the FAR and its supplements     DOCUMENTATION IS PROVIDED  AS IS  AND ALL EXPRESS OR IMPLIED CONDITIONS  REPRESENTATIONS AND WARRANTIES     INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY  FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT   ARE DISCLAIMED  EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID        Copyright 2006 Sun Microsystems  Inc   4150 Network Circle  Santa Clara  Californie 95054  Etats Unis  Tous droits r  serv  s     Sun Microsyst
34. an display includes system  temperatures  hard drive status  power supply and fan status  and voltage and  current sensors        Note     You do not need user permissions to use this command       At the sc gt  prompt  type the showenvironment command     sc gt  showenvironment    System Temperatures  Temperatures in Celsius         Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard  MB T AMB OK 28  10  5 0 45 50 55  MB CMPO T TCORE OK 50  10  5 0 85 90 95  MB CMPO T BCORE OK 51  10  5 0 85 90 95  MB IOB T CORE OK 49  10  5 0 95 100 105    SYS LOCATE SYS SERVICE SYS ACT  OFF OFF ON       22 Sun Fire T1000 Server Service Manual   January 2006             Fans  Speeds Revolution Per Minute      Sensor Status Speed Warn Low  FTO FO OK 6762 2240 1920  FTO F1 OK 6762 2240 1920  FTO F2 OK 6762 2240 1920  FTO F3 OK 6653 2240 1920       Voltage sensors  in Volts            Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft  MB V_VCORE OK 430 20 24 1 36 1 39  MB V_VMEM OK 1 79 69 72 1 87 1 90  MB V_VTT OK 0 89 0 84 0 86 0 93 0 95  MB V_ 1V2 OK 1 18 09 11 1 28 1 30  MB V_ 1V5 OK 1 49 1 36 1 39 1 60 1 63  MB V_ 2V5 OK 2 51 2 27 2 32 2 67 2 72  MB V_ 3V3 OK 3 29 3 06 3 10 3 49 353  MB V  5V OK 5 02 4 55 4 65 5 35 5 45  MB V_ 12V OK 12 25 10 92 11 16 12 84 13 08  MB V_ 3V3STBY OK 3 33 313 3 16 35 53 3 59  System Load  in amps    Sensor Status Load Warn Shutdown   MB I VCORE OK 20 560 80 000 88 000   MB I_VMEM OK 8 160 60 000 66 000   Current sensors    Sensor Stat
35. and the server     Use an Antistatic Mat    Place ESD sensitive components such as the motherboard  memory  and other PCB  cards on an antistatic mat        Common Procedures for Parts  Replacement    Before you can remove and replace parts that are inside the Sun Fire T1000 server   you must perform the following procedures     m    To Shut the System Down    on page 53   m    To Remove the Server From a Rack    on page 55   m    To Perform Electrostatic Discharge  ESD  Prevention Measures    on page 56  a    To Remove the Top Cover    on page 57    The corresponding procedures that you perform when maintenance is complete are  described in    Common Procedures for Finishing Up    on page 72     Required Tools    The Sun Fire T1000 server can be serviced with the following tools     m Antistatic wrist strap  m Antistatic mat  m No  2 Phillips screwdriver    To Shut the System Down    Performing a graceful shutdown makes sure all of your data is saved and the system  is ready for restart     Chapter 3 Removing and Replacing FRUs 53    1  Log in as superuser or equivalent   Depending on the nature of the problem  you might want to view the system status  or the log files  or run diagnostics before you shut down the system  Refer to the Sun  Fire T1000 Server Administration Guide for log file information    2  Notify affected users   Refer to your Solaris system administration documentation for additional  information    3  Save any open files and quit all running programs     
36. apter 2 Sun Fire T1000 Server Diagnostics 33    under test above  c s  gt  MSG   test error message  c s  gt  END_ERROR    where c   the core number  s   the strand number     Warning and informational messages use the following syntax     INFO or WARNING  message    The following is an example of a POST error message      0 gt Data Bitwalk    0 gt L2 Scrub Data    0 gt L2 Enable    0 gt Testing Memory Channel 0 Rank 0 Stack 0   0 gt Testing Memory Channel 3 Rank 0 Stack 0   0 gt Testing Memory Channel 0 Rank 1 Stack 0             Oo 00000    0 0 gt ERROR  TEST   Data Bitwalk   0 0 gt H W under test   MB CMPO CHO R1 D0 S0  30701    0 0 gt Repair Instructions  Replace items in order listed by  H W  under test    above    0 0 gt MSG   Pin 3 failed on MB CMPO CHO R1 D0 S0  J0701   0 0 gt END_ERROR       0 0 gt Testing Memory Channel 3 Rank 1 Stack 0          In this example  POST is reporting a memory error at DIMM location  MB CMP0 CHO0 R1 D0   J0701      b  Run the showfaults command to obtain additional fault information     The fault is captured by ALOM  where the fault is logged  the Service required  LED is lit  and the faulty component is disabled     34 Sun Fire T1000 Server Service Manual   January 2006    Example        ok      sc gt  showfaults  v   ID Time FRU Fault   1 APR 24 12 47 27 MB CMPO CH2 R0 D0 MB CMPO CH2 R0 D0 deemed  faulty and disabled          In this example  MB CMP0 CH2 R0 D0  DIMM 0 at J0701  is disabled  Until the  faulty component is replaced  the 
37. as the following        sc gt  showfaults  v          ID Time FRU Fault  0 SEP 09 11 09 26 MB CMPO CHO RO0 DO0 Host detected fault  MSGID     SUN4U 8000 2S UUID  7ee0e46b ea64 6565 e684 e996963f7b86             Run the showfaults  v command to obtain the UUID to clear the fault        sc gt  clearfault 7ee0e46b ea64 6565 e684 e996963f7b86  Clearing fault from all indicted FRUs     Fault cleared              m If the fault resulted in the DIMM being disabled  such as the following        sc gt  showfaults  v   ID Time FRU Fault   1 OCT 13 12 47 27 MB CMPO CHO RO0 DO0 MB CMPO CHO RO0 DO  deemed faulty and disabled       Run the enablecomponent command to enable the FRU     sc gt  enablecomponent             7  Perform the following steps to verify that there are no faults     a  Set the virtual keyswitch to diag mode so that POST will run in service mode     sc gt  setkeyswitch diag       b  Issue the poweron command        sc gt  poweron          Chapter 3 Removing and Replacing FRUs 67    68    c  Switch to the system console to view POST output        sc gt  console          Watch the POST output for possible fault messages  The following output is an  indication that POST did not detect any faults     0 0 gt POST Passed all devices    0 0 gt    0 0 gt DEMON   Diagnostics Engineering MONitor  0 0 gt Select one of the  following functions   0 0 gt POST Return to OBP     0 gt INFO     0 gt POST Passed all devices     0 gt Master set ACK for vbsc runpost command and spin       
38. breaking information about the system including required software patches   updated hardware and compatibility information  and solutions to know issues   The product notes are available online at    http    www sun com documentation    Release Notes     The Solaris OS Release Notes contain important information  about the Solaris operating system  The release notes are available online at   http    www sun com documentation    SunSolve    Online     Provides a collection of support resources  Depending on  the level of your service contract  you have access to Sun patches  the Sun System  Handbook  the SunSolve knowledge base  the Sun Support Forum  and additional  documents  bulletins  and related links  Access this site at   http   sunsolve sun com    Predictive Self Healing Knowledge Database     You can access the knowledge  article corresponding to a self healing message by taking the Sun Message  Identifier  SUNW MSG ID  and entering it into the field on this page    http    www sun com msg    Chapter 1 Sun Fire T1000 Server Overview 7    8 Sun Fire T1000 Server Service Manual   January 2006    CHAPTER 2       sun Fire T1000 Server Diagnostics       This chapter describes the diagnostics that are available for monitoring and  troubleshooting the Sun Fire T1000 server  This chapter does not provide detailed  troubleshooting procedures  but instead describes the Sun Fire T1000 server  diagnostics facilities and how to use them     This chapter is intended for technicians
39. e 27     To Remove the Clock Battery on the  Motherboard      Perform the procedures described in    Common Procedures for Parts  Replacement    on page 53       Using a small flat head screwdriver  carefully pry the battery  FIGURE 3 12  from the  motherboard        FIGURE 3 12 Removing the Clock Battery from the Motherboard    Sun Fire T1000 Server Service Manual   January 2006    v To Replace the Clock Battery on the  Motherboard    1  Unpackage the replacement battery     2  Press the new battery into the motherboard  FIGURE 3 13  with the   facing upward        FIGURE 3 13 Replacing the Clock Battery on the Motherboard  3  Perform the procedures described in    Common Procedures for Finishing Up    on  page 72     4  Use the ALOM setdate command to set the day and time     Use the setdate command before you power on the host system  For details about this  command  refer to the Sun Fire T1000 Server Advanced Lights Out Management  ALOM  Guide     Chapter 3 Removing and Replacing FRUs 71       Common Procedures for Finishing Up    v To Replace the Top Cover    1  Place the top cover on the chassis     Set the cover down so that the cover hangs over the rear of the server by about an  inch  2 5 cm      2  Slide the cover forward until it latches into place     72 Sun Fire T1000 Server Service Manual   January 2006    v To Reinstall the Server Chassis in the Rack    Refer to the Sun Fire T1000 System Installation Manual for installation instructions     After you have reins
40. e and for  troubleshooting as described in the following sections     Routine Sanity Check of the Hardware    POST tests critical hardware components to verify functionality before the system  boots and accesses software  If POST detects an error  the faulty component is  disabled automatically  preventing faulty hardware from impacting system  operation     Under normal operating conditions  the server is usually configured to run POST  maximum mode for all power on or error generated resets  This enables the system  to initialize quickly  and still have hardware checkups to ensure a healthy system     Diagnosing the System Hardware    You can use POST as an initial diagnostic tool for the system hardware  In this case   configure POST to run in diagnostic service mode for maximum test coverage and  verbose output     v To Run POST    This procedure describes how to run POST when you want maximum testing  as in  the case when you are troubleshooting a system     1  Switch from the system console prompt to the SC console prompt by issuing the     escape sequence and type the command setsc diag_mode normal     ok       sc gt  setsc diag mode normal       2  Set the virtual keyswitch to diag so that POST will run in service mode        sc gt  setkeyswitch diag             Chapter 2 Sun Fire T1000 Server Diagnostics 31    3  Reset the system so that POST runs     The following example uses the powercycle command  For other methods  refer to  the Sun Fire T1000 Server Administrat
41. ems  Inc  a les droits de propriete intellectuels relatants    la technologie qui est d  crit dans ce document  En particulier  et sans la  limitation  ces droits de propri  t   intellectuels peuvent inclure un ou plus des brevets am  ricains   num  r  s    http   www sun com patents et  un ou les brevets plus suppl  mentaires ou les applications de brevet en attente dans les Etats Unis et dans les autres pays     Ce produit ou document est prot  g   par un copyright et distribu   avec des licences qui en restreignent l   utilisation  la copie  la distribution  et la  d  compilation  Aucune partie de ce produit ou document ne peut   tre reproduite sous aucune forme  par quelque moyen que ce soit  sans  l autorisation pr  alable et   crite de Sun et de ses bailleurs de licence  s il y ena     Le logiciel d  tenu par des tiers  et qui comprend la technologie relative aux polices de caract  res  est prot  g   par un copyright et licenci   par des  fournisseurs de Sun     Des parties de ce produit pourront   tre d  riv  es des syst  mes Berkeley BSD licenci  s par l Universit   de Californie  UNIX est une marque  d  pos  e aux Etats Unis et dans d   autres pays et licenci  e exclusivement par X Open Company  Ltd     Sun  Sun Microsystems  le logo Sun  AnswerBook2  docs sun com  Java  OpenBoot  SunSolve  Sun VTS  Sun Fire  et Solaris sont des marques de  fabrique ou des marques d  pos  es de Sun Microsystems  Inc  aux Etats Unis et dans d   autres pays     Toutes les marques SPARC
42. erver  Installation Guide and Sun Fire T1000 Server Administration Guide     Use this flow chart to understand what diagnostics are available to troubleshoot  faulty hardware  and use TABLE 2 1 to find more information about each diagnostic  in this chapter     For many faults  service can be deferred  either because the faulty component has  been asr d out  the fault is being corrected  or the fault is predictive    Sun Fire T1000 Server Service Manual   January 2006        Suspect  faulty  hardware               3   Are any  faults reported by           1    Numbers in this flowchart       s the power  supply  fault LED  lit              2     Connect power  cord or replace  faulty power    correspond to the Action  numbers in Table 2 1              the showfaults  command                      Isa  fault message  ID  MSG ID   displayed     No          5  Enter the  message ID into  the Sun Knowl    edge Article   web site for  recommended  actions                article recom                FRU        supply   9   Do the  Solaris logs  No indicate a No  faulty FRU   No 10   Identify and  Br replace faulty             ment command  eports overtemp             8     Find cause of  overtemp cond                 mend a FRU  replacement         Yes                        11   Does POST  report any faulty  devices        12   Does SunVTS  report any faulty  devices                 13   Perform recom   mended corrective  actions  If needed   contact Sun for  support             FI
43. eters and POST Modes   Parameter Normal Diagnostic No POST Diagnostic Keyswitch  Mode Execution Service Mode Diagnostic preset   default settings  values   diag_mode normal off service normal   setkeyswitch  normal normal normal diag   diag level max n a max max   diag trigger power on reset none all resets all resets  error reset   diag verbosity normal n a max max    Description of POST  execution    This is the default POST  configuration and  provides a reasonable  compromise between  testing thoroughness  and quick server  initialization     POST does not  run  resulting in  quick system  initialization  but  this is not a  suggested  configuration     POST runs the  full spectrum of  tests with the  maximum output  displayed     POST runs the  full spectrum of  tests with the  maximum output  displayed          The setkeyswitch parameter  when set to diag  overrides all the other ALOM POST variables     v To Change POST Parameters    1  Access the ALOM sc gt  prompt     At the console  issue the    key sequence                       2  At the ALOM sc gt  prompt  use the setsc command to set the POST parameter     Example     sc gt  setsc diag mode service       The setkeyswitch parameter is a command that sets the virtual keyswitch  so it    does not use the setsc command  Example           sc gt  setkeyswitch diag          30    Sun Fire T1000 Server Service Manual   January 2006    Reasons to Run POST    You can use POST for basic sanity checking of the server hardwar
44. ey actually occur  This technology is  incorporated into both the hardware and software of the Sun Fire T2000 server     At the heart of the predictive self healing capabilities is the Solaris Fault Manager  a  new service that receives data relating to hardware and software errors  and  automatically and silently diagnoses the underlying problem  Once a problem is  diagnosed  a set of agents automatically responds by logging the event  and if  necessary  takes the faulty component offline  By automatically diagnosing  problems  business critical applications and essential system services can continue  uninterrupted in the event of software failures  or major hardware component  failures        Chassis Identification    FIGURE 1 3 and FIGURE 1 4 show the physical characteristics of the Sun Fire T1000  server                Locator  Power OK Service   ED button    LED and required  Power LED  On Off   button    FIGURE 1 3 Sun Fire T1000 Server Front Panel    6 Sun Fire T1000 Server Service Manual   January 2006    Ethernet ports PCI E slot          oesennasorssann   7    un    Power supply Locator Service Power OK  LEDS LED  required DB9 System  button LED serial console   port ports    FIGURE 1 4 Sun Fire T1000 Server Rear Panel       Additional Service Related Information    In addition to this document  the following resources are available to help you keep  your server running optimally     Product Notes     The Sun Fire T1000 Server Product Notes  819 3244  contain late  
45. ine at   http    www sun com documentation    Safety Symbols    The following symbols might appear in this document  note their meanings        Caution     There is a risk of personal injury and equipment damage  To avoid  personal injury and equipment damage  follow the instructions        Caution     Hot surface  Avoid contact  Surfaces are hot and might cause personal  injury if touched           Caution     Hazardous voltages are present  To reduce the risk of electric shock and  danger to personal health  follow the instructions        Electrostatic Discharge Safety    Electrostatic discharge  ESD  sensitive devices  such as the motherboard  PCI cards   hard drives  and memory cards require special handling        Caution     The boards and hard drives contain electronic components that are  extremely sensitive to static electricity  Ordinary amounts of static electricity from  clothing or the work environment can destroy components  Do not touch the  components along their connector edges        52 Sun Fire T1000 Server Service Manual   January 2006    Use an Antistatic Wrist Strap    Wear an antistatic wrist strap and use an antistatic mat when handling components  such as drive assemblies  boards  or cards  When servicing or removing server  components  attach an antistatic strap to your wrist and then to a metal area on the  chassis  Do this after you disconnect the power cords from the server  Following this  practice equalizes the electrical potentials between you 
46. ion Guide        sc gt  powercycle  Are you sure you want to powercycle the system  y n   y  Powering host off at MON JAN 10 02 52 02 2000       Waiting for host to Power Off  hit any key to abort   SC Alert  SC Request to Power Off Host     SC Alert  Host system has shut down   Powering host on at MON JAN 10 02 52 13 2000    SC Alert  SC Request to Power On Host              4  Switch to the system console to view the post output     sc gt  console    Example of POST output           SC  Alert  Host system has reset1 Note  Some output omitted   0 0 gt     0 0 gt      ERIE Integrated POST 4 x 0 build 17 2005 08 30 11 25     export common source  firmware re ontario fireball fio build 17 post Niagara erie integrated   firmware re     0 0 gt Copyright    2005 Sun Microsystems  Inc  All rights reserved  SUN PROPRIETARY CONFIDENTIAL    Use is subject to license terms    0 0 gt VBSC selecting POST IO Testing    0 0 gt VBSC enabling threads  1    0 0 gt VBSC setting verbosity level 3   0 0 gt Start Selftest        0 0 gt Init CPU   0 0 gt Master CPU Tests Basic        0 0 gt CPU    0             32 Sun Fire T1000 Server Service Manual   January 2006       SC  Alert  Host system has reset1 Note  Some output omitted   0 0 gt      0 0 gt Test 6291456 bytes at 00000001 00000000 Memory Channel   0 3   Rank 0 Stack 1   0 0 gt 1O Bridge unit 1 ilu init test    0 0 gt 1O Bridge unit 1 tlu init test    0 0 gt 1O Bridge unit 1 lpu init test    0 0 gt 1O Bridge unit 1 link train port B    0 0
47. issipating less heat  than conventional processor designs     Depending on the model purchased  the processor has six or eight UltraSPARC  cores  Each core equates to a 64 bit execution pipeline capable of running four  threads  The result is that the 8 core processor handles up to 32 active threads  concurrently     Additional processor components  such the DDR2 memory controllers  L1 cache  L2  cache  and the Jbus I O interface have been carefully tuned for optimal  performance     shows the major components in the Sun Fire T1000 server     PCI E socket  and slot         Motherboard and  chassis assembly       UltraSPARC T1  mullticore processor       assembly    FIGURE 1 2 Sun Fire T1000 Server Components       Performance Enhancements    The Sun Fire T1000 server introduces several new technologies with its sun4v  architecture and multicore  multithreaded UltraSPARC T1 multicore processor     2 Sun Fire T1000 Server Service Manual   January 2006    TABLE 1 1 lists feature specifications for the Sun Fire T1000 server        TABLE 1 1 Sun Fire T1000 System Features   Feature Description   Processor 1 UltraSPARC T1 multicore processor  6 or 8 cores    Memory 8 slots that can be populated with one of the following types of    Ethernet ports    DB 9 serial port    Internal hard disk  drive    Cooling    PCI interface    Power    Firmware    Operating system    Other software    DDR 2 DIMMs    e 512 MB  4 GB maximum     1 GB  8 GB maximum      2 GB  16 GB maximum     4 ports  1
48. le on or through such sites or resources        Contacting Sun Technical Support    If you have technical questions about this product that are not answered in this  document  go to     http   www sun com service contacting       Sun Welcomes Your Comments    Sun is interested in improving its documentation and welcomes your comments and  suggestions  You can submit your comments by going to     http   www sun com hwdocs feedback  Please include the title and part number of your document with your feedback     Sun Fire T1000 Server Service Manual  part number 819 3248 10    Preface  xi    xii Sun Fire T1000 Server Service Manual   January 2006    CHAPTER 1       Sun Fire T1000 Server Overview       This chapter provides an overview of the features of the Sun Fire T1000 server     The following topics are covered     m    Sun Fire T1000 Server Features    on page 1  a    Chassis Identification    on page 6       Sun Fire T1000 Server Features    The Sun Fire T1000 server FIGURE 1 1 is a high performance  entry level server that is  highly scalable and very reliable        FIGURE 1 1 Sun Fire T1000 Server       Chip Multitheaded  CMT  Multicore  Processor and Memory Technology    The UltraSPARC   T1 multicore processor is the basis of the Sun Fire T1000 server   The UltraSPARC T1 processor is based on chip multithreading  CMT  technology  that is optimized for highly threaded transactional processing  The UltraSPARC T1  processor improves throughput while using less power and d
49. mmands    The Automatic System Recovery  ASR  feature enables the server to automatically  configure failed components out of operation until they can be replaced  In the Sun  Fire T2000 server  the following components managed by the ASR feature     m UltraSPARC T1 processor strands    Memory DIMMs  a I O bus    The database that contains the list of disabled components is called the ASR blacklist   asr db      In most cases  POST and ALOM automatically manage the disabling of faulty  components  When the faulty FRU is replaced  it must be manually enabled     Example  A component appears faulty and is automatically disabled  The problem is  due to a loose connector  and no FRU replacement is required to fix the problem   ALOM  which would normally detect a FRU replacement and enable the FRU  does  not do so  In this case  after the loose cable is reseated  the disabled component must  be manually enabled     Sun Fire T1000 Server Service Manual   January 2006    The ASR commands  TABLE 2 7  enable you to view  and manually add or remove  components from the ASR blacklist  These commands are run from the ALOM sc gt   prompt     TABLE 2 7 ASR Commands       Command Description  showcomponent    Displays system components and their current state   enablecomponent asrkey Removes a component from the asr db blacklist     where asrkey is the component to enable     disablecomponent asrkey Adds a component to the asr db blacklist  where  asrkey is the component to disable     clea
50. n xhost   fest system             where test system is the name of the Sun Fire T1000 server you plan to test       Remotely log in to the Sun Fire T1000 server as superuser     Use a command such as rlogin or telnet     Chapter 2 Sun Fire T1000 Server Diagnostics 45    4  Start Sun VTS software  Type        opt SUNWvts bin sunvts  display display system   0    where display system is the name of the machine through which you are remotely  logged in to the Sun Fire T1000 server        If you have installed SunVTS software in a location other than the default  opt  directory  alter the path in this command accordingly     The SunVTS GUI appears on the display system   s screen     46 Sun Fire T1000 Server Service Manual   January 2006    PNT                    Processor s         Memory         Cryptography   4     SCSI Devices mpto         OtherDevices         Network        USB Devices       FIGURE 2 6 The SunVTS GUI Screen    Chapter 2 Sun Fire T1000 Server Diagnostics 47    5  Expand the test lists to see the individual tests     The test selection area lists tests in categories  such as Network  as shown in   FIGURE 2 7  To expand a category  left click the j icon to the left of the category name  FIGURE 2 7 shows the expand category icon  which looks like a plus sign and appears  to the left of the category name           Y Processor s      Y Memory        Cryptography      Y SCSI Devices mpto      Y OtherDevices       Network        bgeO nettest          bgel netlbtest 
51. nents offline if needed and blacklist  them in the asr db     m Solaris OS predictive self healing  PSH    Continuously monitors the health of  the CPU and memory  and works with ALOM to take a faulty component offline  if needed       Log files and console messages     Provide the standard Solaris OS log files and  investigative commands that can be accessed and displayed on the device of your  choice     m SunVTS      is an application you can run that exercises the system  provides  hardware validation  and discloses possible faulty components with  recommendations for repair     The LEDs  ALOM  Solaris OS PSH  and many of the log files and console messages  are integrated  For example  a fault detected by the Solaris PSH software will display  the fault  log it  pass information to ALOM where it is logged  and depending on the  fault  might result in the illumination of one or more LEDs     The diagnostic flowchart in FIGURE 2 1 and TABLE 2 1 describe an approach for using  the servers diagnostics that is likely identify a faulty field replaceable unit  FRU    The diagnostics you use  and the order in which you use them  depend on the nature  of the problem you are troubleshooting  so you might not follow this flow step by   step     The flowchart assumes that you have already performed some rudimentary  troubleshooting such as verification of proper installation  visual inspection of cables  and power  and possibly reset server  For details  refer to the Sun Fire T1000 S
52. nfigure this Ethernet test       Start testing     Click the Start button that is located at the top left of the SunVTS window  Status  and error messages appear in the test messages area located across the bottom of the  window  You can stop testing at any time by clicking the Stop button     During testing  SunVTS software logs all status and error messages  To view these   click the Log button or select Log Files from the Reports menu  This opens a log  window from which you can choose to view the following logs     m Information    Detailed versions of all the status and error messages that appear in  the test messages area     m Test Error    Detailed error messages from individual tests     m VTS Kernel Error   Error messages pertaining to SunVTS software itself  You  should look here if SunVTS software appears to be acting strangely  especially  when it starts up     a UNIX Messages   var adm messages    A file containing messages generated by  the operating system and various applications     m Log Files   var opt SUNWvts logs    A directory containing the log files     For further information  refer to the documents that accompany the SunVTS  software    Chapter 2 Sun Fire T1000 Server Diagnostics 49    50 Sun Fire T1000 Server Service Manual   January 2006    CHAPTER 3       Removing and Replacing FRUs       This chapter describes how to remove and replace field replaceable units  FRUs  in  the Sun Fire T1000 server     The following topics are covered        Safety
53. nnection to a terminal or terminal server  ALOM  provides a command line interface that you can use to remotely administer  geographically distributed or physically inaccessible machines  In addition  ALOM  enables you to run diagnostics  such as POST  remotely that would otherwise  require physical proximity to the server   s serial port     You can configure ALOM to send email alerts of hardware failures  hardware  warnings  and other events related to the server or to ALOM  The ALOM circuitry  runs independently of the server  using the server   s standby power  Therefore   ALOM firmware and software continue to function when the server operating  system goes offline or when the server is powered off  ALOM monitors the  following Sun Fire T1000 server components     Hard disk drive status   Enclosure thermal conditions   Power supply status   Voltage levels   Faults detected by POST  Power On Self Test    Solaris OS Predictive Self Healing  PSH  diagnostic facilities    For information about configuring and using the ALOM system controller  refer to  the Sun Fire T1000 Server Advanced Lights Out Manager  ALOM  Guide     System Reliability  Availability  and Serviceability    Reliability  availability  and serviceability  RAS  are aspects of a system   s design that  affect its ability to operate continuously and to minimize the time necessary to  service the system  Reliability refers to a system   s ability to operate continuously  without failures and to maintain data in
54. ons that are not  covered here       Insert the PCI Express card into the connector slot and retention bracket     FIGURE 3 5  on the PCI Express riser board       On the rear of the chassis  engage the retention latch  FIGURE 3 4  to secure the card    to the chassis       Perform the procedures described in    Common Procedures for Finishing Up    on    page 72       Run the Solaris printdiag command to verify that the PCI Express card is being    recognized by the system     To Remove the Fan Tray Assembly      Perform the procedures described in    Common Procedures for Parts    Replacement    on page 53       Disconnect the fan power cable from the motherboard       Release the tabs  FIGURE 3 6  on both sides of the fan assembly     Sun Fire T1000 Server Service Manual   January 2006                        Fan tray Ce X  assembly    FIGURE 3 6 Removing the Fan Tray Assembly      Remove the fan assembly from the sheet metal mounting brackets     To Replace the Fan Tray Assembly      Unpackage the replacement fan tray assembly and place it on an antistatic mat       Align the fan tray assembly with the sheet metal mounting brackets and slide it  into place until tabs on each side lock it into place       Reconnect the fan power cable to the motherboard       Perform the procedures described in    Common Procedures for Finishing Up    on  page 72       Verify that the Service required and Locator LEDs are not lit     To Remove the Power Supply      Perform the procedures de
55. rasrdb Removes all entries from the asr db blacklist          The showcomponent command may not report all blacklisted DIMMs     Note     The components  asrkeys  vary from system to system  depending on how  many cores and memory are present  Use the showcomponent command to see the  asrkeys on a specific system        Note     A reset or powercycle is required after disabling or enabling a  component  If component status is changed with power on there is no effect to the  system until the next reset or powercycle The following examples show the output  of these commands        v To Run the showcomponent Command    The showcomponent command displays the system components  asrkeys  and  reports their status     1  At the sc gt  prompt  enter the showcomponent command     Chapter 2 Sun Fire T1000 Server Diagnostics 41    Example with no disabled components        sc gt  showcomponent    Keys     ASR state  clean             Example showing a disabled component         sc gt  showcomponent    Keys     ASR state  Disabled Devices  MB CMP0 CH3 R1 D1   dimm8 deemed faulty             To Run the disablecomponent Command    The disablecomponent command disables a component by adding it to the ASR  blacklist     1  At the sc gt  prompt  enter the disablecomponent command        sc gt  disablecomponent MB CMPO CH3 R1 D1    sc gt SC Alert MB CMP0 CH3 R1 D1 disabled       2  After receiving confirmation that the disablecomponent command is complete   reset the server for so that the ASR
56. rform Electrostatic  Discharge  ESD  Prevention Measures    on page 56          Perform the procedures described in    Common Procedures for Parts  Replacement    on page 53       Locate the DIMM  FIGURE 4 8  that you want to replace     Use FIGURE 3 11 and TABLE 3 1 to identify the DIMM you want to remove       Make note of the DIMM location so you can install the replacement DIMM in the  same socket       Push down on the ejector levers on each side of the DIMM until the DIMM is  released                                                                                                                                                                                                                        lt  Front Back  gt   J0501 J0701 J0601 J0801  Channel 0 i   B           DIMM4         DIMM 1  B   H  DIMM 0  DIMM 0     Rank 0 Rank 1  Channel 3 E  DIMM 0   E    I      DIMM0     DIMM 1     EB       B            DIMM 1        J1301 J1101 J1201 J1001            FIGURE 3 11 DIMM Locations    TABLE 3 1 maps the DIMM names that are displayed in faults to the socket numbers  that identify the location of the DIMM on the motherboard     Chapter 3 Removing and Replacing FRUs 65    66    TABLE 3 1 DIMM Names and Socket Numbers       Socket Number DIMM Name Used in Messages   J0501 CH0 RO0 D0  J0601 CHO RO D1  JO701 CHO R1 D0  JO801 CHO R1 D1  J1001 CH3 R0 D0  J1101 CH3 R0 D1  J1201 CH3 R1 D0  J1301 CH3 R1 D1         DIMM names in messages are displayed with the full name such as MB CMP0
57. rt No  72T256220HR3 7A  D Vendor Serial No  d03e620       FRU PROM at MB CMPO CH3 R0 D1 SEEPROM      SPI    SPI    SPI    SPI    SPI    SPI       D Timestamp  MON OCT 03 12 00 00 2005  D Description  DDR2 SDRAM  2048 MB  D Manufacture Location    D Vendor  Infineon  formerly Siemens   D Vendor Part No  72T256220HR3 7A  D Vendor Serial No  d040920       FRU PROM at MB CMP0 CH3 R1 D0 SEEPROM      SPI    SPI    SPI    SPI    SPI    SPI       D Timestamp  MON OCT 03 12 00 00 2005  D Description  DDR2 SDRAM  2048 MB  D Manufacture Location    D Vendor  Infineon  formerly Siemens   D Vendor Part No  72T256220HR3 7A  D Vendor Serial No  d03ec27       FRU PROM at MB CMP0 CH3 R1 D1 SEEPROM          Sun Fire T1000 Server Service Manual   January 2006           SPD Timestamp  MON OCT 03 12 00 00 2005   SPD Description  DDR2 SDRAM  2048 MB   SPD Manufacture Location     SPD Vendor  Infineon  formerly Siemens    SPD Vendor Part No  72T256220HR3 7A   SPD Vendor Serial No  d040924          sce       If you do not provide a command line argument  all FRUs are listed        Running POST    Power on self test  POST  is a group of PROM based tests that run when the server  is powered on or reset  POST checks the basic integrity of the critical hardware  components in the server  motherboard  memory  and I O buses      If POST detects a faulty component  it is disabled automatically  If the system is  capable of running without the disabled component  the system will boot when  POST is complete 
58. s   ManR Initial HW Dash Level  02    ManR Initial HW Rev Level  O1    ManR Shortname  PS    SpecPartNo  885 0407 02    FRU PROM at MB CMPO CHO R0 D0 SEEPROM   SPD Timestamp  MON OCT 03 12 00 00 2005   SPD Description  DDR2 SDRAM  2048 MB   SPD Manufacture Location     SPD Vendor  Infineon  formerly Siemens    SPD Vendor Part No  72T256220HR3 7A   SPD Vendor Serial No  d03fe27          FRU PROM at MB CMPO CHO RO D1 SEEPROM   SPD Timestamp  MON OCT 03 12 00 00 2005   SPD Description  DDR2 SDRAM  2048 MB   SPD Manufacture Location     SPD Vendor  Infineon  formerly Siemens    SPD Vendor Part No  72T256220HR3 7A             Chapter 2 Sun Fire T1000 Server Diagnostics 25       26           SPD Vendor Serial No  d03  623    FRU PROM at MB CMPO CHO R1 D0 SEEPROM   SPD Timestamp  MON OCT 03 12 00 00 2005   SPD Description  DDR2 SDRAM  2048 MB   SPD Manufacture Location     SPD Vendor  Infineon  formerly Siemens    SPD Vendor Part No  72T256220HR3 7A   SPD Vendor Serial No  d03fc26          FRU PROM at MB CMPO CHO R1 D1 SEEPROM   SPD Timestamp  MON OCT 03 12 00 00 2005   SPD Description  DDR2 SDRAM  2048 MB   SPD Manufacture Location     SPD Vendor  Infineon  formerly Siemens    SPD Vendor Part No  72T256220HR3 7A   SPD Vendor Serial No  d03eb26       FRU PROM at MB CMPO CH3 R0 D0 SEEPROM      SPI    SPI    SPI    SPI    SPI    SPI    D Timestamp  MON OCT 03 12 00 00 2005  D Description  DDR2 SDRAM  2048 MB  D Manufacture Location    D Vendor  Infineon  formerly Siemens   D Vendor Pa
59. s below limits           Using ALOM For Diagnosis and Repair  Verification    The Sun Advanced Lights Out Manager  ALOM  is a system controller on the Sun  Fire T1000 server motherboard that enables you to remotely manage and administer    your server     Chapter 2 Sun Fire T1000 Server Diagnostics 17    18    ALOM enables you to run diagnostics remotely such as power on self test  POST    that would otherwise require physical proximity to the server s serial port  You can  also configure ALOM to send email alerts of hardware failures  hardware warnings   and other events related to the server or to ALOM     The ALOM circuitry runs independently of the server  using the server s standby  power  Therefore  ALOM firmware and software continue to function when the  server operating system goes offline or when the server is powered off        Note     For comprehensive ALOM information  refer to the Sun Fire T1000 Server  Advanced Lights Out Manager  ALOM  guide        Faults detected by ALOM  POST  and the Solaris Predictive Self healing  PSH   technology are forwarded to the ALOM for fault handling  FIGURE 2 4      In the event of a system fault  ALOM ensures that the Service required LED is lit   FRU ID PROMs are updated  the fault is logged  and alerts are displayed      Pr  Service Required LED      Environmentals FRU LEDs    EG    H  p ALOM  OST  gt   fault manager F    FRUID PROMs   PROMs  Solaris PSH Si   Logs ne    Lye alerts               FIGURE 2 4 ALOM Fault Managemen
60. sS  R  SUN    microsystems    Sun Fire  T1000 Server  Service Manual    Sun Microsystems  Inc   www sun com    Part No  819 3248 10  January 2006  Revision A    Submit comments about this document at  http    www sun com hwdocs  feedback    Copyright 2006 Sun Microsystems  Inc   4150 Network Circle  Santa Clara  California 95054  U S A  All rights reserved     Sun Microsystems  Inc  has intellectual property rights relating to rd fiat is described in this document  In particular  and without  limitation  these intellectual property rights may include one or more of the U S  patents listed at http      www sun com patents and one or  more additional patents or pending patent applications in the U S  and in other countries     This document and the product to which it pertains are distributed under licenses restricting their use  copying  distribution  and  decompilation  No part of the product or of this document may be reproduced in any form by any means without prior written authorization of  Sun and its licensors  if any     Third party software  including font technology  is copyrighted and licensed from Sun suppliers     Parts of the s produtt may be derived from Berkeley BSD systems  licensed from the University of California  UNIX is a registered trademark in  the U S  and in other countries  exclusively licensed through X Open Company  Ltd     Sun  Sun Microsystems  the Sun logo  Answerbook2  docs sun com  Java  OpenBoot  SunSolve  Sun VTS  Sun Fire  and Solaris are tr
61. scribed in    Common Procedures for Parts  Replacement    on page 53       Disconnect the power cable from the motherboard and pull it through the  midwall       Loosen the fastener  FIGURE 3 7  on the front of the power supply and slide the  power supply forward to remove it from the chassis     Chapter 3 Removing and Replacing FRUs 61       FIGURE 3 7 Removing the Power Supply    v To Replace the Power Supply    1  Unpackage the replacement power supply     2  Slide the power supply into the chassis and engage the two alignment pins in the  rear of the chassis that mate with the power supply     3  Tighten the fastener  FIGURE 3 8  to lock the power supply into place in the chassis     4  Redress the power cable through the midwall in the chassis and connect the cable  to the motherboard     5  Perform the procedures described in    Common Procedures for Finishing Up    on  page 72     6  Verify that the amber Fault LED on the replaced power supply and the Service  required LED is not lit     7  At the sc gt  prompt  issue the showenvironment command to verify the status of    the power supply     62 Sun Fire T1000 Server Service Manual   January 2006    Fastener       Power supply    FIGURE 3 8 Replacing the Power Supply    v To Remove the Hard Drive    1  Perform the procedures described in    Common Procedures for Parts  Replacement    on page 53     2  Disconnect the cable from the hard drive     3  Unsnap the catches on the latches  FIGURE 3 9  on the front of the disk
62. ssembly    on  page 60  4 Power supply    To Remove the The power supply provides  3 3 Vde PS0  unit  PS  Power Supply    on standby power at 3   3 Amps and 12  page 61 Vdc at 25 Amps   5 Hard drive    To Remove the SATA disk drive  3 5 inch form factor HDO  Hard Drive    on  page 63  6 PCI Express    To Remove the Optional add on express card PCIO  card slot Optional PCI  Express Card    on  page 58  7 Clock battery    To Remove the Battery is located on the motherboard  SC BAT  Clock Battery on  the Motherboard     on page 70  8 SEEPROM Remove and The socketed SEEPROM contains the  MB SEEPROM    replace the  socketed  SEEPROM     MAC address and system  configuration information        Appendix A    Field Replaceable Units  FRUs  77    TABLE A 2 Location of DIMMs       Connector Number    J0501  J0601  J0701  J0810  J1001  J1101  J1201  J1301    Location    MB CMP0 CH0 RO D0  MB CMP0 CH0 RO D1  MB CMP0 CH0 R1 D0  MB CMP0 CH0 R1 D1  MB CMP0 CH3 R0 D0  MB CMP0 CH3 R0 D1  MB CMP0 CH3 R1 D0  MB CMP0 CH3 R1 D1             78 Sun Fire T1000 Server Service Manual   January 2006    
63. stem  Administration Guide    Advanced Lights Out Management   ALOM  CMT v1 1 Guide    Site planning information for the  Sun Fire T1000 server    Late breaking information about the  server  The latest notes are posted at   http    www sun com documentation    Provides an overview of the features of  this server    Information about where to find  documentation to get your system  installed and running quickly    Detailed rack mounting  cabling  power   on  and configuration information    How to perform administrative tasks that  are specific to the Sun Fire T1000 server    How to use the Advanced Lights Out  Manager  ALOM  software on the Sun  Fire T1000 server    819 3246    819 3244    819 3247    819 3249    819 3248    819 3250    819 3246          Accessing Sun Documentation    You can view  print  or purchase a broad selection of Sun    documentation     including localized versions  at     http    www sun com documentation       Third Party Web Sites    Sun is not responsible for the availability of third party web sites mentioned in this  document  Sun does not endorse and is not responsible or liable for any content   advertising  products  or other materials that are available on or through such sites    x Sun Fire T1000 Server Service Manual   January 2006    or resources  Sun will not be responsible or liable for any actual or alleged damage  or loss caused by or in connection with the use of or reliance on any such content   goods  or services that are availab
64. stem is running at a minimum  level in standby and is ready to be quickly returned to full  function  The service processor is running    e Slow blink     Indicates that a normal transitory activity is taking  place  This could indicate that the system diagnostics are  running  or that the system is booting    The Power On  Off button turns the server on and off  There is no   Power On Off button on the rear panel    Ethernet Green These LEDs indicate that there is activity on the associated net s    Activity  LEDs    Sun Fire T1000 Server Service Manual   January 2006    TABLE 2 2 Front and Rear Panel LEDs       LED Color Description   Ethernet Yellow These LEDs indicate that the system is linked to the associated  Link LEDs net s     System Green This LED indicates that there is activity on the associated system  console console    Activity   LED   System Yellow These LEDs indicate that the system is linked to the associated  console Link system console    LED         Provided on the front and rear panel     Power Supply LEDs    The power supply LEDs  TABLE 2 3  are located on the back of the power supply     TABLE 2 3 Power Supply LEDs       Name Color  Fault Amber  DC OK Green  AC OK Green    Description    On   Power supply has detected a failure   Off     Normal operation     On   Normal operation  DC output voltage is within normal limits   Off     Power is off     On   Normal operation  Input power is within normal limits   Off     No input voltage  or input voltage i
65. system can boot using memory that was not  disabled        Note     You can use ASR commands to display and control disabled components   See    Managing System Components with Automatic System Recovery Commands     on page 40           Using the Solaris Predictive Self Healing  Feature    The Solaris OS predictive self healing technology enables Sun Fire T1000 server to  diagnose problems while the Solaris OS is running  and mitigate many serious  problems before they occur     The Solaris OS uses the fault manager daemon  fmd  1M    which starts at boot time  and runs in the background to monitor the system  If a component generates an  error  the daemon handles the error by correlating the error with data from previous  errors and other related information to diagnose the problem  Once diagnosed  the  fault manager daemon assigns the problem a unique identifier  UUID  that  distinguishes the problem across any set of systems  When possible  the fault  manager daemon initiates steps to self heal the failed component and take the  component offline  The daemon also logs the fault to the syslogd daemon and  provides a fault notification with a message ID  MSGID   You can use message ID to  get additional information about the problem from Sun   s knowledge article  database     The predictive self healing technology covers the following Sun Fire T1000 server  components     a UltraSPARC T1 multicore processor    Memory  a I O bus    Chapter 2 Sun Fire T1000 Server Diagnostics
66. systems  it is imperative that  the System Controller be checked for evidence of a  faulty system board to ensure that the appropriate  service action is performed                 Use the fmdump 1M  command   fmdump  vu  lt event id gt     to view the results of diagnosis and the specific Field  Replaceable Unit  FRU  identified for repair     The event id can be found in the EVENT ID field of the  message  For example    EVENT ID    39b30371   009 c76c 90ee b245784d2277       Details  The Message ID  SUN4U 8000 2S indicates the  Solaris Fault Manager has received reports that multiple  correctable  single bit  errors associated with a memory  DIMM module have been detected  Diagnosis applied to  the error reports has determined that a fault requiring  repair action is present     A service case should be opened and time scheduled to  replace the FRU  identified in the fmdump 1M  output   on which the suspect DIMM is located        Sun Fire T1000 Server Service Manual   January 2006          If Customer Enabled Services apply to the product then  refer to the FRU replacement procedures in the  appropriate service manual     c  Follow the suggested actions to repair the fault        Collecting Information From Solaris OS  Files and Commands    With the Solaris OS running on the Sun Fire T1000 server  you have the full  compliment of Solaris OS files and commands available for collecting information  and for troubleshooting     If POST  ALOM  or the Solaris PSH features did not
67. t    ALOM sends alerts to all ALOM users that are logged in  sending the alert through  email to a configured email address  and writing the event to the ALOM event log     m Fault recovery     The system automatically detects that the fault condition is no  longer present  ALOM extinguishes the Service required LED and updates the  FRUs PROM  indicating that the fault is no longer present     a Fault repair     The fault has been repaired by human intervention  In most cases   ALOM detects the repair and extinguishes the Service required LED  In the event  that ALOM does not perform these actions  you must perform these tasks  manually with clearfault or enablecomponent commands     ALOM can detect the removal of a FRU  in many cases even if the FRU is removed  while ALOM is powered off  This enables ALOM to know that a fault  diagnosed to  a specific FRU  has been repaired  The ALOM clearfault command enables you to    Sun Fire T1000 Server Service Manual   January 2006    manually clear certain types of faults without a FRU replacement or if ALOM was  unable to automatically detect the FRU replacement  ALOM does not automatically  detect hard drive replacement     Persistent environmental faults can automatically recover  A temperature that is  exceeding a threshold may return to normal limits  An unplugged a power supply  can be plugged in and so on  Recovery of environmental faults is automatically  detected  Recovery events are reported using one of two forms     fru at
68. talled the server chassis in the rack  reconnect all cables that you  disconnected when you remover the chassis from the rack     v To Apply Power to the Server    1  Reconnect the power cord to the power supply        Note     As soon as the power cord is connected  standby power is applied   Depending on the configuration of the firmware  the system might boot           Safety Information    on page 43    Chapter 3 Removing and Replacing FRUs 73    74 Sun Fire T1000 Server Service Manual   January 2006    APPENDIX A    Field Replaceable Units  FRUs        FIGURE A 1 shows the locations of the field replaceable units  FRUs  in the Sun Fire  T1000 server  TABLE A 1 lists the FRUs  TABLE A 2 lists the locations of the DIMMs   The Channel Rank DIMM locations     75       FIGURE A 1 Field Replaceable Units    76 Sun Fire T1000 Server Service Manual   January 2006          TABLE A 1 Sun Fire T1000 Server FRU List  Replacement  Item No  CRU Instructions Description Location  1 Motherboard    To Remove the The motherboard and chassis are MB  and chassis Motherboard and replaced as a single assembly  The  assembly Chassis    on motherboard is provided in different  page 68 configurations to accommodate the  different processor models  6 core and  8 core    2 DIMMs    To Remove Can be ordered in the following sizes  See TABLE A 2  DIMMs    on e 512 MB and  page 65 e 1GB FIGURE 3 11   e 2GB  3 Fan assembly    To Remove the A single assembly containing 4 fans      FAN TRAY  Fan Tray  A
69. tegrity  System availability refers to the  ability of a system to recover to an operational state after a failure  with minimal  impact  Serviceability relates to the time it takes to restore a system to service  following a system failure  Together  reliability  availability  and serviceability  features provide for near continuous system operation     To deliver high levels of reliability  availability  and serviceability  the Sun Fire T1000  server offers the following features     Environmental monitoring   Error detection and correction for improved data integrity   Easy access for most component replacements   Extensive POST tests that automatically delete faulty components from the  configuration     Sun Fire T1000 Server Service Manual   January 2006    m PSH automated run time diagnosis capability that takes faulty components off  line     For more information about using RAS features  refer to the Sun Fire T1000 Server  System Administration Guide     Environmental Monitoring    The Sun Fire T1000 server features an environmental monitoring subsystem  designed to protect the server and its components against     Extreme temperatures   Lack of adequate airflow through the system  Power supply failure   Hardware faults    Temperature sensors throughout the system monitor the ambient temperature of the  system and internal components  The software and hardware ensure that the  temperatures within the enclosure do not exceed predetermined safe operating  ranges  If the
70. the Clock Battery on the Motherboard    on page 71    To locate these CRUs  refer to Appendix A     Field Replaceable Units  FRUs     on  page 75     v To Remove the Optional PCI Express Card    Use this procedure to remove the optional low profile PCI Express card from the  server     1  Perform the procedures described in    Common Procedures for Parts  Replacement    on page 53     2  Remove any cable s  that are attached to the card     58 Sun Fire T1000 Server Service Manual   January 2006    3  On the rear of the chassis  release the retention latch  FIGURE 3 5 that secures the  PCI Express card to the chassis     PCI Express card         Retention latch    FIGURE 3 4 Releasing the PCI Express Card Retention Latch    4  Gently work the PCI Express card out of the socket on the PCI Express riser board  FIGURE 3 5  and the retention bracket     Retention bracket  PCI Express card riser board    FIGURE 3 5 Removing and Replacing the PCI Express Card    Chapter 3 Removing and Replacing FRUs 59    60      Place the PCI Express card on an antistatic mat     To Add or Replace the Optional PCI Express  Card    Use this procedure to replace the PCI Express card       Unpackage the replacement PCI Express card and place it on an antistatic mat     Note     Only low profile PCI E cards with low brackets will fit into the chassis   There are a variety of PCI E cards on the market  Read the product documentation  for your device for additional installation requirements and instructi
71. the System Using SunVTS Software 44  v To Exercise the System Using SunVTS Software 45    iv Sun Fire T2000 Server Service Manual   January 2006    For further information  refer to the documents that accompany the SunVTS    software 49    3  Removing and Replacing FRUs 51    Safety Information 51    Safety Symbols 52    Electrostatic Discharge Safety 52    Use an Antistatic Wrist Strap 53  Use an Antistatic Mat 53    Common Procedures for Parts Replacement 53    Required Tools 53    v    Y    v    v    To Shut the System Down 53  To Remove the Server From a Rack 55  To Perform Electrostatic Discharge  ESD  Prevention Measures 56    To Remove the Top Cover 57    Removing and Replacing CRUs 57    v    4444 A4 A4 A4 id di    To Remove the Optional PCI Express Card 58   To Add or Replace the Optional PCI Express Card 60  To Remove the Fan Tray Assembly 60   To Replace the Fan Tray Assembly 61   To Remove the Power Supply 61   To Replace the Power Supply 62   To Remove the Hard Drive 63   To Replace the Hard Drive 64   To Remove DIMMs 65   To Add or Replace DIMMs 66   To Remove the Motherboard and Chassis 68   To Replace the Motherboard and Chassis Assembly 69  To Remove the Clock Battery on the Motherboard 70    Contents    v    v To Replace the Clock Battery on the Motherboard 71  Common Procedures for Finishing Up 72   v To Replace the Top Cover 72   v To Reinstall the Server Chassis in the Rack 73    v To Apply Power to the Server 73    A  Field Replaceable Units  FRUs  75   
72. tivity Link an  i console  Fault LED Power OK LED LED Te DB9 Sais cons          LED Locator Servic   required Ethernet Link Activity   ED LED LED ports LED LED    FIGURE 2 3 Sun Fire T1000 Server Rear Panel LEDs    Chapter 2 Sun Fire T1000 Server Diagnostics 15    16    Front and Rear Panel LEDs    Two LEDs and one LED button are located in the upper left corner of the front  panel  TABLE 2 2   The LEDs are also provided on the rear panel           TABLE 2 2 Front and Rear Panel LEDs   LED Color Description   Locator White Enables you to identify a particular server  The LED is controlled  LED    and using one of the following methods    button   Issuing the setlocator on or off command    e Pressing the button to toggle the indicator on or off    This LED provides the following indications      Off  Normal operating state    e Fast blink     The server received a signal as a result of one of the  preceding methods and is indicating here I am   that it is  operational    Service Yellow If on  indicates that service is required  The ALOM showfaults   required command will indicate any faults causing this indicator to light    LED    Power OK Green The LED provides the following indications    LED  and e Off   The system is unavailable  Either it has no power or   Power ALOM is not running    On Off Steady on     Indicates that the system is powered on and is   button running in its normal operating state  No service actions are  required    e Standby blink     Indicates the sy
73. us   MB BAT V_BAT OK   Power Supplies    Supply Status Underspeed Overtemp Overvolt Undervolt Overcurrent  PSO OK OFF OFF OFF OFF OFF          Chapter 2 Sun Fire T1000 Server Diagnostics 23    Note     Some information might not be available when the server is in standby  mode        v To Run the showfru Command       Note     By default  the output of the showfru command for all FRUs is very long        The showfru command displays information about the FRUs in the server  Use this  command to see information about an individual FRU  or for all the FRUs        Note     You do not need user permissions to use this command        24 Sun Fire T1000 Server Service Manual   January 2006      Atthe sc gt  prompt  enter the showfru command        sc gt  showfru  s  FRU_PROM at MB SEEPROM                   SEGMENT  SD    ManR    ManR UNIX_Timestamp32  TUE OCT 18 21 17 55 2005   ManR Description  ASSY  Sun Fire T1000 Motherboard   ManR Manufacture Location  Sriracha Chonburi  Thailand   ManR Sun Part No  5017302    ManR Sun Serial No  002989    ManR Vendor  Celestica    ManR Initial HW Dash Level  03    ManR Initial HW Rev Level  O1    ManR Shortname  T1000_MB    SpecPartNo  885 0505 04    FRU PROM at PS0 SEEPROM                SEGMENT  SD    ManR    ManR UNIX_Timestamp32  SUN JUL 31 19 45 13 2005   ManR Description  PSU  300W  AC_INPUT  A207   ManR Manufacture Location  Matamoros  Tamps  Mexico   ManR Sun Part No  3001799    ManR Sun Serial No  G00001    ManR Vendor  Tyco Electronic
74. wfru Command    on    page 24  showkeyswitch Displays the status of the virtual keyswitch   showlocator Displays the current state of the Locator LED as either on or off     showlogs   b lines    e lines     Displays the history of all events logged in the ALOM event buffer   g lines    v     showplat form   v  Displays information about the host system   s hardware configuration  and  whether the hardware is providing service           Note     For the ALOM ASR commands  see TABLE 2 7        v To Run the showfaults Command    The showfaults command displays faults handled by ALOM  Use the  showfaults command for the following reasons     m To see if any faults have been passed to  or detected by ALOM   a To obtain the fault message ID  SUNW MSG ID      m To verify that the replacement of a FRU has cleared the fault and not generated  any additional faults     Chapter 2 Sun Fire T1000 Server Diagnostics 21      At the sc gt  prompt  type the showfaults command        sc gt  showfaults  v  Last POST run  WED OCT 20 19 32 24 2004  POST status  Passed all devices    ID Time FRU Fault  1 OCT 21 14 32 48 MB CMP0 CHO R1 D0 Host detected fault   MSGID     SUN4U 8000 2S UUID  a26d5379 24b8 4a46 bcbf d9elff75albc             In this example  showfaults is reporting a memory error at DIMM location  MB CMP0 CH0 R1 D0   10701      v To Run the showenvironment Command    The showenvironment command displays a snapshot of the server   s  environmental status  The information this command c
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  Bosch NWC-0900 User's Manual  AE30H Funkempfänger - Alan-Albrecht Service  名 SRH-M226AT  User Manual - jawon medical  Infotainment manual - Cruze EU, v.1 (rev ), fr-FR  Manuel d`installation de PharmTaxe    Copyright © All rights reserved. 
   Failed to retrieve file