Home
        Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide
         Contents
1.           x               c  From the Event Logging Details screen  select View Event Log     All unread events are displayed   4  View the BMC system event log     a  From the BIOS Main Menu screen  select Advanced     The Advanced Settings screen is displayed     b  From the Advanced Settings screen  select IPMI 2 0 Configuration   The Advanced Menu IPMI 2 0 Configuration screen is displayed     Appendix A Event Logs and POST Codes    23    Advanced  KKK KEK KKK KEK KEK KEKE KEKE KKK KKK KKK KEKE KKK KEKE KKK KEK KKK KKK KEK KEK KKK KEKE KR KKK KKK KKK KKK KEK KKK KKKEEEK                                       IPMI 2 0 Configuration   View all events in the    x kxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK BMC Event Log       Status Of BMC Working          View BMC System Event Log   Tt will take up to      Reload BMC System Event Log   60 Seconds approx  bi    Clear BMC System Event Log   to read all        LAN Configuration   BMC SEL records         PEF Configuration 4 a    BMC Watch Dog Timer Action  Disabled                            is   Select Screen 7  F LR Select Item E  x   Enter Go to Sub Screen    7   pl General Help 7      FLO Save and Exit         ESC Exit                    KKEKEKKKKKKKKEKKEKKKK KKK KEK KK KKKE KKK KKK KR KKK KKK KEKE KKK KEK KK KKK KEK AAA RARA ARA ARA RAR AAA AR    v02 61  C Copyright 1985 2006  American Megatrends  Inc           c  From the IPMI 2 0 Configuration screen  select View BMC System Event Log     The log takes about 60
2.       How DIMM Errors Are Handled by the System    on page 12     Isolating and Correcting DIMM ECC Errors    on page 18    DIMM Population Rules    The DIMM population rules for the server are as follows     Each CPU can support a maximum of eight DIMMs     The DIMM slots are paired and the DIMMs must be installed in pairs  0 1  2 3  4   5  and 6 7   See FIGURE 3 1 and FIGURE 3 2  The memory sockets are colored black  or white to indicate which slots are paired by matching colors     DIMMs are populated starting from the outside  away from the CPU  and  working toward the inside     CPUs with only a single pair of DIMMs must have those DIMMs installed in that  CPU   s outside white DIMM slots  6 and 7   See FIGURE 3 1 and FIGURE 3 2     Only DDR2 800 Mhz  667Mhz  and 533Mhz DIMMs are supported     Each pair of DIMMs must be identical  same manufacturer  size  and speed         DIMM Replacement Policy    Replace a DIMM when one of the following events takes place    a The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors   UCEs     a UCEs occur and investigation shows that the errors originated from memory   In addition  a DIMM should be replaced whenever more than 24 Correctable    Errors  CEs  originate in 24 hours from a single DIMM and no other DIMM is  showing further CEs     m If more than one DIMM has experienced multiple CEs  other possible causes of  CEs have to be ruled out by a qualified Sun Support specialist before replacing  any DIMMs     R
3.     This section lists facts and considerations about how the server handles system  errors  SERR      m System error handling works through the HyperTransport Synch Flood Error  mechanism on 8111 and 8131     m The following events happen during BIOS POST     a POST reports any previous system errors at the bottom of screen  See  FIGURE D 4 for an example     FIGURE D 4 POST Screen  Previous System Error Listed    i SS  American Qy SUN     microsystems      www ami com  BMC Firmware Revision   1 00   hecking NURAM      Initializing USB Controllers    Done    Press F2 to run Setup  CTRL E on Remote Keyboard    Press F12 to boot from the network  CTRL N on Remote Keyboard     ISB Device s   3 Keyboards  3 Mice  2 Storage Devices  luto Detecting Pri Master   ATAPI CDROM  Pri Master  DU 28SL 1 0A DE 8  Ultra DMA Mode 2  Auto detecting USB Mass Storage Devices    Device  01 AMI Virtual CDROM  Device  02   AMI Virtual Floppy   2 USB mass storage devices found and configured  0085  BMC Respond ing  1 Hyper Transport sync flood error occurred on last boot  PCI System Error         SERR and Hypertransport Synch Flood Error are logged in DMI and the SP  SEL  See the following sample output           SEL Record ID   0a00   Record Type   00   Timestamp   08 10 2005 06 05 32  Generator ID   0001       Appendix D Error Handling 61    62       EvM Revision   04   Sensor Type   Critical Interrupt  Sensor Number   00   Event Type   Sensor specific Discrete  Event Direction   Assertion Event 
4.    m The system enters Halt mode and the following message is displayed        XXXXXXX X Warning  Bad Mix of Processors       x x  x   Multiple core processors cannot be installed with single core  processors    Fatal Error    System Halted              Appendix D Error Handling 63       Hardware Error Handling Summary    TABLE D 1 summarizes the most common hardware errors that you might encounter  with these servers        TABLE D 1 Hardware Error Handling Summary  Logged  DMI  Log or SP  Error Description Handling SEL  Fatal   SP failure The SP fails to boot The SP controls the system reset  so the Not logged Fatal  upon application of system may power on  but will not come out  system power  of reset      During power up  the SP s boot loader  turns on the power LED      During SP boot  Linux startup  and SP  sanity check  the power LED blinks     The LED is turned off when SP  management code  the IPMI stack  is  started   e At exit of BIOS POST  the LED goes to  STEADY ON state   SP failure SP boots but fails The SP controls the system RESET  so the Not logged Fatal  POST  system will not come out of reset   BIOS POST Server BIOS does There are fatal and non fatal errors in POST   failure not pass POST  The BIOS does detect some errors that are    announced during POST as POST codes on  the bottom right corner of the display on the  serial console and on the video display  Some  POST codes are forwarded to the SP for  logging    The POST codes do not come out in  sequen
5.   Event Data   OSFFFE   Description   PCI SERR                m FIGURE D 5 shows an example DMI log screen from the BIOS Setup Page with a  system error     FIGURE D 5 DMI Log Screen with Error    BIOS SETUP UTILITY    y duanced    View Event    len     U    09 12 05 14 23 47    A Hyper Transpor J flood error occurred or    2 53  C  Copyright 1985 erican Megatrends  Inc       Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Handling Mismatching Processors    This section lists facts and considerations about how the server handles mismatching  processors     m The BIOS performs a complete POST     m The BIOS displays a report of any mismatching CPUs  as shown in the following  example        AMIBIOS C 2003 American Megatrends  Inc    BIOS Date  08 10 05 14 51 11 Ver  08 00 10   CPU   AMD Opteron tm  Processor 254  Speed   2 4 GHz  Count   3  CPU Revision  CPUO   E4  CPUL   E6  Microcode Revision  CPUO   0  CPUL   0   DRAM Clocking CPUO   400 MHz  CPU1 Core0 1   400 MHz  Sun Fire Server  1 AMD North Bridge  Rev E  1 AMD North Bridge  Rev E6   1 AMD 8111 I O Hub  Rev C2   2 AMD 8131 PCI X Controllers  Rev B2          System Serial Number   0505AMF028  BMC Firmware Revision   1 00  Checking NVRAM     Initializing USB Controllers    Done     Press F2 to run Setup  CTRL E on Remote Keyboard   Press F12 to boot from the network  CTRL N on Remote Keyboard   Press F8 for BBS POPUP  CTRL P on Remote Keyboard              m No SEL or DMI event is recorded  
6.  2008    Preface       The Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide contains information  and procedures for using available tools to diagnose problems with the servers        Before You Read This Document    It is important that you review the safety guidelines in the Sun Fire X4140  X4240   and X4440 Safety and Compliance Guide     vii       viii    Related Documentation    The document set for the Sun Fire X4140  X4240  and X4440 Servers is described in  the Where To Find Sun Fire X4140  X4240  and X4440 Servers Documentation sheet that  is packed with your system  You can also find the documentation at  http   do6s sun  com    Translated versions of some of these documents are available at  http   docs sun com  Select a language from the drop down list and navigate to  the Sun Fire X4140  X4240  and X4440 Servers document collection using the Product  category link  Available translations for the Sun Fire X4140  X4240  and X4440  Servers include Simplified Chinese  Traditional Chinese  French  Japanese  and  Korean     English documentation is revised more frequently and might be more up to date  than the translated documentation  For all Sun documentation  go to the following  URL     http     docs   sun  com    Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Typographic ConventionsThird Party       Typeface    Meaning    Examples       AaBbCc123 The names of commands  files   and directories  onscreen  computer output    
7.  AC power cords are attached firmly to the server   s power supplies    and to the AC sources       Check that the main cover is firmly in place     There is an intrusion switch on the motherboard that automatically shuts down  the server power to standby mode when the cover is removed     Externally Inspecting the Server    To perform a visual inspection of the external system     1     Inspect the external status indicator LEDs  which can indicate component  malfunction     For the LED locations and descriptions of their behavior  see    External Status  Indicator LEDs    on page 37       Verify that nothing in the server environment is blocking air flow or making a    contact that could short out power       If the problem is not evident  continue with the next section     Internally    Inspecting the Server    on page 4     Chapter 1 Initial Inspection of the Server 3     gt   amp     Internally Inspecting the Server    To perform a visual inspection of the internal system     1  Choose a method for shutting down the server from main power mode to  standby power mode  See FIGURE 1 1 and FIGURE 1 2     a Graceful shutdown   Use a ballpoint pen or other stylus to press and release  the Power button on the front panel  This causes Advanced Configuration and  Power Interface  ACPI  enabled operating systems to perform an orderly  shutdown of the operating system  Servers not running ACPI enabled  operating systems will shut down to standby power mode immediately     a Emergen
8.  Figure Legend       1 Locator LED Locator button  White 4 Rear PS LED   Amber  Power supply fault  2 Service Required LED  Amber 5 System Over Temperature LED   Amber   3 Power OK LED  Green 6 Top Fan LED   Amber  Service action required on fan s     Back Panel LEDs    FIGURE B 2 Back Panel LEDs  X4140 shown                                   Figure Legend       1 Power Supply LEDs  3 Service Required LED   Power Supply OK  Green 4 Power OK LED   Power Supply Fail  Amber 5 Ethernet Port LEDs   AC OK  Green Left side  Green indicates link activity  2 Locator LED Button Right side     Green indicates link activity    Amber indicates link is operating at less than maximum speed     38 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    Hard Drive LEDs    FIGURE B 3 Hard Drive LEDs       Figure Legend       1 Ready to remove LED  Blue     Service action is allowed  2 Fault LED  Amber     Service action is required    3 Status LED  Green     Blinks when data is being transferred       Internal Status Indicator LEDs    The server has internal status indicators on the motherboard  and on the mezzanine   board  For motherboard locations  see FIGURE B 4  For mezzanine board locations  see   FIGURE B 5    m The DIMM Fault LEDs indicate a problem with the corresponding DIMM  They  are located next to the DIMM ejector handles     When you press the Press to See Fault button  if there is a problem with a DIMM   the corresponding DIMM Fault LED flashes  See    DIMM 
9.  LED and System   signal upon Overheat Fault LED blink    detecting an   overtemp condition   Boot device The BIOS is not able The BIOS goes to the next boot device in the DMI Log Non fatal    failure    to boot from a  device in the boot  device list     list  If all devices in the list fail  an error    message is displayed  retry from beginning of    list  SP can control  change boot order        Appendix D Error Handling 67    68 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    Index       B  BIOS   changing POST options  28   event logs  21   POST code checkpoints  33   POST codes  31   POST overview  25   redirecting console output for POST  26  Bootable Diagnostics CD  8    C  comments and suggestions  x  component inventory   viewing with ILOM SP GUI  48  console output  redirecting  26  correctable errors  handling  56    D   diagnostic software  Bootable Diagnostics CD  8  SunVTS  7   DIMMs  error handling  12  fault LEDs  15  isolating errors  18  population rules  11    E  emergency shutdown  4  error handling    correctable  56  DIMMs  12    hardware errors  64  mismatching processors  63  parity errors  59  system errors  61  uncorrectable errors  53  event logs  BIOS  21  external inspection  3  external LEDs  37    F  faults  DIMM  15  FRU inventory  viewing with ILOM SP GUI  48    G   gathering service visit information  2  general troubleshooting guidelines  2  graceful shutdown  4   guidelines for troubleshooting  2    H    hardwa
10.  gt    amp o SUN     microsystems    Sun Fire    X4140  X4240  and X4440  Servers Diagnostics Guide    Sun Microsystems  Inc   www sun com    Part No  820 3067 11  August 2008  Revision A    Submit comments about this document at  http    www sun com hwdocs  feedback    Copyright    2008 Sun Microsystems  Inc   4150 Network Circle  Santa Clara  California 95054  U S A  All rights reserved   Unpublished   rights reserved under the Copyright Laws of the United States     THIS PRODUCT CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF SUN MICROSYSTEMS  INC  USE   DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SUN MICROSYSTEMS   INC     This distribution may include materials developed by third parties     Sun  Sun Microsystems  the Sun logo  Java  Solaris  Sun Fire 4140  Sun Fire 4240 and Sun Fire 4440 are trademarks or registered trademarks of  Sun Microsystems  Inc  in the U S  and other countries     AMD Opteron and Opteron are trademarks of Advanced Micro Devices  Inc   Intel is a registered trademark of Intel Corporation     This product is covered and controlled by U S  Export Control laws and may be subject to the export or import laws in other countries  Nuclear   missile  chemical biological weapons or nuclear maritime end uses or end users  whether direct or indirect  are strictly prohibited  Export or  reexport to countries subject to U S  embargo or to entities identified on U S  export exclusion lists  including  but not
11.  in BMC   XXX XXX XXX XXX    a If you choose Static to assign the IP address manually  perform the  following steps     i  Type the IP address in the IP Address field     You can also enter the subnet mask and default gateway settings in their  respective fields     ii  Select Commit and press Return to commit the changes     iii  Select Refresh and press Return to see your new settings displayed in the  Current IP address in BMC field       Start a web browser and type the service processor   s IP address in the    browser   s URL field       When you are prompted for a user name and password  type the following     m User Name  root    Password  changeme    The Sun Integrated Lights Out Manager main GUI screen is displayed       Click the Remote Control tab       Click the Redirection tab     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    10   11     12     Set the color depth for the redirection console at either 6 or 8 bits   Click the Start Redirection button     When you are prompted for a user name and password  type the following   m User Name  root    Password  changeme    The current POST screen is displayed     Appendix A Event Logs and POST Codes    27       Changing POST Options    These instructions are optional  but you can use them to change the operations that  the server performs during POST testing  To change POST options     1  Initialize the BIOS Setup utility by pressing the F2 key while the system is  performing the power on
12.  limited to  the denied  persons and specially designated nationals lists is strictly prohibited     Use of any spare or replacement CPUs is limited to repair or one for one es of CPUs in products exported in compliance with U S   export laws  Use of CPUs as product upgrades unless authorized by the U S  Government is strictly prohibited        Copyright    2008 Sun Microsystems  Inc   4150 Network Circle  Santa Clara  California 95054  Etats Unis  Tous droits r  serv  s   Non publie   droits r  serv  s selon la l  gislation des Etats Unis sur le droit d auteur     CE PRODUIT CONTIENT DES INFORMATIONS CONFIDENTIELLES ET DES SECRETS COMMERCIAUX DE SUN MICROSYSTEMS  INC   SON UTILISATION  SA DIVULGATION ET SA REPRODUCTION SONT INTERDITES SANS L AUTORISATION EXPRESSE  ECRITE ET  PREALABLE DE SUN MICROSYSTEMS  INC     Cette distribution peut inclure des   l  ments d  velopp  s par des tiers      Sun  Sun Microsystems  le logo Sun  Java  Solaris et Sun Fire 4140  Sun Fire 4240  and Sun Fire 4440 sont des marques de fabrique ou des  marques d  pos  es de Sun Microsystems  Inc  aux Etats Unis et dans d autres pays     AMD Opteron et Opteron sont marques d  pos  es de Advanced Micro Devices  Inc  Intel est une marque d  pos  e de Intel Corporation    Ce produit est soumis    la l  gislation am  ricaine sur le contr  le des exportations et peut   tre soumis    la r  glementation en vigueur dans  d autres pays dans le domaine des exportations et importations  Les utilisations finales  
13.  or front panel for  5 seconds to initiate a    push to test    mode that illuminates all other LEDs both  inside and outside of the chassis for 15 seconds     4  Verify that there are no loose or improperly seated components     5  Verify that all cable connectors inside the system are firmly and correctly  attached to their appropriate connectors    6  Verify that any after factory components are qualified and supported   For a list of supported PCI cards and DIMMs  refer to your server   s service    manual     7  Check that the installed DIMMs comply with the supported DIMM population  rules and configurations  as described in    DIMM Population Rules    on page 11     8  Replace the server cover     9  To restore the server to main power mode  all components powered on   use a  ballpoint pen or other stylus to press and release the Power button on the  server front panel  See FIGURE 1 1 and FIGURE 1 2     When main power is applied to the full server  the Power OK LED next to the  Power button lights and remains lit     Chapter 1 Initial Inspection of the Server 5    10  If the problem with the server is not evident  you can obtain additional  information by viewing the power on self test  POST  messages and BIOS  event logs during system startup  Continue with    Viewing Event Logs    on  page 21     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    CHAPTER 2       Using SunVTS Diagnostic Software       This chapter contains information about t
14.  pattern 55aa55aa         Note     Enabling Quick Boot causes the BIOS to skip the memory test  See     Changing POST Options    on page 28 for more information           Note     Because the server can contain up to 64 MB of memory  128 MB for the  X4440   the memory test can take several minutes  You can cancel POST testing by  pressing any key during POST        3  The BIOS polls the memory controllers for both correctable and uncorrectable  memory errors and logs those errors into the service processor     Appendix A Event Logs and POST Codes 25    26    Redirecting Console Output    Use the following instructions to access the service processor and redirect the  console output so that the BIOS POST codes can be read     1     Initialize the BIOS Setup utility by pressing the F2 key while the system is  performing the power on self test  POST      The BIOS Main menu screen is displayed       Select the Advanced menu tab     The Advanced Settings screen is displayed       Select IPMI 2 0 Configuration     The IPMI 2 0 Configuration screen is displayed       Select the LAN Configuration menu item     The LAN Configuration screen displays the service processor   s IP address       To configure the service processor   s IP address  optional      a  Select the IP Assignment option that you want to use  DHCP or Static      a If you choose DHCP  the server   s IP address is retrieved from your network   s  DHCP server and displayed using the following format   Current IP address
15.  seconds to generate  then it is displayed on the screen     5  If the problem with the server is not evident  continue with    Using the ILOM  Service Processor GUI to View System Information    on page 43  or    Viewing  ILOM SP Event Logs    on page 45     24 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Power On Self Test  POST     The system BIOS provides a rudimentary power on self test  The basic devices  required for the server to operate are checked  memory is tested  the LSI 1064 disk  controller and attached disks are probed and enumerated  and the two Intel dual  Gigabit Ethernet controllers are initialized     The progress of the self test is indicated by a series of POST codes  These codes are  displayed at the bottom right corner of the system   s VGA screen  once the self test  has progressed far enough to initialize the system video   However  the codes are  displayed as the self test runs and scroll off of the screen too quickly to be read  An  alternate method of displaying the POST codes is to redirect the output of the  console to a serial port  see    Redirecting Console Output    on page 26      How BIOS POST Memory Testing Works    The BIOS POST memory testing is performed as follows     1  The first megabyte of DRAM is tested by the BIOS before the BIOS code is  shadowed  that is  copied from ROM to DRAM      2  Once executing out of DRAM  the BIOS performs a simple memory test  a  write read of every location with the
16.  self test  POST      The BIOS Main menu screen is displayed     2  Select Boot   The Boot Settings screen is displayed                 Main Advanced PCIPnP Boot Security Chipset Exit  KKKKXK KE EEK EEK EEE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KE KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK    Boot Settings   Configure Settings    k KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK OK during System Boot         Boot Settings Configuration                 Boot Device Priority E        Hard Disk Drives de ss      CD DVD Drives je                    E Select Screen E     R EE Select Item a  E   Enter Go to Sub Screen    E    El General Help ig      F10 Save and Exit    x   ESC Exit x                  KKEKEKKKKKKKKEKK KKK KKK KKK KKK KKK KKK KKK KEK RA RARA RAR RARA RAR AAA ARA RA RARA RARA AAA    v02 61  C Copyright 1985 2006  American Megatrends  Inc           28 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    3  Select Boot Settings Configuration     The Boot Settings Configuration screen is displayed           Boot    Kk    Kk    Kk    kk    kkkkxkxkxkxkxkxkxkxkxkxkxkkkkxkkxkkkkxkkxkkkkkkxkxkkkkkxkxkkkkkxkxkxkkkkkkxkxkkkkkxkxkxkkkkxkxkxkkkkxkkkkxkx k                         Boot Settings Configuration   Allows BIOS to skip    KKK KKK KEKE KKK KKK KEKE KKK KKK KEKE K KK KKK KEKE KKK KEK KKK KK KEK KKK KEKKK OK certain tests while    Quick Boot  Disabled    booting  This will    Quiet Boot  Disabled    decrease the time    AddOn ROM Display Mode  Force 
17.  system timer interrupt  Traps INT1Ch vector to     POSTINT1ChHandlerBlock        CO Early CPU Init Start  Disable Cache  Init Local APIC    Cl Set up boot strap processor information    C2 Set up boot strap processor for POST  This includes frequency calculation  loading BSP  microcode  and applying user requested value for GART Error Reporting setup question    C3 Errata workarounds applied to the BSP   78  amp   110     C5 Enumerate and set up application processors  This includes microcode loading and    workarounds for errata   78   110   106   107   69   63      C6 Re enable cache for boot strap processor  and apply workarounds in the BSP for errata   106   107   69  and  63 if appropriate  In case of mixed CPU steppings  errors are sought  and logged  and an appropriate frequency for all CPUs is found and applied  NOTE  APs  are left in the CLI HLT state     C7 The HT sets link frequencies and widths to their final values  This routine gets called after  CPU frequency has been calculated to prevent bad programming     OA Initializes the 8042 compatible Keyboard Controller   0B Detects the presence of PS 2 mouse   0c Detects the presence of Keyboard in KBC port        Appendix A Event Logs and POST Codes 33       TABLE A 2 POST Code Checkpoints  Continued    Post Code Description   OE Testing and initialization of different Input Devices  Also  update the Kernel Variables   Traps the INTO9h vector  so that the POST INT09h handler gets control for IRQ1   Uncompress all a
18.  the default  password  changeme     Once you have successfully logged in to the SP  it displays its default command  prompt       gt     4  To start the serial console  type the following commands     cd  SP console  start    To exit console mode and return to the service processor  type   escape shift 9     m  Continue with the following procedures   a    Viewing ILOM SP Event Logs    on page 45       Viewing Replaceable Component Information    on page 48    a    Viewing Sensors    on page 50    44 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Viewing ILOM SP Event Logs    Events are notifications that occur in response to some actions  The IPMI system  event log  SEL  provides status information about the server   s hardware and  software to the ILOM software  which displays the events in the ILOM web GUI  To  view event logs     1  Log in to the SP as Administrator or Operator to reach the ILOM web GUI     a  Type the IP address of the server   s SP into your web browser     The Sun Integrated Lights Out Manager Login screen is displayed     b  Type your user name and password     When you first try to access the ILOM SP  you are prompted to type the default  user name and password  The default user name and password are     Default user name  root  Default password  changeme  2  From the System Monitoring tab  select Event Logs     The System Event Logs page is displayed  See FIGURE C 1 for a page that shows  sample information     Appen
19.  the network  SunVTS    software also provides a TTY mode interface for situations in which running a GUI  is not possible     SunVTS Documentation    For the most up to date information on SunVTS software  go to     http   docs sun com app docs prod test validate    Diagnosing Server Problems With the Bootable  Diagnostics CD    SunVTS 6 4 or later software is preinstalled on your server  The server is also  shipped with the Bootable Diagnostics CD  This CD is designed so that the server  will boot from the CD  This CD boots and starts SunVTS software  Diagnostic tests  run and write output to log files that the service technician can use to determine the  problem with the server     Requirements    a To use the diagnostics CD you must have a keyboard  mouse  and monitor  attached to the server on which you are performing diagnostics  or available  through a remote KVM     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    Using the Bootable Diagnostics CD    To use the diagnostics CD to perform diagnostics     1     2     With the server powered on  insert the CD into the DVD ROM drive     Reboot the server  and press F2 during the start of the reboot so that you can  change the BIOS setting for boot device priority       When the BIOS Main menu appears  navigate to the BIOS Boot menu     Instructions for navigating within the BIOS screens appear on the BIOS screens       On the BIOS Boot menu screen  select Boot Device Priority     The Boot Device 
20.  to See Fault button on the motherboard or the mezzanine  board  LEDs next to the DIMMs flash to indicate that the system has detected 24 or  more CEs in a 24 hour period on that DIMM     Chapter 3 Troubleshooting DIMM Problems 15    Note     The DIMM Fault and Motherboard Fault LEDs operate on stored power for  up to a minute when the system is powered down  even after the AC power is  disconnected  and the motherboard  or mezzanine board  is out of the system  The  stored power lasts for about half an hour        Note     Disconnecting the AC power removes the fault indication  To recover fault  information look in the SP SEL  as described in the Sun Integrated Lights Out Manager  2 0 User   s Guide        a DIMM fault LED is off   The DIMM is operating properly   a DIMM fault LED is flashing  amber      At least one of the DIMMs in this DIMM  pair has reported 24 CEs within a 24 hour period     m Motherboard Fault LED on mezzanine is on     There is a fault on the motherboard   This LED is there because you cannot see the motherboard LEDs when the  mezzanine board is present        Note     The Motherboard Fault LED operates independently of the Press to See Fault  button  and does not operate on stored power        See FIGURE 3 1 for the locations of DIMMs and LEDs on the motherboard  See  FIGURE 3 2 for the locations of DIMMs and LEDs on the mezzanine board     16 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    FIGURE 3 1 DIMMs and LEDs on Mot
21.  to see details of the error     m Solaris     Solaris FMA reports and  sometimes  retires memory with correctable Error  Correction Code  ECC  errors  See your Solaris Operating System documentation  for details  Use the command     fmdump  eV    14 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    to view ECC errors    m Linux     The HERD utility can be used to manage DIMM errors in Linux  See the x64  Servers Utilities Reference Manual for details     The    If HERD is installed  it copies messages from  dev mcelog to   var log messages     If HERD is not installed  a program called mcelog copies messages from   dev mcelog to  var log mcelog     Bootable Diagnostics CD described in Chapter 2 also captures and logs CEs     BIOS DIMM Error Messages    The    NOD     BIOS displays and logs the following DIMM error messages        E n Memory Configuration Mismatch    The following conditions will cause this error message     The DIMMs mode is not paired  running in 64 bit mode instead of 128 bit  mode      The DIMMs    speed is not same    The DIMMs do not support ECC    The DIMMs are not registered    The MCT stopped due to errors in the DIMM   The DIMM module type  buffer  is mismatched   The DIMM generation  I or II  is mismatched   The DIMM CL T is mismatched    The banks on a two sided DIMM are mismatched   The DIMM organization is mismatched  128 bit      The SPD is missing Trc or Trfc information     DIMM Fault LEDs    When you press the Press
22. 0000h  shadow RAM  cacheability  Ported to handle any OEM specific programming needed  during End POST  Copy OEM specific data from POST_DSEG to RUN_CSEG        Appendix A Event Logs and POST Codes 35    TABLE A 2 POST Code Checkpoints  Continued        Post Code Description  B1 Save system context for ACPI   00 Prepares CPU for booting to OS by copying all of the context of the BSP to all application    processors present  NOTE  APs are left in the CLI HLT state     61 70 OEM POST Error  This range is reserved for chipset vendors and system manufacturers   The error associated with this value may be different from one platform to the next        36 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    APPENDIX B    Status Indicator LEDs       This appendix contains information about the locations and behavior of the LEDs on  the server  It describes the external LEDs that can be viewed on the outside of the  server and the internal LEDs that can be viewed only with the main cover removed        External Status Indicator LEDs    See the following figures and tables for information about the LEDs that are  viewable on the outside of the server     m FIGURE B 1 shows and describes the front panel LEDs   m FIGURE B 2 shows and describes the back panel LEDs   m FIGURE B 3 shows and describes the hard drive LEDs   m FIGURE B 4 and FIGURE B 5 show the location of the internal LEDs     37    Front Panel LEDs    FIGURE B 1 Front Panel LEDs  X4140 shown          
23. AaBbCc123 What you type  when contrasted  with onscreen computer output    AaBbCc123 Book titles  new words or terms   words to be emphasized   Replace command line variables  with real names or values       The settings on your browser might differ from these settings     Web Sites    Edit your login file   Use 1s  a to list all files       You have mail       su  Password     Read Chapter 6 in the User s Guide   These are called class options    You must be superuser to do this   To delete a file  type rm filename     Sun    is not responsible for the availability of third party web sites mentioned in this  document  Sun does not endorse and is not responsible or liable for any content   advertising  products  or other materials that are available on or through such sites  or resources  Sun will not be responsible or liable for any actual or alleged damage  or loss caused by or in connection with the use of or reliance on any such content   goods  or services that are available on or through such sites or resources     Preface ix       x    Sun Welcomes Your Comments    Sun is interested in improving its documentation and welcomes your comments and  suggestions  You can submit your comments by going to     http    www sun com hwdocs  feedback  Please include the title and part number of your document with your feedback     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide  part number 820 3067 11    Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   Augus
24. BIOS    needed to boot the E  Bootup Num Lock  On    system  E  Wait For  F1  If Error  Disabled       Interrupt 19 Capture  Enabled  ai E                        ERN Select Screen     xx Select Item     Poe Change Option     ETEL General Help A     F10 Save and Exit       ESC Exit                  KKEKEKKKKKKKKEKKKKKKEKKE KK KKK KEK KKK KKK KKK KKK KKK KEK KKK RARA RARA KARA RRA RAR AAA RA RARA AAA    v02 61  C Copyright 1985 2006  American Megatrends  Inc           4  On the Boot Settings Configuration screen  there are several options that you  can enable or disable     m Quick Boot   This option is disabled by default  If you enable this  the BIOS  skips certain tests while booting  such as the extensive memory test  This  decreases the time it takes for the system to boot     m Quiet Boot   This option is disabled by default  If you enable this  the Sun  Microsystems logo is displayed instead of POST codes     a Add On ROM Display Mode   This option is set to Force BIOS by default   This option has effect only if you have also enabled the Quiet Boot option  but  it controls whether output from the Option ROM is displayed  The two settings  for this option are as follows      a Force BIOS     Remove the Sun logo and display Option ROM output       Keep Current     Do not remove the Sun logo  The Option ROM output is not  displayed     Appendix A Event Logs and POST Codes 29      Boot Num Lock   This option is On by default  keyboard Num Lock is turned  on during boot   If 
25. Fault LEDs    on page 15  for details     a The CPU Fault LEDs indicate a problem with the corresponding CPU     When you press the Press to See Fault button  if there is a problem with a CPU   the corresponding CPU Fault LED flashes        Note     The DIMM Fault and Motherboard Fault LEDs operate on stored power for  up to a minute when the system is powered down  even after the AC power is  disconnected  and the motherboard  or mezzanine board  is out of the system  The  stored power lasts for about half an hour        m The Motherboard Fault LED on the mezzanine board indicates that there is a  problem with the motherboard     Appendix B Status Indicator LEDs 39    Note     The mezzanine board  when present  obscures part of the motherboard   including the LEDs  The Motherboard Fault LED indicates that one or more of the  LEDs on the motherboard is active        FIGURE B 4 DIMMs and LEDs on Motherboard    Fans    CPU 0 CPU 1    Failo  8  CP    Press to  see fault       40 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    FIGURE B 5 DIMMs and LEDs on Mezzanine Board    Fans    Motherboard  Press to Fault    see fault       Appendix B Status Indicator LEDs 41    42 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    APPENDIX C    Using the ILOM Service Processor  GUI to View System Information       This appendix contains information about using the Integrated Lights Out Manager   ILOM  Service processor  SP  GUI to view mon
26. Linux NMI trap catches the interrupt and reports the following NMI     confusion report    sequence        enabled     enabled     enabled     enabled        Aug 5 05   on CPU 0   Aug 5 05   on CPU 1   Aug 5 05   Aug 5 05     Aug 5 05   on CPU 1   Aug 5 05   Aug 5 05     Aug 5 05   on CPU 0   Aug 5 05   Aug 5 05     Aug 5 05   Aug 5 05     15    15    15  15    15    15  15    15    15  15    15  15     00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel      00 d mpk12 53 159 kernel     Uhhuh  NMI received for unknown reason 2d  Uhhuh  NMI received for unknown reason 2d    Dazed and confused  but trying to continue  Do you have a strange power saving mode    Uhhuh  NMI received for unknown reason 3d    Dazed and confused  but trying to continue  Do you have a strange power saving mode    Uhhuh  NMI received for unknown reason 3d    Dazed and confused  but trying to continue  Do you have a strange power saving mode    Dazed and confused  but trying to continue  Do you have a strange power saving mode       60 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Note     The Linux system reboots  but does not inform the BIOS of this incident           Handling of System Errors  SERR 
27. Priority screen appears       Select the DVD ROM drive to be the primary boot device     Save and exit the BIOS screens       Reboot the server     When the server reboots from the CD in the DVD ROM drive  the Solaris  Operating System boots and SunVTS software starts and opens its first GUI  window       In the SunVTS GUI  press Enter or click the Start button when you are    prompted to start the tests     The test suite will run until it encounters an error or the test is completed        Note     The CD will take approximately nine minutes to boot        9     When SunVTS software completes the test  review the log files generated  during the test     SunVTS provides access to four different log files     m SunVTS test error log contains time stamped SunVTS test error messages  The  log file path name is  var opt SUNWvts logs sunvts err  This file is not  created until a SunVTS test failure occurs     m SunVTS kernel error log contains time stamped SunVTS kernel and SunVTS  probe errors  SunVTS kernel errors are errors that relate to running SunVTS   and not to testing of devices  The log file path name is   var  opt  SUNWvts logs vtsk err  This file is not created until SunVTS  reports a SunVTS kernel error     m SunVTS information log contains informative messages that are generated  when you start and stop the SunVTS test sessions  The log file path name is    var  opt SUNWvts logs sunvts info  This file is not created until a  SunVTS test session runs     Chapter 2 Us
28. are Error   No usable system memory       300   08 26 2005   11 36 12       Memory   Memory Device Disabled   CPU 0 DIM 0          When the faulty DIMM is beyond the BIOS s low 1MB extraction space  proper  boot happens        ipmitool gt  sel list  100   08 26 2005   05 04 04   OEM  0xfb    200   08 26 2005   05 04 09   Memory   Memory Device Disabled   CPU 0 DIM 0                m Note the following considerations for this revision     Uncorrectable ECC Memory Error is not reported    Multi bit ECC errors are reported as Memory Device Disabled   On first reboot  BIOS logs a HyperTransport Error in the DMI log   The BIOS disables the DIMM    The BIOS sends the SEL records to the BMC    The BIOS reboots again    The BIOS skips the faulty DIMM on the next POST memory test   The BIOS reports available memory  excluding the faulty DIMM pair     FIGURE D 1 shows an example of a DMI log screen from BIOS Setup Page     54 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008          FIGURE D 1 DMI Log Screen  Uncorrectable Error    BIOS SETUP UTILITY  Advanced    Event Logging details View all unread events  a   on the Event Log     Mark all events as read  Clear Event Log    Uieu Event Log  09 12 05 11 51 05  A Hyper Transport sync flood error occurred on last boot    Enter Go to Sub Screen  Fi General Help  F10 Save and Exit  ESC Exit       v02 53  C Copyright 1985 2002  American Megatrends  In    Appendix D Error Handling 55       56    Handling of Correctabl
29. ate        Appendix A Event Logs and POST Codes 31       TABLE A 1 POST Codes  Continued    Post Code Description   de00 Preparing CPU for booting to OS by copying all of the context of the BSP to all application  processors present  NOTE  APs are left in the CLI HLT state    8613 Initialize PM regs and PM PCI regs at Early POST  Initialize multi host bridge  if system  supports it  Setup ECC options before memory clearing  Enable PCI X clock lines in the  8131    0024 Uncompress and initialize any platform specific BIOS modules    862a BBS ROM initialization    002a Generic Device Initialization Manager  DIM    Disable all devices    042a ISA PnP devices   Disable all devices    052a PCI devices   Disable all devices    122a ISA devices   Static device initialization    152a PCI devices   Static device initialization    252a PCI devices   Output device initialization    202c Initializing different devices  Detecting and initializing the video adapter installed in the  system that have optional ROMs    002e Initializing all the output devices    0033 Initializing the silent boot module  Set the window for displaying text information    0037 Displaying sign on message  CPU information  setup key message  and any OEM specific  information    4538 PCI devices   IPL device initialization    5538 PCI devices   General device initialization    8600 Preparing CPU for booting to OS by copying all of the context of the BSP to all application    processors present  NOTE  APs are left i
30. ck will be in  UTC     m Via the CLI  ILOM web GUL  and IPMI       Viewing Replaceable Component  Information    Depending on the component you select  information about the manufacturer   component name  serial number  and part number can be displayed  To view  replaceable component information     1  Log in to the SP as Administrator or Operator to reach the ILOM web GUI     a  Type the IP address of the server   s SP into your web browser     The Sun Integrated Lights Out Manager Login screen is displayed     b  Type your user name and password     When you first try to access the ILOM Service Processor  you are prompted to  type the default user name and password  The default user name and  password are     Default user name  root  Default password  changeme    48 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    2  From the System Information tab  select Components     The Replaceable Component Information page is displayed  See FIGURE C 2     FIGURE C 2 Replaceable Component Information Page     REFRESH   Loc OUT                       ABOUT  a  dministrator  roo    Integrated Lights Out Manager       System Information                   Versions   Session Time Out Components Identification Information       Component Management  View component information from this page  To view further details  click on a Component Name     Component Management Status          Component Name Type   sys Host System  iSYS MB Motherboard  ISYSIMBIPO Host Processo
31. cy shutdown   Use a ballpoint pen or other stylus to press and hold  the Power button for four seconds to force main power off and enter standby  power mode        Caution     Performing an emergency shutdown can cause open files to become  corrupt  Use an emergency shutdown only when necessary        When main power is off  the Power OK LED on the front panel will begin  flashing  indicating that the server is in standby power mode        Caution     When you use the Power button to enter standby power mode  power is  still directed to service processor and power supply fans  indicated when the  Power OK LED is flashing  To completely power off the server  you must disconnect  the AC power cords from the back panel of the server        FIGURE 1 1 X4140 Server Front Panel    Locate Button LED    1 er          Ea BB 1       PowerButton    4 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    FIGURE 1 2 X4440 Server Front Panel    Locate Button LED          JE BA       Power Button    2  Remove the server cover   For instructions on removing the server cover  refer to your server   s service  manual    3  Inspect the internal status indicator LEDs  These can indicate component  malfunction     For the LED locations and descriptions of their behavior  see    Internal Status  Indicator LEDs    on page 39        Note     The server must be in standby power mode for viewing the internal LEDs        You can hold down the Locate button on the server back panel
32. dix C Using the ILOM Service Processor GUI to View System Information 45    FIGURE C 1 System Event Logs Page     REFRESH   Loc OUT    ry  Administrator  root SP Hostname    Sun    Integrated Lights Out Manager             Sensor Readings   Indicators Event Logs       Jul    Event Log    Displays every event in the SP  including IPMI  Audit  and FMA events  Click the Clear Log button to delete all current log entries     Event Log             Event ID   Class Severity   Date Time Description   162 minor Wed Nov 28 root  Open Session   object   sessionftype   value   www  success  09 39 10 2007   161 minor Wed Nov 28 root  Open Session   object   sessionitype   value   shell   success  09 23 06 2007   160 critical Wed Nov 28 ID   81   pre init timestamp   Entity Presence   hdd  prsnt  Device Absent    09 21 01 2007    159 critical Wed Nov 28 ID   80   pre init timestamp   Entity Presence   hdd2 prsnt  Device Absent  09 20 57 2007 Y    4  Il   E          3  Select the category of event that you want to view in the log from the drop   down list box     You can select from the following types of events     m Sensor specific events  These events relate to a specific sensor for a component   for example  a fan sensor or a power supply sensor     a BIOS generated events  These events relate to error messages generated in the  BIOS     m System management software events  These events relate to events that occur  within the ILOM software     46 Sun Fire X4140  X4240  and X4440 Server
33. e Errors    This section lists facts and considerations about how the server handles correctable  errors     During BIOS POST      The BIOS polls the MCK registers      The BIOS logs to DMI      The BIOS logs to the SP SEL through the BMC   The feature is turned off at OS boot time by default     The following Linux versions report correctable ECC syndrome and memory fill  errors in  var 1log  if kernel flag mce is indicated at boot time  or if mce is  enabled through kernel compile or installation         RH3 Updated single core    RH4 Updatel   a SLES9 SP1     The Linux kernel  x86_64 kernel mce c  repeats a report every 30 seconds  until another error is encountered and an 8131 flag is reset     Solaris support provides full self healing and automated diagnosis for the CPU  and Memory subsystems     FIGURE D 2 shows an example of a DMI log screen from BIOS Setup Page     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    FIGURE D 2 DMI Log Screen  Correctable Error    BIOS SETUP UTILITY    View Event Log    09 12 05 12 33 16    C on Node 1 DIMM Pair 0  SPD addres     5 12 33 16  Bit ECC Memory Error      BAGH  BAZ          If during any stage of memory testing the BIOS finds itself incapable of  reading writing to the DIMM  it takes the following actions      m The BIOS disables the DIMM as indicated by the Memory Decreased message    in the example in EXAMPLE D 1   a The BIOS logs an SEL record   a The BIOS logs an event in DMI     Appendix D Erro
34. etain copies of the logs showing the memory errors per the above rules to send to  Sun for verification prior to calling Sun        How DIMM Errors Are Handled by the  System    This section describes system behavior for the two types of DIMM errors  UCEs and  CEs  and also describes BIOS DIMM error messages     Uncorrectable DIMM Errors    For all operating systems  OS s   the behavior is the same for UCEs     1  When an UCE occurs  the memory controller causes an immediate reboot of the  system     2  During reboot  the BIOS checks the Machine Check registers and determines that  the previous reboot was due to an UCE  then reports this in POST after the  memtest stage     A Hypertransport Sync Flood occurred on last boot    12 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    3  BIOS reports this event in the service processor   s system event log  SEL  as  shown in the sample IPMItool output below          ipmitool  H 10 6 77 249  U root  P changeme  I lanplus sel list   8   09 25 2007   03 22 03   System Boot Initiated  0x02   Initiated by warm  reset   Asserted   9   09 25 2007   03 22 03   Processor  0x04   Presence detected   Asserted   a   09 25 2007   03 22 03   OEM  0x12     Asserted   b   09 25 2007   03 22 03   System Event  0x12   Undetermined system hardware  failure   Asserted       c   OEM record e0   00000002000000000029000002  d   OEM record e0   00000004000000000000b00006  e   OEM record e0   00000048000000000011110322  f   OEM rec
35. g the cause of a problem with the server is to gather  information from the service call paperwork or the onsite personnel  Use the  following general guideline steps when you begin troubleshooting     To gather service information     1  Collect information about the following items   m Events that occurred prior to the failure  m Whether any hardware or software was modified or installed  m Whether the server was recently installed or moved  a How long the server exhibited symptoms    a The duration or frequency of the problem    2  Document the server settings before you make any changes     If possible  make one change at a time in order to isolate potential problems  In  this way  you can maintain a controlled environment and reduce the scope of  troubleshooting     3  Take note of the results of any change that you make  Include any errors or  informational messages     4  Check for potential device conflicts before you add a new device     5  Check for version dependencies  especially with third party software     Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       System Inspection    Controls that have been improperly set and cables that are loose or improperly  connected are common causes of problems with hardware components     Troubleshooting Power Problems    If the server will power on  skip this section and go to    Externally Inspecting the  Server    on page 3     If the server will not power on  check the following       Check that
36. handler starts logging each  DRAM on the detected error and stops logging when the  DIMM interface  limit for the same error is reached  The BIOS s  polling can be disabled through a software  interface   Uncorrectable The CPU detects an The    sync flood    method is used to prevent SP SEL Fatal  DRAM ECC uncorrectable the erroneous data from being propagated  error multiple bit DIMM across the Hypertransport links  The system  error  reboots  the BIOS recovers the machine check  register information  maps this information to  the failing DIMM  when CHIPKILL is  disabled  or DIMM pair  when CHIPKILL is  enabled   and logs that information to the SP   The BIOS will halt the CPU   Unsupported Unsupported The BIOS displays an error message  logs an DMI Log Fatal  DIMM DIMMs are used  or error  and halts the system  SP SEL  configuration supported DIMMs  are loaded  improperly   HyperTranspor CRC or link error Sync floods on HyperTransport links  the DMI Log Fatal  t link failure on one of the machine resets itself  and error information SP SEL    Hypertransport  Links     gets retained through reset    The BIOS reports  A Hyper Transport  sync flood error occurred on last  boot  press F1 to continue     Appendix D Error Handling 65    TABLE D 1 Hardware Error Handling Summary  Continued        Logged  DMI  Log or SP  Error Description Handling SEL  Fatal   PCI SERR  System or parity Sync floods on HyperTransport links  the DMI Log Fatal  PERR error on a PCI bus  machine resets 
37. he SunVTS    diagnostic software tool        Running SunVTS Diagnostic Tests    The servers are shipped with a Bootable Diagnostics CD that contains the Sun  Validation Test Suite  SunVTS  software     SunVTS provides a comprehensive diagnostic tool that tests and validates Sun  hardware by verifying the connectivity and functionality of most hardware  controllers and devices on Sun platforms  SunVTS software can be tailored with  modifiable test instances and processor affinity features    The following tests are supported on x86 platforms      CD DVD Test  cddvdtest    a CPU Test  cputest    m Cryptographics Test  cryptotest    m Disk and Diskette Drives Test  disktest    m Data Translation Look aside Buffer  dtlbtest    m Emulex HBA Test  emlxtest    a Floating Point Unit Test  fputest    a InfiniBand Host Channel Adapter Test  ibhcatest    m Level 1 Data Cache Test  11dcachetest    m Level 2 SRAM Test  12sramtest    m Ethernet Loopback Test  netlbtest    m Network Hardware Test  nettest     a Physical Memory Test  pmemtest     a QLogic Host Bus Adapter Test  qlctest    m RAM Test  ramtest    m Serial Port Test  serialtest    m System Test  systest    a Tape Drive Test  tapetest    m Universal Serial Board Test  usbtest    m Virtual Memory Test  vmemtest    SunVTS software has a sophisticated graphical user interface  GUI  that provides  test configuration and status monitoring  The user interface can be run on one  system to display the SunVTS testing of another system on
38. herboard    Fans    CPU 0    Failon  e   Press to  see fault    Chapter 3 Troubleshooting DIMM Problems       17    FIGURE 3 2 DIMMs and LEDs on Mezzanine Board    Fans    CPU 2 CPU 3    CPU2 Fail un     m m  Motherboard  Press to Fault    see fault          Isolating and Correcting DIMM ECC  Errors    If your log files report an ECC error or a problem with a DIMM  complete the steps  below until you can isolate the fault     In this example  the log file reports an error with the DIMM in CPUO  slot 7  The  fault LEDs on CPUO  slots 6 and 7 are on     To isolate and correct DIMM ECC errors     1  If you have not already done so  shut down your server to standby power mode  and remove the cover     2  Inspect the installed DIMMs to ensure that they comply with the    DIMM  Population Rules    on page 11     18 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    3  Press the PRESS TO SEE FAULT button  and inspect the DIMM fault LEDs  See  FIGURE 3 1 and FIGURE 3 2     A flashing LED identifies a component with a fault   m For CEs  the LEDs correctly identify the DIMM where the errors were detected       For UCEs  both LEDs in the pair flash if there is a problem with either DIMM  in the pair     Note     If your server is equipped with a mezzanine board  the motherboard DIMMs  and LEDs will be hidden beneath it  However  the Motherboard Fault LED lights to  indicate that there is a problem on the motherboard  only while AC power is still  connected   If 
39. ilure detected by reading and individual fan module LEDs are lit     tach signals     66 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    TABLE D 1 Hardware Error Handling Summary  Continued        Logged  DMI  Log or SP   Error Description Handling SEL  Fatal   Multiple fan Fan failure is The Front Fan Fault  Service Action Required  SP SEL Fatal  failure detected by reading and individual fan module LEDs are lit    tach signals   Single power When any of the Service Action Required  and Power Supply SP SEL Non fatal  supply failure AC DC Fault LEDs are lit    PS_VIN_GOOD or   PS_PWR_OK   signals are   deasserted   DC DC power Any The Service Action Required LED is lit  the SP SEL Fatal  converter POWER_GOOD system is powered down to standby power  failure signal is deasserted mode  and the Power LED enters standby   from the DC DC blink state    converters   Voltage The SP monitors The Service Action Required LED and Power SP SEL Fatal  above below system voltages and Supply Fault LED blink   threshold detects voltage   above or below a   given threshold   High The SP monitors The Service Action Required LED and System SP SEL Fatal  temperature CPU and system Overheat Fault LED blink  The motherboard   temperatures  and is shut down above the specified critical level    detects   temperatures above   a given threshold   Processor The CPU drives the CPLD shuts down power to the CPU  The SP SEL Fatal  thermal trip THERMTRIP_L Service Action Required
40. ing SunVTS Diagnostic Software 9    m Solaris system message log is a log of all the general Solaris events logged by  syslogd  The path name of this log file is  var adm messages    a  Click the Log button   The Log file window is displayed    b  Specify the log file that you want to view by selecting it from the Log file  window     The content of the selected log file is displayed in the window     c  With the three lower buttons you can perform the following actions     ma Print the log file     A dialog box appears for you to specify your printer  options and printer name       Delete the log file   The file remains on the display  but it will not be  available the next time you try to display it      m Close the Log file window   The window is closed        Note     If you want to save the log files  When you use the Bootable Diagnostics  CD  the server boots from the CD  Therefore  the test log files are not on the server   s  hard disk drive and they will be deleted when you power cycle the server  To save  the log files  you must save them to a removable media device or FTP them to  another system        10 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    CHAPTER 3       Troubleshooting DIMM Problems       This chapter describes how to detect and correct problems with the server   s Dual  Inline Memory Modules  DIMM s  It includes the following sections           DIMM Population Rules    on page 11      DIMM Replacement Policy    on page 12
41. itoring and maintenance information  for your server     m    Making a Serial Connection to the SP    on page 44   m    Viewing ILOM SP Event Logs    on page 45   m    Viewing Replaceable Component Information    on page 48   m    Viewing Sensors    on page 50   For more information on using the ILOM SP GUI to maintain the server  for    example  configuring alerts   refer to the Integrated Lights Out Manager Administration  Guide     m If any of the logs or information screens indicate a DIMM error  see Chapter 3     a If the problem with the server is not evident after viewing ILOM SP logs and  information  continue with    Running SunVTS Diagnostic Tests    on page 7     43       Making a Serial Connection to the SP    To make a serial connection to the SP     1  Connect a serial cable from the RJ 45 Serial Management port on server to a  terminal device     2  Press ENTER on the terminal device to establish a connection between that  terminal device and the ILOM SP        Note     If you are connecting to the serial port on the SP before it has been powered  up or during its power up sequence  you will see boot messages     The service processor eventually displays a login prompt  For example     SUNSP0003BA84D777 login     The first string in the prompt is the default host name for the ILOM SP  It consists  of the prefix SUNSP and the MAC address of the ILOM SP  The MAC address for  each ILOM SP is unique     3  Log in to the SP and type the default user name  root  with
42. itself  and error information SP SEL  gets retained through reset   The BIOS reports  A Hyper Transport  sync flood error occurred on last  boot  press F1 to continue   BIOS POST The BIOS could not The BIOS displays an error message  logs the DMI Log Non fatal  Microcode find or load the error to DMI  and boots   Error CPU Microcode  Update to the CPU   The message most  likely appears when  a new CPU is  installed in a  motherboard with  an outdated BIOS   In this case  the  BIOS must be  updated   BIOS POST CMOS contents The BIOS displays an error message  logs the DMI Log Non fatal  CMOS failed the error to DMI  and boots   Checksum Bad Checksum check   Unsupported The BIOS supports The BIOS displays an error message  logs the DMI Log Fatal  CPU mismatched error  and halts the system   configuration frequency and  steppings in CPU  configuration  but  some CPUs might  not be supported   Correctable The CPU detects a The CPU corrects the error in hardware  No DMI Log Normal  error variety of interrupt or machine check is generated by SP SEL operation  correctable errors in the hardware  The polling is triggered every  the MCi_STATUS half second by SMI timer interrupts  and is  registers  done by the BIOS SMI handler   The SMI handler logs a message to the SP  SEL if the SEL is available  otherwise SMI  logs a message to DMI  The BIOS s polling  can be disabled through software SMI   Single fan Fan failure is The Front Fan Fault  Service Action Required  SP SEL Non fatal  fa
43. n  of the event  TABLE 3 1 describes the contents of the display     TABLE 3 1 Lines in IPMI Output          Event  hex  Description   8 UCE caused a Hypertransport sync flood which lead to system s warm  reset   0x02 refers to a reboot count maintained since the last AC power  reset    9 BIOS detected and initiated 4 processors in system    a BIOS detected a Sync Flood caused this reboot    b BIOS detected a hardware error caused the Sync Flood    c to le BIOS retrieved and reported some hardware evidence  including all  processors    Machine Check Error registers  events 14 to 18     1f After BIOS detected that a UCE had occurred  it located the DIMM and    reset  0x03 refers to reboot count     21 to 25 BIOS off lined faulty DIMMs from system memory space and reported  them  Each DIMM of a pair is being reported  since hardware UCE  evidence cannot lead BIOS any further than detection of a faulty pair        Correctable DIMM Errors    If a DIMM has 24 or more correctable errors in 24 hours  it is considered defective  and should be replaced     At this time  CEs are not logged in the server   s system event logs  They are reported  or handled in the supported OS s as follows       Windows Server   a  A Machine Check error message bubble appears on the task bar     b  The user must manually open Event Viewer to view errors  Access Event  Viewer through this menu path     Start   gt Administration Tools   gt Event Viewer  c  The user can then view individual errors  by time 
44. n the CLI HLT state        32 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    POST Code Checkpoints    The POST code checkpoints are the largest set of checkpoints during the BIOS pre   boot process  TABLE A 2 describes the type of checkpoints that might occur during  the POST portion of the BIOS  These two digit checkpoints are the output from  primary I O port 80     TABLE A 2 POST Code Checkpoints       Post Code Description    03 Disable NMI  Parity  video for EGA  and DMA controllers  At this point  only ROM  accesses go to the GPNV  If BB size is 64K  turn on ROM Decode below FFFFO000h  It  should allow USB to run in the E000 segment  The HT must program the NB specific  initialization and OEM specific initialization  and can program if it need be at beginning of  BIOS POST  similar to overriding the default values of kernel variables     04 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is  OK  Verify CMOS checksum manually by reading storage area  If the CMOS checksum is  bad  update CMOS with power on default values and clear passwords  Initialize status  register A  Initialize data variables that are based on CMOS setup questions  Initialize both  the 8259 compatible PICs in the system     05 Initialize the interrupt controlling hardware  generally PIC  and interrupt vector table     06 Do R W test to CH 2 count reg  Initialize CH 0 as system timer  Install the POSTINT1Ch  handler  Enable IRQ 0 in PIC for
45. nitializes different devices through DIM    39 Initializes DMAC 1 and DMAC 2    3A Initialize RTC date time    3B Test for total memory installed in the system  Also  Check for DEL or ESC keys to limit  memory test  Display total memory in the system    3C By this point  RAM read write test is completed  program memory holes or handle any  adjustments needed in RAM size with respect to NB  Test if HT Module found an error in  BootBlock and CPU compatibility for MP environment    40 Detect different devices  parallel ports  serial ports  and coprocessor in CPU    etc    successfully installed in the system and update the BDA  EBDA     etc    50 Programming the memory hole or any kind of implementation that needs an adjustment  in system RAM size if required    52 Updates CMOS memory size from memory found in memory test  Allocates memory for    Extended BIOS Data Area from base memory        34 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       TABLE A 2 POST Code Checkpoints  Continued    Post Code Description   60 Initializes NUM LOCK status and programs the KBD typematic rate    75 Initialize Int 13 and prepare for IPL detection    78 Initializes IPL devices controlled by BIOS and option ROMs    7A Initializes remaining option ROMs    7C Generate and write contents of ESCD in NVRam    84 Log errors encountered during POST    85 Displays errors to the user and gets the user response for error    87 Execute BIOS setup if needed requested    8C Afte
46. or Handling       This appendix contains information about how the servers process and log errors   See the following sections           Handling of Uncorrectable Errors    on page 53     Handling of Correctable Errors    on page 56     Handling of Parity Errors  PERR     on page 59     Handling of System Errors  SERR     on page 61     Handling Mismatching Processors    on page 63       Hardware Error Handling Summary    on page 64    Handling of Uncorrectable Errors    This section lists facts and considerations about how the server handles  uncorrectable errors        Note     The BIOS ChipKill feature must be disabled if you are testing for failures of  multiple bits within a DRAM  ChipKill corrects for the failure of a four bit wide  DRAM         The BIOS logs the error to the SP system event log  SEL  through the board  management controller  BMC      The SP s SEL is updated with the failing DIMM pair s particular bank address   The system reboots   The BIOS logs the error in DMI     53    Note     If the error is on low 1MB  the BIOS freezes after rebooting  Therefore  no  DMI log is recorded        m Anexample of the error reported by the SEL through IPMI 2 0 is as follows     When low memory is erroneous  the BIOS is frozen on pre boot low memory  test because the BIOS cannot decompress itself into faulty DRAM and execute  the following items        ipmitool gt  sel list    100   08 26 2005   11 36 09    OEM  0xfb            200   08 26 2005   11 36 12   System Firmw
47. or Readings Indicators       Event Logs          Sensor Readings    View readings for system sensors  Click on a sensor name for more information  including threshold values     Sensor Readings                Name Type Reading  ISYSIMBIPO PRSNT  Entity Presence Present   SYSIMBIPOIT_CORE Temperature 16 000 degrees C  ISYS MB POM_VDDCORE Voltage 1 140 Volts  ISYS MBIPOM_ 1V8 Voltage 1 836 Volts  ISYS MBIPOM_ 0V9 Voltage 0 912 Volts  ISYS MB PO PROCHOT Entity Fault State Deasserted  ISYSIMB P1 PRSNT Entity Presence Present  ISYSIMBIP1 T_CORE Temperature 16 000 degrees C    fOVCIMOoIDA AUNADO    du    altace       4 4 4Ntalte    3  Click the Refresh button to update the sensor readings to their current status     4  Click a sensor to display its thresholds     A display of properties and values appears  See the example in FIGURE C 4     Appendix C Using the ILOM Service Processor GUI to View System Information 51    FIGURE C 4 Sensor Details Page    https   10 6 143 113   Mozilla Firefox    Sun    Integrated Lights Out Manager    View all ofthe properties and values for a sensor      SYS MB PO PRSNT       Property Value      type Entity Presence  class       Discrete Sensor       value Present         10 6 143 113  amp          5  If the problem with the server is not evident after viewing sensor readings  information  continue with    Running SunVTS Diagnostic Tests    on page 7     52 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    APPENDIX D    Err
48. ord e0   00000058000000000000030000    10 OEM record e0 000100440000000000feff  000  11 OEM record e0 00010048000000000000ff3efa  12 OEM record e0 10ab0000000010000006040012  13 OEM record e0 10ab0000001111002011110020  14 OEM record e0 0018304c00  200002000020c0    15 OEM record e0 0019304c00f200004000020c0f  16 OEM record e0 001a304c00f45aa10015080a13  17 OEM record e0 001a3054000000000320004880  18 OEM record e0 001b304c00f200001000020c0f  19 OEM record e0 80000002000000000029000002  la OEM record e0 80000004000000000000b00006  1b OEM record e0 80000048000000000011110322  lc OEM record e0 80000058000000000000030000  14 OEM record e0 800100440000000000feff  000  le OEM record e0 80010048000000000000ff3efa  Tf 09 25 2007   03 22 06 System Boot Initiated  0x03   Initiated by warm  reset   Asserted  20   09 25 2007   03 22 06   Processor  0x04   Presence detected   Asserted  21 09 25 2007   03 22 15 System Firmware Progress  0x01   Memory  initialization   Asserted   22   09 25 2007   03 22 16   Memory   Uncorrectable  23   09 25 2007   03 22 16   Memory   Uncorrectable                      ECC   Asserted   CPU 2 DIMM 0  ECC   Asserted   CPU 2 DIMM 1                24   09 25 2007   03 22 16   Memory   Memory Device Disabled   Asserted   CPU  2 DIMM 0   25   09 25 2007   03 22 16   Memory   Memory Device Disabled   Asserted   CPU  2 DIMM 1          Chapter 3 Troubleshooting DIMM Problems 13    The lines in the display start with event numbers  in hex   followed by a descriptio
49. ostic Tests 7  SunVTS Documentation 8  Diagnosing Server Problems With the Bootable Diagnostics CD 8  Requirements 8    Using the Bootable Diagnostics CD 9    Troubleshooting DIMM Problems 11   DIMM Population Rules 11   DIMM Replacement Policy 12   How DIMM Errors Are Handled by the System 12    Uncorrectable DIMM Errors 12  Correctable DIMM Errors 14  BIOS DIMM Error Messages 15  DIMM Fault LEDs 15  Isolating and Correcting DIMM ECC Errors 18    Event Logs and POST Codes 21  Viewing Event Logs 21  Power On Self Test  POST  25  How BIOS POST Memory Testing Works 25  Redirecting Console Output 26  Changing POST Options 28  POST Codes 31  POST Code Checkpoints 33    Status Indicator LEDs 37  External Status Indicator LEDs 37  Front Panel LEDs 38  Back Panel LEDs 38  Hard Drive LEDs 39  Internal Status Indicator LEDs 39    Using the ILOM Service Processor GUI to View System Information 43  Making a Serial Connection to the SP 44  Viewing ILOM SP Event Logs 45  Interpreting Event Log Time Stamps 47  Viewing Replaceable Component Information 48    Viewing Sensors 50    Error Handling 53    Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    Handling of Uncorrectable Errors 53  Handling of Correctable Errors 56  Handling of Parity Errors  PERR  59  Handling of System Errors  SERR  61  Handling Mismatching Processors 63    Hardware Error Handling Summary 64    Index 69    Contents    v    vi Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August
50. ou utilisateurs finaux  pour des armes nucl  aires  des  missiles  des armes biologiques et chimiques ou du nucl  aire maritime  directement ou indirectement  sont strictement interdites  Les  exportations ou reexportations vers les pays sous embargo am  ricain  ou vers des entit  s figurant sur les listes d exclusion d exportation  am  ricaines  y compris  mais de maniere non exhaustive  la liste de personnes qui font objet d un ordre de ne pas participer  d une fa  on directe  ou indirecte  aux exportations des produits ou des services qui sont r  gis par la l  gislation am  ricaine sur le contr  le des exportations et la liste  de ressortissants sp  cifiquement d  sign  s  sont rigoureusement interdites     L utilisation de pi  ces d  tach  es ou d unit  s centrales de remplacement est limit  e aux r  parations ou    l   change standard d unit  s centrales  pour les produits export  s  conform  ment    la l  gislation am  ricaine en mati  re d exportation  Sauf autorisation par les autorit  s des Etats   Unis  l utilisation d unit  s centrales pour proc  der    des mises    jour de produits est rigoureusement interdite     ES com Ca    Adobe PostScript    Contents       Preface vii    Initial Inspection of the Server 1  Service Troubleshooting Flowchart 1  Gathering Service Information 2  System Inspection 3  Troubleshooting Power Problems 3  Externally Inspecting the Server 3    Internally Inspecting the Server 4    Using SunVTS Diagnostic Software 7  Running SunVTS Diagn
51. out of RAM    01d6 Key sequence and OEM specific method is checked to determine if BIOS recovery is  forced  If next code is E0  BIOS recovery is being executed  Main BIOS checksum is tested    01d7 Restoring CPUID  moving bootblock runtime interface module to RAM  determine  whether to execute serial flash    01d8 Uncompressing runtime module into RAM  Storing CPUID information in memory    01d9 Copying main BIOS into memory    Olda Giving control to BIOS POST    0004 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is  OK  If the CMOS checksum is bad  update CMOS with power on default values    00c2 Set up boot strap processor for POST  This includes frequency calculation  loading BSP  microcode  and applying user requested value for GART Error Reporting setup question    00c3 Errata workarounds applied to the BSP   78  amp   110     00c6 Re enable cache for boot strap processor  and apply workarounds in the BSP for errata     106   107   69  and  63 if appropriate     00c7 HT sets link frequencies and widths to their final values    000a Initializing the 8042 compatible Keyboard Controller    000c Detecting the presence of Keyboard in KBC port    000e Testing and initialization of different Input Devices  Traps the INTO9h vector  so that the    POST INTO9h handler gets control for IRQ1     8600 Preparing CPU for booting to OS by copying all of the context of the BSP to all application  processors present  NOTE  APs are left in the CLI HLT st
52. ps     When the service processor reboots  the SP clock is set to Thu Jan 1 00 00 00 UTC  1970  The SP reboots as a result of the following         complete system unplug replug power cycle   a An IPMI command  for example  mc reset cold         command line interface  CLI  command  for example  reset  SP    Appendix C Using the ILOM Service Processor GUI to View System Information 47    m ILOM web GUI operation  for example  from the Maintenance tab  selecting Reset  SP    a An SP firmware upgrade    After an SP reboot  the SP clock is changed by the following events     m When the host is booted  The host   s BIOS unconditionally sets the SP time to that  indicated by the host   s RTC  The host   s RTC is set by the following operations     a When the host s CMOS is cleared as a result of changing the host   s RTC battery  or inserting the CMOS clear jumper on the motherboard  The host   s RTC starts  at Jan 1 00 01 00 2002       When the host s operating system sets the host   s RTC  The BIOS does not  consider time zones  Solaris and Linux software respect time zones and will set  the system clock to UTC  Therefore  after the OS adjusts the RTC  the time set  by the BIOS will be UTC     a When the user sets the RTC using the host BIOS Setup screen     a Continuously via NTP if NTP is enabled on the SP  NTP jumping is enabled to  recover quickly from an erroneous update from the BIOS or user  NTP servers  provide UTC time  Therefore  if NTP is enabled on the SP  the SP clo
53. r  ISYSIMBIPO DO DIMM  ISYS MBIPOID1 DIMM  ISYSIMBIP0 D2 DIMM  ISYSIMBIPO D3 DIMM  ISYS MBIPOID4 DIMM   rove mmoroning Miaa             3  Select a component from the drop down list     Information about the selected component is displayed     4  If the problem with the server is not evident after viewing replaceable    component information  continue with    Running SunVTS Diagnostic Tests    on    page 7     Appendix C Using the ILOM Service Processor GUI to View System Information    49       Viewing Sensors    This section describes how to view the server temperature  voltage  and fan sensor  readings     For a complete list of sensors  see Appendix D   To view sensor readings   1  Log in to the SP as Administrator or Operator to reach the ILOM web GUI     a  Type the IP address of the server   s SP into your web browser     The Sun Integrated Lights Out Manager Login screen is displayed     b  Type your user name and password     When you first try to access the ILOM Service Processor  you are prompted to  type the default user name and password  The default user name and  password are     Default user name  root  Default password  changeme    2  From the System Monitoring tab  select Sensor Readings     The Sensor Readings page is displayed  See FIGURE C 3     50 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    FIGURE C 3 Sensor Readings Page    Administrator  root  SP    Sun    Integrated Lights Out Manager    00144F8D2DB7                Sens
54. r Handling    57    EXAMPLE D 1 DMI Log Screen  Correctable Error  Memory Decreased    View Event Log    Ma ve    09 12 05 13 30 00  Memory decreased in     05 13 29 54  on Node 1 DIMM Pair 0  SPD address 0A0h 04A2h     Memory Error       In      American Megatrend       58 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       Handling of Parity Errors  PERR     This section lists facts and considerations about how the server handles parity errors   PERR    m The handling of parity errors works through NMIs       During BIOS POST  the NMI is logged in the DMI and the SP SEL  See the  following example command and output            root d mpk12 53 238 root   ipmitool  H 129 146 53 95  U root  P changeme  I lan  sel list  v                SEL Record ID   0100   Record Type   00   Timestamp   01 10 2002 20 16 16  Generator ID   0001   EvM Revision   04   Sensor Type   Critical Interrupt  Sensor Number   00   Event Type   Sensor specific Discrete  Event Direction   Assertion Event  Event Data   04ff00   Description   PCI PERR       m FIGURE D 3 shows an example of a DMI log screen from BIOS Setup Page  with a  parity error     Appendix D Error Handling 59    FIGURE D 3 DMI Log Screen  PCI Parity Error    BIOS SETUP UTILITY    View Event Log    r T  09 12 05 14 27     PCI Parity       2002  American Megatrends  Inc    m The BIOS displays the following messages and freezes  during POST or DOS        NMI EVENT        System Halted due to Fatal NMI     m The 
55. r all device initialization is done  program any user selectable parameters relating to  NB SB  such as timing parameters  non cacheable regions and the shadow RAM  cacheability  and do any other NB SB PCIX OEM specific programming needed during  Late POST  Background scrubbing for DRAM  and L1 and L2 caches are set up based on  setup questions  Get the DRAM scrub limits from each node    8D Build ACPI tables  if ACPI is supported     8E Program the peripheral parameters  Enable Disable NMI as selected    90 Late POST initialization of system management interrupt    AO Check boot password if installed    Al Clean up work needed before booting to OS    A2 Takes care of runtime image preparation for different BIOS modules  Fills the free area in  F000h segment with OFFh  Initializes the Microsoft IRQ Routing Table  Prepares the  runtime language module  Disables the system configuration display if needed    A4 Initialize runtime language module    A7 Displays the system configuration screen if enabled  Initializes the CPUs before boot   which includes the programming of the MTRRs    A8 Prepare CPU for OS boot including final MTRR values    A9 Wait for user input at configuration display if needed    AA Uninstall POST INT1Ch vector and INTO9h vector  Deinitializes the ADM module    AB Prepare BBS for Int 19 boot    AC Any kind of Chipsets  NB SB  specific programming needed during End  POST  just    before giving control to runtime code booting to OS  Program the system BIOS  0F
56. re errors  handling  64    l  ILOM SP GUI  general information  43  serial connection  44  time stamps  47  viewing component inventory  48  viewing sensors  50  viewing SP event log  45  inspection    69    external  3  internal  4   Integrated Lights Out Manager Service Processor   See ILOM SP GUI   internal inspection  4   isolating DIMM ECC errors  18    L  LEDs  external  37  LEDs  ports  and slots illustrated  38  39    locations of ports  slots  and LEDs  illustration   38   39    M    mismatching processors  error handling  63    P  parity errors  handling  59  PERR  59  population rules for DIMMs  11  ports  slots  and LEDs illustrated  38  39  POST   changing options  28   code checkpoints  33   codes table  31   overview  25   redirecting console output  26  Power button  4 5  Power button location  4 5  power off procedure  4  power problems  troubleshooting  3  power on self test  see POST  processors mismatched  error  63    R  redirecting console output  26  related documentation  viii    S    safety guidelines  vii  sensors   viewing with ILOM SP GUI  50  serial connection to ILOM SP  44  SERR  61    Service Processor system event log  See SP SEL  service visit information  gathering  2  shutdown procedure  4  slots  ports  and LEDsillustrated  38  39  SP event log  viewing with ILOM SP GUI  45  SP SEL  time stamps  47  SunVTS  Bootable Diagnostics CD  8  documentation  8  logs  9  overview  7  system errors  handling  61    T  third party Web sites  ix  time 
57. s Diagnostics Guide   August 2008    After you have selected a category of event  the Event Log table is updated with the  specified events  The fields in the Event Log are described in TABLE C 1     TABLE C 1 Event Log Fields          Field Description  Event ID The number of the event  in sequence from number 1   Time Stamp The day and time the event occurred  If the Network Time Protocol     NTP  server is enabled to set the SP time  the SP clock will use  Universal Coordinated Time  UTC   For more information about  time stamps  see    Interpreting Event Log Time Stamps    on page 47     Sensor Name The name of a component for which an event was recorded  The  sensor name abbreviations correspond to these components   sys  System or chassis  e p0  Processor 0  e pl  Processor 1  e io  I O board  e ps  Power supply  e fp  Front panel  e ft  Fan tray    mb  Motherboard    Sensor Type The type of sensor for the specified event     Description A description of the event        4  To clear the event log  click the Clear Event Log button     A confirmation dialog box is displayed   5  Click OK to clear all entries in the log     6  If the problem with the server is not evident after viewing ILOM SP logs and  information  continue with    Running SunVTS Diagnostic Tests    on page 7     Interpreting Event Log Time Stamps    The system event log time stamps are related to the service processor clock settings   If the clock settings change  the change is reflected in the time stam
58. se system to malfunction                  CPU Configuration 7        IDE Configuration 7 ba      Hyper Transport Configuration          ACPT Configuration E        Event Log Configuration   E      TPMI 2 0 Configuration E        MPS Configuration          PCI Express Configuration mei Select Screen ES      Remote Access Configuration E ER Select Item        USB Configuration   Enter Go to Sub Screen        FI General Help        F10 Save and Exit 3       ESC Exit                    KKEKEKKKKKKKKEKKEKKKK KEKE KKK KKK KKK KKK KR KKK K RARA RAR AA RRA RARA AAA RARA AAA RA RARA AAA    v02 61  C Copyright 1985 2006  American Megatrends  Inc        22 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       b  From the Advanced Settings screen  select Event Log Configuration     The Advanced Menu Event Logging Details screen is displayed                          Advanced  kkkkxkkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk  Event Logging details   View all unread events  kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK on the Event Log    View Event Log a  Mark all events as read ig  Clear Event Log             w            E Select Screen  ETAS Select Item    Enter Go to Sub Screen    El General Help    F10 Save and Exit    ESC Exit       KKEKEKKKKKKKKKKKKK KKK KKK KKK RARA KK KARA RARA KKK KKK KKK KK KK RARA RA RARA RARA AAA                                    v02 61  C Copyright 1985 2006  American Megatrends  Inc          
59. stamps in ILOM SP SEL  47  troubleshooting   guidelines  2  typographic conventions  ix    U    uncorrectable errors  handling  53    70 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    
60. t 2008    CHAPTER 1       Initial Inspection of the Server       This chapter includes the following topics   m    Service Troubleshooting Flowchart    on page 1  m    Gathering Service Information    on page 2    m    System Inspection    on page 3       Service Troubleshooting Flowchart    Use the following flowchart as a guideline for using the subjects in this book to  troubleshoot the server     TABLE 1 1 Troubleshooting Flowchart          To perform this task Refer to this section   Gather initial service information     Gathering Service Information    on page 2   Investigate any powering on    Troubleshooting Power Problems    on page 3   problems    Perform external visual inspection    Externally Inspecting the Server    on page 3   and internal visual inspection     Internally Inspecting the Server    on page 4  Chapter 3   View BIOS event logs and POST    Viewing Event Logs    on page 21    messages     Power On Self Test  POST     on page 25       TABLE 1 1 Troubleshooting Flowchart  Continued        To perform this task Refer to this section  View service processor logs and    Using the ILOM Service Processor GUI to View  sensor information    System Information    on page 43    Or view service processor logs and    Using IPMItool to View System Information    on  sensor information  page 55    Run SunVTS diagnostics    Diagnosing Server Problems With the Bootable  Diagnostics CD    on page 8          Gathering Service Information    The first step in determinin
61. the Motherboard Fault LED on the mezzanine board lights  remove  the mezzanine board as described in your server   s service manual  and inspect the  LEDs on the motherboard        4  Disconnect the AC power cords from the server        Caution     Before handling components  attach an ESD wrist strap to a chassis  ground  any unpainted metal surface   The system   s printed circuit boards and hard  disk drives contain components that are extremely sensitive to static electricity           Note     To recover fault information look in the SP SEL  as described in the Sun  Integrated Lights Out Manager 2 0 User s Guide     5  Remove the DIMMs from the DIMM slots in the CPU     Refer to your server   s service manual for details     6  Visually inspect the DIMMs for physical damage  dust  or any other  contamination on the connector or circuits     7  Visually inspect the DIMM slot for physical damage  Look for cracked or  broken plastic on the slot     8  Dust off the DIMMs  clean the contacts  and reseat them        Caution     Use only compressed air to dust DIMMs     9  If there is no obvious damage  replace any failed DIMMs     For UCEs  if the LEDs indicate a fault with the pair  replace both DIMMs  Ensure  that they are inserted correctly with ejector latches secured     10  Reconnect AC power cords to the server     Chapter 3 Troubleshooting DIMM Problems 19    11  Power on the server and run the diagnostics test again     12  Review the log file   If the tests identify 
62. the same error  the problem is in the CPU  not the DIMMs     20 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    APPENDIX A    Event Logs and POST Codes       This appendix contains information about the BIOS event log  the BMC system event  log  the power on self test  POST   and console redirection  It contains the following  sections     m    Viewing Event Logs    on page 21  m    Power On Self Test  POST     on page 25       Viewing Event Logs    Use this procedure to view the BIOS event log and the BMC system event log     1  To turn on main power mode  all components powered on  if necessary  use a  ball point pen or other stylus to press and release the Power button on the  server front panel  See FIGURE 1 1     When main power is applied to the full server  the Power OK LED next to the  Power button lights and remains lit     2  Enter the BIOS Setup utility by pressing the F2 key while the system is  performing the power on self test  POST      The BIOS Main menu screen is displayed   3  View the BIOS event log     a  From the BIOS Main Menu screen  select Advanced     The Advanced Settings screen is displayed     21    Main Advanced PCIPnP Boot Security Chipset Exit  KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKEKKKKXEXEXXKX XX                              Advanced Settings   Configure CPU     x kkxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk OK      WARNING  Setting wrong values in below sections X      may cau
63. tial order and some are repeated   because some POST codes are issued by code  in add in card BIOS expansion ROMs    In the case of early POST failures  for  example  the BSP fails to operate correctly    BIOS just halts without logging    For some other POST failures subsequent to  memory and SP initialization  the BIOS logs a  message to the SP   s SEL        64 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008       TABLE D 1 Hardware Error Handling Summary  Continued   Logged  DMI  Log or SP  Error Description Handling SEL  Fatal   Single bit With ECC enabled The CPU corrects the error in hardware  No SP SEL Normal  DRAM ECC in the BIOS Setup  interrupt or machine check is generated by operation  error the CPU detects the hardware  The polling is triggered every  and corrects a half second by SMI timer interrupts and is  single bit error on done by the BIOS SMI handler   the DIMM interface  The BIOS SMI handler starts logging each  detected error and stops logging when the  limit for the same error is reached  The BIOS s  polling can be disabled through a software  interface   Single four bit With CHIP KILL The CPU corrects the error in hardware  No SP SEL Normal  DRAM error enabled in the BIOS interrupt or machine check is generated by operation  Setup  the CPU the hardware  The polling is triggered every  detects and corrects half second by SMI timer interrupts and is  for the failure of a done by the BIOS SMI handler   four bit wide The BIOS SMI 
64. vailable language  BIOS logo  and Silent logo modules    13 Initialize PM regs and PM PCI regs at Early POST  Initialize multi host bridge  if system  will support it  Setup ECC options before memory clearing  REDIRECTION causes  corrected data to written to RAM immediately  CHIPKILL provides 4 bit error det corr of  x4 type memory  Enable PCI X clock lines in the 8131    20 Relocate all the CPUs to a unique SMBASE address  The BSP will be set to have its entry  point at A000 0  If less than 5 CPU sockets are present on a board  subsequent CPUs entry  points will be separated by 8000h bytes  If more than 4 CPU sockets are present  entry  points are separated by 200h bytes  CPU module will be responsible for the relocation of  the CPU to correct address  NOTE  APs are left in the INIT state    24 Uncompress and initialize any platform specific BIOS modules    30 Initialize System Management Interrupt    2A Initializes different devices through DIM    2C Initializes different devices  Detects and initializes the video adapter installed in the  system that have optional ROMs    2E Initializes all the output devices    31 Allocate memory for ADM module and uncompress it  Give control to ADM module for  initialization  Initialize language and font modules for ADM  Activate ADM module    33 Initializes the silent boot module  Set the window for displaying text information    37 Displaying sign on message  CPU information  setup key message  and any OEM specific  information    38 I
65. you set this to off  the keyboard Num Lock is not turned on  during boot     m Wait for F1 if Error   This option is disabled by default  If you enable this  the  system will pause if an error is found during POST and will only resume when  you press the F1 key    a Interrupt 19 Capture     This option is reserved for future use  Do not change    a Default Boot Order   The letters in the brackets represent the boot devices  To    see the letters defined  position your cursor over the field and read the  definition in the right side of the screen     30 Sun Fire X4140  X4240  and X4440 Servers Diagnostics Guide   August 2008    POST Codes    TABLE A 1 contains descriptions of each of the POST codes  listed in the same order  in which they are generated  These POST codes appear as a four digit string that is a  combination of two digit output from primary I O port 80 and two digit output  from secondary I O port 81  In the POST codes listed in TABLE A 1  the first two  digits are from port 81 and the last two digits are from port 80     TABLE A 1 POST Codes       Post Code Description   00d0 Coming out of POR  PCI configuration space initialization  enabling 8111   s SMBus    00d2 Disable cache  full memory sizing  and verify that flat mode is enabled    00d3 Memory detections and sizing in boot block  cache disabled  IO APIC enabled    01d4 Test base 512KB memory  Adjust policies and cache first 8MB    01d5 Bootblock code is copied from ROM to lower RAM  BIOS is now executing 
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
STATION MÉTÉO SANS FIL 868 MHz  STIHL BGA 85 Instruction Manual  Voir la fiche technique du cale roue.  John Deere AC-375LP User's Manual      Emotiva XPA-1 User's Manual  Manuale d`istruzioni  BEAD-S-2003-017-A - Ministère de la Défense      Copyright © All rights reserved. 
   Failed to retrieve file