Home
        System Event Log Troubleshooting Guide for EPSD
         Contents
1.                             Byte Field Description   3 0      Event Trigger Offset   OA   Critical over temperature  15 Event Data 2 Not used  16 Event Data 3 Not used  5 2 6 1 DIMM Thermal Trip Sensors   Next Steps    Check for clear and unobstructed airflow into and out of the chassis    Ensure the SDR is programmed and correct chassis has been selected    Ensure there are no fan failures    Ensure the air used to cool the system is within the thermal specifications for the system  typically below 35  C      Pons    5 3 System Air Flow Monitoring Sensor    The BMC provides an IPMI sensor to report the volumetric system airflow in CFM  cubic feet per minute   The airflow in CFM is  calculated based on the system fan PWM values  The specific Pulse Width Modulation  PWM or PWMs  used to determine the CFM  is SDR configurable  The relationship between PWM and CFM is based on a lookup table in an OEM SDR     The airflow data is used in the calculation for exit air temperature monitoring  It is exposed as an IPMI sensor to allow a data center  management application to access this data for use in rack level thermal management     This sensor is informational only and will not log events into the SEL     58 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Processor Subsystem    6  Processor Subsystem    Intel   servers report multiple processor
2.              c ccceeeeeeeeeseeeneees 41  Table 23  Power Supply Power In Sensor     Event Trigger Offset     Next Steps            eeee 41  Table 24  Power Supply Current Out   Sensors Typical Characteristics               ccccccccceeesseees 42  Table 25  Power Supply Current Out   Sensor     Event Trigger Offset     Next Greng  42  Table 26  Power Supply Temperature Sensors Typical Characteristics              nnnnns0nnenntnrneeeeee 43  Table 27  Power Supply Temperature Sensor     Event Trigger Offset     Next Steps                  43  Table 28  Power Supply Fan Tachometer Sensors Typical Characteristics                 ccccceesees 44  Table 29  Fan Tachometer Sensors Typical Characteristics                cececeeeeeeeeeeeeeeeeeeeeeeeeeneeeees 45  Table 30  Fan Tachometer Sensor     Event Trigger Offset     Next Steps            ceeeeeeeeeeeeneeeees 46  Table 31  Fan Presence Sensors Typical Characteristics AAA 46  Table 32  Fan Presence Sensors     Event Trigger Offset     Next Giepe 47  Table 33  Fan Redundancy Sensors Typical Characteristics              cccecceseeeeeeeeeeeeeeeeeeeeseneeeees 47  Table 34  Fan Redundancy Sensor     Event Trigger Offset     Next Steps             eens 48  Table 35  Temperature Sensors Typical Characteristics             cccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeesenaeeees 49  Table 36  Temperature Sensors Event Triggers     Descrotton   50  Table 37  Temperature Sensors     Next Gren ENNEN 50  Table 38  Thermal Margin Sensors Typical Character
3.            scccccccceeeeeeeeneeeeceeeeeessseennaeeeeeeesseesseseeeeeeeeeeeessssnnaaees 24  Table 7  BIOS SMI Handler owned Gensors A 24  Table 8  Management Engine Firmware Owned Sensors            sssesssssrseessreteserrrsertrnrererensserrnnent 25  Table 9  Microsoft   OS Owned Events    26  Table 10  Linux  Kernel Panic Events    26  Table 11  Threshold based Voltage Sensors Typical Charachertsics eene 27  Table 12  Threshold based Voltage Sensors Event Triggers     Description    28  Table 13  Threshold based Voltage Sensors     Next Steps             ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeesenaeeees 28  Table 14  Voltage Regulator Watchdog Timer Sensor Typical Characteristics                 0 0006 34  Table 15  Power Unit Status Sensors Typical Characteristics              ccccceseeeeeeeeeeeeeeeeeeeeeeseneeees 35  Table 16  Power Unit Status Sensor     Sensor Specific Offsets     Next Steps            cee 35  Table 17  Power Unit Redundancy Sensors Typical Charachertsice  reee 36  Table 18  Power Unit Redundancy Sensor     Event Trigger Offset     Next Steps          snessnneesn  37  Table 19  Node Auto Shutdown Sensor Typical Characteristics              cccceeeeseeeeeeeeeeeeeeeeeeeeees 37  Table 20  Power Supply Status Sensors Typical Characteristics                cccccssssscceeeeeeeeesseseeees 38  Table 21  Power Supply Status Sensor     Sensor Specific Offsets     Next Steps           0 11100000   39  Table 22  Power Supply Power In Sensors Typical Characteristics       
4.          Byte Field Description  8 Generator ID 0001h   BIOS POST   9   11 Sensor Type Och   Memory   12 Sensor Number 02h                   70 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                                                             Memory Subsystem  Byte Field Description  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   09h  digital Discrete   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset as described in Table 59  15 Event Data 2 RAS Configuration Error Type   7 4    Reserved   3 0    Configuration Error  0   None  3   Invalid DIMM Configuration for RAS Mode  All other values are reserved   16 Event Data 3 RAS Mode Configured   7 4    Reserved   3 0    RAS Mode  Oh   None  Independent Channel Mode   1h   Mirroring Mode  2h   Lockstep Mode  4h   Rank Sparing Mode  Table 59  Memory RAS Configuration Status Sensor   Event Trigger Offset   Next Steps  Event Trigger Offset Description Next Steps  Hex Description  Oth   RAS configuration User enabled mirrored channel mode Informational event only   enabled  in setup   00h   RAS configuration Mirrored channel mode is disabled 1  If this event is accompanied by a post error 8500  there was a problem  disabled   either in se
5.       Table 58  Memory RAS Configuration Status Sensor Typical  ae  02h Memory RAS Configuration Status   Memory RAS Configuration Status Characteristics  System Firmware Progress  Formerly Post    06h POST Error Syston Firmware Progress  Formerny Post System Firmware Progress  Formerly Post Error      Next Steps  Error   On  S  09h Intel Quick Path Interface Link QPI Link Width Reduced Sensor QPI Link Width Reduced Sensor     Next Steps  Width Reduced meg eo Pe   12h Memory RAS Mode Select Memory RAS Mode Select Not applicable  83h System Event System Events             Not applicable       3 3 BIOS SMI Handler owned Sensors  GID   0033h     The following table can be used to find the details of sensors owned by BIOS SMI Handler     Table 7  BIOS SMI Handler owned Sensors                                                 Sensor Sensor Name Details Section Next Steps  Number   Oth Mirroring Redundancy State Mirroring Redundancy State Mirroring Redundancy State Sensor     Next Steps   02h Memory ECC Error Memory Correctable and Uncorrectable Table 64  Correctable and Uncorrectable ECC Error Sensor   y ECC Error Event Trigger Offset     Next Steps  03h Legacy PCI Error Legacy PCI Errors Legacy PCI Error Sensor     Next Steps  04h PCI Express  Fatal Error PCI Express  Fatal Errors and Fatal Error  2 PCI Express  Fatal Error and Fatal Error  2 Sensor     Next     express  ata US and Tata  Error  e Steps   05h PCI Express  Correctable Error PCI Express  Correctable Errors PCI Expres
6.    IPMI v1 0   ER   11   Sensor Type Sensor Type Code for sensor that generated the event   ST   12   Sensor   Number of sensor that generated the event  From SDR    SN   13   Event Dir   Event Dir  Event Type  7      Ob   Assertion event     EDIR      1b   Deassertion event     Event Type  Type of trigger for the event  for example  critical threshold going high  state asserted   and so on  Also indicates class of the event  For example  discrete  threshold  or OEM     The Event Type field is encoded using the Event Reading Type Code                    Revision 1 1 Intel order number G90620 002 5    Basic Decoding of a SEL Record    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                            Byte Field Description   6 0      Event Type Codes  01h   Threshold  States   0x00 0x0b   02h 0ch   Discrete  6Fh   Sensor Specific  70 7Fh   OEM  14   Event Data 1 Per Table 2   ED1   15   Event Data 2   ED2   16   Event Data3   ED3           Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Basic Decoding of a SEL Record    Table 2  Event Request Message Event Data Field Contents       Sensor Event Data    Class  Threshold   Event Data 1   7 6      00b   Unspecified Event Data 2  01b   Trigger reading in Event Data 2  10b   OEM code in Event 
7.    If you have replaced a hard drive  this is expected   Rebuild Remap S e o     g   07h in progress If you have a hot spare and one of the drives failed  this is expected  Check logs for which drive has failed    If this is seen unexpectedly  it could be an indication of a drive that is close to failing           12 3 Hot Swap Controller Health Sensor    The BMC supports an IPMI sensor to indicate the health of the Hot Swap Controller  HSC   This sensor will indicate that the  controller is offline for the cases that the BMC either cannot communicate with it or it is stuck in a degraded state so that the BMC  cannot restore it to full operation through a firmware update     Revision 1 1    Table 91  HSC Health Sensor Typical Characteristics                      Byte Field Description  11 Sensor Type 16h   Microcontroller  69h   Hot Swap Controller 1 Status  12 Sensor Number 6Ah   Hot Swap Controller 2 Status  6Bh   Hot Swap Controller 3 Status  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   OAh  Discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   4h   Transition to offline                Intel order number G90620 002    113    Hot Swap Controller Backplane Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families            
8.    MIC  Status Sensors Intel    Xeon Phi Coprocessor  MIC  Status Sensors Next Steps       Processor 1 DIMM Aggregate    BOh Thermal Margin 1 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P1 DIMM Thrm Mrgn1        Processor 1 DIMM Aggregate    Bih Thermal Margin 2 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P1 DIMM Thrm Mrgn2        Processor 2 DIMM Aggregate  B2h Thermal Margin 1 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P2 DIMM Thrm Mrgn1              Processor 2 DIMM Aggregate  B3h Thermal Margin 2 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P2 DIMM Thrm Mrgn2              Processor 3 DIMM Aggregate  B4h Thermal Margin 1 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P3 DIMM Thrm Mrgn1              Processor 3 DIMM Aggregate  B5h Thermal Margin 2 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P3 DIMM Thrm Mrgn2                 Processor 4 DIMM Aggregate  B6h Thermal Margin 1 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P4 DIMM Thrm Mrgn1                       20 Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Sensor Cross Reference List                                                       Sensor Sensor Name Details Section Next Steps  Num
9.    Sensor Number    05h       Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   71h  OEM Specific        Event Data 1     7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset   Oh   Receiver Error   1h   Bad DLLP   2h   Bad TLP   3h   Replay Num Rollover   4h   Replay Timer timeout   5h   Advisory Non fatal   6h   Link BW Changed   7h   Correctable Internal   8h   Header Log Overflow   Fh   Unspecified Non AER Correctable Error       15    Event Data 2    PCI Bus number       16          Event Data 3        7 3    PCI Device number   2 0      PCI Function number          Revision 1 1    Intel order number G90620 002    85    PCI Express  and Legacy PCI Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    8 1 3 1 PCI Express  Correctable Error Sensor   Next Steps  This is an informational event only  Correctable errors are acceptable and normal at a low rate of occurrence  If the error continues     1  Decode the bus  device  and function to identify the card   2  If this is an add in card   a  Verify the card is inserted properly   b  Install the card in another slot and check whether the error follows the card or stays with the slot   c  Update all firmware and drivers  including non Intel components   3  If this is an on board device   a  Update all
10.   84FF System event log full Minor  8500 Memory component could not be configured in the selected RAS mode Major  8501 DIMM Population Error Major  8520 DIMM AT failed test initialization Major  8521 DIMM_A2 failed test initialization Major  8522 DIMM_AS failed test initialization Major  8523 DIMM_B1 failed test initialization Major  8524 DIMM_B2 failed test initialization Major  8525 DIMM_B3 failed test initialization Major  8526 DIMM_C1 failed test initialization Major  8527 DIMM_C2 failed test initialization Major  8528 DIMM_C3 failed test initialization Major  8529 DIMM_D1 failed test initialization Major  852A DIMM_D2 failed test initialization Major  852B DIMM_D3 failed test initialization Major  852C DIMM_E1 failed test initialization Major  852D DIMM_E2 failed test initialization Major  852E DIMM_E38 failed test initialization Major  852F DIMM_F1 failed test initialization Major  8530 DIMM_F2 failed test initialization Major  8531 DIMM_FS failed test initialization Major  8532 DIMM_G1 failed test initialization Major  8533 DIMM_G2 failed test initialization Major                Intel order number G90620 002    System BIOS Events    9     System BIOS Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                                                                                                    Error Code Error Message Response  8534 DIMM_G8 failed test initialization Ma
11.   85DA DIMM_R1 disabled Major  85DB DIMM_R2 disabled Major  85DC DIMM_R3 disabled Major  85DD DIMM_T1 disabled Major  85DE DIMM_T2 disabled Major  85DF DIMM_TS3 disabled Major  85E0 DIMM_L3 encountered a Serial Presence Detection  SPD  failure Major  85E1 DIMM_M1 encountered a Serial Presence Detection  SPD  failure Major  85E2 DIMM_M2 encountered a Serial Presence Detection  SPD  failure Major  85E3 DIMM M   encountered a Serial Presence Detection  SPD  failure Major  85E4 DIMM_N1 encountered a Serial Presence Detection  SPD  failure Major  85E5 DIMM_N2 encountered a Serial Presence Detection  SPD  failure Major  85E6 DIMM_N3 encountered a Serial Presence Detection  SPD  failure Major  85E7 DIMM_P1 encountered a Serial Presence Detection  SPD  failure Major  85E8 DIMM_P2 encountered a Serial Presence Detection  SPD  failure Major  85E9 DIMM_P3 encountered a Serial Presence Detection  SPD  failure Major  85EA DIMM_R1 encountered a Serial Presence Detection  SPD  failure Major  85EB DIMM_R2 encountered a Serial Presence Detection  SPD  failure Major  85EC DIMM_R3 encountered a Serial Presence Detection  SPD  failure Major  85ED DIMM_T1 encountered a Serial Presence Detection  SPD  failure Major  85EE DIMM_T2 encountered a Serial Presence Detection  SPD  failure Major  85EF DIMM_T3 encountered a Serial Presence Detection  SPD  failure Major  8604 POST Reclaim of non critical NVRAM variables Minor                Intel order number G90620 002    System BIOS Events    95    System
12.   Catastrophic Error Sensor Typical Characteristics                                                       Byte Field Description  11 Sensor Type 07h   Processor  12 Sensor Number 80h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   03h  Digital Discrete   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset   1h  State Asserted   15 Event Data 2 Event Data 2 values as described in Table 50   16 Event Data 3 Bitmap of the CPU that causes the system CATERR    0   CPU1   1   CPU2   2   CPU3   3   CPU4  Note  If more than one bit is set  the BMC cannot  determine the source of the CATERR   Table 50  Catastrophic Error Sensor   Event Data 2 Values   Next Steps  ED2 Description Next Steps  1  h    Pag ee Cross test the processors    2  Replace the processors depending on the results of the test           Revision 1 1    Intel order number G90620 002    61    Processor Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    ED2 Description Next Steps       This error is typically caused by other platform components   1  Check for other errors near the time of the CATERR event                 Oih   CATERR 2  Verify all peripherals are plugged in and operating correctly  particularly Hard Drives  Optical Drives   and I O   3  Update system f
13.   RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR  WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE  MERCHANTABILITY  OR  INFRINGEMENT OF ANY PATENT  COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT     A  Mission Critical Application  is any application in which failure of the Intel Product could result  directly or indirectly   in personal injury or death  SHOULD YOU PURCHASE OR USE INTEL S PRODUCTS FOR ANY SUCH MISSION  CRITICAL APPLICATION  YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES   SUBCONTRACTORS AND AFFILIATES  AND THE DIRECTORS  OFFICERS  AND EMPLOYEES OF EACH   HARMLESS AGAINST ALL CLAIMS COSTS  DAMAGES  AND EXPENSES AND REASONABLE ATTORNEYS   FEES ARISING OUT OF  DIRECTLY OR INDIRECTLY  ANY CLAIM OF PRODUCT LIABILITY  PERSONAL  INJURY  OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION  WHETHER OR  NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN  MANUFACTURE  OR WARNING OF  THE INTEL PRODUCT OR ANY OF ITS PARTS     Intel may make changes to specifications and product descriptions at any time  without notice  Designers must not  rely on the absence or characteristics of any features or instructions marked  reserved  or  undefined   Intel reserves  these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from  future changes to them  The information here is subject to change without notice  Do not finalize a design with this  information     T
14.   Revision 1 1 Intel order number G90620 002 v    Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1 600 1 400 Product Families    9 2 1 System Firmware Progress  Formerly Post Error      Next Steps              ceeeeeeeeees 89   TO  C  assis Subsystem 2 6522 eevee sectesvasedoced sae sac ennec seen saan aaae ay seugseseiecs eves saccacetecseues Eoee eenseseaee 97  10 1 Physical Security EE 97  10 11  GAASSIS  ng e EE 97  101 2 LAN Ce  sh Feiere eege gue 97  10 2 FP  NMI  Interrupt cote  colent depen EE ege Seege 98  10 2 1    PP  NMI  Interrupt     Next Steps enge cc decercesacetsedeneedeacadateadeta vehi EES penitence 99  10 3 e RE 100  11  Miscellaneous E 101  11 1 BIN Een OO EE 101  11 2 MOM MIMIC EE 102  11 2 1 SMI Timeout     Next Steps wees  s ckescissdedesa deca apeldetd edees nate aslcaiadk auesiendeeedeadeasteeee 103  11 3 System Event Log Cleared a 4 atv ea eee ie ace ace Gs Ba es eds see 103  11 4 System  Event     PEP ACHON DE 104  11 4 1 System Event     PEF Action     Next Steps EEN 104  11 5 BMC Watchdog EE TEE 105  11 5 1 BMC Watchdog Sensor     Next Gens    105  11 6 BMC FW Health BEE 106  11 6 1 BMC FW Health Sensor     Next Giepns AAA 106  11 7 Firmware Update Status Gensor  EEN 107  11 8 Add In Module Presence Gensor ANNE 108  11 8 1 Add In Module Presence     Next Giepe AEN 108  11 9 Intel   Xeon Phi    Coprocessor Management Sensors          cccccesesssessssssesssesensees 
15.  001Fh   RT  Record Type    02h   system event record   TS  Timestamp    4F8D70C3h   GID  Generator ID   0033h   BIOS SMI Handler   ER  Event Message Revision    04   IPMI v2 0   ST  Sensor Type    12h   System Event  From IPMI Specification Table 42 3  Sensor Type Codes    SN  Sensor Number   83h   EDIR  Event Direction Event Type    6Fh   7    0   Assertion Event   6 0    6fh   Sensor specific   ED1  Event Data 1    05h   Timestamp Clock Synchronization   ED2  Event Data 2    00h   First in pair    RID 20  00  RT 02  TS C4  70  SD  4F  GID 33  00  ER 04  ST 12  SN 83  EDIR 6F  ED1 05  ED2 80  ED3 FF     2 2 2    RID  Record ID    0020h   RT  Record Type    02h   system event record   TS  Timestamp    4F8D70C4h   GID  Generator ID   0033h   BIOS SMI Handler   ER  Event Message Revision    04   IPMI v2 0   ST  Sensor Type    12h   System Event  From IPMI Specification Table 42 3  Sensor Type Codes    SN  Sensor Number   83h   EDIR  Event Direction Event Type    6fh   7    0   Assertion Event   6 0    6fh   Sensor specific   ED1  Event Data 1    05h   Timestamp Clock Synchronization   ED2  Event Data 2    00h   First in pair    Example of Decoding a PCI Express  Correctable Error Events    Basic Decoding of a SEL Record    The following is an example of decoding a PCI Express  correctable error event  For this particular event it recorded a receiver error  on Bus 0  Device 2  and Function 2  Note that correctable errors are acceptable and normal at a low rate of occurrence    
16.  010b   DIMM Socket 3  All other values are reserved              7 5 2 1 Memory Address Parity Error Sensor   Next Steps    These are bit errors that are detected in the memory addressing hardware  An Address Parity Error implies that the memory address  transmitted to the DIMM addressing circuitry has been compromised  and data read or written is compromised in turn  An Address  Parity Error is logged as such in SEL but in all other ways is treated the same as an Uncorrectable ECC Error     While the error may be due to a failing DRAM chip on the DIMM  it can also be cause by incorrect seating or improper contact  between the socket and DIMM  or by the bent pins in the processor socket     1  If needed  decode DIMM location from hex version of SEL     2  Verify the DIMM is seated properly   3  Examine gold fingers on edge of the DIMM to verify contacts are clean     Revision 1 1 Intel order number G90620 002 79       Memory Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    4  Inspect the processor socket this DIMM is connected to for bent pins  and if found  replace the board   5  Consider replacing the DIMM as a preventative measure  For multiple occurrences  replace the DIMM     80 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  PCI
17.  1  Verify the power budget is within the specified range   going high 2  Check http   Awww intel com p en_US support  for the  power budget tool for your system                          42 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    444    Power Supply Temperature Sensors    Power Subsystems    The BMC monitors one or two power supply temperature sensors for each installed PMBus  compliant power supply     Table 26  Power Supply Temperature Sensors Typical Characteristics       Byte Field    Description       11 Sensor Type    01h   Temperature          12 Sensor Number 5Ch   Power Supply 1 Temperature  5Dh   Power Supply 2 Temperature  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold        14 Event Data 1     7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0      Event Trigger Offset as described in Table 27       15 Event Data 2    Reading that triggered event             16 Event Data 3          Threshold value that triggered event       The following table describes the severity of each of the event triggers for both assertion and deassertion     Table 27  Power Supply Temperature Sensor   Event Trigger Offset   Next Steps                               Event Trigger Offset Assertion Deass
18.  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM         3 3V Riser 1 Power Good is supplied by Riser 1 on specific platforms    3 3V Riser 1 Power Good is an indication of the  3 3V on Riser 1        EAh Baseboard  3 3V Riser 1 Power Good 1  Ensure that the riser is seated correctly    BB  3 3 RSR1 PGD  2  If issue remains  replace the riser   3  If issue remains  replace the main board   4  Ifthe issue remains  replace the power supplies    3 3V Riser 2 Power Good is supplied by Riser 2 on specific platforms    3 3V Riser 2 Power Good is an indication of the  3 3V on Riser 2   EBh Baseboard  3 3V Riser 2 Power Good 1  Ensure that the riser is seated correctly      BB  3 3 RSR2 PGD  2  If issue remains  replace the riser   3  If issue remains  replace the main board   4    If the issue remains  replace the power supplies                    32 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                            Power Subsystems  Sensor Sensor Name Next Steps  Number   0 9V Core IB is supplied by the main board on specific platforms    0 9V Core IB is used by the on board Infiniband  controller on those specific platforms   Baseboard  0 9V  ECh 1  Ensure all cables are connected correctly    BB 0 9V Core IB  F    2  If t
19.  600 1 400 Product Families Introduction    1  Introduction    The server management hardware that is part of the Intel   Server Boards and Intel   Server  Platforms serves as a vital part of the overall server management strategy  The server  management hardware provides essential information to the system administrator and provides  the administrator the ability to remotely control the server  even when the operating system is  not running     The Intel   Server Boards and Intel   Server Platforms offer comprehensive hardware and  software based solutions  The server management features make the servers simple to manage  and provide alerting on system events  From entry to enterprise systems  good overall server  management is essential to reduce overall total cost of ownership     This Troubleshooting Guide is intended to help the users better understand the events that are  logged in the Baseboard Management Controllers  BMC  System Event Logs  SEL  on these  Intel   Server Boards     There is a separate User   s Guide that covers the general server management and the server  management software offered on the Intel   Server Boards and Intel   Server Platforms     Server boards currently supported by this document     Intel   S1400FP Server Boards   Intel   S1400SP Server Boards  Intel   S1600JP Server Boards   Intel   S2400BB Server Boards  Intel   S2400EP Server Boards  Intel   S2400GP Server Boards  Intel   S2400LP Server Boards   Intel   S2400SC Server Boards  Intel  
20.  88          15 Event Data 2 Reading that triggered event       16 Event Data 3 Threshold value that triggered event                   Revision 1 1 Intel order number GS0620 002 111    Hot Swap Controller Backplane Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 88  HSC Backplane Temperature Sensor   Event Trigger Offset   Next Steps                         Event Trigger Assertion Deassert Description Next Steps  Hex Description Severity Severity  00h   Lower non critical   Degraded OK The temperature has dropped below its lower Check for clear and unobstructed airflow into  going low non critical threshold  and out of the chassis   02h   Lower critical non fatal Degraded   The temperature has dropped below its lower Ensure the SDR is programmed and correct  going low critical threshold  chassis has been selected   07h   U itical   D ded OK The temperature has gone over its upper non See en es  er non critica egrade e   EES  pper n 9 a p g PP Ensure the air used to cool the system is within  going high critical threshold  A ed  the thermal specifications for the system  09h   Upper critical non fatal Degraded The temperature has gone over its upper  typically below 35  C    going high critical threshold                          12 2 Hard Disk Drive Monitoring Sensor    The new backplane design for EPSD Platforms Based on Intel   Xeon   Processor E5 4600 2600 2400 1600 
21.  BIOS  firmware  and drivers   b  Replace the board     86 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  System BIOS Events    9  System BIOS Events    There are a number of events that are owned by the system BIOS  These events can occur during Power On Self Test  POST  or  when coming out of a sleep state  Not all of these events signify errors  Some events are described in other chapters in this  document  for example  memory events      9 1 System Events  These events can occur during POST or when coming out of a sleep state  These are informational events only     1  When logging events during BIOS POST uses generator ID 0001h   2  When logging events during BIOS SMI Handler uses generator ID 0033h     9 1 1 System Boot    At the end of POST  just before the actual OS boot occurs  a System Boot Event is logged  This basically serves to mark the  transition of control from completed POST to OS Loader  It is an informational only event     9 1 2 Timestamp Clock Synchronization   These events are used when the time between the BIOS and the BMC is synchronized  Two events are logged  The BIOS does the   first one to send the time synch message to the BMC for synchronization  and the timestamp that message gets is unknown  that is    the timestamp in the log can be anything because it gets the  before  timestamp    So the BIOS sends a s
22.  BIOS Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    96                                                       Error Code Error Message Response  8605 BIOS Settings are corrupted Major  8606 NVRAM variable space was corrupted and has been reinitialized Major  92A3 Serial port component was not detected Major  92A9 Serial port component encountered a resource conflict error Major  A000 TPM device not detected  Minor  A001 TPM device missing or not responding  Minor  A002 TPM device failure  Minor  A003 TPM device failed self test  Minor  A100 BIOS ACM Error Major  A421 PCI component encountered a SERR error Fatal  A5A0 PCI Express  component encountered a PERR error Minor  A5A1 PCI Express  component encountered an SERR error Fatal  A6A0 DXE Boot Services driver  Not enough memory available to shadow a Legacy Option ROM    Minor          Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide Tor EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Chassis Subsystem    10  Chassis Subsystem    The BMC monitors several aspects of the chassis  Next to logging when the power and reset buttons get pressed  the BMC also  monitors chassis intrusion if a chassis intrusion switch is included in the chassis  as well as looking at the network connections  and  logging an event whenever the physical network li
23.  Byte Field Description  15 Event Data 2 Not used  16 Event Data 3 Not used                   12 3 1 HSC Health Sensor   Next Steps  Ensure that all connections to the HSC are well seated     Cross test with another HSC  If the issue remains with the HSC  replace the HSC  otherwise start cross testing all interconnections     114 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Manageability Engine  ME  Events    13  Manageability Engine  ME  Events    The Manageability Engine controls the PECI interface and also contains the Node Manager functionality     13 1 ME Firmware Health Event    This sensor is used in Platform Event messages to the BMC containing health information including but not limited to firmware  upgrade and application errors     Table 92  ME Firmware Health Event Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 002Ch or 602Ch     ME Firmware  9  11 Sensor Type DCh   OEM  12 Sensor Number 17h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   75h  OEM     14 Event Data 1  7 6      10b   OEM code in Event Data 2    5 4      10b   OEM code in Event Data 3    3 0      Health event type     Oh  Firmware Status   15 Event Data 2 See Table 93    16 Event Data 3 See Table 93                            13 1 1 ME Firmware
24.  Express  and Legacy PCI Subsystem    8  PCI Express  and Legacy PCI Subsystem    The PCI Express   PCle  Specification defines standard error types under the Advanced Error Reporting  AER  capabilities  The  BIOS logs AER events into the SEL     The Legacy PCI Specification error types are PERR and SERR  These errors are supported and logged into the SEL     8 1 PCI Express  Errors    PCle error events are either correctable  informational event  or fatal  In both cases information is logged to help identify the source  of the PCle error and the bus  device  and function is included in the extended data fields  The PCle devices are mapped in the  operating system by bus  device  and function  Each device is uniquely identified by the bus  device  and function  PCle device  information can be found in the operating system     8 1 1 Legacy PCI Errors  Legacy PCI errors include PERR and SERR  both are fatal errors     Table 66  Legacy PCI Error Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type 13h   Critical Interrupt  12 Sensor Number 03h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset                      Revision 1 1 Intel order number G90620 002 81    PCI Express  and Lega
25.  Fan Presence and Redundancy Sensors AEN 46   5 2 Temperature CET 49    iv Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5    4600 2600 2400 1600 1 400 Product Families Table of Contents  5 2 1 Threshold based Temperature Sensors Au 49  5 2 2 Thermal Margin Sensors seed ENER Eed ERRCASANEEEE ENEE Ku 51  5 2 3 Processor Thermal Control Gensors  ensena 53  5 2 4 Processor DTS Thermal Margin Sensors EEN 55  5 2 5 Discrete Thermal Sensors egen bn i mresstesticveden araccedehe nates oenaeat atusaonieadadaansaie 55  5 2 6 DIMM  Thermal Trip SONSOls sscc cc ciscscteceeesnencenescecseneadata cache th centauenstnenetbancneesadncte ds 57   5 3 System Air Flow Monitoring SQNSOM  sccessc0 cccesiccecgagteneeeessedeteese tn Ueteeeensd ceded 58  6  IPFOCESSOF Subsystem   eessen 59  6 1 Proc  ssor Status SENSO ccd cctecteis e e E E EEA aE 59  6 2 Catastrophic ee 61  6 3 ES ER Eet RE E 62  6 3 1 CPU Missing Sensor     Next Steps ek 63   6 4 Quick Path Interconnect Sensors EEN 63  6 4 1 QPI Link Width Reduced Sensor Seed haensnlactenees gucte easeedeneG beads Eegen 63  6 4 2 QPI Correctable el EE 64  6 4 3 QPI Fatal Error and Fatal Error  2  s ecc scia  Gestcacactsheiaeeeitauidedeeintienekcieakialee 65   6 5 Processor    ERR2 ue tegt  eege foe 206 halen lan el ues deed EEN 67  6 5 1 Processor ERR2 Timeout     Next Steps AAA 68   6 6 Processor MSID Mismatch Sensor ENEE 68  6 6 1 Processor MSID Mismatc
26.  Health Event   Next Steps  In the following table Event Data 3 is only noted for specific errors   If the issue continues to be persistent  provide the content of Event Data 3 to Intel support team for interpretation  Event Data 3    codes are in general not documented  because their meaning only provides some clues  varies  and usually needs to be individually  interpreted     Revision 1 1 Intel order number G90620 002 115    Manageability Engine  ME  Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 93  ME Firmware Health Event Sensor   Next Steps                                                          ED2   ED3 Description Next Steps  00h Recovery GPIO forced  Recovery Image loaded due to recovery 1  Deassert MGPIO1 and reset the Intel   ME  1     Image execution failed   MGPIO pin asserted  Pin number is configurable in factory presets  Recovery Image or backup operational image loaded because  Default recovery pin is MGPIO1  operational image is corrupted  This may be either caused by flash  device corruption or failed upgrade procedure   2  Either the flash device must be replaced  if error is persistent  or the  upgrade procedure must be started again   02h Flash erase error  Error during flash erasure procedure  The flash device must be replaced   03h 00h   Flash state information  Recovery bootloader image or factory presets image corrupted   Oth   Check exten
27.  If the system has a heatsink fan  ensure the fan is spinning   Check all system fans are operating properly   Check that the air used to cool the system is within limits  typically 35  C      Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Cooling Subsystem    5 2 4 Processor DTS Thermal Margin Sensors    Intel   Xeon   processor E5 4600 2600 2400 1600 v2 product families are incorporating a DTS based thermal spec  This allows a  much more accurate control of the thermal solution and enables lower fan speeds and lower fan power consumption  For Intel    Xeon   processor E5 4600 2600 2400 1600 product families  this requires significant BMC FW calculations to derive the sensor value   Intel   Xeon   processor E5 4600 2600 2400 1600 v2 product families are the follow on processors to Intel   Xeon   processor E5   4600 2600 2400 1600 product families  For Intel   Xeon   processor E5 4600 2600 2400 1600 v2 product families  the BMC   s  derivation of this value is greatly simplified because the majority of the calculations are performed within the processor itself     The main usage of this sensor is as an input to the BMC   s fan control algorithms  The BMC implements this as a threshold sensor   There is one DTS sensor for each installed physical processor package  Thresholds are not set and alert generation is not enabled  for these se
28.  Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                      Sensor Sensor Name Details Section Next Steps   Number  D2h coe pas ae Table 13  Threshold based Voltage Sensors     Next Steps  D3h GE a aoe Table 13  Threshold based Voltage Sensors     Next Steps  D4h ere GPS ee Voltage Table 13  Threshold based Voltage Sensors     Next Steps       Baseboard  1 05V Processor1  Det Vecp t Threshold based Voltage    Sensors Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 05Vccp P1  Spe at          Baseboard  1 05V Processor2  D7h Vecp i Threshold based Voltage    Se  sors Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 05Vccp P2  WEN          Baseboard  1 5V P1 Memory AB  D8h VDDQ S y Threshold based Voltage    Sensors Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 5 P1MEM AB  ee       Baseboard  1 5V P1 Memory CD  Dot VDDQ j y Threshold based Voltage    Sensors Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 5 P1MEM CD  faces some       Baseboard  1 5V P2 Memory AB  DAh VDDQ   y Threshold based Voltage    Sensors Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 5 P2MEM AB  aw       Baseboard  1 5V P2 Memory CD  DBh VDDQ a Y Threshold based Voltage       Table 13  Threshold based Voltage Sensors     Next Steps                            Sensors    BB  1 5 P2MEM CD  SE  Baseboard  1 8V Aux Threshold based Voltag
29.  Out   rable 25  Power Supply Current Out   Sensor     Event Trigger Offset       PS2 Curr Out     5Ch Power Supply 1 Temperature Power Supply Temperature Table 27  Power Supply Temperature Sensor     Event Trigger Offset     Next   PS1 Temperature  Sensors Steps  16 Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Sensor Cross Reference List                                                                               Sensor Sensor Name Details Section Next Steps  Number  5Dh Power Supply 2 Temperature Power Supply Temperature Table 27  Power Supply Temperature Sensor     Event Trigger Offset     Next   PS2 Temperature  Sensors Steps  60h 68h Hard Disk Drive 15     23 Status Hard Disk Drive Monitoring Table 90  Hard Disk Drive Monitoring Sensor   Event Trigger Offset     Next   HDD 15     23 Status  Sensor Steps  Hot Swap Controller 1 3 Status Hot Swap Controller Health  69h 6Bh HSC Health Sensor     Next Steps   HSC      3 Status  Sensor  Processor 1 Status  70h Processor Status Sensor Table 48  Processor Status Sensors     Next Steps   P1 Status  a n  P 2  71h rcessolie Status Processor Status Sensor Table 48  Processor Status Sensors     Next Steps   P2 Status  Og Natasa oe ee e  Processor 3 Status  72h Processor Status Sensor Table 48  Processor Status Sensors     Next Steps   P3 Status   Processor 4 Status  73h Processor Status S
30.  Processor E5 4600 2600 2400 1600 Product Families  this timeout is 500ms     Revision 1 1    Intel order number G90620 002    33    Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families      If the SystemPowerGood signal has not asserted by the time the VR Watchdog Timer expires  the FW powers down the system   logs a SEL entry  and emits a beep code  1 5 1 2   This failure is termed as VR Watchdog Timeout     Table 14  Voltage Regulator Watchdog Timer Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 02h   Voltage  12 Sensor Number OBh  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   03h     digital    Discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset   1h   State Asserted    15 Event Data 2 Not used  16 Event Data 3 Not used                            4 2 1 Voltage Regulator Watchdog Timer Sensor   Next Steps    1  Ensure that all the connectors from the power supply are well seated   2  Cross test the baseboard  If the issue remains with the baseboard  replace the baseboard     4 3 Power Unit  The power unit monitors the power state of the system and logs the state changes in the SEL   4 3 1 Power Unit Status Sensor    The power unit status sensor monitors the power state of t
31.  RID 27  00  RT 02  TS 0A  9B  2E  50  GID 33  00  ER 04  ST 13  SN 05  EDIR 71  ED1 A0  ED1 00  ED3 12     RID  Record ID    0027h    Revision 1 1 Intel order number G90620 002    11    Basic Decoding of a SEL Record  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    RT  Record Type    02h   system event record  TS  Timestamp    502E9BOAh  GID  Generator ID   0033h   BIOS SMI Handler  ER  Event Message Revision    04   IPMI v2 0  ST  Sensor Type    13h   Critical Interrupt  From IPMI Specification Table 42 3  Sensor Type Codes   SN  Sensor Number   05h  EDIR  Event Direction Event Type    71h   7    0   Assertion Event   6 0    71h   OEM Specific for PCI Express  correctable errors   ED1  Event Data 1    AOh   7 6    10b   OEM code in Event Data 2    5 4    10b   OEM code in Event Data 3    3 0      Event Trigger Offset   Oh   Receiver Error  ED2  Event Data 2    00h  PCI Bus number   0  ED3  Event Data 3    12h   7 3    PCI Device number   02h    2 0      PCI Function number   2    2 2 3 Example of Decoding a Power Supply Predictive Failure Event    The following is an example of decoding a Power Supply predictive failure event  For this example power supply 1 saw an A C power  loss event with both the input under voltage warning and fault events getting set  In most cases this means that the A C power spiked  under the minimum warning and fault thresholds for over 20 milliseconds 
32.  S2600CO Server Boards  Intel   S2600CP Server Boards  Intel   S2600GZ S2600GL Server Boards  Intel   S2600IP Server Boards   Intel   S2600JF Server Boards   Intel   S2600WP Server Boards  Intel   S4600LH Server Boards   Intel   W2600CR Workstation Boards    1 1 Purpose    The purpose of this document is to list all possible events generated by the Intel platform  It may  be possible that other sources  not under our control  also generate events  which will not be  described in this document     Revision 1 1 Intel order number G90620 002      Introduction System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1600 1 400 Product Families    1 2 Industry Standard  1 2 1 Intelligent Platform Management Interface  IPMI     The key characteristic of the Intelligent Platform Management Interface  IPMI  is that the  inventory  monitoring  logging  and recovery control functions are available independently of the  main processors  BIOS  and operating system  Platform management functions can also be  made available when the system is in a power down state     IPMI works by interfacing with the BMC  which extends management capabilities in the server  system and operates independently of the main processor by monitoring the on board  instrumentation  Through the BMC  IPMI also allows administrators to control power to the  server  and remotely access BIOS configuration and operating system console information     IPMI defines 
33.  Sensor Specific        14 Event Data 1     7 6      11b   Sensor specific event extension code in Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset   4h   Sensor failure       15 Event Data 2    Sensor number of the failed sensor       16 Event Data 3             Not used          11 6 1 BMC FW Health Sensor   Next Steps    1  Check the SEL for any other events around the time of the failure   Take note of all IPMI activity that was occurring around the time of the failure  Capture a System BMC Debug Log as soon as you  can after experiencing this failure  This log can be captured from the Integrated BMC Web Console or by using the Intel   Syscfg  utility  syscfg  somcdl private filename zip   Send the log file to your system manufacturer or Intel representative for failure    analysis     106    If the failure continues around a specific sensor  replace the board with that sensor     Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Miscellaneous Events    11 7 Firmware Update Status Sensor    The BMC FW supports a single Firmware Update Status sensor  This sensor is used to generate SEL events related to update of  embedded firmware on the platform  This includes updates to the BMC  BIOS  and ME FW     This sensor is an event only sensor that is not readable  Event generation is only enabled for assert
34.  Status event is logged after an AC power on occurs  only if any RAS Mode is currently configured  and  only if RAS Mode is successfully initiated     This is to make sure that there is a record in the SEL telling what the RAS Mode was at the time that the system started up  This is  only logged after AC power on  not DC power on     The Memory RAS Configuration Status Sensor is also used to log an event during POST whenever there is a RAS configuration  error  This is a case where a RAS Mode has been selected but when the system boots  the memory configuration cannot support the  RAS Mode  The memory configuration fails  and operates in Independent Channel Mode     In the SEL record logged  the ED1 Offset value is    RAS Configuration Disabled     and ED3 contains the RAS Mode that is currently  selected but could not be configured  ED2 gives the reason for the RAS configuration failure     at present  only two    RAS  Configuration Error Type    values are implemented   0   None     This is used for an AC power on log record when the RAS configuration is successfully configured   3   Invalid DIMM Configuration for RAS Mode     The installed DIMM configuration cannot support the currently selected RAS  Mode  This may be due to DIMMs that have failed or been disabled  so when this reason has been logged  the user  should check the preceding SEL events to see whether there are DIMM error events     Table 58  Memory RAS Configuration Status Sensor Typical Characteristics       
35.  Steps   P4 VRD Hot  ee ee       18    Intel order number G90620 002    Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Sensor Cross Reference List                                                                            Sensor Sensor Name Details Section Next Steps  Number   Processor 1 Memory VRD Hot 0 1   94h  P1 Mem01 VRD Se Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 1 Memory VRD Hot 2 3   95h  P1 Mem23 VRD Se Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 2 Memory VRD Hot 0 1   96h  P2 Mem01 VRD SE Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 2 Memory VRD Hot 2 3   97h  P2 Mem23 VRD Se Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 3 Memory VRD Hot 0 1   98h  P3 Mem01 VRD SCH Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 3 Memory VRD Hot 2 3   99h  P4 Mem23 VRD ae Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 4 Memory VRD Hot 0 1   9Ah  P4 Mem01 VRD an Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Processor 4 Memory VRD Hot 2 3   9Bh  P4 Mem23 VRD Si Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps  Power Supply 1 Fan Tachometer 1   Power Supply Fan Tachometer   Fowe
36.  Subsystems  Sensor Sensor Name Next Steps  Number   12V is supplied by the power supplies    12V is used by SATA drives  Fans  and PCI cards  In addition it is used to generate various processor  voltages   DOh SE 1  Ensure all cables are connected correctly    ete Ov  2  Check connections on the fans and HDDs   3  If the issue follows the component  swap it  otherwise  replace the board   4  Ifthe issue remains  replace the power supplies    5 0V is supplied by the power supplies for pedestal systems  and supplied by the main board on rack   optimized systems    5 0V is used by the PCI slots   Baseboard  5V 1  Ensure all cables are connected correctly   pin  BB  5 0V  2  Reseat any PCI cards   3  Try PCI cards in other PCI slots   4  Ifthe issue follows the card  swap it  otherwise  replace the main board   5  If the issue remains  replace the power supplies    3 3V is supplied by the power supplies for pedestal systems  and supplied by the main board on rack   optimized systems    3 3V is used by the PCle and PCI X slots   Baseboard  3 3V 1  Ensure all cables are connected correctly   Den  BB  3 3V  2  Reseat any PCI cards   3  Try PCI cards in other PCI slots   4  Ifthe issue follows the card  swap it  otherwise  replace the main board   5  If the issue remains  replace the power supplies    5 0V STBY is supplied by the power supplies for pedestal systems  and supplied by the main board on  rack optimized systems   4 Baseboard  5V Stand by  5 0V STBY is used to generate o
37.  Type   OBh  Generic Discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset as described in Table 18  15 Event Data 2 Not used             16 Event Data 3 Not used                36 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Power Subsystems    Table 18  Power Unit Redundancy Sensor   Event Trigger Offset   Next Steps                                                    Event Trigger Offset Description Next Steps  Hex Description  00h   Fully redundant System is fully operational  Informational Event  Oth Redundancy lost  02h Redundancy degraded  03h   Non redundant  sufficient from redundant     s op           System is not running in This event is accompanied by specific power supply errors  04h Non redundant  sufficient from insufficient redundant power supply mode  ae    failure  and so on   Troubleshoot these events  05h   Non redundant  insufficient  06h Non redundant  degraded from fully redundant  07h Redundant  degraded from non redundant  4 3 3 Node Auto Shutdown Sensor    The BMC supports a Node Auto Shutdown sensor for logging a SEL event due to an emergency shutdown of a node due to loss of  power supply redundancy or PSU CLST throttling due to an over current warning condition  This sensor is applicable only to multi   node systems     Th
38.  Type 01h   Temperature   12 Sensor Number 78h   Processor 1 Thermal Control    79h   Processor 2 Thermal Control                     Revision 1 1 Intel order number G90620 002 53    Cooling Subsystem    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Byte    Field    Description       7Ah   Processor 3 Thermal Control    7Bh   Processor 4 Thermal Control         Event Direction and    Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   01h  Threshold        Event Data 1     7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0      Event Triggers as described in Table 42       15    Event Data 2    Reading that triggered event          16       Event Data 3          Threshold value that triggered event       Table 42  Processor Thermal Control Sensors Event Triggers   Description                                     Event Trigger Assertion Deassert Description  Hex Description Severity Severity  07h a oon ee Degraded OK The thermal margin has gone over its upper non critical threshold   09h SE non fatal Degraded   The thermal margin has gone over its upper critical threshold   5 2 3 1 Processor Thermal Control   Sensors   Next Steps    These events normally occur due to failures of the thermal solution     ON  gt     54    Verify heatsink is properly attached and has thermal grease  
39.  as soon as the microcontroller is  initialized     Monitoring Interface available indicates that Intel   Intelligent Power Node Manager has the capability to monitor power and  temperature  This is generally available when firmware is operational     Power limiting interface available indicates that Intel   Intelligent Power Node Manager can do power limiting and is indicative of an  ACPI compliant OS loaded  unless the OEM has indicated support for non ACPI compliant OS      Current value of not acknowledged capability sensor will be retransmitted no faster than every 300 milliseconds     Next steps depend on the policy that was set  See the Node Manager Specification for more details     Revision 1 1 Intel order number GS0620 002 121    Manageability Engine  ME  Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    13 5 Node Manger Alert Threshold Exceeded  Policy Correction Time Exceeded Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit     Table 97  Node Manager Alert Threshold Exceeded Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 002Ch     ME Firmware  9  11 Sensor Type DCh   OEM  12 Sensor Number 1Bh  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   72h  OEM   14 Event Data 1  7 6      10b   OEM code in Event Dat
40.  bent pins  and if found  replace the board           Revision 1 1    Intel order number G90620 002    77    Memory Subsystem    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Event Trigger Offset       Hex Description    Description    Next Steps                   5     Consider replacing the DIMM as a preventative measure     For multiple occurrences  replace the DIMM           7 5 2 Memory Address Parity Error    Address Parity errors are errors detected in the memory addressing hardware  Because these affect the addressing of memory  contents  they can potentially lead to the same sort of failures as ECC errors  They are logged as a distinct type of error because  they affect memory addressing rather than memory contents  but otherwise they are treated exactly the same as Uncorrectable ECC  Errors  Address Parity errors are logged to the BMC SEL  with Event Data to identify the failing address by channel and DIMM to the    extent that it is possible to do so     Table 65  Address Parity Error Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler   9   11 Sensor Type Och   Memory   12 Sensor Number 13h   13 Event Direction and  7  Event direction    Event Type    Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific        14 Event Data 1     7 6      10b   OEM code in Event Data 2   5 
41.  centric sensors in the SEL     6 1 Processor Status Sensor    The BMC provides an IPMI sensor of type processor for monitoring status information for each processor slot  If an event state   sensor offset  has been asserted  it remains asserted until one of the following happens     e Arearm Sensor Events command is executed for the processor status sensor   e ACor DC power cycle  system reset  or system boot occurs     CPU Presence status is not saved across A C power cycles and therefore will not generate a deassertion after cycling AC power     Table 47  Process Status Sensors Typical Characteristics             Byte Field Description  11 Sensor Type 07h   Processor  12 Sensor Number 70h   Processor 1 Status    71h   Processor 2 Status  72h   Processor 3 Status  73h   Processor 4 Status       13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset as described in Table 48  15 Event Data 2 Not used    16 Event Data 3 Not used                            Revision 1 1 Intel order number G90620 002 59    Processor Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 48  Processor Status Sensors   Next Steps                                                 Eve
42.  fault   associated PMBus  Status  04h     Over tamperature fault register  For example  Data 3 will  have the contents of the  05h     Fan fault VOLTAGE_STATUS register at  the time an Output Voltage fault  was detected  Refer to the  PMBus  Specification for details  on specific register contents           AC     2  If the power supply  still fails  replace it        Revision 1 1    Intel order number G90620 002    39       Power Subsystems    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                      Sensor Specific Offset Description ED2 ED3 Next Steps  Hex Description  02h   Predictive Check the data in ED2 10b   OEM code in Event Data 2 10b   OEM code in Event Data 3 Depends on the warning  Failure and ED3 for more details    a 01h     Output voltage warning event     02h     Output power warning Will have the contents of the 1  Replace the power    03h     Output over current associated PMBus  Status supply   warning register  For example  Data 3 will 2  Verify proper airflow    04h    Over t t have the contents of the to the system    Lver temperature warning   VOLTAGE_STATUS register at    f 3  Verify the power    05h     Fan warning the time an Output Voltage Source  D 06h     Input under voltage warning was detected  Refer to  warning the PMBus  Specification for 4  Rates the system    07h Input over current e specific register i  warning SS HSH    08h     Input ov
43.  generate a critical interrupt      Table 77  IPMI Watchdog Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 23h   Watchdog 2  12 Sensor Number 03h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset as describe in Table 78    15 Event Data 2 Not used  16 Event Data 3 Not used                            Revision 1 1 Intel order number GS0620 002 101    Miscellaneous Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 78  IPMI Watchdog Sensor Event Trigger Offset   Next Steps          Description    Next Steps                         Event Trigger Offset  Hex Description  ool   eye  Oih   Hard reset  02h   Power down  03h   Power cycle  08h   Timer interrupt       Our server systems support a BMC watchdog timer   which can check to see whether the OS is still    responsive  The timer is disabled by default  and has to  be enabled manually  It then requires an IPMl aware  utility in the operating system that will reset the timer  before it expires  If the timer does expire  the BMC can    take action if it is configured to do so  reset  power  down  power cycle  or generate a critical interrupt         If this event 
44.  owned by the BMC     Table 5  BMC owned Sensors                                                                Sensor Sensor Name Details Section Next Steps  Number   Power Unit Status s Table 16  Power Unit Status Sensor     Sensor Specific Offsets     Next   Oth Power Unit Status Sensor   Pwr Unit Status  Steps  Power Unit Redundancy Table 18  Power Unit Redundancy Sensor     Event Trigger Offset     Next      Ower Unit Reaunaancy sensor p   02h  Pwr Unit Redund  Power Unit Redundancy Sensor Steps  IPMI Watch   03h See IPMI Watchdog Table 78  IPMI Watchdog Sensor Event Trigger Offset     Next Steps   IPM  Watchdog   Physical Security e         04h i Physical Security Table 74  Physical Security Sensor Event Trigger Offset     Next Steps   Physical Scrty   FP Interrupt   o5h   FP  NMI  Interrupt FP  NMI  Interrupt     Next Steps   FP NMI Diag Int   SMI Timeout   06h   SMI Timeout SMI Timeout     Next Steps   SMI Timeout   System Event Lo   07h y g System Event Log Cleared Not applicable   System Event Log   System Event f     08h System Event     PEF Action System Event     PEF Action     Next Steps   System Event   Button Sensor   09h Button Sensor Not applicable   Button    Revision 1 1 Intel order number G90620 002 13       Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                                                                         Sensor Se
45.  panic string  0 if no panic string   13 Event Direction and Event Type  7  Event direction    Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset   1h   Runtime Critical Stop  a k a     core dump        blue screen              15 Event Data 2 The second byte of the panic string       16 Event Data 3 The third byte of the panic string                   130 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Linux  Kernel Panic Records    Table 106  Linux  Kernel Panic String Extended Record Characteristics                      Byte Field Description  1 Record ID ID used for SEL Record access  2  3 Record Type  7 0    FOh   OEM non timestamped  bytes 4 16 OEM defined  4 Slave Address The slave address of the card saving the panic  5 Sequence A sequence number  starting at zero   Number  6 Kernel Panic Data   These hold the panic sting  If the panic string is longer than 11 bytes  multiple messages will be sent with increasing sequence  SE numbers   16                   Revision 1 1 Intel order number G90620 002 131    
46.  power cycle with Recovery jumper asserted  If this does not clear  the issue  reflash the SPI flash   10h  Reserved  FFh                116    Intel order number G90620 002    Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Manageability Engine  ME  Events    13 2 Node Manager Exception Event    A Node Manager Exception Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit     Table 94  Node Manager Exception Sensor Typical Characteristics                                           Byte Field Description  8 Generator ID 002Ch or 602Ch     ME Firmware  9  11 Sensor Type DCh   OEM  12 Sensor Number 18h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   72h  OEM   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3      Node Manager Policy event  0     Reserved  1     Policy Correction Time Exceeded     Policy did not meet the contract for the defined policy  The policy will continue to  limit the power or shut down the platform based on the defined policy action    2      Reserved   1 0      00b  15 Event Data 2  4 7      Reserved   0 3      Domain Id  Currently  supports only one domain  Domain 0   16 Event Data 3 Policy Id  13 2 1 Node Manager Exception Event   Next Steps    This is an informational eve
47.  selected   Table 33  Fan Redundancy Sensors Typical Characteristics  Byte Field Description  11 Sensor Type 04h   Fan  12 Sensor Number OCh    Revision 1 1                   Intel order number G90620 002    47       Cooling Subsystem    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Byte    Field Description       13    Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   OBh  Generic Discrete        14 Event Data 1     7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3     3 0  Event Trigger Offset as described in Table 34       15 Event Data 2    Not used             16 Event Data 3    Not used             The following table describes the severity of each of the event triggers for both assertion and deassertion     Table 34  Fan Redundancy Sensor   Event Trigger Offset   Next Steps                                     Event Trigger Offset Description Next Steps  Hex Description  00h   Fully redundant The system has lost one or more fans and is running in non  Fan redundancy loss indicates failure of  oik  Bedundanc   igsi redundant mode  There are enough fans to keep the system one or more fans   4 properly cooled  but fan speeds will boost  Look for lower  non   critical fan errors   02h   Redundancy degraded or fan removal errors in the SEL  to  Se indicate which fan is causin
48.  sensor is rearmed on power on  AC or DC power on transitions      68 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Processor Subsystem    6 6 1    Verify the processor is supported by your baseboard  Check your boards Technical Product Specification  TPS      Revision 1 1    Table 57  Processor MSID Mismatch Sensor Typical Characteristics                                  Byte Field Description  11 Sensor Type 07h   Processor  12 Sensor Number 81h   Processor 1 MSID Mismatch  87h   Processor 2 MSID Mismatch  88h   Processor 3 MSID Mismatch  89h   Processor 4 MSID Mismatch  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   03h     digital    discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h  State Asserted   15 Event Data 2 Not used  16 Event Data 3 Not used          Processor MSID Mismatch Sensor   Next Steps    Intel order number G90620 002    69    Memory Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    7  Memory Subsystem    Intel   servers report memory errors  status  and configuration in the SEL     7 1 Memory RAS Configuration Status    A Memory RAS Configuration
49.  the first entry in the SEL  and continue sequentially to n  the number of entries in the SEL        12 Bug Check   Blue Screen The first record of this type contains the Bug Check   Blue Screen Stop code and is followed by the four Bug Check   Blue  13   Data Screen parameters  LSB first           14 Note that each of the Bug Check   Blue Screen parameters requires two records each   15 Both of the two records for each parameter have the same Record ID    There is a total of nine records   16   Operating system type 00   32 bit OS   01   64 bit OS                Revision 1 1 Intel order number G90620 002 129    Linux  Kernel Panic Records  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    15  Linux  Kernel Panic Records    The Open IPMI driver supports the ability to put semi custom and custom events in the system event log if a panic occurs  If you  enable the    Generate a panic event to all BMCs on a panic    option  you will get one event on a panic in a standard IPMI event format   If you enable the    Generate OEM events containing the panic string    option  you will also get a set of OEM events holding the panic  string     Table 105  Linux  Kernel Panic Event Record Characteristics                      Byte Field Description  8 Generator ID 0021h     Kernel   9   10 EvM Rev 03h   IPMI 1 0 format   11 Sensor Type 20h   OS Stop Shutdown   12 Sensor Number The first byte of the
50. 109  11 9 1 Intel   Xeon Phi    Coprocessor  MIC  Thermal Margin Sensors  109  11 9 2 Intel   Xeon Phi    Coprocessor  MIC  Status Sensors  109   12  Hot Swap Controller Backplane Event             ccccessseeeeeeeeeeeeeseeeeeeeeeseeeeeeeseeneeeeseeeeeeeeeees 111  12 1 HSC Backplane Temperature Sensor             cccceeeseeeeececeeeeeeeseeseeeeeeeeeeeeessesntaaeeees 111  12 2 Hard Disk Drive Monitoring Sensor sistance A as decease ats aS 112  12 3 Hot Swap Controller Health Gensor  AEN 113  12 3 1 HSC Health Sensor     Next Steps           ceececeeeeeceeeeeeecneeeeeeeaeeeeeeceeeeeeeeneeeeeenenaees 114   13  Manageability Engine  ME  Events                secceeeesseeeeeeeeeeeeeensaneeeeneeeeeeeeseeeeseeeeeeeeeeeeenneas 115  13 1 ME Firmware Health Ewent ees   eet Edge cipdiseeees beaenertea enneasdenene 115  13 1 1 ME Firmware Health Event     Next Steps           cccccceceeeeeeeeeeeeeeeeeenaeeeteeeneeeeeeeneeees 115  13 2 Node Manager Exception Event se cccscicessescecacinsdaceey cevgydeecncet gedet   estSde Edge 117  13 2 1 Node Manager Exception Event     Next Steps  0         cccceeecccceeeeeeeeeeeeeeseeeeeteeeeeees 117  13 3 Node Manager Health Event               ccccecseeeccececeeeeeesseeeeneeeeeeeeeeeeeseneeeeeeeeeeneensnaees 118  13 3 1 Node Manager Health Event     Next Steps             ccceeccceeeeeseeeeeeeeeeeeteeeneeeeteeneeees 119  13 4 Node Manager Operational Capabilities Change               c cceccceeeseeeeeeeeetteeeteeees 120  13 4 1 Node Manager Operationa
51. 14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset   Oh                            15 Event Data 2 Low Byte of POST Error Code  16 Event Data 3 High Byte of POST Error Code  9 2 1 System Firmware Progress  Formerly Post Error    Next Steps    See the following table for POST Error Codes     Revision 1 1 Intel order number G90620 002 89    System BIOS Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 72  POST Error Codes                                                                                              Error Code Error Message Response  0012 System RTC date time not set Major  0048 Password check failed Major  0140 PCI component encountered a PERR error Major  0141 PCI resource conflict Major  0146 PCI out of resources error Major  0191 Processor core thread count mismatch detected Fatal  0192 Processor cache size mismatch detected Fatal  0194 Processor family mismatch detected Fatal  0195 Processor Intel R  QPI link frequencies unable to synchronize Fatal  0196 Processor model mismatch detected Fatal  0197 Processor frequencies unable to synchronize Fatal  5220 BIOS Settings reset to default settings Major  5221 Passwords cleared by jumper Major  5224 Password clear jumper is Set Major  8130 Processor 01 disabled Major  8131 Processor 02 disabled Major  8132 Processor 03 disabled Ma
52. 37  Temperature Sensors     Next Steps  Network Interface Controller  1hresnoid oasea _   emperature     2Fh Temperature Tee based Temperature Table 37  Temperature Sensors     Next Steps   LAN NIC Temp  ower   Fan Tachometer Sensors    30h 3Fh   SE Fan Tachometer Sensors Table 30  Fan Tachometer Sensor     Event Trigger Offset     Next Steps   Chassis specific sensor names  ert Bes os ae  Fan Present Sensors y    40h   4Fh  Fame Besant  Fan Presence and Redundane Table 32  Fan Presence Sensors     Event Trigger Offset     Next Steps  Power Supply 1 Status   i gi ifi S  50h pply Power Supply Status Sensors Table 16  Power Unit Status Sensor     Sensor Specific Offsets     Next   PS1 Status  Steps  Power Supply 2 Status Table 16  Power Unit Status Sensor     Sensor Specific Offsets     Next  Fower supply status sensors p  51h  PS2 Status  Power Supply Status Sensors Steps  Power Supply 1 AC Power Input Table 23  Power Supply Power In Sensor     Event Trigger Offset     Next  Fower supply Fower In sensors p  54h  ES Power In  Power Supply Power In Sensors Stens  Power Supply 2 AC Power Input Table 23  Power Supply Power In Sensor     Event Trigger Offset     Next  Fower supply Fower In sensors p  55h  PS2 Power In  Power Supply Power In Sensors Stens  Power Supply 1  12V   of  58h E Output Power Supply Current Out   sae 2s  Power Supply Current Out   Sensor     Event Trigger Offset       PS1 Curr Out    HESE  Power Supply 2  12V   of  59h Nevin Caren Output power Supply Current
53. 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset   2h          15 Event Data 2           7 5      Reserved  Set to 0    4      Channel Information Validity Check    Ob   Channel Number in Event Data 3 Bits 4 3  is not valid   1b   Channel Number in Event Data 3 Bits 4 3  is valid   3      DIMM Information Validity Check    Ob   DIMM Slot ID in Event Data 3 Bits 2 0  is not valid       78    Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Memory Subsystem       Byte Field Description       1b   DIMM Slot ID in Event Data 3 Bits 2 0  is valid   2 0      Error Type   000b   Parity Error Type not known  001b   Data Parity Error  not used   010b   Address Parity Error  All other values are reserved           16 Event Data 3  7 5      Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached   0 3   CPU1 4  All other values are reserved    4 3      Channel Number  if valid  on which the Parity Error occurred  This value will be indeterminate and should be ignored if ED2  Bit  4  is Ob   00b   Channel A  01b   Channel B  10b   Channel C  11b   Channel D     2 0      DIMM Slot ID  if valid  of the specific DIMM that was involved in the transaction that led to the parity error  This value will  be indeterminate and should be ignored if ED2 Bit  3  is Ob     000b   DIMM Socket 1  001b   DIMM Socket 2 
54. 51  CPU Missing Sensor Typical Characteristics              cccccceeeeeeeeeeeeeeeeeseeeeeeeeeeeeeeeeenaeeees 62  Table 52  QPI Link Width Reduced Sensor Typical Characteristics               cecceceeeeeeeeeeeeeeeneeees 63  Table 53  QPI Correctable Error Sensor Typical Charachertsics     neee 64  Table 54  QPI Fatal Error Sensor Typical Characteristics             cccccceeeeeseneeeeeeeeeeeeeeeeeeeeeeseneeeees 65  Table 55  QPI Fatal  2 Error Sensor Typical Characteristics               ccceceseeeeeeeeeeeeeeeeeeeeeeeeneeeees 66  Table 56  Processor ERR2 Timeout Sensor Typical Characteristics              ssnneeeeeeeeeeneeeserre neee 68  Table 57  Processor MSID Mismatch Sensor Typical Characteristics         0    ccccceeeeeeereeeeeneees 69  Table 58  Memory RAS Configuration Status Sensor Typical Charachertstce    70  Table 59  Memory RAS Configuration Status Sensor     Event Trigger Offset     Next Steps         71  Table 60  Memory RAS Mode Select Sensor Typical Characteristics         0     ccccceeeeeeteeeeneees 72  Table 61  Mirroring Redundancy State Sensor Typical Characteristics               ssssseeeeeeneeeeeee eene 73  Table 62  Sparing Redundancy State Sensor Typical Characteristics                c ccccccceeeeesesesees 75  Table 63  Correctable and Uncorrectable ECC Error Sensor Typical Characteristics                  76  Table 64  Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset     Next Steps77  Table 65  Address Parity Error Sensor Typical Cha
55. B  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM                    Revision 1 1 Intel order number G90620 002 31    Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Sensor Sensor Name Next Steps  Number       This 1 35V line is supplied by the main board   This 1 35V line is used by processor 1 memory slots C and D   Baseboard  1 35V P1 Low Voltage yp        ES5h Memory CD VDDQ 1  Ensure all cables are connected correctly    BB  1 35 P1LV CD  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM        This 1 35V line is supplied by the main board   This 1 35V line is used by processor 2 memory slots A and B   Baseboard  1 35V P2 Low Voltage ep  EE YP   y  E6h Memory AB VDDQ   nsure all cables are connected correctly    BB  1 35 P2LV AB  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM        This 1 35V line is supplied by the main board   This 1 35V line is used by processor 2 memory slots C and D   Baseboard  1 35V P2 Low Voltage a E Y    y  E7h Memory CD VDDQ 1  nsure all cables are connected correctly    BB  1 35 P2LV CD 
56. DDh   OEM timestamped  bytes 8 16 OEM defined   4 Timestamp Time when the event was logged  LS byte first    5   6   7   8 IPMI Manufacturer 0137h  311d    IANA enterprise number for Microsoft   9 ID 0157h  343    IANA enterprise number for Intel   10 The value logged depends on the Intelligent Management Bus Driver  IMBDRV  that is loaded    11 Record ID Sequential number reflecting the order in which the records are read  The numbers start at 1 for the first entry in the SEL and  continue sequentially to n  the number of entries in the SEL           Revision 1 1    Intel order number G90620 002    127    Microsoft windows   Records  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                      Byte Field Description   12 Shutdown Comment   Shutdown Comment from the registry  LSB first     13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown Comment  14   15   16 Reserved 00h          14 3 Bug Check   Blue Screen Event Records    When the system experiences a bug check  blue screen   multiple records will be written to the event log  The first is a Bug Check    Blue Screen OS Stop Shutdown Event Record  this can be followed by multiple Bug Check   Blue Screen code OEM records that will  contain the Bug Check   Blue Screen codes  This information can be used to determine what caused the failure     128    Table 103  Bug Check Blue Screen   OS Stop Event Record 
57. Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset as described in Table 45  15 Event Data 2 Not used  16 Event Data 3 Not used  Table 45  Discrete Thermal Sensors   Next Steps  Sensor Sensor Name Event Event Trigger Offset Description Next Steps  Number Type ie  yP Hex Description  ODh SSB Thermal Trip 03h Oih   State Asserted South Side Bridge  SSB  overheated Check for clear and unobstructed  airflow into and out of the chassis   90h P1 VRD Hot 05h Oth Limit Exceeded Processor 1 voltage regulator overheated F  Ensure the SDR is programmed and  91h P2 VRD Hot Processor 2 voltage regulator overheated correct chassis has been selected   92h P3 VRD Hot Processor 3 voltage regulator overheated Ensure there are no fan failures   Ensure the air used for cooling the  93h P4 VRD Hot Processor 4 voltage regulator overheated system is within the thermal  94h P1 Mem01 VRD Hot Processor 1 Memory 0 1 voltage regulator specifications for the system  typically  overheated below 35  C    95h P1 Mem23 VRD Hot Processor 1 Memory 2 3 voltage regulator  overheated  96h P2 Mem01 VRD Hot Processor 2 Memory 0 1 voltage regulator  overheated  56 Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Cooling Subsystem                                                                Sensor Sensor Name Event Event Trigger Offset Description 
58. Data 2  11b   Sensor specific event extension code in Event Data 2   5 4      00b   Unspecified Event Data 3  01b   Trigger threshold value in Event Data 3  10b   OEM code in Event Data 3  11b   Sensor specific event extension code in Event Data 3   3 0      Offset from Event Reading Code for threshold event   Event Data 2     Reading that triggered event  FFh or not present if unspecified   Event Data 3     Threshold value that triggered event  FFh or not present if unspecified  If present  Event Data 2 must be present           discrete Event Data 1   7 6      00b   Unspecified Event Data 2   01b   Previous state and or severity in Event Data 2   10b   OEM code in Event Data 2   11b   Sensor specific event extension code in Event Data 2   5 4      00b   Unspecified Event Data 3   01b   Reserved   10b   OEM code in Event Data 3   11b   Sensor specific event extension code in Event Data 3   3 0      Offset from Event Reading Code for discrete event state  Event Data 2   7 4      Optional offset from    Severity    Event Reading Code  OPER if unspecified     3 0      Optional offset from Event Reading Type Code for previous discrete event state  OFh if unspecified    Event Data 3     Optional OEM code FFh or not present if unspecified        OEM Event Data 1    7 6      00b   Unspecified in Event Data 2  01b   Previous state and or severity in Event Data 2  10b   OEM code in Event Data 2                Revision 1 1 Intel order number G90620 002    Basic Decoding of a SEL Recor
59. Event    6 0  Event Type   76h  OEM Specific   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset   Oh   Atomic Egress Blocked   1h   TLP Prefix Blocked   Fh   Unspecified Non AER Fatal Error    15 Event Data 2 PCI Bus number    16 Event Data 3  7 3      PCI Device number   2 0      PCI Function number                            8 1 2 1 PCI Express  Fatal Error and Fatal Error  2 Sensor   Next Steps    1  Decode the bus  device  and function to identify the card   2  If this is an add in card   a  Verify the card is inserted properly   b  Install the card in another slot and check whether the error follows the card or stays with the slot   c  Update all firmware and drivers  including non Intel components   3  If this is an on board device   a  Update all BIOS  firmware  and drivers   b  Replace the board     8 1 3 PCI Express  Correctable Errors    When a PCI Express    correctable error is reported to the BIOS SMI handler  it will record the error using the following format     84 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Table 69  PCI Express  Correctable Error Sensor Typical Characteristics    PCI Express  and Legacy PCI Subsystem       Field    Description       Generator ID    0033h   BIOS SMI Handler       Sensor Type    13h   Critical Interrupt    
60. Exception Action   3 0      Domain Id  16 Event Data 3 If Error type   10 or 15  lt Policy Id gt                          118 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Manageability Engine  ME  Events       Byte Field Description  If Error type   11  lt Power Sensor Address gt    If Error type   12  lt Inlet Sensor Address gt    Otherwise set to 0                       13 3 1 Node Manager Health Event   Next Steps    Misconfigured policy can happen if the max min power consumption of the platform exceeds the values in policy due to hardware  reconfiguration     First occurrence of not acknowledged event will be retransmitted no faster than every 300 milliseconds     Real time clock synchronization failure alert is sent when NM is enabled and capable of limiting power  but within 10 minutes the  firmware cannot obtain valid calendar time from the host side  so NM cannot handle suspend periods     Next steps depend on the policy that was set  See the Node Manager Specification for more details     Revision 1 1 Intel order number GS0620 002 119    Manageability Engine  ME  Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    13 4 Node Manager Operational Capabilities Change    This message provides a runtime error indication about Intel   
61. FF   RID  Record ID    0119h  RT  Record Type    02h   system event record  TS  Timestamp    4E6A4957h  GID  Generator ID   0001h   BIOS POST  ER  Event Message Revision    04   IPMI v2 0  ST  Sensor Type    12h   System Event  From IPMI Specification Table 42 3  Sensor Type Codes   SN  Sensor Number   83h  EDIR  Event Direction Event Type    6fh   7    0   Assertion Event   6 0    6fh   Sensor specific  ED1  Event Data 1    05h   Timestamp Clock Synchronization  ED2  Event Data 2    00h   First in pair    RID 1A  01  RT 02  TS 57  49  6A  4E  GID 01  00  ER 04  ST 12  SN 83  EDIR 6F  ED1 05  ED2 80  ED3 FF     10 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    2 2 1 2    RID  Record ID    011Ah   RT  Record Type    02h   system event record   TS  Timestamp    4E6A4957h   GID  Generator ID   0001h   BIOS POST   ER  Event Message Revision    04   IPMI v2 0   ST  Sensor Type    12h   System Event  From IPMI Specification Table 42 3  Sensor Type Codes    SN  Sensor Number   83h   EDIR  Event Direction Event Type    6fh   7    0   Assertion Event   6 0    6fh   Sensor specific   ED1  Event Data 1    05h   Timestamp Clock Synchronization   ED2  Event Data 2    80h   Second in pair    BIOS SMI Handler Timestamp Events    RID 1F  00  RT 02  TS C3  70  SD  4F  GID 33  00  ER 04  ST 12  SN 83  EDIR 6F  ED1 05  ED2 00  ED3 FF     RID  Record ID   
62. Fatal Error and Fatal Error  2    The system detected a QPI fatal or non recoverable error  This is a fatal error     Table 54  QPI Fatal Error Sensor Typical Characteristics       Byte    Field    Description       Generator ID    0033h   BIOS SMI Handler       11    Sensor Type    13h   Critical Interrupt       12    Sensor Number    07h       13    Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   73h  OEM Discrete        14          Event Data 1        7 6      10b   OEM code in Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset  Oh   Link Layer Uncorrectable ECC Error  1h   Protocol Layer Poisoned Packet Reception Error          Revision 1 1    Intel order number G90620 002    65    Processor Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Byte Field Description   2h   Link PHY Init Failure with resultant degradation in link width   3h   PHY Layer detected drift buffer alarm   4h   PHY detected latency buffer rollover   5h   PHY Init Failure   6h   Link Layer generic control error  buffer overflow underflow  credit underflow and so on   7h   Parity error in link or PHY layer   8h   Protocol layer timeout detected   9h   Protocol layer failed response   Ah   Protocol layer illegal packet field  target Node ID Error  and so on  Bh   Protocol Layer Queue table over
63. Intelligent Power Node Manager   s operational capabilities  This applies  to all domains     Assertion and deassertion of these events are supported     Table 96  Node Manager Operational Capabilities Change Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 002Ch or 602Ch     ME Firmware  9  11 Sensor Type DCh   OEM  12 Sensor Number 1Ah  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   74h  OEM     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Current state of Operational Capabilities  Bit pattern   0     Policy interface capability  0     Not Available  1     Available  1     Monitoring capability  0     Not Available  1     Available  2     Power limiting capability  0     Not Available  1     Available    15 Event Data 2 Not used             16 Event Data 3 Not used                   120 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Manageability Engine  ME  Events    13 4 1 Node Manager Operational Capabilities Change   Next Steps   Policy Interface available indicates that Intel   Intelligent Power Node Manager is able to respond to the external interface about  querying and setting Intel   Intelligent Power Node Manager policies  This is generally available
64. Mrgn2   2  Ensure the SDR is programmed and correct chassis has been selected   Beh P2 DIMM Thrm Mrgn1  3  Ensure there are no fan failures                          52 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Cooling Subsystem       Sensor Sensor Name Next Steps  Number       B3h P2 DIMM Thrm Mrgn2  4  Ensure the air used to cool the system is within the thermal specifications for the system  typically below 35  C         B4h P3 DIMM Thrm Mrgn1  B5h P3 DIMM Thrm Mrgn2  B6h P4 DIMM Thrm Mrgn1  B7h P4 DIMM Thrm Mrgn2  C8h Agg Therm Mrgn 1  C9h Agg Therm Mrgn 2  CAh Agg Therm Mrgn 3  CBh Agg Therm Mrgn 4  CCh Agg Therm Mrgn 5  CDh Agg Therm Mrgn 6  CEh Agg Therm Mrgn 7  CFh Agg Therm Mrgn 8                                                    5 2 3 Processor Thermal Control Sensors   The BMC FW monitors the percentage of time that a processor has been operationally constrained over a given time window   nominally six seconds  due to internal thermal management algorithms engaging to reduce the temperature of the device  This  monitoring is instantiated as one IPMI analog threshold sensor per processor package     If this is not addressed  the processor will overheat and shut down the system to protect itself from damage     Table 41  Processor Thermal Control Sensors Typical Characteristics             Byte Field Description   11 Sensor
65. Next Steps  Numb T Fa  nai ype Hex Description  97h P2 Mem23 VRD Hot Processor 2 Memory 2 3 voltage regulator  overheated  98h P3 Mem01 VRD Hot Processor 3 Memory 0 1 voltage regulator  overheated  99h P4 Mem23 VRD Hot Processor 3 Memory 2 3 voltage regulator  overheated  9Ah P4 Mem01 VRD Hot Processor 4 Memory 0 1 voltage regulator  overheated  9Bh P4 Mem23 VRD Hot Processor 4 Memory 2 3 voltage regulator  overheated  5 2 6 DIMM Thermal Trip Sensors    The BMC supports DIMM Thermal Trip monitoring that is instantiated as one aggregate IPMI discrete sensor per CPU  When a  DIMM Thermal Trip occurs  the system hardware will automatically power down the server and the BMC will assert the sensor offset  and log an event     Revision 1 1    Table 46  DIMM Thermal Trip Typical Characteristics                      Byte Field Description  11 Sensor Type OCh   Memory  12 Sensor Number COh   Processor 1 DIMM Thermal Trip  Cth   Processor 2 DIMM Thermal Trip  C2h   Processor 3 DIMM Thermal Trip  C3h   Processor 4 DIMM Thermal Trip  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2           5 4      00b   Unspecified Event Data 3          Intel order number G90620 002    57       Cooling Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families   
66. Processor E5 4600 2600 2400 1600 1 400 Product Families    44 3 Power Supply Current Out   Sensors    PMBus  compliant power supplies may monitor the current output of the main 12v voltage rail and report the current usage as a  percentage of the maximum power output for that rail     Table 24  Power Supply Current Out   Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 03h   Current  12 Sensor Number 58h   Power Supply 1 Current Out    59h   Power Supply 2 Current Out    13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold   14 Event Data 1  7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0  Event Trigger Offset as described in Table 25          15 Event Data 2 Reading that triggered event       16 Event Data 3 Threshold value that triggered event                   The following table describes the severity of each of the event triggers for both assertion and deassertion     Table 25  Power Supply Current Out   Sensor   Event Trigger Offset   Next Steps                   Event Trigger Offset Assertion Deassert Description Next Steps  Severity Severity  Hex Description  07h Upper non critical Degraded   OK PMBus  feature to monitor power   If you see this event  the system is using too much power on the  going high supply power consumption  output for the PSU rating   09h Upper critical non fatal Degraded
67. Product Families moves  IPMI ownership of the HDD sensors to the BMC  Note that systems may have multiple storage backplanes  Hard Disk Drive status  monitoring is supported through disk status sensors owned by the BMC     112    Table 89  Hard Disk Drive Monitoring Sensor Typical Characteristics                            Byte Field Description  11 Sensor Type ODh   Drive Slot  Bay   12 Sensor Namba  60h 68h   Hard Disk Drive 15 23 Status  FOh FEh   Hard Disk Drive 0 14 Status  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset as described in Table 90  15 Event Data 2 Not used          Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Hot Swap Controller Backplane Events       Byte Field    Description          16       Event Data 3       Not used          Table 90  Hard Disk Drive Monitoring Sensor   Event Trigger Offset   Next Steps                         Event Description Next Steps  Trigger  00h Drive Presence If during normal operation the state changes unexpectedly  ensure that the drive was seated properly and the drive carrier was  S properly latched  If that does not work  replace the drive    Oth Drive Fault
68. Record access   2   3 Record Type  7 0    DCh   OEM timestamped  bytes 8 16 OEM defined   4 Timestamp Time when the event was logged  LS byte first    5   6   7   8 IPMI Manufacturer ID 0137h  311d    IANA enterprise number for Microsoft   9   10   11 Record ID Sequential number reflecting the order in which the records are read  The numbers start at 1 for the first entry in  the SEL and continue sequentially to n  the number of entries in the SEL    12 Boot Time Timestamp of when the system booted into the OS   13   14   15   16 Reserved 00h                   Revision 1 1 Intel order number G90620 002 125    Microsoft windows   Records  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    14 2 Shutdown Event Records    When the system shuts down from the Microsoft Windows  OS  multiple events can be logged  The first is an OS Stop Shutdown  Event Record  this can be followed by a shutdown reason code OEM record  and then zero or more shutdown comment OEM  records  These are all informational only records     Table 100  Shutdown Reason Code Event Record Typical Characteristics                   Byte Field Description  8 Generator ID 0041h     System Software with an ID   20h  9  11 Sensor Type 20h   OS Stop Shutdown  12 Sensor Number 00h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 E
69. Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset  01h   System Boot  05h   Timestamp Clock Synchronization          15 Event Data 2 For Event Trigger Offset 05h only  Timestamp Clock  Synchronization     00h   1st in pair  80h   2nd in pair    16 Event Data 3 Not used                      88 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  System BIOS Events    9 2 System Firmware Progress  Formerly Post Error     The BIOS logs any POST errors to the SEL  The 2 byte POST code gets logged in the ED2 and ED3 bytes in the SEL entry  This  event will be logged every time a POST error is displayed  Even though this event indicates an error  it may not be a fatal error  If this  is a serious error  there will typically also be a corresponding SEL entry logged for whatever was the cause of the error     this event  may contain more information about what happened than the POST error event     Table 71  POST Error Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0001h   BIOS POST  9  11 Sensor Type OFh   System Firmware Progress  formerly POST Error   12 Sensor Number 06h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   
70. State    Rank Sparing Mode is a Memory RAS configuration option that reserves one memory rank per channel as a    spare rank     If any rank  on a given channel experiences enough Correctable ECC Errors to cross the Correctable Error Threshold  the data in that rank is  copied to the spare rank  and then the spare rank is mapped into the memory array to replace the failing rank     Rank Sparing Mode protects memory data by reserving a    Spare Rank    on each channel that has memory installed on it  If a    Correctable Error Threshold event occurs  the data from the failing rank is copied to the Spare Rank on the same channel  and the  failing DIMM is disabled  Because the Sparing Domain is no longer redundant  a Sparing Redundancy State SEL Event is logged     74 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Table 62  Sparing Redundancy State Sensor Typical Characteristics                                     Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type Och   Memory  12 Sensor Number 11h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   OBh  Generic Discrete   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset  Oh   Fully Redundant  2h   Redundancy Degr
71. System Event Log Troubleshooting  Guide for EPSD Platforms Based on  Intel    Xeon    Processor    5  4600 2600 2400 1600 1400  Product Families    Intel order number G90620 002    Revision 1 1    SI   intel S September 2013    SERVER  BOARD inside    Enterprise Platforms and Services Division     Marketing       Revision History   System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1 600 1 400 Product Families    Revision History    Number    January 2013 Initial release       September 2013   1 1 Added MIC Thermal Margin sensors C4 through C7   Added MIC Status sensors A2  A3  A6  and A7   Added voltage sensors EA  EB  EC  ED  and EF   Corrected typographical errors   Made corrections to Firmware Update Status table   Made corrections to Catastrophic Error Sensor table     Added support for S1400FP  S1400SP  S1600JUP  and S4600LH     ji Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1 600 1 400 Product Families Disclaimers    Disclaimers    INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS  NO LICENSE   EXPRESS OR IMPLIED  BY ESTOPPEL OR OTHERWISE  TO ANY INTELLECTUAL PROPERTY RIGHTS IS  GRANTED BY THIS DOCUMENT  EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR  SUCH PRODUCTS  INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR  IMPLIED WARRANTY
72. Threshold based Temperature  23h  Platform Specific  SE Table 37  Temperature Sensors     Next Steps  Baseboard Temperature 3 Threshold based Temperature  24h  Platform Specific  Sensors Table 37  Temperature Sensors     Next Steps  Baseboard Temperature 4 d p  25h  Platform SE Tee based Temperature Table 37  Temperature Sensors     Next Steps  IO Module Temperature S  26h WO Mod rid Tee based Temperalure Table 37  Temperature Sensors     Next Steps  PCI Riser 1 Temperature   p  27h  PCI Riser 4 ee Tee based Temperature Table 37  Temperature Sensors     Next Steps  IO Riser Temperature    28h  IO Riser SE Tee Ge Table 37  Temperature Sensors     Next Steps  Hot Swap Back Plane 1 3  HSC Backplane Temperature    29h 2Bh   Temperature Sensor Table 88  HSC Backplane Temperature Sensor     Event Trigger Offset       HSBP 1 3 Temp  Res Next Steps  Revision 1 1 Intel order number G90620 002 15       Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                                                                               Sensor Sensor Name Details Section Next Steps  Number  PCI Riser 2 Temperature 4 p  2Ch  PCI Riser 2 Se Tech based Temperature Table 37  Temperature Sensors     Next Steps  SAS Module Temperature B p  2Dh  SAS Mod Temp  P Tee Ge Table 37  Temperature Sensors     Next Steps  Exit Air Temperature    2Eh  Exit Air Geen Tee pased Temperature Table 
73. Typical Characteristics                                     Byte Field Description  8 Generator ID 0041h     System Software with an ID   20h  9  11 Sensor Type 20h   OS Stop Shutdown  12 Sensor Number 00h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h   Runtime Critical Stop  that is     core dump        blue screen      15 Event Data 2 Not used  16 Event Data 3 Not used          Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Microsoft Windows  Records    Table 104  Bug Check Blue Screen code OEM Event Record Typical Characteristics                   Byte Field Description  1 Record ID ID used for SEL Record access  2  3 Record Type  7 0    DEh   OEM timestamped  bytes 8 16 OEM defined  4 Timestamp Time when the event was logged  LS byte first   5  6  7  8 IPMI Manufacturer ID 0137h  311    IANA enterprise number for Microsoft  9 0157h  348    IANA enterprise number for Intel    ary  CH    The value logged depends on the Intelligent Management Bus Driver  IMBDRV  that is loaded         e   k    Sequence Number Sequential number reflecting the order in which the records are read  The numbers start at 1 for
74. a 2   5 4      10b   OEM code in Event Data 3   3    Node Manager Policy event   0     Threshold exceeded    1     Policy Correction Time Exceeded     Policy did not meet the contract for the defined policy  The policy will continue to  limit the power or shut down the platform based on the defined policy action      2      Reserved   1 0      Threshold Number  Valid only if Byte 5 bit  8  is set to 0   0 to 2     Threshold index  15 Event Data 2  7 4      Reserved   3 0      Domain Id  Currently  supports only one domain  Domain 0     16 Event Data 3 Policy ID                            122 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Manageability Engine  ME  Events    13 5 1 Node Manger Alert Threshold Exceeded   Next Steps  First occurrence of not acknowledged event will be retransmitted no faster than every 300 milliseconds   First occurrence of Threshold exceeded event assertion deassertion will be retransmitted no faster than every 300 milliseconds     Next steps depend on the policy that was set  See the Node Manager Specification for more details     Revision 1 1 Intel order number GS0620 002 123    Microsoft Windows  Records    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    14  Microsoft Windows  Records    With Microsoft Windows S
75. a common platform instrumentation interface to enable interoperability between       The baseboard management controller and chassis    The baseboard management controller and systems management software    Between servers    IPMI enables the following     Common access to platform management information  consisting of       Local access from systems management software    Remote access from LAN    Inter chassis access from Intelligent Chassis Management Bus      Access from LAN  serial modem  IPMB  PCI SMBus     or ICMB  available even if the  processor is down       IPMI interface isolates systems management software from hardware      Hardware advancements can be made without impacting the systems management  software      IPMI facilitates cross platform management software     You can find more information on IPMI at the following URL   http  Avww intel com design servers ipmi    1 2 2 Baseboard Management Controller  BMC     A baseboard management controller  BMC  is a specialized microcontroller embedded on most  Intel   Server Boards  The BMC is the heart of the IPMI architecture and provides the  intelligence behind intelligent platform management  that is  the autonomous monitoring and  recovery features implemented directly in platform management hardware and firmware     Different types of sensors built into the computer system report to the BMC on parameters such    as temperature  cooling fan speeds  power mode  operating system status  and so on  The BMC  monito
76. able 8  Management Engine Firmware owned Sensors                                        Sensor Sensor Name Details Section Next Steps  Number  17h ME Firmware Health Events ME Firmware Health Event ME Firmware Health Event     Next Steps  18h N M E ion E   N Manager Exception Event     Next Step  8 ode Manager Exception Events EE Gaon Everi ode Manager Exception Even ext Steps  19h Node Manager Health Events Node Manager Health Event Node Manager Health Event     Next Steps  1Ah Node Manager Operational Capabilities Node Manager Operational Capabilities Node Manager Operational Capabilities Change     Next  Change Events Change Steps  Node Manager Alert Threshold  1Bh Node Manger Alert Threshold Exceeded  Exceeded Events g Node Manger Alert Threshold Exceeded     Next Steps  Revision 1 1 Intel order number G90620 002 25          Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    3 5 Microsoft  OS owned Events  GID   0041     The following table can be used to find the details of records that are owned by the Microsoft  Operating System  OS      Table 9  Microsoft  OS owned Events                               Sensor Name Record Sensor Type Details Section Next Steps  Type  Stee 02h 1Fh   OS Boot Table 98  Boot up Event Record Typical Characteristics Not applicable  oot Event  DCh Not applicable Table 99  Boot up OEM Event Record Typical Characteristics  02
77. aded  15 Event Data 2 Location   7 4    Sparing Domain  0 3   Channel A D for Socket   3 2    Reserved   1 0    Rank on DIMM  0 3   Rank Number  16 Event Data 3 Location   7 5   Socket ID  0 3   CPU1 4   4 3    Channel  0 3   Channel A D for Socket   2 0    DIMM  0 2   DIMM 1 3 on Channel          Revision 1 1    Intel order number G90620 002    Memory Subsystem    75    Memory Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    7 4 1 Sparing Redundancy State Sensor   Next Steps    This event is accompanied by memory errors indicating the source of the issue  Troubleshoot accordingly  probably replace affected  DIMM      For boards with DIMM Fault LEDs  the appropriate Fault LED is lit to indicate which DIMM was the source of the error triggering the  Mirroring Failover action  that is  the failing DIMM     7 5 ECC and Address Parity    1  Memory data errors are logged as correctable or uncorrectable   2  Uncorrectable errors are fatal   3  Memory addresses are protected with parity bits and a parity error is logged  This is a fatal error     7 5 1 Memory Correctable and Uncorrectable ECC Error    ECC errors are divided into Uncorrectable ECC Errors and Correctable ECC Errors  A    Correctable ECC Error    actually represents a  threshold overflow  More Correctable Errors are detected at the memory controller level for a given DIMM within a given timeframe   In both cases  th
78. ailure Major  857F DIMM_L2 encountered a Serial Presence Detection  SPD  failure Major   Go to 85E0   85C0 DIMM_L3 failed test initialization Major  85C1 DIMM_M1 failed test initialization Major  85C2 DIMM_N2 failed test initialization Major  85C3 DIMM_MS failed test initialization Major  85C4 DIMM_N1 failed test initialization Major  85C5 DIMM_N2 failed test initialization Major  85C6 DIMM_N3 failed test initialization Major  85C7 DIMM_P1 failed test initialization Major  85C8 DIMM_P2 failed test initialization Major  85C9 DIMM_P3 failed test initialization Major  85CA DIMM_R1 failed test initialization Major  85CB DIMM_R2 failed test initialization Major  85CC DIMM_R3 failed test initialization Major  85CD DIMM_T1 failed test initialization Major  85CE DIMM_T2 failed test initialization Major  85CF DIMM_TS failed test initialization Major  85D0 DIMM_L3 disabled Major                Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Revision 1 1                                                                                                       Error Code Error Message Response  85D1 DIMM_M1 disabled Major  85D2 DIMM_M2 disabled Major  85D3 DIMM M   disabled Major  85D4 DIMM_N1 disabled Major  85D5 DIMM_N2 disabled Major  85D6 DIMM_N3 disabled Major  85D7 DIMM_P1 disabled Major  85D8 DIMM_P2 disabled Major  85D9 DIMM_P3 disabled Major
79. ajor  8555 DIMM_H1 disabled Major  8556 DIMM_H2 disabled Major  8557 DIMM H3 disabled Major  8558 DIMM_J1 disabled Major  8559 DIMM_J2 disabled Major  855A DIMM_J3 disabled Major  855B DIMM_K1 disabled Major  855C DIMM_K2 disabled Major  855D DIMM_K3 disabled Major  855E DIMM_L1 disabled Major  855F DIMM_L2 disabled Major   Go to 85D0    8560 DIMM_A1 encountered a Serial Presence Detection  SPD  failure Major  8561 DIMM_A2 encountered a Serial Presence Detection  SPD  failure Major  8562 DIMM_A3 encountered a Serial Presence Detection  SPD  failure Major  8563 DIMM_B1 encountered a Serial Presence Detection  SPD  failure Major  8564 DIMM_B2 encountered a Serial Presence Detection  SPD  failure Major  8565 DIMM_B3 encountered a Serial Presence Detection  SPD  failure Major  8566 DIMM_C1 encountered a Serial Presence Detection  SPD  failure Major  8567 DIMM_C2 encountered a Serial Presence Detection  SPD  failure Major  8568 DIMM_C3 encountered a Serial Presence Detection  SPD  failure Major  8569 DIMM_D1 encountered a Serial Presence Detection  SPD  failure Major  856A DIMM_D2 encountered a Serial Presence Detection  SPD  failure Major  856B DIMM_D3 encountered a Serial Presence Detection  SPD  failure Major  856C DIMM_E1 encountered a Serial Presence Detection  SPD  failure Major  856D DIMM_E2 encountered a Serial Presence Detection  SPD  failure Major  856E DIMM_E3 encountered a Serial Presence Detection  SPD  failure Major  856F DIMM_F1 encountered a Serial Presence Detecti
80. an spins too slowly     Table 29  Fan Tachometer Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 04h   Fan  12 Sensor Number 30h 3Fh  Chassis specific   BAh BFh  Chassis specific   13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold   14 Event Data 1  7 6      01b   Trigger reading in Event Data 2     5 4      01b   Trigger threshold in Event Data 3   3 0  Event Trigger Offset as described in Table 30          15 Event Data 2 Reading that triggered event       16 Event Data 3 Threshold value that triggered event                   The following table describes the severity of each of the event triggers for both assertion and deassertion     Revision 1 1 Intel order number G90620 002 45    Cooling Subsystem    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 30  Fan Tachometer Sensor   Event Trigger Offset   Next Steps                                     Event Trigger Offset Assertion Deassert Description Next Steps  Severity Severity  Hex Description  00h   Lower non critical   Degraded OK The fan speed has dropped A fan speed error on a new system build is typically not caused by the fan  going low below its lower non critical spinning too slowly  instead it is caused by the fan being connected to the  threshold  wrong header  the BMC expects them on certai
81. ber  Processor 4 DIMM Aggregate  B7h Thermal Margin 2 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P4 DIMM Thrm Mrgn2   Node Auto Shutdown Sensor  B8h  AutorShutdown  Node Auto Shutdown Sensor Node Auto Shutdown Sensor     Next Steps  Fan Tachometer Sensors     BAh BFh  Chassie specii sensor names  Fan Tachometer Sensors Table 30  Fan Tachometer Sensor     Event Trigger Offset     Next Steps  Processor 1     4 DIMM Thermal  COh C3h_   Trip DIMM Thermal Trip Sensors DIMM Thermal Trip Sensors     Next Steps   P1     P4 Mem Thrm Trip   Intel   Xeon Phi    Coprocessor e m    Intel    Xeon Phi Coprocessor    Th al Margin 1  cah ie an  MIC  Thermal Margin Sensors   Not applicable  Intel   Xeon Phi    Coprocessor e DI  C5h Thermal Margin 2 ie EE Not applicable   MIC 2 Margin   Intel   Xeon Phi    Coprocessor e DI  C6h Thermal Margin 3 e E e Not applicable   MIC 3 Margin   Intel   Xeon Phi    Coprocessor e DI  cm   Thermal Margin 4  EE   use   MIC 4 Margin   Global Aggregate Temperature  C8h CFh   Margin 1  8 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   Agg Therm Mrgn 1     8   Baseboard  12V 3 g  DOh a S threshold based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  12 0V  Sensors  B V 2 g  Dih ASC DOAN  Threshold based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  5 0V  Sensors       Revision 1 1    Intel order number G90620 002 21       Sensor Cross Reference List  System Event
82. but the system remained powered on  If these events  continue to occur  it is advisable to check your power source     RID 5D  00  RT 02  TS D3  B1  AE  4E  GID 20  00  ER 04  ST 08  SN 50  EDIR 6F  ED1 A2  ED2 06  ED3 30   RID  Record ID    005Dh  RT  Record Type    02h   system event record  TS  Timestamp    4EAEB1D3h  GID  Generator ID   0020h   BMC  ER  Event Message Revision    04   IPMI v2 0  ST  Sensor Type    08h   Power Supply  From IPMI Specification Table 42 3  Sensor Type Codes   SN  Sensor Number   50h   Power Supply 1  EDIR  Event Direction Event Type    6Fh   7    0   Assertion Event   6 0    6fh   Sensor specific   ED1  Event Data 1    A2h   7 6    10b   OEM code in Event Data 2    5 4    10b   OEM code in Event Data 3    3 0      Event Trigger Offset   2h   Predictive Failure  ED2  Event Data 2    06h   Input under voltage warning  ED3  Event Data 3    30h  From PMBus  Specification STATUS_INPUT command    5      VIN_UV_WARNING  Input Under voltage Warning    1    4      VIN_UV_FAULT  Input Under voltage Fault    1    12 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    3  Sensor Cross Reference List    Sensor Cross Reference List    This section contains a cross reference to help find details on any specific SEL entry     3 1 BMC owned Sensors  GID   0020h     The following table can be used to find the details of sensors
83. cation from hex version of SEL   Error is a fatal issue that will typically lead to an OS crash 3  Verify the DIMM is seated properly     unless memory has been configured in a RAS mode   3  E   id fi d f the DIMM t if  The system will generate a CATERR   catastrophic error  GE go VR ER O very  and an MCE  Machine Check Exception Error   SE ell    l  While the error may be due to a failing DRAM chip on the A e the ee EE connected to for  DIMM  it can also be cause by incorrect seating or ent RnS  an    5 ound  replace the board   improper contact between socket and DIMM  or by bent 5  Consider replacing the DIMM as a preventative measure   pins in the processor socket  For multiple occurrences  replace the DIMM    00h   Correctable ECC There have been too many  10 or more  correctable ECC Even though this event doesn t immediately lead to problems  it       Error threshold  reached       errors for this particular DIMM since last boot  This event  in itself does not pose any direct problems because the  ECC errors are still being corrected  Depending on the  RAS configuration of the memory  the IMC may take the  affected DIMM offline        can indicate one of the DIMM modules is slowly failing  If this  error occurs more than once     1   2   3     If needed  decode DIMM location from hex version of SEL   Verify the DIMM is seated properly    Examine gold fingers on edge of the DIMM to verify  contacts are clean     Inspect the processor socket this DIMM is connected to for 
84. chassis 1  Use the Quick Start Guide and the Service Guide to determine whether  intrusion sensor is not connected   the chassis intrusion switch is connected properly   00h chassis 2  If this is the case  make sure it makes proper contact when the chassis is  intrusion closed   3  If this is also the case  someone has opened the chassis  Ensure nobody  has access to the system that shouldn t   Someone has unplugged a LAN cable that was This is most likely due to unplugging the cable but can also happen if there is  present when the BMC initialized  This event gets an issue with the cable or switch   oan   LAN leash   logged when the electrical connection on the NIC 1  Check the LAN cable and connector for issues   lost connector gets lost  A      2  Investigate switch logs where possible   3  Ensure nobody has access to the server that shouldn t                       10 2 FP  NMI  Interrupt    The BMC supports an NMI sensor for logging an event when a diagnostic interrupt is generated for the following cases     e The front panel diagnostic interrupt button is pressed   e The BMC receives an IPMI Chassis Control command that requests this action     The front panel interrupt button  also referred to as NMI button  is a recessed button on the front panel that allows the user to force a  critical interrupt which causes a crash error or kernel panic     98 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    X
85. ct Families  Miscellaneous Events    11 2 1 SMI Timeout   Next Steps  This event normally only occurs after another more critical event   1  Check the SEL for any critical interrupts  memory errors  bus errors  PCI errors  or any other serious errors     2  If these are not present  the system locked up before it was able to log the original issue  In this case  low level debug is normally  required     11 3 System Event Log Cleared    The BMC logs a SEL clear event  This is only ever the first event in the SEL  Cause of this event is either a manual SEL clear using  selview or some other IPMI aware utility  or is done in the factory as one of the last steps in the manufacturing process     This is an informational event only     Table 80  System Event Log Cleared Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 10h   Event Logging Disabled  12 Sensor Number 07h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   2h   Log area reset cleared    15 Event Data 2 Not used  16 Event Data 3 Not used                            Revision 1 1 Intel order number GS0620 002 103    Miscellaneous Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Prod
86. cy PCI Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                Byte Field Description  4h   PCI PERR  5h   PCI SERR  15 Event Data 2 PCI Bus number  16 Event Data 3  7 3      PCI Device number   2 0      PCI Function number                   8 1 1 1 Legacy PCI Error Sensor   Next Steps    1  Decode the bus  device  and function to identify the card   2  If this is an add in card   a  Verify the card is inserted properly   b  Install the card in another slot and check whether the error follows the card or stays with the slot   c  Update all firmware and drivers  including non Intel components   3  If this is an on board device   a  Update all BIOS  firmware  and drivers   b  Replace the board     8 1 2 PCI Express  Fatal Errors and Fatal Error  2  When a PCI Express    fatal error is reported to the BIOS SMI handler  it will record the error using the following format     Table 67  PCI Express  Fatal Error Sensor Typical Characteristics                               Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type 13h   Critical Interrupt  12 Sensor Number 04h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   70h  OEM Specific        82 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    P
87. d  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families       Sensor Event Data  Class       11b   Reserved   5 4      00b   Unspecified Event Data 3  01b   Reserved  10b   OEM code in Event Data 3  11b   Reserved   3 0      Offset from Event Reading Type Code  Event Data 2   7 4      Optional OEM code bits or offset from    Severity    Event Reading Type Code  0Fh if unspecified     3 0      Optional OEM code or offset from Event Reading Type Code for previous event state  OFh if unspecified    Event Data 3     Optional OEM code FFh or not present if unspecified                 Table 3  OEM SEL Record  Type COh DFh                                Byte Field Description  1 Record ID ID used for SEL Record access   2  RID   3 Record Type  7 0      Record Type   RT  COh DFh   OEM timestamped  bytes 8 16 OEM defined  4 Timestamp Time when event was logged  LS byte first   5    TS  Example  TS  29  76  68  4C    4C687629h   1281914409   Sun  15 Aug 2010  6 23 20 09 UTC  7 Note  There are various websites that will convert the raw number to a date time   8 Manufacturer ID LS Byte first  The manufacturer ID is a 20 bit value that is derived from the IANA  9    Private Enterprise    ID   10 Most significant four bits   Reserved  0000b    000000h   Unspecified  OFFFFFh   Reserved   This value is binary encoded   For example the ID for the IPMI forum is 7154 decimal  which is 1BF2h  which will be  s
88. de for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Cooling Subsystem                         Byte Field Description  14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset as described in Table 32  15 Event Data 2 Not used  16 Event Data 3 Not used          The following table describes the severity of each of the event triggers for both assertion and deassertion     Table 32  Fan Presence Sensors   Event Trigger Offset   Next Steps                                              Event Trigger Offset   Assertion   Deassert Description Next Steps  Severity Severity  Hex Description  Oih Device OK Degraded Assertion    A fan was inserted  This   Informational only  Present event may also get logged when the  BMC initializes when AC is applied   Deassert     A fan was removed  or These events only get generated in the systems with hot swappable fans   was not present at the expected and normally only when a fan is physically inserted or removed  If fans  location when the BMC initialized  were not physically removed   1  Use the Quick Start Guide to check whether the right fan  headers were used   2  Swap the fans round to see whether the problem stays with the  location or follows the fan   3  Replace the fan or fan wiring housing depending on the outcome  of step 2   4  Ensure the latest FRUSDR update has been run and the correct  chassis is detected or
89. ded info byte in ED3 whether this is wear out Flash erase limit has been reached   protection causing this event  If so just wait until wear out protection  02h   expires  otherwise probably the flash device must be replaced  if Flash write limit has been reached  writing to flash has been disabled   03h SEENEN Writing to the flash has been enabled  04h Internal error  Error during firmware execution     FW Watchdog Operational image needs to be updated to other version or hardware board  Timeout  repair is needed  if error is persistent    05h BMC did not respond to cold reset request and Intel   ME rebooted Verify the Intel   Node Manager configuration   the platform   06h Direct Flash update requested by the BIOS  Intel   ME firmware will   This is transient state  Intel   ME firmware will return to operational mode  switch to recovery mode to perform full update from the BIOS  after successful image update performed by the BIOS   07h 04h   Manufacturing error  Wrong manufacturing configuration detected The flash device must be replaced  if error is persistent    by Intel   ME firmware   Intel   ME FW configuration is inconsistent or out of range  08h Persistent storage integrity error  Flash file system error detected  If error is persistent  restore factory presets using    Force ME Recovery     IPMI command or by doing AC power cycle with Recovery jumper asserted   09h Firmware Exception  Restore factory presets using    Force ME Recovery    IPMI command or by  doing AC
90. dundancy State Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type Och   Memory  12 Sensor Number Oih  13 Event Direction and  7  Event direction   Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   OBh  Generic Discrete     14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset   Oh   Fully Redundant   2h   Redundancy Degraded                      Revision 1 1 Intel order number G90620 002 73    Memory Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Byte Field Description       15 Event Data 2 Location   7 4    Mirroring Domain  0 1   Channel Pair for Socket   3 2    Reserved   1 0    Rank on DIMM  0 3   Rank Number       16 Event Data 3 Location   7 5    Socket ID  0 3   CPU1 4   4 3    Channel  0 3   Channel A D for Socket   2 0    DIMM          0 2   DIMM 1 3 on Channel             7 3 1 Mirroring Redundancy State Sensor   Next Steps    This event is accompanied by memory errors indicating the source of the issue  Troubleshoot accordingly  probably replace affected  DIMM      For boards with DIMM Fault LEDs  the appropriate Fault LED is lit to indicate which DIMM was the source of the error triggering the  Mirroring Failover action  that is  the failing DIMM     7 4 Sparing Redundancy 
91. e   DCh HES based voitage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 8V AUX  Sensors g p  Baseboard  1 1V Stand by Threshold based Voltage   DDh Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 1V STBY  Sensors  Baseboard CMOS Batter d g   DEh y Threshold based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  3 3V Vbat  Sensors       22 Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Sensor Cross Reference List                                                                Sensor Sensor Name Details Section Next Steps  Number  Baseboard  1 35V P1 Low Voltage  E4h Memory AB VDDQ Te reent Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 35 P1LV AB  ase  Baseboard  1 35V P1 Low Voltage  Jnresnoid based Vollage    E5h Memory CD VDDQ Tee based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 35 P1LV CD  ros  Baseboard  1 35V P2 Low Voltage Threshold based Volt  1hresnoid boasea Voltage    E6h Memory AB VDDQ SE ased voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 35 P2LV AB  we ass  Baseboard  1 35V P2 Low Voltage Threshold based Volt  1hresnoid boasea Vollage    E7h Memory CD VDDQ ee ased Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 35 P2LV CD  a eae te  Baseboard  3 3V Riser 1 Pow
92. e 01h   Temperature  12 Sensor Number See Table 40                   Revision 1 1 Intel order number GS0620 002 5     Cooling Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families          Byte Field Description  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold     14 Event Data 1  7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0      Event Triggers as described in Table 39          15 Event Data 2 Reading that triggered event       16 Event Data 3 Threshold value that triggered event                   Table 39  Thermal Margin Sensors Event Triggers   Description                                        Event Trigger Assertion Deassert Description  Hex Description Severity Severity  07h er Degraded OK The thermal margin has gone over its upper non critical threshold   09h ea a non fatal Degraded   The thermal margin has gone over its upper critical threshold   Table 40  Thermal Margin Sensors   Next Steps  Sensor Sensor Name Next Steps    Number       74h P1 Therm Margin  75h P2 Therm Margin  76h P3 Therm Margin  77h P4 Therm Margin          Not a logged SEL event  Sensor is used for thermal management of the processor           BOh P1 DIMM Thrm Mrgn1  4  Check for clear and unobstructed airflow into and out of the chassis   Bih P1 DIMM Thrm 
93. e Table 4        4 Timestamp Time when event was logged  LS byte first    5    TS  Example  TS  29  76  68  4C    4C687629h   1281914409   Sun  15 Aug 2010  6 23 20 09 UTC   7 Note  There are various websites that will convert the raw number to a date time                    4 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Basic Decoding of a SEL Record          Byte Field Description  8   Generator ID RqSA and LUN if event was generated from IPMB   9  GID  Software ID if event was generated from system software   Byte 1     7 1      7 bit TC Slave Address  or 7 bit system software ID   0  0b   ID is IPMB Slave Address  1b   System software ID  Software ID values       0001h     BIOS POST for POST errors  RAS Configuration State   Timestamp Synch  OS Boot events      0033h     BIOS SMI Handler    0020h     BMC Firmware     002Ch     ME Firmware    0041h     Server Management Software    00CO0h     HSC Firmware     HSBP A     00C2h     HSC Firmware     HSBP B  Byte 2   7 4      Channel number  Channel that event message was received over  Oh if the event    message was received from the system interface  primary IPMB  or internally generated  by the BMC      3 2      Reserved  Write as 00b    1 0      IPMB device LUN if byte 1 holds Slave Address  00b otherwise                 10   EvM Rev Event Message format version  04h   IPMI v2 0  03h
94. e error can be narrowed down to particular DIMM s   The BIOS SMI error handler uses this information to log the  data to the BMC SEL and identify the failing DIMM module     Table 63  Correctable and Uncorrectable ECC Error Sensor Typical Characteristics                      Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type Och   Memory  12 Sensor Number 02h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event    6 0  Event Type   6Fh  Sensor Specific    14 Event Data 1  7 6      10b   OEM code in Event Data 2                   76 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Memory Subsystem       Byte Field    Description        5 4      10b   OEM code in Event Data 3   3 0  Event Trigger Offset as described in Table 64       15 Event Data 2     7 2      Reserved  Set to 0    1 0      Rank on DIMM  0 3   Rank number       16 Event Data 3  7 5      Socket ID   4 3   Channel     2 0  DIMM             0 3   CPU1 4    0 3   Chan A D for Socket    0 2   DIMM 1 3 on Channel          Table 64  Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset   Next Steps                   Event Trigger Offset Description Next Steps  Hex Description  Oih   Uncorrectable ECC   An uncorrectable  multi bit  ECC error has occurred  This 1  If needed  decode DIMM lo
95. e sensor is rearmed on power on  AC or DC power on transitions      This sensor is only used for triggering SEL to indicate node or power auto shutdown assertion or deassertion     Table 19  Node Auto Shutdown Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 09h   Power Unit   12 Sensor Number B8h   13 Event Direction and  7  Event direction       Event Type    Ob   Assertion Event  1b   Deassertion Event             Revision 1 1    Intel order number G90620 002 37    Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families             Byte Field Description   6 0  Event Type   03h     digital    discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2     5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset  1h   State Asserted                         15 Event Data 2 Not used  16 Event Data 3 Not used  4 3 3 1 Node Auto Shutdown Sensor   Next Steps    This event is accompanied by specific power supply errors  AC lost  PSU failure  and so on  or other system events  Troubleshoot  these events accordingly     4 4 Power Supply  The BMC monitors the power supply subsystem   4 4 1 Power Supply Status Sensors    These sensors report the status of the power supplies in the system  When a system first AC applied or removed  it can log an event   Also if there is a failure  predictive failure  or a configuration error  
96. econd time synch message to get a  baseline  correct timestamp in the log  That is the  starting time     For example  say that the time the BMC has is March 1  2011 21 00  The BIOS time synch updates that to the same date  21 20  the  BMC was running behind   Without that second time synch message  you don t know that the log time jumped ahead  and when you  get the next log message it looks like there was a 20 min delay during the boot for some unknown reasons     Without that second time synch message  the time span to the next logged message is indeterminate  With the second time synch as  a baseline  the following log timestamps are always determinate     Revision 1 1 Intel order number G90620 002 87    System BIOS Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    The timestamp clock synchronization is run and the events are logged by the BIOS POST every time the system boots  In addition    during the shutdown from some Operating Systems the BIOS SMI Handler is called to run timestamp clock synchronization and log  the events     Table 70  System Event Sensor Typical Characteristics                   Byte Field Description  8 Generator ID   0001h   BIOS POST  9   0033h   BIOS SMI Handler  11 Sensor Type 12h   System Event  12 Sensor Number 83h  13 Event Direction and  7  Event direction   Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  
97. ectable errors are acceptable and normal at a low rate of occurrence  If the error continues     1  Check the processor is installed correctly   2  Inspect the socket for bent pins   3  Cross test the processor  If the issue remains with the processor socket  replace the main board  otherwise the processor     6 5 Processor ERR2 Timeout Sensor    The BMC supports an ERR2 Timeout Sensor  1 per CPU  that asserts if a CPU   s ERR2 signal has been asserted for longer than a  fixed time period   gt  90 seconds   ERR 2  is a processor signal that indicates when the IIO  Integrated IO module in the processor   has a fatal error which could not be communicated to the core to trigger SMI  ERR 2  events are fatal error conditions  where the  BIOS and OS will attempt to gracefully handle error  but may not always do so reliably  A continuously asserted ERR2 signal is an  indication that the BIOS cannot service the condition that caused the error  This is usually because that condition prevents the BIOS  from running     When an ERR2 timeout occurs  the BMC asserts deasserts the ERR2 Timeout Sensor  and logs a SEL event for that sensor  The    default behavior for BMC core firmware is to initiate a system reset upon detection of an ERR2 timeout  The BIOS setup utility  provides an option to disable or enable system reset by the BMC on detection of this condition     Revision 1 1 Intel order number GS0620 002 67    Processor Subsystem  System Event Log Troubleshooting Guide for EPSD Platf
98. ed Event Data 3   3 0      Event Trigger Offset   1h   State Asserted                            15 Event Data 2 Not used  16 Event Data 3 Not used  4 4 5 1 Power Supply Fan Tachometer Sensors   Next Steps    These events only get generated in the systems with PMBus  capable power supplies and normally when the airflow is obstructed to  the power supply     1  Remove and then reinstall the power supply to see whether something might have temporarily caused the fan failure    2  Swap the power supply with another one to see whether the problem stays with the location or follows the power supply   3  Replace the power supply depending on the outcome of steps 1 and 2    4  Ensure the latest FRUSDR update has been run and the correct chassis is detected or selected     44 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Cooling Subsystem    5  Cooling Subsystem    5 1 Fan Sensors    There are three types of fan sensors that can be present on Intel   Server Systems  speed  presence  and redundancy  The last two  are only present in the systems with hot swap redundant fans     5 1 1 Fan Tachometer Sensors    Fan tachometer sensors monitor the rom signal on the relevant fan headers on the platform  Fan speed sensors are threshold based  sensors  Usually they only have lower  critical  thresholds set  so that a SEL entry is only generated if the f
99. eeeeeneeeeeesnneeeeeeeaas 113  ME Firmware Health Event Sensor Typical Charachertetce eeen 115  ME Firmware Health Event Sensor     Next Step             ccecceeeeeeeeeereeeeeeeeeeeeeeeneeeeees 116  Node Manager Exception Sensor Typical Characteristics               c ccceceseeeeeeeeeneeeees 117  Node Manager Health Event Sensor Typical Characteristics           0   ccccceceeeeeeeees 118  Node Manager Operational Capabilities Change Sensor Typical Characteristics      120  Node Manager Alert Threshold Exceeded Sensor Typical Characteristics                122  Boot up Event Record Typical Charachertetcs  124  Boot up OEM Event Record Typical Characteristics              cccccseeceeeeeteeeeesetteeeeeeees 125  Shutdown Reason Code Event Record Typical Characteristics                ceeeeee 126  Shutdown Reason OEM Event Record Typical Characteristics             ccceeee 126  Shutdown Comment OEM Event Record Typical Characteristics                eeee 127  Bug Check Blue Screen     OS Stop Event Record Typical Characteristics              128  Bug Check Blue Screen code OEM Event Record Typical Characteristics             129  Linus  Kernel Panic Event Record Characteristics             cccccceeceeeeeeeeeeeeeeeeeeneeeees 130  Linux  Kernel Panic String Extended Record Characteristics              000sennnneeeeeeeen 131  Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1
100. ensor Table 48  Processor Status Sensors     Next Steps   P4 Status  oe  Processor 1 Thermal Margin  74h   9 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P1 Therm Margin   Processor 2 Thermal Margin  75h   9 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P2 Therm Margin   P Th   Margi  76h EE SE argin Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P3 Therm Margin   Processor 4 Thermal Margin  77h S 9 Thermal Margin Sensors Table 40  Thermal Margin Sensors     Next Steps   P4 Therm Margin   Processor 1     3 Thermal Control    78h 7Bh      Processor Thermal Control Processor Thermal Control   Sensors     Next Steps   P1     P4 Therm Ctrl    Sensors  Processor 1 ERR2 Timeout  7Ch Processor ERR2 Timeout Sensor   Processor ERR2 Timeout     Next Steps   P1 ERR2   Processor 2 ERR2 Timeout  7Dh Processor ERR2 Timeout Sensor   Processor ERR2 Timeout     Next Steps   P2 ERR2  Veet oe  Revision 1 1 Intel order number G90620 002 17       Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                                                                            Sensor Sensor Name Details Section Next Steps  Number   Processor 3 ERR2 Timeout   7Eh Processor ERR2 Timeout Sensor   Processor ERR2 Timeout     Next Steps   P3 ERR2   Processor 4 ERR2 Timeout   7Fh Processor ERR2 Timeout Sensor   Pr
101. eon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    10 2 1    Table 75  FP  NMI  Interrupt Sensor Typical Characteristics       Byte Field    Description       11 Sensor Type    13h   Critical Interrupt          12 Sensor Number 05h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific        14 Event Data 1     7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset  Oh       15 Event Data 2    Not used             16 Event Data 3       Not used          FP  NMI  Interrupt   Next Steps    Chassis Subsystem    The purpose of this button is for diagnosing software issues     when a critical interrupt is generated the OS typically saves a memory  dump  This allows for exact analysis of what is going on in system memory  which can be useful for software developers  or for  troubleshooting OS  software  and driver issues     If this button was not actually pressed  you should ensure there is no physical fault with the front panel     This event only gets logged if a user pressed the NMI button or sent an IPMI Chassis Control command requesting this action  and  although it causes the OS to crash  is not an error     Revision 1 1    Intel order number G90620 002    99    Chassis Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Famili
102. er  Jnresnoid based Vollage    EAh Good Lea based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  3 3 RSR1 PGD  ica   Baseboard  3 3V Riser 2 Power  Jnresnoid based Vollage    EBh Good Tee based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  3 3 RSR2 PGD  DEER  Baseboard  0 9V Threshold based Voltage  R   wm  ECh  BB 0 9V Core IB  Sensors Table 13  Threshold based Voltage Sensors     Next Steps  Baseboard  1 8V Threshold based Voltage  E Ge _  EDh  BB 1 8V IB 1 0  Sensors Table 13  Threshold based Voltage Sensors     Next Steps  Baseboard  1 1V Threshold based Voltage    eo      EEh  BB 1 1V PCH  Sensors Table 13  Threshold based Voltage Sensors     Next Steps  Baseboard  1 2V Threshold based Voltage    EFh  BB  1 2V IB  Sensors Table 13  Threshold based Voltage Sensors     Next Steps  FOh FEh Hard Disk Drive 0  14 Status Hard Disk Drive Monitoring Table 90  Hard Disk Drive Monitoring Sensor   Event Trigger Offset     Next   HDD 0     14 Status  Sensor Steps  Revision 1 1 Intel order number G90620 002 23          Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    3 2 BIOS POST owned Sensors  GID   0001h     The following table can be used to find the details of sensors owned by BIOS POST     Table 6  BIOS POST owned Sensors                               Sensor Sensor Name Details Section Next Steps  Number    
103. er power warning  03h   A C lost AC removed 00b   Unspecified Event Data 2 00b   Unspecified Event Data 3 Informational Event   06h   Configuration   Power supply 10b   OEM code in Event Data 2 00b   Unspecified Event Data 3 Indicates that at least one of       error       configuration is not  supported     Check the data in ED2 for  more details          01h  The BMC cannot access  the PMBus  device on the PSU  but its FRU device is  responding      02h     The PMBUS _REVISION  command returns a version  number that is not supported   only version 1 1 and 1 2 are  supported       03h     The PMBus  device does  not successfully respond to the  PMBUS _REVISION command      04h     The PSU is incompatible  with one or more PSUs that are  present in the system      05h  The PSU FW is operating  in a degraded mode  likely due  to a failed firmware update            the supplies is not correct for  your system configuration     1  Remove the power  supply and verify  compatibility    2  If the power supply is  compatible  it may be  faulty  Replace it        40    Intel order number GS0620 002    Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    4 4 2    Power Supply Power In Sensors    Power Subsystems    These sensors will log an event when a power supply in the system is exceeding its AC power in threshold     Table 22  Power Supply Power In Sensors Typical Charact
104. eristics       Description       OBh   Other Units       54h   Power Supply 1 Status  55h   Power Supply 2 Status       Byte Field   11 Sensor Type   12 Sensor Number   13 Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   01h Threshold        14 Event Data 1     7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0  Event Trigger Offset as described in Table 23       15 Event Data 2    Reading that triggered event             16 Event Data 3       Threshold value that triggered event          The following table describes the severity of each of the event triggers for both assertion and deassertion     Table 23  Power Supply Power In Sensor   Event Trigger Offset   Next Steps                Event Trigger Offset Assertion Deassert Description Next Steps  Severity Severity  Hex Description  07h   Upper non critical   Degraded OK PMBus  feature to monitor power If you see this event  the system is pulling too much power on the                      going high supply power consumption  input for the PSU rating   09h   Upper critical non fatal Degraded 1  Verify the power budget is within the specified range   going high 2  Check http   www intel com p en_US support  for the  power budget tool for your system   Revision 1 1 Intel order number G90620 002 4        Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    
105. ert Description Next Steps  Severity Severity   Hex Description  07h   Upper non critical Degraded OK An upper non critical or 1  Check for clear and unobstructed airflow into and out of the chassis    going high critical temperature 2  Ensure SDR is programmed and correct chassis has been selected    IS   threshold has been f   09h   Upper critical going   non fatal Degraded crossed  3  Ensure there are no fan failures    high 4  Ensure the air used to cool the system is within the thermal   specifications for the system  typically below 35  C     Revision 1 1 Intel order number G90620 002 43       Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    4 4 5 Power Supply Fan Tachometer Sensors    The BMC polls each installed power supply using the PMBus  fan status commands to check for failure conditions for the power  supply fans     Table 28  Power Supply Fan Tachometer Sensors Typical Characteristics             Byte Field Description  11 Sensor Type 04h   Fan  12 Sensor Number AOh   Power Supply 1 Fan Tachometer 1    Ath   Power Supply 1 Fan Tachometer 2  A4h   Power Supply 2 Fan Tachometer 1  A5h   Power Supply 2 Fan Tachometer 2       13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   03h     digital    Discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecifi
106. erver 2003  R2 and later versions  an Intelligent Platform Management Interface  IPMI  driver was added   This added the capability of logging some OS events to the SEL  The driver can write multiple records to the SEL for the following    events      Boot up    Shutdown      Bug Check   Blue Screen    14 1 Boot up Event Records    When the system boots into the Microsoft Windows  OS  two events can be logged  The first is a boot up record and the second is    an OEM event  These are informational only records     Table 98  Boot up Event Record Typical Characteristics                Byte Field Description   8 Generator ID 0041h     System Software with an ID   20h  9   11 Sensor Type 1Fh   OS Boot   12 Sensor Number 00h       13 Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific        14 Event Data 1     7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h   C  boot completed       15 Event Data 2    Not used             16 Event Data 3       Not used          124    Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Microsoft Windows  Records    Table 99  Boot up OEM Event Record Typical Characteristics                            Byte Field Description   1 Record ID ID used for SEL 
107. es    10 3 Button Sensor    The BMC logs when the front panel power and reset buttons get pressed  This is purely for informational purposes and these events  do not indicate errors     Table 76  Button Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 14h   Button   Switch  12 Sensor Number 09h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   Oh   Power Button   2h   Reset Button          15 Event Data 2 Not used                   16 Event Data 3 Not used       100 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Miscellaneous Events    11  Miscellaneous Events    The miscellaneous events section addresses sensors not easily grouped with other sensor types     11 1 IPMI Watchdog    EPSD server systems support an IPMI watchdog timer  which can check to see whether the OS is still responsive  The timer is  disabled by default  and has to be enabled manually  It then requires an IPMl aware utility in the operating system that will reset the  timer before it expires  If the timer does expire  the BMC can take action if it is configured to do so  reset  power down  power cycle   or
108. ffset    Refer to the latest Intel   Xeon Phi     Coprocessor Adapter specification                             15 Event Data 2 Not used  16 Event Data 3 Not used   KEN Intel    Xeon Phi    Coprocessor  MIC  Status Sensors Next Steps    Refer to the latest Intel   Xeon Phi    Coprocessor Adapter specification for the next steps     110 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Hot Swap Controller Backplane Events    12  Hot Swap Controller Backplane Events    All new EPSD Platforms Based on Intel   Xeon   Processor E5 4600 2600 2400 1600 Product Families backplanes follow a hybrid  architecture  in which the IPMI functionality previously supported in the HSC is integrated into the BMC FW     12 1 HSC Backplane Temperature Sensor  There is a thermal sensor on the Hot Swap Backplane to measure the ambient temperature     Table 87  HSC Backplane Temperature Sensor Typical Characteristics             Byte Field Description  11 Sensor Type 01h   Temperature  12 Sensor Number 29h   HSBP 1 Temp    2Ah   HSBP 2 Temp  2Bh   HSBP 3 Temp       13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   01h  Threshold   14 Event Data 1  7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0      Event Trigger Offset as described in Table
109. flow underflow   Ch   Viral Error   Dh   Protocol Layer parity error   Eh   Routing Table Error   Fh    unused    Reserved    15 Event Data 2 0 3   CPU1 4             16 Event Data 3 Not used                   The QPI Fatal Error  2 is a continuation of QPI Fatal Error     Table 55  QPI Fatal  2 Error Sensor Typical Characteristics                               Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type 13h   Critical Interrupt  12 Sensor Number 17h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   74h  OEM Discrete    14 Event Data 1  7 6      10b   OEM code in Event Data 2          66 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Processor Subsystem    Byte Field Description    5 4      00b   Unspecified Event Data 3    3 0      Event Trigger Offset  Oh   Illegal inbound request  1h   IIO Write Cache Uncorrectable Data ECC Error  2h   IIO CSR crossing 32 bit boundary Error  3h   IIO Received XPF physical logical redirect interrupt inbound  4h   IIO Illegal SAD or Illegal or non existent address or memory  5h   IIO Write Cache Coherency Violation                            15 Event Data 2 0 3   CPU1 4  16 Event Data 3 Not used  6 4 3 1 QPI Fatal Error and Fatal Error  2   Next Steps    This is an Informational event only  Corr
110. g the   03h   Non redundant  sufficient from redundant problem  and follow the troubleshooting  04h   Non redundant  sufficient from insufficient steps for these event types   05h   Non redundant  insufficient The system has lost fans and may no longer be able to cool   itself adequately  Overheating may occur if this situation   remains for a longer period of time   06h   Non redundant  degraded from fully The system has lost one or more fans and is running in non    redundant redundant mode  There are enough fans to keep the system   properly cooled  but fan speeds will boost    07h   Redundant  degraded from non redundant The system has lost one or more fans and is running in a          degraded mode  but still is redundant  There are enough fans  to keep the system properly cooled              48    Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    5 2    Cooling Subsystem    Temperature Sensors    There are a variety of temperature sensors that can be implemented on Intel   Server Systems  They are split into various types each  with their own events that can be logged     5 2 1    Threshold based Temperature   Thermal Margin   Processor Thermal Control     Processor DTS Thermal Margin  Monitor only   Discrete Thermal   DIMM Thermal Trip    Threshold based Temperature Sensors    Threshold based temperature sensors are sensors that 
111. ge sources in the system  including the baseboard  memory  and processors  using IPMI compliant  analog threshold sensors  Some voltages are only on specific platforms  For details check your platforms Technical Product  Specification  TPS      Note  A voltage error can be caused by the device supplying the voltage or by the device using the voltage  For each sensor it will be  noted who is supplying the voltage and who is using it     Table 11  Threshold based Voltage Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 02h   Voltage  12 Sensor Number See Table 13  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold   14 Event Data 1  7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0      Event Triggers as described in Table 12          15 Event Data 2 Reading that triggered event       16 Event Data 3 Threshold value that triggered event                   The following table describes the severity of each of the event triggers for both assertion and deassertion     Revision 1 1 Intel order number G90620 002 27    Power Subsystems    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 12  Threshold based Voltage Sensors Event Triggers   Description                                                             Eve
112. gine   Intel   ME  is an IPMI satellite controller  A mechanism exists to forward commands to Intel   ME  and then sends the response back to originator  Similarly events from Intel   ME will be sent as  alerts outside of the BMC     Revision 1 1 Intel order number G90620 002 3    Basic Decoding of a SEL Record  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    2  Basic Decoding of a SEL Record    The System Event Log  SEL  record format is defined in the  PMI Specification  The following section provides a basic definition for  each of the fields in a SEL  For more details see the  PMI  Specification     The definitions for the standard SEL can be found in Table 1     The definitions for the OEM defined event logs can be found in Table 3 and Table 4     2 1 Default Values in the SEL Records    Unless otherwise noted in the event record descriptions the following are the default values in all SEL entries     Byte  3    Record Type  RT    02h   System event record      Byte  9 8    Generator ID   0020h   BMC Firmware    Byte  10    Event Message Revision  ER    04h   IPMI 2 0    Table 1  SEL Record Format          Byte Field Description  1 Record ID ID used for SEL Record access   2  RID        3 Record Type  7 0      Record Type    RT  02h   System event record   COh DFh   OEM timestamped  bytes 8 16 OEM defined  See Table 3   EOh FFh   OEM non timestamped  bytes 4 16 OEM defined  Se
113. h 20h   OS Stop Shutdown   Table 100  Shutdown Reason Code Event Record Typical Characteristics Not applicable  Shutdown Event GE E AEE Table 101  Shutdown Reason OEM Event Record Typical Characteristics We  SE    Table 102  Shutdown Comment OEM Event Record Typical Characteristics PP  8 Table 103  Bug Check Blue Screen     OS Stop Event Record Typical    02h 20h   OS Stop Shutdown Characteristics Not applicable  Bug Check Blue Screen  DEh Not applicable Table 104  Bug Check Blue Screen code OEM Event Record Typical  Characteristics                   3 6 Linux  Kernel Panic Events  GID   0021     The following table can be used to find the details of records that can be generated when there is a Linux  Kernel panic     Table 10  Linux  Kernel Panic Events             Sensor Name Record Sensor Type Details Section Next Steps  Type  02h 20h   OS Stop Shutdown   Table 105  Linux  Kernel Panic Event Record Characteristics Not applicable          Linux  Kernel Panic  FOh Not applicable Table 106  Linux  Kernel Panic String Extended Record Characteristics                   26 Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Power Subsystems    4  Power Subsystems  The BMC monitors the power subsystem including power supplies  select onboard voltages  and related sensors     4 1 Threshold based Voltage Sensors    The BMC monitors the main volta
114. h Sensor     Next Steps        s nssssseesesseneessrerrsserrnnerrrreresrne 69   7  Memory Subsystem 52 8 cost cece ee ee ee 70  7 1 Memory RAS Configuration Status AC EEKENDEESEdEEEREUERSEEENE EEN EEN EAR 70  7 2 Memory RAS Mode Select             ccccccceceeeeeeceeeeeeeeeeeeeeeeeeaeeeeesaaaeeeeenaaeeeeseseeeseenaaes 72  7 3 Mirroring Redundancy State eeegedd erger 73  7 3 1 Mirroring Redundancy State Sensor     Next Steps         0    cccccccesssscceeeeeeeeeessesneaes 74   7 4 Sparing Redundancy State ics ccpsacteacedtandenedsedaenacdenankendvasdieassniene ens mane eeieinenees 74  7 4 1 Sparing Redundancy State Sensor     Next Greng    76   7 5 ECC anad Address Parity  eebe EE EE 76  7 5 1 Memory Correctable and Uncorrectable ECC Error               ccccceeeencceeeeeeeeeeseeeneaaes 76  7 5 2 Memory Address Parity Error                cceccseeececceeeeeeeeseeceeeeeeeeeeeeeseseneeeeeeeeeeeeneseeeeees 78   8  PCI Express  and Legacy PCI Subsystem            cccccseeeeeceeeeeeeeeeesenneeeeeeseeeeeenseeeeeeeeeeeeeees 81  8 1 PGI Express    Error Sereen nnp ee paea aaa aE a a eai AEAEE E 81  8 1 1 ele AS ee 81  8 1 2 PCI Express  Fatal Errors and Fatal Error Ai    82  8 1 3 PCI Express  Correctable te 84   9   System BIOS IE Vents ed 87  9 1 Ee A E E E tels seach een ees Meeedtea 87  9 1 1 System  Boot  aere 87  9 1 2 Timestamp Clock Synchronization 4  ssccicssosaeesedeencnaesceke avassieessvianereeet ends ARC ERENAEEN 87   9 2 System Firmware Progress  Formerly Post Error    89  
115. he issue remains  replace the board   3  If the issue remains  replace the power supplies    1 8V IB I O is supplied by the main board on specific platforms    1 8V IB I O is used by the on board Infiniband  controller on those specific platforms   Baseboard  1 8V  EDh 1  Ensure all cables are connected correctly    BB 1 8V IB I O  8    2  Ifthe issue remains  replace the board   3  Ifthe issue remains  replace the power supplies   This 1 1V line is supplied by the main board   EEh Baseboard  1 1V This 1 1V line is used by the Intel   C600 series Chipset    BB 1 1V PCH  1  Ensure all cables are connected correctly   2  Ifthe issue remains  replace the board    1 2V is supplied by the main board on specific platforms    1 2V is used by the on board Infiniband  controller on those specific platforms   Baseboard  1 2V  EFh 1  Ensure all cables are connected correctly    BB  1 2V IB       2  If the issue remains  replace the board   3  Ifthe issue remains  replace the power supplies           4 2 Voltage Regulator Watchdog Timer Sensor    The BMC FW monitors that the power sequence for the board VR controllers is completed when a DC power on is initiated   Incompletion of the sequence indicates a board problem  in which case the FW powers down the system     The sequence is as follows        BMC FW monitors the PowerSupplyPowerGood signal for assertion  indicating a DC power on has been initiated  and starts a  timer  VR Watchdog Timer   For EPSD Platforms Based on Intel   Xeon  
116. he products described in this document may contain design defects or errors known as errata which may cause the  product to deviate from published specifications  Current characterized errata are available on request     Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your  product order     Copies of documents which have an order number and are referenced in this document  or other Intel literature  may  be obtained by calling 1 800 548 4725  or go to  http   www intel com design literature     Revision 1 1 Intel order number G90620 002 ii    Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1 600 1 400 Product Families    Table of Contents   SM AO GUG TON WEEN 1  1 1 Nie 1  1 2 Industry STAN IAG TE 2  1 2 1 Intelligent Platform Management Interface  IPMI            eee cere eeeeeentaeeeeeeeeneee 2  1 2 2 Baseboard Management Controller  DM     2  1 2 3 Intel   Intelligent Power Node Manager Version 30    3   2  Basic Decoding ot a SEL Record   iicccccaine ene ea 4  2 1 Default Values in the SEL Records biet deeg ne kainate 4  2 2 Notes on SEL Logs and Collecting SEL Information              ccccceeceeseeeeeeeeeeeeeeeees 10  2 2 1 Examples of Decoding BIOS Timestamp Events          s nsssosenneeeneeeserrsserrrnrrrreeeesnee 10  2 2 2 Example of Decoding a PCI Express  Correctable Error Events  11  2 2 3 Example of Decoding a Power S
117. he system and logs state changes  Expected power on events such as DC  ON OFF is logged and unexpected events are also logged  such as AC loss and power good loss     34 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Table 15  Power Unit Status Sensors Typical Characteristics                                  Byte Field Description  11 Sensor Type 09h   Power Unit  12 Sensor Number Oth  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0    Sensor Specific offset as described in Table 16  15 Event Data 2 Not used  16 Event Data 3 Not used          Table 16  Power Unit Status Sensor   Sensor Specific Offsets   Next Steps    Power Subsystems                hardware forced a power down     Sensor Specific Offset Description Next Steps  Hex Description  00h Power down System is powered down  Informational Event  02h   240 VA power down   240 VA power limit was exceeded and the   This could have been caused by many things     3  Remove replace the power supply     1  If you recently added hardware  try removing it   2  Remove replace any add in adapters     4  Remove replace the processors  DIMM  and or hard drives   5  Remove replace the boards in 
118. i    Coprocessor adapter provides an IPMI sensor that is read to get the  temperature data  The BMC then instantiates its own version of this sensor  which is used for fan speed control     The thermal margin sensor is the difference between the Core Temp sensor value and the TControl value reported by the Intel   Xeon  Phi    Coprocessor adapter     This sensor will not log events into the SEL    11 9 2 Intel Xeon Phi    Coprocessor  MIC  Status Sensors   Every time DC power is turned on  the BMC checks for Intel   Xeon Phi    Coprocessor adapters installed in the system  All compatible  cards will be enabled for management  The status sensor is a direct copy of the status sensor reported by the Intel   Xeon Phi    Coprocessor adapter     Table 86  MIC Status Sensors   Typical Characteristics             Byte Field Description  11 Sensor Type COh   OEM defined  12 Sensor Number A2h   MIC 1 Status    A3h   MIC 2 Status  A6h   MIC 3 Status  A7h   MIC 4 Status                   Revision 1 1 Intel order number GS0620 002 109    Miscellaneous Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families          Byte Field Description  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   70h  OEM defined     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger O
119. ion and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   08h     digital    discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset  Oh   Device Removed Device Absent   1h   Device Inserted Device Present    15 Event Data 2 Not used  16 Event Data 3 Not used                            11 8 1 Add In Module Presence   Next Steps    If an unexpected device is removed or inserted  ensure that the module has been seated properly     108 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Miscellaneous Events    11 9 Intel    Xeon Phi    Coprocessor Management Sensors  The Intel   Xeon   Processor E5 4600 2600 2400 1600 Product Families BMC supports limited manageability of the Intel   Xeon Phi     Coprocessor adapter as described in this section  The Intel   Xeon Phi  Coprocessor adapter uses the Many Integrated Core  MIC     architecture and the sensors are referred to as MIC sensors     For each manageable Intel   Xeon Phi    Coprocessor adapter found in the system  the BMC automatically enables the associated  thermal margin sensors  0xC4 0xC7  and status sensors  OxA2  OxA3  OxA6  0xA7      11 9 1 Intel    Xeon Phi    Coprocessor  MIC  Thermal Margin Sensors    The management controller FW of the Intel   Xeon Ph
120. ion events     Table 84  Firmware Update Status Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 2Bh  Version Change   12 Sensor Number 12h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   70h   OEM defined    14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset  Oh   Update started  1h   Update completed successfully  02h   Update failure    15 Event Data 2  Bits 7 4  Target of update   0000b   BMC   0001b   BIOS   0010b   ME   All other values are reserved    Bits 3 1  Target instance  zero based    Bits 0 0  Reserved    16 Event Data 3 Not used                            Revision 1 1 Intel order number G90620 002 107    Miscellaneous Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    11 8 Add In Module Presence Sensor    Some server boards provide dedicated slots for add in modules boards  for example  SAS  IO  and PCle riser   For these boards the  BMC provides an individual presence sensor to indicate whether the module board is installed     Table 85  Add In Module Presence Sensor Typical Characteristics             Byte Field Description  11 Sensor Type 15h   Module Board  12 Sensor Number OEh   IO Module Presence    OFh   SAS Module Presence  13h   IO Module2 Presence       13 Event Direct
121. irmware and drivers   oh CPU Core 1  Cross test the processors   Error 2  Replace the processors depending on the results of the test   3h MSID Verify the processor is supported by your baseboard  Check your boards Technical Product Specification  Mismatch  TPS               6 3 CPU Missing Sensor    The CPU Missing sensor is a discrete sensor reporting the processor is not installed  The most common instance of this event is due  to a processor populated in the incorrect socket     Table 51  CPU Missing Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 07h   Processor  12 Sensor Number 82h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h  State Asserted     15 Event Data 2 Not used  16 Event Data 3 Not used                         62 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Processor Subsystem    6 3 1 CPU Missing Sensor   Next Steps    Verify the processor is installed in the correct slot     6 4 Quick Path Interconnect Sensors    The Intel   Quick Path Interconnect  QPI  bus on Intel   EPSD Boards Based on Intel   Xeon   Processor E5   4600 2600 2400 1600 1400 Prod
122. is being logged  it is because the BMC has been  configured to check the watchdog timer     1  Make sure you have support for this in your OS  typically  using a third party IPMl aware utility such as ipmitool or  ipmiutil along with the OpenIPMI driver      2  If this is the case  it is likely your OS has hung  and you need    to investigate OS event logs to determine what may have  caused this        11 2 SMI Timeout    SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server  management events  typically memory or PCI errors  or other forms of critical interrupts   in order to log them to the SEL  If this  interrupt times out  the system is frozen  The BMC will reset the system after logging the event     102    Table 79  SMI Timeout Sensor Typical Characteristics                Byte Field Description  11 Sensor Type F3h   SMI Timeout  12 Sensor Number 06h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   03h     digital    Discrete        14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h   State Asserted       15 Event Data 2 Not used                16 Event Data 3 Not used          Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Produ
123. istics             ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeenaeeees 51    viii    Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5    Revision 1 1    4600 2600 2400 1 600 1 400 Product Families List of Tables  Table 39  Thermal Margin Sensors Event Triggers     Description    52  Table 40  Thermal Margin Sensors     Next Steps  00        cccceeeeeceeeeeeeneeeeeeeeaeeeeeeeeeeeeeeneeeeeeneneeeees 52  Table 41  Processor Thermal Control Sensors Typical Charachertsics  eee 53  Table 42  Processor Thermal Control Sensors Event Triggers     Description    54  Table 43  Processor DTS Thermal Margin Sensors Typical Characteristics             ccceeeees 55  Table 44  Discrete Thermal Sensors Typical Characteristics               cceeceseeeeeeeeeeeeeeeeeeeeeeeneeees 56  Table 45  Discrete Thermal Sensors     Next Steps          c  ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeneeeeeesenaeeees 56  Table 46  DIMM Thermal Trip Typical Characteristics             ccccccceeeeeeeeeeeceeeeeeeeeeeeeeeeneeeeeesenaeeees 57  Table 47  Process Status Sensors Typical Characteristics            ccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeneeeees 59  Table 48  Processor Status Sensors     Next Gieps nnt 60  Table 49  Catastrophic Error Sensor Typical Characteristics               cceccceeeeeeeeeeeeeeeeeeeeeeeseneeees 61  Table 50  Catastrophic Error Sensor     Event Data 2 Values     Next Steps              eeeeeeeeees 61  Table 
124. it can log an event     Table 20  Power Supply Status Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 08h   Power Supply  12 Sensor Number 50h   Power Supply 1 Status  51h   Power Supply 2 Status  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event                   38 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                                                 Power Subsystems  Byte Field Description   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      ED2 data in Table 21   5 4      ED3 data in Table 21   3 0      Sensor Specific offset as described in Table 21  15 Event Data 2 As described in Table 21  16 Event Data 3 As described in Table 21  Table 21  Power Supply Status Sensor   Sensor Specific Offsets   Next Steps  Sensor Specific Offset Description ED2 ED3 Next Steps  Hex Description  00h   Presence Power supply detected 00b   Unspecified Event Data 2 00b   Unspecified Event Data 3 Informational Event  Oih   Failure Power supply failed 10b   OEM code in Event Data 2 10b   OEM code in Event Data 3 Indicates a power supply  Check the data in ED2   01h     Output voltage fault failed   and ED3 for more details    a 92h     Output power fault Will have the contents of the 1  Remove and reapply             03h     Output over current
125. jor  8133 Processor 04 disabled Major  8160 Processor 01 unable to apply microcode update Major  8161 Processor 02 unable to apply microcode update Major  8162 Processor 03 unable to apply microcode update Major  8163 Processor 04 unable to apply microcode update Major  8170 Processor 01 failed Self Test  BIST  Major  8171 Processor 02 failed Self Test  BIST  Major  8172 Processor 03 failed Self Test  BIST  Major  8173 Processor 04 failed Self Test  BIST  Major  8180 Processor 01 microcode update not found Minor  8181 Processor 02 microcode update not found Minor    8182   Processor 03 microcode update not found   OSL Minor    8183 Processor 04 microcode update not found Minor                   90 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Revision 1 1                                                                                                       Error Code Error Message Response  8190 Watchdog timer failed on last boot Major  8198 OS boot watchdog timer failure Major  8300 Baseboard management controller failed self test Major  8305 Hot Swap Controller failure Major  83A0 Management Engine  ME  failed self test Major  83A1 Management Engine  ME  Failed to respond  Major  84F2 Baseboard management controller failed to respond Major  84F3 Baseboard management controller in update mode Major  84F4 Sensor data record empty Major
126. jor  8535 DIMM_H1 failed test initialization Major  8536 DIMM_H2 failed test initialization Major  8537 DIMM_HS failed test initialization Major  8538 DIMM_J1 failed test initialization Major  8539 DIMM_J2 failed test initialization Major  853A DIMM_J3 failed test initialization Major  853B DIMM_K1 failed test initialization Major  853C DIMM_K2 failed test initialization Major  853D DIMM_K3 failed test initialization Major  853E DIMM_L1 failed test initialization Major  853F DIMM_L2 failed test initialization Major   Go to 85C0    8540 DIMM_A1 disabled Major  8541 DIMM_A2 disabled Major  8542 DIMM A3 disabled Major  8543 DIMM_B1 disabled Major  8544 DIMM_B2 disabled Major  8545 DIMM_B3 disabled Major  8546 DIMM_C1 disabled Major  8547 DIMM_C2 disabled Major  8548 DIMM_C3 disabled Major  8549 DIMM_D1 disabled Major  854A DIMM_D2 disabled Major  854B DIMM_D3 disabled Major  854C DIMM_E1 disabled Major  854D DIMM_E2 disabled Major  854E DIMM_E8 disabled Major  854F DIMM_F1 disabled Major  8550 DIMM_F2 disabled Major  8551 DIMM_F3 disabled Major  8552 DIMM_G1 disabled Major                   92 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    Revision 1 1                                                                                                       Error Code Error Message Response  8553 DIMM_G2 disabled Major  8554 DIMM_G3 disabled M
127. ket  replace the main board   otherwise the processor   This 1 5V line is supplied by the main board   This 1 5V line is used by processor 1 memory slots A and B   Baseboard  1 5V P1 Memory AB YP y  D8h VDDQ 1  Ensure all cables are connected correctly    BB  1 5 P1MEM AB  2  Check the DIMMs are seated properly   3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM   This 1 5V line is supplied by the main board   This 1 5V line is used by processor 1 memory slots C and D   Baseboard  1 5V P1 Memory CD YP y  D9h VDDQ 1  Ensure all cables are connected correctly    BB  1 5 P1MEM CD  2  Check the DIMMs are seated properly   3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM   30 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Power Subsystems       Sensor Sensor Name Next Steps  Number       This 1 5V line is supplied by the main board   This 1 5V line is used by processor 2 memory slots A and B   Baseboard  1 5V P2 Memory AB YP y  DA VDDQ 1  Ensure all cables are connected correctly    BB  1 5 P2MEM AB  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM        This 1 5V line is supplied by the 
128. l Capabilities Change     Next Giepns  eee 121  13 5 Node Manger Alert Threshold Exceeded                eccseeeenccceeeeeeeeeeeeeeeeeeeeeeeneeeeenens 122    vi Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5    4600 2600 2400 1600 1 400 Product Families Table of Contents  13 5 1 Node Manger Alert Threshold Exceeded     Next Steps             ceeeeeeeeeeeeeeeeeenees 123   14  Microsoft Windows  Records  iveccccccccescescsieveccccecacenvadeescceucvoceaussaseedenCoadeSvavesseverccccsasavess 124  14 1 Boot up Event RECORAS ET 124  14 2 Shutdown  Event  RECOrd Sainn a Hae eee Ree  126  14 3 Bug Check   Blue Screen Event RecordS Abu 128  15  Linux  Kernel Panic RecordS          saannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnmnnn nnmnnn nnna 130    Revision 1 1 Intel order number G90620 002 vii    List of Tables   System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5    4600 2600 2400 1600 1 400 Product Families    List of Tables    Tabl   1  SEL Rec  rd  Format EE 4  Table 2  Event Request Message Event Data Field Content  7  Table 3  OEM SEL Record  Type COh DFh  EEN 8  Table 4  OEM SEL Record  Type Ee EE  geigeferete  egerieug iiteete ch andaneett eeeeteanninns 9  Table 5  BMC  Opere SeOnSOLs siissssicadovecdevsisstcaketeinscndaninienrmsacereadswaukecdtautunasdeveneuertiniaabshaauaaghanioes 13  Table 6  BIOS POST owned Sensors  
129. main board   This 1 5V line is used by processor 2 memory slots C and D   Baseboard  1 5V P2 Memory CD YP y  DBh VDDQ 1  Ensure all cables are connected correctly    BB  1 5 P2MEM CD  2  Check the DIMMs are seated properly     3  Cross test the DIMMs  If the issue remains with the DIMMs on this socket  replace the main  board  otherwise the DIMM         1 8V AUX is supplied by the main board   1 8V AUX is used by the BMC and on board NIC   Baseboard  1 8V Aux S y  DCh 1  Ensure all cables are connected correctly    BB  1 8V AUX    S  2  If the issue remains  replace the board   3  Ifthe issue remains  replace the power supplies      1 1V STBY is supplied by the main board    1 1V STBY is used by the Intel   C600 series Chipset        Baseboard  1 1V Stand by          DDh 1  Ensure all cables are connected correctly    BB  1 1V STBY  j    2  Ifthe issue remains  replace the board   3  Ifthe issue remains  replace the power supplies    3 3V Vbat is supplied by the CMOS battery when power is off and by the main board when power is on   DEh Baseboard CMOS Battery  3 3V Vbat is used by the CMOS and related circuits    BB  3 3V Vbat  1  Replace the CMOS battery  Any battery of type CR2032 can be used   2  If error remains  unlikely   replace the board   This 1 35V line is supplied by the main board   This 1 35V line is used by processor 1 memory slots A and B   Baseboard  1 35V P1 Low Voltage as 4  E4h Memory AB VDDQ 1  Ensure all cables are connected correctly      BB  1 35 P1LV A
130. n headers for each  chassis and will log this event if there is no fan on that header     1  Refer to the Quick Start Guide or the Service Guide to identify  the correct fan headers to use    2  Ensure the latest FRUSDR update has been run and the correct  chassis is detected or selected    3  If you are sure this was done  the event may be a sign of  impending fan failure  although this only normally applies if the  system has been in use for a while   Replace the fan    02h   Lower critical non fatal Degraded The fan speed has dropped  going low below its lower critical  threshold   5 1 2 Fan Presence and Redundancy Sensors    Fan presence sensors are only implemented for hot swap fans  and require an additional pin on the fan header  Fan redundancy is  an aggregate of the fan presence sensors and will warn when redundancy is lost  Typically the redundancy mode on Intel   servers is  an n 1 redundancy  if one fan fails there are still sufficient fans to cool the system  but it is no longer redundant  although other  modes are also possible     46    Table 31  Fan Presence Sensors Typical Characteristics                   Byte Field Description  11 Sensor Type 04h   Fan   12 Sensor Number 40h 4Fh  Chassis specific    13 Event Direction and  7  Event direction       Event Type       Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   08h  Generic    digital    Discrete           Intel order number G90620 002 Revision 1 1       System Event Log Troubleshooting Gui
131. n is the BIOS timestamp synchronization event log  This event can be logged by the  BIOS during POST or it can be logged by the BIOS SMI Handler when a system is requested to do a shutdown or a restart from the  operating system  OS   See section 2 2 1 for examples  Most utilities report this as just a BIOS event and do not differentiate  between the two  But sometimes it is useful because you can see the sequence of events better  For example if there are multiple  sequences of the timestamp synchronization events  was the power lost after booting to the OS and then the system restarted  was it  multiple POST events  or was it a restart from the OS     An example of not decoding all the information is with the PCI Express  errors and some of the Power Supply events  For the PCI  Express  errors the type of error and the PCI Bus  Device  and Function are all a part of Event Data 1 through Event Data 3  See  section 2 2 2  For the Power Supply events when there is a failure  predictive failure  or a configuration error  Event Data 2 and Event  Data 3 hold additional information that describes the Power Supplies PMBus  Command Registers and values for that particular  event  See section 2 2 3     2 2 1 Examples of Decoding BIOS Timestamp Events  The following are some samples of BIOS timestamp events during POST and during an OS shutdown     2 2 1 1 BIOS POST Timestamp Events    RID 19  01  RT 02  TS 57  49  6A  4E  GID 01  00  ER 04  ST 12  SN 83  EDIR 6F  ED1 05  ED2 00  ED3 
132. nk is lost     10 1 Physical Security  Two sensors are included in the physical security subsystem  chassis intrusion and LAN leash lost   10 1 1 Chassis Intrusion    Chassis Intrusion is monitored on supported chassis  and the BMC logs corresponding events when the chassis lid is opened and  closed     10 1 2 LAN Leash Lost    The LAN Leash lost sensor monitors the physical connection on the on board network ports  If a LAN Leash lost event is logged  this  means the network port lost its physical connection     Table 73  Physical Security Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 05h   Physical Security  12 Sensor Number 04h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset as described in Table 74                      Revision 1 1 Intel order number G90620 002 97    Chassis Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                      Byte Field Description  15 Event Data 2 Not used  16 Event Data 3 Not used          Table 74  Physical Security Sensor Event Trigger Offset   Next Steps                Event Trigger Offset Description Next Steps  Hex Description  Somebody has opened the chassis  or the 
133. nsor Name Details Section Next Steps  Number  BMC Watchdo  OAh 9 BMC Watchdog Sensor BMC Watchdog Sensor     Next Steps   BMC Watchdog   Volt Regulator Watchd g g g    OBh Dems    Saye Volta e Regulator Watchdo Voltage Regulator Watchdog Timer Sensor     Next Steps   VR Watchdog  Timer Sensor  Fan Redundancy Fan Presence and Redundancy    h Table 34  Fan Redundancy Sensor     Event Trigger Offset     Next Steps  ES  Fan Redundancy  Sensors able  SSB Thermal Tri  ODh P Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps   SSB Thermal Trip   IO Module P  OEh macros ts Add In Module Presence Sensor   Add In Module Presence     Next Steps   IO Mod Presence  pe Se es wre a  SAS Module Presence  OFh Add In Module Presence Sensor   Add In Module Presence     Next Steps   SAS Mod Presence   BMC Firmware Health  10h BMC FW Health Sensor BMC FW Health Sensor     Next Steps   BMC FW Health   System Airflow System Air Flow Monitoring d  11h Not applicable   System Airflow  Sensor pp  Firmware Update Status  12h Firmware Update Status Sensor Not applicable   FW Update Status  PP  IO Module2 P  13h SR Add In Module Presence Sensor   Add In Module Presence     Next Steps   IO Mod  Presence   B T 5 E p  14h SE E Threshold based Temperature Table 37  Temperature Sensors     Next Steps   Platform Specific  Sensors  Baseboard Temperature 6 Threshold based Temperature  15h i p threshold based Temperature Table 37  Temperature Sensors     Next Steps   Platform Specific  Sen
134. nsors     Table 43  Processor DTS Thermal Margin Sensors Typical Characteristics             Byte Field Description  11 Sensor Type 01h   Temperature  12 Sensor Number 83h   Processor 1 DTS Thermal Margin    84h   Processor 2 DTS Thermal Margin  85h   Processor 3 DTS Thermal Margin  86h   Processor 4 DTS Thermal Margin       13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   01h  Threshold                    5 2 5 Discrete Thermal Sensors  Discrete thermal sensors do not report a temperature at all  instead they report an overheating event of some kind  For example     VRD Hot  voltage regulator is overheating  or processor Thermal Trip  the processor got so hot that its over temperature protection  was triggered and the system was shut down to prevent damage      Revision 1 1 Intel order number G90620 002 55    Cooling Subsystem    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 44  Discrete Thermal Sensors Typical Characteristics                                                                                                                Byte Field Description  11 Sensor Type 01h   Temperature  12 Sensor Number See Table 45  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   See Table 45  14 Event Data 1  7 6      00b   Unspecified Event 
135. nt  Next steps depend on the policy that was set  See the Node Manager Specification for more details     Revision 1 1    Intel order number G90620 002    117    Manageability Engine  ME  Events  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    13 3 Node Manager Health Event    A Node Manager Health Event message provides a runtime error indication about Intel   Intelligent Power Node Manager s health   Types of service that can send an error are defined as follows       Misconfigured policy Error reading power data    Error reading inlet temperature    Table 95  Node Manager Health Event Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 002Ch or 602Ch     ME Firmware  9  11 Sensor Type DCh   OEM  12 Sensor Number 19h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event    6 0  Event Type   73h  OEM   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Health Event Type   02h  Sensor Node Manager   15 Event Data 2  7 4      Error type   0 9     Reserved   10     Policy Misconfiguration   11     Power Sensor Reading Failure   12     Inlet Temperature Reading Failure   13     Host Communication error   14     Real time clock synchronization failure   15     Platform shutdown initiated by NM policy due to execution of action defined by Policy 
136. nt Trigger Assertion Deassert Description  Hex Description Severity Severity P  00h St Mal Degraded OK The voltage has dropped below its lower non critical threshold   02h sha H non fatal Degraded   The voltage has dropped below its lower critical threshold   07h ick Degraded OK The voltage has gone over its upper non critical threshold   09h Upper critical non fatal Degraded   The voltage has gone over its upper critical threshold   going high  Table 13  Threshold based Voltage Sensors   Next Steps  Sensor Sensor Name Next Steps  Number  This 1 05V line is supplied by the main board   This 1 05V line is used by processor 1   19h Baseboard  1 05V Processor3 Vccp 1  Ensure all cables are connected correctly    BB  1 05Vccp P3  2  Check the processor is seated properly   3  Cross test the processors  If the issue remains with the processor socket  replace the main board   otherwise the processor   This 1 05V line is supplied by the main board   This 1 05V line is used by processor 1   4An Baseboard  1 05V Processor4 Vccp 1  Ensure all cables are connected correctly    BB  1 05Vccp P4  2  Check the processor is seated properly   3  Cross test the processors  If the issue remains with the processor socket  replace the main board   otherwise the processor           28    Intel order number G90620 002    Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                   Power
137. nt Trigger Processor Status Next Steps  Offset   1  Cross test the processors    Oh Internal error  IERR  P 7  2  Replace the processors depending on the results of the test   This event normally only happens due to failures of the thermal solution   1  Verify heatsink is properly attached and has thermal grease    th Thermal trip 2  Ifthe system has a heatsink fan  ensure the fan is spinning   3  Check all system fans are operating properly   4  Check that the air used to cool the system is within limits  typically   35  C     2h FRB1 BIST failure 1  Cross test the processors    3h FRB2 Hang in POST failure 2  Replace the processors depending on the results of the test    4h FRB3 Processor startup initialization failure  CPU fails to   start    5h Configuration error  for DMI    6h SM BIOS uncorrectable CPU complex error   7h Processor presence detected Informational Event   8h Processor disabled 1  Cross test the processors    9h Terminator presence detected 2  Replace the processors depending on the results of the test        60 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    6 2 Catastrophic Error Sensor    Processor Subsystem    When the Catastrophic Error signal  CATERR   stays asserted  it is a sign that something serious has gone wrong in the hardware   The BMC monitors this signal and reports when it stays asserted     Table 49
138. o the action so you need to  investigate the SEL and PEF settings to identify this event  and troubleshoot accordingly     104 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Miscellaneous Events    11 5 BMC Watchdog Sensor    The BMC supports an IPMI sensor to report that a BMC reset has occurred due to an action taken by the BMC Watchdog feature  A  SEL event will be logged whenever either the BMC FW stack is reset or the BMC CPU itself is reset     Table 82  BMC Watchdog Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 28h   Management Subsystem Health  12 Sensor Number OAh  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   03h     digital    Discrete     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0  Event Trigger Offset   1h   State Asserted    15 Event Data 2 Not used  16 Event Data 3 Not used    11 5 1 BMC Watchdog Sensor   Next Steps                            A SEL event will be logged whenever either the BMC FW stack is reset or the BMC CPU itself is reset     1  Check the SEL for any other events around the time of the failure   Take note of all IPMI activity that was occurring around the time of the failure  Capture a System BMC Debug Log as soon as you  can afte
139. ocessor ERR2 Timeout     Next Steps   P4 ERR2  Gia ea ia i Set me  Catastrophic Error i     80h Catastrophic Error Sensor Table 50  Catastrophic Error Sensor     Event Data 2 Values     Next Steps   CATERR   Processor 1 MSID Mismatch Processor MSID Mismatch     81h   ao eee Processor MSID Mismatch Sensor     Next Steps   P1 MSID Mismatch  Sensor  P Population Fault   82h princes   SE STEE CPU Missing Sensor CPU Missing Sensor     Next Steps   CPU Missing   Processor 1     4 DTS Thermal     83h 86h   Margin Processor DTS Thermal Margin   Not applicable    P1     P4 DTS Therm Mgn  ETE  Processor 2 MSID Mismatch i   87h f Processor MSID Mismatch Processor MSID Mismatch Sensor     Next Steps   P2 MSID Mismatch  Sensor  P 3 MSID Mi tch i   88h ee   SE Processor MSID Mismatch Processor MSID Mismatch Sensor     Next Steps   P3 MSID Mismatch  Sensor  P 4 MSID Mi tch i   89h SEERA   SE Processor MSID Mismatch Processor MSID Mismatch Sensor     Next Steps   P4 MSID Mismatch  Sensor Frocessor Mot Mismatch sensor     Nex D  Processor 1 VRD Tem   90h H Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps   P1 VRD Hot  Gs a ea EE  P 2 VRD Ti   91h feiere ST Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps   P2 VRD Hot  a    See ee  Processor 3 VRD Tem   92h H Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next Steps   P3 VRD Hot   Processor 4 VRD Tem   93h H Discrete Thermal Sensors Table 45  Discrete Thermal Sensors     Next
140. ode Manager Version 2 0    Intel   Intelligent Power Node Manager Version 2 0  NM  is a platform resident technology that  enforces power and thermal policies for the platform  These policies are applied by exploiting  subsystem knobs  such as processor P and T states  that can be used to control power  consumption  Intel   Intelligent Power Node Manager enables data center power and thermal  management by exposing an external interface to management software through which platform  policies can be specified  It also enables specific data center power management usage models  such as power limiting     The configuration and control commands are used by the external management software or  BMC to configure and control the Intel   Intelligent Power Node Manager feature  Because  Platform Services firmware does not have any external interface  external commands are first  received by the BMC over LAN and then relayed to the Platform Services firmware over IPMB  channel  The BMC acts as a relay and the transport conversion device for these commands  For  simplicity  the commands from the management console might be encapsulated in a generic  CONFIG packet format  configuration data length  configuration data blob  to the BMC so that  the BMC doesn   t even have to parse the actual configuration data     The BMC provides the access point for remote commands from external management SW and  generates alerts to them  Intel   Intelligent Power Node Manager on Intel   Manageability En
141. on  SPD  failure Major  8570 DIMM_F2 encountered a Serial Presence Detection  SPD  failure Major  8571 DIMM_F3 encountered a Serial Presence Detection  SPD  failure Major                Intel order number G90620 002    System BIOS Events    93    System BIOS Events    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    94                                                                                                       Error Code Error Message Response  8572 DIMM_G1 encountered a Serial Presence Detection  SPD  failure Major  8573 DIMM_G2 encountered a Serial Presence Detection  SPD  failure Major  8574 DIMM_G3 encountered a Serial Presence Detection  SPD  failure Major  8575 DIMM_H1 encountered a Serial Presence Detection  SPD  failure Major  8576 DIMM_H2 encountered a Serial Presence Detection  SPD  failure Major  8577 DIMM_H3 encountered a Serial Presence Detection  SPD  failure Major  8578 DIMM_J1 encountered a Serial Presence Detection  SPD  failure Major  8579 DIMM_J2 encountered a Serial Presence Detection  SPD  failure Major  857A DIMM_J3 encountered a Serial Presence Detection  SPD  failure Major  857B DIMM_K1 encountered a Serial Presence Detection  SPD  failure Major  857C DIMM_K2 encountered a Serial Presence Detection  SPD  failure Major  857D DIMM_K8 encountered a Serial Presence Detection  SPD  failure Major  857E DIMM_L1 encountered a Serial Presence Detection  SPD  f
142. orms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Table 56  Processor ERR2 Timeout Sensor Typical Characteristics             Byte Field Description  11 Sensor Type 07h   Processor  12 Sensor Number 7Ch   Processor 1 ERR2 Timeout    7Dh   Processor 2 ERR2 Timeout  7Eh   Processor 3 ERR2 Timeout  7Fh   Processor 4 ERR2 Timeout       13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event  1b   Deassertion Event   6 0  Event Type   03h        digital    discrete   14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   1h  State Asserted   15 Event Data 2 Not used    16 Event Data 3 Not used                            6 5 1 Processor ERR2 Timeout   Next Steps   1  Check the SEL for any other events around the time of the failure   Take note of all IPMI activity that was occurring around the time of the failure  Capture a System BMC Debug Log as soon as you  can after experiencing this failure  This log can be captured from the Integrated BMC Web Console or by using the Intel   Syscfg    utility  syscfg  somcdl private filename zip   Send the log file to your system manufacturer or Intel representative for failure  analysis     6 6 Processor MSID Mismatch Sensor    The BMC supports a MSID Mismatch sensor for monitoring for the fault condition that will occur if there is a power rating  incompatibility between a baseboard and a processor     The
143. oss test the processor  If the issue remains with the processor socket  replace the main board  otherwise the processor   6 4 2 QPI Correctable Error Sensor    The system detected an error and corrected it  This is an informational event     Table 53  QPI Correctable Error Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler  9  11 Sensor Type 13h   Critical Interrupt  12 Sensor Number 06h  13 Event Direction and  7  Event direction   Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   72h  OEM Discrete     14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   Reserved                      64 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families                         Byte Field Description  15 Event Data 2 0 3   CPU1 4  16 Event Data 3 Not used   6 4 2 1 QPI Correctable Error Sensor   Next Steps    Processor Subsystem    This is an Informational event only  Correctable errors are acceptable and normal at a low rate of occurrence  If the error continues     1  Check the processor is installed correctly     2  Inspect the socket for bent pins     3  Cross test the processor  If the issue remains with the processor socket  replace the main board  otherwise the processor     6 4 3 QPI 
144. r EPSD Platforms Based on Intel    Xeon    Processor E5    Table 79   Table 80   Table 81   Table 82   Table 83   Table 84   Table 85   Table 86   Table 87   Table 88   Table 89   Table 90   Table 91   Table 92   Table 93   Table 94   Table 95   Table 96   Table 97   Table 98   Table 99     Table 100   Table 101   Table 102   Table 103   Table 104   Table 105   Table 106     4600 2600 2400 1600 1 400 Product Families    SMI Timeout Sensor Typical Characteristics           c cccecceceeeeeeneeeeeeeeneeeeeeeeneeeeeeeeeeees 102  System Event Log Cleared Sensor Typical Charachertsics   103  System Event     PEF Action Sensor Typical Characteristics                ccceeceeeeeeeeeeees 104  BMC Watchdog Sensor Typical Characteristics          0    cccccceceeeeeeeeeeeeneeeeeeeeneeeeeeees 105  BMC FW Health Sensor Typical Characteristics               cccccceeeseeeeeeeeteeeeeeeteeeeeesees 106  Firmware Update Status Sensor Typical Charactertsics   esere 107  Add In Module Presence Sensor Typical Characteristics               ccceeeeeseeeeeeeeeeeeees 108  MIC Status Sensors   Typical Charactertetce 109  HSC Backplane Temperature Sensor Typical Characteristics             c eeeeeeeeeeeee 111  HSC Backplane Temperature Sensor     Event Trigger Offset     Next Gienps 112  Hard Disk Drive Monitoring Sensor Typical Charachertsics  ree 112  Hard Disk Drive Monitoring Sensor   Event Trigger Offset     Next Gieps 113  HSC Health Sensor Typical Characteristics              ccccceeeceeeeeeeeeneee
145. r experiencing this failure  This log can be captured from the Integrated BMC Web Console or by using the Intel   Syscfg  utility  syscfg  somcdl private filename zip   Send the log file to your system manufacturer or Intel representative for failure  analysis     Revision 1 1 Intel order number GS0620 002 105    Miscellaneous Events    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    11 6 BMC FW Health Sensor    The BMC tracks the health of each of its IPMI sensors and reports failures by providing a    BMC FW Health    sensor of the IPMI 2 0  sensor type Management Subsystem Health with support for the Sensor Failure offset  Only assertions will be logged into the SEL  for the Sensor Failure offset  The BMC Firmware Health sensor asserts for any sensor when 10 consecutive sensor errors are read   These are not standard sensor events  that is  threshold crossings or discrete assertions   These are BMC Hardware Access Layer   HAL  errors such as DC NAKs or internal errors while attempting to read a register  If a successful sensor read is completed  the    counter resets to Zero     Table 83  BMC FW Health Sensor Typical Characteristics       Description       28h   Management Subsystem Health       10h       Byte Field   11 Sensor Type   12 Sensor Number   13 Event Direction and  Event Type     7  Event direction  Ob   Assertion Event  1b   Deassertion Event   6 0  Event Type   6Fh 
146. r supply ran I achometer sensors     Next steps   AOh  PS1 Fan Tach 1  Sensors Power Supply Fan Tachometer Sensors     Next Steps  Power Supply 1 Fan Tachometer 2 pply   Ath H  Power Supply Fan Tachometer Power Supply Fan Tachometer Sensors     Next Steps   PS1 Fan Tach 2  Sensors  Intel   Xeon Phi    Coprocessor as PRG   A2h Status 1  MIC  SC Ga Intel   Xeon Phi    Coprocessor  MIC  Status Sensors Next Steps   MIC 1 Status   Intel   Xeon Phi    Coprocessor ET   A3h Status 2 ee El Pea Intel   Xeon Phi    Coprocessor  MIC  Status Sensors Next Steps   MIC 2 Status   MIC  Status Sensors  Power Supply 2 Fan Tachometer 1 pply   A4h H  Power Supply Fan Tachometer Power Supply Fan Tachometer Sensors     Next Steps   PS2 Fan Tach 1  Sensors   Revision 1 1 Intel order number G90620 002 19       Sensor Cross Reference List  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families       Sensor Sensor Name Details Section Next Steps  Number       Power Supply 2 Fan Tachometer 2   Power Supply Fan Tachometer    A5h   PS2 Fan Tach 2  Sensors    Power Supply Fan Tachometer Sensors     Next Steps          Intel   Xeon Phi    Coprocessor  A6h Status 3     MIC 3 Status     Intel   Xeon Phi    Coprocessor              MIC  Status Sensors Intel    Xeon Phi Coprocessor  MIC  Status Sensors Next Steps       Intel   Xeon Phi    Coprocessor  A7h Status 4     MIC 4 Status     Intel   Xeon Phi    Coprocessor           
147. racteristics              ccccsceeeeeeeeeeeeeeeeeeeeeeneeeees 78  Table 66  Legacy PCI Error Sensor Typical Characteristics             ccccceccceeeeeeeeeeeeeeeeeeeeeeeesenaeeees 81  Table 67  PCI Express  Fatal Error Sensor Typical Characteristics                ecceceeeeeeeeeeeeeeeneeees 82  Table 68  PCI Express  Fatal Error  2 Sensor Typical Characteristics              cc cceeeeeeeeeeeeseeeees 83  Table 69  PCI Express  Correctable Error Sensor Typical Characteristics         0      cceceeeeees 85  Table 70  System Event Sensor Typical Characteristics             cccccccceeeeeeeeeeeeeeeeeeeeeeeseeeeeeseaeeees 88  Table 71  POST Error Sensor Typical Characteristics             ccccceceeeeeeeeeeeceeeeeeeeeeeeeeeeeeeeeeeeeeeeees 89  Table  72  POST Error Code EE 90  Table 73  Physical Security Sensor Typical Characteristics             cc cceececeeeeeeeeeeeeeeeeeeeeeeeesenneeees 97  Table 74  Physical Security Sensor Event Trigger Offset     Next Steps              cc eeeeeeeeeeeeeteeeees 98  Table 75  FP  NMI  Interrupt Sensor Typical Characteristics                cccccccscssesssceeeeeeeesssesseaees 99  Table 76  Button Sensor Typical Characteristics AEN 100  Table 77  IPMI Watchdog Sensor Typical Characteristics              ccccceeeeseeeeeeeeeeeeeeeeeeeeeeteeneeees 101  Table 78  IPMI Watchdog Sensor Event Trigger Offset     Next Steps              cceeeeeeeeeeeeeeenees 102    Intel order number G90620 002 ix    List of Tables   System Event Log Troubleshooting Guide fo
148. report an actual temperature  These are linear  threshold based sensors  In  most Intel   Server Systems  multiple sensors are defined  front panel temperature and baseboard temperature  There are also  multiple other sensors that can be defined and are platform specific  Most of these sensors typically have upper and lower thresholds  set     upper to warn in case of an over temperature situation  lower to warn against sensor failure  temperature sensors typically read  out 0 if they stop working      Table 35  Temperature Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 01h   Temperature  12 Sensor Number See Table 37  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   01h  Threshold   14 Event Data 1  7 6      01b   Trigger reading in Event Data 2   5 4      01b   Trigger threshold in Event Data 3   3 0  Event Trigger Offset as described in Table 36       15 Event Data 2 Reading that triggered event                   Revision 1 1 Intel order number GS0620 002 49    Cooling Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Byte Field Description       16 Event Data 3 Threshold value that triggered event                   Table 36  Temperature Sensors Event Triggers   Description                                              Event Trigger Assertion Deassert Descrip
149. rocessor E5 4600 2600 2400 1 600 1 400 Product Families  PCI Express  and Legacy PCI Subsystem       Byte Field Description    14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0  Event Trigger  Oh   Data Link Layer Protocol Error  1h   Surprise Link Down Error  2h   Completer Abort  3h   Unsupported Request  4h   Poisoned TLP  5h   Flow Control Protocol  6h   Completion Timeout  7h   Receiver Buffer Overflow  8h   ACS Violation  9h   Malformed TLP  Ah   ECRC Error  Bh   Received Fatal Message From Downstream  Ch   Unexpected Completion  Dh   Received ERR_NONFATAL Message  Eh   Uncorrectable Internal  Fh   MC Blocked TLP    15 Event Data 2 PCI Bus number    16 Event Data 3  7 3      PCI Device number   2 0      PCI Function number                            The PCI Express   Fatal Error  2 is a continuation of the PCI Express  Fatal Error     Table 68  PCI Express  Fatal Error  2 Sensor Typical Characteristics             Byte Field Description  8 Generator ID 0033h   BIOS SMI Handler   9   11 Sensor Type 13h   Critical Interrupt                   Revision 1 1 Intel order number GS0620 002 83    PCI Express  and Legacy PCI Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families             Byte Field Description  12 Sensor Number 14h  13 Event Direction and  7  Event direction   Event Type Ob   Assertion Event    1b   Deassertion 
150. rs the system for critical events by communicating with various sensors on the system    2 Intel order number GS0620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5  4600 2600 2400 1 600 1400 Product Families Introduction    board  it sends alerts and logs events when certain parameters exceed their preset thresholds   indicating a potential failure of the system  The administrator can also remotely communicate  with the BMC to take some corrective action such as resetting or power cycling the system to  get a hung OS running again  These abilities save on the total cost of ownership of a system     For Intel   Server Boards and Intel   Server Platforms  the BMC supports the industry standard  IPMI 2 0 Specification  enabling you to configure  monitor  and recover systems remotely     1 2 2 1 System Event Log  SEL     The BMC provides a centralized  non volatile repository for critical  warning  and informational  system events called the System Event Log or SEL  By having the BMC manage the SEL and  logging functions  it helps to ensure that    post mortem    logging information is available if a  failure occurs that disables the system processor s      The BMC allows access to SEL from in band and out of band mechanisms  There are various  tools and utilities that can be used to access the SEL  There is the Intel   SELView utility and  multiple open sourced IPMI tools     1 2 3 Intel    Intelligent Power N
151. s  Correctable Error Sensor     Next Steps   24 Intel order number G90620 002 Revision 1 1          System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Sensor Cross Reference List                                           Sensor Sensor Name Details Section Next Steps  Number  Intel   Quick Path Interface  QP Correctable Error sensor LIT Correctable Error sensor     Next steps  06h Correctable Error QPI Correctable Error Sensor QPI Correctable Error Sensor     Next Steps  07h Intel  Quick Path Interface Fatal Error   QPI Fatal Error and Fatal Error  2 QPI Fatal Error and Fatal Error  2     Next Steps  11h Sparing Redundancy State Sparing Redundancy State Sparing Redundancy State Sensor     Next Steps  13h Memory Parity Error Memory Address Parity Error Memory Address Parity Error Sensor     Next Steps  PCI Express  Fatal Error 2 x PCI Express  Fatal Error and Fatal Error  2 Sensor     Next  f f  14h  continuation of Sensor 04h  PCI Express  Fatal Errors and Fatal Error  2 Steps  Intel   Quick Path Interface Fatal Error  17h  2 QPI Fatal Error and Fatal Error  2 QPI Fatal Error and Fatal Error  2     Next Steps   continuation of Sensor 07h   83h System Event System Events Not applicable       3 4 Node Manager   ME Firmware owned Sensors  GID   002Ch or 602Ch     The following table can be used to find the details of sensors owned by the Node Manager   Management Engine  ME  firmware     T
152. sors  IO Module  Temperature Threshold based Temperature  16h p threshold based Temperature Table 37  Temperature Sensors     Next Steps   I O Mod  Temp  Sensors  PCI Ri T 1nresnola basea   emperature  17h CAIRE Temperature Threshold based Temperature Table 37  Temperature Sensors     Next Steps        PCI Riser 3 Temp        Sensors          14    Intel order number G90620 002    Revision 1 1       System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Sensor Cross Reference List                                                                                  Sensor Sensor Name Details Section Next Steps  Number  PCI Riser 4 Temperature 4 p  18h  PCI Riser 4 sets Tech based Temperature Table 37  Temperature Sensors     Next Steps  Baseboard  1 05V Processor3  Jnresnoid dased Vollage    19h Vccp e based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 05Vccp P3   Baseboard  1 05V Processor4  Jnresnoid based Vollage    1Ah Vccp Tee based Voltage Table 13  Threshold based Voltage Sensors     Next Steps   BB  1 05Vccp P4       gt   Baseboard Temperature 1 B  20h  Platform See e SE Temperature Table 37  Temperature Sensors     Next Steps  Front Panel Temperature B  21h  Front Panel Ce NM EE Table 37  Temperature Sensors     Next Steps  SSB Temperature Threshold based Temperature  22h  SSB Temp  Sensors Table 37  Temperature Sensors     Next Steps  Baseboard Temperature 2 
153. t Steps  Number   23h Baseboard Temperature 2   24h Baseboard Temperature 3   25h Baseboard Temperature 4       26h UO Mod Temp  27h PCI Riser 1 Temp  28h IO Riser Temp  2Ch PCI Riser 2 Temp  2Dh SAS Mod Temp  2Eh Exit Air Temp   2Fh LAN NIC Temp                                     5 2 2 Thermal Margin Sensors    Margin sensors are also linear sensors but typically report a negative value  This is not an actual temperature  but in fact an offset to  a critical temperature  Values reported are seen as number of degrees below a critical temperature for the particular component     The BMC supports DIMM aggregate temperature margin IPMI sensors  The temperature readings from the physical temperature  sensors on each DIMM  such as  Temperature Sensor on DIMM  or TSOD  are aggregated into IPMI temperature margin sensors for  groupings of DIMM slots  the partitioning of which is platform SKU specific and generally corresponding to fan domains     The BMC supports global aggregate temperature margin IPMI sensors  There may be as many unique global aggregate sensors as  there are fan domains  Each sensor aggregates the readings of multiple other IPMI temperature sensors supported by the BMC FW   The mapping of child sensors into each global aggregate sensor is SDR configurable  The primary usage for these sensors is to  trigger turning off fans when a lower threshold is reached     Table 38  Thermal Margin Sensors Typical Characteristics          Byte Field Description  11 Sensor Typ
154. t direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event Type   09h  digital Discrete   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      10b   OEM code in Event Data 3   3 0      Event Trigger Offset  Oh   RAS Configuration Disabled  1h   RAS Configuration Enabled    15 Event Data 2 Prior RAS Mode   7 4    Reserved   3 0    RAS Mode  Oh   None  Independent Channel Mode   1h   Mirroring Mode  2h   Lockstep Mode  4h   Rank Sparing Mode                         72 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Memory Subsystem       Byte Field Description    16 Event Data 3 Selected RAS Mode   7 4    Reserved   3 0    RAS Mode  Oh   None  Independent Channel Mode   1h   Mirroring Mode  2h   Lockstep Mode  4h   Rank Sparing Mode                      7 3 Mirroring Redundancy State    Mirroring Mode protects memory data by full redundancy     keeping complete copies of all data on both channels of a Mirroring  Domain  channel pair   If an Uncorrectable Error  which is normally fatal  occurs on one channel of a pair  and the other channel is  still intact and operational  then the Uncorrectable Error is    demoted    to a Correctable Error  and the failed channel is disabled   Because the Mirror Domain is no longer redundant  a Mirroring Redundancy State SEL Event is logged     Table 61  Mirroring Re
155. the system        04h          A C Lost       A C power was removed     Informational Event             Revision 1 1    Intel order number G90620 002    35    Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families       Sensor Specific Offset Description Next Steps       Hex Description       05h   Soft Power Control Generally means power good was lost in This could be cause by the power supply subsystem or system  Failure the system  causing a shutdown  components     1  Verify all power cables and adapters are connected properly  AC  cables as well as the cables between the PSU and system  components      2  Cross test the PSU if possible   3  Replace the power subsystem        06h   Power Unit Failure Power subsystem experienced a failure  Indicates a power supply failed   1  Remove and reapply AC power   2  Ifthe power supply still fails  replace it                       4 3 2 Power Unit Redundancy Sensor    This sensor is enabled on the systems that support redundant power supplies  When a system has AC applied or if it loses  redundancy of the power supplies  a message will get logged into the SEL     Table 17  Power Unit Redundancy Sensors Typical Characteristics                Byte Field Description  11 Sensor Type 09h   Power Unit  12 Sensor Number 02h  13 Event Direction and  7  Event direction  Event Type Ob   Assertion Event    1b   Deassertion Event   6 0  Event
156. ther standby voltages   Sh  BB  5 0V STBY  1  Ensure all cables are connected correctly   2  Ifthe issue remains  replace the board   3  Ifthe issue remains  replace the power supplies                    Revision 1 1 Intel order number GS0620 002 29    Power Subsystems  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families                                     Sensor Sensor Name Next Steps  Number   3 3V AUX is supplied by the main board    3 3V AUX is used by the BMC  clock chips  PCI E Slot  on board NIC  Intel   C600 series Chipset  and  Baseboard  3 3V Auxiliary ICH   D4h   BB  3 3V AUX  1  Ensure all cables are connected correctly   2  If the issue remains  replace the board   3  Ifthe issue remains  replace the power supplies   This 1 05V line is supplied by the main board   This 1 05V line is used by processor 1   D6h Baseboard  1 05V Processor1 Vccp 1  Ensure all cables are connected correctly    BB  1 05Vccp P1  2  Check the processor is seated properly   3  Cross test the processors  If the issue remains with the processor socket  replace the main board   otherwise the processor   This 1 05V line is supplied by the main board   This 1 05V line is used by processor 2   D7h Baseboard  1 05V Processor2 Vccp 1  Ensure all cables are connected correctly    BB  1 05Vccp P2  2  Check the processor is seated properly   3  Cross test the processors  If the issue remains with the processor soc
157. tion  Hex Description Severity Severity  00h EC Degraded OK The temperature has dropped below its lower non critical threshold   02h sree non fatal Degraded   The temperature has dropped below its lower critical threshold   07h GE Degraded OK The temperature has gone over its upper non critical threshold   09h SC non fatal Degraded   The temperature has gone over its upper critical threshold   Table 37  Temperature Sensors   Next Steps  Sensor Sensor Name Next Steps    Number       If the front panel temperature reads zero  check    1  It is connected properly   21h Front Panel Temp 2  The SDR has been programmed correctly for your chassis   If the front panel temperature is too high    1  Check the cooling of your server room                 14h Baseboard Temperature 5 1  Check for clear and unobstructed airflow into and out of the chassis   15h Baseboard Temperature 6 2  Ensure the SDR is programmed and correct chassis has been selected   16h VO Mod2 T 3  Ensure there are no fan failures      emp 4  Ensure the air used to cool the system is within the thermal specifications for the system  typically below  17h PCI Riser 5 Temp 35  C         18h PCI Riser 4 Temp  20h Baseboard Temperature 1    22h SSB Temperature                      50 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families  Cooling Subsystem                Sensor Sensor Name Nex
158. tored in this record as F2h  1Bh  00h for bytes 8 through 10  respectively        8 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Revision 1 1    Basic Decoding of a SEL Record       Byte    Field    Description          11  12  13  14  15  16       OEM Defined       OEM Defined  This is defined according to the manufacturer identified by the  Manufacturer ID field           Table 4  OEM SEL Record  Type EOh FFh        Byte    Field    Description       Record ID   RID     ID used for SEL Record access        Record Type   RT      7 0      Record Type  EOh FFh   OEM system event record          ON Oar    o    11  12  13  14  15  16       OEM       OEM Defined  This is defined by the system integrator           Intel order number G90620 002    Basic Decoding of a SEL Record  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families    2 2 Notes on SEL Logs and Collecting SEL Information    Whenever you capture the SEL log  you should always collect both the text human readable version and the hex version  Because  some of the data is OEM specific  some utilities cannot decode the information correctly  In addition with some OEM specific data  there may be additional variables that are not decoded at all     An example of not decoding all of the informatio
159. tup or due to unavailability applying the mirroring configuration to the memory  Check for other errors  of memory at post  in which case post related to the memory and troubleshoot accordingly   error 8500 is also logged   2  If there is no post error  mirror mode was simply disabled in BIOS setup and  this should be considered informational only   Revision 1 1 Intel order number G90620 002    71       Memory Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    7 2 Memory RAS Mode Select  Memory RAS Mode Select events are logged to record changes in RAS Mode     When a RAS Mode selection is made that changes the RAS Mode  including selecting a RAS Mode from or to Independent Channel  Mode   that change is logged to SEL in a Memory RAS Mode Select event message  which records the previous RAS Mode  from   and the newly selected RAS Mode  to   The event also includes an Offset value in ED1 which indicates whether the mode change  left the system with a RAS Mode active  Enabled   or not  Disabled     Independent Channel Mode selected   This sensor provides the  Spare Channel mode RAS Configuration status  Memory RAS Mode Select is an informational event     Table 60  Memory RAS Mode Select Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0001h   BIOS POST  9  11 Sensor Type Och   Memory  12 Sensor Number 12h  13 Event Direction and  7  Even
160. uct Families    11 4 System Event   PEF Action    The BMC is configurable to send alerts for events logged into the SEL  These alerts are called Platform Event Filters  PEF  and are  disabled by default  The user must configure and enable this feature  PEF events are logged if the BMC takes action due to a PEF  configuration  The BMC event triggering the PEF action will also be in the SEL     This is functionality built into the BMC to allow it to send alerts  SNMP or other  for any event that gets logged to the SEL  PEF filters  are turned off by default and have to be enabled manually using Intel   deployment assistant  Intel   syscfg utility  or an IPMl aware  utility     Table 81  System Event   PEF Action Sensor Typical Characteristics                Byte Field Description  11 Sensor Type 12h   System Event  12 Sensor Number 08h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   6Fh  Sensor Specific     14 Event Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   4h   PEF Action    15 Event Data 2 Not used  16 Event Data 3 Not used                            11 4 1 System Event   PEF Action   Next Steps    This event gets logged if the BMC takes an action due to PEF configuration  Actions can be sending an alert  along with possibly  resetting  power cycling  or powering down the system  There will be another event that has led t
161. uct Families is the interconnect between processors     The QPI Link Width Reduced sensor is used by the BIOS POST to report when the link width has been reduced  Therefore the  Generator ID will be 01h     The QPI Error sensors are reported by the BIOS SMI Handler to the BMC so the Generator ID will be 33h   6 4 1 QPI Link Width Reduced Sensor  BIOS POST has reduced the QPI Link Width because of an error condition seen during initialization     Table 52  QPI Link Width Reduced Sensor Typical Characteristics                   Byte Field Description  8 Generator ID 0001h   BIOS POST  9  11 Sensor Type 13h   Critical Interrupt  12 Sensor Number 09h  13 Event Direction and  7  Event direction  Event Type 0b   Assertion Event    1b   Deassertion Event   6 0  Event Type   77h  OEM Discrete   14 Event Data 1  7 6      10b   OEM code in Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset                      Revision 1 1 Intel order number G90620 002 63    Processor Subsystem  System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1600 1 400 Product Families    Byte Field Description  1h   Reduced to 1   2 width  2h   Reduced to 1   4 width                            15 Event Data 2 0 3   CPU1 4  16 Event Data 3 Not used  6 4 1 1 QPI Link Width Reduced Sensor   Next Steps    If the error continues    1  Check the processor is installed correctly    2  Inspect the socket for bent pins    3  Cr
162. upply Predictive Failure Event  12   3  Sensor Cross Reference LiSt             s  seccccssseeeeeesseeeeeeeeeeeeeessseeneeeeseeeeeeeesseeeeeenseeeeeeeeeseeneees 13  3 1 BMG owned Sensors  GID   00201  deet Greg ea aie 13  3 2 BIOS POST owned Sensors  GID   O0071bh    ENEE 24  3 3 BIOS SMI Handler owned Sensors  GID   00323     24  3 4 Node Manager   ME Firmware owned Sensors  GID   002Ch or 602Ch               25  3 5 Microsoft  OS owned Events  GID   0041   26  3 6 Linus  Kernel Panic Events  GID   00211  26  4  Power Subsystem S iiine ae nn eas ue araneosa eae ee 27  4 1 Threshold based Voltage Sensors ere tank Acehnese 27  4 2 Voltage Regulator Watchdog Timer Sensor             cceceeeeeeeeeeeeeeeeeeeeeeeneeeeetenaeeeeenes 33  4 2 1 Voltage Regulator Watchdog Timer Sensor     Next Gtenps 34   4 3 Eemere ageet eeng ege 34  4 3 1 Power Unit Status  SOMSOf cic  s sctce  ies cie ce eiapessenspieaseneasladashs dh aeedaneetnecsssbocapeestinae ss 34  4 3 2 Power Unit Redundancy Gensor  EEN 36  4 3 3 Node Auto Shutdown Sensor veau eEh E eege 37   4 4  ae 38  4 4 1 Power Supply Status Sensors EEN 38  4 4 2 Power Supply Power In Sensors kA 41  4 4 3 Power Supply Current Out   Sensors eieiei ed a Rn eideaeawe 42  4 4 4 Power Supply Temperature Sensors EEN 43  4 4 5 Power Supply Fan Tachometer Sensors AEN 44   e    GOING SUBSY STON E 45  5 1 Fan SGMSOLS EE 45  5 1 1 Fan Tachometer Sensors 2c cti scce cad ivancncuae cet cagaeeseiies Scar exnatsctee cadsajanemeetcsetacspeceens 45  5 1 2
163. vent Data 1  7 6      00b   Unspecified Event Data 2   5 4      00b   Unspecified Event Data 3   3 0      Event Trigger Offset   3h   OS Graceful Shutdown    15 Event Data 2 Not used  16 Event Data 3 Not used                            Table 101  Shutdown Reason OEM Event Record Typical Characteristics                         Byte Field Description  1 Record ID ID used for SEL Record access   2   3 Record Type  7 0    DDh   OEM timestamped  bytes 8 16 OEM defined   4 Timestamp Time when the event was logged  LS byte first    5          126 Intel order number G90620 002 Revision 1 1    System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel    Xeon    Processor E5 4600 2600 2400 1 600 1 400 Product Families  Microsoft Windows  Records                                                                Byte Field Description   6   7   8 IPMI Manufacturer 0137h  311d    IANA enterprise number for Microsoft   9 ID   10   11 Record ID Sequential number reflecting the order in which the records are read  The numbers start at 1 for the first entry in the SEL and  continue sequentially to n  the number of entries in the SEL    12 Shutdown Reason Shutdown Reason code from the registry  LSB first     13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown ReasonCode   14   15   16 Reserved 00h   Table 102  Shutdown Comment OEM Event Record Typical Characteristics   Byte Field Description   1 Record ID ID used for SEL Record access   2   3 Record Type  7 0    
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Anlisis Comparativo de conexiones usuales en estructuras de acero  Notice - Castorama  SP601 User Guide V1.cdr  User Manual - EasyLog.com  ADMINISTRATOR USER MANUAL    Copyright © All rights reserved. 
   Failed to retrieve file