Home
System Event Log Troubleshooting Guide for Intel® S5500/S3420
Contents
1. Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h Runtime Critical Stop that is core dump blue screen 15 Event Data 2 Not used 16 Event Data 3 Not used Table 97 Bug Check Blue Screen Code OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DEh OEM timestamped bytes 8 16 OEM defined Revision 1 1 Intel order number G7421 1 002 97 Microsoft Windows Records System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 4 Timestamp Time when event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311 IANA enterprise number for Microsoft 9 0157h 343 IANA enterprise number for Intel 10 The value logged depends on the Intelligent Management Bus Driver IMBDRV that is loaded 11 Sequence Number Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL
2. Byte Field Description 11 Sensor Type 02h Voltage 12 Sensor Number See Table 14 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 13 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Revision 1 1 Intel order number G7421 1 002 17 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 13 Voltage Sensors Event Triggers Description Event Trigger Assertion Deassert GN Severi Severi Description Hex Description everity everity 00h Lower non critical Degraded OK The voltage has dropped below its lower non critical threshold going low 02h Lower critical non fatal Degraded The voltage has dropped below its lower critical threshold going low 07h Upper non critical Degraded OK The voltage has gone over its upper non critical threshold going high 09h Upper critical non fatal Degraded The voltage has gone over its upper critical threshold going high Table 14 Voltage Sensors Next Steps SENER Sensor Name Nex
3. Ob DIMM Slot ID in Event Data 3 Bits 2 0 is not valid 1b DIMM Slot ID in Event Data 3 Bits 2 0 is valid 2 0 Error Type 000b Parity Error Type not known 001b Data Parity Error not used 010b Address Parity Error All other values are reserved 16 Event Data 3 7 5 Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached 000b Processor Socket 1 001b Processor Socket 2 All other values are reserved 4 3 Channel Number if valid on which the Parity Error occurred This value will be indeterminate and should be ignored if ED2 Bit 4 is Ob 00b Channel A or D For Processor Socket 1 Processor Socket 2 01b Channel Bor E 10b Channel C or F 11b Reserved 2 0 DIMM Slot ID if valid of the specific DIMM that was involved in the transaction that led to the parity error This value will be indeterminate and should be ignored if ED2 Bit 3 is Ob 000b DIMM Socket 1 001b DIMM Socket 2 All other values are reserved 56 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Memory Subsystem 7 2 2 1 Memory Address Parity Error Sensor Next Steps These are bit errors that are detected in the memory addressing hardware An Address Parity Error implies that the memory address transmitted to the DIMM addressing circuitry has been compromised and data read or written is com
4. Policy interface capability 0 Not Available 1 Available 1 Monitoring capability 0 Not Available 1 Available 2 Power limiting capability 0 Not Available 1 Available 15 Event Data 2 Not used 88 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Manageability Engine ME Events Byte Field Description 16 Event Data 3 Not used 13 3 1 Node Manager Operational Capabilities Change Next Steps Policy Interface available indicates that Intel Intelligent Power Node Manager is able to respond to the external interface about querying and setting Intel Intelligent Power Node Manager policies This is generally available as soon as the microcontroller is initialized Monitoring Interface available indicates that Intel Intelligent Power Node Manager has the capability to monitor power and temperature This is generally available when firmware is operational Power limiting interface available indicates that Intel Intelligent Power Node Manager can do power limiting and is indicative of an ACPI compliant OS loaded unless the OEM has indicated support for non ACPI compliant OS Current value of not acknowledged capability sensor will be retransmitted no faster than every 300 milliseconds Next steps depend on the policy that was set See the Node Manager Specification for more details
5. 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h System Boot 05h Timestamp Clock Synchronization 15 Event Data 2 For Event Trigger Offset 05h only Timestamp Clock Synchronization 00h 1st in pair 80h 2nd in pair 16 Event Data 3 Not used 9 2 System Firmware Progress Formerly Post Error The BIOS logs any POST errors to the SEL The 2 byte POST code gets logged in the ED2 and ED3 bytes in the SEL entry This event will be logged every time a POST error is displayed Even though this event indicates an error it may not be a fatal error If this is a serious error there will typically also be a corresponding SEL entry logged for whatever was the cause of the error this event may contain more information about what happened than the POST error event 64 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards System BIOS Events 9 2 1 Table 70 POST Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type OFh System Firmware Progress formerly POST Error 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger O
6. 2 1 Default Values in the SEL Records Unless otherwise noted in the event record descriptions the following are the default values in all SEL entries Byte 3 Record Type RT 02h System event record Byte 9 8 Generator ID 0020h BMC Firmware Byte 10 Event Message Revision ER 04h IPMI 2 0 Table 1 SEL Record Format Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT 02h System event record COh DFh OEM timestamped bytes 8 16 OEM defined See Table 3 EOh FFh OEM non timestamped bytes 4 16 OEM defined See Table 4 4 Timestamp Time when event was logged LS byte first 5 TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 6 23 20 09 UTC 7 Note There are various websites that will convert the raw number to a date time 4 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Basic Decoding of a SEL Record Byte Field Description 8 Generator ID RqSA and LUN if event was generated from IPMB 9 GID Software ID if event was generated from system software Byte 1 7 1 7 bit IC Slave Address or 7 bit system software ID 0 Ob ID is IPMB Slave Address 1b System software ID Software ID values 0001h BIOS POST for POST errors RAS Configuration State Time
7. Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 17h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used The QPI Fatal 2 Error is a continuation of QPI Fatal Error 44 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 6 4 3 1 Table 51 QPI Fatal 2 Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 18h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used QPI Fatal and Fatal 2 Next Steps Processor Subsystem This is an Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If the error continues 1 2
8. This unknown address is a fatal error including non Intel components o 60 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards PCI Express and Legacy PCI Subsystem Event Trigger Offset SE Description Next Steps Hex Description 04h Poisoned TLP Error Typically indicates a parity error in a TLP transaction This means If this is an onboard device the data received is not correct Update all BIOS firmware and drivers 05h Flow Control Protocol Error Indicates an error during initialization with the device not providing Replace the board enough flow control credits This means the bus configuration is incorrect and it cannot continue 06h Completion Timeout Error Indicates a transaction did not complete in the specified amount of time 07h Completer Abort Error Indicates a transaction had unexpected content or format 08h Receiver Buffer Overflow Error Indicates a synchronization problem between PCI Express devices Extremely rare 09h ACS Violation Error Access Control Services a transaction routing feature failed OAh Malformed TLP Error Indicates a transaction was sent with data exceeding the maximum allowed number of bytes This is not allowed and is a fatal error usually a firmware or driver problem OBh Received ERR_FATAL message from Indicates a fatal error occurred and
9. 1111b indicates that this field is unused and does not contain valid data 3 0 If Domain Instance Type ED3 is set to Local this field specifies the sparing domain local sub instances which channels are included in this sub instance 0000b Reserved 0001b Ch A Ch B Ch C only configuration possible on Intel S5500 S5520 Server Boards 0010b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 48 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 16 Event Data 3 6 4 Reserved 7 Domain Instance Type Ob Local memory sparing domain instance This SEL pertains to a local memory mirroring domain that is restricted to memory mirroring pairs within a processor socket only 1b Global memory sparing domain instance This SEL pertains to a global memory mirroring domain that pertains to memory mirroring between processor sockets 3 0 0 based Instance ID of this sparing domain Table 55 Mirrored Redundancy State Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Descript
10. 2 16 Event Data 3 Not used Not used 12 2 1 HSC Drive Slot Status Sensor Next Steps If during normal operation a drive gets reported as failed ensure that the drive was seated properly and the drive carrier was properly latched If that does not work replace the drive 12 3 HSC Drive Presence Sensor The HSC Drive Slot Presence sensor provides the current presence state for the drive in each of the slots After an AC power cycle there will be a SEL entry to report the presence of the drive in a slot and there will be another entry for any changes in the presence of drives after that 82 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 84 HSC Drive Presence Sensor Typical Characteristics Byte Field Description 8 Generator ID 0OCOh HSC Firmware HSBP A 9 00C2h HSC Firmware HSBP B 11 Sensor Type ODh Drive Slot Bay 12 Sensor Number 6 Slot HSBP 8 Slot HSBP 08h Drive Slot 0 Presence OAh Drive Slot 0 Presence 09h Drive Slot 1 Presence OBh Drive Slot 1 Presence OAh Drive Slot 2 Presence OCh Drive Slot 2 Presence OBh Drive Slot 3 Presence ODh Drive Slot 3 Presence OCh Drive Slot 4 Presence OEh Drive Slot 4 Presence ODh Drive Slot 5 Presence OFh Drive Slot 5 Presence 10h Drive Slot 6 Presence 11h Drive Slot 7 Presence 13 Event Direction and 7 Event
11. Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross test the processor if possible 6 4 2 QPI Non Fatal Error Sensor The system detected a QPI non fatal error that is recoverable This is an informational event Table 49 QPI Non Fatal Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 07h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 73h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used Revision 1 1 Intel order number G7421 1 002 43 Processor Subsystem System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards 6 4 2 1 QPI Non Fatal Error Sensor Next Steps This is an Informational event only Non Fatal errors are acceptable and normal at a low rate of occurrence If the error continues 1 Check the processor is installed correctly 2 Inspect the socket for bent pins 3 Cross test the processor if possible 6 4 3 QPI Fatal and Fatal 2 The system detected a QPI fatal or non recoverable error This is a fatal error Table 50 QPI Fatal Error Sensor Typical Characteristics
12. Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 2 Log area reset cleared 15 Event Data 2 Not used 16 Event Data 3 Not used 11 4 System Event PEF Action The BMC is configurable to send alerts for events logged into the SEL These alerts are called Platform Event Filters PEF and are disabled by default The user must configure and enable this feature PEF events are logged if the BMC takes action due to a PEF configuration The BMC event triggering the PEF action will also be in the SEL 78 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Miscellaneous Events This functionality is built into the BMC to allow it to send alerts SNMP or other for any event that gets logged to the SEL PEF filters are turned off by default and have to be enabled manually using Intel deployment assistant Intel syscfg utility or an IPMI aware utility 11 4 1 Table 80 System Event PEF Action Sensor Typical Characteristics Byte Field Description 11 Sensor Type 12h System Event 12 Sensor Number 08h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 11B Sensor specific event extension code in Event Data 2 5 4 00
13. Guide for Intel S5500 S3420 Series Server Boards 10 2 FP NMI Interrupt Chassis Subsystem The front panel interrupt button also referred to as NMI button is a recessed button on the front panel that allows the user to force a critical interrupt which causes a crash error or kernel panic Table 74 FP NMI Interrupt Sensor Typical Characteristics Byte Field Description 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 05h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 0 15 Event Data 2 Not used 16 Event Data 3 Not used 10 2 1 FP NMI Interrupt Next Steps The purpose of this button is for diagnosing software issues when a critical interrupt is generated the OS typically saves a memory dump This allows for exact analysis of what is going on in system memory which can be useful for software developers or for troubleshooting OS software and driver issues If this button was not actually pressed you should ensure there is no physical fault with the front panel This event only gets logged if a user pressed the NMI button and although it causes the OS to crash is not an error Revision 1 1 Intel order number G7421 1 002 73 Chassis Subsys
14. Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Sensor Specific offset as described in Table 9 15 Event Data 2 Not used 16 Event Data 3 Not used Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps Sensor Specific Offset Description Next Steps Hex Description 00h Power down System is powered down Informational Event 04h AC Lost AC removed Informational Event 05h Soft Power Control Generally means power good was lost This could be caused by the power supply subsystem or system components Failure in the system causing a shutdown 1 Verify all power cables and adapters are connected properly AC cables as well as the cables between the PSU and system components 2 Cross test the PSU if possible 3 Replace the power subsystem 06h Power Unit Failure Power subsystem experienced a Indicates a power supply failed failure 1 Remove and reapply AC power 2 Ifthe power supply still fails replace it 4 2 2 Power Unit Redundancy Sensor This sensor is enabled on systems that support redundant power supplies When a system has AC applied or if it loses redundancy of the pow
15. Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards Introduction The BMC allows access to SEL from in band and out of band mechanisms There are various tools and utilities that can be used to access the SEL There is the Intel SELViewer and multiple open sourced IPMI tools 1 2 3 Intel Intelligent Power Node Manager Version 1 5 Intel Intelligent Power Node Manager version 1 5 NM is a platform resident technology that enforces power and thermal policies for the platform These policies are applied by exploiting subsystem knobs such as processor P and T states that can be used to control power consumption Intel Intelligent Power Node Manager enables data center power and thermal management by exposing an external interface to management software through which platform policies can be specified It also enables specific data center power management usage models such as power limiting The configuration and control commands are used by the external management software or BMC to configure and control the Intel Intelligent Power Node Manager feature Because Platform Services firmware does not have any external interface external commands are first received by the BMC over LAN and then relayed to the Platform Services firmware over IPMB channel The BMC acts as a relay and the transport conversion device for these commands For simplicity the commands from the
16. Power down 03h Power cycle 08h Timer interrupt check to see whether the OS is still responsive The timer is disabled by default and has to be enabled manually It then 1 requires an IPMI aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so reset power down power cycle or generate a critical interrupt 2 watchdog timer Make sure you have support for this in your OS typically using a third party IPMl aware utility like ipmitool or ipmiutil along with the openipmi driver If this is the case then it is likely your OS has hung and you should investigate OS event logs to determine what may have caused this 76 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 11 2 SMI Timeout Miscellaneous Events SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events typically memory or PCI errors or other forms of critical interrupts in order to log them to the SEL If this interrupt times out the system is frozen Table 78 SMI Timeout Sensor Typical Characteristics Byte Field Description 11 Sensor Type F3h SMI Timeout 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type Ob Ass
17. Revision 1 1 Intel order number G7421 1 002 89 Manageability Engine ME Events 13 4 Node Manager Alert Threshold Exceeded System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Policy Correction Time Exceeded Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit Table 88 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 1Bh 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 0 Threshold exceeded 1 Policy Correction Time Exceeded Policy did not meet the contract for the defined policy The policy will continue to limit the power or shut down the platform based on the defined policy action 2 Reserved 1 0 Threshold Number Valid only if Byte 5 bit 3 is set to 0 0 to 2 Threshold index 15 Event Data 2 7 4 Reserved 3 0 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy ID 90 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 13 4 1 Node Manager Alert
18. S3420 Series Server Boards Byte Field Description 16 Event Data 3 7 Domain Instance Type Ob Local memory sparing domain instance This SEL pertains to a local memory sparing domain that is restricted to memory sparing pairs within a processor socket only 1b Global memory sparing domain instance This SEL pertains to a global memory sparing domain that pertains to memory sparing between processor sockets 6 4 Reserved 3 0 0 based Instance ID of this sparing domain Table 59 Sparing Redundancy State Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 01h Memory is configured in Spare System boots with spare Informational event 52 Channel Mode and the memory is operating in the fully redundant state with the spare channel inactive and available channel mode active one entry per processor 00h Memory is configured in Spare Channel Mode and the memory has lost redundancy and is operating in the degraded state with the spare channel active and used to replace a failed channel processor with failing memory to signify loss of redundancy Spare channel replaces failing This event should be accompanied by memory errors indicating the source of the channel one SEL entry for issue Troubleshoot accordingly probably replace affected DIMM Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel
19. S5500 S3420 Series Server Boards Memory Subsystem 7 2 ECC and Address Parity 1 Memory data errors are logged as correctable or uncorrectable 2 Uncorrectable errors are fatal 3 Memory addresses are protected with parity bits and a parity error is logged This is a fatal error 7 2 1 Memory Correctable and Uncorrectable ECC Error ECC errors are divided into Uncorrectable ECC Errors and Correctable ECC Errors A Correctable ECC Error actually represents a threshold overflow More Correctable Errors are detected at the memory controller level for a given DIMM within a given timeframe In both cases the error can be narrowed down to particular DIMM s The BIOS SMI error handler uses this information to log the data to the BMC SEL and identify the failing DIMM module Table 60 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number 02h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 61 15 Event Data 2 7 2 Reserved Set to 0 1 0 The logical rank associated with the failed DDR3 DIMM Revision 1 1 Intel order number G742
20. Steps Front Panel Temp 22h IOH Thermal Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps IOH Therm Margin 23h Processor 1 Memory Thermal Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Margin Mem P1 Thrm Mrgn 24h Processor 2 Memory Thermal Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps Margin Mem P2 Thrm Mrgn 30h 39h Fan Tachometer Sensors Fan Speed Sensors Table 28 Fan Speed Sensor Event Trigger Offset Next Steps Chassis specific sensor names 40h 45h Fan Present Sensors Fan Presence and Redundancy Table 30 Fan Presence Sensors Event Trigger Offset Next Steps Fan x Present Sensors 46h Fan Redundancy Fan Presence and Redundancy Table 32 Fan Redundancy Sensor Event Trigger Offset Next Steps Fan Redundancy Sensors 50h Power Supply 1 Status Power Supply Status Sensors Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps PS1 Status 51h Power Supply 2 Status Power Supply Status Sensors Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps PS2 Status 52h Power Supply 1 Power Supply AC Power Input Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next AC Power Input Sensors Steps PS1 Power In 53h Power Supply 2 Power Supply AC Power Input Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next AC Power Input Sensors Steps PS2 Powe
21. Steps Hex Description 01h Failure Power supply failed Indicates a power supply failed 1 Remove and reapply AC 2 Ifthe power supply still fails replace it 02h Predictive Failure Typically means a fan inside the power supply is not cooling the Replace the power supply power supply It may indicate the fan is failing 03h AC lost AC removed Informational Event 06h Configuration error Power supply configuration is not supported Indicates that at least one of the supplies is not correct for your system configuration 1 Remove the power supply and verify compatibility 2 If the power supply is compatible it may be faulty Replace it 4 3 2 Power Supply AC Power Input Sensors These sensors will log an event when a power supply in the system is exceeding its AC power in threshold Revision 1 1 Table 21 Power Supply AC Power Input Sensors Typical Characteristics Byte Field Description 11 Sensor Type OBh Other Units 12 Sensor Number 52h Power Supply 1 AC Power Input 53h Power Supply 2 AC Power Input 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 22 Intel order number G7421 1 002 25 Power Subsystems System Event Lo
22. Threshold Exceeded Next Steps First occurrence of an unacknowledged event will be retransmitted no faster than every 300 milliseconds First occurrence of Threshold exceeded event assertion deassertion will be retransmitted no faster than every 300 milliseconds Next steps depend on the policy that was set See the Node Manager Specification for more details 13 5 ME Firmware Health Event This sensor is used in Platform Event messages to the BMC containing health information including but not limited to firmware upgrade and application errors Table 89 ME Firmware Health Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch or 602Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 17h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 75h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Health event type Oh Firmware Status 15 Event Data 2 See Table 90 16 Event Data 3 See Table 90 Revision 1 1 Intel order number G7421 1 002 Manageability Engine ME Events 91 Manageability Engine ME Events 13 5 1 ME Firmware Health Event Next Steps In the following table Event Data 3 is only noted for specific errors System Event Log Troubleshooting Guide for Intel 5500 53420 Series Ser
23. air used to cool the system is within the thermal specifications for the system 28 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 5 Cooling Subsystem 5 1 Fan Sensors Cooling Subsystem There are three types of fan sensors that can be present on Intel Server Systems speed presence and redundancy The last two are only present in systems with hot swap redundant fans 5 1 1 Fan Speed Sensors Fan speed sensors monitor the rpm signal on the relevant fan headers on the platform Fan speed sensors are threshold based sensors Usually they only have lower critical thresholds set so that a SEL entry is only generated if the fan spins too slowly Table 27 Fan Speed Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 30h 39h Chassis specific 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 28 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Rev
24. direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 40 15 Event Data 2 Reading that triggered event 36 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards 5 2 4 Byte Field Description 16 Event Data 3 Threshold value that triggered event Table 40 Processor Thermal Control Sensors Event Triggers Description Event Trigger Assertion Deassert Ka e Description Hex Description Severity Severity 07h Upper non critical Degraded OK The thermal margin has gone over its upper non critical threshold going high 09h Upper critical non fatal Degraded The thermal margin has gone over its upper critical threshold going high Table 41 Processor Thermal Control Sensors Next Steps seg Sensor Name Next Steps Number 64h P1 Therm Ctl These events normally only happen due to failures of the thermal solution 65h P2 Therm Ctl 1 Verify the heatsink is properly attached and has thermal grease 2 Ifthe system has a heatsink fan ensure the fan is spinning 3 Check all system fans are operating properly 4 Check that the air used to cool the system i
25. fan presence sensors and will warn when redundancy is lost Typically the redundancy mode on Intel servers is an n 1 redundancy if one fan fails there are still sufficient fans to cool the system but it is no longer redundant although other modes are also possible 30 Table 29 Fan Presence Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 40h 45h Chassis specific 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 08h Generic digital Discrete Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Cooling Subsystem Byte Field Description 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 30 15 Event Data 2 Not used 16 Event Data 3 Not used The following table describes the severity of each of the event triggers for both assertion and deassertion Table 30 Fan Presence Sensors Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Ka SC Severit Severit Description Next Steps Hex Description eventy eventy 01h Device OK Degraded Assertion A fan was inserted This Informational
26. is being reported downstream Error OCh Unexpected Completion Error Indicates the device received a completion notification for a transaction it does not recognize ODh Received ERR NONFATAL Message Error Indicates a non fatal error is redefined as fatal and is being reported 8 1 3 Legacy PCI Errors Legacy PCI errors include PERR and SERR both are fatal errors Revision 1 1 Intel order number G7421 1 002 61 PCI Express and Legacy PCI Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 67 Legacy PCI Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 03h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 68 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Table 68 Legacy PCI Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 04h PERR Parity Error PERR asserted This is a fatal error 1 Decode bus device and function to identify the card 2 If th
27. ncencanatetenwed ceezedectutenuedieeaeeestutaeuedeueceeeeustenedinwecseetwncne 17 4 1 VIE SSMSONS A aaret ee TEE 17 4 2 Ewerling ge EIERE odie Eege ad E 21 4 2 1 Power Unit Status Sonsor Lu u44vemaurmmee meme viiite uk remeser 21 4 2 2 Power Unit Redundancy Sensor EE 22 4 3 Pors ua 24 4 3 1 Power Supply Status Sensors rrrrrrrrrrrrnrrrnnnnrrrrrrrrnnnrrnnnnsrrrrrennnnrnnnnssennteennnnn 24 4 3 2 Power Supply AC Power Input Sensors ccccccccccecccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeess 25 4 3 3 Power Supply Current Output Gensors 26 4 3 4 Power Supply Temperature Sensors sssssseneeeeseessttrrttnttterttrrttrrrererrrnrrtnnreent 27 5 Cooling Subsystem EE 29 5 1 Fan SORSONS EE 29 5 1 1 Fan SPES SOM SONS avart 29 5 1 2 Fan Presence and Redundancy Sensors neea 30 5 2 Temperature SS MS ONS et sates sais debra o E See Se Ae E O ee 33 5 2 1 Regular Temperature Sensors cic ea sce on cahe ENEE tie mid 33 5 2 2 Thermal Margin Zenspre gue EENEG 35 5 2 3 Processor Thermal Control Gensors 36 5 2 4 Discrete Thermal Sensors sus as 37 6 Processor SOSS A Soe Sue diinsiin 39 6 1 Processor Status SONSOM 22 5524 e Eege 39 6 2 Catastrophic Error SENSo ursedarieteste a degen gege 40 6 2 1 Catastrophic Error Sensor Next Gienps 41 Intel order number G74211 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Table of Contents 6 3 CPU Missing SENSOR EE 41 6 3 1 CPU
28. non critica JEnsureithesir used to eool eg going high threshold PRD F thermal specifications for the system typically below 09h Upper critical non fatal Degraded The temperature has gone over its upper critical 35 C going high threshold 12 2 HSC Drive Slot Status Sensor The HSC Drive Slot Status sensor provides the current status for drives in each of the slots Revision 1 1 Table 83 HSC Drive Slot Status Sensor Typical Characteristics Byte Field Description 8 Generator ID 00COh HSC Firmware HSBP A 9 00C2h HSC Firmware HSBP B 11 Sensor Type ODh Drive Slot Bay 12 Sensor Number 6 Slot HSBP 8 Slot HSBP Intel order number G7421 1 002 81 Hot Swap Controller Events System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards Byte Field Description 02h Drive Slot 0 Status 03h Drive Slot 1 Status 04h Drive Slot 2 Status 05h Drive Slot 3 Status 02h Drive Slot 0 Status 03h Drive Slot 1 Status 04h Drive Slot 2 Status 05h Drive Slot 3 Status 06h Drive Slot 4 Status 07h Drive Slot 5 Status 08h Drive Slot 6 Status 09h Drive Slot 7 Status O6h Drive Slot 4 Status 07h Drive Slot 5 Status 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 40h Failed Drive 14 Event Data 1 15 Event Data
29. only Present event may also get logged when the BMC initializes when AC is applied Deassert A fan was removed or These events only get generated in systems with hot swappable fans and normally was not present at the expected only when a fan is physically inserted or removed If fans were not physically removed location when the BMC initialized 1 Use the Quick Start Guide to check whether the right fan headers were used 2 Swap the fans round to see whether the problem stays with the location or follows the fan 3 Replace the fan or fan wiring housing depending on the outcome of step 2 4 Ensure the latest FRUSDR update has been run and the correct chassis was detected or selected Table 31 Fan Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 04h Fan 12 Sensor Number 46h Revision 1 1 Intel order number G7421 1 002 31 Cooling Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 32 15 Event Data 2 Not used 16 Event Data 3 Not used The following table describes the severity of eac
30. reset power down power cycle or generate a critical interrupt Table 76 IPMI Watchdog Sensor Typical Characteristics Byte Field Description 11 Sensor Type 23h Watchdog 2 12 Sensor Number 03h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 11B Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as describe in Table 77 Revision 1 1 Intel order number G7421 1 002 75 Miscellaneous Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 15 Event Data 2 7 4 Interrupt type Oh None 1h SMI 2h NMI Fh Unspecified Oh Reserved 1h BIOS FRB2 2h BIOS POST 3h OS Load 4h SMS OS 5h OEM Fh Unspecified 3h Messaging Interrupt All other Reserved 3 0 Timer use at expiration All other Reserved 16 Event Data 3 Not used Table 77 IPMI Watchdog Sensor Event Trigger Offset Next Steps Event Trigger Offset e Description Next Steps Hex Description 00h Timer expired Our server systems support a BMC watchdog timer which can If this event is being logged it is because the BMC has been configured to check the status only 01h Hard reset 02h
31. 1 1 002 53 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 16 Event Data 3 7 5 Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached 000b Processor Socket 1 001b Processor Socket 2 All other values are reserved 4 3 Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached 00b Channel A or D For Processor Socket 1 Processor Socket 2 01b Channel B or E 10b Channel C or F 11b is reserved 2 0 Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached 000b DIMM Socket 1 001b DIMM Socket 2 All other values are reserved Table 61 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps Event Trigger Offset SC Description Next Steps Hex Description 01h Uncorrectable ECC An uncorrectable multi bit ECC error has occurred This is a 1 If needed decode DIMM location from hex version of SEL Error fatal issue that will typically lead to an OS crash unless memory 2 Verify the DIMM is seated properly has been configured in a RAS mode The system will generate a 3 E Id fi d f the DIMM t if fact CATERR catastrophic error and an MCE Machine Check e OE SON AEA Exception Error 4 Inspect the processor socket this DIMM is connected to for bent pins While the error may be due to a fai
32. 2 Therm Margin 64h Processor 1 Thermal Control Processor Thermal Control Table 41 Processor Thermal Control Sensors Next Steps P1 Therm Ctrl Sensors 65h Processor 2 Thermal Control Processor Thermal Control Table 41 Processor Thermal Control Sensors Next Steps P2 Therm Ctrl Sensors 66h Processor 1 VRD Temp Discrete Thermal Sensors Table 43 Discrete Thermal Sensors P1 VRD Hot 67h Processor 2 VRD Temp Discrete Thermal Sensors Table 43 Discrete Thermal Sensors P2 VRD Hot 68h Catastrophic Error Catastrophic Error Sensor Catastrophic Error Sensor Next Steps CATERR 69h CPU Missing CPU Missing Sensor CPU Missing Sensor Next Steps CPU Missing Revision 1 1 Intel order number G7421 1 002 11 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards ed Sensor Name Details Section Next Steps 6Ah IOH Thermal Trip Discrete Thermal Sensors Table 43 Discrete Thermal Sensors IOH Thermal Trip 3 2 BIOS POST owned Sensors GID 0001h The following table can be used to find the details of sensors owned by BIOS POST Table 6 BIOS POST owned Sensors pre Sensor Name Details Section Next Steps 01h Mirroring Redundancy State Mirrored Redundancy State Sensor Table 55 Mirrored Redundancy State Sensor Event Trigger Offset Next Steps 06h POST Error System Firmware Progress Form
33. 3 Cross test the processor if possible Check the processor is installed correctly Inspect the socket for bent pins Revision 1 1 Intel order number G7421 1 002 45 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 7 Memory Subsystem Intel servers report memory errors status and configuration in the SEL 7 1 Memory RAS Mirroring and Sparing Memory RAS Configuration Status refers to the BIOS sending the current RAS mode and RAS operational state to the BMC to log into the SEL as a SEL record This allows a remote software application to query and retrieve the system memory state The memory configuration state sensors are virtual sensors In other words these sensors are owned and controlled completely by the BIOS independently of the BMC The RAS configuration and state definitions are aligned with the definitions within the Intelligent Platform Management Interface Specification Version 2 0 Accordingly these sensors are read as Status and Redundancy sensors Event Reading Type 0x09 and 0x0B respectively Sensor Number 12h Event Type 0x09 Mirroring Configuration Status Sensor Number 01h Event Type 0x0B Mirroring Redundancy State Sensor Number 13h Event Type 0x09 Sparing Configuration Status Sensor Number 11h Event Type 0x0B Sparing Redundancy State 7 1 1 Mirroring Configuration Status This sensor provides the Mi
34. Correctable errors Table 64 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps 06h Intel QuickPath Interface QPI Correctable Error Sensor QPI Correctable Error Sensor Next Steps Correctable Error 07h Intel QuickPath Interface Non QPI Non Fatal Error Sensor QPI Non Fatal Error Sensor Next Steps fatal Error 14h Memory Address Parity Error Memory Address Parity Error Memory Address Parity Error Sensor Next Steps 17h Intel QuickPath Interface Fatal QPI Fatal and Fatal 2 QPI Fatal and Fatal 42 Next Steps Error 18h Intel QuickPath Interface QPI Fatal and Fatal 42 QPI Fatal and Fatal 42 Next Steps Fatal2 Error 83h System Event System Events Not applicable Revision 1 1 Intel order number G7421 1 002 13 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 3 4 Hot Swap Controller Firmware owned Sensors GID 00COh 00C2h The following table can be used to find the details of sensors owned by the Hot Swap Controller HSC firmware The HSC firmware resides on a Hot Swap Back Plane HSBP There can be up to two HSBPs in a system Each HSBP will have its own GID Q0OCOh HSC Firmware HSBP A 00C2h HSC Firmware HSBP B Table 8 Hot Swap Controller Firmware owned Sensors en Sensor Name Details Section Next Steps 01h Backplane Temperature H
35. Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 59 15 Event Data 2 7 4 If Domain Instance Type ED3 is set to Local this field specifies the 0 based Socket ID of the processor that contains the sparing domain local sub instances A value of 1110b indicates that the sparing configuration specified in Bits 3 0 applies globally to all sockets in the system If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the second participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data 3 0 If Domain Instance Type ED3 is set to Local this field specifies the sparing domain local sub instances which channels are included in this sub instance 0000b Reserved 0001b Ch A Ch B Ch C only configuration possible on Intel 5500 S5520 Server Boards 0010b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this sparing domain global instance A value of 1111b indicates that this field is unused and does not contain valid data Intel order number G7421 1 002 Memory Subsystem 51 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500
36. Event Data 3 Not used Table 57 Sparing Configuration Status Sensor Event Trigger Offset Next Steps Event Trigger Offset Hex Description Description Next Steps 01h The system has configured into Spare Channel RAS mode setup Sparing mode is enabled in Informational event only 00h The system has configured out of Spare Channel RAS mode Sparing mode is disabled 1 either from setup or due to error in which case post error 8500 also occurs If this event is accompanied by a post error 8500 there was a problem applying the sparing configuration to the memory Check for other errors related to the memory and troubleshoot accordingly 2 If there is no post error then sparing mode was simply disabled in BIOS setup and this should be considered informational only 7 14 Sparing Redundancy State Sensor This sensor provides the RAS Redundancy state for the Spare Channel Mode 50 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 53420 Series Server Boards Revision 1 1 Table 58 Sparing Redundancy State Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type Och Memory 12 Sensor Number 11h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14
37. Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 1h Runtime Critical Stop that is core dump blue screen 15 Event Data 2 The second byte of panic string 16 Event Data 3 The third byte of panic string Revision 1 1 Intel order number G7421 1 002 99 Linux Kernel Panic Records System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 99 Linux Kernel Panic String Extended Record Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 FOh OEM non timestamped bytes 4 16 OEM defined 4 Slave Address The slave address of the card saving the panic 5 Sequence Number A sequence number starting at zero 6 Kernel Panic Data These hold the panic sting If the panic string is longer than 11 bytes multiple messages will be sent with increasing sequence numbers 16 100 Intel order number G7421 1 002 Revision 1 1
38. Guide for Intel S5500 S3420 Series Server Boards Sensor Cross Reference List ee Sensor Name Details Section Next Steps 09h Drive Slot 7 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps OAh Drive Slot 0 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OBh Drive Slot 1 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OCh Drive Slot 2 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps ODh Drive Slot 3 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OEh Drive Slot 4 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OFh Drive Slot 5 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 10h Drive Slot 6 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 11h Drive Slot 7 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 3 5 Node Manager ME Firmware owned Sensors GID 002Ch or 602Ch The following table can be used to find the details of sensors owned by the Node Manager Management Engine ME firmware Table 9 Management Engine Firmware owned Sensors Sensor Sensor Name Details Section Next Steps Number 17h ME Firmware Health Events ME Firmware Health Event ME Firmware Health Event Next Steps 18h No
39. Missing Sensor Next Ste pS hasene 42 6 4 QuickPath Interconnect Error Sensors AAA 42 6 4 1 QPI Correctable Error Sensor nn 42 642 QPINon Fatal Error Sensor kss erata a aeae SEENEN 43 6 4 3 OPI Fataland EE 44 T Memory Subsystem eege euEeEegEdeEEENENCAEdEENEEENENEAENEEeAEEENEENEENEEdENEh napina 46 7 1 Memory RAS Mirroring and Gpartmg 46 7 1 1 Mirroring Configuration Status adr 46 7 1 2 Mirrored Redundancy State Sensor urrrnnnnnnnnvnnnnnvvnnnnrnrnrnrrrrerrenrnrnneeneneereeenereen 47 7 1 3 Sparing Configuration Gtaius ccc eeeceneeee tere eee eeeeeeeeeeeeeeeeeeeeeaeeeeeeeeeeteee 49 7 1 4 Sparing Redundancy State Sensor AEN 50 7 2 ECC and Address Fab ugeet geesde 53 7 2 1 Memory Correctable and Uncorrectable ECC Error uuaxrrrrrnnvrrrrnnnnnnnnrnnnnvrrnnnnn 53 722 Memory Address Parity Error cccscceuticn wane eared wea emauiet 55 8 PCI Express and Legacy PCI Subsystem rrrrnnnnnnnnvvvnnnnnnnnnnnnnnnnvnnnnnnnnnnnnnnnnnvnnnnnnnnnnennn 58 8 1 PCI Expr 55 Errors ecacccs ccntactacehecses ccnencatectancetcensacateceancedcandecateceaneet deep Bk kann d akkar 58 8 1 1 PCI Express Correctable ROMs ia ces certcacdsecncaneces aacencsn cane aa 58 8 1 2 PCI Express Fatal e 59 81 3 Legacy PCl Errors Luske dene 61 9 System BIOS Eege eden 63 9 1 System EE 63 9 1 1 System Poet sroiugegteut auser shee tak caves NEES dE AEEAEE SEENEN d er 63 9 1 2 Timestamp Clock Synchronization srrsnnvvrrrrnnnnnnrnrnnnvrrrnnnnnnnnnr
40. S3420 Series Server Boards List of Tables List of Tables BE ale Ke ere RS un akk Gere de eee teeta A Table 2 Event Request Message Event Data Field Contents rrrnnrrrrnnnnnnnnnnnnnnrrrrnnnnnnnrrrnnnnnr 6 Table 3 OEM SEL Record Type COh DFh EEN 7 Table 4 OEM SEL Record Type EOR FER 7 Table 5 BMC owned RE das kan bemannede 8 Table 6 BIOS POST owned Sensors rrrrrrnrnrnnrnnnnnnnnnnnnnnnrnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nn 12 Table 7 BIOS SMI owned Gensors eenen eenen nennen ennen nanen ennenen ennenen nnne 13 Table 8 Hot Swap Controller Firmware owned Sensors cccccccceesseeseccceeeeeeeeaeeeeseceeeeeeeeeaes 14 Table 9 Management Engine Firmware owned Gensors 15 Table 10 Microsoft OS owned Events 16 Table 11 Linux Kernel Panic Events 16 Table 12 Voltage Sensors Typical Characteristics rrrrrrrrrrrrorrorrnnnrrrrnrrrnnnrnrnnrnrrnnnnnnnnnnnnnn 17 Table 13 Voltage Sensors Event Triggers Description 18 Table 14 Voltage Sensors Next Gens 18 Table 15 Power Unit Status Sensors Typical Characteristics rrrrrrnnrrrrrnnnnnnnnrrnnnnrrrnnnnnnnnnnn 21 Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Greng 22 Table 17 Power Unit Redundancy Sensors Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrnrrrnnnnnnr 23 Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps 0000ssne111 23 Table 19 Power Supply Status S
41. SC Backplane Temperature Sensor Table 82 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps 02h Drive Slot 0 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 03h Drive Slot 1 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 04h Drive Slot 2 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 05h Drive Slot 3 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 06h Drive Slot 4 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 07h Drive Slot 5 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 6 Slot HSBP 08h Drive Slot 0 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 09h Drive Slot 1 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OAh Drive Slot 2 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OBh Drive Slot 3 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps OCh Drive Slot 4 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps ODh Drive Slot 5 Presence HSC Drive Presence Sensor HSC Drive Presence Sensor Next Steps 8 Slot HSBP 08h Drive Slot 6 Status HSC Drive Slot Status Sensor HSC Drive Slot Status Sensor Next Steps 14 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting
42. SC Firmware HSBP A 00C2h HSC Firmware HSBP B 11 Sensor Type 01h Temperature 12 Sensor Number 01h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 82 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event 80 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Hot Swap Controller Events Table 82 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps Event Trigger Assertion Deassert S Description Next Steps Hex Description Severity Severity 00h Lower non critical Degraded OK The temperature has dropped below its lower non critical 1 Check for clear and unobstructed airflow into and out going low threshold of the chassis 02h Lower critical non fatal Degraded The temperature has dropped below its lower critical 2 Ensure the SDR is programmed and correct chassis going low threshold has been selected ot U itical D ded OK The t A h it itical 3 Ensure there are no fan failures pper non critica egrade e temperature has gone over its upper
43. Sensors Next Steps BB 1 5V P2 DDR3 15h BB 1 8V AUX Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 8V AUX 16h BB 3 3V Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V 17h BB 3 3V STBY Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V STBY 18h BB 3 3V Vbat Voltage Sensors Table 14 Voltage Sensors Next Steps BB 3 3V Vbat 19h BB 5 0V Voltage Sensors Table 14 Voltage Sensors Next Steps BB 5 0V 1Ah BB 5 0V STBY Voltage Sensors Table 14 Voltage Sensors Next Steps BB 5 0V STBY 1Bh BB 12 0V Voltage Sensors Table 14 Voltage Sensors Next Steps BB 12 0V 1Ch BB 12 0V Voltage Sensors Table 14 Voltage Sensors Next Steps BB 12 0V 1Dh BB 1 35V P1 LV DDR3 Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 35v P1 MEM Revision 1 1 Intel order number G7421 1 002 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards ed Sensor Name Details Section Next Steps 1Eh BB 1 35V P2 LV DDR3 Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 35v P2 MEM 20h Baseboard Temperature Regular Temperature Sensors Table 35 Temperature Sensors Next Steps Baseboard Temp 21h Front Panel Temperature Regular Temperature Sensors Table 35 Temperature Sensors Next
44. System Event Log Troubleshooting Guide for Intel 5500 S3420 Series Server Boards Intel order number G7421 1 002 ERVER BOARD inside Revision 1 1 December 2013 Platform Collaboration and Systems Division Marketing Revision History System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Revision History Number August 2012 Initial draft Corrected IPMI Watchdog and PEF Sensors Typical Characteristics tables December 2013 1 1 Clarified Channel designators for DIMM memory errors Added ME sensor 17h ii Intel order number G74211 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards Disclaimers Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS NO LICENSE EXPRESS OR IMPLIED BY ESTOPPEL OR OTHERWISE TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE MERCHANTABILITY OR INFRINGEMENT OF ANY PATENT COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT A Mission Critical Application is any application in which failure of the Intel Product could result directly or indirectly in personal injury or
45. System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Power Subsystems See Sensor Name Next Steps 1Dh BB 1 35 P1 Mem This 1 35V line is supplied by the main board This 1 35V line is used by low voltage memory on processor 1 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test the DIMMs 4 Ifthe issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 1Eh BB 1 35 P2 Mem This 1 35V line is supplied by the main board This 1 35V line is used by low voltage memory on processor 2 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test the DIMMs 4 Ifthe issue remains with the DIMMs on this socket replace the main board otherwise the DIMM 4 2 Power Unit The power unit monitors the power state of the system and logs the state changes in the SEL 4 2 1 Power Unit Status Sensor The power unit status sensor monitors the power state of the system and logs state changes Expected power on events such as DC ON OFF are logged and unexpected events are also logged such as AC loss and power good loss Revision 1 1 Table 15 Power Unit Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number 01h Intel order number G7421 1 002 2 Power Subsystems System Event Log Troubleshooting
46. and continue sequentially to n the number of entries in the SEL 12 Bug Check Blue Screen Data The first record of this type will contain the Bug Check Blue Screen Stop code and will be followed by the four Bug Check Blue 13 Screen parameters LSB first 14 Note that each of the Bug Check Blue Screen parameters requires two records each 15 Both of the two records for each parameter will have the same Record ID There will be a total of 9 records 16 Operating system type 00 32 bit OS 01 64 bit OS 98 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 15 Linux Kernel Panic Records Linux Kernel Panic Records The OpenIPMI driver supports the ability to put semi custom and custom events in the system event log if a panic occurs If you enable the Generate a panic event to all BMCs on a panic option you will get one event on a panic in a standard IPMI event format If you enable the Generate OEM events containing the panic string option you will also get a set of OEM events holding the panic string Table 98 Linux Kernel Panic Event Record Characteristics Byte Field Description 8 Generator ID 0021h Kernel 9 10 EvM Rev 03h IPMI 1 0 format 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number The first byte of the panic string 0 if no panic string 13 Event Direction and Event Type 7
47. aracteristics Byte Field Description 8 Generator ID 002Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 18h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 Node Manager Policy event 0 Reserved 1 Policy Correction Time Exceeded Policy did not meet the contract for the defined policy The policy will continue to limit the power or shut down the platform based on the defined policy action 2 Reserved 1 0 00b 15 Event Data 2 4 7 Reserved 0 3 Domain Id Currently supports only one domain Domain 0 16 Event Data 3 Policy Id Revision 1 1 Intel order number G7421 1 002 85 Manageability Engine ME Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 13 1 1 Node Manager Exception Event Next Steps This is an informational event Next steps depend on the policy that was set See the Node Manager Specification for more details 13 2 Node Manager Health Event A Node Manager Health Event message provides a runtime error indication about Intel Intelligent Power Node Manager s health Types of service that can send an error are defined as follows Misconfigured policy Error reading power data Err
48. b Unspecified Event Data 3 3 0 Event Trigger Offset 4 PEF Action 15 Event Data 2 7 6 Reserved 5 1b Diagnostic Interrupt NMI 4 1b OEM action 3 1b Power cycle 2 1b Reset 1 1b Power off 0 1b Alert 16 Event Data 3 Not used System Event PEF Action Next Steps This event gets logged if the BMC takes an action due to PEF configuration Actions can be sending an alert or resetting power cycling or powering down the system There will be another event that has led to the action so you should investigate the SEL and PEF settings to identify this event and troubleshoot accordingly Revision 1 1 Intel order number G7421 1 002 79 Hot Swap Controller Events 12 Hot Swap Controller Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards The Hot Swap Controller HSC implements the same basic sensor model that is utilized by the other management controllers in the system Sensor model information is contained in the document Intelligent Platform Management Interface Specification A common set of IPMI commands is used for configuring the sensors and returning threshold status 12 1 HSC Backplane Temperature Sensor There is a thermal sensor on the Hot Swap Backplane to measure the ambient temperature Table 81 HSC Backplane Temperature Sensor Typical Characteristics Description Byte Field 8 Generator ID 9 00COh H
49. ces are mapped in the operating system by bus device and function Each device is uniquely identified by the bus device and function PCle device information can be found in the operating system 8 1 1 PCI Express Correctable Errors When a PCI Express correctable error is reported to the BIOS SMI handler it will record the error using the following format Table 63 PCI Express Correctable Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 05h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 71h OEM Specific 58 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards PCI Express and Legacy PCI Subsystem Byte Field Description 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 64 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Table 64 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Receiver error Correctable error occurred Informationa
50. cy Sensor Event Trigger Offset Next Steps Pwr Unit Redund 03h IPMI Watchdog IPMI Watchdog Table 77 IPMI Watchdog Sensor Event Trigger Offset Next Steps IPMI Watchdog 04h Physical Security Physical Security Table 73 Physical Security Sensor Event Trigger Offset Next Steps Physical Scrty 05h FP Interrupt FP NMI Interrupt FP NMI Interrupt Next Steps FP NMI Diag Int 06h SMI Timeout SMI Timeout SMI Timeout Next Steps SMI Timeout 07h System Event Log System Event Log Cleared Not applicable System Event Log 08h System Event System Event PEF Action System Event PEF Action Next Steps System Event 09h Button Press Event Button Press Events Not applicable Button Press Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Sensor Cross Reference List Paes Sensor Name Details Section Next Steps Number 10h BB 1 1V IOH Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 1V IOH 11h BB 1 1V P1 Vccp Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 1V P1 Vccp 12h BB 1 1V P2 Vccp Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 1V P2 Vccp 13h BB 1 5V P1 DDR3 Voltage Sensors Table 14 Voltage Sensors Next Steps BB 1 5V P1 DDR3 14h BB 1 5V P2 DDR3 Voltage Sensors Table 14 Voltage
51. de Manager Exception Events Node Manager Exception Event Node Manager Exception Event Next Steps 19h Node Manager Health Events Node Manager Health Event Node Manager Health Event Next Steps 1Ah Node Manager Operational Capabilities Node Manager Operational Capabilities Change Node Manager Operational Capabilities Change Next Steps Change Events 1Bh Node Manager Alert Threshold Exceeded Node Manager Alert Threshold Exceeded Node Manager Alert Threshold Exceeded Next Steps Events Revision 1 1 Intel order number G7421 1 002 15 Sensor Cross Reference List System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 3 6 Microsoft OS owned Events GID 0041 The following table can be used to find the details of records that are owned by the Microsoft Operating System OS Table 10 Microsoft OS owned Events Sensor Name bai Sensor Type Details Section Next Steps Boot Event 02h 1Fh OS Boot Table 91 Boot up Event Record Typical Characteristics Not applicable DCh Not applicable Table 92 Boot up OEM Event Record Typical Characteristics Shutdown Event 02h 20h OS Stop Shutdown Table 93 Shutdown Reason Code Event Record Typical Characteristics Not applicable DDh Not applicable Table 94 Shutdown Reason OEM Event Record Typical Characteristics Not applicable Table 95 Shutdown Comment OEM Event Record Typical Characteristics Bug Check Blue Scre
52. death SHOULD YOU PURCHASE OR USE INTEL S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES SUBCONTRACTORS AND AFFILIATES AND THE DIRECTORS OFFICERS AND EMPLOYEES OF EACH HARMLESS AGAINST ALL CLAIMS COSTS DAMAGES AND EXPENSES AND REASONABLE ATTORNEYS FEES ARISING OUT OF DIRECTLY OR INDIRECTLY ANY CLAIM OF PRODUCT LIABILITY PERSONAL INJURY OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN MANUFACTURE OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them The information here is subject to change without notice Do not finalize a design with this information The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications Current characterized errata are available on request Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Copies of documents wh
53. direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 08h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Device Removed Device Absent 1h Device Inserted Device Present 15 Event Data 2 Not used 16 Event Data 3 Not used 12 3 1 HSC Drive Presence Sensor Next Steps On AC power on the drive presence will be logged as an informational event Revision 1 1 Intel order number G7421 1 002 Hot Swap Controller Events 83 Hot Swap Controller Events System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards If during normal operation a drive is removed or installed it will also log an event If you get a drive removed or installed without operator intervention ensure that the drive was seated properly and the drive carrier was properly latched 84 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 13 Manageability Engine ME Events The Manageability Engine controls the PECI interface and also contains the Node Manager functionality 13 1 Node Manager Exception Event Manageability Engine ME Events A Node Manager Exception Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit Table 85 Node Manager Exception Sensor Typical Ch
54. ed event Revision 1 1 Intel order number G7421 1 002 33 Cooling Subsystem 34 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 34 Temperature Sensors Event Triggers Description Event Trigger Hex Description Assertion Severity Deassert Severity Description 00h Lower non critical going low Degraded OK The temperature has dropped below its lower non critical threshold 02h Lower critical going low non fatal Degraded The temperature has dropped below its lower critical threshold 07h Upper non critical going high Degraded OK The temperature has gone over its upper non critical threshold 09h Upper critical going high non fatal Degraded The temperature has gone over its upper critical threshold Table 35 Temperature Sensors Next Steps Sensor Name Sensor Number Next Steps Baseboard Temp 20h PSN ER Check for clear and unobstructed airflow into and out of the chassis Ensure the SDR is programmed and correct chassis has been selected Ensure there are no fan failures Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C Front Panel Temp 21h l gt a 1 2 the front panel temperature reads zero check It is connected properly The FRUSDR has been programmed correctly
55. em BIOS Events System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards 70 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Chassis Subsystem 10 Chassis Subsystem The BMC monitors several aspects of the chassis Next to logging when the power and reset buttons get pressed the BMC also monitors chassis intrusion if a chassis intrusion switch is included in the chassis as well as looking at the network connections and logging an event whenever the physical network link is lost 10 1 Physical Security Two sensors are included in the physical security subsystem chassis intrusion and LAN leash lost 10 1 1 Chassis Intrusion Chassis Intrusion is monitored on supported chassis and the BMC logs corresponding events when the chassis lid is opened and closed 10 1 2 LAN Leash Lost The LAN Leash lost sensor monitors the physical connection on the onboard network ports If a LAN Leash lost event is logged this means the network port lost its physical connection Table 72 Physical Security Sensor Typical Characteristics Byte Field Description 11 Sensor Type 05h Physical Security 12 Sensor Number 04h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecif
56. ement capabilities in the server system and operates independently of the main processor by monitoring the on board instrumentation Through the BMC IPMI also allows administrators to control power to the server and remotely access BIOS configuration and operating system console information IPMI defines a common platform instrumentation interface to enable interoperability between Revision 1 1 Intel order number G7421 1 002 Introduction System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards The baseboard management controller and chassis The baseboard management controller and systems management software Between servers IPMI enables the following Common access to platform management information consisting of Local access from systems management software Remote access from LAN Inter chassis access from Intelligent Chassis Management Bus Access from LAN serial modem IPMB PCI SMBus or ICMB available even if the processor is down PMI interface isolates systems management software from hardware Hardware advancements can be made without impacting the systems management software PMI facilitates cross platform management software You can find more information on IPMI at the following URL http www intel com design servers ipmi 1 2 2 Baseboard Management Controller BMC A baseboard management controller BMC is a specialized microcontroller embedded on most I
57. en 02h 20h OS Stop Shutdown Table 96 Bug Check Blue Screen OS Stop Event Record Typical Characteristics Not applicable DEh Not applicable Table 97 Bug Check Blue Screen Code OEM Event Record Typical Characteristics 3 7 Linux Kernel Panic Events GID 0021 The following table can be used to find the details of records that can be generated when there is a Linux Kernel panic Table 1 1 Linux Kernel Panic Events Sensor Name Hk Sensor Type Details Section Next Steps Linux Kernel Panic 02h 20h OS Stop Shutdown Table 98 Linux Kernel Panic Event Record Characteristics Not applicable FOh Not applicable Table 99 Linux Kernel Panic String Extended Record Characteristics 16 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Power Subsystems 4 Power Subsystems The BMC monitors the power subsystem including power supplies select onboard voltages and related sensors 4 1 Voltage Sensors The BMC monitors the main voltage sources in the system including the baseboard memory and processors using IPMI compliant analog threshold sensors Note A voltage error could be caused by the device supplying the voltage or by the device using the voltage For each sensor it will be noted who is supplying the voltage and who is using it Table 12 Voltage Sensors Typical Characteristics
58. ensors Next Giens 40 Catastrophic Error Sensor Typical Charachertstcs AAA 40 CPU Missing Sensor Typical Characteristics seeseeseeseeeeeeeseeeeeeeeeeeeeeeeeea 41 QPI Correctable Error Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrrrnrnnnnnnn 42 QPI Non Fatal Error Sensor Typical Characteristics sssnnnneeeeeeeennerr neeesser 43 QPI Fatal Error Sensor Typical Characteristics rrnvrrrrrnnnnnnrrnnnnvrrrnnnnnnnrnrnnnnrrnnnnn 44 QPI Fatal 2 Error Sensor Typical Characteristics c ccccceeeeseeeeeeeeeeeeeeeeeeeeneees 45 Mirroring Configuration Status Sensor Typical Characteristics rrrrrrrnnnrrnnnnnnrnnr 46 Mirroring Configuration Status Sensor Event Trigger Offset Next Steps 47 Mirrored Redundancy State Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrnnnnnr 48 Mirrored Redundancy State Sensor Event Trigger Offset Next Giens 49 Sparing Configuration Status Sensor Typical Characteristics rrrrrrrrrrrrrrrrnnnnnnr 49 Sparing Configuration Status Sensor Event Trigger Offset Next Steps 50 Sparing Redundancy State Sensor Typical Characteristics seeeeeeeeeeeeeeee 51 Sparing Redundancy State Sensor Event Trigger Offset Next Giepns 52 Correctable and Uncorrectable ECC Error Sensor Typical Characteristics 53 Correctable and Uncorrectable ECC Error Sensor Event Trigg
59. ensors Typical Characheristtces eneee 24 Table 20 Power Supply Status Sensor Sensor Specific Offsets Next Steps 008 24 Table 21 Power Supply AC Power Input Sensors Typical Characteristics 00000000000000000 25 Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next Steps 26 Table 23 Power Supply Current Output Sensors Typical Characteristics rrrrrrrrrrnnnrnrrr 26 Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Steps 27 Table 25 Power Supply Temperature Sensors Typical Characteristics rrrrnnnrrnnnnrrrrrrrrrnnnn 28 Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Greng 28 Table 27 Fan Speed Sensors Typical Characteristics c cccccceeeeeeeeeeeeneeeeeeeeeeeeeeeneeeeeeees 29 Table 28 Fan Speed Sensor Event Trigger Offset Next Giens 30 Table 29 Fan Presence Sensors Typical Characteristics rrrrrrnnnnnnnnnnnnnnnnnnnrrnnnnnnnnnnnnnnnnnnnnn 30 Table 30 Fan Presence Sensors Event Trigger Offset Next Steps rrnnnnnnrrrnnnnrrrnnnnnnnnnr 31 Table 31 Fan Redundancy Sensors Typical Charachertsiices eeaeee eenen 31 Table 32 Fan Redundancy Sensor Event Trigger Offset Next Greng 32 Table 33 Temperature Sensors Typical Characteristics rrrrnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn 33 Table 34 Temperature Sensors Event Triggers Descri
60. er Offset Next Steps54 Address Parity Error Sensor Typical Characteristics rrrrrrrrrrnnnrrnnnnrrrrrrrrnnnrnnnnnr 55 PCI Express Correctable Error Sensor Typical Characteristics rrrrrrrrrnnnnrrrnnnre 58 PCI Express Correctable Error Sensor Event Trigger Offset Next Steps 59 PCI Express Fatal Error Sensor Typical Charachertstics eneee 60 PCI Express Fatal Error Sensor Event Trigger Offset Next Steps rrrrrrrnnnnnnnrrn 60 Legacy PCI Error Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrnrnnnnnnn 62 Legacy PCI Error Sensor Event Trigger Offset Next Gitepns 62 System Event Sensor Typical Characteristics sseseeeseeseeeeeesseeeeseeeeeeeeeeees 64 POST Error Sensor Typical Characteristics cccccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeees 65 POST Error Gode S EEN 65 Physical Security Sensor Typical Characteristics rrrrrrrrrnnrrrrrnnnnnnnrnnnnnrrrrrnnnnnnennn 71 Physical Security Sensor Event Trigger Offset Next Steps cseeeeeeeeeeeeeeeeees 72 FP NMI Interrupt Sensor Typical Characteristics nneesseeenneeeneeeseennerrnreeserrene 73 Button Press Events Sensor Typical Characteristics AAA 74 IPMI Watchdog Sensor Typical Characteristics nnnnnnnnnrrrrttrrttttttrrrtrrrrrrrrrrrnna 75 IPMI Watchdog Sensor Event Trigger Offset Next Giepns ssneenesssseeerrrneeseerrene 76 SMI Timeo
61. er System Firmware Progress Formerly Post Error Next Steps Post Error 11h Sparing Redundancy State Sparing Redundancy State Sensor Table 59 Sparing Redundancy State Sensor Event Trigger Offset Next Steps 12h Mirroring Configuration Status Mirroring Configuration Status Table 53 Mirroring Configuration Status Sensor Event Trigger Offset Next Steps 13h Sparing Configuration Status Sparing Configuration Status Table 57 Sparing Configuration Status Sensor Event Trigger Offset Next Steps 83h System Event System Events Not applicable 3 3 BIOS SMI owned Sensors GID 0033h The following table can be used to find the details of sensors owned by BIOS SMI 12 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Sensor Cross Reference List Table 7 BIOS SMI owned Sensors SE Sensor Name Details Section Next Steps Number 02h Memory ECC Error Memory Correctable and Table 61 Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Uncorrectable ECC Error Next Steps 03h Legacy PCI Error Legacy PCI Errors Table 68 Legacy PCI Error Sensor Event Trigger Offset Next Steps 04h PCI Express Fatal Error PCI Express Fatal Errors Table 66 PCI Express Fatal Error Sensor Event Trigger Offset Next Steps 05h PCI Express Correctable Error PCI Express
62. er supplies a message will get logged into the SEL 22 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 17 Power Unit Redundancy Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h Power Unit 12 Sensor Number 02h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 18 15 Event Data 2 Not used 16 Event Data 3 Not used Table 18 Power Unit Redundancy Sensor Event Trigger Offset Next Steps Power Subsystems Event Trigger Offset Description Next Steps Hex Description 00h Fully redundant System is fully operational Informational Event 01h Redundancy lost System is not running in redundant This event should be accompanied by specific power supply errors AC 02h Redundaneydearaded power supply mode lost PSU failure and so on Troubleshoot these events accordingly 03h Non redundant sufficient from redundant 04h Non redundant sufficient from insufficient 05h Non redundant insufficient 06h Non redundant degraded from fully redundant 07h Redundant degraded from non redundant Revision 1 1 Intel order
63. ertion Event 1b Deassertion Event 6 0 Event Type 03h digital Discrete 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1 State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used 11 2 1 SMI Timeout Next Steps This event normally only occurs after another more critical event 1 Check the SEL for any critical interrupts memory errors bus errors PCI errors or any other serious errors 2 lf these are not present the system locked up before it was able to log the original issue In this case low level debug is normally required Revision 1 1 Intel order number G7421 1 002 77 Miscellaneous Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 11 3 System Event Log Cleared The BMC logs a SEL clear event This is only ever the first event in the SEL Cause of this event is either a manual SEL clear using Intel SEL Viewer or some other IPMI aware utility or is done in the factory as one of the last steps in the manufacturing process This is an informational event only Table 79 System Event Log Cleared Sensor Typical Characteristics Byte Field Description 11 Sensor Type 10h Event Logging Disabled 12 Sensor Number 07h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14
64. escription 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DDh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 9 0157h 343 IANA enterprise number for Intel 10 The value logged depends on the Intelligent Management Bus Driver IMBDRV that is loaded 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Shutdown Comment Shutdown Comment from the registry LSB first 13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown Comment 14 15 16 Reserved 00h 96 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 14 3 Bug Check Blue Screen Event Records Microsoft Windows Records When the system experiences a bug check blue screen there will be multiple records written to the event log The first is a Bug Check Blue Screen OS Stop Shutdown Event Record this can be followed by multiple Bug Check Blue Screen code OEM records that will contain the Bug Check Blue Screen codes This information can be used to determine what caused the failure Table 96 Bug Check Blue Screen OS Stop Event Record Typical Characteristics
65. eseeeeeneeseneeees 82 12 3 1 HSC Drive Presence Sensor Next Giepe eenen eneee nenen nenene 83 13 Manageability Engine ME Events rrrrrnnnnnnnvvnnnnnnnnnnnnnnnnvvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnr 85 13 1 Node Manager Exception Event 0 2 ccccehecccedeteeenededscdtedetacenedetncetecetauescdetecerndedeeens 85 13 1 1 Node Manager Exception Event Next Giens 86 13 2 Node Manager Health Event in efeueekegerogetegeN Dee eege otet tees eege ege Eege egENee 86 13 2 1 Node Manager Health Event Next Steps AAA 87 13 3 Node Manager Operational Capabilities Change 88 13 3 1 Node Manager Operational Capabilities Change Next Steps eee 89 13 4 Node Manager Alert Threshold Evceecded 90 13 4 1 Node Manager Alert Threshold Exceeded Next Giepns 91 13 5 ME Firmware Health Event 91 13 5 1 ME Firmware Health Event Next Steps rrrrrrrnnnnrnnnnnrrrrrrrrnnnrrnnnnsrrrrrrrnnnrnnnnnr 92 14 Microsoft Windows Records ssnessnsaunenansnnnanannnonenantannenanandnnnnnnnnnenkktannendndndnenndkadnenknnaknenn 93 14 1 Boot up Event RECOPS TE 93 14 2 Shutdown Event Records 94 14 3 Bug Check Blue Screen Event Records wwwrrrrnnvrrrnnnnnnnnvnvnnnvrrrrnnnnnnnrrnnnnnrrrnene 97 15 Linux Kernel Panic Reeorgde ze eastze t stessesbue uzetes ue ste ugbetes ua se d eted ken ebeiadankobkedkan 99 vi Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel SS500
66. eserved FFh 92 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 14 Microsoft Windows Records Microsoft Windows Records With Microsoft Windows Server 2003 R2 and later versions an Intelligent Platform Management Interface IPMI driver was added This added the capability of logging some OS events to the SEL The driver can write multiple records to the SEL for the following events Boot up Shutdown Bug Check Blue Screen 14 1 Boot up Event Records When the system boots into the Microsoft Windows OS there can be two events logged The first is a boot up record and the second is an OEM event These are informational only records Table 91 Boot up Event Record Typical Characteristics Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 1Fh OS Boot 12 Sensor Number 00h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 1h C boot completed 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number G7421 1 002 93 Microsoft Windows Records System Event Log Troublesho
67. ffset 0 15 Event Data 2 Low Byte of POST Error Code 16 Event Data 3 High Byte of POST Error Code System Firmware Progress Formerly Post Error Next Steps See the following table for POST error Codes Revision 1 1 Table 71 POST Error Codes 0012 CMOS date time not set Major 0048 Password check failed Major 0108 Keyboard component encountered a locked error Minor 0109 Keyboard component encountered a stuck key error Minor Intel order number G7421 1 002 65 System BIOS Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 66 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards System BIOS Events 8526 DIMM_D1 failed Self Test BIST Major 8527 DIMM_D2 failed Self Test BIST Major J IS Ire J Jw Eror Cove 8500 8501 8502 8520 8521 8522 8523 8524 8525 8526 8527 8528 8529 852A 8528 8540 en 8542 men 8544 Co 8566 EN men 8500 Revision 1 1 Intel order number G7421 1 002 67 System BIOS Events System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards e SS 68 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards System BIOS Events Revision 1 1 Intel order number G7421 1 002 69 Syst
68. for Intel 5500 53420 Series Server Boards 6 3 1 CPU Missing Sensor Next Steps Verify the processor is installed in the correct slot 6 4 QuickPath Interconnect Error Sensors The Intel QuickPath Interconnect QPI bus on Intel 5500 S3420 series server boards is the interconnection between processors and to the chipset The QPI Error sensors are all reported by the BIOS SMI Handler to the BMC so the Generator ID will be 33h 6 4 1 QPI Correctable Error Sensor The system detected an error and corrected it This is an informational event Table 48 QPI Correctable Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type 13h Critical Interrupt 12 Sensor Number 06h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 72h OEM Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Reserved 15 Event Data 2 0 3 CPU1 4 16 Event Data 3 Not used 42 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Processor Subsystem 6 4 1 1 QPI Correctable Error Sensor Next Steps This is an Informational event only Correctable errors are acceptable and normal at a low rate of occurrence If the error continues 1
69. for Intel S5500 S3420 Series Server Boards Table 2 Event Request Message Event Data Field Contents plen Event Data Threshold Event Data 1 7 6 00b Unspecified Event Data 2 01b Trigger reading in Event Data 2 10b OEM code in Event Data 2 11b Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 01b Trigger threshold value in Event Data 3 10b OEM code in Event Data 3 11b Sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for threshold event Event Data 2 Reading that triggered event FFh or not present if unspecified Event Data 3 Threshold value that triggered event FFh or not present if unspecified If present Event Data 2 must be present discrete Event Data 1 7 6 00b Unspecified Event Data 2 01b Previous state and or severity in Event Data 2 10b OEM code in Event Data 2 11b Sensor specific event extension code in Event Data 2 5 4 00b Unspecified Event Data 3 01b Reserved 10b OEM code in Event Data 3 11b Sensor specific event extension code in Event Data 3 3 0 Offset from Event Reading Code for discrete event state Event Data 2 7 4 Optional offset from Severity Event Reading Code OFh if unspecified 3 0 Optional offset from Event Reading Type Code for previous discrete event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present if
70. for your chassis If the front panel temperature is too high Check the cooling of your server room Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 5 2 2 Thermal Margin Sensors Cooling Subsystem Margin sensors are also linear sensors but typically report a negative value This is not an actual temperature but in fact an offset to a critical temperature Example sensors are Processor Thermal Margin Memory Thermal Margin and IOH Thermal margin Values reported should be seen as number of degrees below a critical temperature for the particular component Revision 1 1 Table 36 Thermal Margin Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 38 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Triggers as described in Table 37 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event Table 37 Thermal Margin Sensors Event Triggers Description Deassert Severity Assertion Severity Description Degraded OK The thermal margin has gone over its upper non critical
71. g Troubleshooting Guide for Intel 5500 53420 Series Server Boards Byte Field Description 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 22 Power Supply AC Power Input Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Gay Severity Severity 07h Upper non critical Degraded OK PMBus feature to monitor power If you see this event the system is pulling too much power on the input for the going high supply power consumption PSU rating 09h Upper critical non fatal Degraded 1 Verify the power budget is within the specified range going high 2 Check http www intel com p en US support for the power budget tool for your system Description Next Steps 4 3 3 Power Supply Current Output Sensors PMBus compliant power supplies may monitor the current output of the main 12v voltage rail and report the current usage as a percentage of the maximum power output for that rail Table 23 Power Supply Current Output Sensors Typical Characteristics Byte Field Description 11 Sensor Type 03h Current 12 Sensor Number 54h Power Supply 1 Current Output 55h Power Supply 2 Current Output 26 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for In
72. h of the event triggers for both assertion and deassertion Table 32 Fan Redundancy Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Fully redundant System has lost one or more fans and is running in non redundant Fan redundancy loss indicates failure of one or more mode There are enough fans to keep the system properly cooled fans 01h Redundancy lost i but fan speeds will boost Look for lower non critical fan errors or fan 02h Redundancy degraded removal errors in the SEL to indicate which fan is Ge causing the problem and follow the troubleshooting 03h Non redundant sufficient from redundant steps for these event types 04h Non redundant sufficient from insufficient 05h Non redundant insufficient System has lost fans and may no longer be able to cool itself adequately Overheating may occur if this situation remains for a longer period of time 06h Non redundant degraded from fully System has lost one or more fans and is running in non redundant redundant mode There are enough fans to keep the system properly cooled but fan speeds will boost 07h Redundant degraded from non redundant System has lost one or more fans and is running in a degraded mode but still is redundant There are enough fans to keep the system properly cooled 32 Intel order number G7421 1 002 Revision 1 1 System Event Log T
73. ich have an order number and are referenced in this document or other Intel literature may be obtained by calling 1 800 548 4725 or go to http www intel com design literature Revision 1 1 Intel order number G7421 1 002 ii Table of Contents System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Table of Contents A EN ajoe iTe i o o BREEN gege deeg ee Eege eege 1 1 1 EIDEN ege r a a a a a a 1 1 2 Industry Standard ME 1 1 2 1 Intelligent Platform Management Interface PMI 1 1 2 2 Baseboard Management Controller BMC 2 1 2 3 Intel Intelligent Power Node Manager Version 15 3 2 Basic Decoding of SEL Record ici cteciesnnitoiescitictenneiotn onions odoinn enone 4 2 1 Default Values in the SEL Records cccccccccccccecececceeseeeeeeeeeeeeeeeseseeeeeeeeeeeeeeeeess 4 3 Sensor Cross Reference LiSt rrrrrnnnnnnvvnnnnnnnnnnnnnnnnevnnnnnnnnnnnnnnnnvnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnen 8 3 1 BMC owned Sensors GID 0020 8 3 2 BIOS POST owned Sensors GID OO01b 12 3 3 BIOS SMI owned Sensors GID 0033h rrnnnnnnnnrnnnnnnnnnnrnnnnnrrnnnnnnnnnnnnnnnnnnnenne 12 3 4 Hot Swap Controller Firmware owned Sensors GID OOCOh 00C2h 14 3 5 Node Manager ME Firmware owned Sensors GID 002Ch or 602Ch 15 3 6 Microsoft OS owned Events GID 0041 16 3 7 Linux Kernel Panic Events GID 0021 16 d Power SuDSYSteM ii caccccc cen cncestees ceed
74. ied Event Data 3 3 0 Event Trigger Offset as described in Table 73 Revision 1 1 Intel order number G7421 1 002 71 Chassis Subsystem System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used Table 73 Physical Security Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Chassis Somebody has opened the chassis or the chassis 1 Use the Quick Start Guide and the Service Guide to determine whether the chassis intrusion intrusion intrusion sensor is not connected switch is connected properly 2 If this is the case make sure it makes proper contact when the chassis is closed 3 If this is also the case someone has opened the chassis Ensure nobody has access to the system that shouldn t 04h LAN leash Someone has unplugged a LAN cable that was This is most likely due to unplugging the cable but could also happen if there is an issue with the lost present when the BMC initialized This event gets cable or switch logged when the electrical connection on the NIC 1 Check the LAN cable and connector for issues connector gets lost f i 2 Investigate switch logs where possible 3 Ensure nobody has access to the server that shouldn t 72 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting
75. ion 01h Memory is configured in Mirrored System boots with mirrored Informational event Channel Mode and the memory is channel mode active one operating in the fully redundant entry per processor state 00h Memory is configured in Mirrored One of the channels in the This event should be accompanied by memory errors indicating the source of the Channel Mode and the memory mirror pair is taken offline issue Troubleshoot accordingly probably replace affected DIMM has lost redundancy and is loss of mirror one entry only operating in the degraded state for affected processor 7 13 Sparing Configuration Status This sensor provides the Spare Channel mode RAS Configuration status Revision 1 1 Table 56 Sparing Configuration Status Sensor Typical Characteristics Byte Field Description Generator ID 0001h BIOS POST 11 Sensor Type Och Memory Intel order number G7421 1 002 Memory Subsystem 49 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 12 Sensor Number 13h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 09h digital Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 57 15 Event Data 2 Not used 16
76. irst one to send the time synch message to the BMC for synchronization and the timestamp that the message gets is unknown that is the timestamp in the log could be anything because it gets the before timestamp So the BIOS sends a second time synch message to get a baseline correct timestamp in the log That is the starting time For example say that the time the BMC has is March 1 2011 21 00 The BIOS time synch updates that to the same date 21 20 the BMC was running behind Without that second time synch message you don t know that the log time jumped ahead and when you get the next log message it looks like there was a 20 min delay during the boot for some unknown reasons Without that second time synch message the time span to the next logged message is indeterminate With the second time synch as a baseline the following log timestamps are always determinate Revision 1 1 Intel order number G7421 1 002 63 System BIOS Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 69 System Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 0033h BIOS SMI Handler 11 Sensor Type 12h System Event 12 Sensor Number 83h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4
77. is is an add in card a Verify the card is inserted properly 05h SERR System Error SERR asserted This is a fatal error b eee card in another slot and check whether the error follows the card or stays with c Update all firmware and drivers including non Intel components 3 If this is an onboard device a Update all BIOS firmware and drivers b Replace the board 62 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards System BIOS Events 9 System BIOS Events There are a number of events that are owned by the system BIOS These events can occur during Power On Self Test POST or when coming out of a sleep state Not all of these events signify errors Some events are described in other chapters in this document for example memory events 9 1 System Events These events can occur during POST or when coming out of a sleep state These are informational events only 1 When logging events during POST BIOS uses generator ID 0001h 2 When coming out of a sleep state BIOS uses generator ID 0033h 9 1 1 System Boot The BIOS logs a system boot event every time the system boots The event gets logged early during POST when BIOS BMC communication is first established This event is not an error 9 1 2 Timestamp Clock Synchronization These events are used when the time between the BIOS and the BMC is synchronized Two events are logged The BIOS does the f
78. ision 1 1 Intel order number G7421 1 002 29 Cooling Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 28 Fan Speed Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert We Severit Severit Description Next Steps Hex Description everity everity 00h Lower non critical Degraded OK The fan speed has dropped below A fan speed error on a new system build is typically not caused by the fan going low its lower non critical threshold spinning too slowly instead it is caused by the fan being connected to the wrong Sg header the BMC expects them on certain headers for each chassis and will log 02h Lower critical non fatal Degraded The fan speed has dropped below going low this event if there is no fan on that header 1 Refer to the Quick Start Guide or the Service Guide to identify the correct fan headers to use 2 Ensure the latest FRUSDR update has been run and the correct chassis was detected or selected its lower critical threshold 3 If you are sure this was done the event may be a sign of impending fan failure although this will only normally apply if the system has been in use for a while Replace the fan 5 1 2 Fan Presence and Redundancy Sensors Fan presence sensors are only implemented for hot swap fans and require an additional pin on the fan header Fan redundancy is an aggregate of the
79. its Reserved 0000b 000000h Unspecified OFFFFFh Reserved This value is binary encoded For example the ID for the IPMI forum is 7154 decimal which is 1BF2h which will be stored in this record as F2h 1Bh 00h for bytes 8 through 10 respectively 11 OEM Defined OEM Defined This is defined according to the manufacturer identified by the 12 Manufacturer ID field 13 14 15 16 Table 4 OEM SEL Record Type EOh FFh Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT EOh FFh OEM system event record 4 OEM OEM Defined This is defined by the system integrator 5 6 7 8 9 10 11 12 13 14 15 16 Revision 1 1 Intel order number G7421 1 002 Sensor Cross Reference List 3 Sensor Cross Reference List This section contains a cross reference to help find details on any specific SEL entry 3 1 BMC owned Sensors GID 0020h The following table can be used to find the details of sensors owned by the BMC Table 5 BMC owned Sensors System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards ee Sensor Name Details Section Next Steps Number 01h Power Unit Status Power Unit Status Sensor Table 16 Power Unit Status Sensor Sensor Specific Offsets Next Steps Pwr Unit Status 02h Power Unit Redundancy Power Unit Redundancy Sensor Table 18 Power Unit Redundan
80. l event only Correctable errors are acceptable and normal at a low rate of If th ti 01h Bad DLLP error Correctable bad DLLP occurred aad j ANS ERORAS S r 1 Decode bus device and function to identify the card 02h Bad TLLP error Correctable bad TLP occurred 2 If this is an add in card 03h REPLAY_NUM Rollover Correctable Replay event occurred a Verify the card is inserted properly Error b Install the card in another slot and check whether the error follows the card 04h REPLAY Timer Timeout Correctable Replay timeout event occurred or stays with the slot Error c Update all firmware and drivers including non Intel components 05h Advisory non fatal Error Correctable advisory event occurred 3 If this is an onboard device received ERR_COR typically provided as notice to software a Update all BIOS firmware and drivers message driver b Replace the board 06h Link bandwidth changed Link bandwidth changed 8 1 2 PCI Express Fatal Errors When a PCI Express fatal error is reported to the BIOS SMI handler it will record the error using the following format Revision 1 1 Intel order number G7421 1 002 59 PCI Express and Legacy PCI Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 65 PCI Express Fatal Error Sensor Typical Characteristics Byte Field Description Generator ID 0033h BIOS SMI Handler 11 Sensor Type 13h Critical Interru
81. ling DRAM chip on the DIMM and if found replace the board it could also be caused by incorrect seating or improper contact 5 Consider replacing the DIMM as a preventative measure For multiple between the socket and DIMM or by bent pins in the processor occurrences replace the DIMM socket 54 Intel order number G7421 1 002 Revision 1 1 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Event Trigger Offset Description Next Steps Hex Description 00h Correctable ECC There have been too many 10 or more correctable ECC errors Even though this event doesn t immediately lead to problems it can indicate Error threshold for this particular DIMM since last boot This event in itself does one of the DIMM modules is slowly failing If this error occurs more than reached not pose any direct problems as the ECC errors are still being once corrected Depending on the RAS configuration of the memory 1 If needed decode DIMM location from hex version of SEL the IMC may take the affected DIMM offline i 2 Verify the DIMM is seated properly 3 Examine gold fingers on edge of the DIMM to verify contacts are clean 4 Inspect the processor socket this DIMM is connected to for bent pins and if found replace the board 5 Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM 7 2 2 Memory Address Parity Error Addres
82. management console might be encapsulated in a generic CONFIG packet format config data length config data blob to the BMC so that the BMC doesn t even have to parse the actual configuration data The BMC provides the access point for remote commands from external management SW and generates alerts to them Intel Intelligent Power Node Manager on Intel Manageability Engine Intel ME is an IPMI satellite controller A mechanism needs to exist to forward commands to Intel ME and send response back to originator Similarly events from Intel ME have to be sent as alerts outside of the BMC It is the responsibility of BMC to implement these mechanisms for communication with Intel Intelligent Power Node Manager The full specification can be downloaded from the following link http www intel com content dam doc technical specification intelliqent power node manager 1 5 specification pdf Revision 1 1 Intel order number G7421 1 002 3 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 2 Basic Decoding of a SEL Record The System Event Log SEL record format is defined in the PMI Specification The following section provides a basic definition for each of the fields in a SEL For more details see the PMI Specification The definitions for the standard SEL can be found in Table 1 The definitions for the OEM defined event logs can be found in Table 3 and Table 4
83. ngly post in which case post error 2 If there is no post error then mirror mode was simply disabled in BIOS setup and this 8500 is also logged should be considered informational only 7 1 2 Mirrored Redundancy State Sensor This sensor provides the RAS Redundancy state for the Memory Mirrored Channel Mode Revision 1 1 Intel order number G7421 1 002 47 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 54 Mirrored Redundancy State Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type Och Memory 12 Sensor Number 01h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type OBh Generic Discrete 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 14 Event Data 1 3 0 Event Trigger Offset as described in Table 55 15 Event Data 2 7 4 If Domain Instance Type ED3 is set to Local this field specifies the mirroring domain local sub instances which channels are included in this sub instance 0000b Reserved 0001b Ch A Ch B 0010b Ch A Ch C 0011b Ch B Ch C 0100b 1110b Reserved If Domain Instance Type ED3 is set to Global this field specifies the 0 based Socket ID of the first participant processor in this mirroring domain global instance A value of
84. nnnnrrrrennnennrrrnnnnnn 63 9 2 System Firmware Progress Formerly Post Error 64 9 2 1 System Firmware Progress Formerly Post Error Next Steps seessesssnee 65 UI Be EE TE EE 71 10 1 Eeler e eee 71 TOT Crest ee 71 10 1 2 LAN Leash Lost ug sunnier mpinimmninGundininbsynmdame 71 10 2 FP NMI Interrupts ee Ee 73 10 21 FP NMI Interrupt Next Steps Aender deed ed E dier 73 10 3 Button EL 74 11 Miscellaneous Events ica cicsicswicueiccietie vies ee cnwiies eile tees eid ewinesten ed ed sanasana Mananasi sMenu Eaua 75 11 1 SREL DEE 75 11 2 SM Da a6 ionene 77 11 2 1 SMI Timeout Next Gieps ccc ceeeeessesessessssssssssessssseessssesesseesseseeeeeeseeeeeeees 77 11 3 System Event Log E EE 78 11 4 System Event PEP ACHOn ssrin asrasa ee oe de 78 11 4 1 System Event PEF Action Next Steps rrrrrrrnnnnnnnrornnnvrrrrnnnnnnnrrrnnnnrrrnnnnnenennr 79 12 Hot Swap Controller Event etic is iccicsscisscsadscessies aaaea aaa aaaea rana aada deaan aiaiai 80 Revision 1 1 Intel order number G7421 1 002 V Table of Contents System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards 12 1 HSC Backplane Temperature Sensor nneeeeeeeeee eneee ennnen nennen 80 12 2 HSG Drive Slot Street idee geg eege TEE EN 12 2 1 HSC Drive Slot Status Sensor Next Gieps cc cceeeeeeeeeeeesseeeeseeeeeeeseeeeeees 82 12 3 HSC Drive Presence Sensor ccccccccccecceeceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
85. ntel Server Boards The BMC is the heart of the IPMI architecture and provides the intelligence behind intelligent platform management that is the autonomous monitoring and recovery features implemented directly in platform management hardware and firmware Different types of sensors built into the computer system report to the BMC on parameters such as temperature cooling fan speeds power mode operating system status and so on The BMC monitors the system for critical events by communicating with various sensors on the system board it sends alerts and logs events when certain parameters exceed their preset thresholds indicating a potential failure of the system The administrator can also remotely communicate with the BMC to take some corrective actions such as resetting or power cycling the system to get a hung OS running again These abilities save on the total cost of ownership of a system For Intel Server Boards and Intel Server Platforms the BMC supports the industry standard IPMI 2 0 Specification enabling you to configure monitor and recover systems remotely 1 2 2 1 System Event Log SEL The BMC provides a centralized non volatile repository for critical warning and informational system events called the System Event Log or SEL By having the BMC manage the SEL and logging functions it helps to ensure that post mortem logging information is available if a failure occurs that disables the systems processor S 2
86. number G7421 1 002 23 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 4 3 Power Supply The BMC monitors the power supply subsystem 43 1 Power Supply Status Sensors These sensors report the status of the power supplies in the system When a system first AC applied or removed it can log an event Also if there is a failure predictive failure or a configuration error it can log an event Table 19 Power Supply Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 08h Power Supply 12 Sensor Number 50h Power Supply 1 Status 51h Power Supply 2 Status 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Sensor Specific offset as described in Table 20 15 Event Data 2 Not used 16 Event Data 3 Not used Table 20 Power Supply Status Sensor Sensor Specific Offsets Next Steps Sensor Specific Offset Description Next Steps Hex Description 00h Presence Power supply detected Informational Event 24 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Power Subsystems Sensor Specific Offset Description Next
87. ocessor 2 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise replace the DIMM 15h BB 1 8V AUX 1 8V is supplied by the main board 1 8V is used by the onboard NIC and I O hub 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the main board 16h BB 3 3V 3 3V is supplied by the power supplies 3 3V is used by the PCle and PCI X slots 1 Ensure all cables are connected correctly 2 Reseat any PCI cards and try them in other slots 3 If the issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies 17h BB 3 3V STBY 3 3V Stby is supplied by the main board 3 3V Stby is used by the BMC Onboard NIC IOH and ICH 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 3 If the issue remains replace the power supplies Revision 1 1 Intel order number G7421 1 002 19 Power Subsystems System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Sensor Sensor Name Next Steps Number 18h BB 3 3V Vbat 3 3V Vbat is supplied by the CMOS battery when power is off and by the main board when power is on 3 3V Vbat is used by the CMOS and related circuits 1 Replace the CMOS battery Any battery of type CR2032 can be
88. ogrammed and correct chassis has been selected 67h P2 VRD Hot Processor2 voltage 3 Ensure there are no fan failures regulator overheated 4 Ensure the air used to cool the system is within the thermal 6ah OH Thermal Trip 03h 01h State Asserted I O Hub IOH overheated specifications for the system typically below 35 C 38 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 6 Processor Subsystem Intel servers report several processor centric sensors in the SEL 6 1 Processor Status Sensor The status sensor reports processor presence or a thermal trip condition Each processor has a status sensor Table 44 Process Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number See Table 45 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 45 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number G7421 1 002 Processor Subsystem 39 Processor Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 45 Processor Status Senso
89. on and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 3h OS Graceful Shutdown 15 Event Data 2 Not used 16 Event Data 3 Not used Table 94 Shutdown Reason OEM Event Record Typical Characteristics Microsoft Windows Records Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DDh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 9 10 Revision 1 1 Intel order number G7421 1 002 95 Microsoft Windows Records System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Shutdown Reason Shutdown Reason code from the registry LSB first 13 HKLM Software Microsoft Windows CurrentVersion Reliability shutdown ReasonCode 14 15 16 Reserved 00h Table 95 Shutdown Comment OEM Event Record Typical Characteristics Byte Field D
90. or reading inlet temperature Table 86 Node Manager Health Event Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 19h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 73h OEM 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Health Event Type 02h Sensor Node Manager 86 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Manageability Engine ME Events Byte Field Description 15 Event Data 2 7 4 Error type 0 9 Reserved 10 Policy Misconfiguration 11 Power Sensor Reading Failure 12 Inlet Temperature Reading Failure 13 Host Communication error 14 Real time clock synchronization failure 15 Platform shutdown initiated by NM policy due to execution of action defined by Policy Exception Action 3 0 Domain Id Currently supports only one domain Domain 0 Event Data 3 If Error type 10 or 15 lt Policy Id gt If Error type 11 lt Power Sensor Address gt If Error type 12 lt Inlet Sensor Address gt Otherwise set to 0 13 2 1 Node Manager Health Event Next Steps Misconfigured policy can happen if the max min power consump
91. ot used 40 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 16 Event Data 3 Not used 6 2 1 Catastrophic Error Sensor Next Steps This error is typically caused by other platform components 1 Check for other errors near the time of the CATERR event 2 Verify all peripherals are plugged in and operating correctly particularly Hard Drives Optical Drives and I O 3 Update system firmware and drivers 6 3 CPU Missing Sensor Processor Subsystem The CPU Missing sensor is a discrete sensor reporting the processor is not installed The most common instance of this event is due to a processor populated in the incorrect socket Table 47 CPU Missing Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 69h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h State Asserted 15 Event Data 2 Not used 16 Event Data 3 Not used Revision 1 1 Intel order number G7421 1 002 41 Processor Subsystem System Event Log Troubleshooting Guide
92. oting Guide for Intel S5500 S3420 Series Server Boards Table 92 Boot up OEM Event Record Typical Characteristics Byte Field Description 1 Record ID ID used for SEL Record access 2 3 Record Type 7 0 DCh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 6 7 8 IPMI Manufacturer ID 0137h 311d IANA enterprise number for Microsoft 9 10 11 Record ID Sequential number reflecting the order in which the records are read The numbers start at 1 for the first entry in the SEL and continue sequentially to n the number of entries in the SEL 12 Boot Time Timestamp of when system booted into the OS 13 14 15 16 Reserved 00h 14 2 Shutdown Event Records When the system shuts down from the Microsoft Windows OS there can be multiple events logged The first is an OS Stop Shutdown Event Record this can be followed by a shutdown reason code OEM record and then zero or more shutdown comment OEM records These are all informational only records 94 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 93 Shutdown Reason Code Event Record Typical Characteristics Byte Field Description 8 Generator ID 0041h System Software with an ID 20h 9 11 Sensor Type 20h OS Stop Shutdown 12 Sensor Number 00h 13 Event Directi
93. promised in turn An Address Parity Error is logged as such in SEL but in all other ways is treated the same as an Uncorrectable ECC Error While the error may be due to a failing DRAM chip on the DIMM it could also be caused by incorrect seating or improper contact between the socket and DIMM or by bent pins in the processor socket If needed decode DIMM location from hex version of SEL Verify the DIMM is seated properly Examine gold fingers on edge of the DIMM to verify contacts are clean Inspect the processor socket this DIMM is connected to for bent pins and if found replace the board Consider replacing the DIMM as a preventative measure For multiple occurrences replace the DIMM ARON Revision 1 1 Intel order number G7421 1 002 57 PCI Express and Legacy PCI Subsystem System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards 8 PCI Express and Legacy PCI Subsystem The PCI Express PCle Specification defines standard error types under the Advanced Error Reporting AER capabilities The BIOS logs AER events into the SEL The Legacy PCI Specification error types are PERR and SERR These errors are supported and logged into the SEL 8 1 PCI Express Errors PCle error events are either correctable informational event or fatal In both cases information is logged to help identify the source of the PCle error and the bus device and function is included in the extended data fields The PCle devi
94. pt 12 Sensor Number 04h 13 Event Direction and Event Type 7 Event direction Ob Assertion Event 1b Deassertion Event 6 0 Event Type 70h OEM Specific 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset as described in Table 66 15 Event Data 2 PCI Bus number 16 Event Data 3 7 3 PCI Device number 2 0 PCI Function number Table 66 PCI Express Fatal Error Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 00h Data Link Layer Protocol Error Indicates a CRC error detected during a DLLP transaction This 1 Decode bus device and function to identify means the transaction was corrupted the card 01h Surprise Link Down The link was lost and is no longer functional Requires a reboot to 2 If this is an add in card bring the link back a Verify the card is inserted properly 02h Unexpected Completion Indicates the device received a completion notification for a b Install the card in another slot and check transaction it does not recognize This is a fatal error whether the error follows the card or stays with the slot 03h Received Unsupported request condition on Typically indicates a failure due to an incorrect address sent to the c Update all firmware and drivers d address decode with the exception target
95. ption arrnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn 34 Table 35 Temperature Sensors Next Giens nnn 34 Table 36 Thermal Margin Sensors Typical Characterstce ne 35 Table 37 Thermal Margin Sensors Event Triggers Descroton 35 Table 38 Thermal Margin Sensors Next Steps rrrrrrrnnnrrnnnnnrrrrrvennnrrrnnnnrrrrrrrnnnrrnnnnsrrrnreeennnn 36 Table 39 Processor Thermal Control Sensors Typical Characteristics orrrrnrrrrrrnnnnnnnrr 36 Revision 1 Intel order number G7421 1 002 vii List of Tables System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Table 40 Table 41 Table 42 Table 43 Table 44 Table 45 Table 46 Table 47 Table 48 Table 49 Table 50 Table 51 Table 52 Table 53 Table 54 Table 55 Table 56 Table 57 Table 58 Table 59 Table 60 Table 61 Table 62 Table 63 Table 64 Table 65 Table 66 Table 67 Table 68 Table 69 Table 70 Table 71 Table 72 Table 73 Table 74 Table 75 Table 76 Table 77 Table 78 Table 79 Table 80 viii Processor Thermal Control Sensors Event Triggers Description 37 Processor Thermal Control Sensors Next Steps eeeeeeeeseeseeeeeeeeeeeeeeeeees 37 Discrete Thermal Sensors Typical Charachertstcs 38 Discrete Thermal Sensors Next Giepns 38 Process Status Sensors Typical Characteristics sssssseeeeeeeeeeeeeeeeeeeereererreererreet 39 Processor Status S
96. r In 10 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Sensor Cross Reference List SE Sensor Name Details Section Next Steps Number 54h Power Supply 1 12V of Power Supply Current Output Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Maximum Current Output Sensors Steps PS1 Curr Out 55h Power Supply 2 12V of Power Supply Current Output Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Maximum Current Output Sensors Steps PS2 Curr Out 56h Power Supply 1 Temperature Power Supply Temperature Sensors Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps PS1 Temperature 57h Power Supply 2 Temperature Power Supply Temperature Sensors Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps PS2 Temperature 60h Processor 1 Status Processor Status Sensor Table 45 Processor Status Sensors Next Steps P1 Status 61h Processor 2 Status Processor Status Sensor Table 45 Processor Status Sensors Next Steps P2 Status 62h Processor 1 Thermal Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps P1 Therm Margin 63h Processor 2 Thermal Margin Thermal Margin Sensors Table 38 Thermal Margin Sensors Next Steps P
97. re based solutions The server management features make the servers simple to manage and provide alerting on system events From entry to enterprise systems good overall server management is essential to reduce overall total cost of ownership This Troubleshooting Guide is intended to help the users better understand the events that are logged in the Baseboard Management Controllers BMC System Event Logs SEL on these Intel Server Boards There is a separate User s Guide that covers the general server management and the server management software offered on Intel Server Boards and Intel Server Platforms Server boards currently supported by this document Intel S3200 X38ML Server Boards Intel 85500 83420 Series Server Boards 1 1 Purpose The purpose of this document is to list all possible events generated by the Intel platform It may be possible that other sources not under our control also generate events which will not be described in this document 1 2 Industry Standard 1 2 1 Intelligent Platform Management Interface IPMI The key characteristic of the Intelligent Platform Management Interface IPMI is that the inventory monitoring logging and recovery control functions are available independently of the main processors BIOS and operating system Platform management functions can also be made available when the system is in a power down state IPMI works by interfacing with the BMC which extends manag
98. roubleshooting Guide for Intel S5500 S3420 Series Server Boards 5 2 Temperature Sensors There are a variety of temperature sensors that can be implemented on Intel Server Systems They are split into three types regular Cooling Subsystem temperature sensors thermal margin sensors and discrete temperature sensors Each of them has its own types of events that can be logged 5 2 1 Regular Temperature Sensors Regular temperature sensors are sensors that report an actual temperature These are linear threshold based sensors In most Intel Server Systems there are at least two sensors defined front panel temperature and baseboard temperature Both these sensors typically have upper and lower thresholds set upper to warn in case of an over temperature situation lower to warn against sensor failure temperature sensors typically read out 0 if they stop working Table 33 Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 35 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 34 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that trigger
99. rroring mode RAS configuration status Table 52 Mirroring Configuration Status Sensor Typical Characteristics Byte Field Description 8 Generator ID 0001h BIOS POST 9 11 Sensor Type Och Memory 12 Sensor Number 12h 46 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Memory Subsystem Byte Field Description 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 09h digital Discrete 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 53 15 Event Data 2 Not used 16 Event Data 3 Not used Table 53 Mirroring Configuration Status Sensor Event Trigger Offset Next Steps Event Trigger Offset Description Next Steps Hex Description 01h The system has been configured into User enabled mirrored channel Informational event only Mirrored Channel RAS Mode mode in setup 00h The system has been configured out Mirrored channel mode is 1 If this event is accompanied by a post error 8500 there was a problem applying the of Mirrored Channel RAS Mode disabled either in setup or due mirroring configuration to the memory Check for other errors related to the memory to unavailability of memory at and troubleshoot accordi
100. rrrnnnn 91 ME Firmware Health Event Sensor Next Greng nnnnnsnensesnnnreneeeerrrrrrrrrrreesrrrrrrnne 92 Boot up Event Record Typical Charactertstce AAA 93 Boot up OEM Event Record Typical Charactertstce rerent 94 Shutdown Reason Code Event Record Typical Characteristics AAA 95 Shutdown Reason OEM Event Record Typical Characteristics rrrnnnnnnrrnnnnrrrnnnne 95 Shutdown Comment OEM Event Record Typical Characteristics cceeeee 96 Bug Check Blue Screen OS Stop Event Record Typical Characteristics 97 Bug Check Blue Screen Code OEM Event Record Typical Characteristics 97 Linus Kernel Panic Event Record Characteristics sssssssseseeeeeeeseesseeeseeeeees 99 Linus Kernel Panic String Extended Record Characteristics sssseseeeeees 100 Intel order number G7421 1 002 ix System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards Introduction 1 Introduction The server management hardware that is part of Intel Server Boards and Intel Server Platforms serves as a vital part of the overall server management strategy The server management hardware provides essential information to the system administrator and provides the administrator the ability to remotely control the server even when the operating system is not running The Intel Server Boards and Intel Server Platforms offer comprehensive hardware and softwa
101. rs Next Steps Sensor Event Trigger Offset a Number Sensor Name Description um Hex Description 60h P1 Status 01h Thermal trip The processor exceeded the This event normally only happens due to failures of the thermal solution maximum temperature 07h State Asserted Indicates the processor is present 61h P2 Status 01h Thermal trip The processor exceeded the maximum temperature 07h State Asserted Indicates the processor is present 2 3 Check all system fans are operating properly 4 Check that the air used to cool the system is within limits typically 35 C 1 Verify the heatsink is properly attached and has thermal grease If the system has a heatsink fan ensure the fan is spinning 6 2 Catastrophic Error Sensor When the Catastrophic Error signal CATERR stays asserted it is a sign that something serious has gone wrong in the hardware The BMC monitors this signal and reports when it stays asserted Table 46 Catastrophic Error Sensor Typical Characteristics Byte Field Description 11 Sensor Type 07h Processor 12 Sensor Number 68h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset 01h State Asserted 15 Event Data 2 N
102. s Parity errors are errors detected in the memory addressing hardware Because these affect the addressing of memory contents they can potentially lead to the same sort of failures as ECC errors They are logged as a distinct type of error because they affect memory addressing rather than memory contents but otherwise they are treated exactly the same as Uncorrectable ECC Errors Address Parity errors are logged to the BMC SEL with Event Data to identify the failing address by channel and DIMM to the extent that it is possible to do so Table 62 Address Parity Error Sensor Typical Characteristics Byte Field Description 8 Generator ID 0033h BIOS SMI Handler 9 11 Sensor Type Och Memory 12 Sensor Number 14h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific Revision 1 1 Intel order number G7421 1 002 55 Memory Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Byte Field Description 14 Event Data 1 7 6 10b OEM code in Event Data 2 5 4 10b OEM code in Event Data 3 3 0 Event Trigger Offset 02h 15 Event Data 2 7 5 Reserved Set to 0 4 Channel Information Validity Check Ob Channel Number in Event Data 3 Bits 4 3 is not valid 1b Channel Number in Event Data 3 Bits 4 3 is valid 3 DIMM Information Validity Check
103. s within limits typically 35 C Discrete Thermal Sensors Cooling Subsystem Discrete thermal sensors do not report a temperature at all instead they report an overheating event of some kind Examples as VRD Hot voltage regulator is overheating or processor Thermal Trip the processor got so hot that its over temperature protection was triggered and the system was shut down to prevent damage Revision 1 1 Intel order number G7421 1 002 37 Cooling Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 42 Discrete Thermal Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 43 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type See Table 43 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset as described in Table 43 15 Event Data 2 Not used 16 Event Data 3 Not used 14 Event Data 1 Table 43 Discrete Thermal Sensors Next Steps Event Trigger Offset S ies Sensor Name Ce Description Next Steps umber ype Hex Description 66h P1 VRD Hot 05h 01h Limit Exceeded Processor1 voltage 1 Check for clear and unobstructed airflow into and out of the chassis regulator overheated 2 Ensure the SDR is pr
104. stamp Synch OS Boot events 0033h BIOS SMI Handler 0020h BMC Firmware 002Ch ME Firmware 0041h Server Management Software 00COh HSC Firmware HSBP A 00C2h HSC Firmware HSBP B Byte 2 7 4 Channel number Channel that event message was received over Oh if the event message was received from the system interface primary IPMB or internally generated by the BMC 3 2 Reserved Write as 00b 1 0 IPMB device LUN if byte 1 holds Slave Address 00b otherwise 10 EvM Rev Event Message format version 04h IPMI v2 0 03h IPMI v1 0 ER 11 Sensor Type Sensor Type Code for sensor that generated the event ST 12 Sensor Number of sensor that generated the event From SDR SN 13 Event Dir Event Dir on 7 Ob Assertion event 1b Deassertion event Event Type Type of trigger for the event for example critical threshold going high state asserted and so on Also indicates class of the event For example discrete threshold or OEM The Event Type field is encoded using the Event Reading Type Code 6 0 Event Type Codes 01h Threshold States 0x00 0x0b 02h Och Discrete 6Fh Sensor Specific 70 7Fh OEM 14 Event Data 1 Per Table 2 Event Request Message Event Data Field Contents ED1 15 Event Data 2 ED2 16 Event Data 3 ED3 Revision 1 1 Intel order number G7421 1 002 Basic Decoding of a SEL Record System Event Log Troubleshooting Guide
105. t Steps Number 10h BB 1 1V IOH This 1 1V line is supplied by the main board This 1 1V line is used by the I O hub OH 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the motherboard 11h BB 1 1V P1 Vccp This 1 1V line is supplied by the main board This 1 1V line is used by processor 1 1 Ensure all cables are connected correctly 2 Cross test the processor if possible If the issue remains with the socket replace the main board otherwise the processor 12h BB 1 1V P2 Vccp This 1 1V line is supplied by the main board This 1 1V line is used by processor 2 1 Ensure all cables are connected correctly 2 Cross test the processor if possible If the issue remains with the socket replace the main board otherwise the processor 18 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards Power Subsystems sensor Sensor Name Next Steps Number 13h BB 1 5V P1 DDR3 This 1 5V line is supplied by the main board This 1 5V line is used by the memory on processor 1 1 Ensure all cables are connected correctly 2 Check the DIMMs are seated properly 3 Cross test the DIMMs If the issue remains with the DIMMs on this socket replace the main board otherwise replace the DIMM 14h BB 1 5V P2 DDR3 This 1 5V line is supplied by the main board This 1 5V line is used by the memory on pr
106. tel S5500 S3420 Series Server Boards Power Subsystems Byte Field Description 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 24 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 24 Power Supply Current Output Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert T Description Next Steps Hex Description Severity Severity 07h Upper non critical Degraded OK PMBus feature to monitor power If you see this event the system is using too much power on the output for going high supply power consumption the PSU rating 09h Upper critical non fatal Degraded 1 Verify the power budget is within the specified range going high 2 Check http www intel com p en US support for the power budget tool for your system 4 3 4 Power Supply Temperature Sensors The BMC monitors one power supply temperature sensor for each installed PMBus compliant power supply Revision 1 1 Intel order number G7421 1 002 27 Power Subs
107. tem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 10 3 Button Press Events The BMC logs when the front panel power and reset buttons get pressed This is purely for informational purposes and these events do not indicate errors Table 75 Button Press Events Sensor Typical Characteristics Byte Field Description 11 Sensor Type 14h Button Switch 12 Sensor Number 09h 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 6Fh Sensor Specific 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Event Trigger Offset Oh Power Button 2h Reset Button 15 Event Data 2 Not used 16 Event Data 3 Not used 74 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Miscellaneous Events 11 Miscellaneous Events The miscellaneous events section addresses sensors not easily grouped with other sensor types 11 1 IPMI Watchdog PCSD server systems support an IPMI watchdog timer which can check to see whether the OS is still responsive The timer is disabled by default and has to be enabled manually It then requires an IPMI aware utility in the operating system that will reset the timer before it expires If the timer does expire the BMC can take action if it is configured to do so
108. threshold Event Trigger Hex Description 07h Upper non critical going high 09h Upper critical going high non fatal Degraded The thermal margin has gone over its upper critical threshold Intel order number G7421 1 002 35 Cooling Subsystem System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards Table 38 Thermal Margin Sensors Next Steps ra Sensor Name Next Steps Number 22h IOH Therm Margin 1 Check for clear and unobstructed airflow into and out of the chassis 23h Mem P1 Therm Margin 2 Ensure the SDR is programmed and correct chassis has been selected 2 h Mem P2 Th Maral 3 Ensure there are no fan failures me ee 4 Ensure the air used to cool the system is within the thermal specifications for the system typically below 35 C 62h P1 Therm Margin Not a logged SEL event Sensor is used for thermal management of the processor 63h P2 Therm Margin 5 2 3 Processor Thermal Control Sensors Processor Thermal Control sensors report the percentage of the time that the processor is throttling its performance due to thermal issues If this is not addressed the processor could overheat and shut down the system to protect itself from damage Table 39 Processor Thermal Control Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number See Table 41 13 Event Direction and 7 Event
109. tion of the platform exceeds the values in policy due to hardware reconfiguration First occurrence of an unacknowledged event will be retransmitted no faster than every 300 milliseconds Real time clock synchronization failure alert is sent when NM is enabled and capable of limiting power but within 10 minutes the firmware cannot obtain valid calendar time from the host side so NM cannot handle suspend periods Next steps depend on the policy that was set See the Node Manager Specification for more details Revision 1 1 Intel order number G7421 1 002 87 Manageability Engine ME Events System Event Log Troubleshooting Guide for Intel S5500 S3420 Series Server Boards 13 3 Node Manager Operational Capabilities Change This message provides a runtime error indication about Intel Intelligent Power Node Manager s operational capabilities This applies to all domains Assertion and deassertion of these events are supported Table 87 Node Manager Operational Capabilities Change Sensor Typical Characteristics Byte Field Description 8 Generator ID 002Ch ME Firmware 9 11 Sensor Type DCh OEM 12 Sensor Number 1Ah 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 74h OEM 14 Event Data 1 7 6 00b Unspecified Event Data 2 5 4 00b Unspecified Event Data 3 3 0 Current state of Operational Capabilities Bit pattern 0
110. unspecified OEM Event Data 1 7 6 00b Unspecified in Event Data 2 01b Previous state and or severity in Event Data 2 10b OEM code in Event Data 2 11b Reserved 5 4 00b Unspecified Event Data 3 01b Reserved 10b OEM code in Event Data 3 11b Reserved 3 0 Offset from Event Reading Type Code Event Data 2 7 4 Optional OEM code bits or offset from Severity Event Reading Type Code OFh if unspecified 3 0 Optional OEM code or offset from Event Reading Type Code for previous event state OFh if unspecified Event Data 3 Optional OEM code FFh or not present or unspecified Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel S5500 53420 Series Server Boards Basic Decoding of a SEL Record Table 3 OEM SEL Record Type COh DFh Byte Field Description 1 Record ID ID used for SEL Record access 2 RID 3 Record Type 7 0 Record Type RT COh DFh OEM timestamped bytes 8 16 OEM defined 4 Timestamp Time when event was logged LS byte first 5 TS Example TS 29 76 68 4C 4C687629h 1281914409 Sun 15 Aug 2010 6 23 20 09 UTC 7 Note There are various websites that will convert the raw number to a date time 8 Manufacturer ID LS Byte first The manufacturer ID is a 20 bit value that is derived from the IANA 9 Private Enterprise ID 10 Most significant four b
111. used 2 If error remains unlikely replace the board 19h BB 5 0V 5 0V is supplied by the power supplies 5 0V is used by the PCI slots 1 Ensure all cables are connected correctly 2 Reseat any PCI cards and try them in other slots 3 Ifthe issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies 1Ah BB 5 0V STBY 5 0V STBY is supplied by the power supplies 5 0V STBY is used to generate other standby voltages 1 Ensure all cables are connected correctly 2 Ifthe issue remains replace the board 3 Ifthe issue remains replace the power supplies 1Bh BB 12 0V 12V is supplied by the power supplies 12V is used by SATA drives Fans and PCI cards In addition it is used to generate various processor voltages 1 Ensure all cables are connected correctly 2 Check connections on fans and HDDs 3 If the issue follows the component swap it otherwise replace the board 4 Ifthe issue remains replace the power supplies 1Ch BB 12 0V 12V is supplied by the power supplies 12V is used by the serial port and by PCI cards In addition it is used to generate various processor voltages 1 Ensure all cables are connected correctly 2 Reseat any PCI cards and try them in other slots 3 If the issue follows the card swap it otherwise replace the main board 4 Ifthe issue remains replace the power supplies 20 Intel order number G7421 1 002 Revision 1 1
112. ut Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrnnnnnn 77 System Event Log Cleared Sensor Typical Characteristics essseseeeeeeeeees 78 System Event PEF Action Sensor Typical Characteristics ssssseeeeeeeeeees 79 Intel order number G7421 1 002 Revision 1 1 System Event Log Troubleshooting Guide for Intel SS500 S3420 Series Server Boards List of Tables Table 81 Table 82 Table 83 Table 84 Table 85 Table 86 Table 87 Table 88 Table 89 Table 90 Table 91 Table 92 Table 93 Table 94 Table 95 Table 96 Table 97 Table 98 Table 99 Revision 1 1 HSC Backplane Temperature Sensor Typical Characteristics rrrrrrrrrrrrrrrrrrrrreenr 80 HSC Backplane Temperature Sensor Event Trigger Offset Next Steps 81 HSC Drive Slot Status Sensor Typical Characteristics rrrrrrnnnnnnnnnnnvnnnnnnnnnnnnnnnenn 81 HSC Drive Presence Sensor Typical Characteristics nnnnsnrrrrrrrrrrrrrrrrrrrrrrrena 83 Node Manager Exception Sensor Typical Characteristics ssssessseeseeeeeeeee 85 Node Manager Health Event Sensor Typical Charachertstlce 86 Node Manager Operational Capabilities Change Sensor Typical Characteristics 88 Node Manager Alert Threshold Exceeded Sensor Typical Characteristics 90 ME Firmware Health Event Sensor Typical Characteristics rrrrrrrrnnnnnnnnnnnnnn
113. ver Boards If the issue continues to be persistent provide the content of Event Data 3 to Intel support team for interpretation Event Data 3 codes are in general not documented because their meaning only provides some clues varies and usually needs to be individually interpreted Table 90 ME Firmware Health Event Sensor Next Steps ED2 ED3 Description Next Steps 00h Recovery GPIO forced Recovery Image loaded due to recovery MGPIO Deassert MGPIO1 and reset the Intel ME pin asserted Pin number is configurable in factory presets Default recovery pin is MGPIO1 01h Image execution failed Recovery Image or backup operational image Either the flash device must be replaced if error is persistent or the upgrade loaded because operational image is corrupted This may be either procedure must be started again caused by flash device corruption or failed upgrade procedure 02h Flash erase error Error during flash erasure procedure The flash device must be replaced 03h Flash corrupted Error while checking Flash consistency The Flash device must be replaced if error is persistent 04h Internal error Error during firmware execution FW Watchdog Timeout Operational image needs to be updated to other version or hardware board repair is needed if error is persistent 05h BMC did not respond to cold reset request and Intel ME rebooted the Verify the Intel Node Manager configuration platform 06h R
114. ystems System Event Log Troubleshooting Guide for Intel 5500 53420 Series Server Boards Table 25 Power Supply Temperature Sensors Typical Characteristics Byte Field Description 11 Sensor Type 01h Temperature 12 Sensor Number 56h Power Supply 1 Temperature 57h Power Supply 2 Temperature 13 Event Direction and 7 Event direction Event Type Ob Assertion Event 1b Deassertion Event 6 0 Event Type 01h Threshold 14 Event Data 1 7 6 01b Trigger reading in Event Data 2 5 4 01b Trigger threshold in Event Data 3 3 0 Event Trigger Offset as described in Table 26 15 Event Data 2 Reading that triggered event 16 Event Data 3 Threshold value that triggered event The following table describes the severity of each of the event triggers for both assertion and deassertion Table 26 Power Supply Temperature Sensor Event Trigger Offset Next Steps Event Trigger Offset Assertion Deassert Ga Description Next Steps Hex Description Severity Severity 07h Upper non critical Degraded OK An upper non critical 1 Check for clear and unobstructed airflow into and out of the chassis going high or critical temperature 2 Ensure the SDR is programmed and correct chassis has been selected RS threshold has been 09h Upper critical non fatal Degraded ctossed 3 Ensure there are no fan failures 4 going high typically below 35 C Ensure the
Download Pdf Manuals
Related Search
Related Contents
"取扱説明書" Manuel d`instructions pH 55 • pH 56 Testeurs étanches bedienungsanleitung automatiklader „vc 10 profi“ ECG Palmare Cardio B - Bluetooth Cardio-B Palm ECG Rexel 2103757 folder Operating Instructions Draper Rolleramic Avaya Configuring BFE Services User's Manual Da-Lite 42078 sestos d1s Copyright © All rights reserved.
Failed to retrieve file