Home
Sun Fire X4100/X4100 M2 and X4200/X4200 M2 Servers
Contents
1. 200 08 26 2005 05 04 09 Memory Memory Device Disabled CPU 0 DIMM 0 Note the following considerations for this revision a Uncorrectable ECC Memory Error is not reported a Multi bit ECC errors are reported as Memory Device Disabled On first reboot BIOS logs a HyperTransport Error in the DMI log The BIOS disables the DIMM The BIOS sends the SEL records to the BMC The BIOS reboots again The BIOS skips the faulty DIMM on the next POST memory test The BIOS reports available memory excluding the faulty DIMM pair FIGURE E 1 shows an example of a DMI log screen from BIOS Setup Page 72 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 BIOS SETUP UTILITY Advanced Event Logging details View all unread events on the Event Log Mark all events as read Clear Event Log View Event Log 09 12 05 11 51 05 A Hyper Transport sync flood error occurred on last boot Enter Go to Sub Screen F1 General Help F10 Save and Exit ESC Exit v02 53 C Copyright 1985 2002 American Megatrends Inc FIGURE E 1 Sample DMI Log Screen Uncorrectable Error Appendix E Error Handling 73 Handling of Correctable Errors This section lists facts and considerations about how the server handles correctable errors During BIOS POST m The BIOS polls the MCK registers a The BIOS logs to
2. Mode Description OFF LED off ON LED steady on STANDBY 100 ms on 2900 ms off SLOW 1 Hz blink rate FAST 4 Hz blink rate LED Sensor Groups Because each LED has its own sensor and can be controlled independently there is some overlap in sensors In particular there are separate LEDs defined for the power locate and alert LEDs on the front and back panels It is desirable to have these sensors linked so that both the front and back panel LEDs can be controlled at the same time This is handled through the use of Entity Association Records These are records in the SDR that contain a list of entities that are considered part of a group For each Entity Association Record we also define another Generic Device Locator as a logical entity to indicate to system software that it refers to a group of LEDS rather than a single physical LED TABLE D 4 describes the LED sensor groups 68 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE D 4 LED Sensor Groups Group Name Sensors in Group sys power led bp power led fp power led sys locate led bp locate led fp locate led sys alert led bp alert led fp alert led For example to set both the front and back panel Power OK leds to a standby blink rate you could use this command ipmitool I lanplus H lt IPADDR gt U root P changeme sunoem led set sys power led standby Set LED fp power led to STANDBY Set LED bp power led to STANDBY
3. 600 02 16 2006 03 32 55 System Firmware Progress Video initialization 700 02 16 2006 03 33 01 System Firmware Progress USB resource configuration 8 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Correctable DIMM Errors At this time correctable errors are not logged in the server s system event logs They are reported or handled in the supported operating systems as follows Windows server m A Machine Check error message bubble pops up on task bar User must manually go into Event Viewer to view errors as follows Start gt Administration Tools gt Event Viewer View individual errors by time to see details of error Solaris There is no reporting of correctable errors in Solaris x86 at this time Linux There is no reporting of correctable errors in the Linux distributions that we support at this time BIOS DIMM Error Messages BIOS will display and log three types of error messages NODE n Memory Configuration Mismatch The following conditions will cause this error message a DIMMs are not paired Running in 64 bit mode instead of 128 bit mode a DIMMs speed not same a DIMMs do not support ECC DIMMs are not registered a MCT stopped due to errors in DIMM a DIMM module type buffer mismatch DIMM generation I II mismatch a DIMM CL T mismatch Banks on two sided DIMM mismatch a DIMM organization mismatch 128 bit SPD missing T
4. Then you could turn off the back panel Power OK LED but leave the front panel Power OK LED blinking by using this command ipmitool I lanplus H lt IPADDR gt U root P changeme sunoem led set bp power led off Set LED bp power led to OFF Using IPMItool Scripts For Testing For testing purposes it is often useful to change the status of all or at least several LEDs at once You can do this by constructing an IPMItool script and executing it with the exec command For example a script to turn on all Fan module LEDS would look like sunoem led set ft0 fm0 led on sunoem led set ft0 fml led on sunoem led set ft0 fm2 led on sunoem led set ft1 fm0 led on sunoem led set ftl fml led on sunoem led set ftl fm2 1led on If this script file were then named leds_fan_on isc you would use it in a command as shown here ipmitool I lanplus H lt IPADDR gt U root P changeme exec leds_fan_on isc Appendix D Using IPMItool to View System Information 69 70 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 APPENDIX E Error Handling Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted This appendix contains information about how the servers process and log errors See the following sections Handling of Uncorrectable Errors on page 71 Handling of Correctable Errors on page 74 Handling of Parity Errors PERR o
5. all the fans are entity 29 The last fan listed 29 5 is entity 29 with instance 5 ft1 fm2 f0 speed 48h ok 2935 6000 RPM For example to see all fan related sensors you would use the following command that uses the entity 29 argument ipmitool I lanplus H lt IPADDR gt ftO fm0 fail ft0 fm0 led ft0 fml fail ft0 fml led ft0 fm2 fail ft0 fm2 1led ft1 fm0 fail ft1 fm0 1led ftl fml fail f fm1 led ft1 fm2 fail ft1 fm2 1led ft0O fm0 f0 speed ft0 fm1 f0 speed ft0O fm2 f0 speed ft1 fm0 f0 speed ft1 fm1 f0 speed ft1 fm2 f0 speed 3D 00 3E 00 3F 00 40 00 Al 00 42 00 43 4 SPP pepe Pw PP EN yD 45 46 7 48 e ppp D os os O0 5 OH OBO A Ua A ON UN UN NN KB a ok ok ok ok ok U root 29 0 Predic 29 0 29 1 Predic 29 1 29 2 Predic 29 2 29 3 Predic 29 3 29 4 Predic 29 4 29 5 Predic 29 5 Generic 29 0 6000 RPM 29 1 6000 RPM 29 2 6000 RPM 29 3 6000 RPM 29 4 6000 RPM 29 5 6000 RPM tive Failure Generic Device 20 tive Failure Generic Device 20 tive Failure Generic Device 20 tive Failure Generic Device 20 tive Failure Generic Device 20 tive Failure Device 20 Deasser P changeme sdr entity 29 ted ted ted ted ted ted Other queries can include a particular type of sensor The command in the following example would return a list
6. serv s Sun Microsystems Inc a les droits de propriete intellectuels relatants la technologie qui est d crit dans ce document En particulier et sans la limitation ces droits de propri t intellectuels peuvent inclure un ou plus des brevets am ricains num r s http www sun com patents et un ou les brevets plus suppl mentaires ou les applications de brevet en attente dans les Etats Unis et dans les autres pays Ce produit ou document est prot g par un copyright et distribu avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y en a Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Sun Sun Microsystems le logo Sun Java AnswerBook2 docs sun com Sun Fire Sun VTS et Solaris sont des marques de fabrique ou des marques d pos es de Sun Microsystems Inc aux Etats Unis et dans d autres pays Toutes les ma
7. viewing with IPMItool 65 87 G general troubleshooting guidelines 3 graceful shutdown 5 GRASP board power status LED 42 guidelines for troubleshooting 3 H hard disk drive status LEDs 37 hardware errors handling 82 l ILOM SP GUI general information 43 serial connection 44 time stamps 47 viewing component inventory 48 viewing sensors 50 viewing SP SEL 45 inspection external 4 internal 5 Integrated Lights Out Manager Service Processor See ILOM SP Intelligent Platform Management Interface See IPMI internal inspection 5 internal LEDs 39 IPMI general information 56 IPMItool changing password 58 clearing SP SEL 63 configuring SSH key 58 connecting to server 57 enabling anonymous user 57 general information 56 LED modes 68 LED sensor groups 68 LED sensor IDs 66 location of package 56 man page 56 setting LED status 66 using scripts for testing 69 using SDR 64 viewing component inventory 65 viewing LED status 66 viewing sensor status 59 viewing SP SEL 62 isolating DIMM ECC errors 16 L LEDs back panel definitions 38 back panel locations 37 CPU fault 42 DIMM fault 42 external 35 fan module fault 42 Front Fan Fault 37 front panel definitions 36 front panel locations 36 GRASP Board Power Status 42 hard disk drive status 37 internal 39 Locate 36 modes 68 power supply status 38 Power Supply Rear Fan Tray Fault 37 Power OK 36 rear fan tray fault 38 sensor groups 68 sensor IDs 66 Service Action Required 36
8. Action Required LED Same function as on front panel Power OK LED Same function as on front panel This LED has two states e Off Fan module is OK Lit amber Fan tray has failed The power supplies have three LEDs Top LED green Power supply is OK e Middle LED amber Power supply failed Bottom LED green AC power to power supply is OK This LED helps you to identify which system in the rack you are working on in a rack full of servers e Push and release this button to make the Locate LED blink for 30 minutes e Hold down the button for 5 seconds to initiate a push to test mode that illuminates all other LEDs both inside and outside of the chassis for 15 seconds This LED has two states e Off Normal operation e Slow Blinking An event that requires a service action has been detected This LED has three states Off Server main power and standby power are off Blinking Server is in standby power mode with AC power applied to only the GRASP board and the power supply fans e On Server is in main power mode with AC power supplied to all components 38 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Internal Status Indicator LEDs The servers have internal fault indicator LEDs for the fan modules the DIMM slots and the CPUs FIGURE B 3 shows the locations of the internal LEDs See TABLE B 3 for descriptions of the LED behavior N
9. Correcting DIMM ECC Errors on page 16 for a procedure to determine which DIMM of the pair is faulty FIGURE 1 4 shows the numbering of the Sun Fire X4100 X4200 DIMM slots FIGURE 1 5 shows the numbering of the Sun Fire X4100 M2 X4200 M2 DIMM slots Chapter 1 Initial Inspection of the Server 11 Back panel of server DIMM 3 DIMM 1 DIMM 2 DIMM 0 Pair 0 DIMM 0 DIMM 1 Pair 1 DIMM 2 DIMM 3 DIMM fault LEDs in DIMM ejector levers FIGURE 1 4 Sun Fire X4100 X4200 DIMM Slot Locations 12 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 DIMM SW2 DIMM AO DIMM BO DIMM A1 DIMM B1 Pair 0 DIMM B1 DIMM A1 Pair 1 DIMM BO DIMM AO Back panel of server m DIMM AO m DIMM BO m DIMM Ai DIMM B1 DIMM fault LEDs FIGURE 1 5 Sun Fire X4100 M2 X4200 M2 DIMM Slot Locations Chapter 1 Initial Inspection of the Server in DIMM ejector levers 13 DIMM Population Rules Note The Sun Fire X4100 X4200 servers use only DDR1 DIMM The Sun Fire X4100 M2 X4200 M2 servers use only DDR2 DIMMs Sun Fire X4100 X4200 Rules The DIMM population rules for the Sun Fire X4100 X4200 servers are listed here a Each CPU can support a maximum of four DDR1 DIMMs m Each pair of DIMMs must be identical same manufacturer size and speed a The DIMM slots are paired and the DIMMs mu
10. Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT state Appendix A BIOS Event Logs and POST Codes 29 POST Code Checkpoints The POST code checkpoints are the largest set of checkpoints during the BIOS pre boot process TABLE A 2 describes the type of checkpoints that might occur during the POST portion of the BIOS These two digit checkpoints are the output from primary 1 0 port 80 TABLE A 2 POST Code Checkpoints Post Code Description 03 Disable NMI Parity video for EGA and DMA controllers At this point only ROM accesses are to the GPNV If BB size is 64K require to turn on ROM Decode below FFFF0000h It should allow USB to run in E000 segment The HT must program the NB specific initialization and OEM specific initialization can program if it need at beginning of BIOS POST like overriding the default values of Kernel Variables 04 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is OK Verify CMOS checksum manually by reading storage area If the CMOS checksum is bad update CMOS with power on default values and clear passwords Initialize status register A Initializes data variables that are based on CMOS setup questions Initializes both the 8259 compatible PICs in the system 05 Initializes the interrupt controlling hardware generally PIC and interrupt vector table 06 Do R W test to
11. Problems m If the server will power on skip this section and go to Externally Inspecting the Server on page 4 m If the server will not power on check this list of items 1 Check that AC power cords are attached firmly to the server s power supplies and to the AC source 2 Check that both the main cover and rear cover are firmly in place There is an intrusion switch on the front I O board that automatically shuts down the server power to standby mode when the covers are removed Externally Inspecting the Server To perform a visual inspection of the external system 1 Inspect the external status indicator LEDs which can indicate component malfunction For the LED locations and descriptions of their behavior see External Status Indicator LEDs on page 35 2 Verify that nothing in the server environment is blocking air flow or making a contact that could short out power 3 If the problem is not evident continue with Internally Inspecting the Server on page 5 4 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Internally Inspecting the Server Perform a visual inspection of the internal system by following these steps Stop when you identify the problem 1 Choose a method for shutting down the server from main power mode to standby power mode a Graceful shutdown Use a ballpoint pen or other stylus to press and release the Power button on the front panel This ca
12. Refresh button to update the sensor readings to their current status 5 Click the Show Thresholds button to display the settings that trigger alerts The Sensor Readings table is updated See the example in FIGURE C 4 For example if system temperature reaches 30 C the service processor will send an alert Sensor thresholds include the following m Low High NR Low or high non recoverable a Low High CR Low or high critical a Low High NC Low or high non critical 52 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Remote Control Maintenance Sensor Readings Event Logs Locator Indicator Sensor Readings View readings for temperature voltage or fan sensors Select a sensor type category fall Sensors Sensor Readings 77 sensors Status a Name a Reading A LowNR A LowCT a LowNC a High NC a High CT a High N Predictive Failure sys tempfail 1 0 0 0 0 0 0 Deasserted Predictive Failure sys fanfail 1 0 0 0 0 0 0 Deasserted Normal mbt_amb 24 degrees C 18 degrees C 20 degrees C 22 degrees C 35 degrees C 40 degrees C 45 Normal mb v_bat 3 232 Volts 2 192 Volts 2 496 Volts 2 688 Volts 3 392 Volts 3 6 Volts KA Normal mb v_ 3v3sthy 3 217 Volts 2 595 Volts 2 785 Volts 2 992 Volts 3 598 Volts 3 788 Volts 3 4 Unknown mb v_ 3v3 Not Available 2 595 2 785 2 992 3 598 3 788 3 4 Unknown mb v_ 5y Not Available 3 484 3 978 4 498 5 486 5 98 6 4 Hide Th
13. The BIOS is not The BIOS goes to the next boot device in the DMI Log Non fatal Failure able to boot froma list If all devices inthe list fail an error mes device in the boot sage is displayed retry from beginning of device list list SP can control change boot order Appendix E Error Handling 85 86 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Index A anonymous user IPMItool 57 B back panel LEDs definitions 38 locations 37 BIOS event logs 23 POST code checkpoints 30 POST codes 28 POST options 27 POST overview 25 redirecting console output for POST 26 Bootable Diagnostics CD 20 C comments and suggestions x component inventory viewing with ILOM SP GUI 48 viewing with IPMItool 65 console output redirecting 26 correctable errors handling 74 CPU fault LED 42 D diagnostic software Bootable Diagnostics CD 20 SunVTS 19 DIMM fault LEDs definition 42 DIMMs fault LEDs 11 isolating errors 16 population rules 14 E emergency shutdown 5 error handling correctable 74 hardware errors 82 mismatching processors 81 parity errors 76 system errors 79 uncorrectable errors 71 event logs BIOS 23 external inspection 4 external LEDs 35 F fan fault LED front panel 37 fan module fault LEDs 42 faults DIMM 11 finding sensor names 64 Front Fan Fault LED 37 front panel LEDs definitions 36 locations 36 front panel Power button 5 FRU inventory viewing with ILOM SP GUI 48
14. could The BIOS displays an error message logs DMI Log Non fatal Microcode Er not find or load the error to DMI and boots ror the CPU Micro code Update to the CPU The message most likely ap pears when a new CPU is installed in a motherboard with an outdated BIOS In this case the BIOS must be updated BIOS POST CMOS contents The BIOS displays an error message logs DMI Log Non fatal CMOS Check failed the Check the error to DMI and boots sum Bad sum check Unsupported The BIOS sup The BIOS displays an error message logs DMI Log Fatal CPU configu ports mismatched the error and halts the system ration frequency and steppings in CPU configuration but some CPUs might not be supported Correctable The CPU detects a The CPUcorrects the error in hardware No DMI Log Normal error variety of correct interrupt or machine check is generated by SP SEL operation able errors in the the hardware The polling is triggered every MCi_STATUS reg half second by SMI timer interrupts and is isters done by the BIOS SMI handler The SMI handler logs a message to the SP SEL if the SEL is available otherwise SMI logs a message to DMI The BIOS s polling is disablable through software SMI Single fan fail Fan failure is de The Front Fan Fault Service Action Re SP SEL Non fatal ure tected by reading quired and individual fan module LEDs are tach signals lit 84 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide Ma
15. example fp t_amb OAh ok 12 0 22 degrees C Appendix D Using IPMItool to View System Information 59 Reading Specific Sensors Although the default output is a long list of sensors it is possible to refine the output to see only specific sensors The sdr list command can use an optional argument to limit the output to sensors of a specific type TABLE D 1 describes the available sensor arguments TABLE D 1 IPMItool Sensor Arguments Argument Description Sensors all All sensor records All sensors full Full sensor records Temperature voltage and fan sensors compact Compact sensor records Digital Discrete failure and presence sensors event Event only records Sensors used only for matching with SEL records mcloc MC locator records Management Controller sensors generic Generic locator records Generic devices LEDs fru FRU locator records FRU devices For example to see only the temperature voltage and fan sensors you would use the following command with the fu11 argument ipmitool I lanplus H lt IPADDR gt U root P changeme sdr elist full fp t_amb OAh ok 12 0 22 degrees C ps t_amb 11h ok 10 0 21 degrees C ps0 f0 speed 15h ok 10 0 11000 RPM ps1 f0 speed 19h ok 10 1 0 RPM mb t_amb 1Ah ok 7 0 25 degrees C mb v_bat 1Bh ok 7 0 3 18 Volts mb v_ 3v3stby ich ok 7 0 3 17 volts mb v_ 3v3 1Dh ok 7 0 3 34 Volts mb v_ 5v 1Eh ok 7 0 5 04 Vol
16. of all Temperature type sensors in the SDR ipmitool I lanplus H lt IPADDR gt sys tempfail mb t_amb fp t_amb ps t_amb io t_amb p0 t_core pl t_core 03h 05h 14h 1Bh 2Ch 35h ok ok ok ok ok ok ok Appendix D Predictive Failure Deasserted U root 23 0 7 0 25 12 0 25 10 0 24 15 0 23 3 0 35 3 1 36 Using IPMItool to View System Information degrees degrees degrees degrees degrees degrees oe Ome A P changeme sdr type temperature 61 Using IPMItool to View the ILOM SP System Event Log The ILOM SP System Event Log SEL provides storage of all system events You can view the SEL with IPMItool See the following sections m Viewing the SEL With IPMItool on page 62 m Clearing the SEL With IPMItool on page 63 m Using the Sensor Data Repository SDR Cache on page 64 m Sensor Numbers and Sensor Names in SEL Events on page 64 Viewing the SEL With IPMItool There are two different IPMI commands that you can use to see different levels of detail m View the ILOM SP SEL with a minimal level of detail by using the sel list command ipmitool I lanplus H lt IJPADDR gt U root P changeme sel list 100 Pre Init Time stamp Entity Presence 0x16 Device Absent 200 Pre Init Time stamp Entity Presence 0x26 Device Present 300 Pre Init Time stamp Entity Presence 0x25 Device Absent 400 Pre I
17. power is applied to the full server the Power OK LED next to the Power button lights and remains lit Enter the BIOS Setup utility by pressing the F2 key while the system is performing the power on self test POST The BIOS Main menu screen is displayed View the BIOS event log a From the BIOS Main Menu screen select Advanced The Advanced Settings screen is displayed b From the Advanced Settings screen select Event Log Configuration The Advanced Menu Event Logging screen is displayed 23 c From the Event Logging Details screen select View Event Log All unread events are displayed 4 View the BMC system event log a From the BIOS Main Menu screen select Advanced The Advanced Settings screen is displayed b From the Advanced Settings screen select IPMI 2 0 Configuration The Advanced Menu IPMI 2 0 Configuration screen is displayed c From the IPMI 2 0 Configuration screen select View BMC System Event Log The log takes about 60 seconds to generate then it is displayed on the screen 5 If the problem with the server is not evident continue with Using the ILOM SP GUI to View System Information on page 43 or Using IPMItool to View System Information on page 55 24 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Power On Self Test POST The system BIOS provides a rudimentary power on self test The basic devices required for the server to opera
18. setting status with IPMItool 66 System Overheat Fault 37 viewing status with IPMItool 66 Locate LED and button 36 mapping sensor numbers to sensor names 64 mismatching processors error handling 81 O overview SunVTS diagnostics 19 P parity errors handling 76 password changing with IPMItool 58 PERR 76 population rules for DIMMs 14 POST changing options 27 code checkpoints 30 codes table 28 overview 25 88 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 redirecting console output 26 Power button front panel 5 power off procedure 5 power supply status LEDs 38 Power Supply Rear Fan Tray Fault LED 37 Power OK LED 36 power on self test see POST processors mismatched error 81 R rear fan tray fault LED 38 redirecting console output 26 related documentation viii Resource CD 56 S safety guidelines vii scripts IPMItool 69 SDR using with IPMItool 64 sensor data repository See SDR sensor IDs for LEDs 66 sensor number formats 64 sensors viewing with ILOM SP GUI 50 viewing with IPMItool 59 serial connection to ILOM SP 44 serial number locations 3 SERR 79 Service Action Required LED 36 Service Processor system event log See SP SEL shutdown procedure 5 SP SEL clearing with IMPItool 63 sensor numbers and names 64 time stamps 47 using SDR 64 viewing with ILOM SP GUI 45 viewing with IPMItool 62 SSH key configuring with IPMItool 58 sticker serial number 3 SunVTS Bootable Diagnost
19. sys power led sys locate led sys alert led sys psfail led sys tempfail led sys fanfail led bp power led bp locate led bp alert led fp power led fp locate led fp alert led io hdd0 led io hdd1 led io hdd2 led io hdd3 led io f0 led p0 led p0 d0 led p0 d1 led p0 d2 led p0 d3 led pl led p1 d0 led pl d1 led pl d2 led pl d3 led ft0 fm0 led ft0 fm1 led Description System Power front back System Locate front back System Alert front back System Power Supply Failed System Over Temperature System Fan Failed Back Panel Power Back Panel Locate Back Panel Alert Front Panel Power Front Panel Locate Front Panel Alert Hard Disk 0 Failed Hard Disk 1 Failed Hard Disk 2 Failed Hard Disk 3 Failed I O Fan Failed CPU 0 Failed CPU 0 DIMM 0 Failed CPU 0 DIMM 1 Failed CPU 0 DIMM 2 Failed CPU 0 DIMM 3 Failed CPU 1 Failed CPU 1 DIMM 0 Failed CPU 1 DIMM 1 Failed CPU 1 DIMM 2 Failed CPU 1 DIMM 3 Failed Fan Tray 0 Module 0 Failed Fan Tray 0 Module 1 Failed Appendix D Using IPMItool to View System Information 67 TABLE D 2 LED Sensor IDs LED Sensor ID Description ft0 fm2 led Fan Tray 0 Module 2 Failed ft1 fm0 led Fan Tray 1 Module 0 Failed ft1 fm1 led Fan Tray 1 Module 1 Failed ft1 fm2 led Fan Tray 1 Module 2 Failed LED Modes You supply the modes in TABLE D 3 to the led set commands to specify in which mode you want the LED to be placed TABLE D 3 LED Modes
20. that you received with your system a Solaris Operating System documentation which is at http docs sun com vii viii Related Documentation For a description of the document set for these servers see the Where To Find Documentation sheet that is packed with your system and also posted at the product s documentation site See the following URL then navigate to your product http www sun com documentation Translated versions of some of these documents are available at the web site described above in French Simplified Chinese Traditional Chinese Korean and Japanese English documentation is revised more frequently and might be more up to date than the translated documentation For all Sun hardware documentation see the following URL http www sun com documentation For Solaris and other software documentation see the following URL http docs sun com Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Typographic ConventionsThird Party Typeface Meaning Examples AaBbCc123 The names of commands files and directories on screen computer output AaBbCc123 What you type when contrasted with on screen computer output AaBbCc123 Book titles new words or terms words to be emphasized Replace command line variables with real names or values The settings on your browser might differ from these settings Web Sites Edit your login file Use 1s a to l
21. window for displaying text information 37 Displaying sign on message CPU information setup key message and any OEM specific information 38 Initializes different devices through DIM 39 Initializes DMAC 1 and DMAC 2 3A Initialize RTC date time 3B Test for total memory installed in the system Also Check for DEL or ESC keys to limit memory test Display total memory in the system 3C By this point RAM read write test is completed program memory holes or handle any adjustments needed in RAM size with respect to NB Test if HT Module found an error in BootBlock and CPU compatibility for MP environment 40 Detect different devices Parallel ports serial ports and coprocessor in CPU etc successfully installed in the system and update the BDA EBDA etc 50 Programming the memory hole or any kind of implementation that needs an adjustment in system RAM size if needed 52 Updates CMOS memory size from memory found in memory test Allocates memory for Extended BIOS Data Area from base memory Appendix A BIOS Event Logs and POST Codes 31 TABLE A 2 POST Code Checkpoints Continued Post Code Description 60 Initializes NUM LOCK status and programs the KBD typematic rate 75 Initialize Int 13 and prepare for IPL detection 78 Initializes IPL devices controlled by BIOS and option ROMs 7A Initializes remaining option ROMs 7C Generate and write contents of ESCD in NVRam 84 Log errors encountered during POST 8
22. x Sensor Readings 77 sensors Status Name Reading State Asserted sys id State Asserted sys intsw Predictive Failure Deasserted sys psfail Predictive Failure Deasserted sys tempfail Predictive Failure Deasserted sys fanfail Normal mb t_amb 24 degrees C Normal mb v_bat 3 232 Volts Normal mb v_ 3v3stby 3 217 Volts Unknown mb v_ 3v3 Not Available Unknown mb v_ 5v Not Available FIGURE C 3 Sample Sensor Readings Screen 3 Select the type of sensor readings that you want to view from the drop down list box You can select All Sensors Temperature Sensors Voltage Sensors or Fan Sensors Appendix C Using the ILOM SP GUI to View System Information 51 The sensor readings are displayed The Sensor Readings fields are described in TABLE C 2 TABLE C 2 Event Log Fields Field Description Status Reports the status of the sensor including State Asserted State Deasserted Predictive Failure Device Inserted Device Present Device Removed Device Absent Unknown and Normal Name Reports the name of the sensor The names correspond to these components e sys System or chassis e bp Back panel e fp Front panel e mb Motherboard e io I O board e p0 Processor 0 e p1 Processor 1 e ft0 Fan tray 0 e ftl Fan tray 1 e pdb Power distribution board e ps0 Power supply 0 e psl Power supply 1 Readin Reports the rpm temperature and voltage measurements 8 P P P 8 4 Click the
23. 5 Display errors to the user and gets the user response for error 87 Execute BIOS setup if needed requested 8C After all device initialization is done programmed any user selectable parameters relating to NB SB such as timing parameters non cacheable regions and the shadow RAM cacheability and do any other NB SB PCIX OEM specific programming needed during Late POST Background scrubbing for DRAM and L1 and L2 caches are set up based on setup questions Get the DRAM scrub limits from each node 8D Build ACPI tables if ACPI is supported 8E Program the peripheral parameters Enable Disable NMI as selected 90 Late POST initialization of system management interrupt AO Check boot password if installed Al Clean up work needed before booting to OS A2 Takes care of runtime image preparation for different BIOS modules Fill the free area in F000h segment with OFFh Initializes the Microsoft IRQ Routing Table Prepares the runtime language module Disables the system configuration display if needed A4 Initialize runtime language module A7 Displays the system configuration screen if enabled Initialize the CPUs before boot which includes the programming of the MTRRs A8 Prepare CPU for OS boot including final MTRR values A9 Wait for user input at config display if needed AA Uninstall POST INT1Ch vector and INTO9h vector Deinitializes the ADM module AB Prepare BBS for Int 19 boot AC Any kind of Chipsets NB SB specific programmin
24. CH 2 count reg Initialize CH 0 as system timer Install the POSTINT1Ch handler Enable IRQ 0 in PIC for system timer interrupt Traps INT1Ch vector to POSTINT1ChHandlerBlock CO Early CPU Init Start Disable Cache Init Local APIC C1 Set up boot strap processor information C2 Set up boot strap processor for POST This includes frequency calculation loading BSP microcode and applying user requested value for GART Error Reporting setup question C3 Errata workarounds applied to the BSP 78 amp 110 C5 Enumerate and set up application processors This includes microcode loading and workarounds for errata 78 110 106 107 69 63 C6 Re enable cache for boot strap processor and apply workarounds in the BSP for errata 106 107 69 and 63 if appropriate In case of mixed CPU steppings errors are sought and logged and an appropriate frequency for all CPUs is found and applied NOTE APs are left in the CLI HLT state C7 The HT sets link frequencies and widths to their final values This routine gets called after CPU frequency has been calculated to prevent bad programming OA Initializes the 8042 compatible Keyboard Controller OB Detects the presence of PS 2 mouse OC Detects the presence of Keyboard in KBC port 30 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE A 2 POST Code Checkpoints Continued Post Code Description OE Testing and initialization of different In
25. CPU slot 0 opposite to the original error in slot 1 the problem is related to the individual DIMM In this case return both DIMMs the pair to the Support Center for replacement m If the error still appears in CPUO slot 1 as the original error did the problem is not related to an individual DIMM Instead it might be caused by CPU0 or by the DIMM slot Continue with the next step Shut down the server again and disconnect the AC power cords Remove both DIMMs of the pair and install them into paired slots on the opposite CPU Using the example install the two DIMMs from CPU0 slots 0 1 into CPU1 slots 0 1 or CPU1 slots 2 3 Reconnect AC power cords to the server Power on the server and run the diagnostics test again Review the log file m If the error now appears under the CPU that manages the DIMM slots you just installed the problem is with the DIMMs Return both DIMMs the pair to the Support Center for replacement m If the error remains with the original CPU there is a problem with that CPU Chapter 1 Initial Inspection of the Server 17 18 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 CHAPTER 2 Diagnostic Testing Software This chapter contains information about a diagnostic software tools that you can use Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted Sun VTS Diagnostic Tests The s
26. DMI a The BIOS logs to the SP SEL through the BMC The feature is turned off at OS boot time by default The following Linux versions report correctable ecc syndrome and memory fill errors in var 1log if kernel flag mce is indicated at boot time or if mce is enabled through kernel compile or installation a RH3 Updated single core a RH4 Update1 a SLES9 SP1 The Linux kernel x86_64 kernel mce c repeats a report every 30 seconds until another error is encountered and a flag is reset Solaris support provides full self healing and automated diagnosis for the CPU and Memory subsystems FIGURE E 2 shows an example of a DMI log screen from BIOS Setup Page BIOS SETUP UTILITY View Event Log 09 12 05 View Event Lo 12 33 16 m Node 1 DIMM Pair 0 SPD address 0A0h 042h 3 16 Bit ECC Memory Error FIGURE E 2 Sample DMI Log Screen Correctable Error 74 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 m If during any stage of memory testing the BIOS finds itself incapable of reading writing to the DIMM it takes the following actions a The BIOS disables the DIMM as indicated by the Memory Decreased message in the example in FIGURE E 3 a The BIOS logs an SEL record a The BIOS logs an event in DMI Jiew Even DIMM Pair O SPD addres Single Bit ECC Memory Error FIGURE E 3 Sample DMI Log Screen Correctable Error Memory Decreased Appendix E Error Handling 75 Ha
27. ED behavior which differs slightly between Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers 35 Locate button LED Service action required LED Power OK LED Power button Front fan fault LED Power supply rear fan tray fault LED System overheat fault LED Hard disk drive status LEDs FIGURE B 1 Sun Fire X4200 X4200 M2 Servers Front Panel LEDs TABLE B 1 Front Panel LED Functions LED Name Description Locate button LED Service Action Required LED Power OK LED This LED helps you to identify which system in the rack you are working on in a rack full of servers e Push and release this button to make the Locate LED blink for 30 minutes e Hold down the button for 5 seconds to initiate a push to test mode that illuminates all other LEDs both inside and outside of the chassis for 15 seconds This LED has two states e Off Normal operation e Slow Blinking An event that requires a service action has been detected This LED has three states e Off Server main power and standby power are off e Blinking Server is in standby power mode with AC power applied to only the GRASP board and the power supply fans On Server is in main power mode with AC power supplied to all components 36 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE B 1 Continued Front Panel LED Functions LED Name Description Front Fan Fault LED Thi
28. EDs on page 66 55 About IPMI IPMI is an open standard hardware management interface specification that defines a specific way for embedded management subsystems to communicate IPMI information is exchanged though baseboard management controllers BMCs which are located on IPMI compliant hardware components Using low level hardware intelligence instead of the operating system has two main benefits first this configuration allows for out of band server management and second the operating system is not burdened with transporting system status data Your ILOM Service Processor SP is IPMI v2 0 compliant You can access IPMI functionality through the command line with the IPMItool utility either in band or out of band Additionally you can generate an IPMI specific trap from the web interface or manage the server s IPMI functions from any external management solution that is IPMI v1 5 or v2 0 compliant For more information about the IPMI v2 0 specification go to http www intel com design servers ipmi spec htm spec2 About IPMItool IPMItool is included on the Resource CD also titled Tools and Drivers CD in later servers 705 1438 IPMItool is a simple command line interface that is useful for managing IPMI enabled devices You can use this utility to perform IPMI functions with a kernel device driver or over a LAN interface IPMItool enables you to manage system hardware components monitor system health and monitor and m
29. Halted Appendix E Error Handling 81 Hardware Error Handling Summary This section contains a table that summarizes the most common hardware errors that you might encounter with these servers TABLEE 1 Hardware Error Handling Summary Logged DMI Log or SP Error Description Handling SEL Fatal SP failure The SP fails to boot The SP controls the system reset so the sys Not logged Fatal upon application tem may power on but will not come out of of system power reset During power up the SP s boot loader turns on the power LED During SP boot Linux startup and SP sanity check The power LED blinks The LED is turned off when SP management code the IPMI stack is started e At exit of BIOS POST the LED goes to STEADY ON state SP failure SP boots but fails The SP controls the system RESET so the Not logged Fatal POST system will not come out of reset BIOS POST Server BIOS does There are fatal and non fatal errors in POST failure not pass POST The BIOS does detect some errors that are announced during POST as POST codes on the bottom right corner of the display on the serial console and on the video display Some POST codes are forwarded to the SP for logging The POST codes described above do not come out in sequential order and some are repeated because some POST codes are is sued by code in add in card BIOS expansion ROMs In the case of early POST failures for exam ple the BSP fails to operate correctl
30. S amp UN microsystems Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide Sun Microsystems Inc www sun com Part No 819 3284 17 2007 Revision A May 2007 R Submit comments about this document at http www sun com hwdocs feedback Copyright 2007 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved Sun Microsystems Inc has intellectual property rights relating to technology fiat is described in this document In particular and without limitation these intellectual property rights may include one or more of the U S patents listed at http www sun com patents and one or more additional patents or pending patent applications in the U S and in other countries This document and the product to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the one may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Sun Sun Microsystems the Sun logo Java AnswerBook2 docs sun com Sun Fire Su
31. Uncorrect The CPU detects The sync flood method of handling this is SP SEL Fatal able DRAM an uncorrectable used to prevent the erroneous data from be ECC error multiple bit DIMM ing propogated across the HyperTransport error links The system reboots the BIOS recovers the machine check register information maps this information to the failing DIMM when CHIPKILL is disabled or DIMM pair when CHIPKILL is enabled and logs that information to the SP The BIOS will halt the CPU Unsupported Unsupported The BIOS displays an error message logs an DMI Log Fatal DIMM config DIMMs are used error and halts the system SP SEL uration or supported DIMMs are load ed improperly HyperTrans CRC or link error Sync floods on HyperTransport links the DMI Log Fatal port link fail on one of the Hy machine resets itself and error information SP SEL ure perTransport Links gets retained through reset The BIOS reports A Hyper Transport sync flood error occurred on last boot press F1 to continue Appendix E Error Handling 83 TABLEE 1 Hardware Error Handling Summary Logged DMI Log or SP Error Description Handling SEL Fatal PCI SERR System or parity Sync floods on HyperTransport links the DMI Log Fatal PERR error on a PCI bus machine resets itself and error information SP SEL gets retained through reset The BIOS reports A Hyper Transport sync flood error occurred on last boot press F1 to continue BIOS POST The BIOS
32. ailable to refine and limit the SEL output If you want to see only the first NUM records add that as a qualifier to the command If you want to see the last NUM records use that qualifier For example to see the last three records in the SEL use this command ipmitool I lanplus H lt IPADDR gt U root P changeme sel elist last 3 800 Pre Init Time stamp Entity Presence psl prsnt Device Absent 900 Pre Init Time stamp Phys Security sys intsw Gen Chassis intrusion a00 Pre Init Time stamp Entity Presence ps0 prsnt Device Present If you want to get more detailed information on a particular event you can use the sel get ID command in which you specify an SEL record ID For example ipmitool I lanplus H lt IJPADDR gt U root P changeme sel get 0x0a00 SEL Record ID 0a00 Record Type ss 02 Timestamp 07 06 1970 01 53 58 Generator ID 0020 EvM Revision 04 Sensor Type Entity Presence Sensor Number 12 Event Type Generic Discrete Event Direction Assertion Event Event Data RAW O1ffff Description Device Present Sensor ID psO prsnt 0x12 Entity ID 10 0 Sensor Type Discrete Entity Presence States Asserted Availability State Device Present In the example above this particular event describes that the Power Supply 0 is detected and present Clearing the SEL With IPMItool To clear the SEL use the sel clear command ipmitool I lanplus H lt IPADDR gt U root P changeme
33. anage system environmentals independent of the operating system Locate IMPItool and its related documentation on your Resource CD 705 1438 or download this tool from the following URL http ipmitool sourceforge net IPMItool Man Page After you install the IPMItool package you can access detailed information about command usage and syntax from the man page that is installed From a command line type this command man ipmitool 56 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Connecting to the Server With IPMItool To connect over a remote interface you must supply a user name and password The default user with admin level access is root with password changeme This means you must use the u and P parameters to pass both user name and password on the command line as shown in the following example ipmitool I lanplus H lt IPADDR gt U root P changeme chassis status Note If you encounter command syntax problems with your particular operating system you can use the ipmitool h command and parameter to determine which parameters can be passed with the ipmitool command on your operating system Also refer to the ipmitool man page by typing man ipmitool Note In the example commands shown in this appendix the default user name root and default password changeme are shown You should type the user name and password that has been set for the server Enabling the Anonymou
34. arts and opens its first GUI window In the Sun VTS GUI press Enter or click the Start button when you are prompted to start the tests The test suite will run until it encounters an error or the test is completed Note The CD will take approximately nine minutes to boot When SunVTS software completes the test review the log files generated during the test SunVTS provides access to four different log files m SunVTS test error log contains time stamped SunVTS test error messages The log file path name is var opt SUNWvts logs sunvts err This file is not created until a SunVTS test failure occurs m SunVTS kernel error log contains time stamped SunVTS kernel and SunVTS probe errors SunVTS kernel errors are errors that relate to running SunVTS and not to testing of devices The log file path name is var opt SUNWVvts logs vtsk err This file is not created until SunVTS reports a SunVTS kernel error m SunVTS information log contains informative messages that are generated when you start and stop the SunVTS test sessions The log file path name is var opt SUNWvts logs sunvts info This file is not created until a SunVTS test session runs Chapter 2 Diagnostic Testing Software 21 m Solaris system message log is a log of all the general Solaris events logged by syslogd The path name of this log file is var adm messages a Click the Log button The Log file window is displayed b Specify the log file that you
35. ble Diagnostics CD SunVTS software is preinstalled on these servers The server is also shipped with the Bootable Diagnostics CD 705 1439 This CD is designed so that the server will boot from the CD This CD will boot the Solaris Operating System and start SunVTS software Diagnostic tests will run and write output to log files that the service technician can use to determine the problem with the server Requirements a To use the Bootable Diagnostics CD you must have a keyboard mouse and monitor attached to the server on which you are performing diagnostics 20 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Using the Bootable Diagnostics CD To use the Bootable Diagnostics CD to perform diagnostics With the server powered on insert the Bootable Diagnostics CD into the DVD ROM drive Reboot the server but press F2 during the start of reboot so that you can change the BIOS setting for boot device priority When the BIOS Main menu appears navigate to the BIOS Boot menu Instructions for navigating within the BIOS screens are printed on the BIOS screens On the BIOS Boot menu screen select Boot Device Priority The Boot Device Priority screen appears Select the DVD ROM drive to be the primary boot device Save and exit the BIOS screens Reboot the server When the server reboots from the CD in the DVD ROM drive the Solaris Operating System boots and SunVTS software st
36. blems 4 Externally Inspecting the Server 4 Internally Inspecting the Server 5 Troubleshooting DIMM Problems 8 How DIMM Errors Are Handled By the System 8 Uncorrectable DIMM Errors 8 Correctable DIMM Errors 9 BIOS DIMM Error Messages 9 DIMM Fault LEDs 11 DIMM Population Rules 14 Sun Fire X4100 X4200 Rules 14 Sun Fire X4100 M2 X4200 M2 Rules 15 Isolating and Correcting DIMM ECC Errors 16 Contents iii Diagnostic Testing Software 19 SunVTS Diagnostic Tests 19 SunVTS Documentation 20 Diagnosing Server Problems With the Bootable Diagnostics CD 20 Requirements 20 Using the Bootable Diagnostics CD 21 BIOS Event Logs and POST Codes 23 Viewing BIOS Event Logs 23 Power On Self Test POST 25 How BIOS POST Memory Testing Works 25 Redirecting Console Output 26 Changing POST Options 27 POST Codes 28 POST Code Checkpoints 30 Status Indicator LEDs 35 External Status Indicator LEDs 35 Internal Status Indicator LEDs 39 Using the ILOM SP GUI to View System Information 43 Making a Serial Connection to the SP 44 Viewing ILOM SP Event Logs 45 Interpreting Event Log Time Stamps 47 Viewing Replaceable Component Information 48 Viewing Temperature Voltage and Fan Sensor Readings 50 Using IPMItool to View System Information 55 About IPMI 56 About IPMItool 56 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 IPMItool Man Page 56 Connecting to the Server With IPMItool 57 Enabling the Anonymous User 57 Changi
37. by substituting a11 for the sensor ID That way you can easily get a list of all LEDs and their status with one command See LED Sensor IDs on page 66 and LED Modes on page 68 for information about the variables in these commands LED Sensor IDs All LEDs in these servers are represented by two sensors m A Generic Device Locator record describes the location of the sensor in the system It has an 1ed suffix and is the name that is fed into the led set and led get commands You can get a list of all of these sensors by issuing the sdr list generic command m A Digital Discrete fault sensor monitors the status of the LED pin and is asserted when the LED is active These sensors have a fail suffix and are used to report events to the SEL Each LED has both a descriptor and a status reading sensor and the two are linked that is if you use the 1ed sensor to turn on a particular LED then the status change is represented in the associated fai1 sensor Also for some of these an event is generated in the SEL For LEDs that blink on failure instead of steady on there events are not generated this is because display an event every time it flashed in the blink cycle TABLE D 2 lists the LED sensor IDs in these servers See Status Indicator LEDs on page 35 for diagrams of the LED locations 66 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE D 2 LED Sensor IDs LED Sensor ID
38. cator LEDs which can indicate component malfunction For the LED locations and descriptions of their behavior see Internal Status Indicator LEDs on page 39 Note You can hold down the Locate button on the server back panel or front panel for 5 seconds to initiate a push to test mode that illuminates all other LEDs both inside and outside of the chassis for 15 seconds 4 Verify that there are no loose or improperly seated components 5 Verify that all cable connectors inside the system are firmly and correctly attached to their appropriate connectors 6 Verify that any after factory components are qualified and supported For a list of supported PCI cards and DIMMs refer to the Sun Fire X4100 X4100 M2 and Sun Fire X4200 X4200 M2 Servers Service Manual 819 1157 7 Check that the installed DIMMs comply with the supported DIMM population rules and configurations as described in Troubleshooting DIMM Problems on page 8 8 Replace the server covers 6 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 9 10 To restore main power mode to the server all components powered on use a ballpoint pen or other pointed object to press and release the Power button on the server front panel See FIGURE 1 2 or FIGURE 1 3 When main power is applied to the full server the Power OK LED next to the Power button lights and remains lit If the problem with the server is not e
39. comply with the DIMM Population Rules on page 14 3 Inspect the fault LEDs on the DIMM slot ejectors and the CPU LEDs on the motherboard See FIGURE 1 4 If any of these LEDs are lit they can indicate the component with the fault 4 Disconnect the AC power cords from the server ground any unpainted metal surface The system s printed circuit boards and hard Caution Before handling components attach an ESD wrist strap to a chassis disk drives contain components that are extremely sensitive to static electricity 5 Remove the DIMMs 6 Visually inspect the DIMMs for physical damage dust or any other contamination on the connector or circuits 7 Visually inspect the DIMM slot for physical damage Look for cracked or broken plastic on the slot 8 Dust off the DIMMs clean the contacts and reseat them 9 If there is no obvious damage exchange the individual DIMMs between the two slots of a given pair Ensure that they are inserted correctly with ejector latches secured Using the example remove the DIMMs from CPUO slots 0 1 then reinstall the DIMM from slot 1 into slot 0 reinstall the DIMM from slot 0 into slot 1 10 Reconnect AC power cords to the server 16 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 11 12 13 14 15 16 17 Power on the server and run the diagnostics test again Review the log file m If the error now appears in
40. cting Console Output Use these instructions to access the service processor and redirect the console output so that the BIOS POST codes can be read 1 Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power on self test POST The BIOS Main menu screen is displayed 2 When the BIOS Main menu screen is displayed select Advanced 3 When the Advanced Settings screen is displayed select IPMI 2 0 Configuration 4 When the IPMI 2 0 Configuration screen is displayed select the LAN Configuration menu item 5 Select the IP Address menu item The service processor s IP address is displayed using the following format Current IP address in BMC XXX XKX XXX XKX 6 Start a web browser and type the service processor s IP address in the browser s URL field 7 When you are prompted for a user name and password type the following m User Name root Password changeme 8 When the ILOM Service Processor web GUI screen is displayed click the Remote Control tab 9 Click the Redirection tab 10 Set the color depth for the redirection console at either 6 or 8 bits 11 Click the Start Redirection button 12 When you are prompted for a user name and password type the following m User Name root Password changeme The current POST screen is displayed 26 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Changing POST Options These instruct
41. d DIMM fault LEDs in DIMM ejector levers Back panel of server DIMM SW2 GRASP board power status LED on the GRASP board DIMM AO 1 DIMM AO DIMM Bo m DIMM BO DIMM AT m DIMM A1 DIMM B1 DIMM B1 DIMM fault LEDs CPU fault in DIMM ejector levers LEDs on the motherboard Fan module fault LEDs on fan modules FIGURE B 4 Sun Fire X4100 M2 X4200 M2 Internal LED Locations Appendix B Status Indicator LEDs 41 TABLE B 3 Internal LED Functions LED Name Description DIMM Fault LED The ejector levers on the DIMM slots hold the LEDs CPU Fault LED on motherboard Fan Module Fault LED GRASP Board Power Status LED This LED has two states e Off DIMM is OK e Lit amber DIMM has failed This LED has two states e Off CPU is OK e Lit amber CPU has encountered a voltage or heat error condition This LED has two states e Off Fan module is OK e Lit amber Fan module has failed This LED has two states e Off standby power is not reaching the GRASP board Lit green 3 3V standby power is reaching the GRASP board 42 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 APPENDIX C Using the ILOM SP GUI to View System Information Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted This appendix contains information about using the Inte
42. emory sockets are colored black or white to indicate which slots are paired by matching colors a CPUs with only a single pair of DIMMs must have those DIMMs installed in that CPU s white DIMM slots A1 and B1 m See TABLE 1 2 for supported DIMM configurations TABLE 1 2 Sun Fire X4100 X4200 M2 Supported DIMM Configurations DDR2 Only Slot A1 Slot B1 Slot AO Slot BO Total Memory Per CPU 1 GB 1 GB 0 0 2 GB 1 GB 1 GB 1 GB 1 GB 4 GB 2 GB 2 GB 1 GB 1 GB 6 GB 4 GB 4 GB 1 GB 1 GB 10 GB 2 GB 2 GB 0 0 4 GB 2 GB 2 GB 2 GB 2 GB 8 GB 4 GB 4 GB 2 GB 2 GB 12 GB 4 GB 4 GB 0 0 8 GB 4 GB 4 GB 4 GB 4 GB 16 GB Chapter 1 Initial Inspection of the Server 15 Isolating and Correcting DIMM ECC Errors If your log files report an ECC error or a problem with a DIMM complete the steps below until you can isolate the fault Note The slot numbers given in the following example use the slot numbering from Sun Fire X4100 X4200 servers The pair 0 1 is equivalent to pair A1 B1 and pair 2 3 is equivalent to pair A0 B0 in the Sun Fire X4100 M2 X4200 M2 servers In this example the log file reports an error with the DIMM in CPUO slot 1 The fault LEDs on CPUO slots 0 1 are lit 1 If you have not already done so shut down your server to standby power mode and remove the main cover Refer to the Sun Fire X4100 and Sun Fire X4200 Servers Service Manual 819 1157 2 Inspect the installed DIMMs to ensure that they
43. en is displayed b Type your user name and password When you first try to access the ILOM Service Processor you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 2 From the System Monitoring tab choose Event Logs The System Event Logs page is displayed See FIGURE C 1 User root Administrator r SUNSPO20000970192 Sun Integrated Lights Out Manager REFRESH LOG OUT Sun Microsystems Inc System Information System Monitoring i Configuration User Management Remote Control Maintenance Sensor Readings Event Logs Locator Indicator System Event Logs View sensor specific BIOS generated or system management software event logs Select an event log category Sensor Specitic Events x og 4 event entries Event ID Time Stamp Sensor Name Sensor Type Description 4 12 31 1969 16 01 01 pst vinok Power Supply State Asserted Asserted 3 12131969 16 01 01 psO prsnt Entity Presence Device Removed Device Absent Asserted 2 12 31 1969 16 00 57 pst prsnt Entity Presence Device Inserted Device Present Asserted 1 12 31 1969 16 00 56 ps1 pwrok Power Supply State Deasserted Asserted FIGURE C 1 Sample System Event Logs Screen 3 Select a category of event that you want to view in the log from the drop down list box Appendix C Using the ILOM SP GUI to View System Informat
44. ervers are shipped with a Bootable Diagnostics CD 705 1439 that contains SunVTS software SunVTS is the Sun Validation Test Suite which provides a comprehensive diagnostic tool that tests and validates Sun hardware by verifying the connectivity and functionality of most hardware controllers and devices on Sun platforms SunVTS software can be tailored with modifiable test instances and processor affinity features Only the following tests are supported on x86 platforms The current x86 support is for the 32 bit operating system only CD DVD Test cddvdtest CPU Test cputest Disk and Floppy Drives Test disktest Data Translation Look aside Buffer dtlbtest Floating Point Unit Test fputest Network Hardware Test nettest Ethernet Loopback Test netlbtest Physical Memory Test pmemtest Serial Port Test serialtest System Test systest m Universal Serial Bus Test usbtest a Virtual Memory Test vmemtest SunVTS software has a sophisticated graphical user interface GUI that provides test configuration and status monitoring The user interface can be run on one system to display the SunVTS testing of another system on the network SunVTS software also provides a TTY mode interface for situations in which running a GUI is not possible SunVTS Documentation For the most up to date information on SunVTS software go to this site http docs sun com app docs coll 1140 2 Diagnosing Server Problems With the Boota
45. fore after the OS adjusts the RTC the time set by the BIOS will be UTC a When the users sets the RTC using the host BIOS Setup screen Continuously via NTP If NTP is enabled on the SP NTP jumping is enabled to recover quickly from an erroneous update from the BIOS or user NTP servers provide UTC time Therefore if NTP is enabled on the SP the SP clock will be in UTC Via the CLI ILOM web GUI and IPMI Appendix C Using the ILOM SP GUI to View System Information 47 Viewing Replaceable Component Information Depending on the component you select information about the manufacturer component name serial number and part number can be displayed 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login screen is displayed b Type your user name and password When you first try to access the ILOM Service Processor you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 2 From the System Information tab choose Components The Replaceable Component Information page is displayed See FIGURE C 2 48 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 System Information Versions Session Time Out Components Replaceable Component Informa
46. g needed during End POST just before giving control to runtime code booting to OS Programmed the system BIOS 0F0000h shadow RAM cacheability Ported to handle any OEM specific programming needed during End POST Copy OEM specific data from POST_DSEG to RUN_CSEG 32 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE A 2 POST Code Checkpoints Continued Post Code Description B1 Save system context for ACPI 00 Prepares CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLIHLT state 61 70 OEM POST Error This range is reserved for chipset vendors and system manufacturers The error associated with this value may be different from one platform to the next Appendix A BIOS Event Logs and POST Codes 33 34 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 APPENDIX B Status Indicator LEDs This appendix describes the locations and definitions of the system LEDs Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted External Status Indicator LEDs FIGURE B 1 and FIGURE B 2 show the locations of the external status indicator LEDs A Sun Fire X4200 X4200 M2 server is shown but the LED locations are the same for the Sun Fire X4100 X4100 M2 servers Refer to TABLE B 1 and TABLE B 2 for descriptions of the L
47. go and display Option ROM output Keep Current Do not remove the Sun logo The Option ROM output is not displayed Boot Num Lock This option is On by default keyboard Num Lock is turned on during boot If you set this to off the keyboard Num Lock is not turned on during boot Wait for F1 if Error This option is disabled by default If you enable this the system will pause if an error is found during POST and will only resume when you press the F1 key a Interrupt 19 Capture This option is reserved for future use Do not change Appendix A BIOS Event Logs and POST Codes 27 POST Codes TABLE A 1 contains descriptions of each of the POST codes listed in the same order in which they are generated These POST codes appear as a four digit string that is a combination of two digit output from primary I O port 80 and two digit output from secondary I O port 81 In the POST codes listed in TABLE A 1 the first two digits are from port 81 and the last two digits are from port 80 TABLE A 1 POST Codes Post Code Description 00d0 Coming out of POR PCI configuration space initialization Enabling the AMD controller s SMBus 00d1 Keyboard controller BAT Waking up from PM Saving power on CPUID in scratch CMOS 00d2 Disable cache full memory sizing and verify that flat mode is enabled 00d3 Memory detections and sizing in boot block cache disabled IO APIC enabled 01d4 Test base 512KB memory Adjust policies and cache f
48. grated Lights Out Manager ILOM Service processor SP GUI to view monitoring and maintenance information for your server m Making a Serial Connection to the SP on page 44 m Viewing ILOM SP Event Logs on page 45 m Viewing Replaceable Component Information on page 48 m Viewing Temperature Voltage and Fan Sensor Readings on page 50 For more information on using the ILOM SP GUI to maintain the server for example configuring alerts refer to the Integrated Lights Out Manager ILOM Administration Guide 819 1160 If any of the logs or information screens indicate a DIMM error see Troubleshooting DIMM Problems on page 8 and Isolating and Correcting DIMM ECC Errors on page 16 If the problem with the server is not evident after viewing ILOM SP logs and information continue with SunVTS Diagnostic Tests on page 19 43 Making a Serial Connection to the SP 1 Connect a serial cable from the RJ 45 Serial Management SER MGT port on your ILOM SP to a terminal device 2 Press ENTER on the terminal device to establish a connection between that terminal device and the ILOM SP Note If you are connecting to the serial port on the SP before it has been powered up or during its power up sequence you will see bootup messages displayed The service processor eventually displays a login prompt For example SUNSP0003BA84D777 login The first string in the prompt is the default
49. host name for the ILOM SP It consists of the prefix SUNSP and the MAC address of the ILOM SP The MAC address for each ILOM SP is unique 3 Log in to the SP and type the default user name root with the default password changeme Once you have successfully logged in to the SP it displays its default command prompt gt 4 To start the serial console type the following commands cd SP console start 5 Determine whether you could successfully connect to the SP m If you could not connect to the SP there is likely a problem with the graphics redirect and service processor GRASP board Replace this board and then repeat this procedure a If you could connect to the SP continue with the following procedures a Viewing ILOM SP Event Logs on page 45 a Viewing Replaceable Component Information on page 48 Viewing Temperature Voltage and Fan Sensor Readings on page 50 44 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Viewing ILOM SP Event Logs The IPMI system event log SEL provides status information about the server s hardware and software to the ILOM software which displays the events in the ILOM web GUI Events are notifications that occur in response to some actions 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login scre
50. ics CD 20 documentation 20 logs 21 overview 19 system errors handling 79 System Overheat Fault LED 37 T time stamps in ILOM SP SEL 47 troubleshooting guidelines 3 U uncorrectable errors handling 71 V visual inspection of system 4 Index 89 90 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007
51. ion 45 You can select from the following types of events m Sensor specific events These events relate to a specific sensor for a component for example a fan sensor or a power supply sensor a BIOS generated events These events relate to error messages generated in the BIOS m System management software events These events relate to events that occur within the ILOM software After you have selected a category of event the Event Log table is updated with the specified events The fields in the Event Log are described in TABLE C 1 TABLEC 1 Event Log Fields Field Description Event ID Time Stamp Sensor Name Sensor Type Description The number of the event in sequence from number 1 The day and time the event occurred If the Network Time Protocol NTP server is enabled to set the SP time the SP clock will use Universal Coordinated Time UTC For more information about time stamps see Interpreting Event Log Time Stamps on page 47 The name of a component for which an event was recorded The sensor name abbreviations correspond to these components sys System or chassis e p0 Processor 0 e pl Processor 1 e io I O board e ps Power supply e fp Front panel e ft Fan tray e mb Motherboard The type of sensor for the specified event A description of the event 4 To clear the event log click the Clear Event Log button A confirmation dialog box is displayed 5 Click OK t
52. ions are optional but you can use them to change the operations that the server performs during POST testing Initialize the BIOS Setup utility by pressing the F2 key while the system is performing the power on self test POST The BIOS Main menu screen is displayed When the BIOS Main menu screen is displayed select Boot The Boot Settings screen is displayed When the Boot Settings screen is displayed select Boot Settings Configuration The Boot Settings Configuration screen is displayed On the Boot Settings Configuration screen there are several options that you can enable or disable m Quick Boot This option is disabled by default If you enable this the BIOS skips certain tests while booting such as the extensive memory test This decreases the time it takes for the system to boot a System Configuration Display This option is disabled by default If you enable this the System Configuration screen is displayed before booting begins m Quiet Boot This option is disabled by default If you enable this the Sun Microsystems logo is displayed instead of POST codes a Language This option is reserved for future use Do not change m Add On ROM Display Mode This option is set to Force BIOS by default This option has effect only if you have also enabled the Quiet Boot option but it controls whether output from the Option ROM is displayed The two settings for this option are as follows Force BIOS Remove the Sun lo
53. irst 8MB 01d5 Bootblock code is copied from ROM to lower RAM BIOS is now executing out of RAM 01d6 Key sequence and OEM specific method is checked to determine if BIOS recovery is forced If next code is E0 BIOS recovery is being executed Main BIOS checksum is tested 01d7 Restoring CPUID moving bootblock runtime interface module to RAM determine whether to execute serial flash 01d8 Uncompressing runtime module into RAM Storing CPUID information in memory 01d9 Copying main BIOS into memory Olda Giving control to BIOS POST 0004 Check CMOS diagnostic byte to determine if battery power is OK and CMOS checksum is OK If the CMOS checksum is bad update CMOS with power on default values 00c2 Set up boot strap processor for POST This includes frequency calculation loading BSP microcode and applying user requested value for GART Error Reporting setup question 00c3 Errata workarounds applied to the BSP 78 amp 110 00c6 Re enable cache for boot strap processor and apply workarounds in the BSP for errata 106 107 69 and 63 if appropriate 00c7 HT sets link frequencies and widths to their final values 000a Initializing the 8042 compatible Keyboard Controller 000c Detecting the presence of Keyboard in KBC port 000e Testing and initialization of different Input Devices Traps the INTO9h vector so that the POST INTO9h handler gets control for IRQ1 28 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics G
54. ist all files You have mail su Password Read Chapter 6 in the User s Guide These are called class options You must be superuser to do this To delete a file type rm filename Sun is not responsible for the availability of third party web sites mentioned in this document Sun does not endorse and is not responsible or liable for any content advertising products or other materials that are available on or through such sites or resources Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content goods or services that are available on or through such sites or resources Preface ix Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions You can submit your comments by going to http www sun com hwdocs feedback Please include the title and part number of your document with your feedback Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide part number 819 3284 17 x Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 CHAPTER 1 Initial Inspection of the Server Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted Service Visit Troubleshooting Flowchart Use the following flowchart as a guideline for using the subjects in this book t
55. kernel Dazed and confused but trying to continue Aug 5 05 15 00 d mpk12 53 159 kernel Do you have a strange power saving mode enabled Aug 5 05 15 00 d mpk12 53 159 kernel Uhhuh NMI received for unknown reason 3d on CPU 1 Aug 5 05 15 00 d mpk12 53 159 kernel Dazed and confused but trying to continue Aug 5 05 15 00 d mpk12 53 159 kernel Do you have a strange power saving mode enabled Aug 5 05 15 00 d mpk12 53 159 kernel Uhhuh NMI received for unknown reason 3d on CPU 0 Appendix E Error Handling 77 Aug 5 05 15 00 d mpk12 53 159 kernel Dazed and confused but trying to continue Aug 5 05 15 00 d mpk12 53 159 kernel Do you have a strange power saving mode enabled Aug 5 05 15 00 d mpk12 53 159 kernel Dazed and confused but trying to continue Aug 5 05 15 00 d mpk12 53 159 kernel Do you have a strange power saving mode enabled Note The Linux system reboots but does not inform the BIOS of this incident 78 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Handling of System Errors SERR This section lists facts and considerations about how the server handles system errors SERR m System error handling works through the HyperTransport Synch Flood Error mechanism in the AMD controller m The following events happen during BIOS POST POST reports of any previous system errors at the bottom of screen See FIGURE E 5 for an example American Wy Su
56. mmand as shown in the following example only two FRU devices are shown in the example but all devices would be shown ipmitool I lanplus H FRU Device Description Board Mfg Board Product Board Serial Board Part Number Board Extra Board Extra Board Extra Board Extra Board Extra Product Product Manufacturer Name FRU Device Description Product Product Product Product Product Product Manufacturer Name Part Number Serial Extra Extra lt IPADDR gt U root P changeme fru print Builtin FRU Device ID 0 BENCHMARK ELECTRONICS ASSY SERV PROCESSOR X4X00 0060HSV 0523000195 501 6979 02 000 000 00 HUNTSVILLE AL USA b302 06 GRASP SUN MICROSYSTEMS ILOM sp net0 fru ID 2 MOTOROLA FAST ETHERNET CONTROLL MPC8248 FCC 00 03 BA D8 73 AC 01 00 03 BA D8 73 AC sal ps Appendix D Using IPMItool to View System Information 65 Viewing and Setting Status LEDs In these servers all LEDS are active driven that is the SP is responsible for the I2C commands that assert and deassert each GPIO pin for each flash cycle The IPMItool command for reading LED status is ipmitool I lanplus H lt IPADDR gt sunoem led get lt sensor ID gt The IPMItool command for setting LED status is ipmitool I lanplus H lt IPADDR gt sunoem led set lt sensor ID gt lt LED mode gt It is possible for both of these commands to operate on all sensors at once
57. n Megatrends 4 j www ami com microsystems BMC Firmware Revision 1 00 hecking NURAM Initializing USB Controllers Done Press F2 to run Setup CTRL E on Remote Keyboard Press F12 to boot from the network CTRL N on Remote Keyboard ISB Device s 3 Keyboards 3 Mice 2 Storage Devices Auto Detecting Pri Master ATAPI CDROM ri Master DU 285L 1 0A DE 8 Ultra DMA Mode 2 Auto detecting USB Mass Storage Devices Device 01 AMI Virtual CDROM Device 02 AMI Virtual Floppy 32 USB mass storage devices found and configured 0085 BMC Respond ing 1 Hyper Transport sync flood error occurred on last boot PCI System Error FIGURE E 5 Sample POST Screen Previous System Error Listed SERR and HyperTransport Synch Flood Error are logged in DMI and the SP SEL See the following sample output SEL Record ID 0a00 Record Type 00 Timestamp 08 10 2005 06 05 32 Generator ID 0001 EvM Revision 04 Sensor Type Critical Interrupt Sensor Number 00 Event Type Sensor specific Discrete Appendix E Error Handling 79 Event Direction Assertion Event Event Data x OSfffE Description PCI SERR m FIGURE E 6 shows an example DMI log screen from the BIOS Setup Page with a system error BIOS SETUP UTILITY 09 12 05 14 23 47 A Hyper Transport s f d error occurred on last boot 5 14 23 36 System Error 2 53 C Copyright 1985 2002 American Megatrends Inc FIGURE E 6 Sample DMI Log Screen System Err
58. n page 76 Handling of System Errors SERR on page 79 Handling Mismatching Processors on page 81 Hardware Error Handling Summary on page 82 Handling of Uncorrectable Errors This section lists facts and considerations about how the server handles uncorrectable errors Note The BIOS ChipKill feature must be disabled if you are testing for failures of multiple bits within a DRAM ChipKill corrects for the failure of a four bit wide DRAM The BIOS logs the error to the SP system event log SEL through the board management controller BMC The SP s SEL is updated with the failing DIMM pair s particular bank address The system reboots The BIOS logs the error in DMI 71 Note If the error is on low 1MB the BIOS freezes after rebooting Therefore no DMI log is recorded m An example of the error is reported by the SEL through IPMI 2 0 is as follows a When low memory is erroneous the BIOS is frozen on pre boot low memory test because the BIOS cannot decompress itself into faulty DRAM and execute the following items ipmitool gt sel list 100 08 26 2005 11 36 09 OEM 0xfb 200 08 26 2005 11 36 12 System Firmware Error No usable system memory 300 08 26 2005 11 36 12 Memory Memory Device Disabled CPU 0 DIMM 0 a When the faulty DIMM is beyond the BIOS s low 1MB extraction space proper boot happens ipmitool gt sel list 100 08 26 2005 05 04 04 OEM 0xfb
59. nVTS and Solaris are trademarks or registered trademarks of Sun Microsystems Inc in the U S and in other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and in other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees viho implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements U S Government Rights Commercial use Government users are subject to the Sun Microsystems Inc standard license agreement and applicable provisions of the FAR and its supplements DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2007 Sun Microsystems Inc 4150 Network Circle Santa Clara Californie 95054 Etats Unis Tous droits r
60. ndling of Parity Errors PERR This section lists facts and considerations about how the server handles parity errors PERR m The handling of parity errors works through NMIs a During BIOS POST the NMI is logged in the DMI and the SP SEL See the following example command and output root d mpk12 53 238 root ipmitool H 129 146 53 95 U root P changeme SEL Record ID Record Type Timestamp Generator ID EvM Revision Sensor Type Sensor Number Event Type Event Event Direction Data Description I lan sel list v 0100 00 01 10 2002 20 16 16 0001 04 Critical Interrupt 00 Sensor specific Discrete Assertion Event O4 00 PCI PERR m FIGURE E 4 shows an example of a DMI log screen from BIOS Setup Page with a parity error 76 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 BIOS SETUP UTILITY View Event Log View Event Log 09 12 05 14 27 47 PCI Parity FIGURE E 4 Sample DMI Log Screen PCI Parity Error m The BIOS displays the following messages and freezes during POST or DOS e NMI EVENT m System Halted due to Fatal NMI m The Linux NMI trap catches the interrupt and reports the following NMI confusion report sequence Aug 5 05 15 00 d mpk12 53 159 kernel Uhhuh NMI received for unknown reason 2d on CPU 0 Aug 5 05 15 00 d mpk12 53 159 kernel Uhhuh NMI received for unknown reason 2d on CPU 1 Aug 5 05 15 00 d mpk12 53 159
61. ng the Default Password 58 Configuring an SSH Key 58 Using IPMItool to Read Sensors 59 Reading Sensor Status 59 Reading All Sensors 59 Reading Specific Sensors 60 Using IPMItool to View the ILOM SP System Event Log 62 Viewing the SEL With IPMItool 62 Clearing the SEL With IPMItool 63 Using the Sensor Data Repository SDR Cache 64 Sensor Numbers and Sensor Names in SEL Events 64 Viewing Component Information With IPMItool 65 Viewing and Setting Status LEDs 66 LED Sensor IDs 66 LED Modes 68 LED Sensor Groups 68 Using IPMItool Scripts For Testing 69 Error Handling 71 Handling of Uncorrectable Errors 71 Handling of Correctable Errors 74 Handling of Parity Errors PERR 76 Handling of System Errors SERR 79 Handling Mismatching Processors 81 Hardware Error Handling Summary 82 Contents v vi Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Preface This Guide contains information and procedures for troubleshooting problems with the servers Before You Read This Document It is important that you review the safety guidelines in the Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Safety and Compliance Guide 819 1161 Using UNIX Commands This document might not contain information about basic UNIX commands and procedures such as shutting down the system booting the system and configuring devices Refer to the following for this information m Software documentation
62. nit Time stamp Phys Security 0x01 Gen Chassis intrusion 500 Pre Init Time stamp Entity Presence 0x12 Device Present Note When you use this command an event record gives a sensor number but does not display the name of the sensor for the event For example in line 100 in the sample output above the sensor number 0x16 is displayed For information about how to map sensor names to the different sensor number formats that might be displayed see Sensor Numbers and Sensor Names in SEL Events on page 64 a View the ILOM SP SEL with a detailed event output by using the sel elist command instead of sel list The sel elist command cross references event records with sensor data records to produce descriptive event output It takes longer to execute because it has to read from both the SEL and the Static Data Repository SDR For increased speed generate an SDR cache before using the sel elist command See Using the Sensor Data Repository SDR Cache on page 64 For example 62 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 ipmitool I lanplus H lt IPADDR gt U root P changeme sel elist first 3 100 Pre Init Time stamp Temperature fp t_amb Upper Non critical going high Reading 31 gt Threshold 30 degrees C 200 Pre Init Time stamp Power Supply psl pwrok State Deasserted 300 Pre Init Time stamp Entity Presence psl prsnt Device Present Certain qualifiers are av
63. nplus H lt IJPADDR gt U root P changeme sunoem sshkey set 2 id_rsa pub Setting SSH key for user id 2 done You can also clear the key for a particular user for example ipmitool I lanplus H lt IPADDR gt U root P changeme sunoem sshkey del 2 Deleted SSH key for user id 2 58 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Using IPMItool to Read Sensors For more information about supported IPMI 2 0 commands and the sensor naming for this server also refer to the Integrated Lights Out Manager ILOM Administration Guide 819 1160 and the Integrated Lights Out Manager Supplement for Sun Fire X4100 and Sun Fire X4200 Servers 819 5464 Reading Sensor Status There are a number of ways to read sensor status from a broad overview that lists all sensors to querying individual sensors and returning detailed information on them See the following sections m Reading All Sensors on page 59 m Reading Specific Sensors on page 60 Reading All Sensors To get a list of all sensors in these servers and their status use the sdr list command with no arguments This returns a large table with every sensor in the system and its status The five fields of the output lines as read from left to right are 1 IPMI sensor ID 16 character maximum 2 IPMI sensor number 3 Sensor status indicating which thresholds have been exceeded 4 Entity ID and instance 5 Sensor reading For
64. o troubleshoot the server To perform this task Gather initial service visit information Investigate any powering on problems Perform external visual inspection and internal visual inspection View BIOS event logs and POST messages View service processor logs and sensor information View service processor logs and sensor information Run SunVTS diagnostics FIGURE 1 1 Troubleshooting Flowchart Refer to these sections Gathering Service Visit Information on page 3 Troubleshooting Power Problems on page 4 Externally Inspecting the Server on page 4 Internally Inspecting the Server on page 5 Troubleshooting DIMM Problems on page 8 Viewing BIOS Event Logs on page 23 Power On Self Test POST on page 25 Using the ILOM SP GUI to View System Infor mation on page 43 Using IPMItool to View System Information on page 55 Diagnosing Server Problems With the Boota ble Diagnostics CD on page 20 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Gathering Service Visit Information The first step in determining the cause of the problem with the server is to gather Whatever information you can from the service call paperwork or the on site personnel Use the following general guideline steps when you begin troubleshooting Collect information ab
65. o clear all entries in the log 6 If the problem with the server is not evident after viewing ILOM SP logs and information continue with SunVTS Diagnostic Tests on page 19 46 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Interpreting Event Log Time Stamps The system event log time stamps are related to the service processor clock settings If the clock settings change the change is reflected in the time stamps When the service processor reboots the SP clock is set to Thu Jan 1 00 00 00 UTC 1970 The SP reboots as a result of the following A complete system unplug replug power cycle An IPMI command for example mc reset cold A command line interface CLI command for example reset SP ILOM web GUI operation for example from the Maintenance tab selecting Reset SP An SP firmware upgrade After an SP reboot the SP clock is changed by the following When the host is booted The host s BIOS unconditionally sets the SP time to that indicated by the host s RTC The host s RTC is set by the following operations a When the host s CMOS is cleared as a result of changing the host s RTC battery or inserting the CMOS clear jumper on the motherboard The host s RTC starts at Jan 1 00 01 00 2002 a When the host s operating system sets the host s RTC The BIOS does not consider time zones Solaris and Linux software respect time zones and will set the system clock to UTC There
66. or Listed 80 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Handling Mismatching Processors This section lists facts and considerations about how the server handles mismatching processors m The BIOS performs a complete POST m The BIOS displays a report of any mismatching CPUs as shown in the following example Note The following example report the names of the AMD controllers in the original Sun Fire X4100 X4200 are used AMIBIOS C 2003 American Megatrends Inc BIOS Date 08 10 05 14 51 11 Ver 08 00 10 CPU AMD Opteron tm Processor 254 Speed 2 4 GHz Count 3 CPU Revision CPUO E4 CPUL E6 Microcode Revision CPUO 0 CPU1 0 DRAM Clocking CPUO 400 MHz CPU1 Core0 1 400 MHz Sun Fire X4100 Server 1 AMD North Bridge Rev E4 1 AMD North Bridge Rev E6 1 AMD 8111 I O Hub Rev C2 2 AMD 8131 PCI X Controllers Rev B2 System Serial Number O505AMF028 BMC Firmware Revision 1 00 Checking NVRAM Initializing USB Controllers Done Press F2 to run Setup CTRL E on Remote Keyboard Press F12 to boot from the network CTRL N on Remote Keyboard Press F8 for BBS POPUP CTRL P on Remote Keyboard a No SEL or DMI event is recorded m The system enters Halt mode and the following message is displayed XXXXXXX Warning Bad Mix of Processors Multiple core processors cannot be installed with single core processors Fatal Error System
67. ote To see the CPU LEDs or the GRASP board LED you must put the server in standby power mode shut down with the front panel Power button but do not disconnect the AC power cords Note the following differences between the original Sun Fire X4100 X4200 and the Sun Fire X4100 X4200 M2 servers regarding the power requirements for viewing the DIMM fault LEDs m For the original Sun Fire X4100 X4200 servers to see the DIMM fault LEDs you must put the server in standby power mode with the AC power cords attached See Internally Inspecting the Server on page 5 m For the Sun Fire X4100 X4200 M2 servers you can view the DIMM fault LEDs without the power cords attached These LEDs can be lit by a capacitor on the motherboard for up to one minute To light the DIMM fault LEDs from the capacitor push the small button on the motherboard labeled DIMM SW2 See FIGURE B 4 FIGURE B 3 shows the internal LEDs in the Sun Fire X4100 X4200 servers FIGURE B 4 shows the internal LEDs in the Sun Fire X4100 X4200 M2 servers Appendix B Status Indicator LEDs 39 Back panel of server DIMM 3 DIMM 1 DIMM 2 DIMM 0 CPU fault LEDs on the motherboard Fan module fault LEDs on fan modules Front panel of server FIGURE B 3 Sun Fire X4100 X4200 Internal LED Locations 40 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 GRASP board power status LED on the GRASP boar
68. out the following items Events that occurred prior to the failure Whether any hardware or software was modified or installed Whether the server was recently installed or moved How long the server exhibited symptoms The duration or frequency of the problem Document the server settings before you make any changes If possible make one change at a time in order to isolate potential problems In this way you can maintain a controlled environment and reduce the scope of troubleshooting Take note of the results of any change you make Include any errors or informational messages Check for potential device conflicts before you add a new device Check for version dependencies especially with third party software Serial Number Locations The system serial number is located on a sticker that is attached to the front bezel see FIGURE 1 2 or FIGURE 1 3 for the location If the bezel is missing a second serial number label is affixed to the system m For Sun Fire X4100 X4100 M2 servers the second sticker is attached to the top of m For Sun Fire X4200 X4200 M2 servers the second sticker is attached to the side of the chassis If you are facing the chassis front the sticker is on the left side near the front Chapter 1 Initial Inspection of the Server 3 System Inspection Improperly set controls and loose or improperly connected cables are common causes of problems with hardware components Troubleshooting Power
69. put Devices Also update the Kernel Variables Traps the INTO9h vector so that the POST INT09h handler gets control for IRQ1 Uncompress all available language BIOS logo and Silent logo modules 13 Initialize PM regs and PM PCI regs at Early POST Initialize multi host bridge if system support it Setup ECC options before memory clearing REDIRECTION causes corrected data to written to RAM immediately CHIPKILL provides 4 bit error det corr of x4 type memory Enable PCI X clock lines in the AMD controller 20 Relocate all the CPUs to a unique SMBASE address The BSP will be set to have its entry point at A000 0 If less than 5 CPU sockets are present on a board subsequent CPUs entry points will be separated by 8000h bytes If more than 4 CPU sockets are present entry points are separated by 200h bytes CPU module will be responsible for the relocation of the CPU to correct address NOTE APs are left in the INIT state 24 Uncompress and initialize any platform specific BIOS modules 30 Initialize System Management Interrupt 2A Initializes different devices through DIM 2C Initializes different devices Detects and initializes the video adapter installed in the system that have optional ROMs 2E Initializes all the output devices 31 Allocate memory for ADM module and uncompress it Give control to ADM module for initialization Initialize language and font modules for ADM Activate ADM module 33 Initializes the silent boot module Set the
70. r that is displayed for an event might appear in slightly different formats See the following examples m The sensor number for the sensor ps1 prsnt power supply 1 present can be displayed as either 1Fh or 0x1F m 38h is equivalent to 0x38 m 4Bh is equivalent to 0x4B The output from certain commands might not display the sensor name along with the corresponding sensor number To see all sensor names in your server mapped to the corresponding sensor numbers you can use the following command ipmitool H 129 144 82 21 U root P changeme sdr elist sys id 00h ok 23 0 State Asserted sys intsw 01h ok 23 0 sys psfail 02h ok 23 0 Predictive Failure Asserted In the sample output above the sensor name is in the first column and the corresponding sensor number is in the second column For a detailed explanation of each sensor listed by name refer to the Integrated Lights Out Manager Supplement For Sun Fire X4100 and Sun Fire X4200 Servers 819 5464 64 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 Viewing Component Information With IPMItool You can view information about system hardware components The software refers to these components as field replaceable unit FRU devices To read the FRU inventory information on these servers you must first have the FRU ROMs programmed After that is done you can see a full list of the available FRU data by using the fru print co
71. re or Trfc info Chapter 1 Initial Inspection of the Server 9 NODE n Paired DIMMs Mismatch The following conditions will cause this error message a Paired DIMMs are not same Checksum mismatch NODE n DIMMs Manufacturer Mismatch The following conditions will cause this error message a DIMMs Manufacturer not supported a Only Samsung Micron Infineon and SMART DIMMs are supported This will be displayed when you add Hitachi DIMMs 10 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 DIMM Fault LEDs The ejectors on the DIMM slots on the motherboard contain DIMM fault LEDs Note the following differences between the Sun Fire X4100 X4200 and the X4100 M2 X4200 M2 servers regarding the power requirements for viewing the DIMM fault LEDs m Sun Fire X4100 X4200 servers only To see the DIMM fault LEDs you must put the server in standby power mode with the AC power cords attached See Internally Inspecting the Server on page 5 m Sun Fire X4100 M2 X4200 M2 servers only You can view the DIMM fault LEDs without the power cords attached These LEDs can be lit by a capacitor on the motherboard for up to one minute To light the DIMM fault LEDs from the capacitor push the small button on the motherboard labeled DIMM SW2 See FIGURE 1 5 Note The DIMM fault LEDs always indicate a failed DIMM pair with the LEDs lit on both slots of the pair that contains the failed DIMM See Isolating and
72. resholds FIGURE C 4 Sample Sensor Readings Screen With Thresholds Shown 6 Click the Hide Thresholds button to revert to the sensor readings The sensor readings are redisplayed without the thresholds 7 If the problem with the server is not evident after viewing ILOM SP logs and information continue with SunVTS Diagnostic Tests on page 19 Appendix C Using the ILOM SP GUI to View System Information 53 54 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 APPENDIX D Using IPMItool to View System Information Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted Caution Although you can use IPMItool to view sensor and LED information do not use any interface other than the ILOM CLI or Web GUI to alter the state or configuration of any sensor or LED Doing so could void your warranty This appendix contains information about using the Intelligent Platform Management Interface IPMI to view monitoring and maintenance information for your server This appendix contains the following sections About IPMI on page 56 About IPMItool on page 56 Connecting to the Server With IPMItool on page 57 Using IPMItool to Read Sensors on page 59 Using IPMItool to View the ILOM SP System Event Log on page 62 Viewing Component Information With IPMItool on page 65 Viewing and Setting Status L
73. rques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionnier de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de l informatique Sun d tient une license non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci es de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun LA DOCUMENTATION EST FOURNIE EN L TAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFA ON Ca Adobe PostScript Contents Preface vii Initial Inspection of the Server 1 Service Visit Troubleshooting Flowchart 1 Gathering Service Visit Information 3 Serial Number Locations 3 System Inspection 4 Troubleshooting Power Pro
74. s LED lights when there is a failed front cooling fan module LEDs on the individual fan modules indicate which fan module has failed Power Supply Rear Fan Tray This LED lights when Fault LED e Two power supplies are present in the system but only one has AC power connected To clear this condition either plug in the second power supply or remove it from the chassis Any voltage related event occurs in the system For CPU related voltage errors the associated CPU Fault LED will also be illuminated e For Sun Fire X4200 X4200 M2 only When the rear fan tray has failed or is removed System Overheat Fault LED This LED lights when an upper temperature limit is detected Hard Disk Drive Status LEDs The hard disk drives have three LEDs e Top LED blue Reserved for future use e Middle LED amber Hard disk drive failed Bottom LED green Hard disk drive is OK Rear fan tray fault LED Sun Fire X4200 only Power supply LEDs on each power supply IS il Power OK LED Service action required LED Locate button LED FIGURE B 2 Sun Fire X4200 X4200 M2 Servers Back Panel LEDs Appendix B Status Indicator LEDs 37 TABLE B 2 LED Name Back Panel LED Functions Description Rear Fan Tray Fault LED The rear fan tray and the LED are present only in Sun Fire X4200 X4200 M2 servers Power Supply LEDs Locate button LED Same function as on front panel Service
75. s User In order to enable the Anonymous NULL user you must alter the privilege level on that account This will let you connect without supplying a u user option on the command line The default password for this user is anonymous ipmitool I lanplus H lt IJPADDR gt U root P changeme channel setaccess 1 1 privilege 4 ipmitool I lanplus H lt IPADDR gt P anonymous user list Appendix D Using IPMItool to View System Information 57 Changing the Default Password You can also change the default passwords for a particular user ID First get a list of users and find the ID for the user you wish to change then supply it with a new password as shown in the following command sequence ipmitool I lanplus H lt IPADDR gt U root P changeme user list ID Name Callin Link Auth IPMI Msg Channel Priv Limit 1 false false true NO ACCESS 2 root false false true ADMINISTRATOR ipmitool I lanplus H lt IPADDR gt U root P changeme user set password 2 newpass ipmitool I lanplus H lt IPADDR gt U root P newpass chassis status Configuring an SSH Key You can use IPMItool to configure an SSH key for a remote shell user To do this first determine the user ID for the desired remote SP user with the user list command ipmitool I lanplus H lt IPADDR gt U root P changeme user list Then supply the user ID and the location of the RSA or DSA public key to use with the ipmitool sunoem sshkey command For example ipmitool I la
76. s follows m Front panel ambient temperature fp t_amb Upper non critical 30 degrees C Upper critical 35 degrees C Upper non recoverable 40 degrees C a CPU 0 p0 t_core and CPU 1 p1 t_core die temperatures Upper non critical 55 degrees C Upper critical 65 degrees C Upper non recoverable 75 degrees C There are three other temperature sensors a I O board ambient temperature io t_amb Motherboard ambient temperature mb t_amb a Power distribution board ambient temperature pdb t_amb 1 Log in to the SP as Administrator or Operator to reach the ILOM web GUI a Type the IP address of the server s SP into your web browser The Sun Integrated Lights Out Manager Login screen is displayed b Type your user name and password When you first try to access the ILOM Service Processor you are prompted to type the default user name and password The default user name and password are Default user name root Default password changeme 2 From the System Monitoring tab choose Sensor Readings The Sensor Readings page is displayed See FIGURE C 3 50 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 _REFRESH LOG OUT Adminis Sun Integrated Lights Out Manager System Monitoring Sensor Readings Event Logs Locator Indicator Sensor Readings View readings for temperature voltage or fan sensors Select a sensor type category an Sensors
77. sel clear Clearing SEL Please allow a few seconds to erase Appendix D Using IPMItool to View System Information 63 Using the Sensor Data Repository SDR Cache When working with the ILOM SP certain operations can be expensive in terms of execution time and the amount of data transferred Typically issuing the sdr elist command requires the entire SDR to be read from the SP Similarly the sel elist command needs to read both the SDR and the SEL from the SP in order to cross reference events and display useful information To speed up these operations it is possible to pre cache the static data in the SDR and feed it back into IPMItool This can have a dramatic effect in the processing time for some commands In order to generate an SDR cache for later ruse use the sdr dump command For example ipmitool I lanplus H lt IPADDR gt U root P changeme sdr dump galaxy sdr Dumping Sensor Data Repository to galaxy sdr After you have generated a cache file it can be supplied to future invocations of IPMItool with the s option For example ipmitool I lanplus H lt IPADDR gt U root P changeme S galaxy sdr sel elist 100 Pre Init Time stamp Entity Presence psl prsnt Device Absent 200 Pre Init Time stamp Entity Presence io f0 prsnt Device Absent 300 Pre Init Time stamp Power Supply ps0 vinok State Asserted Sensor Numbers and Sensor Names in SEL Events Depending on which IPMI command you use the sensor numbe
78. st be installed in pairs 0 and 1 2 and 3 The memory sockets are colored black or white to indicate which slots are paired by matching colors a CPUs with only a single pair of DIMMs must have those DIMMs installed in that CPU s white DIMM slots 0 and 1 m See TABLE 1 1 for supported DIMM configurations TABLE 1 1 Sun Fire X4100 X4200 Supported DIMM Configurations DDR1 Only Slot 3 Slot 1 Slot 2 Slot 0 Total Memory Per CPU 0 512 MB 0 512 MB 1 GB 512 MB 512 MB 512 MB 512 MB 2 GB 512 MB 1 GB 512 MB 1 GB B GB 512 MB 2 GB 512 MB 2 GB 5 GB 512 MB 4 GB 512 GB 4 GB 9 GB 0 1 GB 0 1 GB 2 GB 1 GB 512 MB il GB 512 MB B GB 1 GB 1 GB il GB 1 GB 4 GB 1 GB 2 GB il GB 2 GB 6 GB 1 GB 4 GB il GB 4 GB 10 GB 0 2 GB 0 2 GB 4 GB 2 GB 512 MB 2 GB 512 MB 5 GB 2 GB 1 GB 2 GB 1 GB 6 GB 2 GB 2 GB 2 GB 2 GB 8 GB 14 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLE 1 1 Sun Fire X4100 X4200 Supported DIMM Configurations DDR1 Only Slot 3 Slot 1 lot 2 lot 0 otal Memory Per CPU 2 GB 4 GB GB GB 12 GB 0 4 GB GB 8 GB 4 GB 4 GB GB GB 16 GB Sun Fire X4100 M2 X4200 M2 Rules The DIMM population rules for the Sun Fire X4100 M2 X4200 M2 servers are listed here m Each CPU can support a maximum of four DDR2 DIMMs m Each pair of DIMMs must be identical same manufacturer size and speed a The DIMM slots are paired and the DIMMs must be installed in pairs A1 and B1 AO and BO The m
79. te are checked memory is tested the LSI 1064 disk controller and attached disks are probed and enumerated and the two Intel dual gigabit Ethernet controllers are initialized The progress of the self test is indicated by a series of POST codes These codes are displayed at the bottom right corner of the system s VGA screen once the self test has progressed far enough to initialize the system video However the codes are displayed as the self test runs and scroll off of the screen too quickly to be read An alternate method of displaying the POST codes is to redirect the output of the console to a serial port see Redirecting Console Output on page 26 How BIOS POST Memory Testing Works The BIOS POST memory testing is performed as follows 1 The first megabyte of DRAM is tested by the BIOS before the BIOS code is shadowed that is copied from ROM to DRAM 2 Once executing out of DRAM the BIOS performs a simple memory test a write read of every location with the pattern 55aa55aa Note This memory test is performed only if Quick Boot is not enabled from the Boot Settings Configuration screen Enabling Quick Boot causes the BIOS to skip the memory test See Changing POST Options on page 27 for more information 3 The BIOS polls the memory controllers for both correctable and uncorrectable memory errors and logs those errors into the service processor Appendix A BIOS Event Logs and POST Codes 25 Redire
80. tion View component part numbers serial numbers and manufacturing information Select a device as Chassis Information Type Rack Mount Chassis Part Number 41 0250 01 Serial Number 0060HSI 0503AM0387 Board Information Manufacturer BENCHMARK ELECTRONICS Product Name ASY MOTHERBRD GALAXY1 2 Serial Number 0060H5V 0503000313 Part Number 00 6974 01 Product Information Manufacturer Name SUN MICROSYSTEMS Product Name GALAXY 1 Serial Number O503AMF040 Part Number 602 2813 01 FIGURE C 2 Sample Replaceable Component Information Screen 3 Select a component from the drop down list box Information about the selected component is displayed 4 If the problem with the server is not evident after viewing ILOM SP logs and information continue with SunVTS Diagnostic Tests on page 19 Appendix C Using the ILOM SP GUI to View System Information 49 Viewing Temperature Voltage and Fan Sensor Readings This section explains how to view the server temperature voltage and fan sensor readings There are a total of six temperature sensors that are monitored They all generate IPMI events that will be logged in to the system event log SEL when an upper threshold is exceeded Three of these sensor readings are used to adjust the fan speeds and perform other actions such as illuminating LEDs and powering off the chassis These sensors and their respective thresholds are a
81. ts mb v_ 12v 1Fh ok 7 0 12 22 volts mb v_ 12v 20h ok 7 0 12 20 Voits mb v_ 2v5core 21h ok 7 0 2 54 Volts mb v_ 1v8core 22h ok 7 0 1 83 Volts mb v_ lv2core 23h ok 7 0 1 21 Volts io t_amb 24h ok 15 0 21 degrees C p0 t_core 2Bh ok 3 0 44 degrees C p0 v_ 1v5 2Ch ok 3 0 1 56 Volts p0 v_ 2v5core 2Dh ok 3 0 2 64 Volts p0 v_ 1v25core 2Eh ok 3 0 1 32 Volts pl t_core 34h ok 3 1 40 degrees C pl v_ 1v5 35h ok 3 1 1 55 Volts pl v_ 2v5core 36h ok SL 2 64 Volts pl v_ 1v25core 37h ok 3 1 1 32 Volts ft0 fm0 f0 speed 43h ok 29 0 6000 RPM 60 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 t0 fm1 f 0 t0 fm2 f 0 ft1 fm0 f 0 ft1 fm1 f0 ft1 fm2 f0 speed speed speed speed speed 44h 45h 46h 47h 48h ok ok ok ok ok 29s 29 29 29 29y OP WNP 6000 6000 6000 6000 6000 RPM RPM RPM RPM RPM You can also generate a list of all sensors for a specific Entity Use the list output to determine which entity you are interested in seeing then use the sdr entity command to get a list of all sensors for that entity This command accepts an entity ID and an optional entity instance argument If an entity instance is not specified it will display all instances of that entity The entity ID is given in the 4th field of the output as read from left to right For example in the output shown in the previous example
82. uide May 2007 TABLE A 1 POST Codes Continued Post Code Description 8600 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT state de00 Preparing CPU for booting to OS by copying all of the context of the BSP to all application processors present NOTE APs are left in the CLI HLT state 8613 Initialize PM regs and PM PCI regs at Early POST Initialize multi host bridge if system supports it Setup ECC options before memory clearing Enable PCI X clock lines in the AMD controller 0024 Uncompress and initialize any platform specific BIOS modules 862a BBS ROM initialization 002a Generic Device Initialization Manager DIM Disable all devices 042a ISA PnP devices Disable all devices 052a PCI devices Disable all devices 122a ISA devices Static device initialization 152a PCI devices Static device initialization 252a PCI devices Output device initialization 202c Initializing different devices Detecting and initializing the video adapter installed in the system that have optional ROMs 002e Initializing all the output devices 0033 Initializing the silent boot module Set the window for displaying text information 0037 Displaying sign on message CPU information setup key message and any OEM specific information 4538 PCI devices IPL device initialization 5538 PCI devices General device initialization 8600
83. uses Advanced Configuration and Power Interface ACPI enabled operating systems to perform an orderly shutdown of the operating system Servers not running ACPI enabled operating systems will shut down to standby power mode immediately a Emergency shutdown Use a ballpoint pen or other stylus to press and hold the Power button for four seconds to force main power off and enter standby power mode When main power is off the Power OK LED on the front panel will begin flashing indicating that the server is in standby power mode still directed to the graphics redirect and service processor GRASP board and power supply fans indicated when the Power OK LED is flashing To completely power off the server you must disconnect the AC power cords from the back panel of the server Caution When you use the Power button to enter standby power mode power is Power OK LED Power button Sa CDI Ci wo go Serial number sticker on bezel FIGURE 1 2 Sun Fire X4100 X4100 M2 Server Front Panel Chapter 1 Initial Inspection of the Server 5 Power OK LED Power button 0 fo 40 A0 L Bo Bo o Ao Serial number sticker on bezel FIGURE 1 3 Sun Fire X4200 X4200 M2 Server Front Panel 2 Remove the server covers as required For instructions on removing system covers refer to the Sun Fire X4100 X4100 M2 and Sun Fire X4200 X4200 M2 Servers Service Manual 819 1157 3 Inspect the internal status indi
84. vident you can try viewing the power on self test POST messages and BIOS event logs during system startup Continue with Viewing BIOS Event Logs on page 23 Chapter 1 Initial Inspection of the Server 7 Troubleshooting DIMM Problems Use this section to troubleshoot problems with memory modules or DIMMs Note For information on Sun s DIMM replacement policy for x64 servers contact your Sun Service representative How DIMM Errors Are Handled By the System Uncorrectable DIMM Errors For all operating systems OS the behavior is the same m When UC error happens the memory controller causes an immediate reboot of the system m During reboot BIOS checks NorthBridge memory controller s Machine Check registers and finds out previous reboot was due to Uncorrectable ECC Error PERR SERR also then reports this in POST after the memtest stage A Hypertransport Sync Flood occurred on last boot a Memory reports this event in Service Processor s System Event Log SEL as follows ipmitool H 10 6 77 249 U root P changeme I lanplus sel list F000 02 16 2006 03 32 38 OEM 0x12 100 OEM record e0 00000000040 0c0200200000a2 200 OEM record e0 01000000040000000000000000 300 02 16 2006 03 32 50 Memory Uncorrectable ECC CPU 1 DIMM 0 400 02 16 2006 03 32 50 Memory Memory Device Disabled CPU 1 DIMM 0 500 02 16 2006 03 32 55 System Firmware Progress Motherboard initialization
85. want to view by selecting it from the Log file window The content of the selected log file is displayed in the window c With the three lower buttons you can do the following actions a Print the log file A dialog box appears for you to specify your printer options and printer name Delete the log file The file remains displayed but will be gone the next time you try to display it Close the Log file window The window is dismissed Note If you want to save the log files You must save the log files to another networked system or a removable media device When you use the Bootable Diagnostics CD the server boots from the CD Therefore the test log files are not on the server s hard disk drive and they will be deleted when you power cycle the server 22 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 APPENDIX A BIOS Event Logs and POST Codes This appendix contains information about BIOS event logs power on self test POST and console redirection Note This chapter applies to all Sun Fire X4100 X4100 M2 and X4200 X4200 M2 servers unless otherwise noted Viewing BIOS Event Logs Use this procedure to view the BIOS event log and the BMC system event log To turn on main power mode all components powered on use a ball point pen or other stylus to press and release the Power button on the server front panel See FIGURE 1 1 or FIGURE 1 2 When main
86. y BIOS just halts without logging For some other POST failures subsequent to memory and SP initialization the BIOS logs a message to the SP s SEL 82 Sun Fire X4100 X4100 M2 and X4200 X4200 M2 Servers Diagnostics Guide May 2007 TABLEE 1 Hardware Error Handling Summary Logged DMI Log or SP Error Description Handling SEL Fatal Single bit With ECC enabled The CPU corrects the error in hardware No SP SEL Normal DRAM ECC in the BIOS Setup interrupt or machine check is generated by operation error the CPU detects the hardware The polling is triggered every and corrects a sin half second by SMI timer interrupts and is gle bit error onthe done by the BIOS SMI handler DIMM interface The BIOS SMI handler starts logging each detected error and stops logging when the limit for the same error is reached The BIOS s polling is disablable through a soft ware interface Single four bit With CKIP KILL The CPU corrects the error in hardware No SP SEL Normal DRAM error enabled in the interrupt or machine check is generated by operation BIOS Setup the the hardware The polling is triggered every CPU detects and half second by SMI timer interrupts and is corrects for the done by the BIOS SMI handler failure of a four The BIOS SMI handler starts logging each bit wide DRAMon detected error and stops logging when the the DIMM inter limit for the same error is reached The face BIOS s polling is disablable via a software interface
87. y 2007 TABLEE 1 Hardware Error Handling Summary Logged DMI Log or SP Error Description Handling SEL Fatal Multiple fan Fan failure is de The Front Fan Fault Service Action Re SP SEL Fatal failure tected by reading quired and individual fan module LEDs are tach signals lit Single power When any of the Service Action Required and Power Sup SP SEL Non fatal supply failure AC DC ply Rear Fan Tray Fault LEDs are lit PS_VIN_GOOD or PS_PWR_OK sig nals are deassert ed DC DC pow Any The Service Action Required LED is lit the SP SEL Fatal er converter POWER_GOOD system is powered down to standby power failure signal is deassert mode and the Power LED enters standby ed from the blink state DC DC convert ers Voltage The SP monitors The Service Action Required LED and Pow SP SEL Fatal above below system voltages er Supply Rear Fan Tray Fault LED blink Threshold and detects voltage above or below a given threshold High temper the SP monitors The Service Action Required LED and Sys SP SEL Fatal ature CPU and system tem Overheat Fault LED blink The mother temperatures and board is shut down above the specified detects tempera critical level ture above a given threshold Processor The CPU drives CPLD shuts down power to the CPU The SP SEL Fatal thermal trip the Service Action Required LED and System THERMTRIP_L Overheat Fault LED blink signal upon detect ing an overtemp condition Boot device
Download Pdf Manuals
Related Search
Related Contents
PDFファイル - 医薬品医療機器総合機構 Hyundai DVD/CD/MP3 User's Manual Final Report and Manuel rotary screw compressor units v Air leak survey handbook Copyright © All rights reserved.
Failed to retrieve file