Home

Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide

image

Contents

1. Power Supply Status LEDsO NIC1 Network NIC1 NetworkO redundant power supplies shown Activity LED Speed LED System Status LED ID LED POST LEDs 4 LEDs are on main board O visible through rear of chassis FIGURE 3 2 Rear Panel LEDs Chapter 3 Troubleshooting the Server Using Built In Tools 3 7 TABLE 3 4 Rear Panel LEDs LED Color Function Network Connection Network Activity Network Speed POST LEDs four System ID System Status Fault Power Supply Green Amber Green Multicolor Red Green Amber Blue Green Amber Green Amber This LED is on the left side of each NIC connector Green valid network connection Blinking transmit or receive activity This LED is on the right side of the NIC connector Off 10 Mbps operation Green 100 Mbps operation Amber 1000 Mbps operation To help diagnose power on self test POST failures a set of four bi color diagnostic LEDs is located on the back edge of the server Main Board These LEDs are visible through holes in the rear panel Each of the four LEDs can have one of four states Off Green Red or Amber For detailed information on these LEDs see POST Progress Code LED Indicators on page 3 22 This LED is located on the Main Board and is visibl
2. Q__ it Erm O Q O tl Q Q F QO Q Q Q if S b 0o or Cia O 6 fi FIGURE 4 1 4 2 sp ao Main Board Jumper Locations Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide Write En Normal _ Recover November 2003 Normal _ Clear _ 60 Normal _ o Clear RCVR BOOT CLR PSWD 100 CLR CMOS Normal _ 11 BMC BB WE TABLE 4 1 Jumper Function Summary Designator Jumper Name Action at System Reset J5A2 RJ 45 Serial COM2 Port Configures either a DSR or a DCD signal to the connector See Configuration Rear Panel RJ 45 Serial COM2 Connector in Chapter 2 of the Sun Fire V60x and Sun Fire V65x Server User Guide and Setting the Serial COM2 Port Jumper on page 4 4 of this document CLR CMOS Clear CMOS If these pins are jumpered the CMOS settings are cleared These pins should not be jumpered for normal operation CLR PSWD Clear Password If these pins are jumpered the password is cleared These pins sho
3. _ Lt E e CAOC a a A E N E a a RE AS Power Supply Status LEDO Single Power Supply Power Supply Status LEDsO Redundant Power Supplies FIGURE 3 4 Location of Sun Fire V60x and Sun Fire V65x Servers Rear Panel Power Supply Status LEDs Chapter 3 Troubleshooting the Server Using Built In Tools 3 11 The rear panel power supply status LED has the states indicated in Table 3 6 TABLE 3 6 Power Supply Status LED States Power Supply LED State Power Supply Condition OFF No AC power present to power supply BLINKING GREEN AC power present but only the standby outputs are on GREEN Power supply DC outputs are on and OK BLINKING AMBER PSAlert signal asserted power supply on AMBER Power supply shutdown due to over current over temperature over voltage or undervoltage AMBER or OFF Power supply
4. 5 5 8 1 Sun Fire V60x Server Air Baffle Removal Follow the steps in this section to remove the air baffle 1 Before removing the cover to work inside the system observe the safety guidelines mentioned earlier 2 Remove the chassis cover 3 Gently lift the air baffle until pin C is free of the board mounting hole 4 Remove the baffle from the chassis FIGURE 5 29 Removing the Air Baffle 5 38 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 8 2 Sun Fire V60x Server Air Baffle Installation Follow these steps to install the air baffle 1 Ensure the flex cable auxiliary power cable USB ribbon cable and SCSI cables are routed under where you will be installing the air baffle FIGURE 5 30 Installing the Air Baffle 2 Aligning pin C with the board s mounting hole position the air baffle over the white server board power connector 3 Lower the baffle into position and press it down against the backplane board 4 Ensure tab A aligns with the edge of the power supply and tab B aligns with the edge of the fan module Chapter 5 Maintaining the Server 5 39 5 5 8 3 Sun Fire V65x Server Air Baffle Removal To remove the air baffle follow these steps 1 Remove the blue thumbscrew holding the air baffle to the backplane board 2 Lift the air baffle straight up moving the SCSI cable out of the way FIGURE 5 31 Removing the Air Baffle 5 5 8 4 Sun Fire V65x
5. A Tab at inside of retention clip engages slot on heatsink base during installation Note that center dogleg slot in clip provides room for side to side motion while engaging retention clip slots located at each side eee eee Side Plastic Tab NOTE For ease of installation BOTH retention clips should be installed simultaneously FIGURE 5 19 Installing the Heatsink Retention Clip Details Warning Incorrect use of the tool can cause the tool to slip from the retention clip and strike the server board possibly causing severe damage to the board or board components In addition if too much force is used you may bend the heatsink retention clip to a point where it may be difficult to replace it without bending it back to its original position a Secure each end of the retention clip to the tabs in the processor retainer by aligning the clip holes over the tabs and pushing down a With the tool in the vertical position firmly grasp it and insert the middle prong of the tool securely into the hole at the center of the retention clip b Slowly and carefully push the tool downward making sure the center prong of the tool stays in the retention clip hole c As you continue to exert downward pressure move the top of the handle slightly in a direction away from the heatsink so that the clip is pushed away from the retainer and the hole in the center of the clip is aligned over the retainer tab
6. Install the processor retention mechanisms using the eight screws you removed earlier along with the processor s heatsink and DIMMs that you wish to use with the new board If you only have one processor install the processor air dam in the outer processor location Install the fan module and connect the fan cables to the server board If you are using the DMII fan assembly form the old main board install the DIMM fan assembly and connect the DIMM power cable to the DIMM fan power connector on the fan module Rethread the USB ribbon cable through the clips on the top of the fan module and connect the USB cable to the USB connector on the server board Install the cables between the new server board and the other system components including the power supply Install the air baffle With a screw install the blue plastic retention clip that holds down the flex cable onto the server board Install the processor air duct Install both PCI riser board assemblies Replace the chassis cover if you have no additional work to do inside the chassis Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 6 2 5 6 2 1 Cable Kit Caution The procedure below is for the attention of qualified service engineers only Bef
7. Note The Processor Retest feature will return to its default Disabled condition after this cycle is complete Power LED Does Not Light Check the following a Is the system operating normally If so the power LED is probably defective or the cable from the front panel to the server board is loose m Are there other problems with the system If so check the items listed under System Cooling Fans Do Not Rotate Properly on page 2 12 If all items are correct and problems persist contact your service representative or authorized dealer for help Chapter 2 Troubleshooting Specific Problems 2 9 2 29 2 2 3 1 2 10 Video Problems This section gives help on how to isolate and solve video problems No Video Appears on the Screen Check the following The server board accommodates two processors If only one processor is installed it must be placed in the CPU 1 socket The system will not boot if only one processor is installed and it is in the CPU 2 socket Are there any beeps coming from the board and is the floppy drive being accessed If so your system may have been put in the BIOS recovery mode This mode is used to reflash the BIOS in the event it gets corrupted To enter this mode the RCVR BOOT jumper located along the edge of the board must be set on the two pins nearest the front of the server For normal operation this jumper must be set on the two pins nearest the back of the server
8. Troubleshooting Guide November 2003 4 12 3 Service Partition This menu item see Figure 4 32 allows you to find create format or remove a service partition on the hard disk of the Sun Fire V60x and Sun Fire V65x servers Caution If you remove the service partition it possible that you may have to reformat the hard disk to create it again Reformatting the disk removes all partitions and destroys all data on the disk If you are just updating the service partition you should reformat the service partition and install the updated software using option 4 shown in Figure 4 34 You can create service partitions on any disk you specify as long as a service partition or any other partition does not already exist There is only one service partition allowed Some operating systems automatically create the service partition when they install If you want to reformat the service partition and copy the CD contents to the service partition select Service Partition gt Run Service Partition Administrator and use option 4 Format service partition and install software from the resulting menu see Figure 4 34 This updates the service partition only If the Sun Fire V60x and Sun Fire V65x servers do not have a service partition installed for some reason for example if you running Red Hat Linux the Service Partition Administrator tool on the CD will not be able to create a service partition However from Linux you can cre
9. Caution Before touching or replacing any component inside the server disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap The main board supports DDR 266 compliant registered ECC DIMMs operating at 266 MHz Only tested and qualified DIMMs are supported on the main board Note that all DIMMs are supported by design but only fully tested DIMMs are supported The minimum supported DIMM size is 128 MB Therefore the minimum main memory configuration is 2 x 128 MB or 256 MB The largest size DIMM supported is a 2 GB stacked registered DDR 266 ECC DIMM based on 512 megabit technology The memory system on the main board has the following features a The maximum memory capacity is 12 GB on the Sun Fire V65x server and 6 GB on the Sun Fire V60x server a The minimum memory capacity is 256 MB m ECC single bit errors are corrected and multiple bit errors are detected a Single bit error correction If a single bit error is detected the ECC logic generates a new recovered 64 bit QWord with a pattern that corresponds to the originally received 8 bit ECC parity code The corrected data is returned to the requestor the processor or PCI master a Multiple bit error detection Additional errors within the same QWord c
10. 3 If the configuration is correct press Enter to continue Several entries are displayed and scroll past on the screen showing the test progress see Figure 4 25 Chapter 4 Powering On and Configuring the Server 4 35 CPU_TYPE PASSED FLOATING_POINT_UNIT 61 36 63 66 34 27 gt FLOATING_POINT_UNIT PASSED CLOCK_SPEED 01 30 03 66 34 CPU_TYPE PASSED FLOATING_POINT_UNIT 61 36 63 66 34 27 gt FLOATING _POINT_UNIT PASSED CLOATING_POINT_UNIT 61 36 63 66 34 27 gt FLOATING_POINT_UNIT PASSED CLOCK_SPEED 61 36 63 66 34 27 gt CLOCK_SPEED PASSED MMX_INSTRUCTIONS 61 36 63 6 34 27 MM amp X_INSTRUCTIONS PASSED CPU MMR2_INSTRUCTIONS 61 36 63 6 34 27 CPU MMX2_INSTRUCTIONS PASSED HARD_DISK RESET 1 3 3 66 34 277 gt HARD_DISK RESET PASSED CACHE FIND_CACHE_TYPE 61 36 63 6 34 27 CACHE FIND_CACHE_TYPE PASSED CACHE PAGE_FAULT 61 36 83 06 34 27 CACHE PAGE_FAULT PASSED CACHE RANDOM_PATTERNS 61 36 03 66 34 28 CACHE RANDOM_PATTERNS PASSED CACHE ADDRESS_PATTERNS 61 36 63 06 34 28 FIGURE 4 25 Platform Confidence Quick Test Progress When the testing is done the results are summarized see Figure 4 26 Test Result Summary Pass Count 1 FRU CPU PASSED PASSED CPU Module FRU BASEBOARD PASSED PASSED Power On Self Test PASSED CACHE Controller and Memory FRU MEMORY Controller DIMM PASSED PASSED MEMORY Controller DIMM PASSED Extended MEMORY lt DIMM gt FRU HARD DISK DRIVES PASSED PASSED Ha
11. 40h Off R Off Off Calculate CPU speed 42h Off R G Off Init interrupt vectors interrupt vector initialization is done 44h Off A Off Off Enable USB controller in chipset 46h Off A G Off Initialize SMM handler Initialize USB emulation 48h G R Off Off Validate NVRAM areas Restore from backup if corrupted Chapter 3 Troubleshooting the Server Using Built In Tools 3 25 TABLE 3 14 POST Progress LED Code Table Port 80h Codes Continued POST Diagnostic LED Decoder Code G green R red A amber Description 4Ah G R G Off Load defaults in CMOS RAM if bad checksum or CMOS clear jumper is detected 4Ch G A Off Off Validate date and time in RTC 4Eh G A G Off Determine number of microcode patches present 50h Off R Off R Load microcode to all CPUs 52h Off R G R Scan SMBIOS GPNV areas 54h Off A Off R Early extended memory tests 56h Off A G R Disable DMA 58h G R Off R Disable video controller 5Ah G R G R 8254 timer test on channel 2 5Ch G A Off R Enable 8042 Enable timer and keyboard IRQs Set video mode initialization before setting the video mode is complete Configuring the monochrome mode and color mode settings next 5Eh G A G R Initialize PCI devices and motherboard devices Pass control to video BIOS Start serial console redirection 60h Off R R Off Initialize memory test parameters 62h Off R A Off Initialize AMI display manager module Initialize support code for headless system if no video controlle
12. 5 34 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 FIGURE 5 27 Replacing the Power Supply 5 35 Chapter 5 Maintaining the Server 5 5 7 Hard Disk Drives Caution Not all SCSI hard disk drives HDD are supported by the server Unsupported drives will not mate mechanically with the connector on the inside of the drive bay All drives must be LVDS SCA type 80 pin connector The server does not support internal single ended SE drives Unless an approved RAID card is installed in the server the hard drives cannot be hot swapped The use of unauthorized HDDs may damage the system and void the warranty Only Sun certified drives should be used See Table 1 3 and Table 1 4 in the Sun Fire V60x and Sun Fire V65x Server User Guide for a list of approved hard disk drives Follow these steps to replace a hard disk drive while referring to Figure 5 28 1 Before removing the cover to work inside the system observe the safety guidelines mentioned earlier 2 Remove the bezel from the front of the chassis see panel 1 of Figure 5 28 3 As shown in panel 2 of Figure 5 28 push the green release tab in the retention lever and pull the HDD retention lever toward you until the tab end left end of the lever is free of the housing slot 4 Pull the HDD assembly forward and out of the drive bay 5 Remove the new HDD assembly hard disk drive mounted on a carrier from its wrapper and place it
13. General Board and Feature Issues on page 2 21 2 1 Preparing the System for Diagnostic Testing Caution Turn off devices before disconnecting cables Before disconnecting any peripheral cables from the system turn off the system and any external peripheral devices Failure to do so can cause permanent damage to the system and or the peripheral devices Turn off the system and all external peripheral devices Disconnect all of them from the system except the keyboard and video monitor Make sure the system power cord is plugged into a properly grounded AC outlet Make sure your video display monitor and keyboard are correctly connected to the system Turn on the video monitor Set its brightness and contrast controls to at least two thirds of their maximum ranges see the documentation supplied with your video display monitor If the operating system normally loads from the hard disk drive make sure there is no diskette in drive A Otherwise place a diskette containing the operating system files in drive A 2 1 5 Turn on the system If the power LED does not light see Power LED Does Not Light on page 2 9 6 If errors are encountered power off the system remove all add in cards and turn the power back on Zyd Specific Problems and Corrective Actions This section provides possible solutions for the specific problems listed in Table 2 1 TABLE 2 1 Index to Problems Prob
14. November 2003 Lv iro v g aQ FE u 2 8 O 2 o Sere v xou o D uf aes a gt 2 J aa rt novo Q O eun nN b 0 D 9 7 LLY YA A A fj Asi ii g A i ag y WS gt l z ZZ 0 0 0 ZON Ma rf N A Chapter 5 Maintaining the Server 5 67 FIGURE 5 43 Sun Fire V65x Server Cable Kit Removal System Components Power Supply SCSI Hard Disk Drives DVD CD ROM FDD module Tape Drive optional Front Panel Board Fan Module SCSI Backplane shown horizontal for clarity Server Board SO Or o Connections A B m T FIGURE 5 44 Sun Fire V65x Server Cable Routing To backplane power connector from power supply To server board primary power connector from power supply Floppy FP IDE flex circuit cable from server board to backplane SCSI cable from server board to backplane USB ribbon cable from front panel board to server board Ribbon cable from front panel board to backplane Fan module to server board fan connectors 2 To server board auxiliary signal connector from power supply To server board auxiliary power connector from power supply Serial cable from server board to knockout on back of chassis optional 5 68 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide
15. Run HSC Firmware Update Run FRU SDR Update Run BIOS Update reboot required Reboot to Service Partition Reboot System 4 26 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 11 2 1 Run System Setup Utility If installed the Service Partition on the hard disk of the Sun Fire V60x and Sun Fire V65x servers allows you to perform server management configuration and validation testing To bring up the service partition reboot and press the lt F4 gt function key when the first BIOS screen appears Note Any configuration change CPU memory hard disk add in PCI cards and so forth causes the server to revert to its factory default state regardless of how the server boot options have been set up using the System Setup Utility SSU or the BIOS setup Select Run System Setup Utility to run system setup The System Setup Utility main window appears see Figure 4 17 l System Setup Utility LIET File Preferences Help Topics Available Tasks Language F Other FIGURE 4 17 SSU Main Window The System Setup Utility SSU allows you to configure the following m User Preferences Boot devices m Security Chapter 4 Powering On and Configuring the Server 4 27 4 28 Setting User Preferences Because the server supports running the SSU over a serial console all of the menus work in text mode only Configuring Boot Devices The Multiboot Add in MBA
16. S serial console configuring an external 4 17 serial console communications settings 4 17 service partition create diskettes 4 25 reboot system 4 44 reboot to service partition 4 44 restoring 4 48 run Baseboard Management Controller BMC firmware update 4 43 run BIOS update 4 44 run Field Replaceable Unit Sensor Data Record FRU SDR update 4 44 run HSC firmware update 4 43 run platform confidence test 4 31 run system setup 4 27 system utilities 4 26 service partition menu 4 24 service partition menu using the 4 24 shutting down 4 55 Signal DCD 4 4 DSR 4 4 SMBIOS 3 26 3 28 standby power LED 2 3 system errors 3 1 beep codes 3 2 LEDs 3 1 POST screen messages 3 2 system utilities 3 2 system power cord 2 1 system utilities platform confidence test PCT 3 2 system setup utility SSU 3 2 T technical support contacting 3 29 tools and supplies 5 2 troubleshooting checklists 1 1 guidelines 1 1 U USB controller 3 24 using the Service Partition Menu 4 24 V video display monitor 2 1 W Watchdog timers 4 19 l 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003
17. SCSI Cable Floppy FP IDE cable Backplane Ribbon Cable Backplane Board USB Ribbon Cable FIGURE 5 40 Sun Fire V60x Server Cable Kit Removal 5 62 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 System Components 1 PON ONO Power supply Power distribution board 0 Hard disk drive Optional DVD CD ROM FDD module Front panel board Fan module O Backplane board O Server board Hex Head Screw Serve Som m m m e e e J Boa Backplane Thumbscrew Connections A B C G H To the backplane Power connector To the server board Auxiliary Signal connector Flex circuit cable from the server board FDD FP IDE connector to the backplane BCSI cable from the server board to the backplane USB ribbon cable from the server board to the front panel board Front panel ribbon cable from the front panel board to the backplane From the fan module to the server board fan connector To the server board Auxiliary Power connector FIGURE 5 41 Sun Fire V60x Server Cable Routing Chapter 5 Maintaining the Server 5 63 9 Remove the S
18. The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 5 46 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 10 10 PCI Cards Note Add in cards must be replaced while the riser board is removed from the chassis The server supports 3V only and Universal PCI cards It does not support 5V only cards Caution Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap Note Disconnecting an Ethernet cable from Network 2 may interrupt network connectivity on other network interfaces Run the following commands to restore connectivity to other connected network interfaces etc rce d init d network stop etc rc d init d network start To replace a PCI card follow these steps while referring to Figure 5 34 Before removing the cover to work inside the system observe the previously stated safety guidelines Remove the chassis cover Insert your finger in the plastic loop on the PCI riser assembly Pull straight up and remove the riser assembly from th
19. To install a new processor follow these steps 1 Remove any server items necessary to gain access to the CPU socket where you will install the new CPU and heatsink 2 Following the instructions packaged with your boxed processor prepare the new processor for installation Caution You should not allow any surface that has thermal interface material to come in contact with any other surface as surface contamination may occur 3 As shown in Figure 5 20 open the socket lever Open the lever all the way as shown FIGURE 5 20 Opening the Socket Lever 4 Align the corner mark on the processor with the mark on the socket 5 Insert the processor into the socket as shown in Figure 5 21 5 28 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Align corner mark on the processor to the socket INN shown lt A NYES q ORS FIGURE 5 21 Inserting the Processor 6 Verify that the processor sits flush and level on the socket 7 Close the socket lever until it locks and secures the processor in the socket FIGURE 5 22 Closing the Socket Lever Caution Move the socket lever slowly and make sure that it is engaged on the locking tab on the side of the socket 8 If you have not already done so apply thermal conducting material to the processor now see Figure 5 23 Chapter 5 Maintaining the Server 5 29 Note Heatsink styles may Apply thermal
20. b Choose the Run System Setup Utility menu c Press any key when prompted d Choose the SEL Manager option View the SEL Manager listing to determine which faulty DIMM is detected by the BIOS Choose the Exit option from the File menu of the SEL manager Exit out of the SSU menu and Diagnostics CD main menu Turn off the system Open the top cover 2 18 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 10 11 12 13 14 15 16 10 Remove the faulty DIMM and replace it with the good DIMM read the silkscreen on the motherboard for the DIMM position Refer to the Sun Fire V60x and Sun Fire V65x Server User Guide 817 2023 xx for information on how to correctly replace the DIMMs Note Make sure to replace only the faulty DIMM as indicated in the SEL Manager Replace the top cover Power on the system Press the F2 key to select SETUP when the option appears on the screen In the main page of the SETUP menu use the arrow keys to select the Advanced Menu In the Memory Configuration screen select Memory Retest and select Enabled Press the F10 key to exit the SETUP menu and save the changes The system will now boot correctly with no memory errors For Systems With Only Two DIMMs When you turn on the system the BIOS will issue a sequence of beeps to indicate a memory error detected in POST and the system will not boot no video will be displayed on
21. CD In selecting this form of handshaking the server is prevented from sending video updates to a modem that is not connected to a remote modem If this is not selected video update data being sent to the modem inhibits many modems from answering an incoming call An Emergency Management Port option utilizing CD should not be used if a modem is not used and CD is not connected 4 9 Fault Resilient Booting FRB The BIOS and firmware provides a feature to guarantee that the system boots even if one or more processors fail during POST The BMC contains two watchdog timers that can be configured to reset the system upon time out Chapter 4 Powering On and Configuring the Server 4 19 4 9 1 FRB3 FRB3 refers to the FRB algorithm that detects whether the BSP is healthy enough to run BIOS at all The BMC starts the FRB3 timer when the system is powered up experiences a hard reset The BIOS stops this timer in the power on self test POST by asserting the FRB3 timer halt signal to the BMC This requires that the BSP actually runs BIOS code If the timer is not stopped within five seconds and it expires the BMC disables the BSP logs an FRB3 error event chooses another BSP from the set of non failed processors and resets the system FRB3 provides a check to verify that the selected BSP is not dead on start up and can actually run code This process repeats until either the system boots without an FRB3 timeout or all of the remaining pro
22. FRU SDR and HSC a Clear the CMOS upon completion This can be accomplished by moving the clear CMOS jumper or by holding down the reset button for 4 seconds and at the end of 4 seconds while holding down the reset button press the power button then release both at the same time a Update files can be downloaded from the Sun support web site a Download and apply the latest drivers used in your installation These drivers may include video network adapter SCSI and chipset a Check for proper processor installation Systems with a single processor must have the CPU installed in CPU socket 1 If two processors are installed the processors must be of the same speed and voltage and within one stepping Do not attempt to over clock the processors or other components on this system Over clocking is generally not possible and may damage components and void the warranty of your server board and your boxed or tray processor a Memory must be of the approved type and be properly seated Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 a Verify that all chassis and power supply fans are properly installed and functioning a Approved heat sinks must be properly installed on the processors Do not attempt to run the processors without a heat sink for even a few moments a If the system is running slowly or you receive a processor error message enter BIOS setup and enable processor retest This test will be ru
23. POST Screen Messages on page 3 16 System Utilities The following utilities are available to help troubleshoot system errors a Platform Confidence Test PCT The PCT is used to test major subsystems and analog sensors of the system board m System Setup Utility SSU The SSU is used to read the System Event Log SEL Platform Confidence Test PCT The PCT consists of up to 31 tests that test the following m Processor subsystem a Memory subsystem a Input output subsystem a Management subsystem The PCT supplies three testing levels m Quick Test This runs a subset of available tests and identifies processor memory cache and hard drives m Comprehensive Tests This runs Quick Tests and identifies keyboard mouse ports and controllers a Comprehensive Tests with Looping This runs Comprehensive Tests continually loops through tests until stopped and enables identification of intermittently failing FRUs For information on how to run the PCT see Run Platform Confidence Test PCT on page 4 31 System Setup Utility SSU The SSU is intended to help with troubleshooting system errors and can be used to read the System Event log SEL 3 2 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide e November 2003 For information on how to run the SSU see Using the Service Partition Menu on page 4 24 J2 LEDs and Pushbuttons This section describes the LEDs and pushbuttons on the
24. Port 80h Codes POST Diagnostic LED Decoder Code G green R red A amber Description MSB LSB 20h Off Off R Off Uncompress various BIOS modules 22h Off Off A Off Verify password checksum 24h Off G R Off Verify CMOS checksum 26h Off G A Off Read microcode updates from BIOS ROM 28h G Off R Off Initializing the processors Set up processor registers Select least featured processor as the BSP 2Ah G Off A Off Go to Big Real mode 2Ch G G R Off Decompress INT13 module 2Eh G G A Off Keyboard controller test the keyboard controller input buffer is free Next the BAT command will be issued to the keyboard controller 30h Off Off R R Swap keyboard and mouse ports if needed 32h Off Off A R Write command byte 8042 the initialization after the keyboard controller BAT command test is done The keyboard command byte will be written next 34h Off G R R Keyboard Init the keyboard controller command byte is written Next the pin 23 and 24 blocking and unblocking commands will be issued 36h Off G A R Disable and initialize the 8259 programmable interrupt controller 38h G Off R R Detect configuration mode such as CMOS clear 3Ah G Off A R Chipset initialization before CMOS initialization 3Ch G G R R Init system timer the 8254 timer test is over Starting the legacy memory refresh test next 3Eh G G A R Check refresh toggle the memory refresh line is toggling Checking the 15 second on off time next
25. To remove the heatsink and processor follow these steps while referring to Figure 5 15 As shown in panel 1 of Figure 5 15 Remove the SCSI cable clipped to the left side of the processor air duct then remove the processor air duct by sliding it slightly back and then lifting it from the front edge Remove the riser card assembly for better access to the CPU heatsink if desired Determine the location of the processor you are going to remove see Figure 5 15 CPU 2 is closest to the outside of the server and CPU 1 is toward the inside As shown in panel 2 of Figure 5 15 insert the heatsink retention clip removal tool into the hole in the end of one of the retention clips and then a Use the tool to push the clip down b Move the top of the tool toward the heatsink to release the clip from the tab on the heatsink retainer c Release the pressure on the tool and allow the clip to come up so it clears the tab on the retainer d Release the other end of the clip and slide the clip in a horizontal direction to free it from the middle tab Remove both retention clips and the heatsink as shown in panel 3 5 22 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 As shown in panel 4 a Grasp the end of the socket lever and raise it to disengage the processor pins b Lift the processor straight up out of the socket pasted side of the processor or heatsink on any Caution
26. autres pays Toutes les marques SPARC sont utilis es sous licence et sont des marques de fabrique ou des marques d pos es de SPARC International Inc aux Etats Unis et dans d autres pays Les produits portant les marques SPARC sont bas s sur une architecture d velopp e par Sun Microsystems Inc L interface d utilisation graphique OPEN LOOK et Sun a t d velopp e par Sun Microsystems Inc pour ses utilisateurs et licenci s Sun reconna t les efforts de pionniers de Xerox pour la recherche et le d veloppement du concept des interfaces d utilisation visuelle ou graphique pour l industrie de lanioemateq ie Sun d tient une license non exclusive de Xerox sur l interface d utilisation graphique Xerox cette licence couvrant galement les licenci es de Sun qui mettent en place l interface d utilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun LA DOCUMENTATION EST FOURNIE EN L TAT ET TOUTES AUTRES CONDITIONS DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE A L APTITUDE A UNE UTILISATION PARTICULIERE OU A L ABSENCE DE CONTREFA ON SS me 4 Adobe PostScript Contents Safety and Compliance Information xvii Who Should Use This Book xviii How This Manual is Organized xviii Typographic Conventions xviii Related Documentation xix Or
27. m Is the BIOS set to allow the CD ROM to be the first bootable device m Check cable connections a Verify CD is bootable in another known good CD ROM drive especially if the CD is a copy 2 14 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 210 2 2 11 1 Memory Configuration Errors If you have added removed or replaced a DIMM and you encounter memory configuration errors during power on self test POST do the following to clear the errors Note If the errors you see are DIMM population errors 8508 8509 or 850A you must reorder the DIMMs See Memory DIMM Population Order on page 2 15 for more information Reset or turn on system Press the F2 key to select SETUP as soon as the option appears on the screen Once in the main page of the SETUP menu use the arrow keys to select Advanced menu In the Advanced screen select Memory Configuration option then press Enter In the Memory Configuration screen select Memory Retest then select Enabled Press the F10 function key to exit the SETUP menu and save changes Note The Memory Retest feature will return to its default Disabled condition after the memory test cycle is complete Memory DIMM Population Order If you install modules with mixed memory sizes in your Sun Fire V60x or V65x server you must install the single wide memory modules 256 MB or 512 MB in the lower numbered slots and the dou
28. 2 3 2 LEDs and Pushbuttons 3 3 3 2 1 Front Panel LEDs and Pushbuttons 3 4 3 2 1 1 Front Panel LEDs 3 5 3 2 1 2 Front Panel Pushbuttons 3 6 3 2 2 Rear Panel LEDs 3 7 3 2 3 Front Panel System Status LED 3 9 3 2 4 Rear Panel Power Supply Status LED 3 11 3 2 5 Server Main Board Fault LEDs 3 13 3 2 6 System ID LEDs 3 15 3 3 Power On Self Test POST 3 15 3 3 1 POST Screen Messages 3 16 3 3 2 POST Error Beep Codes 3 19 3 3 2 1 BIOS Recovery Beep Codes 3 21 3 3 3 POST Progress Code LED Indicators 3 22 3 4 Contacting Technical Support 3 29 4 Powering On and Configuring the Server 4 1 4 1 Jumper Locations 4 2 4 2 Setting the Serial COM2 Port Jumper 4 4 4 3 PoweringOn 4 5 44 Clearing CMOS 4 6 4 4 1 Using the Front Panel 4 6 4 4 2 Using the Clear CMOS Jumper 4 7 vi Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 5 4 6 4 7 4 8 4 9 4 10 4 11 Booting Up 4 8 4 5 1 Boot Options 4 9 4 5 1 1 BIOS Setup Utility lt F2 gt 4 9 4 5 1 2 Service Partition lt F4 gt 4 12 4 5 1 3 Network Boot lt F12 gt 4 13 4 5 1 4 Choose Boot Device lt ESC gt 4 13 4 5 2 Other Bootup Items 4 14 4 5 2 1 Ethernet Port Delay 4 14 4 5 2 2 USB Connected External CD ROM Drives 4 14 4 5 2 3 Booting the Server When an External SCSI Hard Drive is Connected 4 15 4 5 2 4 PS 2 Mouse Misidentification 4 15 Loading the Operating System 4 16 Hyper threading CPU Feature 4 16 Configuring an External Serial Console 4 17 Fault
29. 4 1 a Chapter 5 Maintaining the Server on page 5 1 SSS Typographic Conventions The following table describes the typographic conventions used in this book TABLE P 1 Typographic Conventions Typeface or Symbol Meaning Example courier font Names of commands Use ls a to list all files Names of files Edit your login file On screen computer output machine name You have mail italics Book titles new words Read Chapter 6 in the User s Guide Terms to be emphasized These are called class options Variables that you replace with a You must be root to do this real value To delete a file type rm filename boldface courier font What you type machine_name su xviii Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Related Documentation These documents contain information related to the tasks described in this book Sun Fire V60x and Sun Fire V65x Server User Guide Sun Fire V60x Setup Poster Sun Fire V65x Setup Poster Ordering Sun Documents The SunDocsSM program provides more than 250 manuals from Sun Microsystems Inc If you are in the United States Canada Europe or Japan you can purchase documentation sets or individual manuals by using this program For a list of documents and how to order them see the catalog section of the SunExpress Internet site at http store sun com Shell Prompts in Command Examples The following table shows the default system p
30. 5 10 PCI Cards 5 47 5 5 11 Battery 5 49 5 5 12 Keyboard Mouse Y Adapter 5 51 5 5 13 Emergency Management Port Cable 5 52 5 5 13 1 Installing the DSR Peripherals Cable 5 52 5 5 13 2 Installing the DCD Modem Cable 5 52 5 6 Field Replaceable Unit FRU Procedures 5 54 5 6 1 Server Main Board 5 54 5 6 1 1 Sun Fire V60x Server Main Board Replacement 5 54 5 6 1 2 Sun Fire V65x Server Main Board Replacement 5 58 5 6 2 Cable Kit 5 61 5 6 2 1 Sun Fire V60x Server Cable Kit Removal 5 61 5 6 2 2 Sun Fire V60x Server Cable Kit Installation 5 64 5 6 2 3 Sun Fire V65x Server Cable Kit Removal 5 66 5 6 2 4 Sun Fire V65x Server Cable Kit Installation 5 69 5 6 3 System FRU 5 71 5 6 3 1 Sun Fire V60x and Sun Fire V65x Servers System FRU Installation 5 71 x Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Figures FIGURE 1 1 FIGURE 3 1 FIGURE 3 2 FIGURE 3 3 FIGURE 3 4 FIGURE 3 5 FIGURE 3 6 FIGURE 3 7 FIGURE 4 1 FIGURE 4 2 FIGURE 4 3 FIGURE 4 4 FIGURE 4 5 FIGURE 4 6 FIGURE 4 7 FIGURE 4 8 FIGURE 4 9 FIGURE 4 10 FIGURE 4 11 FIGURE 4 12 Main Board Jumper Locations 1 5 Front Panel Pushbuttons and LEDs 3 4 Rear Panel LEDs 3 7 Location of Front Panel System Status LED 3 9 Location of Sun Fire V60x and Sun Fire V65x Servers Rear Panel Power Supply Status LEDs 3 11 Fault and Status LEDs on the Server Board 3 13 Location of Front Panel ID Pushbutton and L
31. Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap The server is certified to function properly only with Sun CPUs Do not mix CPU steppings and speeds or processor family types 5 5 4 1 Safety Precautions Warning If the server has been running any installed processor and heat sink on the processor board s will be hot To avoid the possibility of a burn be careful when removing or installing server board components that are located near processors Caution The processor must be appropriate You may damage the server if you install a processor that is inappropriate for your server Make sure your server can handle a newer faster processor with associated thermal and power considerations If you are adding a second processor to your system the second processor must be compatible with the first processor within one stepping same voltage and same speed For exact information about processor interchangeability contact your customer service representative Caution Pressing the power button does not turn off power to the server board Disconnect the server board from its power source and from any telecommunications links networks or modems before doing any of the procedures described in this section Failure to do this can result in personal injury or equipment damage Some circuitry on the server board may continue to operate even though the front panel power b
32. BIOS Setup and selecting that processor to be retested If a bad processor is removed from the system and is replaced with a new processor the BMC automatically detects this condition and clears the status flag for that processor during the next boot There are three possible states for each processor slot a Processor installed status only indicates processor has passed BIOS POST m Processor failed The processor may have failed FRB 2 or FRB 3 and it has been disabled a Processor not installed status only indicates the processor slot has no processor in it Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 10 Enabling USB Keyboard and Mouse Operation In order to use the Diagnostic CD or Service Partition with a USB keyboard and mouse you will need to set up the BIOS to recognize the USB devices USB support for these functions is disabled by default Note If the OS has not yet been installed you will need to use a PS 2 keyboard to do the initial installation and configuration of the OS To enable a USB keyboard and mouse for use with the Service Partition or Diagnostic CD Install a USB keyboard and mouse onto the server Press the F2 key to access the BIOS Setup screen At the Setup screen select the Advanced menu and the Peripheral configuration option In the Peripheral configuration screen select the Legacy USB Support option and change the setting to Au
33. Configure the J5A2 jumper as shown in Figure 5 37 Installing the DCD Modem Cable Follow these steps to install the DCD Modem cable Plug the DCD Modem cable into the rear panel RJ 45 Serial COM2 connector as shown in Figure 5 37 Configure the J5A2 jumper as shown in Figure 5 37 5 52 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 lt gt Rear RJ 45 connector D DSR Peripherals cable DCD Modem cable DSR signal configuration RJ 45 pin 7 connected to DSR pin 6 of DB9 5 31 J5A2 E o e 642 J5A2 Jumper Block viewed from front of server J5A2 O Ou O Ow 1 642 J5A2 Jumper Block viewed from front of server DCD signal configuration RJ 45 pin 7 connected to DCD pin 1 of DB9 FIGURE 5 37 Installing the EMP Cable Chapter 5 Maintaining the Server 5 53 5 6 FOL 5 6 1 1 2 3 4 5 Field Replaceable Unit FRU Procedures This section explains how to replace the FRUs in the Sun Fire V60x and Sun Fire V65x servers Server Main Board Note The Main Board contains no DIMMs or CPUs and is packaged in an ESD bag with two foam pieces and an ESD wrist strap Be sure to observe all ESD safety guidelines when handling the board Caution The procedure below is for the attention of qualified service engineers only Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers
34. DIMMs see Table 1 3 and Table 1 4 in the Sun Fire V60x and Sun Fire V65x Server User Guide Caution Use of unauthorized DIMM modules may damage the server and may void the warranty Note If you see memory configuration errors after adding or replacing DIMMs see Memory Configuration Errors on page 2 15 for instructions on how to correctly order the DIMMs and clear the errors Note When upgrading RAM from 4 GB or less to more than 4 GB you must run the bigmem kernel if you want to use all of the available memory Sun Fire V60x Server DIMM Replacement To replace DIMMs in a Sun Fire V60x server follow these steps while referring to Figure 5 5 Before removing the cover to work inside the system observe the safety guidelines previously stated Release the DIMM from the connector slot by pressing down on the DIMM module ejector bars at both ends of the connector slot LIft the DIMM up and away from the connector slot With the ejector bars in the open position align the replacement DIMM notch with the connector slot notch and apply even downward pressure on the DIMM until it slides into the connector slot The ejector bars will snap inward and lock the memory module in place 5 10 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 UO 5 w 2 wn o v D S g fo O FIGURE 5 5 DIMM Installation 5 11 Chapter 5 Maintaining th
35. November 2003 5 6 2 4 Sun Fire V65x Server Cable Kit Installation Before removing the cover to work inside the system observe the safety guidelines previously given To replace cables remove the cover and refer to Figure 5 43 through Figure 5 46 when following these steps 1 Install the flex cable A between the backplane connector B and the server board connector C see Figure 5 45 Make sure the end marked P1 Serverboard plugs into the server board FIGURE 5 45 Installing the Flex Cable Chapter 5 Maintaining the Server 5 69 2 Install the flex cable retention clip on the SCSI backplane as shown in Figure 5 46 FIGURE 5 46 Installing the Backplane Retention Clip 3 Install the screw in blue plastic retention clip to hold the flex cable connector in place on the server board 4 Connect both ends of the SCSI cable Figure 5 44 D making sure it routes through the air baffle notch next to the power supply 5 Install the front panel cable Figure 5 44 F between the front panel and the SCSI backplane 6 Install the USB cable Figure 5 44 E a Connect the USB Cable to the front panel board b Connect the USB cable to the main board c Route the USB cable under the black plastic flap and hooks at the top of the fan module to keep it securely in place Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confiden
36. Off Off G Off EDO is not supported 3 03h Off Off G G First row memory test failure 3 04h Off G Off Off Mismatched DIMMs in a row 3 05h Off G Off G Base memory test failure 3 06h Off G G Off Failure on decompressing post module 3 07h Off G G G Generic memory error 08h G Off Off Off 09h G Off Off G OAh G Off G Off OBh G Off G G OCh G G Off Off 0Dh G G Off G 3 OEh G G G Off SMBUS protocol error 3 OFh G G G G Generic memory error 3 3 2 1 BIOS Recovery Beep Codes In rare cases when the system BIOS has been corrupted a BIOS recovery process must be followed to restore system operability During recovery mode the video controller is not initialized One high pitched beep announces the start of the recovery process The entire process takes two to four minutes A successful update ends with two high pitched beeps In the event of a failure two short beeps are generated and a flash code sequence of 0E9h OEAh OEBh OECh and 0EFh appears at the Port 80 diagnostic LEDs see Table 3 12 on page 3 22 Chapter 3 Troubleshooting the Server Using Built In Tools 3 21 TABLE 3 12 BIOS Recovery Beep Codes Error Beep Code Message Port 80h LED Indicators Description I Recovery Start recovery process started 2 Recovery Flashing series of post Unable to boot to floppy ATAPI or boot error codes ATAPI CDROM Recovery process will E9h retry EAh EBh ECh EFh Series of long low Recovery EEh Unable to process valid BIOS recovery pitc
37. Quick Test Results Summary 4 36 Platform Confidence Quick Test Sensor Readings first screen 4 37 Platform Confidence Quick Test Sensor Readings second screen 4 37 Sample RESULT LOG 4 39 Platform Confidence Comprehensive Test Progress 4 41 Platform Confidence Comprehensive Test Results 4 42 Boot Complete from the Diagnostics CD 4 46 Boot Complete from the Diagnostics CD 4 48 Service Partition Administration Menu 4 49 Prompt to Begin BIOS Update 4 50 BIOS Update in Progress Prompt 4 50 First BIOS Update Finished 4 51 Second BIOS Update Pass 4 52 Verifying the BIOS Version 4 52 Location of Recovery Boot Jumper 4 53 Removing the Cover 5 3 Sun Fire V60x Server Bezel Replacement 5 5 xii Sun Fire V60x and Sun Fire V65x servers e November 2003 FIGURE 5 3 FIGURE 5 4 FIGURE 5 5 FIGURE 5 6 FIGURE 5 7 FIGURE 5 8 FIGURE 5 9 FIGURE 5 10 FIGURE 5 11 FIGURE 5 12 FIGURE 5 13 FIGURE 5 14 FIGURE 5 15 FIGURE 5 16 FIGURE 5 17 FIGURE 5 18 FIGURE 5 19 FIGURE 5 20 FIGURE 5 21 FIGURE 5 22 FIGURE 5 23 FIGURE 5 24 FIGURE 5 25 FIGURE 5 26 FIGURE 5 27 FIGURE 5 28 FIGURE 5 29 FIGURE 5 30 FIGURE 5 31 FIGURE 5 32 Sun Fire V65x Server Bezel Replacement 5 6 Floppy CD ROM Module Replacement 5 8 DIMM Installation 5 11 DIMM Pair Locations 5 12 DIMM Fan Removal 5 13 Vertical Fan Support Bar Location 5 14 Connecting the DIMM Fan Power Cable 5 15 Sun Fire V60x Server Heatsi
38. Resilient Booting FRB 4 19 4 9 1 FRB3 4 20 4 9 2 FRB2 4 20 Enabling USB Keyboard and Mouse Operation 4 23 Using the Service Partition Menu 4 24 4 11 1 Create Diskettes 4 25 4 11 2 System Utilities 4 26 4 11 2 1 Run System Setup Utility 4 27 4 11 2 2 Using the SSU to Manage Logs Records Hardware and Events 4 30 4 11 2 3 Run Platform Confidence Test PCT 4 31 4 11 2 4 Run Baseboard Management Controller BMC Firmware Update 4 43 4 11 25 Run HSC Firmware Update 4 43 vii 4 12 4 13 4 14 4 11 2 6 Run Field Replaceable Unit Sensor Data Record FRU SDR Update3 4 44 4 11 2 7 Run BIOS Update reboot required 4 44 4 11 2 8 Reboot to Service Partition 4 44 4 11 2 9 Reboot System 444 Using the Sun Diagnostics CD 4 45 4 12 1 Create Diskettes 4 46 4 12 2 Run System Utilities 4 46 4 12 3 Service Partition 4 47 4 12 4 Restoring the Service Partition 448 Updating the Server Configuration 4 50 4 13 1 Using the Diskette to Update the Server BIOS 4 50 4 13 2 Recovering the BIOS 4 53 Restarting and Shutting Down 4 55 4 14 1 Software Mechanisms 4 55 4 14 1 1 Software Shutdown Commands for Linux 4 55 4 14 1 2 Software Shutdown Commands for Solaris 4 55 4 14 2 Hardware Mechanisms 4 56 5 Maintaining the Server 5 1 5 1 52 5 3 5 4 2 9 Tools and Supplies Needed 5 2 Determining a Faulty Component 5 2 Safety Before You Remove the Cover 5 2 Removing and Replacing the Cover 5 3 Customer Replaceable Unit CRU Procedu
39. Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap The lithium battery on the server board powers the real time clock RTC for up to 10 years in the absence of power A low battery condition is stored in the System Event Log SEL When the battery starts to weaken it loses voltage and the server settings stored in CMOS RAM in the RTC for example the date and time may be wrong Contact your customer service representative or dealer for a list of approved replacement batteries Warning There is a danger of explosion if the battery is incorrectly replaced Replace only with the same or equivalent type recommended by the equipment manufacturer Discard used batteries according to the manufacturer s instructions To replace the battery Before proceeding record your custom BIOS settings Observe the safety and ESD precautions at the beginning of this chapter Open the chassis and locate the battery on the main board near the left front corner Chapter 5 Maintaining the Server 5 49 4 Push the upper end of the metal retainer away from the battery so that the battery pops up see Figure 5 35 FIGURE 5 35 Replacing the Backup Battery 5 Remove the battery from its soc
40. You are prompted to check all the cables and your server configuration then you are exited to the Platform Confidence Test main menu 3 If the configuration is correct press Enter to continue 4 40 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Several entries are displayed and scroll past on the screen showing the test progress see Figure 4 30 The information is much more detailed than that displayed for the Quick Test Sensor 32h Threshold Bounds Checking Testing LNC Threshold Testing LC Threshold Testing UNC Threshold Testing UC Threshold Sensor 36h Threshold Bounds Checking Sensor 98h Threshold Bounds Checking Testing UNC Threshold Testing UC Threshold Sensor Ah Threshold Bounds Checking Sensor 99h Threshold Bounds Checking Testing UNC Threshold Testing UC Threshold Sensor Alh Threshold Bounds Checking BHC CHECKANALOGHYST 07 01 02 20 17 53 BHC CHECKDISCRETESENSORS 07 01 02 20 17 53 BHC DISPLAYANALOGREADINGS 07 01 02 20 17 53 Sensor 48h Baseboard Fan 5 Reading BBh gt 9537 00 RPH Sensor 40h Baseboard Fan 5 Reading BBh gt 9537 00 RPH FIGURE 4 30 Platform Confidence Comprehensive Test Progress When the testing is done the results are summarized see Figure 4 31 Chapter 4 Powering On and Configuring the Server 4 41 Test Result Summary Pass Count 1 FIGURE 4 31 Platform Confidence Comprehensive Test Results 4 Press any key to see the remainin
41. are sent only to the video port After bootup is finished the configuration can be changed to send all messages to the serial console see Configuring an External Serial Console on page 4 17 Note The USB ports may be disabled until the OS is booted and the USB drivers are installed A PS 2 keyboard is required if a keyboard is necessary for initial bootup and configuration 4 8 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 5 1 4 5 1 1 Boot Options The first bootup screen is shown in Figure 4 6 Copyright 1996 2002 SWV25 86B 0094 PO1 0211111021 SWY 2 Production BIOS Version 1 00 BIOS Build 0094 2 K Intel R Keon TM CPU 2 8GHz Testing system memory memory size 1024MB 1024MB Extended Memory Passed 912K L2 Cache SRAM Passed USB Legacy Enabled Press lt F2 gt to enter SETUP lt F4 gt Service Partition FIGURE 4 6 First BIOS Bootup Screen lt F12 gt Network At the bottom of the screen you are given the option to press the following function keys m F2 to enter the BIOS Setup Utility m If the service partition is installed F4 for the Service Partition a DOS partition allowing setup configuration and server testing m F12 to boot from the Network BIOS Setup Utility lt F2 gt Press F2 to enter the BIOS Setup Utility The main BIOS Setup Utility screen shown in Figure 4 7 appears Chapter 4 Powering On and Configuring the Server
42. command to initiate an orderly shutdown and reboot of the server Chapter 4 Powering On and Configuring the Server 4 55 4 14 2 Hardware Mechanisms The following hardware mechanisms are available m Press the Reset button the server is immediately forced to restart However you may lose data a Press the Power button the server is immediately forced to power down However you may lose data Caution These hardware mechanisms are not recommended and should be used only as a last resort 4 56 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CHAPTER 5 Maintaining the Server This chapter describes how to replace components in the Sun Fire V60x and Sun Fire V65x servers after they have been set up It contains the following sections Tools and Supplies Needed on page 5 2 Determining a Faulty Component on page 5 2 Safety Before You Remove the Cover on page 5 2 Removing and Replacing the Cover on page 5 3 Customer Replaceable Unit CRU Procedures on page 5 4 Field Replaceable Unit FRU Procedures on page 5 54 Note The procedures in this chapter for servicing field replaceable faulty components are for the attention of qualified service engineers only If a Field Replaceable Unit FRU needs replacement contact your local Sun Sales representative who will put you in contact with the Sun Enterprise Service branch for your area You can ar
43. correct Are all jumper and switch settings on add in boards and peripheral devices correct To check these settings refer to the manufacturer s documentation that comes with them If applicable ensure that there are no conflicts for example two add in boards sharing the same interrupt Are all DIMMs installed correctly Are all peripheral devices installed correctly If the system has a hard disk drive is it properly formatted or configured Are all device drivers properly installed Are the configuration settings made in BIOS Setup correct Is the operating system properly loaded Refer to the operating system documentation a Did you press the system power on off switch on the front panel to turn the server on power on light should be lit a Is the system power cord properly connected to the system and plugged into a NEMA 5 15R outlet for 100 120 V or a NEMA 6 15R outlet for 200 240V a Is AC power available at the wall outlet a Are there any POST LEDs illuminated If so check Power On Self Test POST on page 3 15 a Are there any POST beep codes If so check POST Error Beep Codes on page 3 19 1 10 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 1 9 Problems With New Application Software Problems that occur when you run new application software are usually related to the software Faulty equipment is much less likely especially if other software runs correctly
44. damage to the board or board components In addition if too much force is used you may bend the heatsink retention clip to a point where it may be difficult to replace it without bending it back to its original position a Secure each end of the retention clip to the tabs in the processor retainer by aligning the clip holes over the tabs and pushing down b With the tool in the vertical position firmly grasp it and insert the middle prong of the tool securely into the hole at the center of the retention clip c Slowly and carefully push the tool downward making sure the center prong of the tool stays in the retention clip hole d As you continue to exert downward pressure move the top of the handle slightly in a direction away from the heatsink so that the clip is pushed away from the retainer and the hole in the center of the clip is aligned over the retainer tab Chapter 5 Maintaining the Server 5 31 5 5 6 5 5 6 1 e Gradually move the top of the tool handle back toward the heatsink in such as manner as to slide the center of the clip over the retainer tab securing it in place Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Power Supply Unit Caution The Sun Fire V60x server does not have a redundant power supply Before replacing the power supply you must take the server out of serv
45. disconnect all cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap Sun Fire V60x Server Main Board Replacement Note Several assemblies must be removed so that the main board can be replaced Refer to other sections of this chapter for detailed instructions on how to remove and replace these assemblies To replace the Sun Fire V60x server Main Board follow these steps Open the box containing the replacement board and remove one of the two antistatic pads You will need this pad in step 17 as an ESD safe place to place the old server board Before removing the cover to work inside the system observe the previously mentioned safety guidelines Remove the cover and bezel from the chassis Remove all drives from the drive bays and flex bay Remove the PCI riser board assemblies 5 54 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 10 11 12 13 14 15 16 Remove the power supply Remove the air baffle Unscrew and remove the blue plastic retention clip that holds down the flex cable onto the server board At the backplane board disconnect the ribbon cable from the front panel board Remove the processor air duct Disconnect the fan cable from the serv
46. errors Some of the error messages are preceded by the string Error to highlight the fact that the system may be malfunctioning All POST errors and warnings are logged in the System Event Log SEL unless it is full See Managing the System Event Log on page 4 30 for more details on the SEL Note All POST errors are logged to the SEL which is capable of holding approximately 3200 entries After the SEL is full no further errors are logged The SEL can be cleared using the SSU or the BIOS setup The SEL is automatically cleared after running the PCT See Managing the System Event Log on page 4 30 for more details Table 3 7 and Table 3 8 contain the POST error messages and error codes TABLE 3 7 Standard POST Error Messages and Codes Error Code Error Message Pause On Boot 100 Timer Channel 2 error Yes 101 Master Interrupt Controller Yes 102 Slave Interrupt Controller Yes 103 CMOS battery failure Yes 104 CMOS options not set Yes 105 CMOS checksum failure Yes 106 CMOS display error Yes 107 Insert key pressed Yes 108 Keyboard locked message Yes 109 Keyboard stuck key Yes 10A Keyboard interface error Yes 10B System memory size error Yes 10E External cache failure Yes 110 Floppy controller error Yes 111 Floppy A error Yes 112 Floppy B error Yes 113 Hard disk 0 error Yes 3 16 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 TABLE 3 7 Standard POST Error
47. failed and AC fuse open or other critical failure Note If redundant power supplies are used in the Sun Fire V65x server the power supply LEDs have the following meaning Both LEDs off no power to power supplies or both power supplies bad Both LEDs blinking green power supplies receiving AC power but server is off Both LEDs solid green server is fully powered on and power supplies are good One LED solid green and one LED amber AC power missing from one of the power supplies 3 12 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 2 9 Server Main Board Fault LEDs There are several fault and status LEDs built into the server board see Figure 3 5 Some of these LEDs are visible only when the chassis cover is removed The LEDs are explained in this section POST O ID LED LEDs System StatusO LED 7 L 2 u Li E m wl E l al UL E io oOo00000 CS A g Oo i J DIMM Fault AANA Be 2 A LEDs 6 i EDE pie Nada a a a CPU 2 Fault LED z cpuiod yy a E Faut CEE 5V SytemO Standby LED v4 gt SA VA eo 7 7 KU E ip m z 6 FIGURE 3 5 Fault and Status LEDs on the Server Board Chapter 3 Troubleshooting the Server Using Built In Tools 3 13 The fault LEDs are summarized below a POST LEDs To help diagnose POST failures a set of f
48. grease to top of processor as required See processor documentation for additional information FIGURE 5 23 Applying Thermal Conducting Material 9 Orient the heatsink such that it properly and fully contacts the surface of the processor beneath it 10 Gently lower the heatsink in place being careful not to damage the thermal interface material TIM as shown in Figure 5 24 FIGURE 5 24 Installing the Heatsink Caution Misorientation of the heatsink will result in poor contact between heatsink and processor Not only will the processor overheat but both processor and socket may be damaged when clamping the heatsink down 5 30 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 11 Install the heatsink retention clips using the retention clip tool Note Make sure to install both retention clips A Tab at inside of retention clip engages slot on heatsink base during installation Note that center dogleg slot in clip provides room for side to side motion while engaging retention clip slots located at each side Retention Clip ee m a l Side Plastic Tab NOTE For ease of installation BOTH retention clips should be installed simultaneously FIGURE 5 25 Installing the Heatsink Retention Clip Details Warning Incorrect use of the tool can cause the tool to slip from the retention clip and strike the server board possibly causing severe
49. is Connected The external SCSI bus is scanned for disk devices before the internal bus is scanned As a result Linux will label external drives before internal drives Exercise caution when adding and removing external devices because Linux drive device names such as dev sea may change leaving the system unable to boot because the external SCSI device may not be a boot drive PS 2 Mouse Misidentification A PS 2 pointing device mouse may be misidentified during OS installation To correct the mouse configuration for the Linux OS run the setup tool from the command line Select the Mouse configuration option then identify the connected pointing device Save the change and exit the setup utility The Solaris OS automatically detects the mouse and if it finds that it needs to change some information it starts the kdmconfig on reboot Chapter 4 Powering On and Configuring the Server 4 15 4 6 Loading the Operating System The bootup process eventually takes you to the point where the operating system loads Note If you have a Sun Fire V60x server or a Sun Fire V65x server with the Solaris operating environment preinstalled the operating system loads during the bootup process If you have a server without a preinstalled operating system you need to install the operating system at this time In all cases the serial port on the rear panel is operational and you can redirect boot messages to an extern
50. is recorded automatically by the BMC while the late POST OS Boot FRB 2 and AP failures are logged to the SEL by the BIOS In the case of an FRB 2 failure some systems log additional information into the OEM data byte Chapter 4 Powering On and Configuring the Server 4 21 4 22 fields of the SEL entry This additional data indicates the last POST task that was executed before the FRB 2 timer expired This information may be useful for failure analysis The BMC maintains failure history for each processor in nonvolatile storage Once a processor is marked failed it remains failed until the user forces the system to retest the processor The BIOS reminds the user about a previous processor failure during each boot cycle until all processors have been retested and successfully pass the FRB tests or AP initialization Processors that have failed in the past are not allowed to become the BSP and are not listed in the MP table and ACPI APIC tables It might happen that all the processors in the system are marked bad An example is a uni processor system where the processor has failed in the past If all the processors are bad the system does not alter the BSP and it attempts to boot from the original BSP Error messages are displayed on the console and errors are logged in the System Event Log of a processor failure If the user replaces a processor that has been marked bad by the system the user must inform the system of this change by running
51. item lists the various diskettes you can create as shown in Figure 4 15 Create Driver kettes Create Utility diskettes System Setup Utilit FRU SDR Load Utility Platform Confidence Test Create BIOS Diskette Create HSC Diskette Create BHC Diskette Sun Cobalt Grizzly Service Partition Menu Beta 1 release Arrow keys move highlight ENTER to select ESC to abort FIGURE 4 15 Create Diskettes Submenu You can use this submenu to create various types of standalone diskettes that you can use to boot servers and run particular tests utilities or Flash the BIOS independently of using the service partition System Utilities menu The following disks can be created m System Setup Utility choosing this option creates two diskettes that allow you to run the System Setup Utility SSU in the same way that you run it from the service partition or from the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD With the two diskette set you can perform the functions described in Run System Setup Utility on page 4 27 a FRU SDR Load Utility choosing this option creates one diskette that allows you to run the FRU SDR Load Utility in the same way that you run it from the service partition or from the diagnostics CD With the diskette you can perform the functions described in Run Field Replaceable Unit Sensor Data Record FRU SDR Update3 on page 4 44 a Platform Confidence Test choosing this option creates one diskette that a
52. jumper block on the server board at the left edge of the board Jumpers at this location are available for the following functions Recovery Boot a Password Clear a CMOS Clear An additional jumper is available for BMC Write Protect WP It is located towards the rear of the main board at the left side of the full depth PCI slot For normal operation these jumpers should be left in their default position as shipped from the factory Chapter 2 Troubleshooting Specific Problems 2 21 4 What processors are supported on the main board The server board supports the Intel Xeon processor with 512K cache 5 What heatsink should I use The boxed processor is available in two basic package configurations It is available packaged for the Sun Fire V60x server 1U chassis implementation and is bundled with a low profile 1U copper heatsink and air duct It is also available for the Sun Fire V65x server This package is bundled with a full height heat sink and processor air duct You must select the proper package for your chassis type and configuration For integration into the Sun Fire V60x server the 1U version of the packaged processor must be used For the Sun Fire V65x server chassis the 2U Pedestal version of the processor package must be used Do not attempt to use the 1U version of the packaged processor with the Sun Fire V65x server as this chassis is designed to provide proper airflow through a plastic shroud and through the fi
53. may restart or shut down the Sun Fire V60x and Sun Fire V65x servers using software or hardware Software Mechanisms This section describes the software shutdown commands supported by Linux and Solaris Software Shutdown Commands for Linux The following software mechanisms are available for shutting down a Linux system a Ctrl Alt Del key combination use this to shut down the operating system and restart the server at any time This works regardless of whether you are logged in or not when in text mode When running GNOME or other X Window System desktops you must log in as root first The Ctrl Alt Del key combination works for both PS 2 and USB keyboards Note The USB port is disabled until an OS and the USB drivers are installed A PS 2 type keyboard and or mouse may be required for initial bootup and configuration m shutdown h now type this to initiate an orderly shutdown and halt the server You may then press the Power button to safely power off the server m shutdown r now type this to initiate an orderly shutdown and reboot of the server m reboot type this to initiate a reboot of the server Software Shutdown Commands for Solaris The following software mechanisms are available for shutting down a Solaris system m shutdown g0 i0 type this to initiate an orderly shutdown and halt the server You may then press the Power button to safely power off the server m shutdown g0 i6 or reboot type either
54. of the old ones Note When installing a new processor or relocating a processor to a different main board apply thermal paste as needed to the top of the processor Caution If you are installing a processor removed from a different server you must prepare the processor and heatsink so that the heatsink properly conducts the heat away from the processor see Figure 5 23 on page 5 30 If the processor and heatsink are not properly prepared damage to the processor or socket can result You should not allow any surface that has thermal interface material to come in contact with any other surface as surface contamination may occur Follow these steps to replace the processor and heatsink Make sure the old processor has been removed and placed on an antistatic pad or if you are moving the processor from one main board to another insert the processor directly into the new board as indicated in the next step As shown in Figure 5 11 open the socket lever Open the lever all the way as shown FIGURE 5 11 Opening the Socket Lever 3 Align the corner mark on the processor with the mark on the socket 4 Insert the processor into the socket as shown in Figure 5 12 Chapter 5 Maintaining the Server 5 19 Align corner mark on the processor to the socket AN shown lt A NISS d SRS To FIGURE 5 12 Inserting the Processor 5 Verify that the processor sits flush and level on the socket
55. position on the chassis floor under where the backplane board will be installed and connect it to the USB connector on the server board Install the backplane board Install the power supply Install the cables between the new server board and the other system components Install the fan module and connect the fan cable to the server board Install the air baffle With a screw install the blue plastic retention clip that holds down the flex cable onto the server board Install the processor air duct Install both PCI riser board assemblies Replace all drives in the drive bays and flex bay Replace the chassis cover if you have no additional work to do inside the chassis Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Chapter 5 Maintaining the Server 5 57 5 6 1 2 10 11 12 Sun Fire V65x Server Main Board Replacement Note Several assemblies must be removed so that the main board can be replaced Refer to other sections of this chapter for detailed instructions on how to remove and replace these assemblies To replace the Sun Fire V65x server Main Board follow these steps Open the box containing the replacement board and remove one of the two antistatic pads You will need this pad in step 17 as an ESD safe place to place the old server board Before removing the cover to
56. resetting the server or shutting it down and powering it back up You may restart or shut down the Sun Fire V60x and Sun Fire V65x servers using software or hardware Software Mechanisms This section describes the software shutdown commands supported by Linux and Solaris Software Shutdown Commands for Linux The following software mechanisms are available for shutting down a Linux system m Ctrl Alt Del key combination use this to shut down the operating system and restart the server at any time This works regardless of whether you are logged in or not when in text mode When running GNOME or other X Window System desktops you must log in as root first The Ctrl Alt Del key combination works for both PS 2 and USB keyboards Note The USB port is disabled until an OS and the USB drivers are installed A PS 2 type keyboard and or mouse may be required for initial bootup and configuration m shutdown h now type this to initiate an orderly shutdown and halt the server You may then press the Power button to safely power off the server m shutdown r now type this to initiate an orderly shutdown and reboot of the server m reboot type this to initiate a reboot of the server Software Shutdown Commands for Solaris The following software mechanisms are available for shutting down a Solaris system m shutdown g0 i0 type this to initiate an orderly shutdown and halt the server You may then press the Power button to s
57. start by default Xserver is installed by default in the Solaris OS Characters Are Distorted or Incorrect Check the following a Are the brightness and contrast controls properly adjusted on the video monitor See the manufacturer s documentation a Are the video monitor signal and power cables properly installed If the problem persists the video monitor may be faulty or it may be the incorrect type Contact your service representative or authorized dealer for help Chapter 2 Troubleshooting Specific Problems 2 11 2 2 4 2 25 2 2 0 System Cooling Fans Do Not Rotate Properly If the system cooling fans are not operating properly system components could be damaged Check the following a Is AC power available at the wall outlet a Is the system power cord properly connected to the system and the wall outlet a Did you press the power button a Is the power on light illuminated a Have any of the fan motors stopped use the server management subsystem to check the fan status Are the fan power connectors properly connected to the server board Is the cable from the front panel board connected to the server board Are the power supply cables properly connected to the server board Are there any shorted wires caused by pinched cables or power connector plugs forced into power connector sockets the wrong way If the switches and connections are correct and AC power is available at the wall outlet contact your servi
58. sun com service contacting solution html For general support and documentation on the servers see the following link http www sun com supporttraining Chapter 3 Troubleshooting the Server Using Built In Tools 3 29 3 30 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CHAPTER 4 Powering On and Configuring the Server This chapter explains how to use the Power On switch to apply power to the server boot to the operating system use the serial console update system software and validate the operation of the Sun Fire V60x and Sun Fire V65x servers The chapter contains these sections Jumper Locations on page 4 2 Setting the Serial COM2 Port Jumper on page 4 4 Powering On on page 4 5 Clearing CMOS on page 4 6 Booting Up on page 4 8 Loading the Operating System on page 4 16 Hyper threading CPU Feature on page 4 16 Configuring an External Serial Console on page 4 17 Fault Resilient Booting FRB on page 4 19 Using the Service Partition Menu on page 4 24 Using the Sun Diagnostics CD on page 4 45 Updating the Server Configuration on page 4 50 Restarting and Shutting Down on page 4 55 4 1 4 1 Jumper Locations Part of configuring the server involves setting the jumper positions on the main board The jumper locations are shown in Figure 4 1 and summarized in Table 4 1
59. that are located near processors Caution The processor must be appropriate You may damage the server if you install a processor that is inappropriate for your server Make sure your server can handle a newer faster processor thermal and power considerations If you are adding a second processor to your system the second processor must be compatible with the first processor within one stepping same voltage and same speed For exact information about processor interchangeability contact your customer service representative Chapter 5 Maintaining the Server 5 27 Caution Pressing the power button does not turn off power to this board Disconnect the server board from its power source and from any telecommunications links networks or modems before doing any of the procedures described in this guide Failure to do this can result in personal injury or equipment damage Some circuitry on the server board may continue to operate even though the front panel power button is off Caution Electrostatic discharge ESD can damage server board components Perform CPU replacement procedures only at an ESD workstation If no such station is available you can provide some ESD protection by wearing an antistatic wrist strap and attaching it to a metal part of the computer chassis Caution CPU installation must be performed by trained service personnel only An ESD wrist strap must be used for this procedure
60. the system fan connector on the server board Install the air baffle Replace the processor air duct Replace the chassis cover Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Chapter 5 Maintaining the Server 5 43 5 5 9 3 10 11 Sun Fire V65x Server Fan Module Removal Unlike the fan module in the Sun Fire V60x server the fans in the Sun Fire V65x server are individually replaceable To replace an individual fan first remove the fan module according to the instructions below while referring to Figure 5 33 Remove the full height PCI riser board Unthread the SCSI cable from the retaining hooks on the plastic processor air duct Push the air duct slightly toward the back of the chassis then lift it by its front edge and remove it from the chassis Remove the flex circuit cable retention clip Disconnect the flex circuit cable from the backplane Unthread and remove the USB cable from the clips on top of the fan module Unplug the fan cables from the server board system fan connectors At the end of the fan module closest to the chassis centerline push on the tab to release it from the chassis While pushing on the tab lift up on the module to clear the retention stub Slide the module towards the power supply until it comes free Lift the fan module out o
61. this occurs perform one of the following procedures to reset the system depending on how many DIMMs are in your system For Systems With Four or More DIMMs 1 Turn on the system 2 Press the F2 key to select SETUP when the option appears on the screen 3 In the main page of the SETUP menu use the arrow keys to select the Advanced menu 2 16 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 10 In the Memory Configuration screen select Memory Retest then select Enabled Press the F10 key to exit the SETUP menu and save the changes The system will now boot correctly For Systems With Only Two DIMMs Open the top cover and move the existing two DIMMs from their current bank to one of the other two banks Power on the system Press the F2 key to select SETUP when the option appears on the screen Once in the main page of the SETUP menu use the arrow keys to select the Advanced menu In the Memory Configuration screen select Memory Retest then select Enabled Press the F10 key to exit the SETUP menu and save the changes The system reboots automatically Power off the system Move the DIMMs back to their original location Note If this step is skipped the following error messages may appear on the screen and on the System Event Log Error 8502 Bad or missing memory in Slot 1A Error 8506 Bad or missing memory in Slot 1B Replace the top co
62. to any pushbuttons you press and video is disabled it could be that the front panel is locked By default front panel locking is disabled however it is possible to enable front panel locking through the BIOS setup To do this an administrative password must be set using Security gt Set Admin Password When the password is set the front panel mouse and keyboard are locked after a timeout expires The video is also blanked The purpose of this is to prevent unauthorized access to a server by someone who plugs in a keyboard and video monitor Access is regained simply by using the keyboard to type the password Note A corded PS 2 keyboard not a wireless one must be plugged into the keyboard mouse connector at the back of the server When the front panel is locked the lights on the keyboard flash but the server is still fully functional 2 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 221 3 2 2 1 4 Server Beeps at Power On or When Booting The server indicates problems with beep codes during Power On Self Test POST in the event there is no displayed video A complete list of beep codes is given in POST Error Beep Codes on page 3 19 The following beep codes identify system events during POST in case video fails to display TABLE 2 2 Bootup Beep Codes Beeps Reason 1 One short beep before boot normal not an error 1 2 Search for option ROMs One lon
63. work inside the system observe the previously mentioned safety guidelines Remove the cover and bezel from the chassis Remove the PCI riser board assemblies Remove the air baffle Unscrew and remove the blue plastic retention clip that holds down the flex cable onto the server board Disconnect the USB ribbon cable from the server board unthread it from the top of the fan module and lay the free end back over the drive bays Remove the processor air duct Disconnect the fan cables from the server board and the DIMM fan power cable from the fan module header and remove the fan module Remove the DIMM fan and vertical support bars Disconnect both ends of all remaining cables that connect the main board to other chassis components including the power supply Note Some cables may be soldered at one end Remove only the connectorized end Remove the heat sink processor air dam and any processors and memory DIMMs that you wish to use with the new board 5 58 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 13 14 15 16 17 18 19 Remove the eight screws that secure the processor retention mechanisms and the three mounting screws that secure the server board to the chassis see Figure 5 39 Caution Make sure that all 11 screws have been removed before attempting to take out the board Do not use any tools to try to pry the board o
64. 1001 9hO TOP Of ee iowernibbie o101 sno high bitsO low bits O on left on right O upper nibble 1100 ChO Dan nuk As high bitsO low bits U on left on right FIGURE 3 7 Examples of POST LED Coding During the POST process each light sequence represents a specific Port 80 POST code If a system should hang during POST the diagnostic LEDs present the last test executed before the hang When you read the LEDs observe them from the back of the system The most significant bit MSB is the leftmost LED and the least significant bit LSB is the rightmost LED Note When comparing a diagnostic LED color sequence from the server Main Board to those listed in the diagnostic LED decoder in the following tables the LEDs on the Main Board should be referenced when viewed by looking into the system from the back Reading the LEDs from left to right the most significant bit is located on the left Chapter 3 Troubleshooting the Server Using Built In Tools 3 23 TABLE 3 13 Boot Block POST Progress LED Code Table Port 80h Codes POST Diagnostic LED Decoder Code G green R red A amber Description MSB LSB MSB LSB 10h Off Off Off R The NMI is disabled Start power on delay Initialization code checksum verified 11h Off Off Off A Initialize the DMA controller perform the keyboard controller BAT test start memory refresh and enter 4 GB flat mode 12h Off Off G R___ Get start of initi
65. 4 9 System Time FIGURE 4 7 BIOS Setup Utility Main Screen Caution Changing the BIOS settings may cause undesirable effects and in some cases may disable the server Be very careful before changing the BIOS configuration It is important to note the default boot sequence The boot sequence is accessed by using the right arrow key to select the Boot menu item on the top of the BIOS screen then pressing Enter The boot sequence is then displayed The default boot sequence is as shown in Figure 4 8 4 10 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 System Time Boot Device Priority FIGURE 4 8 Default Boot Sequence You can always restore all of the default BIOS settings by scrolling to the Exit menu item along the top of the screen The BIOS Setup screen then appears as shown in Figure 4 9 System Time Exit Saving Changes FIGURE 4 9 BIOS Setup Utility Exit Screen Chapter 4 Powering On and Configuring the Server 4 11 4 5 1 2 To restore all of the default settings scroll to Load Setup Defaults and press Enter then select Yes at the prompt and press Enter again Now press F10 to save the settings and exit When you exit the BIOS setup utility the bootup process continues Note To update the server BIOS see Updating the Server Configuration on page 4 50 Service Partition lt F4 gt Note If you have a Sun Fire V60x server or
66. 6 Close the socket lever until it locks and secures the processor in the socket FIGURE 5 13 Closing the Socket Lever Caution Move the socket lever slowly and make sure that it is engaged on the locking tab on the side of the socket 7 Orient the heatsink such that it properly and fully contacts the surface of the processor beneath it 8 Gently lower the heatsink in place being careful not to damage the thermal interface material TIM Caution Misorientation of the heatsink will result in poor contact between heatsink and processor Not only will the processor overheat but both processor and socket may be damaged when securing the heatsink with the metal retention clips 5 20 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 9 Install the heatsink retention clips using the retention clip tool Note Make sure to install both retention clips A Tab at inside of retention clip engages slot on heatsink base during installation Note that center dogleg slot in clip provides room for side to side motion while engaging retention Clip slots located at each side Retention Clip eee eee Side Plastic Tab NOTE For ease of installation BOTH retention clips should be installed simultaneously FIGURE 5 14 Installing the Heatsink Retention Clip Details Warning Incorrect use of the tool can cause the tool to slip from the retenti
67. 8 memory failure 3 20 memory sizing 3 24 Memory test 3 20 monitor 2 1 O operating system loading 4 16 P PCI 3 17 3 26 POST 2 7 3 15 beep codes 3 19 LED indicators 3 22 screen messages 3 16 power cord 2 1 power on self test 3 15 preparing for diagnostic testing 2 1 problems after running new application software 1 11 after system has been running correctly 1 12 application software 2 14 bootable CD ROM not detected 2 14 cannot connect to network server 2 13 l 2 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CD ROM drive activity light 2 12 characters on screen are distorted or incorrect 2 11 confirm OS loading 2 8 diskette drive light 2 12 initial system startup 1 10 key system LEDs 2 8 network 2 13 no characters on screen 2 10 POST 2 7 power light 2 9 screen characters incorrect 2 11 server beeps at power on or when booting 2 5 server boots automatically at power on 2 7 server does not power on 2 3 specific 2 2 starting up 2 3 system cooling fans do not rotate 2 12 with initial system startup 1 10 with new application software 1 11 with SNMP 1 9 problems specific 2 2 processor and heatsink configuration 1 6 processor population order 1 6 1 7 Processor slot state 4 22 pushbutton power sleep 3 6 reset 3 6 system ID 3 6 pushbuttons and LEDs 3 3 R replacing the cover 5 3 resetting the server 1 3 restarting 4 55 RTC 3 26
68. CMOS defaults are loaded during the next POST sequence Note that non volatile storage for embedded devices may or may not be affected by the clear CMOS operation depending on the available hardware support Place the CLR CMOS jumper in the Normal position Chapter 4 Powering On and Configuring the Server 4 7 Normal _ c ofo oth JE Q Q Q Q O O O O O 1 F T RCVRBOOT 9 Clear _ o 80 B 60 CLA PSWD 70 e 80 o 90 B 100 CLR CMOS 11 Clear _ Normal _ oj ont om of Write En 2 or Normal B SD BMCBBWE a Osseo fais 2060 opo 1B 3 HOpsro0 1 2 DCD FIGURE 4 5 Location of Clear CMOS Jumper 4 5 Booting Up As soon as power is applied to the server the bootup process begins Boot messages are sent to either a monitor attached to the video port or to a serial console attached to the rear RJ 45 serial COM2 port The server is configured by default to initially send BIOS and kernel messages to both the serial port and the video port However when the boot process reaches the OS load point the messages and screens
69. CSI backplane board this must be done to access the USB ribbon cable while referring to Figure 5 42 a Unplug all cables from the backplane b Remove the blue thumbscrew A from the right side of the backplane c Grasp the backplane and slide it slightly to the right to free it from the round standoff posts B C that protrude up through the mounting holes d Gently lift out the backplane board and set it aside FIGURE 5 42 Removing the SCSI Backplane 10 The full USB cable Figure 5 41 on page 5 63 E is now uncovered and may be unplugged and removed from the front panel board and from the server board 5 6 2 2 Sun Fire V60x Server Cable Kit Installation To replace cables remove the cover and refer to Figure 5 40 Figure 5 41 and Figure 5 42 while following these steps 1 If you previously removed the SCSI backplane board to remove the USB cable a Replace the USB cable by connecting it from the front panel board to the USB connector Figure 5 41 E on the server board 5 64 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 10 11 12 b Replace the SCSI backplane board i Place the board over all six of the round standoff posts ii Slide the board to the left until it is fully secured by the standoffs iii Install the thumbscrew at the right side of the board Route the fro
70. Configuring the Server 4 29 4 11 2 2 Using the SSU to Manage Logs Records Hardware and Events The System Setup Utility SSU allows you to manage the following System Event Log SEL Sensor Data Records SDR Field Replaceable Units FRU m Platform Events Managing the System Event Log The server maintains a system event log SEL in non volatile memory which holds approximately 3200 SEL entries The log can be viewed and cleared using the SSU To manage the log double click the SEL Manager menu item on the Available Tasks pane of the main SSU window The System Event Log appears and you can use the menu bar at the top of the log window to save the log open a log clear the log or reload the log The SEL can also be cleared if you select BIOS setup during bootup choose lt F2 gt and go into the Server menu Choose Event Log Configuration gt Clear All Event Logs gt Yes Press the Esc key until you receive the prompt to exit setup If you choose to save your changes the System Event Log will be cleared the next time you boot In addition the SEL is cleared automatically each time you run the Platform Confidence Test PCT See Run Platform Confidence Test PCT on page 4 31 for more details on the PCT Managing the Sensor Data Records The Sensor Data Record SDR Manager allows you to view the current sensor data for the system save the SDR data to a file and view SDR information previously sav
71. Do not place the thermal surface as it may pick up contaminants causing incorrect processor mating and possible overheating ZH N R FIGURE 5 15 Sun Fire V65x Server Heatsink and Processor Removal 5 23 Chapter 5 Maintaining the Server 5 5 4 5 Sun Fire V65x Server Heatsink and Processor Replacement Installing a replacement heatsink and processor is essentially the reverse of the procedure given in the previous section Note When a processor kit includes new heatsink retention clips use them in place of the old ones Note When installing a new processor or relocating a processor to a different main board apply thermal paste as needed to the top of the processor Caution If you are installing a processor removed from a different server you must prepare the processor and heatsink so that the heatsink properly conducts the heat away from the processor see Figure 5 23 on page 5 30 If the processor and heatsink are not properly prepared damage to the processor or socket can result You should not allow any surface that has thermal interface material to come in contact with any other surface as surface contamination may occur Follow these steps to replace the processor and heatsink Make sure the old processor has been removed and place it on an antistatic pad or if you are moving the processor from one main board to another insert the processor directly into the new board as in
72. ED 3 15 Examples of POST LED Coding 3 23 Main Board Jumper Locations 4 2 J5A2 Jumper Block Configured for DSR Signal pin 7 connected to DSR 4 4 J5A2 Jumper Block Configured for DCD Signal pin 7 connected to DCD 4 4 Power and Reset Switches on the Front Panel 4 5 Location of Clear CMOS Jumper 4 8 First BIOS Bootup Screen 4 9 BIOS Setup Utility Main Screen 4 10 Default Boot Sequence 4 11 BIOS Setup Utility Exit Screen 4 11 Network Boot Failed Screen 4 13 Boot Device Selection Menu 4 14 Rear Panel Serial COM2 Port 4 17 xi FIGURE 4 13 FIGURE 4 14 FIGURE 4 15 FIGURE 4 16 FIGURE 4 17 FIGURE 4 18 FIGURE 4 19 FIGURE 4 20 FIGURE 4 21 FIGURE 4 22 FIGURE 4 23 FIGURE 4 24 FIGURE 4 25 FIGURE 4 26 FIGURE 4 27 FIGURE 4 28 FIGURE 4 29 FIGURE 4 30 FIGURE 4 31 FIGURE 4 32 FIGURE 4 33 FIGURE 4 34 FIGURE 4 35 FIGURE 4 36 FIGURE 4 37 FIGURE 4 38 FIGURE 4 39 FIGURE 4 40 FIGURE 5 1 FIGURE 5 2 Console Redirection BIOS Setup 4 18 Service Partition Menu 4 24 Create Diskettes Submenu 4 25 System Utilities Submenu 4 26 SSU Main Window 4 27 Multiboot Add in Window 4 28 Security Main Window 4 29 Reboot Prompt 4 32 Warning Prompt 4 33 Platform Confidence Test Menu 4 33 Platform Confidence Quick Test first screen 4 34 Platform Confidence Quick Test Hardware Test Configuration last screen 4 35 Platform Confidence Quick Test Progress 4 36 Platform Confidence
73. Hyper threading can improve performance One example of a mixed task environment is a web and media server that simultaneously runs HTTP daemons and floating point media encoders If Hyper threading is enabled some benchmarks may report inconsistent results due to the chaotic nature of process scheduling on asymmetric logical processors 4 8 Configuring an External Serial Console The RJ 45 serial COM2 port on the Sun Fire V60x and Sun Fire V65x servers rear panel see Figure 4 12 can be used to direct boot messages to a serial console for example a laptop running HyperTerminal a al a e a AKI Rear Panel OM11668A Serial Console Port COM2 FIGURE 4 12 Rear Panel Serial COM2 Port The server is initially configured to send all the initial BIOS and kernel bootup messages to both the serial console and the VGA port Use the information in this section only if you need to restore or change the configuration The default communications settings for the serial COM2 port on the rear panel of the server are 9 600 bps 8 data bits 1 stop bit No parity No flow control You can connect to the serial COM2 port if you have an adapter cable that has an RJ 45 connector at one end and a DB9 connector at th
74. Messages and Codes Continued Error Code Error Message Pause On Boot 114 Hard disk 1 error Yes 115 Hard disk 2 error Yes 116 Hard disk 3 error Yes 117 CD ROM disk 0 error Yes 118 CD ROM disk 1 error Yes 119 CD ROM disk 2 error Yes 11A CD ROM disk 3 error Yes 11B Date time not set Yes 11E Cache memory bad Yes 120 CMOS clear Yes 121 Password clear Yes 140 PCI error Yes 141 PCI memory allocation error Yes 142 PCI IO allocation error Yes 143 PCI IRQ allocation error Yes 144 Shadow of PCI ROM failed Yes 145 PCI ROM not found Yes 146 Insufficient memory to shadow PCI ROM Yes TABLE 3 8 Extended POST Error Messages and Codes Error Code Error Message Pause On Boot 8100 Processor 1 failed BIST No 8101 Processor 2 failed BIST No 8110 Processor 1 internal error IERR No 8111 Processor 2 internal error IERR No 8120 Processor 1 thermal trip error No 8121 Processor 2 thermal trip error No 8130 Processor 1 disabled No 8131 Processor 2 disabled No Chapter 3 Troubleshooting the Server Using Built In Tools TABLE 3 8 Extended POST Error Messages and Codes Continued Error Code Error Message Pause On Boot 8140 Processor 1 failed FRB 3 timer No 8141 Processor 2 failed FRB 3 timer No 8150 Processor 1 failed initialization on last boot No 8151 Processor 2 failed initialization on last boot No 8160 Processor 01 unable to apply BIOS update Yes 8161 Processor 02 unable to apply BIOS update Yes 8170 Processor P1 L2 cache failed Yes 8171 Proces
75. Santa Clara California 95054 Etats Unis Tous droits r serv s Sun Microsystems Inc a les droits de propri t intellectuelle relatants a la technologie qui est d crite dans ce document En particulier et sans la limitation ces droits de propri t intellectuelle peuvent inclure un ou plus des brevets am ricains num r s a http www sun com patents et un ou les brevets plus suppl mentaires ou les applications de brevet en attente dans les Etats Unis et dans les autres pays Ce produit ou document est prot g par un copyright et distribu avec des licences qui en restreignent l utilisation la copie la distribution et la d compilation Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme par quelque moyen que ce soit sans l autorisation pr alable et crite de Sun et de ses bailleurs de licence s il y ena Le logiciel d tenu par des tiers et qui comprend la technologie relative aux polices de caract res est prot g par un copyright et licenci par des fournisseurs de Sun Des parties de ce produit pourront tre d riv es des syst mes Berkeley BSD licenci s par l Universit de Californie UNIX est une marque d pos e aux Etats Unis et dans d autres pays et licenci e exclusivement par X Open Company Ltd Sun Sun Microsystems le logo Sun Java JumpStart Solaris et Sun Fire sont des marques de fabrique ou des marques d pos es de Sun Microsystems Inc aux Etats Unis et dans d
76. See Figure 1 1 on page 1 5 for the location of the RCVR BOOT jumper Make sure the monitor is turned on and the video cable is plugged in completely If you are using a switch box to share a monitor between multiple servers ensure that you have switched to the proper server Is a video cable plugged into the front panel video connector If so the rear video connector will be disabled m Is there an add in PCI video card If so on board video will be disabled Remove all add in cards and retry booting with just the on board components If this is successful try plugging in the add in boards one at a time with a reboot between each addition to isolate a suspect card As a last resort remove and reseat memory modules and processors Try using memory and processors from a known working system Caution Removing and replacing the processors is not recommended and should only be done as a last resort This is a procedure that should be attempted by Sun qualified service personnel Instructions for removing and replacing processors are given in the section titled Replacing a Server CPU and Heatsink on page 5 16 Video can be disabled on the server by means of the BIOS setup If you are using an add in video card make sure your monitor is plugged into the add in video card If you suspect that your video controller may be disabled through the BIOS setup you can attach to the system through a server management connection either th
77. Server Air Baffle Installation To replace the air baffle follow these steps 1 Slide the air baffle in place ensuring that one tab is flat against the top of the power supply and the other tab is resting on top of the backplane board 2 Route the SCSI cable through the slot on the top of the air baffle 3 Secure the air baffle to the backplane using the blue thumbscrew 5 40 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 0 9 5 5 9 1 Fan Module Caution Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap Sun Fire V60x Server Fan Module Removal The fan assembly is a single component see Figure 5 32 The individual fans that make up the assembly are not replaceable Should a fan fail the entire module will need to be replaced A tab on the side of the fan module makes replacement of the module tool less and very simple The fan module is not hot swappable The server must be turned off before the fan module can be replaced Before removing the cover to work inside the system observe the safety guidelines previously given To replace the fan module remove the cover an
78. Sun Fire V60x and Sun Fire V65x servers TABLE 3 1 Server LEDs LED Name Function Location Color Status ID Helps identify One LED on front Blue On ID the server from panel and one at rear the front or rear corner System Visible fault One LED on front Green or Off POST in progress or system stop status indicator panel and one at rear amber Green steady on no fault corner Green blinking degraded Amber steady critical or non recoverable state Amber blinking non critical state Disk Indicates hard Front panel and main Green Blinking HDD activity activity disk activity board left side Memory Identifies failing At the front of each Amber On fault DIMM DIMM module DIMM location on fault 1 6 main board POST Displays boot 80 Left rear of main Each LED See POST Progress Code LED Indicators on LEDs POST codes board can be off page 3 22 for POST code LED details 1 4 green red or amber Fan fault Identifies Sun On Sun Fire V65x Amber On fault 1 4 Fire V65x server server fan module fan failure board CPU land Identify CPU Back corner of Amber On fault 2 fault failure processor socket on main board 5V Identify 5V Front left on main Green Green 5V standby power on standby standby power board on state Main Identifies power Front panel Green Off power is off power state of the On power is on LED server Chapter 3 Troubleshooting the Server Using Built In Tools 3 3 3 2 1 Front Panel LEDs and Push
79. Sun Fire V65x server with the Solaris operating environment preinstalled the Service Partition is also preinstalled If your server does not have a preinstalled operating system the Service Partition is not installed The Service Partition can be installed as described in Service Partition on page 4 47 The Service Partition is a special partition that contains utilities used to manage and configure the Server If the Service Partition is installed use one of the following methods to access the menu m Linux operating environment Press lt F4 gt during bootup a Solaris operating environment Let the system boot up to the Solaris Primary Boot Subsystem menu and select the DIAGNOSTIC partition The Service Partition Menu will display see Figure 4 32 If the Service Partition is not installed you can install it or you can also run the utilities directly from the Sun Diagnostics CD Refer to Service Partition on page 4 47 for information on installing or restoring the Service Partition or running the utilities from the Sun Diagnostics CD Note The operations performed with the service partition menus can also be executed from the Sun Fire V60x and Sun Fire V65x servers Diagnostic CD See Using the Sun Diagnostics CD on page 4 45 When you are finished using the service partition you must depress the Ctrl Alt Delete keys simultaneously to reboot 4 12 Sun Fire V60x and Sun Fire V65x Servers Troubles
80. Use this checklist Does the system meet the minimum hardware requirements for the software See the software documentation Is the software an authorized copy If not get one unauthorized copies often do not work If you are running the software from a diskette is it a good copy If you are running the software from a CD ROM disk is the disk scratched or dirty If you are running the software from a hard disk drive is the software correctly installed Were errors ignored while installing the software If so address these errors and try re installation Were all necessary procedures followed and files installed Are the correct device drivers installed Is the software correctly configured for the system Are you using the software correctly If the problems persist contact the software vendor s customer service representative Chapter 1 Troubleshooting Guidelines 1 11 1 10 Problems After the System Has Been Running Correctly Problems that occur after the system hardware and software have been running correctly often indicate equipment failure Many situations that are easy to correct however can also cause such problems Use this checklist a If you are running the software from a diskette try a new copy of the software a If you are running the software from a CD ROM disk try a different disk to see if the problem occurs on all disks a If you are running the software from a hard disk drive try running i
81. afely power off the server m shutdown g0 i6 or reboot type either command to initiate an orderly shutdown and reboot of the server Chapter 1 Troubleshooting Guidelines 1 3 1 2 2 Hardware Mechanisms The following hardware mechanisms are available m Press the Reset button the server is immediately forced to restart However you may lose data m Press the Power button the server is immediately forced to power down However you may lose data Caution These hardware mechanisms are not recommended and should be used only as a last resort 1 3 Disabling Integrated Components Onboard controllers can be disabled through the server board BIOS setup To enter BIOS setup press lt F2 gt when prompted during the boot up process For more information see BIOS Setup Utility lt F2 gt on page 4 9 1 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 1 4 Setting Main Board Jumpers You should not normally need to set any of the baseboard jumpers They are set at the factory to default positions for optimal operation However if you should need to change them their locations and functions are shown in Figure 1 1 and Table 1 1 c of otp Q Q Q a e Ps E o ie 6 ALB Recover Normal Ho Ho pno Ho i Ho Ho CT 1 CT 1 o e00 em
82. al console See Configuring an External Serial Console on page 4 17 for details 4 7 4 6 Hyper threading CPU Feature The Sun Fire V60x and Sun Fire V65x servers feature Hyper threading capable processors Enabling Hyper threading causes each physical CPU to act as two logical CPUs Enabling Hyper threading on a dual processor Sun Fire V60x server or Sun Fire V65x server causes the operating system to recognize four distinct processors Note Hyper threading is disabled by default when the Sun Fire V60x and Sun Fire V65x servers are shipped Hyper threading may be enabled or disabled in the system BIOS configuration menu by using the following instructions Press the F2 key during the power on self test POST while the server is booting to enter the BIOS configuration menu Using the down arrow key scroll to Processor Settings then press the enter key Toggle the feature Hyper threading using the enter and arrow keys Press the Esc key once to exit the Processor Settings menu Press the left arrow key to highlight the Exit menu Press the Enter key to select Exit Saving Changes 4 16 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 7 Press the Enter key again to confirm the new BIOS setting In most high performance computing HPC environments it is best to disable the Hyper threading feature In some mixed task compute environments
83. alization code and check BIOS header 13h Off Off G A Memory sizing 14h Off G Off R Test base 512K of memory Return to real mode Execute any OEM patches and set up the stack 15h Off G Off A Pass control to the uncompressed code in shadow RAM The initialization code is copied to segment 0 and control will be transferred to segment 0 16h Off G G R Control is in segment 0 Verify the system BIOS checksum If the system BIOS checksum is bad go to checkpoint code E0h otherwise going to checkpoint code D7h 17h Off G G A Pass control to the interface module 18h G Off Off R Decompression of the main system BIOS failed 19h G Off Off A Build the BIOS stack Disable USB controller Disable cache 1Ah G Off G R _Uncompress the POST code module Pass control to the POST code module 1Bh A R Off R Decompress the main system BIOS runtime code 1Ch A R Off A Pass control to the main system BIOS in shadow RAM E0h R R R Off Start of recovery BIOS Initialize interrupt vectors system timer DMA controller and interrupt controller E8h A R R Off Initialize extra module if present E9h A R R G Initialize floppy controller EAh A R A Off Try to boot floppy diskette EBh A R A G If floppy boot fails initialize ATAPI hardware ECh A A R Off Try booting from ATAPI CD ROM drive EEh A A A Off Jump to boot sector 3 24 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 TABLE 3 14 POST Progress LED Code Table
84. and Sun Fire V65x servers running the Solaris operating environment refer to the Sun Fire V60x and Sun Fire V65x Server Solaris Operating Environment Installation Guide 817 2875 xx Chapter 4 Powering On and Configuring the Server 4 49 4 13 4 13 1 Updating the Server Configuration There are several methods for updating the server s configuration They are explained in the following sections Using the Diskette to Update the Server BIOS Insert the BIOS update diskette into the server and reboot The prompt shown in Figure 4 35 appears 0123456 Starting ROM DOS 789 Updating to Grizzly Alpha 2 BIOS v1096 Westville 533 only Strike a key when ready FIGURE 4 35 Prompt to Begin BIOS Update Press any key to continue A window appears see Figure 4 36 indicating that the BIOS update is in progress The floppy disk activity light is illuminated during the update as the BIOS image is copied to the server s flash memory Intel Flash Memory Update Utility Part 643643 035 Programming flash memory area with contents of file BIOS UPDATE IN PROGRESS The BIOS is currently being updated DO NOT REBOOT OR POWER DOWN until the update is completed typically within three minutes FIGURE 4 36 BIOS Update in Progress Prompt 4 50 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Caution Do not attempt to reboot or power down the server while th
85. ate a type 12 Compaq Diagnostics partition which is the only partition type the CD tool understands If you are performing a generic Red Hat install you should create the service partition first and then install Red Hat Linux using the option to preserve existing partitions As shipped from the factory the Sun Fire V60x and Sun Fire V65x servers service partition already exists sdal for single drive systems and sdb1 for dual drive systems Just make sure during Red Hat Linux installation to leave these partitions alone If you have a server with preinstalled Solaris 9 software you do not have to create a service partition If you are performing a custom Solaris installation and the service partition does not exist on the server you will need to install the service partition before performing the Solaris installation Chapter 4 Powering On and Configuring the Server 4 47 4 12 4 Restoring the Service Partition If the hard disk service partition is removed erased or damaged you can use the diagnostics CD to restore it depending on the extent of the problem If the contents of the service partition have been removed erased or damaged but the sdal or sdb1 partitions still exist 1 Boot from the diagnostics CD After the server has booted the screen shown in Figure 4 33 appears Sun Cobalt Grizzly Diagnostic CD Beta 1 release Arrow keys move highlight ENTER to select ESC to abort FIGURE 4 33 Boot Complete from t
86. ble wide memory modules 1 GB or 2 GB in the higher numbered slots Chapter 2 Troubleshooting Specific Problems 2 15 If single wide modules are installed in slots with higher numbers than double wide modules you will encounter one of the following messages during POST while the system is booting up Error 8508 A DIMM population error has been detected Please swap DIMM pair 1A 1B with DIMM pair 2A 2B Error 8509 A DIMM population error has been detected Please swap DIMM pair 1A 1B with DIMM pair 3A 3B Error 850A A DIMM population error has been detected Please swap DIMM pair 2A 2B with DIMM pair 3A 3B When you see these messages during POST shut down the system and reinstall the DIMMs so that all of the single wide DIMMs are in lower numbered slots than the double wide DIMMs Refer to Memory on page 5 9 for more information on how to correctly replace the DIMMs 2 2 11 2 Soft Reboot Errors Note This note applies if you are using BIOS Release 2 0 v1161 or earlier This bug has been fixed in the BIOS release 5 0 v1175 After issuing a Soft Reboot there is very small probability that the memory will not reset correctly If the memory does not reset correctly the system will log an error in the System Event Log disable the bank of memory that did not reset correctly and halt Upon rebooting the system will either appear to have no memory installed or will BOOT with one of the memory banks disabled If
87. buttons The front panel contains the pushbuttons and LEDs shown in Figure 3 1 Note that the illustration has the bezel removed NIC1 and NIC2 Activity LEDs Power Sleep Pushbutton Power Sleep LED System Status FO gt ID LED ID Pushbutton Hard Disk Status LED Reset Pushbutton NMI Pushbutton FIGURE 3 1 Front Panel Pushbuttons and LEDs 3 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 2 1 1 Front Panel LEDs The front panel LEDs are summarized in Table 3 2 TABLE 3 2 Front Panel LEDs LED Color Function Power Green This LED is controlled by software It turns steady when the server is powered up and is off when the system is off or in sleep mode NIC1 and NIC2 Green These LEDs are on when a good network link has been established They blink green to reflect network data activity System Green This LED can assume different states green amber steady blinking to indicate Status Fault Amber critical non critical or degraded server operation Steady green Indicates the system is operating normally Blinking green Indicates the system is operating in a degraded condition Blinking amber Indicates the system is in a non critical condition Steady amber Indicates the system is in a critical or non recoverable condition Off Indicates POST system stop See Front Panel System Status LED on page 3 9 for more details regarding this LED Hard D
88. ce Test PCT on page 4 31 5 70 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 D 0 0 5 6 3 1 3 4 System FRU Caution The procedure below is for the attention of qualified service engineers only Before touching or replacing any component inside the server disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap A System FRU is the Main Board with SCSI backplane power supply power supply distribution board Sun Fire V60x server front panel board fan module and all cables in a Sun Fire V60x server or Sun Fire V65x server chassis The System FRU contains no CPU s HDDs Floppy CD ROM DVD combo or DIMMs The field engineer transfers the customer s CPU s HDDs Floppy CD ROM DVD combo and DIMMs to the new assembly There are special CPU heatsink procedures that must be followed when disassembling heatsinks from processors see Replacing a Server CPU and Heatsink on page 5 16 Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Sun Fire V60x and Sun Fire V65x Servers System FRU Installation Before removing the cover to work inside the sys
89. ce representative or authorized dealer for help Disk Drive Activity Light Does Not Light Check the following Are the disk drive power and signal cables properly installed Are all relevant switches and jumpers on the disk drive set correctly Is the disk drive properly configured Is the disk drive activity light always on If so the signal cable may be plugged in incorrectly Use the Setup Utility to make sure that the disk drive is enabled If the problem persists there may be a problem with the disk drive server board or drive signal cable Contact your service representative or authorized dealer for help CD ROM Drive Activity Light Does Not Light Check the following a Are the power and signal cables to the CD ROM drive properly installed a Are all relevant switches and jumpers on the drive set correctly a Is the drive properly configured a Is the onboard IDE controller enabled a Is the flex cable between the server board and the backplane installed properly 2 12 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 PAPE 2 2 0 Verify that the CD ROM drive works correctly in another system Check to see if the BIOS detects the CD ROM during bootup Check the BIOS setup menu to see if the CD ROM is present and set up as a boot device Cannot Connect to a Server Check the following Make sure you are using the onboard network controller drivers that are shipped on the ins
90. cessors have been disabled At this point if all the processors have been disabled the BMC will attempt to boot the system on one processor at a time irrespective of processor error history This is called desperation mode 4 9 2 FRB2 FRB2 refers to the level of FRB in which the BIOS uses the BMC watchdog timer to back up its operation during POST The BIOS configures the watchdog timer for approximately 6 to 10 minutes indicating that the BIOS is using the timer for the FRB2 phase of operation Note The BIOS factory default is Disable BSP Boot Strap Processor on FRB2 After the BIOS has identified the BSP and saved that information it will then check to see if the watchdog timer expired on the previous boot If so it will store the Time Out Reason bits in a fixed CMOS location token name cmosWDTimerFailReason for applications or a User Binary to examine and act upon Next it sets the watchdog timer FRB2 timer use bit loads the watchdog timer with the new timeout interval and disables FRB3 using the FRB3 timer halt signal This sequence ensures that no gap exists in watchdog timer coverage between FRB3 and FRB2 Note FRB2 is not supported when the BIOS is in Recovery Mode If the watchdog timer expires while the watchdog use bit is set to FRB2 the BMC logs a Watchdog expiration event showing an FRB2 timeout if so configured It then hard resets the system assuming Reset was selected as the watchdog ti
91. d Gradually move the top of the tool handle back toward the heatsink in such as manner as to slide the center of the clip over the retainer tab securing it in place 5 26 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 00 5 5 5 1 10 Replace the air baffle fan module and processor air duct Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Sun Fire V60x and Sun Fire V65x Servers New CPU and Heatsink Installation Installing a new processor and heatsink is an extra cost option This section describes how to install a new CPU and heatsink in a previously unpopulated CPU location Caution The procedure below is for the attention of qualified service engineers only Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap Safety Precautions Warning If the server has been running any installed processor and heat sink on the processor board s will be hot To avoid the possibility of a burn be careful when removing or installing server board components
92. d refer to Figure 5 32 when following these steps Remove the clear plastic processor air duct Remove the blue plastic air baffle Unplug the fan cable from the server board At the left end of the module press the release tab While continuing to press the release tab lift the left side of the fan module and slide it slightly left to free the L shaped foot at the right side of the fan module Remove the fan module from the chassis Chapter 5 Maintaining the Server 5 41 FIGURE 5 32 Removing the Fan Module Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 42 5 5 9 2 10 11 Sun Fire V60x Server Fan Module Replacement Replacing the fan module is essentially the reverse of the procedure described in Sun Fire V60x Server Fan Module Removal on page 5 41 Remove the air baffle Remove the processor air duct Position the new fan module so that the fan cable is located closest to the center of the chassis Slide the L shaped foot on the chassis sidewall end of the fan module under the chassis tab Lower the module onto the chassis floor and slide it as far to the right as it will go Ensure that the fan module is situated between the raised guides not on top of them Press down on the left end of the module and press in on the release tab until the tab snaps into the chassis slot Plug the fan module power cable into
93. d run entirely from the remote console Setup and any other text based utilities can be accessed through console redirection 5 Press Esc to go back to the main BIOS setup menu 4 18 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 6 Exit the BIOS setup utility saving the changes you have made The boot messages are now directed to your external serial console In this example the BIOS setup allows the messages to be directed to COM1 of an external PC running HyperTerminal The intended usage model for the RJ 45 serial connector on the back of the server is for use as an interface to a serial port concentrator allowing for remote access to the server s Emergency Management Port EMP When redirecting the console through a modem as opposed to a null modem cable the modem needs to be configured with the following a Auto answer for example ATSO 2 to answer after two rings a Modem reaction to DTR must be set to return to command state for example AT amp D1 Failure to provide this option will result in the modem either dropping the link when the server reboots as in AT amp D0 or becoming unresponsive to server baud rate changes as in AT amp D2 m The Setup System Setup Utility option for handshaking must be set to CTS RTS CD carrier detect for optimum performance a If the Emergency Management Port shares the serial port with serial redirection the handshaking must be set to CTS RTS
94. dering Sun Documents xix Shell Prompts in Command Examples Notice xx Support xx Sun Welcomes Your Comments xxi Troubleshooting Guidelines 1 1 1 1 Startup related Issues 1 1 1 1 1 Ethernet Port Delay 1 1 xix 1 1 2 USB Connected External CD ROM Drive Will Not Function 1 2 1 1 3 Inability to Boot Server When an External SCSI Hard Drive is Connected 1 2 1 1 4 PS 2 Mouse Misidentification 1 2 1 2 Resetting the Server 1 3 1 2 1 Software Mechanisms 1 3 1 2 1 1 Software Shutdown Commands for Linux 1 3 1 2 1 2 Software Shutdown Commands for Solaris 1 3 1 2 2 Hardware Mechanisms 1 4 1 3 Disabling Integrated Components 1 4 1 4 Setting Main Board Jumpers 1 5 1 5 Processor and Heatsink Configurations 1 6 1 5 1 Single or Dual Processor Main Boards 1 6 1 5 2 Supported Processors 1 7 1 5 3 Heatsinks and Air Ducts 1 7 1 5 4 Processor Population Order 1 7 1 5 5 Hyper threading CPU Feature 1 8 1 6 Memory Configurations 1 8 1 7 Problems With SNMP 1 9 1 8 Problems With Initial System Startup 1 10 1 8 1 Checklist 1 10 1 9 Problems With New Application Software 1 11 1 10 Problems After the System Has Been Running Correctly 1 12 2 Troubleshooting Specific Problems 2 1 2 1 Preparing the System for Diagnostic Testing 2 1 2 2 Specific Problems and Corrective Actions 2 2 2 2 1 Problems Starting Up 2 3 2 2 1 1 Server Does Not Power On 2 3 221 2 Front Panel is Unresponsive and Video is Disabled 2 4 2 2 1 3 Server Beeps at Power On or Wh
95. device selection menu This menu shown in Figure 4 11 allows you to select the device from which the system will boot To select a boot device scroll to the desired device and press Enter otherwise press Esc to exit without changing the boot device Chapter 4 Powering On and Configuring the Server 4 13 4 5 2 4 5 2 1 4 5 2 2 Please select boot device Use f and to change selection Use ENTER to select and save Use ESC to Exit without save FIGURE 4 11 Boot Device Selection Menu Note When you select a boot device with the menu shown in Figure 4 11 it only affects the current boot Subsequent boots revert to the device stored in the BIOS default settings Other Bootup Items During the boot process you will be presented with the choices described in the following sections Type N to keep the current information Y to change the information Ethernet Port Delay Ethernet ports may take a short amount of time less than 1second to activate after ifconfig brings them up This has been noted when the Server is running Red Hat Linux v7 2 or v7 3 USB Connected External CD ROM Drives Some USB connected CD ROM devices perform unreliably on the Sun Fire V60x server and Sun Fire V65x server Use the internal CD ROM device when possible 4 14 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 5 2 3 4 5 2 4 Booting the Server When an External SCSI Hard Drive
96. dicated in the next step As shown in Figure 5 16 open the socket lever Open the lever all the way as shown FIGURE 5 16 Opening the Socket Lever 3 Align the corner mark on the processor with the mark on the socket 4 Insert the processor into the socket as shown in Figure 5 17 5 24 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Align corner mark on the processor to the socket INN shown lt A NYES q ORS FIGURE 5 17 Inserting the Processor 5 Verify that the processor sits flush and level on the socket 6 Close the socket lever until it locks and secures the processor in the socket FIGURE 5 18 Closing the Socket Lever Caution Move the socket lever slowly and make sure that it is engaged on the locking tab on the side of the socket 7 Orient the heatsink such that it properly and fully contacts the surface of the processor beneath it 8 Gently lower the heatsink in place being careful not to damage the thermal interface material TIM Caution Misorientation of the heatsink will result in poor contact between heatsink and processor Not only will the processor overheat but both processor and socket may be damaged when securing the heatsink with the metal retention clips Chapter 5 Maintaining the Server 5 25 9 Install the heatsink retention clips using the retention clip tool Note Make sure to install both retention clips
97. e BIOS is being updated You may get unpredictable results When the first update pass is finished the screen shown in Figure 4 37 appears Intel Flash Memory Update Utility Part 643643 035 Programming flash memory area with contents of file Operation completed successfully FIGURE 4 37 First BIOS Update Finished Chapter 4 Powering On and Configuring the Server 4 51 The next screen appears as the BIOS update continues BIOS UPDATE IN PROGRESS J The BIOS is currently being updated DO NOT REBOOT OR POWER DOWN until the update is completed typically within three minutes 0 29 00 SS ay Operation completed successfully Intel Flash Memory Update Utility Part 643643 035 Programming flash memory area with contents of file FIGURE 4 38 Second BIOS Update Pass After the update is complete the CMOS is cleared and the server reboots At this point the server boot block and BIOS code have been updated 4 When the system reboots the screen clears power the system off 5 Remove the floppy diskette 6 Power the system back on 7 Press F2 to enter BIOS setup 8 When you reach the BIOS setup screen press F10 to save and exit As bootup continues the system may report that the Clear CMOS jumper needs to be returned to its original position If this is displayed simply power the server off for 30 seconds and power back up This completes the Windows BIOS update procedure 9 To verify
98. e LED turns off after a timeout period See Figure 3 5 on page 3 13 for the location of the rear Main Board LED The front panel ID LED and the ID activation button are shown in Figure 3 6 0 O Oa IDLED Yi e 6 ID Pushbutton FIGURE 3 6 Location of Front Panel ID Pushbutton and LED 3 3 Power On Self Test POST The BIOS indicates the current testing phase during POST by writing a hex code to the Enhanced Diagnostic LEDs located on the rear of the server main board and visible through the back of the chassis If errors are encountered error messages or codes will either be displayed to the video screen or if an error has occurred prior to video initialization errors will be reported through a series of audible beep codes POST errors are logged in to the System Event Log SEL During the power on self test POST the server may indicate a system fault by a Displaying error codes and messages at the display screen m Beeping the speaker in a coded sequence a Illuminating the POST LEDs visible from the rear panel in a coded fashion Chapter 3 Troubleshooting the Server Using Built In Tools 3 15 OKNI POST Screen Messages During POST if an error is detected the BIOS displays an error code and message to the screen The tables in this section describe the standard and extended POST error codes and their associated messages The BIOS prompts the user to press a key in case of serious
99. e Server Figure 5 6 shows how the DIMM pairs are to be installed They must be installed in pairs There are three banks of DIMMs labeled 1 2 and 3 Bank 1 contains DIMM locations 1A and 1B Bank 2 contains 2A and 2B and Bank 3 contains 3A and 3B DIMM socket identifiers are marked on the silkscreen next to each DIMM socket on the baseboard Note that the sockets associated with any given bank are located next to each other Back of Main Board J5F1 J5F2 J5F3 J6F1 J6F2 J6F3 Fault LEDs n z 3B 3A 1B 1A 2B 2A Bank 1 Bank2 Bank3 o Front of Main Board FIGURE 5 6 DIMM Pair Locations Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 5 12 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 9 2 Sun Fire V65x Server DIMM Replacement 1 Observe all safety precautions and remove the server top cover 2 Remove the DIMM fan assembly by disconnecting the DIMM fan cable from the main fan module then squeezing the vertical fan support bars to release the DIMM fan see Figure 5 7 FIGURE 5 7 DIMM Fan Removal 3 If you are replacing DIMMs whose ejector bars are engaged by the DIMM fan vertical support bars a Do not remove the vertical fan support bars Instead s
100. e chassis Open the retainer clip on the riser card retention bracket Pull the PCI card out of the riser board slot Install the new PCI add in card on the riser assembly Insert the riser assembly connector in the server board slot while aligning the tabs on the rear retention bracket with the holes in the chassis Firmly press the riser assembly straight down until it is seated in the server board slot Replace the chassis cover if you have no additional work to do inside the chassis Chapter 5 Maintaining the Server 5 47 Note Adding or removing network interface PCI cards may change the labelling order for the on board network interfaces Note Hardware detection on startup Kudzu reports the on board Intel Ethernet interfaces as Generic e1000 devices rather than detecting the actual brand name of the device that is installed ME 7 SS aac FIGURE 5 34 Removing a PCI Card 5 48 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 30 11 Caution Press the riser assembly straight down into the slot Tipping it in the slot while installing it may damage the riser card or board slot Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Battery Caution Before touching or replacing any component inside the Sun Fire V60x and
101. e chipset DIMM population order is designated on the board silkscreen DIMM pairs are populated side by side 8 Why is my machine showing 4 CPUs although this is a 2 CPU server Each CPU appears to the OS as 2 CPUs if the Intel Hyperthreading feature is enabled With Hyperthreading enabled a dual CPU server will act like a 4 CPU server 2 22 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CHAPTER O Troubleshooting the Server Using Built In Tools This chapter explains how to detect and isolate faulty components within the Sun Fire V60x and Sun Fire V65x Servers The chapter contains these sections Diagnosing System Errors on page 3 1 LEDs and Pushbuttons on page 3 3 Power On Self Test POST on page 3 15 Contacting Technical Support on page 3 29 3 1 Diagnosing System Errors Use the following tools to help you isolate server problems LEDs on page 3 1 Beep Codes on page 3 2 POST Screen Messages on page 3 2 System Utilities on page 3 2 Ddal LEDs You can use the diagnostic LED indications to isolate faults See LEDs and Pushbuttons on page 3 3 3 1 31 2 adla 3 1 4 3 1 4 1 3 1 4 2 Beep Codes A built in server speaker indicates failures with audible beeps See POST Error Beep Codes on page 3 19 POST Screen Messages For many failures the BIOS sends error codes and message to the screen See
102. e other wired in accordance with Table 2 3 Back Serial COM2 Port Adapter Pinout in Chapter 2 of the Sun Fire V60x and Sun Fire V65x Server User Guide You can then connect the COM1 port of a PC or laptop to the serial COM2 port using the adapter cable and use HyperTerminal or a similar application to communicate with the server Chapter 4 Powering On and Configuring the Server 4 17 An ANSI 500 terminal emulator is needed to have the display appear properly during BIOS setup and when using the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD or Service Partition tools To set up the server to direct messages to an external serial console 1 On bootup press F2 to select BIOS setup 2 Select the Security menu on the BIOS Setup Utility window 3 Select Console Redirection 4 Select the options as shown in Figure 4 13 System Time BIOS Redirection Port Serial 2 RJ45 FIGURE 4 13 Console Redirection BIOS Setup Note The BIOS supports redirection of both video and keyboard by means of a serial link rear RJ45 serial COM2 connector When console redirection is enabled the local host server keyboard input and video output are passed both to the local keyboard and video connections and to the remote console through the serial link Keyboard inputs from both sources are valid and video is displayed to both outputs As an option the system can be operated without a keyboard or monitor attached to the host system an
103. e through holes in the rear panel It can provide a mechanism for identifying one system out of a group of identical systems This can be particularly useful if the server is used in a rack mount chassis in a high density multiple system application The LED is activated by depressing the front panel System ID pushbutton or if the server receives a remote System Identify command from a remote management console If activated by the front panel pushbutton the LED remains on until the pushbutton is depressed again When the LED illuminates due to a remote System Identify command the LED turns off after a timeout period An additional blue System ID LED is located on the front panel that mirrors the operation of the rear Main Board LED This LED reflects the state of the System Status LED on the front panel This is a bi color LED that can be on off green amber or blinking or combination thereof See Rear Panel Power Supply Status LED on page 3 11 for more detailed information 3 8 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide e November 2003 3 2 3 Front Panel System Status LED The front panel system status LED is located as shown in Figure 3 3 FIGURE 3 3 Location of Front Panel System Status LED The front panel system status LED has the states indicated in Table 3 5 TABLE 3 5 System Stat
104. ed to a file The SDR data is saved in standard SDR format To manage the SDR data double click the SDR Manager menu item on the Available Tasks pane of the main SSU window The SDR Manager main window appears and you can use the menu bar at the top of the window to manage SDR data 4 30 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 11 2 3 Managing the Field Replaceable Units The Field Replaceable Unit FRU Manager allows you to view the FRU information stored in the managed server The FRU records contain information about the system components such as manufacturer s name product name part number version number product and chassis serial number and asset tags This information may prove useful when troubleshooting faults in the server To manage the FRUs double click the FRU Manager menu item on the Available Tasks pane of the main SSU window The FRU Manager main window appears and you can use the menu bar at the top of the log window to manage FRU information Managing Platform Events The Platform Event Manager PEM allows you to configure and manage Platform Event Paging PEP Baseboard Management Controller Local Area Network BMC LAN Configuration and the Emergency Management Port EMP To use PEM double click the Platform Event Manager menu item on the Available Tasks pane of the main SSU window The Platform Event Manager main window appears You can click on the buttons in
105. el Power off the system but leave the AC power connected so the 5V standby is available 2 Verify that the Clear CMOS jumper is in the not clear position 4 6 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide e November 2003 4 4 2 Hold down the reset button for at least 4 seconds While the reset button is still depressed press the power on off button and hold for at least 2 seconds Simultaneously release both the power on off and reset buttons Upon completion of these steps the BMC asserts the clear CMOS signal to emulate the movement of the Clear CMOS jumper The BIOS clears CMOS as if you had moved the Clear CMOS jumper on the main board CMOS is cleared only once per front panel button sequence The BMC releases the CMOS clear line during the next system reset Removing the Clear CMOS jumper from the main board can disable the Front Panel CMOS reset function In addition the jumper should be retained in case the CMOS needs to be cleared using the baseboard header Using the Clear CMOS Jumper Follow these steps to clear the CMOS using the CLR CMOS jumper on the main board see Figure 4 5 Power off the system but leave the AC power connected so the 5V standby is available Verify that the CLR CMOS jumper is in the Clear position Press the power on off button When the BIOS detects a reset CMOS request either through the front panel or with the Clear CMOS jumper
106. em memory configuration of 2 x 128 MB 256 MB and a maximum system memory configuration of 6 x 2 GB 12 GB for the Sun Fire V65x server or 6 x 1GB 6 GB for the Sun Fire V60x server of DDR 266 or later registered compliant SDRAM The server supports DIMM sizes of 128 MB 256 MB 512 MB 1 GB and 2 GB The main board supports DDR 266 compliant registered ECC DIMMs operating at 266 MHz Only tested and qualified DIMMs are supported on the main board Note that all DIMMs are supported by design but only fully tested DIMMs will be supported The minimum supported DIMM size is 128 MB Therefore the minimum main memory configuration is 2 x 128 MB or 256 MB The largest size DIMM supported is a 2 GB stacked registered DDR 266 ECC DIMM based on 512 megabit technology Therefore the largest memory size supported is 12 GB The memory system on the main board has the following features m ECC single bit errors are corrected and multiple bit errors are detected a The maximum memory capacity is 12 GB for the Sun Fire V65x server and 6 GB for Sun Fire V60x server a The minimum memory capacity is 256 MB See Memory on page 5 9 for details on installing memory 1 Registered DIMMS are those with an onboard latch that resynchronizes the address control lines to the DIMM These latches are also buffers to allow the Main Board electronics to drive multiple row devices It is most common for ECC SDRAM modules to be registered 1 8 S
107. en Booting 2 5 2 2 1 4 Some Hard Drives Do Not Show Up During POST 2 5 2 2 1 5 Server Starts Booting Automatically at Power On 2 7 iv Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 3 2 4 2 2 1 6 Power On Self Test POST 2 7 2 2 1 7 Verifying Proper Operation of Key System LEDs 2 8 2 2 1 8 Confirming Loading of the Operating System 2 8 2 2 1 9 KVM PS 2 Keyboard Video Mouse Unit Causes System To Hang During POST 2 8 2 2 2 Power LED Does Not Light 2 9 2 2 3 Video Problems 2 10 2 2 3 1 No Video Appears on the Screen 2 10 2 2 3 2 Xserver Has Not Started 2 11 2 2 3 3 Characters Are Distorted or Incorrect 2 11 2 2 4 System Cooling Fans Do Not Rotate Properly 2 12 2 2 5 Disk Drive Activity Light Does Not Light 2 12 2 2 6 CD ROM Drive Activity Light Does Not Light 2 12 2 2 7 Cannot Connect to a Server 2 13 2 2 8 Problems with Network 2 13 2 2 9 Problems with Application Software 2 14 2 2 10 Bootable CD ROM Is Not Detected 2 14 2 2 11 Memory Configuration Errors 2 15 2 2 11 1 Memory DIMM Population Order 2 15 2 2 11 2 Soft Reboot Errors 2 16 2 2 11 3 Faulty Memory DIMMs 2 18 Other Problems 2 20 General Board and Feature Issues 2 21 Troubleshooting the Server Using Built In Tools 3 1 3 1 Diagnosing System Errors 3 1 3 1 1 LEDs 3 1 3 1 2 Beep Codes 3 2 3 1 3 POST Screen Messages 3 2 3 1 4 System Utilities 3 2 3 1 4 1 Platform Confidence Test PCT 3 2 3 1 4 2 System Setup Utility SSU 3
108. er board and remove the fan module Disconnect both ends of all remaining cables that connect the main board to other chassis components Note Some cables may be soldered at one end Remove only the connectorized end Remove the backplane board Disconnect the USB ribbon cable from the server board and lay the free end back over the drive bays Remove the heat sink processor air dam and any processors and memory DIMMs that you wish to use with the new board Remove the eight screws that secure the processor retention mechanisms and the three mounting screws that secure the server board to the chassis see Figure 5 38 Caution Make sure that all 11 screws have been removed before attempting to take out the board Do not use any tools to try to pry the board out of the server Attempting to do this could result in severe damage to the board Chapter 5 Maintaining the Server 5 55 17 18 19 20 21 22 23 24 FIGURE 5 38 Location of the Mounting Screws Slide the board toward the front of the chassis until the I O connectors are clear of the chassis I O openings lift the server board from the chassis and place it on an antistatic pad Remove the powe
109. ey server components Topics in this chapter include Safety and Compliance Information on page xvii Who Should Use This Book on page xviii How This Manual is Organized on page xviii Typographic Conventions on page xviii Related Documentation on page xix Ordering Sun Documents on page xix Shell Prompts in Command Examples on page xix Notice on page xx Support on page xx Sun Welcomes Your Comments on page xxi Safety and Compliance Information Before you service this product refer to the important safety and compliance information in the Sun Fire V60x and Sun Fire V65x Server Safety and Compliance Guide 817 2028 10 This document is included on the Documentation CD that was shipped with your server and is also available online at http www sun com products n solutions hardware docs Servers Workgroup_Servers Sun_Fire_V60x V65x xvii Who Should Use This Book The intended audience for this book is Sun field service personnel who are responsible for maintaining Sun Fire V60x and Sun Fire V65x Servers How This Manual is Organized This manual contains the following chapters a Chapter 1 Troubleshooting Guidelines on page 1 1 a Chapter 2 Troubleshooting Specific Problems on page 2 1 a Chapter 3 Troubleshooting the Server Using Built In Tools on page 3 1 a Chapter 4 Powering On and Configuring the Server on page
110. f the chassis 5 44 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 ire v Q O w oO 2 o DS O 2o c a o2 S A a2 6 gt ee 25 8 em 2 SCH BOO Oa nm i Y 4 NN E 2 ea o N 2 S a a S y Oe Oaie fl LN Y Ip Y lt j 7 Chapter 5 Maintaining the Server 5 45 FIGURE 5 33 Removing the Fan Module 5 5 9 4 Sun Fire V65x Server Fan Module Replacement Replacing the fan module is essentially the reverse of the procedure described in Sun Fire V65x Server Fan Module Removal on page 5 44 1 Note the raised tabs on the chassis floor and the corresponding notches in the bottom of the fan module 2 Lower the fan module until it is just above the chassis floor 3 Align the notches in the fan module with the raised tabs on the chassis and lower the fan module onto the floor 4 While pressing down on the fan module slide it to the right until the latch snaps into place 5 Plug the fan cables into the server board system fan connectors 6 Make sure the USB cable is routed along the top of the fan module 7 Connect the flex circuit cable floppy FP IDE to the backplane 8 Install the flex circuit cable retention clip 9 Install the full height PCI riser board 10 Replace the plastic processor air duct 11 Thread the SCSI cable through the retaining hooks on the plastic processor air duct 12 Replace the chassis cover Note
111. feature of the SSU allows you to select the boot order for all bootable peripheral devices To select the boot device priority double click the MBA Boot Devices menu item in the Available Tasks pane of the SSU main window The Multiboot Options Add in window appears see Figure 4 18 System Setup Utility LIT File Preferences Help Topics Hard Drives Move si Move Down FIGURE 4 18 Multiboot Add in Window To change boot priorities select a boot device and use the Move Down and Move Up buttons to move the device boot priority Note This menu allows you to change the boot order without going into the BIOS setup Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Configuring Security The Password Authorization feature of the SSU allow you to set BIOS passwords and other security options To configure server security double click the PWA Security menu item in the Available Tasks pane of the SSU main window The main Security window appears see Figure 4 19 Note This menu allows you to change the security settings without going into the BIOS setup System Setup Utility LIET File Preferences Help Topics Available Tasks Task Description Admin Password User Password Close Hel as eee Options FIGURE 4 19 Security Main Window Use the Admin Password User Password and Options buttons to configure the security options Chapter 4 Powering On and
112. following a Are you using third party SCSI adapters System memory limitations limit the number and size of option ROMs in the system If you place too many adapters or adapters that take up too much space in memory they may not install and show the hard drives connected to them a If the Option ROM scan for your card or the onboard SCSI device has been disabled no drives connected to that device will show up during POST m Verify that the device power cable is firmly connected a Check your SCSI ID numbers SCSI devices must have their own unique ID on the SCSI bus This number must be set with jumpers on the device The ID number should be set starting at 0 and must be set lower than 8 if booting from the drive a Check for proper termination on the SCSI bus Note Note In the unlikely event that the server does not boot it may be that the server did not recognize the SCSI drive s If this happens try booting again a If you mix internal LVDS SCSI hard drives with different bus speeds in a single system you may encounter problems You may need to modify the SCSI Device Configuration settings to allow this type of configuration a You may encounter issues when booting from internal SCSI drives when an external SCSI device or an external SCSI array has been added To boot the system using internal drives the SCSI BIOS settings may need to be modified for each channel See example below 1 To go into the SCSI BIOS setting
113. forcing minimum back online Yes POST Error Beep Codes The tables in this section list the POST error beep codes Prior to system video initialization the BIOS and BMC use these beep codes to notify users of error conditions TABLE 3 9 BMC Generated POST Beep Codes Beep Code Description 1 One short beep before boot normal not an error 1 2 Search for option ROMs One long beep and two short beeps on checksum failure 1 2 2 3 BIOS ROM checksum 1 3 1 1 Test DRAM refresh 1 3 1 3 Test 8742 keyboard controller 1 3 3 1 Auto size DRAM System BIOS stops execution here if the BIOS does not detect any usable memory DIMMs 1 3 4 1 Base RAM failure BIOS stops execution here if entire memory is bad 2 1 2 3 Check ROM copyright notice 2 2 3 1 Test for unexpected interrupts 1 5 1 1 FRB failure processor failure 1 5 2 2 No processors installed or processor socket 1 is empty 1 5 2 3 Processor configuration error for example mismatched VIDs 1 5 2 4 Front side bus select configuration error for example mismatched BSELs Chapter 3 Troubleshooting the Server Using Built In Tools 3 19 TABLE 3 9 Beep Code Description 1 5 4 2 BMC Generated POST Beep Codes Power fault DC power unexpectedly lost for example power good from the power supply was deasserted 1 5 4 3 1 5 4 4 Chipset control failure Power control failure for example power good from the power supply did not respond to power request 1 The code
114. g beep and two short beeps on checksum failure 1 2 2 3 BIOS ROM checksum 1 3 1 1 Test DRAM refresh 1 3 1 3 Test 8742 keyboard controller 1 3 3 1 Auto size DRAM System BIOS stops execution here if the BIOS does not detect any usable memory DIMMs 1 3 4 1 Base RAM failure BIOS stops execution here if entire memory is bad 2 1 2 3 Check ROM copyright notice 2 2 3 1 Test for unexpected interrupts 1 5 1 1 FRB failure processor failure 1 5 2 2 No processors installed 1 5 2 3 Processor configuration error for example mismatched VIDs 1 5 2 4 Front side bus select configuration error for example mismatched BSELs 1 5 4 2 Power fault 1 5 4 3 Chipset control failure 1 5 4 4 Power control failure Some Hard Drives Do Not Show Up During POST The server board includes an embedded Adaptec AIC 7902 controller which provides dual Ultra320 Low Voltage Differential SCSI LVDS channels The SCSI bus is terminated on the server board with active terminators that cannot be disabled The onboard device must be at one end of the bus The device at the other end of the cable must also be terminated LVDS devices generally do not have Chapter 2 Troubleshooting Specific Problems 2 5 termination built in and need to have a termination source provided Non LVDS devices generally are terminated through a jumper or resistor pack on the device itself In the event that there is a problem with hard disk drives being recognized check the
115. g test results 5 After the remaining test results are displayed press any key to see the analog sensor readings similar to the screens shown in Figure 4 27 and Figure 4 28 6 After the analog sensor readings are displayed press any key to return to the main Platform Confidence Test menu 7 You can view the RESULT LOG file in a similar fashion to that previously explained 4 42 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 11 2 4 4 11 2 5 Comprehensive Test With Continuous Looping This test performs the same test as the Comprehensive Test but runs continuously until stopped To stop the testing and display the test pass count press Ctrl Break The number of successful test loops executed is shown as Pass Count n at the upper right side of the screen The run time for this test is approximately 15 to 20 minutes per pass depending on the amount of memory installed Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component Run Baseboard Management Controller BMC Firmware Update Use this menu selection to update the Baseboard Management Controller BMC firmware The BMC code resides both on the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD and on the Service Partition in the C BMC directory The BMC firmware can also be updated from a standalone bootable floppy diskette Run HSC Firmware Update Use
116. g the Server 5 Place the edge connector end of the replacement module onto the chassis floor and slide it toward the front of the chassis until the edge connector is fully inserted in the power distribution board connector 6 Make sure that the rear of the power supply is fully seated on the chassis floor and in front of the raised guides 7 Replace the chassis cover Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 5 5 6 2 Sun Fire V65x Server Power Supply The default configuration for the Sun Fire V65x server is a single 500 watt power supply Optionally you can add a second power supply module to provide a redundant 1 1 system Caution If you do not have the second redundant power supply module you must take the server out of service before replacing the single module To replace a power supply 1 Squeeze the module handle to depress the latch Figure 5 27 panel 1 2 Rotate the handle down while pulling the module toward you Figure 5 27 panel 2 3 As you pull the module out support the module with your free hand 4 Insert a new power supply module in the bay 5 Grip the module handle rotate it down and push the module into the bay 6 When the module is nearly all of the way in the handle will rotate up At this time push firmly on the front of the handle to lock the latch
117. h the diagnostics CD inserted in the CD tray Note You will need to make sure that the boot sequence is such that the diagnostics CD is selected as a boot device before the hard disk This is normally the case by default Refer to BIOS Setup Utility lt F2 gt on page 4 9 for more information on how to change the boot sequence After bootup verify that a DOS like menu appears similar to that of the Service Partition menu see Service Partition Menu on page 4 24 except that there is one more item at the top of the screen for managing the service partition as shown in Figure 4 32 This menu works in both local and remote console redirection modes and allows you to create driver and utilities diskettes run system utilities and create format or remove a service partition Chapter 4 Powering On and Configuring the Server 4 45 Create Diskettes ystem Utilities Service Partition Sun Cobalt Grizzly Diagnostic CD Beta 1 release Arrow keys move highlight ENTER to select ESC to abort FIGURE 4 32 Boot Complete from the Diagnostics CD With the diagnostics CD menu you can perform the operations listed in the following sections 4 12 1 Create Diskettes For more information on creating diskettes see Create Diskettes on page 4 25 4 12 2 Run System Utilities For more information on running system utilities see System Utilities on page 4 26 4 46 Sun Fire V60x and Sun Fire V65x Servers
118. he Diagnostics CD 2 Select Service Partition gt Run Service Partition Administrator The Service Partition Administration Menu shown in Figure 4 34 appears 4 48 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Service Partition Administration Menu Release 3 2 1 c Copyright c 1998 2002 Intel Corporation All Rights Reserved View overview of service partition administration Scan for existing service partition Create service partition first time Format service partition and install software Remove a service partition Exit Choose one 1 2 3 4 5 ESC FIGURE 4 34 Service Partition Administration Menu Select option 4 to reformat the partition The partition will be reformatted and the diagnostic software will be loaded on the partition Note The service partition is mounted as diag when running Linux If the service partition was completely removed intentionally or accidentally with Linux fdisk In Linux go into fdisk and manually create a type 12 partition that is at least 41 MB in size on either sdal or sdb1 or the sdc drive for that matter if it is installed Reboot the system from the CD Select Service Partition gt Run Service Partition Administrator and then choose option 4 The partition will be reformatted and the diagnostic software will be loaded on the partition For instructions on installing the service partition in a the Sun Fire V60x
119. hed single beeps failed images BIOS already passed control to OS and flash utility Two long high Recovery EFh BIOS recovery succeeded ready for pitched beeps complete powerdown reboot Dad 3 22 POST Progress Code LED Indicators To help diagnose POST failures a set of four bi color diagnostic LEDs is located on the back edge of the server main board Each of the four LEDs can have one of four states Off Green Red or Amber The LED diagnostics feature consists of a hardware decoder and four dual color LEDs During boot block POST and post boot block POST the LEDs display all normal Port 80 codes representing the progress of the BIOS POST Each POST code is represented by a combination of colors from the four LEDs The LEDs are in pairs of green and red The POST codes are broken into two nibbles an upper and a lower nibble Each bit in the upper nibble is represented by a red LED and each bit in the lower nibble is represented by a green LED If both bits are set in the upper and lower nibble then both red and green LEDs are lit resulting in an amber color Likewise if both bits are clear the red and green LEDs are off Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Figure 3 7 shows examples of how the POST LEDs are coded POST LEDs as viewed from back of server upper nibble bits Oo lower nibble bits RED GREEN OFF AMEER POST Code 95h0 1 1 upper nibble
120. hooting Guide November 2003 4 5 1 3 4 5 1 4 Network Boot lt F12 gt If you press F12 to boot from the network the server software looks for a valid boot file name on the network If it finds such a file name it boots from the network If it cannot find a valid file name it gives up and continues to boot from the hard disk Figure 4 10 is an example of how the screen appears when booting from the network fails Intel R Boot Agent Version 4 0 19 Copyright C 1997 2001 Intel Corporation CLIENT MAC ADDR 88003 47 DS 73 20E GUID 30743B9C 563C D611 0080 20 73D54 70300 PKE E53 No boot filename received PKE H F Exiting Intel PXE ROM FIGURE 4 10 Network Boot Failed Screen Pre boot Execution Environment PXE is a method by which the server can be booted from a remote server This allows the system to boot without any knowledge of the Operating System on the server The PXE environment uses DHCP to obtain network addresses PXE is primarily used for loading operating systems configuring the system or burn in type testing PXE booting will only work if a properly configured PXE server is available Note If you use the F12 key to initiate a boot from the network you will not have the opportunity to choose the Ethernet port from which to boot If you want to have a choice of which port to boot from use the Esc key to initiate the network boot Choose Boot Device lt ESC gt Press the Esc key to go to the boot
121. ice The Sun Fire V65x server however may be optionally equipped with a redundant power supply In this case it is possible to replace the defective power supply without powering down the server Caution Before touching or replacing any component inside the server disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap Sun Fire V60x Server Power Supply Before removing the cover to work inside the system observe the safety guidelines previously mentioned Follow these steps to replace the Sun Fire V60x server power supply Unplug the power cord from the power source and the power supply module Lift the rear of the module up Figure 5 26 panel 2 only enough to clear the raised guides on the chassis floor Caution As shown in Figure 5 26 lift the rear of the power supply module up only enough to clear the raised guides Lifting higher may damage the edge connector and power distribution board 5 32 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 Push the module to the rear of the chassis until it disengages from the power distribution board 4 Lift the module out of the chassis FIGURE 5 26 Removing the Power Supply 5 33 Chapter 5 Maintainin
122. igure 4 23 This phase of the testing determines your server configuration Chassis Name Searching for Boot Block Device Info in BMC code Device Boot Info Block Revision 5 Build Date Oct 6 2662 14 35 63 IMB Device BMC Slave Addr 26 Secondary Slave Addr A FW Revision 8 12 Code Range 66h 2CHEh Boot Struct Checksum 6CE3h 64K block Range h 2C Dh Boot Code Checksum 6666h Searching for OpCode Device Info in BMC code FIGURE 4 23 Platform Confidence Quick Test first screen More screen entries appear as the test progresses Eventually the screen shown in Figure 4 24 appears 4 34 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Hardware Test Configuration 577 KB Intel lt R gt Xeon lt TM gt Microprocessor 2791 MHz 0x2 Found 3F8 is enabled 1 4Mb 3 5 Inch LBA Sectors 71687369 Total Size 35663MB 512 KB UGA compatible controller 256KB RAM ATI RAGE RL CONFIG _CHIP_ID x2 7684 752 VESA 2 6 ATI MACH64 8 MB FIGURE 4 24 Platform Confidence Quick Test Hardware Test Configuration last screen This screen indicates the hardware configuration that has been determined from the initial tests 2 If the hardware configuration does not match the configuration of your server press the Ctrl and Break keys simultaneously Ctrl Break You are prompted to check all the cables and your server configuration then you are exited to the Platform Confidence Test main menu
123. indicates the beep sequence for example 1 5 1 1 means a single beep then a pause then 5 beeps in a row then a pause then a single beep then a pause and then finally a single beep TABLE 3 10 BJIOS Generated Boot Block POST Beep Codes Beep Code Error Message Description 1 Refresh timer failure The memory refresh circuitry on the motherboard is faulty 2 Parity error Parity can not be reset 3 Base memory failure Base memory test failure See Table 3 11 on page 3 21 for additional error details 4 System timer System timer is not operational 5 Processor failure Processor failure detected 6 Keyboard controller Gate A20 The keyboard controller may be bad The BIOS cannot switch failure to protected mode 7 Processor exception interrupt error The CPU generated an exception interrupt 8 Display memory read write error The system video adapter is either missing or its memory is faulty This is not a fatal error 9 ROM checksum error System BIOS ROM checksum error 10 Shutdown register error Shutdown CMOS register read write error detected 11 Invalid BIOS General BIOS ROM error 3 20 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 TABLE 3 11 Memory 3 Beep and LED POST Error Codes Debug Port 80h Error Diagnostic LED Decoder Beep Code Indicator G green R red A amber Meaning MSB LSB 3 00h Off Off Off Off No memory was found in the system 3 Oth Off Off Off G Memory mixed type detected 3 02h
124. ion press the switch once and power comes on press again and the power goes off It is recommended that you use the standard shutdown command before you power down the system using the switch Activate the front panel switch to remove power only when the OS for example Linux has completed the shutdown process and has halted Should you elect to use an ACPI enabled Linux kernel and run the ACPI daemon acpid the behavior of the front panel switch will change to support the standard soft off capability That is when the server is on and the power switch is pressed the OS is notified and begins a graceful shutdown Additionally with ACPI enabled and the server on pressing the power button for longer than four seconds forces an immediate non graceful shutdown Note that the Solaris OS does not support ACPI 4 4 4 4 1 Clearing CMOS It may be necessary to clear CMOS memory in order to restore the default BIOS passwords required to boot the server user or access setup functions supervisor as well as the default BIOS settings The CMOS configuration RAM may be reset by one of two methods a The CMOS clear button sequence from the front panel a The Clear CMOS jumper located on the baseboard The CMOS can also be set to a default setting through the BIOS Setup It will automatically be reset if it becomes corrupted Using the Front Panel Follow these steps to clear the CMOS using the buttons on the front pan
125. isk Drive Green The Drive Activity LED on the front panel is used to indicate drive activity from Activity the onboard SCSI controller The server Main Board also provides a header giving access to this LED for add in IDE or SCSI controllers Blinking green random Hard disk activity Steady amber Hard disk fault Off No disk activity nor fault condition or power is off System ID Blue The blue System Identification LED is used to help identify a system for servicing when it is installed within a high density rack or cabinet that is populated with several other similar systems The System ID LED is illuminated when the system ID button located on the front panel is pressed If activated by the front panel pushbutton the LED remains on until the pushbutton is depressed again The LED also illuminates when the server receives a remote System Identify command from a remote management console In this case the LED turns off after a timeout period The timeout period is configurable with a default of 15 seconds An additional blue System ID LED on the Main Board is visible through the rear panel It mirrors the operation of the front panel LED Chapter 3 Troubleshooting the Server Using Built In Tools 3 5 3 2 1 2 Front Panel Pushbuttons The front panel pushbuttons are summarized in Table 3 3 TABLE 3 3 Front Panel Pushbuttons Switch Function Power Sleep This pushbutton is used to toggle the system power on and off This bu
126. it CRU Procedures The following equipment is customer replaceable Front Bezel DVD Floppy or CD ROM Floppy Combo Module Memory CPU and Heatsink Power Supply Unit Hard Disk Drives Fan Module PCI Cards Battery Keyboard Mouse Adapter Emergency Management Port Cable Note Any configuration changes CPU memory hard disk add in PCI cards and so forth cause the server to revert to the factory default BIOS settings regardless of how the server boot options have been set up using the System Setup Utility or the BIOS setup 5 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 1 Front Bezel To access the system controls and peripherals when a front bezel is installed grasp the bezel at the finger hole on the left side and gently pull it towards you unhinging it at the right until it unsnaps from the chassis Replace the bezel using the reverse process see Figure 5 2 and Figure 5 3 con Chassis Handle D Bezel Locating Tab FIGURE 5 2 Sun Fire V60x Server Bezel Replacement Chapter 5 Maintaining the Server 5 5 amp Chassis Handle D Bezel Locating Tab FIGURE 5 3 Sun Fire V65x Server Bezel Replacement 5 6 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide e November 2003 5 9 2 7 Floppy DVD CD ROM Combo Module Caution A floppy DVD CD ROM module is NOT hot swappable Before replacing it you must first take the server out of se
127. it Removal 5 67 Sun Fire V65x Server Cable Routing 5 68 Installing the Flex Cable 5 69 Installing the Backplane Retention Clip 5 70 Sun Fire V60x and Sun Fire V65x Servers System FRU Installation 5 72 Sun Fire V60x and Sun Fire V65x servers e November 2003 Tables TABLE 1 1 TABLE 1 2 TABLE 2 1 TABLE 2 2 TABLE 3 1 TABLE 3 2 TABLE 3 3 TABLE 3 4 TABLE 3 5 TABLE 3 6 TABLE 3 7 TABLE 3 8 TABLE 3 9 TABLE 3 10 TABLE 3 11 TABLE 3 12 TABLE 3 13 TABLE 3 14 TABLE 4 1 Jumper Function Summary 1 6 Supported Processors and Heatsinks 1 7 Index to Problems 2 2 Bootup Beep Codes 2 5 Server LEDs 3 3 Front Panel LEDs 3 5 Front Panel Pushbuttons 3 6 Rear Panel LEDs 3 8 System Status LED States 3 9 Power Supply Status LED States 3 12 Standard POST Error Messages and Codes 3 16 Extended POST Error Messages and Codes 3 17 BMC Generated POST Beep Codes 3 19 BlOS Generated Boot Block POST Beep Codes 3 20 Memory 3 Beep and LED POST Error Codes 3 21 BIOS Recovery Beep Codes 3 22 Boot Block POST Progress LED Code Table Port 80h Codes 3 24 POST Progress LED Code Table Port 80h Codes 3 25 Jumper Function Summary 4 3 XV xvi Sun Fire V60x and Sun Fire V65x servers November 2003 Preface This Troubleshooting Guide provides information on how to identify isolate and fix problems with the Sun Fire V60x and Sun Fire V65x servers It also explains how to remove and replace certain k
128. k Configured for DCD Signal pin 7 connected to DCD 4 4 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 3 Powering On Caution The power switch on the front of the server is an On Off switch but it does not isolate the equipment from the AC power being supplied through the AC power cord The location of the switch is shown in Figure 4 4 Power Sleep Pushbutton Reset Pushbutton FIGURE 4 4 Power and Reset Switches on the Front Panel Pushing the power pushbutton sends a signal to monitoring circuitry inside the server The switch does not directly control high voltage AC it controls only low voltage signals When the monitoring circuitry detects that the button has been depressed it activates the power supply and powers up the server Likewise when the server is powered up pushing the switch powers the server down The main method for isolating the server from all high voltage is to physically remove the AC power cord If the power cord is not removed the only other way to isolate the server from high voltage is to open all external circuit breakers that supply AC voltage to the equipment Chapter 4 Powering On and Configuring the Server 4 5 Caution As shipped the Sun Fire V60x and Sun Fire V65x servers do not have Advanced Configuration and Power ACPI enabled As a result the front panel power switch operates as a normal power switch In this configurat
129. ket Dispose of the battery according to local ordinances 7 Remove the new lithium battery from its package 8 Being careful to observe the correct polarity lay the battery in the socket 9 Push the battery down so that the metal retainer locks the battery in the socket 10 Close the chassis 11 Run the BIOS setup press F2 when prompted at bootup to restore the configuration settings to the RTC 12 Restore your custom BIOS settings Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 5 50 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 12 Keyboard Mouse Y Adapter To install the PS 2 keyboard mouse Y adapter install the adapter into the round keyboard mouse connector as shown in Figure 5 36 FIGURE 5 36 Installing the PS 2 Keyboard Mouse Y Adapter Chapter 5 Maintaining the Server 5 51 5 0 15 5 5 13 1 5 5 13 2 Emergency Management Port Cable Two different serial port adapters may be installed into the rear panel RJ 45 Serial COM2 Emergency Management Port EMP connector m DSR Peripherals Cable a DCD Modem Cable Installing the DSR Peripherals Cable Follow these steps to install the DSR Peripherals cable Plug the DSR Peripherals cable into the rear panel RJ 45 Serial COM2 connector as shown in Figure 5 37
130. latform Confidence Test To run this test 1 Select Run Platform Confidence Test reboot required on the System Utilities submenu The prompt shown in Figure 4 20 is displayed FIGURE 4 20 Reboot Prompt Note After the PCT is finished you can view the results of the tests However the system will be rebooted to the service partition after you finish PCT 2 Press any key to bring up the warning screen shown Figure 4 21 Note that it is advisable to save the System Event Log to a floppy before running the PCT and that it is recommended that testing not be interrupted during the hardware probe or during the BMC test 4 32 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 WARNING This test utility as part of its normal test processes will automatically clear the System Event Log SEL upon exiting We strongly urge you to save the SEL to a floppy diskette using the SSU Utility prior to running this test to maintain a permanent record of the SEL The PCT when run in continuous loop mode is sensitive as to when the test is manually interrupted using the ctrl break key combination We recommend that the tests not be interrupted during the Hardware Probe or during the BMC test module Aborting during the Hardware Probe or the BHC test can cause unpredictable results Press ENTER to continue or ESC to exit FIGURE 4 21 Warning Prompt 3 Press Enter to continue or Esc to exit If
131. lems Reference Problems Starting Up page 2 3 Power LED Does Not Light page 2 9 Video Problems page 2 10 System Cooling Fans Do Not Rotate Properly page 2 12 Disk Drive Activity Light Does Not Light page 2 12 CD ROM Drive Activity Light Does Not Light page 2 12 Cannot Connect to a Server page 2 13 Problems with Network page 2 13 Problems with Application Software page 2 14 Bootable CD ROM Is Not Detected page 2 14 Memory Configuration Errors page 2 15 Try the solutions in the order given If you cannot correct the problem contact your service representative or authorized dealer for help 2 2 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 21 2 2 1 1 Problems Starting Up If the server does not start up properly use the information in this section to diagnose problems Server Does Not Power On If the server does not power on check the following m Does the main server board have power Open the chassis lid and check the 5V Standby LED on the baseboard to see if it is illuminated If your server is plugged in this LED should be green See Figure 3 5 Fault and Status LEDs on the Server Board on page 3 13 for the location of this LED m Check the power cord connection The Sun Fire V65x server allows the use of two power supplies and the system will not power on if one power cord is used and it is plugged into the wro
132. ll Not Function Some USB connected CD ROM devices perform unreliably on the Sun Fire V60x and Sun Fire V65x servers Use the internal CD ROM device when possible Inability to Boot Server When an External SCSI Hard Drive is Connected The external SCSI bus is scanned for disk devices before the internal bus is scanned As a result the operating system may label external drives before internal drives Exercise caution when adding and removing external devices because the operating system drive device names may change leaving the system unable to boot because the external SCSI device may not be a boot drive See Some Hard Drives Do Not Show Up During POST on page 2 5 for more information and a procedure for forcing the system to boot from internal drives PS 2 Mouse Misidentification A PS 2 pointing device mouse may be misidentified during Linux OS installation To correct the mouse configuration for a Linux OS Type setup at the command prompt to run the setup tool Select the Mouse configuration option then select the connected pointing device Save the change and exit the setup utility A Solaris OS automatically detects the mouse and if it finds that it needs to change some information it starts the kdmconfig on reboot 1 2 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 1 2 12 1 1 2 1 1 1 2 1 2 Resetting the Server Quite often a problem can be solved merely be
133. ll removable devices in the order defined by the boot priority Chapter 2 Troubleshooting Specific Problems 2 7 2 2 1 7 2 2 1 8 2 2 1 9 During POST the server BIOS presents screen messages to indicate error conditions POST also provides beep codes to give you audible clues regarding the performance and operation of the server when there is no video display that can present error messages In addition a set of four bi color diagnostic LEDs is located on the back edge of the server main board These LEDs are active during POST and indicate the state of the server Each of the four LEDs can have one of four states Off Green Red or Amber See Power On Self Test POST on page 3 15 for a complete description of the screen messages beep codes and diagnostic LEDs Verifying Proper Operation of Key System LEDs As POST determines the system configuration it tests for the presence of each mass storage device installed in the system As each device is checked its activity light should turn on briefly Check to see if the disk drive activity light for each drive turns on briefly If not see Disk Drive Activity Light Does Not Light on page 2 12 Confirming Loading of the Operating System Once the system boots up an operating system prompt similar to the one shown below appears on the screen this example is for Linux Linux release x x Kernel 2 4 93lenterprise on an i386 lt hostname gt login The prompt varies acco
134. llows you to run the Platform Confidence Test in the same way that you run it from the service partition or from the diagnostics CD With the diskette you can perform the functions described in Run Platform Confidence Test PCT on page 4 31 m Create BIOS Diskette choosing this option creates one diskette You may use this diskette to update the BIOS of any server Chapter 4 Powering On and Configuring the Server 4 25 m Create HSC Diskette choosing this option creates one diskette that allows you to update the HSC firmware m Create BMC Diskette choosing this option creates one diskette that allows you to update the BMC firmware 4 11 2 System Utilities Pressing Enter with the System Utilities menu item highlighted brings up the submenu shown in Figure 4 16 Note If you are running the Solaris operating environment you must use a PS 2 keyboard when running System Utilities from the Service Partition Run System Setup Utilit Run Platform Confidence Test reboot required Run BHC Firmware Update Run HSC Firmware Update Run FRU SDR Update Run BIOS Update reboot required Reboot to Service Partition Reboot System Sun Cobalt Grizzly Service Partition Menu Beta 1 release Arrow keys move highlight ENTER to select ESC to abort FIGURE 4 16 System Utilities Submenu The following submenus are available Run System Setup Utility Run Platform Confidence Test reboot required Run BMC Firmware Update
135. ments and suggestions You can email your comments to Sun at docfeedback sun com Please include the part number 817 2024 xx of your document in the subject line of your email Preface xxi xxii Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CHAPTER 1 Troubleshooting Guidelines This chapter gives general guidelines and checklists to help you troubleshoot problems with the Sun Fire V60x and Sun Fire V65x servers in an efficient organized manner Following these guidelines will save you time and lead you more quickly to problem resolution This chapter contains the following sections Startup related Issues on page 1 1 Resetting the Server on page 1 3 Disabling Integrated Components on page 1 4 Setting Main Board Jumpers on page 1 5 Processor and Heatsink Configurations on page 1 6 Memory Configurations on page 1 8 Problems With SNMP on page 1 9 Problems With Initial System Startup on page 1 10 Problems With New Application Software on page 1 11 Problems After the System Has Been Running Correctly on page 1 12 1 1 Startup related Issues Le Ethernet Port Delay Ethernet ports may take a short amount of time less than 1second to activate after ifconfig brings them up This has been observed when the Server is running Red Hat Linux v7 2 or v7 3 1 1 1 1 2 LLa 1 1 4 USB Connected External CD ROM Drive Wi
136. meout action 4 20 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 The BIOS is responsible for disabling the FRB2 timeout before initiating the option ROM scan prior to displaying a request for a Boot Password or prior to an Extensive Memory Test The BIOS re enables the FRB2 timer after the Extensive Memory Test The BIOS provides a user configurable option to change the FRB2 response behavior These four options are a Disable on FRB2 m Never Disable a Disable after three consecutive FRB2s a Disable FRB2 timer The option of Disable on FRB2 does the following if the FRB2 timer expires for example a processor has failed FRB2 the BMC resets the system As part of its normal operation the BIOS obtains the watchdog expiration status from the BMC If this status shows an expiration of the FRB2 timer the BIOS logs an FRB2 event with the event data being the last Port 80h code issued in the previous boot The BIOS also issues a Set Processor State command to the BMC indicating an FRB2 failure and tells it to disable the BSP and reset the system The BMC then disables the processor that failed FRB2 and resets the system causing a different processor to become the BSP The option of Never Disable performs all the same functions as Disable on FRB2 with the exception that the BIOS does not send a Set Processor State command to the BMC The BIOS still logs the FRB2 event in the SEL The option of Disable af
137. mind that unplugging the system or flipping a switch on the power strip both remove power a Follow the correct power removal sequence make sure the system has shut down before removing the power cord Power On Self Test POST Each time you turn on the system the BIOS begins execution of POST POST discovers configures and tests the processors memory keyboard and most installed peripheral devices The time needed to test memory depends on the amount of memory installed POST is stored in flash memory To execute and monitor POST Turn on your video monitor and system After a few seconds POST begins to run and displays a splash screen While the splash screen is displayed m Press lt F2 gt to enter the BIOS Setup see BIOS Setup Utility lt F2 gt on page 4 9 OR m Press lt Esc gt to view POST diagnostic messages and change the boot device priority for this boot only see Choose Boot Device lt ESC gt on page 4 13 OR a Ifthe Service Partition is installed press lt F4 gt to run the System Setup Utility see Using the Service Partition Menu on page 4 24 If you do not press lt F2 gt or lt Esc gt or lt F4 gt and do NOT have a device with an operating system loaded the boot process continues and the system beeps once The following message is displayed Operating System not found At this time pressing any key causes the system to attempt a reboot The system searches a
138. n off all peripheral devices connected to the system Turn off the system by pressing the power button on the front of the system Then unplug the AC power cord from the system or wall outlet Label and disconnect all peripheral cables and all telecommunication lines connected to I O connectors or ports on the back of the system Before handling components attach a wrist strap to a chassis ground of the system any unpainted metal surface 5 2 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 4 Removing and Replacing the Cover Many of the equipment replacement procedures require that you remove the chassis cover Before you remove the cover observe the safety instructions in the section titled Safety Before You Remove the Cover on page 5 2 To remove the cover follow these steps While pressing the blue latch button A with your left thumb push down on the top cover and slide it back using the heel of your right hand on the blue pad see Figure 5 1 FIGURE 5 1 Removing the Cover Set the cover aside and away from the immediate work area Note A non skid surface or a stop behind the chassis may be needed if attempting to remove the top cover on a flat surface Sliding the server chassis on a wooden surface may mar the surface there are no rubber feet on the bottom of the chassis Chapter 5 Maintaining the Server 5 3 D D Customer Replaceable Un
139. n only on the next reboot and will retest the processors and bring them back online if marked failed a When using multiple PCI adapter cards in a PCI riser with more than one slot populate the slots from the bottom up 24 General Board and Feature Issues 1 Why is the serial COM2 port implemented through an RJ 45 connector The server board is designed specifically for the High Density Rack Mount HDRM environment Therefore several HDRM features such as a high density serial port have been implemented The intention for the rear RJ 45 serial port is for serial concentrator applications Several serial concentrators on the market are accessed through an RJ 45 serial port In order to accommodate both serial concentrator communication standards and standard modem UPS type communication standards there are a set of jumpers located directly behind the RJ 45 serial connector on the baseboard These jumpers can be used to route the DSR signal to pin 7 on the connector for serial concentrator type implementations or have the DCD signal routed to that pin for modem UPS type implementations See Setting the Serial COM2 Port Jumper on page 4 4 for details 2 How do I disable the integrated components Onboard controllers can be disabled through the server board BIOS setup To enter BIOS setup press F2 when prompted during the boot up process 3 What jumpers are available and how should they be set There are three headers on the main
140. nctional status of the server board Glows green when all systems are operating normally Glows amber when one or more systems are in a fault status This LED mirrors the function of the system status LED on the front panel See Table 3 5 on page 3 9 for a description of the LED states m 5V Standby LED This green LED is on when the server is plugged into AC power whether or not the server is actually powered on AC power is applied to the system as soon as the AC cord is plugged into the power supply a System ID LED This blue LED can be illuminated to identify the server when it is part of a large stack of servers See System ID LEDs on page 3 15 for details 3 14 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 2 6 System ID LEDs A pair of blue LEDs one at the rear of the server and one on the front panel can be used to easily identify the server when it is part of a large stack of servers A single blue LED located at the back edge of the server board next to the backup battery is visible through the rear panel The two LEDs mirror each other and can be illuminated by the Baseboard Management Controller BMC either by pressing a button on the chassis front panel or through server management software When the button is pressed on the front panel both LEDs illuminate and stay illuminated until the button is pressed again If the LED is illuminated through a remote System Identify command th
141. ng 4 6 components disabling 1 4 console redirection 3 26 contact technical support 3 29 cover replacing 5 3 CRU procedures 5 4 air baffle 5 38 battery 5 49 emergency management port cable 5 52 fan module 5 41 floppy CD ROM combo module 5 7 hard disk drives 5 36 keyboard mouse adapter 5 51 memory 5 9 PCI cards 5 47 D determining a faulty component 5 2 diagnostic CD using 4 45 disabling components 1 4 diskette BIOS 4 25 BMC 4 26 FRU SDR load utility 4 25 HSC 4 26 Platform Confidence test 4 25 system setup utility 4 25 E EMP Emergency Management Port 4 19 errors diagnosing 3 1 Extended memory 3 26 F Fault Resilient Booting FRB 4 19 faulty component determining 5 2 FRB 1 4 21 FRB 2 4 21 4 22 FRB 3 3 18 4 21 4 22 front panel LEDs and pushbuttons 3 4 FRU procedures 5 54 cable kit 5 61 installing a new CPU and heatsink 5 27 replacing a server CPU and heatsink 5 16 server main board 5 54 system FRU 5 71 H heatsink configuration 1 6 Hyper threading CPU Feature 4 16 J J5A2 4 4 jumpers J5A2 jumper block 4 4 jumpers main board 1 5 L Language 3 26 LEDs 3 5 hard disk drive activity 3 5 network connection activity 3 8 POST 3 8 power 3 5 power supply 3 8 rear panel 3 7 server main board fault 3 13 status 3 3 system ID 3 5 3 8 3 15 system status fault 3 5 LEDs and pushbuttons 3 3 loading the OS 4 16 M memory configuration 1
142. ng power connector a Is the flex circuit cable labeled Floppy FP IDE properly seated on the baseboard and backplane And are any retention clips used to hold the cable in place properly installed a If you are using a Sun Fire V60x server make sure that the power supply is fully seated in the power distribution board connector a Remove all add in cards and see if the server boots using just the on board components If the server boots successfully add the cards back in one at a time with a reboot after each addition to see if you can isolate a suspect card m Remove and reseat the memory modules Ensure that you have properly populated the memory modules On the main board memory is populated in pairs See Memory on page 5 9 for memory module installation and placement Refer to the silkscreen on the main board for proper memory module placement Try using memory modules from a known compatible server m Remove the processor s and reseat as a last resort Caution Removing and replacing the processors is not recommended and should only be done as a last resort This is a procedure that should be attempted by Sun qualified service personnel Instructions for removing and replacing processors are given in the section titled Replacing a Server CPU and Heatsink on page 5 16 Chapter 2 Troubleshooting Specific Problems 2 3 2 2 1 2 Front Panel is Unresponsive and Video is Disabled If the front panel is unresponsive
143. nk and Processor Removal 5 18 Opening the Socket Lever 5 19 Inserting the Processor 5 20 Closing the Socket Lever 5 20 Installing the Heatsink Retention Clip Details 5 21 Sun Fire V65x Server Heatsink and Processor Removal 5 23 Opening the Socket Lever 5 24 Inserting the Processor 5 25 Closing the Socket Lever 5 25 Installing the Heatsink Retention Clip Details 5 26 Opening the Socket Lever 5 28 Inserting the Processor 5 29 Closing the Socket Lever 5 29 Applying Thermal Conducting Material 5 30 Installing the Heatsink 5 30 Installing the Heatsink Retention Clip Details 5 31 Removing the Power Supply 5 33 Replacing the Power Supply 5 35 Removing a HDD Assembly From a Bay 5 37 Removing the Air Baffle 5 38 Installing the Air Baffle 5 39 Removing the Air Baffle 5 40 Removing the Fan Module 5 42 xiii xiv FIGURE 5 33 FIGURE 5 34 FIGURE 5 35 FIGURE 5 36 FIGURE 5 37 FIGURE 5 38 FIGURE 5 39 FIGURE 5 40 FIGURE 5 41 FIGURE 5 42 FIGURE 5 43 FIGURE 5 44 FIGURE 5 45 FIGURE 5 46 FIGURE 5 47 Removing the Fan Module 5 45 Removing a PCI Card 5 48 Replacing the Backup Battery 5 50 Installing the PS 2 Keyboard Mouse Y Adapter 5 51 Installing the EMP Cable 5 53 Location of the Mounting Screws 5 56 Location of the Mounting Screws 5 59 Sun Fire V60x Server Cable Kit Removal 5 62 Sun Fire V60x Server Cable Routing 5 63 Removing the SCSI Backplane 5 64 Sun Fire V65x Server Cable K
144. ns of the full height heatsink 6 Does it matter which processor is populated first Yes Processor 2 is the processor closest to the outside edge of the board and is labeled CPU 2 Processor 1 is the processor closer to the center of the board and is labeled CPU 1 The server will not boot if only one processor is installed and it is in the CPU 2 socket When two processors are installed the server board is designed in such a way that it can boot from either processor using a technique called Fault Resilient Booting FRB If CPU 1 fails to respond in a designated amount of time during POST CPU 2 is used to complete the boot up sequence In the event of a single processor configuration the board halts during the boot process and displays a message that it is forcing itself to boot from a potentially bad processor and continues once you have acknowledged the message The system bus is automatically terminated an empty CPU 2 socket does not require a terminator 7 What memory configurations are supported on the server board The server board has slots for six Double Data Rate DDR DIMMs and can support a minimum system memory configuration of 256 MB and a maximum system memory configuration of 12 GB The board supports DIMM sizes of 128 MB 256 MB 512 MB 1 GB and 2 GB DDR 200 or DDR 266 memory can be used but speed is locked at 200MHz Memory must be populated in pairs due to dual channel and interleaving supported by th
145. nt panel cable Figure 5 41 F from the front panel board to the backplane and attach it to the matching connector Attach one end P1 of the flex circuit cable Figure 5 41 C to the floppy front panel IDE connector on the server board Secure the P1 end of the flex cable with the blue plastic screw in retention clip Route the flex cable to the backplane board and attach the opposite cable end to the matching connector on the backplane Caution After connecting the flex cable ensure that each cable connector is properly seated in the board connector The connector should be parallel to its board connector and not cocked to one side If in doubt remove reinsert and recheck it Connect the auxiliary signal cable Figure 5 41 B from the power distribution board to the 5 pin auxiliary signal connector on the server board Route the power cable A from the power distribution board to the backplane board and insert it in the white 6 pin connector Route the auxiliary power cable H from the power distribution board to the server board and insert it in the white 8 pin connector On the round SCSI cable D locate the end that is labeled server board Connect that end to the SCSI connector on the server board Route the cable between the PCI connector and memory slots on the server board and then to the backplane board Attach the cable connector to the connector on the backplane board Replace
146. nterface test A keyboard reset error or stuck key was found Issuing the keyboard controller interface test command next 84h R G Off Off Check stuck key enable keyboard the keyboard controller interface test is complete Writing the command byte and initializing the circular buffer next 86h R G G Off Disable parity NMI the command byte was written and global data initialization has completed Checking for a locked key next 88h A Off Off Off Display USB devices 8Ah A Off G Off Verify RAM size Checking for a memory size mismatch with CMOS RAM data next 8Ch A G Off Off Lock out PS 2 keyboard mouse if unattended start is enabled 8Eh A G G Off Initialize boot devices the adapter ROM had control and has now returned control to the BIOS POST Performing any required processing after the option ROM returned control 90h R Off Off R Display IDE mass storage devices 92h R Off G R Display USB mass storage devices 94h R G Off R Report the first set of POST errors to Error Manager 96h R G G R Boot password check the password was checked Performing any required programming before Setup next 98h A Off Off R Float processor initialize performing any required initialization before the coprocessor test next 9Ah A Off G R Enable Interrupts 0 1 2 checking the extended keyboard keyboard ID and NUM Lock key next Issuing the keyboard ID command next 9Ch A G Off R Initialize FDD devices Report second set of POST errors to error me
147. o ge al RCVR BOOT Clear Normal CLR PSWD Clear Normal CTI 100 CLR CMOS 11 al Write En end Normal 200 ojo J5A2 1060 3 4 056R0 1 2 DCD FIGURE 1 1 Main Board Jumper Locations Chapter 1 Troubleshooting Guidelines 1 5 TABLE 1 1 Jumper Function Summary Designator Jumper Name Action at System Reset J5A2 RJ 45 Serial COM2 Port Configures either a DSR or a DCD signal to the connector Configuration See Rear Panel RJ 45 Serial COM2 Connector in Chapter 2 of the Sun Fire V60x and Sun Fire V65x Server User Guide and Setting the Serial COM2 Port Jumper on page 4 4 in this document CLR CMOS Clear CMOS If these pins are jumpered the CMOS settings are cleared These pins should not be jumpered for normal operation CLR PSWD Clear Password If these pins are jumpered the password is cleared These pins should not be jumpered for normal operation RCVR BIOS Boot Recovery If these pins are jumpered the system will attempt BIOS BOOT recovery These pins should not be jumpered for normal operation BMC BB WE BMC Boot Block Write Enable If these pins are jumpered BMC boot block is erasable and
148. omatically after the update is complete The BIOS code resides both on the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD and on the Service Partition in the C BIOS directory The BIOS can also be updated from a standalone bootable floppy diskette See Updating the Server Configuration on page 4 50 for information on how to update the BIOS Reboot to Service Partition Selecting this menu item causes a reboot to the service partition Reboot System Choosing this menu item causes a reboot maintaining normal boot device ordering 4 The update files are on the hard drive service partition 4 44 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 12 Using the Sun Diagnostics CD The Sun Fire V60x and Sun Fire V65x servers Diagnostics CD provides basic server configuration software through a text based menu program that runs when the server boots from the CD After bootup the CD runs a menu program that allows the user to create driver diskettes run utilities update system components and install and manage service partitions The CD ships with all Sun Fire V60x and Sun Fire V65x servers and can used by Sun service personnel as well as end customers Note If you are running the Solaris operating environment you must use a PS 2 keyboard using the Diagnotics CD To begin using the diagnostics CD follow these steps Reboot the Sun Fire V60x and Sun Fire V65x servers wit
149. on an anti static surface 6 Install the new HDD assembly into the drive bay by inserting the tab end left end of the retention lever into the housing slot and gently closing the lever Note Closing the lever should seat the HDD into the backplane connector If the drive does not insert or seat properly do not force the lever Instead check again to make sure the tab of the retention lever is properly inserted into the housing before closing the lever 7 Reinstall a carrier in any bays where you are not reinstalling a HDD assembly 5 36 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 ad Bezel D Chassis Handle amp Bezel Locating Tabs amp Retention Lever FIGURE 5 28 Removing a HDD Assembly From a Bay Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Chapter 5 Maintaining the Server 5 37 5 5 8 Air Baffle This section explains how to remove and replace the air baffles for the servers Caution Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap
150. on clip and strike the server board possibly causing severe damage to the board or board components In addition if too much force is used you may bend the heatsink retention clip to a point where it may be difficult to replace it without bending it back to its original position a Secure each end of the retention clip to the tabs in the processor retainer by aligning the clip holes over the tabs and pushing down a With the tool in the vertical position firmly grasp it and insert the middle prong of the tool securely into the hole at the center of the retention clip b Slowly and carefully push the tool downward making sure the center prong of the tool stays in the retention clip hole c As you continue to exert downward pressure move the top of the handle slightly in a direction away from the heatsink so that the clip is pushed away from the retainer and the hole in the center of the clip is aligned over the retainer tab Chapter 5 Maintaining the Server 5 21 5 5 4 4 10 d Gradually move the top of the tool handle back toward the heatsink in such as manner as to slide the center of the clip over the retainer tab securing it in place Replace the air baffle fan module and processor air duct Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Sun Fire V65x Server Heatsink and Processor Removal
151. onstitute a multiple bit error which is unrecoverable When a multiple bit memory error is detected a non maskable interrupt NMI is issued that instructs the system to shut down to avoid data corruption Multiple bit errors are very rare Note that neither the Linux not Solaris operating systems support NMI events a Memory scrubbing Error correction is performed on data being read from memory The correction is then passed to the requestor and at the same time the error is scrubbed corrected in main memory Memory scrubbing prevents the accumulation of single bit errors in main memory that would then become unrecoverable multiple bit errors 1 Registered DIMMS are those with an onboard latch that resynchronizes the address control lines to the DIMM These latches are also buffers to allow the main board electronics to drive multiple row devices It is most common for ECC SDRAM modules to be registered Chapter 5 Maintaining the Server 5 9 5 5 3 1 a X4 single device data correction x4 SDDC When x4 memory is installed the ECC function can detect and correct a four bit error caused by a single failed memory chip and the system continues to function though system performance will be affected When x8 memory is installed the ECC function will detect an eight bit error caused by a single failed memory chip but will not be able to correct the error In this situation a fatal error will be issued For part numbers of optional
152. oost Reading 27h gt 39 00 degrees C Press Any Key to Continue lt DONE gt FIGURE 4 28 Platform Confidence Quick Test Sensor Readings second screen Chapter 4 Powering On and Configuring the Server 4 37 4 38 Press any key to return to the main Platform Confidence Test menu To view the test results follow this procedure Return to the System Utilities submenu see Figure 4 16 on page 4 26 and use the Quit to DOS menu selection to exit to DOS Change directories to C PCT Type the following command Type RESULT LOG more A portion of the RESULT LOG file is displayed each time you press a key In this way you can see the results which are divided into the following sections BIOS ID Hardware Configuration Test Summary Analog Sensor Readings The RESULT LOG file is overwritten each time you run a test A sample of the RESULT LOG file is shown in Figure 4 29 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 x Cpu Speed COM1 at Port Floppy cfg Dri w Xe i 004752 MMARY FILE oo e w m m m oa i i m m oo oo oo oa Oo Oo Oo Oo oo oot oo oa in m m in t oo oo on P 1 1 1 1 1 1 1 1 1 1 1 i 1 0 Oo 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FIGURE 4 29 Sample RESULT LOG Chapter 4 Powering On and Configuring the Server 4 39 Comprehensive Test This test fully exe
153. ore touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3 Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap You can use the cable kit to replace one or more internal server cables The procedures given below assume that you are removing and replacing all of the cables Sun Fire V60x Server Cable Kit Removal Before removing the cover to work inside the system observe the safety guidelines previously given To remove cables remove the cover and refer to Figure 5 40 Figure 5 41 and Figure 5 42 while following these steps Remove the air baffle Remove the fan module Unscrew and remove the blue plastic flex cable retention clip from the server board Remove all hard disk drives including blanks and the floppy CD ROM combo drive Unplug the backplane power cable server board auxiliary signal cable and server board auxiliary power cable shown in Figure 5 40 panel 2 Remove the round SCSI cable Remove the flex circuit cable floppy FP IDE that runs from the connector on the server board to the connector on the backplane Remove the front panel cable that runs from the front panel board to the backplane Chapter 5 Maintaining the Server 5 61
154. our bi color diagnostic LEDs is located on the back edge of the baseboard Each of the four LEDs can have one of four states Off Green Red or Amber During the POST process each light sequence represents a specific Port 80 POST code If a system should hang during POST the diagnostic LEDs present the last test executed before the hang When reading the lights the LEDs should be observed from the back of the system The most significant bit MSB is the first LED on the left and the least significant bit LSB is the last LED on the right See POST Progress Code LED Indicators on page 3 22 for details regarding the POST LED display a CPU Fault LEDs A fault indicator LED is located next to each of the processor sockets If the server Baseboard Management Controller BMC detects a fault in any processor the corresponding LED illuminates a Memory Fault LEDs A fault indicator LED is located next to each of the DIMM sockets If the BMC detects a fault in a given DIMM the corresponding LED illuminates One LED for each DIMM is illuminated if that DIMM has an uncorrectable or multi bit memory error The LEDs maintain the same state across power switch power down or loss of AC power a Fan Fault LEDs Depending on the server model the fan header may include a fan fault LED If the BMC detects a fan fault the LED illuminates If the fan fault LED is lit the entire fan module must be replaced a System Status LED Indicates fu
155. oved in the process of installing additional DIMM memory 1 Observe all safety precautions and remove the server top cover 2 Slide the DIMM fan assembly over the vertical support bars until the assembly snaps into place see Figure 5 8 and follow steps a through d below FIGURE 5 8 Vertical Fan Support Bar Location 5 14 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 a Orient the support bars so that the curved bottom aligns with the notches in the two middle DIMM ejector bars b Gently push each support bar onto the two ejector bars until they are held firmly in place Make sure the DIMMs stay securely seated in their sockets c Slide the fan assembly down over the two support bars d The flexible tabs at the top of the support bars lock the fan assembly in place 3 Connect the DIMM fan cable to the 3 pin header on the server main fan pack see Figure 5 9 to DIMM fan FIGURE 5 9 Connecting the DIMM Fan Power Cable 4 Replace the server cover Chapter 5 Maintaining the Server 5 15 5 5 4 Replacing a Server CPU and Heatsink Caution The procedure below is for the attention of qualified service engineers only Before touching or replacing any component inside the Sun Fire V60x and Sun Fire V65x servers disconnect all external cables and follow the instructions in Safety Before You Remove the Cover on page 5 2 and Removing and Replacing the Cover on page 5 3
156. pread apart the DIMM ejector bars using the vertical support bars This will eject installed DIMMS from the sockets b Place the new DIMMs in the sockets but do not press them all the way in because the socket latches on each side are tied together by the vertical support bars c Bring the two vertical support bars together enough to engage the keyed half moons on all the DIMMs d Gently press each DIMM one at a time to engage its socket then firmly to fully seat 4 If you are replacing DIMMs whose ejector bars are not engaged by the DIMM fan vertical support bars a Make sure the ejector bars are in the open position b Align the replacement DIMM notch with the connector slot notch and apply even downward pressure on the DIMM until it slides into the connector slot The ejector bars will snap inward and lock the memory module in place Chapter 5 Maintaining the Server 5 13 5 Replace the DIMM fan assembly as explained in Section 5 5 3 3 Installing the DIMM Fan Sun Fire V65x Server Only on page 5 14 Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 553 3 Installing the DIMM Fan Sun Fire V65x Server Only Note The Sun Fire V65x server is shipped with the DIMM fan installed This installation procedure is provided to enable installation of the assembly if it is completely rem
157. processor 2 is the processor closer to the corner of the board see Figure 5 10 The server board is designed in such a way that it can boot from either processor If the primary processor fails to respond in a designated amount of time during POST the secondary processor is used to complete the boot up sequence In the event of a single processor configuration the board halts during the boot process and displays a message indicating that it is forcing itself to boot from a potentially bad processor and continues after the user has acknowledged the message For normal operation it is best if processor 1 is populated first and then processor 2 however in the event of a mistake or a failed processor the server is able to compensate Chapter 1 Troubleshooting Guidelines 1 7 15 5 Hyper threading CPU Feature The Sun Fire V60x and Sun Fire V65x servers feature Hyper threading capable processors Enabling Hyper threading causes each physical CPU to act as two logical CPUs Enabling Hyper threading on dual processor Sun Fire V60x and Sun Fire V65x servers causes the operating system to recognize four distinct processors Hyper threading may be enabled or disabled in the system BIOS configuration menu Refer to the Sun Fire V60x and Sun Fire V65x Server User Guide Chapter 4 for instructions on how to enable or disable this feature 1 6 Memory Configurations The server has slots for six 168 pin DIMMs and can support a minimum syst
158. programmable at next reset These pins should not be jumpered for normal operation 1 5 Lod Processor and Heatsink Configurations This section gives general information regarding the processors and heatsinks used in the Sun Fire V60x and Sun Fire V65x servers Single or Dual Processor Main Boards The servers run with dual processors or with a single processor A single processor system must have a processor installed in processor socket 1 and socket must be empty No terminator needs to be installed in processor socket 2 in a single processor configuration Processor 1 is the processor nearest the middle of the board Processor 2 is located near the edge of the board 1 6 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Loz Lao 1 5 4 Supported Processors The server main board supports the 2 8 and 3 06 GHz Intel Xeon self terminating processors see Table 1 2 TABLE 1 2 Supported Processors and Heatsinks Marketing Part Number Sun FG Part Number Sun Description 595 6943 01 2 8 GHz Processor with heatsink 595 6944 01 3 06 GHz Processor with heatsink Heatsinks and Air Ducts Boxed processors come with the appropriate heatsink and clear plastic air duct for integration into the server When installing a heatsink into the server the heatsink minus the fan should be used Processor Population Order Processor 1 is the processor closest to the inside of the board and
159. r distribution board from the white 24 pin power connector Place the server board in an antistatic bag Remove the replacement server board from its packaging and antistatic bag Connect the power distribution board to the 24 pin power connector on the new server board Ensure that the Mylar insulator sheet is seated securely over the standoffs is laying flat on the chassis floor and that the edge of the sheet is seated below the studs in the rear chassis wall Insert the back edge of the board under the three retention pins located at the rear of the chassis While placing the board on the chassis standoffs carefully align the board I O connectors with the rear chassis I O openings Adjust the board s position so that the three mounting holes rest securely on the shouldered standoffs 5 56 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Reattach the board to the chassis using the three mounting screws Note The server board uses three holes to mount the board to the chassis standoffs Install the processor retention mechanisms using the eight screws you removed earlier along with the processor s heatsink s and DIMMs that you wish to use with the new board If you only have one processor install the processor air dam in the outer processor location Lay the USB ribbon cable in the proper
160. r is detected 64h Off A R Off Start USB controllers in chipset 66h Off Off Set up video parameters in BIOS data area 68h G R R Off Activate ADM the display mode is set Displaying the power on message next 6Ah G R A Off Initialize language module Display splash logo 6Ch G A R Off Display sign on message BIOS ID and processor information 6Eh G A A Off Detect USB devices 70h Off R R R Reset IDE Controllers 72h Off R A Displaying bus initialization error messages 74h Off A R Display setup message the new cursor position has been read and saved Displaying the hit setup message next 76h Off A A R Ensure timer keyboard interrupts are on 78h G R R Extended background memory test start 7Ah G R A R Disable parity and NMI reporting 3 26 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 TABLE 3 14 POST Progress LED Code Table Port 80h Codes Continued POST Diagnostic LED Decoder Code G green R red A amber Description 7Ch G A R R Test 8237 DMA controller the DMA page register test passed Performing the DMA controller 1 base register test next 7Eh G A A R Initialize 8237 DMA controller the DMA controller 2 base register test passed Programming DMA controllers 1 and 2 next 80h R Off Off Off Enable mouse and keyboard the keyboard test has started Clearing the output buffer and checking for stuck keys Issuing the keyboard reset command next 82h R Off G Off Keyboard i
161. range to return the system to Sun for repair under the terms of your warranty Or if under a Sun Service agreement the FRU will be replaced by a Sun Service engineer If a Customer Replaceable Unit CRU needs replacement you can either request a replacement part from Sun or return the entire unit for repair All parts replaced under the system warranty must be returned to Sun within 30 days of receipt of the replacement part Note When working on a server you may want to turn on the blue System ID LED to identify the server that is being worked on See LEDs on page 3 1 for instructions on how to turn on this LED 5 1 or Tools and Supplies Needed All that is needed is an antistatic wrist strap recommended 3 2 Determining a Faulty Component To determine and isolate a faulty component refer to Troubleshooting the Server Using Built In Tools on page 3 1 This chapter can help you isolate a faulty component using the following methods ma Fault and Status LEDs see LEDs on page 3 1 a POST LEDs beep codes and displayed error messages see Diagnosing System Errors on page 3 1 a Platform Confidence Test see Platform Confidence Test PCT on page 3 2 m System Setup Utility see System Setup Utility SSU on page 3 2 9 9 Safety Before You Remove the Cover Before removing the system cover to work inside the server observe these safety guidelines Tur
162. rcises and tests the server system The test takes approximately 15 to 20 minutes to execute depending on the amount of memory installed The following test modules are run during the Comprehensive Test Power On Self Test CPU Test s Cache Memory Test s Math Coprocessor Test s Symmetric Multiprocessing SMP Processor 0 Symmetric Multiprocessing SMP Processor 1 DIMM Memory Test Serverworks HE SL Chipset Test Primary Interrupt Controller Test Programmable Interrupt Timer Test Keyboard Test Hot Swap Controller Test Real Time Clock Test PCI Bus Controller Test Universal Serial Bus Controller Test Super I O Controller Test DMA Controller Test Baseboard Management Controller Test Com Port 1 Controller Test Com Port 2 Controller Test Adaptec SCSI Controller Test Parallel Port Controller Test Floppy A Controller Test ATI Video Adapter Test CD ROM Controller Test Hard Disk Drive Controller and Drives Test Sensor Readings voltage temperature fans and so on To run the Comprehensive Test follow this procedure 1 Select Comprehensive Test using the arrow keys and press Enter The initial testing determines your server configuration and produces screens similar to those shown in Figure 4 30 and Figure 4 31 2 When you are prompted about the configuration of the server if the hardware configuration does not match the configuration of your server press the Ctrl and Break keys simultaneously Ctrl Break
163. rd Disk Drive Press Any Key to Continue lt DONE gt FIGURE 4 26 Platform Confidence Quick Test Results Summary 4 36 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 Press any key to see the analog sensor readings see Figure 4 27 Reading Sensor 40h Baseboard Fan Reading 9537 00 RPH Sensor 41h Baseboard Fan Reading 9537 00 RPM Sensor 42h Baseboard Fan Reading 9537 00 RPH Sensor 43h Baseboard Fan Reading 9231 00 RPM Sensor 44h Baseboard Reading 9537 00 RPH Sensor 10h Baseboard 1 Reading 1 25 Volts Sensor 11h Baseboard 2 Reading 2 47 Volts Sensor 12h Baseboard 3 Reading 3 32 Volts Sensor 13h Baseboard 3 Reading 3 29 Volts Sensor 14h Baseboard 5 Reading 4 99 Volts Sensor 15h Baseboard Reading 11 78 Volts Sensor 16h Baseboard Reading 12 18 Volts Sensor 17h Baseboard Reading 3 07 Volts Sensor B8h Processor Reading 1 45 Volts Sensor 38h Baseboard Reading 00 degrees Sensor 33h Basebrd FanBoost Reading 00 degrees Sensor 32h FntPnl Amb Temp Reading 00 degrees Sensor 36h FP Amb FanBoost Reading 00 degrees Sensor 98h Processorl Temp Reading 00 degrees Sensor A h Proci FanBoost Reading 00 degrees Press Any Key to Continue lt MORE gt FIGURE 4 27 Platform Confidence Quick Test Sensor Readings first screen 5 Press any key to see the remaining sensor readings see Figure 4 28 Sensor 99h Processor2 Temp Reading 27h gt 39 00 degrees C Sensor Alh Proc2 FanB
164. rding to the operating system If the operating system prompt does not appear see Loading the Operating System on page 4 16 or Recovering the BIOS on page 4 53 or Processor and Heatsink Configurations on page 1 6 KVM PS 2 Keyboard Video Mouse Unit Causes System To Hang During POST Some KVM switches may cause intermittent problems during Power On Self Test POST Possible issues are as follows a The system may not respond to keyboard or mouse inputs a The system may hang causing the watchdog timer to expire This in turn causes a FRB 2 Fault Resilient Booting event By default if a FRB 2 event occurs on redundant processor systems the Boot Strap Processor will be disabled on the next boot 2 8 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 222 To return a system with redundant processors to normal operation after an FRB2 event follow the instructions below Reset or turn on the system Press the F2 key to select SETUP as soon as the option appears on the screen Once in the main page of the SETUP menu use the arrow keys to select Processor Settings then press Enter In the Processor Settings screen select Processor Retest then select Enabled Press the F10 function key to exit the SETUP menu and save changes The system will then re test both processors to make sure they are in working condition and bring both processors back to normal operation
165. res 5 4 5 5 1 Front Bezel 5 5 5 5 2 Floppy DVD CD ROM Combo Module 5 7 viii Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 7 5 5 8 Memory 5 9 5 5 3 1 Sun Fire V60x Server DIMM Replacement 5 10 5 5 3 2 Sun Fire V65x Server DIMM Replacement 5 13 5 5 3 3 Installing the DIMM Fan Sun Fire V65x Server Only 5 14 Replacing a Server CPU and Heatsink 5 16 5 5 4 1 Safety Precautions 5 16 5 5 4 2 Sun Fire V60x Server Heatsink and Processor Removal 5 17 5 5 4 3 Sun Fire V60x Server Heatsink and Processor Replacement 5 19 5 5 4 4 Sun Fire V65x Server Heatsink and Processor Removal 5 22 5 5 4 5 Sun Fire V65x Server Heatsink and Processor Replacement 5 24 Sun Fire V60x and Sun Fire V65x Servers New CPU and Heatsink Installation 5 27 5 5 5 1 Safety Precautions 5 27 Power Supply Unit 5 32 5 5 6 1 Sun Fire V60x Server Power Supply 5 32 5 5 6 2 Sun Fire V65x Server Power Supply 5 34 Hard Disk Drives 5 36 Air Baffle 5 38 5 5 8 1 Sun Fire V60x Server Air Baffle Removal 5 38 5 5 8 2 Sun Fire V60x Server Air Baffle Installation 5 39 5 5 8 3 Sun Fire V65x Server Air Baffle Removal 5 40 5 5 8 4 Sun Fire V65x Server Air Baffle Installation 5 40 5 5 9 Fan Module 5 41 5 5 9 1 Sun Fire V60x Server Fan Module Removal 5 41 5 5 9 2 Sun Fire V60x Server Fan Module Replacement 5 43 5 5 9 3 Sun Fire V65x Server Fan Module Removal 5 44 5 5 9 4 Sun Fire V65x Server Fan Module Replacement 5 46 5
166. rompt and superuser prompt for the C Bourne and Korn shell TABLE P 2 Shell Prompt Shell Prompt Bourne shell and Korn shell prompt machine name Bourne shell and Korn shell superuser prompt machine name Preface xix Notice To better illustrate the process being discussed this manual contains examples of data that might be used in daily business operations The examples might include the names of different individuals companies brands and products Only fictitious names are used and any similarity to the names of individuals companies brands and products used by any business enterprise is purely coincidental Support For technical support call the phone numbers listed below according to your location United States Tel 1 800 USA 4SUN 1 800 872 4786 UK Tel 44 870 600 3222 France Tel 33 1 34 03 5080 Germany Tel 49 1805 20 2241 Italy Tel 39 02 92595228 Toll Free 800 605228 Spain Tel 011 3491 767 6000 See the following link for US Europe South America Africa and APAC local country telephone numbers http www sun com service contacting solution html For general support and documentation on the Sun Fire V60x and Sun Fire V65x servers see the following link http www sun com supporttraining xx Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your com
167. rough the serial COM2 port or the LAN connector and redirect the BIOS setup screen to a remote console to check For details on how to do this refer to Configuring an External Serial Console on page 4 17 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 2 3 2 2 2 3 3 If the preceding steps do not solve the problem check the following m Is the keyboard functioning Check to see that the Num Lock light is functioning Is the video monitor plugged in and turned on Are the brightness and contrast controls on the video monitor properly adjusted Are the video monitor switch settings correct Is the video monitor signal cable properly installed Is the onboard video controller enabled If you are using an add in video controller board do the following Verify that the video controller board is fully seated in the server board connector Reboot the system for changes to take effect If there are still no characters on the screen after you reboot the system and POST emits a beep code write down the beep code you hear This information is useful for your service representative If you do not receive a beep code and characters do not appear the video display monitor or video controller may have failed Contact your service representative or authorized dealer for help Xserver Has Not Started The typical reason that Xserver has not been started is that it is not set up to
168. rvice turn off all peripheral devices connected to the system turn off the system by pressing the power button and unplug the AC power cord from the system or wall outlet Note In the Sun Fire V60x server the floppy DVD CD ROM module may be replaced with a hard disk drive If you do this you need to install a small plastic cover to cover the gap at the right side of the drive that is left by removal of the larger size floppy CD ROM module The plastic cover is included in the accessory kit To replace the Floppy CD ROM module follow these steps Before removing the cover to work inside the system observe the safety guidelines previously stated Remove the bezel from the front of the chassis As shown in Figure 5 4 rotate the module s handle bar up A and pull on the handle bar to remove the module from the flex bay Slide a new module into the flex bay until you feel the connectors touch Push the module in using the handle bar about 3 16 of an inch 5mm more to fully engage the connectors Rotate the handle bar down Reinstall the bezel Chapter 5 Maintaining the Server 5 7 FIGURE 5 4 Floppy CD ROM Module Replacement Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 5 8 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 oteto Memory
169. s press the keys Ctrl A at POST during the SCSI initialization phase 2 Once in the SCSI BIOS select External Channel A AIC 7902 A slot at 00 04 07 00 then select the following Configure View SCSI Controller Settings and Advanced Configuration 3 In the Advanced Configuration menu go to SCSI Controller Int 13 Support and select one of the Disabled options 4 In the SCSI BIOS settings screen select Internal Channel B AIC 7902 B at slot 00 04 07 01 then select the following Configure View SCSI Controller Settings and Advanced Configuration 5 In the Advanced Configuration menu go to SCSI Controller Int 13 Support and select the Enabled option if it is not already enabled 6 Save the settings before exiting the menu 2 6 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 2 2 1 5 2 2 1 6 Server Starts Booting Automatically at Power On The server board saves the last known power state in the event of a power failure If you remove power before powering down the system using the power switch on the front panel your system might automatically attempt to restore itself back to the state it was in after you restore power You can configure how you would like your server system to react when power is restored in the BIOS set up Security menu See BIOS Setup Utility lt F2 gt on page 4 9 You can have the server remain off or return to the last known power state m Please keep in
170. sS amp o SUN microsystems Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide Troubleshooting Guide Sun Microsystems Inc 4150 Network Circle Santa Clara CA 95054 U S A 650 960 1300 Part No 817 2024 12 November 2003 Revision A Submit comments about this document at http www sun com hwdocs feedback Copyright 2003 Sun Microsystems Inc 4150 Network Circle Santa Clara California 95054 U S A All rights reserved Sun Microsystems Inc has intellectual property rights relating to technology that is described in this document In particular and without limitation these intellectual property rights may include one or more of the U S patents listed at http www sun com patents and one or more additional patents or pending patent applications in the U S and in other countries This document and the product to which it pertains are distributed under licenses restricting their use copying distribution and decompilation No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors if any Third party software including font technology is copyrighted and licensed from Sun suppliers Parts of the product may be derived from Berkeley BSD systems licensed from the University of California UNIX is a registered trademark in the U S and in other countries exclusively licensed through X Open Company Ltd Sun Sun Micro
171. sor P2 L2 cache failed Yes 8180 BIOS does not support current stepping for Processor P1 Yes 8181 BIOS does not support current stepping for Processor P2 Yes 8190 Watchdog timer failed on last boot No 8191 4 1 core to bus ratio processor cache disabled Yes 8192 L2 Cache size mismatch Yes 8193 CPUID processor stepping are different Yes 8194 CPUID processor family are different Yes 8195 Front side bus speed mismatch System halted Yes Halt 8196 Processor models are different Yes 8197 CPU speed mismatch Yes 8198 Failed to load processor microcode Yes 8300 Baseboard Management Controller BMC failed to function Yes 8301 Front panel controller failed to function Yes 8305 Hotswap controller failed to function Yes 8420 Intelligent System Monitoring chassis opened Yes 84F1 Intelligent System Monitoring forced shutdown Yes 84F2 Server Management Interface failed Yes 84F3 BMC in update mode Yes 84F4 Sensor Data Record SDR empty Yes 84FF System event log full No 8500 Bad or missing memory in slot 3A Yes 8501 Bad or missing memory in slot 2A Yes 3 18 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 Dude TABLE 3 8 Error Code 8502 8504 8505 8506 8601 Extended POST Error Messages and Codes Continued Error Message Pause On Boot Bad or missing memory in slot 1A Yes Bad or missing memory in slot 3B Yes Bad or missing memory in slot 2B Yes Bad or missing memory in slot 1B Yes All memory marked as fail
172. ssager 9Eh A G G R Extended background memory test end Chapter 3 Troubleshooting the Server Using Built In Tools 3 27 TABLE 3 14 POST Progress LED Code Table Port 80h Codes Continued POST Diagnostic LED Decoder Code G green R red A amber Description AOh R Off R Off Prepare and run setup Error manager displays and logs POST errors Waits for user input for certain errors Execute setup A2h R Off A Off Set base expansion memory size A4h R G R Off Program chipset setup options build ACPI Tables and build INT15h E820h table A6h R G A Off Set display mode A8h A Off R Off Build SMBIOS table and MP tables AAh A Off A Off Clear video screen ACh A G R Off Prepare USB controllers for operating system AEh A G A Off One beep to indicate end of POST No beep if silent boot is enabled 000h Off Off Off Off POST completed Passing control to INT 19h boot loader next 3 28 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 4 Contacting Technical Support For technical support call the phone numbers listed below according to your location United States1 800 USA 4SUN 1 800 872 4786 UK Tel 44 870 600 3222 France Tel 33 1 34 03 5080 Germany Tel 49 1805 20 2241 Italy Tel 39 02 92595228 Toll Free 800 605228 Spain Tel 011 3491 767 6000 See the following link for US Europe South America Africa and APAC local country telephone numbers http www
173. systems the Sun logo AnswerBook2 docs sun com Solaris and Sun Fire are trademarks or registered trademarks of Sun Microsystems Inc in the U S and in other countries All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International Inc in the U S and in other countries Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems Inc The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems Inc for its users and licensees Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry Sun holds a non exclusive license from Xerox to the Xerox Graphical User Interface which license also covers Sun s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun s written license agreements U S Government Rights Commercial use Government users are subject to the Sun Microsystems Inc standard license agreement and applicable provisions of the FAR and its supplements DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS REPRESENTATIONS AND WARRANTIES INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE OR NON INFRINGEMENT ARE DISCLAIMED EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID Copyright 2003 Sun Microsystems Inc 4150 Network Circle
174. t from a diskette If the software runs correctly there may be a problem with the copy on the hard disk drive Reinstall the software on the hard disk and try running it again Make sure all necessary files are installed a If the problems are intermittent there may be a loose cable dirt in the keyboard if keyboard input is incorrect a marginal power supply or other random component failures a If you suspect that a transient voltage spike power outage or brownout might have occurred reload the software and try running it again When voltage spikes or brownouts occur symptoms include a flickering video display unexpected system reboots and the system not responding to user commands Note If you are getting random errors in your data files they may be getting corrupted by voltage spikes on your power line If you are experiencing any of the above symptoms that might indicate voltage spikes on the power line install a surge suppressor between the power outlet and the system power cord 1 12 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 CHAPTER 2 Troubleshooting Specific Problems This chapter instructs you on how to solve specific problems with the Sun Fire V60x and Sun Fire V65x servers It contains the following sections Preparing the System for Diagnostic Testing on page 2 1 Specific Problems and Corrective Actions on page 2 2 Other Problems on page 2 20
175. tallation CDs Make sure the driver is loaded and the protocols are bound Make sure the network cable is securely attached to the connector at the system back panel If the cable is attached but the problem persists try a different cable Make sure the hub port is configured for the same duplex mode as the network controller Check with your LAN administrator about the correct networking software that needs to be installed If you are directly connecting two servers no hub you will need a crossover cable see your hub documentation for more information on crossover cables Check the network controller LEDs that are visible through an opening at the system back panel Problems with Network If the server hangs when the drivers are loaded Change the PCI BIOS interrupt settings If diagnostics pass but the connection fails Make sure the network cable is securely attached The Activity LED does not light Make sure the network hub has power For Linux make sure that you are using the e1000 drivers that are part of a 2 4 18 24 7 x or 2 4 18 24 8 x or later kernel If using an older kernel use the e1000 drivers on the SunFire Resource CD If the controller stopped working when an add in adapter was installed Make sure the cable is connected to the port from the onboard network controller Make sure your PCI BIOS is current Make sure the other adapter supports shared interrupts Also make sure your operating system s
176. tem observe the safety guidelines previously given To install the system FRU follow these steps while referring to Figure 5 47 Install the floppy CD ROM combo drive see Floppy DVD CD ROM Combo Module on page 5 7 for details Install the hard disk drives see Hard Disk Drives on page 5 36 for more details Install the heatsinks and CPUs see Replacing a Server CPU and Heatsink on page 5 16 for more details Install the DIMMs see Memory on page 5 9 for more details Chapter 5 Maintaining the Server 5 71 System FRU Installation IGURE 5 47 Sun Fire V60x and Sun Fire V65x Servers Index NUMERICS 5V Standby LED 2 3 A add in cards 2 2 B BIOS updating 4 50 BIOS ID 3 26 BIOS recovery 3 24 BIOS recovery beep codes 3 21 BIOS Setup how to enter during POST 2 7 booting up 4 8 BIOS setup utility 4 9 choose boot device 4 13 network 4 13 service partition 4 12 C checklists application software problems 2 14 can t connect to a server 2 13 can t detect bootable CD ROM 2 14 CD ROM activity light is off 2 12 cooling fans don t operate properly 2 12 disk drive activity light off 2 12 distorted characters on video screen 2 11 hard drives don t show up 2 5 network problem 2 13 no video on screen 2 10 power LED does not light 2 9 server does not power on 2 3 server starts to boot at power on 2 7 clearing CMOS 4 6 CMOS clear jumper 3 26 CMOS cleari
177. ter three consecutive FRB2s performs all the same functions as Disable on FRB2 with the following exception The BIOS maintains a failure history of the successive boots If the same BSP fails three consecutive boots with an FRB2 the processor is then disabled If the system successfully boots to a BSP the failure history maintained by the BIOS should be cleared The option of Disable FRB2 Timer will cause the BIOS to not start the FRB2 timer in the BMC during POST If this option is selected the system has no FRB protection after the FRB3 timer is disabled The BIOS and BMC implement additional safeguards to detect and disable the application processors AP in a multiprocessor system If an AP fails to complete initialization within a certain time it is assumed to be nonfunctional If the BIOS detects that an AP is nonfunctional it requests the BMC to disable that processor When the BMC disables the processor and generates a system reset the BIOS does not see the bad processor in the next boot cycle The failing AP is not listed in the MP table refer to the Multi Processor Specification Rey 1 4 nor in the ACPI APIC tables and is invisible to the operating system All the failures late POST OS Boot FRB 3 FRB 2 and AP failures including the failing processor are recorded into the System Event Log However the user should be aware that if the setup option for error logging is disabled these failures are not recorded The FRB 3 failure
178. the BIOS has been updated observe the BIOS Build number as the FIGURE 4 39 Verifying the BIOS Version 4 52 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 13 2 Recovering the BIOS If an update to the system BIOS is not successful or if the system fails to complete POST and BIOS is unable to boot an operating system it may be necessary to run the BIOS recovery procedure Note Recovering the BIOS is the last resort used only when the main system BIOS will not boot or is corrupt Follow these steps to perform a BIOS recovery 1 Turn off system power 2 Remove the top cover of the server 3 Move the RCVR BOOT jumper located on the baseboard to the Recover position see Figure 4 40 5 il 3 aa S B Q E My Ie Oo o Q 10 6 O 0 f E 1 A Recover Q 10 wii Normal fy 20 RCVR BOOT i e 40 i Clear _ g rA il Normal _ B 70 CLR PSWD 80 O 90 T Ta B 100 CLRCMOS LJ Write E O 10 somal B a BMC BB WE o FIGURE 4 40 Location of Recovery Boot Jumper Chapter 4 Powering On and Configuring the Server 4 53 4 54 With the jumper in the reco
179. the air baffle Replace the fan module Replace the top cover of the server Note The Comprehensive Test should be run after changing any FRU CRU or adding an optional component See Run Platform Confidence Test PCT on page 4 31 Chapter 5 Maintaining the Server 5 65 5 6 2 3 Sun Fire V65x Server Cable Kit Removal Before removing the cover to work inside the system observe the safety guidelines previously given To remove cables remove the cover and refer to Figure 5 44 and Figure 5 44 when following these steps Insert your fingers under the blue plastic loops on the full height PCI riser card and pull the riser card straight up out of the chassis Unscrew the air baffle screw and remove the floppy FP IDE flex cable retention clip Remove the air baffle Unscrew and remove the plastic retention clip that holds the flex cable connector to the SCSI backplane Remove the floppy FP IDE flex circuit cable that runs from the connector on the server board to the connector on the backplane Remove the front panel cable that runs from the front panel board to the backplane Remove the USB cable that runs from the connector on the front panel board through the clips on top of the fan module to the connector on the server main board Remove the SCSI cable that runs from the SCSI backplane to the server board 5 66 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide
180. the screen either Turn off the system Remove the two faulty DIMMs from Slot 1A and Slot 1B Insert two good DIMMs in Slot 2A and Slot 2B Refer to the Sun Fire V60x and Sun Fire V65x Server User Guide 817 2023 xx for information on how to correctly replace the DIMMs Power on the system Press the F2 key to select SETUP when the option appears on the screen In the main page of the SETUP menu use the arrow keys to select the Advanced Menu In the Memory Configuration screen select Memory Retest and select Enabled Press the F10 key to exit the SETUP menu and save the changes The system reboots automatically Chapter 2 Troubleshooting Specific Problems 2 19 11 12 13 14 Power off the system Move the DIMMs back to Slot 1A and Slot 1B Note Note If this step is skipped the following error messages may appear on the screen and on the System Event Log Error 8502 Bad or missing memory in slot 1A Error 8506 Bad or missing memory in slot 1B Replace the top cover Power on the system The system will boot correctly on subsequent reboots Note The Memory Retest feature returns to its default Disabled condition after the memory test cycle is complete 23 2 20 Other Problems If the preceding information does not fix the problem with your server try the following m Update the firmware files to the latest version a The files used include BIOS BMC
181. the server As shown in panel 3 of Figure 5 10 insert the heatsink retention clip removal tool into the hole in the end of one of the retention clips and then a Use the tool to push the clip down b Move the top of the tool toward the heatsink to release the clip from the tab on the heatsink retainer c Release the pressure on the tool and allow the clip to come up so it clears the tab on the retainer d Release the other end of the clip and slide the clip in a horizontal direction to free it from the middle tab 6 Remove both retention clips and the heatsink as shown in panel 4 Chapter 5 Maintaining the Server 5 17 7 As shown in panel 5 a Grasp the end of the socket lever and raise it to disengage the processor pins b Lift the processor straight up out of the socket Caution Do not place the thermal pasted side of the processor or heatsink on any surface as it may pick up contaminants causing incorrect processor mating and possible overheating FIGURE 5 10 Sun Fire V60x Server Heatsink and Processor Removal e November 2003 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide 5 18 5 5 4 3 Sun Fire V60x Server Heatsink and Processor Replacement Installing a replacement heatsink and processor is essentially the reverse of the procedure given in the previous section Note When a processor kit includes new heatsink retention clips use them in place
182. this menu selection to update the hard disk controller firmware The hot swap controller HSC code resides both on the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD and on the Service Partition in the C HSC directory The HSC firmware can also be updated from a standalone bootable floppy diskette 2 The baseboard refers to the server Main Board 3 The update files are on the hard drive service partition Chapter 4 Powering On and Configuring the Server 4 43 4 11 2 6 4 11 2 7 4 11 2 8 4 11 2 9 Run Field Replaceable Unit Sensor Data Record FRU SDR Update Use this menu item to re inventory the FRUs and Sensor Data Records SDR on the Sun Fire V60x and Sun Fire V65x servers The FRU SDR code resides both on the Sun Fire V60x and Sun Fire V65x servers Diagnostics CD and on the Service Partition in the C FRUSDR directory Typically the product and chassis serial numbers are changed only by field service personnel when hardware is changed The asset tag number is intended for use by customers for their internal tracking system For example the asset tag might be updated after a firmware or software update Run BIOS Update reboot required Use this menu item to update the BIOS Boot block in the event that the boot block becomes corrupted Use this menu item to also update the BIOS in the event that the BIOS becomes corrupted or if you want to set the BIOS settings back to their defaults A reboot occurs aut
183. this window to perform platform management Run Platform Confidence Test PCT The PCT is used to test major subsystems and analog sensors of the system board Disabling the Platform Event Filter If you are planning to run the Platform Confidence Test PCT from the Sun Fire V60x and V65x Diagnostic CD you need to disable the Platform Event Filter since this feature will trigger the Baseboard Management Controller BMC to shut down the system if the motherboard temperature exceeds the threshold during PCT The Platform Event filter is disabled by default However it is automatically enabled in the F2 Setup if the BMC LAN Management feature is enabled in the SSU System Setup Utility 1 Baseboard refers to the Main Board in the server Chapter 4 Powering On and Configuring the Server 4 31 To disable the Platform Event Filter 1 Reset or turn on the server 2 Press the F2 key to select SETUP as soon as the option appears on the screen 3 Select the Server menu on the SETUP screen 4 Check to see if the Platform Event Filter is disabled a If the Platform Event Filter is disabled it will not appear as an option in the Server screen a If the Platform Event Filter is enabled go to Step 4 5 To disable the Platform Event Filter select Platform Event Filter from the Server setup screen press Enter and select the Disable option 6 Press the F10 function key to exit the SETUP menu and save changes Running the P
184. tion is indicated with a blinking amber status LED and signifies that at least one of the following conditions is present Temperature voltage or fan non critical threshold crossing Chassis intrusion Satellite controller sends a non critical state via the Set Fault Indication command to the BMC A Set Fault Indication command from the system BIOS The BIOS may use the Set Fault Indication command to indicate additional non critical status such as system memory or CPU configuration changes Degraded Condition A degraded condition is indicated with a blinking green status LED and signifies that at least one of the following conditions is present Non redundant power supply operation This only applies when the BMC is configured for a redundant power subsystem The power unit configuration is configured via OEM SDR records A processor is disabled by FRB or BIOS BIOS has disabled or mapped out some of the system memory This Troubleshooting Guide gives information on how to isolate the server component responsible for any of the critical non critical or degraded conditions listed above 1 Baseboard refers to the server Main Board 3 10 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 2 4 Rear Panel Power Supply Status LED The rear panel power supply status LEDs are located as shown in Figure 3 4 Power Supply Status LED
185. to Disabled is the default Press the F10 key to save the session and exit Choose to reboot the server when prompted After the server reboots you can use the USB keyboard and mouse to use the Diagnostic CD and Service Partition utilities Chapter 4 Powering On and Configuring the Server 4 23 4 11 Using the Service Partition Menu Note By default the Sun Fire V60x and Sun Fire V65x servers are shipped without the Service Partition installed It can be installed as described in Service Partition on page 4 47 If the Service Partition is installed when you press F4 at the initial bootup screen the Service Partition Menu appears see Figure 4 14 Note If you are running the Solaris operating environment on your Sun Fire V60x or V65 server you will not be able to access the Service Partition by pressing F4 when the BIOS POST is running Let the system continue to boot up and when the Solaris Primary Boot Subsystem menu displays select the DIAGNOSTIC paritition Sun Cobalt Grizzly Service Partition Menu Beta 1 release Arrow keys move highlight ENTER to select ESC to abort FIGURE 4 14 Service Partition Menu There are three main menu items across the top of this screen m Create Diskettes m System Utilities m Quit to DOS 4 24 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 11 1 Create Diskettes Pressing Enter with the Create Diskettes menu
186. tton is also used as a sleep button for operating systems that follow the ACPI specification Linux for example configures the power button to the instant off mode There is no ACPI support for the Solaris OS Reset Depressing this pushbutton reboots and initializes the system NMI Pushing this recessed pushbutton causes a non maskable interrupt to occur Note that NMI event trapping is not implemented in Linux nor Solaris System ID This pushbutton toggles the state of the front panel ID LED and the server Main Board ID LED The Main Board ID LED is visible through the rear of the chassis and allows you to locate a particular server from behind a rack of servers 3 6 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 3 2 2 Rear Panel LEDs The rear panel contains the LEDs shown in Figure 3 2 NIC2 NetworkO NIC2 NetworkO Activity BED Speed LED Power Supply Status LED FANA eee KRZAKI E System Status LED NIC1 Networki NIC1 NetworkO ID LED Activity LED Speed LED POST LEDs 4 LEDs are on main board O visible through rear of chassis NIC2 NetworkO NIC2 NetworkO Activity LED Speed LED
187. uld not be jumpered for normal operation RCVR BOOT BIOS Boot Recovery If these pins are jumpered the system will attempt BIOS recovery These pins should not be jumpered for normal operation BMC BB WE BMC Boot Block Write If these pins are jumpered BMC boot block is erasable and Enable programmable at next reset These pins should not be jumpered for normal operation Chapter 4 Powering On and Configuring the Server 4 3 4 2 Setting the Serial COM2 Port Jumper A serial port jumper on the Main Board is preset by default to the position that satisfies most serial port configurations The jumper is located at the rear of the server on the Main Board next to the rear RJ 45 serial connector The jumper is on the jumper block labeled J5A2 The top cover of the server must be removed to access the jumper For serial devices that require a DSR signal default the J5A2 jumper block must be configured as follows place the jumper across positions 3 and 4 the two middle jumper posts as shown in Figure 4 2 5 3 1 O O O O 6 4 2 J5A2 Jumper BlockO viewed from front of server FIGURE 4 2 J5A2 Jumper Block Configured for DSR Signal pin 7 connected to DSR For serial devices that require a DCD signal the J5A2 jumper block must be configured as follows place the jumper across positions 1 and 2 as shown in Figure 4 3 5 3 d O O OO 6 4 2 J5A2 Jumper BlockO viewed from front of server FIGURE 4 3 J5A2 Jumper Bloc
188. un Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 E7 Problems With SNMP Unless the dmi2snmp service is configured correctly failure errors may appear if and when the service is attempted to be shut down By default the dmi2snmp service is not configured robustly enough to be started stopped successfully Note The dmisnmp2 service is not supported in the SOlaris OS Chapter 1 Troubleshooting Guidelines 1 9 1 8 Problems With Initial System Startup Problems that occur at initial system startup are usually caused by incorrect installation or configuration Hardware failure is a less frequent cause 1 8 1 Checklist m Are all cables correctly connected and secured a Is the power cord properly inserted and fully seated a Are there any Baseboard Management Controller BMC beep codes You may have to listen carefully two or three times to hear them See POST Error Beep Codes on page 3 19 for beep code details a Is the BMC running Try pressing the ID button on the front panel If the blue ID LED fails to illuminate the BMC is not responding a Are the cables going to the front panel board installed and seated properly check the front panel cable the USB cable and the 100 pin flex cable Are the processors fully seated in their sockets on the server board Are all add in PCI boards fully seated in their slots on the server board Are all jumper settings on the server board
189. upports shared interrupts Try reseating the add in adapter Chapter 2 Troubleshooting Specific Problems 2 13 2 2 0 2 240 If the add in adapter stopped working without apparent cause a Try reseating the adapter first then try a different slot if necessary a The network driver files may be corrupt or deleted Delete and then reinstall the drivers a Run the diagnostics Note Disconnecting an Ethernet cable from Network 2 may interrupt network connectivity on other network interfaces Run the following commands to restore connectivity to other connected network interfaces etc rce d init d network stop etc rc d init d network start Problems with Application Software If you have problems with application software do the following a Verify that the software is properly configured for the system See the software installation and operation documentation for instructions on setting up and using the software a Try a different copy of the software to see if the problem is with the copy you are using m Make sure all cables are installed correctly a Verify that the server board jumpers are set correctly See Setting Main Board Jumpers on page 1 5 a If other software runs correctly on the system contact your vendor about the failing software If the problem persists contact the software vendor s customer service representative for help Bootable CD ROM Is Not Detected Check the following
190. us LED States System Status LED State System Condition CONTINUOUS GREEN Indicates the system is operating normally BLINKING GREEN Indicates the system is operating in a degraded condition BLINKING AMBER Indicates the system is in a non critical condition CONTINUOUS AMBER Indicates the system is in a critical or non recoverable condition OFF Indicates POST system stop Chapter 3 Troubleshooting the Server Using Built In Tools 3 9 Critical Condition A critical condition or non recoverable threshold crossing is indicated with a continuous amber status LED and is associated with the following events Temperature voltage or fan critical threshold crossing Power subsystem failure The Baseboard Management Controller BMC asserts this failure whenever it detects a power control fault for example the BMC detects that the system power is remaining on even though the BMC has deasserted the signal to turn off power to the system The system is unable to power up due to incorrectly installed processor s or processor incompatibility A satellite controller such as the HSC or another IMPI capable device such as an add in server management PCI card sends a critical or non recoverable state via the Set Fault Indication command to the BMC Critical Event Logging errors including System Memory Uncorrectable ECC error and Fatal Uncorrectable Bus errors such as PCI SERR and PERR Non Critical Condition A non critical condi
191. ut of the server Attempting to do this could result in severe damage to the board FIGURE 5 39 Location of the Mounting Screws Slide the board toward the front of the chassis until the I O connectors are clear of the chassis I O openings lift the server board from the chassis and place it on an antistatic pad Remove the replacement server board from its packaging and antistatic bag Ensure that the Mylar insulator sheet is seated securely over the standoffs is laying flat on the chassis floor and that the edge of the sheet is seated below the studs in the rear chassis wall Insert the back edge of the board under the three retention pins located at the rear of the chassis While placing the board on the chassis standoffs carefully align the board I O connectors with the rear chassis I O openings Adjust the board s position so that the three mounting holes rest securely on the shouldered standoffs Chapter 5 Maintaining the Server 5 59 5 60 20 21 22 23 24 25 26 27 28 29 30 31 Reattach the board to the chassis using the three mounting screws Note The server board uses three holes to mount the board to the chassis standoffs
192. utton is off Caution Electrostatic discharge ESD can damage server board components Perform CPU replacement procedures only at an ESD workstation If no such station is available you can provide some ESD protection by wearing an antistatic wrist strap and attaching it to an unpainted metal part of the computer chassis 5 16 Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 5 5 4 2 Caution CPU replacement must be performed by trained service personnel only An ESD wrist strap must be used for this procedure Sun Fire V60x Server Heatsink and Processor Removal To remove the heatsink and processor follow these steps while referring to Figure 5 10 Remove the plastic air duct that covers the heatsinks and processors see Figure 5 10 Determine the location of the processor you are going to remove see Figure 5 10 CPU 2 is closest to the outside of the server and CPU 1 is toward the inside As shown in panel 1 of Figure 5 10 remove the processor air duct by lifting it up out of the chassis Remove the air baffle by wiggling to loosen the tab from the backplane board Lift the air baffle out of the chassis As shown in panel 2 of Figure 5 10 remove the fan module a Disconnect fan power cable b Push release tab c Slide left and up Note In the Sun Fire V60x server the fan module must be removed to access the retention clip nearest the front of
193. ver Power on the system The system will boot correctly on subsequent reboots Note The Memory Retest feature returns to its default Disabled condition after the memory test cycle is complete Chapter 2 Troubleshooting Specific Problems 2 17 2 2 11 3 1 2 5 Faulty Memory DIMMs Note This note applies if you are using BIOS release 5 0 v1175 or later If you don t add remove or replace any memory DIMM modules and you encounter memory errors during POST after issuing a soft reset hard reset or powering on the system then the BIOS is detecting faulty memory DIMM modules in the system during memory test in POST The BIOS will log an error in the System Event log disable the memory bank that contains the faulty memory DIMM modules and reset the system Upon rebooting the system will either appear to have no memory installed or will boot with one of the memory banks disabled If this occurs perform one of the following procedures to reset the system after replacing the faulty memory DIMM modules Choose the procedure that corresponds with how many DIMMs are in your system For Systems With Four or More DIMMs Turn on the system Boot the system with the Diagnostics CD in the CD ROM drive to enter the Diagnostic CD menu or press the F4 key at the initial bootup screen to enter the Service Partition menu Invoke the SEL manager a Use the arrow key to select the System Utility menu and press Enter
194. very position the BIOS is able to execute the recovery BIOS also known as the boot block instead of the normal BIOS The recovery BIOS is a self contained image that exists solely as a fail safe mechanism for installing a new BIOS image Insert a bootable BIOS recovery diskette containing the new BIOS image files Turn on the system power The recovery BIOS boots from the DOS bootable recovery diskette and the server emits a single beep when it passes control to DOS The server also emits a single beep to indicate the beginning of the flash operation After a period of time the BIOS emits two beeps to indicate that the flash procedure was completed successfully If the flash procedure fails the BIOS emits a continuous series of beeps Note During the BIOS recovery mode video is not initialized One high pitched beep announces the start of the recovery process The entire process takes two to four minutes A successful update ends with two high pitched beeps Failure is indicated by a long series of short beeps When the flash update completes Turn off the system power Remove the floppy diskette Restore the RCVR BOOT jumper to its original position Turn on the system power The system should now boot normally using the updated system BIOS Sun Fire V60x and Sun Fire V65x Servers Troubleshooting Guide November 2003 4 14 4 14 1 4 14 1 1 4 14 1 2 Restarting and Shutting Down You
195. you press Enter the screen shown in Figure 4 22 appears Grizzly BYO Platform Confidence Test v1 03 c Copyright 1997 2003 Intel Corp All Rights Reserved Platform Confidence Test Options Quick Test Comprehensive Test with continuous looping Display Help Text EXIT Highlight selection using Cursor UP DOWN and press ENTER FIGURE 4 22 Platform Confidence Test Menu You can use this menu to perform the following tests m Quick Test a Comprehensive Test DEFAULT a Comprehensive Test With Continuous Looping All test results are saved in the RESULT LOG file of the current directory which is normally C PCT This file is overwritten for each test Chapter 4 Powering On and Configuring the Server 4 33 Quick Test This test performs a quick test of the CPU s DIMM memory CPU cache memory and hard disk drives It is not a complete test of these units Quick Test takes from 2 to 5 minutes depending on the amount of DIMM memory installed The following test modules are run during Quick Test Power On Self Test POST CPU Test s Symmetric Multiprocessing SMP Processor 0 Test Symmetric Multiprocessing SMP Processor 1 Test Hard Disk Drive Test s Cache Memory Test s DIMM Memory Test Sensor Readings voltage temperature fans and so on To run the Quick Test follow this procedure 1 Select Quick Test using the arrow keys and press Enter The initial testing produces a screen similar to the one shown in F

Download Pdf Manuals

image

Related Search

Related Contents

「よくわかる景品表示法と公正競争規約」[PDF:1.7 MB]  Axoclamp-2B Microelectrode Amplifier Manual  User Manual - BrightLink 475wi+/485wi+  Gear Head KBL5900W  Firefriend KO-6382 hob  

Copyright © All rights reserved.
Failed to retrieve file