Home

Method to recover from a boot device failure during reboot or system

image

Contents

1. 0008 During a reboot or Initial program Load IPL of a POWERS server the S HMC displays progress codes The progress codes are numeric characters which are displayed sequentially The progress codes indicate the state of the reboot or IPL progress If the reboot or IPL fails or is stalled the displayed progress code indicates at which point the fail occurred An indication that a reboot or IPL has failed is the fixed display of the same progress code 0009 Depending on the boot status failure the required action necessary for repair may lead to a physical removal of the failed boot device a replacement of the failed boot device a modification of the boot list or a rebuild of the boot devices For example when the boot is stuck with status 0557 a failure to read the boot sector the plan of action requires the pull of the failed boot device The mirror boot device must then be used to retry the boot 0010 All the recovery actions of a failed reboot or a failed IPL require manual intervention from the system administra tor or an IBM representative The boot recovery action is prone to further problems due to user mistakes during the repair action With manual intervention it takes time to per form the repair actions In addition while the failure to boot or IPL persists the DS8000 is in single logical partitioning LPAR mode This increases the likelihood of exposure to a situation where the storage controller is not available if
2. in real time and without user intervention recover from a boot failure and an IPL failure of a POWERS Server DETAILED DESCRIPTION 0015 Reference will now be made in detail to the subject matter disclosed which is illustrated in the accompanying drawings 0016 Method 100 provides the capability to dynamically in real time and without user intervention recover from a failed reboot or IPL of a POWERS Server The recovery action is based on a new option in the S HMC allowing the user to select the type of action the Service Processor will take in the event of a boot failure or IPL failure 110 120 The actions may be to maintain an order of selection ofa selected boot device and a plurality of other devices usable for an OS boot 121 deallocate the failed boot device 122 reduce the Dec 31 2009 priority of the boot device in the bootlist 123 remove the boot device from the bootlist 124 or to take no action allowing user to manually fix the boot or IPL problem One action may be selected and the action may be changed at anytime 0017 When the SP detects a failure to boot or IPL from a boot device 130 the SP will check the action requested for a failure to boot the POWERS server 110 The SP will then notify the user 160 and once SP has selected and executed the recovery action 110 the SP will initiate a reboot of the POWERS server 140 If for some reason the reboot fails again SP will take the same action using the reduced pri
3. of a boot success or taking no action allowing for manual user intervention 100 A 110 receiving a user selected option of an action upon an event of a failure Se a a Se 120 complying with the user selected option without real time user intervention further including 121 maintaining an order of selection of a selected boot device and a j plurality of reduced priority boot devices on the bootlist i 130 detecting a failure 140 attempting a reboot of the server and an IPL with the selected boot device I 150 detecting success or failure of the selected boot device Ema ssl 160 notifying a user of the success or failure of the selected boot device I 170 updating the order of selection of boot devices on the bootlist 180 selecting a reduced priority boot device from the bootlist 190 attempting a reboot of the server and an IPL with the reduced priority boot device 200 continuing the reboot of the server and the IPL attempts using the reduced priority boot devices from the bootlist until detection of a boot success and an IPL success 210 detecting no further boot devices available on the bootlist to successfully complete the boot success and the IPL success 220 notifying a user of the failure Patent Application Publication Dec 31 2009 US 2009 0327813 A1 100 110 receiving a user selected option of an action upon an event of a failure 120 complying with t
4. of selection of boot devices on the bootlist selecting areduced priority boot device from the bootlist attempting a reboot of the server and an IPL with the reduced priority boot device continuing the reboot of the server and the IPL attempts using the reduced priority boot devices from the US 2009 0327813 Al bootlist until detection of a boot success and an IPL success detecting no further boot devices available on the bootlist to successfully complete the boot success and the IPL success and notifying a user of the failure receiving a user selected option of an action upon an event of a boot device failure complying with the user selected option without real time user intervention further including maintaining a bootlist for a server of a plurality of boot devices further including maintaining an order of selection of a selected boot device and a plurality of reduced priority boot devices on the bootlist deallocating a failed boot device changing a priority ofa boot device on the bootlist removing a boot device from the boot list detecting a boot device failure attempting a reboot of the server with the selected boot device detecting success or failure of the selected boot device notifying a user of the success or failure of the selected boot device updating the order of selection of boot devices on the bootlist selecting a reduced priority boot device from the bootlist attempting a reboot of the server with the reduced pr
5. system OS is an open standards based UNIX operating system that allows a user to run the desired appli cations on IBM UNIX OS based servers The boot devices and the Service Processor SP are the POWERS hardware of special interest to this invention 0003 Each POWERS server in the DS8000 contains two boot devices One boot device maybe be designated the pri mary boot device and the other a mirror boot device The boot devices contain the boot files to load the AIX kernel the AIX Operating System and the DS8000 application program that performs the function necessary to manage the storage con troller In addition the boot devices contain configuration files that are used to manage the hardware resources of the storage controller 0004 The Service Processor SP isa POWERPC control ler embedded in a POWERS server The SP contains applica tion programs and device drivers that are required for the functionality of the service processor hardware The SP appli cation programs are used to manage and monitor the POWERS server hardware resources and devices The SP provides functions to manage the automatic power re start of a POWERS server to manage the selection of the boot devices and the capability to modify the boot list The auto restart reboot option when enabled may reboot the system automatically following an unrecoverable hardware or soft ware related failure 0005 The SP provides several distinct surveillance func ti
6. US 20090327813A1 a2 Patent Application Publication o Pub No US 2009 0327813 A1 as United States Coronado et al 43 Pub Date Dec 31 2009 54 METHOD TO RECOVER FROM A BOOT DEVICE FAILURE DURING REBOOT OR SYSTEM IPL 75 Inventors Juan A Coronado Tucson AZ US Aaron E Taylor Tucson AZ US Christina A Lara Tucson AZ US David W Sharik Tucson AZ US Justin D Suess Tucson AZ US Phu Nguyen Tucson AZ US Richard Cunningham Tucson AZ US Adote A Tounou Tucson AZ US Correspondence Address IBM CORPORATION ACCSP c o Suiter Swantz pc llo 14301 FNB Parkway Suite 220 Omaha NE 68154 US International Business Machines Corporation Armonk NY US 73 Assignee 21 Appl No 12 146 087 22 Filed Jun 25 2008 Publication Classification 61 Int Cl GO6F 11 22 2006 01 GO6F 11 20 2006 01 52 U S Cl wacacx 714 36 714 E11 145 714 E11 071 57 ABSTRACT A method of automatic recovery from a boot device failure and an initial program load IPL failure of an operating system OS comprises receiving and complying with a user selected option of an action upon an event of a boot device failure and an IPL failure The user selected option may consist of taking the action of attempting an auto reboot of the server with the selected boot device and continuing the reboot attempts using the reduced priority boot devices from the bootlist until detection
7. and include such changes 1 A method of automatic recovery of an operating system OS from at least one of a boot device failure and an initial program load IPL failure comprising receiving a user selected option of an action upon an event of a failure complying with the user selected option without real time user intervention further including maintaining a bootlist for a server of a plurality of boot devices further including maintaining an order of selection of a selected boot device anda plurality of reduced priority boot devices on the bootlist deallocating a failed boot device reducing a priority of a boot device on the bootlist removing a boot device from the bootlist detecting a failure attempting a reboot of the server and an IPL with the selected boot device detecting success or failure of the selected boot device notifying a user of the success or failure of the selected boot device updating the order of selection of boot devices on the bootlist US 2009 0327813 Al selecting a reduced priority boot device from the bootlist attempting a reboot of the server and an IPL with the reduced priority boot device continuing the reboot of the server and the IPL attempts using the reduced priority boot devices from the bootlist until detection of a boot success and an IPL success Dec 31 2009 detecting no further boot devices available on the bootlist to successfully complete the boot succe
8. for some reason a failure occurs in the running server This also affects overall system performance because the DS8000 per formance has been fine tuned with the availability of both Power 5 servers in mind Just as important this could also increase warranty costs as it can take several hours to provide on site support including diagnosis of the problem This in turn could bring several more hours as the hardware used as boot devices HDD are not normal pieces of hardware car ried by a customer engineer SUMMARY 0011 A method of automatic recovery of an operating system OS from at least one of a boot device failure and an initial program load IPL failure including but not limited to receiving a user selected option of an action upon an event of a failure complying with the user selected option without real time user intervention further including maintaining a boot list for a server of a plurality of boot devices further includ ing maintaining an order of selection of a selected boot device and a plurality of reduced priority boot devices on the bootlist deallocating a failed boot device reducing a priority of a boot device on the bootlist removing a boot device from the bootlist detecting a failure attempting a reboot of the server and an IPL with the selected boot device detecting success or failure of the selected boot device notifying a user of the success or failure of the selected boot device updating the order
9. he user selected option without real time user intervention further including 121 maintaining an order of selection of a selected boot device and a plurality of reduced priority boot devices on the bootlist 130 detecting a failure 140 attempting a reboot of the server and an IPL with the selected boot device 190 attempting a reboot of the server and an IPL with the reduced priority boot device 200 continuing the reboot of the server and the IPL attempts using the reduced priority boot devices from the bootlist until detection of a boot success and an IPL success 210 detecting no further boot devices available on the bootlist to successfully complete the boot success and the IPL success 220 notifying a user of the failure FIG 1 US 2009 0327813 Al METHOD TO RECOVER FROM A BOOT DEVICE FAILURE DURING REBOOT OR SYSTEM IPL TECHNICAL FIELD 0001 The present disclosure generally relates to the field of computer server recovery and more particularly to a method that provides the capability for a computer server such as the POWERS Server to dynamically recover in real time without user intervention from a failed reboot BACKGROUND 0002 The IBM DS8000 storage controller is a dual pro cessor complex controller The dual processor complex con troller includes two POWERS based servers A POWERS server contains processors and memory needed to run the applications that manage the DS8000 functions The AIX operating
10. iority boot device continuing the reboot attempts using the reduced priority boot devices from the bootlist until detection of a boot success detecting no further boot devices available on the bootlist to successfully complete the boot success and notifying a user of the boot device failure The method embodies a user selected option comprises taking an action of attempting a reboot of a server with a selected boot device and continuing reboot attempts using a reduced priority boot devices from a plurality of boot devices on a bootlist until detection of a boot success or taking no action allowing for user manual inter vention The method embodies an additional aspect where the failure detected is an initial program load of an operating system 0012 Itis to be understood that both the foregoing general description and the following detailed description are exem plary and explanatory only and are not necessarily restrictive of the present disclosure The accompanying drawings which are incorporated in and constitute a part of the specification illustrate subject matter of the disclosure Together the descriptions and the drawings serve to explain the principles of the disclosure BRIEF DESCRIPTION OF THE DRAWINGS 0013 The numerous advantages of the disclosure may be better understood by those skilled in the art by reference to the accompanying figures in which 0014 FIG 1 is a flow diagram illustrating a method to dynamically
11. ons One SP surveillance function monitors the functional ity of the firmware during the boot process while another surveillance function monitors the health of the Operating System The surveillance functions allow the SP to take appropriate action when the monitor function detects a fail ure The SP can deallocate or deconfigure a hardware resource The hardware resources deallocated or deconfig ured are bypassed during the boot process 0006 In general the configuration and management of each POWERS server and the DS8000 hardware resources takes place through a Storage Hardware Management S HMC console The S HMC is a stand alone workstation that provides both a Graphical User Interface GUI and a Command Line Interface CLI In reality the S HMC inter faces with the SP to provide a user the capability to perform configuration resource management and maintenance activities on a POWERS server and additional hardware resources of the DS8000 storage controller The S HMC con sole provides among several functions the capability to Dec 31 2009 remotely control the power management of each POWERS server such as the ability to manage the power of a POWERS server 0007 The S HMC the POWERS server and the Service Processor are interconnected through two 16 port Ethernet switches Each POWERS server Service Processor and the S HMC connects to each switch This configuration provides for a fully redundant management network
12. ority boot device on the bootlist 180 SP will continue to take actions 200 until the SP detects that it is impossible to boot the system because there are no longer boot devices available to successfully complete a boot or IPL of the OS and its application 210 In the event that no devices are available to boot the system Method 100 will notify the user of the failure 220 0018 In the present disclosure the methods disclosed may be implemented as sets of instructions or software read able by a device Further it is understood that the specific order or hierarchy of steps in the methods disclosed are examples of exemplary approaches Based upon design pref erences it is understood that the specific order or hierarchy of steps in the method can be rearranged while remaining within the disclosed subject matter The accompanying method claims present elements of the various steps in a sample order and are not necessarily meant to be limited to the specific order or hierarchy presented 0019 Itis believed that the present disclosure and many of its attendant advantages will be understood by the foregoing description and it will be apparent that various changes may be made in the form construction and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages The form described is merely explanatory and it is the inten tion of the following claims to encompass
13. ss and the IPL success and notifying a user of the failure x x x x x

Download Pdf Manuals

image

Related Search

Related Contents

GBT-SCA Updates and chip results - Indico  Philips AJ3840/00M User's Manual  guia stimulus face  ダウンロード  Pelco IXS0LW surveillance camera  D-Link N450 User's Manual  T'nB BBHOLD31  BEDIENUNGSANLEITUNG  Indesit UIAA 10  FurReal Friends Baby Butterscotch 52194 Instructions  

Copyright © All rights reserved.
Failed to retrieve file