Home

HP serviceguard t2808-90006 User's Manual

1. Figure 1 6 Continental Cluster node 2b mo High Availability Network i Los Angeles Cluster 3 wee New York Cluster Data Center A Data Center B 28 Data Replication and or Mirroring Continentalclusters provides the flexibility to work with any data replication mechanism It provides pre integrated solutions that use HP StorageWorks Continuous Access XP HP StorageWorks Continuous Access EVA or EMC Symmetrix Remote Data Facility for data replication The points to consider when configuring a continental cluster over a WAN are Inter cluster connections are TCP IP based The physical connection is one or more leased lines managed by a common carrier Common carriers cannot guarantee the same reliability that a dedicated physical cable can The distance can introduce a time lag for data replication which creates an issue with data currency This could increase the cost by requiring higher speed WAN connections to improve data replication performance and reduce latency Operational issues such as working with different personnel trained on different processes and conducting failover rehearsals are made more difficult the further apart the nodes are in the duster Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Benefits of Continentalclusters You can virtually build data centers anywhere and still
2. 2 Run the following command to add the mirrored half to the array mdadm add dev md0O dev hpdev mylink sde The re mirroring process is initiated When it is complete the extended distance cluster detects the added mirror half and accepts S1 as part of mdo Chapter 4 87 Disaster Scenarios and Their Handling Table 4 1 Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process A package P1 is running ona node Node 1 The package uses a mirror md0 that consists of two storage components S1 local to Node 1 dev hpdev mylink sde and S2 local to Node 2 Data center 1 that consists of Node 1 and P1 experiences a failure NOTE In this example failures in a data center are instantaneous For example power failure The package P 1 fails over to Node 2 and starts running with the mirror of md0 that consists of only the storage local to node 2 S2 Complete the following procedure to initiate a recovery 1 Restore data center 1 Node 1 and storage 1 Once Node 1 is restored it rejoins the duster Once S1 is restored it becomes accessible from Node 2 When the package failed over and started on Node 2 S1 was not a part of mdO As a result you need to add S1 into mdo Run the following command to add S1 to mdo mdadm add dev md0 dev hpdev mylink sde The re mirroring pro
3. 45 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines 46 Disaster Tolerant Local Area Networking Ethernet networks can also be used to connect nodes in a disaster tolerant architecture within the following guidelines e Each node is connected to redundant switches and bridges using two Ethernet host adapters Bridges repeaters or other components that convert from copper to fibre cable may be used to span longer distances Disaster Tolerant Wide Area Networking Disaster tolerant networking for continental clusters is directly tied to the data replication method In addition to the redundant lines connecting the remote nodes you also need to consider what bandwidth you need to support the data replication method you have chosen A continental cluster that handles a high number of transactions per minute will not only require a highly available network but also one with a large amount of bandwidth This is a brief discussion of things to consider when choosing the network configuration for your continental cluster Details on WAN choices and configurations can be found in Continental Cluster documentation available from http docs hp com gt High Availability e Bandwidth affects the rate of data replication and therefore the currency of the data should there be the need to switch control to another site The greater the number of transactions you process the more bandwidth
4. Node 1 Node 1 experiences a failure The package P1 fails over to another node Node 2 This node Node 2 is configured to take over the package when it fails on Node 1 As the network and both the mirrored disk sets are accessible on Node 2 and were also accessible when Node 1 failed you only need to restore Node 1 Then you must enable the package to run on Node 1 after it is repaired by running the following command cmmodpkg e P1 n N1 86 Chapter 4 Table 4 1 Disaster Scenarios and Their Handling Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process A package P1 is running ona node Node 1 The package uses a mirror md0 that consists of two storage components S1 local to Node 1 dev hpdev mylink sde and S2 local to Node 2 Access to S1 is lost from both nodes either due to power failure to S1 or loss of FC links to S1 The package P1 continues torun on Node 1 with the mirror that consists of only S2 Once you restore power to S1 or restore the FC links to S1 the corresponding mirror half of S1 dev hpdev mylink sde is accessible from Node 1 To make the restored mirrored half part of the MD array complete the following procedure 1 Run the following command to remove the mirrored half from the array mdadm remove dev md0 dev hpdev mylink sde
5. command to restart the package on N2 cmrunpkg lt package_name gt When the package starts up on N2 it automatically adds S1 back into the array and starts re mirroring from S1 to S2 When re mirroring is complete the extended distance cluster detects and accepts S1 as part of mdO again For the second failure restore N1 Once it is restored it joins the cluster and can access S1 and S2 1 Run the following command to enable P1 to run on N1 cmmodpkg e P1 n N1 92 Chapter 4 Table 4 1 Disaster Scenarios and Their Handling Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process This scenario is an extension of the previous failure scenario In the previous scenario when the package fails over to N2 it does not start as the value of RPO_TARGET would have been exceeded To forcefully start the package P1 on N2 when the FC links are not restored on N2 check the package log file on N2 and execute the commands that appear in it If the FC links are not restored on N 2 you can only start the package forcefully You can forcefully start a package only if it is determined that the associated data loss is acceptable After you execute the force start commands package P1 starts on N2 and runs with mdO consisting of only S2 dev hpdev mylink sdf Complete the following procedure to initiate
6. requiring the node to make multiple disk I Os Continuous Access XP on the HP StorageWorks Disk Array XP series is an example of physical replication in hardware a single disk I O is replicated across the Continuous Access link to a second XP disk array Advantages of physical replication in hardware are e There is little or no lag time writing to the replica This means that the data remains very current e Replication consumes no additional CPU e The hardware deals with resynchronization if the link or disk fails And resynchronization is independent of CPU failure if the CPU fails and the disk remains up the disk knows it does not have to be resynchronized e Data can be copied in both directions so that if the primary fails and the replica takes over data can be copied back to the primary when it comes back up Disadvantages of physical replication in hardware are Chapter 1 NOTE Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines The logical order of data writes is not always maintained in synchronous replication When a replication link goes down and transactions continue at the primary site writes to the primary disk are queued in a bit map When the link is restored if there has been more than one write to the primary disk then there is no way to determine the original order of transactions until the resynchronization has completed successfully
7. Chapter 1 35 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Continued Attributes Extended Distance Cluster CLX Continentalclusters HP UX only DTS Software Licenses Required SGLX XDC SGLX CLX XP or CLX EVA SG Continentalclusters Metrocluster Continuous Access XP or Metrocluster Continuous Access EVA or Metrocluster EMC SRDF or Enterprise Cluster Master Toolkit or Customer sel ected data replication subsystem CC with RAC SG SGeRAC Continental clusters 36 Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines Disaster Tolerant Architecture Guidelines Disaster tolerant architectures represent a shift away from the massive central data centers and towards more distributed data processing facilities While each architecture will be different to suit specific availability needs there are a few basic guidelines for designing a disaster tolerant architecture so that it protects against the loss of an entire data center e Protecting nodes through geographic dispersion e Protecting data through replication e Using alternative power sources e Creating highly available networks These guidelines are in addition to the standard high availability gu
8. Cluster solutions in order of preference e Quorum Server running in a Serviceguard cluster e Quorum Server For more information on Quorum Server see the Serviceguard Quorum Server Release Notes for Linux Figure 2 1 is an example of a two data center and third location configuration using DWDM with a quorum server node on the third site 54 Chapter 2 Figure 2 1 Chapier 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID Two Data Center and Quorum Service Location Architectures Two Data Centers and Third Location with DWDM and Quorum Server Quorum Sever Third Location Maximum distance 100 kilometers Figure 2 1 is an example of a two data center and third location configuration using DWDM with a quorum server node on the third site The DWDM boxes connected between the two Primary Data Centers are configured with redundant dark fiber links and the standby fibre feature has been enabled 55 Building an Extended Distance Cluster Using Serviceguard and Software RAID Two Data Center and Quorum Service Location Architectures 56 There are no requirements for the distance between the Quorum Server Data center and the Primary Data Centers however it is necessary to ensure that the Quorum Server can be contacted within a reasonable amount of time should be within the NODE_TIMEOUT period LockLUN arbitration is not allowed in this configuration Chapier 2 Chapter 2 Building an Exte
9. DC2 of the failure So DC2 assumes that there were no updates after t1 but there may have been When this scenario occurs disk writes continue on DC1 until t3 In this case the effective value of the RPO_TARGET parameter is greater than the expected value of 0 Chapier 3 81 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File IMPORTANT 82 Again if the network is set up in such a way that when the links between the sites fail the communication links to the application clients are also shut down then the unintended writes are not acknowledged and have no long term effect The value you set for RPO_TARGET must be more than the value you set for the RAID_MONITOR_INTERVAL parameter By default the RAID_MONITOR_INTERVAL parameter is set to 30 seconds For example RPO_TARGET 60 seconds MULTIPLE_DEVICES AND COMPONENT DEVICES Parameter RAID_DEVICE specifies the MD devices that are used by a package You must begin with RAID_DEVICE 0 and increment the list in sequence Component device parameters DEVICE_O and DEVICE_1 specify the component devices for the MD device of the same index For example if a package uses multiple devices such as md0 consisting of devices dev hpdev sde and dev hpdev sdf and mdl consisting of devices dev hpdev sdgl1 and dev hpdev mylink sdhl use mdo RAID DEVICE 0
10. Disk sssusa saasaa ranan 101 Adding a Mirror Component Device 0 000 eects 102 Contents Contents Printing History Table 1 Editions and Releases Printing Date Part Number Edition Operating System Releases see N ote below November T 2808 90001 Edition 1 e RedHat 4U3 or later 2000 e Novell SUSE Linux Enterprise Server 9 SP3 or later e Novell SUSE Linux Enterprise Server 10 or later May 2008 T2808 90006 Edition 2 e RedHat 4U3 or later e Novell SUSE Linux Enterprise Server 9 SP3 or later e Novell SUSE Linux Enterprise Server 10 or later The printing date and part number indicate the current edition The printing date changes when a new edition is printed Minor corrections and updates which are incorporated at reprint do not cause the date to change The part number changes when extensive technical changes are incorporated New editions of this manual will incorporate all material updated since the previous edition NOTE This document describes a group of separate software products that are released independently of one another Not all products described in this document are necessarily supported on all the same operating system releases Consult your product s Red ease Notes for information about supported platforms HP Printing Division Business Critical Computing Business Unit Hewlett Packard Co 19111 Pruneridge Ave Cupertino CA 95014 Preface This guide introduces th
11. EVA own SG supported writes are synchronous Resynchronization can impact performance Complete resynchronization is required in many scenarios that have multiple failures Replication and resynchronization performed by the storage subsystem so the host does not experience a performance hit Incremental resynchronizations are done based on bitmap minimizing the need for full re syncs storage and data replication mechanism or implementing one of HP s pre integrated solutions including Continuous Access XP Continuous Access EVA and EMC SRDF for array based or Oracle 8i Standby for host based Also you may choose Oracle 9i Data Guard asa host based solution Contributed that is unsupported integration templates for Oracle 9i Chapter 1 33 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Continued F Extended Distance Continentalclusters pba Cluster orn HP UX only Application Automatic no manual Automatic no manual Semi automatic user Failover intervention required intervention required must push the type button to initiate recovery Access Mode Active Standby Active Standby Active Standby for a package Client Client detects the lost Client detects the lost You must reconnect Transparen connection
12. HA Clusters Using Metrocluster and Continental clusters B7660 900xx e HP StorageWorks Cluster Extension EVA user guide e HP StorageWorks Cluster Extension XP for HP Serviceguard for Linux e HP Serviceguard for Linux Version A 11 16 Re ease Notes e Managing HP Serviceguard for Linux Use the following URL to access HP s High Availability web page http www hp com go ha If you have problems with HP software or hardware products please contact your HP support representative 11 12 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerance and Recovery in a Serviceguard Cluster This chapter introduces a variety of Hewlett Packard high availability cluster technologies that provide disaster tolerance for your mission critical applications It is assumed that you are already familiar with Serviceguard high availability concepts and configurations This chapter covers the following topics Evaluating the Need for Disaster Tolerance on page 14 What is a Disaster Tolerant Architecture on page 16 Understanding Types of Disaster Tolerant Clusters on page 18 Disaster Tolerant Architecture Guidelines on page 37 Managing a Disaster Tolerant Environment on page 48 Additional Disaster Tolerant Solutions Information on page 50 13 Disaster Tolerance and Recovery in a Serviceguard Cluster Evaluating the Need for Disaster Tolerance 14 Evaluating the N
13. RPO_TARGET e Extended Distance Cluster disk reads may outperform CLX in normal operations On the other hand CLX data resynchronization and recovery performance are better than Extended Distance Cluster Continental Cluster A continental cluster provides an alternative disaster tolerant solution in which distinct clusters can be separated by large distances with wide area networking used between them Continental cluster architecture is implemented using the Continentalclusters product described fully in Chapter 2 of the Designing Disaster Tolerant HA Clusters Using M amp rocluster and Continentalclusters user s guide This product is available only on HP UX and not on Linux The design is implemented with two distinct Serviceguard clusters that can be located in different geographic areas with the same or different subnet configuration In this architecture each cluster maintains its own quorum so an arbitrator data center is not used for a continental cluster A continental cluster can use any WAN connection through a TCP IP protocol however due to data replication needs high speed connections such as T1 or T3 E3 leased lines or switched lines may be required See Figure 1 6 A continental cluster can also be built using multiple clusters that communicate over shorter distances using a conventional LAN 27 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters
14. RPO_TARGET values The discussion below is based on the timeline and events shown in Figure 3 1 Figure 3 1 RPO Target Definitions tO tl t2 t3 s t4 Package is started on DC2 using only local data if t4 t1 isless than RPO_TARGET Cluster reconfigured without DC1 nodes DC1 fails Heartbeats are lost Link Down Timeout expires Package continues writing to DC1 Storage Storage links are lost between datacenters before DC1 fails Disk updates are paused based on Link Down Timeout Parameter To ensure that no data is lost when a package fails over to DC2 and starts with only DC2 s local storage t2 must occur between tO and t1 80 Chapier 3 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File Now consider an XDC configuration such as that shown in Figure 1 3 DWDM links between data centers If DC1 fails such that links A and B both fail simultaneously and DC1 s connection to the Quorum Server fails at the same time Serviceguard ensures that DC2 survives and the package fails over and runs with DC2 local storage But if DC1 s links A and B fail and later DC1 s link to the Quorum Server fails then both sets of nodes DC1 and DC2 will try to obtain the cluster lock from the Quorum Server If the Quorum server chooses DC1 which is about to experience complete site failure then the entire duster will go down But if the Quorum Server chooses DC2 instead
15. This increases the risk of data inconsistency Also because the replicated data is a write operation to a physical disk block database corruption and human errors such as the accidental removal of a database table are replicated at the remote site Configuring the disk so that it does not allow a subsequent disk write until the current disk write is copied to the replica synchronous writes can limit this risk as long as the link remains up Synchronous writes impact the capacity and performance of the data replication technology Redundant disk hardware and cabling are required This at a minimum doubles data storage costs because the technology is in the disk itself and requires specialized hardware For architectures using dedicated cables the distance between the sites is limited by the cable interconnect technology Different technologies support different distances and provide different data through performance For architectures using common carriers the costs can vary dramatically and the connection can be less reliable depending on the Service Level Agreement Advantages of physical replication in software are There is little or no time lag between the initial and replicated disk I O so data remains very current The solution is independent of disk technology so you can use any supported disk technology Data copies are peers so there is no issue with reconfiguring a replica to function as a primary
16. You must connection You must once the application is cy reconnect once the reconnect once the recovered at second application isrecovered application is recovered site at second site at second site Maximum 2 nodes for this release 2 to 16 nodes 1 to 16 nodes in each Cluster Size cluster supporting up Allowed to 3 primary clusters and one recovery cluster maximum total of 4 clusters 64 nodes Storage Identical storage is not Identical Storage is Identical storage is required replication is required required if host based with MD storage based mirroring mirroring is used Identical storage is not required for other data replication implementations 34 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Continued r Extended Distance Continentalclusters ater utes Cluster GEN HP UX only Data Dark Fiber Dark Fiber WAN iin ganon Continuous Access over LAN IP Dark Fiber Continuous Access over pre integrated ATM solution Continuous Access over IP pre integrated solution Continuous Access over ATM pre integrated solution Cluster Single or multiple IP Single or multiple IP Two configurations Network subnet subnet Single IP subnet for both clusters LAN connection between clusters Two P subnets one per cluster WAN connection between clusters
17. a recovery 1 Reconnect the FC links between the data centers As a result S1 dev hpdev mylink sde becomes accessible from N2 2 Run the following command to add S1 to md0 on N2 mdadm add dev md0 dev hpdev mylink sde This command initiates the re mirroring process from S2 to S1 When re mirroring is complete the extended distance duster detects S1 and accepts it as part of mdo Chapter 4 93 Disaster Scenarios and Their Handling Table 4 1 Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process In this case the package P 1 runs with RPO TARGET set to 60 seconds In this case initially the package P1 is running on node N1 P1 uses a mirror mdO consisting of S1 local to node N1 for example dev hpdev mylink sde and S2 local to node N 2 The first failure occurs when all FC links between the two data centers fail causing N1 to lose access to S2 and N2 to lose access to S1 Immediately afterwards a second failure occurs where node N 1 goes down because of a power failure After N1is repaired and brought back into the cluster package switching of P1 toN1lis enabled IMPORTANT While it is not a good idea to enable package switching of P1toN1 it is described here to show recovery from an operator error The FC links between the data centers are not repaired and N2 becomes inaccessible
18. because of a power failure When the first failure occurs the package P 1 continues to run on N1 with md0 consisting of only S1 When the second failure occurs the package fails over to N2 and starts with S2 When N2 fails the package does not start on node N1 because a package is allowed to start only once with a single disk You must repair this failure and both disks must be synchronized and be a part of the MD array before another failure of same pattern occurs In this failure scenario only S1is available to P1 on N1 as the FC links between the data centers are not repaired As P1 started once with S2 on N2 it cannot start on N1 until both disks are available Complete the following steps to initiate a recovery 1 Restore the FC links between the data centers As a result S2 dev hpdev mylink sdf becomes available to N1 and S1 dev hpdev mylink sde becomes accessible from N2 2 Tostart the package P1 on N1 check the package log file in the package directory and run the commands which will appear to force a package start When the package starts up on N1 it automatically adds S2 back into the array and the re mirroring process is started When re mirroring is complete the extended distance duster detects and accepts S1 as part of mdo 94 Chapter 4 Table 4 1 Disaster Scenarios and Their Handling Disaster Scenarios and Their Handling Continued Disaster S
19. dev md0 DEVICE_0 0 dev hpdev sde DEVICE_1 0 dev hpdev sdf md1 RAID DEVICE 1 dev mdl1 DEVICE_0 1 dev hpdev sdg1 DEVICE_1 1 dev hpdev sdh1 The MD RAID device names and the component device names must be unique across the packages in the entire cluster Chapier 3 IMPORTANT Chapier 3 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File e RAID _MONITOR_INTERVAL This parameter defines the time interval in seconds the raid monitor script waits between each check to verify accessibility of both component devices of all mirror devices used by this package By default this parameter is set to 30 seconds After you edit the parameters ensure that you copy the package control script and the edited raid conf file to all nodes in the cluster All the nodes in the cluster must have the identical copy of the files After you have installed the XDC software and completed the configuration procedures your environment is equipped to handle any failure scenarios The subsequent chapter discusses how certain disaster scenarios are handled by the XDC software 83 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File 84 Chapier 3 Disaster Scenarios and Their Handling 4 Disaster Scenarios and Their Handling The pre
20. device in data center 2 As a result the data that is written to this MD device is simultaneously written to both devices A package that is running on one node in one data center has access data from both storage devices The two recommended configurations for the extended distance cluster are both described below 19 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Figure 1 3 Extended Distance Cluster Quorum location Datacenter 1 Cent Connections Client Connections In the above configuration the network and FC links between the data centers are combined and sent over common DWDM links Two DWDM links provide redundancy When one of them fails the other may still be active and may keep the two data centers connected Using the DWDM link clusters can now be extended to greater distances which were not possible earlier due to limits imposed by the Fibre Channel link for storage and Ethernet for networks Storage in both data centers is connected to both the nodes via two FC switches in order to provide multiple paths This configuration supports a distance up to 100 kms between datacenter1 and datacenter2 20 Chapter 1 Figure 1 4 Datacenter 1 DC1 Client Connections Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Two Data Center Setup Quorum location Datacenter 2 DC
21. for more information on supported devices 59 Building an Extended Distance Cluster Using Serviceguard and Software RAID Guidelines on DWDM Links for Network and Data 60 Chapter 2 Chapier 3 Configuring your Environment for Software RAID Configuring your Environment for Software RAID The previous chapters discussed conceptual information on disaster tolerant architectures and procedural information on creating an extended distance cluster This chapter discusses the procedures you need to follow to configure Software RAID in your extended distance cluster Following are the topics discussed in this chapter e Understanding Software RAID on page 62 e Installing the Extended Distance Cluster Software on page 63 e Configuring the Environment on page 66 61 Configuring your Environment for Software RAID Understanding Software RAID Understanding Software RAID Redundant Array of Independent Disks RAID is a mechanism that provides storage fault tolerance and occasionally better performance Software RAID is designed on the concept of RAID 1 RAID 1 uses mirroring where data is written to two disks at the same time The Serviceguard XDC product uses the Multiple Device MD driver and its associated tool mdadm to implement Software RAID With Software RAID two disks or disk sets are configured so that the same data is written on both disks as one write transaction Soif data from one disk set is lost or
22. get hours or days behind in replication using asynchronous replication If the application fails over to the remote site it would start up with data that is not current Currently the two ways of replicating data on line are physical data replication and logical data replication Either of these can be configured to use synchronous or asynchronous writes Physical Data Replication Each physical write to disk is replicated on another disk at another site Because the replication is a physical write to disk it is not application dependent This allows each node to run different applications under normal circumstances Then if a disaster occurs an alternate node can take ownership of applications and data provided the replicated data is current and consistent As shown in Figure 1 7 physical replication can be done in software or hardware 39 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines Figure 1 7 Physical Data Replication 40 Physical Replication in software MID SWRAID Direct access to both copies of data is optional Replication The distance between nodes is limited by mem HoR impe node la j nati ESCON FC and ibe ae abr cation the intermediate Direct access is optional devices FC DWDND on link Replication MD Software RAID is an example of physical replication done in the software a disk I O is written to each array connected to the node
23. havethe data centers provide disaster tolerance for each other Since Continentalclusters uses two clusters theoretically there is no limit to the distance between the two clusters The distance between the clusters is dictated by the required rate of data replication to the remote site level of data currency and the quality of networking links between the two data centers In addition inter cluster communication can be implemented with either a WAN or LAN topology LAN support is advantageous when you have data centers in close proximity to each other but do not want the data centers configured into a single cluster One example may be when you already have two Serviceguard clusters close to each other and for business reasons you cannot merge these two clusters into a single cluster If you are concerned with one of the centers becoming unavailable Continentalclusters can be added to provide disaster tolerance Furthermore Continentalclusters can be implemented with an existing Serviceguard cluster architecture while keeping both clusters running and provide flexibility by supporting disaster recovery failover between two clusters that are on the same subnet or on different subnets You can integrate Continentalclusters with any storage component of choice that is supported by Serviceguard Continentalclusters provides a structure to work with any type of data replication mechanism A set of guidelines for integrating other data replica
24. if one disk set is rendered unavailable the data is always available from the second disk set As aresult high availability of data is guaranteed In an extended distance cluster the two disk sets are in two physically separated locations so if one location becomes unavailable the other location still has the data For more information on Linux Software RAID see The SoftwareRAID HOWTO manual available at http www tldp org HOWTO Software RAID HOWTO html To enable Software RAID in your extended distance cluster you need to complete the following 1 Install the extended distance cluster software 2 Copy the files into package directories 3 Configure packages that will use Software RAID The subsequent sections include information on installing Extended Distance Cluster software and configuring your environment for Software RAID 62 Chapier 3 Configuring your Environment for Software RAID Installing the Extended Distance Cluster Software Installing the Extended Distance Cluster Software This section discusses the supported operating systems prerequisites and the procedures for installing the Extended Distance Cluster software Supported Operating Systems The Extended Distance Cluster software supports the following operating systems e RedHat 4U3 or later e Novell SUSE Linux Enterprise Server 9 SP3 or later e Novell SUSE Linux Enterprise Server 10 or later Prerequisites Following are the prerequisite
25. often must be done at both the file system level and the database level in order to replicate all of the data associated with an application Most database vendors have one or more database replication products An example is the Oracle Standby Database Logical replication can be configured to use synchronous or asynchronous writes Transaction processing monitors TPMs can also perform logical replication Chapter 1 Figure 1 8 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines Logical Data Replication node 1 node la Network Logical Replication in Software No direct access to both copies of data Advantages of using logical replication are The distance between nodes is limited only by the networking technology There is no additional hardware needed to do logical replication unless you choose to boost CPU power and network bandwidth Logical replication can be implemented to reduce risk of duplicating human error For example if a database administrator erroneously removes a table from the database a physical replication method will duplicate that error at the remote site as a raw write to disk A logical replication method can be implemented to delay applying the data at a remote site so such errors would not be replicated at the remote site This also means that administrative tasks such as adding or removing database tables h
26. the MD mirror from starting To avoid this problem HP requires that you make the device names persistent For more information on configuring persistent device names see Using Persistent Device Names on page 71 Create the MD mirror device To enable Software RAID in your environment you need to first create the mirror setup This implies that you specify two disks to create a Multiple Device MD When configuring disks in RAID 1 level use a disk or LUN from each datacenter as one mirror half Be sure to create disk sets of the same size as they need to store data Chapier 3 IMPORTANT Chapier 3 Configuring your Environment for Software RAID Configuring the Environment that are of identical sizes Differences in disk set size results in a mirror being created of a size equal to the smaller of the two disks Besuretocreate the mirror using the persistent device names of the component devices For more information on creating and managing a mirrored device see Creating a Multiple Disk Device on page 72 Create volume groups and logical volumes on the MD mirror device Once the MD mirror device is created you need to create volume groups and logical volumes on it You must use the Volume Group Exclusive activation feature This protects against a volume group which is already active on one node to be activated again accidentally or on purpose on any other node in the cluster For more information on creatin
27. then the application running on DC1 will not be able to write to the remote storage but will continue to write to its local DC1 storage until site failure occurs at t3 If the network is set up in such a way that the application cannot communicate with its clients under these circumstances the clients will not receive any acknowledgement of these writes HP recommends you configure the network such that when links between the sites fail the communication links to the application clients also go down If the network is configured to prevent the application from communicating with its clients under these circumstances the clients will not receive any acknowledgement of these writes and after the failover will retransmit them and the writes will be committed and acknowledged at DC2 This is the desired outcome HP recommends you configure the network such that when links between the sites fail the communication links to the application clients are also shut down In the case of an XDC configuration such as that shown in Figure 1 4 there is an additional variable in the possible failure scenarios Instead of a DWDM link in this configuration there are two separate LAN and FC links which can experience failure independent of each other If the network links between the sites fail within a very short period on the order of 1 second after t1 after the storage links had failed the XDC software on DC1 will not have time to inform the XDC on
28. with the MDO that consists of only S1 After the second failure the package P 1 fails over to N2 and starts with S2 Data that was written to S1 after the FC link failure is now lost because the RPO_TARGET was set to IGNORE In this scenario no attempts are made to repair the first failure until the second failure occurs Typically the second failure occurs before the first failure is repaired 1 To recover from the first failure restore the FC links between the data centers Asa result S1 is accessible from N2 2 Run the following command to add S1 to mdO on N2 mdadm add dev md0 dev hpdev mylink sde This command initiates the re mirroring process When it is complete the extended distance cluster detects S1 and accepts it as mdO For the second failure restore N1 Once it is restored it joins the cluster and can access S1 and S2 1 Run the following command to enable P1 to run on N1 cmmodpkg e P1 n N1 90 Chapter 4 Table 4 1 Disaster Scenarios and Their Handling Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process This failure is the same as the previous failure except that the package P1 is configured with RPO_TARGET set to 60 seconds In this case initially the package P1 is running on N 1 P1 uses a mirror mdO consisting of S1 local to node N1 dev hpdev mylink
29. you will need The following connection types offer differing amounts of bandwidth TlandT3 low end ISDN and DSL medium bandwidth ATM high end e Reliability affects whether or not data replication happens and therefore the consistency of the data should you need to fail over to the recovery cluster Redundant leased lines should be used and should be from two different common carriers if possible e Cost influences both bandwidth and reliability Higher bandwidth and dual leased lines cost more It is best to address data consistency issues first by installing redundant lines then weigh the price of data currency and select the line speed accordingly Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines Disaster Tolerant Cluster Limitations Disaster tolerant clusters have limitations some of which can be mitigated by good planning Some examples of MPOF that may not be covered by disaster tolerant configurations e Failure of all networks among all data centers This can be mitigated by using a different route for all network cables e Loss of power in more than one data center This can be mitigated by making sure data centers are on different power circuits and redundant power supplies are on different circuits If power outages are frequent in your area and down timeis expensive you may want to invest in a backup generator e Loss of all cop
30. 10 noarch rpm This command initializes the XDC software installation After you install XDC you need to copy the raid conf template into each package directory that you need to enable Software RAID 6 Run the following command to copy the raid conf template file as raid conf file into each package directory cp SSGROOT xdc raid conf template SSGCONF lt pkgdir gt raid conf The file is copied into the package directory Installing the Extended Distance Cluster software does not enable Software RAID for every package in your environment You need to manually enable Software RAID for a package by copying the files into the package directories Also if you need to enable Software RAID for more than one package in your environment you need to copy the files and templates into each of those package directories You must edit these template files later Verifying the XDC Installation After you install XDC run the following command to ensure that the software is installed rpm qa grep xdc Chapter 3 Configuring your Environment for Software RAID Installing the Extended Distance Cluster Software In the output the product name xdc A 01 00 0 will be listed The presence of this file verifies that the installation is successful Chapier 3 65 Configuring your Environment for Software RAID Configuring the Environment Configuring the Environment After setting up the hardware as described in the Extended Distance C
31. 2 Client Connections Figure 1 4 shows a configuration that is supported with separate network and FC links between the data centers In this configuration the FC links and the Ethernet networks are not carried over DWDM links But each of these links is duplicated between the two data centers for redundancy The disadvantage of having the network and the FC links separate is that if there is a link failure between sites the ability to exchange heartbeats and the ability to write mirrored data will not be lost at the same time This configuration is supported to a distance of 10 kms between data centers All the nodes in the extended distance cluster must be configured with QLogic driver s multipath feature in order to provide redundancy in connectivity to the storage devices Mirroring for the storage is configured such that each half of the mirror disk set will be physically present at one datacenter each Further from each of the nodes there are multiple paths to both of these mirror halves 21 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters 22 Also note that the networking in the configuration shown is the minimum Added network connections for additional heartbeats are recommended Benefits of Extended Distance Cluster This configuration implements a single Serviceguard cluster across two data centers and uses either Multiple Device MD driver for d
32. 4452K sec unused devices lt none gt 102 Appendix A A asynchronous data replication 39 C cluster extended distance 22 FibreChannel 52 metropolitan 23 wide area 27 cluster maintenance 49 configuring 46 disaster tolerant Ethernet networks 46 disaster tolerant WAN 46 consistency of data 38 continental cluster 27 currency of data 38 D data center 17 data consistency 38 data currency 38 data recoverability 38 data replication 38 FibreChannel 52 ideal 44 logical 42 off line 38 online 39 physical 39 synchronous or asynchronous 39 DATA_REP Variable edit 77 disaster recovery automating 48 services 48 disaster tolerance evaluating need 14 disaster tolerant architecture 17 cluster types 18 Continentalclusters WAN 46 definition 16 FibreChannel cluster 52 guidelines for architecture 37 limitations 47 managing 48 metropolitan cluster rules 23 staffing and training 49 Index Ethernet disaster tolerant 46 evaluating need for disaster tolerance 14 Extended Distance Cluster configuring 66 installing 63 prerequisites 63 extended distance cluster benefits 22 building 51 F FibreChannel clusters 52 G geographic dispersion of nodes 37 L Link Down Timeout 69 logical data replication 42 M MD Device create and assemble 72 MD mirror device create 66 metropolitan cluster definition 23 multiple points of failure 17 MULTIPLE DEVICES AND COMPON
33. Cluster Extension for EVA or XP e HP StorageWorks Cluster Extension EVA user guide e HP StorageWorks Cluster Extension XP for HP Serviceguard for Linux Chapter 1 Building an Extended Distance Cluster Using Serviceguard and Software RAID 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID Simple Serviceguard clusters are usually configured in a single data center often in a single room to provide protection against failures in CPUs interface cards and software E xtended Serviceguard clusters are specialized cluster configurations which allow a single cluster to extend across two separate data centers for increased disaster tolerance Depending on the type of links employed distances of up to 100 kms between data centers can be achieved This chapter discusses several types of extended distance cluster that use basic Serviceguard technology with software mirroring using MD Software RAID and Fibre Channel Both two data center and three data center architectures are illustrated This chapter discusses the following e Types of Data Link for Storage and Networking on page 52 e Two Data Center and Quorum Service Location Architectures on page 53 e Rules for Separate Network and Data Links on page 57 e Guidelines on DWDM Links for Network and Data on page 58 Chapter 2 51 Building an Extended Distance Cluster Using Serviceguard and Software RAID Types of Data Link for Stor
34. Data Center B Housing remote nodes in another building often implies they are powered by a different circuit soit is especially important to make sure all nodes are powered from a different source if the disaster tolerant cluster is located in two data centers in the same building Some disaster tolerant designs go as far as making sure that their redundant power source is supplied by a different power substation on the grid This adds protection against large scale power failures such as brown outs sabotage or electrical storms Creating Highly Available Networking Standard high availability guidelines require redundant networks Redundant networks may be highly available but they are not disaster tolerant if a single accident can interrupt both network connections For example if you use the same trench to lay cables for both networks you do not have a disaster tolerant architecture because a single accident such as a backhoe digging in the wrong place can sever both cables at once making automated failover during a disaster impossible In a disaster tolerant architecture the reliability of the network is paramount To reduce the likelihood of a single accident causing both networks to fail redundant network cables should be installed so that they use physically different routes for each network How you route cables will depend on the networking technology you use Specific guidelines for some network technologies are listed here
35. ENT _ DEVICES 82 networks disaster tolerant Ethernet 46 disaster tolerant WAN 46 o off line data replication 38 online data replication 39 operations staff general guidelines 49 P package control script configure 67 Persistent Device Names Using 71 103 Index persistent device names 66 physical data replication 39 power sources redundant 44 Q QLogic cards 70 R RAID Monitoring Service Configure 78 raid conf file Edit 78 RAID_MONITOR_INTERVAL 83 recoverability of data 38 redundant power sources 44 replicating data 38 off line 38 online 39 rolling disaster 49 rolling disasters 47 RPO_TARGET 78 S Serviceguard 16 Software RAID guidelines 67 understanding 62 synchronous data replication 39 vV Volume Groups Creating 74 Ww WAN configuration 46 wide area cluster 27 X XDC_CONFIG FILE 78 104
36. HP Serviceguard Extended Distance Cluster for Linux A 01 00 Deployment Guide ra Manufacturing Part Number T2808 90006 May 2008 Second Edition Legal Notices Copyright 2006 2008 H ewlett Packard Development Company L P Publication Date 2008 Valid license from HP required for possession use or copying Consistent with FAR 12 211 and 12 212 Commercial Computer Software Computer Software Documentation and Technical Data for Commercial Items are licensed to the U S Government under vendor s standard commercial license The information contained herein is subject to change without notice The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services Nothing herein should be construed as constituting an additional warranty HP shall not be liable for technical or editorial errors or omissions contained herein Intel Itanium registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries Oracle is a registered trademark of Oracle Corporation UNIX is a registered trademark in the United States and other countries licensed exclusively through The Open Group Linux is a registered trademark of Linus Torvalds Red Hat is a registered trademark of Red Hat Software I nc SuSE is a registered trademark of SuSE Inc Contents 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Evalua
37. NABLED YES SERVICE_HALT TIMEOUT 300 e RPO_TARGET 78 Chapier 3 IMPORTANT Chapier 3 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File more time elapses than what is specified for RPO_TARGET the package is prevented from starting on the remote node assuming that the node still has access only to its own half of the mirror By default RPO_TARGET is set to 0 Leave it at 0 to ensure the package does not start on an adoptive node with a mirror half that is not current This ensures the highest degree of data currency If RPO_TARGET is not set to 0 the value of RAID_MONITOR_INTERVAL should be less than the value of RPO_TARGET RAID_MONITOR_INTERVAL should also be less than the value of the Link_Down_Timeout parameter so that disk access failure can be recognized early enough in certain failure scenarios A very low value of RAID_MONITOR_INTERVAL less than 5 seconds has some impact on system performance because of the high frequency of polling You can also set RPO_TARGET tothe special value 1 or to any positive integer Setting RPO_TARGET to 1 causes the RAID system to ignore any time window checks on the disk set This allows the package to start with a mirror half that is not current Setting the RPO_TARGET to any positive integer means that the package will start with a mirror half that is not current by any number of seconds
38. age and Networking Table 2 1 NOTE 52 Types of Data Link for Storage and Networking Fibre Channel technology lets you increase the distance between the components in an Serviceguard cluster thus making it possible to design a disaster tolerant architecture The following table shows some of the distances possible with a few of the available technologies including some of the Fiber Optic alternatives Link Technologies and Distances Type of Link Pe Gigabit Ethernet Twisted Pair 50 meters Short Wave Fiber 500 meters Long Wave Fiber 10 kilometers Dense Wave Division Multiplexing 100 kilometers DWDM The development of DWDM technology allows designers to use dark fiber high speed communication lines provided by common carriers to extend the distances that were formerly subject to limits imposed by Fibre Channel for storage and Ethernet for network links Increased distance often means increased cost and reduced speed of connection Not all combinations of links are supported in all cluster types For a current list of supported configurations and supported distances see the HP Configuration Guide available through your HP representative Chapter 2 NOTE Chapier 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID Two Data Center and Quorum Service Location Architectures Two Data Center and Quorum Service Location Architectures A two data center and Quoru
39. ared storage depend on the storage subsystem and the HBA driver in the configuration Follow the documentation for those devices when setting up multipath The MD driver was used with earlier versions of Serviceguard and may still be used by some storage system HBA combinations For that reason there are references to MD in the template files worksheets and other areas Only use MD if your storage systems specifically calls out its use for multipath If some other multipath mechanism is used e g one built into an HBA driver then references to MD RAIDTAB RAIDSTART etc should be commented out If the references are in the comments they can be ignored References to MD devices such as dev md0 should be replaced with the appropriate multipath device name For example RAIDTAB usr local cmcluster conf raidtab sg RAIDTAB MD RAID COMMANDS 76 Chapier 3 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File Specify the method of activation and deactivation for md Leave the default RAIDSTART raidstart RAIDSTOP raidstop if you want md to be started and stopped with default methods RAIDSTART raidstart c RAIDTAB RAIDSTOP raidstop c RAIDTAB Chapier 3 Creating and Editing the Package Control Scripts After you install the XDC software you need to create a package control script and add refere
40. as to be repeated at each site With database replication you can roll transactions forward or backward to achieve the level of currency desired on the replica although this functionality is not available with file system replication Disadvantages of logical replication are It uses significant CPU overhead because transactions are often replicated more than once and logged to ensure data consistency and all but the most simple database transactions take significant CPU It also uses network bandwidth whereas most physical replication methods use a separate data replication link As a result there may be a significant lag in replicating transactions at the remote site which affects data currency 43 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines 44 e Ifthe primary database fails and is corrupt which results in the replica taking over then the process for restoring the primary database so that it can be used as the replica is complex This often involves recreating the database and doing a database dump from the replica e Applications often have to be modified to work in an environment that uses a logical replication database Logic errors in applications or in the RDBMS code itself that cause database corruption will be replicated to remote sites This is also an issue with physical replication e Most logical replication methods do not support personality swapping whic
41. aster Tolerance and Recovery in a Serviceguard Cluster What is a Disaster Tolerant Architecture impact For these types of installations and many more like them it is important to guard not only against single points of failure but against multiple points of failure MPOF or against single massive failures that cause many components to fail such as the failure of a data center of an entire site or of a small area A data center in the context of disaster recovery is a physically proximate collection of nodes and disks usually all in one room Creating clusters that are resistant to multiple points of failure or single massive failures requires a different type of cluster architecture called a disaster tolerant architecture This architecture provides you with the ability to fail over automatically to another part of the cluster or manually to a different cluster after certain disasters Specifically the disaster tolerant cluster provides appropriate failover in the case where a disaster causes an entire data center to fail as in Figure 1 2 Figure 1 2 Disaster Tolerant Architecture Quorum location Datacenter 1 Connections Client Connections Chapter 1 17 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters NOTE 18 Understanding Types of Disaster Tolerant Clusters To protect against multiple points of failure cluster components must be geographicall
42. ata replication You may choose any mix of Fibre Channel based storage supported by Serviceguard that also supports the QLogic multipath feature This configuration may be the easiest to understand as it is similar in many ways toa standard Serviceguard cluster Application failover is minimized All disks are available to all nodes so that if a primary disk fails but the node stays up and the replica is available there is no failover that is the application continues to run on the same node while accessing the replica Data copies are peers so there is no issue with reconfiguring a replica to function as a primary disk after failover Writes are synchronous so data remains current between the primary disk and its replica unless the link or disk is down Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Cluster Extension CLX Cluster A Linux CLX cluster is similar toan HP UX metropolitan cluster and is a cluster that has alternate nodes located in different parts of a city or in nearby cities Putting nodes further apart increases the likelihood that alternate nodes will be available for failover in the event of a disaster The architectural requirements are the same as for an extended distance cluster with the additional constraint of a third location for arbitrator node s or quorum server And as with an extended distance cluster the dista
43. ation is fine for many applications for which recovery time is not an issue critical to the business Although data might be replicated weekly or even daily recovery could take from a day to a week Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines depending on the volume of data Some applications depending on the role they play in the business may need to have a faster recovery time within hours or even minutes On line Data Replication On line data replication is a method of copying data from one site to another across a link It is used when very short recovery time from minutes to hours is required To be able to recover use of a systemina short time the data at the alternate site must be replicated in real time on all disks Data can be replicated either synchronously or asynchronously Synchronous replication requires one disk write to be completed and replicated before another disk write can begin This method improves the chances of keeping data consistent and current during replication However it greatly reduces replication capacity and performance as well as system response time Asynchronous replication does not require the primary site to wait for one disk write to be replicated before beginning another This can be an issue with data currency depending on the volume of transactions An application that has a very large volume of transactions can
44. cenario What Happens When This Disaster Occurs Recovery Process In this case initially the package P1 is running on node N1 P1 uses a mirror mdO consisting of S1 local to node N1 for example dev hpdev mylink sde and S2 local to node N 2 The first failure occurs with all Ethernet links between the two data centers failing With this failure the heartbeat exchange is lost between N1 and N2 This results in both nodes trying to get to the Quorum server If N1 accesses the Quorum server first the package continues to run on N1 with S1 and S2 while N2 is rebooted If N2 accesses the Quorum server the package fails over to N2 and starts running with both S1 and S2 and N1 is rebooted Complete the following steps to initiate a recovery 1 You need to only restore the Ethernet links between the data centers so that N1 and N2 can exchange heartbeats 2 After restoring the links you must add the node that was rebooted as part of the cluster Run the cmrunnode command to add the node to the duster NOTE If this failureisa precursor to a site failure and if the Quorum Service arbitration selects the site that is likely to have a failure it is possible that the entire duster will go down In this case initially the package P1 is running on node N1 P1 uses a mirror mdO consisting of S1 local to node N1 say dev hpdev mylink sde and S2 local to node N 2 The first failure occurs when th
45. ces lt none gt 100 Appendix A Managing an MD Device Removing and Adding an MD Mirror Component Disk Removing and Adding an MD Mirror Component Disk There are certain failure scenarios where you would need to manually remove the mirror component of an MD device and add it again later For example if links between two data centers fail you would need to remove and add the disks that were marked as failed disks When a disk within an MD device fails the proc mdstat file of the MD array displays a message For example root dlhctl1 dev cat proc mdstat Personalities raidl md0 active raidl sde 2 F sdf 0 9766784 blocks unused devices Appendix A 2 1 U_ lt none gt In the message the F indicates which disk has failed In this example the sde 2 disk has failed In such a scenario you must remove the failed disk from the MD array You need to determine the persistent name of the failed disk before you remove it from the MD array For this example run the following command to determine the persistent name of the disk udevinfo q symlink n sdcl Following is a sample output hpdev mylink sdc disk by id scsi 3600805 3000b9510a6d7f 8a6cdb70054 part1 disk by path pci 0000 06 01 0 scsi 0 0 1 30 partl Run the following command to remove a failed component device from the MD array mdadm remove lt md device name gt lt md mirror_component_persistent_name gt In
46. cess is initiated When it is complete the extended distance duster detects the added mirror half and accepts S1 as part of mdO0 Enable P1 torun on Node 1 by running the following command cmmodpkg e P1 n N1 88 Chapter 4 Table 4 1 Disaster Scenarios and Their Handling Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process This is a multiple failure scenario where the failures occur in a particular sequencein the configuration that corresponds to figure 2 where Ethernet and FC links do not go over DWDM The package P1 is running on a node N1 P1 uses a mirror md0 consisting of S1 local to node N1 say dev hpdev my link sde and S2 local to node N2 The first failure occurs with all FC links between the two data centers failing causing N1 to lose access to S2 and N2 tolose access to S1 The package P1 continues torun on N1 after the first failure with mdO consisting of only S1 After the second failure the package P1 fails over to N2 and starts with S1 Since S2 is also accessible the extended distance cluster adds S2 and starts re mirroring of S2 For the first failure scenario complete the following procedure to initiate a recovery 1 Restore the links in both directions between the data centers As a result S2 dev hpdev mylink sdf is accessible from N1 and S1 is accessible f
47. che 120851 0 0 225 12 1 html gt HP StorageWorks Cluster Extension EVA or XP 23 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Figure 1 5 shows a CLX for a Linux Serviceguard cluster architecture Figure 1 5 CLX for Linux Serviceguard Cluster Data Cente A High Availability Data Center B Third Location A key difference between extended distance clusters and CLX clusters is the data replication technology used The extended distance cluster uses Fibre Channel and Linux MD software mirroring for data replication CLX clusters provide extremely robust hardware based data replication available with specific disk arrays based on the capabilities of the HP StorageWorks Disk Array XP series or the HP StorageWorks EVA disk arrays 24 Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Benefits of CLX CLX offers a more resilient solution than Extended Distance Cluster as it provides complete integration between Serviceguard s application package and the data replication subsystem The storage subsystem is queried to determine the state of the data on the arrays CLX knows that application package data is replicated between two data centers It takes advantage of this knowledge to evaluate the status of the local and remote copies of the data including whether the local site holds t
48. crease the potential for human error Automated recovery procedures and processes can be transparent to the clients Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Managing a Disaster Tolerant Environment Even if recovery is automated you may choose to or need to recover from some types of disasters with manual recovery A rolling disaster which is a disaster that happens before the cluster has recovered from a previous disaster is an example of when you may want to manually switch over If the data link failed as it was coming up and resynchronizing data and the data center failed you would want human intervention to make judgment calls on which site had the most current and consistent data before failing over Who manages the nodes in the cluster and how arethey trained Putting a disaster tolerant architecturein place without planning for the people aspects is a waste of money Training and documentation are more complex because the cluster is in multiple data centers Each data center often has its own operations staff with their own processes and ways of working These operations people will now be required to communicate with each other and coordinate maintenance and failover rehearsals as well as working together to recover from an actual disaster If the remote nodes are placed in a lights out data center the operations staff may want to put additional processes or monitoring s
49. d of whether the deviceis clean up and running or if there are any errors To view the status of the MD device run the following command on any node cat proc mdstat Immediately after the MD devices are created and during some recovery processes the devices undergo a re mirroring process You can view the progress of this process in the proc mdstat file Following is the output you will see root dlhctl cat proc mdstat Personalities raidl md0 active raidl sde 2 sdf 0 9766784 blocks 2 1 U_ SPR eens eS Ole at RE recovery 8 9 871232 9766784 finish 2 7min speed 54452K sec unused devices lt none gt NOTE A status report obtained using the cat proc mdstat command shows the MD device name and the actual device names of the two MD component devices It does not show the persistent device names After you create an MD device you can view the status of the device stop and start the device add and remove a mirror component from the MD device 98 Appendix A Example A 1 Managing an MD Device Stopping the MD Device Stopping the MD Device After you create an MD device it begins to run You need to stop the device and add the configuration into the raid conf file To stop the MD device run the following command mdadm S lt md device name gt When you stop this device all resources that were previously occupied by this device are released Also the entry of this device is removed
50. data is written to this device the MD driver writes to both the underlying disks In case of read requests the MD reads from one device or the other based on its algorithms After creating this device you treat it like any other LUN that is going to have shared data in a Serviceguard environment and then create a logical volume and a file system on it Chapier 3 73 Configuring your Environment for Software RAID Creating Volume Groups and Configuring VG Exclusive Activation on the MD Mirror NOTE 74 Creating Volume Groups and Configuring VG Exclusive Activation on the MD Mirror Once you create the MD mirror device you need to create volume groups and logical volumes on it XDC A 01 00 does not support configuring multiple raid1 devices as physical volumes in a single volume group For example if you create a volume group vg01 it can have only one MD raid1 device dev mdo as its physical volume To configure multiple raid1 devices as physical volumes in a single volume group you must install the XDC A 01 02 patch To install this patch you must first upgrade to HP Serviceguard A 11 18 and the latest version of XDC After upgrading install the A 01 02 patch that is specific to the operating system in your environment XDC A 01 02 contains the following patches for Red Hat and SuSE Linux operating systems e SGLX_00133 for Red Hat Enterprise Linux 4 e SGLX_00134 for Red Hat Enterprise Linux 5 e SGLX_00135 for SUSE Linu
51. df You need to register with Hewlett Packard to access this site Setting the Value of the Link Down Timeout Parameter After you install the QLogic HBA driver you must set the Link Down Timeout parameter of the QLogic cards to a duration equal to the cluster reformation time When using the default values of heartbeat interval and node timeout intervals of Serviceguard for Linux with a Quorum server this parameter must be set to 40 seconds Setting this parameter to 40 seconds which is the recommended value prevents further writes to the active half of the mirror disk set when the other fails If this failure were to also bring down the node a few moments later then the chance of losing these writes are eliminated This parameter prevents any data being written to a disk when a failure occurs The value of this parameter must be set such that the disks are inaccessible for a time period which is greater than the cluster reformation time This parameter is important in scenarios where an entire site is in the process of going down By blocking further writes to the MD device the two disks of the MD device remain current and synchronized As a result when the package fails over it starts with a disk that has current data You must set a value for this parameter for all QLogic cards 69 Configuring your Environment for Software RAID Configuring Multiple Paths to Storage Table 3 1 The QLogic cards are configured to hold up any dis
52. disk after failover 41 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines NOTE 42 e Because there are multiple read devices that is the node has access to both copies of data there may be improvements in read performance e Writes are synchronous unless the link or disk is down Disadvantages of physical replication in software are e Aswith physical replication in the hardware the logical order of data writes is not maintained When the link is restored if there has been more than one write to the primary disk there is no way to determine the original order of transactions until the resynchronization has completed successfully Configuring the software so that a write to disk must be replicated on the remote disk before a subsequent write is allowed can limit the risk of data inconsistency while the link is up e Additional hardware is required for the cluster e Distance between sites is limited by the physical disk link capabilities e Performance is affected by many factors CPU overhead for mirroring double I Os degraded write performance and CPU time for resynchronization In addition CPU failure may cause a resynchronization even if it is not needed further affecting system performance Logical Data Replication Logical data replication is a method of replicating data by repeating the sequence of transactions at the remote site Logical replication
53. e Ethernet links from N1 tothe Ethernet switch in datacenter1 fails With this failure the heartbeat exchange between N1 and N2 is lost N2 accesses the Quorum server as it is the only node which has access to the Quorum server The package fails over toN2 and starts running with both S1 and S2 while N1 gets rebooted Complete the following procedure to initiate a recovery 1 Restore the Ethernet links from N1 to the switch in data center 1 2 After restoring the links you must add the node that was rebooted as part of the cluster Run the cmrunnode command to add the node to the duster Chapter 4 95 Disaster Scenarios and Their Handling 96 Chapter 4 Appendix A Managing an MD Device Managing an MD Device This chapter includes additional information on how to manage the MD device For the latest information on how to manage and MD device see The Software RAID HOWTO manual available at http www tldp org HOWTO Software RAID HOWTO html Following are the topics discussed in this chapter Viewing the Status of the MD Device on page 98 Stopping the MD Device on page 99 Starting the MD Device on page 100 Removing and Adding an MD Mirror Component Disk on page 101 97 Managing an MD Device Viewing the Status of the MD Device Viewing the Status of the MD Device After creating an MD device you can view its status By doing so you can remain informe
54. e concept of Extended Distance Clusters XDC It describes how to configure and manage HP Serviceguard Extended Distance Clusters for Linux and the associated Software RAID functionality In addition this guide indudes information on a variety of Hewlett Packard HP high availability cluster technologies that provide disaster tolerance for your mission critical applications Serviceguard has supported disaster tolerant clusters on HP UX for several years now while it is relatively new on Linux Features of those disaster tolerant HP UX systems may be used as an example through this document Intended Audience It is assumed that you are familiar with the following topics e HP Serviceguard configurations e Basic RAID concepts Document Organization The chapters of this guide include e Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster on page 13 e Chapter 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID on page 51 e Chapter 3 Configuring your Environment for Software RAID on page 61 e Chapter 4 Disaster Scenarios and Their Handling on page 85 10 Related Publications Problem Reporting Preface The following documents contain additional useful information e Clusters for High Availability a Primer of HP Solutions Second Edition Hewlett Packard Professional Books Prentice Hall PTR 2001 ISBN 0 13 089355 2 e Designing Disaster Tolerant
55. e links between each data center If using a single DWDM box for the links between each data center the redundant standby fibre link feature of the DWDM box must be configured If the DWDM box supports multiple active DWDM links that feature can be used instead of the redundant standby feature At least two dark fiber optic links are required between each Primary data center each fibre link routed differently to prevent the backhoe problem It is allowable to have only a single fibre link routed from each Primary data center to the third location however in order to survive the loss of a link between a Primary data center and the third data center the network routing should be configured so that a Primary data center can also reach the Arbitrator via a route passing through the other Primary data center The network switches in the configuration must support DLPI link level packets The network switch can be 100BaseT TX or FX 1000BaseT TX or FX or FDDI The connection between the network switch and the DWDM box must be fiber optic Chapier 2 Chapier 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID Guidelines on DWDM Links for Network and Data e Fibre Channel switches must be used in a DWDM configuration Fibre Channel hubs are not supported Direct Fabric Attach mode must be used for the ports connected to the DWDM link See the HP Configuration Guide available through your HP representative
56. eed for Disaster Tolerance Disaster tolerance is the ability to restore applications and data within a reasonable period of time after a disaster Most people think of fire flood and earthquake as disasters but a disaster can be any event that unexpectedly interrupts service or corrupts data in an entire data center the backhoe that digs too deep and severs a network connection or an act of sabotage Disaster tolerant architectures protect against unplanned down time due to disasters by geographically distributing the nodes in a cluster so that a disaster at one site does not disable the entire duster To evaluate your need for a disaster tolerant solution you need to weigh Risk of disaster Areas prone to tornadoes floods or earthquakes may require a disaster recovery solution Some industries need to consider risks other than natural disasters or accidents such as terrorist activity or sabotage The type of disaster to which your business is prone whether it is due to geographical location or the nature of the business will determine the type of disaster recovery you choose For example if you live in a region prone to big earthquakes you are not likely to put your alternate or backup nodes in the same city as your primary nodes because that sort of disaster affects a large area The frequency of the disaster also plays an important role in determining whether to invest in a rapid disaster recovery solution For example you w
57. es on DWDM Links for Network and Data 00 ccc eee e nee 58 3 Configuring your Environment for Software RAID Understanding Software RAID ssassn saanane 62 Installing the Extended Distance Cluster Software 0 0 00 63 Supported Operating SySteMS 0 cnet n eee ene enees 63 PrerequisiteSis icicle cede boa de ew baw een be SAN we Cee hee 63 installing XDCC Se packs hehe we AGRE Bieta a a a ie ide a ne Wee es 63 Verifying the XDC Installation 0 0 0 ccc tee 64 Configuring the Environment 0600 cece tees 66 Configuring Multiple Paths to Storage 0 ccc eee tenes 69 Setting the Value of the Link Down Timeout Parameter n sasaaa annann 69 Using Persistent DeviceNames 0 0 0 0 ccc tees 71 Contents Creating a Multiple Disk Device 1 ee tenet n eens 72 To Create and Assemble an MD DeViCe 0 0 0c eee 72 Creating Volume Groups and Configuring VG Exclusive Activation on the MD Mirror 74 Configuring the Package Control Script and RAID Configuration File 76 Creating and Editing the Package Control Scripts 00 cee eee 77 Editing the raid conf File i wc es cee dee eed ee eee a ee ee 78 4 Disaster Scenarios and Their Handling A Managing an MD Device Viewing the Status of the MD Device 00 c eee tae 98 Stopping the MD Device 0 2 ttt 99 Starting the MD Device 0 0 tees 100 Removing and Adding an MD Mirror Component
58. f your business if a disaster occurs The following pages give a high level view of a variety of disaster tolerant solutions and sketch the general guidelines that you must follow in developing a disaster tolerant computing environment 15 Disaster Tolerance and Recovery in a Serviceguard Cluster What is a Disaster Tolerant Architecture Figure 1 1 16 What is a Disaster Tolerant Architecture In a Serviceguard cluster configuration high availability is achieved by using redundant hardware to eliminate single points of failure This protects the cluster against hardware faults such as the node failure in Figure 1 1 High Availability Architecture node 1 fails A ae node 2 pkg A disks pk g B e pkg B disks E Client Connections pkg A fails over to node 2 node 1 a node 2 f pkg A disks pkg BT Pee mee D Se ee _ aR pkg B disks 2 s pkg B mirrors Client Connections This architecture which is typically implemented on one sitein a single data center is sometimes called a local cluster For some installations the level of protection given by a local cluster is insufficient Consider the order processing center where power outages are common during harsh weather Or consider the systems running the stock market where multiple system failures for any reason have a significant financial Chapter 1 Dis
59. from the proc mdstat file Stopping the MD Device dev md0 To stop the MD device dev mdo run the following command root dlhct1 dev mdadm S dev md0 Once you stop the device the entry is removed from the proc mdstat file Following is an example of what the file contents will look like root dlhctl dev cat proc mdstat Personalities unused devices raidl lt none gt NOTE This command and the other commands described subsequently are listed as they may be used during cluster development and during some recovery operations Appendix A 99 Managing an MD Device Starting the MD Device Example A 2 Starting the MD Device After you create an MD device you would need to stop and start the MD device to ensure that it is active You would not need to start the MD device in any other scenario as this is handled by the XDC software To start the MD device run the following command mdadm A R lt md_device_name gt lt md_mirror_component_persistent_name_0 gt lt md_mirror_component_persistent_name_1 gt Starting the MD Device dev mdo To start the MD device dev mdo run the following command mdadm A R dev md0 dev hpdev sde dev hpdev sdf1 Following is an example of what the proc mdstat file contents will look like once the MD is started root dlhctl dev cat proc mdstat Personalities raidl md0 active raidl sde 1 sdf 0 9766784 blocks 2 2 UU unused devi
60. g volume groups and configuring exclusive activation see Creating Volume Groups and Configuring VG Exclusive Activation on the MD Mirror on page 74 Configure the package control script and the Extended Distance cluster configuration script In order to let Serviceguard know of the existence of the mirror created in the previous step and hence make use of it it must be configured as part of a package This MD device must be specified in the Extended Distance Cluster configuration file raid conf Copy the raid conf template in the software bundleas raid conf tothe package directory and edit it to specify the RAID configuration parameters for this package Using the details mentioned in this file Serviceguard will start stop and monitor this MD mirror for the package For details on how to configure the package control script and raid conf see Configuring the Package Control Script and RAID Configuration File on page 76 Every time you edit the raid conf file you must copy this edited file to all nodes in the cluster Start the package Starting a package configured for Software RAID is the same as starting any other package in Serviceguard for Linux You also need to keep in mind a few guidelines before you enable Software RAID for a particular package Following are some of these guidelines you need to follow 67 Configuring your Environment for Software RAID Configuring the Environment 68 Ensure that t
61. h is the ability after a failure to allow the secondary site to become the primary and the original primary to become the new secondary site This capability can provide increased up time Ideal Data Replication The ideal disaster tolerant architecture if budgets allow is the following combination e For performance and data currency physical data replication e For data consistency either a second physical data replication as a point in time snapshot or logical data replication which would only be used in the cases where the primary physical replica was corrupt Using Alternative Power Sources In a high availability cluster redundancy is applied to cluster components such as multiple paths to storage redundant network cards power supplies and disks In disaster tolerant architectures another level of protection is required for these redundancies Each data center that houses part of a disaster tolerant duster should be supplied with power from a different circuit In addition to a standard UPS uninterrupted power supply each node in a disaster tolerant cluster should be on a separate power circuit see Figure 1 9 Chapter 1 Figure 1 9 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines Alternative Power Sources Power Circuit 4 Power Circuit 2 t n Data Center A Chapter 1
62. he Quorum Server link is close to the Ethernet links in your setup In cases of failures of all Ethernet and Fibre channel links the nodes can easily access the Quorum Server for arbitration The Quorum Server is configured in a third location only for arbitration In scenarios where the link between two nodes is lost each node considers the other node to be dead As a result both nodes will try to access the Quorum Server The Quorum Server as an arbitrator acknowledges the node that reaches it first and allows the package to start on that node You also need to configure Network Time Protocol NTP in your environment This protocol resolves the time differences that could occur in a network For example nodes in different time zones Chapier 3 NOTE Chapier 3 Configuring your Environment for Software RAID Configuring Multiple Paths to Storage Configuring Multiple Paths to Storage HP requires that you configure multiple paths to the storage device using the QLogic HBA driver as it has inbuilt multipath capabilities Use the install script with the f option to enable multipath failover mode For more information on installing the QLogic HBA driver see the HP StorageWorks Using the QLogic HBA driver for single path or multipath failover mode on Linux systems application notes This document is available at the following location http h20000 www2 hp com bc docs support SupportManual c001 69487 c00169487 p
63. he primary copy or the secondary copy of data whether the local data is consistent or not and whether the local data is current or not Depending on the result of this evaluation CLX decides if it is safe to start the application package whether a resynchronization of data is needed before the package can start or whether manual intervention is required to determine the state of the data before the application package is started CLX allows for customization of the startup behavior for application packages depending on your requirements such as data currency or application availability This means that by default CLX will always prioritize data consistency and data currency over application availability If however you choose to prioritize availability over currency you can configure CLX to start up even when the state of the data cannot be determined to be fully current but the data is consistent CLX XP supports synchronous and asynchronous replication modes allowing you to prioritize performance over data currency between the data centers Because data replication and resynchronization are performed by the storage subsystem CLX may provide significantly better performance than Extended Distance Cluster during recovery Unlike Extended Distance Cluster CLX does not require any additional CPU time for data replication which minimizes the impact on the host There is little or no lag time writing to the replica so the data rema
64. idelines of redundant components such as multiple paths to storage network cards power supplies and disks Protecting Nodes through Geographic Dispersion Redundant nodes in a disaster tolerant architecture must be geographically dispersed If they arein the same data center it is not a disaster tolerant architecture Figure 1 2 on page 17 shows a cluster architecture with nodes in two data centers A and B If all nodes in data center A fail applications can fail over to the nodes in data center B and continue to provide clients with service Depending on the type of disaster you are protecting against and on the available technology the nodes can be as close as another room in the same building or as far away as another city The minimum recommended dispersion is a single building with redundant nodes in different data centers using different power sources Specific architectures based on geographic dispersion are discussed in the following chapter 37 Disaster Tolerance and Recovery in a Serviceguard Cluster Disaster Tolerant Architecture Guidelines 38 Protecting Data through Replication The most significant losses during a disaster are the loss of access to data and the loss of data itself You protect against this loss through data replication that is creating extra copies of the data Data replication should e Ensure data consistency by replicating data in a logical order so that it is immediately usable or recoverab
65. ies of the on line data This can be mitigated by replicating data off line frequent backups It can also be mitigated by taking snapshots of consistent data and storing it on line Business Copy XP and EMC Symmetrix BCV Business Consistency Volumes provide this functionality and the additional benefit of quick recovery should anything happen to both copies of on line data A rolling disaster is a disaster that occurs before the cluster is able to recover from failure that is not normally considered a disaster An example is a data replication link that fails then as it is being restored and data is being resynchronized a disaster causes an entire data center to fail The effects of rolling disasters can be mitigated by ensuring that a copy of the data is stored either off line or on a separate disk that can quickly be mounted The trade off is a lack of currency for the data in the off line copy Chapter 1 47 Disaster Tolerance and Recovery in a Serviceguard Cluster Managing a Disaster Tolerant Environment 48 Managing a Disaster Tolerant Environment In addition to the changes in hardware and software to create a disaster tolerant architecture there are also changes in the way you manage the environment Configuration of a disaster tolerant architecture needs to be carefully planned implemented and maintained There are additional resources needed and additional decisions to make concerning the maintenance of a disaste
66. in order to survive the loss of the network link between a primary data center and the arbitrator data center the network routing should be configured so that a primary data center can also reach the arbitrator via a route which passes through the other primary data center e There must be at least two alternately routed Fibre Channel Data Replication links between each data center e Seethe HP Configuration Guide available through your HP representative for a list of supported Fibre Channel hardware 57 Building an Extended Distance Cluster Using Serviceguard and Software RAID Guidelines on DWDM Links for Network and Data 58 Guidelines on DWDM Links for Network and Data There must be less than 200 milliseconds of latency in the network between the data centers No routing is allowed for the networks between the data centers Routing is allowed to the third data center if a Quorum Server is used in that data center The maximum distance supported between the data centers for DWDM configurations is 100 kilometers Both the networking and Fibre Channel Data Replication can go through the same DWDM box separate DWDM boxes are not required Since DWDM converters are typically designed to be fault tolerant it is acceptable to use only one DWDM box in each data center for the links between each data center However for the highest availability it is recommended to use two separate DWDM boxes in each data center for th
67. ins current Data can be copied in both directions so that if the primary site fails and the replica takes over data can be copied back to the primary site when it comes back up 25 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters 26 Disk resynchronization is independent of CPU failure that is if the hosts at the primary site fail but the disk remains up the disk knows it does not have to be resynchronized Differences Between Extended Distance Cluster and CLX The major differences between an Extended Distance Cluster and a CLX cluster are The methods used to replicate data between the storage devices in the two data centers The two basic methods available for replicating data between the data centers for Linux clusters are either host based or storage array based Extended Distance Cluster always uses host based replication MD mirroring on Linux Any mix of Serviceguard supported Fibre Channel storage can be implemented in an Extended Distance Cluster CLX always uses array based replication mirroring and requires storage from the same vendor in both data centers that is a pair of XPs with Continuous Access or a pair of EVAs with Continuous Access Data centers in an Extended Distance Cluster can span up to 100km whereas the distance between data centers in a Metrocluster is defined by the shortest of the following distances Maximum distance that g
68. istance with a third location supporting the quorum service Extended distance clusters are connected using a high speed cable that guarantees network access between the nodes as long as all guidelines for disaster tolerant Chapter 1 NOTE Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters architecture are followed Extended distance clusters were formerly known as campus clusters but that term is not always appropriate because the supported distances have increased beyond the typical size of a single corporate campus The maximum distance between nodes in an extended distance cluster is set by the limits of the data replication technology and networking limits An extended distance cluster is shown in Figure 1 3 There are norules or recommendations on how far the third location must be from the two main data centers The third location can be as close as the room next door with its own power source or can be as far as in a site across town The distance among all three locations dictates the level of disaster tolerance an extended distance cluster can provide In an extended distance cluster for data replication the Multiple Disk MD driver is used Using the MD kernel driver you can configure RAID 1 mirroring in your cluster In a dual data center setup to configure RAID 1 one LUN froma storage device in data center 1 is coupled with a LUN from a storage
69. k access and essentially hang for a time period which is greater than the cluster reformation time when access toa disk is lost This is achieved by altering the Link Down Timeout value for each port of the card Setting a value for the Link Down Timeout parameter for a QLogic card ensures that the MD device hangs when access toa mirror is lost For configurations with multipath the MD device hangs when one path toa storage system is lost However the MD device resumes activity when the specified hang period expires This ensures that no data is lost This parameter is required to address a scenario where an entire datacenter fails but all its components do not fail at the same time but undergo a rolling failure In this case if the access to one disk is lost the MD layer hangs and data is no longer written to it Within the hang period the node goes down and a cluster reformation takes place When the package fails over to another node it starts with a disk that has current data The value to be set for Link Down Timeout parameter depends on the heartbeat interval and the node timeout values configured for a particular cluster Use the SAN Surfer CLI tool to set the value for this parameter For more information on how to set this parameter see http download qlogic com manual 32338 SN0054614 00B pdf Table 3 1 lists the heartbeat intervals and the node timeout values for a particular cluster Cluster Reformation Time and Timeout Value
70. le Inconsistent data is unusable and is not recoverable for processing Consistent data may or may not be current e Ensure data currency by replicating data quickly so that a replica of the data can be recovered to include all committed disk writes that were applied to the local disks e Ensure data recoverability so that there is some action that can be taken to make the data consistent such as applying logs or rolling a database e Minimize data loss by configuring data replication to address consistency currency and recoverability Different data replication methods have different advantages with regards to data consistency and currency Your choice of which data replication methods to use will depend on what type of disaster tolerant architecture you require Off line Data Replication Off line data replication is the method most commonly used today It involves two or more data centers that store their data on tape and either send it to each other through an express service if need dictates or store it off linein a vault If a disaster occurs at one site the off line copy of data is used to synchronize data and a remote site functions in place of the failed site Because data is replicated using physical off line backup data consistency is fairly high barring human error or an untested corrupt backup However data currency is compromised by the time delay in sending the tape backup to a remote site Off line data replic
71. less than that value For example an RPO_TARGET of 45 means that the package will start only if the mirror is up to date or out of date by less than 45 seconds Because some timers are affected by polling the value of this parameter can vary by approximately 2 seconds This also requires that the minimum value of this parameter is 2 seconds if a small value is necessary Change the value of RPO_TARGET if necessary after considering the cases discussed below Cases to Consider when Setting RPO_TARGET RPO_TARGET allows for certain failure conditions when data is not synchronized between the two sites 79 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File For example let us assume that the data storage links in Figure 1 4 fail before the heartbeat links fail In this case after the time specified by Link_Down_Timeout has elapsed a package in Datacenter1 DC1 will continue updating the local storage but not the mirrored data in datacenter2 DC2 While the communication links must be designed to prevent this situation as far as possible this scenario could occur and may last for a while before one of the sites fails NOTE For more information on how access to disks is disabled in certain failure scenarios see Setting the Value of the Link Down Timeout Parameter on page 69 Let us consider a few failure scenarios and the impact of different
72. luster Architecture section and installing the Extended Distance Cluster software complete the following steps to enable Software RAID for each package Subsequent sections describe each of these processes in detail I 66 Configure multipath for storage In the Extended Distance Cluster setup described in figures 1 and 2 a node has multiple paths to storage With this setup each LUN exposed from a storage array shows up as two devices on every node There are two device entries in the dev directory for the same LUN where each device entry will pertain toa single path to that LUN When a QLogic driver is installed and configured for multipath all device names leading to the same physical device will be merged and only one device entry will appear in their place This happens for devices from both the storage systems Creating these multiple links to the storage device ensures that each node is not dependent only on one link to write data tothat storage device For more information on configuring multipath see Configuring Multiple Paths to Storage on page 69 Configure persistent device names for storage devices Once the multipath has been configured you need to create persistent device names using udev In cases of disk or link failure and subsequent reboot it is possible that device names are renamed or reoriented Since the MD mirror device starts with the names of the component devices a change in the device name prevents
73. lusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Z Extended Distance Continentalclusters AnTIDUNES Cluster CER HP UX only Key Benefit Excellent in normal Two significant benefits Increased data operations and partial failure Since all hosts have access to both disks in a failure where the node is running and the application is up but the disk becomes unavailable no failover occurs The node will access the remote disk to continue processing Provides maximum data protection State of the data is determined before application is started If necessary data resynchronization is performed before application is brought up Better performance than Extended Distance Cluster for resynchronization as replication is done by storage subsystem no impact to host protection by supporting unlimited distance between data centers protects against such disasters as those caused by earthquakes or violent attacks where an entire area can be disrupted Chapter 1 31 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Continued Attributes Extended Distance Cluster CLX Continentalclusters HP UX only Key Limitation No ability to check the state of the data before starting up the application If the volume g
74. m Service location which is at a third location have the following configuration requirements There is no hard requirement on how far the Quorum Service location has to be from the two main data centers It can be as close as the room next door with its own power source or can be as far as in another site across town The distance between all three locations dictates that level of disaster tolerance a cluster can provide e In these solutions there must be an equal number of nodes in each primary data center and the third location known as the arbitrator data center contains the Quorum Server LockLUN is not supported in a Disaster Tolerant configuration In this release only one node in each data center is supported e The Quorum Server is used as a tie breaker to maintain cluster quorum when all communication between the two primary data centers is lost The arbitrator data center must be located separately from the primary data centers For more information about quorum server see the Managing Serviceguard user s guide and the Serviceguard Quorum Server Rdease Notes e A minimum of two heartbeat paths must be configured for all cluster nodes The preferred solution is two separate heartbeat subnets configured in the cluster each going over a separately routed network path to the other data center Alternatively there can bea single dedicated heartbeat subnet with a bonded pair configured for it Each would go over a separately ro
75. nce separating the nodes in a metropolitan cluster is limited by the data replication and network technology available In addition there is no hard requirement on how far the third location has to be from the two main data centers The third location can be as close as the room next door with its own power source or can be as far as in a site across town The distance between all three locations dictates the level of disaster tolerance a metropolitan cluster can provide On Linux the metropolitan cluster is implemented using CLX e CLX for XP e CLX for EVA For HP UX Metropolitan cluster architecture is implemented through the following HP products e Metrocluster with Continuous Access XP e Metrocluster with Continuous Access EVA e Metrocduster with EMC SRDF The above HP UX products are described in detail in Chapters 3 4 and 5 of the Designing Disaster Tolerant HA Clusters Using Metrocdluster and Continentalclusters user s guide The Linux products are described in detail in Getting Started with MC ServiceGuard for Linux guide While there are some differences between the HP UX and the Linux versions the concepts are similar enough that only Cluster Extension CLX will be described here On line versions of the above document and other HA documentation are available at http docs hp com gt High Availability On line versions of the Cluster Extension documentation is available at http h71028 www7 hp com enterprise ca
76. nces to the XDC software to enable Software RAID After you create the package control script you need to complete the following tasks e Edit the value of the DATA_REP variable e Edit the value of the xDc_CONFIG_FILE to point to the location where the raid conf file is placed e Configure the RAID monitoring service To Create a Package Control Script The procedure to create a package control script for XDC software is identical to the procedure that you follow to create other package control scripts To create a package control script run the following command cmmakepkg s lt package file name gt sh For example cmmakepkg s oracle_pkg sh An empty template file for this package is created You will need to edit this package control script in order to enable Software RAID in your environment To Edit the DATA_REP Variable The DATA_REP variable defines the nature of data replication that is used To enable Software RAID set the value of this variable to MD You must set this value for every package that you need to enable Software RAID When you set this parameter to xdcma it enables remote data replication through Software RAID For example DATA_REP xdcemd 77 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File To Edit the XDC_CONFIG FILE parameter In addition to modifying the DATA_REP variable you must also set XDC_CONFIG_FILE t
77. nd another LUN from storage 2 dev hpdev sdf1 Run the following command to create an MD device mdadm create verbose dev md0 level 1 raid devices 2 dev hpdev sdel dev hpdev sdf1 This command creates the MD device Once the new RAID device dev mdo is created on one of the cluster nodes you must assemble it on the nodes where the package must run You create an MD device only once and you can manage other functions using the XDC scripts To assemble the MD device complete the following procedure 1 Stop the MD device on the node where you created it by running the following command mdadm S dev md0 Chapier 3 Configuring your Environment for Software RAID Creating a Multiple Disk Device 2 Assemble the MD device on the other node by running the following command mdadm A R dev md0 dev hpdev sdel dev hpdev sdf1 3 Stop the MD device on the other node by running the following command mdadm S dev md0 You must stop the MD device soon after you assemble it on the second node 4 If you want to create volume groups restart the MD device on the first node by running the following command mdadm A R dev md0 dev hpdev sdel dev hpdev sdf1 5 After you have created the volume groups stop the MD device by running the following command mdadm S dev md0 IMPORTANT You would need to repeat this procedure to create all MD devices that are used in a package When
78. nded Distance Cluster Using Serviceguard and Software RAID Rules for Separate Network and Data Links Rules for Separate Network and Data Links e There must be less than 200 milliseconds of latency in the network between the data centers e Norouting is allowed for the networks between the data centers e Routing is allowed to the third data center if a Quorum Server is used in that data center e The maximum distance between the data centers for this type of configuration is currently limited by the maximum distance supported for the networking type or Fibre Channel link type being used whichever is shorter e There can be a maximum of 500 meters between the Fibre Channel switches in the two data centers if Short wave ports are used This distance can be increased to 10 kilometers by using a Long wave Fibre Channel port on the switches If DWDM links are used the maximum distance between the data centers is 100 kilometers For more information on link technologies see Table 2 1 on page 52 e There must be at least two alternately routed networking links between each primary data center to prevent the backhoe problem The backhoe problem can occur when all cables are routed through a single trench and a tractor on a construction job severs all cables and disables all communications between the data centers It is allowable to have only a single network link routed from each primary data center to the third location however
79. o 3 primary clusters and one recovery cluster e Failover for Continentaldlusters is semi automatic If a data center fails the administrator is advised and is required to take action to bring the application up on the surviving cluster Continental Cluster With Cascading Failover A continental duster with cascading failover uses three main data centers distributed between a metropolitan cluster which serves as a primary cluster and a standard cluster which serves as a recovery cluster Cascading failover means that applications are configured to fail over from one data center to another in the primary cluster and then toa third recovery cluster if the entire primary cluster fails Data replication also follows the cascading model Data is replicated from the primary disk array to the secondary disk array in the Metrocluster then replicated to the third disk array in the Serviceguard recovery cluster For more information on Cascading Failover configuration maintenance and recovery procedures see the Cascading Failover in a Continental Cluster white paper on the high availability documentation web site at http docs hp com gt High Availability gt Continentalcluster Comparison of Disaster Tolerant Solutions Table 1 1 summarizes and compares the disaster tolerant solutions that are currently available Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant C
80. o specify the raid conf file for this package This file resides in the package directory For example XDC_CONFIG_FILE SGCONF oracle_pkg raid conf To Configure the RAID Monitoring Service After you have edited the variables in the XDC configuration file XDC_CONFIG_FILE you must set up RAID monitoring as a service within Serviceguard Following is an example of how the file content CMD 0 SSGSBIN raid_monitor XDC_CONFIG_FILE Ensure that this service is also configured in the package configuration file as shown below After editing the package control script you must edit the raid conf file to enable Software RAID E diting the raid conf File The raid conf file specifies the configuration information of the RAID environment of the package You must place a copy of this file in the package directory of every package that you have enabled Software RAID The parameters in this file are Given a set of storage that is mirrored remotely as in Figure 1 4 the RPO_TARGET Recovery Point Objective Target is the maximum time allowed between the expiration of the Link_Down_Timeout t1 in Figure 3 1 after the failure of the data links to the remote storage and the package starting up on the remote node t4 on Figure 3 1 If must look SERVICE_NAME 0 RAID_monitor SERVICE_ SERVICE_RESTART 0 SERVICE_NAME raid_monitor SERVICE_FAIL_FAST_E
81. oftware in place to maintain the nodes in the remote location Rehearsals of failover scenarios are important to keep prepared A written plan should outline rehearsal of what to doin cases of disaster with a minimum recommended rehearsal schedule of once every 6 months ideally once every 3 months How is thecluster maintained Planned downtime and maintenance such as backups or upgrades must be more carefully thought out because they may leave the cluster vulnerable to another failure For example nodes need to be brought down for maintenance in pairs one node at each site so that quorum calculations do not prevent automated recovery if a disaster occurs during planned maintenance Rapid detection of failures and rapid repair of hardware is essential so that the cluster is not vulnerable to additional failures Testing is more complex and requires personnel in each of the data centers Site failure testing should be added to the current cluster testing plans 49 Disaster Tolerance and Recovery in a Serviceguard Cluster Additional Disaster Tolerant Solutions Information 50 Additional Disaster Tolerant Solutions Information On line versions of HA documentation are available at http docs hp com gt High Availability gt Serviceguard for Linux For information on CLX for EVA and XP see the following document available at http h71028 www7 hp com enterprise cache 120851 0 0 225 12 1 html gt HP StorageWorks
82. on or during recovery process after a failure A persistent device created based on the instructions in the document mentioned earlier will have a device name that starts with dev hpdev The name of the MD device must be unique across all packages in the cluster Also the names of each of their component udev devices must also be unique across all nodes in the cluster 71 Configuring your Environment for Software RAID Creating a Multiple Disk Device NOTE 72 Creating a Multiple Disk Device As mentioned earlier the first step for enabling Software RAID in your environment is to create the Multiple Disk MD device using two underlying component disks This MD device is a virtual device which ensures that any data written to it is written to both component disks As a result the data is identical on both disks that make up the MD device This section describes how to create an MD device This is the only step that you must complete before you enable Software RAID for a package The other RAID operations are needed only during maintenance or during recovery process after a failure has occurred For all the steps in the subsequent sections all the persistent device names and not the actual device names must be used for the two component disks of the MD mirror To Create and Assemble an MD Device This example shows how to create the MD device dev md0 you must create it from a LUN of storage device 1 dev hpdev sdel1 a
83. ould be more likely to protect from hurricanes that occur seasonally rather than protecting from a dormant volcano Vulnerability of the business How long can your business afford to be down Some parts of a business may be able to endure a 1 or 2 day recovery time while others need to recover in a matter of minutes Some parts of a business only need local protection from single outages such a node failure Other parts of a business may need both local protection and protection in case of site failure It is important to consider the role applications play in your business For example you may target the assembly line production servers as most in need of quick recovery But if the most likely disaster in your area is an earthquake it would render the assembly Chapter 1 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Evaluating the Need for Disaster Tolerance line inoperable as well as the computers In this case disaster recovery would be moot and local failover is probably the more appropriate level of protection On the other hand you may have an order processing center that is prone to floods in the winter The business loses thousands of dollars a minute while the order processing servers are down A disaster tolerant architecture is appropriate protection in this situation Deciding toimplement a disaster recovery solution really depends on the balance between risk of disaster and the vulnerability o
84. r tolerant architecture Manage it in house or hire a service Hiring a service can remove the burden of maintaining the capital equipment needed to recover from a disaster Most disaster recovery services provide their own off site equipment which reduces maintenance costs Often the disaster recovery site and equipment are shared by many companies further reducing cost Managing disaster recovery in house gives complete control over the type of redundant equipment used and the methods used to recover from disaster giving you complete control over all means of recovery Implement automated or manual recovery Manual recovery costs less to implement and gives more flexibility in making decisions while recovering from a disaster Evaluating the data and making decisions can add to recovery time but it is justified in some situations for example if applications compete for resources following a disaster and one of them has to be halted Automated recovery reduces the amount of time and in most cases eliminates human intervention needed to recover from a disaster You may want to automate recovery for any number of reasons Automated recovery is usually faster Staff may not be available for manual recovery as is the case with lights out data centers Reduction in human intervention is also a reduction in human error Disasters don t happen often so lack of practice and the stressfulness of the situation may in
85. rom N 2 2 Run the following commands to remove and add S2 to mdO on N1 mdadm remove dev md0 dev hpdev mylink sdf mdadm add dev md0 dev hpdev mylink sdf The re mirroring process is initiated The re mirroring process starts from the beginning on N2 after the second failure When it completes the extended distance cluster detects S2 and accepts it as part of mdo again After recovery for the first failure has been initiated the second failure occurs when re mirroring is in progress and N1 goes down Chapter 4 89 Disaster Scenarios and Their Handling Table 4 1 Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process This is a multiple failure scenario where the failures occur in a particular sequencein the configuration that corresponds to figure 2 where Ethernet and FC links do not go over DWDM The RPO_TARGET for the package P1is set to IGNORE The package is running on Node 1 P1 uses a mirror mdO consisting of S1 local to node N1 dev hpdev mylink sde and S2 local to node N2 The first failure occurs when all FC links between the two data centers fail causing Node 1 to lose access to S2 and Node 2 to lose access to S1 After sometime a second failure occurs Node 1 fails because of power failure The package P1 continues torun on Node 1 after the first failure
86. roup vg can be activated the application will be started If mirrors are split or multiple paths to storage are down as long as the vg can be activated the application will be started Data resynchronization does not havea big impact on system performance However the performance varies depending on the number of times data resynchronization occurs In the case of MD data resynchronization is done one disk at atime using about 10 of the available CPU time and taking longer to resynchronize multiple LUNs The amount of CPU time used is a configurable MD parameter Specialized storage required Currently XP with Continuous Access and EVA with Continuous Access are supported No automatic failover between clusters 32 Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters Table 1 1 Comparison of Disaster Tolerant Cluster Solutions Continued z Extended Distance Continentalclusters ARTIDULES Cluster GEX HP UX only Maximum 100 Kilometers Shortest of the distances No distance Distance between restrictions e Cluster network latency not to exceed 200 ms e Data Replication Max Distance e DWDM provider max distance Data Host based through Array based through You have a choice of Replication MD Replication can Continuous Access XP or either selecting their mechanism affect performance Continuous Access
87. s NOTE 70 Heartbeat Cluster Link Down Intervals Reformation Time Timeout Value 1 second 38 seconds 40 seconds 2 seconds 56 seconds 58 seconds 5 seconds 140 seconds 142 seconds 10 seconds 250 seconds 255 seconds The values in this table are approximate values The actual time varies from system to system depending on the system load Chapier 3 NOTE Chapier 3 Configuring your Environment for Software RAID Using Persistent Device Names Using Persistent Device Names When there is a disk related failure and subsequent reboot there is a possibility that the devices are renamed Linux names disks in the order they are found The device that was dev sdf may be renamed to dev sde if any lower device is failed or removed As a result you cannot activate the MD device with the original name HP requires that the device names be persistent to avoid reorientation after a failure and reboot For more information on creating persistent device names see the Using udev to Simplify HP Serviceguard for Linux Configuration white paper that is available at the following location http docs hp com When creating persistent device names ensure that the same udev rules file exists in all the nodes This is necessary for the symlinks to appear and point to the correct device Use these persistent device names wherever there is a need to specify the devices for extended cluster configurati
88. s for installing Extended Distance Cluster software XDC e HP Serviceguard for Linux A 11 16 07 or higher e Network Time Protocol NTP all nodes in the cluster to point to the same NTP server e QLogic Driver The version number of this driver depends on the version of the QLogic cards in your environment Download the appropriate version of the driver from the following location http Awww hp com gt Software and Driver Downloads Select the Download drivers and software and firmware option Enter the HBA name and click gt gt If more than one result is displayed download the appropriate driver for your operating system Installing XDC You can install XDC from the product CD You must install the XDC software on all nodes in a duster to enable Software RAID Chapier 3 63 Configuring your Environment for Software RAID Installing the Extended Distance Cluster Software NOTE 64 Complete the following procedure to install XDC 1 Insert the product CD into the drive and mount the CD 2 Open the command line interface 3 If you are installing XDC on Red Hat 4 run the following command rpm Uvh xdc A 01 00 0 rhel4 noarch rpm 4 Ifyou areinstalling XDC on Novell SUSE Linux Enterprise Server 9 run the following commana rpm Uvh xdc A 01 00 0 sles9 noarch rpm 5 If you are installing XDC on Novell SUSE Linux Enterprise Server 10 run the following command rpm Uvh xdc A 01 00 0 sles
89. sde and S2 local to node N 2 The first failure occurs when all FC links between the two data centers fail causing N1 to lose access to S2 and N2 to lose access to S1 After the package resumes activity and runs for 20 seconds a second failure occurs causing N 1 to fail perhaps due to power failure Package P1 continues to run on N1 after the first failure with md0 consisting of only S1 After the second failure package P1 fails over to N2 and starts with S2 This happens because the disk S2 is non current by less than 60 seconds This time limit is set by the RPO_TARGET parameter Disk S2 has data that is older than the other mirror half S1 However all data that was written to S1 after the FC link failure is lost In this scenario no attempts are made to repair the first failure until the second failure occurs Typically the second failure occurs before the first failure is repaired 1 To recover from the first failure restore the FC links between the data centers As a result S1 dev hpdev mylink sde is accessible from N2 2 Run the following command to add S1 to md0 on N2 mdadm add dev md0 dev hpdev mylink sde This command initiates the re mirroring process When it is complete the extended distance cluster detects S1 and accepts it as md0 again For the second failure restore N1 Once it is restored it joins the cluster and can access S1 and S2 1 Run the follo
90. the filter ensure that you use the persistent names of all the devices used in the mirrors This prevents these mirror devices from being scanned or used for logical volumes You have to reload LVM with etc init d lvm force reload Once you add the filter to the etc 1vm 1vm conf file create the logical volume infrastructure on the MD mirror device as you would on a single disk For more information on creating volume groups and logical volumes see the latest edition of the Managing HP Serviceguard 11 16 for Linux at http docs hp com en ha html Serviceguard for Linux 75 Configuring your Environment for Software RAID Configuring the Package Control Script and RAID Configuration File Configuring the Package Control Script and RAID Configuration File This section describes the package control scripts and configuration files that you need to create and edit to enable Software RAID in your Serviceguard environment Earlier versions of Serviceguard supported MD as a multipathing software As a result the package control script includes certain configuration parameters that are specific to MD Do not use these parameters to configure XDC in your environment Following are the parameters in the configuration file that you must not edit MD RAID CONFIGURATION FILE Specify the configuration file that will be used to defin the md raid devices for this package NOTE The multipath mechanisms that are supported for sh
91. this example mdadm remove dev md0 dev hpdev mylink sdc1 This command removes the failed mirrored disk from the array 101 Managing an MD Device Removing and Adding an MD Mirror Component Disk Example A 3 Removing a failed MD component disk from dev md0 array To remove a failed MD component disk from dev mdo run the following command mdadm remove dev md0 dev hpdev sde Following is an example of the status message that is displayed when a failed component is removed from the MD array root dlhctl1 dev cat proc mdstat Personalities raidl md0 active raidl sdf 0 9766784 blocks 2 1 U_ unused devices lt none gt Adding a Mirror Component Device As mentioned earlier in certain failure scenarios you need to remove a failed mirror disk component repair it and then add it back intoan MD array Run the following command to add a mirror component back into the MD array mdadm add lt md device name gt lt md mirror_component_persistent_name gt Example A 4 Adding a new disk as an MD component to dev md0 array To add a new disk to the dev md0 array run the following command mdadm add dev md0 dev hpdev sde Following is an example of the status message displayed in the proc mdstat file once the disk is added Personalities raidl md0 active raidl sde 2 sdf 0 9766784 blocks 2 1 U_ EP eae ee eae eed wre was recovery 8 9 871232 9766784 finish 2 7min speed 5
92. ting the Need for Disaster Tolerance 0 0c cece eee eee eens 14 What is a Disaster Tolerant Architecture 0 0 eens 16 Understanding Types of Disaster Tolerant ClusterS 0 0 0 0 c eee eens 18 Extended Distance ClusterS 0 0 ccc cent nee nee nes 18 Cluster Extension CLX Cluster 0 ccc ee eee eee ees 23 Continental CMUSter seis ea ae Oat Aaa oe ewe A ale Oodle ae deal we sels 27 Continental Cluster With Cascading Failover 000 c eee eee 30 Comparison of Disaster Tolerant Solutions 0 00 cece eee eee eee nes 30 Disaster Tolerant Architecture Guidelines 0 0 0 cee 37 Protecting Nodes through Geographic Dispersion 0 000 eee eee 37 Protecting Data through Replication 0 0 0 c eee 38 Using Alternative Power SOUrCES cece teen een eens 44 Creating Highly Available Networking 0 0 cece eee ee nee eee n eens 45 Disaster Tolerant Cluster Limitations 0 c cece eee eee eens 47 Managing a Disaster Tolerant Environment 00c cece eee teens 48 Additional Disaster Tolerant Solutions Information 00c cece eee ees 50 2 Building an Extended Distance Cluster Using Serviceguard and Software RAID Types of Data Link for Storage and Networking s s s sasesana reae 52 Two Data Center and Quorum Service Location Architectures 00000 53 Rules for Separate Network and Data LINKS 0 0 c cece eee eee eens 57 Guidelin
93. tion schemes with Continentalclusters is included in the Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters user s guide Besides selecting your own storage and data replication solution you can also take advantage of the following HP preintegrated solutions Storage subsystems implemented by CLX are also pre integrated with Continentalclusters Continentalclusters uses the same data replication integration module that CLX implements to check for data status of the application package before package start up If Oracle DBMS is used and logical data replication is the preferred method depending on the version either Oracle 8i Standby or Oracle 9i Data Guard with log shipping is used to 29 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters 30 replicate the data between two data centers HP provides a supported integration toolkit for Oracle 8i Standby DB in the Enterprise Cluster Management Toolkit ECMT e RAC is supported by Continentalclusters by integrating it with SGeRAC In this configuration multiple nodes in a single cluster can simultaneously access the database that is nodes in one data center can access the database If the site fails the RAC instances can be recovered at the second site e Continentalclusters supports a maximum of 4 clusters with up to 16 nodes per cluster for a maximum of 64 nodes supporting up t
94. uarantees a network latency of no more than 200ms Maximum distance supported by the data replication link Maximum supported distance for DWDM as stated by the provider In an Extended Distance Cluster there is no built in mechanism for determining the state of the data being replicated When an application fails over from one data center to another the package is allowed to start up if the volume group s can be activated A CLX implementation provides a higher degree of data integrity that is the application is only allowed to start up based on the state of the data and the disk arrays It is possible for data to be updated on the disk system local toa server running a package without remote data being updated This happens if the data link between sites is lost usually as a precursor to a site going down If that occurs and the site with the latest data then goes down that data is lost The period of time from the link lost to the site going down is called the recovery point An Chapter 1 NOTE Chapter 1 Disaster Tolerance and Recovery in a Serviceguard Cluster Understanding Types of Disaster Tolerant Clusters objective can be set for the recovery point such that if data is updated for a period less than the objective automated failover can occur and a package will start If the time is longer than the objective then the package will not start In a Linux environment this is a user configurable parameter
95. uted physical network path to the other data centers e There can be separate networking and Fibre Channel links between the data centers or both networking and Fibre Channel can go over DWDM links between the data centers 53 Building an Extended Distance Cluster Using Serviceguard and Software RAID Two Data Center and Quorum Service Location Architectures e Fibre Channel Direct Fabric Attach DFA is recommended over Fibre Channel Arbitrated loop configurations due to the superior performance of DFA especially as the distance increases Therefore Fibre Channel switches are recommended over Fibre Channel hubs e Any combination of the following Fibre Channel capable disk arrays may be used HP StorageWorks 1000 and 1500 series Modular Storage Arrays HP StorageWorks Enterprise Virtual Arrays or HP StorageWorks Disk Array XP e For disaster tolerance application data must be mirrored between the primary data centers You must ensure that the mirror copies reside in different data centers as the software cannot determine the locations NOTE When a failure results in the mirror copies losing synchronization MD will perform a full resynchronization when both halves of the mirror are available e Noroutingis allowed for the networks between data centers Routing is allowed to the third data center if a Quorum Server is used in that data center The following is a list of recommended arbitration methods for Extended Distance
96. vious chapters provided information on deploying Software RAID in your environment In this chapter you will find information on how Software RAID addresses various disaster scenarios All the disaster scenarios described in this section have the following three categories e Disaster Scenario Describes the type of disaster and provides details regarding the cause and the sequence of failures leading tothe disasters in the case of multiple failures What happens when this disaster occurs Describes how the Extended Distance Cluster software handles this disaster e Recovery Process After the disaster strikes and necessary actions are taken by the software to handle it you need to ensure that your environment recovers from the disaster This section describes all the steps that an administrator needs to take to repair the failures and restore the cluster to its original state Also all the commands listed in this column must be executed on a single line Chapter 4 85 Disaster Scenarios and Their Handling The following table lists all the disaster scenarios that are handled by the Extended Distance Cluster software All the scenarios assume that the setup is the same as the one described in Extended Distance Clusters on page 18 of this document Table 4 1 Disaster Scenarios and Their Handling Disaster Scenario What Happens When This Disaster Occurs Recovery Process A package P1 is running ona node
97. wing command to enable P1 to run on N1 cmmodpkg e P1 n N1 Chapter 4 91 Disaster Scenarios and Their Handling Table 4 1 Disaster Scenarios and Their Handling Continued Disaster Scenario What Happens When This Disaster Occurs Recovery Process In this case the package P1 runs with RPO_TARGET set to 60 seconds Package P1 is running on node N1 P1 uses a mirror mdO consisting of S1 local to node N1 for example dev hpdev mylink sde and S2 local to node N 2 The first failure occurs when all FC links between two data centers fail causing N1 to lose access to S2 and N2 to lose access to S1 After the package resumes activity and runs for 90 seconds a second failure occurs causing node N1 to fail The package P1 continues to run on N1 with mdO consisting of only S1 after the first failure After the second failure the package does not start up on N2 because when it tries to start with only S2 on N2 it detects that S2 is non current for a time period which is greater than the value of RPO_TARGET In this scenario no attempts are made to repair the first failure until the second failure occurs Complete the following procedure to initiate a recovery 1 To recover from the first failure restore the FC links between the data centers Asa result S1 is accessible from N2 2 After the FC links are restored and S1 is accessible from N2 run the following
98. x Enterprise Server 10 You can contact the HP support personnel to obtain these patches When you create a logical volume on an MD device the actual physical devices that form the MD raid1 mirror must be filtered out to avoid receiving messages from LVM about duplicate PV entries For example let us assume that dev sde and dev sdf are two physical disks that form the md device dev mdo The persistent device names for dev sde and dev sdf are dev hpdev md0_mirror0 and dev hpdev md0_mirror1 respectively When you create a logical volume duplicate entries are detected for the two physical disks that form the mirror device As a result the logical volume is not created and an error message is displayed Following is a sample of the error message that is displayed Chapier 3 NOTE Chapier 3 Configuring your Environment for Software RAID Creating Volume Groups and Configuring VG Exclusive Activation on the MD Mirror Found duplicate PV 9w3TIxKZ61FRqwUmOm9tlV5nsdUkTi4i using dev sde not dev sdf With this error you cannot create a new volume group on dev mdo Asa result you must create a filter for LVM To create a filter add the following linein the etc 1lvm 1lvm conf file filter r dev cdrom r dev hpdev md0O_mirror0 xr dev hpdev mdO_mirror1 where dev hpdev md0_mirror0 and dev hpdev md0_mirror1 arethe persistent device names of the devices dev sde and dev sdf respectively When adding
99. y dispersed nodes can be put in different rooms on different floors of a building or even in separate buildings or separate cities The distance between the nodes is dependent on the types of disaster from which you need protection and on the technology used to replicate data Three types of disaster tolerant clusters are described in this guide e Extended Distance Clusters e Cluster Extension CLX Cluster e Continental Cluster These types differ from a simple local cluster in many ways Extended distance dusters and metropolitan clusters often require right of way from local governments or utilities to lay network and data replication cables or connect to DWDMs This can complicate the design and implementation They also require a different kind of control mechanism for ensuring that data integrity issues do not arise such as a quorum server Typically extended distance and metropolitan clusters use an arbitrator site containing a computer running a quorum application Continental clusters span great distances and operate by replicating data between two completely separate local clusters Continental clusters are not supported with HP Serviceguard for Linux They are described here to show the range of solutions that exist Extended Distance Clusters An extended distance cluster also known as extended campus cluster is anormal Serviceguard cluster that has alternate nodes located in different data centers separated by some d

HP serviceguard t2808-90006 User's Manual

Contents

Download Pdf Manuals

Related Search

Related Contents