Types of disasters and recovery methods
You need to be familiar with different types of failures and disasters so that you can use the MetroCluster configuration to respond appropriately.
Single-node failure
A single component in the local HA pair fails.
In a four-node MetroCluster configuration, this failure might lead to an automatic or a negotiated takeover of the impaired node, depending on the component that failed. Data recovery is described in the High Availability Configuration Guide.
Site-wide controller failure
All controller modules fail at a site because of loss of power, replacement of equipment, or disaster. Typically, MetroCluster configurations cannot differentiate between failures and disasters. However, witness software, such as the MetroCluster Tiebreaker software, can differentiate between them. A site-wide controller failure condition can lead to an automatic switchover if Inter-Switch Link (ISL) links and switches are up and the storage is accessible.
The High-Availability Configuration Guide has more information about how to recover from site-wide controller failures that do not include controller failures, as well as failures that include of one or more controllers.
ISL failure
The links between the sites fail. The MetroCluster configuration takes no action. Each node continues to serve data normally, but the mirrors are not written to the respective disaster recovery sites because access to them is lost.
Multiple sequential failures
Multiple components fail in a sequence. For example, a controller module, a switch fabric, and a shelf fail in a sequence and result in a storage failover, fabric redundancy, and SyncMirror sequentially protecting against downtime and data loss.
The following table shows failure types, and the corresponding disaster recovery (DR) mechanism and recovery method:
Failure type | DR mechanism | Summary of recovery method |
---|---|---|
Four-node configuration | Four-node configuration | |
Single-node failure | Local HA failover | Not required if automatic failover and giveback is enabled. |
Site failure | MetroCluster switchover | After the node is restored, manual healing and switchback using the metrocluster healing and metrocluster switchback commands is required. Note The |
Site-wide controller failure | AUSO Only if the storage at the disaster site is accessible. | |
Multiple sequential failures | Local HA failover followed by MetroCluster forced switchover using the metrocluster switchover -forced-on-disaster command. Note Depending on the component that failed, a forced switchover might not be required. | |
ISL failure | No MetroCluster switchover; the two clusters independently serve their data | Not required for this type of failure. After you restore connectivity, the storage resynchronizes automatically. |