Types of disasters and recovery methods

You need to be familiar with different types of failures and disasters so that you can use the MetroCluster configuration to respond appropriately.

Single-node failure
A single component in the local HA pair fails.
In a four-node MetroCluster configuration, this failure might lead to an automatic or a negotiated takeover of the impaired node, depending on the component that failed. Data recovery is described in the High Availability Configuration Guide.
High-availability Configuration Guide
Site-wide controller failure
All controller modules fail at a site because of loss of power, replacement of equipment, or disaster. Typically, MetroCluster configurations cannot differentiate between failures and disasters. However, witness software, such as the MetroCluster Tiebreaker software, can differentiate between them. A site-wide controller failure condition can lead to an automatic switchover if Inter-Switch Link (ISL) links and switches are up and the storage is accessible.
The High-Availability Configuration Guide has more information about how to recover from site-wide controller failures that do not include controller failures, as well as failures that include of one or more controllers.
ISL failure
The links between the sites fail. The MetroCluster configuration takes no action. Each node continues to serve data normally, but the mirrors are not written to the respective disaster recovery sites because access to them is lost.
Multiple sequential failures
Multiple components fail in a sequence. For example, a controller module, a switch fabric, and a shelf fail in a sequence and result in a storage failover, fabric redundancy, and SyncMirror sequentially protecting against downtime and data loss.

The following table shows failure types, and the corresponding disaster recovery (DR) mechanism and recovery method:

Note

AUSO (automatic unscheduled switchover) is not supported on MetroCluster IP configurations.

Failure type	DR mechanism	Summary of recovery method
Failure type	Four-node configuration	Four-node configuration
Single-node failure	Local HA failover	Not required if automatic failover and giveback is enabled.
Site failure	MetroCluster switchover	After the node is restored, manual healing and switchback using the metrocluster healing and metrocluster switchback commands is required. Note The metrocluster heal commands are not required on MetroCluster IP configurations running ONTAP 9.5.
Site-wide controller failure	AUSO Only if the storage at the disaster site is accessible.
Multiple sequential failures	Local HA failover followed by MetroCluster forced switchover using the metrocluster switchover -forced-on-disaster command. Note Depending on the component that failed, a forced switchover might not be required.
ISL failure	No MetroCluster switchover; the two clusters independently serve their data	Not required for this type of failure. After you restore connectivity, the storage resynchronizes automatically.

Give documentation feedback