Verifying healing and manual switchback
You can test the healing and manual switchback operations to verify that data availability is not affected (except for SMB configurations) by switching back the cluster to the original data center after a negotiated switchover.
About this task
This test should take about 30 minutes.
The expected result of this procedure is that services should be switched back to their home nodes.
The healing steps are not required on systems running ONTAP 9.5 or later, on which healing is performed automatically after a negotiated switchover. On systems running ONTAP 9.6 and later, healing is also performed automatically after unscheduled switchover.
- If the system is running ONTAP 9.4 or earlier, heal the data aggregate: metrocluster heal aggregates
Example
The following example shows the successful completion of the command:cluster_A::> metrocluster heal aggregates
[Job 936] Job succeeded: Heal Aggregates is successful. - If the system is running ONTAP 9.4 or earlier, heal the root aggregate: metrocluster heal root-aggregates This step is required on the following configurations:
MetroCluster FC configurations.
MetroCluster IP configurations running ONTAP 9.4 or earlier.
Example
The following example shows the successful completion of the command:cluster_A::> metrocluster heal root-aggregates
[Job 937] Job succeeded: Heal Root Aggregates is successful. - Verify that healing is completed: metrocluster node show
Example
The following example shows the successful completion of the command:cluster_A::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled heal roots completed
cluster_B
node_B_2 unreachable - switched over
42 entries were displayed.metrocluster operation showExample
If the automatic healing operation fails for any reason, you must issue the metrocluster heal commands manually as done in ONTAP versions prior to ONTAP 9.5. You can use the metrocluster operation show and metrocluster operation history show -instance commands to monitor the status of healing and determine the cause of a failure.
- Verify that all aggregates are mirrored: storage aggregate show
Example
The following example shows that all aggregates have a RAID Status of mirrored:cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ----------- ------------
data_cluster
4.19TB 4.13TB 2% online 8 node_A_1 raid_dp,
mirrored,
normal
root_cluster
715.5GB 212.7GB 70% online 1 node_A_1 raid4,
mirrored,
normal
cluster_B Switched Over Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ----------- ------------
data_cluster_B
4.19TB 4.11TB 2% online 5 node_A_1 raid_dp,
mirrored,
normal
root_cluster_B - - - unknown - node_A_1 - - Boot nodes from the disaster site.
- Check the status of switchback recovery: metrocluster node show
Example
cluster_A::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled heal roots completed
cluster_B
node_B_2 configured enabled waiting for switchback
recovery
2 entries were displayed. - Perform the switchback: metrocluster switchback
Example
cluster_A::> metrocluster switchback
[Job 938] Job succeeded: Switchback is successful.Verify switchback - Confirm status of the nodes: metrocluster node show
Example
cluster_A::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled normal
cluster_B
node_B_2 configured enabled normal
2 entries were displayed. - Confirm status of the metrocluster operation: metrocluster operation show
Example
The output should show a successful state.cluster_A::> metrocluster operation show
Operation: switchback
State: successful
Start Time: 2/6/2016 13:54:25
End Time: 2/6/2016 13:56:15
Errors: -