Skip to main content

Verifying healing and manual switchback

You can test the healing and manual switchback operations to verify that data availability is not affected (except for SMB configurations) by switching back the cluster to the original data center after a negotiated switchover.

About this task

This test should take about 30 minutes.

The expected result of this procedure is that services should be switched back to their home nodes.

The healing steps are not required on systems running ONTAP 9.5 or later, on which healing is performed automatically after a negotiated switchover. On systems running ONTAP 9.6 and later, healing is also performed automatically after unscheduled switchover.

  1. If the system is running ONTAP 9.4 or earlier, heal the data aggregate: metrocluster heal aggregates

    Example

    The following example shows the successful completion of the command:
    cluster_A::> metrocluster heal aggregates
    [Job 936] Job succeeded: Heal Aggregates is successful.
  2. If the system is running ONTAP 9.4 or earlier, heal the root aggregate: metrocluster heal root-aggregates

    This step is required on the following configurations:
    • MetroCluster FC configurations.

    • MetroCluster IP configurations running ONTAP 9.4 or earlier.

    Example

    The following example shows the successful completion of the command:
    cluster_A::> metrocluster heal root-aggregates
    [Job 937] Job succeeded: Heal Root Aggregates is successful.
  3. Verify that healing is completed: metrocluster node show

    Example

    The following example shows the successful completion of the command:
    cluster_A::> metrocluster node show
    DR Configuration DR
    Group Cluster Node State Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1 cluster_A
    node_A_1 configured enabled heal roots completed
    cluster_B
    node_B_2 unreachable - switched over
    42 entries were displayed.metrocluster operation show

    Example

    If the automatic healing operation fails for any reason, you must issue the metrocluster heal commands manually as done in ONTAP versions prior to ONTAP 9.5. You can use the metrocluster operation show and metrocluster operation history show -instance commands to monitor the status of healing and determine the cause of a failure.

  4. Verify that all aggregates are mirrored: storage aggregate show

    Example

    The following example shows that all aggregates have a RAID Status of mirrored:
    cluster_A::> storage aggregate show
    cluster Aggregates:
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ----------- ------------
    data_cluster
    4.19TB 4.13TB 2% online 8 node_A_1 raid_dp,
    mirrored,
    normal
    root_cluster
    715.5GB 212.7GB 70% online 1 node_A_1 raid4,
    mirrored,
    normal
    cluster_B Switched Over Aggregates:
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ----------- ------------
    data_cluster_B
    4.19TB 4.11TB 2% online 5 node_A_1 raid_dp,
    mirrored,
    normal
    root_cluster_B - - - unknown - node_A_1 -

  5. Boot nodes from the disaster site.
  6. Check the status of switchback recovery: metrocluster node show

    Example

    cluster_A::> metrocluster node show
    DR Configuration DR
    Group Cluster Node State Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1 cluster_A
    node_A_1 configured enabled heal roots completed
    cluster_B
    node_B_2 configured enabled waiting for switchback
    recovery
    2 entries were displayed.

  7. Perform the switchback: metrocluster switchback

    Example

    cluster_A::> metrocluster switchback 
    [Job 938] Job succeeded: Switchback is successful.Verify switchback
  8. Confirm status of the nodes: metrocluster node show

    Example

    cluster_A::> metrocluster node show
    DR Configuration DR
    Group Cluster Node State Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1 cluster_A
    node_A_1 configured enabled normal
    cluster_B
    node_B_2 configured enabled normal

    2 entries were displayed.

  9. Confirm status of the metrocluster operation: metrocluster operation show

    Example

    The output should show a successful state.
    cluster_A::> metrocluster operation show
    Operation: switchback
    State: successful
    Start Time: 2/6/2016 13:54:25
    End Time: 2/6/2016 13:56:15
    Errors: -