Skip to main content

Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)

After replacing hardware and assigning disks, in systems running ONTAP 9.5 or earlier you can perform the MetroCluster healing operations. In all versions of ONTAP, you must then confirm that aggregates are mirrored and, if necessary, restart mirroring.

About this task

Starting with ONTAP 9.6, the healing operations are performed automatically when the disaster site nodes boot up. The healing commands are not required.

These steps are performed on the surviving cluster.

  1. If you are using ONTAP 9.6 or later, you must verify that automatic healing completed successfully:
    1. Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed: metrocluster operation history show

      Example

      The following output shows that the operations have completed successfully on cluster_A.

      cluster_B::*> metrocluster operation history show
      Operation State Start Time End Time
      ----------------------------- -------------- ---------------- ----------------
      heal-root-aggr-auto successful 2/25/2019 06:45:58
      2/25/2019 06:46:02
      heal-aggr-auto successful 2/25/2019 06:45:48
      2/25/2019 06:45:52
      .
      .
      .

    2. Confirm that the disaster site is ready for switchback:metrocluster node show

      Example

      The following output shows that the operations have completed successfully on cluster_A.

      cluster_B::*> metrocluster node show
      DR Configuration DR
      Group Cluster Node State Mirroring Mode
      ----- ------- ------------- -------------- --------- --------------------
      1 cluster_A
      node_A_1 configured enabled heal roots completed
      node_A_2 configured enabled heal roots completed
      cluster_B
      node_B_1 configured enabled waiting for switchback recovery
      node_B_2 configured enabled waiting for switchback recovery
      4 entries were displayed.
  2. If you are using ONTAP 9.5 or earlier, you must perform aggregate healing:
    1. Verify the state of the nodes: metrocluster node show

      Example

      The following output shows that switchover has completed, so healing can be performed.

      cluster_B::> metrocluster node show
      DR Configuration DR
      Group Cluster Node State Mirroring Mode
      ----- ------- ------------------ -------------- --------- --------------------
      1 cluster_B
      node_B_1 configured enabled switchover completed
      node_B_2 configured enabled switchover completed
      cluster_A
      node_A_1 configured enabled waiting for switchback recovery
      node_A_2 configured enabled waiting for switchback recovery
      4 entries were displayed.

      cluster_B::>
    2. Perform the aggregates healing phase: metrocluster heal -phase aggregates

      Example

      The following output shows a typical aggregates healing operation.

      cluster_B::*> metrocluster heal -phase aggregates
      [Job 647] Job succeeded: Heal Aggregates is successful.

      cluster_B::*> metrocluster operation show
      Operation: heal-aggregates
      State: successful
      Start Time: 10/26/2017 12:01:15
      End Time: 10/26/2017 12:01:17
      Errors: -

      cluster_B::*>
    3. Verify that heal aggregates has completed and the disaster site is ready for switchback: metrocluster node show

      Example

      The following output shows that the heal aggregates phase has completed on cluster_A.

      cluster_B::> metrocluster node show
      DR Configuration DR
      Group Cluster Node State Mirroring Mode
      ----- ------- ------------------ -------------- --------- --------------------
      1 cluster_A
      node_A_1 configured enabled heal aggregates completed
      node_A_2 configured enabled heal aggregates completed
      cluster_B
      node_B_1 configured enabled waiting for switchback recovery
      node_B_2 configured enabled waiting for switchback recovery
      4 entries were displayed.

      cluster_B::>

  3. If disks have been replaced, you must mirror the local and switched over aggregates:
    1. Display the aggregates: storage aggregate show

      Example

      cluster_B::> storage aggregate show
      cluster_B Aggregates:
      Aggregate Size Available Used% State #Vols Nodes RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4,
      normal
      node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4,
      normal
      node_B_1_aggr1 3.14TB 3.04TB 3% online 15 node_B_1 raid_dp,
      normal
      node_B_1_aggr2 3.14TB 3.06TB 3% online 14 node_B_1 raid_tec,
      normal
      node_B_1_aggr1 3.14TB 2.99TB 5% online 37 node_B_2 raid_dp,
      normal
      node_B_1_aggr2 3.14TB 3.02TB 4% online 35 node_B_2 raid_tec,
      normal

      cluster_A Switched Over Aggregates:
      Aggregate Size Available Used% State #Vols Nodes RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_A_1_aggr1 2.36TB 2.12TB 10% online 91 node_B_1 raid_dp,
      normal
      node_A_1_aggr2 3.14TB 2.90TB 8% online 90 node_B_1 raid_tec,
      normal
      node_A_2_aggr1 2.36TB 2.10TB 11% online 91 node_B_2 raid_dp,
      normal
      node_A_2_aggr2 3.14TB 2.89TB 8% online 90 node_B_2 raid_tec,
      normal
      12 entries were displayed.

      cluster_B::>
    2. Mirror the aggregate: storage aggregate mirror -aggregate aggregate-name

      Example

      The following output shows a typical mirroring operation.

      cluster_B::> storage aggregate mirror -aggregate node_B_1_aggr1

      Info: Disks would be added to aggregate "node_B_1_aggr1" on node "node_B_1" in
      the following manner:

      Second Plex

      RAID Group rg0, 6 disks (block checksum, raid_dp)
      Position Disk Type Size
      ---------- ------------------------- ---------- ---------------
      dparity 5.20.6 SSD -
      parity 5.20.14 SSD -
      data 5.21.1 SSD 894.0GB
      data 5.21.3 SSD 894.0GB
      data 5.22.3 SSD 894.0GB
      data 5.21.13 SSD 894.0GB

      Aggregate capacity available for volume use would be 2.99TB.

      Do you want to continue? {y|n}: y
    3. Repeat the previous step for each of the aggregates from the surviving site.
    4. Wait for the aggregates to resynchronize; you can check the status with the storage aggregate show command.

      Example

      The following output shows that a number of aggregates are resynchronizing.

      cluster_B::> storage aggregate show

      cluster_B Aggregates:
      Aggregate Size Available Used% State #Vols Nodes RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4,
      mirrored,
      normal
      node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4,
      mirrored,
      normal
      node_B_1_aggr1 2.86TB 2.76TB 4% online 15 node_B_1 raid_dp,
      resyncing
      node_B_1_aggr2 2.89TB 2.81TB 3% online 14 node_B_1 raid_tec,
      resyncing
      node_B_2_aggr1 2.73TB 2.58TB 6% online 37 node_B_2 raid_dp,
      resyncing
      node_B-2_aggr2 2.83TB 2.71TB 4% online 35 node_B_2 raid_tec,
      resyncing

      cluster_A Switched Over Aggregates:
      Aggregate Size Available Used% State #Vols Nodes RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_A_1_aggr1 1.86TB 1.62TB 13% online 91 node_B_1 raid_dp,
      resyncing
      node_A_1_aggr2 2.58TB 2.33TB 10% online 90 node_B_1 raid_tec,
      resyncing
      node_A_2_aggr1 1.79TB 1.53TB 14% online 91 node_B_2 raid_dp,
      resyncing
      node_A_2_aggr2 2.64TB 2.39TB 9% online 90 node_B_2 raid_tec,
      resyncing
      12 entries were displayed.
    5. Confirm that all aggregates are online and have resynchronized: storage aggregate plex show

      Example

      The following output shows that all aggregates have resynchronized.

      cluster_A::> storage aggregate plex show
      ()
      Is Is Resyncing
      Aggregate Plex Online Resyncing Percent Status
      --------- --------- ------- ---------- --------- ---------------
      node_B_1_aggr0 plex0 true false - normal,active
      node_B_1_aggr0 plex8 true false - normal,active
      node_B_2_aggr0 plex0 true false - normal,active
      node_B_2_aggr0 plex8 true false - normal,active
      node_B_1_aggr1 plex0 true false - normal,active
      node_B_1_aggr1 plex9 true false - normal,active
      node_B_1_aggr2 plex0 true false - normal,active
      node_B_1_aggr2 plex5 true false - normal,active
      node_B_2_aggr1 plex0 true false - normal,active
      node_B_2_aggr1 plex9 true false - normal,active
      node_B_2_aggr2 plex0 true false - normal,active
      node_B_2_aggr2 plex5 true false - normal,active
      node_A_1_aggr1 plex4 true false - normal,active
      node_A_1_aggr1 plex8 true false - normal,active
      node_A_1_aggr2 plex1 true false - normal,active
      node_A_1_aggr2 plex5 true false - normal,active
      node_A_2_aggr1 plex4 true false - normal,active
      node_A_2_aggr1 plex8 true false - normal,active
      node_A_2_aggr2 plex1 true false - normal,active
      node_A_2_aggr2 plex5 true false - normal,active
      20 entries were displayed.
  4. On systems running ONTAP 9.5 and earlier, perform the root-aggregates healing phase: metrocluster heal -phase root-aggregates

    Example

    cluster_B::> metrocluster heal -phase root-aggregates
    [Job 651] Job is queued: MetroCluster Heal Root Aggregates Job.Oct 26 13:05:00
    [Job 651] Job succeeded: Heal Root Aggregates is successful.
  5. Verify that heal root-aggregates has completed and the disaster site is ready for switchback:

    Example

    The following output shows that the heal roots phase has completed on cluster_A.

    cluster_B::> metrocluster node show
    DR Configuration DR
    Group Cluster Node State Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1 cluster_A
    node_A_1 configured enabled heal roots completed
    node_A_2 configured enabled heal roots completed
    cluster_B
    node_B_1 configured enabled waiting for switchback recovery
    node_B_2 configured enabled waiting for switchback recovery
    4 entries were displayed.

    cluster_B::>

After you finish

Proceed to verify the licenses on the replaced nodes.

Verifying licenses on the replaced nodes