Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)
After replacing hardware and assigning disks, in systems running ONTAP 9.5 or earlier you can perform the MetroCluster healing operations. In all versions of ONTAP, you must then confirm that aggregates are mirrored and, if necessary, restart mirroring.
About this task
Starting with ONTAP 9.6, the healing operations are performed automatically when the disaster site nodes boot up. The healing commands are not required.
These steps are performed on the surviving cluster.
- If you are using ONTAP 9.6 or later, you must verify that automatic healing completed successfully:
- Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed: metrocluster operation history show
Example
The following output shows that the operations have completed successfully on cluster_A.
cluster_B::*> metrocluster operation history show
Operation State Start Time End Time
----------------------------- -------------- ---------------- ----------------
heal-root-aggr-auto successful 2/25/2019 06:45:58
2/25/2019 06:46:02
heal-aggr-auto successful 2/25/2019 06:45:48
2/25/2019 06:45:52
.
.
. - Confirm that the disaster site is ready for switchback:metrocluster node show
Example
The following output shows that the operations have completed successfully on cluster_A.
cluster_B::*> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------- -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled heal roots completed
node_A_2 configured enabled heal roots completed
cluster_B
node_B_1 configured enabled waiting for switchback recovery
node_B_2 configured enabled waiting for switchback recovery
4 entries were displayed.
- Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed: metrocluster operation history show
- If you are using ONTAP 9.5 or earlier, you must perform aggregate healing:
- Verify the state of the nodes: metrocluster node show
Example
The following output shows that switchover has completed, so healing can be performed.
cluster_B::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_B
node_B_1 configured enabled switchover completed
node_B_2 configured enabled switchover completed
cluster_A
node_A_1 configured enabled waiting for switchback recovery
node_A_2 configured enabled waiting for switchback recovery
4 entries were displayed.
cluster_B::> - Perform the aggregates healing phase: metrocluster heal -phase aggregates
Example
The following output shows a typical aggregates healing operation.
cluster_B::*> metrocluster heal -phase aggregates
[Job 647] Job succeeded: Heal Aggregates is successful.
cluster_B::*> metrocluster operation show
Operation: heal-aggregates
State: successful
Start Time: 10/26/2017 12:01:15
End Time: 10/26/2017 12:01:17
Errors: -
cluster_B::*> - Verify that heal aggregates has completed and the disaster site is ready for switchback: metrocluster node show
Example
The following output shows that the heal aggregates phase has completed on cluster_A.
cluster_B::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled heal aggregates completed
node_A_2 configured enabled heal aggregates completed
cluster_B
node_B_1 configured enabled waiting for switchback recovery
node_B_2 configured enabled waiting for switchback recovery
4 entries were displayed.
cluster_B::>
- Verify the state of the nodes: metrocluster node show
- If disks have been replaced, you must mirror the local and switched over aggregates:
- Display the aggregates: storage aggregate show
Example
cluster_B::> storage aggregate show
cluster_B Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4,
normal
node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4,
normal
node_B_1_aggr1 3.14TB 3.04TB 3% online 15 node_B_1 raid_dp,
normal
node_B_1_aggr2 3.14TB 3.06TB 3% online 14 node_B_1 raid_tec,
normal
node_B_1_aggr1 3.14TB 2.99TB 5% online 37 node_B_2 raid_dp,
normal
node_B_1_aggr2 3.14TB 3.02TB 4% online 35 node_B_2 raid_tec,
normal
cluster_A Switched Over Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_aggr1 2.36TB 2.12TB 10% online 91 node_B_1 raid_dp,
normal
node_A_1_aggr2 3.14TB 2.90TB 8% online 90 node_B_1 raid_tec,
normal
node_A_2_aggr1 2.36TB 2.10TB 11% online 91 node_B_2 raid_dp,
normal
node_A_2_aggr2 3.14TB 2.89TB 8% online 90 node_B_2 raid_tec,
normal
12 entries were displayed.
cluster_B::> - Mirror the aggregate: storage aggregate mirror -aggregate aggregate-name
Example
The following output shows a typical mirroring operation.
cluster_B::> storage aggregate mirror -aggregate node_B_1_aggr1
Info: Disks would be added to aggregate "node_B_1_aggr1" on node "node_B_1" in
the following manner:
Second Plex
RAID Group rg0, 6 disks (block checksum, raid_dp)
Position Disk Type Size
---------- ------------------------- ---------- ---------------
dparity 5.20.6 SSD -
parity 5.20.14 SSD -
data 5.21.1 SSD 894.0GB
data 5.21.3 SSD 894.0GB
data 5.22.3 SSD 894.0GB
data 5.21.13 SSD 894.0GB
Aggregate capacity available for volume use would be 2.99TB.
Do you want to continue? {y|n}: y - Repeat the previous step for each of the aggregates from the surviving site.
- Wait for the aggregates to resynchronize; you can check the status with the storage aggregate show command.
Example
The following output shows that a number of aggregates are resynchronizing.
cluster_B::> storage aggregate show
cluster_B Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4,
mirrored,
normal
node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4,
mirrored,
normal
node_B_1_aggr1 2.86TB 2.76TB 4% online 15 node_B_1 raid_dp,
resyncing
node_B_1_aggr2 2.89TB 2.81TB 3% online 14 node_B_1 raid_tec,
resyncing
node_B_2_aggr1 2.73TB 2.58TB 6% online 37 node_B_2 raid_dp,
resyncing
node_B-2_aggr2 2.83TB 2.71TB 4% online 35 node_B_2 raid_tec,
resyncing
cluster_A Switched Over Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_aggr1 1.86TB 1.62TB 13% online 91 node_B_1 raid_dp,
resyncing
node_A_1_aggr2 2.58TB 2.33TB 10% online 90 node_B_1 raid_tec,
resyncing
node_A_2_aggr1 1.79TB 1.53TB 14% online 91 node_B_2 raid_dp,
resyncing
node_A_2_aggr2 2.64TB 2.39TB 9% online 90 node_B_2 raid_tec,
resyncing
12 entries were displayed. - Confirm that all aggregates are online and have resynchronized: storage aggregate plex show
Example
The following output shows that all aggregates have resynchronized.
cluster_A::> storage aggregate plex show
()
Is Is Resyncing
Aggregate Plex Online Resyncing Percent Status
--------- --------- ------- ---------- --------- ---------------
node_B_1_aggr0 plex0 true false - normal,active
node_B_1_aggr0 plex8 true false - normal,active
node_B_2_aggr0 plex0 true false - normal,active
node_B_2_aggr0 plex8 true false - normal,active
node_B_1_aggr1 plex0 true false - normal,active
node_B_1_aggr1 plex9 true false - normal,active
node_B_1_aggr2 plex0 true false - normal,active
node_B_1_aggr2 plex5 true false - normal,active
node_B_2_aggr1 plex0 true false - normal,active
node_B_2_aggr1 plex9 true false - normal,active
node_B_2_aggr2 plex0 true false - normal,active
node_B_2_aggr2 plex5 true false - normal,active
node_A_1_aggr1 plex4 true false - normal,active
node_A_1_aggr1 plex8 true false - normal,active
node_A_1_aggr2 plex1 true false - normal,active
node_A_1_aggr2 plex5 true false - normal,active
node_A_2_aggr1 plex4 true false - normal,active
node_A_2_aggr1 plex8 true false - normal,active
node_A_2_aggr2 plex1 true false - normal,active
node_A_2_aggr2 plex5 true false - normal,active
20 entries were displayed.
- Display the aggregates: storage aggregate show
- On systems running ONTAP 9.5 and earlier, perform the root-aggregates healing phase: metrocluster heal -phase root-aggregates
Example
cluster_B::> metrocluster heal -phase root-aggregates
[Job 651] Job is queued: MetroCluster Heal Root Aggregates Job.Oct 26 13:05:00
[Job 651] Job succeeded: Heal Root Aggregates is successful. - Verify that heal root-aggregates has completed and the disaster site is ready for switchback:
Example
The following output shows that the heal roots phase has completed on cluster_A.
cluster_B::> metrocluster node show
DR Configuration DR
Group Cluster Node State Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1 cluster_A
node_A_1 configured enabled heal roots completed
node_A_2 configured enabled heal roots completed
cluster_B
node_B_1 configured enabled waiting for switchback recovery
node_B_2 configured enabled waiting for switchback recovery
4 entries were displayed.
cluster_B::>
After you finish
Proceed to verify the licenses on the replaced nodes.