Verifying operation after loss of a single storage shelf
You can test the failure of a single storage shelf to verify that there is no single point of failure.
This procedure has the following expected results:
An error message should be reported by the monitoring software.
No failover or loss of service should occur.
Mirror resynchronization starts automatically after the hardware failure is restored.
- Check the storage failover status: storage failover show
cluster_A::> storage failover show
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node_A_1 node_A_2 true Connected to node_A_2
node_A_2 node_A_1 true Connected to node_A_1
2 entries were displayed. - Check the aggregate status: storage aggregate show
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
mirrored,
normal
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
mirrored,
normal - Verify that all data SVMs and data volumes are online and serving data: vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*
cluster_A::> vserver show -type data
cluster_A::> vserver show -type data
Admin Operational Root
Vserver Type Subtype State State Volume Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
SVM2 data sync-source running SVM2_root node_A_2_data01_mirrored
cluster_A::> network interface show -fields is-home false
There are no entries matching your query.
cluster_A::> volume show !vol0,!MDV*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
SVM1_root
node_A_1data01_mirrored
online RW 10GB 9.50GB 5%
SVM1
SVM1_data_vol
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_root
node_A_2_data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_data_vol
node_A_2_data02_unmirrored
online RW 1GB 972.6MB 5% - Identify a shelf in Pool 1 for node node_A_2 to power off to simulate a sudden hardware failure: storage aggregate show -r -node node-name !*root The shelf you select must contain drives that are part of a mirrored data aggregate.In the following example, shelf ID 31 is selected to fail.
cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirrored) (block checksums)
Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)
Plex: /node_A_2_data01_mirrored/plex4 (online, normal, active, pool1)
RAID Group /node_A_2_data01_mirrored/plex4/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 1.31.7 1 BSAS 7200 827.7GB 828.0GB (normal)
parity 1.31.6 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.3 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.4 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.5 1 BSAS 7200 827.7GB 828.0GB (normal)
Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
15 entries were displayed. - Physically power off the shelf that you selected.
- Check the aggregate status again: storage aggregate show storage aggregate show -r -node node_A_2 !*root The aggregate with drives on the powered-off shelf should have a degraded RAID status, and drives on the affected plex should have a failed status, as shown in the following example:
cluster_A::> storage aggregate show
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
mirror
degraded
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
mirror
degraded
cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirror degraded) (block checksums)
Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)
Plex: /node_A_2_data01_mirrored/plex4 (offline, failed, inactive, pool1)
RAID Group /node_A_2_data01_mirrored/plex4/rg0 (partial, none checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity FAILED - - - 827.7GB - (failed)
parity FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
15 entries were displayed. - Verify that the data is being served and that all volumes are still online: vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*
cluster_A::> vserver show -type data
cluster_A::> vserver show -type data
Admin Operational Root
Vserver Type Subtype State State Volume Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
SVM2 data sync-source running SVM2_root node_A_1_data01_mirrored
cluster_A::> network interface show -fields is-home false
There are no entries matching your query.
cluster_A::> volume show !vol0,!MDV*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
SVM1_root
node_A_1data01_mirrored
online RW 10GB 9.50GB 5%
SVM1
SVM1_data_vol
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_root
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_data_vol
node_A_2_data02_unmirrored
online RW 1GB 972.6MB 5% - Physically power on the shelf.Resynchronization starts automatically.
- Verify that resynchronization has started: storage aggregate show The affected aggregate should have a resyncing RAID status, as shown in the following example:
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1_root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
resyncing
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
resyncing - Monitor the aggregate to confirm that resynchronization is complete: storage aggregate show The affected aggregate should have a normal RAID status, as shown in the following example:
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
normal
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
resyncing
Give documentation feedback