单个存储架丢失后验证运行情况
可以测试单个存储架的故障,以确认没有单点故障。
关于本任务
此过程具有以下预期结果:
监控软件应报告错误消息。
不应发生故障转移或服务中断。
恢复硬件故障后,镜像再同步将自动启动。
- 检查存储故障转移状态:storage failover show
示例
cluster_A::> storage failover show
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node_A_1 node_A_2 true Connected to node_A_2
node_A_2 node_A_1 true Connected to node_A_1
2 entries were displayed. - 检查聚合状态:storage aggregate show
示例
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
mirrored,
normal
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
mirrored,
normal - 验证所有数据 SVM 和数据卷是否联机并正在提供数据:vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*
示例
cluster_A::> vserver show -type data
cluster_A::> vserver show -type data
Admin Operational Root
Vserver Type Subtype State State Volume Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
SVM2 data sync-source running SVM2_root node_A_2_data01_mirrored
cluster_A::> network interface show -fields is-home false
There are no entries matching your query.
cluster_A::> volume show !vol0,!MDV*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
SVM1_root
node_A_1data01_mirrored
online RW 10GB 9.50GB 5%
SVM1
SVM1_data_vol
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_root
node_A_2_data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_data_vol
node_A_2_data02_unmirrored
online RW 1GB 972.6MB 5% - 确定节点 node_A_2 的池 1 中要关闭电源(以模拟突发硬件故障)的一个存储架:storage aggregate show -r -node node-name !*root
所选的存储架必须包含属于镜像数据聚合的驱动器。
示例
在以下示例中,选择了存储架标识 31 使其发生故障。
cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirrored) (block checksums)
Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)
Plex: /node_A_2_data01_mirrored/plex4 (online, normal, active, pool1)
RAID Group /node_A_2_data01_mirrored/plex4/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 1.31.7 1 BSAS 7200 827.7GB 828.0GB (normal)
parity 1.31.6 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.3 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.4 1 BSAS 7200 827.7GB 828.0GB (normal)
data 1.31.5 1 BSAS 7200 827.7GB 828.0GB (normal)
Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
15 entries were displayed. - 对所选的存储架进行物理断电。
- 再次检查聚合状态:storage aggregate show storage aggregate show -r -node node_A_2 !*root
示例
在已关闭电源的存储架上有驱动器的聚合应显示 degraded RAID 状态,而受影响的丛上的驱动器应显示 failed 状态,如以下示例所示:
cluster_A::> storage aggregate show
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
mirror
degraded
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
mirror
degraded
cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirror degraded) (block checksums)
Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)
Plex: /node_A_2_data01_mirrored/plex4 (offline, failed, inactive, pool1)
RAID Group /node_A_2_data01_mirrored/plex4/rg0 (partial, none checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity FAILED - - - 827.7GB - (failed)
parity FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
data FAILED - - - 827.7GB - (failed)
Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
Usable Physical
Position Disk Pool Type RPM Size Size Status
-------- --------------------------- ---- ----- ------ -------- -------- ----------
dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
15 entries were displayed. - 验证是否正在提供数据以及所有卷是否仍然联机:vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*
示例
cluster_A::> vserver show -type data
cluster_A::> vserver show -type data
Admin Operational Root
Vserver Type Subtype State State Volume Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
SVM2 data sync-source running SVM2_root node_A_1_data01_mirrored
cluster_A::> network interface show -fields is-home false
There are no entries matching your query.
cluster_A::> volume show !vol0,!MDV*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
SVM1_root
node_A_1data01_mirrored
online RW 10GB 9.50GB 5%
SVM1
SVM1_data_vol
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_root
node_A_1data01_mirrored
online RW 10GB 9.49GB 5%
SVM2
SVM2_data_vol
node_A_2_data02_unmirrored
online RW 1GB 972.6MB 5% - 打开存储架的物理电源。
再同步将自动开始。
- 验证再同步是否已开始:storage aggregate show
示例
受影响的聚合应显示 resyncing RAID 状态,如以下示例所示:
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1_root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
resyncing
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
resyncing - 监控聚合以确认再同步已完成:storage aggregate show
示例
受影响的聚合应显示 normal RAID 状态,如以下示例所示:
cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
mirrored,
normal
node_A_1root
707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
mirrored,
normal
node_A_2_data01_mirrored
4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
normal
node_A_2_data02_unmirrored
2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
normal
node_A_2_root
707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
resyncing
提供反馈