跳到主要内容

单个存储架丢失后验证运行情况

可以测试单个存储架的故障,以确认没有单点故障。

关于本任务

此过程具有以下预期结果:

  • 监控软件应报告错误消息。

  • 不应发生故障转移或服务中断。

  • 恢复硬件故障后,镜像再同步将自动启动。

  1. 检查存储故障转移状态:storage failover show

    示例

    cluster_A::> storage failover show

    Node Partner Possible State Description
    -------------- -------------- -------- -------------------------------------
    node_A_1 node_A_2 true Connected to node_A_2
    node_A_2 node_A_1 true Connected to node_A_1
    2 entries were displayed.

  2. 检查聚合状态:storage aggregate show

    示例

    cluster_A::> storage aggregate show

    cluster Aggregates:
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    node_A_1data01_mirrored
    4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_1root
    707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_2_data01_mirrored
    4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
    mirrored,
    normal
    node_A_2_data02_unmirrored
    2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
    normal
    node_A_2_root
    707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
    mirrored,
    normal

  3. 验证所有数据 SVM 和数据卷是否联机并正在提供数据:vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*

    示例

    cluster_A::> vserver show -type data

    cluster_A::> vserver show -type data
    Admin Operational Root
    Vserver Type Subtype State State Volume Aggregate
    ----------- ------- ---------- ---------- ----------- ---------- ----------
    SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
    SVM2 data sync-source running SVM2_root node_A_2_data01_mirrored

    cluster_A::> network interface show -fields is-home false
    There are no entries matching your query.

    cluster_A::> volume show !vol0,!MDV*
    Vserver Volume Aggregate State Type Size Available Used%
    --------- ------------ ------------ ---------- ---- ---------- ---------- -----
    SVM1
    SVM1_root
    node_A_1data01_mirrored
    online RW 10GB 9.50GB 5%
    SVM1
    SVM1_data_vol
    node_A_1data01_mirrored
    online RW 10GB 9.49GB 5%
    SVM2
    SVM2_root
    node_A_2_data01_mirrored
    online RW 10GB 9.49GB 5%
    SVM2
    SVM2_data_vol
    node_A_2_data02_unmirrored
    online RW 1GB 972.6MB 5%


  4. 确定节点 node_A_2 的池 1 中要关闭电源(以模拟突发硬件故障)的一个存储架:storage aggregate show -r -node node-name !*root

    所选的存储架必须包含属于镜像数据聚合的驱动器。

    示例

    在以下示例中,选择了存储架标识 31 使其发生故障。

    cluster_A::> storage aggregate show -r -node node_A_2 !*root
    Owner Node: node_A_2
    Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirrored) (block checksums)
    Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
    RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
    parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)

    Plex: /node_A_2_data01_mirrored/plex4 (online, normal, active, pool1)
    RAID Group /node_A_2_data01_mirrored/plex4/rg0 (normal, block checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity 1.31.7 1 BSAS 7200 827.7GB 828.0GB (normal)
    parity 1.31.6 1 BSAS 7200 827.7GB 828.0GB (normal)
    data 1.31.3 1 BSAS 7200 827.7GB 828.0GB (normal)
    data 1.31.4 1 BSAS 7200 827.7GB 828.0GB (normal)
    data 1.31.5 1 BSAS 7200 827.7GB 828.0GB (normal)

    Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
    Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
    RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
    parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
    15 entries were displayed.

  5. 对所选的存储架进行物理断电。
  6. 再次检查聚合状态:storage aggregate show storage aggregate show -r -node node_A_2 !*root

    示例

    在已关闭电源的存储架上有驱动器的聚合应显示 degraded RAID 状态,而受影响的丛上的驱动器应显示 failed 状态,如以下示例所示:

    cluster_A::> storage aggregate show
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    node_A_1data01_mirrored
    4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_1root
    707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_2_data01_mirrored
    4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
    mirror
    degraded
    node_A_2_data02_unmirrored
    2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
    normal
    node_A_2_root
    707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
    mirror
    degraded
    cluster_A::> storage aggregate show -r -node node_A_2 !*root
    Owner Node: node_A_2
    Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirror degraded) (block checksums)
    Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
    RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity 2.30.3 0 BSAS 7200 827.7GB 828.0GB (normal)
    parity 2.30.4 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.6 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.8 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.5 0 BSAS 7200 827.7GB 828.0GB (normal)

    Plex: /node_A_2_data01_mirrored/plex4 (offline, failed, inactive, pool1)
    RAID Group /node_A_2_data01_mirrored/plex4/rg0 (partial, none checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity FAILED - - - 827.7GB - (failed)
    parity FAILED - - - 827.7GB - (failed)
    data FAILED - - - 827.7GB - (failed)
    data FAILED - - - 827.7GB - (failed)
    data FAILED - - - 827.7GB - (failed)

    Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
    Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
    RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
    Usable Physical
    Position Disk Pool Type RPM Size Size Status
    -------- --------------------------- ---- ----- ------ -------- -------- ----------
    dparity 2.30.12 0 BSAS 7200 827.7GB 828.0GB (normal)
    parity 2.30.22 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.21 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.20 0 BSAS 7200 827.7GB 828.0GB (normal)
    data 2.30.14 0 BSAS 7200 827.7GB 828.0GB (normal)
    15 entries were displayed.

  7. 验证是否正在提供数据以及所有卷是否仍然联机:vserver show -type data network interface show -fields is-home false volume show !vol0,!MDV*

    示例

    cluster_A::> vserver show -type data

    cluster_A::> vserver show -type data
    Admin Operational Root
    Vserver Type Subtype State State Volume Aggregate
    ----------- ------- ---------- ---------- ----------- ---------- ----------
    SVM1 data sync-source running SVM1_root node_A_1_data01_mirrored
    SVM2 data sync-source running SVM2_root node_A_1_data01_mirrored

    cluster_A::> network interface show -fields is-home false
    There are no entries matching your query.

    cluster_A::> volume show !vol0,!MDV*
    Vserver Volume Aggregate State Type Size Available Used%
    --------- ------------ ------------ ---------- ---- ---------- ---------- -----
    SVM1
    SVM1_root
    node_A_1data01_mirrored
    online RW 10GB 9.50GB 5%
    SVM1
    SVM1_data_vol
    node_A_1data01_mirrored
    online RW 10GB 9.49GB 5%
    SVM2
    SVM2_root
    node_A_1data01_mirrored
    online RW 10GB 9.49GB 5%
    SVM2
    SVM2_data_vol
    node_A_2_data02_unmirrored
    online RW 1GB 972.6MB 5%

  8. 打开存储架的物理电源。

    再同步将自动开始。

  9. 验证再同步是否已开始:storage aggregate show

    示例

    受影响的聚合应显示 resyncing RAID 状态,如以下示例所示:

    cluster_A::> storage aggregate show
    cluster Aggregates:
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    node_A_1_data01_mirrored
    4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_1_root
    707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_2_data01_mirrored
    4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
    resyncing
    node_A_2_data02_unmirrored
    2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
    normal
    node_A_2_root
    707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
    resyncing

  10. 监控聚合以确认再同步已完成:storage aggregate show

    示例

    受影响的聚合应显示 normal RAID 状态,如以下示例所示:

    cluster_A::> storage aggregate show
    cluster Aggregates:
    Aggregate Size Available Used% State #Vols Nodes RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    node_A_1data01_mirrored
    4.15TB 3.40TB 18% online 3 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_1root
    707.7GB 34.29GB 95% online 1 node_A_1 raid_dp,
    mirrored,
    normal
    node_A_2_data01_mirrored
    4.15TB 4.12TB 1% online 2 node_A_2 raid_dp,
    normal
    node_A_2_data02_unmirrored
    2.18TB 2.18TB 0% online 1 node_A_2 raid_dp,
    normal
    node_A_2_root
    707.7GB 34.27GB 95% online 1 node_A_2 raid_dp,
    resyncing