Skip to main content

Running system-level diagnostics

You should run comprehensive or focused diagnostic tests for specific components and subsystems whenever you replace the controller.

All commands in the diagnostic procedures are issued from the controller where the component is being replaced.

  1. If the controller to be serviced is not at the LOADER prompt, reboot the controller: halt

    After you issue the command, you should wait until the system stops at the LOADER prompt.

  2. Run diagnostics on the caching module: sldiag device run -dev fcache
  3. Display and note the available devices on the controller module: sldiag device show -dev mb
    The controller module devices and ports displayed can be any one or more of the following:
    • bootmedia is the system booting device.
    • cna is a Converged Network Adapter or interface not connected to a network or storage device.
    • fcal is a Fibre Channel-Arbitrated Loop device not connected to a Fibre Channel network.
    • env is motherboard environmentals.
    • mem is system memory.
    • nic is a network interface card.
    • nvram is nonvolatile RAM.
    • nvmem is a hybrid of NVRAM and system memory.
    • sas is a Serial Attached SCSI device not connected to a disk shelf.
  4. Run diagnostics for each component from the maintenance menu. For example: sldiag device status -dev nvmem -long -state failed
    If you want to run diagnostic tests on...Then...
    Individual components
    1. Clear the status logs: sldiag device clearstatus
    2. Display the available tests for the selected devices: sldiag device show -dev dev_name

      dev_name can be any one of the ports and devices identified in the preceding step.

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Run the selected tests: sldiag device run -dev dev_name

      After the test is complete, the following message is displayed:
      *> <SLDIAG:_ALL_TESTS_COMPLETED>

    5. Verify that no tests failed: sldiag device status -dev dev_name -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

    Multiple components at the same time
    1. Review the enabled and disabled devices in the output from the preceding procedure and determine which ones you want to run concurrently.
    2. List the individual tests for the device: sldiag device show -dev dev_name
    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Verify that the tests were modified: sldiag device show
    5. Repeat these substeps for each device that you want to run concurrently.
    6. Run diagnostics on all of the devices: sldiag device run
      Attention
      Do not add to or modify your entries after you start running diagnostics.
      After the test is complete, the following message is displayed:
      *> <SLDIAG:_ALL_TESTS_COMPLETED>

    7. Verify that there are no hardware problems on the controller: sldiag device status -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

  5. Proceed based on the result of the preceding step.
    If the system-level diagnostics tests...Then...
    Were completed without any failures
    1. Clear the status logs: sldiag device clearstatus
    2. Verify that the log was cleared: sldiag device status

      The following default response is displayed:

      SLDIAG: No log messages are present.

    3. Exit Maintenance mode by typing: halt

      The system displays the LOADER prompt.

    4. Type boot_ontap to return the controller to normal operation.

    Resulted in some test failuresDetermine the cause of the problem.
    1. Exit Maintenance mode: halt
    2. Perform a clean shutdown, and then disconnect the power supplies.
    3. Verify that you have observed all of the considerations identified for running system-level diagnostics, that cables are securely connected, and that hardware components are properly installed in the storage system.
    4. Reconnect the power supplies, and then power on the storage system.
    5. Rerun the system-level diagnostics test.