Skip to main content

Memory problems

See this section to resolve issues related to memory.

Multiple memory modules in a channel identified as failing

Note
Each time you install or remove a memory module, you must disconnect the server from the power source; then, wait 10 seconds before restarting the server.

Complete the following procedure to solve the problem.

  1. Reseat the memory modules; then, restart the server.
  2. Remove the highest-numbered memory module of those that are identified and replace it with an identical known good memory module; then, restart the server. Repeat as necessary. If the failures continue after all identified memory modules are replaced, go to step 4.
  3. Return the removed memory modules, one at a time, to their original connectors, restarting the server after each memory module, until a memory module fails. Replace each failing memory module with an identical known good memory module, restarting the server after each memory module replacement. Repeat step 3 until you have tested all removed memory modules.
  4. Replace the highest-numbered memory module of those identified; then, restart the server. Repeat as necessary.
  5. Reverse the memory modules between the channels (of the same processor), and then restart the server. If the problem is related to a memory module, replace the failing memory module.
  6. (Trained technician only) Install the failing memory module into a memory module connector for processor 2 (if installed) to verify that the problem is not the processor or the memory module connector.
  7. (Trained technician only) Replace the system board.

Displayed system memory is less than installed physical memory

Complete the following procedure to solve the problem.

Note
Each time you install or remove a memory module, you must disconnect the server from the power source; then, wait 10 seconds before restarting the server.
  1. Make sure that:
    • No error LEDs are lit on the operator information panel.

    • No memory module error LEDs are lit on the system board.

    • Memory mirrored channel does not account for the discrepancy.

    • The memory modules are seated correctly.

    • You have installed the correct type of memory module (see PMEM rules for requirements).

    • After changing or replacing a memory module, memory configuration is updated accordingly in the Setup Utility.

    • All banks of memory are enabled. The server might have automatically disabled a memory bank when it detected a problem, or a memory bank might have been manually disabled.

    • There is no memory mismatch when the server is at the minimum memory configuration.

    • When PMEMs are installed:

      1. If the memory is set in App Direct Mode, all the saved data have been backed up, and created namespaces are deleted before any PMEM is replaced or added.

      2. Refer to PMEM rules and see if the displayed memory fits the mode description.

      3. If PMEMs are recently set in Memory Mode, turn it back to App Direct Mode and examine if there is namespace that has not been deleted.

      4. Go to the Setup Utility, select System Configuration and Boot Management > Intel Optane PMEMs > Security, and make sure security of all the PMEM units is disabled.

  2. Reseat the memory modules, and then restart the server.

  3. Check the POST error log:

    • If a memory module was disabled by a systems-management interrupt (SMI), replace the memory module.

    • If a memory module was disabled by the user or by POST, reseat the memory module; then, run the Setup Utility and enable the memory module.

  4. Run memory diagnostics. When you start a solution and press the key according to the on-screen instructions, the LXPM interface is displayed by default. (For more information, see the “Startup” section in the LXPM documentation compatible with your server at Lenovo XClarity Provisioning Manager portal page.) You can perform memory diagnostics with this interface. From the Diagnostic page, go to Run Diagnostic > Memory test or PMEM test.

    Note
    When PMEMs are installed, run diagnostics based on the mode that is set presently:
    • App Direct Mode:

      • Run Memory Test for DRAM memory modules.

      • Run PMEM Test for PMEMs.

    • Memory Mode:

      Run both Memory Test and PMEM Test for PMEMs.

  5. Reverse the modules between the channels (of the same processor), and then restart the server. If the problem is related to a memory module, replace the failing memory module.
    Note
    When PMEMs are installed, only adopt this method in Memory Mode.
  6. Re-enable all memory modules using the Setup Utility, and then restart the server.

  7. (Trained technician only) Install the failing memory module into a memory module connector for processor 2 (if installed) to verify that the problem is not the processor or the memory module connector.

  8. (Trained technician only) Replace the system board.

Invalid memory population detected

If this warning message appears, complete the following steps:
  • Invalid memory population (unsupported DIMM population) detected. Please verify memory configuration is valid.
  1. See Memory module installation rules and order to make sure the present memory module population sequence is supported.
  2. If the present sequence is indeed supported, see if any of the modules is displayed as “disabled” in Setup Utility.
  3. Reseat the module that is displayed as “disabled,” and reboot the system.
  4. If the problem persists, replace the memory module.

Attempt to change to another PMEM mode fails

After the PMEM mode is changed and the system is successfully restarted, if the PMEM mode stays the same instead of being changed, check the DRAM DIMMs and PMEM capacity to see if it meets the requirement of the new mode (see PMEM rules).

Extra namespace appears in an interleaved region

If there are two created namespaces in one interleaved region, VMware ESXi ignores the created namespaces and creates an extra new namespace during system booting. Delete the created namespaces in either the Setup Utility or the operating system before the first booting with ESXi.

Migrated PMEMs are not supported

If this warning message appears, complete the following steps:
  • Intel Optane PMEM interleave set (DIMM X) is migrated from another system (Platform ID: 0x00), these migrated PMEMs are not supported nor warranted in this system.
  1. Move the modules back to the original system with the exact same configuration as the previous one.
  2. Back up stored data in PMEM namespaces.
  3. Disable PMEM security with one of the following options:
    • LXPM

      Go to UEFI Setup > System Settings > Intel Optane PMEMs > Security > Press to Disable Security, and input passphrase to disable security.

    • Setup Utility

      Go to System Configuration and Boot Management > System Settings > Intel Optane PMEMs > Security > Press to Disable Security, and input passphrase to disable security.

  4. Delete namespaces with command corresponding to the operating system that is installed:
    • Linux command:

      ndctl destroy-namespace all -f
    • Windows Powershell command:

      Get-PmemDisk | Remove-PmemDisk
  5. Clear Platform Configuration Data (PCD) and Namespace Label Storage Area (LSA) with the following ipmctl command (for both Linux and Windows).
    ipmctl delete -pcd
    Note
    See the following links to learn how to download and use impctl in different operating systems:
  6. Reboot the system, and press the key according to the on-screen instructions to enter Setup Utility. (For more information, see the “Startup” section in the LXPM documentation compatible with your server at Lenovo XClarity Provisioning Manager portal page.)
  7. Power off the system.
  8. Remove the modules to be reused for a new system or configuration.

PMEMs installed in wrong slots after system board replacement

If this warning message appears, complete the following steps:
  • DIMM X of Intel Optane PMEM persistent interleave set should be moved to DIMM Y.
  1. Record each of the instructions of changing PMEM slot from XCC events.
  2. Power off the system, and remove the PMEMs that are mentioned in the warning messages. It is suggested to label these PMEMs to avoid confusion.
  3. Install the PMEMs in the correct slot number indicated in the warning messages. Remove the labels to avoid blocking airflow and cooling.
  4. Complete replacement and power on the system. Make sure there is no similar warning messages in XCC.
Note
Do not perform any provisioning on PMEM to avoid data lost when the messages are still present in XCC events.

After PMEMs are reconfigured, error messages and LEDs persist to indicate PMEMs are installed in wrong slots

AC the system or restart XCC to solve this problem.

Cannot create goal successfully when installing PMEMs to the system for the first time

When seeing ones of the following messages:
  • ERROR: Cannot retrieve memory resources info
  • ERROR: One or more PMEM modules do not have PCD data. A platform reboot is recommended to restore valid PCD data.
Complete the following steps to solve the problem.
  1. If the PMEMs have been installed in another system with stored data, perform the following steps to erase the data.
    1. Based on the original population order, install the PMEMs to the original system where they were installed previously, and back up the data from the PMEMs to other storage devices.
    2. Disable PMEM security with one of the following options:
      • LXPM

        Go to UEFI Setup > System Settings > Intel Optane PMEMs > Security > Press to Disable Security, and input passphrase to disable security.

      • Setup Utility

        Go to System Configuration and Boot Management > System Settings > Intel Optane PMEMs > Security > Press to Disable Security, and input passphrase to disable security.

    3. Delete namespaces with command corresponding to the operating system that is installed:
      • Linux command:

        ndctl destroy-namespace all -f
      • Windows Powershell command:

        Get-PmemDisk | Remove-PmemDisk
    4. Clear Platform Configuration Data (PCD) and Namespace Label Storage Area (LSA) with the following ipmctl command (for both Linux and Windows).
      ipmctl delete -pcd
      Note
      See the following links to learn how to download and use impctl in different operating systems:
  2. Install the PMEMs back to the target system, and upgrade system firmware to the latest version without entering Setup Utility.
  3. If the problem persists, overwrite PMEMs with the following ndctl command.

    ndctl sanitize-dimm --overwrite all
  4. Monitor the overwrite status with the following command.

    watch -n 1 “ipmctl show -d OverwriteStatus -dimm”
  5. When seeing all PMEM OverwriteStatus=Completed, reboot the system and see if the problem persists.