Saltar al contenido principal

SXM5 GPU problems

Use this information to resolve problems that are related to SXM5 GPUs and the SXM5 GPU board.

Note
Make sure to update GPU driver, which includes the nvidia-smi utility required for GPU problem determination. Latest driver can be found at Drivers and Software download website for ThinkSystem SR675 V3.

Health check for SXM5 GPUs

The summary of nvidia-smi utility indicates 4 GPUs online.

System fails to detect the SXM5 GPU board or a specific SXM5 GPU

Go through the following steps to solve the problem.

  1. Power cycle the system.
  2. Check power input related events in XCC.
  3. Check the system temperature.
  4. Reboot the system, and run ipmi health check (see Health check for SXM5 GPUs).
  5. However, if the problem persists, complete the following steps:
    1. Collect XCC service data (see Collecting service data).
    2. Contact Lenovo Service.