GPU problems
Use this information to resolve problems that are related to GPUs in the compute tray.
Use one of the following commands to check the GPU health status. Make sure to update GPU driver, which includes the following utilities required. Latest driver can be found at Drivers and Software download website for Lenovo NVIDIA GB300 NVL72.
For more information about System Management Interface (SMI), see NVIDIA System Management Interface.
nvidia-smi
Run the nvidia-smi command to display the four GPUs online.
Figure 1. nvidia-smi
nvidia-smi topo –p2p n
Run the nvidia-smi topo –p2p n command to to display the internal connection status between GPUs within a single compute tray.NoteAnUnknown status for any GPU link indicates a potential hardware issue with the GPU, NVLink switch tray, or cable cartridge. Figure 2. nvidia-smi topo –p2p n
nvidia-smi -q --id=1 -f <output file name>
Run the nvidia-smi -q --id=1 -f <output file name> command to export GPU inventory information.
Type the desired file name in <output file name> to store the output. For example: nvidia-smi -q --id=1 -f /tmp/queryoam1.txt.
Figure 3. nvidia-smi -q --id=1 -f <output file name>==============NVSMI LOG==============
Timestamp : Mon Mar 30 02:14:58 2026
Driver Version : 580.105.08
CUDA Version : 13.0
Attached GPUs : 4
GPU 00000009:06:00.0
Product Name : NVIDIA GB300
Product Brand : NVIDIA
Product Architecture : Blackwell
Display Mode : Requested functionality has been deprecated
Display Attached : No
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : ATS
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1652725032738
GPU UUID : GPU-29255b40-4ad2-6e15-a7e2-634503314135
GPU PDI : 0xca89506c512681b3
Minor Number : 1
VBIOS Version : 97.10.4A.00.1F
MultiGPU Board : No
Board ID : 0x90600
Board Part Number : 900-2G548-0081-000
GPU Part Number : 31C2-893-A1
FRU Part Number : N/A
Platform Info
Chassis Serial Number : 1822725187334
Slot Number : 26
Tray Index : 16
Host ID : 1
Peer Type : Switch Connected
Module Id : 1
GPU Fabric GUID : 0xca89506c512681b3
Inforom Version
Image Version : G548.0301.00.03
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2026/03/29 08:57:08.426
Latest Duration : 56215 us
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : Enabled
nvidia-smi nvlink -s
Run the nvidia-smi nvlink -s command to display the NVLink connection status.
Figure 4. nvidia-smi nvlink -s
Give documentation feedback