NVLink switch tray problems
Use this information to resolve problems that are related to NVLink switch tray.
For more information about System Management Interface (SMI), see NVIDIA System Management Interface.
nv show system health
Run the nv show system health command to display the NVLink switch tray health status.
Figure 1. nv show system health
nv show cluster apps running
Run the nv show cluster apps running command to display all active cluster applications currently running in the NVOS cluster.
Figure 2. nv show cluster apps running
nvidia-smi-q | grep -A4 Fabric
Run the nvidia-smi-q | grep -A4 Fabric command to display the cluster connection status.
Figure 3. nvidia-smi-q | grep -A4 Fabric
nvidia-smi topo –p2p n
Run the nvidia-smi topo –p2p n command to display the GPU connection topology status.
Figure 4. nvidia-smi topo –p2p n
nvidia-smi nvlink -s
Run the nvidia-smi nvlink -s command to display the NVLink connection status.
Figure 5. nvidia-smi nvlink -s
nvidia-smi-q | grep Platform -A 6
Run the nvidia-smi-q | grep Platform -A 6 command to display the compute tray fabric connection status.
Figure 6. nvidia-smi-q | grep Platform -A 6