Skip to main content

GPU power capping settings (trained technician only)

Refer to the following section for more information on GPU power capping. This procedure is trained technician only.

GPU power capping tools

You can set GPU power capping through XCC IPMI command. See the following sections for IPMI command.

XCC and firmware version

Make sure XCC firmware version is QGX312Q or above. To update XCC frimware, see Update the firmware.

Set up GPU power capping after replacing the system board

After replacing the system board, make sure to configure GPU power capping.

Reading GPU power capping value

Use IPMI command to read the GPU power capping value. See following for more details.

Steps:

  1. Read user-defined GPU power capping value with command line below:
    ipmitool raw 0x3a 0x6 0xc0 [Slot]
    The return value would be as below:
    ipmitool raw 0x3a 0x6 0xc0 [Slot]
    [x] [y]
    where
    • [Slot] is GPU numbering—GPU 1: [Slot]= [3], GPU 2: [Slot]= [4], GPU 3: [Slot]= [5], GPU 4: [Slot]= [6]

    • [x] is the first digit and [y] is the second and third digits of a three-digit hexadecimal number. Convert the hexadecimal number to decimal number. The decimal number is the power capping value.

    For example, the return value below shows that the power capping value for GPU 3 is 600W. (converted from hexadecimal number 258).
    ipmitool raw 0x3a 0x6 0xc0 3
    02 58

    Read every GPU power capping value and note down the power capping value.

    If the value returned as fail, proceed to Step 2.

  2. (Skip Step 2 if power capping value was read successfully in Step 1.)

    Read default GPU power capping value with command line below:
    ipmitool raw 0x3a 0x0b 0xf2 0x0 0x10 0x02
    The return value would be as below:
    ipmitool raw 0x3a 0x0b 0xf2 0x0 0x10 0x02
    [x] [y]

    where [x] is the first digit and [y] is the second and third digits of a three-digit hexadecimal number. Convert the hexadecimal number to decimal number. The decimal number is the power capping value.

    For example, the return value below shows that the GPU power capping value is 600W. (converted from hexadecimal number 258).
    ipmitool raw 0x3a 0x0b 0xf2 0x0 0x10 0x02
    02 58

    Note down the power capping value.

Configure GPU power capping with IPMI commands

Note
  • All four GPUs are power capped to the same wattage value.

  • The GPUs can be configured to the following three power capping values:

    • TGP Max mode: 700W (default mode, maximum 4 trays in the enclosure)

    • TGP User selected optimal: 600W (maximum 5 trays in the enclosure)

    • TGP User selected minimum: 500W (maximum 6 trays in the enclosure)

Steps:

  1. Convert the power capping wattage value from decimal number to hexadecimal number.

    Take 600W as an example, the decimal number 600 converts to hexadecimal number is: 258

  2. Set power capping with command line below:
    ipmitool raw 0x3a 0x6 0xc0 0xff [x] [y]

    where [slot] is the GPU numbering; [x] is the first digit and [y] is the second and third digits of the converted hexadecimal number.

    For example, the command line for power capping GPU to 600W is:
    ipmitool raw 0x3a 0x6 0xc0 0xff 0x2 0x58
  3. After 30 to 50 seconds, read the power capping value with the following command:
    ipmitool raw 0x3a 0x6 0xc0 [Slot]
    The return value would be as below:
    ipmitool raw 0x3a 0x6 0xc0 [Slot]
    [x] [y]
    where
    • [Slot] is GPU numbering—GPU 1: [Slot]= [3], GPU 2: [Slot]= [4], GPU 3: [Slot]= [5], GPU 4: [Slot]= [6]

    • [x] is the first digit and [y] is the second and third digits of a three-digit hexadecimal number. Convert the hexadecimal number to decimal number. The decimal number is the power capping value.

    For example, the return value below shows that the power capping value for GPU 3 is 600W. (converted from hexadecimal number 258).
    ipmitool raw 0x3a 0x6 0xc0 3
    02 58
  4. Read every GPU power capping value. If the response power capping value is incorrect, perform DC cycle to the system, and repeat step 2 to verify the value. If the problem persists, perform AC cycle or virtual reseat, and verify again.