Skip to main content

Overview

The ‘Events and Thresholds’ page enables you to view the predefined events or custom events for DC management and all the thresholds applied to different entities. Clicking ‘Events and Thresholds’ on the left side menu brings up the ‘Events and Thresholds’ page. The page lists all events and thresholds. The ‘Events’ list can be further filtered by time or severity level, as defined below:

Severity LevelIconDescription

Custom

Associated with all custom events.

Critical

For errors that may cause Energy Manager to stop working properly.

Error

For errors on specific nodes, or non-critical errors in Energy Manager .

Warning

For events that warn that an error may soon occur.

Informative

For events that do not report errors.

Note
There are different ‘Events’ tabs in Energy Manager:
  • The ‘Events’ page lists all the predefined events and custom events.

  • The ‘Events’ tab on the ‘Dashboard’ page only lists the critical events and threshold based events.

  • The ‘Events’ tab in the ‘Datacenter Management’ page lists all the events applying to the specific group or device.

Note

Some typical events are listed below along with simple troubleshooting tips.

Event TypeDescriptionTroubleshooting

PLATFORM_OPERATION_FAILED

May be caused by platform issues, including: platform error, platform unstable or platform temporarily busy. Typical error notes may include:

  • Plugin Operation Exception: command code: 34, completion code: ffffff83, failure reason: ERROR_COMPLETION_CODE, detailed message: Response for command: 0x34, NetFn[LUN: 0x1C. Completion code: 0x83 Unknown completion code-125. Additional data 0 bytes: …

  • Platform operation failed: System is on, but ME power measurement is suspended.

  • Platform operation failed: Thermal is not supported in the node: …

  • Platform operation failed: NM3.0 get cups data error. Plugin Operation Exception: ……

  • Platform operation failed: NM 3.0 get cups data error. Receive timeout, state =TIMEOUT

  • Platform operation failed: NM Airflow temperature measurement is pending in the node: …

Check the status and the power and temperature trending of the device.

If the status and the trending are displayed normally, you can ignore the event.

If they are abnormal, you can try the following actions to solve it:

  • Cycle AC power of the managed device.

  • Update the BIOS/BMC firmware to the latest version.

INTERNAL_ERROR

May be caused by platform issues, including: platform error, platform temporarily busy or some unsupported actions. Typical error notes may include:

Set average period for POWER with value: 60 failed!

  • Cycle AC power of the managed device.

  • Update the BIOS/BMC firmware to the latest version.

COMMUNICATION_WITH_NODE_FAILED

May be caused by a communication issue with the managed device. Typical error notes may include:

  • Receive timeout, state = TIMEOUT

  • IPMI session has not been created!

  • Check the network status, such as NIC port and network cable, to make sure the device is reachable.

  • Reset the BMC.

CANT_SET_NODE_EVENT

May be caused by a communication or platform issue. Typical error notes may include:

Failed to subscribe Predefined-Events with node Id: 46 Error: Failed to receive test event from node…

  • Check the network status, such as NIC port and network cable, to make sure the device is reachable.

  • Update BMC firmware to the latest version.