Types of system-defined performance threshold policies
Unified Manager provides some standard threshold policies that monitor cluster performance and generate events automatically. These policies are enabled by default, and they generate warning or information events when the monitored performance thresholds are breached.
If you are receiving unnecessary events from any system-defined performance threshold policies, you can disable the events for individual policies from the Event Setup page.
Cluster threshold policies
The system-defined cluster performance threshold policies are assigned, by default, to every cluster being monitored by Unified Manager:
- Cluster imbalance threshold
- Identifies situations in which one node is operating at a much higher load than other nodes in the cluster, and therefore potentially affecting workload latencies.
Node threshold policies
The system-defined node performance threshold policies are assigned, by default, to every node in the clusters being monitored by Unified Manager:
- Node resources over-utilized
- Identifies situations in which a single node is operating above the bounds of its operational efficiency, and therefore potentially affecting workload latencies.
- Node HA pair over-utilized
- Identifies situations in which nodes in an HA pair are operating above the bounds of the HA pair operational efficiency.
- Node disk fragmentation
- Identifies situations in which a disk or disks in an aggregate are fragmented, slowing key system services and potentially affecting workload latencies on a node.
Aggregate threshold policies
The system-defined aggregate performance threshold policy is assigned by default to every aggregate in the clusters being monitored by Unified Manager:
- Aggregate disks over-utilized
- Identifies situations in which an aggregate is operating above the limits of its operational efficiency, thereby potentially affecting workload latencies. It identifies these situations by looking for aggregates where the disks in the aggregate are more than 95% utilized for more than 30 minutes. This multicondition policy then performs the following analysis to help determine the cause of the issue:
Is a disk in the aggregate currently undergoing background maintenance activity?
Some of the background maintenance activities a disk could be undergoing are disk reconstruction, disk scrub, SyncMirror resynchronization, and reparity.
- Is there a communications bottleneck in the disk shelf Fibre Channel interconnect?
- Is there too little free space in the aggregate?
A warning event is issued for this policy only if one (or more) of the three subordinate policies are also considered breached. A performance event is not triggered if only the disks in the aggregate are more than 95% utilized.
Aggregate disks over-utilizedpolicy analyzes HDD-only aggregates and
Workload latency threshold policies
The system-defined workload latency threshold policies are assigned to any workload that has a configured Performance Service Level policy that has a defined expected latency
value:
- Workload Volume/LUN Latency Threshold Breached as defined by Performance Service Level
- Identifies volumes (file shares) and LUNs that have exceeded their
expected latency
limit, and that are affecting workload performance. This is a warning event.
QoS threshold policies
The system-defined QoS performance threshold policies are assigned to any workload that has a configured ONTAP QoS maximum throughput policy (IOPS, IOPS/TB, or MB/s). Unified Manager triggers an event when the workload throughput value is 15% less than the configured QoS value:
- QoS Max IOPS or MB/s threshold
- Identifies volumes and LUNs that have exceeded their QoS maximum IOPS or MB/s throughput limit, and that are affecting workload latency. This is a warning event.
- QoS Peak IOPS/TB or IOPS/TB with Block Size threshold
- Identifies volumes that have exceeded their adaptive QoS peak IOPS/TB throughput limit (or IOPS/TB with Block Size limit), and that are affecting workload latency. This is a warning event.