Flapping (Hosts)

How monitoring features provide detection of flapping hosts

Monitoring supports optional detection of hosts that are "flapping." Flapping occurs when a host changes state too frequently, resulting in a storm of problem and recovery notifications. Flapping can indicate configuration problems (e.g., thresholds set too low), troublesome services, or real network problems.

Whenever monitoring checks the status of a host, it will check to see if it has started or stopped flapping. It does this by

  • Storing the results of the last 21 checks of the host.
  • Analyzing the historical check results and determining where state changes/transitions occur.
  • Using the state transitions to determine a percent state change value (a measure of change) for the host.
  • Comparing the percent state change value against low and high flapping thresholds.

A host is determined to have started flapping when its percent state change first exceeds a high flapping threshold. A host is determined to have stopped flapping when its percent state goes below a low flapping threshold (assuming that it was previously flapping).

Host flap detection works similarly to service flap detection, with one important difference: Monitoring will attempt to check to see if a host is flapping whenever:
  • The host is checked (actively or passively)
  • Sometimes when a service associated with that host is checked. More specifically, when at least x amount of time has passed since the flap detection was last performed, where x is equal to the average check interval of all services associated with the host.

Flap Handling

When a host is first detected as flapping, monitoring will:
  1. Log a message indicating that the host is flapping.
  2. Add a non-persistent comment to the host indicating that it is flapping.
  3. Send a flapping start notification for the host to appropriate contacts.
  4. Suppress other notifications for the host (this is one of the filters in the notification logic).
When a host stops flapping, monitoring will:
  1. Log a message indicating that the host has stopped flapping.
  2. Delete the comment that was originally added to the host when it started flapping.
  3. Send a flapping stop notification for the host to appropriate contacts.
  4. Remove the block on notifications for the host (notifications will still be bound to the normal notification logic).