Health Tab

Features, descriptions, and screenshot of the Sonexion System Manager monitoring tool (built upon Icinga monitoring SW)

The Sonexion System Manager monitoring solution is built on Icinga, open-source software that is proven for monitoring health and performance of large networks of Linux servers. The Icinga monitoring system checks user-specified hosts and services, and it provides notifications when things go wrong and when they recover.

Monitoring tool features include the following:

  • Monitoring network services (including SMTP, POP3, HTTP, NNTP, and PING)
  • Monitoring host resources (including CPU load and disk usage)
  • Monitoring components (including array devices, BMC, batteries, PSUs, cooling fans, I/O modules, disk drives, and enclosure electronics)
  • Parallelized service checks
  • Ability to define network host hierarchy using "parent" hosts, allowing detection of and distinction between hosts that are down and hosts that are unreachable
  • Contact notifications when service or host problems occur and get resolved
  • Ability to define event handlers to be run during service or host events for proactive problem resolution
  • Automatic log file rotation
  • Supports redundant monitoring hosts

From the Health tab, the Icinga software monitoring solution is displayed. An example follows of the main screen where everything comes together. The default view is the Tactical Monitoring Overview. See detailed descriptions in Figure 1.

The following color descriptions aid in quick interpretation of the Health tab screen.

Table 1. Health Tab Color Descriptions
COLORDESCRIPTION
GreenState is OK, host is up, service is ok, not flapping, or not in scheduled downtime.
AmberState is in a warning state.
Light Amber

(2 shades)

State is warning but acknowledged.
RedState is in a critical state.
Light Red

(2 shades)

State in critical but acknowledged. Also can indicate that a host is down, a service is critical, host or service properties have been disabled, is flapping or in scheduled downtime, or various other possible issues related to host and service problems.
Purple

(3 shades)

State is in an unknown or unreachable state.
BlueIndicates a pending action.
Figure: Health Tab Overview

Health Tab UI Elements

Header with Tactical Information

The left side of the header is the Tactical Monitoring Overview (TMO). It is designed to serve as a birds-eye view of all network monitoring activity. It allows the operator to quickly see network outages, host status, and service status. It distinguishes between problems that have been handled in some way (that is, has been acknowledged, had notifications disabled, and so forth) and problems that have not been handled (that is, need attention). This is useful when multiple hosts/services are being monitored and the operator needs to keep a single screen up to alert about problems.

Clicking any of the TMO links loads the related detailed status view into the Alert Summary window, located below the header and to the right of the left panel menu. For example, clicking the All Services Warning link will open the Service Status Details for All Hosts view.
Tip: Hover over a TMO link to see its name.

TMO links include host and service counters for their respective states. For example, there are three counters to the left of the Hosts Down link; Unacknowledged Hosts Down, Acknowledged Hosts Down, and Handled Hosts Down.

TMO links and their related counters are also color coded. If all related counts are zero, the TMO link background color is gray. If any counts indicate a warning, the color is amber. If any counts are critical or down, the color is red.

General Process Information is displayed at the right side of the header:
  • Hosts|Services (active/passive)
  • Host|Services execution time (min/avg/max)
  • Host|Services latency (min/avg/max)

Left Panel Menu

From the left menu, click any link to select a detailed status view that will then display in the Alert Summary window. The available views are grouped into functional areas. There are a General view, Status views, Problems views, System views, Reporting views, and a Configuration view.

General View

The General view provides the ability to search for a hostname, service name, storage enclosure, and so on. To display the detailed view, select an item from the search results.

Status Views

ViewDescription
Tactical OverviewThe tactical monitoring overview
Host DetailsHost status details for all hosts
Service DetailService status details for all hosts
Hostgroup OverviewStatus overview for all host groups
Servicegroup OverviewStatus overview for all service groups

Problems Views

ViewDescription
Service ProblemsService status details for all hosts, where the status is not OK. This is the default display when the Health tab is first opened or when it is refreshed.
Unhandled ServicesService problems that have not yet been handled
Host ProblemsHost status details for all hosts, where the status is not OK
Unhandled Hosts Host problems that have not yet been handled
All Unhandled ProblemsAll identified system problems that have not yet been handled
Network OutagesLists all current blocking outages

System Views

System views allow the viewer to see relevant system information:

ViewDescription
CommentsHosts and services comments
DowntimeScheduled hosts and services downtime
Process InfoIcinga process information, such as whether notifications are enabled
Performance InfoProgramwide performance information, such as check statistics and other metrics
Scheduling QueueThe check scheduling queue

Reporting Views

An operator may use the Reporting views to create reports related to system availability, alerts, notifications, and logs. After selecting a report, the system will guide the operator through the steps to create the report. Some reports require the operator to specify parameters to generate the report, some reports do not.

Available reports include:

ReportDescription
AvailabilityFour (4) available reports: Hosts, Services, Host groups, and Service groups
Alert HistoryA list of all host and service alerts by date and time. The operator can filter the results after the report displays.
Alert SummaryFive (5) available reports: 25 Most Recent Hard Alerts, 25 Most Recent Hard Host Alerts, 25 Most Recent Hard Service Alerts, Top Hard Host Alert Producers, 25 Top Hard Service Alert Producers
NotificationsA list of all system notifications by date and time. The operator can filter the results after the report displays.
Event LogsA list of the event log by date ad time

Once a report is created, it may be exported to a file in CSV, JSON, or XML format.

Configuration View

In the Configuration view, click View Configuration to view the current system configuration.