Welcome to docs.opsview.com

Cluster Monitoring Checks in Opsview

A clustered service is not uncommon in many computer data centres and Opsview can monitor them, whether that is just on a host basis (e.g. clustered firewalls) or on services running on those hosts (e.g. clustered database servers)

Sometime it is desireable to have an aggregated view of your clustered services. There are two ways of getting an aggregated check result:

  • check_opsview_keyword
  • check_cluster

check_opsview_keyword

Requirements

To monitor a cluster of services, you need to:

  • assign the relevant services to a keyword

Restrictions

This check must run on the Opsview master as it will query the Runtime database to get the status information.

Service Check Configuration

Create a new 'Service Check' with:

Plugin: check_opsview_keyword
Args: --keyword=keywordA 

Where keyword is the keyword you are interested in. This will return the highest failure state in that keyword. This is the same as the viewport summary view. Other options:

--algorithm=percent_ok --percent_critical=@0:20 --percent_warning=@20:25

This will return ok if > 25% of services are in an OK state, warning if between 20% and 25% of services are in OK state and critical if between 0% to 20% of services in OK state.

You can then assign this service check to the Opsview host.

check_cluster

Requirements

To effectively monitor a cluster you need to:

  • identify all hosts and/or services that make up the cluster
  • identify the warning and critical levels within the cluster

Restrictions

All parts of the cluster need to be monitored by the same Opsview server, such as a single slave or the master server. The cluster check cannot be split across different monitoring servers.

Host Cluster Configuration

Create a new 'Service Check' using appropriate details, such as

Name: Host Cluster
Description: Check the host cluster for failures
Plugin: check_cluster
Args: -h -l 'Host Cluster' -w <w> -c <c> -d <hosts>

Where:

-w is the warning limit
-c is the critical limit
-d is the list of hosts

The warning and critical limits use the threshold format. If there are 3 hosts in the cluster and a warning alert is required if 1 is unavailable, and critical if 2 or more are unavailable, then use:

-w 1:1 -c @2:

The hosts value is constructed as follows (note: hostname1, hostname2 and hostname3 are the hosts involved here - the rest of the line should be used as-is):

$HOSTSTATEID:hostname1$,$HOSTSTATEID:hostname2$,$HOSTSTATEID:hostname3$

Then create a host:

Primary Hostname/IP: 127.0.0.1
Host Title: Host Cluster
Hosts Check Command: <blank>

and assign the check to it. Note, the host check command is blank as this can be considered a virtual host that does not itself need checking. For the check to become active a reload needs to be performed.

Service Cluster Configuration

As with the host cluster check, create a new 'Service Check' using appropriate details, such as

Name: Service Cluster
Description: Check the host cluster for service failures
Plugin: check_cluster
Args: -s -l 'Service Cluster' -w <w> -c <c> -d <services>

The services value is constructed as follows (note: hostname1, hostname2 and hostname3 are the hosts involved here, with the service servicename assigned to them all):

$SERVICESTATEID:hostname1:servicename$,$SERVICESTATEID:hostname2:servicename$,$SERVICESTATEID:hostname3:servicename$

Then assign the new check to a host and perform a reload.

Navigation
Print/export
Toolbox