Welcome to docs.opsview.com

Differences

This shows you the differences between two versions of the page.

opsview4.6:cluster_checks [2014/09/09 12:19] (current)
Line 1: Line 1:
 +====== Cluster Monitoring Checks in Opsview ======
 +A clustered service is not uncommon in many computer data centres and Opsview can monitor them, whether that is just on a host basis (e.g. clustered firewalls) or on services running on those hosts (e.g. clustered database servers)
 +
 +Sometime it is desireable to have an //aggregated// view of your clustered services. There are two ways of getting an aggregated check result:
 +  * ''check_opsview_keyword''
 +  * ''check_cluster''
 +
 +
 +====== check_opsview_keyword ======
 +
 +===== Requirements =====
 +To monitor a cluster of services, you need to:
 +  * assign the relevant services to a keyword
 +
 +===== Restrictions =====
 +This check must run on the Opsview master as it will query the Runtime database to get the status information.
 +
 +===== Service Check Configuration =====
 +
 +Create a new 'Service Check' with:
 +
 +  Plugin: check_opsview_keyword
 +  Args: --keyword=keywordA
 +
 +Where keyword is the keyword you are interested in. This will return the highest failure state in that keyword. This is the same as the viewport summary view. Other options:
 +
 +<code>
 +--algorithm=percent_ok --percent_critical=@0:20 --percent_warning=@20:25
 +</code>
 +
 +This will return ok if > 25% of services are in an OK state, warning if between 20% and 25% of services are in OK state and critical if between 0% to 20% of services in OK state.
 +
 +You can then assign this service check to the Opsview host.
 +
 +
 +
 +====== check_cluster ======
 +
 +===== Requirements =====
 +
 +To effectively monitor a cluster you need to:
 +
 +  * identify all hosts and/or services that make up the cluster
 +  * identify the warning and critical levels within the cluster
 +
 +===== Restrictions =====
 +
 +All parts of the cluster need to be monitored by the same Opsview server, such as a single slave or the master server.  The cluster check cannot be split across different monitoring servers.
 +
 +===== Host Cluster Configuration =====
 +
 +Create a new 'Service Check' using appropriate details, such as
 +
 +  Name: Host Cluster
 +  Description: Check the host cluster for failures
 +  Plugin: check_cluster
 +  Args: -h -l 'Host Cluster' -w <w> -c <c> -d <hosts>
 +
 +Where:
 +<code>
 +-w is the warning limit
 +-c is the critical limit
 +-d is the list of hosts
 +</code>
 +
 +The warning and critical limits use the [[http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT|threshold format]].  If there are 3 hosts in the cluster and a warning alert is required if 1 is unavailable, and critical if 2 or more are unavailable, then use:
 +
 +<code>
 +-w 1:1 -c @2:
 +</code>
 +
 +The ''hosts'' value is constructed as follows (note: ''hostname1'', ''hostname2'' and ''hostname3''  are the hosts involved here - the rest of the line should be used as-is):
 +
 +  $HOSTSTATEID:hostname1$,$HOSTSTATEID:hostname2$,$HOSTSTATEID:hostname3$
 +
 +Then create a host:
 +
 +  Primary Hostname/IP: 127.0.0.1
 +  Host Title: Host Cluster
 +  Hosts Check Command: <blank>
 +  
 +and assign the check to it.  Note, the host check command is blank as this can be considered a virtual host that does not itself need checking.  For the check to become active a reload needs to be performed.
 +===== Service Cluster Configuration =====
 +
 +As with the host cluster check, create a new 'Service Check' using appropriate details, such as
 +
 +  Name: Service Cluster
 +  Description: Check the host cluster for service failures
 +  Plugin: check_cluster
 +  Args: -s -l 'Service Cluster' -w <w> -c <c> -d <services>
 +
 +The ''services'' value is constructed as follows (note: ''hostname1'', ''hostname2'' and ''hostname3''  are the hosts involved here, with the service ''servicename'' assigned to them all):
 +
 +  $SERVICESTATEID:hostname1:servicename$,$SERVICESTATEID:hostname2:servicename$,$SERVICESTATEID:hostname3:servicename$
 +
 +Then assign the new check to a host and perform a reload.
Navigation
Print/export
Toolbox