Welcome to docs.opsview.com

Service Checks

Service checks are the things that you care about. From the Service Check pages, you can define how to check a service, when to check it and what to do if it fails.

Service checks are deliberately separated from hosts, so you need to associate service checks against hosts to start monitoring them. This host + a service check will be referred to as a service, in Opsview terminology.

Name

This is the name of the service check. This needs to be unique for all service checks.

When combined with a host, this will be the name of the service. Note that reports and performance graphs will use this name to find historical information, so if you change the name of a service check history will be lost.

The name is constrained to 63 characters and certain characters are blocked from being used. Trailing spaces are automatically removed.

If this service check is marked as a multiple services based on host attributes, then the name of the service created in Nagios® Core will be Service name: {attribute value}.

Some names are not changeable, such as Cluster-node, Slave-node and Interface.

Note: If you change the name of a service check, you will lose history in the RRD graphs as it is considered to be a different service.

Service Group

This is key to the access control. Define the service group to be a logical grouping of service checks that are commonly dealt with by a particular person or team.

Keywords

Assign keywords to this service check. Only hosts with the same keyword that is also on this host will be associated with this keyword.

You can use a comma separated list of keywords. Keywords are alphanumeric.

Note: You will need to reload for any viewports to update any keyword associations.

Dependencies

From Opsview 3.7.0, you can define service check dependencies.

For this service check, you can define which other service checks it is dependent on. This affects execution and notification.

If the dependent service is in a failed state, then this service check will have a result of UNKNOWN and output of “Dependency failure”. Also, notifications will be suppressed if the dependent service is in a failed state.

A common use of notification dependencies is for checks using an agent. For instance, if you use NRPE to monitor disk, cpu or memory utilisation, you can set those three service checks to have a dependency on the NRPE agent. This means an NRPE failure will not execute the other checks and raise only a single alarm and not one for each service.

Note:

  • Dependencies are optional. You can still have NRPE checks that work as normal with no dependencies setup
  • Dependencies are limited to the same host only
  • A dependency will only be setup if the dependent service has been selected for this host (either specifically or via host templates). If not, then the service will be setup as normal but without the dependency included
  • A dependent service's state is based on either its current soft or hard state. This is defined by this system preference
  • Dependent services do not go into an unhandled state as Opsview doesn't know at status time about the dependency hierarchy
  • You cannot be dependent on a service which creates multiple services based on host attributes

Type of Check

Opsview has 4 built-in different types of service checks. Other fields will be available depending on the type of service check chosen.

Active Plugin

These are Nagios plugins that run on a regular schedule, that take arguments to change the behaviour of the check which can include thresholds

SNMP Polling

This another form of active checks, which utilises the check_snmp plugin. Opsview provides helper fields to select which SNMP OID you want monitored and what thresholds you want alerts for.

Passive

This is an empty service. Opsview creates the service, but we expect results to be pushed into Nagios Core via the Nagios Core command pipe from some external system. See the Nagios Core documentation for more details

As an aside, Opsview creates passive services for all hosts that are monitored from a slave. Opsview uses NSCA to then send results from the slaves up to the master.

SNMP Trap

This is a form of passive check. Opsview provides screens to help you configure your SNMP trap rules.

Check Period

This sets the time period the active check is allowed to run in - defaults to 24×7 (run all the time). You may want to run a service check only during work hours - Nagios Core will not schedule any active checks outside this time period. The last state captured will stay as the visible state outside the time period.

Note that if you have any performance graphs, there will be gaps while the active check is not run. If you want graphs 24×7 but no alerts, you can set the check period to be 24×7 and setup a timed exception to alter the threshold parameters to stop alerts.

Check Interval

This is the interval between checks for an active check. The default is 5 minutes.

Max Check Attempts

When the plugin returns a non-OK state, this service will run for this many check attempts before raising a notification. The default is 3 failures before notifications are sent.

Retry interval

If a service is in a failed state, but before any notifications have been sent, the retry interval is set here.

When a notification has been sent, the service will switch back to being checked at the normal frequency level.

Plugin

Set the plugin to be the name of the binary or script that will run the actual check.

If the check is to be run on an agent only set the plugin to be check_nrpe and set the arguments appropriately.

Invert Plugin Results

This checkbox can be ticked to invert certain result codes from a plugin.

Currently, a critical result can be inverted to OK and vice versa.

Arguments

These are the arguments for the plugin used if this is an active check.

You can use host macros, dynamic macros or host attributes within the arguments.

Example host macros:

  • $HOSTADDRESS$ - this will get substituted for the host's primary hostname/IP
  • $ADDRESSES$ - this will get substituted with the full comma separated list from the Other Interfaces field for a host

Example dynamic macros:

  • $SERVICESTATEID:hostname:servicename$ - this will get substituted at execution time with the service state id of the specific hostname and servicename

Example host attributes (from Opsview 3.7.1):

  • %NRPE_PORT% - this will get substituted with the value of the host attribute NRPE_PORT. If the attribute does not exist, this will be replaced with an empty string. If there are multiple attributes with the same name, will substitute with the 1st in the list
  • %NRPE_PORT:1$ - this will get substituted with arg1 based on the host attribute NRPE_PORT. If arg1 is not set, then the macro will be replaced with an empty string

Escaping Dollar Symbols

If you need to include a $ in the your argument, you will need to escape it in the arguments by entering it twice. For example, if the plugin is check_dummy, you can enter arguments of:

0 'This is a test which costs $$100'

This will mean the command executed will be:

check_dummy 0 'This is a test which costs $100'

Be aware that shell expansion may occur. If the arguments were: 0 “This is a test which costs $$100” (using double quotes instead of single quotes), then the command executed would be:

check_dummy 0 "This is a test which costs $100"

As the shell will evaluate $1, the output would be:

This is a test which costs 00

Therefore, use single quotes around any expected $ symbols.

check_nrpe Arguments

check_nrpe arguments should be set similar to

check_nrpe -H $HOSTADDRESS$ -c **NRPE_COMMAND** -a '**COMMAND ARGUMENTS**'

Be aware that the $SNMP_COMMUNITY$ macro will be expanded out to be shell friendly - ie, the community name $teve's will be expanded to be '$teve'\''s'.

Check Freshness

If you are receiving passive results, you may want to check that you are getting results within a certain timeframe. From Opsview 3.5.1, you can configure this to take an action.

You can enable freshness checking which means that if this service has not been updated for this amount of time, then Nagios Core will force a stale result for the service based on the configuration.

There are two actions that can be taken:

  • Resend Notifications
  • Submit Result

Note: Due to a limitation in Nagios Core, only one of these actions can be chosen.

Resend Notifications

When a passive check is received, there are normally no others that follow. The status on screen will show this last state.

An alert will also be raised through the usual mechanism. However, you do not get a re-notification unless the service fails again.

If this option is selected, then we implement this servicecheck with a freshness check that just submits the same result back to Nagios Core. The freshness threshold is set to the notification interval so it looks like the service has received the same result again. However, a side effect of this is that if the passive check is run on a slave, the master will not get a stale result for this service. Do not enable this option if you expect regular passive results to arrive.

Note: This feature is available in Opsview 3.5.0, but the user interface options will change in Opsview 3.5.1 onwards.

Submit Result

This submits a result back into Nagios Core if the freshness timeout value has been reached, so you can either change the state to display an error or perhaps reset the state of a service back to OK.

Freshness Timeout

This is the amount of time before Nagios Core considers a service to be not fresh. You can enter this value in a duration format, such as 10m for 10 minutes or 48h 15m for 48 hours and 15 minutes.

Note that due to the way that Nagios Core calculates this value, the stale action will run a few minutes after this timeout value.

Stale State

Choose the appropriate state that you want the service to change to when it passes the freshness threshold.

You may want to automatically set a service back to OK after 1 hour for certain types of checks.

Stale Text

Choose what text you want to set as the output. The text will be added to the end of the state phrase (OK, WARNING, CRITICAL, UNKNOWN).

Example Host

For SNMP Polling, an example host field will appear. You can enter a host here and a selection of hosts will appear that have SNMP configured. Select one and press SNMP Walk to get the snmpwalk output for that host.

You can then select a specific OID that you want monitored. Clicking on the OID will populate the OID, Label and Calculate Rate field appropriately.

OID

This field determines which OID will be polled by this service check. This is a text field which will be translated into the numeric field at Opsview reload time.

Label

From Opsview 3.7.2, you can choose the performance label.

The default performance graphing label is the OID. If you set a string here, this will be used instead.

Note: if you change this value, you will lose history in existing graphs because it is considered to be a different data point.

Calculate Rate

From Opsview 3.7.2, you can choose rate changes in the returned SNMP values.

If this is set to “No rate”, then the raw value is used for performance data and for threshold comparisons.

If this is set to “Per second”, then the values are calculated based on the differences between checks and converted into a per second value. This is especially useful for SNMP data which is of the type Counter32 or Counter64, because you then calculate the rate of change. The alert levels will be against this rate change.

You can also choose “Per minute” or “Per hour” and the rate value will be adjusted appropriately.

Notify on

Choose which states this service check should notify on.

Note: If the host has no notification options set, then services on that host will not have notifications enabled either.

Notification Period

The time period when notifications are allowed to be sent to contacts. If this is set to blank, the value will be inherited from the host's configuration.

Re-notification Interval

Period of time (in minutes unless configured otherwise) after which a notification is resent if the service is still unhandled. If this is set to 0, then only the first notification is sent (when the service goes into a hard state change).

If this is set to blank, the value will be inherited from the host's re-notification interval.

Multiple Services

This is an advanced feature available from Opsview 3.7.1.

This field defines which variable in the argument field is used to create multiple services. For instance, if you set this to be DISK, then there will be multiple services created with the name of ”{servicecheckname}: {value}” based on the number of variables of the name DISK that are assigned to the host.

This will only work when the Type of Check is an Active Plugin.

Flap detection

A service is considered flapping if its state changes too much. If this option is set, any services will be checked for this flapping condition and an icon will appear for the service and notifications will be temporarily disabled until the service comes out of a flapping state.

We recommend that flap detection is enabled for active checks. However if you find a service is flapping frequently, there is probably another issue that needs investigating.

We recommend that flap detection is disabled for passive checks.

For more information about flap detection, see the Nagios Core documentation for flap detection.

Alert Every Failure

This option forces a notification to be sent on every check in a non-OK state. This is useful if you have a passive service check which receives results.

Note: this overrides the re-notification interval option, so you will get alerts every time.

In Opsview 3.7.0, there are three states for this option:

  • Disabled - only get alerts on state changes
  • Enabled - get alerts for every failed state
  • Enabled with re-notification interval - get alerts for every failed state as long as the re-notification interval has passed. This is useful if you get a lot of results in quick succession

Note: The notification number will increase for every non-OK result and only gets reset to 0 when an OK state is received.

Event Handler

In Opsview 3.7.1, you can set an event handler for the service check. See event handlers page for more information.

Markdown Filter

If this option is chosen, then the service output will be filtered through the Markdown plugin. This allows you to mark up the output with bold, italics and URL links. For instance, if the output is:

**Disk failure** on *sd1* - see [internal wiki](http://opsview.org)

This will be displayed as:

Disk failure on sd1 - see internal wiki

Use http://daringfireball.net/projects/markdown/dingus to test your plugin output. Bear in mind that:

  • you cannot use the pipe symbol as Nagios Core interprets this as the start of performance data
  • < and > characters are converted to the HTML entities so you cannot embed other HTML tags
  • you should keep to only one line due to NSCA limitations in a distributed environment

Therefore, you should stick to using just bold, italics and links in your output.

Note: Only Opsview status pages will use the filter to display the output. Pages rendered from Nagios Core will not use the filter. Also, events view does not currently display markdown output.

Deprecated

Notification Dependencies

For this service check, you can define which other service checks it is dependent on, for notification purposes. So if this service fails and is about to send an alert, and the dependent service has also failed (and sent out a notification), then notifications for this service will be suppressed.

A common use of notification dependencies is that checks via NRPE (such as disk, cpu or memory utilisation) will have a dependency of the NRPE agent. This way an NRPE failure will raise only a single alarm and not one for each service.

Note:

  • Dependencies are optional. You can still have NRPE checks that work as normal with no dependencies setup
  • A dependency will only be setup if the dependent service has been selected for this host (either specifically or via host templates). If not, then the service will be setup as normal

Note: This functionality has changed to dependencies from Opsview 3.7.0 onwards.

Navigation
Print/export
Toolbox