Welcome to docs.opsview.com

Differences

This shows you the differences between two versions of the page.

opsview4.6:monitoringui [2014/09/09 12:19] (current)
Line 1: Line 1:
 +====== Monitoring Web User Interface ======
 +See the [[quickstart|Quick Start]] guide for information about various page elements.
 +
 +===== General =====
 +
 +The language used in the Opsview user interface can be changed. See the [[i18n|internationalisation]] page for more information.
 +
 +Times in the web user interface as displayed based on the local timezone of the Opsview Master server. However, all times stored in the database is based on UTC.
 +
 +===== Common Status Messages =====
 +
 +In addition to specific plugin data you may see the following results returned by service checks:
 +
 +
 +||**Message**||**Meaning**||**Notes**||
 +||**Agent not responding**||Monitoring plugin could not connect to SNMP agent||SNMP agent is not running on device, or plugin is configured to use wrong SNMP version||
 +||**Connection Refused**||Monitoring plugin could not connect to host||Often means that service you are checking is not available||
 +||**Plugin timeout after 10 seconds**||Monitoring plugin will stop trying connection after pre-specified time interval (10 seconds)||Service is down or very slow to respond ||
 +||**Service results are stale**||Data has not been received from slave server in timely manner||Large numbers of 'stale' service checks indicated problem communicating with slave server||
 +||**Socket timeout after 20 seconds**||Monitoring plugin will stop trying connection after pre-specified time interval (20 seconds)||Service is down, or very slow to respond ||
 +
 +===== Status views =====
 +
 +Host status:
 +
 +  * UP: Host is responding to network requests
 +  * DOWN: Host is not respondng to network requests within pre-defined timeframe
 +  * UNKNOWN: It was not possible to establish status of host (due to network outage for example)
 +
 +Service status
 +  * UP: Service check completed successfully and no thresholds have been exceeded
 +  * WARNING: Service check completed successfully and warning threshold was exceeded
 +  * CRITICAL: Service check completed successfully and critical threshold was exceeded
 +  * UNKNOWN: It was not possible to establish status of service check, or check failed to run
 +==== Host Group Hierarchy ====
 +
 +Provides a top-down view of your entire system. This allows you to see at a high level which hosts and services are in what state and you can drill down to see other host groups.
 +
 +**Note**: Only hosts that have at least 1 service will be displayed. Empty hosts are not displayed.
 +
 +  * Host Group: Hosts and network devices are grouped into Host Groups
 +  * Host Status Totals: Status totals for hosts and network devices (UP, DOWN and UNREACHABLE)
 +  * Service Status Totals: Status totals for service checks (UP, WARNING, CRITICAL and UNKNOWN)
 +  * Handled: Host or Service events that have been 'handled' via scheduling downtime or acknowledging
 +  * Unhandled: Host or Service events that have have not been handled by an operator. The precise meaning of //unhandled// is:
 +    * A host is unhandled if:
 +      * It is DOWN and
 +      * it has not been acknowledged and
 +      * it is not in a period of downtime,
 +      * Otherwise it is handled
 +    * A service is unhandled if:
 +      * Its host is UP and
 +      * the service is not OK and
 +      * the service has not been acknowledged and
 +      * the service is not in a period of downtime,
 +      * Otherwise it is handled
 +
 +=== Sorting of status information ===
 +
 +Status information can be sorted (see [[#URL_parameters]] below) and with some sorting data may be repeated.  For example, when sorting by ''state'' a host with both a warning and critical alert will be shown twice.
 +
 +==== Keyword View ====
 +This gives an aggregated view of a collection of services. See the [[opsview4.6:viewport|keyword view]] page for more information.
 +
 +==== Network Map ====
 +
 +Clickable map of network infrastructure and hosts. This map is drawn from perspective of monitoring system and may not directly match other network diagrams.
 +
 +  * Layout Method: Different ways of drawing network map
 +  * Scaling factor: Allows you to change scale of network map (zoom in or zoom out)
 +  * Drawing Layers: Allows you to filter devices based on Host Group
 +
 +=== Spacing out a subset of hosts ===
 +
 +There is a hidden option which will try and space out the hosts in the network map.
 +
 +A circular network map can look like this:
 +
 +{{:opsview4.6:full_map.png|}}
 +
 +If you select the host groups (either by including or excluding), you can reduce the number of hosts:
 +
 +{{:opsview4.6:subset.png|}}
 +
 +However, this keeps the same spacing, which doesn't increase the readability of the hosts as they will overlap.
 +
 +If you add the hidden option //onlyincludehostsinlayerlist=1// to the URL, the algorithm will space out the hosts based on the host groups in the include list:
 +
 +{{:opsview4.6:spaced_subset.png|}}
 +
 +This algorithm is not perfect, as the links between hosts may not be correct if there is not a full tree between the monitoring system and the hosts, but it can help with visualising the parent tree relationship.
 +
 +You can then bookmark this URL to get this view.
 +
 +==== Service Detail ====
 +
 +Detailed view of all hosts and service checks.
 +
 +For each service, various icons will appear based on what information is available:
 +
 +^ State ^ Icon ^ Notes ^
 +| Acknowledged | {{:opsview4.6:ack.png|}}| For more information, see the [[opsview4.6:acknowledgements|acknowledgements]] page |
 +| Scheduled Downtime (downtime is in the future) | {{:opsview4.6:sched_downtime4.png|}} | |
 +| Currently in downtime | {{:opsview4.6:downtime4.png|}} | |
 +| Comments | {{:opsview4.6:comment4.png|}} | The comments will not show if an acknowledgement or downtime icon is displayed |
 +| Graphing | {{:opsview4.6:graph4.png|}} | You may have to reload Opsview for new service checks to be marked with the graphing icon |
 +| Flapping | {{:opsview4.6:flapping.png|}} | |
 +
 +===== Host Commands =====
 +
 +===Disable active checks for this host===
 +
 +Disables any active checks for this hosts. Active checks are those which use Nagios plugins, SNMP traps will still be processed.
 +
 +===Re-schedule the next check of this host===
 +
 +Allows the next host check to be rescheduled, useful if host has recently recovered from problem.
 +
 +===Start / stop accepting passive checks for this host===
 +
 +Passive checks allow host status data to be submitted for this host from external source
 +
 +===Submit passive check for this host===
 +
 +If passive checks enabled. Allows host status to be set manually.
 +
 +===Start obsessing for this host===
 +
 +===Enable / Disable notifications for this host===
 +
 +Configures whether status notifications should be sent for host (does not affect service checks)
 +
 +===Enable / Disable notifications for all services on this host===
 +
 +Configures whether status notifications should be sent for services associated with this host
 +
 +===Schedule downtime for this host===
 +
 +===Enable / Disable checks for all services on this host===
 +
 +Configures whether this host should be monitored. It is strongly recommended that checks are not disabled.
 +
 +===Enable / Disable event handler for this host===
 +
 +Configures whether event handler is enabled. The event handler is used by distributed monitoring, it can also be used for integration with other management systems.
 +
 +===Enable / Disable flap detection for this host===
 +
 +Flap detection is a mechanism used to suppress notifications for a host which is frequently changing state (eg: rebooting, reoccuring network issue). Generally this should be enabled.
 +
 +===== Service Commands =====
 +
 +===Enable / disable active checks for this service===
 +
 +Configures whether active checks should be enabled for this service check. Active checks use Nagios plugins, passive checks may be SNMP traps or data from external systems.
 +
 +===Re-schedule the next check of this host===
 +
 +Allows the next service check to be rescheduled, useful if service has recently recovered from problem.
 +
 +===Start / stop accepting passive checks for this service===
 +
 +Passive checks allow service status data to be submitted for this host from external source
 +
 +===Submit passive check for this host===
 +
 +If passive checks enabled. Allows service status and check result to be submitted manually.
 +
 +===Start obsessing for this service===
 +
 +===Enable / Disable notifications for this host===
 +
 +Configures whether status notifications should be sent for host (does not affect service checks)
 +
 +===Schedule downtime for this service===
 +
 +===Enable / Disable event handler for this service===
 +
 +Configures whether event handler is enabled. The event handler is used by distributed monitoring, it can also be used for integration with other management systems.
 +
 +===Enable / Disable flap detection for this service===
 +
 +Flap detection is a mechanism used to suppress notifications for a service which is frequently changing state (eg: service restarting, reoccuring network issue). Generally this should be enabled.
 +==== URL parameters ====
 +You can add URL parameters to affect the display of services. This works for ''/status/hostgroup'', ''/status/host'', ''/status/service'' and ''/viewport/{keyword}''.
 +
 +Parameters are specified on the URL by
 +
 +  /status/hostgroup?parameter=value&parameter=value
 +
 +These are some of the filter parameters you can use:
 +||**Parameter**||**Effect**||
 +||changeaccess=1|| Will show list of services that this logged in contact has change access for||
 +||host={name}||Will filter according to host name. Can use % as a wildcard. Can specify multiple times||
 +||servicecheck={name}||Will filter according to service name. Can use % as a wildcard. Can specify multiple times||
 +||host_state={hoststateid}||Will filter according to host state id. Can specify multiple times (up=0, down=1, unreachable=2)||
 +||state={stateid}     ||Will filter according to service state id. Can specify multiple times (ok=0, warning=1, critical=2, unknown=3)||
 +||host_filter={type}  ||Type is either handled or unhandled. Will filter results according to whether the host is handled or unhandled. ||
 +||filter={type}       ||Type is either handled or unhandled. Will filter results according to whether the service is handled or unhandled. ||
 +||asuser={contactname}||Will show list of services based on this contact. Only available for view all roles||
 +
 +For /status/service, extra parameters available are:
 +||htid={id}||Will show services related to host template id||
 +||bcid={id}||Will show services related to business component id||
 +||bsid={id}||Will show services related to business service id||
 +
 +In addition, you can specify an ordering for the results (ordering is not available with ''/status/host''). These can be specified using order={value}. You can specify this multiple times, with the priority from left to right. Valid orders are:
 +||**Name**  ||**Effect**||
 +||state     ||Orders by service state (order of states: OK, UNKNOWN, WARNING, CRITICAL)||
 +||service   ||Orders by service name||
 +||host      ||Orders by host name||
 +||host_state||Orders by host state (order of states: UP, DOWN, UNREACHABLE)||
 +||last_check||Orders by time of last check||
 +||last_state_change||Orders by time of last state change||
 +
 +Syntax notes: To have a space in the URL, i.e. 'Unix Memory', you must have a %20 instead of the space - making the URL something like http://opsview/status/service/?state=2&servicename=Unix%20Memory. To have a WILDCARD in the URL (i.e. *), you must use %25, i.e. to show all critical 'Unix *' service checks, use the following: http://opsview/status/service/?state=2&servicename=Unix%25
 +
 +In addition, each order can be reversed by suffixing //_desc// to the value.
 +
 +The default order is host, service.
 +
 +===== General URL parameters =====
 +These parameters are available across Opsview web:
 +  * include_header - if set to 0, then the top banner is removed although the page header will remain
 +  * lang - override the language variable for this page
Navigation
Print/export
Toolbox