Welcome to docs.opsview.com

Distributed Architecture

Opsview supports a distributed architecture, so it is possible to have slave monitoring systems, which can be clustered.

See the slave setup documentation for the process of creating a slave system.

This document explains how the distributed monitoring works.

How Slaves Communicate

All communications between the master and the slave is via an SSH tunnel.

The opsviewd process will start and monitor the SSH tunnel. If you have a security policy that forbids an ssh connection to a slave, it is possible to configure Opsview so the slave initiates the connections - see the reverse ssh documentation for instructions on how to configure this.

When a service is updated on a slave, the results are placed into a log file. Every 5 seconds, NagiosĀ® Core invokes the process-cache-data script to upload all these results to the master. This uses the send_nsca command, writing to localhost port 5667 which is tunnelled back to the master.

Opsview will create a service, called Slave-node: {name}, to monitor each slave node. This also checks other information such as if the time is synchronised between slave nodes.

How Cluster Nodes Communicate

A cluster node is similar, but only actively monitors a subset of the hosts assigned to the slave system. At Opsview reload time, Opsview splits the list of hosts between all the nodes in the slave cluster to balance the workload. It also works out, for each cluster node, which hosts it would take over in the case of a failure.

Each node in the cluster will create a service called Cluster-node: {name} which monitors every other node in the cluster. Each cluster will require SSH keys to be exchanged so the slave node can speak to all other slaves nodes in the cluster. An event handler will be created which specifies the hosts that this node will take over in the case of a failure.

A limitation of Opsview is that all slaves are working for an Opsview reload to occur. If you have a node in a slave system that has failed and will be offline, you can remove it from the slave system and reload.