Welcome to docs.opsview.com

Differences

This shows you the differences between two versions of the page.

opsview4.6:nrd-architecture [2014/09/09 12:19] (current)
Line 1: Line 1:
 +====== NRD Architecture ======
 +===== Overview =====
 +From Opsview 3.11.0, [[http://code.google.com/p/nrd/|NRD]], or Nagios Result Distributor, is used for sending results from slaves back up to the Opsview master.
 +NRD has a client/server design - there is a client running on each slave server which sends results to the nrd process on the Opsview master.
 +
 +Opsview's implementation of NRD has the following benefits of using NSCA:
 +  * Can send multi-line plugin output back to master
 +  * A 16K limit on the amount of data sent, above the previous 511 bytes
 +  * 50% performance improvement over NSCA communication
 +  * Results are queued on the slave which means that sending results is now in parallel to other NagiosĀ® Core responsibilities
 +  * Results received on the Opsview master are written directly to the check results queue, reducing workload on Nagios Core on the master
 +  * Transactional results so if a sending failure occurs, the whole transaction is aborted and retried
 +  * The NRD daemon on the Opsview master can dynamically prefork extra servers based on load
 +  * Timestamp of results is now based on the time the results were stored on the slave, rather than the time of reception on the master
 +  * Packet sizes are now flexible, so only the amount of data required is sent (66% less data than NSCA)
 +
 +===== Architecture =====
 +{{:opsview4.6:nrd_architecture.png|}}
 +===== Server =====
 +The server daemon is ''/usr/local/nagios/bin/nrd''. This is started and stopped with the rest of Opsview using ''/etc/init.d/opsview''.
 +
 +The configuration file used is ''/usr/local/nagios/etc/nrd.conf''. However, this file is generated by an Opsview reload so changes made directly will be overwritten.
 +
 +Opsview runs NRD in prefork mode, so there are up to 12 nrd processes - these are dynamically created if required. If there are changes (such as if the shared password has changed), then a restart of the nrd process is necessary.
 +
 +The server logs information to ''/var/log/opsview/opsviewd.log''. You can increase the logging by changing the ''/usr/local/nagios/etc/nrd.conf'' file and altering the ''log_level'' attribute to 4 for the maximum debug - this requires a restart of the NRD daemon. As Log4perl has its own filtering capabilities, you will need to change ''/usr/local/nagios/etc/Log4perl.conf'' with:
 +<code>
 +log4perl.logger.nrd=DEBUG
 +</code>
 +
 +This file does not need a restart of NRD, but can take up to 30 seconds to be recognised.
 +
 +Information is encrypted between the client and server based on a shared password. This shared password can be changed in the [[opsview4.6:configuration_files#nrd_shared_password|opsview.conf]] file.
 +
 +
 +===== Client =====
 +The client is implemented as a daemon called ''import_slaveresultsd''. You can send results back to NRD on the master using
 +<code>
 +printf "host1\t0\toutput message" | /usr/local/nagios/bin/send_nrd -c /usr/local/nagios/etc/send_nrd.cfg
 +</code>
 +But the import_slaveresultsd includes all the necessary libraries so there is not an invocation penalty for every result.
 +
 +The daemon continually checks the directory ''/usr/local/nagios/var/slaveresults'' for any new files. This will contain results written by Nagios Core every 5 seconds and files are created based on the timestamp of creation.
 +
 +When the daemon finds a file, it will (for each file, oldest first):
 +  * discard any file older than the value of ''$slave_results_max_cache'' in ''opsview.conf'' (5 minutes by default)
 +  * send the file up to the master
 +
 +The client log file is located at ''/usr/local/nagios/var/log/opsview-slave.log''.
 +
 +You can increase the debugging of the import_slaveresultsd by altering the file ''/usr/local/nagios/etc/Log4perl-slave.conf'' with:
 +<code>
 +#log4perl.logger.import_slaveresultsd=DEBUG
 +</code>
 +
 +This does not require a restart of the daemon, though it could take up to 30 seconds to recognise.
 +
 +Be aware that every successful send result is archived to the directory ''/usr/local/nagios/var/slaveresults.archive'', so if you enable this option, please remember to disable again.
 +
 +===== Communication Flow =====
 +When a client connects, there is an exchange of initial information before results are sent. Internally, the results are sent as JSON data but are encrypted.
 +
 +The server will receive each result and write all the results into Nagios' checkresults directory (bypassing the Nagios Core named pipe which is a known bottleneck). The results are not considered "ready" until the client signals the end of the data, thus ensuring data integrity.
 +
 +
 +===== Monitoring =====
 +Monitoring is done on the Opsview master via the //Slave-node// checks. This will return back the number of back logged files exist in the results directory. It will also alert if the oldest file is older than 70 seconds as this means the daemon is not working.
 +
 +
 +===== Troubleshooting =====
 +==== Slave results not getting to master ====
 +If you get errors on the master in ''/var/log/opsview/opsviewd.log'' like:
 +<code>
 +[2012/01/31 13:48:01] [nrd] [ERROR] Couldn't unserialize a request: malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "\x{19cb}\x{285}E\x{529}...") at /usr/local/nagios/bin/../perl/lib/NRD/Serialize/plain.pm line 28, <GEN25> line 4.
 +</code>
 +Then restart NRD on the master with ''/etc/init.d/opsview restart''.
Navigation
Print/export
Toolbox