Welcome to docs.opsview.com

Runtime Database

Opsview relies on updates to the Runtime database hold the latest status data. Opsview uses NDOUtils for the NagiosĀ® Core to database interaction.

Architecture Diagram

Nagios Core writes its data to the /usr/local/nagios/var/ndo.dat file via the ndomod broker module. This file is moved into the /usr/local/nagios/var/ndologs directory every 5 seconds (In Opsview 2.14 and earlier, the log file also rotated on every host failed state. This was patched due to the blocking nature of host checks in Nagios Core 2.X).

The main daemon is import_ndologsd. This controls the importing of data into the ndo2db daemon, which handles the interaction with the Runtime database.

The interface between Nagios Core and import_ndologsd are the log files that are held in /usr/local/nagios/var/ndologs.

Note that if there is a problem with the import process, Nagios Core will continue to run and do its monitoring and continue alerting. However, the status screens in Opsview may not be up to date.

Troubleshooting

Where are the debug logs?

import_ndologsd writes its debug log files into /var/log/opsview/opsviewd.log.

ndo2db writes its log files to syslog. You will need to configure syslog to capture results appropriately. ndo2db uses the user facility.

Can I get extra debug?

You can uncomment the following line in /usr/local/nagios/etc/Log4perl.conf. It may take up to 30 seconds for the daemon to recognise the change:

log4perl.logger.import_ndologsd=DEBUG

You will get timing and size information for every log file imported.

Note that a DEBUG level will also copy any NDO log files from /usr/local/nagios/var/ndologs into /usr/local/nagios/var/ndologs.archive, so do not leave the debug level on for longer than necessary as this will use a lot of disk space. If you have issues with slow Runtime performance, gather these files over a reload and escalate to Opsview support for further investigation.

What does "Import of 1257049915.210140, size=1123, took 5.63 seconds > 5 seconds" mean?

If you get log entries such as:

[2009/10/06 21:08:28] [import_ndologsd] [WARN] Import of 1254859702.684564, size=530288, took 6.21 seconds > 5 seconds

This is usually fine because a large import (around 530K above) is occurring. This is likely to be at the end of the reload process.

If you have entries where the size is small and the time is long, then there are probably some database tuning you can do.

[2009/11/01 04:32:01] [import_ndologsd] [WARN] Import of 1257049915.210140, size=1123, took 5.63 seconds > 5 seconds

How do I know that the updates are timely?

In a default install of Opsview, a service check called Opsview NDO will be created and associated with the opsview host.

This service check uses the check_opsview_ndo_import plugin to collect information.

We recommend that you have the appropriate notifications setup for this service check so that you are informed when the imports are not being updated to the database in time.

What happens when the database is down?

In normal operation, you will see messages like this in syslog:

Nov  3 08:13:48 localhost ndo2db: Successfully connected to MySQL database
Nov  3 08:13:48 localhost ndo2db: Successfully disconnected from MySQL database

These are set to DEBUG level, so you can use syslog to filter these out.

In Opsview versions before 3.3.2, ndo2db would raise an error and drop the log file.

From Opsview 3.3.2 onwards, if the database connection is down, the import process is blocked until the database is available again.

You will see errors like this in syslog:

Nov  3 08:15:33 localhost ndo2db: Error: Could not connect to MySQL database: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Nov  3 08:15:33 localhost ndo2db: Did not find instance_name 'default' - retrying

Also file2sock will hang while it writes to the ndo.sock socket. When the database recovers, the import continues.

I had a database problem - how can I get the imports to catch up?

If you have a major database problem and the ndologs directory has many entries, then you can either:

  • leave the import process to continue, so all the data is inserted and wait for the latest status data to be displayed
  • or remove the log files from the directory, thus getting the latest status data into the Runtime database straight away, but losing state history data

You can get the latest database status time by hovering over the Last Updated value in the status bar (when a refresh occurs). For example, it could read:

(Server status: 2009-11-03 09:02:08)

Can I tell how long historically updates are taking?

A query like this shows all the imports taking more than 5 seconds:

select conninfo_id, connect_time, timediff(disconnect_time,connect_time) as duration from nagios_conninfo
where time_to_sec(timediff(disconnect_time,connect_time)) > 5
order by connect_time desc limit 10
Navigation
Print/export
Toolbox