[icinga-users] Passive Services/Hosts stale after many reloads
denis.simonet at adfinis-sygroup.ch
Tue Dec 18 17:20:45 CET 2012
We use Icinga to monitor more than 400 hosts. Active checks are fine but we sometimes encounter problems with passive hosts and services. This issue is kind of complicated because many components are involved and we couldn't find an actual cause yet. So let's start with a description of our setup.
The icinga configuration files (object files) are generated automatically with a web front end which we wrote ourself. There is an export functionality which writes the icinga configuration files and then does a /etc/init.d/icinga reload. There are also around 15 satellites which send check results to the central icinga server, over nsca.
Our problem seems to be related to reloads. If there are many Icinga-Reloads for a while, at some point icinga thinks that (almost?) all passive hosts and services are stale and triggers the configured error command (which is a Critical state). So there are like 300 alerts at the same time.
Does anyone have an idea why this happens from time to time? We succeeded to cause this situation with a stress test after reloading Icinga every 10 seconds for 20 minutes. After stopping this it took a small amount of time and the alerts began. It would help if you could point a direction - is it the Icinga configuration, nsca, the named pipe, the objects or the general setup?
If you need to know any other details about our setup, please let me know.
Thank you in advance.
Adfinis SyGroup AG
Denis Simonet, Software Engineer
Brückfeldstrasse 21 | CH-3012 Bern
Tel. 031 550 31 11
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the icinga-users